These days I have started learn Hadoop at last. I planed to learn this promising technology for a while and finally find a time to do it. Currently, I'm learning the Hadoop architecture and what functionality it can provide that an RDBMS can't do.
As a part of learning the product, I downloaded two VMs (VMWare) with preinstalled single-cluster Hadoop installations. One VM is available from Apache itself. It is running Ubuntu, which I'm not so accustomed to it, it is very minimalistic and doesn't provide any GUI. The other free VM is from Cloudera, it can be downloaded here. It is based on CentOS 5.8 and provides more features and additional products installed than the VM from Apache.
I just want to point out to a caveat that I hit with the Cloudera VM. By default, the VM allocates only 1 GB RAM. To my surprise, any attempt to run even the simplest WordCount example resulted in "out of memory" errors from Java. Under the default RAM allocation, the VM can become virtually inaccesible very easily because there are a lot of Hadoop-related running on it. Fortunately, the workaround is quite simple. Allocate at least 2GB RAM to the VM and everything will work much better. Most probably, there are additional solutions for "out of memory" problem, related to tuning of "-Xmx" parameter, but just providing slightly more RAM to the VM is much simpler. I hope that this tip will save time to another Hadoop beginner like me.
-Xmx