[Note] -- Hadoop, IMHO, is history. Rather than waste time with all this, suggest you check up my blog post on Spark with Python . In an earlier post on Data Science - A DIY approach , I had explained how one can initiate a career in data science, or data analytics, by using free resources available on the web. Since Hadoop and Map Reduce is a tool and a technique that is very popular in data science, this post will get you started and help you Install Hadoop 2.2, in a single machine cluster mode on a machine running Ubuntu Compile and run the standard WordCount example in Java Compile and run another, non WordCount , program in Java Use the Hadoop streaming utility to run a WordCount program written in Python, as an example of a non-Java application Compile and run a java program that actually solves a small but representative Predictive Analytics problem All the information presented here has been gathered from various sites on the web and has been tested on dual-boot ...
Comments