Nutch Hadoop Tutorial – This is a tutorial that shows you how to set up Apache Nutch on a running hadoop cluster and won’t dive into the architect detail too much, which is a perfect tutorial for me.
A few assumptions before following this tutorial:
1. root 2. ssh 3. cluster 4. maillist for Q&A 5. Java programming background
Hadoop Cluster Setup:
Download Hadoop and Nutch:
Setup the Deployment Architecture
Deploy Nutch to a Single Machine
Deploy Nutch to multiple Machines
Performing a Crawl
Testing the Crawl
Performing a Search