Nutch Hadoop

Nutch Hadoop Tutorial – This is a tutorial that shows you how to set up Apache Nutch on a running hadoop cluster and won’t dive into the architect detail too much, which is a perfect tutorial for me. 

A few assumptions before following this tutorial:

1. root 2. ssh 3. cluster 4. maillist for Q&A 5. Java programming background

Hadoop Cluster Setup:

Download Hadoop and Nutch:

Setup the Deployment Architecture

Deploy Nutch to a Single Machine

Deploy Nutch to multiple Machines

Performing a Crawl

Testing the Crawl 

Performing a Search

 

Leave a comment