Nutch Hadoop

Nutch Hadoop Tutorial – This is a tutorial that shows you how to set up Apache Nutch on a running hadoop cluster and won’t dive into the architect detail too much, which is a perfect tutorial for me. 

A few assumptions before following this tutorial:

1. root 2. ssh 3. cluster 4. maillist for Q&A 5. Java programming background

Hadoop Cluster Setup:

Download Hadoop and Nutch:

Setup the Deployment Architecture

Deploy Nutch to a Single Machine

Deploy Nutch to multiple Machines

Performing a Crawl

Testing the Crawl 

Performing a Search

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s