Cloudera – Cloudera Search Flume + HBase + SolrCloud


Need to look at Cloudera Search. Solr and Elastic Search are both awesome tools and they can really scale to fit in the big data world by evolving into SolrCloud and ElasticSearch Cluster. I set up Elastic Search Cluster on top of Hadoop and it was really easy to use. However, the downside is:

(1) It is a bit hard to admin, since we are using CDH, it might just be easier that you can monitor all the animals at one place – Cloudera Manager.

(2) When you need to index the data, you have to turn that into JSON, and move from HDFS to local and write some Python code to index it, at least that was my way to do it.

I know long time ago that Cloudera has solr built in but I never really looked into it. Today, I did some research and it seems like that they have some decent architecture that work right out of box. You can have the data flowing through flume to dump into your hdfs environment, stored in HBase, and you can index the data there from Solr in the Hue environment. If everything works as they claimed to be. The whole structure might be really stable and awesome! Your team might not need another 200K to hire an expensive big data programmer and those few tools will just do what you want out of box. 🙂

For the front end, Kibana is also another reason that why I like Elastic Search, also they have Silk which is an equivalent of Kibana but for Solr, and in the newer version of Hue, they have some pretty kickass dashboard where you will have your bar chart, pie chart and favorite map.

hue-3.6-search-v2(copyright reserved to

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s