I found a project on github contributed by Lewis Zhang and Mohammed Omer(momer), momer has not only written a Nutch plugin to make http request using Selenium,Firefox, but also finished another plugin on top of Selenium Grid which will not only improve the performance if running in parallel, but also leverage the grid to handle the hanging process if any. He also offered two docker images to help get started. Since I have not really used docker and think this would be great chance to learn how to use. So this post is about my experience building his project using docker.
You can clone the github repositories locally and run docker build. However, there is an easier way which you can just run docker build command directly against the github project. In that case, it will treat the files from the URL as a whole and actually pull the content first locally, then send it to the docker daemon using as the `context` to build the container.
Two things that worth mentioning, first, you can pass a tar ball to the build command from stdin and docker will decompress and use it as the context. second, there are many staging or intermediate containers along the way to build the final container that you expected. Those will be deleted as default but you can keep them if you set the –rm=false. When I build the hub container, I realized the repository name is missing and the same thing happens again when I redo it. I ended up using the 12 digits image id to start the container, and at least it works.
Now the challenging part is how to start the node, momer mentioned that you gonna use a tool called MaestroNG to make it work.
TO BE CONTINUED