Scrapyd – You can manage your spiders in GUI

“Scrapyd is an application for deploying and running Scrapy spiders. It enables you to deploy (upload) your projects and control their spiders using a JSON API.”

You first need to package your project into egg by using ‘scrapy deploy’ inside the project folder.

Then you can upload the egg to the scrapy server by using ‘curl http://localhost:6800/schedule.json -d project=datafireball -d spider=datafireball’

Docs, Github:

scrapyd_homepage

scrapyd_jobsscrapyd_itemsscrapyd_items_detail

Docker – Remove Existing Docker Images

I want to remove all the existing docker images from my virtualbox, and I also ran into the errors like this for a few images.

docker_rmi_fail

However, I ran command `sudo docker ps`, I cannot see any running containers and it confused me a lot until I came across this Docker issue3258 on github. In the end, I realized that there is a difference between running and non-running containers which `docker ps` will only list the running ones. You need to remove both types of containers before remove all the images.

Here is the solution in the end:

sudo docker ps -a | grep Exit | awk '{print $1}' | sudo xargs docker rm
sudo docker rmi $(sudo docker images -q)

More information about what the commands do:

`sudo docker ps -a` will list all the information about docker containers including the running one, exited ones..etc.  

docker_rmi_ps_a

Then it pipe the data to extract the image id of the ones that contain Exit then run `docker rm` command, which is used to remove non-running containers.

After that, you can easily remove all the images because no containers will be running. Here is also a helpful post from stackoverflow.