Today I learned a few tips from my colleague in Bash.
We have a whole series of shell commands that need to run sequentially.
set -e will make sure the shell script will stop once one line failed.
Without set -e, the whole script will try to finish even if an intermediate step fail.
GF: hey baby, can you help me get the pricing for my wishlist?
BF: absolutely, I can help you scrape lancome’s whole websites and give you all the pricing! (fingercrossed).
I took a quick look at lancome’s website and realized that it is a very typical website, every tag has well formatted class name or id. Even if they don’t have well formatted json requests while monitoring their network activity, it is still doable just collect the populated HTML and use beautifulsoup to narrow down to the desired elements and content.
I used urllib2 to make a simple http request and clearly what I want is captured in the response and then it is just a matter of parsing the HTML, thanks to Beautifulsoup, life is so easy now.
Then the idea of using requests quickly came up to me. However, seems like the requests library is not working right out of box.
Here is a screenshot of comparing urlib2 with requests.
In the end, I realized that `r.text` is not enough. First, r.text will return the html in the unicode format and then screw up the soup while passing to the BeautifulSoup function.
In the end, I have to upgrade my Python version and meanwhile using r.content instead of r.text to make it work.
Cartodb is the open source library which specializes in creating maps easily.
Apache Cloudstack is a OSCC (open source cloud computing).
Here is a great tutorial walking you through some of the basic features of Cloudstack by visiting the frontend management console.