If you are interested in playing with time series data like stock price, it is usually a good idea to start with Finance, probably the most frequently used exchange stock price. There are many exchanges out there and NASDAQ is a good one. I shopped around on the Internet but it is a bit hard to find some good dataset with fine grained data without paying. However, there are plenty of free APIs out there but they are all based on tickers so in this case, we can put together a solution where we can first get a list of public company names and then loop through each company but making an API call each specifying the time range.
1. Get Company List
By visiting Nasdaq website, you can easily find a download file which contains all the tickers that listed there (not only Nasdaq but also Amex and NYSE).
And this is how the data file looks like.
This is the first time I ever see 3435 public companies listed in such a clean format, let’s do some quick analysis. Since the industry is a subcategory of Sector and have if not hundreds, at least tens of different categories that might be difficult to display. For now, let’s aggregate by sector and see what are the total market cap, number of companies, and maybe how “young” each sector is by calculating the median IPO year.
As you can see, technology sector has the most market cap (5.9trillion usd) which is almost half (46%) of the whole market. And the whole Nasdaq total market share is about ~ 13 trillion USD. At the same time, it is interesting to find that it is actually the finance industry who has the highest average IPO year and not surprisingly, consumer durables have the lowest/oldest average IPO year. From the company count perspective, Health care has the most number of public companies.
Anyway, now we have a trustworthy list of tickers, the next step is to hit the API and get the time series stock price for those companies via alpha vantage.
2. Get Time Series data
I put together this little program so that I can make calls and then store the raw response to my local disk for later processing. Sqlite is a good option and you can use dbbrowser to view the table content easily.
One small tip is that the insert statement above is an easy way to escape all the characters by using the question mark placeholder. Quite neat so that you don’t have to play with double quotes and single quotes, which is a big pain in the ass.
Unfortunately, my job couldn’t finish even for my tests against just the first 10 companies, and I should have guess it way ahead of time, it is a “free-mium” service, the API has an extremely small limit which if you are trying to make more than 5 calls in a minute, you need to upgrade to the premium services, which I am not fully ready to do that yet. I guess this is the end of post. A perfect example how difficult and time consuming it could be to hunt down the good data sources.
Frankly speaking, it is indeed not that expensive but I guess as a hobbyist, you probably want to shop around and see if there is a better choice for your weekend project.