Pandas – iPython Notebook Study Notes

My colleagues and I have been R for data crunching, analyzing and virtualization for a while. My friend Alex has told me that the more he used Pandas together with iPython notebook, the more he prefers that to writing R code inside RStudio. From my own perspective, there was a few times that it is really convenient to share work with iPython notebook with others and also, the Random forest library in Python is better. I decide to spend more time learning how to use pandas and scikit-learn in the future. Thanks Alex!

The easiest way to get started is to download the Anaconda version of Python, in the case, you don’t have to go through the painful process of installing pandas, scikit-learn python libraries, trust me, that pain is not that much value added to you, my friend :).

Then you are good to go. Here are a few screen shots of how I write Python code to read in a csv file into pandas dataframe.

You can read more about the function read_csv from here. The pandas documentation is more verbose in detail and also provides more examples to get you started. Seems like the index_col=0 is probably a argument I gonna use now and then. So I highlighted it here.