Pandas – iPython Notebook Study Notes

My colleagues and I have been R for data crunching, analyzing and virtualization for a while. My friend Alex has told me that the more he used Pandas together with iPython notebook, the more he prefers that to writing R code inside RStudio. From my own perspective, there was a few times that it is really convenient to share work with iPython notebook with others and also, the Random forest library in Python is better. I decide to spend more time learning how to use pandas and scikit-learn in the future. Thanks Alex!

The easiest way to get started is to download the Anaconda version of Python, in the case, you don’t have to go through the painful process of installing pandas, scikit-learn python libraries, trust me, that pain is not that much value added to you, my friend :).

Then you are good to go. Here are a few screen shots of how I write Python code to read in a csv file into pandas dataframe.

excel_csv

pandas_read_csv

You can read more about the function read_csv from here. The pandas documentation is more verbose in detail and also provides more examples to get you started. Seems like the index_col=0 is probably a argument I gonna use now and then. So I highlighted it here.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s