To read the data that cleaned up by Python. There are a few options. You can pump Python result into some kind of popular datatypes (JSON), or You can do some ETL work on either side to make the output the Python could be easily read by R. However, there is already an R package called RPython that seamlessly fill in this gap.
Take a look at document of this package, you can find there are only 4 functions in the content page but that makes my life so much easier.
1.
library(‘rPython’)
str_py_list <- "[1,'a', 'B']"
str(python.get(str_py_list))
List of 3
$ : num 1
$ : chr "a"
$ : chr "B"
str_py_tuple <- "(1,'a', 'B')"
str(python.get(str_py_tuple))
List of 3
$ : num 1
$ : chr "a"
$ : chr "B"
str_py_dict <- "{1:2, 'a':'A', 'B': 1+1}"
str(python.get(str_py_dict))
List of 3
$ a: chr "A"
$ 1: num 2
$ B: num 2
As you can see, the python.get
will read in the string and parse python objects into the List in R. and then you can use data.frame
to change it into dataframe type.
2. python.assign(PyObject,RObject)
will read the R object and translate into an Python object. Combining with python.exec("python command")
, you can read the R data into Python.
> data(iris)
> df python.assign(‘py_iris’, df)
> python.exec(“print len(py_iris)”)
5
> python.exec(“print py_iris.keys()”)
[u’Petal.Length’, u’Sepal.Length’, u’Petal.Width’, u’Sepal.Width’, u’Species’]
3. python.load()
will run a script of python code
$cat datafireball.py
import urllib2, sys
sys.path.append(‘/Library/Python/2.7/site-packages/beautifulsoup4-4.2.1-py2.7.egg’)
from bs4 import BeautifulSoup
stream = urllib2.urlopen(‘https://datafireball.com/’)
soup = BeautifulSoup(stream)
print soup.find(‘div’, {‘class’:’site-description’}).text.encode(‘utf-8’)
Above is a very basic Python script which uses urllib2 library make http request to datafireball.com and then uses BeautifulSoup package to parse the html returned. In the end, it will print the description title of datafireball.com to the screen.
Note, if you know how to catch the title into R, please leave a comment, but this is what happens in R.
python.load(‘/tmp/datafireball.py’)
a journey of a data guy