R in command line – stdin/stdout

In most cases, people use R inside some kind of IDE or interactive mode, like the R command line or Rstudio. From there, you can import all kinds of dataset using read.csv…etc. However, sometimes, I found it extremely helpful to write your R code in Rscript and read the data from the standard input and write the result back to the standard output. In this way, you can seamlessly pipe your R script with Bash, Python all together in one line.

Also, Hadoop Streaming makes it super easy to combine the power of all kinds of different languages together and fully utilize the power of cluster computing. This series of posts, I will introduce how to use stdin/stdout in R, how to parse the string into arguments, how to organize the out and how to apply all of this into Hadoop Streaming.

Here I will post a few tips for R users to get started working with stdin and stdout:
1.

#!/usr/bin/Rscript

Very First of all, your code need to start this line of code, this “shebang” will tell the machine which interpreter it will use to run the code, not /user/bin/R not anything else, the Rscript!
2.

input<-file('stdin', 'r')

As mentioned in the help page for function file:
Use “stdin” to refer to the C-level ‘standard input’ of the process (which need not be connected to anything in a console or embedded version of R, and is not in RGui on Windows).
Then we’ve successfully created the connection.
3.

row <- readLines(input, n=1)

The ‘r’ flag is actually very important here, otherwise, you can only read the first line..
And as default, the connection is closed when we first created the input connection. From the help page of readLines: If the connection is open it is read from its current position. If it is not open, it is opened in “rt” mode for the duration of the call and then closed again. Usually, data stored in the flat file comes with a format that each line is a record. So n=1 tells R to read 1 row at a time.
4.

while(length(row)>0) {
    # do something with your row (record)
}

To make sure every row got processed. You just need to check the length of the line that you read and put that into a while loop as a check flag.
5.

write(result, "")

write(x, file=”data”…), the file could be a connection, but we want our result be written to the standard output, in this case, we can just use an empty string/stdout() to make it happen.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s