Python / From Python to R and back

In this post you will find a short introduction to rpy2. It is a library which allows using R code in Python almost like it is a native Python code.

Technical requirements

Currently the development is done on UNIX-like operating systems with the following software versions. Those are the recommended versions to run rpy2 with.
Software
Version
Python3.4
R3.2+
Running rpy2 will require compiled libraries for R, Python, and readline; building rpy2 will require the corresponding development headers (check the documentation for more information about builing rpy2).

Getting started

It is assumed here that the rpy2 package has been properly installed. In python, making a package or module available is achieved by importing it. rpy2 is just a python package. We are going to interact with R using the robjects layer, a high-level interface that tries to hide R behind a Python-like behavior.
import rpy2.robjects as robjects

The r instance

The object :data:r in :mod:rpy2.robjects represents the running embedded R process.
If you are familiar with R and the R console, :data:r is a little like a communication channel from Python to R.

Getting R objects

In Python the [ operator is an alias for the method :meth:__getitem__.
The :meth:__getitem__ method of :mod:rpy2.robjects.r, evaluates a variable from the R console.
Example in R:
%%R
pi
ERROR: Cell magic %%R not found.

With :mod:rpy2:
pi = robjects.r['pi']
pi[0]
3.141592653589793

Under the hood, the variable pi is gotten by default from the R base package, unless an other variable with the name pi was created in R’s .globalEnv.
Whenever one wishes to be specific about where the symbol should be looked for (which should be most of the time), it possible to wrap R packages in Python namespace objects (see :ref:robjects-packages).
Also, nice to keep in mind that pi is not a scalar but a vector of length 1.


Evaluating R code

The evaluation is performed in what is known to R users as the Global Environment, that is the place one starts at when starting the R console. Whenever the R code creates variables, those variables are “located” in that Global Environment by default.
Example:
robjects.r('''
    f <- code="" function="" r="" verbose="FALSE)">
        if (verbose) {
            cat("I am calling f().\n")
        }
        2 * pi * r
        }
        f(3)
''')
- Python:0x7f1ab8c31a08 / R:0x2e71cc8>
[18.849556]

The expression above returns the value 18.85, but first creates an R function f. That function f is present in the R Global Environement, and can be accessed with the __getitem__ mechanism outlined above:
r_f = robjects.globalenv['f']
print(r_f.r_repr())
function (r, verbose = FALSE)
{
    if (verbose) {
        cat("I am calling f().n")
    }
    2 * pi * r
}

As shown earlier, an alternative way to get the function is to get it from the :class:R singleton
r_f = robjects.r[‘f’]

The function r_f is callable, and can be used like a regular Python function.
res = r_f(3)
Please check :ref:robjects-introduction-functions out for more info on calling functions.

Interpolating R objects into R code strings

Against the first impression one may get from the title of this section, simple and handy features of :mod:rpy2 are presented here.
An R object has a string representation that can be used directly into R code to be evaluated.
Simple example:
letters = robjects.r[‘letters’] 
rcode = ‘paste(%s, collapse=-”)’ %(letters.r_repr()) 
res = robjects.r(rcode) 
print(res)
“a-b-c-d-e-f-g-h-i-j-k-l-m-n-o-p-q-r-s-t-u-v-w-x-y-z”

R vectors

In R, data are mostly represented by vectors, even when looking like scalars.
When looking closely at the R object pi used previously, we can observe that this is in fact a vector of length 1.
len(robjects.r[‘pi’])
1
As such, the python method :meth:add will result in a concatenation (function c() in R), as this is the case for regular python lists.
Accessing the one value in that vector has to be stated explicitly:
robjects.r[‘pi’][0]
3.1415926535897931
There is much that can be achieved with vectors, having them to behave more like Python lists or R vectors. A comprehensive description of the behavior of vectors is found in :mod:robjects.vector.

Creating rpy2 vectors

Creating R vectors can be achieved simply:
res = robjects.StrVector([‘abc’, ‘def’]) 
print(res.r_repr()) 
>>> c(“abc”, “def”) 
  
res = robjects.IntVector([123]) 
print(res.r_repr())
>>> 1:3 
  
res = robjects.FloatVector([1.12.23.3]) 
print(res.r_repr()) 
>>> c(1.12.23.3)
R matrixes and arrays are just vectors with a dim attribute.
The easiest way to create such objects is to do it through R functions:
= robjects.FloatVector([1.12.23.34.45.56.6]) 
= robjects.r‘matrix’ 
print(m) 
>>>  [,1] [,2] [,3]
[1,] 1.1  3.3  5.5 
[2,] 2.2  4.4  6.6

Calling R functions

Calling R functions is similar to calling Python functions:
rsum = robjects.r[‘sum’] 
rsum(robjects.IntVector([1,2,3]))[0
>>> 6L
Keywords are also working:
rsort = robjects.r[‘sort’] 
res = rsort(robjects.IntVector([1,2,3]), decreasing=True
print(res.r_repr()) 
>>> c(3L2L1L)
By default, calling R functions return R objects.
More information on functions is in :ref:robjects-functions.

This is the nice-looking version of the official Rpy2 documentation, which can be found here: http://rpy2.readthedocs.org

No comments:

Post a Comment