November, 10th, 18H00 @ CRI by Thomas Julou
About this Workshop
R was designed with data analysis, graphical representations, and statistics in mind. We will focus on data manipulation and graphics, and almost skip the stats. Actually, once your data are in the right format (thanks to habile data manipulation) and you know what stats to perform (generally after wise graphical explorations), running the stats functions is a piece of cake; the hard part is to know/understand the stats, which is way beyond the focus of this workshop.
More precisely, after a brief R survival guide, we'll dive into an amazing package: ggplot2 (and its companion plyr). This is an implementation of Wilkinson's grammar of graphics that brings a very handy layer of abstraction in plotting commands. This makes R a very efficient alternative to any other plotting program, be they spreadsheets or scripts.
Finally, depending on people interests, we will possibly mention various advanced aspects of R: connection to MySQL databases, representations of geographical data, bridge with Python, performance optimisation with C nested in R code .
From Wikipedia : In computing, R is a programming language and software environment for statistical computing and graphics. R is an implementation of the S programming language created by John Chambers while at Bell Labs combined with lexical scoping semantics inspired by Scheme. R was created by Ross Ihaka and Robert Gentleman at the University of Auckland, New Zealand, and is now developed by the R Development Core Team, of which Chambers is a member. R is named partly after the first names of the first two R authors (Robert Gentleman and Ross Ihaka), and partly as a play on the name of S. The R language has become a de facto standard among statisticians for the development of statistical software, and is widely used for statistical software development and data analysis. R is part of the **GNU** project. Its source code is freely available under the GNU General Public License, and pre-compiled binary versions are provided for various operating systems. R uses a command line interface, though several graphical user interfaces are available.
Prepare The Workshop
All R ressources are available from CRAN. Depending on your system, please follow the appropriate link to install R; for this workshop, you only need the base package from R distribution.In addition, you must install ggplot2. Just type
install.packages("ggplot2")at the R prompt. All dependencies (including plyr) should be installed automatically…
For linux user, you may consider installing a GUI. JGR is a good option. GUIs are installed by default with the mac and windows distribution.
Here is the script file with all examples used in this session: File:Rfabelier 20101110.r.pdf
Here are the presentation slides: File:Rfabelier 20101110.pdf
R and ggplot2 from Python
Use the rpy2 package.
To install it on Ubuntu/Debian
sudo apt-get install python-rpy2
from rpy2.robjects.vectors import DataFrame from rpy2.robjects.packages import importr r_base = importr('base') datasets = importr('datasets') faithful_data = datasets.faithful from rpy2.robjects.lib import ggplot2 ggplot2.theme_set(ggplot2.theme_bw()) p = ggplot2.ggplot(faithful_data) + \ ggplot2.aes_string(x = "eruptions") + \ ggplot2.geom_histogram(fill = "lightblue") + \ ggplot2.geom_density(ggplot2.aes_string(y = '..count..'), colour = "orange") + \ ggplot2.geom_rug() + \ ggplot2.scale_x_continuous("Eruption duration (seconds)") + \ ggplot2.opts(title = "Old Faithful eruptions") p.plot()