Part 3 - Advanced features of sparklyr

August 23, 2017 Javier Luraschi

RStudio recently announced a new open-source package called sparklyr that facilitates a connection between R and Spark using a full-fledged dplyr backend with support for the entirety of Spark’s MLlib library. Due to Spark’s ability to interact with distributed data with little latency, it is becoming an attractive tool for interfacing with large datasets in an interactive environment. In addition to handling the storage of data, Spark also incorporates a variety of other tools including stream processing, computing on graphs, and a distributed machine learning framework. Some of these tools are available to R programmers via the sparklyr package.

In this four-part series, we’ll discuss how to leverage Spark’s capabilities in a modern R environment. The sparklyr Series:

  1. Introducing an R interface for Apache Spark
  2. Extending Spark using sparklyr and R
  3. Advanced Features of sparklyr
  4. Understanding Spark and sparklyr deployment modes 

Download Materials

About the Author

Javier Luraschi

Javier is a Software Engineer with experience in technologies ranging from desktop, web, mobile and backend; to augmented reality and deep learning applications. He previously worked for Microsoft Research and SAP and holds a double degree in Mathematics and Software Engineering.

Follow on Twitter More Content by Javier Luraschi
Previous Video
Part 2 - Extending Spark using sparklyr
Part 2 - Extending Spark using sparklyr

sparklyr facilitates a connection between R and Spark using a full-fledged dplyr backend with support for t...

Next Video
Part 4 - Understanding sparklyr deployment modes
Part 4 - Understanding sparklyr deployment modes

sparklyr facilitates a connection between R and Spark using a full-fledged dplyr backend with support for t...