Posit

Videos
Part 4 - Understanding sparklyr deployment modes

Videos

RStudio recently announced a new open-source package called sparklyr that facilitates a connection between R and Spark using a full-fledged dplyr backend with support for the entirety of Spark’s MLlib library. Due to Spark’s ability to interact with distributed data with little latency, it is becoming an attractive tool for interfacing with large datasets in an interactive environment. In addition to handling the storage of data, Spark also incorporates a variety of other tools including stream processing, computing on graphs, and a distributed machine learning framework. Some of these tools are available to R programmers via the sparklyr package.

In this four-part series, we’ll discuss how to leverage Spark’s capabilities in a modern R environment. The sparklyr Series:

Download Materials

Tags: Spark sparklyr

Edgar Ruiz

Edgar Ruiz es un Ingeniero de Soluciones en RStudio. Es el administrador de los sitios oficiales de sparklyr y de R para bases de datos. También es autor de los paquetes de R: dbplot, tidypredict y modeldb, y co-autor de el paquete dbplyr.