Building Spark ML pipelines with sparklyr

rstudio::conf 2018 Interoperabilityerability

Kevin Kuo

March 4, 2018

We provide an overview of the recently implemented Pipelines API in sparklyr, an R package for interfacing with Apache Spark. This new feature allows users to build and tune data transformation and machine learning pipelines that are interoperable with Scala and Python, simplifying handoffs between data science and data engineering. We go over the components of pipelines and walk through practical examples.

View Slides

About the speaker

Kevin Kuo

Kevin is a software engineer working on open source packages for big data analytics and machine learning. He has held data science positions in a variety of industries and was a credentialed actuary. He likes mixing cocktails and studying about wine.