Building Spark ML pipelines with sparklyr – Kevin Kuo

March 3, 2018


We provide an overview of the recently implemented Pipelines API in sparklyr, an R package for interfacing with Apache Spark. This new feature allows users to build and tune data transformation and machine learning pipelines that are interoperable with Scala and Python, simplifying handoffs between data science and data engineering. We go over the components of pipelines and walk through practical examples.

About the speaker

Kevin Kuo
Software Engineer

Kevin is a software engineer focused on building R interfaces to big data and machine learning tools like Spark and TensorFlow. He has experience applying data analytics in a variety of settings from insurance claims analytics to predictive maintenance of industrial assets. Outside of data science, Kevin enjoys wine tasting and crafting fancy cocktails.

Previous Video
Deploying TensorFlow models with tfdeploy – Javier Luraschi
Deploying TensorFlow models with tfdeploy – Javier Luraschi

No More Videos