Pardon the interruption as we migrate content to our new site. Visit posit.co for our full site.
rstudio::conf 2018 Interoperabilityerability
Building Spark ML pipelines with sparklyr
March 4, 2018
We provide an overview of the recently implemented Pipelines API in sparklyr, an R package for interfacing with Apache Spark. This new feature allows users to build and tune data transformation and machine learning pipelines that are interoperable with Scala and Python, simplifying handoffs between data science and data engineering. We go over the components of pipelines and walk through practical examples.
Kevin is a software engineer working on open source packages for big data analytics and machine learning. He has held data science positions in a variety of industries and was a credentialed actuary. He likes mixing cocktails and studying about wine.