We provide an overview of the recently implemented Pipelines API in sparklyr, an R package for interfacing with Apache Spark. This new feature allows users to build and tune data transformation and machine learning pipelines that are interoperable with Scala and Python, simplifying handoffs between data science and data engineering. We go over the components of pipelines and walk through practical examples.
About the speaker
Kevin is a software engineer focused on building R interfaces to big data and machine learning tools like Spark and TensorFlow. He has experience applying data analytics in a variety of settings from insurance claims analytics to predictive maintenance of industrial assets. Outside of data science, Kevin enjoys wine tasting and crafting fancy cocktails.