Building a new data science pipeline for the FT with RStudio Connect

We have recently implemented a new Data Science workflow and pipeline, using RStudio Connect and Google Cloud Services.

Building a new data science pipeline for the FT with RStudio Connect

January 30, 2020

We have recently implemented a new Data Science workflow and pipeline, using RStudio Connect and Google Cloud Services. This has vastly decreased our pipeline complexity, allowing us to bring our models and products into scheduled production more quickly. In addition, our workflow, working closely together as a team on all projects on a regular two-week sprint cycle, has increased the range of projects we have been able to take on and complete. To detail some of the key lessons we’ve learned (and some of the difficulties!), we’ll walk you through one of our recent sprints, where we productionalised the generation of a suite of behavioural and demographic features so that they can be more easily plugged in to a range of models and used across the business by the FT’s platform and product teams.

About the speaker

I completed my first degree in Electrical Engineering in Athens, Greece and after working for almost two years as a software engineer I decided I was ready to get into the world of data-science. That is when I started my masters in Machine Learning at University College London. I then immediately joined the FT through a graduate scheme in order to gain a wider view of the business and then moved to the data-science team. As part of the data-science team I have been involved in building both of our data/model pipelines, developing and maintaining various models around user churn, user engagement and subscription lifetime value. My greatest passion outside technology and data-science is traveling and hiking, having been in Nepal and the Himalayas twice over the past two years.