I am a longtime R and Tidyverse user, and I recently joined a small data team of 3 at a company where our data team of 3 uses GCP extensively for our data pipelines and data analysis. Our data engineer is piping data from our data sources into BigQuery, and I am creating many views / tables in BigQuery from the data.
I must say BigQuery is great, and handles big data sources of GB-sized tables with ease, but I'd also like to use R on some of the smaller tables saved in BigQuery. There are some fairly clear pros and cons of using each of BigQuery (power to handles big data) and R (power of flexibility when working with smaller data), and I'd like to introduce R into our stack to handle what it's good at. When I proposed this, I received the following feedback:
"deploying r onto our pipelines environment will not be straightforward. we'd have to make a special kubernetes pod just for r and maintain that. also r isn't as performant and library management isn't as easy. it isn't a language built for building production pipelines."
With that said, I am interested in hearing if anybody here has been successful integrating R / BigQuery into a production GCP data pipeline. I don't have enough knowledge of data engineering work and our data pipeline to know better than our engineer on this. He seems convincing on R's weaknesses in this regard (library management, not as performant, lots of setup work), however I am highly proficient in R and am confident that if my team allowed me to introduce R into our stack / pipeline, that it would be helpful.
Any thoughts or related experiences on this would be greatly appreciated, thanks!