Data science is now a hot area of investment for many organizations. Countless blogs, articles, and analyst reports emphasize that effective data science is critical for competitive advantage, and many business leaders believe that data science is vital for an organization to survive, much less thrive, over the next several years.
However, many data science leaders grapple with an existential crisis for their teams. On the one hand, many vendors and analyst reports emphasize the rise of Citizen Data Scientists, empowered by tools that promise to augment and automate the hard work of data science to automagically answer vital questions, no Data Scientist required. On the other hand, machine learning and deep learning methods in the hands of software engineers, fueled by lots of computational power, answer more and more questions (as long as the problem is well-defined, and there is sufficient data available). Squeezed in between these trends, what is the role of a data scientist?
Even worse, nearly as many blogs and analyst reports emphasize the challenges of effectively implementing data science in an organization, and emphatically state that most analytics and data science projects fail, and most companies don’t achieve the revenue and profit growth that they hoped their data science investments would deliver.
We will dive into the role of a data scientist in more detail in the coming weeks, but here we will focus on this question: Why is getting real, lasting value from data science investments so difficult?
In talking to many different organizations implementing data science projects, we have seen many challenges that prevent data science investments from delivering the value they should. These typically fall into three categories:
Lack of credibility: Data science leaders grapple with whether their team has the necessary training and the right tools to discover relevant and valuable insights in their data. Once the team has found something interesting, how can others in the organization understand and trust those insights enough to actually change their behavior, and make decisions based on them? This problem is compounded if the approach is a difficult-to-explain, black box model.
Slow path to value: Seemingly simple questions like “Which customers will be our most profitable next quarter?” often turn into month-long research projects as data scientists scour the firm for data and struggle to wrangle it into shape (a topic we discussed in a recent blog post, Wrangling Unruly Data). Then once the data scientists start to develop an analysis, they find iterating and refining their results with stakeholders slow and unwieldy (something we covered in another blog post, Getting to the Right Question). These slow response times frustrate business sponsors and often stymie putting data insights into action. Worse, they encourage decision makers to go with their gut intuition instead of data.
Ephemeral benefits: Once a valuable insight or tool has reached a decision maker, organizations struggle with maintaining and growing the value of these data science investments over time. They find it difficult to implement repeatable and reproducible processes as their systems and data science tools evolve, which often forces them to start from scratch when solving a new problem, or to reimplement old analyses when needed. Furthermore, data science practice at an organization often become dependent on a single software vendor, and that vendor may try to extract more of the value the customer receives as software license revenue.
Andrew Mangano, Data Intelligence Lead at Salesforce, spoke at rstudio::conf 2020 about the importance of delivering useful insights to your stakeholders.
So what’s the answer? And how do you cut through all the hype and confusion?
The reality is that hard, vaguely defined but valuable to solve, problems exist in the world. Commodity approaches (whether via augmented analytics for citizen data scientists, or standard machine learning approaches for software engineers) yield commodity answers. Real-world business problems require smart, agile data science teams empowered with the flexibility and breadth of open source languages like R and Python. We know this because tens of thousands of you use our software every day to do amazing things.
To deliver real, lasting value, organizations need to set aside the hype and build on a strong foundation. We recommend adopting a strategy we call Serious Data Science. As shown in Figure 1, Serious Data Science is an approach to data science designed to deliver insights that are:
We’ll be writing in detail about these components of Serious Data Science in the weeks to come. But before we get to that, we must address a topic near and dear to every data science leader: the role of the data scientist within the organization. Our post next Tuesday will address how that role is changing in today’s organizations, and why they will need the Serious Data Science framework to continue demonstrating their value in the months and years to come.
If you’d like to learn more about Serious Data Science, we recommend:
A team from AstraZeneca describes how they connected and grew a community of R users at their organization.
With the vetiver package, data scientists have a streamlined, consistent way to maintain machine learning pipelines. We recently updated our Bike Share prediction application using vetiver and Quarto.