Coinbase Logo

Part 2: Data Science at Coinbase: Invest In Data Foundation & Embrace AI Capabilities

TL;DR: Data Scientists operate in a "transformation layer" to turn raw data into useful information. The field is evolving from seeking mythical perfect candidates to empowering all professionals with data. Prioritizing data engineering and infrastructure investments is crucial and Coinbase is leading the way by embracing Generative AI and advanced tools to democratize data accessibility.

By Justin Chen

Engineering

, November 7, 2023

Coinbase Blog

Interestingly, data scientists and analytics professionals are seldom the recipients of their own analytical outputs. They operate within what can be described as a "transformation layer"—this means they turn raw data into helpful information that can be used to make decisions or improve products. This intricate process involves the development of data pipelines, the crafting of machine learning models, and the composition of data narratives. Each element serves a purpose similar to how engineering components are indispensable in creating a successful product. In this context, the foundations of data and its subsequent transformation are the foundation for effective business decisions. Looking ahead, the world of data science anticipates new contributors and transformative methodologies, potentially redefining industry practices.

chart1

Evolving: The Transformation and Consumption Layer

Today's data science and analytics field is marked by a few noticeable traits:

  • Ad-hoc manual queries 

  • Time-consuming analyses

  • Over-reliance on individual expertise and tribal knowledge 

  • A lack of structured approach 

Every company is searching for a unicorn— The rare individual who can parachute into the role and excel at all of the above with years of experience. However , it is important to remember that unicorns are mythical creatures that do not exist. Even if the “unicorn” is a star performer in their previous company, they might find themselves stepping into a role in this industry with a limited knowledge base, weak data foundation, and highly customized workflow. This common situation highlights a key challenge: it is hard to standardize skills and knowledge in this field. What a data scientist excels at can be heavily influenced by their past experiences and training.

The goal moving forward is to evolve from seeking mythical, perfect candidates to establishing a robust system that empowers all data professionals to do their best work.

chart2

eamesBot/shutterstock.com

However, change is unfolding rapidly in the field of data science even more quickly with the rise of automation and AI. Imagine most data insights being automated, allowing business professionals to access the information they need with ease and efficiency. This future is fast becoming a reality, especially with technological advancements in Generative AI. 

At Coinbase, our data science team is leading this change. We have introduced an early model of a Slackbot powered by LLM, which  is designed to instantly satisfy users' data questions — streamlining how we interact with data and get adhoc insights. Moreover, by using platforms like ThoughtSpot, we are making analytics and dashboard creation accessible through everyday language, even for non-experts. These innovations are not just adaptations, but mark a significant stride towards a more efficient data-driven future.

By refining our approach, we will enhance these technological tools' precision and practicality. As these changes become integral to our operations and our business expands, the idea of offering "insights as a service"—focusing on utilizing advanced tools and models to increase analytical output exponentially rather than relying solely on people—becomes increasingly important.  

Investment in the data foundation layer now, by prioritizing funding and initiatives for data engineering and its projects, is essential for future preparedness and will yield much higher and longer term dividends — and doing so enhances the efficiency and dependability of data insights delivery. Initiatives like Analytics Data Models (design principles to govern certified data source), Cerberus (data quality framework), and Metrics Cube (single source of true for core metrics semantics) are concrete investments that Coinbase has prioritized over the years. Such commitments have facilitated the swift integration of advanced capabilities like LLM, paving the way for immediate and actionable business insights. This strategic foresight ensures not only the relevance, but also the resilience, of our data infrastructure in the face of future demands. 

To summarize, a data tool strategy and roadmap to propel a data-driven culture sustainably is critical to a company's success. It should do so by: 

  • Embracing the capabilities of LLM. A sandbox environment can be pivotal for tooling and exploration.

  • Investing in solidifying a data foundation: It is often said that the tangible results we see from data science (be it ML models, insights, or reports) are just the tip of the iceberg. Delving deep, having a strong foundation becomes imperative.

  • Promoting product, sales, marketing, and finance teams to self-service their data consumption. Not all teams need to learn how to write SQL or Pivot data with spreadsheets, but they should have an analytical mindset and be taught to navigate simple toolings with natural language, which can be achieved through training.

image3

eragraphics/shutterstock.com

Coinbase is at the forefront of a transformative era in data science, pivoting from the elusive search for “unicorns” to fostering an environment of collective technological empowerment. Our initiatives signify a move towards a new approach, one that democratizes data, enabling more interactions across teams and reducing reliance on niche expertise. By investing in foundational data strategies, we are setting the stage for a future where real-time, accessible insights form the backbone of strategic decision-making for businesses. As we embrace this evolution, our commitment is clear: to harness the best of technology and human expertise to chart a course that is both revolutionary and rooted in value. 

Coinbase logo
Justin Chen

About Justin Chen

Sr. Director of Data Science on the Strategy, Execution, and Analytics Team