While building our data team at Coinbase, I have found the following question to be the most insightful:
How would you describe yourself as a histogram?
In 2012, Harvard Business Review described Data Scientist as the of 21st century. However, the term data scientist has evolved a lot over the past few years, to the extent that everyone seems to have a different definition of the role. Data professionals come from a variety of backgrounds. On one end of the spectrum, there are people with advanced quantitative degrees who picked up some programming skills. On the other end, there are software engineers who taught themselves stats and machine learning. Given the diverse backgrounds of everyone in the data field, I realized this question really allowed me to understand their strengths and interests:
How would you describe yourself as a histogram, with the following skills on the x-axis: SQL, statistics, machine learning, backend services and distributed computing? More specifically, you have 100 points that you have to distribute across these skills.
Based on the scores that candidates provide themselves on this histogram, it becomes quickly obvious what their persona is, and thereby what role could they most successfully fill within our data team:
All data professionals can be described as a histogram of their skill sets
To make this more concrete, I want to provide a view into the type of work that each of those data personas perform at Coinbase.
Data analysts: work with product or platform teams to perform deep analyses that help determine what to build next. For instance, one question an analyst at Coinbase may spend a lot of time answering is: what types of users does Coinbase have in terms of their trading and withdrawal patterns. Data analysts tool set typically involves: SQL, Excel and in some cases complex analyses using Pandas dataframes and Jupyter notebooks.
Quantitative analysts: work on more complex data analyses that involve deep knowledge of statistics, Bayesian math and time series (ARIMA) modeling. In a fintech domain such as at Coinbase, they may solve problems around studying behavior of machine learning systems e.g. how quickly does the risk score for a user (and hence the user population in aggregate) converge to the true score for that user (and hence the true distribution).
Data scientists: are generalists who are adept at SQL, statistics as well as machine learning. They straddle the area between quants and machine learning engineers in that they can derive statistically sound analysis and also build prototypical machine learning models. At Coinbase, they may work on problems such as building user Lifetime Value (LTV) models to inform our user acquisition and referral programs.
Machine learning engineers: In contrast to data scientists, machine learning engineers implement production ready models which involves adding features to the model that can scale at the time of scoring. At Coinbase, this involves building machine learning models to prevent payment fraud or detect potential account takeovers. They also work on deep learning and computer vision to help with our identity verification systems e.g. determine if an Id document uploaded was blurry, photoshopped or similar despite some alterations to a previous upload.
Machine learning platform engineers: are backend software engineers who build scalable training and scoring infrastructure for machine learning. At Coinbase, this involves building pipelines to enable parallelized model training on large data sets or implementing a shared feature store with transformed features (aka signals) that can be shared across multiple machine learning models. They also work on building scalable backend systems that can return results from machine learning models in near real time.
Data engineers: build a reliable data warehouse that sets the foundations for all data use cases including business intelligence or machine learning. At Coinbase, this involves building Extract Transform Load (ETL) systems that can efficiently move data from many different services to our data warehouse (AWS Redshift in our case). They write streaming or batch mapreduce code to perform either in-memory aggregations on the data (Apache Flink) or batch aggregations respectively.
That’s the follow-up question I ask to determine how should I help them with their career progression at Coinbase. This could for example look like: data analysts who want to move into data engineering or data engineers who want to move into machine learning. This helps determine future projects that we could offer them so that they can learn the skill sets needed while doing those projects.
If you liked this article, we’d love to hear in comments about the data persona you would use to describe yourself as well as any feedback on whether you found this hiring rubric useful or not.
The links in this blog post are being provided as a convenience and for informational purposes only; they do not constitute an endorsement or an approval by Coinbase of any of the content or views expressed by or on any external site. Coinbase bears no responsibility for the accuracy, legality or content of the external site or for that of subsequent links. Contact the external site for answers to questions regarding its content.