Coinbase Logo

Language and region

Detecting Fraudulent Transactions: Coinbase Scalable Blockchain Address Risk Scoring System

Tl;dr: Coinbase introduces an innovative risk-scoring system to generate risk score of blockchain addresses, using machine learning to predict potential fraudulent transactions and provide extended protection for users.

By Ayush Agarwal

Engineering

, December 6, 2023

, 4min read time

Horizontally Scalable Random Walks - Blog

While Cryptocurrencies are increasingly becoming safer, a few bad actors do still exploit the pseudonymous nature of blockchain and lack of identity tied to the blockchain address.  To tackle these challenges, Coinbase introduced an innovative system for generating a risk score for blockchain addresses to detect if they may be involved in fraudulent transactions.

By combining different features for capturing the transaction pattern of an address with machine learning, the Scalable Blockchain Address Risk Scoring System can help predict the potential riskiness associated with a particular address. While no predictive system is perfect, this system has allowed us to expand our visibility for millions of blockchain addresses without Coinbase association - thus extending protection to our users.

The risk associated with an address is used in different systems like Coinbase Retail, Coinbase Wallet for cautioning the user and in some cases intentionally delaying transactions for manual review. This system has enabled significant savings of user funds, ensuring a safer and more reliable crypto experience. 

Ensuring Trust in Crypto Investment

At Coinbase, we work hard to strengthen trust and security within the crypto ecosystem. Our system for assessing the risk associated with blockchain addresses is designed to help identify malicious addresses and fraudulent transactions. The system is able to handle the very large scale of the blockchain networks and effectively takes into account the evolving nature of the blockchain.  By stopping / delaying fraudulent transactions or showing warnings to users, we enhance security and instill confidence in our users. Coinbase also has a system for detecting ERC-20 scam tokens. Read more about it here.

Scalable Graph Embedding Based Risk Scoring System

Research Paper - KYT - Risk Score Generation - Blog

The system incorporates two major components i.e a Risk Scoring System and a Graph Embedding System for Blockchain Addresses. The graph in question here is the transaction graph of a given blockchain such as Ethereum or Bitcoin, where nodes represent addresses and directed edges represent transactions between the addresses. 

Risk Scoring System for Blockchain Addresses

The Risk Scoring System predicts the risk associated with each blockchain address. It uses a comprehensive list of features capturing an address’ activity in blockchain. These features comprise (a)  aggregate transaction behaviors of an address and (b)  graph-based features to capture structural and transactional information. The system uses the well-known Node2Vec algorithm for generating graph node embeddings. These combined sets of features are used for generating the risk score of an address using a predictive machine learning model, trained using a supervised learning approach for which the ground truth labels were obtained from validated sources. 

Scalable Graph Embeddings for Blockchain Addresses

Horizontally Scalable Random Walks - Page 2

As of October 2023, Ethereum boasts approximately 270M addresses and Bitcoin has approximately 1.2B addresses. These numbers reflect the scale and complexity of blockchain networks which presents a unique challenge for generating graph embeddings for address. At Coinbase we therefore built a scalable approach for generating graph embedding for blockchain addresses by performing incremental training. The system incorporates a dynamic Node2Vec learning algorithm along with a distributed MapReduce approach for efficiently generating Node2Vec embeddings. The system performs an initial training of the Node2Vec model, then incrementally adds addresses which have transacted since the last training run. This helps in addressing the challenges associated with the large scale and dynamically evolving nature of blockchain transaction graphs. Utilizing Node2Vec embedding, this approach captures evolving transaction patterns for an address, thereby strengthening the system’s capability to effectively detect fraudulent addresses and transactions by capturing fraud patterns employed by fraudsters. 

What does it all mean for you? 

The blockchain address risk scoring system is being used in various Coinbase products for enhanced security of customer funds. This innovation has already saved millions of dollars for our customers by protecting their funds from fraudsters. The system is being used in multiple products in Coinbase to detect and address potential risk associated with cryptocurrency transactions. It aids in preventing fraudulent activities and issues alerts to users, bringing increased awareness when interacting with high risk blockchain addresses. 

Together, these algorithmic advancements in detecting fraud in the crypto world underscore Coinbase's commitment to user security and our continuous innovation in the field of cryptocurrency. With the addition of a Risk Scoring system for detecting fraudulent transactions, you can invest and transact with increased confidence, knowing that Coinbase is vigilantly working to keep your crypto assets safe. 

Special thanks to Prof. Bhaskar Krishnamachari, Arjun Maheswaram and Varsha Mahadevan for their work on this piece. Additionally, I express gratitude to the ML team for their dedicated efforts and Coinbase Risk Team for their valuable partnership. 

Coinbase practices and processes may evolve or change over time. Posts may not be updated to reflect those changes and should only be read to represent practices at the time of publication.

*Risk scores are numerical scores between 0 to 1 which denote the risk associated with a blockchain address.

Coinbase logo