Blockchain client types
January 26, 2022
There are four basic blockchain client types, each of which has a different use within the context of a network. Determining the differences between the four types can help you choose the right one for your needs.
Blockchains are decentralized networks that require many parties to sync with them in a peer-to-peer manner. Each blockchain may support multiple software implementations, or clients, that enable users to take part. A user may execute one of these software implementations in different set forms, called client types or nodes, and these formed nodes are the parties that create the decentralized network.
Typically, each node stores its own copy of the blockchain and keeps track of incoming transactions to have an up-to-date view of the network. These nodes are necessary to ensure the correctness of the chain, prevent malicious activity from occurring, and maintain decentralized consensus.
However, for some users the storage and computing requirements of blockchains can prove a barrier to entry — and it is technically difficult to run nodes that require a full download of the blockchain. Thus, new ways are needed to trustlessly sync with the chain in a secure fashion.
The four basic client types of a blockchain network are:
Archive (or archival) nodes
Each has its own characteristics, and each can be used in a different way within the context of a blockchain network.
The most standard type of node is a full node. Full nodes store all the blockchain data on disk and verify the rules of the network — which include tasks such as participating in block validation, receiving and verifying all transactions, and generally serving the network with data.
Full nodes must also store a copy of the state, a data structure that holds the status of users in the network — such as the UTXO set in Bitcoin, or all the accounts and balances in Ethereum (in which the state is respectively represented as a Merkle tree and a modified Merkle Patricia Trie). Full nodes are distinct from miners, as miners simply reorder or remove transactions from the data received by nodes and then perform the mining process to solve cryptographic puzzles.
While clients in general must follow a formal specification, a given network can be open to different client implementations. For example, the Ethereum network consists mostly of Geth and Parity nodes (mostly Geth), and eth2 will support a large variety of client implementations including Prysm, Lighthouse, and Lodestar.
However, full nodes must keep track of a significant quantity of information, requiring large amounts of storage and bandwidth to operate (SSDs must be used due to the volume of read/write operations). For example, as of early October 2020, the Bitcoin blockchain took up around 300 GB on Geth, and the Ethereum blockchain used around 500 GB.
Besides, though nodes require 24/7 uptime and a high level of technical knowledge to maintain them, there is generally no direct economic incentive to run a full node. As a result, many users running full nodes are businesses such as exchanges or infrastructure providers who rely on the other benefits of full nodes.
While running a full node is technically challenging, there are benefits to doing so. First, running a full node is the most secure way to access a network. It guarantees maximum self-sovereignty because you can trustlessly verify that all network rules are being followed. By running a full node you also improve the decentralization and overall health of the network by acting as a data provider and protecting other clients from being tricked by malicious nodes or miners.
Full nodes help secure the network by verifying all transactions rather than just those that are relevant to them, and further secure the network by alerting other client types of invalid blocks. In some networks, certain other client types rely on full nodes to verify transactions and cannot access the blockchain without connecting to a full node.
In some networks you may also be able to receive rewards for running a full node. For example, Celo aims to address the lack of economic incentives for operating a full node: its network allows individuals who run non-validating full nodes to set gateway fees for answering requests and forwarding transactions on behalf of other types of clients. We hope to see further experimentation to incentivize participants to run full nodes in future blockchain networks.
While full nodes already store a large amount of data, archival nodes take storage even further by retaining everything included in a full node along with an archive of the historical states of the chain.
Stores current balance of any account in chain
Stores full balance history of any account in chain
Stores state of last few blocks in network
Stores history of every state change in network
Has information needed to re-compute historical network data
Has historical network data stored, does not need to re-compute to find
An archival node could be described as a full node with a massive amount of cached historical data. However, and importantly, an archival node does not provide any more validation or security than a full node.
As of October 2020, archival nodes on Ethereum occupy more than 5.3 TB of data. With such a large volume of data, to sync an archival node on a network should take approximately two weeks. But by using an infrastructure-as-a-service product, that time may be dramatically reduced — the Coinbase Cloud ETH archival node is production-ready in a few hours. Only a very small number of archival nodes are actually run on the network, due to the lift of spinning up an archival node on one’s own, and they are typically run by entities such as block explorers, data analytics companies, and infrastructure providers.
Due to their intense storage and the uptime needed to remain functioning, most users choose not to run full nodes. Light clients, however, improve the accessibility of blockchain networks for resource-constrained devices, while giving high security and requiring low computing power.
As a low-resource node, a light client allows users to sync with a blockchain in a cryptographically secure manner without having to store the whole blockchain. Light clients can be used to find out the state of an account, check that a transaction was confirmed, or watch for logged events.
Light clients operate by downloading and verifying a chain of block headers and requesting any other relevant information, such as transaction data, from full nodes. The header is the smallest unit that forms a chain, and each header refers back to the previous block’s header. The block header stores a condensed version of information in the block, including the hash of the previous block, the timestamp, and the Merkle tree root.
This Merkle tree root is a representation of the state of a block and the set of all transactions. It could be regarded as a fingerprint of information about the block. The goal of a light client is to verify and archive the headers, and verify received information against the Merkle tree. Only the specific portion of the state that is relevant to the light client needs to be verified, and proofs received from full nodes can be verified against the Merkle root in the block header.
While light clients do not need to be run constantly, they must connect to intermediary full nodes to request data and interact with the blockchain. Verification is trust-minimized: proofs can be verified regardless of whether the light clients trust the full node.
In Bitcoin, the method above is known as SPV verification. SPV clients trust downloaded headers as long as they belong to the longest chain. For any given transaction, full nodes provide light clients with an SPV proof and a Merkle path to the transaction in the tree as the data needed to verify the transaction. This method can be used for cross-chain interactions such as bridges or sidechains.
Light clients are well suited for low-capacity users, such as those using smartphones or browser extensions, because they are able to maintain a high-security assurance about the state of a chain. While light clients do not write data to the network, they do make blockchains more accessible to a variety of other users.
Light client designs
The design space of a light client is enormous and there is always room for improvement and more features. Light clients can borrow techniques from cryptography and distributed systems to construct complex yet innovative solutions.
Below are some examples of cutting-edge light client designs.
Celo’s ultralight client Plumo uses a mix of different cryptography techniques to achieve lightweight validation. In general, SPV verification for proof of stake networks is expensive: users need to verify that two-thirds of the validating stake has signed on a block for a given header and blocks occur frequently.
Celo has improved on this, using epoch-based synching whereby only one header is downloaded per epoch. In Celo, the validator set changes once only per epoch – and an epoch is one day, so the load on light clients is already drastically reduced: they need to verify headers only once a day rather than once per block.
Cryptographic primitives such as BLS signatures can be used to aggregate all of a validator’s signatures, and SNARKs — proofs used to verify the correctness of a computation without having to execute it yourself — can be submitted from full nodes to prove the light client protocol. This process consists of checking the signatures of the last header of each epoch plus any validator set changes. Using SNARKs, one could (relatively) quickly prove validator set changes over the span of months.
Light clients are also being improved in research. For example, storage and bandwidth requirements scale linearly with the chain length of an SPV proof, and can still be a burden in a larger blockchain. Flyclient is an efficient solution for light client block header verification. It improves on a protocol called Non-Interactive Proofs of Proof of Work by being compatible with variable difficulty and hashrates. It also involves short inclusion proofs, which are 10 times smaller than previous solutions. Flyclient operates by downloading only a logarithmic number of block headers (instead of having to download every one) while storing only a single block header between executions.
With Flyclient, one can prove the whole chain is valid using as little information as possible, enabling easier cross-chain interoperability in decentralized protocols that require light client verification. ZCash specifically plans to use Flyclient research to implement a ZEC-ETH bridge (tZEC will implement a light client verification of the ZCash blockchain inside an Ethereum smart contract).
Blockchains depend on a shared state that corresponds to the values in a block at any given time. As explained earlier, the state changes after transactions are executed, and is typically stored in a tree data structure such as a Merkle tree or Merkle Patricia Trie. However, the state can become very large, and rebuilding the tree for the purposes of verification can be expensive. This can make node sync times very long, making them harder to function and ultimately decreasing how many nodes are run.
A research initiative called “Stateless Ethereum” aims to make nodes in Ethereum easier and faster to spin up by requiring the bare minimum amount of information to ensure the validity of the state. This could enable nodes to begin functioning in minutes rather than days, which could serve as an enormous improvement on the status quo.
The most traditional way to sync a node is by using the full sync method, which involves starting at the genesis block to sync. Alternatively, fast sync could be used, which starts requesting blocks from a trusted checkpoint and then switches to full sync as soon as it catches up.
The closest iteration of Stateless Ethereum in research has been the exploration of a “beam sync” mode, which only pulls the data it needs to execute changes to the state, rather than downloading the whole state.
In beam sync, clients begin watching and executing transactions as they happen, and request a witness (proof) for each block for any information it does not have. The client can then gradually rely more on its locally computed state as it builds up its own history of transactions.
It is prudent to note that statelessness is a spectrum: a truly stateless client would not store any state itself; instead it would store only the latest transactions, together with witnesses, to execute the next block.
In practice, there will probably be a spectrum of stateful nodes, some providing full information and some receiving selected portions of it. For example, full state nodes would compute a witness and attach it to a block; partial state nodes would only keep state for a few blocks, or would simply watch the state relevant to them and request the rest of the data from witnesses. (Zero state nodes would rely entirely on witnesses to verify blocks.)
Fraud/validity proofs and data availability
Most light clients operate on the assumption that the majority of miners/validators are honest, and simply check that the miners/validators have supported a given block rather than verifying the block themselves. However, a set of malicious nodes might be able to attack light clients and submit invalid blocks.
One way to protect against such behavior is to introduce a system of alerts so honest full nodes can report an invalid block to light clients. Specifically, fraud proofs can be used to report dishonest behavior and additionally weaken the honest majority assumption. If a verifying node processes a block and finds that it is invalid, it can create a “fraud proof” containing information from the block and Merkle tree to convince any light client that the block is invalid.
Light clients could simply take this proof and verify the block themselves, even if they are given no other data. With fraud proofs, light clients have full assurance about the state of a blockchain, and are provided with a better security model as long as there is at least one honest node (1 of N). In a stateless validation setting, light clients would need to verify individual blocks only if they hear alarms (and where the alarms are verifiable).
However, what happens if an attacker creates an invalid block but does not release data about it (called the data availability problem)? Fishermen — actors who check for invalid blocks — would not have enough data to prove that the block is invalid. Furthermore, the resulting game between fisherman and attacker could become complicated, as the attacker could publish the data at any time if accused of bad behavior.
One solution is to create “proofs” of data availability by the use of erasure codes (e.g. Reed Solomon codes), a cryptographic technique that allows a piece of information to be divided into many pieces (codes) but reconstructed with only a subset of the pieces. Using erasure codes, light clients would be able to prove the data availability of a block probabilistically by downloading only certain chunks of data.
Another source of improvement is using SNARKs or STARKs to create validity proofs, which are cryptographically verifiable proofs that allow block producers to prove to clients that a block satisfies some arbitrarily complex conditions. The light client would simply need to download the header, verify the proof, and then randomly sample some Merkle tree branches of erasure-coded data for data availability checks.
It is clear that a wide ecosystem of client types is required to serve a variety of blockchain users and use cases, and to maintain truly healthy blockchain networks.
While full nodes must exist for decentralized blockchain networks, the barriers to entry remain high, and not every user can run a full node.
Light and stateless clients are therefore necessary to improve the accessibility and decentralization of blockchains by increasing participation — and are more convenient for most users. The easier validation is, the greater the chance that new nodes can sync with a chain, which makes the network more resilient to attacks.
The future of blockchain clients is exciting. As new research is implemented, we will see designs that are drastically more functional, performant, and accessible. Novel cryptographic tools such as SNARKs and STARKs will accelerate the progress of light clients and lead to improvements in areas such as sharding and cross-chain protocols — or simply enable use cases that have not yet been imagined.
As we progress these systems further, by developing different technologies and adopting new trust models, our definition of validation and even decentralization may change. Lightweight verification has already enabled more robust social coordination with 1 of N trust models, and we now realize that not everyone is required to validate everything in a blockchain.
And so, perhaps some day, anyone with a smartphone and internet connection will be securely connected to blockchain networks, and we will all have access to a truly global financial system.
Query & Transact
Anyone building products and services with blockchain data needs access to reliable read/write nodes, as nodes are the access points into the entire ecosystem. But developing and managing decentralized and resilient node infrastructure in-house is not a simple task — especially when trying to support a diverse range of blockchain protocols. Relying on a provider that rate-limits data usage, or only supports a few networks, is not an option for many businesses that anticipate rapid growth.
Query & Transact by Coinbase Cloud is an infrastructure product designed for companies and entrepreneurs who face the challenges of developing and managing decentralized and resilient node infrastructure as they build secure Web 3.0 applications. QT provides a robust link between off-chain systems and blockchain networks, making it significantly easier for companies to add blockchain support and expand their protocol coverage without investing to develop in-house capabilities.
“Whether you’re an established company looking to free up engineering resources, or you’re a team just getting started in the blockchain space and you want to build something with secure and reliable access to these chains, QT clusters make it fast and easy to build on any of these blockchains. We’re thankful to our early QT customers who helped make this a better product than anything that exists in the ecosystem.”
— Joe Lallouz, Coinbase Cloud
In addition to offering full nodes, we’re proud to offer archival nodes as part of QT clusters. QT Archival includes complete block-by-block information about the state of the network — data not included in a full node’s ledger. Data and machine learning (ML) companies can make use of archival nodes without the hassle and expense of maintaining them in-house. Learn More…