Dimensions of design space for data availability

The problem of data availability must be solved in order to unlock huge increases in blockchain throughput. Data availability sampling (DAS) is a hot topic in this space, but it is only one dimension in the multi-dimensional design space for data availability. Verifiable information dispersal (VID) is another underrated dimension in this space. In this post we shine a spotlight on the distinction between DAS and VID, and we guide you along Espresso’s path through this design space.

The data availability problem

A core challenge for public blockchains is to scale to support billions of users with high data throughput and low latency. In such a demanding regime for performance most network nodes do not have enough resources to process a new block’s entire payload in the short time before it is finalized by the network. Instead this payload data must be stored off-chain by specialized storage nodes. A data availability (DA) solution is needed to guarantee that a new block will be finalized only if its payload is available for retrieval to anyone who requests it.

All existing solutions for DA begin with an erasure code: the block’s payload is supplemented with redundant data so that the entire payload can be recovered from any sufficiently large subset of the erasure-coded data. This erasure encoding ensures that no small set of rogue storage nodes could cause the loss of precious payload data, either by accident or by attack.

What is data availability sampling (DAS)?

Data availability sampling (DAS) is a “random audit” strategy for DA where some nodes in the network—call them sampling nodes—each select a small number of random pieces of erasure-coded payload data and query the storage nodes for those pieces. If the storage nodes respond successfully to sufficiently many of these queries then it is cryptographically certain that the payload is available. This certainty grows with the number and diversity of sampling nodes.

It does not take much bandwidth or compute resources to run a sampling node. A sampling node could, for example, be a light client run by an end user. In a sense, each additional light client who participates as a sampling node makes a meaningful contribution to the security of the network.

DAS is employed by Celestia and it is a part of Ethereum’s danksharding proposal.

What is verifiable information dispersal (VID)?

Verifiable information dispersal (VID) is another strategy for DA where the erasure-coded payload is partitioned into small shares. These shares are dispersed among the storage nodes so that

  1. The full payload can be reconstructed from only a subset of shares. Thus, the payload is available even when some storage nodes are unwilling or unable to provide their shares on request.
  2. Individual shares can be verified against a block commitment. This property protects against corrupted data from malicious storage nodes or a malicious block disperser.

Proof-of-stake networks such as Ethereum already rely on honest-majority assumptions, so VID solves DA under existing threat models.

VID is employed by Espresso as part of our three-layer DA solution called Tiramisu. VID is also part of Ethereum’s danksharding proposal.

DAS and VID are independent

It might seem at first glance that DAS and VID are competing solutions to DA, but in fact these two strategies are completely independent of each other. A network such as Ethereum could benefit by deploying either solution, or both. These DA solutions differ on both performance and security guarantees. They are orthogonal dimensions in DA design space.

Uses VID No VID
Uses DAS Ethereum danksharding Celestia
No DAS Espresso Bitcoin

Verifiable information dispersal (VID)

Scalability for high-performance blockchains

VID unlocks a huge improvement in both:

  1. The bandwidth required to run a node, and
  2. The total network communication summed over all nodes.

Thus, a network can use VID to squeeze much more throughput out of a given amount of bandwidth.

Without VID the full payload must be broadcast to all storage nodes. In this case the throughput of the entire network is limited by the throughput of each individual storage node. Such a severe limitation is incompatible with high network throughput.

By contrast, with VID the size of messages sent to each storage node is proportional to [payload size] divided by [number of storage nodes]. This is a significant asymptotic improvement. For example, a 1GB payload is too large to send to all storage nodes. But in a network with 1000 storage nodes each node receives only ~1MB of data. Each additional storage node in the network reduces the required bandwidth for all storage nodes.

Similar observations can be made about the total network communication summed over all storage nodes. Without VID total network communication scales with [number of storage nodes] times [payload size]. This cost can be prohibitive in a network with many storage nodes. For example, Ethereum currently has almost 1 million validators. Total communication to broadcast a 32MB block payload among all Ethereum validators is on the order of 32TB.

By contrast, VID guarantees DA using total network communication proportional to [payload size] with no dependence whatsoever on the number of storage nodes. Ethereum could use VID to improve total network communication by a factor of ~1 million.

Data availability for fast finality

Espresso’s VID scheme needs only one round of communication between the block proposer and storage nodes: the disperser delivers a share to each storage node, and the storage node responds with an attestation that it has received and validated the share. Most consensus protocols need at least one round of communication anyway, so this VID traffic does not change existing message flows.

This single round of communication is fast, even on the critical path to consensus. A VID execution could conclude before a new block has been finalized. That’s a remarkable property: DA is guaranteed at the moment a block is finalized.

Data availability sampling (DAS)

Detection of lapses in data retention

In any blockchain with off-chain data storage it is natural to have a data retention policy that stipulates how long the data for a newly finalized block must be available before storage nodes are permitted to delete that data so as to free storage for future blocks. Given such a policy, a natural challenge is how to enforce it, or, at least, how to detect a breach of policy.

DAS is well suited to this challenge. Sampling nodes could continue to post sampling queries to storage nodes throughout the duration of the retention period. If data goes missing during that period then the sampling nodes are certain to discover it.

For example, Ethereum’s danksharding proposal stipulates that storage nodes shall make data available for not just the current block, but all previous blocks in the current epoch. The purpose of this policy is to allow third parties adequate opportunity to retrieve new block data before storage nodes delete it. Accordingly, danksharding also stipulates that sampling nodes shall enforce this policy by querying all blocks in the current epoch.

Attributable fault for malicious nodes

DAS makes it possible to attribute fault to a malicious storage node who hides data. In order to explain this property we must first take a detour to talk about the meaning of “attributable fault” and why it matters.

The security of decentralized networks such as Ethereum typically rely on an honest-majority assumption. For example, a typical security guarantee has the form, “Blockchain B is secure if at least ⅔ of all validators (or perhaps all stake) honestly follows the protocol for B.”

Some malicious behaviors are attributable. For example, a malicious validator might sign attestations to two conflicting blocks. In this case, the conflicting signatures are cryptographic proof of malicious behavior. A neutral observer can be convinced to assign blame after-the-fact.

Other malicious behaviors are not so easily attributable. For example, suppose a malicious participant M has declined to acknowledge messages from an honest participant H, causing H to raise an alarm about M. A neutral observer after-the-fact cannot distinguish between:

  1. M is malicious and withheld requested data.
  2. H is malicious: M delivered the requested data but H falsely claims otherwise.

This is the fundamental challenge described in this 2017 presentation by Vitalik Buterin on the DA problem.

In most cases, it does not matter whether the network can attribute fault to the malicious parties. It matters only that the number of malicious parties complies with the honest-majority assumption. For example, a network could be secure if at most ⅓ of all validators are malicious, regardless of whether it’s possible to attribute fault to any of those validators.

Nonetheless, the ability to attribute fault has value. For example, attributable fault allows the network to punish bad actors by, say, slashing a security deposit. Such slashing increases the economic incentive for network participants to be honest.

For another example, it could be argued that an honest-majority assumption is more likely to get violated if fault cannot be attributed to the malicious actors. Perhaps fewer than ⅓ of all validators will be dishonest if their dishonesty could be cryptographically proven. But perhaps a ⅔-fraction of those same validators could be bribed into dishonesty on the condition that no neutral observer could distinguish honest actors from dishonest actors.

The importance of fault attribution in extreme network conditions

It is under extreme network conditions such as this that DAS could provide a security benefit that is not known to be achievable any other way. DAS can turn a data hiding attack from non-attributable to attributable.

For example, suppose an attacker who controls many storage nodes is attempting to hide data. Recall our earlier observation: if the attacker hides only a small piece of the payload then that missing piece could be recovered thanks to the erasure encoding. Thus, in order to successfully hide even a tiny amount of payload data, the attacker must hide a large amount of erasure-coded data. Given sufficiently many sample requests from light clients, it is cryptographically certain that many light clients will detect malicious behavior by the storage nodes controlled by the attacker.

We also observed that if only a single light client L raises the alarm about a storage node S then it is impossible to distinguish whether S or L is malicious. But a raised alarm about S from many independent light clients might constitute sufficient evidence against S to convince a neutral observer after-the-fact.

Thus, if we accept the premise that an honest-majority assumption could get violated in the presence of non-attributable faults, then any process that successfully assigns blame for an otherwise non-attributable fault could bolster the security of the network. DAS is one example of such a process for data hiding attacks.

DAS on the critical path to consensus

The process by which DAS attributes fault for a data-hiding attack adds an additional round of messages among many nodes on the network. This additional multi-party back-and-forth could severely reduce the performance of a latency-sensitive process.

Consider the critical path to consensus for finalization of a new block. High-performance blockchains are designed with the explicit goal to minimize this important time-to-finality metric. The decision to put a latency-heavy protocol such as DAS on the critical path is not to be made lightly.

For example, Ethereum’s danksharding proposal as described by Valeria Nikolaenko and Dan Boneh calls for each validator V, after receiving a candidate block from the proposer, to collect successful DA samples from many other validators before V replies to the proposer with its attestation.

For another example, Celestia does no DA sampling on the critical path to consensus. Instead, all DA sampling is done by light nodes after blocks are already finalized.

DAS off the critical path

What is the utility of DAS for blocks that have already been finalized? Specifically, what actions should a blockchain protocol take if sampling nodes discover that a finalized block has missing data? The defective block cannot be removed from the chain, as that would violate finality. It seems that the only remaining course of action is to invoke so-called “social consensus” (or, jokingly, “twitter consensus”), wherein protocol participants hold a discussion outside the protocol on how to proceed.

Conclusion: DA in Espresso

VID and DAS are defenses against data hiding attacks on high-performance blockchains. These defenses are independent of each other—a network could use either or both.

At Espresso we have chosen to use VID for its security, minimization of back-and-forth communication among nodes on the critical path, and asymptotic improvements in bandwidth use. We leave open the option to use DAS in future iterations of Espresso.

1 Like