Dimensions of design space for data availability

gus · July 6, 2024, 9:53pm

The problem of data availability must be solved in order to unlock huge increases in blockchain throughput. Data availability sampling (DAS) is a hot topic in this space, but it is only one dimension in the multi-dimensional design space for data availability. Verifiable information dispersal (VID) is another underrated dimension in this space. In this post we shine a spotlight on the distinction between DAS and VID, and we guide you along Espresso’s path through this design space.

The data availability problem

A core challenge for public blockchains is to scale to support billions of users with high data throughput and low latency. In such a demanding regime for performance most network nodes do not have enough resources to process a new block’s entire payload in the short time before it is finalized by the network. Instead this payload data must be stored off-chain by specialized storage nodes. A data availability (DA) solution is needed to guarantee that a new block will be finalized only if its payload is available for retrieval to anyone who requests it.

All existing solutions for DA begin with an erasure code: the block’s payload is supplemented with redundant data so that the entire payload can be recovered from any sufficiently large subset of the erasure-coded data. This erasure encoding ensures that no small set of rogue storage nodes could cause the loss of precious payload data, either by accident or by attack.

What is data availability sampling (DAS)?

Data availability sampling (DAS) is a “random audit” strategy for DA where some nodes in the network—call them sampling nodes—each select a small number of random pieces of erasure-coded payload data and query the storage nodes for those pieces. If the storage nodes respond successfully to sufficiently many of these queries then it is cryptographically certain that the payload is available. This certainty grows with the number and diversity of sampling nodes.

It does not take much bandwidth or compute resources to run a sampling node. A sampling node could, for example, be a light client run by an end user. In a sense, each additional light client who participates as a sampling node makes a meaningful contribution to the security of the network.

DAS is employed by Celestia and it is a part of Ethereum’s danksharding proposal.

What is verifiable information dispersal (VID)?

Verifiable information dispersal (VID) is another strategy for DA where the erasure-coded payload is partitioned into small shares. These shares are dispersed among the storage nodes so that

The full payload can be reconstructed from only a subset of shares. Thus, the payload is available even when some storage nodes are unwilling or unable to provide their shares on request.
Individual shares can be verified against a block commitment. This property protects against corrupted data from malicious storage nodes or a malicious block disperser.

Proof-of-stake networks such as Ethereum already rely on honest-majority assumptions, so VID solves DA under existing threat models.

VID is employed by Espresso as part of our three-layer DA solution called Tiramisu. VID is also part of Ethereum’s danksharding proposal.

DAS and VID are independent

It might seem at first glance that DAS and VID are competing solutions to DA, but in fact these two strategies are completely independent of each other. A network such as Ethereum could benefit by deploying either solution, or both. These DA solutions differ on both performance and security guarantees. They are orthogonal dimensions in DA design space.

	Uses VID	No VID
Uses DAS	Ethereum danksharding	Celestia
No DAS	Espresso	Bitcoin

Verifiable information dispersal (VID)

Scalability for high-performance blockchains

VID unlocks a huge improvement in both:

The bandwidth required to run a node, and
The total network communication summed over all nodes.

Thus, a network can use VID to squeeze much more throughput out of a given amount of bandwidth.

Without VID the full payload must be broadcast to all storage nodes. In this case the throughput of the entire network is limited by the throughput of each individual storage node. Such a severe limitation is incompatible with high network throughput.

By contrast, with VID the size of messages sent to each storage node is proportional to [payload size] divided by [number of storage nodes]. This is a significant asymptotic improvement. For example, a 1GB payload is too large to send to all storage nodes. But in a network with 1000 storage nodes each node receives only ~1MB of data. Each additional storage node in the network reduces the required bandwidth for all storage nodes.

Similar observations can be made about the total network communication summed over all storage nodes. Without VID total network communication scales with [number of storage nodes] times [payload size]. This cost can be prohibitive in a network with many storage nodes. For example, Ethereum currently has almost 1 million validators. Total communication to broadcast a 32MB block payload among all Ethereum validators is on the order of 32TB.

By contrast, VID guarantees DA using total network communication proportional to [payload size] with no dependence whatsoever on the number of storage nodes. Ethereum could use VID to improve total network communication by a factor of ~1 million.

Data availability for fast finality

Espresso’s VID scheme needs only one round of communication between the block proposer and storage nodes: the disperser delivers a share to each storage node, and the storage node responds with an attestation that it has received and validated the share. Most consensus protocols need at least one round of communication anyway, so this VID traffic does not change existing message flows.

This single round of communication is fast, even on the critical path to consensus. A VID execution could conclude before a new block has been finalized. That’s a remarkable property: DA is guaranteed at the moment a block is finalized.

Data availability sampling (DAS)

Detection of lapses in data retention

In any blockchain with off-chain data storage it is natural to have a data retention policy that stipulates how long the data for a newly finalized block must be available before storage nodes are permitted to delete that data so as to free storage for future blocks. Given such a policy, a natural challenge is how to enforce it, or, at least, how to detect a breach of policy.

DAS is well suited to this challenge. Sampling nodes could continue to post sampling queries to storage nodes throughout the duration of the retention period. If data goes missing during that period then the sampling nodes are certain to discover it.

For example, Ethereum’s danksharding proposal stipulates that storage nodes shall make data available for not just the current block, but all previous blocks in the current epoch. The purpose of this policy is to allow third parties adequate opportunity to retrieve new block data before storage nodes delete it. Accordingly, danksharding also stipulates that sampling nodes shall enforce this policy by querying all blocks in the current epoch.

Attributable fault for malicious nodes

DAS makes it possible to attribute fault to a malicious storage node who hides data. In order to explain this property we must first take a detour to talk about the meaning of “attributable fault” and why it matters.

The security of decentralized networks such as Ethereum typically rely on an honest-majority assumption. For example, a typical security guarantee has the form, “Blockchain B is secure if at least ⅔ of all validators (or perhaps all stake) honestly follows the protocol for B.”

Some malicious behaviors are attributable. For example, a malicious validator might sign attestations to two conflicting blocks. In this case, the conflicting signatures are cryptographic proof of malicious behavior. A neutral observer can be convinced to assign blame after-the-fact.

Other malicious behaviors are not so easily attributable. For example, suppose a malicious participant M has declined to acknowledge messages from an honest participant H, causing H to raise an alarm about M. A neutral observer after-the-fact cannot distinguish between:

M is malicious and withheld requested data.
H is malicious: M delivered the requested data but H falsely claims otherwise.

This is the fundamental challenge described in this 2017 presentation by Vitalik Buterin on the DA problem.

In most cases, it does not matter whether the network can attribute fault to the malicious parties. It matters only that the number of malicious parties complies with the honest-majority assumption. For example, a network could be secure if at most ⅓ of all validators are malicious, regardless of whether it’s possible to attribute fault to any of those validators.

Nonetheless, the ability to attribute fault has value. For example, attributable fault allows the network to punish bad actors by, say, slashing a security deposit. Such slashing increases the economic incentive for network participants to be honest.

For another example, it could be argued that an honest-majority assumption is more likely to get violated if fault cannot be attributed to the malicious actors. Perhaps fewer than ⅓ of all validators will be dishonest if their dishonesty could be cryptographically proven. But perhaps a ⅔-fraction of those same validators could be bribed into dishonesty on the condition that no neutral observer could distinguish honest actors from dishonest actors.

The importance of fault attribution in extreme network conditions

It is under extreme network conditions such as this that DAS could provide a security benefit that is not known to be achievable any other way. DAS can turn a data hiding attack from non-attributable to attributable.

For example, suppose an attacker who controls many storage nodes is attempting to hide data. Recall our earlier observation: if the attacker hides only a small piece of the payload then that missing piece could be recovered thanks to the erasure encoding. Thus, in order to successfully hide even a tiny amount of payload data, the attacker must hide a large amount of erasure-coded data. Given sufficiently many sample requests from light clients, it is cryptographically certain that many light clients will detect malicious behavior by the storage nodes controlled by the attacker.

We also observed that if only a single light client L raises the alarm about a storage node S then it is impossible to distinguish whether S or L is malicious. But a raised alarm about S from many independent light clients might constitute sufficient evidence against S to convince a neutral observer after-the-fact.

Thus, if we accept the premise that an honest-majority assumption could get violated in the presence of non-attributable faults, then any process that successfully assigns blame for an otherwise non-attributable fault could bolster the security of the network. DAS is one example of such a process for data hiding attacks.

DAS on the critical path to consensus

The process by which DAS attributes fault for a data-hiding attack adds an additional round of messages among many nodes on the network. This additional multi-party back-and-forth could severely reduce the performance of a latency-sensitive process.

Consider the critical path to consensus for finalization of a new block. High-performance blockchains are designed with the explicit goal to minimize this important time-to-finality metric. The decision to put a latency-heavy protocol such as DAS on the critical path is not to be made lightly.

For example, Ethereum’s danksharding proposal as described by Valeria Nikolaenko and Dan Boneh calls for each validator V, after receiving a candidate block from the proposer, to collect successful DA samples from many other validators before V replies to the proposer with its attestation.

For another example, Celestia does no DA sampling on the critical path to consensus. Instead, all DA sampling is done by light nodes after blocks are already finalized.

DAS off the critical path

What is the utility of DAS for blocks that have already been finalized? Specifically, what actions should a blockchain protocol take if sampling nodes discover that a finalized block has missing data? The defective block cannot be removed from the chain, as that would violate finality. It seems that the only remaining course of action is to invoke so-called “social consensus” (or, jokingly, “twitter consensus”), wherein protocol participants hold a discussion outside the protocol on how to proceed.

Conclusion: DA in Espresso

VID and DAS are defenses against data hiding attacks on high-performance blockchains. These defenses are independent of each other—a network could use either or both.

At Espresso we have chosen to use VID for its security, minimization of back-and-forth communication among nodes on the critical path, and asymptotic improvements in bandwidth use. We leave open the option to use DAS in future iterations of Espresso.

bfisch · December 4, 2024, 6:46am

More precisely, if the storage node passes sufficiently many queries then with extremely high probability the payload is available to the storage node. This alone doesn’t mean the data is actually “available” to the network, because the storage node could still intentionally withhold the data from anyone who requests it. Therefore, DAS is not too helpful in this regard. An honest node that has lost the data will report this and try to recover it, so DAS doesn’t help. A dishonest node who has the data may still withhold the data, so DAS is insufficient. DAS could help dissuade nodes from deleting data just to save storage costs, similar to proofs-of-retrievability in Filecoin, but only if nodes are economically punished for failing DAS. This is non-trivial, as DAS is verified offchain, and not the case in implementations today.

But what DAS is really trying to achieve is in the other direction: if collectively the storage nodes of a consensus protocol have lost or are withholding (on account of some Byzantine subset) a block of data then with very high probability consensus clients will agree that the block is unavailable and thus can fork the consensus to exclude this block. In this context, “withholding a block” also precludes releasing sufficiently many shares of the data in response to DAS queries such that honest clients can cooperatively reconstruct the data. Intuitively, this would seem to imply that nodes must reply incorrectly (or not at all) to most queries, and that with high probability at least one DAS query will fail for each consensus client. However, this would only be true if queries were randomly mixed and anonymously routed at the network level, which they are not in any DAS implementation today! Otherwise, the malicious storage nodes can choose to answer queries correctly for one client and not respond to queries for any other client. The one client would see the block as available while all other clients see it as unavailable. This breaks consensus safety (e.g., the one client may consider a transaction final when all other clients will skip over it, leading to loss of funds). This is called a “selective share disclosure” attack (section 5.4 in https://arxiv.org/pdf/1809.09044.pdf ).

In conclusion, without anonymous mixing of queries, DAS helps by limiting the scope of attacks that the malicious storage nodes could carry out: it limits the number of clients they can harm by causing them to finalize transactions that social consensus will eventually exclude. However, without either anonymous mixing or economic slashing mechanisms DAS doesn’t buy much on top of consensus and VID.

Topic		Replies	Views
ZODA, FRIDA, Danksharding, AVID and Dispersed Ledger Consensus	0	487	January 23, 2025
Faster VID on Espresso’s critical path Consensus	2	309	October 15, 2024
CIRC: Coordinated Inter-Rollup Communication Interoperability	2	1103	January 7, 2025
Questions regarding the decentralized timeboost spec General	9	150	October 16, 2024
Espresso ZK Light Client Cryptography	0	98	July 6, 2024