Zero-Knowledge, the Blockchain, and Media: Is Data Monetization Dead?

Preface:* This paper was originally written as a final essay for a Big Data class as part of my MS program. Therefore, it is written in a largely academic tone and might be a bit dense while not using many web3-native terminology. Proceed assuming the audience for this paper is not a web3-literate professional, but a web3-curious disciple.*

Thank you for reading, and feel free to reach out to discuss more (& maybe inspire my MS capstone thesis)!


The advent of cryptocurrency and web3 in a larger social consciousness than before has, understandably, led to investigations and experiments into what ways could a total decentralized digital society manifest. Newer technologies like non-fungible tokens have begun to immerse themselves more fully into the media landscape with mainstream adoption already under way. In light of this, there is also a rethinking of what decentralized media platforms look like: whether they be replacements for social media platforms, or media-hosting platforms for video or audio.

One of the more fundamental aspects of our current media platforms - Netflix, Instagram, Spotify, etc - is that they are a centralized store of big data. Millions of users have entrusted these media companies with enough data voluntarily that these companies have, in turn, been able to use this data to make assumptions and characterizations of their users in order to improve their services, or sell their data to third-parties through data monetization. Enough of our data, whether it be personally identifiable information or behavioral trends, have been distributed to other companies who have thus used it to churn out more media to continue consumption and influence the public opinion.

One of the basic tenets of decentralization is increased privacy. In a financial setting, which is more often thought of when one thinks “crypto”, it is equivalent to not trusting “big banks” with an individual’s money, and instead owning custody of one’s own assets and be able to transact with others “privately”, without an intermediary third-party. For media platforms, eliminating an intermediary third-party would be tricky, as it may translate to eliminating the platform itself.

How does the media industry adapt to this landscape, then?

In this essay, I will be exploring a possible avenue for web3 media companies to nurture a decentralized media landscape while still engaging with data, despite anonymization and augmented privacy. I will first introduce the concept of zero-knowledge proofs in a basic manner to illustrate how one may create a platform that preserves user privacy. Then, I will explore what data is available to media platforms despite self-custodian data privacy. Finally, I will explore how media platforms could use this data to continue data-based practices that preserve user privacy while continuing a legacy of data-driven decision making and monetization.

Zero-Knowledge Proofs and Privacy

Zero-knowledge proofs (ZKPs) are a cryptographic way to confirm the veracity of one’s knowledge of some data X, without needing to know the value of data itself. Suppose there are two parties: the verifier V and the prover P. P claims to know some data X that would award them access to a facility. V needs to verify that P actually does know the data, without knowing what X’s value is as P refuses to reveal it.

To engage with this, both V and P must participate in a proof system. The proof is established when (1) V is convinced that P does know the requisite data for access and (2) if P does not know the requisite data, V should be able to ensure that P cannot fake this knowledge – the proof must be complete and sound (or valid).

The final criterion for this proof to be considered “zero-knowledge” is that if V is convinced that P does have the requisite knowledge, V cannot falsify their own knowledge of the data either.

The entire proof process is a mathematical one, most simply described as if P is required to know the answers to, for example, a sudoku puzzle to enter a room, P is able to prove this knowledge to V without revealing the answers themselves, and V must be confident in P’s proof to believe that they have this knowledge.

An extension of ZKPs are zK-SNARKs, which have the additional prerequisites of being “succinct” and “non-interactive.” These essentially mean the proofs can be completed quickly using less data, and require only one interaction between the prover and verifier as opposed to a back-and-forth dialogue.

The goal of ZKP is to establish a complete private interaction between two parties, wherein only one party knows certain data and is not required to expose that data in order to gain access to a service or other information. In administrative environments, this can amount to not needing to disclose one’s social security number in order to prove credit worthiness to open a new credit line, or reveal their passport details in order to prove their citizenship or identity. At a smaller scale, this can amount to not needing to enter a username and password to prove to a platform that they own an account, or have permission to access certain data or content. From a security perspective, this is safer as there’s no password transmission over a network, thereby no encryption deficiencies or data leakages.

Selective disclosure, which is what ZKPs can permit, allow users to take ownership of their data and make active decisions about their availability to platforms and services. It is not enforced privacy, but rather elective privacy: consumers are given the opportunity to choose what data of theirs they want to make available, and what they would like to keep private but still continue to access the same functions and services. As decentralization is tied to privacy, this is conceivably the next route for the next-generation of data-based applications and platforms.

In light of this increased privacy, what data remains to be collected and analyzed for media platforms and companies?

Blockchain Analysis and What We Have Left

Suppose there exists a web3 media platform whose primary function is to host video for streaming. Users sign into the platform – which is available through a monthly or annual subscription – by signing a blockchain contract that affirms their subscription and permits them access to the platform. The platform only has their wallet address which is used to sign the contract for entry, and can verify their subscription thanks to ZKP payment confirmation protocols.

In this scenario, the media company does not have any of the data that media companies are able to collect at present: no demographic information, no email addresses or passwords, and no personally identifiable information. The only data they have on a singular user is their wallet address (“wallet” is a bit of a misnomer here).

While at first glance, it may seem like wallet addresses are a bit useless in terms of gathering information about its owner, they are quite useful at establishing one’s on-chain identity and network. This is because wallet addresses are identifiers on the blockchain: the public ledger of cryptographic interactions. Therefore, the data they make public is related to wallet activity: how many assets are contained in this wallet, which other wallets does this wallet transact with, and what services does this wallet frequently interact with? The answers to these questions can help build a holistic profile of a single address, and falls in line with what some call blockchain analytics.

In some cases, it is also possible to associate an identity with an address. This is possible through two different ways: either soul-bound NFTs, or verifiable credentials. A soul-bound NFT is permanently tied to an address and cannot be transferred or destroyed, thereby acting as an identified for the address’ ownership. On the other hand, verifiable credentials are digital assets that contain information about an address that have been verified by a trusted source (and thus can be considered credible). An address that has any of these assets can be associated with an identity.

The scope of this paper is restricted to wallets without such identifiers, as these identifying assets make the collection of data very similar to that of web2 media platforms.

So where does this leave media platforms and companies, who have made big data an integral part of their revenue and content generation pipelines? Quite simply, web3 media platforms and companies must make a large part of their research and development focus on blockchain analysis in order to gain insights about their consumer base in a manner that respects the privacy provided by ZKP but also maintains an echo of the web2 data model.

The New “Big Data” and Media

Considering the data media companies would now have to work with be based on public blockchain data which is restricted to public signed transactions and interactions, we must rethink the insights we can gain from it. While demographics and other identity-first data metrics are now scarce, there are more avenues available to explore behavioral data and the significance of blockchain permanence; that is, having one’s assets and interactions immortalized for analysis.

With data related to transaction history (asset ownership), connected addresses (networks and popularly-interacted contracts and services), and identity (if available), blockchain analysis by media platforms can produce insights that influence subscription analytics, billing patterns, and market research.

Subscription analytics for a web3-first platform would be less about demographic-based subscription (country of residence, age, gender, etc) and more about networks and assets. Assuming most addresses don’t have identity-related assets associated with them, a preliminary analysis of their on-chain data would reveal which contracts (services) they actively interact with most (that is, either transact with or access in a permanent-enough manner to be recorded on-chain). This would enable platforms to create a profile of a user based solely on their contributions and on-chain activity, which is more active than passive.

Billing patterns can also be influenced by this data analysis; knowing the net average asset worth of a platform’s users could inform billing strategies and prices. In a self-perpetuating cycle, this would reflect in more impactful market forces deciding subscription rates rather than intra-market collusion.

Finally, being able to understand the behavioral patterns of the platform’s user base can better inform media programming. On a strategic level, this information is more active than passive streaming data we currently deal with through Nielson or self-reported streaming activity via existing platforms, for certain definitive actions are made permanent on the blockchain unless actively bypassed. Ownership of NFT assets, interactions with other wallets or contracts, and any on-chain public information can help platforms better assess what their next strategic move can be based on a real-time feedback loop.

Of course, the association with identity-based information with an address can still provide media platforms the same data with which they currently work. There are provisions to allow selective de-anonymization even within a ZKP framework for subscription and payment: voluntary selective de-anonymization, for example, can always be made available for media consumers who would be amenable to revealing their identity to the platform in exchange for personalization in services as opposed to those who would prefer to remain anonymous from the platform.

This can also serve as a precaution in events of fraud or suspected misconduct: in an effort to prove innocence or rightful ownership, a user could elect to voluntarily and selectively de-anonymize themselves to prove themselves. They would no longer have the protection of a privacy-protecting protocol as their other identifying information would be linked (not necessarily on-chain) with their wallet address (like an email address and password, or any multi-factor authentication-based information). Technology like decentralized identifiers could be integrated into the platform for these purposes, and could remain obscured until voluntary disclosure.

Overall, the advantages for a media company pivoting to adapt to a decentralized landscape far outweigh the limitations it may experience in terms of a data-driven decision making process. A ZKP-first, private approach to user information increases trust and transparency with a user base that is pointedly looking for platforms that do not act as interceptors for personally identifiable information. This creates an overall more secure environment for not only financial transactions for subscriptions but also interactions with the platform and any affiliated contracts, potentially also reducing centralized abuse of power. Finally, ZKP sign-ins can be integrated with other web3 decentralized apps, creating a more seamless experience for users who are looking for a cross-functional and cooperative landscape.

Further Research

While this paper sets up a preliminary working framework for a privacy-first media approach in order for media companies to pivot and adapt to a decentralized web3 landscape, there is a still a long way to go in establishing the protocol, procedures, and standards for such platforms. Further research and experimentation is required to fully integrate zero-knowledge proof-based sign-in structures with voluntary deanonymization via verifiable credentials and decentralized identifiers. Additionally, there needs to be detailed delineation of on- and off- chain data and what exactly remains accessible as part (or not part) of either. Finally, there needs to be more critical examination of how ZKP can fit into media companies’ data practices and user interactions.

Subscribe to esha
Receive the latest updates directly to your inbox.
Mint this entry as an NFT to add it to your collection.
This entry has been permanently stored onchain and signed by its creator.