CoSnarks in Action at Devcon7

It's a wrap! Devcon7 flew by and with everyone back in the TACEO office we take some time to recap on the week full of events, talks, and making cryptographic connections. And allowing you to make cryptographic connections in a secure and privacy preserving manner was our main focus during the event! Or, to be more precise it was allowing you to get nice swag for meeting as many people as possible ;)

We will dive into more details in the blog post, but Devcon7 marks also a big milestone for TACEO and we want to celebrate with you. In collaboration with Cursive and PSE we built the first production ready coSNARKs and generated the first proofs with real-world data in our alphanet.

Announcement of Alpha Network launch during Devcon 7

Valid Taps and the Leaderboard

First things first, before we talk about the alphanet, we want to explain to you what exactly we built for Devcon. The Cursive team built an impressive demo, highlighting the capabilities of a lot of cool crypto building blocks. Visitors to the Cursive booth received an NFC sticker or card with cryptographic key material on it. Pairing the sticker with your mobile phone allowed you to register at Cursive's backend with the public key you obtained from the sticker. Locally in the browser, you provided contact information, like your telegram handle, X account, notes from your encounter, and whatnot.

To get the contact information of a new frend you simply tapped your phone on their sticker. If the other person agrees, you get their contact information, signed by their private key. The (plain) contact information never left the device, you are the only person that received the information, building the Devcon Social Graph. A lot of cryptography is involved here1, but we of course want to focus on the coSNARKs part, so bear with us for one more minute.

Participation in building the social graph was emphasized on the coSNARK Tap Leaderboard (yeah, we are closing in). On the Leaderboard, every tapper could find themselves in ranking with all other tappers. At the Cursive booth, if you could prove (*wink*) to them, that you had more than a certain amount of valid taps, you got some merch, e.g., a nice hat, or an NFC ring!

The alert reader maybe already realizes where this goes. We somehow need to create proofs that attest a valid tap on a device with limited resources (your mobile phone) - in come coSNARKs!

The rest of the blog post will give you a high level overview of the alphanet and how we integrate it in the Cursive tech stack. We may write another post about the technical intracities in the future (especially interesting is how we managed to cut the time of the witness extension from 40 seconds to ~8 seconds). This post though serves as a high level overview on how you will be able to utilize our coSNARKs network to easily delegate proof generation!

Private Proof Delegation

This specific use-case is one of the two main problems we want to solve with coSNARKs. We have a dedicated blog post about the use-case, highlighting the problem and how coSNARKs can solve it in case you want to dive into that before going on. Anyways, to understand the rest of the blog post it's not necessary.2

But first things first, why the heck do we even need coSNARKs here? It all boils down to the trust assumptions of the social graph. Neither Cursive nor TACEO nor anyone else knows your contact information. Additionally, we also don't want that anyone knows your connections. We only want to know the number of valid taps you made during the Devcon week, so that the Cursive team can give you your well earned swag.

The Wrong Way

Of course we could simply store all your contact information in some database, sent this out on every tap, and increase some counter by one to build the leaderboard. I think all of us can trivially see why we should opt against that. We can do better than Google and Meta and all the rest of them. Another easy solution to this problem is that on every tap, both tappers sign a message indicating that they, in-fact, tapped. This would erase the need to store the contact information, but still, at least Cursive would learn how the social graph looks in its entirety, as they would learn who connected with whom.

The solution is to build a zkSNARK that attests that you received a valid signature from another NFC chip. We do not need to verify the signature inside the zk-circuit, just that it is a valid signature, and is from the key set of the Devcon participants. Additionally, we produce some nullifiers that prevent users from submiting the same tap multiple times.3

The circuit has 7,736 constraints, so not too large all things considered. Nevertheless, if we want to compute a zkSNARK on your mobile phone with domain size of $2^{13}$ we run into problems. First of all, the generated WASM from snarkJS to produce your extended witness is just shy of 3MB. This is quite a lot for mobile devices. Still, if we can assume good WiFi at the venue this should not be that much of a problem. More pressing is the upper bound mobile phones enforce on the applications that run on the device (which is a good thing). For a reasonable machine (e.g., a desktop) building a proof with $2^{13}$ constraints is not a problem, but for a mobile device, especially lower end, this is actually quite a bit of work. We estimated that on average a valid tap proof should take about 15 seconds for lower end devices. During this 15 seconds, users are not allowed to put their phone away, as the browser would go to the background. Background applications that try to compute FFTs and IFFTs are killed faster by the mobile phone than you can say Multi-Scalar Multiplication.

The Good Way

As we can see, building the proof on your mobile phone may be possible, but the UX is just horrendous. After a tap, I don't want to stand there for 15 seconds with my phone in hand, hoping the OS doesn't kill the application, potentially having some other people waiting for you to tap. So what do we do? We delegate the proof creation to the coSNARKs alphanet!

Have a look at the following picture:4

Depiction of the different actors in the alpha-network

This figure shows all the involved actors in this setting:

  1. The user's device which submits a new job (proof delegation) to the management server.
  2. The node providers, which are identified by a public key. They provide MPC nodes that can produce coSNARKs. Every node provider can have one or more nodes. We need nodes from three different providers otherwise security assumptions would break.5
  3. Cursive DB/backend, which verifies the proofs and keeps track of the leaderboard.
  4. The management server, a delegation entity that connects open jobs to node providers.

After you tapped and got the signature and contact information from your new friend, you secret-share all the necessary information to build the proof (the signature, which keys were involved, ...) on your device and encrypt the shares under the public keys of the node providers. This also happens in the browser, but takes sub 10 milliseconds and a 170KB WASM package. Compare that with the 3MB witness extension WASM package and the 15 seconds proof generation time of the client-side proving. The mobile phone then submits the encrypted shares to the management server. Shares are encrypted to maintain privacy, as otherwise the management server would be able to reconstruct the input. The management server creates a new job for the network, and returns a unique identifier for the job. Lastly, the mobile phone sends this job id to the Cursive backend and that's it! That is all your mobile phone has to do, secret-share 12 field elements, encrypt them, and send two HTTP requests. All of this happens in an instant - I am sure that you all didn't even notice that this was going on when you tapped during Devcon ;)

Workflow that user device needs to perform to request a coSnark

Even though your phone is done, the real work for the network has just begun. Finding node providers is very easy, as we only have three for the Demo, namely Cursive, PSE and TACEO. The waiting nodes (the ones that are not currently producing a proof) now race to the management server to retrieve the new job. As soon as there is one available node per node provider, the management server sends out the necessary information for the computing nodes. This includes their encrypted input, and its peers (the other nodes).

They create the extended witness and produce a Groth16 proof6 with our coCircom tooling, verify it locally, and send it back to the management server. The whole process, end-to-end, took approximately 15 seconds, with the bulk of it being the witness extension, so the same amount of time a traditional zkSNARK would take on your device.

While all of this is happening, the Cursive backend that keeps track of the leaderboard periodically asks the management server whether the nodes produced some proofs for the currently ongoing jobs (remember, your device forwarded the job id to the Cursive backend). As soon as the management server provides a proof, the backend verifies it, and upon success, updates the leaderboard.

Workflow that Cursive server performs to check the completion of a coSnark

We of course skipped some technical details, but stay tuned for a more in-depth follow up post!

The Alphanet in Numbers

During the four day long event the alphanet produced more than 15k(!) coSNARKs, which means more than 3.7k proofs per day and in crunch times the network produced up to 40 proofs per minute. In total there were a little bit over 1.8k tappers. So thank you to all of you who participated ❤️

Our original goal was to reach 2k proofs, which was already exceeded on day one, where 3k proofs were generated. In the initial setup, meaning on day one, every node provider maintained three nodes (so, 9 in total). Our upper limit was to produce three proofs in parallel. Remember, proofs can only be generated across nodes from different providers, otherwise the MPC security guarantees would simply break. This resulted in a throughput of roughly 10 proofs/minute, which was more than enough in our original estimate, looking at past data from Cursive NFC tech.

Fortunately, the alphanet makes it easy to scale horizontally. Each provider simply spins up additional nodes (they're just Docker containers running on AWS machines) increasing throughput to $x$ proofs in parallel — where $x$ is determined by the provider with the fewest nodes. In the DC7 case, we scaled up to eight nodes per node provider.

The plan was to go strong into day two at Devcon (in Austria we were six hours behind Bangkok) with our new setup and adjusted goals for the week (we were pretty hyped to be frank). This time around though we woke up to a text message that people reported client errors and that the leaderboard was not updating. Looking at our dashboard we found the following picture:

Dashboard view during day 2 outage

The third row indicated how many jobs are currently waiting for nodes to retrieve them. The row in the center indicates the jobs that are currently running. Yeah, exactly that row indicates that there are no running proofs at this moment. This brought us in a predicament, as, first of all, the leaderboard obviously did not retrieve proofs. Additionally, the management server (the machine that delegates the proof) was running on a lower end machine. We identified the requirements for this machine as pretty low, as its job was to simply handle some 10 HTTPS requests and at most 30 gRPC calls from the nodes per minute. Remember the Cursive DB/backend server? It periodically queries the management server and tries to retrieve the open jobs. With 2.5k(!)7 open jobs, the management server was mostly occupied with trying to answer those queries, and it did a poor job at that, because the machine was so weak.

What happened was that apparently the nodes of Cursive died over night. At first there was still one node up and running, trying to do everything on its own. As Cursive only had one node, the alphanet could only produce one proof in parallel, leading to this immense open job queue. Luckily for us, we identified this rather fast. Bringing everything back up to eight nodes per provider, we could grab some coffee and watch the open job queue slowly being worked through. I would lie if I said it was not super satisfying watching the queue slowly decrease, as seen in this chart:

Dashboard view after day 2 outage was resolved

Summary

To sum things up, the alphanet produced 15.266 proofs in total (some of you still tapped after Devcon ended) and at the end we scaled to 16 nodes per provider. Working through the jitters of day two also showed us the beauty of distributed systems. Working with the engineers from Cursive and PSE through the problem was, albeit stressful, also really fun. Generally working with them over the last couple of weeks was awesome. Creating such a network and trying to fix and maintain it would be impossible without them. So, we want to conclude this post by issuing a big thank you to Cursive and PSE. Thanks also to the 1800 of you who were part of our alphanet launch and we are so grateful to be part of the Devcon social graph! On to hopefully many more collaborations and future joint efforts 🔥

In the coming weeks we will have more examples of coSNARKs in action so follow along on X for updates. If you're interested in building with coSNARKs or by using the (invite only) alphanet, join the conversation on Discord!

Footnotes

1

I am sure the Cursive folks will write a dedicated blog post about all the cryptography involved in this step

2

We also have a post for the other use-case, Private Shared State if you are interested.

3

You can find the actual circuit in the GitHub.

4

We already had a similar setup for our Max Pick Challenge, were TACEO hosted all nodes.

5

We use a semi-honest honest-majority MPC protocol with three parties. Therefore we need three different entities. If a node provider would own two nodes participating in the MPC, it would learn the secret values.

6

We of course had a key ceremony between Cursive and TACEO engineers to create the proving key in a trustless manner ;)

7

Remember that our initial goal was to produce 2k proofs. Now more than 2.5k were in a queue and still waiting.