on
Too Packed to Hack
During our internal Hackathon week, each of us chose a topic of our choice to focus on for an entire week. After brainstorming a bit (and discarding some initial ideas), I settled on adding packed secret sharing to our tooling. As a starting point I took the zkSaaS paper from 2023, but I will go into more detail how that worked out in the following!
On Packed Secret Sharing
Packed secret sharing (PSS for short, though ambiguous; go here for the other PSS) had its first mention in the paper Communication complexity of secure computation in 1992. As a quick reminder, in Shamir's (single-) secret sharing scheme (How to share a secret) you embed a secret $s$ into the constant term of a degree $t-1$ polynomial $f$, i.e., $f(0)=s$ where all the other coefficients are chosen at random. Then, typically party $i$ receives $f(i)$ as their secret share. By Lagrangian interpolation any $t$ of these shares then suffice to reconstruct the secret.
Packed secret sharing is a more or less straightforward generalization of Shamir's secret sharing (SSS), as follows. Assume you have $\ell$ secrets $s_1,\dots,s_\ell$ you want to secret share among $n$ parties. To this end you take again some evaluation points, for simplicity we again just assume them to be $1,\dots,n$ and some disjoint points $e_1,\dots,e_\ell$ where we hide our secrets via $f(e_i)=s_i$ for all $i=1,\dots,\ell$.
If you think of our packed secrets as a vector $s=(s_1,\dots,s_\ell)$, then the parties doing local addition/multiplication with their packed shares corresponds to coordinate-wise addition and multiplication of the underlying secret shared vectors. Addition can be performed without hesitation, just as in ordinary Shamir sharing. However, during multiplication (also just as in ordinary Shamir sharing), you must account for the degree of the polynomial doubling, which increases the number of shares required to reconstruct the secrets. Therefore, degree reduction protocols must also be considered in PSS.
So that sounds pretty cool right? Every player just receives one share, as they would in SSS, but it actually hides multiple secrets! But everything that sounds too good to be true, usually also is. The downside in PSS compared to SSS is that it comes with a threshold reduction. If we consider polynomials of degree $d$ we have the following:
- SSS is secure against any set of $d$ corrupt parties whereas
- PSS is secure against any set of $d-\ell+1$ corrupt parties.1
The zkSaaS idea
As my project was planned to evolve around PSS I started with the implementation of it. Since our tooling already features SSS and a packed secret share from the outside looks like an ordinary Shamir share I could reuse a lot of the existing functionalities. After a bit of playing around sharing and reconstructing a vector of secrets was working, so what should I do with it?
As already mentioned, I started from the zkSaaS paper, so let's dive a little bit into what it does. Similarly to the Collaborative zk-SNARKs paper it aims to distribute the generation of zk-SNARKs. The high level idea is that a client who wants to compute a zk-SNARK delegates the proving to a set of (possibly untrusted) servers. In their setting there is a more powerful "king" server among the participating servers. Using custom (distributed) variants of ZK-relevant functionalities like FFTs and MSMs, the hope is to leverage the PSS scheme in a SIMD manner.
For the FFT the rough outline goes like this: each party does the first levels of the FFT locally and then sends the results to the king server who reconstructs and computes the remaining parts of the FFT and sends out the packed shares again.
The MSM protocol consists of local MSMs, converting to regular Shamir shares and another local multiplication, so nothing too fancy except the conversion.
So at first glance it seemed like the only things I needed to implement were:
- PSS version of FFT and MSMs.
- Conversion from PSS to SSS.
- Multiplication with degree reduction for PSS.
- Generation of single and correlated (packed) randomness, which is needed in some protocols, e.g. in multiplying PSS.
Motivated to start implementing, I dove right into these. As I wanted to reuse as much as possible of the already existing, it took me some time to figure out what is compatible and what is not. Time went by and I spent nearly half of the week to implement all of the above points.
Then, while talking with a colleague about some doubts I had in my implementation, we also had a closer look at the concrete protocol zkSaaS used in their Groth16 version. After a little while we came to the conclusion that their idea of distributing the proof is unfortunately not that compatible2 with how we designed the proving part in our coCircom tooling so we decided that I would refocus my goal.
A new goal: Batched Proving
This was also something I considered doing, but only after I have finished doing the zkSaaS Groth16 prover, so it moved into the spotlight now. Let me explain how this works. Since coCircom (if you want to read more about our tooling and how to use it I refer to our docs) works with Circom circuits, let us assume we have given a circuit, some inputs with corresponding witnesses, and the circuit specific zkey (which contains the proving and verifying key).
Our goal is to do batched proving, so call the prover once and get proofs for each of the above in one run.
Since one major aspect of our tooling is to not reveal the witnesses we have to secret share them. Here comes packed secret sharing into play!
The way we would normally call our (single) Groth16 prover goes like this. First let for a value $x$ denote $[x]$ its (single) share of some party. Each party then executes the prover on their vector of shares $([w_1],\dots,[w_\ell])$ and in the end receives the proof for the witness $w=(w_1,\dots,w_\ell)$.
Now let's look at what's different for PSS.
For simplicity assume that we have two witnesses $w=(w_1,\dots,w_\ell)$ and $v=(v_1,\dots,v_\ell)$ for the same circuit but different inputs.
We now denote by $\llbracket x_1,\dots,x_k\rrbracket$ the packed share of $x_1,\dots,x_k$.
So how we pack the witnesses is like this, each party gets a vector of packed shares $\llbracket w_1,v_1\rrbracket, \dots \llbracket w_\ell,v_\ell\rrbracket$. Thus everyone still holds just one vector of shares, except that it now holds two witnesses.
In a perfect world (or at least it would have been a perfect world for me during the hackathon) each party uses this packed vector to execute the same (single) Groth16 prover. Guess what, that does not work. One of the problems is that you have to carefully think about how to do operations on the public inputs which (may) differ for different witnesses and which occur in several places during the execution of the prover. Also the generation of the randomness (correlated $r,s$ in Groth16) has to be adapted for the batched proving use case.
But besides that, since a lot of the stuff in Groth16 is linear, we can leave most of the prover as is and just pay attention in the end where you have to open to (in this case) two proofs.
Results and learnings
I have to disclaim in the beginning that although producing proofs, I have some bug still in my code so that the proofs do not verify :(. So far I have not found the time to go looking for it, but if I find it in the future, you may hear again from me!
Nevertheless, I did some testing on basic example circuits and have found that the overhead for single proving with 3 parties to double proving with 5 parties or triple proving with 7 parties is little to none, sometimes even faster. So if you have enough willing parties at hand, why not go for two proofs instead of one in the same time?
One major personal learning is that I should have definitely looked further ahead before implementing the FFTs, MSMs, etc., because that time was lacking when trying to fix the bug in the batched proving. Despite this letdown it was fun to take a whole week and just working on a project that you enjoy on your own. Plus, ordering food for lunch every day was a nice extra.
Sometimes its a competition in our office about who's got the most formulas, so let's add some here. A further restriction we have, is that $d<2n$, because otherwise we could not do multiplication. Also we of course have the bounds $\ell-1\leq d\leq n-1$. As we simplified the evaluation and sharing points in our PSS sketch, let's see details. In literature it is common to use a prime field $\mathbb F_p $ for $p$ prime and $p> \ell+n$, so that $-\ell+1,\dots,n$ are distinct and you can hide the secrets in $f(-i+1)=s_i$ for $i=1,\dots,\ell$.
One of the main differences is that they get into the proving a little bit later than we do, so this would have required me to modify/extend our witness generation as well, which was not so appealing to me.
Thank you for following our hackathon series! If you'd like to read more about coSNARKs and the rest of the TACEO stack, feel free to check out the docs, chat with us in Discord, and follow for news on X!