Message ID | 20210528035520.3445-1-harishankar.vishwanathan@rutgers.edu (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | BPF |
Headers | show |
Series | [bpf-next] bpf: tnums: Provably sound, faster, and more precise algorithm for tnum_mul | expand |
Context | Check | Description |
---|---|---|
netdev/cover_letter | success | Link |
netdev/fixes_present | success | Link |
netdev/patch_count | success | Link |
netdev/tree_selection | success | Clearly marked for bpf-next |
netdev/subject_prefix | success | Link |
netdev/cc_maintainers | warning | 8 maintainers not CCed: netdev@vger.kernel.org yhs@fb.com kpsingh@kernel.org daniel@iogearbox.net andrii@kernel.org kafai@fb.com john.fastabend@gmail.com songliubraving@fb.com |
netdev/source_inline | success | Was 0 now: 0 |
netdev/verify_signedoff | success | Link |
netdev/module_param | success | Was 0 now: 0 |
netdev/build_32bit | success | Errors and warnings before: 1 this patch: 1 |
netdev/kdoc | success | Errors and warnings before: 0 this patch: 0 |
netdev/verify_fixes | success | Link |
netdev/checkpatch | fail | CHECK: Blank lines aren't necessary after an open brace '{' CHECK: Please don't use multiple blank lines ERROR: trailing whitespace |
netdev/build_allmodconfig_warn | success | Errors and warnings before: 1 this patch: 1 |
netdev/header_inline | success | Link |
On Thu, May 27, 2021 at 11:14 PM <hv90@scarletmail.rutgers.edu> wrote: > > From: Harishankar Vishwanathan <harishankar.vishwanathan@rutgers.edu> > > This patch introduces a new algorithm for multiplication of tristate > numbers (tnums) that is provably sound. It is faster and more precise when > compared to the existing method. > > Like the existing method, this new algorithm follows the long > multiplication algorithm. The idea is to generate partial products by > multiplying each bit in the multiplier (tnum a) with the multiplicand > (tnum b), and adding the partial products after appropriately bit-shifting > them. The new algorithm, however, uses just a single loop over the bits of > the multiplier (tnum a) and accumulates only the uncertain components of > the multiplicand (tnum b) into a mask-only tnum. The following paper > explains the algorithm in more detail: https://arxiv.org/abs/2105.05398. This is a nice paper, I appreciated tables with algorithms pseudo-code and specific examples with uncertain bits, thanks! I think your algorithm makes sense, but I've also CC'ed original author of tnum logic. Edward, please take a look as well. See below mostly styling nits. > > A natural way to construct the tnum product is by performing a tnum > addition on all the partial products. This algorithm presents another > method of doing this: decompose each partial product into two tnums, > consisting of the values and the masks separately. The mask-sum is > accumulated within the loop in acc_m. The value-sum tnum is generated > using a.value * b.value. The tnum constructed by tnum addition of the > value-sum and the mask-sum contains all possible summations of concrete > values drawn from the partial product tnums pairwise. We prove this result > in the paper. > > Our evaluations show that the new algorithm is overall more precise > (producing tnums with less uncertain components) than the existing method. > As an illustrative example, consider the input tnums A and B. The numbers > in the paranthesis correspond to (value;mask). > > A = 000000x1 (1;2) > B = 0010011x (38;1) > A * B (existing) = xxxxxxxx (0;255) > A * B (new) = 0x1xxxxx (32;95) > > Importantly, we present a proof of soundness of the new algorithm in the > aforementioned paper. Additionally, we show that this new algorithm is > empirically faster than the existing method. > > Co-developed-by: Matan Shachnai <m.shachnai@rutgers.edu> > Signed-off-by: Matan Shachnai <m.shachnai@rutgers.edu> > Co-developed-by: Srinivas Narayana <srinivas.narayana@rutgers.edu> > Signed-off-by: Srinivas Narayana <srinivas.narayana@rutgers.edu> > Co-developed-by: Santosh Nagarakatte <santosh.nagarakatte@rutgers.edu> > Signed-off-by: Santosh Nagarakatte <santosh.nagarakatte@rutgers.edu> > Signed-off-by: Harishankar Vishwanathan <harishankar.vishwanathan@rutgers.edu> > --- > kernel/bpf/tnum.c | 38 +++++++++++++++++++------------------- > 1 file changed, 19 insertions(+), 19 deletions(-) > > diff --git a/kernel/bpf/tnum.c b/kernel/bpf/tnum.c > index ceac5281bd31..bb1fa1cc181d 100644 > --- a/kernel/bpf/tnum.c > +++ b/kernel/bpf/tnum.c > @@ -111,30 +111,30 @@ struct tnum tnum_xor(struct tnum a, struct tnum b) > return TNUM(v & ~mu, mu); > } > > -/* half-multiply add: acc += (unknown * mask * value). > - * An intermediate step in the multiply algorithm. > - */ > -static struct tnum hma(struct tnum acc, u64 value, u64 mask) > +struct tnum tnum_mul(struct tnum a, struct tnum b) It's probably a good idea to have a short description (from your commit description above) of the algorithm in the comment above this function, with a link to your paper. > { > - while (mask) { > - if (mask & 1) > - acc = tnum_add(acc, TNUM(0, value)); > - mask >>= 1; > - value <<= 1; > - } > - return acc; > -} > + u64 acc_v = a.value * b.value; > + struct tnum acc_m = TNUM(0, 0); > > -struct tnum tnum_mul(struct tnum a, struct tnum b) > -{ > - struct tnum acc; > - u64 pi; > + while (a.value > 0 || a.mask > 0) { `while (a.value || a.mask)` is shorter and doesn't imply that a.value or a.mask can be < 0 (otherwise you'd write != 0, right? ;) > + unnecessary empty line > + // LSB of tnum a is a certain 1 please use C-style comments /* */ > + if (((a.value & 1) == 1) && ((a.mask & 1) == 0)) just if (a.value & 1) is enough. if a.value == 1, a.mask has to be 0, right? and (x & 1) == 1 is just a more verbose way of saying (x & 1) is non-zero, which in C is just if (x & 1). > + acc_m = tnum_add(acc_m, TNUM(0, b.mask)); > > - pi = a.value * b.value; > - acc = hma(TNUM(pi, 0), a.mask, b.mask | b.value); > - return hma(acc, b.mask, a.value); > + // LSB of tnum a is uncertain > + else if ((a.mask & 1) == 1) same about comment style and simpler if statement; also another comment below > + acc_m = tnum_add(acc_m, TNUM(0, b.value | b.mask)); > + > + // Note: no case for LSB is certain 0 > + a = tnum_rshift(a, 1); > + b = tnum_lshift(b, 1); > + } > + > + return tnum_add(TNUM(acc_v, 0), acc_m); > } > > + another unnecessary empty line > /* Note that if a and b disagree - i.e. one has a 'known 1' where the other has > * a 'known 0' - this will return a 'known 1' for that bit. > */ > -- > 2.25.1 >
Thanks for reviewing this patch, Andrii. All of your comments make sense to us. We will resend the patch with the fixes you requested. On Sun, May 30, 2021 at 1:59 AM Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote: > > On Thu, May 27, 2021 at 11:14 PM <hv90@scarletmail.rutgers.edu> wrote: > > > > From: Harishankar Vishwanathan <harishankar.vishwanathan@rutgers.edu> > > > > This patch introduces a new algorithm for multiplication of tristate > > numbers (tnums) that is provably sound. It is faster and more precise when > > compared to the existing method. > > > > Like the existing method, this new algorithm follows the long > > multiplication algorithm. The idea is to generate partial products by > > multiplying each bit in the multiplier (tnum a) with the multiplicand > > (tnum b), and adding the partial products after appropriately bit-shifting > > them. The new algorithm, however, uses just a single loop over the bits of > > the multiplier (tnum a) and accumulates only the uncertain components of > > the multiplicand (tnum b) into a mask-only tnum. The following paper > > explains the algorithm in more detail: https://arxiv.org/abs/2105.05398. > > This is a nice paper, I appreciated tables with algorithms pseudo-code > and specific examples with uncertain bits, thanks! > > I think your algorithm makes sense, but I've also CC'ed original > author of tnum logic. Edward, please take a look as well. > > See below mostly styling nits. > > > > > A natural way to construct the tnum product is by performing a tnum > > addition on all the partial products. This algorithm presents another > > method of doing this: decompose each partial product into two tnums, > > consisting of the values and the masks separately. The mask-sum is > > accumulated within the loop in acc_m. The value-sum tnum is generated > > using a.value * b.value. The tnum constructed by tnum addition of the > > value-sum and the mask-sum contains all possible summations of concrete > > values drawn from the partial product tnums pairwise. We prove this result > > in the paper. > > > > Our evaluations show that the new algorithm is overall more precise > > (producing tnums with less uncertain components) than the existing method. > > As an illustrative example, consider the input tnums A and B. The numbers > > in the paranthesis correspond to (value;mask). > > > > A = 000000x1 (1;2) > > B = 0010011x (38;1) > > A * B (existing) = xxxxxxxx (0;255) > > A * B (new) = 0x1xxxxx (32;95) > > > > Importantly, we present a proof of soundness of the new algorithm in the > > aforementioned paper. Additionally, we show that this new algorithm is > > empirically faster than the existing method. > > > > Co-developed-by: Matan Shachnai <m.shachnai@rutgers.edu> > > Signed-off-by: Matan Shachnai <m.shachnai@rutgers.edu> > > Co-developed-by: Srinivas Narayana <srinivas.narayana@rutgers.edu> > > Signed-off-by: Srinivas Narayana <srinivas.narayana@rutgers.edu> > > Co-developed-by: Santosh Nagarakatte <santosh.nagarakatte@rutgers.edu> > > Signed-off-by: Santosh Nagarakatte <santosh.nagarakatte@rutgers.edu> > > Signed-off-by: Harishankar Vishwanathan <harishankar.vishwanathan@rutgers.edu> > > --- > > kernel/bpf/tnum.c | 38 +++++++++++++++++++------------------- > > 1 file changed, 19 insertions(+), 19 deletions(-) > > > > diff --git a/kernel/bpf/tnum.c b/kernel/bpf/tnum.c > > index ceac5281bd31..bb1fa1cc181d 100644 > > --- a/kernel/bpf/tnum.c > > +++ b/kernel/bpf/tnum.c > > @@ -111,30 +111,30 @@ struct tnum tnum_xor(struct tnum a, struct tnum b) > > return TNUM(v & ~mu, mu); > > } > > > > -/* half-multiply add: acc += (unknown * mask * value). > > - * An intermediate step in the multiply algorithm. > > - */ > > -static struct tnum hma(struct tnum acc, u64 value, u64 mask) > > +struct tnum tnum_mul(struct tnum a, struct tnum b) > > It's probably a good idea to have a short description (from your > commit description above) of the algorithm in the comment above this > function, with a link to your paper. > > > { > > - while (mask) { > > - if (mask & 1) > > - acc = tnum_add(acc, TNUM(0, value)); > > - mask >>= 1; > > - value <<= 1; > > - } > > - return acc; > > -} > > + u64 acc_v = a.value * b.value; > > + struct tnum acc_m = TNUM(0, 0); > > > > -struct tnum tnum_mul(struct tnum a, struct tnum b) > > -{ > > - struct tnum acc; > > - u64 pi; > > + while (a.value > 0 || a.mask > 0) { > > `while (a.value || a.mask)` is shorter and doesn't imply that a.value > or a.mask can be < 0 (otherwise you'd write != 0, right? ;) > > > + > > unnecessary empty line > > > + // LSB of tnum a is a certain 1 > > please use C-style comments /* */ > > > + if (((a.value & 1) == 1) && ((a.mask & 1) == 0)) > > just if (a.value & 1) is enough. if a.value == 1, a.mask has to be 0, > right? and (x & 1) == 1 is just a more verbose way of saying (x & 1) > is non-zero, which in C is just if (x & 1). > > > + acc_m = tnum_add(acc_m, TNUM(0, b.mask)); > > > > - pi = a.value * b.value; > > - acc = hma(TNUM(pi, 0), a.mask, b.mask | b.value); > > - return hma(acc, b.mask, a.value); > > + // LSB of tnum a is uncertain > > + else if ((a.mask & 1) == 1) > > same about comment style and simpler if statement; also another comment below > > > + acc_m = tnum_add(acc_m, TNUM(0, b.value | b.mask)); > > + > > + // Note: no case for LSB is certain 0 > > + a = tnum_rshift(a, 1); > > + b = tnum_lshift(b, 1); > > + } > > + > > + return tnum_add(TNUM(acc_v, 0), acc_m); > > } > > > > + > > another unnecessary empty line > > > /* Note that if a and b disagree - i.e. one has a 'known 1' where the other has > > * a 'known 0' - this will return a 'known 1' for that bit. > > */ > > -- > > 2.25.1 > >
On 30/05/2021 06:59, Andrii Nakryiko wrote: > I think your algorithm makes sense, but I've also CC'ed original > author of tnum logic. Edward, please take a look as well.Yep, I've been in discussions with them about the paper and their algorithm appears fine to me. As for the patch, nothing to add beyond your observations. -ed
On 6/1/21 10:11 AM, Edward Cree wrote: > On 30/05/2021 06:59, Andrii Nakryiko wrote: >> I think your algorithm makes sense, but I've also CC'ed original >> author of tnum logic. Edward, please take a look as well.Yep, I've been in discussions with them about the paper and their > algorithm appears fine to me. As for the patch, nothing to add > beyond your observations. SGTM, Edward, okay to add your Acked-by or Reviewed-by in that case when applying v2 [0]? Thanks, Daniel [0] https://patchwork.kernel.org/project/netdevbpf/patch/20210531020157.7386-1-harishankar.vishwanathan@rutgers.edu/
diff --git a/kernel/bpf/tnum.c b/kernel/bpf/tnum.c index ceac5281bd31..bb1fa1cc181d 100644 --- a/kernel/bpf/tnum.c +++ b/kernel/bpf/tnum.c @@ -111,30 +111,30 @@ struct tnum tnum_xor(struct tnum a, struct tnum b) return TNUM(v & ~mu, mu); } -/* half-multiply add: acc += (unknown * mask * value). - * An intermediate step in the multiply algorithm. - */ -static struct tnum hma(struct tnum acc, u64 value, u64 mask) +struct tnum tnum_mul(struct tnum a, struct tnum b) { - while (mask) { - if (mask & 1) - acc = tnum_add(acc, TNUM(0, value)); - mask >>= 1; - value <<= 1; - } - return acc; -} + u64 acc_v = a.value * b.value; + struct tnum acc_m = TNUM(0, 0); -struct tnum tnum_mul(struct tnum a, struct tnum b) -{ - struct tnum acc; - u64 pi; + while (a.value > 0 || a.mask > 0) { + + // LSB of tnum a is a certain 1 + if (((a.value & 1) == 1) && ((a.mask & 1) == 0)) + acc_m = tnum_add(acc_m, TNUM(0, b.mask)); - pi = a.value * b.value; - acc = hma(TNUM(pi, 0), a.mask, b.mask | b.value); - return hma(acc, b.mask, a.value); + // LSB of tnum a is uncertain + else if ((a.mask & 1) == 1) + acc_m = tnum_add(acc_m, TNUM(0, b.value | b.mask)); + + // Note: no case for LSB is certain 0 + a = tnum_rshift(a, 1); + b = tnum_lshift(b, 1); + } + + return tnum_add(TNUM(acc_v, 0), acc_m); } + /* Note that if a and b disagree - i.e. one has a 'known 1' where the other has * a 'known 0' - this will return a 'known 1' for that bit. */