diff mbox series

[bpf-next] bpf: tnums: Provably sound, faster, and more precise algorithm for tnum_mul

Message ID 20210528035520.3445-1-harishankar.vishwanathan@rutgers.edu (mailing list archive)
State Superseded
Delegated to: BPF
Headers show
Series [bpf-next] bpf: tnums: Provably sound, faster, and more precise algorithm for tnum_mul | expand

Checks

Context Check Description
netdev/cover_letter success Link
netdev/fixes_present success Link
netdev/patch_count success Link
netdev/tree_selection success Clearly marked for bpf-next
netdev/subject_prefix success Link
netdev/cc_maintainers warning 8 maintainers not CCed: netdev@vger.kernel.org yhs@fb.com kpsingh@kernel.org daniel@iogearbox.net andrii@kernel.org kafai@fb.com john.fastabend@gmail.com songliubraving@fb.com
netdev/source_inline success Was 0 now: 0
netdev/verify_signedoff success Link
netdev/module_param success Was 0 now: 0
netdev/build_32bit success Errors and warnings before: 1 this patch: 1
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/verify_fixes success Link
netdev/checkpatch fail CHECK: Blank lines aren't necessary after an open brace '{' CHECK: Please don't use multiple blank lines ERROR: trailing whitespace
netdev/build_allmodconfig_warn success Errors and warnings before: 1 this patch: 1
netdev/header_inline success Link

Commit Message

HARISHANKAR VISHWANATHAN May 28, 2021, 3:55 a.m. UTC
From: Harishankar Vishwanathan <harishankar.vishwanathan@rutgers.edu>

This patch introduces a new algorithm for multiplication of tristate
numbers (tnums) that is provably sound. It is faster and more precise when
compared to the existing method.

Like the existing method, this new algorithm follows the long
multiplication algorithm. The idea is to generate partial products by
multiplying each bit in the multiplier (tnum a) with the multiplicand
(tnum b), and adding the partial products after appropriately bit-shifting
them. The new algorithm, however, uses just a single loop over the bits of
the multiplier (tnum a) and accumulates only the uncertain components of
the multiplicand (tnum b) into a mask-only tnum. The following paper
explains the algorithm in more detail: https://arxiv.org/abs/2105.05398.

A natural way to construct the tnum product is by performing a tnum
addition on all the partial products. This algorithm presents another
method of doing this: decompose each partial product into two tnums,
consisting of the values and the masks separately. The mask-sum is
accumulated within the loop in acc_m. The value-sum tnum is generated
using a.value * b.value. The tnum constructed by tnum addition of the
value-sum and the mask-sum contains all possible summations of concrete
values drawn from the partial product tnums pairwise. We prove this result
in the paper.

Our evaluations show that the new algorithm is overall more precise
(producing tnums with less uncertain components) than the existing method.
As an illustrative example, consider the input tnums A and B. The numbers
in the paranthesis correspond to (value;mask).

A                = 000000x1 (1;2)
B                = 0010011x (38;1)
A * B (existing) = xxxxxxxx (0;255)
A * B (new)      = 0x1xxxxx (32;95)

Importantly, we present a proof of soundness of the new algorithm in the
aforementioned paper. Additionally, we show that this new algorithm is
empirically faster than the existing method.

Co-developed-by: Matan Shachnai <m.shachnai@rutgers.edu>
Signed-off-by: Matan Shachnai <m.shachnai@rutgers.edu>
Co-developed-by: Srinivas Narayana <srinivas.narayana@rutgers.edu>
Signed-off-by: Srinivas Narayana <srinivas.narayana@rutgers.edu>
Co-developed-by: Santosh Nagarakatte <santosh.nagarakatte@rutgers.edu>
Signed-off-by: Santosh Nagarakatte <santosh.nagarakatte@rutgers.edu>
Signed-off-by: Harishankar Vishwanathan <harishankar.vishwanathan@rutgers.edu>
---
 kernel/bpf/tnum.c | 38 +++++++++++++++++++-------------------
 1 file changed, 19 insertions(+), 19 deletions(-)

Comments

Andrii Nakryiko May 30, 2021, 5:59 a.m. UTC | #1
On Thu, May 27, 2021 at 11:14 PM <hv90@scarletmail.rutgers.edu> wrote:
>
> From: Harishankar Vishwanathan <harishankar.vishwanathan@rutgers.edu>
>
> This patch introduces a new algorithm for multiplication of tristate
> numbers (tnums) that is provably sound. It is faster and more precise when
> compared to the existing method.
>
> Like the existing method, this new algorithm follows the long
> multiplication algorithm. The idea is to generate partial products by
> multiplying each bit in the multiplier (tnum a) with the multiplicand
> (tnum b), and adding the partial products after appropriately bit-shifting
> them. The new algorithm, however, uses just a single loop over the bits of
> the multiplier (tnum a) and accumulates only the uncertain components of
> the multiplicand (tnum b) into a mask-only tnum. The following paper
> explains the algorithm in more detail: https://arxiv.org/abs/2105.05398.

This is a nice paper, I appreciated tables with algorithms pseudo-code
and specific examples with uncertain bits, thanks!

I think your algorithm makes sense, but I've also CC'ed original
author of tnum logic. Edward, please take a look as well.

See below mostly styling nits.

>
> A natural way to construct the tnum product is by performing a tnum
> addition on all the partial products. This algorithm presents another
> method of doing this: decompose each partial product into two tnums,
> consisting of the values and the masks separately. The mask-sum is
> accumulated within the loop in acc_m. The value-sum tnum is generated
> using a.value * b.value. The tnum constructed by tnum addition of the
> value-sum and the mask-sum contains all possible summations of concrete
> values drawn from the partial product tnums pairwise. We prove this result
> in the paper.
>
> Our evaluations show that the new algorithm is overall more precise
> (producing tnums with less uncertain components) than the existing method.
> As an illustrative example, consider the input tnums A and B. The numbers
> in the paranthesis correspond to (value;mask).
>
> A                = 000000x1 (1;2)
> B                = 0010011x (38;1)
> A * B (existing) = xxxxxxxx (0;255)
> A * B (new)      = 0x1xxxxx (32;95)
>
> Importantly, we present a proof of soundness of the new algorithm in the
> aforementioned paper. Additionally, we show that this new algorithm is
> empirically faster than the existing method.
>
> Co-developed-by: Matan Shachnai <m.shachnai@rutgers.edu>
> Signed-off-by: Matan Shachnai <m.shachnai@rutgers.edu>
> Co-developed-by: Srinivas Narayana <srinivas.narayana@rutgers.edu>
> Signed-off-by: Srinivas Narayana <srinivas.narayana@rutgers.edu>
> Co-developed-by: Santosh Nagarakatte <santosh.nagarakatte@rutgers.edu>
> Signed-off-by: Santosh Nagarakatte <santosh.nagarakatte@rutgers.edu>
> Signed-off-by: Harishankar Vishwanathan <harishankar.vishwanathan@rutgers.edu>
> ---
>  kernel/bpf/tnum.c | 38 +++++++++++++++++++-------------------
>  1 file changed, 19 insertions(+), 19 deletions(-)
>
> diff --git a/kernel/bpf/tnum.c b/kernel/bpf/tnum.c
> index ceac5281bd31..bb1fa1cc181d 100644
> --- a/kernel/bpf/tnum.c
> +++ b/kernel/bpf/tnum.c
> @@ -111,30 +111,30 @@ struct tnum tnum_xor(struct tnum a, struct tnum b)
>         return TNUM(v & ~mu, mu);
>  }
>
> -/* half-multiply add: acc += (unknown * mask * value).
> - * An intermediate step in the multiply algorithm.
> - */
> -static struct tnum hma(struct tnum acc, u64 value, u64 mask)
> +struct tnum tnum_mul(struct tnum a, struct tnum b)

It's probably a good idea to have a short description (from your
commit description above) of the algorithm in the comment above this
function, with a link to your paper.

>  {
> -       while (mask) {
> -               if (mask & 1)
> -                       acc = tnum_add(acc, TNUM(0, value));
> -               mask >>= 1;
> -               value <<= 1;
> -       }
> -       return acc;
> -}
> +       u64 acc_v = a.value * b.value;
> +       struct tnum acc_m = TNUM(0, 0);
>
> -struct tnum tnum_mul(struct tnum a, struct tnum b)
> -{
> -       struct tnum acc;
> -       u64 pi;
> +       while (a.value > 0 || a.mask > 0) {

`while (a.value || a.mask)` is shorter and doesn't imply that a.value
or a.mask can be < 0 (otherwise you'd write != 0, right? ;)

> +

unnecessary empty line

> +               // LSB of tnum a is a certain 1

please use C-style comments /* */

> +               if (((a.value & 1) == 1) && ((a.mask & 1) == 0))

just if (a.value & 1) is enough. if a.value == 1, a.mask has to be 0,
right? and (x & 1) == 1 is just a more verbose way of saying (x & 1)
is non-zero, which in C is just if (x & 1).

> +                       acc_m = tnum_add(acc_m, TNUM(0, b.mask));
>
> -       pi = a.value * b.value;
> -       acc = hma(TNUM(pi, 0), a.mask, b.mask | b.value);
> -       return hma(acc, b.mask, a.value);
> +               // LSB of tnum a is uncertain
> +               else if ((a.mask & 1) == 1)

same about comment style and simpler if statement; also another comment below

> +                       acc_m = tnum_add(acc_m, TNUM(0, b.value | b.mask));
> +
> +               // Note: no case for LSB is certain 0
> +               a = tnum_rshift(a, 1);
> +               b = tnum_lshift(b, 1);
> +       }
> +
> +       return tnum_add(TNUM(acc_v, 0), acc_m);
>  }
>
> +

another unnecessary empty line

>  /* Note that if a and b disagree - i.e. one has a 'known 1' where the other has
>   * a 'known 0' - this will return a 'known 1' for that bit.
>   */
> --
> 2.25.1
>
HARISHANKAR VISHWANATHAN May 30, 2021, 11:05 p.m. UTC | #2
Thanks for reviewing this patch, Andrii. All of your comments make sense to us.
We will resend the patch with the fixes you requested.


On Sun, May 30, 2021 at 1:59 AM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Thu, May 27, 2021 at 11:14 PM <hv90@scarletmail.rutgers.edu> wrote:
> >
> > From: Harishankar Vishwanathan <harishankar.vishwanathan@rutgers.edu>
> >
> > This patch introduces a new algorithm for multiplication of tristate
> > numbers (tnums) that is provably sound. It is faster and more precise when
> > compared to the existing method.
> >
> > Like the existing method, this new algorithm follows the long
> > multiplication algorithm. The idea is to generate partial products by
> > multiplying each bit in the multiplier (tnum a) with the multiplicand
> > (tnum b), and adding the partial products after appropriately bit-shifting
> > them. The new algorithm, however, uses just a single loop over the bits of
> > the multiplier (tnum a) and accumulates only the uncertain components of
> > the multiplicand (tnum b) into a mask-only tnum. The following paper
> > explains the algorithm in more detail: https://arxiv.org/abs/2105.05398.
>
> This is a nice paper, I appreciated tables with algorithms pseudo-code
> and specific examples with uncertain bits, thanks!
>
> I think your algorithm makes sense, but I've also CC'ed original
> author of tnum logic. Edward, please take a look as well.
>
> See below mostly styling nits.
>
> >
> > A natural way to construct the tnum product is by performing a tnum
> > addition on all the partial products. This algorithm presents another
> > method of doing this: decompose each partial product into two tnums,
> > consisting of the values and the masks separately. The mask-sum is
> > accumulated within the loop in acc_m. The value-sum tnum is generated
> > using a.value * b.value. The tnum constructed by tnum addition of the
> > value-sum and the mask-sum contains all possible summations of concrete
> > values drawn from the partial product tnums pairwise. We prove this result
> > in the paper.
> >
> > Our evaluations show that the new algorithm is overall more precise
> > (producing tnums with less uncertain components) than the existing method.
> > As an illustrative example, consider the input tnums A and B. The numbers
> > in the paranthesis correspond to (value;mask).
> >
> > A                = 000000x1 (1;2)
> > B                = 0010011x (38;1)
> > A * B (existing) = xxxxxxxx (0;255)
> > A * B (new)      = 0x1xxxxx (32;95)
> >
> > Importantly, we present a proof of soundness of the new algorithm in the
> > aforementioned paper. Additionally, we show that this new algorithm is
> > empirically faster than the existing method.
> >
> > Co-developed-by: Matan Shachnai <m.shachnai@rutgers.edu>
> > Signed-off-by: Matan Shachnai <m.shachnai@rutgers.edu>
> > Co-developed-by: Srinivas Narayana <srinivas.narayana@rutgers.edu>
> > Signed-off-by: Srinivas Narayana <srinivas.narayana@rutgers.edu>
> > Co-developed-by: Santosh Nagarakatte <santosh.nagarakatte@rutgers.edu>
> > Signed-off-by: Santosh Nagarakatte <santosh.nagarakatte@rutgers.edu>
> > Signed-off-by: Harishankar Vishwanathan <harishankar.vishwanathan@rutgers.edu>
> > ---
> >  kernel/bpf/tnum.c | 38 +++++++++++++++++++-------------------
> >  1 file changed, 19 insertions(+), 19 deletions(-)
> >
> > diff --git a/kernel/bpf/tnum.c b/kernel/bpf/tnum.c
> > index ceac5281bd31..bb1fa1cc181d 100644
> > --- a/kernel/bpf/tnum.c
> > +++ b/kernel/bpf/tnum.c
> > @@ -111,30 +111,30 @@ struct tnum tnum_xor(struct tnum a, struct tnum b)
> >         return TNUM(v & ~mu, mu);
> >  }
> >
> > -/* half-multiply add: acc += (unknown * mask * value).
> > - * An intermediate step in the multiply algorithm.
> > - */
> > -static struct tnum hma(struct tnum acc, u64 value, u64 mask)
> > +struct tnum tnum_mul(struct tnum a, struct tnum b)
>
> It's probably a good idea to have a short description (from your
> commit description above) of the algorithm in the comment above this
> function, with a link to your paper.
>
> >  {
> > -       while (mask) {
> > -               if (mask & 1)
> > -                       acc = tnum_add(acc, TNUM(0, value));
> > -               mask >>= 1;
> > -               value <<= 1;
> > -       }
> > -       return acc;
> > -}
> > +       u64 acc_v = a.value * b.value;
> > +       struct tnum acc_m = TNUM(0, 0);
> >
> > -struct tnum tnum_mul(struct tnum a, struct tnum b)
> > -{
> > -       struct tnum acc;
> > -       u64 pi;
> > +       while (a.value > 0 || a.mask > 0) {
>
> `while (a.value || a.mask)` is shorter and doesn't imply that a.value
> or a.mask can be < 0 (otherwise you'd write != 0, right? ;)
>
> > +
>
> unnecessary empty line
>
> > +               // LSB of tnum a is a certain 1
>
> please use C-style comments /* */
>
> > +               if (((a.value & 1) == 1) && ((a.mask & 1) == 0))
>
> just if (a.value & 1) is enough. if a.value == 1, a.mask has to be 0,
> right? and (x & 1) == 1 is just a more verbose way of saying (x & 1)
> is non-zero, which in C is just if (x & 1).
>
> > +                       acc_m = tnum_add(acc_m, TNUM(0, b.mask));
> >
> > -       pi = a.value * b.value;
> > -       acc = hma(TNUM(pi, 0), a.mask, b.mask | b.value);
> > -       return hma(acc, b.mask, a.value);
> > +               // LSB of tnum a is uncertain
> > +               else if ((a.mask & 1) == 1)
>
> same about comment style and simpler if statement; also another comment below
>
> > +                       acc_m = tnum_add(acc_m, TNUM(0, b.value | b.mask));
> > +
> > +               // Note: no case for LSB is certain 0
> > +               a = tnum_rshift(a, 1);
> > +               b = tnum_lshift(b, 1);
> > +       }
> > +
> > +       return tnum_add(TNUM(acc_v, 0), acc_m);
> >  }
> >
> > +
>
> another unnecessary empty line
>
> >  /* Note that if a and b disagree - i.e. one has a 'known 1' where the other has
> >   * a 'known 0' - this will return a 'known 1' for that bit.
> >   */
> > --
> > 2.25.1
> >
Edward Cree June 1, 2021, 8:11 a.m. UTC | #3
On 30/05/2021 06:59, Andrii Nakryiko wrote:
> I think your algorithm makes sense, but I've also CC'ed original
> author of tnum logic. Edward, please take a look as well.Yep, I've been in discussions with them about the paper and their
 algorithm appears fine to me.  As for the patch, nothing to add
 beyond your observations.

-ed
Daniel Borkmann June 1, 2021, 9:41 a.m. UTC | #4
On 6/1/21 10:11 AM, Edward Cree wrote:
> On 30/05/2021 06:59, Andrii Nakryiko wrote:
>> I think your algorithm makes sense, but I've also CC'ed original
>> author of tnum logic. Edward, please take a look as well.Yep, I've been in discussions with them about the paper and their
>   algorithm appears fine to me.  As for the patch, nothing to add
>   beyond your observations.

SGTM, Edward, okay to add your Acked-by or Reviewed-by in that case when applying v2 [0]?

Thanks,
Daniel

   [0] https://patchwork.kernel.org/project/netdevbpf/patch/20210531020157.7386-1-harishankar.vishwanathan@rutgers.edu/
diff mbox series

Patch

diff --git a/kernel/bpf/tnum.c b/kernel/bpf/tnum.c
index ceac5281bd31..bb1fa1cc181d 100644
--- a/kernel/bpf/tnum.c
+++ b/kernel/bpf/tnum.c
@@ -111,30 +111,30 @@  struct tnum tnum_xor(struct tnum a, struct tnum b)
 	return TNUM(v & ~mu, mu);
 }
 
-/* half-multiply add: acc += (unknown * mask * value).
- * An intermediate step in the multiply algorithm.
- */
-static struct tnum hma(struct tnum acc, u64 value, u64 mask)
+struct tnum tnum_mul(struct tnum a, struct tnum b) 
 {
-	while (mask) {
-		if (mask & 1)
-			acc = tnum_add(acc, TNUM(0, value));
-		mask >>= 1;
-		value <<= 1;
-	}
-	return acc;
-}
+	u64 acc_v = a.value * b.value;
+	struct tnum acc_m = TNUM(0, 0);
 
-struct tnum tnum_mul(struct tnum a, struct tnum b)
-{
-	struct tnum acc;
-	u64 pi;
+	while (a.value > 0 || a.mask > 0) {
+
+		// LSB of tnum a is a certain 1
+		if (((a.value & 1) == 1) && ((a.mask & 1) == 0))
+			acc_m = tnum_add(acc_m, TNUM(0, b.mask));
 
-	pi = a.value * b.value;
-	acc = hma(TNUM(pi, 0), a.mask, b.mask | b.value);
-	return hma(acc, b.mask, a.value);
+		// LSB of tnum a is uncertain
+		else if ((a.mask & 1) == 1)
+			acc_m = tnum_add(acc_m, TNUM(0, b.value | b.mask));
+
+		// Note: no case for LSB is certain 0
+		a = tnum_rshift(a, 1);
+		b = tnum_lshift(b, 1);
+	}
+
+	return tnum_add(TNUM(acc_v, 0), acc_m);
 }
 
+
 /* Note that if a and b disagree - i.e. one has a 'known 1' where the other has
  * a 'known 0' - this will return a 'known 1' for that bit.
  */