diff mbox series

[v4,bpf-next] bpf: Explicitly zero-extend R0 after 32-bit cmpxchg

Message ID 20210223150845.1857620-1-jackmanb@google.com (mailing list archive)
State Superseded
Delegated to: BPF
Headers show
Series [v4,bpf-next] bpf: Explicitly zero-extend R0 after 32-bit cmpxchg | expand

Checks

Context Check Description
netdev/cover_letter success Link
netdev/fixes_present success Link
netdev/patch_count success Link
netdev/tree_selection success Clearly marked for bpf-next
netdev/subject_prefix success Link
netdev/cc_maintainers fail 1 blamed authors not CCed: yhs@fb.com; 9 maintainers not CCed: netdev@vger.kernel.org kpsingh@kernel.org shuah@kernel.org songliubraving@fb.com linux-kselftest@vger.kernel.org yhs@fb.com kafai@fb.com john.fastabend@gmail.com andrii@kernel.org
netdev/source_inline success Was 0 now: 0
netdev/verify_signedoff success Link
netdev/module_param success Was 0 now: 0
netdev/build_32bit success Errors and warnings before: 24 this patch: 24
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/verify_fixes success Link
netdev/checkpatch warning CHECK: multiple assignments should be avoided WARNING: line length of 84 exceeds 80 columns WARNING: line length of 86 exceeds 80 columns WARNING: line length of 91 exceeds 80 columns WARNING: line length of 95 exceeds 80 columns WARNING: line length of 98 exceeds 80 columns
netdev/build_allmodconfig_warn success Errors and warnings before: 24 this patch: 24
netdev/header_inline success Link
netdev/stable success Stable not CCed

Commit Message

Brendan Jackman Feb. 23, 2021, 3:08 p.m. UTC
As pointed out by Ilya and explained in the new comment, there's a
discrepancy between x86 and BPF CMPXCHG semantics: BPF always loads
the value from memory into r0, while x86 only does so when r0 and the
value in memory are different. The same issue affects s390.

At first this might sound like pure semantics, but it makes a real
difference when the comparison is 32-bit, since the load will
zero-extend r0/rax.

The fix is to explicitly zero-extend rax after doing such a
CMPXCHG. Since this problem affects multiple archs, this is done in
the verifier by patching in a BPF_ZEXT_REG instruction after every
32-bit cmpxchg. Any archs that don't need such manual zero-extension
can do a look-ahead with insn_is_zext to skip the unnecessary mov.

There was actually already logic to patch in zero-extension insns
after 32-bit cmpxchgs, in opt_subreg_zext_lo32_rnd_hi32. To avoid
bloating the prog with unnecessary movs, we now explicitly check and
skip that logic for this case.

Reported-by: Ilya Leoshkevich <iii@linux.ibm.com>
Fixes: 5ffa25502b5a ("bpf: Add instructions for atomic_[cmp]xchg")
Signed-off-by: Brendan Jackman <jackmanb@google.com>
---

Differences v3->v4[1]:
 - Moved the optimization against pointless zext into the correct place:
   opt_subreg_zext_lo32_rnd_hi32 is called _after_ fixup_bpf_calls.

Differences v2->v3[1]:
 - Moved patching into fixup_bpf_calls (patch incoming to rename this function)
 - Added extra commentary on bpf_jit_needs_zext
 - Added check to avoid adding a pointless zext(r0) if there's already one there.

Difference v1->v2[1]: Now solved centrally in the verifier instead of
  specifically for the x86 JIT. Thanks to Ilya and Daniel for the suggestions!

[1] v3: https://lore.kernel.org/bpf/08669818-c99d-0d30-e1db-53160c063611@iogearbox.net/T/#t
    v2: https://lore.kernel.org/bpf/08669818-c99d-0d30-e1db-53160c063611@iogearbox.net/T/#t
    v1: https://lore.kernel.org/bpf/d7ebaefb-bfd6-a441-3ff2-2fdfe699b1d2@iogearbox.net/T/#t

 kernel/bpf/core.c                             |  4 +++
 kernel/bpf/verifier.c                         | 33 +++++++++++++++++--
 .../selftests/bpf/verifier/atomic_cmpxchg.c   | 25 ++++++++++++++
 .../selftests/bpf/verifier/atomic_or.c        | 26 +++++++++++++++
 4 files changed, 86 insertions(+), 2 deletions(-)


base-commit: 7b1e385c9a488de9291eaaa412146d3972e9dec5
--
2.30.0.617.g56c4b15f3c-goog

Comments

Martin KaFai Lau Feb. 24, 2021, 5:47 a.m. UTC | #1
On Tue, Feb 23, 2021 at 03:08:45PM +0000, Brendan Jackman wrote:
[ ... ]

> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> index 0ae015ad1e05..dcf18612841b 100644
> --- a/kernel/bpf/core.c
> +++ b/kernel/bpf/core.c
> @@ -2342,6 +2342,10 @@ bool __weak bpf_helper_changes_pkt_data(void *func)
>  /* Return TRUE if the JIT backend wants verifier to enable sub-register usage
>   * analysis code and wants explicit zero extension inserted by verifier.
>   * Otherwise, return FALSE.
> + *
> + * The verifier inserts an explicit zero extension after BPF_CMPXCHGs even if
> + * you don't override this. JITs that don't want these extra insns can detect
> + * them using insn_is_zext.
>   */
>  bool __weak bpf_jit_needs_zext(void)
>  {
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 3d34ba492d46..ec1cbd565140 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -11061,8 +11061,16 @@ static int opt_subreg_zext_lo32_rnd_hi32(struct bpf_verifier_env *env,
>  			 */
>  			if (WARN_ON(!(insn.imm & BPF_FETCH)))
>  				return -EINVAL;
> -			load_reg = insn.imm == BPF_CMPXCHG ? BPF_REG_0
> -							   : insn.src_reg;
> +			/* There should already be a zero-extension inserted after BPF_CMPXCHG. */
> +			if (insn.imm == BPF_CMPXCHG) {
> +				struct bpf_insn *next = &insns[adj_idx + 1];
> +
> +				if (WARN_ON(!insn_is_zext(next) || next->dst_reg != insn.src_reg))
> +					return -EINVAL;
> +				continue;
This is to avoid zext_patch again for the JITs with
bpf_jit_needs_zext() == true.

IIUC, at this point, aux[adj_idx].zext_dst == true which
means that the check_atomic() has already marked the
reg0->subreg_def properly.

> +			}
> +
> +			load_reg = insn.src_reg;
>  		} else {
>  			load_reg = insn.dst_reg;
>  		}
> @@ -11666,6 +11674,27 @@ static int fixup_bpf_calls(struct bpf_verifier_env *env)
>  			continue;
>  		}
> 
> +		/* BPF_CMPXCHG always loads a value into R0, therefore always
> +		 * zero-extends. However some archs' equivalent instruction only
> +		 * does this load when the comparison is successful. So here we
> +		 * add a BPF_ZEXT_REG after every 32-bit CMPXCHG, so that such
> +		 * archs' JITs don't need to deal with the issue. Archs that
> +		 * don't face this issue may use insn_is_zext to detect and skip
> +		 * the added instruction.
> +		 */
> +		if (insn->code == (BPF_STX | BPF_W | BPF_ATOMIC) && insn->imm == BPF_CMPXCHG) {
> +			struct bpf_insn zext_patch[2] = { *insn, BPF_ZEXT_REG(BPF_REG_0) };
Then should this zext_patch only be done for "!bpf_jit_needs_zext()"
such that the above change in opt_subreg_zext_lo32_rnd_hi32()
becomes unnecessary?

> +
> +			new_prog = bpf_patch_insn_data(env, i + delta, zext_patch, 2);
> +			if (!new_prog)
> +				return -ENOMEM;
> +
> +			delta    += 1;
> +			env->prog = prog = new_prog;
> +			insn      = new_prog->insnsi + i + delta;
> +			continue;
> +		}
> +
>  		if (insn->code != (BPF_JMP | BPF_CALL))
>  			continue;
>  		if (insn->src_reg == BPF_PSEUDO_CALL)
Brendan Jackman Feb. 24, 2021, 9:32 a.m. UTC | #2
On Wed, 24 Feb 2021 at 06:48, Martin KaFai Lau <kafai@fb.com> wrote:
>
> On Tue, Feb 23, 2021 at 03:08:45PM +0000, Brendan Jackman wrote:
> [ ... ]
>
> > diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> > index 0ae015ad1e05..dcf18612841b 100644
> > --- a/kernel/bpf/core.c
> > +++ b/kernel/bpf/core.c
> > @@ -2342,6 +2342,10 @@ bool __weak bpf_helper_changes_pkt_data(void *func)
> >  /* Return TRUE if the JIT backend wants verifier to enable sub-register usage
> >   * analysis code and wants explicit zero extension inserted by verifier.
> >   * Otherwise, return FALSE.
> > + *
> > + * The verifier inserts an explicit zero extension after BPF_CMPXCHGs even if
> > + * you don't override this. JITs that don't want these extra insns can detect
> > + * them using insn_is_zext.
> >   */
> >  bool __weak bpf_jit_needs_zext(void)
> >  {
> > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > index 3d34ba492d46..ec1cbd565140 100644
> > --- a/kernel/bpf/verifier.c
> > +++ b/kernel/bpf/verifier.c
> > @@ -11061,8 +11061,16 @@ static int opt_subreg_zext_lo32_rnd_hi32(struct bpf_verifier_env *env,
> >                        */
> >                       if (WARN_ON(!(insn.imm & BPF_FETCH)))
> >                               return -EINVAL;
> > -                     load_reg = insn.imm == BPF_CMPXCHG ? BPF_REG_0
> > -                                                        : insn.src_reg;
> > +                     /* There should already be a zero-extension inserted after BPF_CMPXCHG. */
> > +                     if (insn.imm == BPF_CMPXCHG) {
> > +                             struct bpf_insn *next = &insns[adj_idx + 1];
> > +
> > +                             if (WARN_ON(!insn_is_zext(next) || next->dst_reg != insn.src_reg))
> > +                                     return -EINVAL;
> > +                             continue;
> This is to avoid zext_patch again for the JITs with
> bpf_jit_needs_zext() == true.
>
> IIUC, at this point, aux[adj_idx].zext_dst == true which
> means that the check_atomic() has already marked the
> reg0->subreg_def properly.

That's right... sorry I'm not sure if you're implying something here
or just checking understanding?

> > +                     }
> > +
> > +                     load_reg = insn.src_reg;
> >               } else {
> >                       load_reg = insn.dst_reg;
> >               }
> > @@ -11666,6 +11674,27 @@ static int fixup_bpf_calls(struct bpf_verifier_env *env)
> >                       continue;
> >               }
> >
> > +             /* BPF_CMPXCHG always loads a value into R0, therefore always
> > +              * zero-extends. However some archs' equivalent instruction only
> > +              * does this load when the comparison is successful. So here we
> > +              * add a BPF_ZEXT_REG after every 32-bit CMPXCHG, so that such
> > +              * archs' JITs don't need to deal with the issue. Archs that
> > +              * don't face this issue may use insn_is_zext to detect and skip
> > +              * the added instruction.
> > +              */
> > +             if (insn->code == (BPF_STX | BPF_W | BPF_ATOMIC) && insn->imm == BPF_CMPXCHG) {
> > +                     struct bpf_insn zext_patch[2] = { *insn, BPF_ZEXT_REG(BPF_REG_0) };
> Then should this zext_patch only be done for "!bpf_jit_needs_zext()"
> such that the above change in opt_subreg_zext_lo32_rnd_hi32()
> becomes unnecessary?

Yep that would work but I IMO it would be a more fragile expression of
the logic: instead of directly checking whether something was done
we'd be looking at a proxy for another part of the system's behaviour.
I don't think it would win us anything in terms of clarity either?

Thanks for taking a look!
Ilya Leoshkevich Feb. 24, 2021, 12:02 p.m. UTC | #3
On Tue, 2021-02-23 at 15:08 +0000, Brendan Jackman wrote:
> As pointed out by Ilya and explained in the new comment, there's a
> discrepancy between x86 and BPF CMPXCHG semantics: BPF always loads
> the value from memory into r0, while x86 only does so when r0 and the
> value in memory are different. The same issue affects s390.
> 
> At first this might sound like pure semantics, but it makes a real
> difference when the comparison is 32-bit, since the load will
> zero-extend r0/rax.
> 
> The fix is to explicitly zero-extend rax after doing such a
> CMPXCHG. Since this problem affects multiple archs, this is done in
> the verifier by patching in a BPF_ZEXT_REG instruction after every
> 32-bit cmpxchg. Any archs that don't need such manual zero-extension
> can do a look-ahead with insn_is_zext to skip the unnecessary mov.
> 
> There was actually already logic to patch in zero-extension insns
> after 32-bit cmpxchgs, in opt_subreg_zext_lo32_rnd_hi32. To avoid
> bloating the prog with unnecessary movs, we now explicitly check and
> skip that logic for this case.
> 
> Reported-by: Ilya Leoshkevich <iii@linux.ibm.com>
> Fixes: 5ffa25502b5a ("bpf: Add instructions for atomic_[cmp]xchg")
> Signed-off-by: Brendan Jackman <jackmanb@google.com>
> ---
> 
> Differences v3->v4[1]:
>  - Moved the optimization against pointless zext into the correct
> place:
>    opt_subreg_zext_lo32_rnd_hi32 is called _after_ fixup_bpf_calls.
> 
> Differences v2->v3[1]:
>  - Moved patching into fixup_bpf_calls (patch incoming to rename this
> function)
>  - Added extra commentary on bpf_jit_needs_zext
>  - Added check to avoid adding a pointless zext(r0) if there's
> already one there.
> 
> Difference v1->v2[1]: Now solved centrally in the verifier instead of
>   specifically for the x86 JIT. Thanks to Ilya and Daniel for the
> suggestions!
> 
> [1] v3: 
> https://lore.kernel.org/bpf/08669818-c99d-0d30-e1db-53160c063611@iogearbox.net/T/#t
>     v2: 
> https://lore.kernel.org/bpf/08669818-c99d-0d30-e1db-53160c063611@iogearbox.net/T/#t
>     v1: 
> https://lore.kernel.org/bpf/d7ebaefb-bfd6-a441-3ff2-2fdfe699b1d2@iogearbox.net/T/#t
> 
>  kernel/bpf/core.c                             |  4 +++
>  kernel/bpf/verifier.c                         | 33
> +++++++++++++++++--
>  .../selftests/bpf/verifier/atomic_cmpxchg.c   | 25 ++++++++++++++
>  .../selftests/bpf/verifier/atomic_or.c        | 26 +++++++++++++++
>  4 files changed, 86 insertions(+), 2 deletions(-)

Unfortunately this still gives me 2 x `w0 = w0` on s390, but the
culprit seems to be not your patch, but rather that
adjust_insn_aux_data() is messing up zext_dst. I'll try to debug
further and come up with a fix.
Ilya Leoshkevich Feb. 24, 2021, 2:16 p.m. UTC | #4
On Tue, 2021-02-23 at 15:08 +0000, Brendan Jackman wrote:
> As pointed out by Ilya and explained in the new comment, there's a
> discrepancy between x86 and BPF CMPXCHG semantics: BPF always loads
> the value from memory into r0, while x86 only does so when r0 and the
> value in memory are different. The same issue affects s390.
> 
> At first this might sound like pure semantics, but it makes a real
> difference when the comparison is 32-bit, since the load will
> zero-extend r0/rax.
> 
> The fix is to explicitly zero-extend rax after doing such a
> CMPXCHG. Since this problem affects multiple archs, this is done in
> the verifier by patching in a BPF_ZEXT_REG instruction after every
> 32-bit cmpxchg. Any archs that don't need such manual zero-extension
> can do a look-ahead with insn_is_zext to skip the unnecessary mov.
> 
> There was actually already logic to patch in zero-extension insns
> after 32-bit cmpxchgs, in opt_subreg_zext_lo32_rnd_hi32. To avoid
> bloating the prog with unnecessary movs, we now explicitly check and
> skip that logic for this case.
> 
> Reported-by: Ilya Leoshkevich <iii@linux.ibm.com>
> Fixes: 5ffa25502b5a ("bpf: Add instructions for atomic_[cmp]xchg")
> Signed-off-by: Brendan Jackman <jackmanb@google.com>
> ---
> 
> Differences v3->v4[1]:
>  - Moved the optimization against pointless zext into the correct
> place:
>    opt_subreg_zext_lo32_rnd_hi32 is called _after_ fixup_bpf_calls.
> 
> Differences v2->v3[1]:
>  - Moved patching into fixup_bpf_calls (patch incoming to rename this
> function)
>  - Added extra commentary on bpf_jit_needs_zext
>  - Added check to avoid adding a pointless zext(r0) if there's
> already one there.
> 
> Difference v1->v2[1]: Now solved centrally in the verifier instead of
>   specifically for the x86 JIT. Thanks to Ilya and Daniel for the
> suggestions!
> 
> [1] v3: 
> https://lore.kernel.org/bpf/08669818-c99d-0d30-e1db-53160c063611@iogearbox.net/T/#t
>     v2: 
> https://lore.kernel.org/bpf/08669818-c99d-0d30-e1db-53160c063611@iogearbox.net/T/#t
>     v1: 
> https://lore.kernel.org/bpf/d7ebaefb-bfd6-a441-3ff2-2fdfe699b1d2@iogearbox.net/T/#t
> 
>  kernel/bpf/core.c                             |  4 +++
>  kernel/bpf/verifier.c                         | 33
> +++++++++++++++++--
>  .../selftests/bpf/verifier/atomic_cmpxchg.c   | 25 ++++++++++++++
>  .../selftests/bpf/verifier/atomic_or.c        | 26 +++++++++++++++
>  4 files changed, 86 insertions(+), 2 deletions(-)

I think I managed to figure out what is wrong with
adjust_insn_aux_data(): insn_has_def32() does not know about BPF_FETCH.
I'll post a fix shortly; in the meantime, based on my debugging
experience and on looking at the code for a while, I have a few
comments regarding the patch.

[...]

> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 3d34ba492d46..ec1cbd565140 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -11061,8 +11061,16 @@ static int
> opt_subreg_zext_lo32_rnd_hi32(struct bpf_verifier_env *env,
>                          */
>                         if (WARN_ON(!(insn.imm & BPF_FETCH)))
>                                 return -EINVAL;
> -                       load_reg = insn.imm == BPF_CMPXCHG ?
> BPF_REG_0
> -                                                          :
> insn.src_reg;
> +                       /* There should already be a zero-extension
> inserted after BPF_CMPXCHG. */
> +                       if (insn.imm == BPF_CMPXCHG) {
> +                               struct bpf_insn *next =
> &insns[adj_idx + 1];

Would it make sense to check bounds here? Not sure whether the
verification process might come that far with the last instruction
being cmpxchg and not ret, but still..

> +
> +                               if (WARN_ON(!insn_is_zext(next) ||
> next->dst_reg != insn.src_reg))

We generate BPF_ZEXT_REG(BPF_REG_0), so we should probably use
BPF_REG_0 instead of insn.src_reg here.

> +                                       return -EINVAL;
> +                               continue;

I think we need i++ before continue, otherwise we would stumble upon
BPF_ZEXT_REG itself on the next iteration, and it is also marked with
zext_dst.

> +                       }
> +
> +                       load_reg = insn.src_reg;
>                 } else {
>                         load_reg = insn.dst_reg;
>                 }
> @@ -11666,6 +11674,27 @@ static int fixup_bpf_calls(struct
> bpf_verifier_env *env)
>                         continue;
>                 }
> 
> +               /* BPF_CMPXCHG always loads a value into R0,
> therefore always
> +                * zero-extends. However some archs' equivalent
> instruction only
> +                * does this load when the comparison is successful.
> So here we
> +                * add a BPF_ZEXT_REG after every 32-bit CMPXCHG, so
> that such
> +                * archs' JITs don't need to deal with the issue.
> Archs that
> +                * don't face this issue may use insn_is_zext to
> detect and skip
> +                * the added instruction.
> +                */
> +               if (insn->code == (BPF_STX | BPF_W | BPF_ATOMIC) &&
> insn->imm == BPF_CMPXCHG) {

Since we want this only for JITs and not the interpreter, would it make
sense to check prog->jit_requested, like some other fragments of this
function do?

[...]
Martin KaFai Lau Feb. 24, 2021, 10:14 p.m. UTC | #5
On Wed, Feb 24, 2021 at 10:32:28AM +0100, Brendan Jackman wrote:
> On Wed, 24 Feb 2021 at 06:48, Martin KaFai Lau <kafai@fb.com> wrote:
> >
> > On Tue, Feb 23, 2021 at 03:08:45PM +0000, Brendan Jackman wrote:
> > [ ... ]
> >
> > > diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> > > index 0ae015ad1e05..dcf18612841b 100644
> > > --- a/kernel/bpf/core.c
> > > +++ b/kernel/bpf/core.c
> > > @@ -2342,6 +2342,10 @@ bool __weak bpf_helper_changes_pkt_data(void *func)
> > >  /* Return TRUE if the JIT backend wants verifier to enable sub-register usage
> > >   * analysis code and wants explicit zero extension inserted by verifier.
> > >   * Otherwise, return FALSE.
> > > + *
> > > + * The verifier inserts an explicit zero extension after BPF_CMPXCHGs even if
> > > + * you don't override this. JITs that don't want these extra insns can detect
> > > + * them using insn_is_zext.
> > >   */
> > >  bool __weak bpf_jit_needs_zext(void)
> > >  {
> > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > > index 3d34ba492d46..ec1cbd565140 100644
> > > --- a/kernel/bpf/verifier.c
> > > +++ b/kernel/bpf/verifier.c
> > > @@ -11061,8 +11061,16 @@ static int opt_subreg_zext_lo32_rnd_hi32(struct bpf_verifier_env *env,
> > >                        */
> > >                       if (WARN_ON(!(insn.imm & BPF_FETCH)))
> > >                               return -EINVAL;
> > > -                     load_reg = insn.imm == BPF_CMPXCHG ? BPF_REG_0
> > > -                                                        : insn.src_reg;
> > > +                     /* There should already be a zero-extension inserted after BPF_CMPXCHG. */
> > > +                     if (insn.imm == BPF_CMPXCHG) {
> > > +                             struct bpf_insn *next = &insns[adj_idx + 1];
> > > +
> > > +                             if (WARN_ON(!insn_is_zext(next) || next->dst_reg != insn.src_reg))
> > > +                                     return -EINVAL;
> > > +                             continue;
> > This is to avoid zext_patch again for the JITs with
> > bpf_jit_needs_zext() == true.
> >
> > IIUC, at this point, aux[adj_idx].zext_dst == true which
> > means that the check_atomic() has already marked the
> > reg0->subreg_def properly.
> 
> That's right... sorry I'm not sure if you're implying something here
> or just checking understanding?
> 
> > > +                     }
> > > +
> > > +                     load_reg = insn.src_reg;
> > >               } else {
> > >                       load_reg = insn.dst_reg;
> > >               }
> > > @@ -11666,6 +11674,27 @@ static int fixup_bpf_calls(struct bpf_verifier_env *env)
> > >                       continue;
> > >               }
> > >
> > > +             /* BPF_CMPXCHG always loads a value into R0, therefore always
> > > +              * zero-extends. However some archs' equivalent instruction only
> > > +              * does this load when the comparison is successful. So here we
> > > +              * add a BPF_ZEXT_REG after every 32-bit CMPXCHG, so that such
> > > +              * archs' JITs don't need to deal with the issue. Archs that
> > > +              * don't face this issue may use insn_is_zext to detect and skip
> > > +              * the added instruction.
> > > +              */
> > > +             if (insn->code == (BPF_STX | BPF_W | BPF_ATOMIC) && insn->imm == BPF_CMPXCHG) {
> > > +                     struct bpf_insn zext_patch[2] = { *insn, BPF_ZEXT_REG(BPF_REG_0) };
> > Then should this zext_patch only be done for "!bpf_jit_needs_zext()"
> > such that the above change in opt_subreg_zext_lo32_rnd_hi32()
> > becomes unnecessary?
> 
> Yep that would work but I IMO it would be a more fragile expression of
> the logic: instead of directly checking whether something was done
> we'd be looking at a proxy for another part of the system's behaviour.
> I don't think it would win us anything in terms of clarity either?
hmmm... I find it quite confusing to read.

While the current opt_subreg_zext_lo32_rnd_hi32() has
already been doing the actual zext patching work based
on the zext_dst marking,
this patch does zext patch for cmpxchg before opt_subreg_zext_lo32_rnd_hi32()
even the zext_dst has already been marked.

Then later in opt_subreg_zext_lo32_rnd_hi32(), code is
added to avoid doing the zext patch again for the
"!bpf_jit_needs_zext()" case.

If there is other cases later, then changes have to be made
in both places, one does zext patch and then another to
avoid double patch for the "!bpf_jit_needs_zext()" case.

Why not only patch it when there is no other places doing it?

It may be better to do the zext patch for cmpxchg in
opt_subreg_zext_lo32_rnd_hi32() also.  Then all zext patch
is done in one place.  Something like:

static int opt_subreg_zext_lo32_rnd_hi32(struct bpf_verifier_env *env,
					const union bpf_attr *attr)
{

	for (i = 0; i < len; i++) {
		/* ... */
	
		if (!bpf_jit_needs_zext() && !is_cmpxchg_insn(insn))
			continue;

		/* do zext patch */
	}
}

Would that work?
Martin KaFai Lau Feb. 24, 2021, 10:34 p.m. UTC | #6
On Wed, Feb 24, 2021 at 03:16:18PM +0100, Ilya Leoshkevich wrote:
> On Tue, 2021-02-23 at 15:08 +0000, Brendan Jackman wrote:
> > As pointed out by Ilya and explained in the new comment, there's a
> > discrepancy between x86 and BPF CMPXCHG semantics: BPF always loads
> > the value from memory into r0, while x86 only does so when r0 and the
> > value in memory are different. The same issue affects s390.
> > 
> > At first this might sound like pure semantics, but it makes a real
> > difference when the comparison is 32-bit, since the load will
> > zero-extend r0/rax.
> > 
> > The fix is to explicitly zero-extend rax after doing such a
> > CMPXCHG. Since this problem affects multiple archs, this is done in
> > the verifier by patching in a BPF_ZEXT_REG instruction after every
> > 32-bit cmpxchg. Any archs that don't need such manual zero-extension
> > can do a look-ahead with insn_is_zext to skip the unnecessary mov.
> > 
> > There was actually already logic to patch in zero-extension insns
> > after 32-bit cmpxchgs, in opt_subreg_zext_lo32_rnd_hi32. To avoid
> > bloating the prog with unnecessary movs, we now explicitly check and
> > skip that logic for this case.
> > 
> > Reported-by: Ilya Leoshkevich <iii@linux.ibm.com>
> > Fixes: 5ffa25502b5a ("bpf: Add instructions for atomic_[cmp]xchg")
> > Signed-off-by: Brendan Jackman <jackmanb@google.com>
> > ---
> > 
> > Differences v3->v4[1]:
> >  - Moved the optimization against pointless zext into the correct
> > place:
> >    opt_subreg_zext_lo32_rnd_hi32 is called _after_ fixup_bpf_calls.
> > 
> > Differences v2->v3[1]:
> >  - Moved patching into fixup_bpf_calls (patch incoming to rename this
> > function)
> >  - Added extra commentary on bpf_jit_needs_zext
> >  - Added check to avoid adding a pointless zext(r0) if there's
> > already one there.
> > 
> > Difference v1->v2[1]: Now solved centrally in the verifier instead of
> >   specifically for the x86 JIT. Thanks to Ilya and Daniel for the
> > suggestions!
> > 
> > [1] v3: 
> > https://lore.kernel.org/bpf/08669818-c99d-0d30-e1db-53160c063611@iogearbox.net/T/#t
> >     v2: 
> > https://lore.kernel.org/bpf/08669818-c99d-0d30-e1db-53160c063611@iogearbox.net/T/#t
> >     v1: 
> > https://lore.kernel.org/bpf/d7ebaefb-bfd6-a441-3ff2-2fdfe699b1d2@iogearbox.net/T/#t
> > 
> >  kernel/bpf/core.c                             |  4 +++
> >  kernel/bpf/verifier.c                         | 33
> > +++++++++++++++++--
> >  .../selftests/bpf/verifier/atomic_cmpxchg.c   | 25 ++++++++++++++
> >  .../selftests/bpf/verifier/atomic_or.c        | 26 +++++++++++++++
> >  4 files changed, 86 insertions(+), 2 deletions(-)
> 
> I think I managed to figure out what is wrong with
> adjust_insn_aux_data(): insn_has_def32() does not know about BPF_FETCH.
> I'll post a fix shortly; in the meantime, based on my debugging
> experience and on looking at the code for a while, I have a few
> comments regarding the patch.
Ah. good catch.

If adjust_insn_aux_data()/insn_has_def32() is fixed to set zext_dst
properly for BPF_FETCH, then that alone should be enough for s390?
Ilya Leoshkevich Feb. 24, 2021, 11:07 p.m. UTC | #7
On Wed, 2021-02-24 at 14:34 -0800, Martin KaFai Lau wrote:
> On Wed, Feb 24, 2021 at 03:16:18PM +0100, Ilya Leoshkevich wrote:
> > On Tue, 2021-02-23 at 15:08 +0000, Brendan Jackman wrote:
> > > As pointed out by Ilya and explained in the new comment, there's a
> > > discrepancy between x86 and BPF CMPXCHG semantics: BPF always loads
> > > the value from memory into r0, while x86 only does so when r0 and
> > > the
> > > value in memory are different. The same issue affects s390.
> > > 
> > > At first this might sound like pure semantics, but it makes a real
> > > difference when the comparison is 32-bit, since the load will
> > > zero-extend r0/rax.
> > > 
> > > The fix is to explicitly zero-extend rax after doing such a
> > > CMPXCHG. Since this problem affects multiple archs, this is done in
> > > the verifier by patching in a BPF_ZEXT_REG instruction after every
> > > 32-bit cmpxchg. Any archs that don't need such manual zero-
> > > extension
> > > can do a look-ahead with insn_is_zext to skip the unnecessary mov.
> > > 
> > > There was actually already logic to patch in zero-extension insns
> > > after 32-bit cmpxchgs, in opt_subreg_zext_lo32_rnd_hi32. To avoid
> > > bloating the prog with unnecessary movs, we now explicitly check
> > > and
> > > skip that logic for this case.
> > > 
> > > Reported-by: Ilya Leoshkevich <iii@linux.ibm.com>
> > > Fixes: 5ffa25502b5a ("bpf: Add instructions for atomic_[cmp]xchg")
> > > Signed-off-by: Brendan Jackman <jackmanb@google.com>
> > > ---
> > > 
> > > Differences v3->v4[1]:
> > >  - Moved the optimization against pointless zext into the correct
> > > place:
> > >    opt_subreg_zext_lo32_rnd_hi32 is called _after_ fixup_bpf_calls.
> > > 
> > > Differences v2->v3[1]:
> > >  - Moved patching into fixup_bpf_calls (patch incoming to rename
> > > this
> > > function)
> > >  - Added extra commentary on bpf_jit_needs_zext
> > >  - Added check to avoid adding a pointless zext(r0) if there's
> > > already one there.
> > > 
> > > Difference v1->v2[1]: Now solved centrally in the verifier instead
> > > of
> > >   specifically for the x86 JIT. Thanks to Ilya and Daniel for the
> > > suggestions!
> > > 
> > > [1] v3: 
> > > https://lore.kernel.org/bpf/08669818-c99d-0d30-e1db-53160c063611@iogearbox.net/T/#t
> > >     v2: 
> > > https://lore.kernel.org/bpf/08669818-c99d-0d30-e1db-53160c063611@iogearbox.net/T/#t
> > >     v1: 
> > > https://lore.kernel.org/bpf/d7ebaefb-bfd6-a441-3ff2-2fdfe699b1d2@iogearbox.net/T/#t
> > > 
> > >  kernel/bpf/core.c                             |  4 +++
> > >  kernel/bpf/verifier.c                         | 33
> > > +++++++++++++++++--
> > >  .../selftests/bpf/verifier/atomic_cmpxchg.c   | 25 ++++++++++++++
> > >  .../selftests/bpf/verifier/atomic_or.c        | 26 +++++++++++++++
> > >  4 files changed, 86 insertions(+), 2 deletions(-)
> > 
> > I think I managed to figure out what is wrong with
> > adjust_insn_aux_data(): insn_has_def32() does not know about
> > BPF_FETCH.
> > I'll post a fix shortly; in the meantime, based on my debugging
> > experience and on looking at the code for a while, I have a few
> > comments regarding the patch.
> Ah. good catch.
> 
> If adjust_insn_aux_data()/insn_has_def32() is fixed to set zext_dst
> properly for BPF_FETCH, then that alone should be enough for s390?

Yes, my fix [1] + this patch (with conflicts resolved) seem to work
really nicely on s390 for me: no duplicate zexts and one less check
that the JIT needs to do.

[1]
https://lore.kernel.org/bpf/20210224141837.104654-1-iii@linux.ibm.com/
Brendan Jackman March 1, 2021, 4:48 p.m. UTC | #8
On Wed, 24 Feb 2021 at 23:14, Martin KaFai Lau <kafai@fb.com> wrote:
>
> On Wed, Feb 24, 2021 at 10:32:28AM +0100, Brendan Jackman wrote:
> > On Wed, 24 Feb 2021 at 06:48, Martin KaFai Lau <kafai@fb.com> wrote:
> > >
> > > On Tue, Feb 23, 2021 at 03:08:45PM +0000, Brendan Jackman wrote:
> > > [ ... ]
> > >
> > > > diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> > > > index 0ae015ad1e05..dcf18612841b 100644
> > > > --- a/kernel/bpf/core.c
> > > > +++ b/kernel/bpf/core.c
> > > > @@ -2342,6 +2342,10 @@ bool __weak bpf_helper_changes_pkt_data(void *func)
> > > >  /* Return TRUE if the JIT backend wants verifier to enable sub-register usage
> > > >   * analysis code and wants explicit zero extension inserted by verifier.
> > > >   * Otherwise, return FALSE.
> > > > + *
> > > > + * The verifier inserts an explicit zero extension after BPF_CMPXCHGs even if
> > > > + * you don't override this. JITs that don't want these extra insns can detect
> > > > + * them using insn_is_zext.
> > > >   */
> > > >  bool __weak bpf_jit_needs_zext(void)
> > > >  {
> > > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > > > index 3d34ba492d46..ec1cbd565140 100644
> > > > --- a/kernel/bpf/verifier.c
> > > > +++ b/kernel/bpf/verifier.c
> > > > @@ -11061,8 +11061,16 @@ static int opt_subreg_zext_lo32_rnd_hi32(struct bpf_verifier_env *env,
> > > >                        */
> > > >                       if (WARN_ON(!(insn.imm & BPF_FETCH)))
> > > >                               return -EINVAL;
> > > > -                     load_reg = insn.imm == BPF_CMPXCHG ? BPF_REG_0
> > > > -                                                        : insn.src_reg;
> > > > +                     /* There should already be a zero-extension inserted after BPF_CMPXCHG. */
> > > > +                     if (insn.imm == BPF_CMPXCHG) {
> > > > +                             struct bpf_insn *next = &insns[adj_idx + 1];
> > > > +
> > > > +                             if (WARN_ON(!insn_is_zext(next) || next->dst_reg != insn.src_reg))
> > > > +                                     return -EINVAL;
> > > > +                             continue;
> > > This is to avoid zext_patch again for the JITs with
> > > bpf_jit_needs_zext() == true.
> > >
> > > IIUC, at this point, aux[adj_idx].zext_dst == true which
> > > means that the check_atomic() has already marked the
> > > reg0->subreg_def properly.
> >
> > That's right... sorry I'm not sure if you're implying something here
> > or just checking understanding?
> >
> > > > +                     }
> > > > +
> > > > +                     load_reg = insn.src_reg;
> > > >               } else {
> > > >                       load_reg = insn.dst_reg;
> > > >               }
> > > > @@ -11666,6 +11674,27 @@ static int fixup_bpf_calls(struct bpf_verifier_env *env)
> > > >                       continue;
> > > >               }
> > > >
> > > > +             /* BPF_CMPXCHG always loads a value into R0, therefore always
> > > > +              * zero-extends. However some archs' equivalent instruction only
> > > > +              * does this load when the comparison is successful. So here we
> > > > +              * add a BPF_ZEXT_REG after every 32-bit CMPXCHG, so that such
> > > > +              * archs' JITs don't need to deal with the issue. Archs that
> > > > +              * don't face this issue may use insn_is_zext to detect and skip
> > > > +              * the added instruction.
> > > > +              */
> > > > +             if (insn->code == (BPF_STX | BPF_W | BPF_ATOMIC) && insn->imm == BPF_CMPXCHG) {
> > > > +                     struct bpf_insn zext_patch[2] = { *insn, BPF_ZEXT_REG(BPF_REG_0) };
> > > Then should this zext_patch only be done for "!bpf_jit_needs_zext()"
> > > such that the above change in opt_subreg_zext_lo32_rnd_hi32()
> > > becomes unnecessary?
> >
> > Yep that would work but I IMO it would be a more fragile expression of
> > the logic: instead of directly checking whether something was done
> > we'd be looking at a proxy for another part of the system's behaviour.
> > I don't think it would win us anything in terms of clarity either?
> hmmm... I find it quite confusing to read.
>
> While the current opt_subreg_zext_lo32_rnd_hi32() has
> already been doing the actual zext patching work based
> on the zext_dst marking,
> this patch does zext patch for cmpxchg before opt_subreg_zext_lo32_rnd_hi32()
> even the zext_dst has already been marked.
>
> Then later in opt_subreg_zext_lo32_rnd_hi32(), code is
> added to avoid doing the zext patch again for the
> "!bpf_jit_needs_zext()" case.
>
> If there is other cases later, then changes have to be made
> in both places, one does zext patch and then another to
> avoid double patch for the "!bpf_jit_needs_zext()" case.
>
> Why not only patch it when there is no other places doing it?
>
> It may be better to do the zext patch for cmpxchg in
> opt_subreg_zext_lo32_rnd_hi32() also.  Then all zext patch
> is done in one place.  Something like:
>
> static int opt_subreg_zext_lo32_rnd_hi32(struct bpf_verifier_env *env,
>                                         const union bpf_attr *attr)
> {
>
>         for (i = 0; i < len; i++) {
>                 /* ... */
>
>                 if (!bpf_jit_needs_zext() && !is_cmpxchg_insn(insn))
>                         continue;
>
>                 /* do zext patch */
>         }
> }
>
> Would that work?

Yep - this is so much simpler and clearer. Thanks! Sending another spin.
diff mbox series

Patch

diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 0ae015ad1e05..dcf18612841b 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -2342,6 +2342,10 @@  bool __weak bpf_helper_changes_pkt_data(void *func)
 /* Return TRUE if the JIT backend wants verifier to enable sub-register usage
  * analysis code and wants explicit zero extension inserted by verifier.
  * Otherwise, return FALSE.
+ *
+ * The verifier inserts an explicit zero extension after BPF_CMPXCHGs even if
+ * you don't override this. JITs that don't want these extra insns can detect
+ * them using insn_is_zext.
  */
 bool __weak bpf_jit_needs_zext(void)
 {
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 3d34ba492d46..ec1cbd565140 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -11061,8 +11061,16 @@  static int opt_subreg_zext_lo32_rnd_hi32(struct bpf_verifier_env *env,
 			 */
 			if (WARN_ON(!(insn.imm & BPF_FETCH)))
 				return -EINVAL;
-			load_reg = insn.imm == BPF_CMPXCHG ? BPF_REG_0
-							   : insn.src_reg;
+			/* There should already be a zero-extension inserted after BPF_CMPXCHG. */
+			if (insn.imm == BPF_CMPXCHG) {
+				struct bpf_insn *next = &insns[adj_idx + 1];
+
+				if (WARN_ON(!insn_is_zext(next) || next->dst_reg != insn.src_reg))
+					return -EINVAL;
+				continue;
+			}
+
+			load_reg = insn.src_reg;
 		} else {
 			load_reg = insn.dst_reg;
 		}
@@ -11666,6 +11674,27 @@  static int fixup_bpf_calls(struct bpf_verifier_env *env)
 			continue;
 		}

+		/* BPF_CMPXCHG always loads a value into R0, therefore always
+		 * zero-extends. However some archs' equivalent instruction only
+		 * does this load when the comparison is successful. So here we
+		 * add a BPF_ZEXT_REG after every 32-bit CMPXCHG, so that such
+		 * archs' JITs don't need to deal with the issue. Archs that
+		 * don't face this issue may use insn_is_zext to detect and skip
+		 * the added instruction.
+		 */
+		if (insn->code == (BPF_STX | BPF_W | BPF_ATOMIC) && insn->imm == BPF_CMPXCHG) {
+			struct bpf_insn zext_patch[2] = { *insn, BPF_ZEXT_REG(BPF_REG_0) };
+
+			new_prog = bpf_patch_insn_data(env, i + delta, zext_patch, 2);
+			if (!new_prog)
+				return -ENOMEM;
+
+			delta    += 1;
+			env->prog = prog = new_prog;
+			insn      = new_prog->insnsi + i + delta;
+			continue;
+		}
+
 		if (insn->code != (BPF_JMP | BPF_CALL))
 			continue;
 		if (insn->src_reg == BPF_PSEUDO_CALL)
diff --git a/tools/testing/selftests/bpf/verifier/atomic_cmpxchg.c b/tools/testing/selftests/bpf/verifier/atomic_cmpxchg.c
index 2efd8bcf57a1..6e52dfc64415 100644
--- a/tools/testing/selftests/bpf/verifier/atomic_cmpxchg.c
+++ b/tools/testing/selftests/bpf/verifier/atomic_cmpxchg.c
@@ -94,3 +94,28 @@ 
 	.result = REJECT,
 	.errstr = "invalid read from stack",
 },
+{
+	"BPF_W cmpxchg should zero top 32 bits",
+	.insns = {
+		/* r0 = U64_MAX; */
+		BPF_MOV64_IMM(BPF_REG_0, 0),
+		BPF_ALU64_IMM(BPF_SUB, BPF_REG_0, 1),
+		/* u64 val = r0; */
+		BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_0, -8),
+		/* r0 = (u32)atomic_cmpxchg((u32 *)&val, r0, 1); */
+		BPF_MOV32_IMM(BPF_REG_1, 1),
+		BPF_ATOMIC_OP(BPF_W, BPF_CMPXCHG, BPF_REG_10, BPF_REG_1, -8),
+		/* r1 = 0x00000000FFFFFFFFull; */
+		BPF_MOV64_IMM(BPF_REG_1, 1),
+		BPF_ALU64_IMM(BPF_LSH, BPF_REG_1, 32),
+		BPF_ALU64_IMM(BPF_SUB, BPF_REG_1, 1),
+		/* if (r0 != r1) exit(1); */
+		BPF_JMP_REG(BPF_JEQ, BPF_REG_0, BPF_REG_1, 2),
+		BPF_MOV32_IMM(BPF_REG_0, 1),
+		BPF_EXIT_INSN(),
+		/* exit(0); */
+		BPF_MOV32_IMM(BPF_REG_0, 0),
+		BPF_EXIT_INSN(),
+	},
+	.result = ACCEPT,
+},
diff --git a/tools/testing/selftests/bpf/verifier/atomic_or.c b/tools/testing/selftests/bpf/verifier/atomic_or.c
index 70f982e1f9f0..0a08b99e6ddd 100644
--- a/tools/testing/selftests/bpf/verifier/atomic_or.c
+++ b/tools/testing/selftests/bpf/verifier/atomic_or.c
@@ -75,3 +75,29 @@ 
 	},
 	.result = ACCEPT,
 },
+{
+	"BPF_W atomic_fetch_or should zero top 32 bits",
+	.insns = {
+		/* r1 = U64_MAX; */
+		BPF_MOV64_IMM(BPF_REG_1, 0),
+		BPF_ALU64_IMM(BPF_SUB, BPF_REG_1, 1),
+		/* u64 val = r0; */
+		BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_1, -8),
+		/* r1 = (u32)atomic_sub((u32 *)&val, 1); */
+		BPF_MOV32_IMM(BPF_REG_1, 2),
+		BPF_ATOMIC_OP(BPF_W, BPF_OR | BPF_FETCH, BPF_REG_10, BPF_REG_1, -8),
+		/* r2 = 0x00000000FFFFFFFF; */
+		BPF_MOV64_IMM(BPF_REG_2, 1),
+		BPF_ALU64_IMM(BPF_LSH, BPF_REG_2, 32),
+		BPF_ALU64_IMM(BPF_SUB, BPF_REG_2, 1),
+		/* if (r2 != r1) exit(1); */
+		BPF_JMP_REG(BPF_JEQ, BPF_REG_2, BPF_REG_1, 2),
+		/* BPF_MOV32_IMM(BPF_REG_0, 1), */
+		BPF_MOV64_REG(BPF_REG_0, BPF_REG_1),
+		BPF_EXIT_INSN(),
+		/* exit(0); */
+		BPF_MOV32_IMM(BPF_REG_0, 0),
+		BPF_EXIT_INSN(),
+	},
+	.result = ACCEPT,
+},