diff mbox series

[bpf-next] selftests/bpf: Fix implementation-defined behavior in sk_lookup test

Message ID 20220221180358.169101-1-jakub@cloudflare.com (mailing list archive)
State Changes Requested
Delegated to: BPF
Headers show
Series [bpf-next] selftests/bpf: Fix implementation-defined behavior in sk_lookup test | expand

Checks

Context Check Description
bpf/vmtest-bpf-next-PR success PR summary
netdev/tree_selection success Clearly marked for bpf-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix success Link
netdev/cover_letter success Single patches do not need cover letters
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/cc_maintainers fail 1 blamed authors not CCed: yhs@fb.com; 9 maintainers not CCed: linux-kselftest@vger.kernel.org kpsingh@kernel.org john.fastabend@gmail.com kafai@fb.com shuah@kernel.org songliubraving@fb.com lmb@cloudflare.com davemarchevsky@fb.com yhs@fb.com
netdev/build_clang success Errors and warnings before: 0 this patch: 0
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 15 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
bpf/vmtest-bpf-next success VM_Test

Commit Message

Jakub Sitnicki Feb. 21, 2022, 6:03 p.m. UTC
Shifting 16-bit type by 16 bits is implementation-defined for BPF programs.
Don't rely on it in case it is causing the test failures we are seeing on
s390x z15 target.

Fixes: 2ed0dc5937d3 ("selftests/bpf: Cover 4-byte load from remote_port in bpf_sk_lookup")
Reported-by: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---

I don't have a dev env for s390x/z15 set up yet, so can't definitely confirm the fix.
That said, it seems worth fixing either way.

 tools/testing/selftests/bpf/progs/test_sk_lookup.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Ilya Leoshkevich Feb. 21, 2022, 9:39 p.m. UTC | #1
On Mon, 2022-02-21 at 19:03 +0100, Jakub Sitnicki wrote:
> Shifting 16-bit type by 16 bits is implementation-defined for BPF
> programs.
> Don't rely on it in case it is causing the test failures we are
> seeing on
> s390x z15 target.
> 
> Fixes: 2ed0dc5937d3 ("selftests/bpf: Cover 4-byte load from
> remote_port in bpf_sk_lookup")
> Reported-by: Andrii Nakryiko <andrii.nakryiko@gmail.com>
> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
> ---
> 
> I don't have a dev env for s390x/z15 set up yet, so can't definitely
> confirm the fix.
> That said, it seems worth fixing either way.
> 
>  tools/testing/selftests/bpf/progs/test_sk_lookup.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/testing/selftests/bpf/progs/test_sk_lookup.c
> b/tools/testing/selftests/bpf/progs/test_sk_lookup.c
> index bf5b7caefdd0..7d47276a8964 100644
> --- a/tools/testing/selftests/bpf/progs/test_sk_lookup.c
> +++ b/tools/testing/selftests/bpf/progs/test_sk_lookup.c
> @@ -65,6 +65,7 @@ static const __u32 KEY_SERVER_A = SERVER_A;
>  static const __u32 KEY_SERVER_B = SERVER_B;
>  
>  static const __u16 SRC_PORT = bpf_htons(8008);
> +static const __u32 SRC_PORT_U32 = bpf_htonl(8008U << 16);
>  static const __u32 SRC_IP4 = IP4(127, 0, 0, 2);
>  static const __u32 SRC_IP6[] = IP6(0xfd000000, 0x0, 0x0,
> 0x00000002);
>  
> @@ -421,7 +422,7 @@ int ctx_narrow_access(struct bpf_sk_lookup *ctx)
>  
>         /* Load from remote_port field with zero padding (backward
> compatibility) */
>         val_u32 = *(__u32 *)&ctx->remote_port;
> -       if (val_u32 != bpf_htonl(bpf_ntohs(SRC_PORT) << 16))
> +       if (val_u32 != SRC_PORT_U32)
>                 return SK_DROP;
>  
>         /* Narrow loads from local_port field. Expect DST_PORT. */

Unfortunately this doesn't help with the s390 problem.
I'll try to debug this.
Ilya Leoshkevich Feb. 22, 2022, 12:43 a.m. UTC | #2
On Mon, 2022-02-21 at 22:39 +0100, Ilya Leoshkevich wrote:
> On Mon, 2022-02-21 at 19:03 +0100, Jakub Sitnicki wrote:
> > Shifting 16-bit type by 16 bits is implementation-defined for BPF
> > programs.
> > Don't rely on it in case it is causing the test failures we are
> > seeing on
> > s390x z15 target.
> > 
> > Fixes: 2ed0dc5937d3 ("selftests/bpf: Cover 4-byte load from
> > remote_port in bpf_sk_lookup")
> > Reported-by: Andrii Nakryiko <andrii.nakryiko@gmail.com>
> > Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
> > ---
> > 
> > I don't have a dev env for s390x/z15 set up yet, so can't
> > definitely
> > confirm the fix.
> > That said, it seems worth fixing either way.
> > 
> >  tools/testing/selftests/bpf/progs/test_sk_lookup.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/tools/testing/selftests/bpf/progs/test_sk_lookup.c
> > b/tools/testing/selftests/bpf/progs/test_sk_lookup.c
> > index bf5b7caefdd0..7d47276a8964 100644
> > --- a/tools/testing/selftests/bpf/progs/test_sk_lookup.c
> > +++ b/tools/testing/selftests/bpf/progs/test_sk_lookup.c
> > @@ -65,6 +65,7 @@ static const __u32 KEY_SERVER_A = SERVER_A;
> >  static const __u32 KEY_SERVER_B = SERVER_B;
> >  
> >  static const __u16 SRC_PORT = bpf_htons(8008);
> > +static const __u32 SRC_PORT_U32 = bpf_htonl(8008U << 16);
> >  static const __u32 SRC_IP4 = IP4(127, 0, 0, 2);
> >  static const __u32 SRC_IP6[] = IP6(0xfd000000, 0x0, 0x0,
> > 0x00000002);
> >  
> > @@ -421,7 +422,7 @@ int ctx_narrow_access(struct bpf_sk_lookup
> > *ctx)
> >  
> >         /* Load from remote_port field with zero padding (backward
> > compatibility) */
> >         val_u32 = *(__u32 *)&ctx->remote_port;
> > -       if (val_u32 != bpf_htonl(bpf_ntohs(SRC_PORT) << 16))
> > +       if (val_u32 != SRC_PORT_U32)
> >                 return SK_DROP;
> >  
> >         /* Narrow loads from local_port field. Expect DST_PORT. */
> 
> Unfortunately this doesn't help with the s390 problem.
> I'll try to debug this.

I have to admit I have a hard time wrapping my head around the
requirements here.

Based on the pre-9a69e2b385f4 code, do I understand correctly that
for the following input

Port:     0x1f48
SRC_PORT: 0x481f

we expect the following results for different kinds of loads:

Size   Offset  LE      BE
BPF_B  0       0x1f    0
BPF_B  1       0x48    0
BPF_B  2       0       0x48
BPF_B  3       0       0x1f
BPF_H  0       0x481f  0
BPF_H  1       0       0x481f
BPF_W  0       0x481f  0x481f

and this is guaranteed by the struct bpf_sk_lookup ABI? Because then it
looks as if 9a69e2b385f4 breaks it on big-endian as follows:

Size   Offset  BE-9a69e2b385f4
BPF_B  0       0x48
BPF_B  1       0x1f
BPF_B  2       0
BPF_B  3       0
BPF_H  0       0x481f
BPF_H  1       0
BPF_W  0       0x481f0000

Or is the old behavior a bug and this new one is desirable?
9a69e2b385f4 has no Fixes: tag, so I assume that's the former :-(

In which case, would it make sense to fix it by swapping remote_port
and :16 in bpf_sk_lookup on big-endian?

Best regards,
Ilya
Ilya Leoshkevich Feb. 22, 2022, 2:22 a.m. UTC | #3
On Tue, 2022-02-22 at 01:43 +0100, Ilya Leoshkevich wrote:
> On Mon, 2022-02-21 at 22:39 +0100, Ilya Leoshkevich wrote:
> > On Mon, 2022-02-21 at 19:03 +0100, Jakub Sitnicki wrote:
> > > Shifting 16-bit type by 16 bits is implementation-defined for BPF
> > > programs.
> > > Don't rely on it in case it is causing the test failures we are
> > > seeing on
> > > s390x z15 target.
> > > 
> > > Fixes: 2ed0dc5937d3 ("selftests/bpf: Cover 4-byte load from
> > > remote_port in bpf_sk_lookup")
> > > Reported-by: Andrii Nakryiko <andrii.nakryiko@gmail.com>
> > > Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
> > > ---
> > > 
> > > I don't have a dev env for s390x/z15 set up yet, so can't
> > > definitely
> > > confirm the fix.
> > > That said, it seems worth fixing either way.
> > > 
> > >  tools/testing/selftests/bpf/progs/test_sk_lookup.c | 3 ++-
> > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/tools/testing/selftests/bpf/progs/test_sk_lookup.c
> > > b/tools/testing/selftests/bpf/progs/test_sk_lookup.c
> > > index bf5b7caefdd0..7d47276a8964 100644
> > > --- a/tools/testing/selftests/bpf/progs/test_sk_lookup.c
> > > +++ b/tools/testing/selftests/bpf/progs/test_sk_lookup.c
> > > @@ -65,6 +65,7 @@ static const __u32 KEY_SERVER_A = SERVER_A;
> > >  static const __u32 KEY_SERVER_B = SERVER_B;
> > >  
> > >  static const __u16 SRC_PORT = bpf_htons(8008);
> > > +static const __u32 SRC_PORT_U32 = bpf_htonl(8008U << 16);
> > >  static const __u32 SRC_IP4 = IP4(127, 0, 0, 2);
> > >  static const __u32 SRC_IP6[] = IP6(0xfd000000, 0x0, 0x0,
> > > 0x00000002);
> > >  
> > > @@ -421,7 +422,7 @@ int ctx_narrow_access(struct bpf_sk_lookup
> > > *ctx)
> > >  
> > >         /* Load from remote_port field with zero padding
> > > (backward
> > > compatibility) */
> > >         val_u32 = *(__u32 *)&ctx->remote_port;
> > > -       if (val_u32 != bpf_htonl(bpf_ntohs(SRC_PORT) << 16))
> > > +       if (val_u32 != SRC_PORT_U32)
> > >                 return SK_DROP;
> > >  
> > >         /* Narrow loads from local_port field. Expect DST_PORT.
> > > */
> > 
> > Unfortunately this doesn't help with the s390 problem.
> > I'll try to debug this.
> 
> I have to admit I have a hard time wrapping my head around the
> requirements here.
> 
> Based on the pre-9a69e2b385f4 code, do I understand correctly that
> for the following input
> 
> Port:     0x1f48
> SRC_PORT: 0x481f
> 
> we expect the following results for different kinds of loads:
> 
> Size   Offset  LE      BE
> BPF_B  0       0x1f    0
> BPF_B  1       0x48    0
> BPF_B  2       0       0x48
> BPF_B  3       0       0x1f
> BPF_H  0       0x481f  0
> BPF_H  1       0       0x481f
> BPF_W  0       0x481f  0x481f
> 
> and this is guaranteed by the struct bpf_sk_lookup ABI? Because then
> it
> looks as if 9a69e2b385f4 breaks it on big-endian as follows:
> 
> Size   Offset  BE-9a69e2b385f4
> BPF_B  0       0x48
> BPF_B  1       0x1f
> BPF_B  2       0
> BPF_B  3       0
> BPF_H  0       0x481f
> BPF_H  1       0
> BPF_W  0       0x481f0000

Sorry, I worded this incorrectly: 9a69e2b385f4 did not change the
kernel behavior, the ABI is not broken and the old compiled code should
continue to work.
What the second table really shows are what the results should be
according to the 9a69e2b385f4 struct bpf_sk_lookup definition, which I
still think is broken on big-endian and needs to be adjusted to match
the ABI.

I noticed one other strange thing in the meantime: loads from
*(__u32 *)&ctx->remote_port, *(__u16 *)&ctx->remote_port and
*((__u16 *)&ctx->remote_port + 1) all produce 8008 on s390, which is
clearly inconsistent. It looks as if convert_ctx_accesses() needs to be
adjusted to handle combinations like ctx_field_size == 4 && size == 2
&& target_size == 2. I will continue with this tomorrow.

> Or is the old behavior a bug and this new one is desirable?
> 9a69e2b385f4 has no Fixes: tag, so I assume that's the former :-(
> 
> In which case, would it make sense to fix it by swapping remote_port
> and :16 in bpf_sk_lookup on big-endian?
> 
> Best regards,
> Ilya
Jakub Sitnicki Feb. 22, 2022, 2:53 p.m. UTC | #4
On Tue, Feb 22, 2022 at 03:22 AM +01, Ilya Leoshkevich wrote:
> On Tue, 2022-02-22 at 01:43 +0100, Ilya Leoshkevich wrote:
>> On Mon, 2022-02-21 at 22:39 +0100, Ilya Leoshkevich wrote:
>> > On Mon, 2022-02-21 at 19:03 +0100, Jakub Sitnicki wrote:
>> > > Shifting 16-bit type by 16 bits is implementation-defined for BPF
>> > > programs.
>> > > Don't rely on it in case it is causing the test failures we are
>> > > seeing on
>> > > s390x z15 target.
>> > > 
>> > > Fixes: 2ed0dc5937d3 ("selftests/bpf: Cover 4-byte load from
>> > > remote_port in bpf_sk_lookup")
>> > > Reported-by: Andrii Nakryiko <andrii.nakryiko@gmail.com>
>> > > Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
>> > > ---
>> > > 
>> > > I don't have a dev env for s390x/z15 set up yet, so can't
>> > > definitely
>> > > confirm the fix.
>> > > That said, it seems worth fixing either way.
>> > > 
>> > >  tools/testing/selftests/bpf/progs/test_sk_lookup.c | 3 ++-
>> > >  1 file changed, 2 insertions(+), 1 deletion(-)
>> > > 
>> > > diff --git a/tools/testing/selftests/bpf/progs/test_sk_lookup.c
>> > > b/tools/testing/selftests/bpf/progs/test_sk_lookup.c
>> > > index bf5b7caefdd0..7d47276a8964 100644
>> > > --- a/tools/testing/selftests/bpf/progs/test_sk_lookup.c
>> > > +++ b/tools/testing/selftests/bpf/progs/test_sk_lookup.c
>> > > @@ -65,6 +65,7 @@ static const __u32 KEY_SERVER_A = SERVER_A;
>> > >  static const __u32 KEY_SERVER_B = SERVER_B;
>> > >  
>> > >  static const __u16 SRC_PORT = bpf_htons(8008);
>> > > +static const __u32 SRC_PORT_U32 = bpf_htonl(8008U << 16);
>> > >  static const __u32 SRC_IP4 = IP4(127, 0, 0, 2);
>> > >  static const __u32 SRC_IP6[] = IP6(0xfd000000, 0x0, 0x0,
>> > > 0x00000002);
>> > >  
>> > > @@ -421,7 +422,7 @@ int ctx_narrow_access(struct bpf_sk_lookup
>> > > *ctx)
>> > >  
>> > >         /* Load from remote_port field with zero padding
>> > > (backward
>> > > compatibility) */
>> > >         val_u32 = *(__u32 *)&ctx->remote_port;
>> > > -       if (val_u32 != bpf_htonl(bpf_ntohs(SRC_PORT) << 16))
>> > > +       if (val_u32 != SRC_PORT_U32)
>> > >                 return SK_DROP;
>> > >  
>> > >         /* Narrow loads from local_port field. Expect DST_PORT.
>> > > */
>> > 
>> > Unfortunately this doesn't help with the s390 problem.
>> > I'll try to debug this.
>> 
>> I have to admit I have a hard time wrapping my head around the
>> requirements here.
>> 
>> Based on the pre-9a69e2b385f4 code, do I understand correctly that
>> for the following input
>> 
>> Port:     0x1f48
>> SRC_PORT: 0x481f
>> 
>> we expect the following results for different kinds of loads:
>> 
>> Size   Offset  LE      BE
>> BPF_B  0       0x1f    0
>> BPF_B  1       0x48    0
>> BPF_B  2       0       0x48
>> BPF_B  3       0       0x1f
>> BPF_H  0       0x481f  0
>> BPF_H  1       0       0x481f
>> BPF_W  0       0x481f  0x481f
>> 
>> and this is guaranteed by the struct bpf_sk_lookup ABI? Because then
>> it
>> looks as if 9a69e2b385f4 breaks it on big-endian as follows:
>> 
>> Size   Offset  BE-9a69e2b385f4
>> BPF_B  0       0x48
>> BPF_B  1       0x1f
>> BPF_B  2       0
>> BPF_B  3       0
>> BPF_H  0       0x481f
>> BPF_H  1       0
>> BPF_W  0       0x481f0000
>
> Sorry, I worded this incorrectly: 9a69e2b385f4 did not change the
> kernel behavior, the ABI is not broken and the old compiled code should
> continue to work.
> What the second table really shows are what the results should be
> according to the 9a69e2b385f4 struct bpf_sk_lookup definition, which I
> still think is broken on big-endian and needs to be adjusted to match
> the ABI.
>
> I noticed one other strange thing in the meantime: loads from
> *(__u32 *)&ctx->remote_port, *(__u16 *)&ctx->remote_port and
> *((__u16 *)&ctx->remote_port + 1) all produce 8008 on s390, which is
> clearly inconsistent. It looks as if convert_ctx_accesses() needs to be
> adjusted to handle combinations like ctx_field_size == 4 && size == 2
> && target_size == 2. I will continue with this tomorrow.
>
>> Or is the old behavior a bug and this new one is desirable?
>> 9a69e2b385f4 has no Fixes: tag, so I assume that's the former :-(
>> 
>> In which case, would it make sense to fix it by swapping remote_port
>> and :16 in bpf_sk_lookup on big-endian?

Thanks for looking into it.

When it comes to requirements, my intention was to keep the same
behavior as before the split up of the remote_port field in 9a69e2b385f4
("bpf: Make remote_port field in struct bpf_sk_lookup 16-bit wide").

9a69e2b385f4 was supposed to be a formality, after a similar change in
4421a582718a ("bpf: Make dst_port field in struct bpf_sock 16-bit
wide"), which went in earlier.

In 4421a582718a I've provided a bit more context. I understand that the
remote_port value, even before the type changed from u32 to u16,
appeared to the BPF program as if laid out in memory like so:

      offsetof(struct bpf_sk_lookup, remote_port) +0  <port MSB>
                                                  +1  <port LSB>
                                                  +2  0x00
                                                  +3  0x00

Translating it to your handy table format, I expect should result in
loads as so if port is 8008 = 0x1f48:

      Size   Offset  LE      BE
      BPF_B  0       0x1f    0x1f
      BPF_B  1       0x48    0x48
      BPF_B  2       0       0
      BPF_B  3       0       0
      BPF_H  0       0x481f  0x1f48
      BPF_H  1       0       0
      BPF_W  0       0x481f  0x1f480000

But since the fix does not work, there must be a mistake somewhere in my
reasoning.

I expect I should be able to get virtme for s390 working sometime this
week to check my math. I've seen your collegue had some luck with it
[1].

Looking forward to your findings.

[1] https://github.com/cilium/ebpf/issues/86#issuecomment-623945549
Ilya Leoshkevich Feb. 22, 2022, 5:42 p.m. UTC | #5
On Tue, 2022-02-22 at 15:53 +0100, Jakub Sitnicki wrote:
> On Tue, Feb 22, 2022 at 03:22 AM +01, Ilya Leoshkevich wrote:
> > On Tue, 2022-02-22 at 01:43 +0100, Ilya Leoshkevich wrote:
> > > On Mon, 2022-02-21 at 22:39 +0100, Ilya Leoshkevich wrote:
> > > > On Mon, 2022-02-21 at 19:03 +0100, Jakub Sitnicki wrote:
> > > > > Shifting 16-bit type by 16 bits is implementation-defined for
> > > > > BPF
> > > > > programs.
> > > > > Don't rely on it in case it is causing the test failures we
> > > > > are
> > > > > seeing on
> > > > > s390x z15 target.
> > > > > 
> > > > > Fixes: 2ed0dc5937d3 ("selftests/bpf: Cover 4-byte load from
> > > > > remote_port in bpf_sk_lookup")
> > > > > Reported-by: Andrii Nakryiko <andrii.nakryiko@gmail.com>
> > > > > Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
> > > > > ---
> > > > > 
> > > > > I don't have a dev env for s390x/z15 set up yet, so can't
> > > > > definitely
> > > > > confirm the fix.
> > > > > That said, it seems worth fixing either way.
> > > > > 
> > > > >  tools/testing/selftests/bpf/progs/test_sk_lookup.c | 3 ++-
> > > > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > > > > 
> > > > > diff --git
> > > > > a/tools/testing/selftests/bpf/progs/test_sk_lookup.c
> > > > > b/tools/testing/selftests/bpf/progs/test_sk_lookup.c
> > > > > index bf5b7caefdd0..7d47276a8964 100644
> > > > > --- a/tools/testing/selftests/bpf/progs/test_sk_lookup.c
> > > > > +++ b/tools/testing/selftests/bpf/progs/test_sk_lookup.c
> > > > > @@ -65,6 +65,7 @@ static const __u32 KEY_SERVER_A = SERVER_A;
> > > > >  static const __u32 KEY_SERVER_B = SERVER_B;
> > > > >  
> > > > >  static const __u16 SRC_PORT = bpf_htons(8008);
> > > > > +static const __u32 SRC_PORT_U32 = bpf_htonl(8008U << 16);
> > > > >  static const __u32 SRC_IP4 = IP4(127, 0, 0, 2);
> > > > >  static const __u32 SRC_IP6[] = IP6(0xfd000000, 0x0, 0x0,
> > > > > 0x00000002);
> > > > >  
> > > > > @@ -421,7 +422,7 @@ int ctx_narrow_access(struct
> > > > > bpf_sk_lookup
> > > > > *ctx)
> > > > >  
> > > > >         /* Load from remote_port field with zero padding
> > > > > (backward
> > > > > compatibility) */
> > > > >         val_u32 = *(__u32 *)&ctx->remote_port;
> > > > > -       if (val_u32 != bpf_htonl(bpf_ntohs(SRC_PORT) << 16))
> > > > > +       if (val_u32 != SRC_PORT_U32)
> > > > >                 return SK_DROP;
> > > > >  
> > > > >         /* Narrow loads from local_port field. Expect
> > > > > DST_PORT.
> > > > > */
> > > > 
> > > > Unfortunately this doesn't help with the s390 problem.
> > > > I'll try to debug this.
> > > 
> > > I have to admit I have a hard time wrapping my head around the
> > > requirements here.
> > > 
> > > Based on the pre-9a69e2b385f4 code, do I understand correctly
> > > that
> > > for the following input
> > > 
> > > Port:     0x1f48
> > > SRC_PORT: 0x481f
> > > 
> > > we expect the following results for different kinds of loads:
> > > 
> > > Size   Offset  LE      BE
> > > BPF_B  0       0x1f    0
> > > BPF_B  1       0x48    0
> > > BPF_B  2       0       0x48
> > > BPF_B  3       0       0x1f
> > > BPF_H  0       0x481f  0
> > > BPF_H  1       0       0x481f
> > > BPF_W  0       0x481f  0x481f
> > > 
> > > and this is guaranteed by the struct bpf_sk_lookup ABI? Because
> > > then
> > > it
> > > looks as if 9a69e2b385f4 breaks it on big-endian as follows:
> > > 
> > > Size   Offset  BE-9a69e2b385f4
> > > BPF_B  0       0x48
> > > BPF_B  1       0x1f
> > > BPF_B  2       0
> > > BPF_B  3       0
> > > BPF_H  0       0x481f
> > > BPF_H  1       0
> > > BPF_W  0       0x481f0000
> > 
> > Sorry, I worded this incorrectly: 9a69e2b385f4 did not change the
> > kernel behavior, the ABI is not broken and the old compiled code
> > should
> > continue to work.
> > What the second table really shows are what the results should be
> > according to the 9a69e2b385f4 struct bpf_sk_lookup definition,
> > which I
> > still think is broken on big-endian and needs to be adjusted to
> > match
> > the ABI.
> > 
> > I noticed one other strange thing in the meantime: loads from
> > *(__u32 *)&ctx->remote_port, *(__u16 *)&ctx->remote_port and
> > *((__u16 *)&ctx->remote_port + 1) all produce 8008 on s390, which
> > is
> > clearly inconsistent. It looks as if convert_ctx_accesses() needs
> > to be
> > adjusted to handle combinations like ctx_field_size == 4 && size ==
> > 2
> > && target_size == 2. I will continue with this tomorrow.
> > 
> > > Or is the old behavior a bug and this new one is desirable?
> > > 9a69e2b385f4 has no Fixes: tag, so I assume that's the former :-(
> > > 
> > > In which case, would it make sense to fix it by swapping
> > > remote_port
> > > and :16 in bpf_sk_lookup on big-endian?
> 
> Thanks for looking into it.
> 
> When it comes to requirements, my intention was to keep the same
> behavior as before the split up of the remote_port field in
> 9a69e2b385f4
> ("bpf: Make remote_port field in struct bpf_sk_lookup 16-bit wide").
> 
> 9a69e2b385f4 was supposed to be a formality, after a similar change
> in
> 4421a582718a ("bpf: Make dst_port field in struct bpf_sock 16-bit
> wide"), which went in earlier.
> 
> In 4421a582718a I've provided a bit more context. I understand that
> the
> remote_port value, even before the type changed from u32 to u16,
> appeared to the BPF program as if laid out in memory like so:
> 
>       offsetof(struct bpf_sk_lookup, remote_port) +0  <port MSB>
>                                                   +1  <port LSB>
>                                                   +2  0x00
>                                                   +3  0x00
> 
> Translating it to your handy table format, I expect should result in
> loads as so if port is 8008 = 0x1f48:
> 
>       Size   Offset  LE      BE
>       BPF_B  0       0x1f    0x1f
>       BPF_B  1       0x48    0x48
>       BPF_B  2       0       0
>       BPF_B  3       0       0
>       BPF_H  0       0x481f  0x1f48
>       BPF_H  1       0       0
>       BPF_W  0       0x481f  0x1f480000

Hmm, I think for big-endian the layout is different.
If we look at test_sk_lookup.c from 9a69e2b385f4^:

        /* Narrow loads from remote_port field. Expect SRC_PORT. */
        if (LSB(ctx->remote_port, 0) != ((SRC_PORT >> 0) & 0xff) ||
            LSB(ctx->remote_port, 1) != ((SRC_PORT >> 8) & 0xff) ||
            LSB(ctx->remote_port, 2) != 0 || LSB(ctx->remote_port, 3)
!= 0)
                return SK_DROP;

LSB() on little-endian is just byte indexing, so it's indeed 
1f,48,00,00. However, on big-endian it's indexing from the end, so
it's 00,00,48,1f.

> But since the fix does not work, there must be a mistake somewhere in
> my
> reasoning.
> 
> I expect I should be able to get virtme for s390 working sometime
> this
> week to check my math. I've seen your collegue had some luck with it
> [1].

Yeah, I think it should work. In the worst case it should be possible
to tweak vmtest.sh to cross-compile and emulate s390.

> Looking forward to your findings.
> 
> [1] https://github.com/cilium/ebpf/issues/86#issuecomment-623945549
Jakub Sitnicki Feb. 22, 2022, 6:24 p.m. UTC | #6
On Tue, Feb 22, 2022 at 06:42 PM +01, Ilya Leoshkevich wrote:
> On Tue, 2022-02-22 at 15:53 +0100, Jakub Sitnicki wrote:
>> On Tue, Feb 22, 2022 at 03:22 AM +01, Ilya Leoshkevich wrote:
>> > On Tue, 2022-02-22 at 01:43 +0100, Ilya Leoshkevich wrote:
>> > > On Mon, 2022-02-21 at 22:39 +0100, Ilya Leoshkevich wrote:
>> > > > On Mon, 2022-02-21 at 19:03 +0100, Jakub Sitnicki wrote:
>> > > > > Shifting 16-bit type by 16 bits is implementation-defined for
>> > > > > BPF
>> > > > > programs.
>> > > > > Don't rely on it in case it is causing the test failures we
>> > > > > are
>> > > > > seeing on
>> > > > > s390x z15 target.
>> > > > > 
>> > > > > Fixes: 2ed0dc5937d3 ("selftests/bpf: Cover 4-byte load from
>> > > > > remote_port in bpf_sk_lookup")
>> > > > > Reported-by: Andrii Nakryiko <andrii.nakryiko@gmail.com>
>> > > > > Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
>> > > > > ---
>> > > > > 
>> > > > > I don't have a dev env for s390x/z15 set up yet, so can't
>> > > > > definitely
>> > > > > confirm the fix.
>> > > > > That said, it seems worth fixing either way.
>> > > > > 
>> > > > >  tools/testing/selftests/bpf/progs/test_sk_lookup.c | 3 ++-
>> > > > >  1 file changed, 2 insertions(+), 1 deletion(-)
>> > > > > 
>> > > > > diff --git
>> > > > > a/tools/testing/selftests/bpf/progs/test_sk_lookup.c
>> > > > > b/tools/testing/selftests/bpf/progs/test_sk_lookup.c
>> > > > > index bf5b7caefdd0..7d47276a8964 100644
>> > > > > --- a/tools/testing/selftests/bpf/progs/test_sk_lookup.c
>> > > > > +++ b/tools/testing/selftests/bpf/progs/test_sk_lookup.c
>> > > > > @@ -65,6 +65,7 @@ static const __u32 KEY_SERVER_A = SERVER_A;
>> > > > >  static const __u32 KEY_SERVER_B = SERVER_B;
>> > > > >  
>> > > > >  static const __u16 SRC_PORT = bpf_htons(8008);
>> > > > > +static const __u32 SRC_PORT_U32 = bpf_htonl(8008U << 16);
>> > > > >  static const __u32 SRC_IP4 = IP4(127, 0, 0, 2);
>> > > > >  static const __u32 SRC_IP6[] = IP6(0xfd000000, 0x0, 0x0,
>> > > > > 0x00000002);
>> > > > >  
>> > > > > @@ -421,7 +422,7 @@ int ctx_narrow_access(struct
>> > > > > bpf_sk_lookup
>> > > > > *ctx)
>> > > > >  
>> > > > >         /* Load from remote_port field with zero padding
>> > > > > (backward
>> > > > > compatibility) */
>> > > > >         val_u32 = *(__u32 *)&ctx->remote_port;
>> > > > > -       if (val_u32 != bpf_htonl(bpf_ntohs(SRC_PORT) << 16))
>> > > > > +       if (val_u32 != SRC_PORT_U32)
>> > > > >                 return SK_DROP;
>> > > > >  
>> > > > >         /* Narrow loads from local_port field. Expect
>> > > > > DST_PORT.
>> > > > > */
>> > > > 
>> > > > Unfortunately this doesn't help with the s390 problem.
>> > > > I'll try to debug this.
>> > > 
>> > > I have to admit I have a hard time wrapping my head around the
>> > > requirements here.
>> > > 
>> > > Based on the pre-9a69e2b385f4 code, do I understand correctly
>> > > that
>> > > for the following input
>> > > 
>> > > Port:     0x1f48
>> > > SRC_PORT: 0x481f
>> > > 
>> > > we expect the following results for different kinds of loads:
>> > > 
>> > > Size   Offset  LE      BE
>> > > BPF_B  0       0x1f    0
>> > > BPF_B  1       0x48    0
>> > > BPF_B  2       0       0x48
>> > > BPF_B  3       0       0x1f
>> > > BPF_H  0       0x481f  0
>> > > BPF_H  1       0       0x481f
>> > > BPF_W  0       0x481f  0x481f
>> > > 
>> > > and this is guaranteed by the struct bpf_sk_lookup ABI? Because
>> > > then
>> > > it
>> > > looks as if 9a69e2b385f4 breaks it on big-endian as follows:
>> > > 
>> > > Size   Offset  BE-9a69e2b385f4
>> > > BPF_B  0       0x48
>> > > BPF_B  1       0x1f
>> > > BPF_B  2       0
>> > > BPF_B  3       0
>> > > BPF_H  0       0x481f
>> > > BPF_H  1       0
>> > > BPF_W  0       0x481f0000
>> > 
>> > Sorry, I worded this incorrectly: 9a69e2b385f4 did not change the
>> > kernel behavior, the ABI is not broken and the old compiled code
>> > should
>> > continue to work.
>> > What the second table really shows are what the results should be
>> > according to the 9a69e2b385f4 struct bpf_sk_lookup definition,
>> > which I
>> > still think is broken on big-endian and needs to be adjusted to
>> > match
>> > the ABI.
>> > 
>> > I noticed one other strange thing in the meantime: loads from
>> > *(__u32 *)&ctx->remote_port, *(__u16 *)&ctx->remote_port and
>> > *((__u16 *)&ctx->remote_port + 1) all produce 8008 on s390, which
>> > is
>> > clearly inconsistent. It looks as if convert_ctx_accesses() needs
>> > to be
>> > adjusted to handle combinations like ctx_field_size == 4 && size ==
>> > 2
>> > && target_size == 2. I will continue with this tomorrow.
>> > 
>> > > Or is the old behavior a bug and this new one is desirable?
>> > > 9a69e2b385f4 has no Fixes: tag, so I assume that's the former :-(
>> > > 
>> > > In which case, would it make sense to fix it by swapping
>> > > remote_port
>> > > and :16 in bpf_sk_lookup on big-endian?
>> 
>> Thanks for looking into it.
>> 
>> When it comes to requirements, my intention was to keep the same
>> behavior as before the split up of the remote_port field in
>> 9a69e2b385f4
>> ("bpf: Make remote_port field in struct bpf_sk_lookup 16-bit wide").
>> 
>> 9a69e2b385f4 was supposed to be a formality, after a similar change
>> in
>> 4421a582718a ("bpf: Make dst_port field in struct bpf_sock 16-bit
>> wide"), which went in earlier.
>> 
>> In 4421a582718a I've provided a bit more context. I understand that
>> the
>> remote_port value, even before the type changed from u32 to u16,
>> appeared to the BPF program as if laid out in memory like so:
>> 
>>       offsetof(struct bpf_sk_lookup, remote_port) +0  <port MSB>
>>                                                   +1  <port LSB>
>>                                                   +2  0x00
>>                                                   +3  0x00
>> 
>> Translating it to your handy table format, I expect should result in
>> loads as so if port is 8008 = 0x1f48:
>> 
>>       Size   Offset  LE      BE
>>       BPF_B  0       0x1f    0x1f
>>       BPF_B  1       0x48    0x48
>>       BPF_B  2       0       0
>>       BPF_B  3       0       0
>>       BPF_H  0       0x481f  0x1f48
>>       BPF_H  1       0       0
>>       BPF_W  0       0x481f  0x1f480000
>
> Hmm, I think for big-endian the layout is different.
> If we look at test_sk_lookup.c from 9a69e2b385f4^:
>
>         /* Narrow loads from remote_port field. Expect SRC_PORT. */
>         if (LSB(ctx->remote_port, 0) != ((SRC_PORT >> 0) & 0xff) ||
>             LSB(ctx->remote_port, 1) != ((SRC_PORT >> 8) & 0xff) ||
>             LSB(ctx->remote_port, 2) != 0 || LSB(ctx->remote_port, 3)
> != 0)
>                 return SK_DROP;
>
> LSB() on little-endian is just byte indexing, so it's indeed 
> 1f,48,00,00. However, on big-endian it's indexing from the end, so
> it's 00,00,48,1f.

I understood that LSB() is indexing from the end on BE because SRC_PORT
constant value differs on LE (= 0x481f) and BE (= 0x1f48) platforms, so

                 LE  BE
  SRC_PORT >> 0  1f  48
  SRC_PORT >> 8  48  1f

So on LE we first compare remote_port MSB, then LSB.
While on BE we start with remote_port LSB, then MSB.

But, now that you have pointed it out, I notice that sizeof(remote_port)
has changed and from 4 to 2, and I can't see how LSB(…, 3) and LSB(…, 4)
loads can keep working on big-endian.

[...]
Ilya Leoshkevich Feb. 22, 2022, 9:51 p.m. UTC | #7
On Tue, 2022-02-22 at 19:24 +0100, Jakub Sitnicki wrote:
> On Tue, Feb 22, 2022 at 06:42 PM +01, Ilya Leoshkevich wrote:
> > On Tue, 2022-02-22 at 15:53 +0100, Jakub Sitnicki wrote:
> > > On Tue, Feb 22, 2022 at 03:22 AM +01, Ilya Leoshkevich wrote:
> > > > On Tue, 2022-02-22 at 01:43 +0100, Ilya Leoshkevich wrote:
> > > > > On Mon, 2022-02-21 at 22:39 +0100, Ilya Leoshkevich wrote:
> > > > > > On Mon, 2022-02-21 at 19:03 +0100, Jakub Sitnicki wrote:
> > > > > > > Shifting 16-bit type by 16 bits is implementation-defined
> > > > > > > for
> > > > > > > BPF
> > > > > > > programs.
> > > > > > > Don't rely on it in case it is causing the test failures
> > > > > > > we
> > > > > > > are
> > > > > > > seeing on
> > > > > > > s390x z15 target.
> > > > > > > 
> > > > > > > Fixes: 2ed0dc5937d3 ("selftests/bpf: Cover 4-byte load
> > > > > > > from
> > > > > > > remote_port in bpf_sk_lookup")
> > > > > > > Reported-by: Andrii Nakryiko <andrii.nakryiko@gmail.com>
> > > > > > > Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
> > > > > > > ---
> > > > > > > 
> > > > > > > I don't have a dev env for s390x/z15 set up yet, so can't
> > > > > > > definitely
> > > > > > > confirm the fix.
> > > > > > > That said, it seems worth fixing either way.
> > > > > > > 
> > > > > > >  tools/testing/selftests/bpf/progs/test_sk_lookup.c | 3
> > > > > > > ++-
> > > > > > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > > > > > > 
> > > > > > > diff --git
> > > > > > > a/tools/testing/selftests/bpf/progs/test_sk_lookup.c
> > > > > > > b/tools/testing/selftests/bpf/progs/test_sk_lookup.c
> > > > > > > index bf5b7caefdd0..7d47276a8964 100644
> > > > > > > --- a/tools/testing/selftests/bpf/progs/test_sk_lookup.c
> > > > > > > +++ b/tools/testing/selftests/bpf/progs/test_sk_lookup.c
> > > > > > > @@ -65,6 +65,7 @@ static const __u32 KEY_SERVER_A =
> > > > > > > SERVER_A;
> > > > > > >  static const __u32 KEY_SERVER_B = SERVER_B;
> > > > > > >  
> > > > > > >  static const __u16 SRC_PORT = bpf_htons(8008);
> > > > > > > +static const __u32 SRC_PORT_U32 = bpf_htonl(8008U <<
> > > > > > > 16);
> > > > > > >  static const __u32 SRC_IP4 = IP4(127, 0, 0, 2);
> > > > > > >  static const __u32 SRC_IP6[] = IP6(0xfd000000, 0x0, 0x0,
> > > > > > > 0x00000002);
> > > > > > >  
> > > > > > > @@ -421,7 +422,7 @@ int ctx_narrow_access(struct
> > > > > > > bpf_sk_lookup
> > > > > > > *ctx)
> > > > > > >  
> > > > > > >         /* Load from remote_port field with zero padding
> > > > > > > (backward
> > > > > > > compatibility) */
> > > > > > >         val_u32 = *(__u32 *)&ctx->remote_port;
> > > > > > > -       if (val_u32 != bpf_htonl(bpf_ntohs(SRC_PORT) <<
> > > > > > > 16))
> > > > > > > +       if (val_u32 != SRC_PORT_U32)
> > > > > > >                 return SK_DROP;
> > > > > > >  
> > > > > > >         /* Narrow loads from local_port field. Expect
> > > > > > > DST_PORT.
> > > > > > > */
> > > > > > 
> > > > > > Unfortunately this doesn't help with the s390 problem.
> > > > > > I'll try to debug this.
> > > > > 
> > > > > I have to admit I have a hard time wrapping my head around
> > > > > the
> > > > > requirements here.
> > > > > 
> > > > > Based on the pre-9a69e2b385f4 code, do I understand correctly
> > > > > that
> > > > > for the following input
> > > > > 
> > > > > Port:     0x1f48
> > > > > SRC_PORT: 0x481f
> > > > > 
> > > > > we expect the following results for different kinds of loads:
> > > > > 
> > > > > Size   Offset  LE      BE
> > > > > BPF_B  0       0x1f    0
> > > > > BPF_B  1       0x48    0
> > > > > BPF_B  2       0       0x48
> > > > > BPF_B  3       0       0x1f
> > > > > BPF_H  0       0x481f  0
> > > > > BPF_H  1       0       0x481f
> > > > > BPF_W  0       0x481f  0x481f
> > > > > 
> > > > > and this is guaranteed by the struct bpf_sk_lookup ABI?
> > > > > Because
> > > > > then
> > > > > it
> > > > > looks as if 9a69e2b385f4 breaks it on big-endian as follows:
> > > > > 
> > > > > Size   Offset  BE-9a69e2b385f4
> > > > > BPF_B  0       0x48
> > > > > BPF_B  1       0x1f
> > > > > BPF_B  2       0
> > > > > BPF_B  3       0
> > > > > BPF_H  0       0x481f
> > > > > BPF_H  1       0
> > > > > BPF_W  0       0x481f0000
> > > > 
> > > > Sorry, I worded this incorrectly: 9a69e2b385f4 did not change
> > > > the
> > > > kernel behavior, the ABI is not broken and the old compiled
> > > > code
> > > > should
> > > > continue to work.
> > > > What the second table really shows are what the results should
> > > > be
> > > > according to the 9a69e2b385f4 struct bpf_sk_lookup definition,
> > > > which I
> > > > still think is broken on big-endian and needs to be adjusted to
> > > > match
> > > > the ABI.
> > > > 
> > > > I noticed one other strange thing in the meantime: loads from
> > > > *(__u32 *)&ctx->remote_port, *(__u16 *)&ctx->remote_port and
> > > > *((__u16 *)&ctx->remote_port + 1) all produce 8008 on s390,
> > > > which
> > > > is
> > > > clearly inconsistent. It looks as if convert_ctx_accesses()
> > > > needs
> > > > to be
> > > > adjusted to handle combinations like ctx_field_size == 4 &&
> > > > size ==
> > > > 2
> > > > && target_size == 2. I will continue with this tomorrow.
> > > > 
> > > > > Or is the old behavior a bug and this new one is desirable?
> > > > > 9a69e2b385f4 has no Fixes: tag, so I assume that's the former
> > > > > :-(
> > > > > 
> > > > > In which case, would it make sense to fix it by swapping
> > > > > remote_port
> > > > > and :16 in bpf_sk_lookup on big-endian?
> > > 
> > > Thanks for looking into it.
> > > 
> > > When it comes to requirements, my intention was to keep the same
> > > behavior as before the split up of the remote_port field in
> > > 9a69e2b385f4
> > > ("bpf: Make remote_port field in struct bpf_sk_lookup 16-bit
> > > wide").
> > > 
> > > 9a69e2b385f4 was supposed to be a formality, after a similar
> > > change
> > > in
> > > 4421a582718a ("bpf: Make dst_port field in struct bpf_sock 16-bit
> > > wide"), which went in earlier.
> > > 
> > > In 4421a582718a I've provided a bit more context. I understand
> > > that
> > > the
> > > remote_port value, even before the type changed from u32 to u16,
> > > appeared to the BPF program as if laid out in memory like so:
> > > 
> > >       offsetof(struct bpf_sk_lookup, remote_port) +0  <port MSB>
> > >                                                   +1  <port LSB>
> > >                                                   +2  0x00
> > >                                                   +3  0x00
> > > 
> > > Translating it to your handy table format, I expect should result
> > > in
> > > loads as so if port is 8008 = 0x1f48:
> > > 
> > >       Size   Offset  LE      BE
> > >       BPF_B  0       0x1f    0x1f
> > >       BPF_B  1       0x48    0x48
> > >       BPF_B  2       0       0
> > >       BPF_B  3       0       0
> > >       BPF_H  0       0x481f  0x1f48
> > >       BPF_H  1       0       0
> > >       BPF_W  0       0x481f  0x1f480000
> > 
> > Hmm, I think for big-endian the layout is different.
> > If we look at test_sk_lookup.c from 9a69e2b385f4^:
> > 
> >         /* Narrow loads from remote_port field. Expect SRC_PORT. */
> >         if (LSB(ctx->remote_port, 0) != ((SRC_PORT >> 0) & 0xff) ||
> >             LSB(ctx->remote_port, 1) != ((SRC_PORT >> 8) & 0xff) ||
> >             LSB(ctx->remote_port, 2) != 0 || LSB(ctx->remote_port,
> > 3)
> > != 0)
> >                 return SK_DROP;
> > 
> > LSB() on little-endian is just byte indexing, so it's indeed 
> > 1f,48,00,00. However, on big-endian it's indexing from the end, so
> > it's 00,00,48,1f.
> 
> I understood that LSB() is indexing from the end on BE because
> SRC_PORT
> constant value differs on LE (= 0x481f) and BE (= 0x1f48) platforms,
> so
> 
>                  LE  BE
>   SRC_PORT >> 0  1f  48
>   SRC_PORT >> 8  48  1f
> 
> So on LE we first compare remote_port MSB, then LSB.
> While on BE we start with remote_port LSB, then MSB.
> 
> But, now that you have pointed it out, I notice that
> sizeof(remote_port)
> has changed and from 4 to 2, and I can't see how LSB(…, 3) and LSB(…,
> 4)
> loads can keep working on big-endian.

Oh, right - it should be 00,00,1f,48 on big-endian.
Out-of-bounds LSB is indeed an issue. I've posted my current
thoughts as an RFC series [1], this one is addressed in patch 3.

[1]
https://lore.kernel.org/bpf/20220222182559.2865596-1-iii@linux.ibm.com/

> 
> [...]
Jakub Sitnicki Feb. 25, 2022, 10:01 a.m. UTC | #8
On Tue, Feb 22, 2022 at 10:51 PM +01, Ilya Leoshkevich wrote:
> On Tue, 2022-02-22 at 19:24 +0100, Jakub Sitnicki wrote:
>> On Tue, Feb 22, 2022 at 06:42 PM +01, Ilya Leoshkevich wrote:
>> > On Tue, 2022-02-22 at 15:53 +0100, Jakub Sitnicki wrote:
>> > > On Tue, Feb 22, 2022 at 03:22 AM +01, Ilya Leoshkevich wrote:
>> > > > On Tue, 2022-02-22 at 01:43 +0100, Ilya Leoshkevich wrote:
>> > > > > On Mon, 2022-02-21 at 22:39 +0100, Ilya Leoshkevich wrote:
>> > > > > > On Mon, 2022-02-21 at 19:03 +0100, Jakub Sitnicki wrote:
>> > > > > > > Shifting 16-bit type by 16 bits is implementation-defined
>> > > > > > > for
>> > > > > > > BPF
>> > > > > > > programs.
>> > > > > > > Don't rely on it in case it is causing the test failures
>> > > > > > > we
>> > > > > > > are
>> > > > > > > seeing on
>> > > > > > > s390x z15 target.
>> > > > > > > 
>> > > > > > > Fixes: 2ed0dc5937d3 ("selftests/bpf: Cover 4-byte load
>> > > > > > > from
>> > > > > > > remote_port in bpf_sk_lookup")
>> > > > > > > Reported-by: Andrii Nakryiko <andrii.nakryiko@gmail.com>
>> > > > > > > Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
>> > > > > > > ---
>> > > > > > > 
>> > > > > > > I don't have a dev env for s390x/z15 set up yet, so can't
>> > > > > > > definitely
>> > > > > > > confirm the fix.
>> > > > > > > That said, it seems worth fixing either way.
>> > > > > > > 
>> > > > > > >  tools/testing/selftests/bpf/progs/test_sk_lookup.c | 3
>> > > > > > > ++-
>> > > > > > >  1 file changed, 2 insertions(+), 1 deletion(-)
>> > > > > > > 
>> > > > > > > diff --git
>> > > > > > > a/tools/testing/selftests/bpf/progs/test_sk_lookup.c
>> > > > > > > b/tools/testing/selftests/bpf/progs/test_sk_lookup.c
>> > > > > > > index bf5b7caefdd0..7d47276a8964 100644
>> > > > > > > --- a/tools/testing/selftests/bpf/progs/test_sk_lookup.c
>> > > > > > > +++ b/tools/testing/selftests/bpf/progs/test_sk_lookup.c
>> > > > > > > @@ -65,6 +65,7 @@ static const __u32 KEY_SERVER_A =
>> > > > > > > SERVER_A;
>> > > > > > >  static const __u32 KEY_SERVER_B = SERVER_B;
>> > > > > > >  
>> > > > > > >  static const __u16 SRC_PORT = bpf_htons(8008);
>> > > > > > > +static const __u32 SRC_PORT_U32 = bpf_htonl(8008U <<
>> > > > > > > 16);
>> > > > > > >  static const __u32 SRC_IP4 = IP4(127, 0, 0, 2);
>> > > > > > >  static const __u32 SRC_IP6[] = IP6(0xfd000000, 0x0, 0x0,
>> > > > > > > 0x00000002);
>> > > > > > >  
>> > > > > > > @@ -421,7 +422,7 @@ int ctx_narrow_access(struct
>> > > > > > > bpf_sk_lookup
>> > > > > > > *ctx)
>> > > > > > >  
>> > > > > > >         /* Load from remote_port field with zero padding
>> > > > > > > (backward
>> > > > > > > compatibility) */
>> > > > > > >         val_u32 = *(__u32 *)&ctx->remote_port;
>> > > > > > > -       if (val_u32 != bpf_htonl(bpf_ntohs(SRC_PORT) <<
>> > > > > > > 16))
>> > > > > > > +       if (val_u32 != SRC_PORT_U32)
>> > > > > > >                 return SK_DROP;
>> > > > > > >  
>> > > > > > >         /* Narrow loads from local_port field. Expect
>> > > > > > > DST_PORT.
>> > > > > > > */
>> > > > > > 
>> > > > > > Unfortunately this doesn't help with the s390 problem.
>> > > > > > I'll try to debug this.
>> > > > > 
>> > > > > I have to admit I have a hard time wrapping my head around
>> > > > > the
>> > > > > requirements here.
>> > > > > 
>> > > > > Based on the pre-9a69e2b385f4 code, do I understand correctly
>> > > > > that
>> > > > > for the following input
>> > > > > 
>> > > > > Port:     0x1f48
>> > > > > SRC_PORT: 0x481f
>> > > > > 
>> > > > > we expect the following results for different kinds of loads:
>> > > > > 
>> > > > > Size   Offset  LE      BE
>> > > > > BPF_B  0       0x1f    0
>> > > > > BPF_B  1       0x48    0
>> > > > > BPF_B  2       0       0x48
>> > > > > BPF_B  3       0       0x1f
>> > > > > BPF_H  0       0x481f  0
>> > > > > BPF_H  1       0       0x481f
>> > > > > BPF_W  0       0x481f  0x481f
>> > > > > 
>> > > > > and this is guaranteed by the struct bpf_sk_lookup ABI?
>> > > > > Because
>> > > > > then
>> > > > > it
>> > > > > looks as if 9a69e2b385f4 breaks it on big-endian as follows:
>> > > > > 
>> > > > > Size   Offset  BE-9a69e2b385f4
>> > > > > BPF_B  0       0x48
>> > > > > BPF_B  1       0x1f
>> > > > > BPF_B  2       0
>> > > > > BPF_B  3       0
>> > > > > BPF_H  0       0x481f
>> > > > > BPF_H  1       0
>> > > > > BPF_W  0       0x481f0000
>> > > > 
>> > > > Sorry, I worded this incorrectly: 9a69e2b385f4 did not change
>> > > > the
>> > > > kernel behavior, the ABI is not broken and the old compiled
>> > > > code
>> > > > should
>> > > > continue to work.
>> > > > What the second table really shows are what the results should
>> > > > be
>> > > > according to the 9a69e2b385f4 struct bpf_sk_lookup definition,
>> > > > which I
>> > > > still think is broken on big-endian and needs to be adjusted to
>> > > > match
>> > > > the ABI.
>> > > > 
>> > > > I noticed one other strange thing in the meantime: loads from
>> > > > *(__u32 *)&ctx->remote_port, *(__u16 *)&ctx->remote_port and
>> > > > *((__u16 *)&ctx->remote_port + 1) all produce 8008 on s390,
>> > > > which
>> > > > is
>> > > > clearly inconsistent. It looks as if convert_ctx_accesses()
>> > > > needs
>> > > > to be
>> > > > adjusted to handle combinations like ctx_field_size == 4 &&
>> > > > size ==
>> > > > 2
>> > > > && target_size == 2. I will continue with this tomorrow.
>> > > > 
>> > > > > Or is the old behavior a bug and this new one is desirable?
>> > > > > 9a69e2b385f4 has no Fixes: tag, so I assume that's the former
>> > > > > :-(
>> > > > > 
>> > > > > In which case, would it make sense to fix it by swapping
>> > > > > remote_port
>> > > > > and :16 in bpf_sk_lookup on big-endian?
>> > > 
>> > > Thanks for looking into it.
>> > > 
>> > > When it comes to requirements, my intention was to keep the same
>> > > behavior as before the split up of the remote_port field in
>> > > 9a69e2b385f4
>> > > ("bpf: Make remote_port field in struct bpf_sk_lookup 16-bit
>> > > wide").
>> > > 
>> > > 9a69e2b385f4 was supposed to be a formality, after a similar
>> > > change
>> > > in
>> > > 4421a582718a ("bpf: Make dst_port field in struct bpf_sock 16-bit
>> > > wide"), which went in earlier.
>> > > 
>> > > In 4421a582718a I've provided a bit more context. I understand
>> > > that
>> > > the
>> > > remote_port value, even before the type changed from u32 to u16,
>> > > appeared to the BPF program as if laid out in memory like so:
>> > > 
>> > >       offsetof(struct bpf_sk_lookup, remote_port) +0  <port MSB>
>> > >                                                   +1  <port LSB>
>> > >                                                   +2  0x00
>> > >                                                   +3  0x00
>> > > 
>> > > Translating it to your handy table format, I expect should result
>> > > in
>> > > loads as so if port is 8008 = 0x1f48:
>> > > 
>> > >       Size   Offset  LE      BE
>> > >       BPF_B  0       0x1f    0x1f
>> > >       BPF_B  1       0x48    0x48
>> > >       BPF_B  2       0       0
>> > >       BPF_B  3       0       0
>> > >       BPF_H  0       0x481f  0x1f48
>> > >       BPF_H  1       0       0
>> > >       BPF_W  0       0x481f  0x1f480000
>> > 
>> > Hmm, I think for big-endian the layout is different.
>> > If we look at test_sk_lookup.c from 9a69e2b385f4^:
>> > 
>> >         /* Narrow loads from remote_port field. Expect SRC_PORT. */
>> >         if (LSB(ctx->remote_port, 0) != ((SRC_PORT >> 0) & 0xff) ||
>> >             LSB(ctx->remote_port, 1) != ((SRC_PORT >> 8) & 0xff) ||
>> >             LSB(ctx->remote_port, 2) != 0 || LSB(ctx->remote_port,
>> > 3)
>> > != 0)
>> >                 return SK_DROP;
>> > 
>> > LSB() on little-endian is just byte indexing, so it's indeed 
>> > 1f,48,00,00. However, on big-endian it's indexing from the end, so
>> > it's 00,00,48,1f.
>> 
>> I understood that LSB() is indexing from the end on BE because
>> SRC_PORT
>> constant value differs on LE (= 0x481f) and BE (= 0x1f48) platforms,
>> so
>> 
>>                  LE  BE
>>   SRC_PORT >> 0  1f  48
>>   SRC_PORT >> 8  48  1f
>> 
>> So on LE we first compare remote_port MSB, then LSB.
>> While on BE we start with remote_port LSB, then MSB.
>> 
>> But, now that you have pointed it out, I notice that
>> sizeof(remote_port)
>> has changed and from 4 to 2, and I can't see how LSB(…, 3) and LSB(…,
>> 4)
>> loads can keep working on big-endian.
>
> Oh, right - it should be 00,00,1f,48 on big-endian.
> Out-of-bounds LSB is indeed an issue. I've posted my current
> thoughts as an RFC series [1], this one is addressed in patch 3.
>
> [1]
> https://lore.kernel.org/bpf/20220222182559.2865596-1-iii@linux.ibm.com/

Sorry for the radio silience.

I have patched my way through cross-compiling bpf selftests to s390, and
can reproduce the sk_lookup test failure on big-endian.

bpftool gen object/skeleton refusing to process BE BPF objects on an LE
host got me stuck for a moment, but user mode qemu saved the day.

FWIW, sock_fields test are also broken on BE, but failing silently, as I
observed by instrumenting the read_sk_dst_port BPF prog with bpf_printk.

I will be able check out the RFC series over this weekend.
diff mbox series

Patch

diff --git a/tools/testing/selftests/bpf/progs/test_sk_lookup.c b/tools/testing/selftests/bpf/progs/test_sk_lookup.c
index bf5b7caefdd0..7d47276a8964 100644
--- a/tools/testing/selftests/bpf/progs/test_sk_lookup.c
+++ b/tools/testing/selftests/bpf/progs/test_sk_lookup.c
@@ -65,6 +65,7 @@  static const __u32 KEY_SERVER_A = SERVER_A;
 static const __u32 KEY_SERVER_B = SERVER_B;
 
 static const __u16 SRC_PORT = bpf_htons(8008);
+static const __u32 SRC_PORT_U32 = bpf_htonl(8008U << 16);
 static const __u32 SRC_IP4 = IP4(127, 0, 0, 2);
 static const __u32 SRC_IP6[] = IP6(0xfd000000, 0x0, 0x0, 0x00000002);
 
@@ -421,7 +422,7 @@  int ctx_narrow_access(struct bpf_sk_lookup *ctx)
 
 	/* Load from remote_port field with zero padding (backward compatibility) */
 	val_u32 = *(__u32 *)&ctx->remote_port;
-	if (val_u32 != bpf_htonl(bpf_ntohs(SRC_PORT) << 16))
+	if (val_u32 != SRC_PORT_U32)
 		return SK_DROP;
 
 	/* Narrow loads from local_port field. Expect DST_PORT. */