diff mbox series

[dwarves,v2] btf_encoder: sanitize non-regular int base type

Message ID 20210207071726.3969978-1-yhs@fb.com (mailing list archive)
State Not Applicable
Delegated to: BPF
Headers show
Series [dwarves,v2] btf_encoder: sanitize non-regular int base type | expand

Checks

Context Check Description
netdev/tree_selection success Not a local patch

Commit Message

Yonghong Song Feb. 7, 2021, 7:17 a.m. UTC
clang with dwarf5 may generate non-regular int base type,
i.e., not a signed/unsigned char/short/int/longlong/__int128.
Such base types are often used to describe
how an actual parameter or variable is generated. For example,

0x000015cf:   DW_TAG_base_type
                DW_AT_name      ("DW_ATE_unsigned_1")
                DW_AT_encoding  (DW_ATE_unsigned)
                DW_AT_byte_size (0x00)

0x00010ed9:         DW_TAG_formal_parameter
                      DW_AT_location    (DW_OP_lit0,
                                         DW_OP_not,
                                         DW_OP_convert (0x000015cf) "DW_ATE_unsigned_1",
                                         DW_OP_convert (0x000015d4) "DW_ATE_unsigned_8",
                                         DW_OP_stack_value)
                      DW_AT_abstract_origin     (0x00013984 "branch")

What it does is with a literal "0", did a "not" operation, and the converted to
one-bit unsigned int and then 8-bit unsigned int.

Another example,

0x000e97e4:   DW_TAG_base_type
                DW_AT_name      ("DW_ATE_unsigned_24")
                DW_AT_encoding  (DW_ATE_unsigned)
                DW_AT_byte_size (0x03)

0x000f88f8:     DW_TAG_variable
                  DW_AT_location        (indexed (0x3c) loclist = 0x00008fb0:
                     [0xffffffff82808812, 0xffffffff82808817):
                         DW_OP_breg0 RAX+0,
                         DW_OP_convert (0x000e97d5) "DW_ATE_unsigned_64",
                         DW_OP_convert (0x000e97df) "DW_ATE_unsigned_8",
                         DW_OP_stack_value,
                         DW_OP_piece 0x1,
                         DW_OP_breg0 RAX+0,
                         DW_OP_convert (0x000e97d5) "DW_ATE_unsigned_64",
                         DW_OP_convert (0x000e97da) "DW_ATE_unsigned_32",
                         DW_OP_lit8,
                         DW_OP_shr,
                         DW_OP_convert (0x000e97da) "DW_ATE_unsigned_32",
                         DW_OP_convert (0x000e97e4) "DW_ATE_unsigned_24",
                         DW_OP_stack_value,
                         DW_OP_piece 0x3
                     ......

At one point, a right shift by 8 happens and the result is converted to
32-bit unsigned int and then to 24-bit unsigned int.

BTF does not need any of these DW_OP_* information and such non-regular int
types will cause libbpf to emit errors.
Let us sanitize them to generate BTF acceptable to libbpf and kernel.

Cc: Sedat Dilek <sedat.dilek@gmail.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Yonghong Song <yhs@fb.com>
---
 libbtf.c | 39 ++++++++++++++++++++++++++++++++++++++-
 1 file changed, 38 insertions(+), 1 deletion(-)

Comments

Sedat Dilek Feb. 7, 2021, 10:32 a.m. UTC | #1
On Sun, Feb 7, 2021 at 8:17 AM Yonghong Song <yhs@fb.com> wrote:
>
> clang with dwarf5 may generate non-regular int base type,
> i.e., not a signed/unsigned char/short/int/longlong/__int128.
> Such base types are often used to describe
> how an actual parameter or variable is generated. For example,
>
> 0x000015cf:   DW_TAG_base_type
>                 DW_AT_name      ("DW_ATE_unsigned_1")
>                 DW_AT_encoding  (DW_ATE_unsigned)
>                 DW_AT_byte_size (0x00)
>
> 0x00010ed9:         DW_TAG_formal_parameter
>                       DW_AT_location    (DW_OP_lit0,
>                                          DW_OP_not,
>                                          DW_OP_convert (0x000015cf) "DW_ATE_unsigned_1",
>                                          DW_OP_convert (0x000015d4) "DW_ATE_unsigned_8",
>                                          DW_OP_stack_value)
>                       DW_AT_abstract_origin     (0x00013984 "branch")
>
> What it does is with a literal "0", did a "not" operation, and the converted to
> one-bit unsigned int and then 8-bit unsigned int.
>
> Another example,
>
> 0x000e97e4:   DW_TAG_base_type
>                 DW_AT_name      ("DW_ATE_unsigned_24")
>                 DW_AT_encoding  (DW_ATE_unsigned)
>                 DW_AT_byte_size (0x03)
>
> 0x000f88f8:     DW_TAG_variable
>                   DW_AT_location        (indexed (0x3c) loclist = 0x00008fb0:
>                      [0xffffffff82808812, 0xffffffff82808817):
>                          DW_OP_breg0 RAX+0,
>                          DW_OP_convert (0x000e97d5) "DW_ATE_unsigned_64",
>                          DW_OP_convert (0x000e97df) "DW_ATE_unsigned_8",
>                          DW_OP_stack_value,
>                          DW_OP_piece 0x1,
>                          DW_OP_breg0 RAX+0,
>                          DW_OP_convert (0x000e97d5) "DW_ATE_unsigned_64",
>                          DW_OP_convert (0x000e97da) "DW_ATE_unsigned_32",
>                          DW_OP_lit8,
>                          DW_OP_shr,
>                          DW_OP_convert (0x000e97da) "DW_ATE_unsigned_32",
>                          DW_OP_convert (0x000e97e4) "DW_ATE_unsigned_24",
>                          DW_OP_stack_value,
>                          DW_OP_piece 0x3
>                      ......
>
> At one point, a right shift by 8 happens and the result is converted to
> 32-bit unsigned int and then to 24-bit unsigned int.
>
> BTF does not need any of these DW_OP_* information and such non-regular int
> types will cause libbpf to emit errors.
> Let us sanitize them to generate BTF acceptable to libbpf and kernel.
>
> Cc: Sedat Dilek <sedat.dilek@gmail.com>

Thanks for v2.

For both v1 and v2:

   Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
   Reported-by: Sedat Dilek <sedat.dilek@gmail.com>

My development and testing environment:

1. Debian/testing AMD64
2. Linux v5.11-rc6+ with custom mostly Clang fixes
3. Debug-Info: BTF + DWARF-v5 enabled
4. Compiler and tools (incl. Clang's Integrated ASsembler IAS):
LLVM/Clang v12.0.0-rc1 (make LLVM=1 LLVM_IAS=1)

Build and boot on bare metal.

- Sedat -

> Acked-by: Andrii Nakryiko <andrii@kernel.org>
> Signed-off-by: Yonghong Song <yhs@fb.com>
> ---
>  libbtf.c | 39 ++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 38 insertions(+), 1 deletion(-)
>
> diff --git a/libbtf.c b/libbtf.c
> index 9f76283..5843200 100644
> --- a/libbtf.c
> +++ b/libbtf.c
> @@ -373,6 +373,7 @@ int32_t btf_elf__add_base_type(struct btf_elf *btfe, const struct base_type *bt,
>         struct btf *btf = btfe->btf;
>         const struct btf_type *t;
>         uint8_t encoding = 0;
> +       uint16_t byte_sz;
>         int32_t id;
>
>         if (bt->is_signed) {
> @@ -384,7 +385,43 @@ int32_t btf_elf__add_base_type(struct btf_elf *btfe, const struct base_type *bt,
>                 return -1;
>         }
>
> -       id = btf__add_int(btf, name, BITS_ROUNDUP_BYTES(bt->bit_size), encoding);
> +       /* dwarf5 may emit DW_ATE_[un]signed_{num} base types where
> +        * {num} is not power of 2 and may exceed 128. Such attributes
> +        * are mostly used to record operation for an actual parameter
> +        * or variable.
> +        * For example,
> +        *     DW_AT_location        (indexed (0x3c) loclist = 0x00008fb0:
> +        *         [0xffffffff82808812, 0xffffffff82808817):
> +        *             DW_OP_breg0 RAX+0,
> +        *             DW_OP_convert (0x000e97d5) "DW_ATE_unsigned_64",
> +        *             DW_OP_convert (0x000e97df) "DW_ATE_unsigned_8",
> +        *             DW_OP_stack_value,
> +        *             DW_OP_piece 0x1,
> +        *             DW_OP_breg0 RAX+0,
> +        *             DW_OP_convert (0x000e97d5) "DW_ATE_unsigned_64",
> +        *             DW_OP_convert (0x000e97da) "DW_ATE_unsigned_32",
> +        *             DW_OP_lit8,
> +        *             DW_OP_shr,
> +        *             DW_OP_convert (0x000e97da) "DW_ATE_unsigned_32",
> +        *             DW_OP_convert (0x000e97e4) "DW_ATE_unsigned_24",
> +        *             DW_OP_stack_value, DW_OP_piece 0x3
> +        *     DW_AT_name    ("ebx")
> +        *     DW_AT_decl_file       ("/linux/arch/x86/events/intel/core.c")
> +        *
> +        * In the above example, at some point, one unsigned_32 value
> +        * is right shifted by 8 and the result is converted to unsigned_32
> +        * and then unsigned_24.
> +        *
> +        * BTF does not need such DW_OP_* information so let us sanitize
> +        * these non-regular int types to avoid libbpf/kernel complaints.
> +        */
> +       byte_sz = BITS_ROUNDUP_BYTES(bt->bit_size);
> +       if (!byte_sz || (byte_sz & (byte_sz - 1))) {
> +               name = "__SANITIZED_FAKE_INT__";
> +               byte_sz = 4;
> +       }
> +
> +       id = btf__add_int(btf, name, byte_sz, encoding);
>         if (id < 0) {
>                 btf_elf__log_err(btfe, BTF_KIND_INT, name, true, "Error emitting BTF type");
>         } else {
> --
> 2.24.1
>
Mark Wielaard Feb. 7, 2021, 2:18 p.m. UTC | #2
Hi,

On Sat, 2021-02-06 at 23:17 -0800, Yonghong Song wrote:
> clang with dwarf5 may generate non-regular int base type,
> i.e., not a signed/unsigned char/short/int/longlong/__int128.
> Such base types are often used to describe
> how an actual parameter or variable is generated. For example,
> 
> 0x000015cf:   DW_TAG_base_type
>                 DW_AT_name      ("DW_ATE_unsigned_1")
>                 DW_AT_encoding  (DW_ATE_unsigned)
>                 DW_AT_byte_size (0x00)
> 
> 0x00010ed9:         DW_TAG_formal_parameter
>                       DW_AT_location    (DW_OP_lit0,
>                                          DW_OP_not,
>                                          DW_OP_convert (0x000015cf) "DW_ATE_unsigned_1",
>                                          DW_OP_convert (0x000015d4) "DW_ATE_unsigned_8",
>                                          DW_OP_stack_value)
>                       DW_AT_abstract_origin     (0x00013984 "branch")
> 
> What it does is with a literal "0", did a "not" operation, and the converted to
> one-bit unsigned int and then 8-bit unsigned int.

Thanks for tracking this down. Do you have any idea why the clang
compiler emits this? You might be right that it is intended to do what
you describe it does (but then it would simply encode an unsigned
constant 1 char in a very inefficient way). But as implemented it
doesn't seem to make any sense. What would DW_OP_convert of an zero
sized base type even mean (if it is intended as a 1 bit sized typed,
then why is there no DW_AT_bit_size)?

So I do think your patch makes sense. clang clearly is emitting
something bogus. And so some fixup is needed. But maybe we should at
least give a warning about it, otherwise it might never get fixed.

BTW. If these bogus base types are only emitted as part of a location
expression and not as part of an actual function or variable type
description, then why are we even trying to encode it as a BTF type? It
might be cheaper to just skip/drop it. But maybe the code setup makes
it hard to know whether or not such a (bogus) type is actually
referenced from a function or variable description?

Cheers,

Mark
Sedat Dilek Feb. 7, 2021, 3:15 p.m. UTC | #3
On Sun, Feb 7, 2021 at 3:18 PM Mark Wielaard <mark@klomp.org> wrote:
>
> Hi,
>
> On Sat, 2021-02-06 at 23:17 -0800, Yonghong Song wrote:
> > clang with dwarf5 may generate non-regular int base type,
> > i.e., not a signed/unsigned char/short/int/longlong/__int128.
> > Such base types are often used to describe
> > how an actual parameter or variable is generated. For example,
> >
> > 0x000015cf:   DW_TAG_base_type
> >                 DW_AT_name      ("DW_ATE_unsigned_1")
> >                 DW_AT_encoding  (DW_ATE_unsigned)
> >                 DW_AT_byte_size (0x00)
> >
> > 0x00010ed9:         DW_TAG_formal_parameter
> >                       DW_AT_location    (DW_OP_lit0,
> >                                          DW_OP_not,
> >                                          DW_OP_convert (0x000015cf) "DW_ATE_unsigned_1",
> >                                          DW_OP_convert (0x000015d4) "DW_ATE_unsigned_8",
> >                                          DW_OP_stack_value)
> >                       DW_AT_abstract_origin     (0x00013984 "branch")
> >
> > What it does is with a literal "0", did a "not" operation, and the converted to
> > one-bit unsigned int and then 8-bit unsigned int.
>
> Thanks for tracking this down. Do you have any idea why the clang
> compiler emits this? You might be right that it is intended to do what
> you describe it does (but then it would simply encode an unsigned
> constant 1 char in a very inefficient way). But as implemented it
> doesn't seem to make any sense. What would DW_OP_convert of an zero
> sized base type even mean (if it is intended as a 1 bit sized typed,
> then why is there no DW_AT_bit_size)?
>
> So I do think your patch makes sense. clang clearly is emitting
> something bogus. And so some fixup is needed. But maybe we should at
> least give a warning about it, otherwise it might never get fixed.
>
> BTW. If these bogus base types are only emitted as part of a location
> expression and not as part of an actual function or variable type
> description, then why are we even trying to encode it as a BTF type? It
> might be cheaper to just skip/drop it. But maybe the code setup makes
> it hard to know whether or not such a (bogus) type is actually
> referenced from a function or variable description?
>

As said this is with LLVM/Clang v12.0.0-rc1 and `make LLVM=1
LLVM_IAS=1` means use all LLVM tools (do not use GNU/binutils like
GNU/ld BFD) and Clang's Integrated ASsembler (means do not use GNU
AS).

When doing an indexed search for "DW_ATE_unsigned"...

...this points to changes recently done in places like
llvm/lib/CodeGen/AsmPrinter/.

I see 16 changes there touching DWARF-x area since llvmorg-12.0.0-rc1 release:

$ git log --oneline llvmorg-12.0.0-rc1.. llvm/lib/CodeGen/AsmPrinter/
e44a10094283 .gcc_except_table: Set SHF_LINK_ORDER if binutils>=2.36,
and drop unneeded unique ID for -fno-unique-section-names
853a2649160c [AsmPrinter] __patchable_function_entries: Set
SHF_LINK_ORDER for binutils 2.36 and above
e3c0b0fe0958 [WebAssembly] locals can now be indirect in DWARF
34f3249abdff [DebugInfo] Fix error from D95893, where I accidentally
used an unsigned int in a loop and it wraps around.
a740af4de970 [CodeView][DebugInfo] Update the code for removing
template arguments from the display name of a codeview function id.
56fa34ae3570 DebugInfo: Temporarily work around -gsplit-dwarf + LTO
.debug_gnu_pubnames regression after D94976
8998f5843503 Re-land D94976 after revert in e29552c5aff6
d32deaab4d53 Revert "[DWARF] Location-less inlined variables should
not have DW_TAG_variable"
ddc2f1e3fb4f [DWARF] Location-less inlined variables should not have
DW_TAG_variable
511c9a76fb98 [AsmPrinter] Use ListSeparator (NFC)
85b7b5625a00 Fix memory leak in 4318028cd2d7633a0cdeb0b5d4d2ed81fab87864
4318028cd2d7 DebugInfo: Add a DWARF FORM extension for addrx+offset
references to reduce relocations
e29552c5aff6 Revert "[DWARF] Create subprogram's DIE in DISubprogram's unit"
dd7297e1bffe DebugInfo: Fix bug in addr+offset exprloc to use DWARFv5
addrx op instead of DWARFv4 GNU extension
7e6c87ee0454 DebugInfo: Deduplicate addresses in debug_addr
ef0dcb506300 [DWARF] Create subprogram's DIE in DISubprogram's unit

What I try to say is with LLVM-13 (git) this might look different?

Here, I have a LLVM-12 ThinLTO+PGO optimized toolchain which saves 1/3
of Linux-kernel build-time.
So, I do not want to switch to Debian's or packages from <apt.llvm.org>.
These binaries take much much longer and I do not know if I get some
new issues with Linux v5.11-rc6+.

Again, I am not a LLVM toolchain expert.
Best is to ask on llvm-dev mailing-list?

Thanks.

- Sedat -



[1] https://github.com/llvm/llvm-project/search?o=desc&q=DW_ATE_unsigned&s=indexed
Yonghong Song Feb. 7, 2021, 6:14 p.m. UTC | #4
On 2/7/21 6:18 AM, Mark Wielaard wrote:
> Hi,
> 
> On Sat, 2021-02-06 at 23:17 -0800, Yonghong Song wrote:
>> clang with dwarf5 may generate non-regular int base type,
>> i.e., not a signed/unsigned char/short/int/longlong/__int128.
>> Such base types are often used to describe
>> how an actual parameter or variable is generated. For example,
>>
>> 0x000015cf:   DW_TAG_base_type
>>                  DW_AT_name      ("DW_ATE_unsigned_1")
>>                  DW_AT_encoding  (DW_ATE_unsigned)
>>                  DW_AT_byte_size (0x00)
>>
>> 0x00010ed9:         DW_TAG_formal_parameter
>>                        DW_AT_location    (DW_OP_lit0,
>>                                           DW_OP_not,
>>                                           DW_OP_convert (0x000015cf) "DW_ATE_unsigned_1",
>>                                           DW_OP_convert (0x000015d4) "DW_ATE_unsigned_8",
>>                                           DW_OP_stack_value)
>>                        DW_AT_abstract_origin     (0x00013984 "branch")
>>
>> What it does is with a literal "0", did a "not" operation, and the converted to
>> one-bit unsigned int and then 8-bit unsigned int.
> 
> Thanks for tracking this down. Do you have any idea why the clang
> compiler emits this? You might be right that it is intended to do what

No, I don't know.

> you describe it does (but then it would simply encode an unsigned
> constant 1 char in a very inefficient way). But as implemented it
> doesn't seem to make any sense. What would DW_OP_convert of an zero
> sized base type even mean (if it is intended as a 1 bit sized typed,
> then why is there no DW_AT_bit_size)?

We need to report back to llvm community about this instance.
DW_AT_byte_size = 0 only for DW_ATE_unsigned_1 does not sound right.
DW_AT_bit_size might be needed as you suggested.

> 
> So I do think your patch makes sense. clang clearly is emitting
> something bogus. And so some fixup is needed. But maybe we should at
> least give a warning about it, otherwise it might never get fixed.

In pahole, to deal with dwarf weirdness, we have several other places
for workaround without warning. I think it is okay not to have
warning here as users cannot do anything about it. Also we have
this special int type with name __SANITIZED_FAKE_INT__ which will
signal the issue still exists if looking at BTF or generated vmlinux.h.

> 
> BTW. If these bogus base types are only emitted as part of a location
> expression and not as part of an actual function or variable type
> description, then why are we even trying to encode it as a BTF type? It

Let us still encode it as a BTF type as an evidence that we do
changed some types. After deduplication, we will only have ONE 
__SANITIZED_FAKE_INT__ type. I think leave this one type in BTF is okay.

> might be cheaper to just skip/drop it. But maybe the code setup makes
> it hard to know whether or not such a (bogus) type is actually
> referenced from a function or variable description?

I guess it is possible that we do a preprocessing to check whether
any types BTF intends to emit (var/func definition, etc.) referencing
such types or not. This will involves overhead for every CU most of 
which does not have this issue. And this should not really happen given 
a sane dwarf.

With __SANITIZED_FAKE_INT__ as the type name, if something e.g., a 
variable definition use this type, you will have
    __SANITIZED_FAKE_INT__ a;
in skeleton, or have
    struct {
       __SANITIZED_FAKE_INT__ member;
       ...
    };
in vmlinux.h,
we should be able to catch such an error easily.

> 
> Cheers,
> 
> Mark
>
Nick Desaulniers Feb. 8, 2021, 7:16 p.m. UTC | #5
On Sun, Feb 7, 2021 at 6:18 AM Mark Wielaard <mark@klomp.org> wrote:
>
> Hi,
>
> On Sat, 2021-02-06 at 23:17 -0800, Yonghong Song wrote:
> > clang with dwarf5 may generate non-regular int base type,
> > i.e., not a signed/unsigned char/short/int/longlong/__int128.
> > Such base types are often used to describe
> > how an actual parameter or variable is generated. For example,
> >
> > 0x000015cf:   DW_TAG_base_type
> >                 DW_AT_name      ("DW_ATE_unsigned_1")
> >                 DW_AT_encoding  (DW_ATE_unsigned)
> >                 DW_AT_byte_size (0x00)
> >
> > 0x00010ed9:         DW_TAG_formal_parameter
> >                       DW_AT_location    (DW_OP_lit0,
> >                                          DW_OP_not,
> >                                          DW_OP_convert (0x000015cf) "DW_ATE_unsigned_1",
> >                                          DW_OP_convert (0x000015d4) "DW_ATE_unsigned_8",
> >                                          DW_OP_stack_value)
> >                       DW_AT_abstract_origin     (0x00013984 "branch")
> >
> > What it does is with a literal "0", did a "not" operation, and the converted to
> > one-bit unsigned int and then 8-bit unsigned int.
>
> Thanks for tracking this down. Do you have any idea why the clang
> compiler emits this? You might be right that it is intended to do what
> you describe it does (but then it would simply encode an unsigned
> constant 1 char in a very inefficient way). But as implemented it
> doesn't seem to make any sense. What would DW_OP_convert of an zero
> sized base type even mean (if it is intended as a 1 bit sized typed,
> then why is there no DW_AT_bit_size)?

David,
Any thoughts on the above sequence of DW_OP_ entries?  This is a part
of DWARF I'm unfamiliar with.

>
> So I do think your patch makes sense. clang clearly is emitting
> something bogus. And so some fixup is needed. But maybe we should at
> least give a warning about it, otherwise it might never get fixed.
>
> BTW. If these bogus base types are only emitted as part of a location
> expression and not as part of an actual function or variable type
> description, then why are we even trying to encode it as a BTF type? It
> might be cheaper to just skip/drop it. But maybe the code setup makes
> it hard to know whether or not such a (bogus) type is actually
> referenced from a function or variable description?
>
> Cheers,
>
> Mark
Nick Desaulniers Feb. 8, 2021, 7:22 p.m. UTC | #6
On Sat, Feb 6, 2021 at 11:17 PM Yonghong Song <yhs@fb.com> wrote:
>
> clang with dwarf5 may generate non-regular int base type,
> i.e., not a signed/unsigned char/short/int/longlong/__int128.
> Such base types are often used to describe
> how an actual parameter or variable is generated. For example,
>
> 0x000015cf:   DW_TAG_base_type
>                 DW_AT_name      ("DW_ATE_unsigned_1")
>                 DW_AT_encoding  (DW_ATE_unsigned)
>                 DW_AT_byte_size (0x00)
>
> 0x00010ed9:         DW_TAG_formal_parameter
>                       DW_AT_location    (DW_OP_lit0,
>                                          DW_OP_not,
>                                          DW_OP_convert (0x000015cf) "DW_ATE_unsigned_1",
>                                          DW_OP_convert (0x000015d4) "DW_ATE_unsigned_8",
>                                          DW_OP_stack_value)
>                       DW_AT_abstract_origin     (0x00013984 "branch")
>
> What it does is with a literal "0", did a "not" operation, and the converted to
> one-bit unsigned int and then 8-bit unsigned int.
>
> Another example,
>
> 0x000e97e4:   DW_TAG_base_type
>                 DW_AT_name      ("DW_ATE_unsigned_24")
>                 DW_AT_encoding  (DW_ATE_unsigned)
>                 DW_AT_byte_size (0x03)
>
> 0x000f88f8:     DW_TAG_variable
>                   DW_AT_location        (indexed (0x3c) loclist = 0x00008fb0:
>                      [0xffffffff82808812, 0xffffffff82808817):
>                          DW_OP_breg0 RAX+0,
>                          DW_OP_convert (0x000e97d5) "DW_ATE_unsigned_64",
>                          DW_OP_convert (0x000e97df) "DW_ATE_unsigned_8",
>                          DW_OP_stack_value,
>                          DW_OP_piece 0x1,
>                          DW_OP_breg0 RAX+0,
>                          DW_OP_convert (0x000e97d5) "DW_ATE_unsigned_64",
>                          DW_OP_convert (0x000e97da) "DW_ATE_unsigned_32",
>                          DW_OP_lit8,
>                          DW_OP_shr,
>                          DW_OP_convert (0x000e97da) "DW_ATE_unsigned_32",
>                          DW_OP_convert (0x000e97e4) "DW_ATE_unsigned_24",
>                          DW_OP_stack_value,
>                          DW_OP_piece 0x3
>                      ......
>
> At one point, a right shift by 8 happens and the result is converted to
> 32-bit unsigned int and then to 24-bit unsigned int.
>
> BTF does not need any of these DW_OP_* information and such non-regular int
> types will cause libbpf to emit errors.
> Let us sanitize them to generate BTF acceptable to libbpf and kernel.
>
> Cc: Sedat Dilek <sedat.dilek@gmail.com>
> Acked-by: Andrii Nakryiko <andrii@kernel.org>
> Signed-off-by: Yonghong Song <yhs@fb.com>

Thanks for the patch!

Tested-by: Nick Desaulniers <ndesaulniers@google.com>

> ---
>  libbtf.c | 39 ++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 38 insertions(+), 1 deletion(-)
>
> diff --git a/libbtf.c b/libbtf.c
> index 9f76283..5843200 100644
> --- a/libbtf.c
> +++ b/libbtf.c
> @@ -373,6 +373,7 @@ int32_t btf_elf__add_base_type(struct btf_elf *btfe, const struct base_type *bt,
>         struct btf *btf = btfe->btf;
>         const struct btf_type *t;
>         uint8_t encoding = 0;
> +       uint16_t byte_sz;
>         int32_t id;
>
>         if (bt->is_signed) {
> @@ -384,7 +385,43 @@ int32_t btf_elf__add_base_type(struct btf_elf *btfe, const struct base_type *bt,
>                 return -1;
>         }
>
> -       id = btf__add_int(btf, name, BITS_ROUNDUP_BYTES(bt->bit_size), encoding);
> +       /* dwarf5 may emit DW_ATE_[un]signed_{num} base types where
> +        * {num} is not power of 2 and may exceed 128. Such attributes
> +        * are mostly used to record operation for an actual parameter
> +        * or variable.
> +        * For example,
> +        *     DW_AT_location        (indexed (0x3c) loclist = 0x00008fb0:
> +        *         [0xffffffff82808812, 0xffffffff82808817):
> +        *             DW_OP_breg0 RAX+0,
> +        *             DW_OP_convert (0x000e97d5) "DW_ATE_unsigned_64",
> +        *             DW_OP_convert (0x000e97df) "DW_ATE_unsigned_8",
> +        *             DW_OP_stack_value,
> +        *             DW_OP_piece 0x1,
> +        *             DW_OP_breg0 RAX+0,
> +        *             DW_OP_convert (0x000e97d5) "DW_ATE_unsigned_64",
> +        *             DW_OP_convert (0x000e97da) "DW_ATE_unsigned_32",
> +        *             DW_OP_lit8,
> +        *             DW_OP_shr,
> +        *             DW_OP_convert (0x000e97da) "DW_ATE_unsigned_32",
> +        *             DW_OP_convert (0x000e97e4) "DW_ATE_unsigned_24",
> +        *             DW_OP_stack_value, DW_OP_piece 0x3
> +        *     DW_AT_name    ("ebx")
> +        *     DW_AT_decl_file       ("/linux/arch/x86/events/intel/core.c")
> +        *
> +        * In the above example, at some point, one unsigned_32 value
> +        * is right shifted by 8 and the result is converted to unsigned_32
> +        * and then unsigned_24.
> +        *
> +        * BTF does not need such DW_OP_* information so let us sanitize
> +        * these non-regular int types to avoid libbpf/kernel complaints.
> +        */
> +       byte_sz = BITS_ROUNDUP_BYTES(bt->bit_size);
> +       if (!byte_sz || (byte_sz & (byte_sz - 1))) {
> +               name = "__SANITIZED_FAKE_INT__";
> +               byte_sz = 4;
> +       }
> +
> +       id = btf__add_int(btf, name, byte_sz, encoding);
>         if (id < 0) {
>                 btf_elf__log_err(btfe, BTF_KIND_INT, name, true, "Error emitting BTF type");
>         } else {
> --
> 2.24.1
>
Sedat Dilek Feb. 8, 2021, 7:27 p.m. UTC | #7
On Mon, Feb 8, 2021 at 8:23 PM Nick Desaulniers <ndesaulniers@google.com> wrote:
>
> On Sat, Feb 6, 2021 at 11:17 PM Yonghong Song <yhs@fb.com> wrote:
> >
> > clang with dwarf5 may generate non-regular int base type,
> > i.e., not a signed/unsigned char/short/int/longlong/__int128.
> > Such base types are often used to describe
> > how an actual parameter or variable is generated. For example,
> >
> > 0x000015cf:   DW_TAG_base_type
> >                 DW_AT_name      ("DW_ATE_unsigned_1")
> >                 DW_AT_encoding  (DW_ATE_unsigned)
> >                 DW_AT_byte_size (0x00)
> >
> > 0x00010ed9:         DW_TAG_formal_parameter
> >                       DW_AT_location    (DW_OP_lit0,
> >                                          DW_OP_not,
> >                                          DW_OP_convert (0x000015cf) "DW_ATE_unsigned_1",
> >                                          DW_OP_convert (0x000015d4) "DW_ATE_unsigned_8",
> >                                          DW_OP_stack_value)
> >                       DW_AT_abstract_origin     (0x00013984 "branch")
> >
> > What it does is with a literal "0", did a "not" operation, and the converted to
> > one-bit unsigned int and then 8-bit unsigned int.
> >
> > Another example,
> >
> > 0x000e97e4:   DW_TAG_base_type
> >                 DW_AT_name      ("DW_ATE_unsigned_24")
> >                 DW_AT_encoding  (DW_ATE_unsigned)
> >                 DW_AT_byte_size (0x03)
> >
> > 0x000f88f8:     DW_TAG_variable
> >                   DW_AT_location        (indexed (0x3c) loclist = 0x00008fb0:
> >                      [0xffffffff82808812, 0xffffffff82808817):
> >                          DW_OP_breg0 RAX+0,
> >                          DW_OP_convert (0x000e97d5) "DW_ATE_unsigned_64",
> >                          DW_OP_convert (0x000e97df) "DW_ATE_unsigned_8",
> >                          DW_OP_stack_value,
> >                          DW_OP_piece 0x1,
> >                          DW_OP_breg0 RAX+0,
> >                          DW_OP_convert (0x000e97d5) "DW_ATE_unsigned_64",
> >                          DW_OP_convert (0x000e97da) "DW_ATE_unsigned_32",
> >                          DW_OP_lit8,
> >                          DW_OP_shr,
> >                          DW_OP_convert (0x000e97da) "DW_ATE_unsigned_32",
> >                          DW_OP_convert (0x000e97e4) "DW_ATE_unsigned_24",
> >                          DW_OP_stack_value,
> >                          DW_OP_piece 0x3
> >                      ......
> >
> > At one point, a right shift by 8 happens and the result is converted to
> > 32-bit unsigned int and then to 24-bit unsigned int.
> >
> > BTF does not need any of these DW_OP_* information and such non-regular int
> > types will cause libbpf to emit errors.
> > Let us sanitize them to generate BTF acceptable to libbpf and kernel.
> >
> > Cc: Sedat Dilek <sedat.dilek@gmail.com>
> > Acked-by: Andrii Nakryiko <andrii@kernel.org>
> > Signed-off-by: Yonghong Song <yhs@fb.com>
>
> Thanks for the patch!
>
> Tested-by: Nick Desaulniers <ndesaulniers@google.com>
>

Cool, thanks for testing Nick.

- Sedat -

> > ---
> >  libbtf.c | 39 ++++++++++++++++++++++++++++++++++++++-
> >  1 file changed, 38 insertions(+), 1 deletion(-)
> >
> > diff --git a/libbtf.c b/libbtf.c
> > index 9f76283..5843200 100644
> > --- a/libbtf.c
> > +++ b/libbtf.c
> > @@ -373,6 +373,7 @@ int32_t btf_elf__add_base_type(struct btf_elf *btfe, const struct base_type *bt,
> >         struct btf *btf = btfe->btf;
> >         const struct btf_type *t;
> >         uint8_t encoding = 0;
> > +       uint16_t byte_sz;
> >         int32_t id;
> >
> >         if (bt->is_signed) {
> > @@ -384,7 +385,43 @@ int32_t btf_elf__add_base_type(struct btf_elf *btfe, const struct base_type *bt,
> >                 return -1;
> >         }
> >
> > -       id = btf__add_int(btf, name, BITS_ROUNDUP_BYTES(bt->bit_size), encoding);
> > +       /* dwarf5 may emit DW_ATE_[un]signed_{num} base types where
> > +        * {num} is not power of 2 and may exceed 128. Such attributes
> > +        * are mostly used to record operation for an actual parameter
> > +        * or variable.
> > +        * For example,
> > +        *     DW_AT_location        (indexed (0x3c) loclist = 0x00008fb0:
> > +        *         [0xffffffff82808812, 0xffffffff82808817):
> > +        *             DW_OP_breg0 RAX+0,
> > +        *             DW_OP_convert (0x000e97d5) "DW_ATE_unsigned_64",
> > +        *             DW_OP_convert (0x000e97df) "DW_ATE_unsigned_8",
> > +        *             DW_OP_stack_value,
> > +        *             DW_OP_piece 0x1,
> > +        *             DW_OP_breg0 RAX+0,
> > +        *             DW_OP_convert (0x000e97d5) "DW_ATE_unsigned_64",
> > +        *             DW_OP_convert (0x000e97da) "DW_ATE_unsigned_32",
> > +        *             DW_OP_lit8,
> > +        *             DW_OP_shr,
> > +        *             DW_OP_convert (0x000e97da) "DW_ATE_unsigned_32",
> > +        *             DW_OP_convert (0x000e97e4) "DW_ATE_unsigned_24",
> > +        *             DW_OP_stack_value, DW_OP_piece 0x3
> > +        *     DW_AT_name    ("ebx")
> > +        *     DW_AT_decl_file       ("/linux/arch/x86/events/intel/core.c")
> > +        *
> > +        * In the above example, at some point, one unsigned_32 value
> > +        * is right shifted by 8 and the result is converted to unsigned_32
> > +        * and then unsigned_24.
> > +        *
> > +        * BTF does not need such DW_OP_* information so let us sanitize
> > +        * these non-regular int types to avoid libbpf/kernel complaints.
> > +        */
> > +       byte_sz = BITS_ROUNDUP_BYTES(bt->bit_size);
> > +       if (!byte_sz || (byte_sz & (byte_sz - 1))) {
> > +               name = "__SANITIZED_FAKE_INT__";
> > +               byte_sz = 4;
> > +       }
> > +
> > +       id = btf__add_int(btf, name, byte_sz, encoding);
> >         if (id < 0) {
> >                 btf_elf__log_err(btfe, BTF_KIND_INT, name, true, "Error emitting BTF type");
> >         } else {
> > --
> > 2.24.1
> >
>
>
> --
> Thanks,
> ~Nick Desaulniers
Arnaldo Carvalho de Melo Feb. 8, 2021, 7:55 p.m. UTC | #8
Em Sun, Feb 07, 2021 at 11:32:00AM +0100, Sedat Dilek escreveu:
> On Sun, Feb 7, 2021 at 8:17 AM Yonghong Song <yhs@fb.com> wrote:
> >
> > clang with dwarf5 may generate non-regular int base type,
> > i.e., not a signed/unsigned char/short/int/longlong/__int128.
> > Such base types are often used to describe
> > how an actual parameter or variable is generated. For example,
> >
> > 0x000015cf:   DW_TAG_base_type
> >                 DW_AT_name      ("DW_ATE_unsigned_1")
> >                 DW_AT_encoding  (DW_ATE_unsigned)
> >                 DW_AT_byte_size (0x00)
> >
> > 0x00010ed9:         DW_TAG_formal_parameter
> >                       DW_AT_location    (DW_OP_lit0,
> >                                          DW_OP_not,
> >                                          DW_OP_convert (0x000015cf) "DW_ATE_unsigned_1",
> >                                          DW_OP_convert (0x000015d4) "DW_ATE_unsigned_8",
> >                                          DW_OP_stack_value)
> >                       DW_AT_abstract_origin     (0x00013984 "branch")
> >
> > What it does is with a literal "0", did a "not" operation, and the converted to
> > one-bit unsigned int and then 8-bit unsigned int.
> >
> > Another example,
> >
> > 0x000e97e4:   DW_TAG_base_type
> >                 DW_AT_name      ("DW_ATE_unsigned_24")
> >                 DW_AT_encoding  (DW_ATE_unsigned)
> >                 DW_AT_byte_size (0x03)
> >
> > 0x000f88f8:     DW_TAG_variable
> >                   DW_AT_location        (indexed (0x3c) loclist = 0x00008fb0:
> >                      [0xffffffff82808812, 0xffffffff82808817):
> >                          DW_OP_breg0 RAX+0,
> >                          DW_OP_convert (0x000e97d5) "DW_ATE_unsigned_64",
> >                          DW_OP_convert (0x000e97df) "DW_ATE_unsigned_8",
> >                          DW_OP_stack_value,
> >                          DW_OP_piece 0x1,
> >                          DW_OP_breg0 RAX+0,
> >                          DW_OP_convert (0x000e97d5) "DW_ATE_unsigned_64",
> >                          DW_OP_convert (0x000e97da) "DW_ATE_unsigned_32",
> >                          DW_OP_lit8,
> >                          DW_OP_shr,
> >                          DW_OP_convert (0x000e97da) "DW_ATE_unsigned_32",
> >                          DW_OP_convert (0x000e97e4) "DW_ATE_unsigned_24",
> >                          DW_OP_stack_value,
> >                          DW_OP_piece 0x3
> >                      ......
> >
> > At one point, a right shift by 8 happens and the result is converted to
> > 32-bit unsigned int and then to 24-bit unsigned int.
> >
> > BTF does not need any of these DW_OP_* information and such non-regular int
> > types will cause libbpf to emit errors.
> > Let us sanitize them to generate BTF acceptable to libbpf and kernel.
> >
> > Cc: Sedat Dilek <sedat.dilek@gmail.com>
> 
> Thanks for v2.
> 
> For both v1 and v2:
> 
>    Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
>    Reported-by: Sedat Dilek <sedat.dilek@gmail.com>

Thanks, applied.

- Arnaldo
 
> My development and testing environment:
> 
> 1. Debian/testing AMD64
> 2. Linux v5.11-rc6+ with custom mostly Clang fixes
> 3. Debug-Info: BTF + DWARF-v5 enabled
> 4. Compiler and tools (incl. Clang's Integrated ASsembler IAS):
> LLVM/Clang v12.0.0-rc1 (make LLVM=1 LLVM_IAS=1)
> 
> Build and boot on bare metal.
> 
> - Sedat -
> 
> > Acked-by: Andrii Nakryiko <andrii@kernel.org>
> > Signed-off-by: Yonghong Song <yhs@fb.com>
> > ---
> >  libbtf.c | 39 ++++++++++++++++++++++++++++++++++++++-
> >  1 file changed, 38 insertions(+), 1 deletion(-)
> >
> > diff --git a/libbtf.c b/libbtf.c
> > index 9f76283..5843200 100644
> > --- a/libbtf.c
> > +++ b/libbtf.c
> > @@ -373,6 +373,7 @@ int32_t btf_elf__add_base_type(struct btf_elf *btfe, const struct base_type *bt,
> >         struct btf *btf = btfe->btf;
> >         const struct btf_type *t;
> >         uint8_t encoding = 0;
> > +       uint16_t byte_sz;
> >         int32_t id;
> >
> >         if (bt->is_signed) {
> > @@ -384,7 +385,43 @@ int32_t btf_elf__add_base_type(struct btf_elf *btfe, const struct base_type *bt,
> >                 return -1;
> >         }
> >
> > -       id = btf__add_int(btf, name, BITS_ROUNDUP_BYTES(bt->bit_size), encoding);
> > +       /* dwarf5 may emit DW_ATE_[un]signed_{num} base types where
> > +        * {num} is not power of 2 and may exceed 128. Such attributes
> > +        * are mostly used to record operation for an actual parameter
> > +        * or variable.
> > +        * For example,
> > +        *     DW_AT_location        (indexed (0x3c) loclist = 0x00008fb0:
> > +        *         [0xffffffff82808812, 0xffffffff82808817):
> > +        *             DW_OP_breg0 RAX+0,
> > +        *             DW_OP_convert (0x000e97d5) "DW_ATE_unsigned_64",
> > +        *             DW_OP_convert (0x000e97df) "DW_ATE_unsigned_8",
> > +        *             DW_OP_stack_value,
> > +        *             DW_OP_piece 0x1,
> > +        *             DW_OP_breg0 RAX+0,
> > +        *             DW_OP_convert (0x000e97d5) "DW_ATE_unsigned_64",
> > +        *             DW_OP_convert (0x000e97da) "DW_ATE_unsigned_32",
> > +        *             DW_OP_lit8,
> > +        *             DW_OP_shr,
> > +        *             DW_OP_convert (0x000e97da) "DW_ATE_unsigned_32",
> > +        *             DW_OP_convert (0x000e97e4) "DW_ATE_unsigned_24",
> > +        *             DW_OP_stack_value, DW_OP_piece 0x3
> > +        *     DW_AT_name    ("ebx")
> > +        *     DW_AT_decl_file       ("/linux/arch/x86/events/intel/core.c")
> > +        *
> > +        * In the above example, at some point, one unsigned_32 value
> > +        * is right shifted by 8 and the result is converted to unsigned_32
> > +        * and then unsigned_24.
> > +        *
> > +        * BTF does not need such DW_OP_* information so let us sanitize
> > +        * these non-regular int types to avoid libbpf/kernel complaints.
> > +        */
> > +       byte_sz = BITS_ROUNDUP_BYTES(bt->bit_size);
> > +       if (!byte_sz || (byte_sz & (byte_sz - 1))) {
> > +               name = "__SANITIZED_FAKE_INT__";
> > +               byte_sz = 4;
> > +       }
> > +
> > +       id = btf__add_int(btf, name, byte_sz, encoding);
> >         if (id < 0) {
> >                 btf_elf__log_err(btfe, BTF_KIND_INT, name, true, "Error emitting BTF type");
> >         } else {
> > --
> > 2.24.1
> >
Arnaldo Carvalho de Melo Feb. 8, 2021, 8:13 p.m. UTC | #9
Em Mon, Feb 08, 2021 at 11:22:48AM -0800, Nick Desaulniers escreveu:
> On Sat, Feb 6, 2021 at 11:17 PM Yonghong Song <yhs@fb.com> wrote:
> >
> > clang with dwarf5 may generate non-regular int base type,
> > i.e., not a signed/unsigned char/short/int/longlong/__int128.
> > Such base types are often used to describe
> > how an actual parameter or variable is generated. For example,
> >
> > 0x000015cf:   DW_TAG_base_type
> >                 DW_AT_name      ("DW_ATE_unsigned_1")
> >                 DW_AT_encoding  (DW_ATE_unsigned)
> >                 DW_AT_byte_size (0x00)
> >
> > 0x00010ed9:         DW_TAG_formal_parameter
> >                       DW_AT_location    (DW_OP_lit0,
> >                                          DW_OP_not,
> >                                          DW_OP_convert (0x000015cf) "DW_ATE_unsigned_1",
> >                                          DW_OP_convert (0x000015d4) "DW_ATE_unsigned_8",
> >                                          DW_OP_stack_value)
> >                       DW_AT_abstract_origin     (0x00013984 "branch")
> >
> > What it does is with a literal "0", did a "not" operation, and the converted to
> > one-bit unsigned int and then 8-bit unsigned int.
> >
> > Another example,
> >
> > 0x000e97e4:   DW_TAG_base_type
> >                 DW_AT_name      ("DW_ATE_unsigned_24")
> >                 DW_AT_encoding  (DW_ATE_unsigned)
> >                 DW_AT_byte_size (0x03)
> >
> > 0x000f88f8:     DW_TAG_variable
> >                   DW_AT_location        (indexed (0x3c) loclist = 0x00008fb0:
> >                      [0xffffffff82808812, 0xffffffff82808817):
> >                          DW_OP_breg0 RAX+0,
> >                          DW_OP_convert (0x000e97d5) "DW_ATE_unsigned_64",
> >                          DW_OP_convert (0x000e97df) "DW_ATE_unsigned_8",
> >                          DW_OP_stack_value,
> >                          DW_OP_piece 0x1,
> >                          DW_OP_breg0 RAX+0,
> >                          DW_OP_convert (0x000e97d5) "DW_ATE_unsigned_64",
> >                          DW_OP_convert (0x000e97da) "DW_ATE_unsigned_32",
> >                          DW_OP_lit8,
> >                          DW_OP_shr,
> >                          DW_OP_convert (0x000e97da) "DW_ATE_unsigned_32",
> >                          DW_OP_convert (0x000e97e4) "DW_ATE_unsigned_24",
> >                          DW_OP_stack_value,
> >                          DW_OP_piece 0x3
> >                      ......
> >
> > At one point, a right shift by 8 happens and the result is converted to
> > 32-bit unsigned int and then to 24-bit unsigned int.
> >
> > BTF does not need any of these DW_OP_* information and such non-regular int
> > types will cause libbpf to emit errors.
> > Let us sanitize them to generate BTF acceptable to libbpf and kernel.
> >
> > Cc: Sedat Dilek <sedat.dilek@gmail.com>
> > Acked-by: Andrii Nakryiko <andrii@kernel.org>
> > Signed-off-by: Yonghong Song <yhs@fb.com>
> 
> Thanks for the patch!
> 
> Tested-by: Nick Desaulniers <ndesaulniers@google.com>

Thanks for testing and documenting that you tested, added the tag to
the commit,

- Arnaldo
Sedat Dilek Feb. 8, 2021, 8:46 p.m. UTC | #10
On Mon, Feb 8, 2021 at 8:55 PM Arnaldo Carvalho de Melo
<arnaldo.melo@gmail.com> wrote:
>
> Em Sun, Feb 07, 2021 at 11:32:00AM +0100, Sedat Dilek escreveu:
> > On Sun, Feb 7, 2021 at 8:17 AM Yonghong Song <yhs@fb.com> wrote:
> > >
> > > clang with dwarf5 may generate non-regular int base type,
> > > i.e., not a signed/unsigned char/short/int/longlong/__int128.
> > > Such base types are often used to describe
> > > how an actual parameter or variable is generated. For example,
> > >
> > > 0x000015cf:   DW_TAG_base_type
> > >                 DW_AT_name      ("DW_ATE_unsigned_1")
> > >                 DW_AT_encoding  (DW_ATE_unsigned)
> > >                 DW_AT_byte_size (0x00)
> > >
> > > 0x00010ed9:         DW_TAG_formal_parameter
> > >                       DW_AT_location    (DW_OP_lit0,
> > >                                          DW_OP_not,
> > >                                          DW_OP_convert (0x000015cf) "DW_ATE_unsigned_1",
> > >                                          DW_OP_convert (0x000015d4) "DW_ATE_unsigned_8",
> > >                                          DW_OP_stack_value)
> > >                       DW_AT_abstract_origin     (0x00013984 "branch")
> > >
> > > What it does is with a literal "0", did a "not" operation, and the converted to
> > > one-bit unsigned int and then 8-bit unsigned int.
> > >
> > > Another example,
> > >
> > > 0x000e97e4:   DW_TAG_base_type
> > >                 DW_AT_name      ("DW_ATE_unsigned_24")
> > >                 DW_AT_encoding  (DW_ATE_unsigned)
> > >                 DW_AT_byte_size (0x03)
> > >
> > > 0x000f88f8:     DW_TAG_variable
> > >                   DW_AT_location        (indexed (0x3c) loclist = 0x00008fb0:
> > >                      [0xffffffff82808812, 0xffffffff82808817):
> > >                          DW_OP_breg0 RAX+0,
> > >                          DW_OP_convert (0x000e97d5) "DW_ATE_unsigned_64",
> > >                          DW_OP_convert (0x000e97df) "DW_ATE_unsigned_8",
> > >                          DW_OP_stack_value,
> > >                          DW_OP_piece 0x1,
> > >                          DW_OP_breg0 RAX+0,
> > >                          DW_OP_convert (0x000e97d5) "DW_ATE_unsigned_64",
> > >                          DW_OP_convert (0x000e97da) "DW_ATE_unsigned_32",
> > >                          DW_OP_lit8,
> > >                          DW_OP_shr,
> > >                          DW_OP_convert (0x000e97da) "DW_ATE_unsigned_32",
> > >                          DW_OP_convert (0x000e97e4) "DW_ATE_unsigned_24",
> > >                          DW_OP_stack_value,
> > >                          DW_OP_piece 0x3
> > >                      ......
> > >
> > > At one point, a right shift by 8 happens and the result is converted to
> > > 32-bit unsigned int and then to 24-bit unsigned int.
> > >
> > > BTF does not need any of these DW_OP_* information and such non-regular int
> > > types will cause libbpf to emit errors.
> > > Let us sanitize them to generate BTF acceptable to libbpf and kernel.
> > >
> > > Cc: Sedat Dilek <sedat.dilek@gmail.com>
> >
> > Thanks for v2.
> >
> > For both v1 and v2:
> >
> >    Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
> >    Reported-by: Sedat Dilek <sedat.dilek@gmail.com>
>
> Thanks, applied.
>

Great.
I cannot see it yet in [1] or [2].

More important to me is is this worth a pahole v1.20.1 release?
This patch is required to successfully build with BTF and DWARF-5 and Clang-12.

I have Nathan's "bpf: Hoise pahole version checks into Kconfig" patch
in my custom patchset (together with Nick's DWARF-v5 patchset) which
makes it easier to do (depends on PAHOLE_VERSION >= 1201) via Kconfig
for example.

- Sedat -

[1] https://git.kernel.org/pub/scm/devel/pahole/pahole.git/
[2] https://github.com/acmel/dwarves/

> - Arnaldo
>
> > My development and testing environment:
> >
> > 1. Debian/testing AMD64
> > 2. Linux v5.11-rc6+ with custom mostly Clang fixes
> > 3. Debug-Info: BTF + DWARF-v5 enabled
> > 4. Compiler and tools (incl. Clang's Integrated ASsembler IAS):
> > LLVM/Clang v12.0.0-rc1 (make LLVM=1 LLVM_IAS=1)
> >
> > Build and boot on bare metal.
> >
> > - Sedat -
> >
> > > Acked-by: Andrii Nakryiko <andrii@kernel.org>
> > > Signed-off-by: Yonghong Song <yhs@fb.com>
> > > ---
> > >  libbtf.c | 39 ++++++++++++++++++++++++++++++++++++++-
> > >  1 file changed, 38 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/libbtf.c b/libbtf.c
> > > index 9f76283..5843200 100644
> > > --- a/libbtf.c
> > > +++ b/libbtf.c
> > > @@ -373,6 +373,7 @@ int32_t btf_elf__add_base_type(struct btf_elf *btfe, const struct base_type *bt,
> > >         struct btf *btf = btfe->btf;
> > >         const struct btf_type *t;
> > >         uint8_t encoding = 0;
> > > +       uint16_t byte_sz;
> > >         int32_t id;
> > >
> > >         if (bt->is_signed) {
> > > @@ -384,7 +385,43 @@ int32_t btf_elf__add_base_type(struct btf_elf *btfe, const struct base_type *bt,
> > >                 return -1;
> > >         }
> > >
> > > -       id = btf__add_int(btf, name, BITS_ROUNDUP_BYTES(bt->bit_size), encoding);
> > > +       /* dwarf5 may emit DW_ATE_[un]signed_{num} base types where
> > > +        * {num} is not power of 2 and may exceed 128. Such attributes
> > > +        * are mostly used to record operation for an actual parameter
> > > +        * or variable.
> > > +        * For example,
> > > +        *     DW_AT_location        (indexed (0x3c) loclist = 0x00008fb0:
> > > +        *         [0xffffffff82808812, 0xffffffff82808817):
> > > +        *             DW_OP_breg0 RAX+0,
> > > +        *             DW_OP_convert (0x000e97d5) "DW_ATE_unsigned_64",
> > > +        *             DW_OP_convert (0x000e97df) "DW_ATE_unsigned_8",
> > > +        *             DW_OP_stack_value,
> > > +        *             DW_OP_piece 0x1,
> > > +        *             DW_OP_breg0 RAX+0,
> > > +        *             DW_OP_convert (0x000e97d5) "DW_ATE_unsigned_64",
> > > +        *             DW_OP_convert (0x000e97da) "DW_ATE_unsigned_32",
> > > +        *             DW_OP_lit8,
> > > +        *             DW_OP_shr,
> > > +        *             DW_OP_convert (0x000e97da) "DW_ATE_unsigned_32",
> > > +        *             DW_OP_convert (0x000e97e4) "DW_ATE_unsigned_24",
> > > +        *             DW_OP_stack_value, DW_OP_piece 0x3
> > > +        *     DW_AT_name    ("ebx")
> > > +        *     DW_AT_decl_file       ("/linux/arch/x86/events/intel/core.c")
> > > +        *
> > > +        * In the above example, at some point, one unsigned_32 value
> > > +        * is right shifted by 8 and the result is converted to unsigned_32
> > > +        * and then unsigned_24.
> > > +        *
> > > +        * BTF does not need such DW_OP_* information so let us sanitize
> > > +        * these non-regular int types to avoid libbpf/kernel complaints.
> > > +        */
> > > +       byte_sz = BITS_ROUNDUP_BYTES(bt->bit_size);
> > > +       if (!byte_sz || (byte_sz & (byte_sz - 1))) {
> > > +               name = "__SANITIZED_FAKE_INT__";
> > > +               byte_sz = 4;
> > > +       }
> > > +
> > > +       id = btf__add_int(btf, name, byte_sz, encoding);
> > >         if (id < 0) {
> > >                 btf_elf__log_err(btfe, BTF_KIND_INT, name, true, "Error emitting BTF type");
> > >         } else {
> > > --
> > > 2.24.1
> > >
>
> --
>
> - Arnaldo
Andrii Nakryiko Feb. 8, 2021, 9:07 p.m. UTC | #11
On Mon, Feb 8, 2021 at 12:46 PM Sedat Dilek <sedat.dilek@gmail.com> wrote:
>
> On Mon, Feb 8, 2021 at 8:55 PM Arnaldo Carvalho de Melo
> <arnaldo.melo@gmail.com> wrote:
> >
> > Em Sun, Feb 07, 2021 at 11:32:00AM +0100, Sedat Dilek escreveu:
> > > On Sun, Feb 7, 2021 at 8:17 AM Yonghong Song <yhs@fb.com> wrote:
> > > >
> > > > clang with dwarf5 may generate non-regular int base type,
> > > > i.e., not a signed/unsigned char/short/int/longlong/__int128.
> > > > Such base types are often used to describe
> > > > how an actual parameter or variable is generated. For example,
> > > >
> > > > 0x000015cf:   DW_TAG_base_type
> > > >                 DW_AT_name      ("DW_ATE_unsigned_1")
> > > >                 DW_AT_encoding  (DW_ATE_unsigned)
> > > >                 DW_AT_byte_size (0x00)
> > > >
> > > > 0x00010ed9:         DW_TAG_formal_parameter
> > > >                       DW_AT_location    (DW_OP_lit0,
> > > >                                          DW_OP_not,
> > > >                                          DW_OP_convert (0x000015cf) "DW_ATE_unsigned_1",
> > > >                                          DW_OP_convert (0x000015d4) "DW_ATE_unsigned_8",
> > > >                                          DW_OP_stack_value)
> > > >                       DW_AT_abstract_origin     (0x00013984 "branch")
> > > >
> > > > What it does is with a literal "0", did a "not" operation, and the converted to
> > > > one-bit unsigned int and then 8-bit unsigned int.
> > > >
> > > > Another example,
> > > >
> > > > 0x000e97e4:   DW_TAG_base_type
> > > >                 DW_AT_name      ("DW_ATE_unsigned_24")
> > > >                 DW_AT_encoding  (DW_ATE_unsigned)
> > > >                 DW_AT_byte_size (0x03)
> > > >
> > > > 0x000f88f8:     DW_TAG_variable
> > > >                   DW_AT_location        (indexed (0x3c) loclist = 0x00008fb0:
> > > >                      [0xffffffff82808812, 0xffffffff82808817):
> > > >                          DW_OP_breg0 RAX+0,
> > > >                          DW_OP_convert (0x000e97d5) "DW_ATE_unsigned_64",
> > > >                          DW_OP_convert (0x000e97df) "DW_ATE_unsigned_8",
> > > >                          DW_OP_stack_value,
> > > >                          DW_OP_piece 0x1,
> > > >                          DW_OP_breg0 RAX+0,
> > > >                          DW_OP_convert (0x000e97d5) "DW_ATE_unsigned_64",
> > > >                          DW_OP_convert (0x000e97da) "DW_ATE_unsigned_32",
> > > >                          DW_OP_lit8,
> > > >                          DW_OP_shr,
> > > >                          DW_OP_convert (0x000e97da) "DW_ATE_unsigned_32",
> > > >                          DW_OP_convert (0x000e97e4) "DW_ATE_unsigned_24",
> > > >                          DW_OP_stack_value,
> > > >                          DW_OP_piece 0x3
> > > >                      ......
> > > >
> > > > At one point, a right shift by 8 happens and the result is converted to
> > > > 32-bit unsigned int and then to 24-bit unsigned int.
> > > >
> > > > BTF does not need any of these DW_OP_* information and such non-regular int
> > > > types will cause libbpf to emit errors.
> > > > Let us sanitize them to generate BTF acceptable to libbpf and kernel.
> > > >
> > > > Cc: Sedat Dilek <sedat.dilek@gmail.com>
> > >
> > > Thanks for v2.
> > >
> > > For both v1 and v2:
> > >
> > >    Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
> > >    Reported-by: Sedat Dilek <sedat.dilek@gmail.com>
> >
> > Thanks, applied.
> >
>
> Great.
> I cannot see it yet in [1] or [2].
>
> More important to me is is this worth a pahole v1.20.1 release?
> This patch is required to successfully build with BTF and DWARF-5 and Clang-12.
>
> I have Nathan's "bpf: Hoise pahole version checks into Kconfig" patch
> in my custom patchset (together with Nick's DWARF-v5 patchset) which
> makes it easier to do (depends on PAHOLE_VERSION >= 1201) via Kconfig

I don't think the math works out with 1201 vs 121 (in the future), so
I'd rather prefer 1.21, if necessary.

> for example.
>
> - Sedat -
>
> [1] https://git.kernel.org/pub/scm/devel/pahole/pahole.git/
> [2] https://github.com/acmel/dwarves/
>
> > - Arnaldo
> >
> > > My development and testing environment:
> > >
> > > 1. Debian/testing AMD64
> > > 2. Linux v5.11-rc6+ with custom mostly Clang fixes
> > > 3. Debug-Info: BTF + DWARF-v5 enabled
> > > 4. Compiler and tools (incl. Clang's Integrated ASsembler IAS):
> > > LLVM/Clang v12.0.0-rc1 (make LLVM=1 LLVM_IAS=1)
> > >
> > > Build and boot on bare metal.
> > >
> > > - Sedat -
> > >
> > > > Acked-by: Andrii Nakryiko <andrii@kernel.org>
> > > > Signed-off-by: Yonghong Song <yhs@fb.com>
> > > > ---
> > > >  libbtf.c | 39 ++++++++++++++++++++++++++++++++++++++-
> > > >  1 file changed, 38 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/libbtf.c b/libbtf.c
> > > > index 9f76283..5843200 100644
> > > > --- a/libbtf.c
> > > > +++ b/libbtf.c
> > > > @@ -373,6 +373,7 @@ int32_t btf_elf__add_base_type(struct btf_elf *btfe, const struct base_type *bt,
> > > >         struct btf *btf = btfe->btf;
> > > >         const struct btf_type *t;
> > > >         uint8_t encoding = 0;
> > > > +       uint16_t byte_sz;
> > > >         int32_t id;
> > > >
> > > >         if (bt->is_signed) {
> > > > @@ -384,7 +385,43 @@ int32_t btf_elf__add_base_type(struct btf_elf *btfe, const struct base_type *bt,
> > > >                 return -1;
> > > >         }
> > > >
> > > > -       id = btf__add_int(btf, name, BITS_ROUNDUP_BYTES(bt->bit_size), encoding);
> > > > +       /* dwarf5 may emit DW_ATE_[un]signed_{num} base types where
> > > > +        * {num} is not power of 2 and may exceed 128. Such attributes
> > > > +        * are mostly used to record operation for an actual parameter
> > > > +        * or variable.
> > > > +        * For example,
> > > > +        *     DW_AT_location        (indexed (0x3c) loclist = 0x00008fb0:
> > > > +        *         [0xffffffff82808812, 0xffffffff82808817):
> > > > +        *             DW_OP_breg0 RAX+0,
> > > > +        *             DW_OP_convert (0x000e97d5) "DW_ATE_unsigned_64",
> > > > +        *             DW_OP_convert (0x000e97df) "DW_ATE_unsigned_8",
> > > > +        *             DW_OP_stack_value,
> > > > +        *             DW_OP_piece 0x1,
> > > > +        *             DW_OP_breg0 RAX+0,
> > > > +        *             DW_OP_convert (0x000e97d5) "DW_ATE_unsigned_64",
> > > > +        *             DW_OP_convert (0x000e97da) "DW_ATE_unsigned_32",
> > > > +        *             DW_OP_lit8,
> > > > +        *             DW_OP_shr,
> > > > +        *             DW_OP_convert (0x000e97da) "DW_ATE_unsigned_32",
> > > > +        *             DW_OP_convert (0x000e97e4) "DW_ATE_unsigned_24",
> > > > +        *             DW_OP_stack_value, DW_OP_piece 0x3
> > > > +        *     DW_AT_name    ("ebx")
> > > > +        *     DW_AT_decl_file       ("/linux/arch/x86/events/intel/core.c")
> > > > +        *
> > > > +        * In the above example, at some point, one unsigned_32 value
> > > > +        * is right shifted by 8 and the result is converted to unsigned_32
> > > > +        * and then unsigned_24.
> > > > +        *
> > > > +        * BTF does not need such DW_OP_* information so let us sanitize
> > > > +        * these non-regular int types to avoid libbpf/kernel complaints.
> > > > +        */
> > > > +       byte_sz = BITS_ROUNDUP_BYTES(bt->bit_size);
> > > > +       if (!byte_sz || (byte_sz & (byte_sz - 1))) {
> > > > +               name = "__SANITIZED_FAKE_INT__";
> > > > +               byte_sz = 4;
> > > > +       }
> > > > +
> > > > +       id = btf__add_int(btf, name, byte_sz, encoding);
> > > >         if (id < 0) {
> > > >                 btf_elf__log_err(btfe, BTF_KIND_INT, name, true, "Error emitting BTF type");
> > > >         } else {
> > > > --
> > > > 2.24.1
> > > >
> >
> > --
> >
> > - Arnaldo
Sedat Dilek Feb. 8, 2021, 9:11 p.m. UTC | #12
On Mon, Feb 8, 2021 at 10:08 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Mon, Feb 8, 2021 at 12:46 PM Sedat Dilek <sedat.dilek@gmail.com> wrote:
> >
> > On Mon, Feb 8, 2021 at 8:55 PM Arnaldo Carvalho de Melo
> > <arnaldo.melo@gmail.com> wrote:
> > >
> > > Em Sun, Feb 07, 2021 at 11:32:00AM +0100, Sedat Dilek escreveu:
> > > > On Sun, Feb 7, 2021 at 8:17 AM Yonghong Song <yhs@fb.com> wrote:
> > > > >
> > > > > clang with dwarf5 may generate non-regular int base type,
> > > > > i.e., not a signed/unsigned char/short/int/longlong/__int128.
> > > > > Such base types are often used to describe
> > > > > how an actual parameter or variable is generated. For example,
> > > > >
> > > > > 0x000015cf:   DW_TAG_base_type
> > > > >                 DW_AT_name      ("DW_ATE_unsigned_1")
> > > > >                 DW_AT_encoding  (DW_ATE_unsigned)
> > > > >                 DW_AT_byte_size (0x00)
> > > > >
> > > > > 0x00010ed9:         DW_TAG_formal_parameter
> > > > >                       DW_AT_location    (DW_OP_lit0,
> > > > >                                          DW_OP_not,
> > > > >                                          DW_OP_convert (0x000015cf) "DW_ATE_unsigned_1",
> > > > >                                          DW_OP_convert (0x000015d4) "DW_ATE_unsigned_8",
> > > > >                                          DW_OP_stack_value)
> > > > >                       DW_AT_abstract_origin     (0x00013984 "branch")
> > > > >
> > > > > What it does is with a literal "0", did a "not" operation, and the converted to
> > > > > one-bit unsigned int and then 8-bit unsigned int.
> > > > >
> > > > > Another example,
> > > > >
> > > > > 0x000e97e4:   DW_TAG_base_type
> > > > >                 DW_AT_name      ("DW_ATE_unsigned_24")
> > > > >                 DW_AT_encoding  (DW_ATE_unsigned)
> > > > >                 DW_AT_byte_size (0x03)
> > > > >
> > > > > 0x000f88f8:     DW_TAG_variable
> > > > >                   DW_AT_location        (indexed (0x3c) loclist = 0x00008fb0:
> > > > >                      [0xffffffff82808812, 0xffffffff82808817):
> > > > >                          DW_OP_breg0 RAX+0,
> > > > >                          DW_OP_convert (0x000e97d5) "DW_ATE_unsigned_64",
> > > > >                          DW_OP_convert (0x000e97df) "DW_ATE_unsigned_8",
> > > > >                          DW_OP_stack_value,
> > > > >                          DW_OP_piece 0x1,
> > > > >                          DW_OP_breg0 RAX+0,
> > > > >                          DW_OP_convert (0x000e97d5) "DW_ATE_unsigned_64",
> > > > >                          DW_OP_convert (0x000e97da) "DW_ATE_unsigned_32",
> > > > >                          DW_OP_lit8,
> > > > >                          DW_OP_shr,
> > > > >                          DW_OP_convert (0x000e97da) "DW_ATE_unsigned_32",
> > > > >                          DW_OP_convert (0x000e97e4) "DW_ATE_unsigned_24",
> > > > >                          DW_OP_stack_value,
> > > > >                          DW_OP_piece 0x3
> > > > >                      ......
> > > > >
> > > > > At one point, a right shift by 8 happens and the result is converted to
> > > > > 32-bit unsigned int and then to 24-bit unsigned int.
> > > > >
> > > > > BTF does not need any of these DW_OP_* information and such non-regular int
> > > > > types will cause libbpf to emit errors.
> > > > > Let us sanitize them to generate BTF acceptable to libbpf and kernel.
> > > > >
> > > > > Cc: Sedat Dilek <sedat.dilek@gmail.com>
> > > >
> > > > Thanks for v2.
> > > >
> > > > For both v1 and v2:
> > > >
> > > >    Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
> > > >    Reported-by: Sedat Dilek <sedat.dilek@gmail.com>
> > >
> > > Thanks, applied.
> > >
> >
> > Great.
> > I cannot see it yet in [1] or [2].
> >
> > More important to me is is this worth a pahole v1.20.1 release?
> > This patch is required to successfully build with BTF and DWARF-5 and Clang-12.
> >
> > I have Nathan's "bpf: Hoise pahole version checks into Kconfig" patch
> > in my custom patchset (together with Nick's DWARF-v5 patchset) which
> > makes it easier to do (depends on PAHOLE_VERSION >= 1201) via Kconfig
>
> I don't think the math works out with 1201 vs 121 (in the future), so
> I'd rather prefer 1.21, if necessary.
>

Aaargh, you are right.

# /opt/pahole/bin/pahole --numeric_version
120

The wish is to have a pahole v1.21.

- Sedat -

> > for example.
> >
> > - Sedat -
> >
> > [1] https://git.kernel.org/pub/scm/devel/pahole/pahole.git/
> > [2] https://github.com/acmel/dwarves/
> >
> > > - Arnaldo
> > >
> > > > My development and testing environment:
> > > >
> > > > 1. Debian/testing AMD64
> > > > 2. Linux v5.11-rc6+ with custom mostly Clang fixes
> > > > 3. Debug-Info: BTF + DWARF-v5 enabled
> > > > 4. Compiler and tools (incl. Clang's Integrated ASsembler IAS):
> > > > LLVM/Clang v12.0.0-rc1 (make LLVM=1 LLVM_IAS=1)
> > > >
> > > > Build and boot on bare metal.
> > > >
> > > > - Sedat -
> > > >
> > > > > Acked-by: Andrii Nakryiko <andrii@kernel.org>
> > > > > Signed-off-by: Yonghong Song <yhs@fb.com>
> > > > > ---
> > > > >  libbtf.c | 39 ++++++++++++++++++++++++++++++++++++++-
> > > > >  1 file changed, 38 insertions(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/libbtf.c b/libbtf.c
> > > > > index 9f76283..5843200 100644
> > > > > --- a/libbtf.c
> > > > > +++ b/libbtf.c
> > > > > @@ -373,6 +373,7 @@ int32_t btf_elf__add_base_type(struct btf_elf *btfe, const struct base_type *bt,
> > > > >         struct btf *btf = btfe->btf;
> > > > >         const struct btf_type *t;
> > > > >         uint8_t encoding = 0;
> > > > > +       uint16_t byte_sz;
> > > > >         int32_t id;
> > > > >
> > > > >         if (bt->is_signed) {
> > > > > @@ -384,7 +385,43 @@ int32_t btf_elf__add_base_type(struct btf_elf *btfe, const struct base_type *bt,
> > > > >                 return -1;
> > > > >         }
> > > > >
> > > > > -       id = btf__add_int(btf, name, BITS_ROUNDUP_BYTES(bt->bit_size), encoding);
> > > > > +       /* dwarf5 may emit DW_ATE_[un]signed_{num} base types where
> > > > > +        * {num} is not power of 2 and may exceed 128. Such attributes
> > > > > +        * are mostly used to record operation for an actual parameter
> > > > > +        * or variable.
> > > > > +        * For example,
> > > > > +        *     DW_AT_location        (indexed (0x3c) loclist = 0x00008fb0:
> > > > > +        *         [0xffffffff82808812, 0xffffffff82808817):
> > > > > +        *             DW_OP_breg0 RAX+0,
> > > > > +        *             DW_OP_convert (0x000e97d5) "DW_ATE_unsigned_64",
> > > > > +        *             DW_OP_convert (0x000e97df) "DW_ATE_unsigned_8",
> > > > > +        *             DW_OP_stack_value,
> > > > > +        *             DW_OP_piece 0x1,
> > > > > +        *             DW_OP_breg0 RAX+0,
> > > > > +        *             DW_OP_convert (0x000e97d5) "DW_ATE_unsigned_64",
> > > > > +        *             DW_OP_convert (0x000e97da) "DW_ATE_unsigned_32",
> > > > > +        *             DW_OP_lit8,
> > > > > +        *             DW_OP_shr,
> > > > > +        *             DW_OP_convert (0x000e97da) "DW_ATE_unsigned_32",
> > > > > +        *             DW_OP_convert (0x000e97e4) "DW_ATE_unsigned_24",
> > > > > +        *             DW_OP_stack_value, DW_OP_piece 0x3
> > > > > +        *     DW_AT_name    ("ebx")
> > > > > +        *     DW_AT_decl_file       ("/linux/arch/x86/events/intel/core.c")
> > > > > +        *
> > > > > +        * In the above example, at some point, one unsigned_32 value
> > > > > +        * is right shifted by 8 and the result is converted to unsigned_32
> > > > > +        * and then unsigned_24.
> > > > > +        *
> > > > > +        * BTF does not need such DW_OP_* information so let us sanitize
> > > > > +        * these non-regular int types to avoid libbpf/kernel complaints.
> > > > > +        */
> > > > > +       byte_sz = BITS_ROUNDUP_BYTES(bt->bit_size);
> > > > > +       if (!byte_sz || (byte_sz & (byte_sz - 1))) {
> > > > > +               name = "__SANITIZED_FAKE_INT__";
> > > > > +               byte_sz = 4;
> > > > > +       }
> > > > > +
> > > > > +       id = btf__add_int(btf, name, byte_sz, encoding);
> > > > >         if (id < 0) {
> > > > >                 btf_elf__log_err(btfe, BTF_KIND_INT, name, true, "Error emitting BTF type");
> > > > >         } else {
> > > > > --
> > > > > 2.24.1
> > > > >
> > >
> > > --
> > >
> > > - Arnaldo
Mark Wielaard Feb. 8, 2021, 9:24 p.m. UTC | #13
Hi David,

(Nice to have seen you just recently!)

On Mon, 2021-02-08 at 12:43 -0800, David Blaikie wrote:
> DW_OP_convert was added in DWARFv5 and it can be used for type conversions,
> these are created in the LLVM middle/backend during optimizations, not by
> the frontend. So the middle/backend doesn't have a way to create canonical
> DW_TAG_base_types for frontend language integer types - so it creates
> synthesized ones. So this is intentional/not a particularly quirky thing.
> 
> LLVM creates locations incrementally as optimizations are applied - without
> doing any particular canonicalization/reduction later on (maybe it has some
> canonicalization/reduction, but not a very wholistic/aggressive approach
> there) - so it can end up with something that is not the most compact
> representation.

But it seems to end up with a bogus representation (see below in the
original quoted message for an example). The issue isn't really that it
isn't an optimized constant representation (although it is kind of an
inefficient one). It is this DW_OP_convert applied to a
DW_TAG_base_type DIE with a DW_AT_byte_size of zero. Which doesn't
really make any sense to me. It feels like it is asking the DWARF
consumer to do a divide by zero here ("now express this value using
zero bits!"). Could you explain these syntactic "DW_ATE_unsigned_1"
base_type DIEs? What do they represent? Why do they have a zero size?
Should they really have a DW_AT_bit_size and a DW_AT_byte_size of 1?

Thanks,

Mark

> On Mon, Feb 8, 2021 at 11:17 AM Nick Desaulniers <ndesaulniers@google.com>
> wrote:
> 
> > On Sun, Feb 7, 2021 at 6:18 AM Mark Wielaard <mark@klomp.org> wrote:
> > > 
> > > Hi,
> > > 
> > > On Sat, 2021-02-06 at 23:17 -0800, Yonghong Song wrote:
> > > > clang with dwarf5 may generate non-regular int base type,
> > > > i.e., not a signed/unsigned char/short/int/longlong/__int128.
> > > > Such base types are often used to describe
> > > > how an actual parameter or variable is generated. For example,
> > > > 
> > > > 0x000015cf:   DW_TAG_base_type
> > > >                 DW_AT_name      ("DW_ATE_unsigned_1")
> > > >                 DW_AT_encoding  (DW_ATE_unsigned)
> > > >                 DW_AT_byte_size (0x00)
> > > > 
> > > > 0x00010ed9:         DW_TAG_formal_parameter
> > > >                       DW_AT_location    (DW_OP_lit0,
> > > >                                          DW_OP_not,
> > > >                                          DW_OP_convert (0x000015cf)
> > 
> > "DW_ATE_unsigned_1",
> > > >                                          DW_OP_convert (0x000015d4)
> > 
> > "DW_ATE_unsigned_8",
> > > >                                          DW_OP_stack_value)
> > > >                       DW_AT_abstract_origin     (0x00013984 "branch")
> > > > 
> > > > What it does is with a literal "0", did a "not" operation, and the
> > 
> > converted to
> > > > one-bit unsigned int and then 8-bit unsigned int.
> > > 
> > > Thanks for tracking this down. Do you have any idea why the clang
> > > compiler emits this? You might be right that it is intended to do what
> > > you describe it does (but then it would simply encode an unsigned
> > > constant 1 char in a very inefficient way). But as implemented it
> > > doesn't seem to make any sense. What would DW_OP_convert of an zero
> > > sized base type even mean (if it is intended as a 1 bit sized typed,
> > > then why is there no DW_AT_bit_size)?
> > 
> > David,
> > Any thoughts on the above sequence of DW_OP_ entries?  This is a part
> > of DWARF I'm unfamiliar with.
> > 
> > > 
> > > So I do think your patch makes sense. clang clearly is emitting
> > > something bogus. And so some fixup is needed. But maybe we should at
> > > least give a warning about it, otherwise it might never get fixed.
> > > 
> > > BTW. If these bogus base types are only emitted as part of a location
> > > expression and not as part of an actual function or variable type
> > > description, then why are we even trying to encode it as a BTF type? It
> > > might be cheaper to just skip/drop it. But maybe the code setup makes
> > > it hard to know whether or not such a (bogus) type is actually
> > > referenced from a function or variable description?
> > > 
> > > Cheers,
> > > 
> > > Mark
> > 
> > 
> > 
> > --
> > Thanks,
> > ~Nick Desaulniers
> >
diff mbox series

Patch

diff --git a/libbtf.c b/libbtf.c
index 9f76283..5843200 100644
--- a/libbtf.c
+++ b/libbtf.c
@@ -373,6 +373,7 @@  int32_t btf_elf__add_base_type(struct btf_elf *btfe, const struct base_type *bt,
 	struct btf *btf = btfe->btf;
 	const struct btf_type *t;
 	uint8_t encoding = 0;
+	uint16_t byte_sz;
 	int32_t id;
 
 	if (bt->is_signed) {
@@ -384,7 +385,43 @@  int32_t btf_elf__add_base_type(struct btf_elf *btfe, const struct base_type *bt,
 		return -1;
 	}
 
-	id = btf__add_int(btf, name, BITS_ROUNDUP_BYTES(bt->bit_size), encoding);
+	/* dwarf5 may emit DW_ATE_[un]signed_{num} base types where
+	 * {num} is not power of 2 and may exceed 128. Such attributes
+	 * are mostly used to record operation for an actual parameter
+	 * or variable.
+	 * For example,
+	 *     DW_AT_location        (indexed (0x3c) loclist = 0x00008fb0:
+	 *         [0xffffffff82808812, 0xffffffff82808817):
+	 *             DW_OP_breg0 RAX+0,
+	 *             DW_OP_convert (0x000e97d5) "DW_ATE_unsigned_64",
+	 *             DW_OP_convert (0x000e97df) "DW_ATE_unsigned_8",
+	 *             DW_OP_stack_value,
+	 *             DW_OP_piece 0x1,
+	 *             DW_OP_breg0 RAX+0,
+	 *             DW_OP_convert (0x000e97d5) "DW_ATE_unsigned_64",
+	 *             DW_OP_convert (0x000e97da) "DW_ATE_unsigned_32",
+	 *             DW_OP_lit8,
+	 *             DW_OP_shr,
+	 *             DW_OP_convert (0x000e97da) "DW_ATE_unsigned_32",
+	 *             DW_OP_convert (0x000e97e4) "DW_ATE_unsigned_24",
+	 *             DW_OP_stack_value, DW_OP_piece 0x3
+	 *     DW_AT_name    ("ebx")
+	 *     DW_AT_decl_file       ("/linux/arch/x86/events/intel/core.c")
+	 *
+	 * In the above example, at some point, one unsigned_32 value
+	 * is right shifted by 8 and the result is converted to unsigned_32
+	 * and then unsigned_24.
+	 *
+	 * BTF does not need such DW_OP_* information so let us sanitize
+	 * these non-regular int types to avoid libbpf/kernel complaints.
+	 */
+	byte_sz = BITS_ROUNDUP_BYTES(bt->bit_size);
+	if (!byte_sz || (byte_sz & (byte_sz - 1))) {
+		name = "__SANITIZED_FAKE_INT__";
+		byte_sz = 4;
+	}
+
+	id = btf__add_int(btf, name, byte_sz, encoding);
 	if (id < 0) {
 		btf_elf__log_err(btfe, BTF_KIND_INT, name, true, "Error emitting BTF type");
 	} else {