Message ID | 20230517161648.17582-6-alan.maguire@oracle.com (mailing list archive) |
---|---|
State | RFC |
Headers | show |
Series | Encoding function addresses using DECL_TAGs | expand |
Context | Check | Description |
---|---|---|
netdev/tree_selection | success | Not a local patch |
On Wed, May 17, 2023 at 05:16:47PM +0100, Alan Maguire wrote: > By making sorting function for our ELF function list match on > both name and function, we ensure that the set of ELF functions > includes multiple copies for functions which have multiple instances > across CUs. For example, cpumask_weight has 22 instances in > System.map/kallsyms: > > ffffffff8103b530 t cpumask_weight > ffffffff8103e300 t cpumask_weight > ffffffff81040d30 t cpumask_weight > ffffffff8104fa00 t cpumask_weight > ffffffff81064300 t cpumask_weight > ffffffff81082ba0 t cpumask_weight > ffffffff81084f50 t cpumask_weight > ffffffff810a4ad0 t cpumask_weight > ffffffff810bb740 t cpumask_weight > ffffffff8110a6c0 t cpumask_weight > ffffffff81118ab0 t cpumask_weight > ffffffff81129b50 t cpumask_weight > ffffffff81137dc0 t cpumask_weight > ffffffff811aead0 t cpumask_weight > ffffffff811d6800 t cpumask_weight > ffffffff811e1370 t cpumask_weight > ffffffff812fae80 t cpumask_weight > ffffffff81375c50 t cpumask_weight > ffffffff81634b60 t cpumask_weight > ffffffff817ba540 t cpumask_weight > ffffffff819abf30 t cpumask_weight > ffffffff81a7cb60 t cpumask_weight > > With ELF representations for each address, and DWARF info about > addresses (low_pc) we can match DWARF with ELF accurately. > The result for the BTF representation is that we end up with > a single de-duped function: > > [9287] FUNC 'cpumask_weight' type_id=9286 linkage=static > > ...and 22 DECL_TAGs for each address that point at it: > > 9288] DECL_TAG 'address=0xffffffff8103b530' type_id=9287 component_idx=-1 > [9623] DECL_TAG 'address=0xffffffff8103e300' type_id=9287 component_idx=-1 > [9829] DECL_TAG 'address=0xffffffff81040d30' type_id=9287 component_idx=-1 > [11609] DECL_TAG 'address=0xffffffff8104fa00' type_id=9287 component_idx=-1 > [13299] DECL_TAG 'address=0xffffffff81064300' type_id=9287 component_idx=-1 > [15704] DECL_TAG 'address=0xffffffff81082ba0' type_id=9287 component_idx=-1 > [15731] DECL_TAG 'address=0xffffffff81084f50' type_id=9287 component_idx=-1 > [18582] DECL_TAG 'address=0xffffffff810a4ad0' type_id=9287 component_idx=-1 > [20234] DECL_TAG 'address=0xffffffff810bb740' type_id=9287 component_idx=-1 > [25384] DECL_TAG 'address=0xffffffff8110a6c0' type_id=9287 component_idx=-1 > [25798] DECL_TAG 'address=0xffffffff81118ab0' type_id=9287 component_idx=-1 > [26285] DECL_TAG 'address=0xffffffff81129b50' type_id=9287 component_idx=-1 > [27040] DECL_TAG 'address=0xffffffff81137dc0' type_id=9287 component_idx=-1 > [32900] DECL_TAG 'address=0xffffffff811aead0' type_id=9287 component_idx=-1 > [35059] DECL_TAG 'address=0xffffffff811d6800' type_id=9287 component_idx=-1 > [35353] DECL_TAG 'address=0xffffffff811e1370' type_id=9287 component_idx=-1 > [48934] DECL_TAG 'address=0xffffffff812fae80' type_id=9287 component_idx=-1 > [54476] DECL_TAG 'address=0xffffffff81375c50' type_id=9287 component_idx=-1 > [87772] DECL_TAG 'address=0xffffffff81634b60' type_id=9287 component_idx=-1 > [108841] DECL_TAG 'address=0xffffffff817ba540' type_id=9287 component_idx=-1 > [132557] DECL_TAG 'address=0xffffffff819abf30' type_id=9287 component_idx=-1 > [143689] DECL_TAG 'address=0xffffffff81a7cb60' type_id=9287 component_idx=-1 right, Yonghong pointed this out in: https://lore.kernel.org/bpf/49e4fee2-8be0-325f-3372-c79d96b686e9@meta.com/ it's problem, because we pass btf id as attach id during bpf program load, and kernel does not have a way to figure out which address from the associated DECL_TAGs to use if we could change dedup algo to take the function address into account and make it not de-duplicate equal functions with different addresses, then we could: - find btf id that properly and uniquely identifies the function we want to trace - store the vmlinux base address and treat stored function addresses as offsets, so the verifier can get proper address even if the kernel is relocated or - add support for kernel's kallsyms to return address for given btf id, I plan to check on this one jirka > > Consider another case where the same name - wakeup_show() - is > used for two different function signatures: > > From kernel/irq/irqdesc.c > > static ssize_t wakeup_show(struct kobject *kobj, > struct kobj_attribute *attr, char *buf) > > ...and from drivers/base/power/sysfs.c > > static ssize_t wakeup_show(struct device *dev, struct device_attribute *attr, > char *buf); > > We see both defined in BTF, along with the addresses that > tell us which is which: > > [28472] FUNC 'wakeup_show' type_id=11214 linkage=static > > specifies > > [11214] FUNC_PROTO '(anon)' ret_type_id=76 vlen=3 > 'kobj' type_id=877 > 'attr' type_id=11200 > 'buf' type_id=56 > > ...and has declaration tag > > [28473] DECL_TAG 'address=0xffffffff8115eab0' type_id=28472 component_idx=-1 > > which identifies > > ffffffff8115eab0 t wakeup_show > > ...as the function with the first signature. > > Similarly, > > [114375] FUNC 'wakeup_show' type_id=4750 linkage=static > > [4750] FUNC_PROTO '(anon)' ret_type_id=76 vlen=3 > 'dev' type_id=1488 > 'attr' type_id=3909 > 'buf' type_id=56 > ...and > > [114376] DECL_TAG 'address=0xffffffff8181eac0' type_id=114375 component_idx=-1 > > ...tell us that > > ffffffff8181eac0 t wakeup_show > > ...has the second signature. So we can accommodate multiple > functions with conflicting signatures in BTF, since we have > added extra info to map from function description in BTF > to address. > > In total for vmlinux 52006 DECL_TAGs are added, and these add > 2MB to the BTF representation. > > Signed-off-by: Alan Maguire <alan.maguire@oracle.com> > --- > btf_encoder.c | 26 ++++++++++++++++++++------ > 1 file changed, 20 insertions(+), 6 deletions(-) > > diff --git a/btf_encoder.c b/btf_encoder.c > index 3bd0fe0..315053d 100644 > --- a/btf_encoder.c > +++ b/btf_encoder.c > @@ -988,13 +988,25 @@ static int functions_cmp(const void *_a, const void *_b) > { > const struct elf_function *a = _a; > const struct elf_function *b = _b; > + int ret; > > /* if search key allows prefix match, verify target has matching > * prefix len and prefix matches. > */ > if (a->prefixlen && a->prefixlen == b->prefixlen) > - return strncmp(a->name, b->name, b->prefixlen); > - return strcmp(a->name, b->name); > + ret = strncmp(a->name, b->name, b->prefixlen); > + else > + ret = strcmp(a->name, b->name); > + > + if (ret || !b->addr) > + return ret; > + > + /* secondarily sort/search by address. */ > + if (a->addr < b->addr) > + return -1; > + if (a->addr > b->addr) > + return 1; > + return 0; > } > > #ifndef max > @@ -1044,9 +1056,11 @@ static int btf_encoder__collect_function(struct btf_encoder *encoder, GElf_Sym * > } > > static struct elf_function *btf_encoder__find_function(const struct btf_encoder *encoder, > - const char *name, size_t prefixlen) > + struct function *fn, size_t prefixlen) > { > - struct elf_function key = { .name = name, .prefixlen = prefixlen }; > + struct elf_function key = { .name = function__name(fn), > + .addr = fn->low_pc, > + .prefixlen = prefixlen }; > > return bsearch(&key, encoder->functions.entries, encoder->functions.cnt, sizeof(key), functions_cmp); > } > @@ -1846,7 +1860,7 @@ int btf_encoder__encode_cu(struct btf_encoder *encoder, struct cu *cu, struct co > continue; > > /* prefer exact function name match... */ > - func = btf_encoder__find_function(encoder, name, 0); > + func = btf_encoder__find_function(encoder, fn, 0); > if (func) { > if (func->generated) > continue; > @@ -1863,7 +1877,7 @@ int btf_encoder__encode_cu(struct btf_encoder *encoder, struct cu *cu, struct co > * it does not have optimized-out parameters > * in any cu. > */ > - func = btf_encoder__find_function(encoder, name, > + func = btf_encoder__find_function(encoder, fn, > strlen(name)); > if (func) { > save = true; > -- > 2.31.1 >
On 18/05/2023 09:39, Jiri Olsa wrote: > On Wed, May 17, 2023 at 05:16:47PM +0100, Alan Maguire wrote: >> By making sorting function for our ELF function list match on >> both name and function, we ensure that the set of ELF functions >> includes multiple copies for functions which have multiple instances >> across CUs. For example, cpumask_weight has 22 instances in >> System.map/kallsyms: >> >> ffffffff8103b530 t cpumask_weight >> ffffffff8103e300 t cpumask_weight >> ffffffff81040d30 t cpumask_weight >> ffffffff8104fa00 t cpumask_weight >> ffffffff81064300 t cpumask_weight >> ffffffff81082ba0 t cpumask_weight >> ffffffff81084f50 t cpumask_weight >> ffffffff810a4ad0 t cpumask_weight >> ffffffff810bb740 t cpumask_weight >> ffffffff8110a6c0 t cpumask_weight >> ffffffff81118ab0 t cpumask_weight >> ffffffff81129b50 t cpumask_weight >> ffffffff81137dc0 t cpumask_weight >> ffffffff811aead0 t cpumask_weight >> ffffffff811d6800 t cpumask_weight >> ffffffff811e1370 t cpumask_weight >> ffffffff812fae80 t cpumask_weight >> ffffffff81375c50 t cpumask_weight >> ffffffff81634b60 t cpumask_weight >> ffffffff817ba540 t cpumask_weight >> ffffffff819abf30 t cpumask_weight >> ffffffff81a7cb60 t cpumask_weight >> >> With ELF representations for each address, and DWARF info about >> addresses (low_pc) we can match DWARF with ELF accurately. >> The result for the BTF representation is that we end up with >> a single de-duped function: >> >> [9287] FUNC 'cpumask_weight' type_id=9286 linkage=static >> >> ...and 22 DECL_TAGs for each address that point at it: >> >> 9288] DECL_TAG 'address=0xffffffff8103b530' type_id=9287 component_idx=-1 >> [9623] DECL_TAG 'address=0xffffffff8103e300' type_id=9287 component_idx=-1 >> [9829] DECL_TAG 'address=0xffffffff81040d30' type_id=9287 component_idx=-1 >> [11609] DECL_TAG 'address=0xffffffff8104fa00' type_id=9287 component_idx=-1 >> [13299] DECL_TAG 'address=0xffffffff81064300' type_id=9287 component_idx=-1 >> [15704] DECL_TAG 'address=0xffffffff81082ba0' type_id=9287 component_idx=-1 >> [15731] DECL_TAG 'address=0xffffffff81084f50' type_id=9287 component_idx=-1 >> [18582] DECL_TAG 'address=0xffffffff810a4ad0' type_id=9287 component_idx=-1 >> [20234] DECL_TAG 'address=0xffffffff810bb740' type_id=9287 component_idx=-1 >> [25384] DECL_TAG 'address=0xffffffff8110a6c0' type_id=9287 component_idx=-1 >> [25798] DECL_TAG 'address=0xffffffff81118ab0' type_id=9287 component_idx=-1 >> [26285] DECL_TAG 'address=0xffffffff81129b50' type_id=9287 component_idx=-1 >> [27040] DECL_TAG 'address=0xffffffff81137dc0' type_id=9287 component_idx=-1 >> [32900] DECL_TAG 'address=0xffffffff811aead0' type_id=9287 component_idx=-1 >> [35059] DECL_TAG 'address=0xffffffff811d6800' type_id=9287 component_idx=-1 >> [35353] DECL_TAG 'address=0xffffffff811e1370' type_id=9287 component_idx=-1 >> [48934] DECL_TAG 'address=0xffffffff812fae80' type_id=9287 component_idx=-1 >> [54476] DECL_TAG 'address=0xffffffff81375c50' type_id=9287 component_idx=-1 >> [87772] DECL_TAG 'address=0xffffffff81634b60' type_id=9287 component_idx=-1 >> [108841] DECL_TAG 'address=0xffffffff817ba540' type_id=9287 component_idx=-1 >> [132557] DECL_TAG 'address=0xffffffff819abf30' type_id=9287 component_idx=-1 >> [143689] DECL_TAG 'address=0xffffffff81a7cb60' type_id=9287 component_idx=-1 > > right, Yonghong pointed this out in: > https://lore.kernel.org/bpf/49e4fee2-8be0-325f-3372-c79d96b686e9@meta.com/ > > it's problem, because we pass btf id as attach id during bpf program load, > and kernel does not have a way to figure out which address from the associated > DECL_TAGs to use > > if we could change dedup algo to take the function address into account and > make it not de-duplicate equal functions with different addresses, then we > could: > > - find btf id that properly and uniquely identifies the function we > want to trace > So maybe a more natural approach would be to extend BTF_KIND_FUNC (I think Alexei suggested something this earlier but I could be misremembering) as follows: 2.2.12 BTF_KIND_FUNC ~~~~~~~~~~~~~~~~~~~~ ``struct btf_type`` encoding requirement: * ``name_off``: offset to a valid C identifier - * ``info.kind_flag``: 0 + * ``info.kind_flag``: 0 or 1 if additional ``struct btf_func`` follows * ``info.kind``: BTF_KIND_FUNC * ``info.vlen``: linkage information (BTF_FUNC_STATIC, BTF_FUNC_GLOBAL or BTF_FUNC_EXTERN) * ``type``: a BTF_KIND_FUNC_PROTO type - No additional type data follow ``btf_type``. + If ``info.kind_flag`` is specified, a ``struct btf_func`` follows.:: + + struct btf_func { + __u64 addr; + }; + Otherwise no additional type data follows ``btf_type``. With the above, dedup could be made to fail when functions have non- identical associated addresses. Judging by the number of DECL_TAGs in the RFC, we'd end up with ~1000 extra BTF_KIND_FUNCs, and the extra space for struct btf_funcs would require roughly 400k. We'd still get dedup of FUNC_PROTOs, so I suspect the extra size would be < 1MB. > - store the vmlinux base address and treat stored function addresses as > offsets, so the verifier can get proper address even if the kernel > is relocated > yep; when we read kernel/module BTF in we could hopefully carry out this recalculation and update the vmlinux/module BTF addresses accordingly. Thanks! Alan
On 5/18/23 6:23 AM, Alan Maguire wrote: > On 18/05/2023 09:39, Jiri Olsa wrote: >> On Wed, May 17, 2023 at 05:16:47PM +0100, Alan Maguire wrote: >>> By making sorting function for our ELF function list match on >>> both name and function, we ensure that the set of ELF functions >>> includes multiple copies for functions which have multiple instances >>> across CUs. For example, cpumask_weight has 22 instances in >>> System.map/kallsyms: >>> >>> ffffffff8103b530 t cpumask_weight >>> ffffffff8103e300 t cpumask_weight >>> ffffffff81040d30 t cpumask_weight >>> ffffffff8104fa00 t cpumask_weight >>> ffffffff81064300 t cpumask_weight >>> ffffffff81082ba0 t cpumask_weight >>> ffffffff81084f50 t cpumask_weight >>> ffffffff810a4ad0 t cpumask_weight >>> ffffffff810bb740 t cpumask_weight >>> ffffffff8110a6c0 t cpumask_weight >>> ffffffff81118ab0 t cpumask_weight >>> ffffffff81129b50 t cpumask_weight >>> ffffffff81137dc0 t cpumask_weight >>> ffffffff811aead0 t cpumask_weight >>> ffffffff811d6800 t cpumask_weight >>> ffffffff811e1370 t cpumask_weight >>> ffffffff812fae80 t cpumask_weight >>> ffffffff81375c50 t cpumask_weight >>> ffffffff81634b60 t cpumask_weight >>> ffffffff817ba540 t cpumask_weight >>> ffffffff819abf30 t cpumask_weight >>> ffffffff81a7cb60 t cpumask_weight >>> >>> With ELF representations for each address, and DWARF info about >>> addresses (low_pc) we can match DWARF with ELF accurately. >>> The result for the BTF representation is that we end up with >>> a single de-duped function: >>> >>> [9287] FUNC 'cpumask_weight' type_id=9286 linkage=static >>> >>> ...and 22 DECL_TAGs for each address that point at it: >>> >>> 9288] DECL_TAG 'address=0xffffffff8103b530' type_id=9287 component_idx=-1 >>> [9623] DECL_TAG 'address=0xffffffff8103e300' type_id=9287 component_idx=-1 >>> [9829] DECL_TAG 'address=0xffffffff81040d30' type_id=9287 component_idx=-1 >>> [11609] DECL_TAG 'address=0xffffffff8104fa00' type_id=9287 component_idx=-1 >>> [13299] DECL_TAG 'address=0xffffffff81064300' type_id=9287 component_idx=-1 >>> [15704] DECL_TAG 'address=0xffffffff81082ba0' type_id=9287 component_idx=-1 >>> [15731] DECL_TAG 'address=0xffffffff81084f50' type_id=9287 component_idx=-1 >>> [18582] DECL_TAG 'address=0xffffffff810a4ad0' type_id=9287 component_idx=-1 >>> [20234] DECL_TAG 'address=0xffffffff810bb740' type_id=9287 component_idx=-1 >>> [25384] DECL_TAG 'address=0xffffffff8110a6c0' type_id=9287 component_idx=-1 >>> [25798] DECL_TAG 'address=0xffffffff81118ab0' type_id=9287 component_idx=-1 >>> [26285] DECL_TAG 'address=0xffffffff81129b50' type_id=9287 component_idx=-1 >>> [27040] DECL_TAG 'address=0xffffffff81137dc0' type_id=9287 component_idx=-1 >>> [32900] DECL_TAG 'address=0xffffffff811aead0' type_id=9287 component_idx=-1 >>> [35059] DECL_TAG 'address=0xffffffff811d6800' type_id=9287 component_idx=-1 >>> [35353] DECL_TAG 'address=0xffffffff811e1370' type_id=9287 component_idx=-1 >>> [48934] DECL_TAG 'address=0xffffffff812fae80' type_id=9287 component_idx=-1 >>> [54476] DECL_TAG 'address=0xffffffff81375c50' type_id=9287 component_idx=-1 >>> [87772] DECL_TAG 'address=0xffffffff81634b60' type_id=9287 component_idx=-1 >>> [108841] DECL_TAG 'address=0xffffffff817ba540' type_id=9287 component_idx=-1 >>> [132557] DECL_TAG 'address=0xffffffff819abf30' type_id=9287 component_idx=-1 >>> [143689] DECL_TAG 'address=0xffffffff81a7cb60' type_id=9287 component_idx=-1 >> >> right, Yonghong pointed this out in: >> https://lore.kernel.org/bpf/49e4fee2-8be0-325f-3372-c79d96b686e9@meta.com/ >> >> it's problem, because we pass btf id as attach id during bpf program load, >> and kernel does not have a way to figure out which address from the associated >> DECL_TAGs to use >> >> if we could change dedup algo to take the function address into account and >> make it not de-duplicate equal functions with different addresses, then we >> could: >> >> - find btf id that properly and uniquely identifies the function we >> want to trace >> > > So maybe a more natural approach would be to extend BTF_KIND_FUNC > (I think Alexei suggested something this earlier but I could be > misremembering) as follows: > > > 2.2.12 BTF_KIND_FUNC > ~~~~~~~~~~~~~~~~~~~~ > > ``struct btf_type`` encoding requirement: > * ``name_off``: offset to a valid C identifier > - * ``info.kind_flag``: 0 > + * ``info.kind_flag``: 0 or 1 if additional ``struct btf_func`` follows > * ``info.kind``: BTF_KIND_FUNC > * ``info.vlen``: linkage information (BTF_FUNC_STATIC, BTF_FUNC_GLOBAL > or BTF_FUNC_EXTERN) > * ``type``: a BTF_KIND_FUNC_PROTO type > > - No additional type data follow ``btf_type``. > + If ``info.kind_flag`` is specified, a ``struct btf_func`` follows.:: > + > + struct btf_func { > + __u64 addr; > + }; > + Otherwise no additional type data follows ``btf_type``. > > > With the above, dedup could be made to fail when functions have non- > identical associated addresses. Judging by the number of DECL_TAGs in > the RFC, we'd end up with ~1000 extra BTF_KIND_FUNCs, and the extra > space for struct btf_funcs would require roughly 400k. We'd still get > dedup of FUNC_PROTOs, so I suspect the extra size would be < 1MB. Agree that this seems a better idea to save space and also not impacting existing dedup algorithm. The only weird thing is previously we use KIND + option vlen to decide overall size of the type, but now we need KIND + kflag to decide FUNC type size. But this should be okay. We need to add an option to pahole to enable this feature and in kernel to enable this option only if the kernel supports new format. Do we want to add addresses to all functions in which case we will have FUNC types with both kflag 0 and kflag 1? Do you imagine whether we potentially need to encode other additional information to FUNC type? > > > >> - store the vmlinux base address and treat stored function addresses as >> offsets, so the verifier can get proper address even if the kernel >> is relocated >> > > yep; when we read kernel/module BTF in we could hopefully carry out > this recalculation and update the vmlinux/module BTF addresses > accordingly. > > Thanks! > > Alan
On Thu, May 18, 2023 at 02:23:34PM +0100, Alan Maguire wrote: > On 18/05/2023 09:39, Jiri Olsa wrote: > > On Wed, May 17, 2023 at 05:16:47PM +0100, Alan Maguire wrote: > >> By making sorting function for our ELF function list match on > >> both name and function, we ensure that the set of ELF functions > >> includes multiple copies for functions which have multiple instances > >> across CUs. For example, cpumask_weight has 22 instances in > >> System.map/kallsyms: > >> > >> ffffffff8103b530 t cpumask_weight > >> ffffffff8103e300 t cpumask_weight > >> ffffffff81040d30 t cpumask_weight > >> ffffffff8104fa00 t cpumask_weight > >> ffffffff81064300 t cpumask_weight > >> ffffffff81082ba0 t cpumask_weight > >> ffffffff81084f50 t cpumask_weight > >> ffffffff810a4ad0 t cpumask_weight > >> ffffffff810bb740 t cpumask_weight > >> ffffffff8110a6c0 t cpumask_weight > >> ffffffff81118ab0 t cpumask_weight > >> ffffffff81129b50 t cpumask_weight > >> ffffffff81137dc0 t cpumask_weight > >> ffffffff811aead0 t cpumask_weight > >> ffffffff811d6800 t cpumask_weight > >> ffffffff811e1370 t cpumask_weight > >> ffffffff812fae80 t cpumask_weight > >> ffffffff81375c50 t cpumask_weight > >> ffffffff81634b60 t cpumask_weight > >> ffffffff817ba540 t cpumask_weight > >> ffffffff819abf30 t cpumask_weight > >> ffffffff81a7cb60 t cpumask_weight > >> > >> With ELF representations for each address, and DWARF info about > >> addresses (low_pc) we can match DWARF with ELF accurately. > >> The result for the BTF representation is that we end up with > >> a single de-duped function: > >> > >> [9287] FUNC 'cpumask_weight' type_id=9286 linkage=static > >> > >> ...and 22 DECL_TAGs for each address that point at it: > >> > >> 9288] DECL_TAG 'address=0xffffffff8103b530' type_id=9287 component_idx=-1 > >> [9623] DECL_TAG 'address=0xffffffff8103e300' type_id=9287 component_idx=-1 > >> [9829] DECL_TAG 'address=0xffffffff81040d30' type_id=9287 component_idx=-1 > >> [11609] DECL_TAG 'address=0xffffffff8104fa00' type_id=9287 component_idx=-1 > >> [13299] DECL_TAG 'address=0xffffffff81064300' type_id=9287 component_idx=-1 > >> [15704] DECL_TAG 'address=0xffffffff81082ba0' type_id=9287 component_idx=-1 > >> [15731] DECL_TAG 'address=0xffffffff81084f50' type_id=9287 component_idx=-1 > >> [18582] DECL_TAG 'address=0xffffffff810a4ad0' type_id=9287 component_idx=-1 > >> [20234] DECL_TAG 'address=0xffffffff810bb740' type_id=9287 component_idx=-1 > >> [25384] DECL_TAG 'address=0xffffffff8110a6c0' type_id=9287 component_idx=-1 > >> [25798] DECL_TAG 'address=0xffffffff81118ab0' type_id=9287 component_idx=-1 > >> [26285] DECL_TAG 'address=0xffffffff81129b50' type_id=9287 component_idx=-1 > >> [27040] DECL_TAG 'address=0xffffffff81137dc0' type_id=9287 component_idx=-1 > >> [32900] DECL_TAG 'address=0xffffffff811aead0' type_id=9287 component_idx=-1 > >> [35059] DECL_TAG 'address=0xffffffff811d6800' type_id=9287 component_idx=-1 > >> [35353] DECL_TAG 'address=0xffffffff811e1370' type_id=9287 component_idx=-1 > >> [48934] DECL_TAG 'address=0xffffffff812fae80' type_id=9287 component_idx=-1 > >> [54476] DECL_TAG 'address=0xffffffff81375c50' type_id=9287 component_idx=-1 > >> [87772] DECL_TAG 'address=0xffffffff81634b60' type_id=9287 component_idx=-1 > >> [108841] DECL_TAG 'address=0xffffffff817ba540' type_id=9287 component_idx=-1 > >> [132557] DECL_TAG 'address=0xffffffff819abf30' type_id=9287 component_idx=-1 > >> [143689] DECL_TAG 'address=0xffffffff81a7cb60' type_id=9287 component_idx=-1 > > > > right, Yonghong pointed this out in: > > https://lore.kernel.org/bpf/49e4fee2-8be0-325f-3372-c79d96b686e9@meta.com/ > > > > it's problem, because we pass btf id as attach id during bpf program load, > > and kernel does not have a way to figure out which address from the associated > > DECL_TAGs to use > > > > if we could change dedup algo to take the function address into account and > > make it not de-duplicate equal functions with different addresses, then we > > could: > > > > - find btf id that properly and uniquely identifies the function we > > want to trace > > > > So maybe a more natural approach would be to extend BTF_KIND_FUNC > (I think Alexei suggested something this earlier but I could be > misremembering) as follows: > > > 2.2.12 BTF_KIND_FUNC > ~~~~~~~~~~~~~~~~~~~~ > > ``struct btf_type`` encoding requirement: > * ``name_off``: offset to a valid C identifier > - * ``info.kind_flag``: 0 > + * ``info.kind_flag``: 0 or 1 if additional ``struct btf_func`` follows > * ``info.kind``: BTF_KIND_FUNC > * ``info.vlen``: linkage information (BTF_FUNC_STATIC, BTF_FUNC_GLOBAL > or BTF_FUNC_EXTERN) > * ``type``: a BTF_KIND_FUNC_PROTO type > > - No additional type data follow ``btf_type``. > + If ``info.kind_flag`` is specified, a ``struct btf_func`` follows.:: > + > + struct btf_func { > + __u64 addr; > + }; > + Otherwise no additional type data follows ``btf_type``. > > > With the above, dedup could be made to fail when functions have non- > identical associated addresses. Judging by the number of DECL_TAGs in > the RFC, we'd end up with ~1000 extra BTF_KIND_FUNCs, and the extra > space for struct btf_funcs would require roughly 400k. We'd still get > dedup of FUNC_PROTOs, so I suspect the extra size would be < 1MB. nice, I think it's better solution > > > > > - store the vmlinux base address and treat stored function addresses as > > offsets, so the verifier can get proper address even if the kernel > > is relocated > > > > yep; when we read kernel/module BTF in we could hopefully carry out > this recalculation and update the vmlinux/module BTF addresses > accordingly. I wonder now when the address will be stored as number (not string) we could somehow generate relocation records and have the module loader do the relocation automatically not sure how that works for vmlinux when it's loaded/relocated on boot jirka
On 5/18/23 9:22 AM, Jiri Olsa wrote: > On Thu, May 18, 2023 at 02:23:34PM +0100, Alan Maguire wrote: >> On 18/05/2023 09:39, Jiri Olsa wrote: >>> On Wed, May 17, 2023 at 05:16:47PM +0100, Alan Maguire wrote: >>>> By making sorting function for our ELF function list match on >>>> both name and function, we ensure that the set of ELF functions >>>> includes multiple copies for functions which have multiple instances >>>> across CUs. For example, cpumask_weight has 22 instances in >>>> System.map/kallsyms: >>>> >>>> ffffffff8103b530 t cpumask_weight >>>> ffffffff8103e300 t cpumask_weight >>>> ffffffff81040d30 t cpumask_weight >>>> ffffffff8104fa00 t cpumask_weight >>>> ffffffff81064300 t cpumask_weight >>>> ffffffff81082ba0 t cpumask_weight >>>> ffffffff81084f50 t cpumask_weight >>>> ffffffff810a4ad0 t cpumask_weight >>>> ffffffff810bb740 t cpumask_weight >>>> ffffffff8110a6c0 t cpumask_weight >>>> ffffffff81118ab0 t cpumask_weight >>>> ffffffff81129b50 t cpumask_weight >>>> ffffffff81137dc0 t cpumask_weight >>>> ffffffff811aead0 t cpumask_weight >>>> ffffffff811d6800 t cpumask_weight >>>> ffffffff811e1370 t cpumask_weight >>>> ffffffff812fae80 t cpumask_weight >>>> ffffffff81375c50 t cpumask_weight >>>> ffffffff81634b60 t cpumask_weight >>>> ffffffff817ba540 t cpumask_weight >>>> ffffffff819abf30 t cpumask_weight >>>> ffffffff81a7cb60 t cpumask_weight >>>> >>>> With ELF representations for each address, and DWARF info about >>>> addresses (low_pc) we can match DWARF with ELF accurately. >>>> The result for the BTF representation is that we end up with >>>> a single de-duped function: >>>> >>>> [9287] FUNC 'cpumask_weight' type_id=9286 linkage=static >>>> >>>> ...and 22 DECL_TAGs for each address that point at it: >>>> >>>> 9288] DECL_TAG 'address=0xffffffff8103b530' type_id=9287 component_idx=-1 >>>> [9623] DECL_TAG 'address=0xffffffff8103e300' type_id=9287 component_idx=-1 >>>> [9829] DECL_TAG 'address=0xffffffff81040d30' type_id=9287 component_idx=-1 >>>> [11609] DECL_TAG 'address=0xffffffff8104fa00' type_id=9287 component_idx=-1 >>>> [13299] DECL_TAG 'address=0xffffffff81064300' type_id=9287 component_idx=-1 >>>> [15704] DECL_TAG 'address=0xffffffff81082ba0' type_id=9287 component_idx=-1 >>>> [15731] DECL_TAG 'address=0xffffffff81084f50' type_id=9287 component_idx=-1 >>>> [18582] DECL_TAG 'address=0xffffffff810a4ad0' type_id=9287 component_idx=-1 >>>> [20234] DECL_TAG 'address=0xffffffff810bb740' type_id=9287 component_idx=-1 >>>> [25384] DECL_TAG 'address=0xffffffff8110a6c0' type_id=9287 component_idx=-1 >>>> [25798] DECL_TAG 'address=0xffffffff81118ab0' type_id=9287 component_idx=-1 >>>> [26285] DECL_TAG 'address=0xffffffff81129b50' type_id=9287 component_idx=-1 >>>> [27040] DECL_TAG 'address=0xffffffff81137dc0' type_id=9287 component_idx=-1 >>>> [32900] DECL_TAG 'address=0xffffffff811aead0' type_id=9287 component_idx=-1 >>>> [35059] DECL_TAG 'address=0xffffffff811d6800' type_id=9287 component_idx=-1 >>>> [35353] DECL_TAG 'address=0xffffffff811e1370' type_id=9287 component_idx=-1 >>>> [48934] DECL_TAG 'address=0xffffffff812fae80' type_id=9287 component_idx=-1 >>>> [54476] DECL_TAG 'address=0xffffffff81375c50' type_id=9287 component_idx=-1 >>>> [87772] DECL_TAG 'address=0xffffffff81634b60' type_id=9287 component_idx=-1 >>>> [108841] DECL_TAG 'address=0xffffffff817ba540' type_id=9287 component_idx=-1 >>>> [132557] DECL_TAG 'address=0xffffffff819abf30' type_id=9287 component_idx=-1 >>>> [143689] DECL_TAG 'address=0xffffffff81a7cb60' type_id=9287 component_idx=-1 >>> >>> right, Yonghong pointed this out in: >>> https://lore.kernel.org/bpf/49e4fee2-8be0-325f-3372-c79d96b686e9@meta.com/ >>> >>> it's problem, because we pass btf id as attach id during bpf program load, >>> and kernel does not have a way to figure out which address from the associated >>> DECL_TAGs to use >>> >>> if we could change dedup algo to take the function address into account and >>> make it not de-duplicate equal functions with different addresses, then we >>> could: >>> >>> - find btf id that properly and uniquely identifies the function we >>> want to trace >>> >> >> So maybe a more natural approach would be to extend BTF_KIND_FUNC >> (I think Alexei suggested something this earlier but I could be >> misremembering) as follows: >> >> >> 2.2.12 BTF_KIND_FUNC >> ~~~~~~~~~~~~~~~~~~~~ >> >> ``struct btf_type`` encoding requirement: >> * ``name_off``: offset to a valid C identifier >> - * ``info.kind_flag``: 0 >> + * ``info.kind_flag``: 0 or 1 if additional ``struct btf_func`` follows >> * ``info.kind``: BTF_KIND_FUNC >> * ``info.vlen``: linkage information (BTF_FUNC_STATIC, BTF_FUNC_GLOBAL >> or BTF_FUNC_EXTERN) >> * ``type``: a BTF_KIND_FUNC_PROTO type >> >> - No additional type data follow ``btf_type``. >> + If ``info.kind_flag`` is specified, a ``struct btf_func`` follows.:: >> + >> + struct btf_func { >> + __u64 addr; >> + }; >> + Otherwise no additional type data follows ``btf_type``. >> >> >> With the above, dedup could be made to fail when functions have non- >> identical associated addresses. Judging by the number of DECL_TAGs in >> the RFC, we'd end up with ~1000 extra BTF_KIND_FUNCs, and the extra >> space for struct btf_funcs would require roughly 400k. We'd still get >> dedup of FUNC_PROTOs, so I suspect the extra size would be < 1MB > nice, I think it's better solution > >> >> >> >>> - store the vmlinux base address and treat stored function addresses as >>> offsets, so the verifier can get proper address even if the kernel >>> is relocated >>> >> >> yep; when we read kernel/module BTF in we could hopefully carry out >> this recalculation and update the vmlinux/module BTF addresses >> accordingly. > > I wonder now when the address will be stored as number (not string) we > could somehow generate relocation records and have the module loader > do the relocation automatically > > not sure how that works for vmlinux when it's loaded/relocated on boot Right, actual module address will mostly not match the one in dwarf. Some during module btf load, we should modify btf address as well for later use? Yes, may need to reuse some routines used in initial module relocation. > > jirka
On 5/18/23 11:25 AM, Yonghong Song wrote: > > > On 5/18/23 9:22 AM, Jiri Olsa wrote: >> On Thu, May 18, 2023 at 02:23:34PM +0100, Alan Maguire wrote: >>> On 18/05/2023 09:39, Jiri Olsa wrote: >>>> On Wed, May 17, 2023 at 05:16:47PM +0100, Alan Maguire wrote: >>>>> By making sorting function for our ELF function list match on >>>>> both name and function, we ensure that the set of ELF functions >>>>> includes multiple copies for functions which have multiple instances >>>>> across CUs. For example, cpumask_weight has 22 instances in >>>>> System.map/kallsyms: >>>>> >>>>> ffffffff8103b530 t cpumask_weight >>>>> ffffffff8103e300 t cpumask_weight >>>>> ffffffff81040d30 t cpumask_weight >>>>> ffffffff8104fa00 t cpumask_weight >>>>> ffffffff81064300 t cpumask_weight >>>>> ffffffff81082ba0 t cpumask_weight >>>>> ffffffff81084f50 t cpumask_weight >>>>> ffffffff810a4ad0 t cpumask_weight >>>>> ffffffff810bb740 t cpumask_weight >>>>> ffffffff8110a6c0 t cpumask_weight >>>>> ffffffff81118ab0 t cpumask_weight >>>>> ffffffff81129b50 t cpumask_weight >>>>> ffffffff81137dc0 t cpumask_weight >>>>> ffffffff811aead0 t cpumask_weight >>>>> ffffffff811d6800 t cpumask_weight >>>>> ffffffff811e1370 t cpumask_weight >>>>> ffffffff812fae80 t cpumask_weight >>>>> ffffffff81375c50 t cpumask_weight >>>>> ffffffff81634b60 t cpumask_weight >>>>> ffffffff817ba540 t cpumask_weight >>>>> ffffffff819abf30 t cpumask_weight >>>>> ffffffff81a7cb60 t cpumask_weight >>>>> >>>>> With ELF representations for each address, and DWARF info about >>>>> addresses (low_pc) we can match DWARF with ELF accurately. >>>>> The result for the BTF representation is that we end up with >>>>> a single de-duped function: >>>>> >>>>> [9287] FUNC 'cpumask_weight' type_id=9286 linkage=static >>>>> >>>>> ...and 22 DECL_TAGs for each address that point at it: >>>>> >>>>> 9288] DECL_TAG 'address=0xffffffff8103b530' type_id=9287 >>>>> component_idx=-1 >>>>> [9623] DECL_TAG 'address=0xffffffff8103e300' type_id=9287 >>>>> component_idx=-1 >>>>> [9829] DECL_TAG 'address=0xffffffff81040d30' type_id=9287 >>>>> component_idx=-1 >>>>> [11609] DECL_TAG 'address=0xffffffff8104fa00' type_id=9287 >>>>> component_idx=-1 >>>>> [13299] DECL_TAG 'address=0xffffffff81064300' type_id=9287 >>>>> component_idx=-1 >>>>> [15704] DECL_TAG 'address=0xffffffff81082ba0' type_id=9287 >>>>> component_idx=-1 >>>>> [15731] DECL_TAG 'address=0xffffffff81084f50' type_id=9287 >>>>> component_idx=-1 >>>>> [18582] DECL_TAG 'address=0xffffffff810a4ad0' type_id=9287 >>>>> component_idx=-1 >>>>> [20234] DECL_TAG 'address=0xffffffff810bb740' type_id=9287 >>>>> component_idx=-1 >>>>> [25384] DECL_TAG 'address=0xffffffff8110a6c0' type_id=9287 >>>>> component_idx=-1 >>>>> [25798] DECL_TAG 'address=0xffffffff81118ab0' type_id=9287 >>>>> component_idx=-1 >>>>> [26285] DECL_TAG 'address=0xffffffff81129b50' type_id=9287 >>>>> component_idx=-1 >>>>> [27040] DECL_TAG 'address=0xffffffff81137dc0' type_id=9287 >>>>> component_idx=-1 >>>>> [32900] DECL_TAG 'address=0xffffffff811aead0' type_id=9287 >>>>> component_idx=-1 >>>>> [35059] DECL_TAG 'address=0xffffffff811d6800' type_id=9287 >>>>> component_idx=-1 >>>>> [35353] DECL_TAG 'address=0xffffffff811e1370' type_id=9287 >>>>> component_idx=-1 >>>>> [48934] DECL_TAG 'address=0xffffffff812fae80' type_id=9287 >>>>> component_idx=-1 >>>>> [54476] DECL_TAG 'address=0xffffffff81375c50' type_id=9287 >>>>> component_idx=-1 >>>>> [87772] DECL_TAG 'address=0xffffffff81634b60' type_id=9287 >>>>> component_idx=-1 >>>>> [108841] DECL_TAG 'address=0xffffffff817ba540' type_id=9287 >>>>> component_idx=-1 >>>>> [132557] DECL_TAG 'address=0xffffffff819abf30' type_id=9287 >>>>> component_idx=-1 >>>>> [143689] DECL_TAG 'address=0xffffffff81a7cb60' type_id=9287 >>>>> component_idx=-1 >>>> >>>> right, Yonghong pointed this out in: >>>> >>>> https://lore.kernel.org/bpf/49e4fee2-8be0-325f-3372-c79d96b686e9@meta.com/ >>>> >>>> it's problem, because we pass btf id as attach id during bpf program >>>> load, >>>> and kernel does not have a way to figure out which address from the >>>> associated >>>> DECL_TAGs to use >>>> >>>> if we could change dedup algo to take the function address into >>>> account and >>>> make it not de-duplicate equal functions with different addresses, >>>> then we >>>> could: >>>> >>>> - find btf id that properly and uniquely identifies the function we >>>> want to trace >>>> >>> >>> So maybe a more natural approach would be to extend BTF_KIND_FUNC >>> (I think Alexei suggested something this earlier but I could be >>> misremembering) as follows: >>> >>> >>> 2.2.12 BTF_KIND_FUNC >>> ~~~~~~~~~~~~~~~~~~~~ >>> >>> ``struct btf_type`` encoding requirement: >>> * ``name_off``: offset to a valid C identifier >>> - * ``info.kind_flag``: 0 >>> + * ``info.kind_flag``: 0 or 1 if additional ``struct btf_func`` >>> follows >>> * ``info.kind``: BTF_KIND_FUNC >>> * ``info.vlen``: linkage information (BTF_FUNC_STATIC, >>> BTF_FUNC_GLOBAL >>> or BTF_FUNC_EXTERN) >>> * ``type``: a BTF_KIND_FUNC_PROTO type >>> >>> - No additional type data follow ``btf_type``. >>> + If ``info.kind_flag`` is specified, a ``struct btf_func`` follows.:: >>> + >>> + struct btf_func { >>> + __u64 addr; >>> + }; >>> + Otherwise no additional type data follows ``btf_type``. >>> >>> >>> With the above, dedup could be made to fail when functions have non- >>> identical associated addresses. Judging by the number of DECL_TAGs in >>> the RFC, we'd end up with ~1000 extra BTF_KIND_FUNCs, and the extra >>> space for struct btf_funcs would require roughly 400k. We'd still get >>> dedup of FUNC_PROTOs, so I suspect the extra size would be < 1MB >> nice, I think it's better solution >> >>> >>> >>> >>>> - store the vmlinux base address and treat stored function >>>> addresses as >>>> offsets, so the verifier can get proper address even if the kernel >>>> is relocated >>>> >>> >>> yep; when we read kernel/module BTF in we could hopefully carry out >>> this recalculation and update the vmlinux/module BTF addresses >>> accordingly. >> >> I wonder now when the address will be stored as number (not string) we >> could somehow generate relocation records and have the module loader >> do the relocation automatically This is an interesting idea if bpf subsystem does not want to do it... >> >> not sure how that works for vmlinux when it's loaded/relocated on boot If no KASLR, address in the vmlinux dwarf should match the one in BTF based on my experience. With KASLR, yes, relocation needs to be done for those addresses in BTF as they won't match the actual func address in the dwarf. Do most distributions enable KASLR (CONFIG_RANDOMIZE_BASE)? > > Right, actual module address will mostly not match the one in dwarf. > Some during module btf load, we should modify btf address as well > for later use? Yes, may need to reuse some routines used in initial > module relocation. > >> >> jirka
On Thu, May 18, 2023 at 11:26 AM Yonghong Song <yhs@meta.com> wrote: > > I wonder now when the address will be stored as number (not string) we > > could somehow generate relocation records and have the module loader > > do the relocation automatically > > > > not sure how that works for vmlinux when it's loaded/relocated on boot > > Right, actual module address will mostly not match the one in dwarf. > Some during module btf load, we should modify btf address as well > for later use? Yes, may need to reuse some routines used in initial > module relocation. Few thoughts: Initially I felt that single FUNC with multiple DECL_TAG(addr) is better, since BTF for all funcs is the same and it's likely one static inline function that the compiler decided not to inline (like cpumask_weight), so when libbpf wants to attach prog to it the kernel should automatically attach in all places. But then noticed that actually different functions with the same name and proto will be deduplicated into one. Their bodies at different locations will be different. Example: seq_show. In this case it's better to let libbpf pick the exact one to attach. Then realized that even the same function like cpumask_weight() might have different body at different locations due to optimizations. I don't think dwarf contains enough info to distinguish all the combinations. Considering all that it's better to keep one BTF kind_func -> one addr. If it's extended the way Alan is proposing with kind_flag the dedup logic will not combine them due to different addresses. Also turned out that the kernel doesn't validate decl_tag string. The following code loads without error: __attribute__((btf_decl_tag("\x10\xf0"))); I'm not sure whether we want to tighten decl_tag validation and how. If we keep it as-is we can use func+decl_tag approach to add 4 bytes of addr in the binary format (if 1st byte is not zero). But it feels like a hack, since the kernel needs to be changed anyway to adjust the addresses after module loading and kernel relocation. So func with kind_flag seems like the best approach. Regarding relocation of address in the kernel and modules... We just need to add base_addr to all addrs-es recorded in BTF. Both for kernel and for module BTFs. Shouldn't be too complicated.
On Thu, May 18, 2023 at 5:26 PM Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > > On Thu, May 18, 2023 at 11:26 AM Yonghong Song <yhs@meta.com> wrote: > > > I wonder now when the address will be stored as number (not string) we > > > could somehow generate relocation records and have the module loader > > > do the relocation automatically > > > > > > not sure how that works for vmlinux when it's loaded/relocated on boot > > > > Right, actual module address will mostly not match the one in dwarf. > > Some during module btf load, we should modify btf address as well > > for later use? Yes, may need to reuse some routines used in initial > > module relocation. > > > Few thoughts: > > Initially I felt that single FUNC with multiple DECL_TAG(addr) > is better, since BTF for all funcs is the same and it's likely > one static inline function that the compiler decided not to inline > (like cpumask_weight), so when libbpf wants to attach prog to it > the kernel should automatically attach in all places. > But then noticed that actually different functions with > the same name and proto will be deduplicated into one. > Their bodies at different locations will be different. > Example: seq_show. > In this case it's better to let libbpf pick the exact one to attach. > Then realized that even the same function like cpumask_weight() > might have different body at different locations due to optimizations. > I don't think dwarf contains enough info to distinguish all the combinations. > > Considering all that it's better to keep one BTF kind_func -> one addr. > If it's extended the way Alan is proposing with kind_flag > the dedup logic will not combine them due to different addresses. I've discussed this w/ Alexei and Yonghong offline, so will summarize what I said here. I don't think that we should go the route of adding kflag to BTF_KIND_FUNC. As Yonghong pointed out, previously only vlen and kind determined byte size of the type, and so adding a third variable (kflag), which would apply only to BTF_KIND_FUNC, seems like an unnecessary new complication. I propose to go with an entirely new kind instead, we have plenty of them left. This new kind will be pretty kernel-specific, so could be targeted for kernel use cases better without adding unnecessary complications to Clang. BTF_KIND_FUNCs generated by Clang for .bpf.o files don't need addr, they are meaningless and Clang doesn't know anything about addresses anyways. So we can keep Clang unchanged and more backwards compatible. But now that this new kind (BTF_KIND_KERNEL_FUNC? KFUNC would be misleading, unfortunately) is kernel-specific and generated by pahole only, besides addr we can add some flags field and use them to mark function as defined as kfunc or not, or (as a hypothetical example) traceable or not, or maybe we even have inline flag some day, etc. Something that makes sense mostly for kernel functions. Having said all that, given we are going to break all existing BTF-aware tools again with a new kind, we should really couple all this work with making BTF self-describing as discussed in [0], so that future changes like this won't break older bpftool and other similar tools, unnecessarily. Which, btw, is another reason to not use kflag to determine the size of btf_type. Proposed solution in [0] assumes that kind + vlen defines the size. We should probably have dedicated discussion for self-describing BTF, but I think both changes have to be done in the same release window. [0] https://lore.kernel.org/bpf/CAEf4BzYjWHRdNNw4B=eOXOs_ONrDwrgX4bn=Nuc1g8JPFC34MA@mail.gmail.com/#t > > Also turned out that the kernel doesn't validate decl_tag string. > The following code loads without error: > __attribute__((btf_decl_tag("\x10\xf0"))); > > I'm not sure whether we want to tighten decl_tag validation and how. > If we keep it as-is we can use func+decl_tag approach > to add 4 bytes of addr in the binary format (if 1st byte is not zero). > But it feels like a hack, since the kernel needs to be changed > anyway to adjust the addresses after module loading and kernel relocation. > So func with kind_flag seems like the best approach. > > Regarding relocation of address in the kernel and modules... > We just need to add base_addr to all addrs-es recorded in BTF. > Both for kernel and for module BTFs. > Shouldn't be too complicated. yep, KASLR seems simple enough to handle by the kernel itself at boot time.
On Mon, May 22, 2023 at 02:31:01PM -0700, Andrii Nakryiko wrote: > On Thu, May 18, 2023 at 5:26 PM Alexei Starovoitov > <alexei.starovoitov@gmail.com> wrote: > > > > On Thu, May 18, 2023 at 11:26 AM Yonghong Song <yhs@meta.com> wrote: > > > > I wonder now when the address will be stored as number (not string) we > > > > could somehow generate relocation records and have the module loader > > > > do the relocation automatically > > > > > > > > not sure how that works for vmlinux when it's loaded/relocated on boot > > > > > > Right, actual module address will mostly not match the one in dwarf. > > > Some during module btf load, we should modify btf address as well > > > for later use? Yes, may need to reuse some routines used in initial > > > module relocation. > > > > > > Few thoughts: > > > > Initially I felt that single FUNC with multiple DECL_TAG(addr) > > is better, since BTF for all funcs is the same and it's likely > > one static inline function that the compiler decided not to inline > > (like cpumask_weight), so when libbpf wants to attach prog to it > > the kernel should automatically attach in all places. > > But then noticed that actually different functions with > > the same name and proto will be deduplicated into one. > > Their bodies at different locations will be different. > > Example: seq_show. > > In this case it's better to let libbpf pick the exact one to attach. > > Then realized that even the same function like cpumask_weight() > > might have different body at different locations due to optimizations. > > I don't think dwarf contains enough info to distinguish all the combinations. > > > > Considering all that it's better to keep one BTF kind_func -> one addr. > > If it's extended the way Alan is proposing with kind_flag > > the dedup logic will not combine them due to different addresses. > > I've discussed this w/ Alexei and Yonghong offline, so will summarize > what I said here. I don't think that we should go the route of adding > kflag to BTF_KIND_FUNC. As Yonghong pointed out, previously only vlen > and kind determined byte size of the type, and so adding a third > variable (kflag), which would apply only to BTF_KIND_FUNC, seems like > an unnecessary new complication. > > I propose to go with an entirely new kind instead, we have plenty of > them left. This new kind will be pretty kernel-specific, so could be > targeted for kernel use cases better without adding unnecessary > complications to Clang. BTF_KIND_FUNCs generated by Clang for .bpf.o > files don't need addr, they are meaningless and Clang doesn't know > anything about addresses anyways. So we can keep Clang unchanged and > more backwards compatible. > > But now that this new kind (BTF_KIND_KERNEL_FUNC? KFUNC would be > misleading, unfortunately) is kernel-specific and generated by pahole > only, besides addr we can add some flags field and use them to mark > function as defined as kfunc or not, or (as a hypothetical example) > traceable or not, or maybe we even have inline flag some day, etc. > Something that makes sense mostly for kernel functions. > > Having said all that, given we are going to break all existing > BTF-aware tools again with a new kind, we should really couple all > this work with making BTF self-describing as discussed in [0], so that > future changes like this won't break older bpftool and other similar > tools, unnecessarily. nice, would be great to have this and eventually got rid of new pahole enable/disable options, makes sense to do this before adding new type jirka > > Which, btw, is another reason to not use kflag to determine the size > of btf_type. Proposed solution in [0] assumes that kind + vlen defines > the size. We should probably have dedicated discussion for > self-describing BTF, but I think both changes have to be done in the > same release window. > > [0] https://lore.kernel.org/bpf/CAEf4BzYjWHRdNNw4B=eOXOs_ONrDwrgX4bn=Nuc1g8JPFC34MA@mail.gmail.com/#t > > > > > Also turned out that the kernel doesn't validate decl_tag string. > > The following code loads without error: > > __attribute__((btf_decl_tag("\x10\xf0"))); > > > > I'm not sure whether we want to tighten decl_tag validation and how. > > If we keep it as-is we can use func+decl_tag approach > > to add 4 bytes of addr in the binary format (if 1st byte is not zero). > > But it feels like a hack, since the kernel needs to be changed > > anyway to adjust the addresses after module loading and kernel relocation. > > So func with kind_flag seems like the best approach. > > > > Regarding relocation of address in the kernel and modules... > > We just need to add base_addr to all addrs-es recorded in BTF. > > Both for kernel and for module BTFs. > > Shouldn't be too complicated. > > yep, KASLR seems simple enough to handle by the kernel itself at boot time.
On 25/05/2023 09:52, Jiri Olsa wrote: > On Mon, May 22, 2023 at 02:31:01PM -0700, Andrii Nakryiko wrote: >> On Thu, May 18, 2023 at 5:26 PM Alexei Starovoitov >> <alexei.starovoitov@gmail.com> wrote: >>> >>> On Thu, May 18, 2023 at 11:26 AM Yonghong Song <yhs@meta.com> wrote: >>>>> I wonder now when the address will be stored as number (not string) we >>>>> could somehow generate relocation records and have the module loader >>>>> do the relocation automatically >>>>> >>>>> not sure how that works for vmlinux when it's loaded/relocated on boot >>>> >>>> Right, actual module address will mostly not match the one in dwarf. >>>> Some during module btf load, we should modify btf address as well >>>> for later use? Yes, may need to reuse some routines used in initial >>>> module relocation. >>> >>> >>> Few thoughts: >>> >>> Initially I felt that single FUNC with multiple DECL_TAG(addr) >>> is better, since BTF for all funcs is the same and it's likely >>> one static inline function that the compiler decided not to inline >>> (like cpumask_weight), so when libbpf wants to attach prog to it >>> the kernel should automatically attach in all places. >>> But then noticed that actually different functions with >>> the same name and proto will be deduplicated into one. >>> Their bodies at different locations will be different. >>> Example: seq_show. >>> In this case it's better to let libbpf pick the exact one to attach. >>> Then realized that even the same function like cpumask_weight() >>> might have different body at different locations due to optimizations. >>> I don't think dwarf contains enough info to distinguish all the combinations. >>> >>> Considering all that it's better to keep one BTF kind_func -> one addr. >>> If it's extended the way Alan is proposing with kind_flag >>> the dedup logic will not combine them due to different addresses. >> >> I've discussed this w/ Alexei and Yonghong offline, so will summarize >> what I said here. I don't think that we should go the route of adding >> kflag to BTF_KIND_FUNC. As Yonghong pointed out, previously only vlen >> and kind determined byte size of the type, and so adding a third >> variable (kflag), which would apply only to BTF_KIND_FUNC, seems like >> an unnecessary new complication. >> >> I propose to go with an entirely new kind instead, we have plenty of >> them left. This new kind will be pretty kernel-specific, so could be >> targeted for kernel use cases better without adding unnecessary >> complications to Clang. BTF_KIND_FUNCs generated by Clang for .bpf.o >> files don't need addr, they are meaningless and Clang doesn't know >> anything about addresses anyways. So we can keep Clang unchanged and >> more backwards compatible. >> >> But now that this new kind (BTF_KIND_KERNEL_FUNC? KFUNC would be >> misleading, unfortunately) is kernel-specific and generated by pahole >> only, besides addr we can add some flags field and use them to mark >> function as defined as kfunc or not, or (as a hypothetical example) >> traceable or not, or maybe we even have inline flag some day, etc. >> Something that makes sense mostly for kernel functions. >> >> Having said all that, given we are going to break all existing >> BTF-aware tools again with a new kind, we should really couple all >> this work with making BTF self-describing as discussed in [0], so that >> future changes like this won't break older bpftool and other similar >> tools, unnecessarily. > > nice, would be great to have this and eventually got rid of new pahole > enable/disable options, makes sense to do this before adding new type > > jirka > agreed; I'd been thinking the same and I've been working on a proof-of- concept of this based on our previous discussions, I'll send it out as soon as I've got it roughly working. With respect to the question of having a new kind, I'm not sure I agree with the above. We've already broken the "vlen == number of objects following" for BTF_KIND_FUNC, where vlen is used to represent linkage information instead. To me, it feels more natural to have continuity across different object types (kernel versus BPF program) with BTF_KIND_FUNC: the fact that it's hard to come up with an alternate name is perhaps a reflection of this. Most characteristics (aside from "is a kfunc") seem to be shared across kernel and BPF program functions, but the best way to judge is probably to come up with as complete a list as is possible I suppose. In order to accommodate a metadata description using existing BTF_KIND_FUNC, we can have a metadata flag that can say "KFLAG set means singular object following of object_size" that is set for BTF_KIND_FUNC. We can mark it as discouraged for future use. One argument I definitely see for a new kind representing kernel functions is if it were the case that we might need N elements _and_ a singular object following the btf_type to represent it. I don't currently see any use for such a model for function representation, but if that is anticipated somehow, it might be worth having a new kind to support that sort of representation. Alan >> >> Which, btw, is another reason to not use kflag to determine the size >> of btf_type. Proposed solution in [0] assumes that kind + vlen defines >> the size. We should probably have dedicated discussion for >> self-describing BTF, but I think both changes have to be done in the >> same release window. >> >> [0] https://lore.kernel.org/bpf/CAEf4BzYjWHRdNNw4B=eOXOs_ONrDwrgX4bn=Nuc1g8JPFC34MA@mail.gmail.com/#t >> >>> >>> Also turned out that the kernel doesn't validate decl_tag string. >>> The following code loads without error: >>> __attribute__((btf_decl_tag("\x10\xf0"))); >>> >>> I'm not sure whether we want to tighten decl_tag validation and how. >>> If we keep it as-is we can use func+decl_tag approach >>> to add 4 bytes of addr in the binary format (if 1st byte is not zero). >>> But it feels like a hack, since the kernel needs to be changed >>> anyway to adjust the addresses after module loading and kernel relocation. >>> So func with kind_flag seems like the best approach. >>> >>> Regarding relocation of address in the kernel and modules... >>> We just need to add base_addr to all addrs-es recorded in BTF. >>> Both for kernel and for module BTFs. >>> Shouldn't be too complicated. >> >> yep, KASLR seems simple enough to handle by the kernel itself at boot time. >
On Thu, May 25, 2023 at 3:20 AM Alan Maguire <alan.maguire@oracle.com> wrote: > > On 25/05/2023 09:52, Jiri Olsa wrote: > > On Mon, May 22, 2023 at 02:31:01PM -0700, Andrii Nakryiko wrote: > >> On Thu, May 18, 2023 at 5:26 PM Alexei Starovoitov > >> <alexei.starovoitov@gmail.com> wrote: > >>> > >>> On Thu, May 18, 2023 at 11:26 AM Yonghong Song <yhs@meta.com> wrote: > >>>>> I wonder now when the address will be stored as number (not string) we > >>>>> could somehow generate relocation records and have the module loader > >>>>> do the relocation automatically > >>>>> > >>>>> not sure how that works for vmlinux when it's loaded/relocated on boot > >>>> > >>>> Right, actual module address will mostly not match the one in dwarf. > >>>> Some during module btf load, we should modify btf address as well > >>>> for later use? Yes, may need to reuse some routines used in initial > >>>> module relocation. > >>> > >>> > >>> Few thoughts: > >>> > >>> Initially I felt that single FUNC with multiple DECL_TAG(addr) > >>> is better, since BTF for all funcs is the same and it's likely > >>> one static inline function that the compiler decided not to inline > >>> (like cpumask_weight), so when libbpf wants to attach prog to it > >>> the kernel should automatically attach in all places. > >>> But then noticed that actually different functions with > >>> the same name and proto will be deduplicated into one. > >>> Their bodies at different locations will be different. > >>> Example: seq_show. > >>> In this case it's better to let libbpf pick the exact one to attach. > >>> Then realized that even the same function like cpumask_weight() > >>> might have different body at different locations due to optimizations. > >>> I don't think dwarf contains enough info to distinguish all the combinations. > >>> > >>> Considering all that it's better to keep one BTF kind_func -> one addr. > >>> If it's extended the way Alan is proposing with kind_flag > >>> the dedup logic will not combine them due to different addresses. > >> > >> I've discussed this w/ Alexei and Yonghong offline, so will summarize > >> what I said here. I don't think that we should go the route of adding > >> kflag to BTF_KIND_FUNC. As Yonghong pointed out, previously only vlen > >> and kind determined byte size of the type, and so adding a third > >> variable (kflag), which would apply only to BTF_KIND_FUNC, seems like > >> an unnecessary new complication. > >> > >> I propose to go with an entirely new kind instead, we have plenty of > >> them left. This new kind will be pretty kernel-specific, so could be > >> targeted for kernel use cases better without adding unnecessary > >> complications to Clang. BTF_KIND_FUNCs generated by Clang for .bpf.o > >> files don't need addr, they are meaningless and Clang doesn't know > >> anything about addresses anyways. So we can keep Clang unchanged and > >> more backwards compatible. > >> > >> But now that this new kind (BTF_KIND_KERNEL_FUNC? KFUNC would be > >> misleading, unfortunately) is kernel-specific and generated by pahole > >> only, besides addr we can add some flags field and use them to mark > >> function as defined as kfunc or not, or (as a hypothetical example) > >> traceable or not, or maybe we even have inline flag some day, etc. > >> Something that makes sense mostly for kernel functions. > >> > >> Having said all that, given we are going to break all existing > >> BTF-aware tools again with a new kind, we should really couple all > >> this work with making BTF self-describing as discussed in [0], so that > >> future changes like this won't break older bpftool and other similar > >> tools, unnecessarily. > > > > nice, would be great to have this and eventually got rid of new pahole > > enable/disable options, makes sense to do this before adding new type > > > > jirka > > > > agreed; I'd been thinking the same and I've been working on a proof-of- > concept of this based on our previous discussions, I'll send it out as > soon as I've got it roughly working. nice! it would be great to not have every older tool break whenever we add a tiny extension to BTF > > With respect to the question of having a new kind, I'm not sure I agree > with the above. We've already broken the "vlen == number of objects > following" for BTF_KIND_FUNC, where vlen is used to represent linkage > information instead. I'd say BTF_KIND_FUNC and its abuse of vlen is just an example of what we shouldn't do going forward. But even that still allows us to compactly describe each btf_type's size as a pair of (fixed_sz, vlen_sz), where the total size will be fixed_sz + vlen * vlen_sz. For BTF_KIND_FUNC vlen_sz will be zero, even if vlen is non-zero. If we allow klag to change bytes size of a type, we'd need another sizing dimension, which is what I try to avoid here. Look at ENUM and ENUM64. We chose to add a new kind for ENUM64 because of different byte sizing needs, instead of abusing kflag for this, and I think this was the right decision. Let's do that here as well. > > To me, it feels more natural to have continuity across different object > types (kernel versus BPF program) with BTF_KIND_FUNC: the fact that > it's hard to come up with an alternate name is perhaps a reflection of > this. Most characteristics (aside from "is a kfunc") seem to be shared > across kernel and BPF program functions, but the best way to judge > is probably to come up with as complete a list as is possible I suppose. Even the address that you'll add to BTF_KIND_FUNC doesn't apply to BTF emitted for BPF object files. So while I can sympathize (conceptually) with the desire to have one kind for "all thing function", it will cause more problems. Again, look at ENUM and ENUM64 case. Sure, we had to teach BTF dedup and BPF CO-RE about both kinds to represent the same enum concept, but we'd have to do the same even if just added a kflag for 64-bit enum. So my point is that tools will still need to be extended to take advantage of new information, but reusing the same kind is causing more problems and is not solving any. > > In order to accommodate a metadata description using existing > BTF_KIND_FUNC, we can have a metadata flag that can say > "KFLAG set means singular object following of object_size" that is > set for BTF_KIND_FUNC. We can mark it as discouraged for future > use. > > One argument I definitely see for a new kind representing kernel > functions is if it were the case that we might need N elements > _and_ a singular object following the btf_type to represent it. > I don't currently see any use for such a model for function > representation, but if that is anticipated somehow, it might be > worth having a new kind to support that sort of representation. We can never foresee stuff like this, but it's best to not design ourselves into the corner. We have enough space for BTF_KIND_xxx, which gives you all the same benefits. Let's keep it simple and straightforward. > > Alan > > >> > >> Which, btw, is another reason to not use kflag to determine the size > >> of btf_type. Proposed solution in [0] assumes that kind + vlen defines > >> the size. We should probably have dedicated discussion for > >> self-describing BTF, but I think both changes have to be done in the > >> same release window. > >> > >> [0] https://lore.kernel.org/bpf/CAEf4BzYjWHRdNNw4B=eOXOs_ONrDwrgX4bn=Nuc1g8JPFC34MA@mail.gmail.com/#t > >> > >>> > >>> Also turned out that the kernel doesn't validate decl_tag string. > >>> The following code loads without error: > >>> __attribute__((btf_decl_tag("\x10\xf0"))); > >>> > >>> I'm not sure whether we want to tighten decl_tag validation and how. > >>> If we keep it as-is we can use func+decl_tag approach > >>> to add 4 bytes of addr in the binary format (if 1st byte is not zero). > >>> But it feels like a hack, since the kernel needs to be changed > >>> anyway to adjust the addresses after module loading and kernel relocation. > >>> So func with kind_flag seems like the best approach. > >>> > >>> Regarding relocation of address in the kernel and modules... > >>> We just need to add base_addr to all addrs-es recorded in BTF. > >>> Both for kernel and for module BTFs. > >>> Shouldn't be too complicated. > >> > >> yep, KASLR seems simple enough to handle by the kernel itself at boot time. > >
diff --git a/btf_encoder.c b/btf_encoder.c index 3bd0fe0..315053d 100644 --- a/btf_encoder.c +++ b/btf_encoder.c @@ -988,13 +988,25 @@ static int functions_cmp(const void *_a, const void *_b) { const struct elf_function *a = _a; const struct elf_function *b = _b; + int ret; /* if search key allows prefix match, verify target has matching * prefix len and prefix matches. */ if (a->prefixlen && a->prefixlen == b->prefixlen) - return strncmp(a->name, b->name, b->prefixlen); - return strcmp(a->name, b->name); + ret = strncmp(a->name, b->name, b->prefixlen); + else + ret = strcmp(a->name, b->name); + + if (ret || !b->addr) + return ret; + + /* secondarily sort/search by address. */ + if (a->addr < b->addr) + return -1; + if (a->addr > b->addr) + return 1; + return 0; } #ifndef max @@ -1044,9 +1056,11 @@ static int btf_encoder__collect_function(struct btf_encoder *encoder, GElf_Sym * } static struct elf_function *btf_encoder__find_function(const struct btf_encoder *encoder, - const char *name, size_t prefixlen) + struct function *fn, size_t prefixlen) { - struct elf_function key = { .name = name, .prefixlen = prefixlen }; + struct elf_function key = { .name = function__name(fn), + .addr = fn->low_pc, + .prefixlen = prefixlen }; return bsearch(&key, encoder->functions.entries, encoder->functions.cnt, sizeof(key), functions_cmp); } @@ -1846,7 +1860,7 @@ int btf_encoder__encode_cu(struct btf_encoder *encoder, struct cu *cu, struct co continue; /* prefer exact function name match... */ - func = btf_encoder__find_function(encoder, name, 0); + func = btf_encoder__find_function(encoder, fn, 0); if (func) { if (func->generated) continue; @@ -1863,7 +1877,7 @@ int btf_encoder__encode_cu(struct btf_encoder *encoder, struct cu *cu, struct co * it does not have optimized-out parameters * in any cu. */ - func = btf_encoder__find_function(encoder, name, + func = btf_encoder__find_function(encoder, fn, strlen(name)); if (func) { save = true;
By making sorting function for our ELF function list match on both name and function, we ensure that the set of ELF functions includes multiple copies for functions which have multiple instances across CUs. For example, cpumask_weight has 22 instances in System.map/kallsyms: ffffffff8103b530 t cpumask_weight ffffffff8103e300 t cpumask_weight ffffffff81040d30 t cpumask_weight ffffffff8104fa00 t cpumask_weight ffffffff81064300 t cpumask_weight ffffffff81082ba0 t cpumask_weight ffffffff81084f50 t cpumask_weight ffffffff810a4ad0 t cpumask_weight ffffffff810bb740 t cpumask_weight ffffffff8110a6c0 t cpumask_weight ffffffff81118ab0 t cpumask_weight ffffffff81129b50 t cpumask_weight ffffffff81137dc0 t cpumask_weight ffffffff811aead0 t cpumask_weight ffffffff811d6800 t cpumask_weight ffffffff811e1370 t cpumask_weight ffffffff812fae80 t cpumask_weight ffffffff81375c50 t cpumask_weight ffffffff81634b60 t cpumask_weight ffffffff817ba540 t cpumask_weight ffffffff819abf30 t cpumask_weight ffffffff81a7cb60 t cpumask_weight With ELF representations for each address, and DWARF info about addresses (low_pc) we can match DWARF with ELF accurately. The result for the BTF representation is that we end up with a single de-duped function: [9287] FUNC 'cpumask_weight' type_id=9286 linkage=static ...and 22 DECL_TAGs for each address that point at it: 9288] DECL_TAG 'address=0xffffffff8103b530' type_id=9287 component_idx=-1 [9623] DECL_TAG 'address=0xffffffff8103e300' type_id=9287 component_idx=-1 [9829] DECL_TAG 'address=0xffffffff81040d30' type_id=9287 component_idx=-1 [11609] DECL_TAG 'address=0xffffffff8104fa00' type_id=9287 component_idx=-1 [13299] DECL_TAG 'address=0xffffffff81064300' type_id=9287 component_idx=-1 [15704] DECL_TAG 'address=0xffffffff81082ba0' type_id=9287 component_idx=-1 [15731] DECL_TAG 'address=0xffffffff81084f50' type_id=9287 component_idx=-1 [18582] DECL_TAG 'address=0xffffffff810a4ad0' type_id=9287 component_idx=-1 [20234] DECL_TAG 'address=0xffffffff810bb740' type_id=9287 component_idx=-1 [25384] DECL_TAG 'address=0xffffffff8110a6c0' type_id=9287 component_idx=-1 [25798] DECL_TAG 'address=0xffffffff81118ab0' type_id=9287 component_idx=-1 [26285] DECL_TAG 'address=0xffffffff81129b50' type_id=9287 component_idx=-1 [27040] DECL_TAG 'address=0xffffffff81137dc0' type_id=9287 component_idx=-1 [32900] DECL_TAG 'address=0xffffffff811aead0' type_id=9287 component_idx=-1 [35059] DECL_TAG 'address=0xffffffff811d6800' type_id=9287 component_idx=-1 [35353] DECL_TAG 'address=0xffffffff811e1370' type_id=9287 component_idx=-1 [48934] DECL_TAG 'address=0xffffffff812fae80' type_id=9287 component_idx=-1 [54476] DECL_TAG 'address=0xffffffff81375c50' type_id=9287 component_idx=-1 [87772] DECL_TAG 'address=0xffffffff81634b60' type_id=9287 component_idx=-1 [108841] DECL_TAG 'address=0xffffffff817ba540' type_id=9287 component_idx=-1 [132557] DECL_TAG 'address=0xffffffff819abf30' type_id=9287 component_idx=-1 [143689] DECL_TAG 'address=0xffffffff81a7cb60' type_id=9287 component_idx=-1 Consider another case where the same name - wakeup_show() - is used for two different function signatures: From kernel/irq/irqdesc.c static ssize_t wakeup_show(struct kobject *kobj, struct kobj_attribute *attr, char *buf) ...and from drivers/base/power/sysfs.c static ssize_t wakeup_show(struct device *dev, struct device_attribute *attr, char *buf); We see both defined in BTF, along with the addresses that tell us which is which: [28472] FUNC 'wakeup_show' type_id=11214 linkage=static specifies [11214] FUNC_PROTO '(anon)' ret_type_id=76 vlen=3 'kobj' type_id=877 'attr' type_id=11200 'buf' type_id=56 ...and has declaration tag [28473] DECL_TAG 'address=0xffffffff8115eab0' type_id=28472 component_idx=-1 which identifies ffffffff8115eab0 t wakeup_show ...as the function with the first signature. Similarly, [114375] FUNC 'wakeup_show' type_id=4750 linkage=static [4750] FUNC_PROTO '(anon)' ret_type_id=76 vlen=3 'dev' type_id=1488 'attr' type_id=3909 'buf' type_id=56 ...and [114376] DECL_TAG 'address=0xffffffff8181eac0' type_id=114375 component_idx=-1 ...tell us that ffffffff8181eac0 t wakeup_show ...has the second signature. So we can accommodate multiple functions with conflicting signatures in BTF, since we have added extra info to map from function description in BTF to address. In total for vmlinux 52006 DECL_TAGs are added, and these add 2MB to the BTF representation. Signed-off-by: Alan Maguire <alan.maguire@oracle.com> --- btf_encoder.c | 26 ++++++++++++++++++++------ 1 file changed, 20 insertions(+), 6 deletions(-)