Message ID | 20230316170149.4106586-1-jolsa@kernel.org (mailing list archive) |
---|---|
Headers | show |
Series | mm/bpf/perf: Store build id in file object | expand |
On Thu, Mar 16, 2023 at 06:01:40PM +0100, Jiri Olsa wrote: > hi, > this patchset adds build id object pointer to struct file object. > > We have several use cases for build id to be used in BPF programs > [2][3]. Yes, you have use cases, but you never answered the question I asked: Is this going to be enabled by every distro kernel, or is it for special use-cases where only people doing a very specialised thing who are willing to build their own kernels will use it? Saying "hubble/tetragon" doesn't answer that question. Maybe it does to you, but I have no idea what that software is. Put it another way: how does this make *MY* life better? Literally me. How will it affect my life?
On Thu, Mar 16, 2023 at 10:35 AM Matthew Wilcox <willy@infradead.org> wrote: > > On Thu, Mar 16, 2023 at 06:01:40PM +0100, Jiri Olsa wrote: > > hi, > > this patchset adds build id object pointer to struct file object. > > > > We have several use cases for build id to be used in BPF programs > > [2][3]. > > Yes, you have use cases, but you never answered the question I asked: > > Is this going to be enabled by every distro kernel, or is it for special > use-cases where only people doing a very specialised thing who are > willing to build their own kernels will use it? > > Saying "hubble/tetragon" doesn't answer that question. Maybe it does > to you, but I have no idea what that software is. > > Put it another way: how does this make *MY* life better? Literally me. > How will it affect my life? So at Google we use build IDs for all profiling, I believe Meta is the same but obviously I can't speak for them. For BPF program stack traces, using build ID + offset stack traces is preferable to perf's whole system synthesis of mmap events based on data held in /proc/pid/maps. Individual stack traces are larger, but you avoid the ever growing problem of coming up with some initial virtual memory state that will allow you to identify samples. This doesn't answer the question about how this will help you, but I expect over time you will see scalability issues and also want to use tools assuming build IDs are present and cheap to access. Thanks, Ian
On Thu, Mar 16, 2023 at 10:50 AM Ian Rogers <irogers@google.com> wrote: > > On Thu, Mar 16, 2023 at 10:35 AM Matthew Wilcox <willy@infradead.org> wrote: > > > > On Thu, Mar 16, 2023 at 06:01:40PM +0100, Jiri Olsa wrote: > > > hi, > > > this patchset adds build id object pointer to struct file object. > > > > > > We have several use cases for build id to be used in BPF programs > > > [2][3]. > > > > Yes, you have use cases, but you never answered the question I asked: > > > > Is this going to be enabled by every distro kernel, or is it for special > > use-cases where only people doing a very specialised thing who are > > willing to build their own kernels will use it? > > > > Saying "hubble/tetragon" doesn't answer that question. Maybe it does > > to you, but I have no idea what that software is. > > > > Put it another way: how does this make *MY* life better? Literally me. > > How will it affect my life? > > So at Google we use build IDs for all profiling, I believe Meta is the > same but obviously I can't speak for them. For BPF program stack Yep, Meta is also capturing stack traces with build ID as well, if possible. Build IDs help with profiling short-lived processes which exit before the profiling session is done and user-space tooling is able to collect /proc/<pid>/maps contents (which is what Ian is referring to here). But also build ID allows to offload more of the expensive stack symbolization process (converting raw memory addresses into human readable function+offset+file path+line numbers information) to dedicated remote servers, by allowing to cache and reuse preprocessed DWARF/ELF information based on build ID. I believe perf tool is also using build ID, so any tool relying on perf capturing full and complete profiling data for system-wide performance analysis would benefit as well. Generally speaking, there is a whole ecosystem built on top of assumption that binaries have build ID and profiling tooling is able to provide more value if those build IDs are more reliably collected. Which ultimately benefits the entire open-source ecosystem by allowing people to spot issues (not necessarily just performance, it could be correctness issues as well) more reliably, fix them, and benefit every user. > traces, using build ID + offset stack traces is preferable to perf's > whole system synthesis of mmap events based on data held in > /proc/pid/maps. Individual stack traces are larger, but you avoid the > ever growing problem of coming up with some initial virtual memory > state that will allow you to identify samples. > > This doesn't answer the question about how this will help you, but I > expect over time you will see scalability issues and also want to use > tools assuming build IDs are present and cheap to access. > > Thanks, > Ian
On Thu, Mar 16, 2023 at 02:51:52PM -0700, Andrii Nakryiko wrote: > Yep, Meta is also capturing stack traces with build ID as well, if > possible. Build IDs help with profiling short-lived processes which > exit before the profiling session is done and user-space tooling is > able to collect /proc/<pid>/maps contents (which is what Ian is > referring to here). But also build ID allows to offload more of the > expensive stack symbolization process (converting raw memory addresses > into human readable function+offset+file path+line numbers > information) to dedicated remote servers, by allowing to cache and > reuse preprocessed DWARF/ELF information based on build ID. > > I believe perf tool is also using build ID, so any tool relying on > perf capturing full and complete profiling data for system-wide > performance analysis would benefit as well. > > Generally speaking, there is a whole ecosystem built on top of > assumption that binaries have build ID and profiling tooling is able > to provide more value if those build IDs are more reliably collected. > Which ultimately benefits the entire open-source ecosystem by allowing > people to spot issues (not necessarily just performance, it could be > correctness issues as well) more reliably, fix them, and benefit every > user. But build IDs are _generally_ available. The only problem (AIUI) is when you're trying to examine the contents of one container from another container. And to solve that problem, you're imposing a cost on everybody else with (so far) pretty vague justifications. I really don't like to see you growing struct file for this (nor struct inode, nor struct vm_area_struct). It's all quite unsatisfactory and I don't have a good suggestion.
On Thu, Mar 16, 2023 at 8:51 PM Matthew Wilcox <willy@infradead.org> wrote: > > On Thu, Mar 16, 2023 at 02:51:52PM -0700, Andrii Nakryiko wrote: > > Yep, Meta is also capturing stack traces with build ID as well, if > > possible. Build IDs help with profiling short-lived processes which > > exit before the profiling session is done and user-space tooling is > > able to collect /proc/<pid>/maps contents (which is what Ian is > > referring to here). But also build ID allows to offload more of the > > expensive stack symbolization process (converting raw memory addresses > > into human readable function+offset+file path+line numbers > > information) to dedicated remote servers, by allowing to cache and > > reuse preprocessed DWARF/ELF information based on build ID. > > > > I believe perf tool is also using build ID, so any tool relying on > > perf capturing full and complete profiling data for system-wide > > performance analysis would benefit as well. > > > > Generally speaking, there is a whole ecosystem built on top of > > assumption that binaries have build ID and profiling tooling is able > > to provide more value if those build IDs are more reliably collected. > > Which ultimately benefits the entire open-source ecosystem by allowing > > people to spot issues (not necessarily just performance, it could be > > correctness issues as well) more reliably, fix them, and benefit every > > user. > > But build IDs are _generally_ available. The only problem (AIUI) > is when you're trying to examine the contents of one container from > another container. And to solve that problem, you're imposing a cost > on everybody else with (so far) pretty vague justifications. I really > don't like to see you growing struct file for this (nor struct inode, > nor struct vm_area_struct). It's all quite unsatisfactory and I don't > have a good suggestion. There is a lot of profiling, observability and debugging tooling built using BPF. And when capturing stack traces from BPF programs, if the build ID note is not physically present in memory, fetching it from the BPF program might fail in NMI (and other non-faultable contexts). This patch set is about making sure we always can fetch build ID, even from most restrictive environments. It's guarded by Kconfig to avoid adding 8 bytes of overhead to struct file for environment where this might be unacceptable, giving users and distros a choice.
On Fri, Mar 17, 2023 at 09:33:17AM -0700, Andrii Nakryiko wrote: > > But build IDs are _generally_ available. The only problem (AIUI) > > is when you're trying to examine the contents of one container from > > another container. And to solve that problem, you're imposing a cost > > on everybody else with (so far) pretty vague justifications. I really > > don't like to see you growing struct file for this (nor struct inode, > > nor struct vm_area_struct). It's all quite unsatisfactory and I don't > > have a good suggestion. > > There is a lot of profiling, observability and debugging tooling built > using BPF. And when capturing stack traces from BPF programs, if the > build ID note is not physically present in memory, fetching it from > the BPF program might fail in NMI (and other non-faultable contexts). > This patch set is about making sure we always can fetch build ID, even > from most restrictive environments. It's guarded by Kconfig to avoid > adding 8 bytes of overhead to struct file for environment where this > might be unacceptable, giving users and distros a choice. Lovely. As an exercise you might want to collect the stats on the number of struct file instances on the system vs. the number of files that happen to be ELF objects and are currently mmapped anywhere. That does depend upon the load, obviously, but it's not hard to collect - you already have more than enough hooks inserted in the relevant places. That might give a better appreciation of the reactions...
On Fri, Mar 17, 2023 at 09:14:03PM +0000, Al Viro wrote: > On Fri, Mar 17, 2023 at 09:33:17AM -0700, Andrii Nakryiko wrote: > > > > But build IDs are _generally_ available. The only problem (AIUI) > > > is when you're trying to examine the contents of one container from > > > another container. And to solve that problem, you're imposing a cost > > > on everybody else with (so far) pretty vague justifications. I really > > > don't like to see you growing struct file for this (nor struct inode, > > > nor struct vm_area_struct). It's all quite unsatisfactory and I don't > > > have a good suggestion. > > > > There is a lot of profiling, observability and debugging tooling built > > using BPF. And when capturing stack traces from BPF programs, if the > > build ID note is not physically present in memory, fetching it from > > the BPF program might fail in NMI (and other non-faultable contexts). > > This patch set is about making sure we always can fetch build ID, even > > from most restrictive environments. It's guarded by Kconfig to avoid > > adding 8 bytes of overhead to struct file for environment where this > > might be unacceptable, giving users and distros a choice. > > Lovely. As an exercise you might want to collect the stats on the > number of struct file instances on the system vs. the number of files > that happen to be ELF objects and are currently mmapped anywhere. > That does depend upon the load, obviously, but it's not hard to collect - > you already have more than enough hooks inserted in the relevant places. > That might give a better appreciation of the reactions... One possibility would be a bit stolen from inode flags + hash keyed by struct inode address (middle bits make for a decent hash function); inode eviction would check that bit and kick the corresponding thing from hash if the bit is set. Associating that thing with inode => hash lookup/insert + set the bit.
On Fri, Mar 17, 2023 at 2:21 PM Al Viro <viro@zeniv.linux.org.uk> wrote: > > On Fri, Mar 17, 2023 at 09:14:03PM +0000, Al Viro wrote: > > On Fri, Mar 17, 2023 at 09:33:17AM -0700, Andrii Nakryiko wrote: > > > > > > But build IDs are _generally_ available. The only problem (AIUI) > > > > is when you're trying to examine the contents of one container from > > > > another container. And to solve that problem, you're imposing a cost > > > > on everybody else with (so far) pretty vague justifications. I really > > > > don't like to see you growing struct file for this (nor struct inode, > > > > nor struct vm_area_struct). It's all quite unsatisfactory and I don't > > > > have a good suggestion. > > > > > > There is a lot of profiling, observability and debugging tooling built > > > using BPF. And when capturing stack traces from BPF programs, if the > > > build ID note is not physically present in memory, fetching it from > > > the BPF program might fail in NMI (and other non-faultable contexts). > > > This patch set is about making sure we always can fetch build ID, even > > > from most restrictive environments. It's guarded by Kconfig to avoid > > > adding 8 bytes of overhead to struct file for environment where this > > > might be unacceptable, giving users and distros a choice. > > > > Lovely. As an exercise you might want to collect the stats on the > > number of struct file instances on the system vs. the number of files > > that happen to be ELF objects and are currently mmapped anywhere. That's a good suggestion. I wrote a simple script that uses the drgn tool ([0]), it enables nice introspection of the state of the kernel memory for the running kernel. The script is at the bottom ([1]) for anyone to sanity check. I didn't try to figure out which file is mmaped as executable and which didn't, so let's do worst case and assume that none of the file is executable, and thus that 8 byte pointer is a waste for all of them. On my devserver I got: task_cnt=15984 uniq_file_cnt=56780 On randomly chosen production host I got: task_cnt=6387 uniq_file_cnt=22514 So it seems like my devserver is "busier" than the production host. :) Above numbers suggest that my devserver's kernel has about 57000 *unique* `struct file *` instances. That's 450KB of overhead. That's not much by any modern standard. But let's say I'm way off, and we have 1 million struct files. That's 8MB overhead. I'd argue that those 8MB is not a big deal even on a normal laptop, even less so on production servers. Especially if you have 1 million active struct file instances created in the system, as way more will be used for application-specific needs. > > That does depend upon the load, obviously, but it's not hard to collect - > > you already have more than enough hooks inserted in the relevant places. > > That might give a better appreciation of the reactions... > > One possibility would be a bit stolen from inode flags + hash keyed by > struct inode address (middle bits make for a decent hash function); > inode eviction would check that bit and kick the corresponding thing > from hash if the bit is set. > > Associating that thing with inode => hash lookup/insert + set the bit. This is an interesting idea, but now we are running into a few unnecessary problems. We need to have a global dynamically sized hash map in the system. If we fix the number of buckets, we risk either wasting memory on an underutilized system (if we oversize), or performance problems due to collisions (if we undersize) if we have a busy system with lots of executables mapped in memory. If we don't pre-size, then we are talking about reallocations, rehashing, and doing that under global lock or something like that. Further, we'd have to take locks on buckets, which causes further problems for looking up build ID from this hashmap in NMI context for perf events and BPF programs, as locks can't be safely taken under those conditions, and thus fetching build ID would still be unreliable (though less so than it is today, of course). All of this is solvable to some degree (but not perfectly and not with simple and elegant approaches), but seems like an unnecessarily overcomplication compared to the amount of memory that we hope to save. It still feels like a Kconfig-guarded 8 byte field per struct file is a reasonable price for gaining reliable build ID information for profiling/tracing tools. [0] https://drgn.readthedocs.io/en/latest/index.html [1] Script I used: from drgn.helpers.linux.pid import for_each_task from drgn.helpers.linux.fs import for_each_file task_cnt = 0 file_set = set() for task in for_each_task(prog): task_cnt += 1 try: for (fd, file) in for_each_file(task): file_set.add(file.value_()) except: pass uniq_file_cnt = len(file_set) print(f"task_cnt={task_cnt} uniq_file_cnt={uniq_file_cnt}")
On Thu, Mar 16, 2023 at 05:34:41PM +0000, Matthew Wilcox wrote: > On Thu, Mar 16, 2023 at 06:01:40PM +0100, Jiri Olsa wrote: > > hi, > > this patchset adds build id object pointer to struct file object. > > > > We have several use cases for build id to be used in BPF programs > > [2][3]. > > Yes, you have use cases, but you never answered the question I asked: > > Is this going to be enabled by every distro kernel, or is it for special > use-cases where only people doing a very specialised thing who are > willing to build their own kernels will use it? I hope so, but I guess only time tell.. given the response by Ian and Andrii there are 3 big users already > > Saying "hubble/tetragon" doesn't answer that question. Maybe it does > to you, but I have no idea what that software is. > > Put it another way: how does this make *MY* life better? Literally me. > How will it affect my life? sorry, I'm not sure how to answer this.. there're possible users of the feature and the price seems to be reasonable to me, but of course I understand file/inode objects are tricky to mess with jirka
On Fri, Mar 17, 2023 at 11:08:44PM -0700, Andrii Nakryiko wrote: SNIP > > > That does depend upon the load, obviously, but it's not hard to collect - > > > you already have more than enough hooks inserted in the relevant places. > > > That might give a better appreciation of the reactions... > > > > One possibility would be a bit stolen from inode flags + hash keyed by > > struct inode address (middle bits make for a decent hash function); > > inode eviction would check that bit and kick the corresponding thing > > from hash if the bit is set. > > > > Associating that thing with inode => hash lookup/insert + set the bit. > > This is an interesting idea, but now we are running into a few > unnecessary problems. We need to have a global dynamically sized hash > map in the system. If we fix the number of buckets, we risk either > wasting memory on an underutilized system (if we oversize), or > performance problems due to collisions (if we undersize) if we have a > busy system with lots of executables mapped in memory. If we don't > pre-size, then we are talking about reallocations, rehashing, and > doing that under global lock or something like that. Further, we'd > have to take locks on buckets, which causes further problems for > looking up build ID from this hashmap in NMI context for perf events > and BPF programs, as locks can't be safely taken under those > conditions, and thus fetching build ID would still be unreliable > (though less so than it is today, of course). > > All of this is solvable to some degree (but not perfectly and not with > simple and elegant approaches), but seems like an unnecessarily > overcomplication compared to the amount of memory that we hope to > save. It still feels like a Kconfig-guarded 8 byte field per struct > file is a reasonable price for gaining reliable build ID information > for profiling/tracing tools. > > > [0] https://drgn.readthedocs.io/en/latest/index.html > > [1] Script I used: > > from drgn.helpers.linux.pid import for_each_task > from drgn.helpers.linux.fs import for_each_file > > task_cnt = 0 > file_set = set() > > for task in for_each_task(prog): > task_cnt += 1 > try: > for (fd, file) in for_each_file(task): > file_set.add(file.value_()) > except: > pass > > uniq_file_cnt = len(file_set) > print(f"task_cnt={task_cnt} uniq_file_cnt={uniq_file_cnt}") great you beat me to this, I wouldn't have thought of using drgn for this ;-) I'll see if I can install it to some of our test servers thanks, jirka
On Sat, Mar 18, 2023 at 09:33:49AM +0100, Jiri Olsa wrote: > On Thu, Mar 16, 2023 at 05:34:41PM +0000, Matthew Wilcox wrote: > > On Thu, Mar 16, 2023 at 06:01:40PM +0100, Jiri Olsa wrote: > > > hi, > > > this patchset adds build id object pointer to struct file object. > > > > > > We have several use cases for build id to be used in BPF programs > > > [2][3]. > > > > Yes, you have use cases, but you never answered the question I asked: > > > > Is this going to be enabled by every distro kernel, or is it for special > > use-cases where only people doing a very specialised thing who are > > willing to build their own kernels will use it? > > I hope so, but I guess only time tell.. given the response by Ian and Andrii > there are 3 big users already So the whole "There's a config option to turn it off" shtick is just a fig-leaf. I won't ever see it turned off. You're imposing the cost of this on EVERYONE who runs a distro kernel. And almost nobody will see any benefits from it. Thanks for admitting that.
On Sat, Mar 18, 2023 at 03:16:45PM +0000, Matthew Wilcox wrote: > On Sat, Mar 18, 2023 at 09:33:49AM +0100, Jiri Olsa wrote: > > On Thu, Mar 16, 2023 at 05:34:41PM +0000, Matthew Wilcox wrote: > > > On Thu, Mar 16, 2023 at 06:01:40PM +0100, Jiri Olsa wrote: > > > > hi, > > > > this patchset adds build id object pointer to struct file object. > > > > > > > > We have several use cases for build id to be used in BPF programs > > > > [2][3]. > > > > > > Yes, you have use cases, but you never answered the question I asked: > > > > > > Is this going to be enabled by every distro kernel, or is it for special > > > use-cases where only people doing a very specialised thing who are > > > willing to build their own kernels will use it? > > > > I hope so, but I guess only time tell.. given the response by Ian and Andrii > > there are 3 big users already > > So the whole "There's a config option to turn it off" shtick is just a > fig-leaf. I won't ever see it turned off. You're imposing the cost of > this on EVERYONE who runs a distro kernel. And almost nobody will see > any benefits from it. Thanks for admitting that. > sure, I understand that's legit way of looking at this I can imagine distros would have that enabled for debugging version of the kernel (like in fedora), and if that proves to be useful the standard kernel might take it, but yes, there's price (for discussion as pointed by Andrii) and it'd be for the distro maintainers to decide jirka
Em Sat, Mar 18, 2023 at 03:16:45PM +0000, Matthew Wilcox escreveu: > On Sat, Mar 18, 2023 at 09:33:49AM +0100, Jiri Olsa wrote: > > On Thu, Mar 16, 2023 at 05:34:41PM +0000, Matthew Wilcox wrote: > > > On Thu, Mar 16, 2023 at 06:01:40PM +0100, Jiri Olsa wrote: > > > > hi, > > > > this patchset adds build id object pointer to struct file object. > > > > > > > > We have several use cases for build id to be used in BPF programs > > > > [2][3]. > > > > > > Yes, you have use cases, but you never answered the question I asked: > > > > > > Is this going to be enabled by every distro kernel, or is it for special > > > use-cases where only people doing a very specialised thing who are > > > willing to build their own kernels will use it? > > > > I hope so, but I guess only time tell.. given the response by Ian and Andrii > > there are 3 big users already > > So the whole "There's a config option to turn it off" shtick is just a > fig-leaf. I won't ever see it turned off. You're imposing the cost of > this on EVERYONE who runs a distro kernel. And almost nobody will see > any benefits from it. Thanks for admitting that. I agree that build-ids are not useful for all 'struct file' uses, just for executable files and for people wanting to have better observability capabilities. Having said that, it seems there will be no extra memory overhead at least for a fedora:36 x86_64 kernel: void __init files_init(void) { filp_cachep = kmem_cache_create("filp", sizeof(struct file), 0, SLAB_HWCACHE_ALIGN | SLAB_PANIC | SLAB_ACCOUNT, NULL); percpu_counter_init(&nr_files, 0, GFP_KERNEL); } [root@quaco ~]# pahole file | grep size: -A2 /* size: 232, cachelines: 4, members: 20 */ /* sum members: 228, holes: 1, sum holes: 4 */ /* last cacheline: 40 bytes */ [acme@quaco perf-tools]$ uname -a Linux quaco 6.1.11-100.fc36.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Feb 9 20:36:30 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux [root@quaco ~]# head -2 /proc/slabinfo slabinfo - version: 2.1 # name <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables <limit> <batchcount> <sharedfactor> : slabdata <active_slabs> <num_slabs> <sharedavail> [root@quaco ~]# grep -w filp /proc/slabinfo filp 12452 13056 256 32 2 : tunables 0 0 0 : slabdata 408 408 0 [root@quaco ~]# so there are 24 bytes on the 4th cacheline that are not being used, right? One other observation is that maybe we could do it as the 'struct sock' hierachy in networking, where we would have a 'struct exec_file' that would be: struct exec_file { struct file file; char build_id[20]; } say, and then when we create the 'struct file' in __alloc_file() we could check some bit in 'flags' like Al Viro suggested and pick a different slab than 'filp_cachep', that has that extra space for the build_id (and whatever else exec related state we may end up wanting, if ever). No core fs will need to know about that except when we go free it, to free from the right slab cache. In current distro configs, no overhead would take place if I read that SLAB_HWCACHE_ALIGN thing right, no? - Arnaldo
On Wed, Mar 22, 2023 at 8:45 AM Arnaldo Carvalho de Melo <acme@kernel.org> wrote: > > Em Sat, Mar 18, 2023 at 03:16:45PM +0000, Matthew Wilcox escreveu: > > On Sat, Mar 18, 2023 at 09:33:49AM +0100, Jiri Olsa wrote: > > > On Thu, Mar 16, 2023 at 05:34:41PM +0000, Matthew Wilcox wrote: > > > > On Thu, Mar 16, 2023 at 06:01:40PM +0100, Jiri Olsa wrote: > > > > > hi, > > > > > this patchset adds build id object pointer to struct file object. > > > > > > > > > > We have several use cases for build id to be used in BPF programs > > > > > [2][3]. > > > > > > > > Yes, you have use cases, but you never answered the question I asked: > > > > > > > > Is this going to be enabled by every distro kernel, or is it for special > > > > use-cases where only people doing a very specialised thing who are > > > > willing to build their own kernels will use it? > > > > > > I hope so, but I guess only time tell.. given the response by Ian and Andrii > > > there are 3 big users already > > > > So the whole "There's a config option to turn it off" shtick is just a > > fig-leaf. I won't ever see it turned off. You're imposing the cost of > > this on EVERYONE who runs a distro kernel. And almost nobody will see > > any benefits from it. Thanks for admitting that. > > I agree that build-ids are not useful for all 'struct file' uses, just > for executable files and for people wanting to have better observability > capabilities. > > Having said that, it seems there will be no extra memory overhead at > least for a fedora:36 x86_64 kernel: > > void __init files_init(void) > { > filp_cachep = kmem_cache_create("filp", sizeof(struct file), 0, > SLAB_HWCACHE_ALIGN | SLAB_PANIC | SLAB_ACCOUNT, NULL); > percpu_counter_init(&nr_files, 0, GFP_KERNEL); > } > > [root@quaco ~]# pahole file | grep size: -A2 > /* size: 232, cachelines: 4, members: 20 */ > /* sum members: 228, holes: 1, sum holes: 4 */ > /* last cacheline: 40 bytes */ > [acme@quaco perf-tools]$ uname -a > Linux quaco 6.1.11-100.fc36.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Feb 9 20:36:30 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux > [root@quaco ~]# head -2 /proc/slabinfo > slabinfo - version: 2.1 > # name <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables <limit> <batchcount> <sharedfactor> : slabdata <active_slabs> <num_slabs> <sharedavail> > [root@quaco ~]# grep -w filp /proc/slabinfo > filp 12452 13056 256 32 2 : tunables 0 0 0 : slabdata 408 408 0 > [root@quaco ~]# > > so there are 24 bytes on the 4th cacheline that are not being used, > right? Well, even better then! > > One other observation is that maybe we could do it as the 'struct sock' > hierachy in networking, where we would have a 'struct exec_file' that > would be: > > struct exec_file { > struct file file; > char build_id[20]; > } > > say, and then when we create the 'struct file' in __alloc_file() we > could check some bit in 'flags' like Al Viro suggested and pick a > different slab than 'filp_cachep', that has that extra space for the > build_id (and whatever else exec related state we may end up wanting, if > ever). > > No core fs will need to know about that except when we go free it, to > free from the right slab cache. > > In current distro configs, no overhead would take place if I read that > SLAB_HWCACHE_ALIGN thing right, no? Makes sense to me as well. Whatever the solution, as long as it's usable from NMI contexts would be fine for the purposes of fetching build ID. It would be good to hear from folks that are opposing adding a pointer field to struct file whether they prefer this way instead? > > - Arnaldo
On Fri, Mar 31, 2023 at 11:19:45AM -0700, Andrii Nakryiko wrote: > On Wed, Mar 22, 2023 at 8:45 AM Arnaldo Carvalho de Melo > <acme@kernel.org> wrote: > > Having said that, it seems there will be no extra memory overhead at > > least for a fedora:36 x86_64 kernel: > > Makes sense to me as well. Whatever the solution, as long as it's > usable from NMI contexts would be fine for the purposes of fetching > build ID. It would be good to hear from folks that are opposing adding > a pointer field to struct file whether they prefer this way instead? Still no. While it may not take up any room right now, this will surely not be the last thing added to struct file. When something which is genuinely useful needs to be added, that person should not have to sort out your mess first, NAK now, NAK tomorrow, NAK forever. Al told you how you could do it without trampling on core data structures.
On Fri, Mar 31, 2023 at 11:36 AM Matthew Wilcox <willy@infradead.org> wrote: > > On Fri, Mar 31, 2023 at 11:19:45AM -0700, Andrii Nakryiko wrote: > > On Wed, Mar 22, 2023 at 8:45 AM Arnaldo Carvalho de Melo > > <acme@kernel.org> wrote: > > > Having said that, it seems there will be no extra memory overhead at > > > least for a fedora:36 x86_64 kernel: > > > > Makes sense to me as well. Whatever the solution, as long as it's > > usable from NMI contexts would be fine for the purposes of fetching > > build ID. It would be good to hear from folks that are opposing adding > > a pointer field to struct file whether they prefer this way instead? > > Still no. While it may not take up any room right now, this will > surely not be the last thing added to struct file. When something > which is genuinely useful needs to be added, that person should > not have to sort out your mess first, So I assume you are talking about adding a pointer field to the struct file, right? What about the alternative proposed by Arnaldo to have a struct exec_file that extends a struct file? > > NAK now, NAK tomorrow, NAK forever. Al told you how you could do it > without trampling on core data structures. As I replied to Al, any solution that will have a lookup table on the side isn't compatible with usage from NMI context due to locking. And lots of tracing use cases are based on perf counters which are handled in NMI context. And that's besides all the complexities with right-sizing hash maps (if hashmaps are to be used).