Message ID | 20240321163705.3067592-1-surenb@google.com (mailing list archive) |
---|---|
Headers | show |
Series | Memory allocation profiling | expand |
On Thu, 21 Mar 2024 09:36:22 -0700 Suren Baghdasaryan <surenb@google.com> wrote: > Low overhead [1] per-callsite memory allocation profiling. Not just for > debug kernels, overhead low enough to be deployed in production. > > Example output: > root@moria-kvm:~# sort -rn /proc/allocinfo > 127664128 31168 mm/page_ext.c:270 func:alloc_page_ext > 56373248 4737 mm/slub.c:2259 func:alloc_slab_page > 14880768 3633 mm/readahead.c:247 func:page_cache_ra_unbounded > 14417920 3520 mm/mm_init.c:2530 func:alloc_large_system_hash > 13377536 234 block/blk-mq.c:3421 func:blk_mq_alloc_rqs > 11718656 2861 mm/filemap.c:1919 func:__filemap_get_folio > 9192960 2800 kernel/fork.c:307 func:alloc_thread_stack_node > 4206592 4 net/netfilter/nf_conntrack_core.c:2567 func:nf_ct_alloc_hashtable > 4136960 1010 drivers/staging/ctagmod/ctagmod.c:20 [ctagmod] func:ctagmod_start > 3940352 962 mm/memory.c:4214 func:alloc_anon_folio > 2894464 22613 fs/kernfs/dir.c:615 func:__kernfs_new_node Did you consider adding a knob to permit all the data to be wiped out? So people can zap everything, run the chosen workload then go see what happened? Of course, this can be done in userspace by taking a snapshot before and after, then crunching on the two....
On Thu, Mar 21, 2024 at 1:42 PM Andrew Morton <akpm@linux-foundation.org> wrote: > > On Thu, 21 Mar 2024 09:36:22 -0700 Suren Baghdasaryan <surenb@google.com> wrote: > > > Low overhead [1] per-callsite memory allocation profiling. Not just for > > debug kernels, overhead low enough to be deployed in production. > > > > Example output: > > root@moria-kvm:~# sort -rn /proc/allocinfo > > 127664128 31168 mm/page_ext.c:270 func:alloc_page_ext > > 56373248 4737 mm/slub.c:2259 func:alloc_slab_page > > 14880768 3633 mm/readahead.c:247 func:page_cache_ra_unbounded > > 14417920 3520 mm/mm_init.c:2530 func:alloc_large_system_hash > > 13377536 234 block/blk-mq.c:3421 func:blk_mq_alloc_rqs > > 11718656 2861 mm/filemap.c:1919 func:__filemap_get_folio > > 9192960 2800 kernel/fork.c:307 func:alloc_thread_stack_node > > 4206592 4 net/netfilter/nf_conntrack_core.c:2567 func:nf_ct_alloc_hashtable > > 4136960 1010 drivers/staging/ctagmod/ctagmod.c:20 [ctagmod] func:ctagmod_start > > 3940352 962 mm/memory.c:4214 func:alloc_anon_folio > > 2894464 22613 fs/kernfs/dir.c:615 func:__kernfs_new_node > > Did you consider adding a knob to permit all the data to be wiped out? > So people can zap everything, run the chosen workload then go see what > happened? > > Of course, this can be done in userspace by taking a snapshot before > and after, then crunching on the two.... Yeah, that's exactly what I was envisioning. Don't think we need to complicate more by adding a reset functionality unless there are other reasons for it. Thanks!
Hi, On 2024-03-21 17:36, Suren Baghdasaryan wrote: > Overview: > Low overhead [1] per-callsite memory allocation profiling. Not just for > debug kernels, overhead low enough to be deployed in production. > > Example output: > root@moria-kvm:~# sort -rn /proc/allocinfo > 127664128 31168 mm/page_ext.c:270 func:alloc_page_ext > 56373248 4737 mm/slub.c:2259 func:alloc_slab_page > 14880768 3633 mm/readahead.c:247 func:page_cache_ra_unbounded > 14417920 3520 mm/mm_init.c:2530 func:alloc_large_system_hash > 13377536 234 block/blk-mq.c:3421 func:blk_mq_alloc_rqs > 11718656 2861 mm/filemap.c:1919 func:__filemap_get_folio > 9192960 2800 kernel/fork.c:307 func:alloc_thread_stack_node > 4206592 4 net/netfilter/nf_conntrack_core.c:2567 func:nf_ct_alloc_hashtable > 4136960 1010 drivers/staging/ctagmod/ctagmod.c:20 [ctagmod] func:ctagmod_start > 3940352 962 mm/memory.c:4214 func:alloc_anon_folio > 2894464 22613 fs/kernfs/dir.c:615 func:__kernfs_new_node > ... > > Since v5 [2]: > - Added Reviewed-by and Acked-by, per Vlastimil Babka and Miguel Ojeda > - Changed pgalloc_tag_{add|sub} to use number of pages instead of order, per Matthew Wilcox > - Changed pgalloc_tag_sub_bytes to pgalloc_tag_sub_pages and adjusted the usage, per Matthew Wilcox > - Moved static key check before prepare_slab_obj_exts_hook(), per Vlastimil Babka > - Fixed RUST helper, per Miguel Ojeda > - Fixed documentation, per Randy Dunlap > - Rebased over mm-unstable > > Usage: > kconfig options: > - CONFIG_MEM_ALLOC_PROFILING > - CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT > - CONFIG_MEM_ALLOC_PROFILING_DEBUG > adds warnings for allocations that weren't accounted because of a > missing annotation > > sysctl: > /proc/sys/vm/mem_profiling > > Runtime info: > /proc/allocinfo > > Notes: > > [1]: Overhead > To measure the overhead we are comparing the following configurations: > (1) Baseline with CONFIG_MEMCG_KMEM=n > (2) Disabled by default (CONFIG_MEM_ALLOC_PROFILING=y && > CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=n) > (3) Enabled by default (CONFIG_MEM_ALLOC_PROFILING=y && > CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=y) > (4) Enabled at runtime (CONFIG_MEM_ALLOC_PROFILING=y && > CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=n && /proc/sys/vm/mem_profiling=1) > (5) Baseline with CONFIG_MEMCG_KMEM=y && allocating with __GFP_ACCOUNT > (6) Disabled by default (CONFIG_MEM_ALLOC_PROFILING=y && > CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=n) && CONFIG_MEMCG_KMEM=y > (7) Enabled by default (CONFIG_MEM_ALLOC_PROFILING=y && > CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=y) && CONFIG_MEMCG_KMEM=y > > Performance overhead: > To evaluate performance we implemented an in-kernel test executing > multiple get_free_page/free_page and kmalloc/kfree calls with allocation > sizes growing from 8 to 240 bytes with CPU frequency set to max and CPU > affinity set to a specific CPU to minimize the noise. Below are results > from running the test on Ubuntu 22.04.2 LTS with 6.8.0-rc1 kernel on > 56 core Intel Xeon: > > kmalloc pgalloc > (1 baseline) 6.764s 16.902s > (2 default disabled) 6.793s (+0.43%) 17.007s (+0.62%) > (3 default enabled) 7.197s (+6.40%) 23.666s (+40.02%) > (4 runtime enabled) 7.405s (+9.48%) 23.901s (+41.41%) > (5 memcg) 13.388s (+97.94%) 48.460s (+186.71%) > (6 def disabled+memcg) 13.332s (+97.10%) 48.105s (+184.61%) > (7 def enabled+memcg) 13.446s (+98.78%) 54.963s (+225.18%) > > Memory overhead: > Kernel size: > > text data bss dec diff > (1) 26515311 18890222 17018880 62424413 > (2) 26524728 19423818 16740352 62688898 264485 > (3) 26524724 19423818 16740352 62688894 264481 > (4) 26524728 19423818 16740352 62688898 264485 > (5) 26541782 18964374 16957440 62463596 39183 > > Memory consumption on a 56 core Intel CPU with 125GB of memory: > Code tags: 192 kB > PageExts: 262144 kB (256MB) > SlabExts: 9876 kB (9.6MB) > PcpuExts: 512 kB (0.5MB) > > Total overhead is 0.2% of total memory. > > Benchmarks: > > Hackbench tests run 100 times: > hackbench -s 512 -l 200 -g 15 -f 25 -P > baseline disabled profiling enabled profiling > avg 0.3543 0.3559 (+0.0016) 0.3566 (+0.0023) > stdev 0.0137 0.0188 0.0077 > > > hackbench -l 10000 > baseline disabled profiling enabled profiling > avg 6.4218 6.4306 (+0.0088) 6.5077 (+0.0859) > stdev 0.0933 0.0286 0.0489 > > stress-ng tests: > stress-ng --class memory --seq 4 -t 60 > stress-ng --class cpu --seq 4 -t 60 > Results posted at: https://evilpiepirate.org/~kent/memalloc_prof_v4_stress-ng/ > > [2] https://lore.kernel.org/all/20240306182440.2003814-1-surenb@google.com/ If I enable this, I consistently get percpu allocation failures. I can occasionally reproduce it in qemu. I've attached the logs and my config, please let me know if there's anything else that could be relevant. Kind regards, Klara Modin
On Fri, Apr 5, 2024 at 6:37 AM Klara Modin <klarasmodin@gmail.com> wrote: > > Hi, > > On 2024-03-21 17:36, Suren Baghdasaryan wrote: > > Overview: > > Low overhead [1] per-callsite memory allocation profiling. Not just for > > debug kernels, overhead low enough to be deployed in production. > > > > Example output: > > root@moria-kvm:~# sort -rn /proc/allocinfo > > 127664128 31168 mm/page_ext.c:270 func:alloc_page_ext > > 56373248 4737 mm/slub.c:2259 func:alloc_slab_page > > 14880768 3633 mm/readahead.c:247 func:page_cache_ra_unbounded > > 14417920 3520 mm/mm_init.c:2530 func:alloc_large_system_hash > > 13377536 234 block/blk-mq.c:3421 func:blk_mq_alloc_rqs > > 11718656 2861 mm/filemap.c:1919 func:__filemap_get_folio > > 9192960 2800 kernel/fork.c:307 func:alloc_thread_stack_node > > 4206592 4 net/netfilter/nf_conntrack_core.c:2567 func:nf_ct_alloc_hashtable > > 4136960 1010 drivers/staging/ctagmod/ctagmod.c:20 [ctagmod] func:ctagmod_start > > 3940352 962 mm/memory.c:4214 func:alloc_anon_folio > > 2894464 22613 fs/kernfs/dir.c:615 func:__kernfs_new_node > > ... > > > > Since v5 [2]: > > - Added Reviewed-by and Acked-by, per Vlastimil Babka and Miguel Ojeda > > - Changed pgalloc_tag_{add|sub} to use number of pages instead of order, per Matthew Wilcox > > - Changed pgalloc_tag_sub_bytes to pgalloc_tag_sub_pages and adjusted the usage, per Matthew Wilcox > > - Moved static key check before prepare_slab_obj_exts_hook(), per Vlastimil Babka > > - Fixed RUST helper, per Miguel Ojeda > > - Fixed documentation, per Randy Dunlap > > - Rebased over mm-unstable > > > > Usage: > > kconfig options: > > - CONFIG_MEM_ALLOC_PROFILING > > - CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT > > - CONFIG_MEM_ALLOC_PROFILING_DEBUG > > adds warnings for allocations that weren't accounted because of a > > missing annotation > > > > sysctl: > > /proc/sys/vm/mem_profiling > > > > Runtime info: > > /proc/allocinfo > > > > Notes: > > > > [1]: Overhead > > To measure the overhead we are comparing the following configurations: > > (1) Baseline with CONFIG_MEMCG_KMEM=n > > (2) Disabled by default (CONFIG_MEM_ALLOC_PROFILING=y && > > CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=n) > > (3) Enabled by default (CONFIG_MEM_ALLOC_PROFILING=y && > > CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=y) > > (4) Enabled at runtime (CONFIG_MEM_ALLOC_PROFILING=y && > > CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=n && /proc/sys/vm/mem_profiling=1) > > (5) Baseline with CONFIG_MEMCG_KMEM=y && allocating with __GFP_ACCOUNT > > (6) Disabled by default (CONFIG_MEM_ALLOC_PROFILING=y && > > CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=n) && CONFIG_MEMCG_KMEM=y > > (7) Enabled by default (CONFIG_MEM_ALLOC_PROFILING=y && > > CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=y) && CONFIG_MEMCG_KMEM=y > > > > Performance overhead: > > To evaluate performance we implemented an in-kernel test executing > > multiple get_free_page/free_page and kmalloc/kfree calls with allocation > > sizes growing from 8 to 240 bytes with CPU frequency set to max and CPU > > affinity set to a specific CPU to minimize the noise. Below are results > > from running the test on Ubuntu 22.04.2 LTS with 6.8.0-rc1 kernel on > > 56 core Intel Xeon: > > > > kmalloc pgalloc > > (1 baseline) 6.764s 16.902s > > (2 default disabled) 6.793s (+0.43%) 17.007s (+0.62%) > > (3 default enabled) 7.197s (+6.40%) 23.666s (+40.02%) > > (4 runtime enabled) 7.405s (+9.48%) 23.901s (+41.41%) > > (5 memcg) 13.388s (+97.94%) 48.460s (+186.71%) > > (6 def disabled+memcg) 13.332s (+97.10%) 48.105s (+184.61%) > > (7 def enabled+memcg) 13.446s (+98.78%) 54.963s (+225.18%) > > > > Memory overhead: > > Kernel size: > > > > text data bss dec diff > > (1) 26515311 18890222 17018880 62424413 > > (2) 26524728 19423818 16740352 62688898 264485 > > (3) 26524724 19423818 16740352 62688894 264481 > > (4) 26524728 19423818 16740352 62688898 264485 > > (5) 26541782 18964374 16957440 62463596 39183 > > > > Memory consumption on a 56 core Intel CPU with 125GB of memory: > > Code tags: 192 kB > > PageExts: 262144 kB (256MB) > > SlabExts: 9876 kB (9.6MB) > > PcpuExts: 512 kB (0.5MB) > > > > Total overhead is 0.2% of total memory. > > > > Benchmarks: > > > > Hackbench tests run 100 times: > > hackbench -s 512 -l 200 -g 15 -f 25 -P > > baseline disabled profiling enabled profiling > > avg 0.3543 0.3559 (+0.0016) 0.3566 (+0.0023) > > stdev 0.0137 0.0188 0.0077 > > > > > > hackbench -l 10000 > > baseline disabled profiling enabled profiling > > avg 6.4218 6.4306 (+0.0088) 6.5077 (+0.0859) > > stdev 0.0933 0.0286 0.0489 > > > > stress-ng tests: > > stress-ng --class memory --seq 4 -t 60 > > stress-ng --class cpu --seq 4 -t 60 > > Results posted at: https://evilpiepirate.org/~kent/memalloc_prof_v4_stress-ng/ > > > > [2] https://lore.kernel.org/all/20240306182440.2003814-1-surenb@google.com/ > > If I enable this, I consistently get percpu allocation failures. I can > occasionally reproduce it in qemu. I've attached the logs and my config, > please let me know if there's anything else that could be relevant. Thanks for the report! In debug_alloc_profiling.log I see: [ 7.445127] percpu: limit reached, disable warning That's probably the reason. I'll take a closer look at the cause of that and how we can fix it. In qemu-alloc3.log I see couple of warnings: [ 1.111620] alloc_tag was not set [ 1.111880] WARNING: CPU: 0 PID: 164 at include/linux/alloc_tag.h:118 kfree (./include/linux/alloc_tag.h:118 (discriminator 1) ./include/linux/alloc_tag.h:161 (discriminator 1) mm/slub.c:2043 ... [ 1.161710] alloc_tag was not cleared (got tag for fs/squashfs/cache.c:413) [ 1.162289] WARNING: CPU: 0 PID: 195 at include/linux/alloc_tag.h:109 kmalloc_trace_noprof (./include/linux/alloc_tag.h:109 (discriminator 1) ./include/linux/alloc_tag.h:149 (discriminator 1) ... Which means we missed to instrument some allocation. Can you please check if disabling CONFIG_MEM_ALLOC_PROFILING_DEBUG fixes QEMU case? In the meantime I'll try to reproduce and fix this. Thanks, Suren. > > Kind regards, > Klara Modin
On 2024-04-05 16:14, Suren Baghdasaryan wrote: > On Fri, Apr 5, 2024 at 6:37 AM Klara Modin <klarasmodin@gmail.com> wrote: >> If I enable this, I consistently get percpu allocation failures. I can >> occasionally reproduce it in qemu. I've attached the logs and my config, >> please let me know if there's anything else that could be relevant. > > Thanks for the report! > In debug_alloc_profiling.log I see: > > [ 7.445127] percpu: limit reached, disable warning > > That's probably the reason. I'll take a closer look at the cause of > that and how we can fix it. Thanks! > > In qemu-alloc3.log I see couple of warnings: > > [ 1.111620] alloc_tag was not set > [ 1.111880] WARNING: CPU: 0 PID: 164 at > include/linux/alloc_tag.h:118 kfree (./include/linux/alloc_tag.h:118 > (discriminator 1) ./include/linux/alloc_tag.h:161 (discriminator 1) > mm/slub.c:2043 ... > > [ 1.161710] alloc_tag was not cleared (got tag for fs/squashfs/cache.c:413) > [ 1.162289] WARNING: CPU: 0 PID: 195 at > include/linux/alloc_tag.h:109 kmalloc_trace_noprof > (./include/linux/alloc_tag.h:109 (discriminator 1) > ./include/linux/alloc_tag.h:149 (discriminator 1) ... > > Which means we missed to instrument some allocation. Can you please > check if disabling CONFIG_MEM_ALLOC_PROFILING_DEBUG fixes QEMU case? > In the meantime I'll try to reproduce and fix this. > Thanks, > Suren. That does seem to be the case from what I can tell. I didn't get the warning in qemu consistently, but it hasn't reappeared for a number of times at least with the debugging option off. Regards, Klara Modin
On Fri, Apr 5, 2024 at 7:30 AM Klara Modin <klarasmodin@gmail.com> wrote: > > On 2024-04-05 16:14, Suren Baghdasaryan wrote: > > On Fri, Apr 5, 2024 at 6:37 AM Klara Modin <klarasmodin@gmail.com> wrote: > >> If I enable this, I consistently get percpu allocation failures. I can > >> occasionally reproduce it in qemu. I've attached the logs and my config, > >> please let me know if there's anything else that could be relevant. > > > > Thanks for the report! > > In debug_alloc_profiling.log I see: > > > > [ 7.445127] percpu: limit reached, disable warning > > > > That's probably the reason. I'll take a closer look at the cause of > > that and how we can fix it. > > Thanks! In the build that produced debug_alloc_profiling.log I think we are consuming all the per-cpu memory reserved for the modules. Could you please try this change and see if that fixes the issue: include/linux/percpu.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/percpu.h b/include/linux/percpu.h index a790afba9386..03053de557cf 100644 --- a/include/linux/percpu.h +++ b/include/linux/percpu.h @@ -17,7 +17,7 @@ /* enough to cover all DEFINE_PER_CPUs in modules */ #ifdef CONFIG_MODULES #ifdef CONFIG_MEM_ALLOC_PROFILING -#define PERCPU_MODULE_RESERVE (8 << 12) +#define PERCPU_MODULE_RESERVE (8 << 13) #else #define PERCPU_MODULE_RESERVE (8 << 10) #endif > > > > > In qemu-alloc3.log I see couple of warnings: > > > > [ 1.111620] alloc_tag was not set > > [ 1.111880] WARNING: CPU: 0 PID: 164 at > > include/linux/alloc_tag.h:118 kfree (./include/linux/alloc_tag.h:118 > > (discriminator 1) ./include/linux/alloc_tag.h:161 (discriminator 1) > > mm/slub.c:2043 ... > > > > [ 1.161710] alloc_tag was not cleared (got tag for fs/squashfs/cache.c:413) > > [ 1.162289] WARNING: CPU: 0 PID: 195 at > > include/linux/alloc_tag.h:109 kmalloc_trace_noprof > > (./include/linux/alloc_tag.h:109 (discriminator 1) > > ./include/linux/alloc_tag.h:149 (discriminator 1) ... > > > > Which means we missed to instrument some allocation. Can you please > > check if disabling CONFIG_MEM_ALLOC_PROFILING_DEBUG fixes QEMU case? > > In the meantime I'll try to reproduce and fix this. > > Thanks, > > Suren. > > That does seem to be the case from what I can tell. I didn't get the > warning in qemu consistently, but it hasn't reappeared for a number of > times at least with the debugging option off. > > Regards, > Klara Modin
On 2024-04-05 17:20, Suren Baghdasaryan wrote: > On Fri, Apr 5, 2024 at 7:30 AM Klara Modin <klarasmodin@gmail.com> wrote: >> >> On 2024-04-05 16:14, Suren Baghdasaryan wrote: >>> On Fri, Apr 5, 2024 at 6:37 AM Klara Modin <klarasmodin@gmail.com> wrote: >>>> If I enable this, I consistently get percpu allocation failures. I can >>>> occasionally reproduce it in qemu. I've attached the logs and my config, >>>> please let me know if there's anything else that could be relevant. >>> >>> Thanks for the report! >>> In debug_alloc_profiling.log I see: >>> >>> [ 7.445127] percpu: limit reached, disable warning >>> >>> That's probably the reason. I'll take a closer look at the cause of >>> that and how we can fix it. >> >> Thanks! > > In the build that produced debug_alloc_profiling.log I think we are > consuming all the per-cpu memory reserved for the modules. Could you > please try this change and see if that fixes the issue: > > include/linux/percpu.h | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/include/linux/percpu.h b/include/linux/percpu.h > index a790afba9386..03053de557cf 100644 > --- a/include/linux/percpu.h > +++ b/include/linux/percpu.h > @@ -17,7 +17,7 @@ > /* enough to cover all DEFINE_PER_CPUs in modules */ > #ifdef CONFIG_MODULES > #ifdef CONFIG_MEM_ALLOC_PROFILING > -#define PERCPU_MODULE_RESERVE (8 << 12) > +#define PERCPU_MODULE_RESERVE (8 << 13) > #else > #define PERCPU_MODULE_RESERVE (8 << 10) > #endif > Yeah, that patch fixes the issue for me. Thanks, Tested-by: Klara Modin
On Fri, Apr 5, 2024 at 8:38 AM Klara Modin <klarasmodin@gmail.com> wrote: > > > > On 2024-04-05 17:20, Suren Baghdasaryan wrote: > > On Fri, Apr 5, 2024 at 7:30 AM Klara Modin <klarasmodin@gmail.com> wrote: > >> > >> On 2024-04-05 16:14, Suren Baghdasaryan wrote: > >>> On Fri, Apr 5, 2024 at 6:37 AM Klara Modin <klarasmodin@gmail.com> wrote: > >>>> If I enable this, I consistently get percpu allocation failures. I can > >>>> occasionally reproduce it in qemu. I've attached the logs and my config, > >>>> please let me know if there's anything else that could be relevant. > >>> > >>> Thanks for the report! > >>> In debug_alloc_profiling.log I see: > >>> > >>> [ 7.445127] percpu: limit reached, disable warning > >>> > >>> That's probably the reason. I'll take a closer look at the cause of > >>> that and how we can fix it. > >> > >> Thanks! > > > > In the build that produced debug_alloc_profiling.log I think we are > > consuming all the per-cpu memory reserved for the modules. Could you > > please try this change and see if that fixes the issue: > > > > include/linux/percpu.h | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/include/linux/percpu.h b/include/linux/percpu.h > > index a790afba9386..03053de557cf 100644 > > --- a/include/linux/percpu.h > > +++ b/include/linux/percpu.h > > @@ -17,7 +17,7 @@ > > /* enough to cover all DEFINE_PER_CPUs in modules */ > > #ifdef CONFIG_MODULES > > #ifdef CONFIG_MEM_ALLOC_PROFILING > > -#define PERCPU_MODULE_RESERVE (8 << 12) > > +#define PERCPU_MODULE_RESERVE (8 << 13) > > #else > > #define PERCPU_MODULE_RESERVE (8 << 10) > > #endif > > > > Yeah, that patch fixes the issue for me. > > Thanks, > Tested-by: Klara Modin Official fix is posted at https://lore.kernel.org/all/20240406214044.1114406-1-surenb@google.com/ Thanks, Suren.
On Thu, Mar 21, 2024 at 09:36:22AM -0700, Suren Baghdasaryan wrote: > Low overhead [1] per-callsite memory allocation profiling. Not just for > debug kernels, overhead low enough to be deployed in production. Okay, I think I'm holding it wrong. With next-20240424 if I set: CONFIG_CODE_TAGGING=y CONFIG_MEM_ALLOC_PROFILING=y CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT=y My test system totally freaks out: ... SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1 Oops: general protection fault, probably for non-canonical address 0xc388d881e4808550: 0000 [#1] PREEMPT SMP NOPTI CPU: 0 PID: 0 Comm: swapper Not tainted 6.9.0-rc5-next-20240424 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015 RIP: 0010:__kmalloc_node_noprof+0xcd/0x560 Which is: __kmalloc_node_noprof+0xcd/0x560: __slab_alloc_node at mm/slub.c:3780 (discriminator 2) (inlined by) slab_alloc_node at mm/slub.c:3982 (discriminator 2) (inlined by) __do_kmalloc_node at mm/slub.c:4114 (discriminator 2) (inlined by) __kmalloc_node_noprof at mm/slub.c:4122 (discriminator 2) Which is: tid = READ_ONCE(c->tid); I haven't gotten any further than that; I'm EOD. Anyone seen anything like this with this series? -Kees
On Wed, Apr 24, 2024 at 06:59:01PM -0700, Kees Cook wrote: > On Thu, Mar 21, 2024 at 09:36:22AM -0700, Suren Baghdasaryan wrote: > > Low overhead [1] per-callsite memory allocation profiling. Not just for > > debug kernels, overhead low enough to be deployed in production. > > Okay, I think I'm holding it wrong. With next-20240424 if I set: > > CONFIG_CODE_TAGGING=y > CONFIG_MEM_ALLOC_PROFILING=y > CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT=y > > My test system totally freaks out: > > ... > SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1 > Oops: general protection fault, probably for non-canonical address 0xc388d881e4808550: 0000 [#1] PREEMPT SMP NOPTI > CPU: 0 PID: 0 Comm: swapper Not tainted 6.9.0-rc5-next-20240424 #1 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015 > RIP: 0010:__kmalloc_node_noprof+0xcd/0x560 > > Which is: > > __kmalloc_node_noprof+0xcd/0x560: > __slab_alloc_node at mm/slub.c:3780 (discriminator 2) > (inlined by) slab_alloc_node at mm/slub.c:3982 (discriminator 2) > (inlined by) __do_kmalloc_node at mm/slub.c:4114 (discriminator 2) > (inlined by) __kmalloc_node_noprof at mm/slub.c:4122 (discriminator 2) > > Which is: > > tid = READ_ONCE(c->tid); > > I haven't gotten any further than that; I'm EOD. Anyone seen anything > like this with this series? I certainly haven't. That looks like some real corruption, we're in slub internal data structures and derefing a garbage address. Check kasan and all that?
On Wed, Apr 24, 2024 at 8:26 PM Kent Overstreet <kent.overstreet@linux.dev> wrote: > > On Wed, Apr 24, 2024 at 06:59:01PM -0700, Kees Cook wrote: > > On Thu, Mar 21, 2024 at 09:36:22AM -0700, Suren Baghdasaryan wrote: > > > Low overhead [1] per-callsite memory allocation profiling. Not just for > > > debug kernels, overhead low enough to be deployed in production. > > > > Okay, I think I'm holding it wrong. With next-20240424 if I set: > > > > CONFIG_CODE_TAGGING=y > > CONFIG_MEM_ALLOC_PROFILING=y > > CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT=y > > > > My test system totally freaks out: > > > > ... > > SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1 > > Oops: general protection fault, probably for non-canonical address 0xc388d881e4808550: 0000 [#1] PREEMPT SMP NOPTI > > CPU: 0 PID: 0 Comm: swapper Not tainted 6.9.0-rc5-next-20240424 #1 > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015 > > RIP: 0010:__kmalloc_node_noprof+0xcd/0x560 > > > > Which is: > > > > __kmalloc_node_noprof+0xcd/0x560: > > __slab_alloc_node at mm/slub.c:3780 (discriminator 2) > > (inlined by) slab_alloc_node at mm/slub.c:3982 (discriminator 2) > > (inlined by) __do_kmalloc_node at mm/slub.c:4114 (discriminator 2) > > (inlined by) __kmalloc_node_noprof at mm/slub.c:4122 (discriminator 2) > > > > Which is: > > > > tid = READ_ONCE(c->tid); > > > > I haven't gotten any further than that; I'm EOD. Anyone seen anything > > like this with this series? > > I certainly haven't. That looks like some real corruption, we're in slub > internal data structures and derefing a garbage address. Check kasan and > all that? Hi Kees, I tested next-20240424 yesterday with defconfig and CONFIG_MEM_ALLOC_PROFILING enabled but didn't see any issue like that. Could you share your config file please? Thanks, Suren.
On Thu, Apr 25, 2024 at 08:39:37AM -0700, Suren Baghdasaryan wrote: > On Wed, Apr 24, 2024 at 8:26 PM Kent Overstreet > <kent.overstreet@linux.dev> wrote: > > > > On Wed, Apr 24, 2024 at 06:59:01PM -0700, Kees Cook wrote: > > > On Thu, Mar 21, 2024 at 09:36:22AM -0700, Suren Baghdasaryan wrote: > > > > Low overhead [1] per-callsite memory allocation profiling. Not just for > > > > debug kernels, overhead low enough to be deployed in production. > > > > > > Okay, I think I'm holding it wrong. With next-20240424 if I set: > > > > > > CONFIG_CODE_TAGGING=y > > > CONFIG_MEM_ALLOC_PROFILING=y > > > CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT=y > > > > > > My test system totally freaks out: > > > > > > ... > > > SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1 > > > Oops: general protection fault, probably for non-canonical address 0xc388d881e4808550: 0000 [#1] PREEMPT SMP NOPTI > > > CPU: 0 PID: 0 Comm: swapper Not tainted 6.9.0-rc5-next-20240424 #1 > > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015 > > > RIP: 0010:__kmalloc_node_noprof+0xcd/0x560 > > > > > > Which is: > > > > > > __kmalloc_node_noprof+0xcd/0x560: > > > __slab_alloc_node at mm/slub.c:3780 (discriminator 2) > > > (inlined by) slab_alloc_node at mm/slub.c:3982 (discriminator 2) > > > (inlined by) __do_kmalloc_node at mm/slub.c:4114 (discriminator 2) > > > (inlined by) __kmalloc_node_noprof at mm/slub.c:4122 (discriminator 2) > > > > > > Which is: > > > > > > tid = READ_ONCE(c->tid); > > > > > > I haven't gotten any further than that; I'm EOD. Anyone seen anything > > > like this with this series? > > > > I certainly haven't. That looks like some real corruption, we're in slub > > internal data structures and derefing a garbage address. Check kasan and > > all that? > > Hi Kees, > I tested next-20240424 yesterday with defconfig and > CONFIG_MEM_ALLOC_PROFILING enabled but didn't see any issue like that. > Could you share your config file please? Well *that* took a while to .config bisect. I probably should have found it sooner, but CONFIG_DEBUG_KMEMLEAK=y is what broke me. Without that, everything is lovely! :) I can reproduce it now with: $ make defconfig kvm_guest.config $ ./scripts/config -e CONFIG_MEM_ALLOC_PROFILING -e CONFIG_DEBUG_KMEMLEAK -Kees
On Thu, Mar 21, 2024 at 09:36:22AM -0700, Suren Baghdasaryan wrote: > Overview: > Low overhead [1] per-callsite memory allocation profiling. Not just for > debug kernels, overhead low enough to be deployed in production. A bit late to actually _running_ this code, but I remain a fan: Tested-by: Kees Cook <keescook@chromium.org> I have a little tweak patch I'll send out too...
On Thu, Apr 25, 2024 at 1:01 PM Kees Cook <keescook@chromium.org> wrote: > > On Thu, Apr 25, 2024 at 08:39:37AM -0700, Suren Baghdasaryan wrote: > > On Wed, Apr 24, 2024 at 8:26 PM Kent Overstreet > > <kent.overstreet@linux.dev> wrote: > > > > > > On Wed, Apr 24, 2024 at 06:59:01PM -0700, Kees Cook wrote: > > > > On Thu, Mar 21, 2024 at 09:36:22AM -0700, Suren Baghdasaryan wrote: > > > > > Low overhead [1] per-callsite memory allocation profiling. Not just for > > > > > debug kernels, overhead low enough to be deployed in production. > > > > > > > > Okay, I think I'm holding it wrong. With next-20240424 if I set: > > > > > > > > CONFIG_CODE_TAGGING=y > > > > CONFIG_MEM_ALLOC_PROFILING=y > > > > CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT=y > > > > > > > > My test system totally freaks out: > > > > > > > > ... > > > > SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1 > > > > Oops: general protection fault, probably for non-canonical address 0xc388d881e4808550: 0000 [#1] PREEMPT SMP NOPTI > > > > CPU: 0 PID: 0 Comm: swapper Not tainted 6.9.0-rc5-next-20240424 #1 > > > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015 > > > > RIP: 0010:__kmalloc_node_noprof+0xcd/0x560 > > > > > > > > Which is: > > > > > > > > __kmalloc_node_noprof+0xcd/0x560: > > > > __slab_alloc_node at mm/slub.c:3780 (discriminator 2) > > > > (inlined by) slab_alloc_node at mm/slub.c:3982 (discriminator 2) > > > > (inlined by) __do_kmalloc_node at mm/slub.c:4114 (discriminator 2) > > > > (inlined by) __kmalloc_node_noprof at mm/slub.c:4122 (discriminator 2) > > > > > > > > Which is: > > > > > > > > tid = READ_ONCE(c->tid); > > > > > > > > I haven't gotten any further than that; I'm EOD. Anyone seen anything > > > > like this with this series? > > > > > > I certainly haven't. That looks like some real corruption, we're in slub > > > internal data structures and derefing a garbage address. Check kasan and > > > all that? > > > > Hi Kees, > > I tested next-20240424 yesterday with defconfig and > > CONFIG_MEM_ALLOC_PROFILING enabled but didn't see any issue like that. > > Could you share your config file please? > > Well *that* took a while to .config bisect. I probably should have found > it sooner, but CONFIG_DEBUG_KMEMLEAK=y is what broke me. Without that, > everything is lovely! :) > > I can reproduce it now with: > > $ make defconfig kvm_guest.config > $ ./scripts/config -e CONFIG_MEM_ALLOC_PROFILING -e CONFIG_DEBUG_KMEMLEAK Thanks! I'll use this to reproduce the issue and will see if we can handle that recursion in a better way. > > -Kees > > -- > Kees Cook