Message ID | 20220809094933.2203087-1-alexander.atanasov@virtuozzo.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v1,1/2] Enable balloon drivers to report inflated memory | expand |
On Tue, Aug 09, 2022 at 12:49:32PM +0300, Alexander Atanasov wrote: > Display reported in /proc/meminfo as: > > Inflated(total) or Inflated(free) > > depending on the driver. > > Drivers use the sign bit to indicate where they do account > the inflated memory. > > Amount of inflated memory can be used by: > - as a hint for the oom a killer > - user space software that monitors memory pressure > > Cc: David Hildenbrand <david@redhat.com> > Cc: Wei Liu <wei.liu@kernel.org> > Cc: Nadav Amit <namit@vmware.com> > > Signed-off-by: Alexander Atanasov <alexander.atanasov@virtuozzo.com> > --- > Documentation/filesystems/proc.rst | 5 +++++ > fs/proc/meminfo.c | 11 +++++++++++ > include/linux/mm.h | 4 ++++ > mm/page_alloc.c | 4 ++++ > 4 files changed, 24 insertions(+) > > diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst > index 1bc91fb8c321..064b5b3d5bd8 100644 > --- a/Documentation/filesystems/proc.rst > +++ b/Documentation/filesystems/proc.rst > @@ -986,6 +986,7 @@ Example output. You may not have all of these fields. > VmallocUsed: 40444 kB > VmallocChunk: 0 kB > Percpu: 29312 kB > + Inflated(total): 2097152 kB > HardwareCorrupted: 0 kB > AnonHugePages: 4149248 kB > ShmemHugePages: 0 kB > @@ -1133,6 +1134,10 @@ VmallocChunk > Percpu > Memory allocated to the percpu allocator used to back percpu > allocations. This stat excludes the cost of metadata. > +Inflated(total) or Inflated(free) > + Amount of memory that is inflated by the balloon driver. > + Due to differences among balloon drivers inflated memory > + is either subtracted from TotalRam or from MemFree. > HardwareCorrupted > The amount of RAM/memory in KB, the kernel identifies as > corrupted. > diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c > index 6e89f0e2fd20..ebbe52ccbb93 100644 > --- a/fs/proc/meminfo.c > +++ b/fs/proc/meminfo.c > @@ -38,6 +38,9 @@ static int meminfo_proc_show(struct seq_file *m, void *v) > unsigned long pages[NR_LRU_LISTS]; > unsigned long sreclaimable, sunreclaim; > int lru; > +#ifdef CONFIG_MEMORY_BALLOON > + long inflated_kb; > +#endif > > si_meminfo(&i); > si_swapinfo(&i); > @@ -153,6 +156,14 @@ static int meminfo_proc_show(struct seq_file *m, void *v) > global_zone_page_state(NR_FREE_CMA_PAGES)); > #endif > > +#ifdef CONFIG_MEMORY_BALLOON > + inflated_kb = atomic_long_read(&mem_balloon_inflated_kb); > + if (inflated_kb >= 0) > + seq_printf(m, "Inflated(total): %8ld kB\n", inflated_kb); > + else > + seq_printf(m, "Inflated(free): %8ld kB\n", -inflated_kb); > +#endif > + > hugetlb_report_meminfo(m); > > arch_report_meminfo(m); This seems too baroque for my taste. Why not just have two counters for the two pruposes? And is there any value in having this atomic? We want a consistent value but just READ_ONCE seems sufficient ... > diff --git a/include/linux/mm.h b/include/linux/mm.h > index 7898e29bcfb5..b190811dc16e 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -2582,6 +2582,10 @@ extern int watermark_boost_factor; > extern int watermark_scale_factor; > extern bool arch_has_descending_max_zone_pfns(void); > > +#ifdef CONFIG_MEMORY_BALLOON > +extern atomic_long_t mem_balloon_inflated_kb; > +#endif > + > /* nommu.c */ > extern atomic_long_t mmap_pages_allocated; > extern int nommu_shrink_inode_mappings(struct inode *, size_t, size_t); > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index b0bcab50f0a3..12359179a3a2 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -194,6 +194,10 @@ EXPORT_SYMBOL(init_on_alloc); > DEFINE_STATIC_KEY_MAYBE(CONFIG_INIT_ON_FREE_DEFAULT_ON, init_on_free); > EXPORT_SYMBOL(init_on_free); > > +#ifdef CONFIG_MEMORY_BALLOON > +atomic_long_t mem_balloon_inflated_kb = ATOMIC_LONG_INIT(0); > +#endif > + > static bool _init_on_alloc_enabled_early __read_mostly > = IS_ENABLED(CONFIG_INIT_ON_ALLOC_DEFAULT_ON); > static int __init early_init_on_alloc(char *buf) > -- > 2.31.1 > >
> On Aug 9, 2022, at 17:49, Alexander Atanasov <alexander.atanasov@virtuozzo.com> wrote: > > Display reported in /proc/meminfo as: Hi, I am not sure if this is a right place (meminfo) to put this statistic in since it is the accounting from a specific driver. IIUC, this driver is only installed in a VM, then this accounting will always be zero if this driver is not installed. Is this possible to put it in a driver-specific sysfs file (maybe it is better)? Just some thoughts from me. Muchun, Thanks. > > Inflated(total) or Inflated(free) > > depending on the driver. > > Drivers use the sign bit to indicate where they do account > the inflated memory. > > Amount of inflated memory can be used by: > - as a hint for the oom a killer > - user space software that monitors memory pressure > > Cc: David Hildenbrand <david@redhat.com> > Cc: Wei Liu <wei.liu@kernel.org> > Cc: Nadav Amit <namit@vmware.com> > > Signed-off-by: Alexander Atanasov <alexander.atanasov@virtuozzo.com> > --- > Documentation/filesystems/proc.rst | 5 +++++ > fs/proc/meminfo.c | 11 +++++++++++ > include/linux/mm.h | 4 ++++ > mm/page_alloc.c | 4 ++++ > 4 files changed, 24 insertions(+) > > diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst > index 1bc91fb8c321..064b5b3d5bd8 100644 > --- a/Documentation/filesystems/proc.rst > +++ b/Documentation/filesystems/proc.rst > @@ -986,6 +986,7 @@ Example output. You may not have all of these fields. > VmallocUsed: 40444 kB > VmallocChunk: 0 kB > Percpu: 29312 kB > + Inflated(total): 2097152 kB > HardwareCorrupted: 0 kB > AnonHugePages: 4149248 kB > ShmemHugePages: 0 kB > @@ -1133,6 +1134,10 @@ VmallocChunk > Percpu > Memory allocated to the percpu allocator used to back percpu > allocations. This stat excludes the cost of metadata. > +Inflated(total) or Inflated(free) > + Amount of memory that is inflated by the balloon driver. > + Due to differences among balloon drivers inflated memory > + is either subtracted from TotalRam or from MemFree. > HardwareCorrupted > The amount of RAM/memory in KB, the kernel identifies as > corrupted. > diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c > index 6e89f0e2fd20..ebbe52ccbb93 100644 > --- a/fs/proc/meminfo.c > +++ b/fs/proc/meminfo.c > @@ -38,6 +38,9 @@ static int meminfo_proc_show(struct seq_file *m, void *v) > unsigned long pages[NR_LRU_LISTS]; > unsigned long sreclaimable, sunreclaim; > int lru; > +#ifdef CONFIG_MEMORY_BALLOON > + long inflated_kb; > +#endif > > si_meminfo(&i); > si_swapinfo(&i); > @@ -153,6 +156,14 @@ static int meminfo_proc_show(struct seq_file *m, void *v) > global_zone_page_state(NR_FREE_CMA_PAGES)); > #endif > > +#ifdef CONFIG_MEMORY_BALLOON > + inflated_kb = atomic_long_read(&mem_balloon_inflated_kb); > + if (inflated_kb >= 0) > + seq_printf(m, "Inflated(total): %8ld kB\n", inflated_kb); > + else > + seq_printf(m, "Inflated(free): %8ld kB\n", -inflated_kb); > +#endif > + > hugetlb_report_meminfo(m); > > arch_report_meminfo(m); > diff --git a/include/linux/mm.h b/include/linux/mm.h > index 7898e29bcfb5..b190811dc16e 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -2582,6 +2582,10 @@ extern int watermark_boost_factor; > extern int watermark_scale_factor; > extern bool arch_has_descending_max_zone_pfns(void); > > +#ifdef CONFIG_MEMORY_BALLOON > +extern atomic_long_t mem_balloon_inflated_kb; > +#endif > + > /* nommu.c */ > extern atomic_long_t mmap_pages_allocated; > extern int nommu_shrink_inode_mappings(struct inode *, size_t, size_t); > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index b0bcab50f0a3..12359179a3a2 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -194,6 +194,10 @@ EXPORT_SYMBOL(init_on_alloc); > DEFINE_STATIC_KEY_MAYBE(CONFIG_INIT_ON_FREE_DEFAULT_ON, init_on_free); > EXPORT_SYMBOL(init_on_free); > > +#ifdef CONFIG_MEMORY_BALLOON > +atomic_long_t mem_balloon_inflated_kb = ATOMIC_LONG_INIT(0); > +#endif > + > static bool _init_on_alloc_enabled_early __read_mostly > = IS_ENABLED(CONFIG_INIT_ON_ALLOC_DEFAULT_ON); > static int __init early_init_on_alloc(char *buf) > -- > 2.31.1 > >
Hello, On 10.08.22 6:05, Muchun Song wrote: > > >> On Aug 9, 2022, at 17:49, Alexander Atanasov <alexander.atanasov@virtuozzo.com> wrote: >> >> Display reported in /proc/meminfo as: > > Hi, > > I am not sure if this is a right place (meminfo) to put this statistic in since > it is the accounting from a specific driver. IIUC, this driver is only installed > in a VM, then this accounting will always be zero if this driver is not installed. > Is this possible to put it in a driver-specific sysfs file (maybe it is better)? > Just some thoughts from me. Yes, it is only used if run under hypervisor but it is under a config option for that reason. There are several balloon drivers that will use it, not only one driver - virtio, VMWare, HyperV, XEN and possibly others. I made one as an example. Initially i worked on a patches for debugfs but discussion lead to put it into a centralized place.
On 9.08.22 13:32, Michael S. Tsirkin wrote: > On Tue, Aug 09, 2022 at 12:49:32PM +0300, Alexander Atanasov wrote: >> @@ -153,6 +156,14 @@ static int meminfo_proc_show(struct seq_file *m, void *v) >> global_zone_page_state(NR_FREE_CMA_PAGES)); >> #endif >> >> +#ifdef CONFIG_MEMORY_BALLOON >> + inflated_kb = atomic_long_read(&mem_balloon_inflated_kb); >> + if (inflated_kb >= 0) >> + seq_printf(m, "Inflated(total): %8ld kB\n", inflated_kb); >> + else >> + seq_printf(m, "Inflated(free): %8ld kB\n", -inflated_kb); >> +#endif >> + >> hugetlb_report_meminfo(m); >> >> arch_report_meminfo(m); > > > This seems too baroque for my taste. > Why not just have two counters for the two pruposes? I agree it is not good but it reflects the current situation. Dirvers account in only one way - either used or total - which i don't like. So to save space and to avoid the possibility that some driver starts to use both at the same time. I suggest to be only one value. > And is there any value in having this atomic? > We want a consistent value but just READ_ONCE seems sufficient ... I do not see this as only a value that is going to be displayed. I tried to be defensive here and to avoid premature optimization. One possible scenario is OOM killer(using the value) vs balloon deflate on oom will need it. But any other user of that value will likely need it atomic too. Drivers use spin_locks for calculations they might find a way to reduce the spin lock usage and use the atomic. While making it a long could only bring bugs without benefits. It is not on a fast path too so i prefer to be safe.
On Wed, Aug 10, 2022 at 08:54:52AM +0300, Alexander Atanasov wrote: > On 9.08.22 13:32, Michael S. Tsirkin wrote: > > On Tue, Aug 09, 2022 at 12:49:32PM +0300, Alexander Atanasov wrote: > > > @@ -153,6 +156,14 @@ static int meminfo_proc_show(struct seq_file *m, void *v) > > > global_zone_page_state(NR_FREE_CMA_PAGES)); > > > #endif > > > +#ifdef CONFIG_MEMORY_BALLOON > > > + inflated_kb = atomic_long_read(&mem_balloon_inflated_kb); > > > + if (inflated_kb >= 0) > > > + seq_printf(m, "Inflated(total): %8ld kB\n", inflated_kb); > > > + else > > > + seq_printf(m, "Inflated(free): %8ld kB\n", -inflated_kb); > > > +#endif > > > + > > > hugetlb_report_meminfo(m); > > > arch_report_meminfo(m); > > > > > > This seems too baroque for my taste. > > Why not just have two counters for the two pruposes? > > I agree it is not good but it reflects the current situation. > Dirvers account in only one way - either used or total - which i don't like. > So to save space and to avoid the possibility that some driver starts to use > both at the same time. I suggest to be only one value. I don't see what would be wrong if some driver used both at some point. > > > And is there any value in having this atomic? > > We want a consistent value but just READ_ONCE seems sufficient ... > > I do not see this as only a value that is going to be displayed. > I tried to be defensive here and to avoid premature optimization. > One possible scenario is OOM killer(using the value) vs balloon deflate on > oom will need it. But any other user of that value will likely need it > atomic too. Drivers use spin_locks for calculations they might find a way to > reduce the spin lock usage and use the atomic. > While making it a long could only bring bugs without benefits. > It is not on a fast path too so i prefer to be safe. Well we do not normally spread atomics around just because we can, it does not magically make the code safe. If this needs atomics we need to document why. > -- > Regards, > Alexander Atanasov
Hello, On 10.08.22 9:05, Michael S. Tsirkin wrote: > On Wed, Aug 10, 2022 at 08:54:52AM +0300, Alexander Atanasov wrote: >> On 9.08.22 13:32, Michael S. Tsirkin wrote: >>> On Tue, Aug 09, 2022 at 12:49:32PM +0300, Alexander Atanasov wrote: >>>> @@ -153,6 +156,14 @@ static int meminfo_proc_show(struct seq_file *m, void *v) >>>> global_zone_page_state(NR_FREE_CMA_PAGES)); >>>> #endif >>>> +#ifdef CONFIG_MEMORY_BALLOON >>>> + inflated_kb = atomic_long_read(&mem_balloon_inflated_kb); >>>> + if (inflated_kb >= 0) >>>> + seq_printf(m, "Inflated(total): %8ld kB\n", inflated_kb); >>>> + else >>>> + seq_printf(m, "Inflated(free): %8ld kB\n", -inflated_kb); >>>> +#endif >>>> + >>>> hugetlb_report_meminfo(m); >>>> arch_report_meminfo(m); >>> >>> >>> This seems too baroque for my taste. >>> Why not just have two counters for the two pruposes? >> >> I agree it is not good but it reflects the current situation. >> Dirvers account in only one way - either used or total - which i don't like. >> So to save space and to avoid the possibility that some driver starts to use >> both at the same time. I suggest to be only one value. > > I don't see what would be wrong if some driver used both > at some point. If you don't see what's wrong with using both, i might as well add Cached and Buffers - next hypervisor might want to use them or any other by its discretion leaving the fun to figure it out to the userspace? Single definitive value is much better and clear from user prespective and meminfo is exactly for the users. If a driver for some wierd reason needs to do both it is a whole new topic that i don't like to go into. Good news is that currently no such driver exists. > >> >>> And is there any value in having this atomic? >>> We want a consistent value but just READ_ONCE seems sufficient ... >> >> I do not see this as only a value that is going to be displayed. >> I tried to be defensive here and to avoid premature optimization. >> One possible scenario is OOM killer(using the value) vs balloon deflate on >> oom will need it. But any other user of that value will likely need it >> atomic too. Drivers use spin_locks for calculations they might find a way to >> reduce the spin lock usage and use the atomic. >> While making it a long could only bring bugs without benefits. >> It is not on a fast path too so i prefer to be safe. > > Well we do not normally spread atomics around just because we > can, it does not magically make the code safe. > If this needs atomics we need to document why. Of course it does not. In one of your comments to my other patches you said you do not like patches that add one line then remove it in the next patch. To avoid that i put an atomic - if at one point it is clear it is not required i would be happy to change it but it is more likely to be need than not. So i will probably have to document it instead. At this point the decision if it should be or should not be in the meminfo is more important - if general opinion is positive i will address the technical details.
On Wed, Aug 10, 2022 at 10:50:10AM +0300, Alexander Atanasov wrote: > Hello, > > On 10.08.22 9:05, Michael S. Tsirkin wrote: > > On Wed, Aug 10, 2022 at 08:54:52AM +0300, Alexander Atanasov wrote: > > > On 9.08.22 13:32, Michael S. Tsirkin wrote: > > > > On Tue, Aug 09, 2022 at 12:49:32PM +0300, Alexander Atanasov wrote: > > > > > @@ -153,6 +156,14 @@ static int meminfo_proc_show(struct seq_file *m, void *v) > > > > > global_zone_page_state(NR_FREE_CMA_PAGES)); > > > > > #endif > > > > > +#ifdef CONFIG_MEMORY_BALLOON > > > > > + inflated_kb = atomic_long_read(&mem_balloon_inflated_kb); > > > > > + if (inflated_kb >= 0) > > > > > + seq_printf(m, "Inflated(total): %8ld kB\n", inflated_kb); > > > > > + else > > > > > + seq_printf(m, "Inflated(free): %8ld kB\n", -inflated_kb); > > > > > +#endif > > > > > + > > > > > hugetlb_report_meminfo(m); > > > > > arch_report_meminfo(m); > > > > > > > > > > > > This seems too baroque for my taste. > > > > Why not just have two counters for the two pruposes? > > > > > > I agree it is not good but it reflects the current situation. > > > Dirvers account in only one way - either used or total - which i don't like. > > > So to save space and to avoid the possibility that some driver starts to use > > > both at the same time. I suggest to be only one value. > > > > I don't see what would be wrong if some driver used both > > at some point. > > If you don't see what's wrong with using both, i might as well add > Cached and Buffers - next hypervisor might want to use them or any other by > its discretion leaving the fun to figure it out to the userspace? Assuming you document what these mean, sure. > Single definitive value is much better and clear from user prespective and > meminfo is exactly for the users. Not really, the negative value trick is anything but clear. > If a driver for some wierd reason needs to do both it is a whole new topic > that i don't like to go into. Good news is that currently no such driver > exists. > > > > > > > > > > > And is there any value in having this atomic? > > > > We want a consistent value but just READ_ONCE seems sufficient ... > > > > > > I do not see this as only a value that is going to be displayed. > > > I tried to be defensive here and to avoid premature optimization. > > > One possible scenario is OOM killer(using the value) vs balloon deflate on > > > oom will need it. But any other user of that value will likely need it > > > atomic too. Drivers use spin_locks for calculations they might find a way to > > > reduce the spin lock usage and use the atomic. > > > While making it a long could only bring bugs without benefits. > > > It is not on a fast path too so i prefer to be safe. > > > > Well we do not normally spread atomics around just because we > > can, it does not magically make the code safe. > > If this needs atomics we need to document why. > > Of course it does not. In one of your comments to my other patches you said > you do not like patches that add one line then remove it in the next patch. > To avoid that i put an atomic - if at one point it is clear it is not > required i would be happy to change it but it is more likely to be need than > not. So i will probably have to document it instead. > > At this point the decision if it should be or should not be in the meminfo > is more important - if general opinion is positive i will address the > technical details. Not up to me, you need ack from linux-mm guys for that. > -- > Regards, > Alexander Atanasov
diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst index 1bc91fb8c321..064b5b3d5bd8 100644 --- a/Documentation/filesystems/proc.rst +++ b/Documentation/filesystems/proc.rst @@ -986,6 +986,7 @@ Example output. You may not have all of these fields. VmallocUsed: 40444 kB VmallocChunk: 0 kB Percpu: 29312 kB + Inflated(total): 2097152 kB HardwareCorrupted: 0 kB AnonHugePages: 4149248 kB ShmemHugePages: 0 kB @@ -1133,6 +1134,10 @@ VmallocChunk Percpu Memory allocated to the percpu allocator used to back percpu allocations. This stat excludes the cost of metadata. +Inflated(total) or Inflated(free) + Amount of memory that is inflated by the balloon driver. + Due to differences among balloon drivers inflated memory + is either subtracted from TotalRam or from MemFree. HardwareCorrupted The amount of RAM/memory in KB, the kernel identifies as corrupted. diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c index 6e89f0e2fd20..ebbe52ccbb93 100644 --- a/fs/proc/meminfo.c +++ b/fs/proc/meminfo.c @@ -38,6 +38,9 @@ static int meminfo_proc_show(struct seq_file *m, void *v) unsigned long pages[NR_LRU_LISTS]; unsigned long sreclaimable, sunreclaim; int lru; +#ifdef CONFIG_MEMORY_BALLOON + long inflated_kb; +#endif si_meminfo(&i); si_swapinfo(&i); @@ -153,6 +156,14 @@ static int meminfo_proc_show(struct seq_file *m, void *v) global_zone_page_state(NR_FREE_CMA_PAGES)); #endif +#ifdef CONFIG_MEMORY_BALLOON + inflated_kb = atomic_long_read(&mem_balloon_inflated_kb); + if (inflated_kb >= 0) + seq_printf(m, "Inflated(total): %8ld kB\n", inflated_kb); + else + seq_printf(m, "Inflated(free): %8ld kB\n", -inflated_kb); +#endif + hugetlb_report_meminfo(m); arch_report_meminfo(m); diff --git a/include/linux/mm.h b/include/linux/mm.h index 7898e29bcfb5..b190811dc16e 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2582,6 +2582,10 @@ extern int watermark_boost_factor; extern int watermark_scale_factor; extern bool arch_has_descending_max_zone_pfns(void); +#ifdef CONFIG_MEMORY_BALLOON +extern atomic_long_t mem_balloon_inflated_kb; +#endif + /* nommu.c */ extern atomic_long_t mmap_pages_allocated; extern int nommu_shrink_inode_mappings(struct inode *, size_t, size_t); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index b0bcab50f0a3..12359179a3a2 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -194,6 +194,10 @@ EXPORT_SYMBOL(init_on_alloc); DEFINE_STATIC_KEY_MAYBE(CONFIG_INIT_ON_FREE_DEFAULT_ON, init_on_free); EXPORT_SYMBOL(init_on_free); +#ifdef CONFIG_MEMORY_BALLOON +atomic_long_t mem_balloon_inflated_kb = ATOMIC_LONG_INIT(0); +#endif + static bool _init_on_alloc_enabled_early __read_mostly = IS_ENABLED(CONFIG_INIT_ON_ALLOC_DEFAULT_ON); static int __init early_init_on_alloc(char *buf)
Display reported in /proc/meminfo as: Inflated(total) or Inflated(free) depending on the driver. Drivers use the sign bit to indicate where they do account the inflated memory. Amount of inflated memory can be used by: - as a hint for the oom a killer - user space software that monitors memory pressure Cc: David Hildenbrand <david@redhat.com> Cc: Wei Liu <wei.liu@kernel.org> Cc: Nadav Amit <namit@vmware.com> Signed-off-by: Alexander Atanasov <alexander.atanasov@virtuozzo.com> --- Documentation/filesystems/proc.rst | 5 +++++ fs/proc/meminfo.c | 11 +++++++++++ include/linux/mm.h | 4 ++++ mm/page_alloc.c | 4 ++++ 4 files changed, 24 insertions(+)