diff mbox series

[RFC,v4,12/13] mm/vmscan: Export drop_slab() and drop_slab_node()

Message ID 20191212171137.13872-13-david@redhat.com (mailing list archive)
State New, archived
Headers show
Series virtio-mem: paravirtualized memory | expand

Commit Message

David Hildenbrand Dec. 12, 2019, 5:11 p.m. UTC
We already have a way to trigger reclaiming of all reclaimable slab objects
from user space (echo 2 > /proc/sys/vm/drop_caches). Let's allow drivers
to also trigger this when they really want to make progress and know what
they are doing.

virtio-mem wants to use these functions when it failed to unplug memory
for quite some time (e.g., after 30 minutes). It will then try to
free up reclaimable objects by dropping the slab caches every now and
then (e.g., every 30 minutes) as long as necessary. There will be a way to
disable this feature and info messages will be logged.

In the future, we want to have a drop_slab_range() functionality
instead. Memory offlining code has similar demands and also other
alloc_contig_range() users (e.g., gigantic pages) could make good use of
this feature. Adding it, however, requires more work/thought.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 include/linux/mm.h | 4 ++--
 mm/vmscan.c        | 2 ++
 2 files changed, 4 insertions(+), 2 deletions(-)

Comments

Michal Hocko Feb. 25, 2020, 2:58 p.m. UTC | #1
On Thu 12-12-19 18:11:36, David Hildenbrand wrote:
> We already have a way to trigger reclaiming of all reclaimable slab objects
> from user space (echo 2 > /proc/sys/vm/drop_caches). Let's allow drivers
> to also trigger this when they really want to make progress and know what
> they are doing.

I cannot say I would be fan of this. This is a global action with user
visible performance impact. I am worried that we will find out that all
sorts of drivers have a very good idea that dropping slab caches is
going to help their problem whatever it is. We have seen the same patter
in the userspace already and that is the reason we are logging the usage
to the log and count invocations in the counter.

> virtio-mem wants to use these functions when it failed to unplug memory
> for quite some time (e.g., after 30 minutes). It will then try to
> free up reclaimable objects by dropping the slab caches every now and
> then (e.g., every 30 minutes) as long as necessary. There will be a way to
> disable this feature and info messages will be logged.
> 
> In the future, we want to have a drop_slab_range() functionality
> instead. Memory offlining code has similar demands and also other
> alloc_contig_range() users (e.g., gigantic pages) could make good use of
> this feature. Adding it, however, requires more work/thought.

We already do have a memory_notify(MEM_GOING_OFFLINE) for that purpose
and slab allocator implements a callback (slab_mem_going_offline_callback).
The callback is quite dumb and it doesn't really try to free objects
from the given memory range or even try to drop active objects which
might turn out to be hard but this sounds like a more robust way to
achieve what you want.
 
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Michal Hocko <mhocko@kernel.org>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  include/linux/mm.h | 4 ++--
>  mm/vmscan.c        | 2 ++
>  2 files changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 64799c5cb39f..483300f58be8 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2706,8 +2706,8 @@ int drop_caches_sysctl_handler(struct ctl_table *, int,
>  					void __user *, size_t *, loff_t *);
>  #endif
>  
> -void drop_slab(void);
> -void drop_slab_node(int nid);
> +extern void drop_slab(void);
> +extern void drop_slab_node(int nid);
>  
>  #ifndef CONFIG_MMU
>  #define randomize_va_space 0
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index c3e53502a84a..4e1cdaaec5e6 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -719,6 +719,7 @@ void drop_slab_node(int nid)
>  		} while ((memcg = mem_cgroup_iter(NULL, memcg, NULL)) != NULL);
>  	} while (freed > 10);
>  }
> +EXPORT_SYMBOL(drop_slab_node);
>  
>  void drop_slab(void)
>  {
> @@ -728,6 +729,7 @@ void drop_slab(void)
>  		drop_slab_node(nid);
>  	count_vm_event(DROP_SLAB);
>  }
> +EXPORT_SYMBOL(drop_slab);
>  
>  static inline int is_page_cache_freeable(struct page *page)
>  {
> -- 
> 2.23.0
David Hildenbrand Feb. 25, 2020, 3:09 p.m. UTC | #2
On 25.02.20 15:58, Michal Hocko wrote:
> On Thu 12-12-19 18:11:36, David Hildenbrand wrote:
>> We already have a way to trigger reclaiming of all reclaimable slab objects
>> from user space (echo 2 > /proc/sys/vm/drop_caches). Let's allow drivers
>> to also trigger this when they really want to make progress and know what
>> they are doing.
> 
> I cannot say I would be fan of this. This is a global action with user
> visible performance impact. I am worried that we will find out that all
> sorts of drivers have a very good idea that dropping slab caches is
> going to help their problem whatever it is. We have seen the same patter
> in the userspace already and that is the reason we are logging the usage
> to the log and count invocations in the counter.

Yeah, I decided to hold back patch 11-13 for the v1 (which I am planning
to post in March after more testing). What we really want is to make
memory offlining an alloc_contig_range() work better with reclaimable
objects.

> 
>> virtio-mem wants to use these functions when it failed to unplug memory
>> for quite some time (e.g., after 30 minutes). It will then try to
>> free up reclaimable objects by dropping the slab caches every now and
>> then (e.g., every 30 minutes) as long as necessary. There will be a way to
>> disable this feature and info messages will be logged.
>>
>> In the future, we want to have a drop_slab_range() functionality
>> instead. Memory offlining code has similar demands and also other
>> alloc_contig_range() users (e.g., gigantic pages) could make good use of
>> this feature. Adding it, however, requires more work/thought.
> 
> We already do have a memory_notify(MEM_GOING_OFFLINE) for that purpose
> and slab allocator implements a callback (slab_mem_going_offline_callback).
> The callback is quite dumb and it doesn't really try to free objects
> from the given memory range or even try to drop active objects which
> might turn out to be hard but this sounds like a more robust way to
> achieve what you want.

Two things:

1. memory_notify(MEM_GOING_OFFLINE) is called after trying to isolate
the page range and checking if we only have movable pages. Won't help
much I guess.

2. alloc_contig_range() won't benefit from that.

Something like drop_slab_range() would be better, and calling it from
the right spots in the core (e.g., set_migratetype_isolate() see below).

Especially, have a look at mm/page_isolation.c:set_migratetype_isolate()

"FIXME: Now, memory hotplug doesn't call shrink_slab() by itself. We
just check MOVABLE pages."
Michal Hocko Feb. 25, 2020, 5:06 p.m. UTC | #3
On Tue 25-02-20 16:09:29, David Hildenbrand wrote:
> On 25.02.20 15:58, Michal Hocko wrote:
> > On Thu 12-12-19 18:11:36, David Hildenbrand wrote:
> >> We already have a way to trigger reclaiming of all reclaimable slab objects
> >> from user space (echo 2 > /proc/sys/vm/drop_caches). Let's allow drivers
> >> to also trigger this when they really want to make progress and know what
> >> they are doing.
> > 
> > I cannot say I would be fan of this. This is a global action with user
> > visible performance impact. I am worried that we will find out that all
> > sorts of drivers have a very good idea that dropping slab caches is
> > going to help their problem whatever it is. We have seen the same patter
> > in the userspace already and that is the reason we are logging the usage
> > to the log and count invocations in the counter.
> 
> Yeah, I decided to hold back patch 11-13 for the v1 (which I am planning
> to post in March after more testing). What we really want is to make
> memory offlining an alloc_contig_range() work better with reclaimable
> objects.
> 
> > 
> >> virtio-mem wants to use these functions when it failed to unplug memory
> >> for quite some time (e.g., after 30 minutes). It will then try to
> >> free up reclaimable objects by dropping the slab caches every now and
> >> then (e.g., every 30 minutes) as long as necessary. There will be a way to
> >> disable this feature and info messages will be logged.
> >>
> >> In the future, we want to have a drop_slab_range() functionality
> >> instead. Memory offlining code has similar demands and also other
> >> alloc_contig_range() users (e.g., gigantic pages) could make good use of
> >> this feature. Adding it, however, requires more work/thought.
> > 
> > We already do have a memory_notify(MEM_GOING_OFFLINE) for that purpose
> > and slab allocator implements a callback (slab_mem_going_offline_callback).
> > The callback is quite dumb and it doesn't really try to free objects
> > from the given memory range or even try to drop active objects which
> > might turn out to be hard but this sounds like a more robust way to
> > achieve what you want.
> 
> Two things:
> 
> 1. memory_notify(MEM_GOING_OFFLINE) is called after trying to isolate
> the page range and checking if we only have movable pages. Won't help
> much I guess.

You are right, I have missed that. Can we reorder those two calls?

> 2. alloc_contig_range() won't benefit from that.

True.

> Something like drop_slab_range() would be better, and calling it from
> the right spots in the core (e.g., set_migratetype_isolate() see below).
> 
> Especially, have a look at mm/page_isolation.c:set_migratetype_isolate()
> 
> "FIXME: Now, memory hotplug doesn't call shrink_slab() by itself. We
> just check MOVABLE pages."

shrink_slab is really a large hammer for this purpose. The notifier
mechanism sounds more appropriate to me. If that means to move it
outside of its current position then let's try to experiment with that.
But there is a long route to have per pfn range reclaim.
David Hildenbrand Feb. 25, 2020, 5:23 p.m. UTC | #4
On 25.02.20 18:06, Michal Hocko wrote:
> On Tue 25-02-20 16:09:29, David Hildenbrand wrote:
>> On 25.02.20 15:58, Michal Hocko wrote:
>>> On Thu 12-12-19 18:11:36, David Hildenbrand wrote:
>>>> We already have a way to trigger reclaiming of all reclaimable slab objects
>>>> from user space (echo 2 > /proc/sys/vm/drop_caches). Let's allow drivers
>>>> to also trigger this when they really want to make progress and know what
>>>> they are doing.
>>>
>>> I cannot say I would be fan of this. This is a global action with user
>>> visible performance impact. I am worried that we will find out that all
>>> sorts of drivers have a very good idea that dropping slab caches is
>>> going to help their problem whatever it is. We have seen the same patter
>>> in the userspace already and that is the reason we are logging the usage
>>> to the log and count invocations in the counter.
>>
>> Yeah, I decided to hold back patch 11-13 for the v1 (which I am planning
>> to post in March after more testing). What we really want is to make
>> memory offlining an alloc_contig_range() work better with reclaimable
>> objects.
>>
>>>
>>>> virtio-mem wants to use these functions when it failed to unplug memory
>>>> for quite some time (e.g., after 30 minutes). It will then try to
>>>> free up reclaimable objects by dropping the slab caches every now and
>>>> then (e.g., every 30 minutes) as long as necessary. There will be a way to
>>>> disable this feature and info messages will be logged.
>>>>
>>>> In the future, we want to have a drop_slab_range() functionality
>>>> instead. Memory offlining code has similar demands and also other
>>>> alloc_contig_range() users (e.g., gigantic pages) could make good use of
>>>> this feature. Adding it, however, requires more work/thought.
>>>
>>> We already do have a memory_notify(MEM_GOING_OFFLINE) for that purpose
>>> and slab allocator implements a callback (slab_mem_going_offline_callback).
>>> The callback is quite dumb and it doesn't really try to free objects
>>> from the given memory range or even try to drop active objects which
>>> might turn out to be hard but this sounds like a more robust way to
>>> achieve what you want.
>>
>> Two things:
>>
>> 1. memory_notify(MEM_GOING_OFFLINE) is called after trying to isolate
>> the page range and checking if we only have movable pages. Won't help
>> much I guess.
> 
> You are right, I have missed that. Can we reorder those two calls?

AFAIK no (would have to look up the details, but there was a good reason
for the order, e.g., avoid races with other users of page isolation like
alloc_contig_range()).

Especially, "[PATCH RFC v4 06/13] mm: Allow to offline unmovable
PageOffline() pages via MEM_GOING_OFFLINE" (which is still impatiently
waiting for an ACK ;) ) also works around that ordering issue in a way
we discussed back then.
diff mbox series

Patch

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 64799c5cb39f..483300f58be8 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2706,8 +2706,8 @@  int drop_caches_sysctl_handler(struct ctl_table *, int,
 					void __user *, size_t *, loff_t *);
 #endif
 
-void drop_slab(void);
-void drop_slab_node(int nid);
+extern void drop_slab(void);
+extern void drop_slab_node(int nid);
 
 #ifndef CONFIG_MMU
 #define randomize_va_space 0
diff --git a/mm/vmscan.c b/mm/vmscan.c
index c3e53502a84a..4e1cdaaec5e6 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -719,6 +719,7 @@  void drop_slab_node(int nid)
 		} while ((memcg = mem_cgroup_iter(NULL, memcg, NULL)) != NULL);
 	} while (freed > 10);
 }
+EXPORT_SYMBOL(drop_slab_node);
 
 void drop_slab(void)
 {
@@ -728,6 +729,7 @@  void drop_slab(void)
 		drop_slab_node(nid);
 	count_vm_event(DROP_SLAB);
 }
+EXPORT_SYMBOL(drop_slab);
 
 static inline int is_page_cache_freeable(struct page *page)
 {