diff mbox

[v8,03/17] mm: Assign id to every memcg-aware shrinker

Message ID 153063054586.1818.6041047871606697364.stgit@localhost.localdomain (mailing list archive)
State New, archived
Headers show

Commit Message

Kirill Tkhai July 3, 2018, 3:09 p.m. UTC
The patch introduces shrinker::id number, which is used to enumerate
memcg-aware shrinkers. The number start from 0, and the code tries
to maintain it as small as possible.

This will be used as to represent a memcg-aware shrinkers in memcg
shrinkers map.

Since all memcg-aware shrinkers are based on list_lru, which is per-memcg
in case of !CONFIG_MEMCG_KMEM only, the new functionality will be under
this config option.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Acked-by: Vladimir Davydov <vdavydov.dev@gmail.com>
Tested-by: Shakeel Butt <shakeelb@google.com>
---
 include/linux/shrinker.h |    4 +++
 mm/vmscan.c              |   62 ++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 66 insertions(+)

Comments

Matthew Wilcox (Oracle) July 3, 2018, 3:27 p.m. UTC | #1
On Tue, Jul 03, 2018 at 06:09:05PM +0300, Kirill Tkhai wrote:
> +++ b/mm/vmscan.c
> @@ -169,6 +169,49 @@ unsigned long vm_total_pages;
>  static LIST_HEAD(shrinker_list);
>  static DECLARE_RWSEM(shrinker_rwsem);
>  
> +#ifdef CONFIG_MEMCG_KMEM
> +static DEFINE_IDR(shrinker_idr);
> +static int shrinker_nr_max;

So ... we've now got a list_head (shrinker_list) which contains all of
the shrinkers, plus a shrinker_idr which contains the memcg-aware shrinkers?

Why not replace the shrinker_list with the shrinker_idr?  It's only used
twice in vmscan.c:

void register_shrinker_prepared(struct shrinker *shrinker)
{
        down_write(&shrinker_rwsem);
        list_add_tail(&shrinker->list, &shrinker_list);
        up_write(&shrinker_rwsem);
}

        list_for_each_entry(shrinker, &shrinker_list, list) {
...

The first is simply idr_alloc() and the second is

	idr_for_each_entry(&shrinker_idr, shrinker, id) {

I understand there's a difference between allocating the shrinker's ID and
adding it to the list.  You can do this by calling idr_alloc with NULL
as the pointer, and then using idr_replace() when you want to add the
shrinker to the list.  idr_for_each_entry() skips over NULL entries.

This will actually reduce the size of each shrinker and be more
cache-efficient when calling the shrinkers.  I think we can also get
rid of the shrinker_rwsem eventually, but let's leave it for now.
Shakeel Butt July 3, 2018, 3:46 p.m. UTC | #2
On Tue, Jul 3, 2018 at 8:27 AM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Tue, Jul 03, 2018 at 06:09:05PM +0300, Kirill Tkhai wrote:
> > +++ b/mm/vmscan.c
> > @@ -169,6 +169,49 @@ unsigned long vm_total_pages;
> >  static LIST_HEAD(shrinker_list);
> >  static DECLARE_RWSEM(shrinker_rwsem);
> >
> > +#ifdef CONFIG_MEMCG_KMEM
> > +static DEFINE_IDR(shrinker_idr);
> > +static int shrinker_nr_max;
>
> So ... we've now got a list_head (shrinker_list) which contains all of
> the shrinkers, plus a shrinker_idr which contains the memcg-aware shrinkers?
>
> Why not replace the shrinker_list with the shrinker_idr?  It's only used
> twice in vmscan.c:
>
> void register_shrinker_prepared(struct shrinker *shrinker)
> {
>         down_write(&shrinker_rwsem);
>         list_add_tail(&shrinker->list, &shrinker_list);
>         up_write(&shrinker_rwsem);
> }
>
>         list_for_each_entry(shrinker, &shrinker_list, list) {
> ...
>
> The first is simply idr_alloc() and the second is
>
>         idr_for_each_entry(&shrinker_idr, shrinker, id) {
>
> I understand there's a difference between allocating the shrinker's ID and
> adding it to the list.  You can do this by calling idr_alloc with NULL
> as the pointer, and then using idr_replace() when you want to add the
> shrinker to the list.  idr_for_each_entry() skips over NULL entries.
>
> This will actually reduce the size of each shrinker and be more
> cache-efficient when calling the shrinkers.  I think we can also get
> rid of the shrinker_rwsem eventually, but let's leave it for now.

Can you explain how you envision shrinker_rwsem can be removed? I am
very much interested in doing that.

thanks,
Shakeel
Kirill Tkhai July 3, 2018, 3:46 p.m. UTC | #3
On 03.07.2018 18:27, Matthew Wilcox wrote:
> On Tue, Jul 03, 2018 at 06:09:05PM +0300, Kirill Tkhai wrote:
>> +++ b/mm/vmscan.c
>> @@ -169,6 +169,49 @@ unsigned long vm_total_pages;
>>  static LIST_HEAD(shrinker_list);
>>  static DECLARE_RWSEM(shrinker_rwsem);
>>  
>> +#ifdef CONFIG_MEMCG_KMEM
>> +static DEFINE_IDR(shrinker_idr);
>> +static int shrinker_nr_max;
> 
> So ... we've now got a list_head (shrinker_list) which contains all of
> the shrinkers, plus a shrinker_idr which contains the memcg-aware shrinkers?
> 
> Why not replace the shrinker_list with the shrinker_idr?  It's only used
> twice in vmscan.c:
> 
> void register_shrinker_prepared(struct shrinker *shrinker)
> {
>         down_write(&shrinker_rwsem);
>         list_add_tail(&shrinker->list, &shrinker_list);
>         up_write(&shrinker_rwsem);
> }
> 
>         list_for_each_entry(shrinker, &shrinker_list, list) {
> ...
> 
> The first is simply idr_alloc() and the second is
> 
> 	idr_for_each_entry(&shrinker_idr, shrinker, id) {
> 
> I understand there's a difference between allocating the shrinker's ID and
> adding it to the list.  You can do this by calling idr_alloc with NULL
> as the pointer, and then using idr_replace() when you want to add the
> shrinker to the list.  idr_for_each_entry() skips over NULL entries.

shrinker_idr now contains only memcg-aware shrinkers, so all bits from memcg map
may be potentially populated. In case of memcg-aware shrinkers and !memcg-aware
shrinkers share the same numbers like you suggest, this will lead to increasing
size of memcg maps, which is bad for memory consumption. So, memcg-aware shrinkers
should to have its own IDR and its own numbers. The tricks like allocation big
IDs for !memcg-aware shrinkers seem bad for me, since they make the code more
complicated.

> This will actually reduce the size of each shrinker and be more
> cache-efficient when calling the shrinkers.  I think we can also get
> rid of the shrinker_rwsem eventually, but let's leave it for now.

This patchset does not make the cache-efficient bad, since without the patchset the situation
is so bad, that it's just impossible to talk about the cache efficiently,
so let's leave lockless iteration/etc for the future works.

Kirill
Kirill Tkhai July 3, 2018, 4:17 p.m. UTC | #4
Hi, Shakeel,

On 03.07.2018 18:46, Shakeel Butt wrote:
> On Tue, Jul 3, 2018 at 8:27 AM Matthew Wilcox <willy@infradead.org> wrote:
>>
>> On Tue, Jul 03, 2018 at 06:09:05PM +0300, Kirill Tkhai wrote:
>>> +++ b/mm/vmscan.c
>>> @@ -169,6 +169,49 @@ unsigned long vm_total_pages;
>>>  static LIST_HEAD(shrinker_list);
>>>  static DECLARE_RWSEM(shrinker_rwsem);
>>>
>>> +#ifdef CONFIG_MEMCG_KMEM
>>> +static DEFINE_IDR(shrinker_idr);
>>> +static int shrinker_nr_max;
>>
>> So ... we've now got a list_head (shrinker_list) which contains all of
>> the shrinkers, plus a shrinker_idr which contains the memcg-aware shrinkers?
>>
>> Why not replace the shrinker_list with the shrinker_idr?  It's only used
>> twice in vmscan.c:
>>
>> void register_shrinker_prepared(struct shrinker *shrinker)
>> {
>>         down_write(&shrinker_rwsem);
>>         list_add_tail(&shrinker->list, &shrinker_list);
>>         up_write(&shrinker_rwsem);
>> }
>>
>>         list_for_each_entry(shrinker, &shrinker_list, list) {
>> ...
>>
>> The first is simply idr_alloc() and the second is
>>
>>         idr_for_each_entry(&shrinker_idr, shrinker, id) {
>>
>> I understand there's a difference between allocating the shrinker's ID and
>> adding it to the list.  You can do this by calling idr_alloc with NULL
>> as the pointer, and then using idr_replace() when you want to add the
>> shrinker to the list.  idr_for_each_entry() skips over NULL entries.
>>
>> This will actually reduce the size of each shrinker and be more
>> cache-efficient when calling the shrinkers.  I think we can also get
>> rid of the shrinker_rwsem eventually, but let's leave it for now.
> 
> Can you explain how you envision shrinker_rwsem can be removed? I am
> very much interested in doing that.

Have you tried to do some games with SRCU? It looks like we just need to
teach count_objects() and scan_objects() to work with semi-destructed
shrinkers. Though, this looks this will make impossible to introduce
shrinkers, which do synchronize_srcu() in scan_objects() for example.
Not sure, someone will actually use this, and this is possible to consider
as limitation.

Kirill
Shakeel Butt July 3, 2018, 5 p.m. UTC | #5
On Tue, Jul 3, 2018 at 9:17 AM Kirill Tkhai <ktkhai@virtuozzo.com> wrote:
>
> Hi, Shakeel,
>
> On 03.07.2018 18:46, Shakeel Butt wrote:
> > On Tue, Jul 3, 2018 at 8:27 AM Matthew Wilcox <willy@infradead.org> wrote:
> >>
> >> On Tue, Jul 03, 2018 at 06:09:05PM +0300, Kirill Tkhai wrote:
> >>> +++ b/mm/vmscan.c
> >>> @@ -169,6 +169,49 @@ unsigned long vm_total_pages;
> >>>  static LIST_HEAD(shrinker_list);
> >>>  static DECLARE_RWSEM(shrinker_rwsem);
> >>>
> >>> +#ifdef CONFIG_MEMCG_KMEM
> >>> +static DEFINE_IDR(shrinker_idr);
> >>> +static int shrinker_nr_max;
> >>
> >> So ... we've now got a list_head (shrinker_list) which contains all of
> >> the shrinkers, plus a shrinker_idr which contains the memcg-aware shrinkers?
> >>
> >> Why not replace the shrinker_list with the shrinker_idr?  It's only used
> >> twice in vmscan.c:
> >>
> >> void register_shrinker_prepared(struct shrinker *shrinker)
> >> {
> >>         down_write(&shrinker_rwsem);
> >>         list_add_tail(&shrinker->list, &shrinker_list);
> >>         up_write(&shrinker_rwsem);
> >> }
> >>
> >>         list_for_each_entry(shrinker, &shrinker_list, list) {
> >> ...
> >>
> >> The first is simply idr_alloc() and the second is
> >>
> >>         idr_for_each_entry(&shrinker_idr, shrinker, id) {
> >>
> >> I understand there's a difference between allocating the shrinker's ID and
> >> adding it to the list.  You can do this by calling idr_alloc with NULL
> >> as the pointer, and then using idr_replace() when you want to add the
> >> shrinker to the list.  idr_for_each_entry() skips over NULL entries.
> >>
> >> This will actually reduce the size of each shrinker and be more
> >> cache-efficient when calling the shrinkers.  I think we can also get
> >> rid of the shrinker_rwsem eventually, but let's leave it for now.
> >
> > Can you explain how you envision shrinker_rwsem can be removed? I am
> > very much interested in doing that.
>
> Have you tried to do some games with SRCU? It looks like we just need to
> teach count_objects() and scan_objects() to work with semi-destructed
> shrinkers. Though, this looks this will make impossible to introduce
> shrinkers, which do synchronize_srcu() in scan_objects() for example.
> Not sure, someone will actually use this, and this is possible to consider
> as limitation.
>

Hi Kirill, I tried SRCU and the discussion is at
https://lore.kernel.org/lkml/20171117173521.GA21692@infradead.org/T/#u

Paul E. McKenney suggested to enable SRCU unconditionally. So, to use
SRCU for shrinkers, we first have to push unconditional SRCU.

Tetsuo had another lockless solution which was a bit involved but does
not depend on SRCU.

thanks,
Shakeel
Kirill Tkhai July 3, 2018, 5:32 p.m. UTC | #6
On 03.07.2018 20:00, Shakeel Butt wrote:
> On Tue, Jul 3, 2018 at 9:17 AM Kirill Tkhai <ktkhai@virtuozzo.com> wrote:
>>
>> Hi, Shakeel,
>>
>> On 03.07.2018 18:46, Shakeel Butt wrote:
>>> On Tue, Jul 3, 2018 at 8:27 AM Matthew Wilcox <willy@infradead.org> wrote:
>>>>
>>>> On Tue, Jul 03, 2018 at 06:09:05PM +0300, Kirill Tkhai wrote:
>>>>> +++ b/mm/vmscan.c
>>>>> @@ -169,6 +169,49 @@ unsigned long vm_total_pages;
>>>>>  static LIST_HEAD(shrinker_list);
>>>>>  static DECLARE_RWSEM(shrinker_rwsem);
>>>>>
>>>>> +#ifdef CONFIG_MEMCG_KMEM
>>>>> +static DEFINE_IDR(shrinker_idr);
>>>>> +static int shrinker_nr_max;
>>>>
>>>> So ... we've now got a list_head (shrinker_list) which contains all of
>>>> the shrinkers, plus a shrinker_idr which contains the memcg-aware shrinkers?
>>>>
>>>> Why not replace the shrinker_list with the shrinker_idr?  It's only used
>>>> twice in vmscan.c:
>>>>
>>>> void register_shrinker_prepared(struct shrinker *shrinker)
>>>> {
>>>>         down_write(&shrinker_rwsem);
>>>>         list_add_tail(&shrinker->list, &shrinker_list);
>>>>         up_write(&shrinker_rwsem);
>>>> }
>>>>
>>>>         list_for_each_entry(shrinker, &shrinker_list, list) {
>>>> ...
>>>>
>>>> The first is simply idr_alloc() and the second is
>>>>
>>>>         idr_for_each_entry(&shrinker_idr, shrinker, id) {
>>>>
>>>> I understand there's a difference between allocating the shrinker's ID and
>>>> adding it to the list.  You can do this by calling idr_alloc with NULL
>>>> as the pointer, and then using idr_replace() when you want to add the
>>>> shrinker to the list.  idr_for_each_entry() skips over NULL entries.
>>>>
>>>> This will actually reduce the size of each shrinker and be more
>>>> cache-efficient when calling the shrinkers.  I think we can also get
>>>> rid of the shrinker_rwsem eventually, but let's leave it for now.
>>>
>>> Can you explain how you envision shrinker_rwsem can be removed? I am
>>> very much interested in doing that.
>>
>> Have you tried to do some games with SRCU? It looks like we just need to
>> teach count_objects() and scan_objects() to work with semi-destructed
>> shrinkers. Though, this looks this will make impossible to introduce
>> shrinkers, which do synchronize_srcu() in scan_objects() for example.
>> Not sure, someone will actually use this, and this is possible to consider
>> as limitation.
>>
> 
> Hi Kirill, I tried SRCU and the discussion is at
> https://lore.kernel.org/lkml/20171117173521.GA21692@infradead.org/T/#u
> 
> Paul E. McKenney suggested to enable SRCU unconditionally. So, to use
> SRCU for shrinkers, we first have to push unconditional SRCU.

First time, I read this, I though the talk goes about some new srcu_read_lock()
without an argument and it's need to rework SRCU in some huge way. Thanks
god, it was just a misreading :)
> Tetsuo had another lockless solution which was a bit involved but does
> not depend on SRCU.

Ok, I see refcounters suggestion. Thanks for the link, Shakeel!

Kirill
Matthew Wilcox (Oracle) July 3, 2018, 5:47 p.m. UTC | #7
On Tue, Jul 03, 2018 at 08:46:28AM -0700, Shakeel Butt wrote:
> On Tue, Jul 3, 2018 at 8:27 AM Matthew Wilcox <willy@infradead.org> wrote:
> > This will actually reduce the size of each shrinker and be more
> > cache-efficient when calling the shrinkers.  I think we can also get
> > rid of the shrinker_rwsem eventually, but let's leave it for now.
> 
> Can you explain how you envision shrinker_rwsem can be removed? I am
> very much interested in doing that.

Sure.  Right now we have 3 uses of shrinker_rwsem -- two for adding and
removing shrinkers to the list and one for walking the list.  If we switch
to an IDR then we can use a spinlock for adding/removing shrinkers and
the RCU read lock for looking up an entry in the IDR each iteration of
the loop.

We'd need to stop the shrinker from disappearing underneath us while we
drop the RCU lock, so we'd need a refcount in the shrinker, and to free
the shrinkers using RCU.  We do similar things for other data structures,
so this is all pretty well understood.
Matthew Wilcox (Oracle) July 3, 2018, 5:58 p.m. UTC | #8
On Tue, Jul 03, 2018 at 06:46:57PM +0300, Kirill Tkhai wrote:
> shrinker_idr now contains only memcg-aware shrinkers, so all bits from memcg map
> may be potentially populated. In case of memcg-aware shrinkers and !memcg-aware
> shrinkers share the same numbers like you suggest, this will lead to increasing
> size of memcg maps, which is bad for memory consumption. So, memcg-aware shrinkers
> should to have its own IDR and its own numbers. The tricks like allocation big
> IDs for !memcg-aware shrinkers seem bad for me, since they make the code more
> complicated.

Do we really have so very many !memcg-aware shrinkers?

$ git grep -w register_shrinker |wc
     32     119    2221
$ git grep -w register_shrinker_prepared |wc
      4      13     268
(that's an overstatement; one of those is the declaration, one the definition,
and one an internal call, so we actually only have one caller of _prepared).

So it looks to me like your average system has one shrinker per
filesystem, one per graphics card, one per raid5 device, and a few
miscellaneous.  I'd be shocked if anybody had more than 100 shrinkers
registered on their laptop.

I think we should err on the side of simiplicity and just have one IDR for
every shrinker instead of playing games to solve a theoretical problem.

> > This will actually reduce the size of each shrinker and be more
> > cache-efficient when calling the shrinkers.  I think we can also get
> > rid of the shrinker_rwsem eventually, but let's leave it for now.
> 
> This patchset does not make the cache-efficient bad, since without the patchset the situation
> is so bad, that it's just impossible to talk about the cache efficiently,
> so let's leave lockless iteration/etc for the future works.

The situation is that bad /for your use case/.  Not so much for others.
You're introducing additional complexity here, and it'd be nice if we
can remove some of the complexity that's already there.
Kirill Tkhai July 3, 2018, 7:12 p.m. UTC | #9
On 03.07.2018 20:58, Matthew Wilcox wrote:
> On Tue, Jul 03, 2018 at 06:46:57PM +0300, Kirill Tkhai wrote:
>> shrinker_idr now contains only memcg-aware shrinkers, so all bits from memcg map
>> may be potentially populated. In case of memcg-aware shrinkers and !memcg-aware
>> shrinkers share the same numbers like you suggest, this will lead to increasing
>> size of memcg maps, which is bad for memory consumption. So, memcg-aware shrinkers
>> should to have its own IDR and its own numbers. The tricks like allocation big
>> IDs for !memcg-aware shrinkers seem bad for me, since they make the code more
>> complicated.
> 
> Do we really have so very many !memcg-aware shrinkers?
> 
> $ git grep -w register_shrinker |wc
>      32     119    2221
> $ git grep -w register_shrinker_prepared |wc
>       4      13     268
> (that's an overstatement; one of those is the declaration, one the definition,
> and one an internal call, so we actually only have one caller of _prepared).
> 
> So it looks to me like your average system has one shrinker per
> filesystem, one per graphics card, one per raid5 device, and a few
> miscellaneous.  I'd be shocked if anybody had more than 100 shrinkers
> registered on their laptop.
> 
> I think we should err on the side of simiplicity and just have one IDR for
> every shrinker instead of playing games to solve a theoretical problem.

It just a standard situation for the systems with many containers. Every mount
introduce a new shrinker to the system, so it's easy to see a system with
100 or ever 1000 shrinkers. AFAIR, Shakeel said he also has the similar
configurations.

So, this problem is not theoretical, it's just a standard situation
for active consumer or Docker/etc.

>>> This will actually reduce the size of each shrinker and be more
>>> cache-efficient when calling the shrinkers.  I think we can also get
>>> rid of the shrinker_rwsem eventually, but let's leave it for now.
>>
>> This patchset does not make the cache-efficient bad, since without the patchset the situation
>> is so bad, that it's just impossible to talk about the cache efficiently,
>> so let's leave lockless iteration/etc for the future works.
> 
> The situation is that bad /for your use case/.  Not so much for others.
> You're introducing additional complexity here, and it'd be nice if we
> can remove some of the complexity that's already there.

You started from cache-efficienty, and now you moved to existing complexity.
I did some cleanups in this patchset, also there is Vladimir's patch, which
simplifies shrinker logic. Also there is already 17 patches.
Which already existing complexity you want to remove? I don't think there is
existing complexity directly connected to this patchset.
Shakeel Butt July 3, 2018, 7:19 p.m. UTC | #10
On Tue, Jul 3, 2018 at 12:13 PM Kirill Tkhai <ktkhai@virtuozzo.com> wrote:
>
> On 03.07.2018 20:58, Matthew Wilcox wrote:
> > On Tue, Jul 03, 2018 at 06:46:57PM +0300, Kirill Tkhai wrote:
> >> shrinker_idr now contains only memcg-aware shrinkers, so all bits from memcg map
> >> may be potentially populated. In case of memcg-aware shrinkers and !memcg-aware
> >> shrinkers share the same numbers like you suggest, this will lead to increasing
> >> size of memcg maps, which is bad for memory consumption. So, memcg-aware shrinkers
> >> should to have its own IDR and its own numbers. The tricks like allocation big
> >> IDs for !memcg-aware shrinkers seem bad for me, since they make the code more
> >> complicated.
> >
> > Do we really have so very many !memcg-aware shrinkers?
> >
> > $ git grep -w register_shrinker |wc
> >      32     119    2221
> > $ git grep -w register_shrinker_prepared |wc
> >       4      13     268
> > (that's an overstatement; one of those is the declaration, one the definition,
> > and one an internal call, so we actually only have one caller of _prepared).
> >
> > So it looks to me like your average system has one shrinker per
> > filesystem, one per graphics card, one per raid5 device, and a few
> > miscellaneous.  I'd be shocked if anybody had more than 100 shrinkers
> > registered on their laptop.
> >
> > I think we should err on the side of simiplicity and just have one IDR for
> > every shrinker instead of playing games to solve a theoretical problem.
>
> It just a standard situation for the systems with many containers. Every mount
> introduce a new shrinker to the system, so it's easy to see a system with
> 100 or ever 1000 shrinkers. AFAIR, Shakeel said he also has the similar
> configurations.
>

I can say on our production systems, a couple thousand shrinkers is normal.

Shakeel
Matthew Wilcox (Oracle) July 3, 2018, 7:25 p.m. UTC | #11
On Tue, Jul 03, 2018 at 12:19:35PM -0700, Shakeel Butt wrote:
> On Tue, Jul 3, 2018 at 12:13 PM Kirill Tkhai <ktkhai@virtuozzo.com> wrote:
> > > Do we really have so very many !memcg-aware shrinkers?
> > >
> > > $ git grep -w register_shrinker |wc
> > >      32     119    2221
> > > $ git grep -w register_shrinker_prepared |wc
> > >       4      13     268
> > > (that's an overstatement; one of those is the declaration, one the definition,
> > > and one an internal call, so we actually only have one caller of _prepared).
> > >
> > > So it looks to me like your average system has one shrinker per
> > > filesystem, one per graphics card, one per raid5 device, and a few
> > > miscellaneous.  I'd be shocked if anybody had more than 100 shrinkers
> > > registered on their laptop.
> > >
> > > I think we should err on the side of simiplicity and just have one IDR for
> > > every shrinker instead of playing games to solve a theoretical problem.
> >
> > It just a standard situation for the systems with many containers. Every mount
> > introduce a new shrinker to the system, so it's easy to see a system with
> > 100 or ever 1000 shrinkers. AFAIR, Shakeel said he also has the similar
> > configurations.
> >
> 
> I can say on our production systems, a couple thousand shrinkers is normal.

But how many are !memcg aware?  It sounds to me like almost all of the
shrinkers come through the sget_userns() caller, so the other shrinkers
are almost irrelevant.
Shakeel Butt July 3, 2018, 7:54 p.m. UTC | #12
On Tue, Jul 3, 2018 at 12:25 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Tue, Jul 03, 2018 at 12:19:35PM -0700, Shakeel Butt wrote:
> > On Tue, Jul 3, 2018 at 12:13 PM Kirill Tkhai <ktkhai@virtuozzo.com> wrote:
> > > > Do we really have so very many !memcg-aware shrinkers?
> > > >
> > > > $ git grep -w register_shrinker |wc
> > > >      32     119    2221
> > > > $ git grep -w register_shrinker_prepared |wc
> > > >       4      13     268
> > > > (that's an overstatement; one of those is the declaration, one the definition,
> > > > and one an internal call, so we actually only have one caller of _prepared).
> > > >
> > > > So it looks to me like your average system has one shrinker per
> > > > filesystem, one per graphics card, one per raid5 device, and a few
> > > > miscellaneous.  I'd be shocked if anybody had more than 100 shrinkers
> > > > registered on their laptop.
> > > >
> > > > I think we should err on the side of simiplicity and just have one IDR for
> > > > every shrinker instead of playing games to solve a theoretical problem.
> > >
> > > It just a standard situation for the systems with many containers. Every mount
> > > introduce a new shrinker to the system, so it's easy to see a system with
> > > 100 or ever 1000 shrinkers. AFAIR, Shakeel said he also has the similar
> > > configurations.
> > >
> >
> > I can say on our production systems, a couple thousand shrinkers is normal.
>
> But how many are !memcg aware?  It sounds to me like almost all of the
> shrinkers come through the sget_userns() caller, so the other shrinkers
> are almost irrelevant.

I would say almost half. Sorry I do not have exact numbers. Basically
we use ext4 very extensively and majority of shrinkers are related to
ext4 (again I do not have exact numbers). One ext4 mount typically
registers three shrinkers, one memcg-aware (sget) and two non-memcg
aware (ext4_es_register_shrinker, ext4_xattr_create_cache).

Shakeel
Al Viro July 3, 2018, 8:39 p.m. UTC | #13
On Tue, Jul 03, 2018 at 10:47:44AM -0700, Matthew Wilcox wrote:
> On Tue, Jul 03, 2018 at 08:46:28AM -0700, Shakeel Butt wrote:
> > On Tue, Jul 3, 2018 at 8:27 AM Matthew Wilcox <willy@infradead.org> wrote:
> > > This will actually reduce the size of each shrinker and be more
> > > cache-efficient when calling the shrinkers.  I think we can also get
> > > rid of the shrinker_rwsem eventually, but let's leave it for now.
> > 
> > Can you explain how you envision shrinker_rwsem can be removed? I am
> > very much interested in doing that.
> 
> Sure.  Right now we have 3 uses of shrinker_rwsem -- two for adding and
> removing shrinkers to the list and one for walking the list.  If we switch
> to an IDR then we can use a spinlock for adding/removing shrinkers and
> the RCU read lock for looking up an entry in the IDR each iteration of
> the loop.
> 
> We'd need to stop the shrinker from disappearing underneath us while we
> drop the RCU lock, so we'd need a refcount in the shrinker, and to free
> the shrinkers using RCU.  We do similar things for other data structures,
> so this is all pretty well understood.

<censored>
struct super_block {
...
        struct shrinker s_shrink;       /* per-sb shrinker handle */
...
}
<censored>

What was that about refcount in the shrinker and taking over the lifetime
rules of the objects it might be embedded into, again?
diff mbox

Patch

diff --git a/include/linux/shrinker.h b/include/linux/shrinker.h
index 6794490f25b2..7ca9c18cf130 100644
--- a/include/linux/shrinker.h
+++ b/include/linux/shrinker.h
@@ -66,6 +66,10 @@  struct shrinker {
 
 	/* These are for internal use */
 	struct list_head list;
+#ifdef CONFIG_MEMCG_KMEM
+	/* ID in shrinker_idr */
+	int id;
+#endif
 	/* objs pending delete, per node */
 	atomic_long_t *nr_deferred;
 };
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 8d2ffae4db28..f9ca6b57d72f 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -169,6 +169,49 @@  unsigned long vm_total_pages;
 static LIST_HEAD(shrinker_list);
 static DECLARE_RWSEM(shrinker_rwsem);
 
+#ifdef CONFIG_MEMCG_KMEM
+static DEFINE_IDR(shrinker_idr);
+static int shrinker_nr_max;
+
+static int prealloc_memcg_shrinker(struct shrinker *shrinker)
+{
+	int id, ret = -ENOMEM;
+
+	down_write(&shrinker_rwsem);
+	id = idr_alloc(&shrinker_idr, shrinker, 0, 0, GFP_KERNEL);
+	if (id < 0)
+		goto unlock;
+
+	if (id >= shrinker_nr_max)
+		shrinker_nr_max = id + 1;
+	shrinker->id = id;
+	ret = 0;
+unlock:
+	up_write(&shrinker_rwsem);
+	return ret;
+}
+
+static void unregister_memcg_shrinker(struct shrinker *shrinker)
+{
+	int id = shrinker->id;
+
+	BUG_ON(id < 0);
+
+	down_write(&shrinker_rwsem);
+	idr_remove(&shrinker_idr, id);
+	up_write(&shrinker_rwsem);
+}
+#else /* CONFIG_MEMCG_KMEM */
+static int prealloc_memcg_shrinker(struct shrinker *shrinker)
+{
+	return 0;
+}
+
+static void unregister_memcg_shrinker(struct shrinker *shrinker)
+{
+}
+#endif /* CONFIG_MEMCG_KMEM */
+
 #ifdef CONFIG_MEMCG
 static bool global_reclaim(struct scan_control *sc)
 {
@@ -313,13 +356,30 @@  int prealloc_shrinker(struct shrinker *shrinker)
 	shrinker->nr_deferred = kzalloc(size, GFP_KERNEL);
 	if (!shrinker->nr_deferred)
 		return -ENOMEM;
+
+	if (shrinker->flags & SHRINKER_MEMCG_AWARE) {
+		if (prealloc_memcg_shrinker(shrinker))
+			goto free_deferred;
+	}
+
 	return 0;
+
+free_deferred:
+	kfree(shrinker->nr_deferred);
+	shrinker->nr_deferred = NULL;
+	return -ENOMEM;
 }
 
 void free_prealloced_shrinker(struct shrinker *shrinker)
 {
+	if (!shrinker->nr_deferred)
+		return;
+
 	kfree(shrinker->nr_deferred);
 	shrinker->nr_deferred = NULL;
+
+	if (shrinker->flags & SHRINKER_MEMCG_AWARE)
+		unregister_memcg_shrinker(shrinker);
 }
 
 void register_shrinker_prepared(struct shrinker *shrinker)
@@ -347,6 +407,8 @@  void unregister_shrinker(struct shrinker *shrinker)
 {
 	if (!shrinker->nr_deferred)
 		return;
+	if (shrinker->flags & SHRINKER_MEMCG_AWARE)
+		unregister_memcg_shrinker(shrinker);
 	down_write(&shrinker_rwsem);
 	list_del(&shrinker->list);
 	up_write(&shrinker_rwsem);