diff mbox series

memcg: page_cgroup_ino() get memcg from compound_head(page)

Message ID 20230313083452.1319968-1-yosryahmed@google.com (mailing list archive)
State New
Headers show
Series memcg: page_cgroup_ino() get memcg from compound_head(page) | expand

Commit Message

Yosry Ahmed March 13, 2023, 8:34 a.m. UTC
From: Hugh Dickins <hughd@google.com>

In a kernel with added WARN_ON_ONCE(PageTail) in page_memcg_check(), we
observed a warning from page_cgroup_ino() when reading
/proc/kpagecgroup. This warning was added to catch fragile reads of
a page memcg. Make page_cgroup_ino() get memcg from compound_head(page):
that gives it the correct memcg for each subpage of a compound page,
so is the right fix.

I dithered between the right fix and the safer "fix": it's unlikely but
conceivable that some userspace has learnt that /proc/kpagecgroup gives
no memcg on tail pages, and compensates for that in some (racy) way: so
continuing to give no memcg on tails, without warning, might be safer.

But hwpoison_filter_task(), the only other user of page_cgroup_ino(),
persuaded me.  It looks as if it currently leaves out tail pages of the
selected memcg, by mistake: whereas hwpoison_inject() uses compound_head()
and expects the tails to be included.  So hwpoison testing coverage has
probably been restricted by the wrong output from page_cgroup_ino() (if
that memcg filter is used at all): in the short term, it might be safer
not to enable wider coverage there, but long term we would regret that.

Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
---

(Yosry: Alternatively, we could modify page_memcg_check() to do
 page_folio() like its sibling page_memcg(), as page_cgroup_ino() is the
 only remaining caller other than print_page_owner_memcg(); and it already
 excludes pages that have page->memcg_data = 0)

---
 mm/memcontrol.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Andrew Morton March 13, 2023, 7:44 p.m. UTC | #1
On Mon, 13 Mar 2023 08:34:52 +0000 Yosry Ahmed <yosryahmed@google.com> wrote:

> From: Hugh Dickins <hughd@google.com>
> 
> In a kernel with added WARN_ON_ONCE(PageTail) in page_memcg_check(), we
> observed a warning from page_cgroup_ino() when reading
> /proc/kpagecgroup.

If this is the only known situation in which page_memcg_check() is
passed a tail page, why does page_memcg_check() have

	if (PageTail(page))
		return NULL;

?  Can we remove this to simplify, streamline and clarify?
Yosry Ahmed March 13, 2023, 9:08 p.m. UTC | #2
On Mon, Mar 13, 2023 at 12:44 PM Andrew Morton
<akpm@linux-foundation.org> wrote:
>
> On Mon, 13 Mar 2023 08:34:52 +0000 Yosry Ahmed <yosryahmed@google.com> wrote:
>
> > From: Hugh Dickins <hughd@google.com>
> >
> > In a kernel with added WARN_ON_ONCE(PageTail) in page_memcg_check(), we
> > observed a warning from page_cgroup_ino() when reading
> > /proc/kpagecgroup.
>
> If this is the only known situation in which page_memcg_check() is
> passed a tail page, why does page_memcg_check() have
>
>         if (PageTail(page))
>                 return NULL;
>
> ?  Can we remove this to simplify, streamline and clarify?

I guess it's a safety check so that we don't end up trying to cast a
tail page to a folio. My opinion is to go one step further and change
page_memcg_check() to do return the memcg of the head page, i.e:

static inline struct mem_cgroup *page_memcg_check(struct page *page)
{
    return folio_memcg_check(page_folio(page));
}

This makes it consistent with page_memcg(), and makes sure future
users are getting the "correct" memcg for whatever page they pass in.
I am interested to hear other folks' opinions here.

The only other user today is print_page_owner_memcg(). I am not sure
if it's doing the right thing by explicitly reading page->memcg_data,
but it is already excluding pages that have page->memcg_data == 0,
which should be the case for tail pages.

>
>
Michal Hocko March 14, 2023, 10 a.m. UTC | #3
On Mon 13-03-23 08:34:52, Yosry Ahmed wrote:
> From: Hugh Dickins <hughd@google.com>
> 
> In a kernel with added WARN_ON_ONCE(PageTail) in page_memcg_check(), we
> observed a warning from page_cgroup_ino() when reading
> /proc/kpagecgroup. This warning was added to catch fragile reads of
> a page memcg. Make page_cgroup_ino() get memcg from compound_head(page):
> that gives it the correct memcg for each subpage of a compound page,
> so is the right fix.
> 
> I dithered between the right fix and the safer "fix": it's unlikely but
> conceivable that some userspace has learnt that /proc/kpagecgroup gives
> no memcg on tail pages, and compensates for that in some (racy) way: so
> continuing to give no memcg on tails, without warning, might be safer.
> 
> But hwpoison_filter_task(), the only other user of page_cgroup_ino(),
> persuaded me.  It looks as if it currently leaves out tail pages of the
> selected memcg, by mistake: whereas hwpoison_inject() uses compound_head()
> and expects the tails to be included.  So hwpoison testing coverage has
> probably been restricted by the wrong output from page_cgroup_ino() (if
> that memcg filter is used at all): in the short term, it might be safer
> not to enable wider coverage there, but long term we would regret that.
> 
> Signed-off-by: Hugh Dickins <hughd@google.com>
> Signed-off-by: Yosry Ahmed <yosryahmed@google.com>

Acked-by: Michal Hocko <mhocko@suse.com>

Thanks!

> ---
> 
> (Yosry: Alternatively, we could modify page_memcg_check() to do
>  page_folio() like its sibling page_memcg(), as page_cgroup_ino() is the
>  only remaining caller other than print_page_owner_memcg(); and it already
>  excludes pages that have page->memcg_data = 0)
> 
> ---
>  mm/memcontrol.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 5abffe6f8389..e3a55295725e 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -395,7 +395,7 @@ ino_t page_cgroup_ino(struct page *page)
>  	unsigned long ino = 0;
>  
>  	rcu_read_lock();
> -	memcg = page_memcg_check(page);
> +	memcg = page_memcg_check(compound_head(page));
>  
>  	while (memcg && !(memcg->css.flags & CSS_ONLINE))
>  		memcg = parent_mem_cgroup(memcg);
> -- 
> 2.40.0.rc1.284.g88254d51c5-goog
Michal Hocko March 14, 2023, 10:02 a.m. UTC | #4
On Mon 13-03-23 14:08:53, Yosry Ahmed wrote:
> On Mon, Mar 13, 2023 at 12:44 PM Andrew Morton
> <akpm@linux-foundation.org> wrote:
> >
> > On Mon, 13 Mar 2023 08:34:52 +0000 Yosry Ahmed <yosryahmed@google.com> wrote:
> >
> > > From: Hugh Dickins <hughd@google.com>
> > >
> > > In a kernel with added WARN_ON_ONCE(PageTail) in page_memcg_check(), we
> > > observed a warning from page_cgroup_ino() when reading
> > > /proc/kpagecgroup.
> >
> > If this is the only known situation in which page_memcg_check() is
> > passed a tail page, why does page_memcg_check() have
> >
> >         if (PageTail(page))
> >                 return NULL;
> >
> > ?  Can we remove this to simplify, streamline and clarify?
> 
> I guess it's a safety check so that we don't end up trying to cast a
> tail page to a folio. My opinion is to go one step further and change
> page_memcg_check() to do return the memcg of the head page, i.e:
> 
> static inline struct mem_cgroup *page_memcg_check(struct page *page)
> {
>     return folio_memcg_check(page_folio(page));
> }

I would just stick with the existing code and put a comment that this
function shouldn't be used in any new code and the folio counterpart
should be used instead.
Yosry Ahmed March 14, 2023, 7:45 p.m. UTC | #5
On Tue, Mar 14, 2023 at 3:02 AM Michal Hocko <mhocko@suse.com> wrote:
>
> On Mon 13-03-23 14:08:53, Yosry Ahmed wrote:
> > On Mon, Mar 13, 2023 at 12:44 PM Andrew Morton
> > <akpm@linux-foundation.org> wrote:
> > >
> > > On Mon, 13 Mar 2023 08:34:52 +0000 Yosry Ahmed <yosryahmed@google.com> wrote:
> > >
> > > > From: Hugh Dickins <hughd@google.com>
> > > >
> > > > In a kernel with added WARN_ON_ONCE(PageTail) in page_memcg_check(), we
> > > > observed a warning from page_cgroup_ino() when reading
> > > > /proc/kpagecgroup.
> > >
> > > If this is the only known situation in which page_memcg_check() is
> > > passed a tail page, why does page_memcg_check() have
> > >
> > >         if (PageTail(page))
> > >                 return NULL;
> > >
> > > ?  Can we remove this to simplify, streamline and clarify?
> >
> > I guess it's a safety check so that we don't end up trying to cast a
> > tail page to a folio. My opinion is to go one step further and change
> > page_memcg_check() to do return the memcg of the head page, i.e:
> >
> > static inline struct mem_cgroup *page_memcg_check(struct page *page)
> > {
> >     return folio_memcg_check(page_folio(page));
> > }
>
> I would just stick with the existing code and put a comment that this
> function shouldn't be used in any new code and the folio counterpart
> should be used instead.

Would you mind explaining the rationale?

If existing users are not passing in tail pages and we are telling new
users not to use it, what's the point of leaving the PageTail() check?

Is page owner doing the right thing by discounting pages with
page->memcg_data = 0 in print_page_owner_memcg() ? Wouldn't this not
show the memcg of tail pages?

If page owner also needs a compound_head()/ page_folio() call before
checking the page memcg, perhaps we should convert both call sites to
page_memcg_check() to folio_memcg_check() and remove it now?

>
> --
> Michal Hocko
> SUSE Labs
Yosry Ahmed March 14, 2023, 7:46 p.m. UTC | #6
On Tue, Mar 14, 2023 at 12:45 PM Yosry Ahmed <yosryahmed@google.com> wrote:
>
> On Tue, Mar 14, 2023 at 3:02 AM Michal Hocko <mhocko@suse.com> wrote:
> >
> > On Mon 13-03-23 14:08:53, Yosry Ahmed wrote:
> > > On Mon, Mar 13, 2023 at 12:44 PM Andrew Morton
> > > <akpm@linux-foundation.org> wrote:
> > > >
> > > > On Mon, 13 Mar 2023 08:34:52 +0000 Yosry Ahmed <yosryahmed@google.com> wrote:
> > > >
> > > > > From: Hugh Dickins <hughd@google.com>
> > > > >
> > > > > In a kernel with added WARN_ON_ONCE(PageTail) in page_memcg_check(), we
> > > > > observed a warning from page_cgroup_ino() when reading
> > > > > /proc/kpagecgroup.
> > > >
> > > > If this is the only known situation in which page_memcg_check() is
> > > > passed a tail page, why does page_memcg_check() have
> > > >
> > > >         if (PageTail(page))
> > > >                 return NULL;
> > > >
> > > > ?  Can we remove this to simplify, streamline and clarify?
> > >
> > > I guess it's a safety check so that we don't end up trying to cast a
> > > tail page to a folio. My opinion is to go one step further and change
> > > page_memcg_check() to do return the memcg of the head page, i.e:
> > >
> > > static inline struct mem_cgroup *page_memcg_check(struct page *page)
> > > {
> > >     return folio_memcg_check(page_folio(page));
> > > }
> >
> > I would just stick with the existing code and put a comment that this
> > function shouldn't be used in any new code and the folio counterpart
> > should be used instead.
>
> Would you mind explaining the rationale?
>
> If existing users are not passing in tail pages and we are telling new
> users not to use it, what's the point of leaving the PageTail() check?
>
> Is page owner doing the right thing by discounting pages with
> page->memcg_data = 0 in print_page_owner_memcg() ? Wouldn't this not
> show the memcg of tail pages?
>
> If page owner also needs a compound_head()/ page_folio() call before
> checking the page memcg, perhaps we should convert both call sites to
> page_memcg_check() to folio_memcg_check() and remove it now?

.. or add page_folio() inside page_memcg_check() as suggested initially.

>
> >
> > --
> > Michal Hocko
> > SUSE Labs
Roman Gushchin March 15, 2023, 2:34 a.m. UTC | #7
On Mon, Mar 13, 2023 at 02:08:53PM -0700, Yosry Ahmed wrote:
> On Mon, Mar 13, 2023 at 12:44 PM Andrew Morton
> <akpm@linux-foundation.org> wrote:
> >
> > On Mon, 13 Mar 2023 08:34:52 +0000 Yosry Ahmed <yosryahmed@google.com> wrote:
> >
> > > From: Hugh Dickins <hughd@google.com>
> > >
> > > In a kernel with added WARN_ON_ONCE(PageTail) in page_memcg_check(), we
> > > observed a warning from page_cgroup_ino() when reading
> > > /proc/kpagecgroup.
> >
> > If this is the only known situation in which page_memcg_check() is
> > passed a tail page, why does page_memcg_check() have
> >
> >         if (PageTail(page))
> >                 return NULL;
> >
> > ?  Can we remove this to simplify, streamline and clarify?
> 
> I guess it's a safety check so that we don't end up trying to cast a
> tail page to a folio. My opinion is to go one step further and change
> page_memcg_check() to do return the memcg of the head page, i.e:
> 
> static inline struct mem_cgroup *page_memcg_check(struct page *page)
> {
>     return folio_memcg_check(page_folio(page));
> }
> 
> This makes it consistent with page_memcg(), and makes sure future
> users are getting the "correct" memcg for whatever page they pass in.
> I am interested to hear other folks' opinions here.
> 
> The only other user today is print_page_owner_memcg(). I am not sure
> if it's doing the right thing by explicitly reading page->memcg_data,
> but it is already excluding pages that have page->memcg_data == 0,
> which should be the case for tail pages.

Yeah, I think it's a good idea. I'd do this.
If you'll master a patch like this, please, apply my
Acked-by: Roman Gushchin <roman.gushchin@linux.dev> .

I'm ok with the current approach too (the one you posted),
but the one above is preferable.

Overall it would be nice to clarify & document our expectations
from /proc/kpagecgroup (and /proc/kpageflags & /proc/kpagecount)
in the new folio "epoch".

Thanks!
Yosry Ahmed March 15, 2023, 2:37 a.m. UTC | #8
Somehow I was stupid enough to forget CC'ing Matthew :)

+Matthew Wilcox

On Mon, Mar 13, 2023 at 1:34 AM Yosry Ahmed <yosryahmed@google.com> wrote:
>
> From: Hugh Dickins <hughd@google.com>
>
> In a kernel with added WARN_ON_ONCE(PageTail) in page_memcg_check(), we
> observed a warning from page_cgroup_ino() when reading
> /proc/kpagecgroup. This warning was added to catch fragile reads of
> a page memcg. Make page_cgroup_ino() get memcg from compound_head(page):
> that gives it the correct memcg for each subpage of a compound page,
> so is the right fix.
>
> I dithered between the right fix and the safer "fix": it's unlikely but
> conceivable that some userspace has learnt that /proc/kpagecgroup gives
> no memcg on tail pages, and compensates for that in some (racy) way: so
> continuing to give no memcg on tails, without warning, might be safer.
>
> But hwpoison_filter_task(), the only other user of page_cgroup_ino(),
> persuaded me.  It looks as if it currently leaves out tail pages of the
> selected memcg, by mistake: whereas hwpoison_inject() uses compound_head()
> and expects the tails to be included.  So hwpoison testing coverage has
> probably been restricted by the wrong output from page_cgroup_ino() (if
> that memcg filter is used at all): in the short term, it might be safer
> not to enable wider coverage there, but long term we would regret that.
>
> Signed-off-by: Hugh Dickins <hughd@google.com>
> Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
> ---
>
> (Yosry: Alternatively, we could modify page_memcg_check() to do
>  page_folio() like its sibling page_memcg(), as page_cgroup_ino() is the
>  only remaining caller other than print_page_owner_memcg(); and it already
>  excludes pages that have page->memcg_data = 0)
>
> ---
>  mm/memcontrol.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 5abffe6f8389..e3a55295725e 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -395,7 +395,7 @@ ino_t page_cgroup_ino(struct page *page)
>         unsigned long ino = 0;
>
>         rcu_read_lock();
> -       memcg = page_memcg_check(page);
> +       memcg = page_memcg_check(compound_head(page));
>
>         while (memcg && !(memcg->css.flags & CSS_ONLINE))
>                 memcg = parent_mem_cgroup(memcg);
> --
> 2.40.0.rc1.284.g88254d51c5-goog
>
Yosry Ahmed March 15, 2023, 2:39 a.m. UTC | #9
On Tue, Mar 14, 2023 at 7:34 PM Roman Gushchin <roman.gushchin@linux.dev> wrote:
>
> On Mon, Mar 13, 2023 at 02:08:53PM -0700, Yosry Ahmed wrote:
> > On Mon, Mar 13, 2023 at 12:44 PM Andrew Morton
> > <akpm@linux-foundation.org> wrote:
> > >
> > > On Mon, 13 Mar 2023 08:34:52 +0000 Yosry Ahmed <yosryahmed@google.com> wrote:
> > >
> > > > From: Hugh Dickins <hughd@google.com>
> > > >
> > > > In a kernel with added WARN_ON_ONCE(PageTail) in page_memcg_check(), we
> > > > observed a warning from page_cgroup_ino() when reading
> > > > /proc/kpagecgroup.
> > >
> > > If this is the only known situation in which page_memcg_check() is
> > > passed a tail page, why does page_memcg_check() have
> > >
> > >         if (PageTail(page))
> > >                 return NULL;
> > >
> > > ?  Can we remove this to simplify, streamline and clarify?
> >
> > I guess it's a safety check so that we don't end up trying to cast a
> > tail page to a folio. My opinion is to go one step further and change
> > page_memcg_check() to do return the memcg of the head page, i.e:
> >
> > static inline struct mem_cgroup *page_memcg_check(struct page *page)
> > {
> >     return folio_memcg_check(page_folio(page));
> > }
> >
> > This makes it consistent with page_memcg(), and makes sure future
> > users are getting the "correct" memcg for whatever page they pass in.
> > I am interested to hear other folks' opinions here.
> >
> > The only other user today is print_page_owner_memcg(). I am not sure
> > if it's doing the right thing by explicitly reading page->memcg_data,
> > but it is already excluding pages that have page->memcg_data == 0,
> > which should be the case for tail pages.
>
> Yeah, I think it's a good idea. I'd do this.
> If you'll master a patch like this, please, apply my
> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> .

Thanks, Roman. So far there have been different opinions (specifically
Michal's), so I'll wait until the discussion settles before sending
another patch.

>
> I'm ok with the current approach too (the one you posted),
> but the one above is preferable.
>
> Overall it would be nice to clarify & document our expectations
> from /proc/kpagecgroup (and /proc/kpageflags & /proc/kpagecount)
> in the new folio "epoch".

Agreed, though I don't think I am the right person for this.


>
> Thanks!
Waiman Long March 15, 2023, 3:06 a.m. UTC | #10
On 3/13/23 17:08, Yosry Ahmed wrote:
> On Mon, Mar 13, 2023 at 12:44 PM Andrew Morton
> <akpm@linux-foundation.org> wrote:
>> On Mon, 13 Mar 2023 08:34:52 +0000 Yosry Ahmed <yosryahmed@google.com> wrote:
>>
>>> From: Hugh Dickins <hughd@google.com>
>>>
>>> In a kernel with added WARN_ON_ONCE(PageTail) in page_memcg_check(), we
>>> observed a warning from page_cgroup_ino() when reading
>>> /proc/kpagecgroup.
>> If this is the only known situation in which page_memcg_check() is
>> passed a tail page, why does page_memcg_check() have
>>
>>          if (PageTail(page))
>>                  return NULL;
>>
>> ?  Can we remove this to simplify, streamline and clarify?
> I guess it's a safety check so that we don't end up trying to cast a
> tail page to a folio. My opinion is to go one step further and change
> page_memcg_check() to do return the memcg of the head page, i.e:
>
> static inline struct mem_cgroup *page_memcg_check(struct page *page)
> {
>      return folio_memcg_check(page_folio(page));
> }
>
> This makes it consistent with page_memcg(), and makes sure future
> users are getting the "correct" memcg for whatever page they pass in.
> I am interested to hear other folks' opinions here.
>
> The only other user today is print_page_owner_memcg(). I am not sure
> if it's doing the right thing by explicitly reading page->memcg_data,
> but it is already excluding pages that have page->memcg_data == 0,
> which should be the case for tail pages.

It is reading memcg_data directly to see if it is slab cache page. It is 
currently skipping page that does not have memcg_data set.

Cheers,
Longman
Yosry Ahmed March 15, 2023, 3:10 a.m. UTC | #11
On Tue, Mar 14, 2023 at 8:07 PM Waiman Long <longman@redhat.com> wrote:
>
> On 3/13/23 17:08, Yosry Ahmed wrote:
> > On Mon, Mar 13, 2023 at 12:44 PM Andrew Morton
> > <akpm@linux-foundation.org> wrote:
> >> On Mon, 13 Mar 2023 08:34:52 +0000 Yosry Ahmed <yosryahmed@google.com> wrote:
> >>
> >>> From: Hugh Dickins <hughd@google.com>
> >>>
> >>> In a kernel with added WARN_ON_ONCE(PageTail) in page_memcg_check(), we
> >>> observed a warning from page_cgroup_ino() when reading
> >>> /proc/kpagecgroup.
> >> If this is the only known situation in which page_memcg_check() is
> >> passed a tail page, why does page_memcg_check() have
> >>
> >>          if (PageTail(page))
> >>                  return NULL;
> >>
> >> ?  Can we remove this to simplify, streamline and clarify?
> > I guess it's a safety check so that we don't end up trying to cast a
> > tail page to a folio. My opinion is to go one step further and change
> > page_memcg_check() to do return the memcg of the head page, i.e:
> >
> > static inline struct mem_cgroup *page_memcg_check(struct page *page)
> > {
> >      return folio_memcg_check(page_folio(page));
> > }
> >
> > This makes it consistent with page_memcg(), and makes sure future
> > users are getting the "correct" memcg for whatever page they pass in.
> > I am interested to hear other folks' opinions here.
> >
> > The only other user today is print_page_owner_memcg(). I am not sure
> > if it's doing the right thing by explicitly reading page->memcg_data,
> > but it is already excluding pages that have page->memcg_data == 0,
> > which should be the case for tail pages.
>
> It is reading memcg_data directly to see if it is slab cache page. It is
> currently skipping page that does not have memcg_data set.

IIUC this skips tail pages, because they should always have
page->memcg_data == 0, even if they are charged to a memcg. To
correctly get their memcg we should read it from the
compound_head()/page_folio().

My 2c, we can check PageSlab() to print the extra message for slab
pages, instead of reading memcg_data directly, which kinda breaks the
abstraction created by the various helpers for reading a page memcg.
Someone can easily change something in how memcg_data is interpreted
in those helpers without realizing that page_owner is also reading it.

>
> Cheers,
> Longman
>
Waiman Long March 15, 2023, 3:33 a.m. UTC | #12
On 3/14/23 23:10, Yosry Ahmed wrote:
>>> The only other user today is print_page_owner_memcg(). I am not sure
>>> if it's doing the right thing by explicitly reading page->memcg_data,
>>> but it is already excluding pages that have page->memcg_data == 0,
>>> which should be the case for tail pages.
>> It is reading memcg_data directly to see if it is slab cache page. It is
>> currently skipping page that does not have memcg_data set.
> IIUC this skips tail pages, because they should always have
> page->memcg_data == 0, even if they are charged to a memcg. To
> correctly get their memcg we should read it from the
> compound_head()/page_folio().
The purpose of that function is mainly to report pages that have a 
reference to a memcg, especially the dead one. So by counting the 
occurrence of a particular cgroup name, we can have a rough idea of 
that. So only head page has relevance here and we can skip the tail pages.
>
> My 2c, we can check PageSlab() to print the extra message for slab
> pages, instead of reading memcg_data directly, which kinda breaks the
> abstraction created by the various helpers for reading a page memcg.
> Someone can easily change something in how memcg_data is interpreted
> in those helpers without realizing that page_owner is also reading it.

You are right. We should be using a helper if available. I will send a 
patch to fix that.

Thanks,
Longman
Yosry Ahmed March 15, 2023, 3:40 a.m. UTC | #13
On Tue, Mar 14, 2023 at 8:33 PM Waiman Long <longman@redhat.com> wrote:
>
> On 3/14/23 23:10, Yosry Ahmed wrote:
> >>> The only other user today is print_page_owner_memcg(). I am not sure
> >>> if it's doing the right thing by explicitly reading page->memcg_data,
> >>> but it is already excluding pages that have page->memcg_data == 0,
> >>> which should be the case for tail pages.
> >> It is reading memcg_data directly to see if it is slab cache page. It is
> >> currently skipping page that does not have memcg_data set.
> > IIUC this skips tail pages, because they should always have
> > page->memcg_data == 0, even if they are charged to a memcg. To
> > correctly get their memcg we should read it from the
> > compound_head()/page_folio().
> The purpose of that function is mainly to report pages that have a
> reference to a memcg, especially the dead one. So by counting the
> occurrence of a particular cgroup name, we can have a rough idea of
> that. So only head page has relevance here and we can skip the tail pages.

I see. If you update the code to not check memcg_data directly, and we
update page_memcg_check() to return memcg of the head page, this will
stop skipping tail pages. Is this okay?

If not, we can explicitly skip tail pages in print_page_owner_memcg()
if we decide to check the head's memcg in page_memcg_check().

> >
> > My 2c, we can check PageSlab() to print the extra message for slab
> > pages, instead of reading memcg_data directly, which kinda breaks the
> > abstraction created by the various helpers for reading a page memcg.
> > Someone can easily change something in how memcg_data is interpreted
> > in those helpers without realizing that page_owner is also reading it.
>
> You are right. We should be using a helper if available. I will send a
> patch to fix that.

Great, thanks!

>
> Thanks,
> Longman
>
Matthew Wilcox March 15, 2023, 4:54 a.m. UTC | #14
On Mon, Mar 13, 2023 at 02:08:53PM -0700, Yosry Ahmed wrote:
> On Mon, Mar 13, 2023 at 12:44 PM Andrew Morton
> <akpm@linux-foundation.org> wrote:
> >
> > On Mon, 13 Mar 2023 08:34:52 +0000 Yosry Ahmed <yosryahmed@google.com> wrote:
> >
> > > From: Hugh Dickins <hughd@google.com>
> > >
> > > In a kernel with added WARN_ON_ONCE(PageTail) in page_memcg_check(), we
> > > observed a warning from page_cgroup_ino() when reading
> > > /proc/kpagecgroup.
> >
> > If this is the only known situation in which page_memcg_check() is
> > passed a tail page, why does page_memcg_check() have
> >
> >         if (PageTail(page))
> >                 return NULL;
> >
> > ?  Can we remove this to simplify, streamline and clarify?
> 
> I guess it's a safety check so that we don't end up trying to cast a
> tail page to a folio. My opinion is to go one step further and change
> page_memcg_check() to do return the memcg of the head page, i.e:
> 
> static inline struct mem_cgroup *page_memcg_check(struct page *page)
> {
>     return folio_memcg_check(page_folio(page));
> }

If you look at my commit becacb04fdd4, I was preserving the existing
behaviour of page_memcg_check() when passed a tail page.  It would
previously, rightly or wrongly, read the memcg_data from the tail page
and get back NULL.

I suspect that was not the intended behaviour, but I do not think this
patch is the right fix; it simply papers over the problem and maybe
creates a new one.  Callers of page_memcg_check() should be eliminated,
precisely because of this ambiguity.  It's up to the people who understand
each of the callers who need to make the decision to always convert the
page that they have to a folio and ask about its memcg, or whether they
want to preserve the existing behaviour of returning NULL for tail pages.

So, I say NACK to this patch as it does not preserve existing behaviour,
and does not advance our understanding of what we have wrought.
Yosry Ahmed March 15, 2023, 7:04 a.m. UTC | #15
On Tue, Mar 14, 2023 at 9:54 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Mon, Mar 13, 2023 at 02:08:53PM -0700, Yosry Ahmed wrote:
> > On Mon, Mar 13, 2023 at 12:44 PM Andrew Morton
> > <akpm@linux-foundation.org> wrote:
> > >
> > > On Mon, 13 Mar 2023 08:34:52 +0000 Yosry Ahmed <yosryahmed@google.com> wrote:
> > >
> > > > From: Hugh Dickins <hughd@google.com>
> > > >
> > > > In a kernel with added WARN_ON_ONCE(PageTail) in page_memcg_check(), we
> > > > observed a warning from page_cgroup_ino() when reading
> > > > /proc/kpagecgroup.
> > >
> > > If this is the only known situation in which page_memcg_check() is
> > > passed a tail page, why does page_memcg_check() have
> > >
> > >         if (PageTail(page))
> > >                 return NULL;
> > >
> > > ?  Can we remove this to simplify, streamline and clarify?
> >
> > I guess it's a safety check so that we don't end up trying to cast a
> > tail page to a folio. My opinion is to go one step further and change
> > page_memcg_check() to do return the memcg of the head page, i.e:
> >
> > static inline struct mem_cgroup *page_memcg_check(struct page *page)
> > {
> >     return folio_memcg_check(page_folio(page));
> > }
>
> If you look at my commit becacb04fdd4, I was preserving the existing
> behaviour of page_memcg_check() when passed a tail page.  It would
> previously, rightly or wrongly, read the memcg_data from the tail page
> and get back NULL.

Right, I looked at that. I also looked at 1b7e4464d43a which added
folio_memcg() and changed page_memcg()'s behavior to use page_folio()
to retrieve the memcg from the head, which made me wonder why
different decisions were made for these 2 helpers.

Were the users of page_memcg() already passing in head pages only?

>
> I suspect that was not the intended behaviour, but I do not think this
> patch is the right fix; it simply papers over the problem and maybe
> creates a new one.  Callers of page_memcg_check() should be eliminated,
> precisely because of this ambiguity.  It's up to the people who understand
> each of the callers who need to make the decision to always convert the
> page that they have to a folio and ask about its memcg, or whether they
> want to preserve the existing behaviour of returning NULL for tail pages.
>
> So, I say NACK to this patch as it does not preserve existing behaviour,
> and does not advance our understanding of what we have wrought.

I am not sure which patch you are NACKing, the original patch from
Hugh (adding compound_head() to page_cgroup_ino()) or the suggested
alternative patch which changes page_memcg_check() to use
page_folio().

I am assuming the latter (based on where this reply was placed). If
that's the case, are you okay with the original patch? If yes, then
changing page_memcg_check() to use page_folio() should be a NOP as the
only 2 callers today are page_cgroup_ino() and
print_page_owner_memcg(). The latter already implicitly excludes tail
pages by checking if memcg_data == 0, and we can add an explicit
PageTail() check there instead.

Or is it the former (original patch)?
Matthew Wilcox March 15, 2023, 12:19 p.m. UTC | #16
On Wed, Mar 15, 2023 at 12:04:10AM -0700, Yosry Ahmed wrote:
> On Tue, Mar 14, 2023 at 9:54 PM Matthew Wilcox <willy@infradead.org> wrote:
> >
> > On Mon, Mar 13, 2023 at 02:08:53PM -0700, Yosry Ahmed wrote:
> > > On Mon, Mar 13, 2023 at 12:44 PM Andrew Morton
> > > <akpm@linux-foundation.org> wrote:
> > > >
> > > > On Mon, 13 Mar 2023 08:34:52 +0000 Yosry Ahmed <yosryahmed@google.com> wrote:
> > > >
> > > > > From: Hugh Dickins <hughd@google.com>
> > > > >
> > > > > In a kernel with added WARN_ON_ONCE(PageTail) in page_memcg_check(), we
> > > > > observed a warning from page_cgroup_ino() when reading
> > > > > /proc/kpagecgroup.
> > > >
> > > > If this is the only known situation in which page_memcg_check() is
> > > > passed a tail page, why does page_memcg_check() have
> > > >
> > > >         if (PageTail(page))
> > > >                 return NULL;
> > > >
> > > > ?  Can we remove this to simplify, streamline and clarify?
> > >
> > > I guess it's a safety check so that we don't end up trying to cast a
> > > tail page to a folio. My opinion is to go one step further and change
> > > page_memcg_check() to do return the memcg of the head page, i.e:
> > >
> > > static inline struct mem_cgroup *page_memcg_check(struct page *page)
> > > {
> > >     return folio_memcg_check(page_folio(page));
> > > }
> >
> > If you look at my commit becacb04fdd4, I was preserving the existing
> > behaviour of page_memcg_check() when passed a tail page.  It would
> > previously, rightly or wrongly, read the memcg_data from the tail page
> > and get back NULL.
> 
> Right, I looked at that. I also looked at 1b7e4464d43a which added
> folio_memcg() and changed page_memcg()'s behavior to use page_folio()
> to retrieve the memcg from the head, which made me wonder why
> different decisions were made for these 2 helpers.
> 
> Were the users of page_memcg() already passing in head pages only?

There were 18 months between those commits ... I'd learned to be
more careful about maintaining the semantics instead of changing
them to "what they should have been".

> >
> > I suspect that was not the intended behaviour, but I do not think this
> > patch is the right fix; it simply papers over the problem and maybe
> > creates a new one.  Callers of page_memcg_check() should be eliminated,
> > precisely because of this ambiguity.  It's up to the people who understand
> > each of the callers who need to make the decision to always convert the
> > page that they have to a folio and ask about its memcg, or whether they
> > want to preserve the existing behaviour of returning NULL for tail pages.
> >
> > So, I say NACK to this patch as it does not preserve existing behaviour,
> > and does not advance our understanding of what we have wrought.
> 
> I am not sure which patch you are NACKing, the original patch from
> Hugh (adding compound_head() to page_cgroup_ino()) or the suggested
> alternative patch which changes page_memcg_check() to use
> page_folio().

Both patches are NACKed.  page_memcg_check() needs to go away
because it has the tail page ambiguity.  Both callers should be using
folio_memcg_check() directly and resolving for themselves what the
correct behaviour is when seeing a tail page.
Yosry Ahmed March 15, 2023, 9:43 p.m. UTC | #17
On Wed, Mar 15, 2023 at 5:19 AM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Wed, Mar 15, 2023 at 12:04:10AM -0700, Yosry Ahmed wrote:
> > On Tue, Mar 14, 2023 at 9:54 PM Matthew Wilcox <willy@infradead.org> wrote:
> > >
> > > On Mon, Mar 13, 2023 at 02:08:53PM -0700, Yosry Ahmed wrote:
> > > > On Mon, Mar 13, 2023 at 12:44 PM Andrew Morton
> > > > <akpm@linux-foundation.org> wrote:
> > > > >
> > > > > On Mon, 13 Mar 2023 08:34:52 +0000 Yosry Ahmed <yosryahmed@google.com> wrote:
> > > > >
> > > > > > From: Hugh Dickins <hughd@google.com>
> > > > > >
> > > > > > In a kernel with added WARN_ON_ONCE(PageTail) in page_memcg_check(), we
> > > > > > observed a warning from page_cgroup_ino() when reading
> > > > > > /proc/kpagecgroup.
> > > > >
> > > > > If this is the only known situation in which page_memcg_check() is
> > > > > passed a tail page, why does page_memcg_check() have
> > > > >
> > > > >         if (PageTail(page))
> > > > >                 return NULL;
> > > > >
> > > > > ?  Can we remove this to simplify, streamline and clarify?
> > > >
> > > > I guess it's a safety check so that we don't end up trying to cast a
> > > > tail page to a folio. My opinion is to go one step further and change
> > > > page_memcg_check() to do return the memcg of the head page, i.e:
> > > >
> > > > static inline struct mem_cgroup *page_memcg_check(struct page *page)
> > > > {
> > > >     return folio_memcg_check(page_folio(page));
> > > > }
> > >
> > > If you look at my commit becacb04fdd4, I was preserving the existing
> > > behaviour of page_memcg_check() when passed a tail page.  It would
> > > previously, rightly or wrongly, read the memcg_data from the tail page
> > > and get back NULL.
> >
> > Right, I looked at that. I also looked at 1b7e4464d43a which added
> > folio_memcg() and changed page_memcg()'s behavior to use page_folio()
> > to retrieve the memcg from the head, which made me wonder why
> > different decisions were made for these 2 helpers.
> >
> > Were the users of page_memcg() already passing in head pages only?
>
> There were 18 months between those commits ... I'd learned to be
> more careful about maintaining the semantics instead of changing
> them to "what they should have been".
>
> > >
> > > I suspect that was not the intended behaviour, but I do not think this
> > > patch is the right fix; it simply papers over the problem and maybe
> > > creates a new one.  Callers of page_memcg_check() should be eliminated,
> > > precisely because of this ambiguity.  It's up to the people who understand
> > > each of the callers who need to make the decision to always convert the
> > > page that they have to a folio and ask about its memcg, or whether they
> > > want to preserve the existing behaviour of returning NULL for tail pages.
> > >
> > > So, I say NACK to this patch as it does not preserve existing behaviour,
> > > and does not advance our understanding of what we have wrought.
> >
> > I am not sure which patch you are NACKing, the original patch from
> > Hugh (adding compound_head() to page_cgroup_ino()) or the suggested
> > alternative patch which changes page_memcg_check() to use
> > page_folio().
>
> Both patches are NACKed.  page_memcg_check() needs to go away
> because it has the tail page ambiguity.  Both callers should be using
> folio_memcg_check() directly and resolving for themselves what the
> correct behaviour is when seeing a tail page.
>

I agree. I even suggested this to Michal in one of the replies.

For page_cgroup_ino() we can simply pass in
folio_memcg(page_folio(page)), which would mimic what Hugh's patch is
doing for page_cgroup_ino().

For page owner, I am not sure if we want to do something similar
(which would start printing the memcg for tail pages as well), or
explicitly excluding tail pages and THEN do
folio_memcg(page_folio(page)) to get the memcg of head pages. Waiman,
what do you think?
Waiman Long March 16, 2023, 12:09 a.m. UTC | #18
On 3/15/23 17:43, Yosry Ahmed wrote:
> On Wed, Mar 15, 2023 at 5:19 AM Matthew Wilcox <willy@infradead.org> wrote:
>> On Wed, Mar 15, 2023 at 12:04:10AM -0700, Yosry Ahmed wrote:
>>> On Tue, Mar 14, 2023 at 9:54 PM Matthew Wilcox <willy@infradead.org> wrote:
>>>> On Mon, Mar 13, 2023 at 02:08:53PM -0700, Yosry Ahmed wrote:
>>>>> On Mon, Mar 13, 2023 at 12:44 PM Andrew Morton
>>>>> <akpm@linux-foundation.org> wrote:
>>>>>> On Mon, 13 Mar 2023 08:34:52 +0000 Yosry Ahmed <yosryahmed@google.com> wrote:
>>>>>>
>>>>>>> From: Hugh Dickins <hughd@google.com>
>>>>>>>
>>>>>>> In a kernel with added WARN_ON_ONCE(PageTail) in page_memcg_check(), we
>>>>>>> observed a warning from page_cgroup_ino() when reading
>>>>>>> /proc/kpagecgroup.
>>>>>> If this is the only known situation in which page_memcg_check() is
>>>>>> passed a tail page, why does page_memcg_check() have
>>>>>>
>>>>>>          if (PageTail(page))
>>>>>>                  return NULL;
>>>>>>
>>>>>> ?  Can we remove this to simplify, streamline and clarify?
>>>>> I guess it's a safety check so that we don't end up trying to cast a
>>>>> tail page to a folio. My opinion is to go one step further and change
>>>>> page_memcg_check() to do return the memcg of the head page, i.e:
>>>>>
>>>>> static inline struct mem_cgroup *page_memcg_check(struct page *page)
>>>>> {
>>>>>      return folio_memcg_check(page_folio(page));
>>>>> }
>>>> If you look at my commit becacb04fdd4, I was preserving the existing
>>>> behaviour of page_memcg_check() when passed a tail page.  It would
>>>> previously, rightly or wrongly, read the memcg_data from the tail page
>>>> and get back NULL.
>>> Right, I looked at that. I also looked at 1b7e4464d43a which added
>>> folio_memcg() and changed page_memcg()'s behavior to use page_folio()
>>> to retrieve the memcg from the head, which made me wonder why
>>> different decisions were made for these 2 helpers.
>>>
>>> Were the users of page_memcg() already passing in head pages only?
>> There were 18 months between those commits ... I'd learned to be
>> more careful about maintaining the semantics instead of changing
>> them to "what they should have been".
>>
>>>> I suspect that was not the intended behaviour, but I do not think this
>>>> patch is the right fix; it simply papers over the problem and maybe
>>>> creates a new one.  Callers of page_memcg_check() should be eliminated,
>>>> precisely because of this ambiguity.  It's up to the people who understand
>>>> each of the callers who need to make the decision to always convert the
>>>> page that they have to a folio and ask about its memcg, or whether they
>>>> want to preserve the existing behaviour of returning NULL for tail pages.
>>>>
>>>> So, I say NACK to this patch as it does not preserve existing behaviour,
>>>> and does not advance our understanding of what we have wrought.
>>> I am not sure which patch you are NACKing, the original patch from
>>> Hugh (adding compound_head() to page_cgroup_ino()) or the suggested
>>> alternative patch which changes page_memcg_check() to use
>>> page_folio().
>> Both patches are NACKed.  page_memcg_check() needs to go away
>> because it has the tail page ambiguity.  Both callers should be using
>> folio_memcg_check() directly and resolving for themselves what the
>> correct behaviour is when seeing a tail page.
>>
> I agree. I even suggested this to Michal in one of the replies.
>
> For page_cgroup_ino() we can simply pass in
> folio_memcg(page_folio(page)), which would mimic what Hugh's patch is
> doing for page_cgroup_ino().
>
> For page owner, I am not sure if we want to do something similar
> (which would start printing the memcg for tail pages as well), or
> explicitly excluding tail pages and THEN do
> folio_memcg(page_folio(page)) to get the memcg of head pages. Waiman,
> what do you think?

I prefer the current behavior of printing information for the head page 
only. I am open to suggestion of the best APIs to use.

Cheers,
Longman
Yosry Ahmed March 16, 2023, 12:25 a.m. UTC | #19
On Wed, Mar 15, 2023 at 5:09 PM Waiman Long <longman@redhat.com> wrote:
>
>
> On 3/15/23 17:43, Yosry Ahmed wrote:
> > On Wed, Mar 15, 2023 at 5:19 AM Matthew Wilcox <willy@infradead.org> wrote:
> >> On Wed, Mar 15, 2023 at 12:04:10AM -0700, Yosry Ahmed wrote:
> >>> On Tue, Mar 14, 2023 at 9:54 PM Matthew Wilcox <willy@infradead.org> wrote:
> >>>> On Mon, Mar 13, 2023 at 02:08:53PM -0700, Yosry Ahmed wrote:
> >>>>> On Mon, Mar 13, 2023 at 12:44 PM Andrew Morton
> >>>>> <akpm@linux-foundation.org> wrote:
> >>>>>> On Mon, 13 Mar 2023 08:34:52 +0000 Yosry Ahmed <yosryahmed@google.com> wrote:
> >>>>>>
> >>>>>>> From: Hugh Dickins <hughd@google.com>
> >>>>>>>
> >>>>>>> In a kernel with added WARN_ON_ONCE(PageTail) in page_memcg_check(), we
> >>>>>>> observed a warning from page_cgroup_ino() when reading
> >>>>>>> /proc/kpagecgroup.
> >>>>>> If this is the only known situation in which page_memcg_check() is
> >>>>>> passed a tail page, why does page_memcg_check() have
> >>>>>>
> >>>>>>          if (PageTail(page))
> >>>>>>                  return NULL;
> >>>>>>
> >>>>>> ?  Can we remove this to simplify, streamline and clarify?
> >>>>> I guess it's a safety check so that we don't end up trying to cast a
> >>>>> tail page to a folio. My opinion is to go one step further and change
> >>>>> page_memcg_check() to do return the memcg of the head page, i.e:
> >>>>>
> >>>>> static inline struct mem_cgroup *page_memcg_check(struct page *page)
> >>>>> {
> >>>>>      return folio_memcg_check(page_folio(page));
> >>>>> }
> >>>> If you look at my commit becacb04fdd4, I was preserving the existing
> >>>> behaviour of page_memcg_check() when passed a tail page.  It would
> >>>> previously, rightly or wrongly, read the memcg_data from the tail page
> >>>> and get back NULL.
> >>> Right, I looked at that. I also looked at 1b7e4464d43a which added
> >>> folio_memcg() and changed page_memcg()'s behavior to use page_folio()
> >>> to retrieve the memcg from the head, which made me wonder why
> >>> different decisions were made for these 2 helpers.
> >>>
> >>> Were the users of page_memcg() already passing in head pages only?
> >> There were 18 months between those commits ... I'd learned to be
> >> more careful about maintaining the semantics instead of changing
> >> them to "what they should have been".
> >>
> >>>> I suspect that was not the intended behaviour, but I do not think this
> >>>> patch is the right fix; it simply papers over the problem and maybe
> >>>> creates a new one.  Callers of page_memcg_check() should be eliminated,
> >>>> precisely because of this ambiguity.  It's up to the people who understand
> >>>> each of the callers who need to make the decision to always convert the
> >>>> page that they have to a folio and ask about its memcg, or whether they
> >>>> want to preserve the existing behaviour of returning NULL for tail pages.
> >>>>
> >>>> So, I say NACK to this patch as it does not preserve existing behaviour,
> >>>> and does not advance our understanding of what we have wrought.
> >>> I am not sure which patch you are NACKing, the original patch from
> >>> Hugh (adding compound_head() to page_cgroup_ino()) or the suggested
> >>> alternative patch which changes page_memcg_check() to use
> >>> page_folio().
> >> Both patches are NACKed.  page_memcg_check() needs to go away
> >> because it has the tail page ambiguity.  Both callers should be using
> >> folio_memcg_check() directly and resolving for themselves what the
> >> correct behaviour is when seeing a tail page.
> >>
> > I agree. I even suggested this to Michal in one of the replies.
> >
> > For page_cgroup_ino() we can simply pass in
> > folio_memcg(page_folio(page)), which would mimic what Hugh's patch is
> > doing for page_cgroup_ino().
> >
> > For page owner, I am not sure if we want to do something similar
> > (which would start printing the memcg for tail pages as well), or
> > explicitly excluding tail pages and THEN do
> > folio_memcg(page_folio(page)) to get the memcg of head pages. Waiman,
> > what do you think?
>
> I prefer the current behavior of printing information for the head page
> only. I am open to suggestion of the best APIs to use.

I think instead of explicitly checking page->memcg_data, we can check
PageTail() and return explicitly for tail pages tails, check
PageSlab() to print the message for slab pages, then get the page's
memcg through folio_memcg_check(page_folio(page)).

Something like:

static inline int print_page_owner_memcg(char *kbuf, size_t count, int ret,
struct page *page)
{
    ...
    rcu_read_lock();

    /* Only head pages hold refs to a memcg */
    if (PageTail(page))
        goto out_unlock;

    if (PageSlab(page))
        ret += scnprintf(kbuf + ret, count - ret, "Slab cache page\n");

    memcg = folio_memcg_check(page_folio(page));
    if (!memcg)
        goto out_unlock;
    ...
}

Matthew, What do you think?

>
> Cheers,
> Longman
>
Matthew Wilcox March 16, 2023, 3:07 a.m. UTC | #20
On Wed, Mar 15, 2023 at 05:25:49PM -0700, Yosry Ahmed wrote:
[snipped 80 lines.  please learn to trim]
> I think instead of explicitly checking page->memcg_data, we can check
> PageTail() and return explicitly for tail pages tails, check
> PageSlab() to print the message for slab pages, then get the page's
> memcg through folio_memcg_check(page_folio(page)).
> 
> Something like:
> 
> static inline int print_page_owner_memcg(char *kbuf, size_t count, int ret,
> struct page *page)
> {
>     ...
>     rcu_read_lock();
> 
>     /* Only head pages hold refs to a memcg */
>     if (PageTail(page))
>         goto out_unlock;
> 
>     if (PageSlab(page))
>         ret += scnprintf(kbuf + ret, count - ret, "Slab cache page\n");
> 
>     memcg = folio_memcg_check(page_folio(page));
>     if (!memcg)
>         goto out_unlock;
>     ...
> }
> 
> Matthew, What do you think?

Brrr, this is hard.  read_page_owner() holds no locks or references,
so pages can transform between being head/tail/order-0 while we're
running.

It _tries_ to skip over tail pages in the most inefficient way possible:

                if (!IS_ALIGNED(pfn, 1 << page_owner->order))
                        goto ext_put_continue;

But any attempt to use folio APIs is going to risk tripping the
assertions in the folio code that it's not a tail.  This requires
more thought.
Yosry Ahmed March 16, 2023, 3:16 a.m. UTC | #21
On Wed, Mar 15, 2023 at 8:07 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Wed, Mar 15, 2023 at 05:25:49PM -0700, Yosry Ahmed wrote:
> [snipped 80 lines.  please learn to trim]
> > I think instead of explicitly checking page->memcg_data, we can check
> > PageTail() and return explicitly for tail pages tails, check
> > PageSlab() to print the message for slab pages, then get the page's
> > memcg through folio_memcg_check(page_folio(page)).
> >
> > Something like:
> >
> > static inline int print_page_owner_memcg(char *kbuf, size_t count, int ret,
> > struct page *page)
> > {
> >     ...
> >     rcu_read_lock();
> >
> >     /* Only head pages hold refs to a memcg */
> >     if (PageTail(page))
> >         goto out_unlock;
> >
> >     if (PageSlab(page))
> >         ret += scnprintf(kbuf + ret, count - ret, "Slab cache page\n");
> >
> >     memcg = folio_memcg_check(page_folio(page));
> >     if (!memcg)
> >         goto out_unlock;
> >     ...
> > }
> >
> > Matthew, What do you think?
>
> Brrr, this is hard.  read_page_owner() holds no locks or references,
> so pages can transform between being head/tail/order-0 while we're
> running.
>
> It _tries_ to skip over tail pages in the most inefficient way possible:
>
>                 if (!IS_ALIGNED(pfn, 1 << page_owner->order))
>                         goto ext_put_continue;
>
> But any attempt to use folio APIs is going to risk tripping the
> assertions in the folio code that it's not a tail.  This requires
> more thought.

Ugh yeah. I thought the worst that could happen is that if a page
becomes a tail page after the PageTail() check, then we will not skip
it and we will read the memcg from the head instead, which shouldn't
be the end of the world. I missed the fact that the folio returned by
page_folio() can change before folio_memcg_check() gets to read its
memcg_data.

I guess this race exists with the current implementation as well.
page_memcg_check() will check for tail pages then cast the page to a
folio. If the page becomes a tail page after the PageTail() check
inside page_memcg_check() we risk running into the same situation.

(FWIW folio_memcg_check() doesn't seem to have assertions for tail pages today)
Yosry Ahmed March 22, 2023, 6:52 a.m. UTC | #22
On Wed, Mar 15, 2023 at 8:16 PM Yosry Ahmed <yosryahmed@google.com> wrote:
>
> On Wed, Mar 15, 2023 at 8:07 PM Matthew Wilcox <willy@infradead.org> wrote:
> >
> > On Wed, Mar 15, 2023 at 05:25:49PM -0700, Yosry Ahmed wrote:
> > [snipped 80 lines.  please learn to trim]
> > > I think instead of explicitly checking page->memcg_data, we can check
> > > PageTail() and return explicitly for tail pages tails, check
> > > PageSlab() to print the message for slab pages, then get the page's
> > > memcg through folio_memcg_check(page_folio(page)).
> > >
> > > Something like:
> > >
> > > static inline int print_page_owner_memcg(char *kbuf, size_t count, int ret,
> > > struct page *page)
> > > {
> > >     ...
> > >     rcu_read_lock();
> > >
> > >     /* Only head pages hold refs to a memcg */
> > >     if (PageTail(page))
> > >         goto out_unlock;
> > >
> > >     if (PageSlab(page))
> > >         ret += scnprintf(kbuf + ret, count - ret, "Slab cache page\n");
> > >
> > >     memcg = folio_memcg_check(page_folio(page));
> > >     if (!memcg)
> > >         goto out_unlock;
> > >     ...
> > > }
> > >
> > > Matthew, What do you think?
> >
> > Brrr, this is hard.  read_page_owner() holds no locks or references,
> > so pages can transform between being head/tail/order-0 while we're
> > running.
> >
> > It _tries_ to skip over tail pages in the most inefficient way possible:
> >
> >                 if (!IS_ALIGNED(pfn, 1 << page_owner->order))
> >                         goto ext_put_continue;
> >
> > But any attempt to use folio APIs is going to risk tripping the
> > assertions in the folio code that it's not a tail.  This requires
> > more thought.
>
> Ugh yeah. I thought the worst that could happen is that if a page
> becomes a tail page after the PageTail() check, then we will not skip
> it and we will read the memcg from the head instead, which shouldn't
> be the end of the world. I missed the fact that the folio returned by
> page_folio() can change before folio_memcg_check() gets to read its
> memcg_data.
>
> I guess this race exists with the current implementation as well.
> page_memcg_check() will check for tail pages then cast the page to a
> folio. If the page becomes a tail page after the PageTail() check
> inside page_memcg_check() we risk running into the same situation.
>
> (FWIW folio_memcg_check() doesn't seem to have assertions for tail pages today)

Matthew, any thoughts on this?
diff mbox series

Patch

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 5abffe6f8389..e3a55295725e 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -395,7 +395,7 @@  ino_t page_cgroup_ino(struct page *page)
 	unsigned long ino = 0;
 
 	rcu_read_lock();
-	memcg = page_memcg_check(page);
+	memcg = page_memcg_check(compound_head(page));
 
 	while (memcg && !(memcg->css.flags & CSS_ONLINE))
 		memcg = parent_mem_cgroup(memcg);