Message ID | 20230313083452.1319968-1-yosryahmed@google.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | memcg: page_cgroup_ino() get memcg from compound_head(page) | expand |
On Mon, 13 Mar 2023 08:34:52 +0000 Yosry Ahmed <yosryahmed@google.com> wrote: > From: Hugh Dickins <hughd@google.com> > > In a kernel with added WARN_ON_ONCE(PageTail) in page_memcg_check(), we > observed a warning from page_cgroup_ino() when reading > /proc/kpagecgroup. If this is the only known situation in which page_memcg_check() is passed a tail page, why does page_memcg_check() have if (PageTail(page)) return NULL; ? Can we remove this to simplify, streamline and clarify?
On Mon, Mar 13, 2023 at 12:44 PM Andrew Morton <akpm@linux-foundation.org> wrote: > > On Mon, 13 Mar 2023 08:34:52 +0000 Yosry Ahmed <yosryahmed@google.com> wrote: > > > From: Hugh Dickins <hughd@google.com> > > > > In a kernel with added WARN_ON_ONCE(PageTail) in page_memcg_check(), we > > observed a warning from page_cgroup_ino() when reading > > /proc/kpagecgroup. > > If this is the only known situation in which page_memcg_check() is > passed a tail page, why does page_memcg_check() have > > if (PageTail(page)) > return NULL; > > ? Can we remove this to simplify, streamline and clarify? I guess it's a safety check so that we don't end up trying to cast a tail page to a folio. My opinion is to go one step further and change page_memcg_check() to do return the memcg of the head page, i.e: static inline struct mem_cgroup *page_memcg_check(struct page *page) { return folio_memcg_check(page_folio(page)); } This makes it consistent with page_memcg(), and makes sure future users are getting the "correct" memcg for whatever page they pass in. I am interested to hear other folks' opinions here. The only other user today is print_page_owner_memcg(). I am not sure if it's doing the right thing by explicitly reading page->memcg_data, but it is already excluding pages that have page->memcg_data == 0, which should be the case for tail pages. > >
On Mon 13-03-23 08:34:52, Yosry Ahmed wrote: > From: Hugh Dickins <hughd@google.com> > > In a kernel with added WARN_ON_ONCE(PageTail) in page_memcg_check(), we > observed a warning from page_cgroup_ino() when reading > /proc/kpagecgroup. This warning was added to catch fragile reads of > a page memcg. Make page_cgroup_ino() get memcg from compound_head(page): > that gives it the correct memcg for each subpage of a compound page, > so is the right fix. > > I dithered between the right fix and the safer "fix": it's unlikely but > conceivable that some userspace has learnt that /proc/kpagecgroup gives > no memcg on tail pages, and compensates for that in some (racy) way: so > continuing to give no memcg on tails, without warning, might be safer. > > But hwpoison_filter_task(), the only other user of page_cgroup_ino(), > persuaded me. It looks as if it currently leaves out tail pages of the > selected memcg, by mistake: whereas hwpoison_inject() uses compound_head() > and expects the tails to be included. So hwpoison testing coverage has > probably been restricted by the wrong output from page_cgroup_ino() (if > that memcg filter is used at all): in the short term, it might be safer > not to enable wider coverage there, but long term we would regret that. > > Signed-off-by: Hugh Dickins <hughd@google.com> > Signed-off-by: Yosry Ahmed <yosryahmed@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Thanks! > --- > > (Yosry: Alternatively, we could modify page_memcg_check() to do > page_folio() like its sibling page_memcg(), as page_cgroup_ino() is the > only remaining caller other than print_page_owner_memcg(); and it already > excludes pages that have page->memcg_data = 0) > > --- > mm/memcontrol.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 5abffe6f8389..e3a55295725e 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -395,7 +395,7 @@ ino_t page_cgroup_ino(struct page *page) > unsigned long ino = 0; > > rcu_read_lock(); > - memcg = page_memcg_check(page); > + memcg = page_memcg_check(compound_head(page)); > > while (memcg && !(memcg->css.flags & CSS_ONLINE)) > memcg = parent_mem_cgroup(memcg); > -- > 2.40.0.rc1.284.g88254d51c5-goog
On Mon 13-03-23 14:08:53, Yosry Ahmed wrote: > On Mon, Mar 13, 2023 at 12:44 PM Andrew Morton > <akpm@linux-foundation.org> wrote: > > > > On Mon, 13 Mar 2023 08:34:52 +0000 Yosry Ahmed <yosryahmed@google.com> wrote: > > > > > From: Hugh Dickins <hughd@google.com> > > > > > > In a kernel with added WARN_ON_ONCE(PageTail) in page_memcg_check(), we > > > observed a warning from page_cgroup_ino() when reading > > > /proc/kpagecgroup. > > > > If this is the only known situation in which page_memcg_check() is > > passed a tail page, why does page_memcg_check() have > > > > if (PageTail(page)) > > return NULL; > > > > ? Can we remove this to simplify, streamline and clarify? > > I guess it's a safety check so that we don't end up trying to cast a > tail page to a folio. My opinion is to go one step further and change > page_memcg_check() to do return the memcg of the head page, i.e: > > static inline struct mem_cgroup *page_memcg_check(struct page *page) > { > return folio_memcg_check(page_folio(page)); > } I would just stick with the existing code and put a comment that this function shouldn't be used in any new code and the folio counterpart should be used instead.
On Tue, Mar 14, 2023 at 3:02 AM Michal Hocko <mhocko@suse.com> wrote: > > On Mon 13-03-23 14:08:53, Yosry Ahmed wrote: > > On Mon, Mar 13, 2023 at 12:44 PM Andrew Morton > > <akpm@linux-foundation.org> wrote: > > > > > > On Mon, 13 Mar 2023 08:34:52 +0000 Yosry Ahmed <yosryahmed@google.com> wrote: > > > > > > > From: Hugh Dickins <hughd@google.com> > > > > > > > > In a kernel with added WARN_ON_ONCE(PageTail) in page_memcg_check(), we > > > > observed a warning from page_cgroup_ino() when reading > > > > /proc/kpagecgroup. > > > > > > If this is the only known situation in which page_memcg_check() is > > > passed a tail page, why does page_memcg_check() have > > > > > > if (PageTail(page)) > > > return NULL; > > > > > > ? Can we remove this to simplify, streamline and clarify? > > > > I guess it's a safety check so that we don't end up trying to cast a > > tail page to a folio. My opinion is to go one step further and change > > page_memcg_check() to do return the memcg of the head page, i.e: > > > > static inline struct mem_cgroup *page_memcg_check(struct page *page) > > { > > return folio_memcg_check(page_folio(page)); > > } > > I would just stick with the existing code and put a comment that this > function shouldn't be used in any new code and the folio counterpart > should be used instead. Would you mind explaining the rationale? If existing users are not passing in tail pages and we are telling new users not to use it, what's the point of leaving the PageTail() check? Is page owner doing the right thing by discounting pages with page->memcg_data = 0 in print_page_owner_memcg() ? Wouldn't this not show the memcg of tail pages? If page owner also needs a compound_head()/ page_folio() call before checking the page memcg, perhaps we should convert both call sites to page_memcg_check() to folio_memcg_check() and remove it now? > > -- > Michal Hocko > SUSE Labs
On Tue, Mar 14, 2023 at 12:45 PM Yosry Ahmed <yosryahmed@google.com> wrote: > > On Tue, Mar 14, 2023 at 3:02 AM Michal Hocko <mhocko@suse.com> wrote: > > > > On Mon 13-03-23 14:08:53, Yosry Ahmed wrote: > > > On Mon, Mar 13, 2023 at 12:44 PM Andrew Morton > > > <akpm@linux-foundation.org> wrote: > > > > > > > > On Mon, 13 Mar 2023 08:34:52 +0000 Yosry Ahmed <yosryahmed@google.com> wrote: > > > > > > > > > From: Hugh Dickins <hughd@google.com> > > > > > > > > > > In a kernel with added WARN_ON_ONCE(PageTail) in page_memcg_check(), we > > > > > observed a warning from page_cgroup_ino() when reading > > > > > /proc/kpagecgroup. > > > > > > > > If this is the only known situation in which page_memcg_check() is > > > > passed a tail page, why does page_memcg_check() have > > > > > > > > if (PageTail(page)) > > > > return NULL; > > > > > > > > ? Can we remove this to simplify, streamline and clarify? > > > > > > I guess it's a safety check so that we don't end up trying to cast a > > > tail page to a folio. My opinion is to go one step further and change > > > page_memcg_check() to do return the memcg of the head page, i.e: > > > > > > static inline struct mem_cgroup *page_memcg_check(struct page *page) > > > { > > > return folio_memcg_check(page_folio(page)); > > > } > > > > I would just stick with the existing code and put a comment that this > > function shouldn't be used in any new code and the folio counterpart > > should be used instead. > > Would you mind explaining the rationale? > > If existing users are not passing in tail pages and we are telling new > users not to use it, what's the point of leaving the PageTail() check? > > Is page owner doing the right thing by discounting pages with > page->memcg_data = 0 in print_page_owner_memcg() ? Wouldn't this not > show the memcg of tail pages? > > If page owner also needs a compound_head()/ page_folio() call before > checking the page memcg, perhaps we should convert both call sites to > page_memcg_check() to folio_memcg_check() and remove it now? .. or add page_folio() inside page_memcg_check() as suggested initially. > > > > > -- > > Michal Hocko > > SUSE Labs
On Mon, Mar 13, 2023 at 02:08:53PM -0700, Yosry Ahmed wrote: > On Mon, Mar 13, 2023 at 12:44 PM Andrew Morton > <akpm@linux-foundation.org> wrote: > > > > On Mon, 13 Mar 2023 08:34:52 +0000 Yosry Ahmed <yosryahmed@google.com> wrote: > > > > > From: Hugh Dickins <hughd@google.com> > > > > > > In a kernel with added WARN_ON_ONCE(PageTail) in page_memcg_check(), we > > > observed a warning from page_cgroup_ino() when reading > > > /proc/kpagecgroup. > > > > If this is the only known situation in which page_memcg_check() is > > passed a tail page, why does page_memcg_check() have > > > > if (PageTail(page)) > > return NULL; > > > > ? Can we remove this to simplify, streamline and clarify? > > I guess it's a safety check so that we don't end up trying to cast a > tail page to a folio. My opinion is to go one step further and change > page_memcg_check() to do return the memcg of the head page, i.e: > > static inline struct mem_cgroup *page_memcg_check(struct page *page) > { > return folio_memcg_check(page_folio(page)); > } > > This makes it consistent with page_memcg(), and makes sure future > users are getting the "correct" memcg for whatever page they pass in. > I am interested to hear other folks' opinions here. > > The only other user today is print_page_owner_memcg(). I am not sure > if it's doing the right thing by explicitly reading page->memcg_data, > but it is already excluding pages that have page->memcg_data == 0, > which should be the case for tail pages. Yeah, I think it's a good idea. I'd do this. If you'll master a patch like this, please, apply my Acked-by: Roman Gushchin <roman.gushchin@linux.dev> . I'm ok with the current approach too (the one you posted), but the one above is preferable. Overall it would be nice to clarify & document our expectations from /proc/kpagecgroup (and /proc/kpageflags & /proc/kpagecount) in the new folio "epoch". Thanks!
Somehow I was stupid enough to forget CC'ing Matthew :) +Matthew Wilcox On Mon, Mar 13, 2023 at 1:34 AM Yosry Ahmed <yosryahmed@google.com> wrote: > > From: Hugh Dickins <hughd@google.com> > > In a kernel with added WARN_ON_ONCE(PageTail) in page_memcg_check(), we > observed a warning from page_cgroup_ino() when reading > /proc/kpagecgroup. This warning was added to catch fragile reads of > a page memcg. Make page_cgroup_ino() get memcg from compound_head(page): > that gives it the correct memcg for each subpage of a compound page, > so is the right fix. > > I dithered between the right fix and the safer "fix": it's unlikely but > conceivable that some userspace has learnt that /proc/kpagecgroup gives > no memcg on tail pages, and compensates for that in some (racy) way: so > continuing to give no memcg on tails, without warning, might be safer. > > But hwpoison_filter_task(), the only other user of page_cgroup_ino(), > persuaded me. It looks as if it currently leaves out tail pages of the > selected memcg, by mistake: whereas hwpoison_inject() uses compound_head() > and expects the tails to be included. So hwpoison testing coverage has > probably been restricted by the wrong output from page_cgroup_ino() (if > that memcg filter is used at all): in the short term, it might be safer > not to enable wider coverage there, but long term we would regret that. > > Signed-off-by: Hugh Dickins <hughd@google.com> > Signed-off-by: Yosry Ahmed <yosryahmed@google.com> > --- > > (Yosry: Alternatively, we could modify page_memcg_check() to do > page_folio() like its sibling page_memcg(), as page_cgroup_ino() is the > only remaining caller other than print_page_owner_memcg(); and it already > excludes pages that have page->memcg_data = 0) > > --- > mm/memcontrol.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 5abffe6f8389..e3a55295725e 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -395,7 +395,7 @@ ino_t page_cgroup_ino(struct page *page) > unsigned long ino = 0; > > rcu_read_lock(); > - memcg = page_memcg_check(page); > + memcg = page_memcg_check(compound_head(page)); > > while (memcg && !(memcg->css.flags & CSS_ONLINE)) > memcg = parent_mem_cgroup(memcg); > -- > 2.40.0.rc1.284.g88254d51c5-goog >
On Tue, Mar 14, 2023 at 7:34 PM Roman Gushchin <roman.gushchin@linux.dev> wrote: > > On Mon, Mar 13, 2023 at 02:08:53PM -0700, Yosry Ahmed wrote: > > On Mon, Mar 13, 2023 at 12:44 PM Andrew Morton > > <akpm@linux-foundation.org> wrote: > > > > > > On Mon, 13 Mar 2023 08:34:52 +0000 Yosry Ahmed <yosryahmed@google.com> wrote: > > > > > > > From: Hugh Dickins <hughd@google.com> > > > > > > > > In a kernel with added WARN_ON_ONCE(PageTail) in page_memcg_check(), we > > > > observed a warning from page_cgroup_ino() when reading > > > > /proc/kpagecgroup. > > > > > > If this is the only known situation in which page_memcg_check() is > > > passed a tail page, why does page_memcg_check() have > > > > > > if (PageTail(page)) > > > return NULL; > > > > > > ? Can we remove this to simplify, streamline and clarify? > > > > I guess it's a safety check so that we don't end up trying to cast a > > tail page to a folio. My opinion is to go one step further and change > > page_memcg_check() to do return the memcg of the head page, i.e: > > > > static inline struct mem_cgroup *page_memcg_check(struct page *page) > > { > > return folio_memcg_check(page_folio(page)); > > } > > > > This makes it consistent with page_memcg(), and makes sure future > > users are getting the "correct" memcg for whatever page they pass in. > > I am interested to hear other folks' opinions here. > > > > The only other user today is print_page_owner_memcg(). I am not sure > > if it's doing the right thing by explicitly reading page->memcg_data, > > but it is already excluding pages that have page->memcg_data == 0, > > which should be the case for tail pages. > > Yeah, I think it's a good idea. I'd do this. > If you'll master a patch like this, please, apply my > Acked-by: Roman Gushchin <roman.gushchin@linux.dev> . Thanks, Roman. So far there have been different opinions (specifically Michal's), so I'll wait until the discussion settles before sending another patch. > > I'm ok with the current approach too (the one you posted), > but the one above is preferable. > > Overall it would be nice to clarify & document our expectations > from /proc/kpagecgroup (and /proc/kpageflags & /proc/kpagecount) > in the new folio "epoch". Agreed, though I don't think I am the right person for this. > > Thanks!
On 3/13/23 17:08, Yosry Ahmed wrote: > On Mon, Mar 13, 2023 at 12:44 PM Andrew Morton > <akpm@linux-foundation.org> wrote: >> On Mon, 13 Mar 2023 08:34:52 +0000 Yosry Ahmed <yosryahmed@google.com> wrote: >> >>> From: Hugh Dickins <hughd@google.com> >>> >>> In a kernel with added WARN_ON_ONCE(PageTail) in page_memcg_check(), we >>> observed a warning from page_cgroup_ino() when reading >>> /proc/kpagecgroup. >> If this is the only known situation in which page_memcg_check() is >> passed a tail page, why does page_memcg_check() have >> >> if (PageTail(page)) >> return NULL; >> >> ? Can we remove this to simplify, streamline and clarify? > I guess it's a safety check so that we don't end up trying to cast a > tail page to a folio. My opinion is to go one step further and change > page_memcg_check() to do return the memcg of the head page, i.e: > > static inline struct mem_cgroup *page_memcg_check(struct page *page) > { > return folio_memcg_check(page_folio(page)); > } > > This makes it consistent with page_memcg(), and makes sure future > users are getting the "correct" memcg for whatever page they pass in. > I am interested to hear other folks' opinions here. > > The only other user today is print_page_owner_memcg(). I am not sure > if it's doing the right thing by explicitly reading page->memcg_data, > but it is already excluding pages that have page->memcg_data == 0, > which should be the case for tail pages. It is reading memcg_data directly to see if it is slab cache page. It is currently skipping page that does not have memcg_data set. Cheers, Longman
On Tue, Mar 14, 2023 at 8:07 PM Waiman Long <longman@redhat.com> wrote: > > On 3/13/23 17:08, Yosry Ahmed wrote: > > On Mon, Mar 13, 2023 at 12:44 PM Andrew Morton > > <akpm@linux-foundation.org> wrote: > >> On Mon, 13 Mar 2023 08:34:52 +0000 Yosry Ahmed <yosryahmed@google.com> wrote: > >> > >>> From: Hugh Dickins <hughd@google.com> > >>> > >>> In a kernel with added WARN_ON_ONCE(PageTail) in page_memcg_check(), we > >>> observed a warning from page_cgroup_ino() when reading > >>> /proc/kpagecgroup. > >> If this is the only known situation in which page_memcg_check() is > >> passed a tail page, why does page_memcg_check() have > >> > >> if (PageTail(page)) > >> return NULL; > >> > >> ? Can we remove this to simplify, streamline and clarify? > > I guess it's a safety check so that we don't end up trying to cast a > > tail page to a folio. My opinion is to go one step further and change > > page_memcg_check() to do return the memcg of the head page, i.e: > > > > static inline struct mem_cgroup *page_memcg_check(struct page *page) > > { > > return folio_memcg_check(page_folio(page)); > > } > > > > This makes it consistent with page_memcg(), and makes sure future > > users are getting the "correct" memcg for whatever page they pass in. > > I am interested to hear other folks' opinions here. > > > > The only other user today is print_page_owner_memcg(). I am not sure > > if it's doing the right thing by explicitly reading page->memcg_data, > > but it is already excluding pages that have page->memcg_data == 0, > > which should be the case for tail pages. > > It is reading memcg_data directly to see if it is slab cache page. It is > currently skipping page that does not have memcg_data set. IIUC this skips tail pages, because they should always have page->memcg_data == 0, even if they are charged to a memcg. To correctly get their memcg we should read it from the compound_head()/page_folio(). My 2c, we can check PageSlab() to print the extra message for slab pages, instead of reading memcg_data directly, which kinda breaks the abstraction created by the various helpers for reading a page memcg. Someone can easily change something in how memcg_data is interpreted in those helpers without realizing that page_owner is also reading it. > > Cheers, > Longman >
On 3/14/23 23:10, Yosry Ahmed wrote: >>> The only other user today is print_page_owner_memcg(). I am not sure >>> if it's doing the right thing by explicitly reading page->memcg_data, >>> but it is already excluding pages that have page->memcg_data == 0, >>> which should be the case for tail pages. >> It is reading memcg_data directly to see if it is slab cache page. It is >> currently skipping page that does not have memcg_data set. > IIUC this skips tail pages, because they should always have > page->memcg_data == 0, even if they are charged to a memcg. To > correctly get their memcg we should read it from the > compound_head()/page_folio(). The purpose of that function is mainly to report pages that have a reference to a memcg, especially the dead one. So by counting the occurrence of a particular cgroup name, we can have a rough idea of that. So only head page has relevance here and we can skip the tail pages. > > My 2c, we can check PageSlab() to print the extra message for slab > pages, instead of reading memcg_data directly, which kinda breaks the > abstraction created by the various helpers for reading a page memcg. > Someone can easily change something in how memcg_data is interpreted > in those helpers without realizing that page_owner is also reading it. You are right. We should be using a helper if available. I will send a patch to fix that. Thanks, Longman
On Tue, Mar 14, 2023 at 8:33 PM Waiman Long <longman@redhat.com> wrote: > > On 3/14/23 23:10, Yosry Ahmed wrote: > >>> The only other user today is print_page_owner_memcg(). I am not sure > >>> if it's doing the right thing by explicitly reading page->memcg_data, > >>> but it is already excluding pages that have page->memcg_data == 0, > >>> which should be the case for tail pages. > >> It is reading memcg_data directly to see if it is slab cache page. It is > >> currently skipping page that does not have memcg_data set. > > IIUC this skips tail pages, because they should always have > > page->memcg_data == 0, even if they are charged to a memcg. To > > correctly get their memcg we should read it from the > > compound_head()/page_folio(). > The purpose of that function is mainly to report pages that have a > reference to a memcg, especially the dead one. So by counting the > occurrence of a particular cgroup name, we can have a rough idea of > that. So only head page has relevance here and we can skip the tail pages. I see. If you update the code to not check memcg_data directly, and we update page_memcg_check() to return memcg of the head page, this will stop skipping tail pages. Is this okay? If not, we can explicitly skip tail pages in print_page_owner_memcg() if we decide to check the head's memcg in page_memcg_check(). > > > > My 2c, we can check PageSlab() to print the extra message for slab > > pages, instead of reading memcg_data directly, which kinda breaks the > > abstraction created by the various helpers for reading a page memcg. > > Someone can easily change something in how memcg_data is interpreted > > in those helpers without realizing that page_owner is also reading it. > > You are right. We should be using a helper if available. I will send a > patch to fix that. Great, thanks! > > Thanks, > Longman >
On Mon, Mar 13, 2023 at 02:08:53PM -0700, Yosry Ahmed wrote: > On Mon, Mar 13, 2023 at 12:44 PM Andrew Morton > <akpm@linux-foundation.org> wrote: > > > > On Mon, 13 Mar 2023 08:34:52 +0000 Yosry Ahmed <yosryahmed@google.com> wrote: > > > > > From: Hugh Dickins <hughd@google.com> > > > > > > In a kernel with added WARN_ON_ONCE(PageTail) in page_memcg_check(), we > > > observed a warning from page_cgroup_ino() when reading > > > /proc/kpagecgroup. > > > > If this is the only known situation in which page_memcg_check() is > > passed a tail page, why does page_memcg_check() have > > > > if (PageTail(page)) > > return NULL; > > > > ? Can we remove this to simplify, streamline and clarify? > > I guess it's a safety check so that we don't end up trying to cast a > tail page to a folio. My opinion is to go one step further and change > page_memcg_check() to do return the memcg of the head page, i.e: > > static inline struct mem_cgroup *page_memcg_check(struct page *page) > { > return folio_memcg_check(page_folio(page)); > } If you look at my commit becacb04fdd4, I was preserving the existing behaviour of page_memcg_check() when passed a tail page. It would previously, rightly or wrongly, read the memcg_data from the tail page and get back NULL. I suspect that was not the intended behaviour, but I do not think this patch is the right fix; it simply papers over the problem and maybe creates a new one. Callers of page_memcg_check() should be eliminated, precisely because of this ambiguity. It's up to the people who understand each of the callers who need to make the decision to always convert the page that they have to a folio and ask about its memcg, or whether they want to preserve the existing behaviour of returning NULL for tail pages. So, I say NACK to this patch as it does not preserve existing behaviour, and does not advance our understanding of what we have wrought.
On Tue, Mar 14, 2023 at 9:54 PM Matthew Wilcox <willy@infradead.org> wrote: > > On Mon, Mar 13, 2023 at 02:08:53PM -0700, Yosry Ahmed wrote: > > On Mon, Mar 13, 2023 at 12:44 PM Andrew Morton > > <akpm@linux-foundation.org> wrote: > > > > > > On Mon, 13 Mar 2023 08:34:52 +0000 Yosry Ahmed <yosryahmed@google.com> wrote: > > > > > > > From: Hugh Dickins <hughd@google.com> > > > > > > > > In a kernel with added WARN_ON_ONCE(PageTail) in page_memcg_check(), we > > > > observed a warning from page_cgroup_ino() when reading > > > > /proc/kpagecgroup. > > > > > > If this is the only known situation in which page_memcg_check() is > > > passed a tail page, why does page_memcg_check() have > > > > > > if (PageTail(page)) > > > return NULL; > > > > > > ? Can we remove this to simplify, streamline and clarify? > > > > I guess it's a safety check so that we don't end up trying to cast a > > tail page to a folio. My opinion is to go one step further and change > > page_memcg_check() to do return the memcg of the head page, i.e: > > > > static inline struct mem_cgroup *page_memcg_check(struct page *page) > > { > > return folio_memcg_check(page_folio(page)); > > } > > If you look at my commit becacb04fdd4, I was preserving the existing > behaviour of page_memcg_check() when passed a tail page. It would > previously, rightly or wrongly, read the memcg_data from the tail page > and get back NULL. Right, I looked at that. I also looked at 1b7e4464d43a which added folio_memcg() and changed page_memcg()'s behavior to use page_folio() to retrieve the memcg from the head, which made me wonder why different decisions were made for these 2 helpers. Were the users of page_memcg() already passing in head pages only? > > I suspect that was not the intended behaviour, but I do not think this > patch is the right fix; it simply papers over the problem and maybe > creates a new one. Callers of page_memcg_check() should be eliminated, > precisely because of this ambiguity. It's up to the people who understand > each of the callers who need to make the decision to always convert the > page that they have to a folio and ask about its memcg, or whether they > want to preserve the existing behaviour of returning NULL for tail pages. > > So, I say NACK to this patch as it does not preserve existing behaviour, > and does not advance our understanding of what we have wrought. I am not sure which patch you are NACKing, the original patch from Hugh (adding compound_head() to page_cgroup_ino()) or the suggested alternative patch which changes page_memcg_check() to use page_folio(). I am assuming the latter (based on where this reply was placed). If that's the case, are you okay with the original patch? If yes, then changing page_memcg_check() to use page_folio() should be a NOP as the only 2 callers today are page_cgroup_ino() and print_page_owner_memcg(). The latter already implicitly excludes tail pages by checking if memcg_data == 0, and we can add an explicit PageTail() check there instead. Or is it the former (original patch)?
On Wed, Mar 15, 2023 at 12:04:10AM -0700, Yosry Ahmed wrote: > On Tue, Mar 14, 2023 at 9:54 PM Matthew Wilcox <willy@infradead.org> wrote: > > > > On Mon, Mar 13, 2023 at 02:08:53PM -0700, Yosry Ahmed wrote: > > > On Mon, Mar 13, 2023 at 12:44 PM Andrew Morton > > > <akpm@linux-foundation.org> wrote: > > > > > > > > On Mon, 13 Mar 2023 08:34:52 +0000 Yosry Ahmed <yosryahmed@google.com> wrote: > > > > > > > > > From: Hugh Dickins <hughd@google.com> > > > > > > > > > > In a kernel with added WARN_ON_ONCE(PageTail) in page_memcg_check(), we > > > > > observed a warning from page_cgroup_ino() when reading > > > > > /proc/kpagecgroup. > > > > > > > > If this is the only known situation in which page_memcg_check() is > > > > passed a tail page, why does page_memcg_check() have > > > > > > > > if (PageTail(page)) > > > > return NULL; > > > > > > > > ? Can we remove this to simplify, streamline and clarify? > > > > > > I guess it's a safety check so that we don't end up trying to cast a > > > tail page to a folio. My opinion is to go one step further and change > > > page_memcg_check() to do return the memcg of the head page, i.e: > > > > > > static inline struct mem_cgroup *page_memcg_check(struct page *page) > > > { > > > return folio_memcg_check(page_folio(page)); > > > } > > > > If you look at my commit becacb04fdd4, I was preserving the existing > > behaviour of page_memcg_check() when passed a tail page. It would > > previously, rightly or wrongly, read the memcg_data from the tail page > > and get back NULL. > > Right, I looked at that. I also looked at 1b7e4464d43a which added > folio_memcg() and changed page_memcg()'s behavior to use page_folio() > to retrieve the memcg from the head, which made me wonder why > different decisions were made for these 2 helpers. > > Were the users of page_memcg() already passing in head pages only? There were 18 months between those commits ... I'd learned to be more careful about maintaining the semantics instead of changing them to "what they should have been". > > > > I suspect that was not the intended behaviour, but I do not think this > > patch is the right fix; it simply papers over the problem and maybe > > creates a new one. Callers of page_memcg_check() should be eliminated, > > precisely because of this ambiguity. It's up to the people who understand > > each of the callers who need to make the decision to always convert the > > page that they have to a folio and ask about its memcg, or whether they > > want to preserve the existing behaviour of returning NULL for tail pages. > > > > So, I say NACK to this patch as it does not preserve existing behaviour, > > and does not advance our understanding of what we have wrought. > > I am not sure which patch you are NACKing, the original patch from > Hugh (adding compound_head() to page_cgroup_ino()) or the suggested > alternative patch which changes page_memcg_check() to use > page_folio(). Both patches are NACKed. page_memcg_check() needs to go away because it has the tail page ambiguity. Both callers should be using folio_memcg_check() directly and resolving for themselves what the correct behaviour is when seeing a tail page.
On Wed, Mar 15, 2023 at 5:19 AM Matthew Wilcox <willy@infradead.org> wrote: > > On Wed, Mar 15, 2023 at 12:04:10AM -0700, Yosry Ahmed wrote: > > On Tue, Mar 14, 2023 at 9:54 PM Matthew Wilcox <willy@infradead.org> wrote: > > > > > > On Mon, Mar 13, 2023 at 02:08:53PM -0700, Yosry Ahmed wrote: > > > > On Mon, Mar 13, 2023 at 12:44 PM Andrew Morton > > > > <akpm@linux-foundation.org> wrote: > > > > > > > > > > On Mon, 13 Mar 2023 08:34:52 +0000 Yosry Ahmed <yosryahmed@google.com> wrote: > > > > > > > > > > > From: Hugh Dickins <hughd@google.com> > > > > > > > > > > > > In a kernel with added WARN_ON_ONCE(PageTail) in page_memcg_check(), we > > > > > > observed a warning from page_cgroup_ino() when reading > > > > > > /proc/kpagecgroup. > > > > > > > > > > If this is the only known situation in which page_memcg_check() is > > > > > passed a tail page, why does page_memcg_check() have > > > > > > > > > > if (PageTail(page)) > > > > > return NULL; > > > > > > > > > > ? Can we remove this to simplify, streamline and clarify? > > > > > > > > I guess it's a safety check so that we don't end up trying to cast a > > > > tail page to a folio. My opinion is to go one step further and change > > > > page_memcg_check() to do return the memcg of the head page, i.e: > > > > > > > > static inline struct mem_cgroup *page_memcg_check(struct page *page) > > > > { > > > > return folio_memcg_check(page_folio(page)); > > > > } > > > > > > If you look at my commit becacb04fdd4, I was preserving the existing > > > behaviour of page_memcg_check() when passed a tail page. It would > > > previously, rightly or wrongly, read the memcg_data from the tail page > > > and get back NULL. > > > > Right, I looked at that. I also looked at 1b7e4464d43a which added > > folio_memcg() and changed page_memcg()'s behavior to use page_folio() > > to retrieve the memcg from the head, which made me wonder why > > different decisions were made for these 2 helpers. > > > > Were the users of page_memcg() already passing in head pages only? > > There were 18 months between those commits ... I'd learned to be > more careful about maintaining the semantics instead of changing > them to "what they should have been". > > > > > > > I suspect that was not the intended behaviour, but I do not think this > > > patch is the right fix; it simply papers over the problem and maybe > > > creates a new one. Callers of page_memcg_check() should be eliminated, > > > precisely because of this ambiguity. It's up to the people who understand > > > each of the callers who need to make the decision to always convert the > > > page that they have to a folio and ask about its memcg, or whether they > > > want to preserve the existing behaviour of returning NULL for tail pages. > > > > > > So, I say NACK to this patch as it does not preserve existing behaviour, > > > and does not advance our understanding of what we have wrought. > > > > I am not sure which patch you are NACKing, the original patch from > > Hugh (adding compound_head() to page_cgroup_ino()) or the suggested > > alternative patch which changes page_memcg_check() to use > > page_folio(). > > Both patches are NACKed. page_memcg_check() needs to go away > because it has the tail page ambiguity. Both callers should be using > folio_memcg_check() directly and resolving for themselves what the > correct behaviour is when seeing a tail page. > I agree. I even suggested this to Michal in one of the replies. For page_cgroup_ino() we can simply pass in folio_memcg(page_folio(page)), which would mimic what Hugh's patch is doing for page_cgroup_ino(). For page owner, I am not sure if we want to do something similar (which would start printing the memcg for tail pages as well), or explicitly excluding tail pages and THEN do folio_memcg(page_folio(page)) to get the memcg of head pages. Waiman, what do you think?
On 3/15/23 17:43, Yosry Ahmed wrote: > On Wed, Mar 15, 2023 at 5:19 AM Matthew Wilcox <willy@infradead.org> wrote: >> On Wed, Mar 15, 2023 at 12:04:10AM -0700, Yosry Ahmed wrote: >>> On Tue, Mar 14, 2023 at 9:54 PM Matthew Wilcox <willy@infradead.org> wrote: >>>> On Mon, Mar 13, 2023 at 02:08:53PM -0700, Yosry Ahmed wrote: >>>>> On Mon, Mar 13, 2023 at 12:44 PM Andrew Morton >>>>> <akpm@linux-foundation.org> wrote: >>>>>> On Mon, 13 Mar 2023 08:34:52 +0000 Yosry Ahmed <yosryahmed@google.com> wrote: >>>>>> >>>>>>> From: Hugh Dickins <hughd@google.com> >>>>>>> >>>>>>> In a kernel with added WARN_ON_ONCE(PageTail) in page_memcg_check(), we >>>>>>> observed a warning from page_cgroup_ino() when reading >>>>>>> /proc/kpagecgroup. >>>>>> If this is the only known situation in which page_memcg_check() is >>>>>> passed a tail page, why does page_memcg_check() have >>>>>> >>>>>> if (PageTail(page)) >>>>>> return NULL; >>>>>> >>>>>> ? Can we remove this to simplify, streamline and clarify? >>>>> I guess it's a safety check so that we don't end up trying to cast a >>>>> tail page to a folio. My opinion is to go one step further and change >>>>> page_memcg_check() to do return the memcg of the head page, i.e: >>>>> >>>>> static inline struct mem_cgroup *page_memcg_check(struct page *page) >>>>> { >>>>> return folio_memcg_check(page_folio(page)); >>>>> } >>>> If you look at my commit becacb04fdd4, I was preserving the existing >>>> behaviour of page_memcg_check() when passed a tail page. It would >>>> previously, rightly or wrongly, read the memcg_data from the tail page >>>> and get back NULL. >>> Right, I looked at that. I also looked at 1b7e4464d43a which added >>> folio_memcg() and changed page_memcg()'s behavior to use page_folio() >>> to retrieve the memcg from the head, which made me wonder why >>> different decisions were made for these 2 helpers. >>> >>> Were the users of page_memcg() already passing in head pages only? >> There were 18 months between those commits ... I'd learned to be >> more careful about maintaining the semantics instead of changing >> them to "what they should have been". >> >>>> I suspect that was not the intended behaviour, but I do not think this >>>> patch is the right fix; it simply papers over the problem and maybe >>>> creates a new one. Callers of page_memcg_check() should be eliminated, >>>> precisely because of this ambiguity. It's up to the people who understand >>>> each of the callers who need to make the decision to always convert the >>>> page that they have to a folio and ask about its memcg, or whether they >>>> want to preserve the existing behaviour of returning NULL for tail pages. >>>> >>>> So, I say NACK to this patch as it does not preserve existing behaviour, >>>> and does not advance our understanding of what we have wrought. >>> I am not sure which patch you are NACKing, the original patch from >>> Hugh (adding compound_head() to page_cgroup_ino()) or the suggested >>> alternative patch which changes page_memcg_check() to use >>> page_folio(). >> Both patches are NACKed. page_memcg_check() needs to go away >> because it has the tail page ambiguity. Both callers should be using >> folio_memcg_check() directly and resolving for themselves what the >> correct behaviour is when seeing a tail page. >> > I agree. I even suggested this to Michal in one of the replies. > > For page_cgroup_ino() we can simply pass in > folio_memcg(page_folio(page)), which would mimic what Hugh's patch is > doing for page_cgroup_ino(). > > For page owner, I am not sure if we want to do something similar > (which would start printing the memcg for tail pages as well), or > explicitly excluding tail pages and THEN do > folio_memcg(page_folio(page)) to get the memcg of head pages. Waiman, > what do you think? I prefer the current behavior of printing information for the head page only. I am open to suggestion of the best APIs to use. Cheers, Longman
On Wed, Mar 15, 2023 at 5:09 PM Waiman Long <longman@redhat.com> wrote: > > > On 3/15/23 17:43, Yosry Ahmed wrote: > > On Wed, Mar 15, 2023 at 5:19 AM Matthew Wilcox <willy@infradead.org> wrote: > >> On Wed, Mar 15, 2023 at 12:04:10AM -0700, Yosry Ahmed wrote: > >>> On Tue, Mar 14, 2023 at 9:54 PM Matthew Wilcox <willy@infradead.org> wrote: > >>>> On Mon, Mar 13, 2023 at 02:08:53PM -0700, Yosry Ahmed wrote: > >>>>> On Mon, Mar 13, 2023 at 12:44 PM Andrew Morton > >>>>> <akpm@linux-foundation.org> wrote: > >>>>>> On Mon, 13 Mar 2023 08:34:52 +0000 Yosry Ahmed <yosryahmed@google.com> wrote: > >>>>>> > >>>>>>> From: Hugh Dickins <hughd@google.com> > >>>>>>> > >>>>>>> In a kernel with added WARN_ON_ONCE(PageTail) in page_memcg_check(), we > >>>>>>> observed a warning from page_cgroup_ino() when reading > >>>>>>> /proc/kpagecgroup. > >>>>>> If this is the only known situation in which page_memcg_check() is > >>>>>> passed a tail page, why does page_memcg_check() have > >>>>>> > >>>>>> if (PageTail(page)) > >>>>>> return NULL; > >>>>>> > >>>>>> ? Can we remove this to simplify, streamline and clarify? > >>>>> I guess it's a safety check so that we don't end up trying to cast a > >>>>> tail page to a folio. My opinion is to go one step further and change > >>>>> page_memcg_check() to do return the memcg of the head page, i.e: > >>>>> > >>>>> static inline struct mem_cgroup *page_memcg_check(struct page *page) > >>>>> { > >>>>> return folio_memcg_check(page_folio(page)); > >>>>> } > >>>> If you look at my commit becacb04fdd4, I was preserving the existing > >>>> behaviour of page_memcg_check() when passed a tail page. It would > >>>> previously, rightly or wrongly, read the memcg_data from the tail page > >>>> and get back NULL. > >>> Right, I looked at that. I also looked at 1b7e4464d43a which added > >>> folio_memcg() and changed page_memcg()'s behavior to use page_folio() > >>> to retrieve the memcg from the head, which made me wonder why > >>> different decisions were made for these 2 helpers. > >>> > >>> Were the users of page_memcg() already passing in head pages only? > >> There were 18 months between those commits ... I'd learned to be > >> more careful about maintaining the semantics instead of changing > >> them to "what they should have been". > >> > >>>> I suspect that was not the intended behaviour, but I do not think this > >>>> patch is the right fix; it simply papers over the problem and maybe > >>>> creates a new one. Callers of page_memcg_check() should be eliminated, > >>>> precisely because of this ambiguity. It's up to the people who understand > >>>> each of the callers who need to make the decision to always convert the > >>>> page that they have to a folio and ask about its memcg, or whether they > >>>> want to preserve the existing behaviour of returning NULL for tail pages. > >>>> > >>>> So, I say NACK to this patch as it does not preserve existing behaviour, > >>>> and does not advance our understanding of what we have wrought. > >>> I am not sure which patch you are NACKing, the original patch from > >>> Hugh (adding compound_head() to page_cgroup_ino()) or the suggested > >>> alternative patch which changes page_memcg_check() to use > >>> page_folio(). > >> Both patches are NACKed. page_memcg_check() needs to go away > >> because it has the tail page ambiguity. Both callers should be using > >> folio_memcg_check() directly and resolving for themselves what the > >> correct behaviour is when seeing a tail page. > >> > > I agree. I even suggested this to Michal in one of the replies. > > > > For page_cgroup_ino() we can simply pass in > > folio_memcg(page_folio(page)), which would mimic what Hugh's patch is > > doing for page_cgroup_ino(). > > > > For page owner, I am not sure if we want to do something similar > > (which would start printing the memcg for tail pages as well), or > > explicitly excluding tail pages and THEN do > > folio_memcg(page_folio(page)) to get the memcg of head pages. Waiman, > > what do you think? > > I prefer the current behavior of printing information for the head page > only. I am open to suggestion of the best APIs to use. I think instead of explicitly checking page->memcg_data, we can check PageTail() and return explicitly for tail pages tails, check PageSlab() to print the message for slab pages, then get the page's memcg through folio_memcg_check(page_folio(page)). Something like: static inline int print_page_owner_memcg(char *kbuf, size_t count, int ret, struct page *page) { ... rcu_read_lock(); /* Only head pages hold refs to a memcg */ if (PageTail(page)) goto out_unlock; if (PageSlab(page)) ret += scnprintf(kbuf + ret, count - ret, "Slab cache page\n"); memcg = folio_memcg_check(page_folio(page)); if (!memcg) goto out_unlock; ... } Matthew, What do you think? > > Cheers, > Longman >
On Wed, Mar 15, 2023 at 05:25:49PM -0700, Yosry Ahmed wrote: [snipped 80 lines. please learn to trim] > I think instead of explicitly checking page->memcg_data, we can check > PageTail() and return explicitly for tail pages tails, check > PageSlab() to print the message for slab pages, then get the page's > memcg through folio_memcg_check(page_folio(page)). > > Something like: > > static inline int print_page_owner_memcg(char *kbuf, size_t count, int ret, > struct page *page) > { > ... > rcu_read_lock(); > > /* Only head pages hold refs to a memcg */ > if (PageTail(page)) > goto out_unlock; > > if (PageSlab(page)) > ret += scnprintf(kbuf + ret, count - ret, "Slab cache page\n"); > > memcg = folio_memcg_check(page_folio(page)); > if (!memcg) > goto out_unlock; > ... > } > > Matthew, What do you think? Brrr, this is hard. read_page_owner() holds no locks or references, so pages can transform between being head/tail/order-0 while we're running. It _tries_ to skip over tail pages in the most inefficient way possible: if (!IS_ALIGNED(pfn, 1 << page_owner->order)) goto ext_put_continue; But any attempt to use folio APIs is going to risk tripping the assertions in the folio code that it's not a tail. This requires more thought.
On Wed, Mar 15, 2023 at 8:07 PM Matthew Wilcox <willy@infradead.org> wrote: > > On Wed, Mar 15, 2023 at 05:25:49PM -0700, Yosry Ahmed wrote: > [snipped 80 lines. please learn to trim] > > I think instead of explicitly checking page->memcg_data, we can check > > PageTail() and return explicitly for tail pages tails, check > > PageSlab() to print the message for slab pages, then get the page's > > memcg through folio_memcg_check(page_folio(page)). > > > > Something like: > > > > static inline int print_page_owner_memcg(char *kbuf, size_t count, int ret, > > struct page *page) > > { > > ... > > rcu_read_lock(); > > > > /* Only head pages hold refs to a memcg */ > > if (PageTail(page)) > > goto out_unlock; > > > > if (PageSlab(page)) > > ret += scnprintf(kbuf + ret, count - ret, "Slab cache page\n"); > > > > memcg = folio_memcg_check(page_folio(page)); > > if (!memcg) > > goto out_unlock; > > ... > > } > > > > Matthew, What do you think? > > Brrr, this is hard. read_page_owner() holds no locks or references, > so pages can transform between being head/tail/order-0 while we're > running. > > It _tries_ to skip over tail pages in the most inefficient way possible: > > if (!IS_ALIGNED(pfn, 1 << page_owner->order)) > goto ext_put_continue; > > But any attempt to use folio APIs is going to risk tripping the > assertions in the folio code that it's not a tail. This requires > more thought. Ugh yeah. I thought the worst that could happen is that if a page becomes a tail page after the PageTail() check, then we will not skip it and we will read the memcg from the head instead, which shouldn't be the end of the world. I missed the fact that the folio returned by page_folio() can change before folio_memcg_check() gets to read its memcg_data. I guess this race exists with the current implementation as well. page_memcg_check() will check for tail pages then cast the page to a folio. If the page becomes a tail page after the PageTail() check inside page_memcg_check() we risk running into the same situation. (FWIW folio_memcg_check() doesn't seem to have assertions for tail pages today)
On Wed, Mar 15, 2023 at 8:16 PM Yosry Ahmed <yosryahmed@google.com> wrote: > > On Wed, Mar 15, 2023 at 8:07 PM Matthew Wilcox <willy@infradead.org> wrote: > > > > On Wed, Mar 15, 2023 at 05:25:49PM -0700, Yosry Ahmed wrote: > > [snipped 80 lines. please learn to trim] > > > I think instead of explicitly checking page->memcg_data, we can check > > > PageTail() and return explicitly for tail pages tails, check > > > PageSlab() to print the message for slab pages, then get the page's > > > memcg through folio_memcg_check(page_folio(page)). > > > > > > Something like: > > > > > > static inline int print_page_owner_memcg(char *kbuf, size_t count, int ret, > > > struct page *page) > > > { > > > ... > > > rcu_read_lock(); > > > > > > /* Only head pages hold refs to a memcg */ > > > if (PageTail(page)) > > > goto out_unlock; > > > > > > if (PageSlab(page)) > > > ret += scnprintf(kbuf + ret, count - ret, "Slab cache page\n"); > > > > > > memcg = folio_memcg_check(page_folio(page)); > > > if (!memcg) > > > goto out_unlock; > > > ... > > > } > > > > > > Matthew, What do you think? > > > > Brrr, this is hard. read_page_owner() holds no locks or references, > > so pages can transform between being head/tail/order-0 while we're > > running. > > > > It _tries_ to skip over tail pages in the most inefficient way possible: > > > > if (!IS_ALIGNED(pfn, 1 << page_owner->order)) > > goto ext_put_continue; > > > > But any attempt to use folio APIs is going to risk tripping the > > assertions in the folio code that it's not a tail. This requires > > more thought. > > Ugh yeah. I thought the worst that could happen is that if a page > becomes a tail page after the PageTail() check, then we will not skip > it and we will read the memcg from the head instead, which shouldn't > be the end of the world. I missed the fact that the folio returned by > page_folio() can change before folio_memcg_check() gets to read its > memcg_data. > > I guess this race exists with the current implementation as well. > page_memcg_check() will check for tail pages then cast the page to a > folio. If the page becomes a tail page after the PageTail() check > inside page_memcg_check() we risk running into the same situation. > > (FWIW folio_memcg_check() doesn't seem to have assertions for tail pages today) Matthew, any thoughts on this?
diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 5abffe6f8389..e3a55295725e 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -395,7 +395,7 @@ ino_t page_cgroup_ino(struct page *page) unsigned long ino = 0; rcu_read_lock(); - memcg = page_memcg_check(page); + memcg = page_memcg_check(compound_head(page)); while (memcg && !(memcg->css.flags & CSS_ONLINE)) memcg = parent_mem_cgroup(memcg);