diff mbox series

[v2] mm, dump_page: do not crash with bad compound_mapcount()

Message ID 20200804214807.169256-1-jhubbard@nvidia.com (mailing list archive)
State New, archived
Headers show
Series [v2] mm, dump_page: do not crash with bad compound_mapcount() | expand

Commit Message

John Hubbard Aug. 4, 2020, 9:48 p.m. UTC
If a compound page is being split while dump_page() is being run on that
page, we can end up calling compound_mapcount() on a page that is no
longer compound. This leads to a crash (already seen at least once in
the field), due to the VM_BUG_ON_PAGE() assertion inside
compound_mapcount().

(The above is from Matthew Wilcox's analysis of Qian Cai's bug report.)

A similar problem is possible, via compound_pincount() instead of
compound_mapcount().

In order to avoid this kind of crash, make dump_page() slightly more
robust, by providing a pair of simpler routines that don't contain
assertions: head_mapcount() and head_pincount().

For debug tools, we don't want to go *too* far in this direction, but
this is a simple small fix, and the crash has already been seen, so it's
a good trade-off.

Reported-by: Qian Cai <cai@lca.pw>
Suggested-by: Matthew Wilcox <willy@infradead.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
---
Hi,

I'm assuming that a fix is not required for -stable, but let me know if
others feel differently. The dump_page() code has changed a lot in that
area.

Changes since v1 [1]:

1) Rebased onto mmotm

2) Used a simpler head_*count() approach.

3) Added Matthew's Suggested-by: tag

4) Support pincount as well as mapcount.

[1] https://lore.kernel.org/linux-mm/20200804183943.1244828-1-jhubbard@nvidia.com/

thanks,
John Hubbard

 include/linux/mm.h | 14 ++++++++++++--
 mm/debug.c         |  6 +++---
 2 files changed, 15 insertions(+), 5 deletions(-)

Comments

Vlastimil Babka Aug. 6, 2020, 11:45 a.m. UTC | #1
On 8/4/20 11:48 PM, John Hubbard wrote:
> If a compound page is being split while dump_page() is being run on that
> page, we can end up calling compound_mapcount() on a page that is no
> longer compound. This leads to a crash (already seen at least once in
> the field), due to the VM_BUG_ON_PAGE() assertion inside
> compound_mapcount().
> 
> (The above is from Matthew Wilcox's analysis of Qian Cai's bug report.)
> 
> A similar problem is possible, via compound_pincount() instead of
> compound_mapcount().
> 
> In order to avoid this kind of crash, make dump_page() slightly more
> robust, by providing a pair of simpler routines that don't contain
> assertions: head_mapcount() and head_pincount().
> 
> For debug tools, we don't want to go *too* far in this direction, but
> this is a simple small fix, and the crash has already been seen, so it's
> a good trade-off.
> 
> Reported-by: Qian Cai <cai@lca.pw>
> Suggested-by: Matthew Wilcox <willy@infradead.org>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Signed-off-by: John Hubbard <jhubbard@nvidia.com>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

> ---
> Hi,
> 
> I'm assuming that a fix is not required for -stable, but let me know if
> others feel differently. The dump_page() code has changed a lot in that
> area.

I'd say only if we backport all those changes, as most were to avoid similar
kind of crashes?
How about this additional patch now that we have head_mapcoun()? (I wouldn't
go for squashing as the goal and scope is too different).

----8<----
From 3b3d5b4639eea9c0787eed510a32acdc918d569f Mon Sep 17 00:00:00 2001
From: Vlastimil Babka <vbabka@suse.cz>
Date: Thu, 6 Aug 2020 13:33:58 +0200
Subject: [PATCH] mm, thp: use head_mapcount when we know we have a head page

Patch "mm, dump_page: do not crash with bad compound_mapcount()" has introduced
head_mapcount() to split out the part of compound_mapcount() where we already
know/assume we have a head page. We can use the new function in more places
where we already have a head page, to avoid the overhead of compound_head()
and (with DEBUG_VM) a debug check. This patch does that. There are few more
applicable places, but behind DEBUG_VM so performance is not important, and the
extra debug check in compound_mapcount() could be useful instead.

The bloat-o-meter difference without DEBUG_VM is the following:

add/remove: 0/0 grow/shrink: 1/4 up/down: 32/-56 (-24)
Function                                     old     new   delta
__split_huge_pmd                            2867    2899     +32
shrink_page_list                            3860    3847     -13
reuse_swap_page                              762     748     -14
page_trans_huge_mapcount                     153     139     -14
total_mapcount                               187     172     -15
Total: Before=8687306, After=8687282, chg -0.00%

This just shows that compiler wasn't able to prove we have a head page by
itself. In __split_huge_pmd() the eliminated check probably led to different
optimization decisions thus code size increased.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/huge_memory.c | 6 +++---
 mm/swapfile.c    | 2 +-
 mm/vmscan.c      | 2 +-
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 90733cefa528..5927874b7894 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2125,7 +2125,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
 	 * Set PG_double_map before dropping compound_mapcount to avoid
 	 * false-negative page_mapped().
 	 */
-	if (compound_mapcount(page) > 1 && !TestSetPageDoubleMap(page)) {
+	if (head_mapcount(page) > 1 && !TestSetPageDoubleMap(page)) {
 		for (i = 0; i < HPAGE_PMD_NR; i++)
 			atomic_inc(&page[i]._mapcount);
 	}
@@ -2467,7 +2467,7 @@ int total_mapcount(struct page *page)
 	if (likely(!PageCompound(page)))
 		return atomic_read(&page->_mapcount) + 1;
 
-	compound = compound_mapcount(page);
+	compound = head_mapcount(page);
 	if (PageHuge(page))
 		return compound;
 	ret = compound;
@@ -2531,7 +2531,7 @@ int page_trans_huge_mapcount(struct page *page, int *total_mapcount)
 		ret -= 1;
 		_total_mapcount -= HPAGE_PMD_NR;
 	}
-	mapcount = compound_mapcount(page);
+	mapcount = head_mapcount(page);
 	ret += mapcount;
 	_total_mapcount += mapcount;
 	if (total_mapcount)
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 9ee4211835c6..c5e722de38b8 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -1673,7 +1673,7 @@ static int page_trans_huge_map_swapcount(struct page *page, int *total_mapcount,
 		map_swapcount -= 1;
 		_total_mapcount -= HPAGE_PMD_NR;
 	}
-	mapcount = compound_mapcount(page);
+	mapcount = head_mapcount(page);
 	map_swapcount += mapcount;
 	_total_mapcount += mapcount;
 	if (total_mapcount)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index a086c104a9a6..72218cdfd902 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1248,7 +1248,7 @@ static unsigned int shrink_page_list(struct list_head *page_list,
 					 * away. Chances are some or all of the
 					 * tail pages can be freed without IO.
 					 */
-					if (!compound_mapcount(page) &&
+					if (!head_mapcount(page) &&
 					    split_huge_page_to_list(page,
 								    page_list))
 						goto activate_locked;
Matthew Wilcox Aug. 6, 2020, 1:48 p.m. UTC | #2
On Thu, Aug 06, 2020 at 01:45:11PM +0200, Vlastimil Babka wrote:
> How about this additional patch now that we have head_mapcoun()? (I wouldn't
> go for squashing as the goal and scope is too different).

I like it.  It bothers me that the compiler doesn't know that
compound_head(compound_head(x)) == compound_head(x).  I updated
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32911 with a request to be
able to tell the compiler that compound_head() is idempotent.

> The bloat-o-meter difference without DEBUG_VM is the following:
> 
> add/remove: 0/0 grow/shrink: 1/4 up/down: 32/-56 (-24)
> Function                                     old     new   delta
> __split_huge_pmd                            2867    2899     +32
> shrink_page_list                            3860    3847     -13
> reuse_swap_page                              762     748     -14
> page_trans_huge_mapcount                     153     139     -14
> total_mapcount                               187     172     -15
> Total: Before=8687306, After=8687282, chg -0.00%

That's great.  I'm expecting improvements from my thp_head() macro when
that lands (currently in Andrew's tree).  I have been reluctant to replace
current callers of compound_head() with thp_head(), but I suspect PF_HEAD
could use thp_head() and save a few bytes on a tinyconfig build.

> +++ b/mm/huge_memory.c
> @@ -2125,7 +2125,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
>  	 * Set PG_double_map before dropping compound_mapcount to avoid
>  	 * false-negative page_mapped().
>  	 */
> -	if (compound_mapcount(page) > 1 && !TestSetPageDoubleMap(page)) {
> +	if (head_mapcount(page) > 1 && !TestSetPageDoubleMap(page)) {

I'm a little nervous about this one.  The page does actually come from
pmd_page(), and today that's guaranteed to be a head page.  But I'm
not convinced that's going to still be true in twenty years.  With the
current THP patchset, I won't allocate pages larger than PMD order, but
I can see there being interest in tracking pages in chunks larger than
2MB in the future.  And then pmd_page() might well return a tail page.
So it might be a good idea to not convert this one.

> @@ -2467,7 +2467,7 @@ int total_mapcount(struct page *page)
>  	if (likely(!PageCompound(page)))
>  		return atomic_read(&page->_mapcount) + 1;
>  
> -	compound = compound_mapcount(page);
> +	compound = head_mapcount(page);
>  	if (PageHuge(page))
>  		return compound;
>  	ret = compound;

Yes.  This function is a little confusing because it uses PageCompound()
all the way through when it really should be using PageHead ... because
the opening line of the function is: VM_BUG_ON_PAGE(PageTail(page), page);

> @@ -2531,7 +2531,7 @@ int page_trans_huge_mapcount(struct page *page, int *total_mapcount)
>  		ret -= 1;
>  		_total_mapcount -= HPAGE_PMD_NR;
>  	}
> -	mapcount = compound_mapcount(page);
> +	mapcount = head_mapcount(page);
>  	ret += mapcount;
>  	_total_mapcount += mapcount;
>  	if (total_mapcount)

Yes, we called compound_head() earlier in the function.  Safe.

> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index 9ee4211835c6..c5e722de38b8 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -1673,7 +1673,7 @@ static int page_trans_huge_map_swapcount(struct page *page, int *total_mapcount,
>  		map_swapcount -= 1;
>  		_total_mapcount -= HPAGE_PMD_NR;
>  	}
> -	mapcount = compound_mapcount(page);
> +	mapcount = head_mapcount(page);
>  	map_swapcount += mapcount;
>  	_total_mapcount += mapcount;
>  	if (total_mapcount)

Yes.  page is a head page at this point.

> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index a086c104a9a6..72218cdfd902 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1248,7 +1248,7 @@ static unsigned int shrink_page_list(struct list_head *page_list,
>  					 * away. Chances are some or all of the
>  					 * tail pages can be freed without IO.
>  					 */
> -					if (!compound_mapcount(page) &&
> +					if (!head_mapcount(page) &&
>  					    split_huge_page_to_list(page,
>  								    page_list))
>  						goto activate_locked;

Yes.  We don't put (can't put!) tail pages on the lists, so this must be a
head page.
Vlastimil Babka Aug. 6, 2020, 3:13 p.m. UTC | #3
On 8/6/20 3:48 PM, Matthew Wilcox wrote:
> On Thu, Aug 06, 2020 at 01:45:11PM +0200, Vlastimil Babka wrote:
>> How about this additional patch now that we have head_mapcoun()? (I wouldn't
>> go for squashing as the goal and scope is too different).
> 
> I like it.  It bothers me that the compiler doesn't know that
> compound_head(compound_head(x)) == compound_head(x).  I updated
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32911 with a request to be
> able to tell the compiler that compound_head() is idempotent.

Yeah it would be nice to get the benefits everywhere automatically. But I guess
the compiler would have to discard the idempotence assumptions if there are
multiple consecutive (perhaps hidden behind page flag access)
compound_head(page) from a function, as soon as we modify the struct page somewhere.

>> The bloat-o-meter difference without DEBUG_VM is the following:
>> 
>> add/remove: 0/0 grow/shrink: 1/4 up/down: 32/-56 (-24)
>> Function                                     old     new   delta
>> __split_huge_pmd                            2867    2899     +32
>> shrink_page_list                            3860    3847     -13
>> reuse_swap_page                              762     748     -14
>> page_trans_huge_mapcount                     153     139     -14
>> total_mapcount                               187     172     -15
>> Total: Before=8687306, After=8687282, chg -0.00%
> 
> That's great.  I'm expecting improvements from my thp_head() macro when
> that lands (currently in Andrew's tree).  I have been reluctant to replace
> current callers of compound_head() with thp_head(), but I suspect PF_HEAD
> could use thp_head() and save a few bytes on a tinyconfig build.
> 
>> +++ b/mm/huge_memory.c
>> @@ -2125,7 +2125,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
>>  	 * Set PG_double_map before dropping compound_mapcount to avoid
>>  	 * false-negative page_mapped().
>>  	 */
>> -	if (compound_mapcount(page) > 1 && !TestSetPageDoubleMap(page)) {
>> +	if (head_mapcount(page) > 1 && !TestSetPageDoubleMap(page)) {
> 
> I'm a little nervous about this one.  The page does actually come from
> pmd_page(), and today that's guaranteed to be a head page.  But I'm
> not convinced that's going to still be true in twenty years.  With the
> current THP patchset, I won't allocate pages larger than PMD order, but
> I can see there being interest in tracking pages in chunks larger than
> 2MB in the future.  And then pmd_page() might well return a tail page.
> So it might be a good idea to not convert this one.

Hmm the function converts the compound mapcount of the whole page to a
HPAGE_PMD_NR of base pages. If suddenly the compound page was bigger than a pmd,
then I guess this wouldn't work properly anymore without changes anyway?
Maybe we could stick something like VM_BUG_ON(PageTransHuge(page)) there as
"enforced documentation" for now?
Matthew Wilcox Aug. 6, 2020, 3:39 p.m. UTC | #4
On Thu, Aug 06, 2020 at 05:13:05PM +0200, Vlastimil Babka wrote:
> On 8/6/20 3:48 PM, Matthew Wilcox wrote:
> > On Thu, Aug 06, 2020 at 01:45:11PM +0200, Vlastimil Babka wrote:
> >> How about this additional patch now that we have head_mapcoun()? (I wouldn't
> >> go for squashing as the goal and scope is too different).
> > 
> > I like it.  It bothers me that the compiler doesn't know that
> > compound_head(compound_head(x)) == compound_head(x).  I updated
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32911 with a request to be
> > able to tell the compiler that compound_head() is idempotent.
> 
> Yeah it would be nice to get the benefits everywhere automatically. But I guess
> the compiler would have to discard the idempotence assumptions if there are
> multiple consecutive (perhaps hidden behind page flag access)
> compound_head(page) from a function, as soon as we modify the struct page somewhere.

The only place where we modify the struct page is the split code, and
I think we're pretty careful to handle the pages appropriately there.
The buddy allocator has its own way of combining pages.

> >> +++ b/mm/huge_memory.c
> >> @@ -2125,7 +2125,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
> >>  	 * Set PG_double_map before dropping compound_mapcount to avoid
> >>  	 * false-negative page_mapped().
> >>  	 */
> >> -	if (compound_mapcount(page) > 1 && !TestSetPageDoubleMap(page)) {
> >> +	if (head_mapcount(page) > 1 && !TestSetPageDoubleMap(page)) {
> > 
> > I'm a little nervous about this one.  The page does actually come from
> > pmd_page(), and today that's guaranteed to be a head page.  But I'm
> > not convinced that's going to still be true in twenty years.  With the
> > current THP patchset, I won't allocate pages larger than PMD order, but
> > I can see there being interest in tracking pages in chunks larger than
> > 2MB in the future.  And then pmd_page() might well return a tail page.
> > So it might be a good idea to not convert this one.
> 
> Hmm the function converts the compound mapcount of the whole page to a
> HPAGE_PMD_NR of base pages. If suddenly the compound page was bigger than a pmd,
> then I guess this wouldn't work properly anymore without changes anyway?
> Maybe we could stick something like VM_BUG_ON(PageTransHuge(page)) there as
> "enforced documentation" for now?

I think it would work as-is.  But also I may have totally misunderstood it.
I'll write this declaratively and specifically for x86 (PMD order is 9)
... tell me when I've made a mistake ;-)

This function is for splitting the PMD.  We're leaving the underlying
page intact and just changing the page table.  So if, say, we have an
underlying 4MB page (and maybe the pages are mapped as PMDs in this
process), we might get subpage number 512 of this order-10 page.  We'd
need to check the DoubleMap bit on subpage 1, and the compound_mapcount
also stored in page 1, but we'd only want to spread the mapcount out
over the 512 subpages from 512-1023; we wouldn't want to spread it out
over 0-511 because they aren't affected by this particular PMD.

Having to reason about stuff like this is why I limited the THP code to
stop at PMD order ... I don't want to make my life even more complicated
than I have to!
Vlastimil Babka Aug. 6, 2020, 3:53 p.m. UTC | #5
On 8/6/20 5:39 PM, Matthew Wilcox wrote:
>> >> +++ b/mm/huge_memory.c
>> >> @@ -2125,7 +2125,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
>> >>  	 * Set PG_double_map before dropping compound_mapcount to avoid
>> >>  	 * false-negative page_mapped().
>> >>  	 */
>> >> -	if (compound_mapcount(page) > 1 && !TestSetPageDoubleMap(page)) {
>> >> +	if (head_mapcount(page) > 1 && !TestSetPageDoubleMap(page)) {
>> > 
>> > I'm a little nervous about this one.  The page does actually come from
>> > pmd_page(), and today that's guaranteed to be a head page.  But I'm
>> > not convinced that's going to still be true in twenty years.  With the
>> > current THP patchset, I won't allocate pages larger than PMD order, but
>> > I can see there being interest in tracking pages in chunks larger than
>> > 2MB in the future.  And then pmd_page() might well return a tail page.
>> > So it might be a good idea to not convert this one.
>> 
>> Hmm the function converts the compound mapcount of the whole page to a
>> HPAGE_PMD_NR of base pages. If suddenly the compound page was bigger than a pmd,
>> then I guess this wouldn't work properly anymore without changes anyway?
>> Maybe we could stick something like VM_BUG_ON(PageTransHuge(page)) there as
>> "enforced documentation" for now?
> 
> I think it would work as-is.  But also I may have totally misunderstood it.
> I'll write this declaratively and specifically for x86 (PMD order is 9)
> ... tell me when I've made a mistake ;-)
> 
> This function is for splitting the PMD.  We're leaving the underlying
> page intact and just changing the page table.  So if, say, we have an
> underlying 4MB page (and maybe the pages are mapped as PMDs in this
> process), we might get subpage number 512 of this order-10 page.  We'd
> need to check the DoubleMap bit on subpage 1, and the compound_mapcount
> also stored in page 1, but we'd only want to spread the mapcount out
> over the 512 subpages from 512-1023; we wouldn't want to spread it out
> over 0-511 because they aren't affected by this particular PMD.

Yeah, and then we decrease the compound mapcount, which is a counter of "how
many times is this compound page mapped as a whole". But we only removed (the
second) half of the compound mapping, so imho that would be wrong?

> Having to reason about stuff like this is why I limited the THP code to
> stop at PMD order ... I don't want to make my life even more complicated
> than I have to!

Kirill might correct me but I'd expect the THP code right now has baked in many
assumptions about THP pages being exactly HPAGE_PMD_ORDER large?
Matthew Wilcox Aug. 6, 2020, 5:15 p.m. UTC | #6
On Thu, Aug 06, 2020 at 05:53:10PM +0200, Vlastimil Babka wrote:
> On 8/6/20 5:39 PM, Matthew Wilcox wrote:
> >> >> +++ b/mm/huge_memory.c
> >> >> @@ -2125,7 +2125,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
> >> >>  	 * Set PG_double_map before dropping compound_mapcount to avoid
> >> >>  	 * false-negative page_mapped().
> >> >>  	 */
> >> >> -	if (compound_mapcount(page) > 1 && !TestSetPageDoubleMap(page)) {
> >> >> +	if (head_mapcount(page) > 1 && !TestSetPageDoubleMap(page)) {
> >> > 
> >> > I'm a little nervous about this one.  The page does actually come from
> >> > pmd_page(), and today that's guaranteed to be a head page.  But I'm
> >> > not convinced that's going to still be true in twenty years.  With the
> >> > current THP patchset, I won't allocate pages larger than PMD order, but
> >> > I can see there being interest in tracking pages in chunks larger than
> >> > 2MB in the future.  And then pmd_page() might well return a tail page.
> >> > So it might be a good idea to not convert this one.
> >> 
> >> Hmm the function converts the compound mapcount of the whole page to a
> >> HPAGE_PMD_NR of base pages. If suddenly the compound page was bigger than a pmd,
> >> then I guess this wouldn't work properly anymore without changes anyway?
> >> Maybe we could stick something like VM_BUG_ON(PageTransHuge(page)) there as
> >> "enforced documentation" for now?
> > 
> > I think it would work as-is.  But also I may have totally misunderstood it.
> > I'll write this declaratively and specifically for x86 (PMD order is 9)
> > ... tell me when I've made a mistake ;-)
> > 
> > This function is for splitting the PMD.  We're leaving the underlying
> > page intact and just changing the page table.  So if, say, we have an
> > underlying 4MB page (and maybe the pages are mapped as PMDs in this
> > process), we might get subpage number 512 of this order-10 page.  We'd
> > need to check the DoubleMap bit on subpage 1, and the compound_mapcount
> > also stored in page 1, but we'd only want to spread the mapcount out
> > over the 512 subpages from 512-1023; we wouldn't want to spread it out
> > over 0-511 because they aren't affected by this particular PMD.
> 
> Yeah, and then we decrease the compound mapcount, which is a counter of "how
> many times is this compound page mapped as a whole". But we only removed (the
> second) half of the compound mapping, so imho that would be wrong?

I'd expect that count to be incremented by 1 for each PMD that it's
mapped to?  ie change the definition of that counter slightly.

> > Having to reason about stuff like this is why I limited the THP code to
> > stop at PMD order ... I don't want to make my life even more complicated
> > than I have to!
> 
> Kirill might correct me but I'd expect the THP code right now has baked in many
> assumptions about THP pages being exactly HPAGE_PMD_ORDER large?

There are somewhat fewer places that make that assumption after applying
the ~80 patches here ... http://git.infradead.org/users/willy/pagecache.git

I have mostly not touched the anonymous THPs (obviously some of the code
paths are shared), although both Kirill & I think there's a win to be
had there too.
Kirill A. Shutemov Aug. 7, 2020, 2:35 p.m. UTC | #7
On Tue, Aug 04, 2020 at 02:48:07PM -0700, John Hubbard wrote:
> If a compound page is being split while dump_page() is being run on that
> page, we can end up calling compound_mapcount() on a page that is no
> longer compound. This leads to a crash (already seen at least once in
> the field), due to the VM_BUG_ON_PAGE() assertion inside
> compound_mapcount().
> 
> (The above is from Matthew Wilcox's analysis of Qian Cai's bug report.)
> 
> A similar problem is possible, via compound_pincount() instead of
> compound_mapcount().
> 
> In order to avoid this kind of crash, make dump_page() slightly more
> robust, by providing a pair of simpler routines that don't contain
> assertions: head_mapcount() and head_pincount().

I find naming misleading. head_mapcount() and head_pincount() sounds like
a mapcount/pincount of the head page, but it's not. It's mapcount and
pincount of the compound page.

Maybe compound_mapcount_head() and compound_pincoun_head()? Or
__compound_mapcount() and __compound_pincount().

> For debug tools, we don't want to go *too* far in this direction, but
> this is a simple small fix, and the crash has already been seen, so it's
> a good trade-off.
> 
> Reported-by: Qian Cai <cai@lca.pw>
> Suggested-by: Matthew Wilcox <willy@infradead.org>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Signed-off-by: John Hubbard <jhubbard@nvidia.com>
> ---
> Hi,
> 
> I'm assuming that a fix is not required for -stable, but let me know if
> others feel differently. The dump_page() code has changed a lot in that
> area.
> 
> Changes since v1 [1]:
> 
> 1) Rebased onto mmotm
> 
> 2) Used a simpler head_*count() approach.
> 
> 3) Added Matthew's Suggested-by: tag
> 
> 4) Support pincount as well as mapcount.
> 
> [1] https://lore.kernel.org/linux-mm/20200804183943.1244828-1-jhubbard@nvidia.com/
> 
> thanks,
> John Hubbard
> 
>  include/linux/mm.h | 14 ++++++++++++--
>  mm/debug.c         |  6 +++---
>  2 files changed, 15 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 4f12b2465e80..8ab941cf73f4 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -776,6 +776,11 @@ static inline void *kvcalloc(size_t n, size_t size, gfp_t flags)
>  extern void kvfree(const void *addr);
>  extern void kvfree_sensitive(const void *addr, size_t len);
>  
> +static inline int head_mapcount(struct page *head)
> +{

Do we want VM_BUG_ON_PAGE(!PageHead(head), head) here?

> +	return atomic_read(compound_mapcount_ptr(head)) + 1;
> +}
> +
>  /*
>   * Mapcount of compound page as a whole, does not include mapped sub-pages.
>   *
> @@ -785,7 +790,7 @@ static inline int compound_mapcount(struct page *page)
>  {
>  	VM_BUG_ON_PAGE(!PageCompound(page), page);
>  	page = compound_head(page);
> -	return atomic_read(compound_mapcount_ptr(page)) + 1;
> +	return head_mapcount(page);
>  }
>  
>  /*
> @@ -898,11 +903,16 @@ static inline bool hpage_pincount_available(struct page *page)
>  	return PageCompound(page) && compound_order(page) > 1;
>  }
>  
> +static inline int head_pincount(struct page *head)
> +{

Ditto.

> +	return atomic_read(compound_pincount_ptr(head));
> +}
> +
>  static inline int compound_pincount(struct page *page)
>  {
>  	VM_BUG_ON_PAGE(!hpage_pincount_available(page), page);
>  	page = compound_head(page);
> -	return atomic_read(compound_pincount_ptr(page));
> +	return head_pincount(page);
>  }
>  
>  static inline void set_compound_order(struct page *page, unsigned int order)
> diff --git a/mm/debug.c b/mm/debug.c
> index c27fff1e3ca8..69b60637112b 100644
> --- a/mm/debug.c
> +++ b/mm/debug.c
> @@ -102,12 +102,12 @@ void __dump_page(struct page *page, const char *reason)
>  		if (hpage_pincount_available(page)) {
>  			pr_warn("head:%p order:%u compound_mapcount:%d compound_pincount:%d\n",
>  					head, compound_order(head),
> -					compound_mapcount(head),
> -					compound_pincount(head));
> +					head_mapcount(head),
> +					head_pincount(head));
>  		} else {
>  			pr_warn("head:%p order:%u compound_mapcount:%d\n",
>  					head, compound_order(head),
> -					compound_mapcount(head));
> +					head_mapcount(head));
>  		}
>  	}
>  	if (PageKsm(page))
> -- 
> 2.28.0
>
Kirill A. Shutemov Aug. 7, 2020, 2:53 p.m. UTC | #8
On Thu, Aug 06, 2020 at 06:15:00PM +0100, Matthew Wilcox wrote:
> On Thu, Aug 06, 2020 at 05:53:10PM +0200, Vlastimil Babka wrote:
> > On 8/6/20 5:39 PM, Matthew Wilcox wrote:
> > >> >> +++ b/mm/huge_memory.c
> > >> >> @@ -2125,7 +2125,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
> > >> >>  	 * Set PG_double_map before dropping compound_mapcount to avoid
> > >> >>  	 * false-negative page_mapped().
> > >> >>  	 */
> > >> >> -	if (compound_mapcount(page) > 1 && !TestSetPageDoubleMap(page)) {
> > >> >> +	if (head_mapcount(page) > 1 && !TestSetPageDoubleMap(page)) {
> > >> > 
> > >> > I'm a little nervous about this one.  The page does actually come from
> > >> > pmd_page(), and today that's guaranteed to be a head page.  But I'm
> > >> > not convinced that's going to still be true in twenty years.  With the
> > >> > current THP patchset, I won't allocate pages larger than PMD order, but
> > >> > I can see there being interest in tracking pages in chunks larger than
> > >> > 2MB in the future.  And then pmd_page() might well return a tail page.
> > >> > So it might be a good idea to not convert this one.
> > >> 
> > >> Hmm the function converts the compound mapcount of the whole page to a
> > >> HPAGE_PMD_NR of base pages. If suddenly the compound page was bigger than a pmd,
> > >> then I guess this wouldn't work properly anymore without changes anyway?
> > >> Maybe we could stick something like VM_BUG_ON(PageTransHuge(page)) there as
> > >> "enforced documentation" for now?
> > > 
> > > I think it would work as-is.  But also I may have totally misunderstood it.
> > > I'll write this declaratively and specifically for x86 (PMD order is 9)
> > > ... tell me when I've made a mistake ;-)
> > > 
> > > This function is for splitting the PMD.  We're leaving the underlying
> > > page intact and just changing the page table.  So if, say, we have an
> > > underlying 4MB page (and maybe the pages are mapped as PMDs in this
> > > process), we might get subpage number 512 of this order-10 page.  We'd
> > > need to check the DoubleMap bit on subpage 1, and the compound_mapcount
> > > also stored in page 1, but we'd only want to spread the mapcount out
> > > over the 512 subpages from 512-1023; we wouldn't want to spread it out
> > > over 0-511 because they aren't affected by this particular PMD.
> > 
> > Yeah, and then we decrease the compound mapcount, which is a counter of "how
> > many times is this compound page mapped as a whole". But we only removed (the
> > second) half of the compound mapping, so imho that would be wrong?
> 
> I'd expect that count to be incremented by 1 for each PMD that it's
> mapped to?  ie change the definition of that counter slightly.
> 
> > > Having to reason about stuff like this is why I limited the THP code to
> > > stop at PMD order ... I don't want to make my life even more complicated
> > > than I have to!
> > 
> > Kirill might correct me but I'd expect the THP code right now has baked in many
> > assumptions about THP pages being exactly HPAGE_PMD_ORDER large?

That will be true for PMD-mapped THP pages after applying Matthew's
patchset.

> There are somewhat fewer places that make that assumption after applying
> the ~80 patches here ... http://git.infradead.org/users/willy/pagecache.git

The patchset allows for THP to be anywhere between order-2 and
order-9 (on x86-64).

> I have mostly not touched the anonymous THPs (obviously some of the code
> paths are shared), although both Kirill & I think there's a win to be
> had there too.

Yeah. Reducing LRU handling overhead alone should be enough to justify the
effort. But we still would need to have numbers.
Matthew Wilcox Aug. 7, 2020, 3:10 p.m. UTC | #9
On Fri, Aug 07, 2020 at 05:35:04PM +0300, Kirill A. Shutemov wrote:
> On Tue, Aug 04, 2020 at 02:48:07PM -0700, John Hubbard wrote:
> > If a compound page is being split while dump_page() is being run on that
> > page, we can end up calling compound_mapcount() on a page that is no
> > longer compound. This leads to a crash (already seen at least once in
> > the field), due to the VM_BUG_ON_PAGE() assertion inside
> > compound_mapcount().

[...]
> > +static inline int head_mapcount(struct page *head)
> > +{
> 
> Do we want VM_BUG_ON_PAGE(!PageHead(head), head) here?

Well, no.  That was the point of the bug report -- by the time we called
compound_mapcount, the page was no longer a head page.

> > A similar problem is possible, via compound_pincount() instead of
> > compound_mapcount().
> > 
> > In order to avoid this kind of crash, make dump_page() slightly more
> > robust, by providing a pair of simpler routines that don't contain
> > assertions: head_mapcount() and head_pincount().
> 
> I find naming misleading. head_mapcount() and head_pincount() sounds like
> a mapcount/pincount of the head page, but it's not. It's mapcount and
> pincount of the compound page.

OK, point taken.  I might go for head_compound_mapcount()?  Or as I
originally suggested, just opencoding it like we do in __page_mapcount().
Kirill A. Shutemov Aug. 7, 2020, 4:48 p.m. UTC | #10
On Fri, Aug 07, 2020 at 04:10:29PM +0100, Matthew Wilcox wrote:
> On Fri, Aug 07, 2020 at 05:35:04PM +0300, Kirill A. Shutemov wrote:
> > On Tue, Aug 04, 2020 at 02:48:07PM -0700, John Hubbard wrote:
> > > If a compound page is being split while dump_page() is being run on that
> > > page, we can end up calling compound_mapcount() on a page that is no
> > > longer compound. This leads to a crash (already seen at least once in
> > > the field), due to the VM_BUG_ON_PAGE() assertion inside
> > > compound_mapcount().
> 
> [...]
> > > +static inline int head_mapcount(struct page *head)
> > > +{
> > 
> > Do we want VM_BUG_ON_PAGE(!PageHead(head), head) here?
> 
> Well, no.  That was the point of the bug report -- by the time we called
> compound_mapcount, the page was no longer a head page.

Right. VM_BUG_ON_PAGE(PageTail(head), head)?

> > > A similar problem is possible, via compound_pincount() instead of
> > > compound_mapcount().
> > > 
> > > In order to avoid this kind of crash, make dump_page() slightly more
> > > robust, by providing a pair of simpler routines that don't contain
> > > assertions: head_mapcount() and head_pincount().
> > 
> > I find naming misleading. head_mapcount() and head_pincount() sounds like
> > a mapcount/pincount of the head page, but it's not. It's mapcount and
> > pincount of the compound page.
> 
> OK, point taken.  I might go for head_compound_mapcount()?  Or as I
> originally suggested, just opencoding it like we do in __page_mapcount().

I'm fine either way.
John Hubbard Aug. 7, 2020, 10:40 p.m. UTC | #11
On 8/7/20 9:48 AM, Kirill A. Shutemov wrote:
>> [...]
>>>> +static inline int head_mapcount(struct page *head)
>>>> +{
>>>
>>> Do we want VM_BUG_ON_PAGE(!PageHead(head), head) here?
>>
>> Well, no.  That was the point of the bug report -- by the time we called
>> compound_mapcount, the page was no longer a head page.
> 
> Right. VM_BUG_ON_PAGE(PageTail(head), head)?

Sorry for overlooking that feedback point. Looking at it now, I'd much
rather not put any assertions at all here. This supposed to be for
implementing the failure case, and we've moved past assertions at this
point. In other words, dump_page() is part of *implementing* an
assertion, so calling assertions from within it is undesirable.

It's better to put the assertions in code that would call these inner
routines. Then, if we want to use these routines outside of mm/debug.c,
as per the other thread, then we should provide a wrapper that has such
assertions.


thanks,
diff mbox series

Patch

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 4f12b2465e80..8ab941cf73f4 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -776,6 +776,11 @@  static inline void *kvcalloc(size_t n, size_t size, gfp_t flags)
 extern void kvfree(const void *addr);
 extern void kvfree_sensitive(const void *addr, size_t len);
 
+static inline int head_mapcount(struct page *head)
+{
+	return atomic_read(compound_mapcount_ptr(head)) + 1;
+}
+
 /*
  * Mapcount of compound page as a whole, does not include mapped sub-pages.
  *
@@ -785,7 +790,7 @@  static inline int compound_mapcount(struct page *page)
 {
 	VM_BUG_ON_PAGE(!PageCompound(page), page);
 	page = compound_head(page);
-	return atomic_read(compound_mapcount_ptr(page)) + 1;
+	return head_mapcount(page);
 }
 
 /*
@@ -898,11 +903,16 @@  static inline bool hpage_pincount_available(struct page *page)
 	return PageCompound(page) && compound_order(page) > 1;
 }
 
+static inline int head_pincount(struct page *head)
+{
+	return atomic_read(compound_pincount_ptr(head));
+}
+
 static inline int compound_pincount(struct page *page)
 {
 	VM_BUG_ON_PAGE(!hpage_pincount_available(page), page);
 	page = compound_head(page);
-	return atomic_read(compound_pincount_ptr(page));
+	return head_pincount(page);
 }
 
 static inline void set_compound_order(struct page *page, unsigned int order)
diff --git a/mm/debug.c b/mm/debug.c
index c27fff1e3ca8..69b60637112b 100644
--- a/mm/debug.c
+++ b/mm/debug.c
@@ -102,12 +102,12 @@  void __dump_page(struct page *page, const char *reason)
 		if (hpage_pincount_available(page)) {
 			pr_warn("head:%p order:%u compound_mapcount:%d compound_pincount:%d\n",
 					head, compound_order(head),
-					compound_mapcount(head),
-					compound_pincount(head));
+					head_mapcount(head),
+					head_pincount(head));
 		} else {
 			pr_warn("head:%p order:%u compound_mapcount:%d\n",
 					head, compound_order(head),
-					compound_mapcount(head));
+					head_mapcount(head));
 		}
 	}
 	if (PageKsm(page))