diff mbox series

mm/migrate: fix page state accounting type conversion underflow

Message ID 20210722054840.501423-1-npiggin@gmail.com (mailing list archive)
State New
Headers show
Series mm/migrate: fix page state accounting type conversion underflow | expand

Commit Message

Nicholas Piggin July 22, 2021, 5:48 a.m. UTC
Similarly to commit 2da9f6305f306 ("mm/vmscan: fix NR_ISOLATED_FILE
corruption on 64-bit"), fix -ve int -> unsigned int -> long bug.

Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Fixes: c5fc5c3ae0c84 ("mm: migrate: account THP NUMA migration counters correctly")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 mm/migrate.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

David Hildenbrand July 22, 2021, 7:27 a.m. UTC | #1
On 22.07.21 07:48, Nicholas Piggin wrote:
> Similarly to commit 2da9f6305f306 ("mm/vmscan: fix NR_ISOLATED_FILE
> corruption on 64-bit"), fix -ve int -> unsigned int -> long bug.
> 
> Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> Fixes: c5fc5c3ae0c84 ("mm: migrate: account THP NUMA migration counters correctly")
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
>   mm/migrate.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 34a9ad3e0a4f..7e240437e7d9 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -2068,7 +2068,7 @@ int migrate_misplaced_page(struct page *page, struct vm_area_struct *vma,
>   	LIST_HEAD(migratepages);
>   	new_page_t *new;
>   	bool compound;
> -	unsigned int nr_pages = thp_nr_pages(page);
> +	int nr_pages = thp_nr_pages(page);
>   
>   	/*
>   	 * PTE mapped THP or HugeTLB page can't reach here so the page could
> 

This is too fragile to silently break again IMHO. Should we similarly to 
2da9f6305f306 handle the conversion explicitly inside the 
mod_node_page_state() call?
Andrew Morton July 22, 2021, 7:20 p.m. UTC | #2
On Thu, 22 Jul 2021 09:27:50 +0200 David Hildenbrand <david@redhat.com> wrote:

> On 22.07.21 07:48, Nicholas Piggin wrote:
> > Similarly to commit 2da9f6305f306 ("mm/vmscan: fix NR_ISOLATED_FILE
> > corruption on 64-bit"), fix -ve int -> unsigned int -> long bug.
> > 
> > Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> > Fixes: c5fc5c3ae0c84 ("mm: migrate: account THP NUMA migration counters correctly")
> > Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> > ---
> >   mm/migrate.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/mm/migrate.c b/mm/migrate.c
> > index 34a9ad3e0a4f..7e240437e7d9 100644
> > --- a/mm/migrate.c
> > +++ b/mm/migrate.c
> > @@ -2068,7 +2068,7 @@ int migrate_misplaced_page(struct page *page, struct vm_area_struct *vma,
> >   	LIST_HEAD(migratepages);
> >   	new_page_t *new;
> >   	bool compound;
> > -	unsigned int nr_pages = thp_nr_pages(page);
> > +	int nr_pages = thp_nr_pages(page);
> >   
> >   	/*
> >   	 * PTE mapped THP or HugeTLB page can't reach here so the page could
> > 
> 
> This is too fragile to silently break again IMHO. Should we similarly to 
> 2da9f6305f306 handle the conversion explicitly inside the 
> mod_node_page_state() call?

And please don't send us off to loo at another commit to understand
this one.  A full standalone changelog, please.

Preferably with a description of user-visible effects.  2da9f6305f306
said "Symptoms include CMA allocations hanging forever holding the
cma_mutex due to alloc_contig_range->...->isolate_migratepages_block
waiting forever in "while (unlikely(too_many_isolated(pgdat)))".  Is
that also the case with this bug?
Nicholas Piggin July 26, 2021, 1:43 a.m. UTC | #3
Excerpts from David Hildenbrand's message of July 22, 2021 5:27 pm:
> On 22.07.21 07:48, Nicholas Piggin wrote:
>> Similarly to commit 2da9f6305f306 ("mm/vmscan: fix NR_ISOLATED_FILE
>> corruption on 64-bit"), fix -ve int -> unsigned int -> long bug.
>> 
>> Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>> Fixes: c5fc5c3ae0c84 ("mm: migrate: account THP NUMA migration counters correctly")
>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
>> ---
>>   mm/migrate.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/mm/migrate.c b/mm/migrate.c
>> index 34a9ad3e0a4f..7e240437e7d9 100644
>> --- a/mm/migrate.c
>> +++ b/mm/migrate.c
>> @@ -2068,7 +2068,7 @@ int migrate_misplaced_page(struct page *page, struct vm_area_struct *vma,
>>   	LIST_HEAD(migratepages);
>>   	new_page_t *new;
>>   	bool compound;
>> -	unsigned int nr_pages = thp_nr_pages(page);
>> +	int nr_pages = thp_nr_pages(page);
>>   
>>   	/*
>>   	 * PTE mapped THP or HugeTLB page can't reach here so the page could
>> 
> 
> This is too fragile to silently break again IMHO. Should we similarly to 
> 2da9f6305f306 handle the conversion explicitly inside the 
> mod_node_page_state() call?

Casting to signed still has the fragility that the variable is unsigned
so negating it somewhere else would break. I was somewhat inconsistent 
in the fixes, but there is less code that uses the variable here so it's
simpler to change the type IMO.

Negating an unsigned type always gives you a non-negative number. 
Unfortunately types matter.

Thanks,
Nick
Matthew Wilcox July 27, 2021, 4:19 p.m. UTC | #4
On Thu, Jul 22, 2021 at 03:48:40PM +1000, Nicholas Piggin wrote:
> Similarly to commit 2da9f6305f306 ("mm/vmscan: fix NR_ISOLATED_FILE
> corruption on 64-bit"), fix -ve int -> unsigned int -> long bug.
> 
> Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> Fixes: c5fc5c3ae0c84 ("mm: migrate: account THP NUMA migration counters correctly")
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
>  mm/migrate.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 34a9ad3e0a4f..7e240437e7d9 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -2068,7 +2068,7 @@ int migrate_misplaced_page(struct page *page, struct vm_area_struct *vma,
>  	LIST_HEAD(migratepages);
>  	new_page_t *new;
>  	bool compound;
> -	unsigned int nr_pages = thp_nr_pages(page);
> +	int nr_pages = thp_nr_pages(page);

This has prompted me to go off and look through the folio work.  There
are a number of similar problems.  It's sad that gcc doesn't have a
warning that catched this (although Clang does!)  I've filed

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101645

In the meantime, I'm going to change:

 - folio_nr_pages() now returns 'long' instead of 'unsigned long'.
 - Everywhere that assigns it to a variable changes its type from 'int'
   or 'unsigned int' or 'unsigned long' to 'long'.

The only place this looks a little dodgy is:

+static inline bool folio_contains(struct folio *folio, pgoff_t index)
+{
+       /* HugeTLBfs indexes the page cache in units of hpage_size */
+       if (folio_test_hugetlb(folio))
+               return folio->index == index;
+       return index - folio_index(folio) < folio_nr_pages(folio);
+}

but i'm pretty sure that's OK because index & folio_index are
both unsigned long, and by my reading of the C spec, that
promotes long to unsigned long, and so we do an unsigned comparison
(meaning that index 3, folio_index() 4 will wrap around to ULONG_MAX
and the comparison will return false, as expected.

I also intend to change update_lru_size() to take a long.

This patch is insufficient ... mem_cgroup_move_account() has the same bug.
I'm not quite sure how to handle this patch -- I'm going to replace all
this in 5.15, but this should be backported to earlier kernels.
Aneesh Kumar K.V July 28, 2021, 2:15 p.m. UTC | #5
Matthew Wilcox <willy@infradead.org> writes:


> This patch is insufficient ... mem_cgroup_move_account() has the same bug.

Do we have similar issue mem_cgroup_move_account()? 

void __mod_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx,
			int val)
{
	/* Update node */
	__mod_node_page_state(lruvec_pgdat(lruvec), idx, val);

the int val -> long val update happens as expected right?

-aneesh
diff mbox series

Patch

diff --git a/mm/migrate.c b/mm/migrate.c
index 34a9ad3e0a4f..7e240437e7d9 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -2068,7 +2068,7 @@  int migrate_misplaced_page(struct page *page, struct vm_area_struct *vma,
 	LIST_HEAD(migratepages);
 	new_page_t *new;
 	bool compound;
-	unsigned int nr_pages = thp_nr_pages(page);
+	int nr_pages = thp_nr_pages(page);
 
 	/*
 	 * PTE mapped THP or HugeTLB page can't reach here so the page could