Message ID | 20210722054840.501423-1-npiggin@gmail.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm/migrate: fix page state accounting type conversion underflow | expand |
On 22.07.21 07:48, Nicholas Piggin wrote: > Similarly to commit 2da9f6305f306 ("mm/vmscan: fix NR_ISOLATED_FILE > corruption on 64-bit"), fix -ve int -> unsigned int -> long bug. > > Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru> > Fixes: c5fc5c3ae0c84 ("mm: migrate: account THP NUMA migration counters correctly") > Signed-off-by: Nicholas Piggin <npiggin@gmail.com> > --- > mm/migrate.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/mm/migrate.c b/mm/migrate.c > index 34a9ad3e0a4f..7e240437e7d9 100644 > --- a/mm/migrate.c > +++ b/mm/migrate.c > @@ -2068,7 +2068,7 @@ int migrate_misplaced_page(struct page *page, struct vm_area_struct *vma, > LIST_HEAD(migratepages); > new_page_t *new; > bool compound; > - unsigned int nr_pages = thp_nr_pages(page); > + int nr_pages = thp_nr_pages(page); > > /* > * PTE mapped THP or HugeTLB page can't reach here so the page could > This is too fragile to silently break again IMHO. Should we similarly to 2da9f6305f306 handle the conversion explicitly inside the mod_node_page_state() call?
On Thu, 22 Jul 2021 09:27:50 +0200 David Hildenbrand <david@redhat.com> wrote: > On 22.07.21 07:48, Nicholas Piggin wrote: > > Similarly to commit 2da9f6305f306 ("mm/vmscan: fix NR_ISOLATED_FILE > > corruption on 64-bit"), fix -ve int -> unsigned int -> long bug. > > > > Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru> > > Fixes: c5fc5c3ae0c84 ("mm: migrate: account THP NUMA migration counters correctly") > > Signed-off-by: Nicholas Piggin <npiggin@gmail.com> > > --- > > mm/migrate.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/mm/migrate.c b/mm/migrate.c > > index 34a9ad3e0a4f..7e240437e7d9 100644 > > --- a/mm/migrate.c > > +++ b/mm/migrate.c > > @@ -2068,7 +2068,7 @@ int migrate_misplaced_page(struct page *page, struct vm_area_struct *vma, > > LIST_HEAD(migratepages); > > new_page_t *new; > > bool compound; > > - unsigned int nr_pages = thp_nr_pages(page); > > + int nr_pages = thp_nr_pages(page); > > > > /* > > * PTE mapped THP or HugeTLB page can't reach here so the page could > > > > This is too fragile to silently break again IMHO. Should we similarly to > 2da9f6305f306 handle the conversion explicitly inside the > mod_node_page_state() call? And please don't send us off to loo at another commit to understand this one. A full standalone changelog, please. Preferably with a description of user-visible effects. 2da9f6305f306 said "Symptoms include CMA allocations hanging forever holding the cma_mutex due to alloc_contig_range->...->isolate_migratepages_block waiting forever in "while (unlikely(too_many_isolated(pgdat)))". Is that also the case with this bug?
Excerpts from David Hildenbrand's message of July 22, 2021 5:27 pm: > On 22.07.21 07:48, Nicholas Piggin wrote: >> Similarly to commit 2da9f6305f306 ("mm/vmscan: fix NR_ISOLATED_FILE >> corruption on 64-bit"), fix -ve int -> unsigned int -> long bug. >> >> Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru> >> Fixes: c5fc5c3ae0c84 ("mm: migrate: account THP NUMA migration counters correctly") >> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> >> --- >> mm/migrate.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/mm/migrate.c b/mm/migrate.c >> index 34a9ad3e0a4f..7e240437e7d9 100644 >> --- a/mm/migrate.c >> +++ b/mm/migrate.c >> @@ -2068,7 +2068,7 @@ int migrate_misplaced_page(struct page *page, struct vm_area_struct *vma, >> LIST_HEAD(migratepages); >> new_page_t *new; >> bool compound; >> - unsigned int nr_pages = thp_nr_pages(page); >> + int nr_pages = thp_nr_pages(page); >> >> /* >> * PTE mapped THP or HugeTLB page can't reach here so the page could >> > > This is too fragile to silently break again IMHO. Should we similarly to > 2da9f6305f306 handle the conversion explicitly inside the > mod_node_page_state() call? Casting to signed still has the fragility that the variable is unsigned so negating it somewhere else would break. I was somewhat inconsistent in the fixes, but there is less code that uses the variable here so it's simpler to change the type IMO. Negating an unsigned type always gives you a non-negative number. Unfortunately types matter. Thanks, Nick
On Thu, Jul 22, 2021 at 03:48:40PM +1000, Nicholas Piggin wrote: > Similarly to commit 2da9f6305f306 ("mm/vmscan: fix NR_ISOLATED_FILE > corruption on 64-bit"), fix -ve int -> unsigned int -> long bug. > > Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru> > Fixes: c5fc5c3ae0c84 ("mm: migrate: account THP NUMA migration counters correctly") > Signed-off-by: Nicholas Piggin <npiggin@gmail.com> > --- > mm/migrate.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/mm/migrate.c b/mm/migrate.c > index 34a9ad3e0a4f..7e240437e7d9 100644 > --- a/mm/migrate.c > +++ b/mm/migrate.c > @@ -2068,7 +2068,7 @@ int migrate_misplaced_page(struct page *page, struct vm_area_struct *vma, > LIST_HEAD(migratepages); > new_page_t *new; > bool compound; > - unsigned int nr_pages = thp_nr_pages(page); > + int nr_pages = thp_nr_pages(page); This has prompted me to go off and look through the folio work. There are a number of similar problems. It's sad that gcc doesn't have a warning that catched this (although Clang does!) I've filed https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101645 In the meantime, I'm going to change: - folio_nr_pages() now returns 'long' instead of 'unsigned long'. - Everywhere that assigns it to a variable changes its type from 'int' or 'unsigned int' or 'unsigned long' to 'long'. The only place this looks a little dodgy is: +static inline bool folio_contains(struct folio *folio, pgoff_t index) +{ + /* HugeTLBfs indexes the page cache in units of hpage_size */ + if (folio_test_hugetlb(folio)) + return folio->index == index; + return index - folio_index(folio) < folio_nr_pages(folio); +} but i'm pretty sure that's OK because index & folio_index are both unsigned long, and by my reading of the C spec, that promotes long to unsigned long, and so we do an unsigned comparison (meaning that index 3, folio_index() 4 will wrap around to ULONG_MAX and the comparison will return false, as expected. I also intend to change update_lru_size() to take a long. This patch is insufficient ... mem_cgroup_move_account() has the same bug. I'm not quite sure how to handle this patch -- I'm going to replace all this in 5.15, but this should be backported to earlier kernels.
Matthew Wilcox <willy@infradead.org> writes:
> This patch is insufficient ... mem_cgroup_move_account() has the same bug.
Do we have similar issue mem_cgroup_move_account()?
void __mod_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx,
int val)
{
/* Update node */
__mod_node_page_state(lruvec_pgdat(lruvec), idx, val);
the int val -> long val update happens as expected right?
-aneesh
diff --git a/mm/migrate.c b/mm/migrate.c index 34a9ad3e0a4f..7e240437e7d9 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -2068,7 +2068,7 @@ int migrate_misplaced_page(struct page *page, struct vm_area_struct *vma, LIST_HEAD(migratepages); new_page_t *new; bool compound; - unsigned int nr_pages = thp_nr_pages(page); + int nr_pages = thp_nr_pages(page); /* * PTE mapped THP or HugeTLB page can't reach here so the page could
Similarly to commit 2da9f6305f306 ("mm/vmscan: fix NR_ISOLATED_FILE corruption on 64-bit"), fix -ve int -> unsigned int -> long bug. Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru> Fixes: c5fc5c3ae0c84 ("mm: migrate: account THP NUMA migration counters correctly") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> --- mm/migrate.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)