Message ID | 20240605212146.994486-1-peterx@redhat.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm/page_table_check: Fix crash on ZONE_DEVICE | expand |
On Wed, 5 Jun 2024 17:21:46 -0400 Peter Xu <peterx@redhat.com> wrote: > Not all pages may apply to pgtable check. One example is ZONE_DEVICE > pages: they map PFNs directly, and they don't allocate page_ext at all even > if there's struct page around. One may reference devm_memremap_pages(). > > When both ZONE_DEVICE and page-table-check enabled, then try to map some > dax memories, one can trigger kernel bug constantly now when the kernel was > trying to inject some pfn maps on the dax device: > > kernel BUG at mm/page_table_check.c:55! > > While it's pretty legal to use set_pxx_at() for ZONE_DEVICE pages for page > fault resolutions, skip all the checks if page_ext doesn't even exist in > pgtable checker, which applies to ZONE_DEVICE but maybe more. Do we have a Reported-by: for this one? And a Fixes? It looks like df4e817b7108?
[ add Alistair ] Peter Xu wrote: > Not all pages may apply to pgtable check. One example is ZONE_DEVICE > pages: they map PFNs directly, and they don't allocate page_ext at all even > if there's struct page around. One may reference devm_memremap_pages(). > > When both ZONE_DEVICE and page-table-check enabled, then try to map some > dax memories, one can trigger kernel bug constantly now when the kernel was > trying to inject some pfn maps on the dax device: > > kernel BUG at mm/page_table_check.c:55! > > While it's pretty legal to use set_pxx_at() for ZONE_DEVICE pages for page > fault resolutions, skip all the checks if page_ext doesn't even exist in > pgtable checker, which applies to ZONE_DEVICE but maybe more. This looks correct to me, and needed in the near term. You can add: Reviewed-by: Dan Williams <dan.j.williams@intel.com> In the long term, the page_ext check may not be needed. I.e. the reason I added Alistair was in case his work to make DAX pages behave like typical pages for reference counting would also make them behave the same for the presence of page_ext.
Dan Williams <dan.j.williams@intel.com> writes: > [ add Alistair ] > > Peter Xu wrote: >> Not all pages may apply to pgtable check. One example is ZONE_DEVICE >> pages: they map PFNs directly, and they don't allocate page_ext at all even >> if there's struct page around. One may reference devm_memremap_pages(). >> >> When both ZONE_DEVICE and page-table-check enabled, then try to map some >> dax memories, one can trigger kernel bug constantly now when the kernel was >> trying to inject some pfn maps on the dax device: >> >> kernel BUG at mm/page_table_check.c:55! >> >> While it's pretty legal to use set_pxx_at() for ZONE_DEVICE pages for page >> fault resolutions, skip all the checks if page_ext doesn't even exist in >> pgtable checker, which applies to ZONE_DEVICE but maybe more. > > This looks correct to me, and needed in the near term. You can add: > > Reviewed-by: Dan Williams <dan.j.williams@intel.com> > > In the long term, the page_ext check may not be needed. I.e. the reason > I added Alistair was in case his work to make DAX pages behave like > typical pages for reference counting would also make them behave the > same for the presence of page_ext. It doesn't currently. However I did run into this bug while I was developing those so please add: Reviewed-by: Alistair Popple <apopple@nvidia.com>
On Wed, Jun 5, 2024 at 5:21 PM Peter Xu <peterx@redhat.com> wrote: > > Not all pages may apply to pgtable check. One example is ZONE_DEVICE > pages: they map PFNs directly, and they don't allocate page_ext at all even > if there's struct page around. One may reference devm_memremap_pages(). > > When both ZONE_DEVICE and page-table-check enabled, then try to map some > dax memories, one can trigger kernel bug constantly now when the kernel was > trying to inject some pfn maps on the dax device: > > kernel BUG at mm/page_table_check.c:55! > > While it's pretty legal to use set_pxx_at() for ZONE_DEVICE pages for page > fault resolutions, skip all the checks if page_ext doesn't even exist in > pgtable checker, which applies to ZONE_DEVICE but maybe more. Thank you for reporting this bug. A few comments below: > > Cc: Dan Williams <dan.j.williams@intel.com> > Cc: Pasha Tatashin <pasha.tatashin@soleen.com> > Signed-off-by: Peter Xu <peterx@redhat.com> > --- > mm/page_table_check.c | 11 ++++++++++- > 1 file changed, 10 insertions(+), 1 deletion(-) > > diff --git a/mm/page_table_check.c b/mm/page_table_check.c > index 4169576bed72..509c6ef8de40 100644 > --- a/mm/page_table_check.c > +++ b/mm/page_table_check.c > @@ -73,6 +73,9 @@ static void page_table_check_clear(unsigned long pfn, unsigned long pgcnt) > page = pfn_to_page(pfn); > page_ext = page_ext_get(page); > > + if (!page_ext) > + return; I would replace the above with the following, here and in other places: if (!page_ext) { WARN_ONCE(!is_zone_device_page(page), "page_ext is missing for a non-device page\n"); return; } > + > BUG_ON(PageSlab(page)); > anon = PageAnon(page); > > @@ -110,6 +113,9 @@ static void page_table_check_set(unsigned long pfn, unsigned long pgcnt, > page = pfn_to_page(pfn); > page_ext = page_ext_get(page); > > + if (!page_ext) > + return; > + > BUG_ON(PageSlab(page)); > anon = PageAnon(page); > > @@ -140,7 +146,10 @@ void __page_table_check_zero(struct page *page, unsigned int order) > BUG_ON(PageSlab(page)); > > page_ext = page_ext_get(page); > - BUG_ON(!page_ext); > + > + if (!page_ext) > + return; > + > for (i = 0; i < (1ul << order); i++) { > struct page_table_check *ptc = get_page_table_check(page_ext); > > -- > 2.45.0 >
Pasha Tatashin wrote: > On Wed, Jun 5, 2024 at 5:21 PM Peter Xu <peterx@redhat.com> wrote: > > > > Not all pages may apply to pgtable check. One example is ZONE_DEVICE > > pages: they map PFNs directly, and they don't allocate page_ext at all even > > if there's struct page around. One may reference devm_memremap_pages(). > > > > When both ZONE_DEVICE and page-table-check enabled, then try to map some > > dax memories, one can trigger kernel bug constantly now when the kernel was > > trying to inject some pfn maps on the dax device: > > > > kernel BUG at mm/page_table_check.c:55! > > > > While it's pretty legal to use set_pxx_at() for ZONE_DEVICE pages for page > > fault resolutions, skip all the checks if page_ext doesn't even exist in > > pgtable checker, which applies to ZONE_DEVICE but maybe more. > > Thank you for reporting this bug. A few comments below: > > > > > Cc: Dan Williams <dan.j.williams@intel.com> > > Cc: Pasha Tatashin <pasha.tatashin@soleen.com> > > Signed-off-by: Peter Xu <peterx@redhat.com> > > --- > > mm/page_table_check.c | 11 ++++++++++- > > 1 file changed, 10 insertions(+), 1 deletion(-) > > > > diff --git a/mm/page_table_check.c b/mm/page_table_check.c > > index 4169576bed72..509c6ef8de40 100644 > > --- a/mm/page_table_check.c > > +++ b/mm/page_table_check.c > > @@ -73,6 +73,9 @@ static void page_table_check_clear(unsigned long pfn, unsigned long pgcnt) > > page = pfn_to_page(pfn); > > page_ext = page_ext_get(page); > > > > + if (!page_ext) > > + return; > > I would replace the above with the following, here and in other places: > > if (!page_ext) { > WARN_ONCE(!is_zone_device_page(page), > "page_ext is missing for a non-device page\n"); > return; > } Hmm, but this function is silent for the !pfn_valid(@pfn) case, and the old cold has BUG_ON(!page_ext). So we know the caller is not being careful about @pfn, and existing code is likely avoiding the BUG_ON(). The justification for the WARN_ONCE(), or maybe VM_WARN_ONCE(), would be if there is a high likelihood that ongoing kernel changes introduce more pfn_valid() but not page_ext covered pages? Is that a realistic scenario?
On Wed, Jun 5, 2024 at 8:20 PM Dan Williams <dan.j.williams@intel.com> wrote: > > Pasha Tatashin wrote: > > On Wed, Jun 5, 2024 at 5:21 PM Peter Xu <peterx@redhat.com> wrote: > > > > > > Not all pages may apply to pgtable check. One example is ZONE_DEVICE > > > pages: they map PFNs directly, and they don't allocate page_ext at all even > > > if there's struct page around. One may reference devm_memremap_pages(). > > > > > > When both ZONE_DEVICE and page-table-check enabled, then try to map some > > > dax memories, one can trigger kernel bug constantly now when the kernel was > > > trying to inject some pfn maps on the dax device: > > > > > > kernel BUG at mm/page_table_check.c:55! > > > > > > While it's pretty legal to use set_pxx_at() for ZONE_DEVICE pages for page > > > fault resolutions, skip all the checks if page_ext doesn't even exist in > > > pgtable checker, which applies to ZONE_DEVICE but maybe more. > > > > Thank you for reporting this bug. A few comments below: > > > > > > > > Cc: Dan Williams <dan.j.williams@intel.com> > > > Cc: Pasha Tatashin <pasha.tatashin@soleen.com> > > > Signed-off-by: Peter Xu <peterx@redhat.com> > > > --- > > > mm/page_table_check.c | 11 ++++++++++- > > > 1 file changed, 10 insertions(+), 1 deletion(-) > > > > > > diff --git a/mm/page_table_check.c b/mm/page_table_check.c > > > index 4169576bed72..509c6ef8de40 100644 > > > --- a/mm/page_table_check.c > > > +++ b/mm/page_table_check.c > > > @@ -73,6 +73,9 @@ static void page_table_check_clear(unsigned long pfn, unsigned long pgcnt) > > > page = pfn_to_page(pfn); > > > page_ext = page_ext_get(page); > > > > > > + if (!page_ext) > > > + return; > > > > I would replace the above with the following, here and in other places: > > > > if (!page_ext) { > > WARN_ONCE(!is_zone_device_page(page), > > "page_ext is missing for a non-device page\n"); > > return; > > } > > Hmm, but this function is silent for the !pfn_valid(@pfn) case, and the > old cold has BUG_ON(!page_ext). So we know the caller is not being > careful about @pfn, and existing code is likely avoiding the BUG_ON(). > > The justification for the WARN_ONCE(), or maybe VM_WARN_ONCE(), would > be if there is a high likelihood that ongoing kernel changes introduce > more pfn_valid() but not page_ext covered pages? Is that a realistic > scenario? Good point, it is unlikely we will have scenarios without page_ext. Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com>
On Wed, Jun 05, 2024 at 03:05:43PM -0700, Andrew Morton wrote: > On Wed, 5 Jun 2024 17:21:46 -0400 Peter Xu <peterx@redhat.com> wrote: > > > Not all pages may apply to pgtable check. One example is ZONE_DEVICE > > pages: they map PFNs directly, and they don't allocate page_ext at all even > > if there's struct page around. One may reference devm_memremap_pages(). > > > > When both ZONE_DEVICE and page-table-check enabled, then try to map some > > dax memories, one can trigger kernel bug constantly now when the kernel was > > trying to inject some pfn maps on the dax device: > > > > kernel BUG at mm/page_table_check.c:55! > > > > While it's pretty legal to use set_pxx_at() for ZONE_DEVICE pages for page > > fault resolutions, skip all the checks if page_ext doesn't even exist in > > pgtable checker, which applies to ZONE_DEVICE but maybe more. > > Do we have a Reported-by: for this one? Nop, I just hit that when I started to look at the dax issues. > > And a Fixes? It looks like df4e817b7108? Yes that commit should be proper. Thanks,
diff --git a/mm/page_table_check.c b/mm/page_table_check.c index 4169576bed72..509c6ef8de40 100644 --- a/mm/page_table_check.c +++ b/mm/page_table_check.c @@ -73,6 +73,9 @@ static void page_table_check_clear(unsigned long pfn, unsigned long pgcnt) page = pfn_to_page(pfn); page_ext = page_ext_get(page); + if (!page_ext) + return; + BUG_ON(PageSlab(page)); anon = PageAnon(page); @@ -110,6 +113,9 @@ static void page_table_check_set(unsigned long pfn, unsigned long pgcnt, page = pfn_to_page(pfn); page_ext = page_ext_get(page); + if (!page_ext) + return; + BUG_ON(PageSlab(page)); anon = PageAnon(page); @@ -140,7 +146,10 @@ void __page_table_check_zero(struct page *page, unsigned int order) BUG_ON(PageSlab(page)); page_ext = page_ext_get(page); - BUG_ON(!page_ext); + + if (!page_ext) + return; + for (i = 0; i < (1ul << order); i++) { struct page_table_check *ptc = get_page_table_check(page_ext);
Not all pages may apply to pgtable check. One example is ZONE_DEVICE pages: they map PFNs directly, and they don't allocate page_ext at all even if there's struct page around. One may reference devm_memremap_pages(). When both ZONE_DEVICE and page-table-check enabled, then try to map some dax memories, one can trigger kernel bug constantly now when the kernel was trying to inject some pfn maps on the dax device: kernel BUG at mm/page_table_check.c:55! While it's pretty legal to use set_pxx_at() for ZONE_DEVICE pages for page fault resolutions, skip all the checks if page_ext doesn't even exist in pgtable checker, which applies to ZONE_DEVICE but maybe more. Cc: Dan Williams <dan.j.williams@intel.com> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Signed-off-by: Peter Xu <peterx@redhat.com> --- mm/page_table_check.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-)