diff mbox series

[2/7] mm/rmap: map_pte() was not handling private ZONE_DEVICE page properly

Message ID 20180824192549.30844-3-jglisse@redhat.com (mailing list archive)
State New, archived
Headers show
Series HMM updates, improvements and fixes | expand

Commit Message

Jerome Glisse Aug. 24, 2018, 7:25 p.m. UTC
From: Ralph Campbell <rcampbell@nvidia.com>

Private ZONE_DEVICE pages use a special pte entry and thus are not
present. Properly handle this case in map_pte(), it is already handled
in check_pte(), the map_pte() part was lost in some rebase most probably.

Without this patch the slow migration path can not migrate back private
ZONE_DEVICE memory to regular memory. This was found after stress
testing migration back to system memory. This ultimatly can lead the
CPU to an infinite page fault loop on the special swap entry.

Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: stable@vger.kernel.org
---
 mm/page_vma_mapped.c | 9 +++++++++
 1 file changed, 9 insertions(+)

Comments

Education Directorate Aug. 30, 2018, 2:05 p.m. UTC | #1
On Fri, Aug 24, 2018 at 03:25:44PM -0400, jglisse@redhat.com wrote:
> From: Ralph Campbell <rcampbell@nvidia.com>
> 
> Private ZONE_DEVICE pages use a special pte entry and thus are not
> present. Properly handle this case in map_pte(), it is already handled
> in check_pte(), the map_pte() part was lost in some rebase most probably.
> 
> Without this patch the slow migration path can not migrate back private
> ZONE_DEVICE memory to regular memory. This was found after stress
> testing migration back to system memory. This ultimatly can lead the
> CPU to an infinite page fault loop on the special swap entry.
> 
> Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
> Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Cc: stable@vger.kernel.org
> ---
>  mm/page_vma_mapped.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
> index ae3c2a35d61b..1cf5b9bfb559 100644
> --- a/mm/page_vma_mapped.c
> +++ b/mm/page_vma_mapped.c
> @@ -21,6 +21,15 @@ static bool map_pte(struct page_vma_mapped_walk *pvmw)
>  			if (!is_swap_pte(*pvmw->pte))
>  				return false;
>  		} else {
> +			if (is_swap_pte(*pvmw->pte)) {
> +				swp_entry_t entry;
> +
> +				/* Handle un-addressable ZONE_DEVICE memory */
> +				entry = pte_to_swp_entry(*pvmw->pte);
> +				if (is_device_private_entry(entry))
> +					return true;
> +			}
> +

This happens just for !PVMW_SYNC && PVMW_MIGRATION? I presume this
is triggered via the remove_migration_pte() code path? Doesn't
returning true here imply that we've taken the ptl lock for the
pvmw?

Balbir
Jerome Glisse Aug. 30, 2018, 2:34 p.m. UTC | #2
On Fri, Aug 31, 2018 at 12:05:38AM +1000, Balbir Singh wrote:
> On Fri, Aug 24, 2018 at 03:25:44PM -0400, jglisse@redhat.com wrote:
> > From: Ralph Campbell <rcampbell@nvidia.com>
> > 
> > Private ZONE_DEVICE pages use a special pte entry and thus are not
> > present. Properly handle this case in map_pte(), it is already handled
> > in check_pte(), the map_pte() part was lost in some rebase most probably.
> > 
> > Without this patch the slow migration path can not migrate back private
> > ZONE_DEVICE memory to regular memory. This was found after stress
> > testing migration back to system memory. This ultimatly can lead the
> > CPU to an infinite page fault loop on the special swap entry.
> > 
> > Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
> > Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > Cc: stable@vger.kernel.org
> > ---
> >  mm/page_vma_mapped.c | 9 +++++++++
> >  1 file changed, 9 insertions(+)
> > 
> > diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
> > index ae3c2a35d61b..1cf5b9bfb559 100644
> > --- a/mm/page_vma_mapped.c
> > +++ b/mm/page_vma_mapped.c
> > @@ -21,6 +21,15 @@ static bool map_pte(struct page_vma_mapped_walk *pvmw)
> >  			if (!is_swap_pte(*pvmw->pte))
> >  				return false;
> >  		} else {
> > +			if (is_swap_pte(*pvmw->pte)) {
> > +				swp_entry_t entry;
> > +
> > +				/* Handle un-addressable ZONE_DEVICE memory */
> > +				entry = pte_to_swp_entry(*pvmw->pte);
> > +				if (is_device_private_entry(entry))
> > +					return true;
> > +			}
> > +
> 
> This happens just for !PVMW_SYNC && PVMW_MIGRATION? I presume this
> is triggered via the remove_migration_pte() code path? Doesn't
> returning true here imply that we've taken the ptl lock for the
> pvmw?

This happens through try_to_unmap() from migrate_vma_unmap() and thus
has !PVMW_SYNC and !PVMW_MIGRATION

But you are right about the ptl lock, so looking at code we were just
doing pte modification without holding the pte lock but the
page_vma_mapped_walk() would not try to unlock as pvmw->ptl == NULL
so this never triggered any warning.

I am gonna post a v2 shortly which address that.

Cheers,
Jérôme
diff mbox series

Patch

diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
index ae3c2a35d61b..1cf5b9bfb559 100644
--- a/mm/page_vma_mapped.c
+++ b/mm/page_vma_mapped.c
@@ -21,6 +21,15 @@  static bool map_pte(struct page_vma_mapped_walk *pvmw)
 			if (!is_swap_pte(*pvmw->pte))
 				return false;
 		} else {
+			if (is_swap_pte(*pvmw->pte)) {
+				swp_entry_t entry;
+
+				/* Handle un-addressable ZONE_DEVICE memory */
+				entry = pte_to_swp_entry(*pvmw->pte);
+				if (is_device_private_entry(entry))
+					return true;
+			}
+
 			if (!pte_present(*pvmw->pte))
 				return false;
 		}