diff mbox series

f45ec5ff16 ("userfaultfd: wp: support swap and page migration"): [ 140.777858] BUG: Bad rss-counter state mm:b278fc66 type:MM_ANONPAGES val:1

Message ID 20200410073209.11164-1-hdanton@sina.com (mailing list archive)
State New, archived
Headers show
Series f45ec5ff16 ("userfaultfd: wp: support swap and page migration"): [ 140.777858] BUG: Bad rss-counter state mm:b278fc66 type:MM_ANONPAGES val:1 | expand

Commit Message

Hillf Danton April 10, 2020, 7:32 a.m. UTC
On Fri, 10 Apr 2020 08:25:18 +0800
> Greetings,
> 
> 0day kernel testing robot got the below dmesg and the first bad commit is
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
> 
> commit f45ec5ff16a75f96dac8c89862d75f1d8739efd4
> Author:     Peter Xu <peterx@redhat.com>
> AuthorDate: Mon Apr 6 20:06:01 2020 -0700
> Commit:     Linus Torvalds <torvalds@linux-foundation.org>
> CommitDate: Tue Apr 7 10:43:39 2020 -0700
> 
>     userfaultfd: wp: support swap and page migration
>     
>     For either swap and page migration, we all use the bit 2 of the entry to
>     identify whether this entry is uffd write-protected.  It plays a similar
>     role as the existing soft dirty bit in swap entries but only for keeping
>     the uffd-wp tracking for a specific PTE/PMD.
>     
>     Something special here is that when we want to recover the uffd-wp bit
>     from a swap/migration entry to the PTE bit we'll also need to take care of
>     the _PAGE_RW bit and make sure it's cleared, otherwise even with the
>     _PAGE_UFFD_WP bit we can't trap it at all.
>     
>     In change_pte_range() we do nothing for uffd if the PTE is a swap entry.
>     That can lead to data mismatch if the page that we are going to write
>     protect is swapped out when sending the UFFDIO_WRITEPROTECT.  This patch
>     also applies/removes the uffd-wp bit even for the swap entries.
>
Have trouble understanding the last sentence in the paragraph above and
particularly linking it to the first one.

> If you fix the issue, kindly add following tag
> Reported-by: kernel test robot <lkp@intel.com>
> 
> [child3:925] eventfd (323) returned ENOSYS, marking as inactive.
> [  132.014801] can: request_module (can-proto-2) failed.
> [  132.063717] can: request_module (can-proto-2) failed.
> [  137.186037] trinity-c2 (943) used greatest stack depth: 5804 bytes left
> [  140.771486] MCE: Killing trinity-c2:956 due to hardware memory corruption fault at 8bd2060
> [  140.777858] BUG: Bad rss-counter state mm:b278fc66 type:MM_ANONPAGES val:1
> [  140.778736] BUG: Bad rss-counter state mm:b278fc66 type:MM_SHMEMPAGES val:2
> [  141.589424] MCE: Killing trinity-c3:940 due to hardware memory corruption fault at 8a8c860
> [  141.590730] swap_info_get: Bad swap file entry 700b8216
> [  141.591400] BUG: Bad page map in process trinity-c3  pte:17042c3c pmd:b1809067
> [  141.592304] addr:08bcf000 vm_flags:00100073 anon_vma:f1f29528 mapping:00000000 index:8bcf
> [  141.593399] file:(null) fault:0x0 mmap:0x0 readpage:0x0
> [  141.594065] CPU: 0 PID: 940 Comm: trinity-c3 Not tainted 5.6.0-11490-gf45ec5ff16a75 #1
> [  141.595055] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
> [  141.596093] Call Trace:
> [  141.596443]  dump_stack+0x16/0x18
> [  141.596868]  print_bad_pte+0x13f/0x159
> [  141.597367]  unmap_page_range+0x2a7/0x3e7
> [  141.597893]  unmap_single_vma+0x53/0x5d
> [  141.598383]  unmap_vmas+0x2c/0x3b
> [  141.598811]  exit_mmap+0x81/0xfc
> [  141.599238]  __mmput+0x25/0x8d
> [  141.599633]  mmput+0x28/0x2b
> [  141.600007]  do_exit+0x2f0/0x84a
> [  141.600449]  ? ___might_sleep+0x3f/0x11f
> [  141.600949]  do_group_exit+0x86/0x86
> [  141.601421]  __ia32_sys_exit_group+0x15/0x15
> [  141.601965]  do_fast_syscall_32+0x86/0xbf
> [  141.602481]  entry_SYSENTER_32+0xaf/0x101


Because is_swap_pte(oldpte) != IS_ENABLED(CONFIG_MIGRATION)), restore the
old behavior by modifying uffd_wp only for pte that is __not__ swap entry,
as the commit log says.

Comments

Peter Xu April 10, 2020, 3:32 p.m. UTC | #1
Hi, Hillf,

On Fri, Apr 10, 2020 at 03:32:09PM +0800, Hillf Danton wrote:
> 
> On Fri, 10 Apr 2020 08:25:18 +0800
> > Greetings,
> > 
> > 0day kernel testing robot got the below dmesg and the first bad commit is
> > 
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
> > 
> > commit f45ec5ff16a75f96dac8c89862d75f1d8739efd4
> > Author:     Peter Xu <peterx@redhat.com>
> > AuthorDate: Mon Apr 6 20:06:01 2020 -0700
> > Commit:     Linus Torvalds <torvalds@linux-foundation.org>
> > CommitDate: Tue Apr 7 10:43:39 2020 -0700
> > 
> >     userfaultfd: wp: support swap and page migration
> >     
> >     For either swap and page migration, we all use the bit 2 of the entry to
> >     identify whether this entry is uffd write-protected.  It plays a similar
> >     role as the existing soft dirty bit in swap entries but only for keeping
> >     the uffd-wp tracking for a specific PTE/PMD.
> >     
> >     Something special here is that when we want to recover the uffd-wp bit
> >     from a swap/migration entry to the PTE bit we'll also need to take care of
> >     the _PAGE_RW bit and make sure it's cleared, otherwise even with the
> >     _PAGE_UFFD_WP bit we can't trap it at all.
> >     
> >     In change_pte_range() we do nothing for uffd if the PTE is a swap entry.
> >     That can lead to data mismatch if the page that we are going to write
> >     protect is swapped out when sending the UFFDIO_WRITEPROTECT.  This patch
> >     also applies/removes the uffd-wp bit even for the swap entries.
> >
> Have trouble understanding the last sentence in the paragraph above and
> particularly linking it to the first one.

This patch handles the swap entry tracking for userfault-wp.  I wanted
to express the fact that we didn't do that before.  Sorry if my
wording is confusing.

> 
> > If you fix the issue, kindly add following tag
> > Reported-by: kernel test robot <lkp@intel.com>
> > 
> > [child3:925] eventfd (323) returned ENOSYS, marking as inactive.
> > [  132.014801] can: request_module (can-proto-2) failed.
> > [  132.063717] can: request_module (can-proto-2) failed.
> > [  137.186037] trinity-c2 (943) used greatest stack depth: 5804 bytes left
> > [  140.771486] MCE: Killing trinity-c2:956 due to hardware memory corruption fault at 8bd2060
> > [  140.777858] BUG: Bad rss-counter state mm:b278fc66 type:MM_ANONPAGES val:1
> > [  140.778736] BUG: Bad rss-counter state mm:b278fc66 type:MM_SHMEMPAGES val:2
> > [  141.589424] MCE: Killing trinity-c3:940 due to hardware memory corruption fault at 8a8c860
> > [  141.590730] swap_info_get: Bad swap file entry 700b8216

Should be a 32bit guest, swap type == 11100b.  I guess this also means
MAX_SWAPFILES is bigger than 11100b.

> > [  141.591400] BUG: Bad page map in process trinity-c3  pte:17042c3c pmd:b1809067
> > [  141.592304] addr:08bcf000 vm_flags:00100073 anon_vma:f1f29528 mapping:00000000 index:8bcf
> > [  141.593399] file:(null) fault:0x0 mmap:0x0 readpage:0x0
> > [  141.594065] CPU: 0 PID: 940 Comm: trinity-c3 Not tainted 5.6.0-11490-gf45ec5ff16a75 #1
> > [  141.595055] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
> > [  141.596093] Call Trace:
> > [  141.596443]  dump_stack+0x16/0x18
> > [  141.596868]  print_bad_pte+0x13f/0x159
> > [  141.597367]  unmap_page_range+0x2a7/0x3e7
> > [  141.597893]  unmap_single_vma+0x53/0x5d
> > [  141.598383]  unmap_vmas+0x2c/0x3b
> > [  141.598811]  exit_mmap+0x81/0xfc
> > [  141.599238]  __mmput+0x25/0x8d
> > [  141.599633]  mmput+0x28/0x2b
> > [  141.600007]  do_exit+0x2f0/0x84a
> > [  141.600449]  ? ___might_sleep+0x3f/0x11f
> > [  141.600949]  do_group_exit+0x86/0x86
> > [  141.601421]  __ia32_sys_exit_group+0x15/0x15
> > [  141.601965]  do_fast_syscall_32+0x86/0xbf
> > [  141.602481]  entry_SYSENTER_32+0xaf/0x101
> 
> 
> Because is_swap_pte(oldpte) != IS_ENABLED(CONFIG_MIGRATION)), restore the
> old behavior by modifying uffd_wp only for pte that is __not__ swap entry,
> as the commit log says.
> 
> --- b/mm/mprotect.c
> +++ c/mm/mprotect.c
> @@ -139,7 +139,7 @@ static unsigned long change_pte_range(st
>  			}
>  			ptep_modify_prot_commit(vma, addr, pte, oldpte, ptent);
>  			pages++;
> -		} else if (is_swap_pte(oldpte)) {
> +		} else if (IS_ENABLED(CONFIG_MIGRATION)) {
>  			swp_entry_t entry = pte_to_swp_entry(oldpte);
>  			pte_t newpte;
>  
> @@ -154,7 +154,9 @@ static unsigned long change_pte_range(st
>  					newpte = pte_swp_mksoft_dirty(newpte);
>  				if (pte_swp_uffd_wp(oldpte))
>  					newpte = pte_swp_mkuffd_wp(newpte);
> -			} else if (is_write_device_private_entry(entry)) {
> +			}
> +
> +			if (is_write_device_private_entry(entry)) {
>  				/*
>  				 * We do not preserve soft-dirtiness. See
>  				 * copy_one_pte() for explanation.
> @@ -163,11 +165,18 @@ static unsigned long change_pte_range(st
>  				newpte = swp_entry_to_pte(entry);
>  				if (pte_swp_uffd_wp(oldpte))
>  					newpte = pte_swp_mkuffd_wp(newpte);
> -			} else {
> -				newpte = oldpte;
>  			}
>  
> -			if (uffd_wp)
> +			/*
> +			 * do nothing for changing uffd_wp if oldpte is a
> +			 * swap entry.
> +			 * That can lead to data mismatch if the page we
> +			 * are going to write protect is swapped out when
> +			 * sending the UFFDIO_WRITEPROTECT.
> +			 */
> +			if (is_swap_pte(oldpte))
> +				newpte = oldpte;
> +			else if (uffd_wp)
>  				newpte = pte_swp_mkuffd_wp(newpte);
>  			else if (uffd_wp_resolve)
>  				newpte = pte_swp_clear_uffd_wp(newpte);

I'm not sure this is correct.  As I mentioned, the commit wanted to
apply the uffd-wp bit even for the swap entries so that even the swap
entries got swapped in, the page will still be write protected.  So
IIUC think we can't remove that.

Above report happened at __swap_info_get() where we should have
triggered type >= READ_ONCE(nr_swapfiles) in swap_type_to_swap_info(),
which, iiuc, means the type is considered wrong.  The mistery to me is
how did the swap type went wrong.  E.g., iiuc the
"is_swap_pte(oldpte)" replacement to "IS_ENABLED(CONFIG_MIGRATION)"
shouldn't affect this, because even if MIGRATION is not enabled, both
is_write_migration_entry() and is_write_device_private_entry() should
return constant zero after all, so they shouldn't be thouching the
swap type.

I'm still trying to digest on what's happened... It would be good too
if more information on the test could be given, e.g., what is the
behavior of trinity-c2. A reproducer is of course even better.

Thanks,
Matthew Wilcox April 10, 2020, 3:38 p.m. UTC | #2
On Fri, Apr 10, 2020 at 11:32:34AM -0400, Peter Xu wrote:
> I'm still trying to digest on what's happened... It would be good too
> if more information on the test could be given, e.g., what is the
> behavior of trinity-c2. A reproducer is of course even better.

Trinity is a syscall fuzzer.  Don't expect what it's doing to make any
sense, it's just executing syscalls at random.
Peter Xu April 10, 2020, 6:14 p.m. UTC | #3
On Fri, Apr 10, 2020 at 08:38:05AM -0700, Matthew Wilcox wrote:
> On Fri, Apr 10, 2020 at 11:32:34AM -0400, Peter Xu wrote:
> > I'm still trying to digest on what's happened... It would be good too
> > if more information on the test could be given, e.g., what is the
> > behavior of trinity-c2. A reproducer is of course even better.
> 
> Trinity is a syscall fuzzer.  Don't expect what it's doing to make any
> sense, it's just executing syscalls at random.

OK thanks.

Though I just noticed that the original report is actually with some
attachments which I totally missed initially.  There's the config file
showing that we're with:

CONFIG_MIGRATION=y
CONFIG_MEMORY_FAILURE=y
CONFIG_DEVICE_PRIVATE=n

And even a reproducer.  However the reproducer script will fail at
wget, until I fixed it using:

initrd=openwrt-trinity-i386.cgz

to replace:

initrd=openwrt-i386-trinity.cgz

Then I can download the initrd and boot the VM with a decent QEMU.
However I didn't see any test running after the VM booted, and it will
reboot/shutdown after 100 sec without any error triggered (I believe
the rc.local tries to run something under /etc/kernel-tests/ but I'm
not sure it's running the right thing).

If there's any way to reproduce (I believe so because it can even
bisect in the original report, I just don't know how...), I'm thinking
maybe we can try to dump every swp entry change that could have been
touched in change_pte_range(), which is the only place that I thought
could be related to this in the commit, to see whether there's
anything suspecious.

8<----------------------------------------------
diff --git a/mm/mprotect.c b/mm/mprotect.c
index 1d823b050329..1b6daf7d03aa 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -173,6 +173,8 @@ static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
                                newpte = pte_swp_clear_uffd_wp(newpte);
 
                        if (!pte_same(oldpte, newpte)) {
+                               pr_info("%s: Update swp entry, 0x%lx -> 0x%lx\n",
+                                       __func__, pte_val(oldpte), pte_val(newpte));
                                set_pte_at(vma->vm_mm, addr, pte, newpte);
                                pages++;
                        }
Hillf Danton April 11, 2020, 1:21 a.m. UTC | #4
Hi Peter

On Fri, 10 Apr 2020 11:32:34 -0400 Peter Xu wrote:
> 
> I'm not sure this is correct.  As I mentioned, the commit wanted to
> apply the uffd-wp bit even for the swap entries so that even the swap
> entries got swapped in, the page will still be write protected.  So
> IIUC think we can't remove that.
> 
Thank you for explaining that we would flip-flop uffd-wp in f45ec5ff16.

The input to test robot now looks like the diff below after restoring
CONFIG_MIGRATION.

--- b/mm/mprotect.c
+++ c/mm/mprotect.c
@@ -139,9 +139,9 @@ static unsigned long change_pte_range(st
 			}
 			ptep_modify_prot_commit(vma, addr, pte, oldpte, ptent);
 			pages++;
-		} else if (is_swap_pte(oldpte)) {
+		} else if (IS_ENABLED(CONFIG_MIGRATION)) {
 			swp_entry_t entry = pte_to_swp_entry(oldpte);
-			pte_t newpte;
+			pte_t newpte = oldpte;
 
 			if (is_write_migration_entry(entry)) {
 				/*
@@ -154,7 +154,9 @@ static unsigned long change_pte_range(st
 					newpte = pte_swp_mksoft_dirty(newpte);
 				if (pte_swp_uffd_wp(oldpte))
 					newpte = pte_swp_mkuffd_wp(newpte);
-			} else if (is_write_device_private_entry(entry)) {
+			}
+
+			if (is_write_device_private_entry(entry)) {
 				/*
 				 * We do not preserve soft-dirtiness. See
 				 * copy_one_pte() for explanation.
@@ -163,8 +165,6 @@ static unsigned long change_pte_range(st
 				newpte = swp_entry_to_pte(entry);
 				if (pte_swp_uffd_wp(oldpte))
 					newpte = pte_swp_mkuffd_wp(newpte);
-			} else {
-				newpte = oldpte;
 			}
 
 			if (uffd_wp)
Hillf Danton April 12, 2020, 12:54 p.m. UTC | #5
On Fri, 10 Apr 2020 11:32:34 -0400 Peter Xu wrote:
> 
> I'm not sure this is correct.  As I mentioned, the commit wanted to
> apply the uffd-wp bit even for the swap entries so that even the swap
> entries got swapped in, the page will still be write protected.  So
> IIUC think we can't remove that.

Yes you are right.

Now both CONFIG_MIGRATION and swap entry are restored after making uffd_wq
survive migrate the same way as soft_dirty.

--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -236,6 +236,8 @@ static bool remove_migration_pte(struct
 		pte = pte_mkold(mk_pte(new, READ_ONCE(vma->vm_page_prot)));
 		if (pte_swp_soft_dirty(*pvmw.pte))
 			pte = pte_mksoft_dirty(pte);
+		if (pte_swp_uffd_wp(*pvmw.pte))
+			pte = pte_mkuffd_wp(pte);
 
 		/*
 		 * Recheck VMA as permissions can change since migration started
@@ -243,15 +245,11 @@ static bool remove_migration_pte(struct
 		entry = pte_to_swp_entry(*pvmw.pte);
 		if (is_write_migration_entry(entry))
 			pte = maybe_mkwrite(pte, vma);
-		else if (pte_swp_uffd_wp(*pvmw.pte))
-			pte = pte_mkuffd_wp(pte);
 
 		if (unlikely(is_zone_device_page(new))) {
 			if (is_device_private_page(new)) {
 				entry = make_device_private_entry(new, pte_write(pte));
 				pte = swp_entry_to_pte(entry);
-				if (pte_swp_uffd_wp(*pvmw.pte))
-					pte = pte_mkuffd_wp(pte);
 			}
 		}
 
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -139,11 +139,13 @@ static unsigned long change_pte_range(st
 			}
 			ptep_modify_prot_commit(vma, addr, pte, oldpte, ptent);
 			pages++;
-		} else if (is_swap_pte(oldpte)) {
+		} else if (IS_ENABLED(CONFIG_MIGRATION)) {
 			swp_entry_t entry = pte_to_swp_entry(oldpte);
 			pte_t newpte;
 
-			if (is_write_migration_entry(entry)) {
+			if (!non_swap_entry(entry)) {
+				newpte = oldpte;
+			} else if (is_write_migration_entry(entry)) {
 				/*
 				 * A protection check is difficult so
 				 * just be safe and disable write
@@ -164,7 +166,7 @@ static unsigned long change_pte_range(st
 				if (pte_swp_uffd_wp(oldpte))
 					newpte = pte_swp_mkuffd_wp(newpte);
 			} else {
-				newpte = oldpte;
+				continue;
 			}
 
 			if (uffd_wp)
Peter Xu April 12, 2020, 9:45 p.m. UTC | #6
On Sun, Apr 12, 2020 at 08:54:08PM +0800, Hillf Danton wrote:
> 
> On Fri, 10 Apr 2020 11:32:34 -0400 Peter Xu wrote:
> > 
> > I'm not sure this is correct.  As I mentioned, the commit wanted to
> > apply the uffd-wp bit even for the swap entries so that even the swap
> > entries got swapped in, the page will still be write protected.  So
> > IIUC think we can't remove that.
> 
> Yes you are right.
> 
> Now both CONFIG_MIGRATION and swap entry are restored after making uffd_wq
> survive migrate the same way as soft_dirty.
> 
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -236,6 +236,8 @@ static bool remove_migration_pte(struct
>  		pte = pte_mkold(mk_pte(new, READ_ONCE(vma->vm_page_prot)));
>  		if (pte_swp_soft_dirty(*pvmw.pte))
>  			pte = pte_mksoft_dirty(pte);
> +		if (pte_swp_uffd_wp(*pvmw.pte))
> +			pte = pte_mkuffd_wp(pte);
>  
>  		/*
>  		 * Recheck VMA as permissions can change since migration started
> @@ -243,15 +245,11 @@ static bool remove_migration_pte(struct
>  		entry = pte_to_swp_entry(*pvmw.pte);
>  		if (is_write_migration_entry(entry))
>  			pte = maybe_mkwrite(pte, vma);
> -		else if (pte_swp_uffd_wp(*pvmw.pte))
> -			pte = pte_mkuffd_wp(pte);
>  
>  		if (unlikely(is_zone_device_page(new))) {
>  			if (is_device_private_page(new)) {
>  				entry = make_device_private_entry(new, pte_write(pte));
>  				pte = swp_entry_to_pte(entry);
> -				if (pte_swp_uffd_wp(*pvmw.pte))
> -					pte = pte_mkuffd_wp(pte);
>  			}
>  		}
>  
> --- a/mm/mprotect.c
> +++ b/mm/mprotect.c
> @@ -139,11 +139,13 @@ static unsigned long change_pte_range(st
>  			}
>  			ptep_modify_prot_commit(vma, addr, pte, oldpte, ptent);
>  			pages++;
> -		} else if (is_swap_pte(oldpte)) {
> +		} else if (IS_ENABLED(CONFIG_MIGRATION)) {
>  			swp_entry_t entry = pte_to_swp_entry(oldpte);
>  			pte_t newpte;
>  
> -			if (is_write_migration_entry(entry)) {
> +			if (!non_swap_entry(entry)) {
> +				newpte = oldpte;
> +			} else if (is_write_migration_entry(entry)) {
>  				/*
>  				 * A protection check is difficult so
>  				 * just be safe and disable write
> @@ -164,7 +166,7 @@ static unsigned long change_pte_range(st
>  				if (pte_swp_uffd_wp(oldpte))
>  					newpte = pte_swp_mkuffd_wp(newpte);
>  			} else {
> -				newpte = oldpte;
> +				continue;
>  			}
>  
>  			if (uffd_wp)
> 
> 

Hi, Hillf,

Feel free to have a look at another report, which I think is the same
issue of this:

https://lore.kernel.org/lkml/CA+G9fYsRGvkqtpdGv_aVr+Hn17KgYq04Q=EE=pB774qVxRqOeg@mail.gmail.com/

IMHO this bisected commit is correct itself, it's just that we
shouldn't enable uffd-wp on 32bit system.

Thanks,
diff mbox series

Patch

--- b/mm/mprotect.c
+++ c/mm/mprotect.c
@@ -139,7 +139,7 @@  static unsigned long change_pte_range(st
 			}
 			ptep_modify_prot_commit(vma, addr, pte, oldpte, ptent);
 			pages++;
-		} else if (is_swap_pte(oldpte)) {
+		} else if (IS_ENABLED(CONFIG_MIGRATION)) {
 			swp_entry_t entry = pte_to_swp_entry(oldpte);
 			pte_t newpte;
 
@@ -154,7 +154,9 @@  static unsigned long change_pte_range(st
 					newpte = pte_swp_mksoft_dirty(newpte);
 				if (pte_swp_uffd_wp(oldpte))
 					newpte = pte_swp_mkuffd_wp(newpte);
-			} else if (is_write_device_private_entry(entry)) {
+			}
+
+			if (is_write_device_private_entry(entry)) {
 				/*
 				 * We do not preserve soft-dirtiness. See
 				 * copy_one_pte() for explanation.
@@ -163,11 +165,18 @@  static unsigned long change_pte_range(st
 				newpte = swp_entry_to_pte(entry);
 				if (pte_swp_uffd_wp(oldpte))
 					newpte = pte_swp_mkuffd_wp(newpte);
-			} else {
-				newpte = oldpte;
 			}
 
-			if (uffd_wp)
+			/*
+			 * do nothing for changing uffd_wp if oldpte is a
+			 * swap entry.
+			 * That can lead to data mismatch if the page we
+			 * are going to write protect is swapped out when
+			 * sending the UFFDIO_WRITEPROTECT.
+			 */
+			if (is_swap_pte(oldpte))
+				newpte = oldpte;
+			else if (uffd_wp)
 				newpte = pte_swp_mkuffd_wp(newpte);
 			else if (uffd_wp_resolve)
 				newpte = pte_swp_clear_uffd_wp(newpte);