mbox series

[mm-unstable,RFC,00/26] mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all architectures with swap PTEs

Message ID 20221206144730.163732-1-david@redhat.com (mailing list archive)
Headers show
Series mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all architectures with swap PTEs | expand

Message

David Hildenbrand Dec. 6, 2022, 2:47 p.m. UTC
This is the follow-up on [1]:
	[PATCH v2 0/8] mm: COW fixes part 3: reliable GUP R/W FOLL_GET of
	anonymous pages

After we implemented __HAVE_ARCH_PTE_SWP_EXCLUSIVE on most prominent
enterprise architectures, implement __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all
remaining architectures that support swap PTEs.

This makes sure that exclusive anonymous pages will stay exclusive, even
after they were swapped out -- for example, making GUP R/W FOLL_GET of
anonymous pages reliable. Details can be found in [1].

This primarily fixes remaining known O_DIRECT memory corruptions that can
happen on concurrent swapout, whereby we can lose DMA reads to a page
(modifying the user page by writing to it).

To verify, there are two test cases (requiring swap space, obviously):
(1) The O_DIRECT+swapout test case [2] from Andrea. This test case tries
    triggering a race condition.
(2) My vmsplice() test case [3] that tries to detect if the exclusive
    marker was lost during swapout, not relying on a race condition.


For example, on 32bit x86 (with and without PAE), my test case fails
without these patches:
	$ ./test_swp_exclusive
	FAIL: page was replaced during COW
But succeeds with these patches:
	$ ./test_swp_exclusive 
	PASS: page was not replaced during COW


Why implement __HAVE_ARCH_PTE_SWP_EXCLUSIVE for all architectures, even
the ones where swap support might be in a questionable state? This is the
first step towards removing "readable_exclusive" migration entries, and
instead using pte_swp_exclusive() also with (readable) migration entries
instead (as suggested by Peter). The only missing piece for that is
supporting pmd_swp_exclusive() on relevant architectures with THP
migration support.

As all relevant architectures now implement __HAVE_ARCH_PTE_SWP_EXCLUSIVE,,
we can drop __HAVE_ARCH_PTE_SWP_EXCLUSIVE in the last patch.


RFC because some of the swap PTE layouts are really tricky and I really
need some feedback related to deciphering these layouts and "using yet
unused PTE bits in swap PTEs". I tried cross-compiling all relevant setups
(phew, I might only miss some power/nohash variants), but only tested on
x86 so far.

CCing arch maintainers only on this cover letter and on the respective
patch(es).


[1] https://lkml.kernel.org/r/20220329164329.208407-1-david@redhat.com
[2] https://gitlab.com/aarcange/kernel-testcases-for-v5.11/-/blob/main/page_count_do_wp_page-swap.c
[3] https://gitlab.com/davidhildenbrand/scratchspace/-/blob/main/test_swp_exclusive.c

David Hildenbrand (26):
  mm/debug_vm_pgtable: more pte_swp_exclusive() sanity checks
  alpha/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
  arc/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
  arm/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
  csky/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
  hexagon/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
  ia64/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
  loongarch/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
  m68k/mm: remove dummy __swp definitions for nommu
  m68k/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
  microblaze/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
  mips/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
  nios2/mm: refactor swap PTE layout
  nios2/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
  openrisc/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
  parisc/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
  powerpc/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on 32bit book3s
  powerpc/nohash/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
  riscv/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
  sh/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
  sparc/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on 32bit
  sparc/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on 64bit
  um/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
  x86/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE also on 32bit
  xtensa/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
  mm: remove __HAVE_ARCH_PTE_SWP_EXCLUSIVE

 arch/alpha/include/asm/pgtable.h              | 40 ++++++++-
 arch/arc/include/asm/pgtable-bits-arcv2.h     | 26 +++++-
 arch/arm/include/asm/pgtable-2level.h         |  3 +
 arch/arm/include/asm/pgtable-3level.h         |  3 +
 arch/arm/include/asm/pgtable.h                | 34 ++++++--
 arch/arm64/include/asm/pgtable.h              |  1 -
 arch/csky/abiv1/inc/abi/pgtable-bits.h        | 13 ++-
 arch/csky/abiv2/inc/abi/pgtable-bits.h        | 19 ++--
 arch/csky/include/asm/pgtable.h               | 17 ++++
 arch/hexagon/include/asm/pgtable.h            | 36 ++++++--
 arch/ia64/include/asm/pgtable.h               | 31 ++++++-
 arch/loongarch/include/asm/pgtable-bits.h     |  4 +
 arch/loongarch/include/asm/pgtable.h          | 38 +++++++-
 arch/m68k/include/asm/mcf_pgtable.h           | 35 +++++++-
 arch/m68k/include/asm/motorola_pgtable.h      | 37 +++++++-
 arch/m68k/include/asm/pgtable_no.h            |  6 --
 arch/m68k/include/asm/sun3_pgtable.h          | 38 +++++++-
 arch/microblaze/include/asm/pgtable.h         | 44 +++++++---
 arch/mips/include/asm/pgtable-32.h            | 86 ++++++++++++++++---
 arch/mips/include/asm/pgtable-64.h            | 23 ++++-
 arch/mips/include/asm/pgtable.h               | 35 ++++++++
 arch/nios2/include/asm/pgtable-bits.h         |  3 +
 arch/nios2/include/asm/pgtable.h              | 37 ++++++--
 arch/openrisc/include/asm/pgtable.h           | 40 +++++++--
 arch/parisc/include/asm/pgtable.h             | 40 ++++++++-
 arch/powerpc/include/asm/book3s/32/pgtable.h  | 37 ++++++--
 arch/powerpc/include/asm/book3s/64/pgtable.h  |  1 -
 arch/powerpc/include/asm/nohash/32/pgtable.h  | 22 +++--
 arch/powerpc/include/asm/nohash/32/pte-40x.h  |  6 +-
 arch/powerpc/include/asm/nohash/32/pte-44x.h  | 18 +---
 arch/powerpc/include/asm/nohash/32/pte-85xx.h |  4 +-
 arch/powerpc/include/asm/nohash/64/pgtable.h  | 24 +++++-
 arch/powerpc/include/asm/nohash/pgtable.h     | 15 ++++
 arch/powerpc/include/asm/nohash/pte-e500.h    |  1 -
 arch/riscv/include/asm/pgtable-bits.h         |  3 +
 arch/riscv/include/asm/pgtable.h              | 28 ++++--
 arch/s390/include/asm/pgtable.h               |  1 -
 arch/sh/include/asm/pgtable_32.h              | 53 +++++++++---
 arch/sparc/include/asm/pgtable_32.h           | 26 +++++-
 arch/sparc/include/asm/pgtable_64.h           | 37 +++++++-
 arch/sparc/include/asm/pgtsrmmu.h             | 14 +--
 arch/um/include/asm/pgtable.h                 | 36 +++++++-
 arch/x86/include/asm/pgtable-2level.h         | 26 ++++--
 arch/x86/include/asm/pgtable-3level.h         | 26 +++++-
 arch/x86/include/asm/pgtable.h                |  3 -
 arch/xtensa/include/asm/pgtable.h             | 31 +++++--
 include/linux/pgtable.h                       | 29 -------
 mm/debug_vm_pgtable.c                         | 25 +++++-
 mm/memory.c                                   |  4 -
 mm/rmap.c                                     | 11 ---
 50 files changed, 943 insertions(+), 227 deletions(-)

Comments

David Hildenbrand Dec. 14, 2022, 11:22 a.m. UTC | #1
On 06.12.22 15:47, David Hildenbrand wrote:
> This is the follow-up on [1]:
> 	[PATCH v2 0/8] mm: COW fixes part 3: reliable GUP R/W FOLL_GET of
> 	anonymous pages
> 
> After we implemented __HAVE_ARCH_PTE_SWP_EXCLUSIVE on most prominent
> enterprise architectures, implement __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all
> remaining architectures that support swap PTEs.
> 
> This makes sure that exclusive anonymous pages will stay exclusive, even
> after they were swapped out -- for example, making GUP R/W FOLL_GET of
> anonymous pages reliable. Details can be found in [1].
> 
> This primarily fixes remaining known O_DIRECT memory corruptions that can
> happen on concurrent swapout, whereby we can lose DMA reads to a page
> (modifying the user page by writing to it).
> 
> To verify, there are two test cases (requiring swap space, obviously):
> (1) The O_DIRECT+swapout test case [2] from Andrea. This test case tries
>      triggering a race condition.
> (2) My vmsplice() test case [3] that tries to detect if the exclusive
>      marker was lost during swapout, not relying on a race condition.
> 
> 
> For example, on 32bit x86 (with and without PAE), my test case fails
> without these patches:
> 	$ ./test_swp_exclusive
> 	FAIL: page was replaced during COW
> But succeeds with these patches:
> 	$ ./test_swp_exclusive
> 	PASS: page was not replaced during COW
> 
> 
> Why implement __HAVE_ARCH_PTE_SWP_EXCLUSIVE for all architectures, even
> the ones where swap support might be in a questionable state? This is the
> first step towards removing "readable_exclusive" migration entries, and
> instead using pte_swp_exclusive() also with (readable) migration entries
> instead (as suggested by Peter). The only missing piece for that is
> supporting pmd_swp_exclusive() on relevant architectures with THP
> migration support.
> 
> As all relevant architectures now implement __HAVE_ARCH_PTE_SWP_EXCLUSIVE,,
> we can drop __HAVE_ARCH_PTE_SWP_EXCLUSIVE in the last patch.
> 
> 
> RFC because some of the swap PTE layouts are really tricky and I really
> need some feedback related to deciphering these layouts and "using yet
> unused PTE bits in swap PTEs". I tried cross-compiling all relevant setups
> (phew, I might only miss some power/nohash variants), but only tested on
> x86 so far.

As I was messing with sparc64 either way and got debian to boot under 
QEMU, I verified that the sparc64 change also seems to work as expected 
(under sun4u).
Huacai Chen Dec. 18, 2022, 3:32 a.m. UTC | #2
Hi, David,

What is the opposite of exclusive here? Shared or inclusive? I prefer
pte_swp_mkshared() or pte_swp_mkinclusive() rather than
pte_swp_clear_exclusive(). Existing examples: dirty/clean, young/old
...

Huacai

On Tue, Dec 6, 2022 at 10:48 PM David Hildenbrand <david@redhat.com> wrote:
>
> This is the follow-up on [1]:
>         [PATCH v2 0/8] mm: COW fixes part 3: reliable GUP R/W FOLL_GET of
>         anonymous pages
>
> After we implemented __HAVE_ARCH_PTE_SWP_EXCLUSIVE on most prominent
> enterprise architectures, implement __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all
> remaining architectures that support swap PTEs.
>
> This makes sure that exclusive anonymous pages will stay exclusive, even
> after they were swapped out -- for example, making GUP R/W FOLL_GET of
> anonymous pages reliable. Details can be found in [1].
>
> This primarily fixes remaining known O_DIRECT memory corruptions that can
> happen on concurrent swapout, whereby we can lose DMA reads to a page
> (modifying the user page by writing to it).
>
> To verify, there are two test cases (requiring swap space, obviously):
> (1) The O_DIRECT+swapout test case [2] from Andrea. This test case tries
>     triggering a race condition.
> (2) My vmsplice() test case [3] that tries to detect if the exclusive
>     marker was lost during swapout, not relying on a race condition.
>
>
> For example, on 32bit x86 (with and without PAE), my test case fails
> without these patches:
>         $ ./test_swp_exclusive
>         FAIL: page was replaced during COW
> But succeeds with these patches:
>         $ ./test_swp_exclusive
>         PASS: page was not replaced during COW
>
>
> Why implement __HAVE_ARCH_PTE_SWP_EXCLUSIVE for all architectures, even
> the ones where swap support might be in a questionable state? This is the
> first step towards removing "readable_exclusive" migration entries, and
> instead using pte_swp_exclusive() also with (readable) migration entries
> instead (as suggested by Peter). The only missing piece for that is
> supporting pmd_swp_exclusive() on relevant architectures with THP
> migration support.
>
> As all relevant architectures now implement __HAVE_ARCH_PTE_SWP_EXCLUSIVE,,
> we can drop __HAVE_ARCH_PTE_SWP_EXCLUSIVE in the last patch.
>
>
> RFC because some of the swap PTE layouts are really tricky and I really
> need some feedback related to deciphering these layouts and "using yet
> unused PTE bits in swap PTEs". I tried cross-compiling all relevant setups
> (phew, I might only miss some power/nohash variants), but only tested on
> x86 so far.
>
> CCing arch maintainers only on this cover letter and on the respective
> patch(es).
>
>
> [1] https://lkml.kernel.org/r/20220329164329.208407-1-david@redhat.com
> [2] https://gitlab.com/aarcange/kernel-testcases-for-v5.11/-/blob/main/page_count_do_wp_page-swap.c
> [3] https://gitlab.com/davidhildenbrand/scratchspace/-/blob/main/test_swp_exclusive.c
>
> David Hildenbrand (26):
>   mm/debug_vm_pgtable: more pte_swp_exclusive() sanity checks
>   alpha/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
>   arc/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
>   arm/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
>   csky/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
>   hexagon/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
>   ia64/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
>   loongarch/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
>   m68k/mm: remove dummy __swp definitions for nommu
>   m68k/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
>   microblaze/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
>   mips/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
>   nios2/mm: refactor swap PTE layout
>   nios2/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
>   openrisc/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
>   parisc/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
>   powerpc/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on 32bit book3s
>   powerpc/nohash/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
>   riscv/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
>   sh/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
>   sparc/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on 32bit
>   sparc/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on 64bit
>   um/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
>   x86/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE also on 32bit
>   xtensa/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
>   mm: remove __HAVE_ARCH_PTE_SWP_EXCLUSIVE
>
>  arch/alpha/include/asm/pgtable.h              | 40 ++++++++-
>  arch/arc/include/asm/pgtable-bits-arcv2.h     | 26 +++++-
>  arch/arm/include/asm/pgtable-2level.h         |  3 +
>  arch/arm/include/asm/pgtable-3level.h         |  3 +
>  arch/arm/include/asm/pgtable.h                | 34 ++++++--
>  arch/arm64/include/asm/pgtable.h              |  1 -
>  arch/csky/abiv1/inc/abi/pgtable-bits.h        | 13 ++-
>  arch/csky/abiv2/inc/abi/pgtable-bits.h        | 19 ++--
>  arch/csky/include/asm/pgtable.h               | 17 ++++
>  arch/hexagon/include/asm/pgtable.h            | 36 ++++++--
>  arch/ia64/include/asm/pgtable.h               | 31 ++++++-
>  arch/loongarch/include/asm/pgtable-bits.h     |  4 +
>  arch/loongarch/include/asm/pgtable.h          | 38 +++++++-
>  arch/m68k/include/asm/mcf_pgtable.h           | 35 +++++++-
>  arch/m68k/include/asm/motorola_pgtable.h      | 37 +++++++-
>  arch/m68k/include/asm/pgtable_no.h            |  6 --
>  arch/m68k/include/asm/sun3_pgtable.h          | 38 +++++++-
>  arch/microblaze/include/asm/pgtable.h         | 44 +++++++---
>  arch/mips/include/asm/pgtable-32.h            | 86 ++++++++++++++++---
>  arch/mips/include/asm/pgtable-64.h            | 23 ++++-
>  arch/mips/include/asm/pgtable.h               | 35 ++++++++
>  arch/nios2/include/asm/pgtable-bits.h         |  3 +
>  arch/nios2/include/asm/pgtable.h              | 37 ++++++--
>  arch/openrisc/include/asm/pgtable.h           | 40 +++++++--
>  arch/parisc/include/asm/pgtable.h             | 40 ++++++++-
>  arch/powerpc/include/asm/book3s/32/pgtable.h  | 37 ++++++--
>  arch/powerpc/include/asm/book3s/64/pgtable.h  |  1 -
>  arch/powerpc/include/asm/nohash/32/pgtable.h  | 22 +++--
>  arch/powerpc/include/asm/nohash/32/pte-40x.h  |  6 +-
>  arch/powerpc/include/asm/nohash/32/pte-44x.h  | 18 +---
>  arch/powerpc/include/asm/nohash/32/pte-85xx.h |  4 +-
>  arch/powerpc/include/asm/nohash/64/pgtable.h  | 24 +++++-
>  arch/powerpc/include/asm/nohash/pgtable.h     | 15 ++++
>  arch/powerpc/include/asm/nohash/pte-e500.h    |  1 -
>  arch/riscv/include/asm/pgtable-bits.h         |  3 +
>  arch/riscv/include/asm/pgtable.h              | 28 ++++--
>  arch/s390/include/asm/pgtable.h               |  1 -
>  arch/sh/include/asm/pgtable_32.h              | 53 +++++++++---
>  arch/sparc/include/asm/pgtable_32.h           | 26 +++++-
>  arch/sparc/include/asm/pgtable_64.h           | 37 +++++++-
>  arch/sparc/include/asm/pgtsrmmu.h             | 14 +--
>  arch/um/include/asm/pgtable.h                 | 36 +++++++-
>  arch/x86/include/asm/pgtable-2level.h         | 26 ++++--
>  arch/x86/include/asm/pgtable-3level.h         | 26 +++++-
>  arch/x86/include/asm/pgtable.h                |  3 -
>  arch/xtensa/include/asm/pgtable.h             | 31 +++++--
>  include/linux/pgtable.h                       | 29 -------
>  mm/debug_vm_pgtable.c                         | 25 +++++-
>  mm/memory.c                                   |  4 -
>  mm/rmap.c                                     | 11 ---
>  50 files changed, 943 insertions(+), 227 deletions(-)
>
> --
> 2.38.1
>
>
David Hildenbrand Dec. 18, 2022, 9:59 a.m. UTC | #3
On 18.12.22 04:32, Huacai Chen wrote:
> Hi, David,
> 
> What is the opposite of exclusive here? Shared or inclusive? I prefer
> pte_swp_mkshared() or pte_swp_mkinclusive() rather than
> pte_swp_clear_exclusive(). Existing examples: dirty/clean, young/old
> ...

Hi Huacai,

thanks for having a look!

Please note that this series doesn't add these primitives but merely 
implements them on all remaining architectures.

Having that said, the semantics are "exclusive" vs. "maybe shared", not 
"exclusive" vs. "shared" or sth. else. It would have to be 
pte_swp_mkmaybe_shared().


Note that this naming matches just the way we handle it for the other 
pte_swp_ flags we have, namely:

pte_swp_mksoft_dirty()
pte_swp_soft_dirty()
pte_swp_clear_soft_dirty()

and

pte_swp_mkuffd_wp()
pte_swp_uffd_wp()
pte_swp_clear_uffd_wp()


For example, we also (thankfully) didn't call it pte_mksoft_clean().
Grepping for "pte_swp.*soft_dirty" gives you the full picture.

Thanks!

David

> 
> Huacai
> 
> On Tue, Dec 6, 2022 at 10:48 PM David Hildenbrand <david@redhat.com> wrote:
>>
>> This is the follow-up on [1]:
>>          [PATCH v2 0/8] mm: COW fixes part 3: reliable GUP R/W FOLL_GET of
>>          anonymous pages
>>
>> After we implemented __HAVE_ARCH_PTE_SWP_EXCLUSIVE on most prominent
>> enterprise architectures, implement __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all
>> remaining architectures that support swap PTEs.
>>
>> This makes sure that exclusive anonymous pages will stay exclusive, even
>> after they were swapped out -- for example, making GUP R/W FOLL_GET of
>> anonymous pages reliable. Details can be found in [1].
>>
>> This primarily fixes remaining known O_DIRECT memory corruptions that can
>> happen on concurrent swapout, whereby we can lose DMA reads to a page
>> (modifying the user page by writing to it).
>>
>> To verify, there are two test cases (requiring swap space, obviously):
>> (1) The O_DIRECT+swapout test case [2] from Andrea. This test case tries
>>      triggering a race condition.
>> (2) My vmsplice() test case [3] that tries to detect if the exclusive
>>      marker was lost during swapout, not relying on a race condition.
>>
>>
>> For example, on 32bit x86 (with and without PAE), my test case fails
>> without these patches:
>>          $ ./test_swp_exclusive
>>          FAIL: page was replaced during COW
>> But succeeds with these patches:
>>          $ ./test_swp_exclusive
>>          PASS: page was not replaced during COW
>>
>>
>> Why implement __HAVE_ARCH_PTE_SWP_EXCLUSIVE for all architectures, even
>> the ones where swap support might be in a questionable state? This is the
>> first step towards removing "readable_exclusive" migration entries, and
>> instead using pte_swp_exclusive() also with (readable) migration entries
>> instead (as suggested by Peter). The only missing piece for that is
>> supporting pmd_swp_exclusive() on relevant architectures with THP
>> migration support.
>>
>> As all relevant architectures now implement __HAVE_ARCH_PTE_SWP_EXCLUSIVE,,
>> we can drop __HAVE_ARCH_PTE_SWP_EXCLUSIVE in the last patch.
>>
>>
>> RFC because some of the swap PTE layouts are really tricky and I really
>> need some feedback related to deciphering these layouts and "using yet
>> unused PTE bits in swap PTEs". I tried cross-compiling all relevant setups
>> (phew, I might only miss some power/nohash variants), but only tested on
>> x86 so far.
>>
>> CCing arch maintainers only on this cover letter and on the respective
>> patch(es).
>>
>>
>> [1] https://lkml.kernel.org/r/20220329164329.208407-1-david@redhat.com
>> [2] https://gitlab.com/aarcange/kernel-testcases-for-v5.11/-/blob/main/page_count_do_wp_page-swap.c
>> [3] https://gitlab.com/davidhildenbrand/scratchspace/-/blob/main/test_swp_exclusive.c
>>
>> David Hildenbrand (26):
>>    mm/debug_vm_pgtable: more pte_swp_exclusive() sanity checks
>>    alpha/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
>>    arc/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
>>    arm/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
>>    csky/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
>>    hexagon/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
>>    ia64/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
>>    loongarch/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
>>    m68k/mm: remove dummy __swp definitions for nommu
>>    m68k/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
>>    microblaze/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
>>    mips/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
>>    nios2/mm: refactor swap PTE layout
>>    nios2/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
>>    openrisc/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
>>    parisc/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
>>    powerpc/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on 32bit book3s
>>    powerpc/nohash/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
>>    riscv/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
>>    sh/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
>>    sparc/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on 32bit
>>    sparc/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on 64bit
>>    um/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
>>    x86/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE also on 32bit
>>    xtensa/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
>>    mm: remove __HAVE_ARCH_PTE_SWP_EXCLUSIVE
>>
>>   arch/alpha/include/asm/pgtable.h              | 40 ++++++++-
>>   arch/arc/include/asm/pgtable-bits-arcv2.h     | 26 +++++-
>>   arch/arm/include/asm/pgtable-2level.h         |  3 +
>>   arch/arm/include/asm/pgtable-3level.h         |  3 +
>>   arch/arm/include/asm/pgtable.h                | 34 ++++++--
>>   arch/arm64/include/asm/pgtable.h              |  1 -
>>   arch/csky/abiv1/inc/abi/pgtable-bits.h        | 13 ++-
>>   arch/csky/abiv2/inc/abi/pgtable-bits.h        | 19 ++--
>>   arch/csky/include/asm/pgtable.h               | 17 ++++
>>   arch/hexagon/include/asm/pgtable.h            | 36 ++++++--
>>   arch/ia64/include/asm/pgtable.h               | 31 ++++++-
>>   arch/loongarch/include/asm/pgtable-bits.h     |  4 +
>>   arch/loongarch/include/asm/pgtable.h          | 38 +++++++-
>>   arch/m68k/include/asm/mcf_pgtable.h           | 35 +++++++-
>>   arch/m68k/include/asm/motorola_pgtable.h      | 37 +++++++-
>>   arch/m68k/include/asm/pgtable_no.h            |  6 --
>>   arch/m68k/include/asm/sun3_pgtable.h          | 38 +++++++-
>>   arch/microblaze/include/asm/pgtable.h         | 44 +++++++---
>>   arch/mips/include/asm/pgtable-32.h            | 86 ++++++++++++++++---
>>   arch/mips/include/asm/pgtable-64.h            | 23 ++++-
>>   arch/mips/include/asm/pgtable.h               | 35 ++++++++
>>   arch/nios2/include/asm/pgtable-bits.h         |  3 +
>>   arch/nios2/include/asm/pgtable.h              | 37 ++++++--
>>   arch/openrisc/include/asm/pgtable.h           | 40 +++++++--
>>   arch/parisc/include/asm/pgtable.h             | 40 ++++++++-
>>   arch/powerpc/include/asm/book3s/32/pgtable.h  | 37 ++++++--
>>   arch/powerpc/include/asm/book3s/64/pgtable.h  |  1 -
>>   arch/powerpc/include/asm/nohash/32/pgtable.h  | 22 +++--
>>   arch/powerpc/include/asm/nohash/32/pte-40x.h  |  6 +-
>>   arch/powerpc/include/asm/nohash/32/pte-44x.h  | 18 +---
>>   arch/powerpc/include/asm/nohash/32/pte-85xx.h |  4 +-
>>   arch/powerpc/include/asm/nohash/64/pgtable.h  | 24 +++++-
>>   arch/powerpc/include/asm/nohash/pgtable.h     | 15 ++++
>>   arch/powerpc/include/asm/nohash/pte-e500.h    |  1 -
>>   arch/riscv/include/asm/pgtable-bits.h         |  3 +
>>   arch/riscv/include/asm/pgtable.h              | 28 ++++--
>>   arch/s390/include/asm/pgtable.h               |  1 -
>>   arch/sh/include/asm/pgtable_32.h              | 53 +++++++++---
>>   arch/sparc/include/asm/pgtable_32.h           | 26 +++++-
>>   arch/sparc/include/asm/pgtable_64.h           | 37 +++++++-
>>   arch/sparc/include/asm/pgtsrmmu.h             | 14 +--
>>   arch/um/include/asm/pgtable.h                 | 36 +++++++-
>>   arch/x86/include/asm/pgtable-2level.h         | 26 ++++--
>>   arch/x86/include/asm/pgtable-3level.h         | 26 +++++-
>>   arch/x86/include/asm/pgtable.h                |  3 -
>>   arch/xtensa/include/asm/pgtable.h             | 31 +++++--
>>   include/linux/pgtable.h                       | 29 -------
>>   mm/debug_vm_pgtable.c                         | 25 +++++-
>>   mm/memory.c                                   |  4 -
>>   mm/rmap.c                                     | 11 ---
>>   50 files changed, 943 insertions(+), 227 deletions(-)
>>
>> --
>> 2.38.1
>>
>>
>
Huacai Chen Dec. 19, 2022, 1:40 a.m. UTC | #4
On Sun, Dec 18, 2022 at 5:59 PM David Hildenbrand <david@redhat.com> wrote:
>
> On 18.12.22 04:32, Huacai Chen wrote:
> > Hi, David,
> >
> > What is the opposite of exclusive here? Shared or inclusive? I prefer
> > pte_swp_mkshared() or pte_swp_mkinclusive() rather than
> > pte_swp_clear_exclusive(). Existing examples: dirty/clean, young/old
> > ...
>
> Hi Huacai,
>
> thanks for having a look!
>
> Please note that this series doesn't add these primitives but merely
> implements them on all remaining architectures.
>
> Having that said, the semantics are "exclusive" vs. "maybe shared", not
> "exclusive" vs. "shared" or sth. else. It would have to be
> pte_swp_mkmaybe_shared().
>
>
> Note that this naming matches just the way we handle it for the other
> pte_swp_ flags we have, namely:
>
> pte_swp_mksoft_dirty()
> pte_swp_soft_dirty()
> pte_swp_clear_soft_dirty()
>
> and
>
> pte_swp_mkuffd_wp()
> pte_swp_uffd_wp()
> pte_swp_clear_uffd_wp()
>
>
> For example, we also (thankfully) didn't call it pte_mksoft_clean().
> Grepping for "pte_swp.*soft_dirty" gives you the full picture.
>
> Thanks!
OK, got it.

Huacai
>
> David
>
> >
> > Huacai
> >
> > On Tue, Dec 6, 2022 at 10:48 PM David Hildenbrand <david@redhat.com> wrote:
> >>
> >> This is the follow-up on [1]:
> >>          [PATCH v2 0/8] mm: COW fixes part 3: reliable GUP R/W FOLL_GET of
> >>          anonymous pages
> >>
> >> After we implemented __HAVE_ARCH_PTE_SWP_EXCLUSIVE on most prominent
> >> enterprise architectures, implement __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all
> >> remaining architectures that support swap PTEs.
> >>
> >> This makes sure that exclusive anonymous pages will stay exclusive, even
> >> after they were swapped out -- for example, making GUP R/W FOLL_GET of
> >> anonymous pages reliable. Details can be found in [1].
> >>
> >> This primarily fixes remaining known O_DIRECT memory corruptions that can
> >> happen on concurrent swapout, whereby we can lose DMA reads to a page
> >> (modifying the user page by writing to it).
> >>
> >> To verify, there are two test cases (requiring swap space, obviously):
> >> (1) The O_DIRECT+swapout test case [2] from Andrea. This test case tries
> >>      triggering a race condition.
> >> (2) My vmsplice() test case [3] that tries to detect if the exclusive
> >>      marker was lost during swapout, not relying on a race condition.
> >>
> >>
> >> For example, on 32bit x86 (with and without PAE), my test case fails
> >> without these patches:
> >>          $ ./test_swp_exclusive
> >>          FAIL: page was replaced during COW
> >> But succeeds with these patches:
> >>          $ ./test_swp_exclusive
> >>          PASS: page was not replaced during COW
> >>
> >>
> >> Why implement __HAVE_ARCH_PTE_SWP_EXCLUSIVE for all architectures, even
> >> the ones where swap support might be in a questionable state? This is the
> >> first step towards removing "readable_exclusive" migration entries, and
> >> instead using pte_swp_exclusive() also with (readable) migration entries
> >> instead (as suggested by Peter). The only missing piece for that is
> >> supporting pmd_swp_exclusive() on relevant architectures with THP
> >> migration support.
> >>
> >> As all relevant architectures now implement __HAVE_ARCH_PTE_SWP_EXCLUSIVE,,
> >> we can drop __HAVE_ARCH_PTE_SWP_EXCLUSIVE in the last patch.
> >>
> >>
> >> RFC because some of the swap PTE layouts are really tricky and I really
> >> need some feedback related to deciphering these layouts and "using yet
> >> unused PTE bits in swap PTEs". I tried cross-compiling all relevant setups
> >> (phew, I might only miss some power/nohash variants), but only tested on
> >> x86 so far.
> >>
> >> CCing arch maintainers only on this cover letter and on the respective
> >> patch(es).
> >>
> >>
> >> [1] https://lkml.kernel.org/r/20220329164329.208407-1-david@redhat.com
> >> [2] https://gitlab.com/aarcange/kernel-testcases-for-v5.11/-/blob/main/page_count_do_wp_page-swap.c
> >> [3] https://gitlab.com/davidhildenbrand/scratchspace/-/blob/main/test_swp_exclusive.c
> >>
> >> David Hildenbrand (26):
> >>    mm/debug_vm_pgtable: more pte_swp_exclusive() sanity checks
> >>    alpha/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
> >>    arc/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
> >>    arm/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
> >>    csky/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
> >>    hexagon/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
> >>    ia64/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
> >>    loongarch/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
> >>    m68k/mm: remove dummy __swp definitions for nommu
> >>    m68k/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
> >>    microblaze/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
> >>    mips/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
> >>    nios2/mm: refactor swap PTE layout
> >>    nios2/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
> >>    openrisc/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
> >>    parisc/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
> >>    powerpc/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on 32bit book3s
> >>    powerpc/nohash/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
> >>    riscv/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
> >>    sh/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
> >>    sparc/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on 32bit
> >>    sparc/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on 64bit
> >>    um/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
> >>    x86/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE also on 32bit
> >>    xtensa/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
> >>    mm: remove __HAVE_ARCH_PTE_SWP_EXCLUSIVE
> >>
> >>   arch/alpha/include/asm/pgtable.h              | 40 ++++++++-
> >>   arch/arc/include/asm/pgtable-bits-arcv2.h     | 26 +++++-
> >>   arch/arm/include/asm/pgtable-2level.h         |  3 +
> >>   arch/arm/include/asm/pgtable-3level.h         |  3 +
> >>   arch/arm/include/asm/pgtable.h                | 34 ++++++--
> >>   arch/arm64/include/asm/pgtable.h              |  1 -
> >>   arch/csky/abiv1/inc/abi/pgtable-bits.h        | 13 ++-
> >>   arch/csky/abiv2/inc/abi/pgtable-bits.h        | 19 ++--
> >>   arch/csky/include/asm/pgtable.h               | 17 ++++
> >>   arch/hexagon/include/asm/pgtable.h            | 36 ++++++--
> >>   arch/ia64/include/asm/pgtable.h               | 31 ++++++-
> >>   arch/loongarch/include/asm/pgtable-bits.h     |  4 +
> >>   arch/loongarch/include/asm/pgtable.h          | 38 +++++++-
> >>   arch/m68k/include/asm/mcf_pgtable.h           | 35 +++++++-
> >>   arch/m68k/include/asm/motorola_pgtable.h      | 37 +++++++-
> >>   arch/m68k/include/asm/pgtable_no.h            |  6 --
> >>   arch/m68k/include/asm/sun3_pgtable.h          | 38 +++++++-
> >>   arch/microblaze/include/asm/pgtable.h         | 44 +++++++---
> >>   arch/mips/include/asm/pgtable-32.h            | 86 ++++++++++++++++---
> >>   arch/mips/include/asm/pgtable-64.h            | 23 ++++-
> >>   arch/mips/include/asm/pgtable.h               | 35 ++++++++
> >>   arch/nios2/include/asm/pgtable-bits.h         |  3 +
> >>   arch/nios2/include/asm/pgtable.h              | 37 ++++++--
> >>   arch/openrisc/include/asm/pgtable.h           | 40 +++++++--
> >>   arch/parisc/include/asm/pgtable.h             | 40 ++++++++-
> >>   arch/powerpc/include/asm/book3s/32/pgtable.h  | 37 ++++++--
> >>   arch/powerpc/include/asm/book3s/64/pgtable.h  |  1 -
> >>   arch/powerpc/include/asm/nohash/32/pgtable.h  | 22 +++--
> >>   arch/powerpc/include/asm/nohash/32/pte-40x.h  |  6 +-
> >>   arch/powerpc/include/asm/nohash/32/pte-44x.h  | 18 +---
> >>   arch/powerpc/include/asm/nohash/32/pte-85xx.h |  4 +-
> >>   arch/powerpc/include/asm/nohash/64/pgtable.h  | 24 +++++-
> >>   arch/powerpc/include/asm/nohash/pgtable.h     | 15 ++++
> >>   arch/powerpc/include/asm/nohash/pte-e500.h    |  1 -
> >>   arch/riscv/include/asm/pgtable-bits.h         |  3 +
> >>   arch/riscv/include/asm/pgtable.h              | 28 ++++--
> >>   arch/s390/include/asm/pgtable.h               |  1 -
> >>   arch/sh/include/asm/pgtable_32.h              | 53 +++++++++---
> >>   arch/sparc/include/asm/pgtable_32.h           | 26 +++++-
> >>   arch/sparc/include/asm/pgtable_64.h           | 37 +++++++-
> >>   arch/sparc/include/asm/pgtsrmmu.h             | 14 +--
> >>   arch/um/include/asm/pgtable.h                 | 36 +++++++-
> >>   arch/x86/include/asm/pgtable-2level.h         | 26 ++++--
> >>   arch/x86/include/asm/pgtable-3level.h         | 26 +++++-
> >>   arch/x86/include/asm/pgtable.h                |  3 -
> >>   arch/xtensa/include/asm/pgtable.h             | 31 +++++--
> >>   include/linux/pgtable.h                       | 29 -------
> >>   mm/debug_vm_pgtable.c                         | 25 +++++-
> >>   mm/memory.c                                   |  4 -
> >>   mm/rmap.c                                     | 11 ---
> >>   50 files changed, 943 insertions(+), 227 deletions(-)
> >>
> >> --
> >> 2.38.1
> >>
> >>
> >
>
> --
> Thanks,
>
> David / dhildenb
>