mbox series

[resend,RFC,0/9] s390: fixes, cleanups and optimizations for page table walkers

Message ID 20210909162248.14969-1-david@redhat.com (mailing list archive)
Headers show
Series s390: fixes, cleanups and optimizations for page table walkers | expand

Message

David Hildenbrand Sept. 9, 2021, 4:22 p.m. UTC
Resend because I missed ccing people on the actual patches ...

RFC because the patches are essentially untested and I did not actually
try to trigger any of the things these patches are supposed to fix. It
merely matches my current understanding (and what other code does :) ). I
did compile-test as far as possible.

After learning more about the wonderful world of page tables and their
interaction with the mmap_sem and VMAs, I spotted some issues in our
page table walkers that allow user space to trigger nasty behavior when
playing dirty tricks with munmap() or mmap() of hugetlb. While some issues
should be hard to trigger, others are fairly easy because we provide
conventient interfaces (e.g., KVM_S390_GET_SKEYS and KVM_S390_SET_SKEYS).

Future work:
- Don't use get_locked_pte() when it's not required to actually allocate
  page tables -- similar to how storage keys are now handled. Examples are
  get_pgste() and __gmap_zap.
- Don't use get_locked_pte() and instead let page fault logic allocate page
  tables when we actually do need page tables -- also, similar to how
  storage keys are now handled. Examples are set_pgste_bits() and
  pgste_perform_essa().
- Maybe switch to mm/pagewalk.c to avoid custom page table walkers. For
  __gmap_zap() that's very easy.

Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Janosch Frank <frankja@linux.ibm.com>
Cc: Cornelia Huck <cohuck@redhat.com>
Cc: Claudio Imbrenda <imbrenda@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Niklas Schnelle <schnelle@linux.ibm.com>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Ulrich Weigand <Ulrich.Weigand@de.ibm.com>

David Hildenbrand (9):
  s390/gmap: validate VMA in __gmap_zap()
  s390/gmap: don't unconditionally call pte_unmap_unlock() in
    __gmap_zap()
  s390/mm: validate VMA in PGSTE manipulation functions
  s390/mm: fix VMA and page table handling code in storage key handling
    functions
  s390/uv: fully validate the VMA before calling follow_page()
  s390/pci_mmio: fully validate the VMA before calling follow_pte()
  s390/mm: no need for pte_alloc_map_lock() if we know the pmd is
    present
  s390/mm: optimize set_guest_storage_key()
  s390/mm: optimize reset_guest_reference_bit()

 arch/s390/kernel/uv.c    |   2 +-
 arch/s390/mm/gmap.c      |  11 +++-
 arch/s390/mm/pgtable.c   | 109 +++++++++++++++++++++++++++------------
 arch/s390/pci/pci_mmio.c |   4 +-
 4 files changed, 89 insertions(+), 37 deletions(-)


base-commit: 7d2a07b769330c34b4deabeed939325c77a7ec2f

Comments

Claudio Imbrenda Sept. 14, 2021, 4:50 p.m. UTC | #1
On Thu,  9 Sep 2021 18:22:39 +0200
David Hildenbrand <david@redhat.com> wrote:

> Resend because I missed ccing people on the actual patches ...
> 
> RFC because the patches are essentially untested and I did not actually
> try to trigger any of the things these patches are supposed to fix. It

this is an interesting series, and the code makes sense, but I would
really like to see some regression tests, and maybe even some
selftests to trigger (at least some of) the issues.

the follow-up question is: how did we manage to go on so long without
noticing these issues? :D

> merely matches my current understanding (and what other code does :) ). I
> did compile-test as far as possible.
> 
> After learning more about the wonderful world of page tables and their
> interaction with the mmap_sem and VMAs, I spotted some issues in our
> page table walkers that allow user space to trigger nasty behavior when
> playing dirty tricks with munmap() or mmap() of hugetlb. While some issues
> should be hard to trigger, others are fairly easy because we provide
> conventient interfaces (e.g., KVM_S390_GET_SKEYS and KVM_S390_SET_SKEYS).
> 
> Future work:
> - Don't use get_locked_pte() when it's not required to actually allocate
>   page tables -- similar to how storage keys are now handled. Examples are
>   get_pgste() and __gmap_zap.
> - Don't use get_locked_pte() and instead let page fault logic allocate page
>   tables when we actually do need page tables -- also, similar to how
>   storage keys are now handled. Examples are set_pgste_bits() and
>   pgste_perform_essa().
> - Maybe switch to mm/pagewalk.c to avoid custom page table walkers. For
>   __gmap_zap() that's very easy.
> 
> Cc: Christian Borntraeger <borntraeger@de.ibm.com>
> Cc: Janosch Frank <frankja@linux.ibm.com>
> Cc: Cornelia Huck <cohuck@redhat.com>
> Cc: Claudio Imbrenda <imbrenda@linux.ibm.com>
> Cc: Heiko Carstens <hca@linux.ibm.com>
> Cc: Vasily Gorbik <gor@linux.ibm.com>
> Cc: Niklas Schnelle <schnelle@linux.ibm.com>
> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
> Cc: Ulrich Weigand <Ulrich.Weigand@de.ibm.com>
> 
> David Hildenbrand (9):
>   s390/gmap: validate VMA in __gmap_zap()
>   s390/gmap: don't unconditionally call pte_unmap_unlock() in
>     __gmap_zap()
>   s390/mm: validate VMA in PGSTE manipulation functions
>   s390/mm: fix VMA and page table handling code in storage key handling
>     functions
>   s390/uv: fully validate the VMA before calling follow_page()
>   s390/pci_mmio: fully validate the VMA before calling follow_pte()
>   s390/mm: no need for pte_alloc_map_lock() if we know the pmd is
>     present
>   s390/mm: optimize set_guest_storage_key()
>   s390/mm: optimize reset_guest_reference_bit()
> 
>  arch/s390/kernel/uv.c    |   2 +-
>  arch/s390/mm/gmap.c      |  11 +++-
>  arch/s390/mm/pgtable.c   | 109 +++++++++++++++++++++++++++------------
>  arch/s390/pci/pci_mmio.c |   4 +-
>  4 files changed, 89 insertions(+), 37 deletions(-)
> 
> 
> base-commit: 7d2a07b769330c34b4deabeed939325c77a7ec2f
David Hildenbrand Sept. 14, 2021, 6:06 p.m. UTC | #2
On 14.09.21 18:50, Claudio Imbrenda wrote:
> On Thu,  9 Sep 2021 18:22:39 +0200
> David Hildenbrand <david@redhat.com> wrote:
> 
>> Resend because I missed ccing people on the actual patches ...
>>
>> RFC because the patches are essentially untested and I did not actually
>> try to trigger any of the things these patches are supposed to fix. It
> 
> this is an interesting series, and the code makes sense, but I would
> really like to see some regression tests, and maybe even some
> selftests to trigger (at least some of) the issues.

Yep, it most certainly needs regression testing before picking any of 
this. selftests would be great, but I won't find time for it in the 
foreseeable future.

> 
> the follow-up question is: how did we manage to go on so long without
> noticing these issues? :D

Excellent question - I guess we simply weren't aware of the dos and 
don'ts when dealing with process page tables :)

> 
>> merely matches my current understanding (and what other code does :) ). I
>> did compile-test as far as possible.
>>
>> After learning more about the wonderful world of page tables and their
>> interaction with the mmap_sem and VMAs, I spotted some issues in our
>> page table walkers that allow user space to trigger nasty behavior when
>> playing dirty tricks with munmap() or mmap() of hugetlb. While some issues
>> should be hard to trigger, others are fairly easy because we provide
>> conventient interfaces (e.g., KVM_S390_GET_SKEYS and KVM_S390_SET_SKEYS).
>>
>> Future work:
>> - Don't use get_locked_pte() when it's not required to actually allocate
>>    page tables -- similar to how storage keys are now handled. Examples are
>>    get_pgste() and __gmap_zap.
>> - Don't use get_locked_pte() and instead let page fault logic allocate page
>>    tables when we actually do need page tables -- also, similar to how
>>    storage keys are now handled. Examples are set_pgste_bits() and
>>    pgste_perform_essa().
>> - Maybe switch to mm/pagewalk.c to avoid custom page table walkers. For
>>    __gmap_zap() that's very easy.
>>
>> Cc: Christian Borntraeger <borntraeger@de.ibm.com>
>> Cc: Janosch Frank <frankja@linux.ibm.com>
>> Cc: Cornelia Huck <cohuck@redhat.com>
>> Cc: Claudio Imbrenda <imbrenda@linux.ibm.com>
>> Cc: Heiko Carstens <hca@linux.ibm.com>
>> Cc: Vasily Gorbik <gor@linux.ibm.com>
>> Cc: Niklas Schnelle <schnelle@linux.ibm.com>
>> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
>> Cc: Ulrich Weigand <Ulrich.Weigand@de.ibm.com>
>>
>> David Hildenbrand (9):
>>    s390/gmap: validate VMA in __gmap_zap()
>>    s390/gmap: don't unconditionally call pte_unmap_unlock() in
>>      __gmap_zap()
>>    s390/mm: validate VMA in PGSTE manipulation functions
>>    s390/mm: fix VMA and page table handling code in storage key handling
>>      functions
>>    s390/uv: fully validate the VMA before calling follow_page()
>>    s390/pci_mmio: fully validate the VMA before calling follow_pte()
>>    s390/mm: no need for pte_alloc_map_lock() if we know the pmd is
>>      present
>>    s390/mm: optimize set_guest_storage_key()
>>    s390/mm: optimize reset_guest_reference_bit()
>>
>>   arch/s390/kernel/uv.c    |   2 +-
>>   arch/s390/mm/gmap.c      |  11 +++-
>>   arch/s390/mm/pgtable.c   | 109 +++++++++++++++++++++++++++------------
>>   arch/s390/pci/pci_mmio.c |   4 +-
>>   4 files changed, 89 insertions(+), 37 deletions(-)
>>
>>
>> base-commit: 7d2a07b769330c34b4deabeed939325c77a7ec2f
>
Heiko Carstens Sept. 28, 2021, 10:59 a.m. UTC | #3
On Thu, Sep 09, 2021 at 06:22:39PM +0200, David Hildenbrand wrote:
> Resend because I missed ccing people on the actual patches ...
> 
> RFC because the patches are essentially untested and I did not actually
> try to trigger any of the things these patches are supposed to fix. It
> merely matches my current understanding (and what other code does :) ). I
> did compile-test as far as possible.
> 
> After learning more about the wonderful world of page tables and their
> interaction with the mmap_sem and VMAs, I spotted some issues in our
> page table walkers that allow user space to trigger nasty behavior when
> playing dirty tricks with munmap() or mmap() of hugetlb. While some issues
> should be hard to trigger, others are fairly easy because we provide
> conventient interfaces (e.g., KVM_S390_GET_SKEYS and KVM_S390_SET_SKEYS).
> 
> Future work:
> - Don't use get_locked_pte() when it's not required to actually allocate
>   page tables -- similar to how storage keys are now handled. Examples are
>   get_pgste() and __gmap_zap.
> - Don't use get_locked_pte() and instead let page fault logic allocate page
>   tables when we actually do need page tables -- also, similar to how
>   storage keys are now handled. Examples are set_pgste_bits() and
>   pgste_perform_essa().
> - Maybe switch to mm/pagewalk.c to avoid custom page table walkers. For
>   __gmap_zap() that's very easy.
> 
> Cc: Christian Borntraeger <borntraeger@de.ibm.com>
> Cc: Janosch Frank <frankja@linux.ibm.com>
> Cc: Cornelia Huck <cohuck@redhat.com>
> Cc: Claudio Imbrenda <imbrenda@linux.ibm.com>
> Cc: Heiko Carstens <hca@linux.ibm.com>
> Cc: Vasily Gorbik <gor@linux.ibm.com>
> Cc: Niklas Schnelle <schnelle@linux.ibm.com>
> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
> Cc: Ulrich Weigand <Ulrich.Weigand@de.ibm.com>

For the whole series:
Acked-by: Heiko Carstens <hca@linux.ibm.com>

Christian, given that this is mostly about KVM I'd assume this should
go via the KVM tree. Patch 6 (pci_mmio) is already upstream.
Christian Borntraeger Sept. 28, 2021, 11:06 a.m. UTC | #4
Am 28.09.21 um 12:59 schrieb Heiko Carstens:
> On Thu, Sep 09, 2021 at 06:22:39PM +0200, David Hildenbrand wrote:
>> Resend because I missed ccing people on the actual patches ...
>>
>> RFC because the patches are essentially untested and I did not actually
>> try to trigger any of the things these patches are supposed to fix. It
>> merely matches my current understanding (and what other code does :) ). I
>> did compile-test as far as possible.
>>
>> After learning more about the wonderful world of page tables and their
>> interaction with the mmap_sem and VMAs, I spotted some issues in our
>> page table walkers that allow user space to trigger nasty behavior when
>> playing dirty tricks with munmap() or mmap() of hugetlb. While some issues
>> should be hard to trigger, others are fairly easy because we provide
>> conventient interfaces (e.g., KVM_S390_GET_SKEYS and KVM_S390_SET_SKEYS).
>>
>> Future work:
>> - Don't use get_locked_pte() when it's not required to actually allocate
>>    page tables -- similar to how storage keys are now handled. Examples are
>>    get_pgste() and __gmap_zap.
>> - Don't use get_locked_pte() and instead let page fault logic allocate page
>>    tables when we actually do need page tables -- also, similar to how
>>    storage keys are now handled. Examples are set_pgste_bits() and
>>    pgste_perform_essa().
>> - Maybe switch to mm/pagewalk.c to avoid custom page table walkers. For
>>    __gmap_zap() that's very easy.
>>
>> Cc: Christian Borntraeger <borntraeger@de.ibm.com>
>> Cc: Janosch Frank <frankja@linux.ibm.com>
>> Cc: Cornelia Huck <cohuck@redhat.com>
>> Cc: Claudio Imbrenda <imbrenda@linux.ibm.com>
>> Cc: Heiko Carstens <hca@linux.ibm.com>
>> Cc: Vasily Gorbik <gor@linux.ibm.com>
>> Cc: Niklas Schnelle <schnelle@linux.ibm.com>
>> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
>> Cc: Ulrich Weigand <Ulrich.Weigand@de.ibm.com>
> 
> For the whole series:
> Acked-by: Heiko Carstens <hca@linux.ibm.com>
> 
> Christian, given that this is mostly about KVM I'd assume this should
> go via the KVM tree. Patch 6 (pci_mmio) is already upstream.

Right, I think I will queue this even without testing for now.
Claudio, is patch 7 ok for you with the explanation from David?
Claudio Imbrenda Sept. 28, 2021, 2:38 p.m. UTC | #5
On Tue, 28 Sep 2021 13:06:26 +0200
Christian Borntraeger <borntraeger@de.ibm.com> wrote:

> Am 28.09.21 um 12:59 schrieb Heiko Carstens:
> > On Thu, Sep 09, 2021 at 06:22:39PM +0200, David Hildenbrand wrote:  
> >> Resend because I missed ccing people on the actual patches ...
> >>
> >> RFC because the patches are essentially untested and I did not actually
> >> try to trigger any of the things these patches are supposed to fix. It
> >> merely matches my current understanding (and what other code does :) ). I
> >> did compile-test as far as possible.
> >>
> >> After learning more about the wonderful world of page tables and their
> >> interaction with the mmap_sem and VMAs, I spotted some issues in our
> >> page table walkers that allow user space to trigger nasty behavior when
> >> playing dirty tricks with munmap() or mmap() of hugetlb. While some issues
> >> should be hard to trigger, others are fairly easy because we provide
> >> conventient interfaces (e.g., KVM_S390_GET_SKEYS and KVM_S390_SET_SKEYS).
> >>
> >> Future work:
> >> - Don't use get_locked_pte() when it's not required to actually allocate
> >>    page tables -- similar to how storage keys are now handled. Examples are
> >>    get_pgste() and __gmap_zap.
> >> - Don't use get_locked_pte() and instead let page fault logic allocate page
> >>    tables when we actually do need page tables -- also, similar to how
> >>    storage keys are now handled. Examples are set_pgste_bits() and
> >>    pgste_perform_essa().
> >> - Maybe switch to mm/pagewalk.c to avoid custom page table walkers. For
> >>    __gmap_zap() that's very easy.
> >>
> >> Cc: Christian Borntraeger <borntraeger@de.ibm.com>
> >> Cc: Janosch Frank <frankja@linux.ibm.com>
> >> Cc: Cornelia Huck <cohuck@redhat.com>
> >> Cc: Claudio Imbrenda <imbrenda@linux.ibm.com>
> >> Cc: Heiko Carstens <hca@linux.ibm.com>
> >> Cc: Vasily Gorbik <gor@linux.ibm.com>
> >> Cc: Niklas Schnelle <schnelle@linux.ibm.com>
> >> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
> >> Cc: Ulrich Weigand <Ulrich.Weigand@de.ibm.com>  
> > 
> > For the whole series:
> > Acked-by: Heiko Carstens <hca@linux.ibm.com>
> > 
> > Christian, given that this is mostly about KVM I'd assume this should
> > go via the KVM tree. Patch 6 (pci_mmio) is already upstream.  
> 
> Right, I think I will queue this even without testing for now.
> Claudio, is patch 7 ok for you with the explanation from David?

yes
Christian Borntraeger Sept. 28, 2021, 4:03 p.m. UTC | #6
Am 09.09.21 um 18:22 schrieb David Hildenbrand:
> Resend because I missed ccing people on the actual patches ...
> 
> RFC because the patches are essentially untested and I did not actually
> try to trigger any of the things these patches are supposed to fix. It
> merely matches my current understanding (and what other code does :) ). I
> did compile-test as far as possible.
> 
> After learning more about the wonderful world of page tables and their
> interaction with the mmap_sem and VMAs, I spotted some issues in our
> page table walkers that allow user space to trigger nasty behavior when
> playing dirty tricks with munmap() or mmap() of hugetlb. While some issues
> should be hard to trigger, others are fairly easy because we provide
> conventient interfaces (e.g., KVM_S390_GET_SKEYS and KVM_S390_SET_SKEYS).
> 
> Future work:
> - Don't use get_locked_pte() when it's not required to actually allocate
>    page tables -- similar to how storage keys are now handled. Examples are
>    get_pgste() and __gmap_zap.
> - Don't use get_locked_pte() and instead let page fault logic allocate page
>    tables when we actually do need page tables -- also, similar to how
>    storage keys are now handled. Examples are set_pgste_bits() and
>    pgste_perform_essa().
> - Maybe switch to mm/pagewalk.c to avoid custom page table walkers. For
>    __gmap_zap() that's very easy.
> 
> Cc: Christian Borntraeger <borntraeger@de.ibm.com>
> Cc: Janosch Frank <frankja@linux.ibm.com>
> Cc: Cornelia Huck <cohuck@redhat.com>
> Cc: Claudio Imbrenda <imbrenda@linux.ibm.com>
> Cc: Heiko Carstens <hca@linux.ibm.com>
> Cc: Vasily Gorbik <gor@linux.ibm.com>
> Cc: Niklas Schnelle <schnelle@linux.ibm.com>
> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
> Cc: Ulrich Weigand <Ulrich.Weigand@de.ibm.com>
> 
> David Hildenbrand (9):
>    s390/gmap: validate VMA in __gmap_zap()
>    s390/gmap: don't unconditionally call pte_unmap_unlock() in
>      __gmap_zap()
>    s390/mm: validate VMA in PGSTE manipulation functions
>    s390/mm: fix VMA and page table handling code in storage key handling
>      functions
>    s390/uv: fully validate the VMA before calling follow_page()
>    s390/pci_mmio: fully validate the VMA before calling follow_pte()
>    s390/mm: no need for pte_alloc_map_lock() if we know the pmd is
>      present
>    s390/mm: optimize set_guest_storage_key()
>    s390/mm: optimize reset_guest_reference_bit()
> 
>   arch/s390/kernel/uv.c    |   2 +-
>   arch/s390/mm/gmap.c      |  11 +++-
>   arch/s390/mm/pgtable.c   | 109 +++++++++++++++++++++++++++------------
>   arch/s390/pci/pci_mmio.c |   4 +-
>   4 files changed, 89 insertions(+), 37 deletions(-)
> 

Thanks applied. Will run some test on those commits, but its already pushed
out to my next tree to give it some coverage.
g