Message ID | 20210909162248.14969-1-david@redhat.com (mailing list archive) |
---|---|
Headers | show |
Series | s390: fixes, cleanups and optimizations for page table walkers | expand |
On Thu, 9 Sep 2021 18:22:39 +0200 David Hildenbrand <david@redhat.com> wrote: > Resend because I missed ccing people on the actual patches ... > > RFC because the patches are essentially untested and I did not actually > try to trigger any of the things these patches are supposed to fix. It this is an interesting series, and the code makes sense, but I would really like to see some regression tests, and maybe even some selftests to trigger (at least some of) the issues. the follow-up question is: how did we manage to go on so long without noticing these issues? :D > merely matches my current understanding (and what other code does :) ). I > did compile-test as far as possible. > > After learning more about the wonderful world of page tables and their > interaction with the mmap_sem and VMAs, I spotted some issues in our > page table walkers that allow user space to trigger nasty behavior when > playing dirty tricks with munmap() or mmap() of hugetlb. While some issues > should be hard to trigger, others are fairly easy because we provide > conventient interfaces (e.g., KVM_S390_GET_SKEYS and KVM_S390_SET_SKEYS). > > Future work: > - Don't use get_locked_pte() when it's not required to actually allocate > page tables -- similar to how storage keys are now handled. Examples are > get_pgste() and __gmap_zap. > - Don't use get_locked_pte() and instead let page fault logic allocate page > tables when we actually do need page tables -- also, similar to how > storage keys are now handled. Examples are set_pgste_bits() and > pgste_perform_essa(). > - Maybe switch to mm/pagewalk.c to avoid custom page table walkers. For > __gmap_zap() that's very easy. > > Cc: Christian Borntraeger <borntraeger@de.ibm.com> > Cc: Janosch Frank <frankja@linux.ibm.com> > Cc: Cornelia Huck <cohuck@redhat.com> > Cc: Claudio Imbrenda <imbrenda@linux.ibm.com> > Cc: Heiko Carstens <hca@linux.ibm.com> > Cc: Vasily Gorbik <gor@linux.ibm.com> > Cc: Niklas Schnelle <schnelle@linux.ibm.com> > Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> > Cc: Ulrich Weigand <Ulrich.Weigand@de.ibm.com> > > David Hildenbrand (9): > s390/gmap: validate VMA in __gmap_zap() > s390/gmap: don't unconditionally call pte_unmap_unlock() in > __gmap_zap() > s390/mm: validate VMA in PGSTE manipulation functions > s390/mm: fix VMA and page table handling code in storage key handling > functions > s390/uv: fully validate the VMA before calling follow_page() > s390/pci_mmio: fully validate the VMA before calling follow_pte() > s390/mm: no need for pte_alloc_map_lock() if we know the pmd is > present > s390/mm: optimize set_guest_storage_key() > s390/mm: optimize reset_guest_reference_bit() > > arch/s390/kernel/uv.c | 2 +- > arch/s390/mm/gmap.c | 11 +++- > arch/s390/mm/pgtable.c | 109 +++++++++++++++++++++++++++------------ > arch/s390/pci/pci_mmio.c | 4 +- > 4 files changed, 89 insertions(+), 37 deletions(-) > > > base-commit: 7d2a07b769330c34b4deabeed939325c77a7ec2f
On 14.09.21 18:50, Claudio Imbrenda wrote: > On Thu, 9 Sep 2021 18:22:39 +0200 > David Hildenbrand <david@redhat.com> wrote: > >> Resend because I missed ccing people on the actual patches ... >> >> RFC because the patches are essentially untested and I did not actually >> try to trigger any of the things these patches are supposed to fix. It > > this is an interesting series, and the code makes sense, but I would > really like to see some regression tests, and maybe even some > selftests to trigger (at least some of) the issues. Yep, it most certainly needs regression testing before picking any of this. selftests would be great, but I won't find time for it in the foreseeable future. > > the follow-up question is: how did we manage to go on so long without > noticing these issues? :D Excellent question - I guess we simply weren't aware of the dos and don'ts when dealing with process page tables :) > >> merely matches my current understanding (and what other code does :) ). I >> did compile-test as far as possible. >> >> After learning more about the wonderful world of page tables and their >> interaction with the mmap_sem and VMAs, I spotted some issues in our >> page table walkers that allow user space to trigger nasty behavior when >> playing dirty tricks with munmap() or mmap() of hugetlb. While some issues >> should be hard to trigger, others are fairly easy because we provide >> conventient interfaces (e.g., KVM_S390_GET_SKEYS and KVM_S390_SET_SKEYS). >> >> Future work: >> - Don't use get_locked_pte() when it's not required to actually allocate >> page tables -- similar to how storage keys are now handled. Examples are >> get_pgste() and __gmap_zap. >> - Don't use get_locked_pte() and instead let page fault logic allocate page >> tables when we actually do need page tables -- also, similar to how >> storage keys are now handled. Examples are set_pgste_bits() and >> pgste_perform_essa(). >> - Maybe switch to mm/pagewalk.c to avoid custom page table walkers. For >> __gmap_zap() that's very easy. >> >> Cc: Christian Borntraeger <borntraeger@de.ibm.com> >> Cc: Janosch Frank <frankja@linux.ibm.com> >> Cc: Cornelia Huck <cohuck@redhat.com> >> Cc: Claudio Imbrenda <imbrenda@linux.ibm.com> >> Cc: Heiko Carstens <hca@linux.ibm.com> >> Cc: Vasily Gorbik <gor@linux.ibm.com> >> Cc: Niklas Schnelle <schnelle@linux.ibm.com> >> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> >> Cc: Ulrich Weigand <Ulrich.Weigand@de.ibm.com> >> >> David Hildenbrand (9): >> s390/gmap: validate VMA in __gmap_zap() >> s390/gmap: don't unconditionally call pte_unmap_unlock() in >> __gmap_zap() >> s390/mm: validate VMA in PGSTE manipulation functions >> s390/mm: fix VMA and page table handling code in storage key handling >> functions >> s390/uv: fully validate the VMA before calling follow_page() >> s390/pci_mmio: fully validate the VMA before calling follow_pte() >> s390/mm: no need for pte_alloc_map_lock() if we know the pmd is >> present >> s390/mm: optimize set_guest_storage_key() >> s390/mm: optimize reset_guest_reference_bit() >> >> arch/s390/kernel/uv.c | 2 +- >> arch/s390/mm/gmap.c | 11 +++- >> arch/s390/mm/pgtable.c | 109 +++++++++++++++++++++++++++------------ >> arch/s390/pci/pci_mmio.c | 4 +- >> 4 files changed, 89 insertions(+), 37 deletions(-) >> >> >> base-commit: 7d2a07b769330c34b4deabeed939325c77a7ec2f >
On Thu, Sep 09, 2021 at 06:22:39PM +0200, David Hildenbrand wrote: > Resend because I missed ccing people on the actual patches ... > > RFC because the patches are essentially untested and I did not actually > try to trigger any of the things these patches are supposed to fix. It > merely matches my current understanding (and what other code does :) ). I > did compile-test as far as possible. > > After learning more about the wonderful world of page tables and their > interaction with the mmap_sem and VMAs, I spotted some issues in our > page table walkers that allow user space to trigger nasty behavior when > playing dirty tricks with munmap() or mmap() of hugetlb. While some issues > should be hard to trigger, others are fairly easy because we provide > conventient interfaces (e.g., KVM_S390_GET_SKEYS and KVM_S390_SET_SKEYS). > > Future work: > - Don't use get_locked_pte() when it's not required to actually allocate > page tables -- similar to how storage keys are now handled. Examples are > get_pgste() and __gmap_zap. > - Don't use get_locked_pte() and instead let page fault logic allocate page > tables when we actually do need page tables -- also, similar to how > storage keys are now handled. Examples are set_pgste_bits() and > pgste_perform_essa(). > - Maybe switch to mm/pagewalk.c to avoid custom page table walkers. For > __gmap_zap() that's very easy. > > Cc: Christian Borntraeger <borntraeger@de.ibm.com> > Cc: Janosch Frank <frankja@linux.ibm.com> > Cc: Cornelia Huck <cohuck@redhat.com> > Cc: Claudio Imbrenda <imbrenda@linux.ibm.com> > Cc: Heiko Carstens <hca@linux.ibm.com> > Cc: Vasily Gorbik <gor@linux.ibm.com> > Cc: Niklas Schnelle <schnelle@linux.ibm.com> > Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> > Cc: Ulrich Weigand <Ulrich.Weigand@de.ibm.com> For the whole series: Acked-by: Heiko Carstens <hca@linux.ibm.com> Christian, given that this is mostly about KVM I'd assume this should go via the KVM tree. Patch 6 (pci_mmio) is already upstream.
Am 28.09.21 um 12:59 schrieb Heiko Carstens: > On Thu, Sep 09, 2021 at 06:22:39PM +0200, David Hildenbrand wrote: >> Resend because I missed ccing people on the actual patches ... >> >> RFC because the patches are essentially untested and I did not actually >> try to trigger any of the things these patches are supposed to fix. It >> merely matches my current understanding (and what other code does :) ). I >> did compile-test as far as possible. >> >> After learning more about the wonderful world of page tables and their >> interaction with the mmap_sem and VMAs, I spotted some issues in our >> page table walkers that allow user space to trigger nasty behavior when >> playing dirty tricks with munmap() or mmap() of hugetlb. While some issues >> should be hard to trigger, others are fairly easy because we provide >> conventient interfaces (e.g., KVM_S390_GET_SKEYS and KVM_S390_SET_SKEYS). >> >> Future work: >> - Don't use get_locked_pte() when it's not required to actually allocate >> page tables -- similar to how storage keys are now handled. Examples are >> get_pgste() and __gmap_zap. >> - Don't use get_locked_pte() and instead let page fault logic allocate page >> tables when we actually do need page tables -- also, similar to how >> storage keys are now handled. Examples are set_pgste_bits() and >> pgste_perform_essa(). >> - Maybe switch to mm/pagewalk.c to avoid custom page table walkers. For >> __gmap_zap() that's very easy. >> >> Cc: Christian Borntraeger <borntraeger@de.ibm.com> >> Cc: Janosch Frank <frankja@linux.ibm.com> >> Cc: Cornelia Huck <cohuck@redhat.com> >> Cc: Claudio Imbrenda <imbrenda@linux.ibm.com> >> Cc: Heiko Carstens <hca@linux.ibm.com> >> Cc: Vasily Gorbik <gor@linux.ibm.com> >> Cc: Niklas Schnelle <schnelle@linux.ibm.com> >> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> >> Cc: Ulrich Weigand <Ulrich.Weigand@de.ibm.com> > > For the whole series: > Acked-by: Heiko Carstens <hca@linux.ibm.com> > > Christian, given that this is mostly about KVM I'd assume this should > go via the KVM tree. Patch 6 (pci_mmio) is already upstream. Right, I think I will queue this even without testing for now. Claudio, is patch 7 ok for you with the explanation from David?
On Tue, 28 Sep 2021 13:06:26 +0200 Christian Borntraeger <borntraeger@de.ibm.com> wrote: > Am 28.09.21 um 12:59 schrieb Heiko Carstens: > > On Thu, Sep 09, 2021 at 06:22:39PM +0200, David Hildenbrand wrote: > >> Resend because I missed ccing people on the actual patches ... > >> > >> RFC because the patches are essentially untested and I did not actually > >> try to trigger any of the things these patches are supposed to fix. It > >> merely matches my current understanding (and what other code does :) ). I > >> did compile-test as far as possible. > >> > >> After learning more about the wonderful world of page tables and their > >> interaction with the mmap_sem and VMAs, I spotted some issues in our > >> page table walkers that allow user space to trigger nasty behavior when > >> playing dirty tricks with munmap() or mmap() of hugetlb. While some issues > >> should be hard to trigger, others are fairly easy because we provide > >> conventient interfaces (e.g., KVM_S390_GET_SKEYS and KVM_S390_SET_SKEYS). > >> > >> Future work: > >> - Don't use get_locked_pte() when it's not required to actually allocate > >> page tables -- similar to how storage keys are now handled. Examples are > >> get_pgste() and __gmap_zap. > >> - Don't use get_locked_pte() and instead let page fault logic allocate page > >> tables when we actually do need page tables -- also, similar to how > >> storage keys are now handled. Examples are set_pgste_bits() and > >> pgste_perform_essa(). > >> - Maybe switch to mm/pagewalk.c to avoid custom page table walkers. For > >> __gmap_zap() that's very easy. > >> > >> Cc: Christian Borntraeger <borntraeger@de.ibm.com> > >> Cc: Janosch Frank <frankja@linux.ibm.com> > >> Cc: Cornelia Huck <cohuck@redhat.com> > >> Cc: Claudio Imbrenda <imbrenda@linux.ibm.com> > >> Cc: Heiko Carstens <hca@linux.ibm.com> > >> Cc: Vasily Gorbik <gor@linux.ibm.com> > >> Cc: Niklas Schnelle <schnelle@linux.ibm.com> > >> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> > >> Cc: Ulrich Weigand <Ulrich.Weigand@de.ibm.com> > > > > For the whole series: > > Acked-by: Heiko Carstens <hca@linux.ibm.com> > > > > Christian, given that this is mostly about KVM I'd assume this should > > go via the KVM tree. Patch 6 (pci_mmio) is already upstream. > > Right, I think I will queue this even without testing for now. > Claudio, is patch 7 ok for you with the explanation from David? yes
Am 09.09.21 um 18:22 schrieb David Hildenbrand: > Resend because I missed ccing people on the actual patches ... > > RFC because the patches are essentially untested and I did not actually > try to trigger any of the things these patches are supposed to fix. It > merely matches my current understanding (and what other code does :) ). I > did compile-test as far as possible. > > After learning more about the wonderful world of page tables and their > interaction with the mmap_sem and VMAs, I spotted some issues in our > page table walkers that allow user space to trigger nasty behavior when > playing dirty tricks with munmap() or mmap() of hugetlb. While some issues > should be hard to trigger, others are fairly easy because we provide > conventient interfaces (e.g., KVM_S390_GET_SKEYS and KVM_S390_SET_SKEYS). > > Future work: > - Don't use get_locked_pte() when it's not required to actually allocate > page tables -- similar to how storage keys are now handled. Examples are > get_pgste() and __gmap_zap. > - Don't use get_locked_pte() and instead let page fault logic allocate page > tables when we actually do need page tables -- also, similar to how > storage keys are now handled. Examples are set_pgste_bits() and > pgste_perform_essa(). > - Maybe switch to mm/pagewalk.c to avoid custom page table walkers. For > __gmap_zap() that's very easy. > > Cc: Christian Borntraeger <borntraeger@de.ibm.com> > Cc: Janosch Frank <frankja@linux.ibm.com> > Cc: Cornelia Huck <cohuck@redhat.com> > Cc: Claudio Imbrenda <imbrenda@linux.ibm.com> > Cc: Heiko Carstens <hca@linux.ibm.com> > Cc: Vasily Gorbik <gor@linux.ibm.com> > Cc: Niklas Schnelle <schnelle@linux.ibm.com> > Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> > Cc: Ulrich Weigand <Ulrich.Weigand@de.ibm.com> > > David Hildenbrand (9): > s390/gmap: validate VMA in __gmap_zap() > s390/gmap: don't unconditionally call pte_unmap_unlock() in > __gmap_zap() > s390/mm: validate VMA in PGSTE manipulation functions > s390/mm: fix VMA and page table handling code in storage key handling > functions > s390/uv: fully validate the VMA before calling follow_page() > s390/pci_mmio: fully validate the VMA before calling follow_pte() > s390/mm: no need for pte_alloc_map_lock() if we know the pmd is > present > s390/mm: optimize set_guest_storage_key() > s390/mm: optimize reset_guest_reference_bit() > > arch/s390/kernel/uv.c | 2 +- > arch/s390/mm/gmap.c | 11 +++- > arch/s390/mm/pgtable.c | 109 +++++++++++++++++++++++++++------------ > arch/s390/pci/pci_mmio.c | 4 +- > 4 files changed, 89 insertions(+), 37 deletions(-) > Thanks applied. Will run some test on those commits, but its already pushed out to my next tree to give it some coverage. g