Message ID | 20190123222315.1122-1-jglisse@redhat.com (mailing list archive) |
---|---|
Headers | show |
Series | mmu notifier provide context informations | expand |
On Wed, Jan 23, 2019 at 2:23 PM <jglisse@redhat.com> wrote: > > From: Jérôme Glisse <jglisse@redhat.com> > > Hi Andrew, i see that you still have my event patch in you queue [1]. > This patchset replace that single patch and is broken down in further > step so that it is easier to review and ascertain that no mistake were > made during mechanical changes. Here are the step: > > Patch 1 - add the enum values > Patch 2 - coccinelle semantic patch to convert all call site of > mmu_notifier_range_init to default enum value and also > to passing down the vma when it is available > Patch 3 - update many call site to more accurate enum values > Patch 4 - add the information to the mmu_notifier_range struct > Patch 5 - helper to test if a range is updated to read only > > All the remaining patches are update to various driver to demonstrate > how this new information get use by device driver. I build tested > with make all and make all minus everything that enable mmu notifier > ie building with MMU_NOTIFIER=no. Also tested with some radeon,amd > gpu and intel gpu. > > If they are no objections i believe best plan would be to merge the > the first 5 patches (all mm changes) through your queue for 5.1 and > then to delay driver update to each individual driver tree for 5.2. > This will allow each individual device driver maintainer time to more > thouroughly test this more then my own testing. > > Note that i also intend to use this feature further in nouveau and > HMM down the road. I also expect that other user like KVM might be > interested into leveraging this new information to optimize some of > there secondary page table invalidation. "Down the road" users should introduce the functionality they want to consume. The common concern with preemptively including forward-looking infrastructure is realizing later that the infrastructure is not needed, or needs changing. If it has no current consumer, leave it out. > > Here is an explaination on the rational for this patchset: > > > CPU page table update can happens for many reasons, not only as a result > of a syscall (munmap(), mprotect(), mremap(), madvise(), ...) but also > as a result of kernel activities (memory compression, reclaim, migration, > ...). > > This patch introduce a set of enums that can be associated with each of > the events triggering a mmu notifier. Latter patches take advantages of > those enum values. > > - UNMAP: munmap() or mremap() > - CLEAR: page table is cleared (migration, compaction, reclaim, ...) > - PROTECTION_VMA: change in access protections for the range > - PROTECTION_PAGE: change in access protections for page in the range > - SOFT_DIRTY: soft dirtyness tracking > > Being able to identify munmap() and mremap() from other reasons why the > page table is cleared is important to allow user of mmu notifier to > update their own internal tracking structure accordingly (on munmap or > mremap it is not longer needed to track range of virtual address as it > becomes invalid). The only context information consumed in this patch set is MMU_NOTIFY_PROTECTION_VMA. What is the practical benefit of these "optimize out the case when a range is updated to read only" optimizations? Any numbers to show this is worth the code thrash? > > [1] https://www.ozlabs.org/~akpm/mmotm/broken-out/mm-mmu_notifier-contextual-information-for-event-triggering-invalidation-v2.patch > > Cc: Christian König <christian.koenig@amd.com> > Cc: Jan Kara <jack@suse.cz> > Cc: Felix Kuehling <Felix.Kuehling@amd.com> > Cc: Jason Gunthorpe <jgg@mellanox.com> > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: Matthew Wilcox <mawilcox@microsoft.com> > Cc: Ross Zwisler <zwisler@kernel.org> > Cc: Dan Williams <dan.j.williams@intel.com> > Cc: Paolo Bonzini <pbonzini@redhat.com> > Cc: Radim Krčmář <rkrcmar@redhat.com> > Cc: Michal Hocko <mhocko@kernel.org> > Cc: Ralph Campbell <rcampbell@nvidia.com> > Cc: John Hubbard <jhubbard@nvidia.com> > Cc: kvm@vger.kernel.org > Cc: dri-devel@lists.freedesktop.org > Cc: linux-rdma@vger.kernel.org > Cc: linux-fsdevel@vger.kernel.org > Cc: Arnd Bergmann <arnd@arndb.de> > > Jérôme Glisse (9): > mm/mmu_notifier: contextual information for event enums > mm/mmu_notifier: contextual information for event triggering > invalidation > mm/mmu_notifier: use correct mmu_notifier events for each invalidation > mm/mmu_notifier: pass down vma and reasons why mmu notifier is > happening > mm/mmu_notifier: mmu_notifier_range_update_to_read_only() helper > gpu/drm/radeon: optimize out the case when a range is updated to read > only > gpu/drm/amdgpu: optimize out the case when a range is updated to read > only > gpu/drm/i915: optimize out the case when a range is updated to read > only > RDMA/umem_odp: optimize out the case when a range is updated to read > only > > drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c | 13 ++++++++ > drivers/gpu/drm/i915/i915_gem_userptr.c | 16 ++++++++++ > drivers/gpu/drm/radeon/radeon_mn.c | 13 ++++++++ > drivers/infiniband/core/umem_odp.c | 22 +++++++++++-- > fs/proc/task_mmu.c | 3 +- > include/linux/mmu_notifier.h | 42 ++++++++++++++++++++++++- > include/rdma/ib_umem_odp.h | 1 + > kernel/events/uprobes.c | 3 +- > mm/huge_memory.c | 14 +++++---- > mm/hugetlb.c | 11 ++++--- > mm/khugepaged.c | 3 +- > mm/ksm.c | 6 ++-- > mm/madvise.c | 3 +- > mm/memory.c | 25 +++++++++------ > mm/migrate.c | 5 ++- > mm/mmu_notifier.c | 10 ++++++ > mm/mprotect.c | 4 ++- > mm/mremap.c | 3 +- > mm/oom_kill.c | 3 +- > mm/rmap.c | 6 ++-- > 20 files changed, 171 insertions(+), 35 deletions(-) > > -- > 2.17.2 >
On Wed, Jan 23, 2019 at 02:54:40PM -0800, Dan Williams wrote: > On Wed, Jan 23, 2019 at 2:23 PM <jglisse@redhat.com> wrote: > > > > From: Jérôme Glisse <jglisse@redhat.com> > > > > Hi Andrew, i see that you still have my event patch in you queue [1]. > > This patchset replace that single patch and is broken down in further > > step so that it is easier to review and ascertain that no mistake were > > made during mechanical changes. Here are the step: > > > > Patch 1 - add the enum values > > Patch 2 - coccinelle semantic patch to convert all call site of > > mmu_notifier_range_init to default enum value and also > > to passing down the vma when it is available > > Patch 3 - update many call site to more accurate enum values > > Patch 4 - add the information to the mmu_notifier_range struct > > Patch 5 - helper to test if a range is updated to read only > > > > All the remaining patches are update to various driver to demonstrate > > how this new information get use by device driver. I build tested > > with make all and make all minus everything that enable mmu notifier > > ie building with MMU_NOTIFIER=no. Also tested with some radeon,amd > > gpu and intel gpu. > > > > If they are no objections i believe best plan would be to merge the > > the first 5 patches (all mm changes) through your queue for 5.1 and > > then to delay driver update to each individual driver tree for 5.2. > > This will allow each individual device driver maintainer time to more > > thouroughly test this more then my own testing. > > > > Note that i also intend to use this feature further in nouveau and > > HMM down the road. I also expect that other user like KVM might be > > interested into leveraging this new information to optimize some of > > there secondary page table invalidation. > > "Down the road" users should introduce the functionality they want to > consume. The common concern with preemptively including > forward-looking infrastructure is realizing later that the > infrastructure is not needed, or needs changing. If it has no current > consumer, leave it out. This patchset already show that this is useful, what more can i do ? I know i will use this information, in nouveau for memory policy we allocate our own structure for every vma the GPU ever accessed or that userspace hinted we should set a policy for. Right now with existing mmu notifier i _must_ free those structure because i do not know if the invalidation is an munmap or something else. So i am loosing important informations and unecessarly free struct that i will have to re-allocate just couple jiffies latter. That's one way i am using this. The other way is to optimize GPU page table update just like i am doing with all the patches to RDMA/ODP and various GPU drivers. > > Here is an explaination on the rational for this patchset: > > > > > > CPU page table update can happens for many reasons, not only as a result > > of a syscall (munmap(), mprotect(), mremap(), madvise(), ...) but also > > as a result of kernel activities (memory compression, reclaim, migration, > > ...). > > > > This patch introduce a set of enums that can be associated with each of > > the events triggering a mmu notifier. Latter patches take advantages of > > those enum values. > > > > - UNMAP: munmap() or mremap() > > - CLEAR: page table is cleared (migration, compaction, reclaim, ...) > > - PROTECTION_VMA: change in access protections for the range > > - PROTECTION_PAGE: change in access protections for page in the range > > - SOFT_DIRTY: soft dirtyness tracking > > > > Being able to identify munmap() and mremap() from other reasons why the > > page table is cleared is important to allow user of mmu notifier to > > update their own internal tracking structure accordingly (on munmap or > > mremap it is not longer needed to track range of virtual address as it > > becomes invalid). > > The only context information consumed in this patch set is > MMU_NOTIFY_PROTECTION_VMA. > > What is the practical benefit of these "optimize out the case when a > range is updated to read only" optimizations? Any numbers to show this > is worth the code thrash? It depends on the workload for instance if you map to RDMA a file read only like a log file for export, all write back that would disrupt the RDMA mapping can be optimized out. See above for more reasons why it is beneficial (knowing when it is an munmap/mremap versus something else). I would have not thought that passing down information as something that controversial. Hopes this help you see the benefit of this. Cheers, Jérôme
On Wed, Jan 23, 2019 at 3:05 PM Jerome Glisse <jglisse@redhat.com> wrote: > > On Wed, Jan 23, 2019 at 02:54:40PM -0800, Dan Williams wrote: > > On Wed, Jan 23, 2019 at 2:23 PM <jglisse@redhat.com> wrote: > > > > > > From: Jérôme Glisse <jglisse@redhat.com> > > > > > > Hi Andrew, i see that you still have my event patch in you queue [1]. > > > This patchset replace that single patch and is broken down in further > > > step so that it is easier to review and ascertain that no mistake were > > > made during mechanical changes. Here are the step: > > > > > > Patch 1 - add the enum values > > > Patch 2 - coccinelle semantic patch to convert all call site of > > > mmu_notifier_range_init to default enum value and also > > > to passing down the vma when it is available > > > Patch 3 - update many call site to more accurate enum values > > > Patch 4 - add the information to the mmu_notifier_range struct > > > Patch 5 - helper to test if a range is updated to read only > > > > > > All the remaining patches are update to various driver to demonstrate > > > how this new information get use by device driver. I build tested > > > with make all and make all minus everything that enable mmu notifier > > > ie building with MMU_NOTIFIER=no. Also tested with some radeon,amd > > > gpu and intel gpu. > > > > > > If they are no objections i believe best plan would be to merge the > > > the first 5 patches (all mm changes) through your queue for 5.1 and > > > then to delay driver update to each individual driver tree for 5.2. > > > This will allow each individual device driver maintainer time to more > > > thouroughly test this more then my own testing. > > > > > > Note that i also intend to use this feature further in nouveau and > > > HMM down the road. I also expect that other user like KVM might be > > > interested into leveraging this new information to optimize some of > > > there secondary page table invalidation. > > > > "Down the road" users should introduce the functionality they want to > > consume. The common concern with preemptively including > > forward-looking infrastructure is realizing later that the > > infrastructure is not needed, or needs changing. If it has no current > > consumer, leave it out. > > This patchset already show that this is useful, what more can i do ? > I know i will use this information, in nouveau for memory policy we > allocate our own structure for every vma the GPU ever accessed or that > userspace hinted we should set a policy for. Right now with existing > mmu notifier i _must_ free those structure because i do not know if > the invalidation is an munmap or something else. So i am loosing > important informations and unecessarly free struct that i will have > to re-allocate just couple jiffies latter. That's one way i am using > this. Understood, but that still seems to say stage the core support when the nouveau enabling is ready. > The other way is to optimize GPU page table update just like i > am doing with all the patches to RDMA/ODP and various GPU drivers. Yes. > > > > > Here is an explaination on the rational for this patchset: > > > > > > > > > CPU page table update can happens for many reasons, not only as a result > > > of a syscall (munmap(), mprotect(), mremap(), madvise(), ...) but also > > > as a result of kernel activities (memory compression, reclaim, migration, > > > ...). > > > > > > This patch introduce a set of enums that can be associated with each of > > > the events triggering a mmu notifier. Latter patches take advantages of > > > those enum values. > > > > > > - UNMAP: munmap() or mremap() > > > - CLEAR: page table is cleared (migration, compaction, reclaim, ...) > > > - PROTECTION_VMA: change in access protections for the range > > > - PROTECTION_PAGE: change in access protections for page in the range > > > - SOFT_DIRTY: soft dirtyness tracking > > > > > > Being able to identify munmap() and mremap() from other reasons why the > > > page table is cleared is important to allow user of mmu notifier to > > > update their own internal tracking structure accordingly (on munmap or > > > mremap it is not longer needed to track range of virtual address as it > > > becomes invalid). > > > > The only context information consumed in this patch set is > > MMU_NOTIFY_PROTECTION_VMA. > > > > What is the practical benefit of these "optimize out the case when a > > range is updated to read only" optimizations? Any numbers to show this > > is worth the code thrash? > > It depends on the workload for instance if you map to RDMA a file > read only like a log file for export, all write back that would > disrupt the RDMA mapping can be optimized out. > > See above for more reasons why it is beneficial (knowing when it is > an munmap/mremap versus something else). > > I would have not thought that passing down information as something > that controversial. Hopes this help you see the benefit of this. I'm not asserting that it is controversial. I am asserting that whenever a changelog says "optimize" it also includes concrete data about the optimization scenario. Maybe the scenarios you have optimized are clear to the driver owners, they just weren't immediately clear to me.
Andrew what is your plan for this ? I had a discussion with Peter Xu and Andrea about change_pte() and kvm. Today the change_pte() kvm optimization is effectively disabled because of invalidate_range calls. With a minimal couple lines patch on top of this patchset we can bring back the kvm change_pte optimization and we can also optimize some other cases like for instance when write protecting after fork (but i am not sure this is something qemu does often so it might not help for real kvm workload). I will be posting a the extra patch as an RFC, but in the meantime i wanted to know what was the status for this. Jan, Christian does your previous ACK still holds for this ? On Wed, Jan 23, 2019 at 05:23:06PM -0500, jglisse@redhat.com wrote: > From: Jérôme Glisse <jglisse@redhat.com> > > Hi Andrew, i see that you still have my event patch in you queue [1]. > This patchset replace that single patch and is broken down in further > step so that it is easier to review and ascertain that no mistake were > made during mechanical changes. Here are the step: > > Patch 1 - add the enum values > Patch 2 - coccinelle semantic patch to convert all call site of > mmu_notifier_range_init to default enum value and also > to passing down the vma when it is available > Patch 3 - update many call site to more accurate enum values > Patch 4 - add the information to the mmu_notifier_range struct > Patch 5 - helper to test if a range is updated to read only > > All the remaining patches are update to various driver to demonstrate > how this new information get use by device driver. I build tested > with make all and make all minus everything that enable mmu notifier > ie building with MMU_NOTIFIER=no. Also tested with some radeon,amd > gpu and intel gpu. > > If they are no objections i believe best plan would be to merge the > the first 5 patches (all mm changes) through your queue for 5.1 and > then to delay driver update to each individual driver tree for 5.2. > This will allow each individual device driver maintainer time to more > thouroughly test this more then my own testing. > > Note that i also intend to use this feature further in nouveau and > HMM down the road. I also expect that other user like KVM might be > interested into leveraging this new information to optimize some of > there secondary page table invalidation. > > Here is an explaination on the rational for this patchset: > > > CPU page table update can happens for many reasons, not only as a result > of a syscall (munmap(), mprotect(), mremap(), madvise(), ...) but also > as a result of kernel activities (memory compression, reclaim, migration, > ...). > > This patch introduce a set of enums that can be associated with each of > the events triggering a mmu notifier. Latter patches take advantages of > those enum values. > > - UNMAP: munmap() or mremap() > - CLEAR: page table is cleared (migration, compaction, reclaim, ...) > - PROTECTION_VMA: change in access protections for the range > - PROTECTION_PAGE: change in access protections for page in the range > - SOFT_DIRTY: soft dirtyness tracking > > Being able to identify munmap() and mremap() from other reasons why the > page table is cleared is important to allow user of mmu notifier to > update their own internal tracking structure accordingly (on munmap or > mremap it is not longer needed to track range of virtual address as it > becomes invalid). > > [1] https://www.ozlabs.org/~akpm/mmotm/broken-out/mm-mmu_notifier-contextual-information-for-event-triggering-invalidation-v2.patch > > Cc: Christian König <christian.koenig@amd.com> > Cc: Jan Kara <jack@suse.cz> > Cc: Felix Kuehling <Felix.Kuehling@amd.com> > Cc: Jason Gunthorpe <jgg@mellanox.com> > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: Matthew Wilcox <mawilcox@microsoft.com> > Cc: Ross Zwisler <zwisler@kernel.org> > Cc: Dan Williams <dan.j.williams@intel.com> > Cc: Paolo Bonzini <pbonzini@redhat.com> > Cc: Radim Krčmář <rkrcmar@redhat.com> > Cc: Michal Hocko <mhocko@kernel.org> > Cc: Ralph Campbell <rcampbell@nvidia.com> > Cc: John Hubbard <jhubbard@nvidia.com> > Cc: kvm@vger.kernel.org > Cc: dri-devel@lists.freedesktop.org > Cc: linux-rdma@vger.kernel.org > Cc: linux-fsdevel@vger.kernel.org > Cc: Arnd Bergmann <arnd@arndb.de> > > Jérôme Glisse (9): > mm/mmu_notifier: contextual information for event enums > mm/mmu_notifier: contextual information for event triggering > invalidation > mm/mmu_notifier: use correct mmu_notifier events for each invalidation > mm/mmu_notifier: pass down vma and reasons why mmu notifier is > happening > mm/mmu_notifier: mmu_notifier_range_update_to_read_only() helper > gpu/drm/radeon: optimize out the case when a range is updated to read > only > gpu/drm/amdgpu: optimize out the case when a range is updated to read > only > gpu/drm/i915: optimize out the case when a range is updated to read > only > RDMA/umem_odp: optimize out the case when a range is updated to read > only > > drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c | 13 ++++++++ > drivers/gpu/drm/i915/i915_gem_userptr.c | 16 ++++++++++ > drivers/gpu/drm/radeon/radeon_mn.c | 13 ++++++++ > drivers/infiniband/core/umem_odp.c | 22 +++++++++++-- > fs/proc/task_mmu.c | 3 +- > include/linux/mmu_notifier.h | 42 ++++++++++++++++++++++++- > include/rdma/ib_umem_odp.h | 1 + > kernel/events/uprobes.c | 3 +- > mm/huge_memory.c | 14 +++++---- > mm/hugetlb.c | 11 ++++--- > mm/khugepaged.c | 3 +- > mm/ksm.c | 6 ++-- > mm/madvise.c | 3 +- > mm/memory.c | 25 +++++++++------ > mm/migrate.c | 5 ++- > mm/mmu_notifier.c | 10 ++++++ > mm/mprotect.c | 4 ++- > mm/mremap.c | 3 +- > mm/oom_kill.c | 3 +- > mm/rmap.c | 6 ++-- > 20 files changed, 171 insertions(+), 35 deletions(-) > > -- > 2.17.2 >
On Thu, 31 Jan 2019 11:10:06 -0500 Jerome Glisse <jglisse@redhat.com> wrote: > Andrew what is your plan for this ? I had a discussion with Peter Xu > and Andrea about change_pte() and kvm. Today the change_pte() kvm > optimization is effectively disabled because of invalidate_range > calls. With a minimal couple lines patch on top of this patchset > we can bring back the kvm change_pte optimization and we can also > optimize some other cases like for instance when write protecting > after fork (but i am not sure this is something qemu does often so > it might not help for real kvm workload). > > I will be posting a the extra patch as an RFC, but in the meantime > i wanted to know what was the status for this. The various drm patches appear to be headed for collisions with drm tree development so we'll need to figure out how to handle that and in what order things happen. It's quite unclear from the v4 patchset's changelogs that this has anything to do with KVM and "the change_pte() kvm optimization" hasn't been described anywhere(?). So.. I expect the thing to do here is to get everything finished, get the changelogs completed with this new information and do a resend. Can we omit the drm and rdma patches for now? Feed them in via the subsystem maintainers when the dust has settled?
On Thu, Jan 31, 2019 at 11:55:35AM -0800, Andrew Morton wrote: > On Thu, 31 Jan 2019 11:10:06 -0500 Jerome Glisse <jglisse@redhat.com> wrote: > > > Andrew what is your plan for this ? I had a discussion with Peter Xu > > and Andrea about change_pte() and kvm. Today the change_pte() kvm > > optimization is effectively disabled because of invalidate_range > > calls. With a minimal couple lines patch on top of this patchset > > we can bring back the kvm change_pte optimization and we can also > > optimize some other cases like for instance when write protecting > > after fork (but i am not sure this is something qemu does often so > > it might not help for real kvm workload). > > > > I will be posting a the extra patch as an RFC, but in the meantime > > i wanted to know what was the status for this. > > The various drm patches appear to be headed for collisions with drm > tree development so we'll need to figure out how to handle that and in > what order things happen. > > It's quite unclear from the v4 patchset's changelogs that this has > anything to do with KVM and "the change_pte() kvm optimization" hasn't > been described anywhere(?). > > So.. I expect the thing to do here is to get everything finished, get > the changelogs completed with this new information and do a resend. > > Can we omit the drm and rdma patches for now? Feed them in via the > subsystem maintainers when the dust has settled? Yes, i should have pointed out that you can ignore the driver patches i will resumit them through the appropriate tree once the mm bits are in. I just wanted to show case how i intended to use this. I will try not to forget next time to clearly tag things that are just there to show case and that will be merge latter through different tree. I will do a v5 with kvm bits once we have enough testing and confidence. So i guess this all will be delayed to 5.2 and 5.3 for driver bits. The kvm bits are outcomes of private emails and previous face to face discussion around mmu notifier and kvm. I believe the context information will turn to be useful to more users than the ones i am doing it for. Cheers, Jérôme
Am 31.01.19 um 17:10 schrieb Jerome Glisse: > Andrew what is your plan for this ? I had a discussion with Peter Xu > and Andrea about change_pte() and kvm. Today the change_pte() kvm > optimization is effectively disabled because of invalidate_range > calls. With a minimal couple lines patch on top of this patchset > we can bring back the kvm change_pte optimization and we can also > optimize some other cases like for instance when write protecting > after fork (but i am not sure this is something qemu does often so > it might not help for real kvm workload). > > I will be posting a the extra patch as an RFC, but in the meantime > i wanted to know what was the status for this. > > > Jan, Christian does your previous ACK still holds for this ? At least the general approach still sounds perfectly sane to me. Regarding how to merge these patches I think we should just get the general infrastructure into Linus tree and when can then merge the DRM patches one release later when we are sure that it doesn't break anything. Christian. > > > On Wed, Jan 23, 2019 at 05:23:06PM -0500, jglisse@redhat.com wrote: >> From: Jérôme Glisse <jglisse@redhat.com> >> >> Hi Andrew, i see that you still have my event patch in you queue [1]. >> This patchset replace that single patch and is broken down in further >> step so that it is easier to review and ascertain that no mistake were >> made during mechanical changes. Here are the step: >> >> Patch 1 - add the enum values >> Patch 2 - coccinelle semantic patch to convert all call site of >> mmu_notifier_range_init to default enum value and also >> to passing down the vma when it is available >> Patch 3 - update many call site to more accurate enum values >> Patch 4 - add the information to the mmu_notifier_range struct >> Patch 5 - helper to test if a range is updated to read only >> >> All the remaining patches are update to various driver to demonstrate >> how this new information get use by device driver. I build tested >> with make all and make all minus everything that enable mmu notifier >> ie building with MMU_NOTIFIER=no. Also tested with some radeon,amd >> gpu and intel gpu. >> >> If they are no objections i believe best plan would be to merge the >> the first 5 patches (all mm changes) through your queue for 5.1 and >> then to delay driver update to each individual driver tree for 5.2. >> This will allow each individual device driver maintainer time to more >> thouroughly test this more then my own testing. >> >> Note that i also intend to use this feature further in nouveau and >> HMM down the road. I also expect that other user like KVM might be >> interested into leveraging this new information to optimize some of >> there secondary page table invalidation. >> >> Here is an explaination on the rational for this patchset: >> >> >> CPU page table update can happens for many reasons, not only as a result >> of a syscall (munmap(), mprotect(), mremap(), madvise(), ...) but also >> as a result of kernel activities (memory compression, reclaim, migration, >> ...). >> >> This patch introduce a set of enums that can be associated with each of >> the events triggering a mmu notifier. Latter patches take advantages of >> those enum values. >> >> - UNMAP: munmap() or mremap() >> - CLEAR: page table is cleared (migration, compaction, reclaim, ...) >> - PROTECTION_VMA: change in access protections for the range >> - PROTECTION_PAGE: change in access protections for page in the range >> - SOFT_DIRTY: soft dirtyness tracking >> >> Being able to identify munmap() and mremap() from other reasons why the >> page table is cleared is important to allow user of mmu notifier to >> update their own internal tracking structure accordingly (on munmap or >> mremap it is not longer needed to track range of virtual address as it >> becomes invalid). >> >> [1] https://www.ozlabs.org/~akpm/mmotm/broken-out/mm-mmu_notifier-contextual-information-for-event-triggering-invalidation-v2.patch >> >> Cc: Christian König <christian.koenig@amd.com> >> Cc: Jan Kara <jack@suse.cz> >> Cc: Felix Kuehling <Felix.Kuehling@amd.com> >> Cc: Jason Gunthorpe <jgg@mellanox.com> >> Cc: Andrew Morton <akpm@linux-foundation.org> >> Cc: Matthew Wilcox <mawilcox@microsoft.com> >> Cc: Ross Zwisler <zwisler@kernel.org> >> Cc: Dan Williams <dan.j.williams@intel.com> >> Cc: Paolo Bonzini <pbonzini@redhat.com> >> Cc: Radim Krčmář <rkrcmar@redhat.com> >> Cc: Michal Hocko <mhocko@kernel.org> >> Cc: Ralph Campbell <rcampbell@nvidia.com> >> Cc: John Hubbard <jhubbard@nvidia.com> >> Cc: kvm@vger.kernel.org >> Cc: dri-devel@lists.freedesktop.org >> Cc: linux-rdma@vger.kernel.org >> Cc: linux-fsdevel@vger.kernel.org >> Cc: Arnd Bergmann <arnd@arndb.de> >> >> Jérôme Glisse (9): >> mm/mmu_notifier: contextual information for event enums >> mm/mmu_notifier: contextual information for event triggering >> invalidation >> mm/mmu_notifier: use correct mmu_notifier events for each invalidation >> mm/mmu_notifier: pass down vma and reasons why mmu notifier is >> happening >> mm/mmu_notifier: mmu_notifier_range_update_to_read_only() helper >> gpu/drm/radeon: optimize out the case when a range is updated to read >> only >> gpu/drm/amdgpu: optimize out the case when a range is updated to read >> only >> gpu/drm/i915: optimize out the case when a range is updated to read >> only >> RDMA/umem_odp: optimize out the case when a range is updated to read >> only >> >> drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c | 13 ++++++++ >> drivers/gpu/drm/i915/i915_gem_userptr.c | 16 ++++++++++ >> drivers/gpu/drm/radeon/radeon_mn.c | 13 ++++++++ >> drivers/infiniband/core/umem_odp.c | 22 +++++++++++-- >> fs/proc/task_mmu.c | 3 +- >> include/linux/mmu_notifier.h | 42 ++++++++++++++++++++++++- >> include/rdma/ib_umem_odp.h | 1 + >> kernel/events/uprobes.c | 3 +- >> mm/huge_memory.c | 14 +++++---- >> mm/hugetlb.c | 11 ++++--- >> mm/khugepaged.c | 3 +- >> mm/ksm.c | 6 ++-- >> mm/madvise.c | 3 +- >> mm/memory.c | 25 +++++++++------ >> mm/migrate.c | 5 ++- >> mm/mmu_notifier.c | 10 ++++++ >> mm/mprotect.c | 4 ++- >> mm/mremap.c | 3 +- >> mm/oom_kill.c | 3 +- >> mm/rmap.c | 6 ++-- >> 20 files changed, 171 insertions(+), 35 deletions(-) >> >> -- >> 2.17.2 >> > _______________________________________________ > dri-devel mailing list > dri-devel@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel
On Thu 31-01-19 11:10:06, Jerome Glisse wrote: > > Andrew what is your plan for this ? I had a discussion with Peter Xu > and Andrea about change_pte() and kvm. Today the change_pte() kvm > optimization is effectively disabled because of invalidate_range > calls. With a minimal couple lines patch on top of this patchset > we can bring back the kvm change_pte optimization and we can also > optimize some other cases like for instance when write protecting > after fork (but i am not sure this is something qemu does often so > it might not help for real kvm workload). > > I will be posting a the extra patch as an RFC, but in the meantime > i wanted to know what was the status for this. > > Jan, Christian does your previous ACK still holds for this ? Yes, I still think the approach makes sense. Dan's concern about in tree users is valid but it seems you have those just not merged yet, right? Honza
On Fri, Feb 01, 2019 at 10:02:30PM +0100, Jan Kara wrote: > On Thu 31-01-19 11:10:06, Jerome Glisse wrote: > > > > Andrew what is your plan for this ? I had a discussion with Peter Xu > > and Andrea about change_pte() and kvm. Today the change_pte() kvm > > optimization is effectively disabled because of invalidate_range > > calls. With a minimal couple lines patch on top of this patchset > > we can bring back the kvm change_pte optimization and we can also > > optimize some other cases like for instance when write protecting > > after fork (but i am not sure this is something qemu does often so > > it might not help for real kvm workload). > > > > I will be posting a the extra patch as an RFC, but in the meantime > > i wanted to know what was the status for this. > > > > Jan, Christian does your previous ACK still holds for this ? > > Yes, I still think the approach makes sense. Dan's concern about in tree > users is valid but it seems you have those just not merged yet, right? (Catching up on email) This version included some of the first users for this but i do not want to merge them through Andrew but through the individual driver project tree. Also in the meantime i found a use for this with kvm and i expect few others users of mmu notifier will leverage this extra informations. Cheers, Jérôme
From: Jérôme Glisse <jglisse@redhat.com> Hi Andrew, i see that you still have my event patch in you queue [1]. This patchset replace that single patch and is broken down in further step so that it is easier to review and ascertain that no mistake were made during mechanical changes. Here are the step: Patch 1 - add the enum values Patch 2 - coccinelle semantic patch to convert all call site of mmu_notifier_range_init to default enum value and also to passing down the vma when it is available Patch 3 - update many call site to more accurate enum values Patch 4 - add the information to the mmu_notifier_range struct Patch 5 - helper to test if a range is updated to read only All the remaining patches are update to various driver to demonstrate how this new information get use by device driver. I build tested with make all and make all minus everything that enable mmu notifier ie building with MMU_NOTIFIER=no. Also tested with some radeon,amd gpu and intel gpu. If they are no objections i believe best plan would be to merge the the first 5 patches (all mm changes) through your queue for 5.1 and then to delay driver update to each individual driver tree for 5.2. This will allow each individual device driver maintainer time to more thouroughly test this more then my own testing. Note that i also intend to use this feature further in nouveau and HMM down the road. I also expect that other user like KVM might be interested into leveraging this new information to optimize some of there secondary page table invalidation. Here is an explaination on the rational for this patchset: CPU page table update can happens for many reasons, not only as a result of a syscall (munmap(), mprotect(), mremap(), madvise(), ...) but also as a result of kernel activities (memory compression, reclaim, migration, ...). This patch introduce a set of enums that can be associated with each of the events triggering a mmu notifier. Latter patches take advantages of those enum values. - UNMAP: munmap() or mremap() - CLEAR: page table is cleared (migration, compaction, reclaim, ...) - PROTECTION_VMA: change in access protections for the range - PROTECTION_PAGE: change in access protections for page in the range - SOFT_DIRTY: soft dirtyness tracking Being able to identify munmap() and mremap() from other reasons why the page table is cleared is important to allow user of mmu notifier to update their own internal tracking structure accordingly (on munmap or mremap it is not longer needed to track range of virtual address as it becomes invalid). [1] https://www.ozlabs.org/~akpm/mmotm/broken-out/mm-mmu_notifier-contextual-information-for-event-triggering-invalidation-v2.patch Cc: Christian König <christian.koenig@amd.com> Cc: Jan Kara <jack@suse.cz> Cc: Felix Kuehling <Felix.Kuehling@amd.com> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Matthew Wilcox <mawilcox@microsoft.com> Cc: Ross Zwisler <zwisler@kernel.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: kvm@vger.kernel.org Cc: dri-devel@lists.freedesktop.org Cc: linux-rdma@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org Cc: Arnd Bergmann <arnd@arndb.de> Jérôme Glisse (9): mm/mmu_notifier: contextual information for event enums mm/mmu_notifier: contextual information for event triggering invalidation mm/mmu_notifier: use correct mmu_notifier events for each invalidation mm/mmu_notifier: pass down vma and reasons why mmu notifier is happening mm/mmu_notifier: mmu_notifier_range_update_to_read_only() helper gpu/drm/radeon: optimize out the case when a range is updated to read only gpu/drm/amdgpu: optimize out the case when a range is updated to read only gpu/drm/i915: optimize out the case when a range is updated to read only RDMA/umem_odp: optimize out the case when a range is updated to read only drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c | 13 ++++++++ drivers/gpu/drm/i915/i915_gem_userptr.c | 16 ++++++++++ drivers/gpu/drm/radeon/radeon_mn.c | 13 ++++++++ drivers/infiniband/core/umem_odp.c | 22 +++++++++++-- fs/proc/task_mmu.c | 3 +- include/linux/mmu_notifier.h | 42 ++++++++++++++++++++++++- include/rdma/ib_umem_odp.h | 1 + kernel/events/uprobes.c | 3 +- mm/huge_memory.c | 14 +++++---- mm/hugetlb.c | 11 ++++--- mm/khugepaged.c | 3 +- mm/ksm.c | 6 ++-- mm/madvise.c | 3 +- mm/memory.c | 25 +++++++++------ mm/migrate.c | 5 ++- mm/mmu_notifier.c | 10 ++++++ mm/mprotect.c | 4 ++- mm/mremap.c | 3 +- mm/oom_kill.c | 3 +- mm/rmap.c | 6 ++-- 20 files changed, 171 insertions(+), 35 deletions(-)