[v4,0/9] mmu notifier provide context informations

Message ID	20190123222315.1122-1-jglisse@redhat.com (mailing list archive)
Headers	show Return-Path: <kvm-owner@kernel.org> From: jglisse@redhat.com To: linux-mm@kvack.org Cc: Andrew Morton <akpm@linux-foundation.org>, linux-kernel@vger.kernel.org, =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= <jglisse@redhat.com>, =?utf-8?q?Christian_?= =?utf-8?q?K=C3=B6nig?= <christian.koenig@amd.com>, Jan Kara <jack@suse.cz>, Felix Kuehling <Felix.Kuehling@amd.com>, Jason Gunthorpe <jgg@mellanox.com>, Matthew Wilcox <mawilcox@microsoft.com>, Ross Zwisler <zwisler@kernel.org>, Dan Williams <dan.j.williams@intel.com>, Paolo Bonzini <pbonzini@redhat.com>, =?utf-8?b?UmFkaW0gS3LEjW3DocWZ?= <rkrcmar@redhat.com>, Michal Hocko <mhocko@kernel.org>, Ralph Campbell <rcampbell@nvidia.com>, John Hubbard <jhubbard@nvidia.com>, kvm@vger.kernel.org, dri-devel@lists.freedesktop.org, linux-rdma@vger.kernel.org, linux-fsdevel@vger.kernel.org, Arnd Bergmann <arnd@arndb.de> Subject: [PATCH v4 0/9] mmu notifier provide context informations Date: Wed, 23 Jan 2019 17:23:06 -0500 Message-Id: <20190123222315.1122-1-jglisse@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: kvm-owner@vger.kernel.org Precedence: bulk
Series	mmu notifier provide context informations \| expand [v4,0/9] mmu notifier provide context informations [v4,1/9] mm/mmu_notifier: contextual information for event enums [v4,2/9] mm/mmu_notifier: contextual information for event triggering invalidation [v4,3/9] mm/mmu_notifier: use correct mmu_notifier events for each invalidation [v4,4/9] mm/mmu_notifier: pass down vma and reasons why mmu notifier is happening [v4,5/9] mm/mmu_notifier: mmu_notifier_range_update_to_read_only() helper [v4,6/9] gpu/drm/radeon: optimize out the case when a range is updated to read only [v4,7/9] gpu/drm/amdgpu: optimize out the case when a range is updated to read only [v4,8/9] gpu/drm/i915: optimize out the case when a range is updated to read only [v4,9/9] RDMA/umem_odp: optimize out the case when a range is updated to read only

Jerome Glisse Jan. 23, 2019, 10:23 p.m. UTC

From: Jérôme Glisse <jglisse@redhat.com>

Hi Andrew, i see that you still have my event patch in you queue [1].
This patchset replace that single patch and is broken down in further
step so that it is easier to review and ascertain that no mistake were
made during mechanical changes. Here are the step:

    Patch 1 - add the enum values
    Patch 2 - coccinelle semantic patch to convert all call site of
              mmu_notifier_range_init to default enum value and also
              to passing down the vma when it is available
    Patch 3 - update many call site to more accurate enum values
    Patch 4 - add the information to the mmu_notifier_range struct
    Patch 5 - helper to test if a range is updated to read only

All the remaining patches are update to various driver to demonstrate
how this new information get use by device driver. I build tested
with make all and make all minus everything that enable mmu notifier
ie building with MMU_NOTIFIER=no. Also tested with some radeon,amd
gpu and intel gpu.

If they are no objections i believe best plan would be to merge the
the first 5 patches (all mm changes) through your queue for 5.1 and
then to delay driver update to each individual driver tree for 5.2.
This will allow each individual device driver maintainer time to more
thouroughly test this more then my own testing.

Note that i also intend to use this feature further in nouveau and
HMM down the road. I also expect that other user like KVM might be
interested into leveraging this new information to optimize some of
there secondary page table invalidation.

Here is an explaination on the rational for this patchset:


CPU page table update can happens for many reasons, not only as a result
of a syscall (munmap(), mprotect(), mremap(), madvise(), ...) but also
as a result of kernel activities (memory compression, reclaim, migration,
...).

This patch introduce a set of enums that can be associated with each of
the events triggering a mmu notifier. Latter patches take advantages of
those enum values.

- UNMAP: munmap() or mremap()
- CLEAR: page table is cleared (migration, compaction, reclaim, ...)
- PROTECTION_VMA: change in access protections for the range
- PROTECTION_PAGE: change in access protections for page in the range
- SOFT_DIRTY: soft dirtyness tracking

Being able to identify munmap() and mremap() from other reasons why the
page table is cleared is important to allow user of mmu notifier to
update their own internal tracking structure accordingly (on munmap or
mremap it is not longer needed to track range of virtual address as it
becomes invalid).

[1] https://www.ozlabs.org/~akpm/mmotm/broken-out/mm-mmu_notifier-contextual-information-for-event-triggering-invalidation-v2.patch

Cc: Christian König <christian.koenig@amd.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Ross Zwisler <zwisler@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: kvm@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org
Cc: linux-rdma@vger.kernel.org
Cc: linux-fsdevel@vger.kernel.org
Cc: Arnd Bergmann <arnd@arndb.de>

Jérôme Glisse (9):
  mm/mmu_notifier: contextual information for event enums
  mm/mmu_notifier: contextual information for event triggering
    invalidation
  mm/mmu_notifier: use correct mmu_notifier events for each invalidation
  mm/mmu_notifier: pass down vma and reasons why mmu notifier is
    happening
  mm/mmu_notifier: mmu_notifier_range_update_to_read_only() helper
  gpu/drm/radeon: optimize out the case when a range is updated to read
    only
  gpu/drm/amdgpu: optimize out the case when a range is updated to read
    only
  gpu/drm/i915: optimize out the case when a range is updated to read
    only
  RDMA/umem_odp: optimize out the case when a range is updated to read
    only

 drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c  | 13 ++++++++
 drivers/gpu/drm/i915/i915_gem_userptr.c | 16 ++++++++++
 drivers/gpu/drm/radeon/radeon_mn.c      | 13 ++++++++
 drivers/infiniband/core/umem_odp.c      | 22 +++++++++++--
 fs/proc/task_mmu.c                      |  3 +-
 include/linux/mmu_notifier.h            | 42 ++++++++++++++++++++++++-
 include/rdma/ib_umem_odp.h              |  1 +
 kernel/events/uprobes.c                 |  3 +-
 mm/huge_memory.c                        | 14 +++++----
 mm/hugetlb.c                            | 11 ++++---
 mm/khugepaged.c                         |  3 +-
 mm/ksm.c                                |  6 ++--
 mm/madvise.c                            |  3 +-
 mm/memory.c                             | 25 +++++++++------
 mm/migrate.c                            |  5 ++-
 mm/mmu_notifier.c                       | 10 ++++++
 mm/mprotect.c                           |  4 ++-
 mm/mremap.c                             |  3 +-
 mm/oom_kill.c                           |  3 +-
 mm/rmap.c                               |  6 ++--
 20 files changed, 171 insertions(+), 35 deletions(-)

Dan Williams Jan. 23, 2019, 10:54 p.m. UTC | #1

On Wed, Jan 23, 2019 at 2:23 PM <jglisse@redhat.com> wrote:
>
> From: Jérôme Glisse <jglisse@redhat.com>
>
> Hi Andrew, i see that you still have my event patch in you queue [1].
> This patchset replace that single patch and is broken down in further
> step so that it is easier to review and ascertain that no mistake were
> made during mechanical changes. Here are the step:
>
>     Patch 1 - add the enum values
>     Patch 2 - coccinelle semantic patch to convert all call site of
>               mmu_notifier_range_init to default enum value and also
>               to passing down the vma when it is available
>     Patch 3 - update many call site to more accurate enum values
>     Patch 4 - add the information to the mmu_notifier_range struct
>     Patch 5 - helper to test if a range is updated to read only
>
> All the remaining patches are update to various driver to demonstrate
> how this new information get use by device driver. I build tested
> with make all and make all minus everything that enable mmu notifier
> ie building with MMU_NOTIFIER=no. Also tested with some radeon,amd
> gpu and intel gpu.
>
> If they are no objections i believe best plan would be to merge the
> the first 5 patches (all mm changes) through your queue for 5.1 and
> then to delay driver update to each individual driver tree for 5.2.
> This will allow each individual device driver maintainer time to more
> thouroughly test this more then my own testing.
>
> Note that i also intend to use this feature further in nouveau and
> HMM down the road. I also expect that other user like KVM might be
> interested into leveraging this new information to optimize some of
> there secondary page table invalidation.

"Down the road" users should introduce the functionality they want to
consume. The common concern with preemptively including
forward-looking infrastructure is realizing later that the
infrastructure is not needed, or needs changing. If it has no current
consumer, leave it out.

>
> Here is an explaination on the rational for this patchset:
>
>
> CPU page table update can happens for many reasons, not only as a result
> of a syscall (munmap(), mprotect(), mremap(), madvise(), ...) but also
> as a result of kernel activities (memory compression, reclaim, migration,
> ...).
>
> This patch introduce a set of enums that can be associated with each of
> the events triggering a mmu notifier. Latter patches take advantages of
> those enum values.
>
> - UNMAP: munmap() or mremap()
> - CLEAR: page table is cleared (migration, compaction, reclaim, ...)
> - PROTECTION_VMA: change in access protections for the range
> - PROTECTION_PAGE: change in access protections for page in the range
> - SOFT_DIRTY: soft dirtyness tracking
>
> Being able to identify munmap() and mremap() from other reasons why the
> page table is cleared is important to allow user of mmu notifier to
> update their own internal tracking structure accordingly (on munmap or
> mremap it is not longer needed to track range of virtual address as it
> becomes invalid).

The only context information consumed in this patch set is
MMU_NOTIFY_PROTECTION_VMA.

What is the practical benefit of these "optimize out the case when a
range is updated to read only" optimizations? Any numbers to show this
is worth the code thrash?

>
> [1] https://www.ozlabs.org/~akpm/mmotm/broken-out/mm-mmu_notifier-contextual-information-for-event-triggering-invalidation-v2.patch
>
> Cc: Christian König <christian.koenig@amd.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Felix Kuehling <Felix.Kuehling@amd.com>
> Cc: Jason Gunthorpe <jgg@mellanox.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Matthew Wilcox <mawilcox@microsoft.com>
> Cc: Ross Zwisler <zwisler@kernel.org>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Radim Krčmář <rkrcmar@redhat.com>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Ralph Campbell <rcampbell@nvidia.com>
> Cc: John Hubbard <jhubbard@nvidia.com>
> Cc: kvm@vger.kernel.org
> Cc: dri-devel@lists.freedesktop.org
> Cc: linux-rdma@vger.kernel.org
> Cc: linux-fsdevel@vger.kernel.org
> Cc: Arnd Bergmann <arnd@arndb.de>
>
> Jérôme Glisse (9):
>   mm/mmu_notifier: contextual information for event enums
>   mm/mmu_notifier: contextual information for event triggering
>     invalidation
>   mm/mmu_notifier: use correct mmu_notifier events for each invalidation
>   mm/mmu_notifier: pass down vma and reasons why mmu notifier is
>     happening
>   mm/mmu_notifier: mmu_notifier_range_update_to_read_only() helper
>   gpu/drm/radeon: optimize out the case when a range is updated to read
>     only
>   gpu/drm/amdgpu: optimize out the case when a range is updated to read
>     only
>   gpu/drm/i915: optimize out the case when a range is updated to read
>     only
>   RDMA/umem_odp: optimize out the case when a range is updated to read
>     only
>
>  drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c  | 13 ++++++++
>  drivers/gpu/drm/i915/i915_gem_userptr.c | 16 ++++++++++
>  drivers/gpu/drm/radeon/radeon_mn.c      | 13 ++++++++
>  drivers/infiniband/core/umem_odp.c      | 22 +++++++++++--
>  fs/proc/task_mmu.c                      |  3 +-
>  include/linux/mmu_notifier.h            | 42 ++++++++++++++++++++++++-
>  include/rdma/ib_umem_odp.h              |  1 +
>  kernel/events/uprobes.c                 |  3 +-
>  mm/huge_memory.c                        | 14 +++++----
>  mm/hugetlb.c                            | 11 ++++---
>  mm/khugepaged.c                         |  3 +-
>  mm/ksm.c                                |  6 ++--
>  mm/madvise.c                            |  3 +-
>  mm/memory.c                             | 25 +++++++++------
>  mm/migrate.c                            |  5 ++-
>  mm/mmu_notifier.c                       | 10 ++++++
>  mm/mprotect.c                           |  4 ++-
>  mm/mremap.c                             |  3 +-
>  mm/oom_kill.c                           |  3 +-
>  mm/rmap.c                               |  6 ++--
>  20 files changed, 171 insertions(+), 35 deletions(-)
>
> --
> 2.17.2
>

Jerome Glisse Jan. 23, 2019, 11:04 p.m. UTC | #2

On Wed, Jan 23, 2019 at 02:54:40PM -0800, Dan Williams wrote:
> On Wed, Jan 23, 2019 at 2:23 PM <jglisse@redhat.com> wrote:
> >
> > From: Jérôme Glisse <jglisse@redhat.com>
> >
> > Hi Andrew, i see that you still have my event patch in you queue [1].
> > This patchset replace that single patch and is broken down in further
> > step so that it is easier to review and ascertain that no mistake were
> > made during mechanical changes. Here are the step:
> >
> >     Patch 1 - add the enum values
> >     Patch 2 - coccinelle semantic patch to convert all call site of
> >               mmu_notifier_range_init to default enum value and also
> >               to passing down the vma when it is available
> >     Patch 3 - update many call site to more accurate enum values
> >     Patch 4 - add the information to the mmu_notifier_range struct
> >     Patch 5 - helper to test if a range is updated to read only
> >
> > All the remaining patches are update to various driver to demonstrate
> > how this new information get use by device driver. I build tested
> > with make all and make all minus everything that enable mmu notifier
> > ie building with MMU_NOTIFIER=no. Also tested with some radeon,amd
> > gpu and intel gpu.
> >
> > If they are no objections i believe best plan would be to merge the
> > the first 5 patches (all mm changes) through your queue for 5.1 and
> > then to delay driver update to each individual driver tree for 5.2.
> > This will allow each individual device driver maintainer time to more
> > thouroughly test this more then my own testing.
> >
> > Note that i also intend to use this feature further in nouveau and
> > HMM down the road. I also expect that other user like KVM might be
> > interested into leveraging this new information to optimize some of
> > there secondary page table invalidation.
> 
> "Down the road" users should introduce the functionality they want to
> consume. The common concern with preemptively including
> forward-looking infrastructure is realizing later that the
> infrastructure is not needed, or needs changing. If it has no current
> consumer, leave it out.

This patchset already show that this is useful, what more can i do ?
I know i will use this information, in nouveau for memory policy we
allocate our own structure for every vma the GPU ever accessed or that
userspace hinted we should set a policy for. Right now with existing
mmu notifier i _must_ free those structure because i do not know if
the invalidation is an munmap or something else. So i am loosing
important informations and unecessarly free struct that i will have
to re-allocate just couple jiffies latter. That's one way i am using
this. The other way is to optimize GPU page table update just like i
am doing with all the patches to RDMA/ODP and various GPU drivers.


> > Here is an explaination on the rational for this patchset:
> >
> >
> > CPU page table update can happens for many reasons, not only as a result
> > of a syscall (munmap(), mprotect(), mremap(), madvise(), ...) but also
> > as a result of kernel activities (memory compression, reclaim, migration,
> > ...).
> >
> > This patch introduce a set of enums that can be associated with each of
> > the events triggering a mmu notifier. Latter patches take advantages of
> > those enum values.
> >
> > - UNMAP: munmap() or mremap()
> > - CLEAR: page table is cleared (migration, compaction, reclaim, ...)
> > - PROTECTION_VMA: change in access protections for the range
> > - PROTECTION_PAGE: change in access protections for page in the range
> > - SOFT_DIRTY: soft dirtyness tracking
> >
> > Being able to identify munmap() and mremap() from other reasons why the
> > page table is cleared is important to allow user of mmu notifier to
> > update their own internal tracking structure accordingly (on munmap or
> > mremap it is not longer needed to track range of virtual address as it
> > becomes invalid).
> 
> The only context information consumed in this patch set is
> MMU_NOTIFY_PROTECTION_VMA.
> 
> What is the practical benefit of these "optimize out the case when a
> range is updated to read only" optimizations? Any numbers to show this
> is worth the code thrash?

It depends on the workload for instance if you map to RDMA a file
read only like a log file for export, all write back that would
disrupt the RDMA mapping can be optimized out.

See above for more reasons why it is beneficial (knowing when it is
an munmap/mremap versus something else).

I would have not thought that passing down information as something
that controversial. Hopes this help you see the benefit of this.

Cheers,
Jérôme

Dan Williams Jan. 24, 2019, midnight UTC | #3

On Wed, Jan 23, 2019 at 3:05 PM Jerome Glisse <jglisse@redhat.com> wrote:
>
> On Wed, Jan 23, 2019 at 02:54:40PM -0800, Dan Williams wrote:
> > On Wed, Jan 23, 2019 at 2:23 PM <jglisse@redhat.com> wrote:
> > >
> > > From: Jérôme Glisse <jglisse@redhat.com>
> > >
> > > Hi Andrew, i see that you still have my event patch in you queue [1].
> > > This patchset replace that single patch and is broken down in further
> > > step so that it is easier to review and ascertain that no mistake were
> > > made during mechanical changes. Here are the step:
> > >
> > >     Patch 1 - add the enum values
> > >     Patch 2 - coccinelle semantic patch to convert all call site of
> > >               mmu_notifier_range_init to default enum value and also
> > >               to passing down the vma when it is available
> > >     Patch 3 - update many call site to more accurate enum values
> > >     Patch 4 - add the information to the mmu_notifier_range struct
> > >     Patch 5 - helper to test if a range is updated to read only
> > >
> > > All the remaining patches are update to various driver to demonstrate
> > > how this new information get use by device driver. I build tested
> > > with make all and make all minus everything that enable mmu notifier
> > > ie building with MMU_NOTIFIER=no. Also tested with some radeon,amd
> > > gpu and intel gpu.
> > >
> > > If they are no objections i believe best plan would be to merge the
> > > the first 5 patches (all mm changes) through your queue for 5.1 and
> > > then to delay driver update to each individual driver tree for 5.2.
> > > This will allow each individual device driver maintainer time to more
> > > thouroughly test this more then my own testing.
> > >
> > > Note that i also intend to use this feature further in nouveau and
> > > HMM down the road. I also expect that other user like KVM might be
> > > interested into leveraging this new information to optimize some of
> > > there secondary page table invalidation.
> >
> > "Down the road" users should introduce the functionality they want to
> > consume. The common concern with preemptively including
> > forward-looking infrastructure is realizing later that the
> > infrastructure is not needed, or needs changing. If it has no current
> > consumer, leave it out.
>
> This patchset already show that this is useful, what more can i do ?
> I know i will use this information, in nouveau for memory policy we
> allocate our own structure for every vma the GPU ever accessed or that
> userspace hinted we should set a policy for. Right now with existing
> mmu notifier i _must_ free those structure because i do not know if
> the invalidation is an munmap or something else. So i am loosing
> important informations and unecessarly free struct that i will have
> to re-allocate just couple jiffies latter. That's one way i am using
> this.

Understood, but that still seems to say stage the core support when
the nouveau enabling is ready.

> The other way is to optimize GPU page table update just like i
> am doing with all the patches to RDMA/ODP and various GPU drivers.

Yes.

>
>
> > > Here is an explaination on the rational for this patchset:
> > >
> > >
> > > CPU page table update can happens for many reasons, not only as a result
> > > of a syscall (munmap(), mprotect(), mremap(), madvise(), ...) but also
> > > as a result of kernel activities (memory compression, reclaim, migration,
> > > ...).
> > >
> > > This patch introduce a set of enums that can be associated with each of
> > > the events triggering a mmu notifier. Latter patches take advantages of
> > > those enum values.
> > >
> > > - UNMAP: munmap() or mremap()
> > > - CLEAR: page table is cleared (migration, compaction, reclaim, ...)
> > > - PROTECTION_VMA: change in access protections for the range
> > > - PROTECTION_PAGE: change in access protections for page in the range
> > > - SOFT_DIRTY: soft dirtyness tracking
> > >
> > > Being able to identify munmap() and mremap() from other reasons why the
> > > page table is cleared is important to allow user of mmu notifier to
> > > update their own internal tracking structure accordingly (on munmap or
> > > mremap it is not longer needed to track range of virtual address as it
> > > becomes invalid).
> >
> > The only context information consumed in this patch set is
> > MMU_NOTIFY_PROTECTION_VMA.
> >
> > What is the practical benefit of these "optimize out the case when a
> > range is updated to read only" optimizations? Any numbers to show this
> > is worth the code thrash?
>
> It depends on the workload for instance if you map to RDMA a file
> read only like a log file for export, all write back that would
> disrupt the RDMA mapping can be optimized out.
>
> See above for more reasons why it is beneficial (knowing when it is
> an munmap/mremap versus something else).
>
> I would have not thought that passing down information as something
> that controversial. Hopes this help you see the benefit of this.

I'm not asserting that it is controversial. I am asserting that
whenever a changelog says "optimize" it also includes concrete data
about the optimization scenario. Maybe the scenarios you have
optimized are clear to the driver owners, they just weren't
immediately clear to me.

Jerome Glisse Jan. 31, 2019, 4:10 p.m. UTC | #4

Andrew what is your plan for this ? I had a discussion with Peter Xu
and Andrea about change_pte() and kvm. Today the change_pte() kvm
optimization is effectively disabled because of invalidate_range
calls. With a minimal couple lines patch on top of this patchset
we can bring back the kvm change_pte optimization and we can also
optimize some other cases like for instance when write protecting
after fork (but i am not sure this is something qemu does often so
it might not help for real kvm workload).

I will be posting a the extra patch as an RFC, but in the meantime
i wanted to know what was the status for this.


Jan, Christian does your previous ACK still holds for this ?


On Wed, Jan 23, 2019 at 05:23:06PM -0500, jglisse@redhat.com wrote:
> From: Jérôme Glisse <jglisse@redhat.com>
> 
> Hi Andrew, i see that you still have my event patch in you queue [1].
> This patchset replace that single patch and is broken down in further
> step so that it is easier to review and ascertain that no mistake were
> made during mechanical changes. Here are the step:
> 
>     Patch 1 - add the enum values
>     Patch 2 - coccinelle semantic patch to convert all call site of
>               mmu_notifier_range_init to default enum value and also
>               to passing down the vma when it is available
>     Patch 3 - update many call site to more accurate enum values
>     Patch 4 - add the information to the mmu_notifier_range struct
>     Patch 5 - helper to test if a range is updated to read only
> 
> All the remaining patches are update to various driver to demonstrate
> how this new information get use by device driver. I build tested
> with make all and make all minus everything that enable mmu notifier
> ie building with MMU_NOTIFIER=no. Also tested with some radeon,amd
> gpu and intel gpu.
> 
> If they are no objections i believe best plan would be to merge the
> the first 5 patches (all mm changes) through your queue for 5.1 and
> then to delay driver update to each individual driver tree for 5.2.
> This will allow each individual device driver maintainer time to more
> thouroughly test this more then my own testing.
> 
> Note that i also intend to use this feature further in nouveau and
> HMM down the road. I also expect that other user like KVM might be
> interested into leveraging this new information to optimize some of
> there secondary page table invalidation.
> 
> Here is an explaination on the rational for this patchset:
> 
> 
> CPU page table update can happens for many reasons, not only as a result
> of a syscall (munmap(), mprotect(), mremap(), madvise(), ...) but also
> as a result of kernel activities (memory compression, reclaim, migration,
> ...).
> 
> This patch introduce a set of enums that can be associated with each of
> the events triggering a mmu notifier. Latter patches take advantages of
> those enum values.
> 
> - UNMAP: munmap() or mremap()
> - CLEAR: page table is cleared (migration, compaction, reclaim, ...)
> - PROTECTION_VMA: change in access protections for the range
> - PROTECTION_PAGE: change in access protections for page in the range
> - SOFT_DIRTY: soft dirtyness tracking
> 
> Being able to identify munmap() and mremap() from other reasons why the
> page table is cleared is important to allow user of mmu notifier to
> update their own internal tracking structure accordingly (on munmap or
> mremap it is not longer needed to track range of virtual address as it
> becomes invalid).
> 
> [1] https://www.ozlabs.org/~akpm/mmotm/broken-out/mm-mmu_notifier-contextual-information-for-event-triggering-invalidation-v2.patch
> 
> Cc: Christian König <christian.koenig@amd.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Felix Kuehling <Felix.Kuehling@amd.com>
> Cc: Jason Gunthorpe <jgg@mellanox.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Matthew Wilcox <mawilcox@microsoft.com>
> Cc: Ross Zwisler <zwisler@kernel.org>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Radim Krčmář <rkrcmar@redhat.com>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Ralph Campbell <rcampbell@nvidia.com>
> Cc: John Hubbard <jhubbard@nvidia.com>
> Cc: kvm@vger.kernel.org
> Cc: dri-devel@lists.freedesktop.org
> Cc: linux-rdma@vger.kernel.org
> Cc: linux-fsdevel@vger.kernel.org
> Cc: Arnd Bergmann <arnd@arndb.de>
> 
> Jérôme Glisse (9):
>   mm/mmu_notifier: contextual information for event enums
>   mm/mmu_notifier: contextual information for event triggering
>     invalidation
>   mm/mmu_notifier: use correct mmu_notifier events for each invalidation
>   mm/mmu_notifier: pass down vma and reasons why mmu notifier is
>     happening
>   mm/mmu_notifier: mmu_notifier_range_update_to_read_only() helper
>   gpu/drm/radeon: optimize out the case when a range is updated to read
>     only
>   gpu/drm/amdgpu: optimize out the case when a range is updated to read
>     only
>   gpu/drm/i915: optimize out the case when a range is updated to read
>     only
>   RDMA/umem_odp: optimize out the case when a range is updated to read
>     only
> 
>  drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c  | 13 ++++++++
>  drivers/gpu/drm/i915/i915_gem_userptr.c | 16 ++++++++++
>  drivers/gpu/drm/radeon/radeon_mn.c      | 13 ++++++++
>  drivers/infiniband/core/umem_odp.c      | 22 +++++++++++--
>  fs/proc/task_mmu.c                      |  3 +-
>  include/linux/mmu_notifier.h            | 42 ++++++++++++++++++++++++-
>  include/rdma/ib_umem_odp.h              |  1 +
>  kernel/events/uprobes.c                 |  3 +-
>  mm/huge_memory.c                        | 14 +++++----
>  mm/hugetlb.c                            | 11 ++++---
>  mm/khugepaged.c                         |  3 +-
>  mm/ksm.c                                |  6 ++--
>  mm/madvise.c                            |  3 +-
>  mm/memory.c                             | 25 +++++++++------
>  mm/migrate.c                            |  5 ++-
>  mm/mmu_notifier.c                       | 10 ++++++
>  mm/mprotect.c                           |  4 ++-
>  mm/mremap.c                             |  3 +-
>  mm/oom_kill.c                           |  3 +-
>  mm/rmap.c                               |  6 ++--
>  20 files changed, 171 insertions(+), 35 deletions(-)
> 
> -- 
> 2.17.2
>

Andrew Morton Jan. 31, 2019, 7:55 p.m. UTC | #5

On Thu, 31 Jan 2019 11:10:06 -0500 Jerome Glisse <jglisse@redhat.com> wrote:

> Andrew what is your plan for this ? I had a discussion with Peter Xu
> and Andrea about change_pte() and kvm. Today the change_pte() kvm
> optimization is effectively disabled because of invalidate_range
> calls. With a minimal couple lines patch on top of this patchset
> we can bring back the kvm change_pte optimization and we can also
> optimize some other cases like for instance when write protecting
> after fork (but i am not sure this is something qemu does often so
> it might not help for real kvm workload).
> 
> I will be posting a the extra patch as an RFC, but in the meantime
> i wanted to know what was the status for this.

The various drm patches appear to be headed for collisions with drm
tree development so we'll need to figure out how to handle that and in
what order things happen.

It's quite unclear from the v4 patchset's changelogs that this has
anything to do with KVM and "the change_pte() kvm optimization" hasn't
been described anywhere(?).

So..  I expect the thing to do here is to get everything finished, get
the changelogs completed with this new information and do a resend.

Can we omit the drm and rdma patches for now?  Feed them in via the
subsystem maintainers when the dust has settled?

Jerome Glisse Jan. 31, 2019, 8:33 p.m. UTC | #6

On Thu, Jan 31, 2019 at 11:55:35AM -0800, Andrew Morton wrote:
> On Thu, 31 Jan 2019 11:10:06 -0500 Jerome Glisse <jglisse@redhat.com> wrote:
> 
> > Andrew what is your plan for this ? I had a discussion with Peter Xu
> > and Andrea about change_pte() and kvm. Today the change_pte() kvm
> > optimization is effectively disabled because of invalidate_range
> > calls. With a minimal couple lines patch on top of this patchset
> > we can bring back the kvm change_pte optimization and we can also
> > optimize some other cases like for instance when write protecting
> > after fork (but i am not sure this is something qemu does often so
> > it might not help for real kvm workload).
> > 
> > I will be posting a the extra patch as an RFC, but in the meantime
> > i wanted to know what was the status for this.
> 
> The various drm patches appear to be headed for collisions with drm
> tree development so we'll need to figure out how to handle that and in
> what order things happen.
> 
> It's quite unclear from the v4 patchset's changelogs that this has
> anything to do with KVM and "the change_pte() kvm optimization" hasn't
> been described anywhere(?).
> 
> So..  I expect the thing to do here is to get everything finished, get
> the changelogs completed with this new information and do a resend.
> 
> Can we omit the drm and rdma patches for now?  Feed them in via the
> subsystem maintainers when the dust has settled?

Yes, i should have pointed out that you can ignore the driver patches
i will resumit them through the appropriate tree once the mm bits are
in. I just wanted to show case how i intended to use this. I will try
not to forget next time to clearly tag things that are just there to
show case and that will be merge latter through different tree.

I will do a v5 with kvm bits once we have enough testing and confidence.
So i guess this all will be delayed to 5.2 and 5.3 for driver bits.
The kvm bits are outcomes of private emails and previous face to face
discussion around mmu notifier and kvm. I believe the context information
will turn to be useful to more users than the ones i am doing it for.

Cheers,
Jérôme

Christian König Feb. 1, 2019, 12:24 p.m. UTC | #7

Am 31.01.19 um 17:10 schrieb Jerome Glisse:
> Andrew what is your plan for this ? I had a discussion with Peter Xu
> and Andrea about change_pte() and kvm. Today the change_pte() kvm
> optimization is effectively disabled because of invalidate_range
> calls. With a minimal couple lines patch on top of this patchset
> we can bring back the kvm change_pte optimization and we can also
> optimize some other cases like for instance when write protecting
> after fork (but i am not sure this is something qemu does often so
> it might not help for real kvm workload).
>
> I will be posting a the extra patch as an RFC, but in the meantime
> i wanted to know what was the status for this.
>
>
> Jan, Christian does your previous ACK still holds for this ?

At least the general approach still sounds perfectly sane to me.

Regarding how to merge these patches I think we should just get the 
general infrastructure into Linus tree and when can then merge the DRM 
patches one release later when we are sure that it doesn't break anything.

Christian.

>
>
> On Wed, Jan 23, 2019 at 05:23:06PM -0500, jglisse@redhat.com wrote:
>> From: Jérôme Glisse <jglisse@redhat.com>
>>
>> Hi Andrew, i see that you still have my event patch in you queue [1].
>> This patchset replace that single patch and is broken down in further
>> step so that it is easier to review and ascertain that no mistake were
>> made during mechanical changes. Here are the step:
>>
>>      Patch 1 - add the enum values
>>      Patch 2 - coccinelle semantic patch to convert all call site of
>>                mmu_notifier_range_init to default enum value and also
>>                to passing down the vma when it is available
>>      Patch 3 - update many call site to more accurate enum values
>>      Patch 4 - add the information to the mmu_notifier_range struct
>>      Patch 5 - helper to test if a range is updated to read only
>>
>> All the remaining patches are update to various driver to demonstrate
>> how this new information get use by device driver. I build tested
>> with make all and make all minus everything that enable mmu notifier
>> ie building with MMU_NOTIFIER=no. Also tested with some radeon,amd
>> gpu and intel gpu.
>>
>> If they are no objections i believe best plan would be to merge the
>> the first 5 patches (all mm changes) through your queue for 5.1 and
>> then to delay driver update to each individual driver tree for 5.2.
>> This will allow each individual device driver maintainer time to more
>> thouroughly test this more then my own testing.
>>
>> Note that i also intend to use this feature further in nouveau and
>> HMM down the road. I also expect that other user like KVM might be
>> interested into leveraging this new information to optimize some of
>> there secondary page table invalidation.
>>
>> Here is an explaination on the rational for this patchset:
>>
>>
>> CPU page table update can happens for many reasons, not only as a result
>> of a syscall (munmap(), mprotect(), mremap(), madvise(), ...) but also
>> as a result of kernel activities (memory compression, reclaim, migration,
>> ...).
>>
>> This patch introduce a set of enums that can be associated with each of
>> the events triggering a mmu notifier. Latter patches take advantages of
>> those enum values.
>>
>> - UNMAP: munmap() or mremap()
>> - CLEAR: page table is cleared (migration, compaction, reclaim, ...)
>> - PROTECTION_VMA: change in access protections for the range
>> - PROTECTION_PAGE: change in access protections for page in the range
>> - SOFT_DIRTY: soft dirtyness tracking
>>
>> Being able to identify munmap() and mremap() from other reasons why the
>> page table is cleared is important to allow user of mmu notifier to
>> update their own internal tracking structure accordingly (on munmap or
>> mremap it is not longer needed to track range of virtual address as it
>> becomes invalid).
>>
>> [1] https://www.ozlabs.org/~akpm/mmotm/broken-out/mm-mmu_notifier-contextual-information-for-event-triggering-invalidation-v2.patch
>>
>> Cc: Christian König <christian.koenig@amd.com>
>> Cc: Jan Kara <jack@suse.cz>
>> Cc: Felix Kuehling <Felix.Kuehling@amd.com>
>> Cc: Jason Gunthorpe <jgg@mellanox.com>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: Matthew Wilcox <mawilcox@microsoft.com>
>> Cc: Ross Zwisler <zwisler@kernel.org>
>> Cc: Dan Williams <dan.j.williams@intel.com>
>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>> Cc: Radim Krčmář <rkrcmar@redhat.com>
>> Cc: Michal Hocko <mhocko@kernel.org>
>> Cc: Ralph Campbell <rcampbell@nvidia.com>
>> Cc: John Hubbard <jhubbard@nvidia.com>
>> Cc: kvm@vger.kernel.org
>> Cc: dri-devel@lists.freedesktop.org
>> Cc: linux-rdma@vger.kernel.org
>> Cc: linux-fsdevel@vger.kernel.org
>> Cc: Arnd Bergmann <arnd@arndb.de>
>>
>> Jérôme Glisse (9):
>>    mm/mmu_notifier: contextual information for event enums
>>    mm/mmu_notifier: contextual information for event triggering
>>      invalidation
>>    mm/mmu_notifier: use correct mmu_notifier events for each invalidation
>>    mm/mmu_notifier: pass down vma and reasons why mmu notifier is
>>      happening
>>    mm/mmu_notifier: mmu_notifier_range_update_to_read_only() helper
>>    gpu/drm/radeon: optimize out the case when a range is updated to read
>>      only
>>    gpu/drm/amdgpu: optimize out the case when a range is updated to read
>>      only
>>    gpu/drm/i915: optimize out the case when a range is updated to read
>>      only
>>    RDMA/umem_odp: optimize out the case when a range is updated to read
>>      only
>>
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c  | 13 ++++++++
>>   drivers/gpu/drm/i915/i915_gem_userptr.c | 16 ++++++++++
>>   drivers/gpu/drm/radeon/radeon_mn.c      | 13 ++++++++
>>   drivers/infiniband/core/umem_odp.c      | 22 +++++++++++--
>>   fs/proc/task_mmu.c                      |  3 +-
>>   include/linux/mmu_notifier.h            | 42 ++++++++++++++++++++++++-
>>   include/rdma/ib_umem_odp.h              |  1 +
>>   kernel/events/uprobes.c                 |  3 +-
>>   mm/huge_memory.c                        | 14 +++++----
>>   mm/hugetlb.c                            | 11 ++++---
>>   mm/khugepaged.c                         |  3 +-
>>   mm/ksm.c                                |  6 ++--
>>   mm/madvise.c                            |  3 +-
>>   mm/memory.c                             | 25 +++++++++------
>>   mm/migrate.c                            |  5 ++-
>>   mm/mmu_notifier.c                       | 10 ++++++
>>   mm/mprotect.c                           |  4 ++-
>>   mm/mremap.c                             |  3 +-
>>   mm/oom_kill.c                           |  3 +-
>>   mm/rmap.c                               |  6 ++--
>>   20 files changed, 171 insertions(+), 35 deletions(-)
>>
>> -- 
>> 2.17.2
>>
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

Jan Kara Feb. 1, 2019, 9:02 p.m. UTC | #8

On Thu 31-01-19 11:10:06, Jerome Glisse wrote:
> 
> Andrew what is your plan for this ? I had a discussion with Peter Xu
> and Andrea about change_pte() and kvm. Today the change_pte() kvm
> optimization is effectively disabled because of invalidate_range
> calls. With a minimal couple lines patch on top of this patchset
> we can bring back the kvm change_pte optimization and we can also
> optimize some other cases like for instance when write protecting
> after fork (but i am not sure this is something qemu does often so
> it might not help for real kvm workload).
> 
> I will be posting a the extra patch as an RFC, but in the meantime
> i wanted to know what was the status for this.
> 
> Jan, Christian does your previous ACK still holds for this ?

Yes, I still think the approach makes sense. Dan's concern about in tree
users is valid but it seems you have those just not merged yet, right?

								Honza

Jerome Glisse Feb. 11, 2019, 6:54 p.m. UTC | #9

On Fri, Feb 01, 2019 at 10:02:30PM +0100, Jan Kara wrote:
> On Thu 31-01-19 11:10:06, Jerome Glisse wrote:
> > 
> > Andrew what is your plan for this ? I had a discussion with Peter Xu
> > and Andrea about change_pte() and kvm. Today the change_pte() kvm
> > optimization is effectively disabled because of invalidate_range
> > calls. With a minimal couple lines patch on top of this patchset
> > we can bring back the kvm change_pte optimization and we can also
> > optimize some other cases like for instance when write protecting
> > after fork (but i am not sure this is something qemu does often so
> > it might not help for real kvm workload).
> > 
> > I will be posting a the extra patch as an RFC, but in the meantime
> > i wanted to know what was the status for this.
> > 
> > Jan, Christian does your previous ACK still holds for this ?
> 
> Yes, I still think the approach makes sense. Dan's concern about in tree
> users is valid but it seems you have those just not merged yet, right?

(Catching up on email)

This version included some of the first users for this but i do not
want to merge them through Andrew but through the individual driver
project tree. Also in the meantime i found a use for this with kvm
and i expect few others users of mmu notifier will leverage this
extra informations.

Cheers,
Jérôme

[v4,0/9] mmu notifier provide context informations

Message

Comments