mbox series

[00/10] HMM updates for 5.1

Message ID 20190129165428.3931-1-jglisse@redhat.com (mailing list archive)
Headers show
Series HMM updates for 5.1 | expand

Message

Jerome Glisse Jan. 29, 2019, 4:54 p.m. UTC
From: Jérôme Glisse <jglisse@redhat.com>

This patchset improves the HMM driver API and add support for hugetlbfs
and DAX mirroring. The improvement motivation was to make the ODP to HMM
conversion easier [1]. Because we have nouveau bits schedule for 5.1 and
to avoid any multi-tree synchronization this patchset adds few lines of
inline function that wrap the existing HMM driver API to the improved
API. The nouveau driver was tested before and after this patchset and it
builds and works on both case so there is no merging issue [2]. The
nouveau bit are queue up for 5.1 so this is why i added those inline.

If this get merge in 5.1 the plans is to merge the HMM to ODP in 5.2 or
5.3 if testing shows any issues (so far no issues has been found with
limited testing but Mellanox will be running heavier testing for longer
time).

To avoid spamming mm i would like to not cc mm on ODP or nouveau patches,
however if people prefer to see those on mm mailing list then i can keep
it cced.

This is also what i intend to use as a base for AMD and Intel patches
(v2 with more thing of some rfc which were already posted in the past).

[1] https://cgit.freedesktop.org/~glisse/linux/log/?h=odp-hmm
[2] https://cgit.freedesktop.org/~glisse/linux/log/?h=hmm-for-5.1

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: Dan Williams <dan.j.williams@intel.com>

Jérôme Glisse (10):
  mm/hmm: use reference counting for HMM struct
  mm/hmm: do not erase snapshot when a range is invalidated
  mm/hmm: improve and rename hmm_vma_get_pfns() to hmm_range_snapshot()
  mm/hmm: improve and rename hmm_vma_fault() to hmm_range_fault()
  mm/hmm: improve driver API to work and wait over a range
  mm/hmm: add default fault flags to avoid the need to pre-fill pfns
    arrays.
  mm/hmm: add an helper function that fault pages and map them to a
    device
  mm/hmm: support hugetlbfs (snap shoting, faulting and DMA mapping)
  mm/hmm: allow to mirror vma of a file on a DAX backed filesystem
  mm/hmm: add helpers for driver to safely take the mmap_sem

 include/linux/hmm.h |  290 ++++++++++--
 mm/hmm.c            | 1060 +++++++++++++++++++++++++++++--------------
 2 files changed, 983 insertions(+), 367 deletions(-)

Comments

John Hubbard Feb. 20, 2019, 11:17 p.m. UTC | #1
On 1/29/19 8:54 AM, jglisse@redhat.com wrote:
> From: Jérôme Glisse <jglisse@redhat.com>
> 
> This patchset improves the HMM driver API and add support for hugetlbfs
> and DAX mirroring. The improvement motivation was to make the ODP to HMM
> conversion easier [1]. Because we have nouveau bits schedule for 5.1 and
> to avoid any multi-tree synchronization this patchset adds few lines of
> inline function that wrap the existing HMM driver API to the improved
> API. The nouveau driver was tested before and after this patchset and it
> builds and works on both case so there is no merging issue [2]. The
> nouveau bit are queue up for 5.1 so this is why i added those inline.
> 
> If this get merge in 5.1 the plans is to merge the HMM to ODP in 5.2 or
> 5.3 if testing shows any issues (so far no issues has been found with
> limited testing but Mellanox will be running heavier testing for longer
> time).
> 
> To avoid spamming mm i would like to not cc mm on ODP or nouveau patches,
> however if people prefer to see those on mm mailing list then i can keep
> it cced.
> 
> This is also what i intend to use as a base for AMD and Intel patches
> (v2 with more thing of some rfc which were already posted in the past).
> 

Hi Jerome,

Although Ralph has been testing and looking at this patchset, I just now
noticed that there hasn't been much public review of it, so I'm doing
a bit of that now. I don't think it's *quite* too late, because we're
still not at the 5.1 merge window...sorry for taking so long to get to
this.

Ralph, you might want to add ACKs or Tested-by's to some of these
patches (or even Reviewed-by, if you went that deep, which I suspect you
did in some cases), according to what you feel comfortable with?


thanks,
Jerome Glisse Feb. 20, 2019, 11:36 p.m. UTC | #2
On Wed, Feb 20, 2019 at 03:17:58PM -0800, John Hubbard wrote:
> On 1/29/19 8:54 AM, jglisse@redhat.com wrote:
> > From: Jérôme Glisse <jglisse@redhat.com>
> > 
> > This patchset improves the HMM driver API and add support for hugetlbfs
> > and DAX mirroring. The improvement motivation was to make the ODP to HMM
> > conversion easier [1]. Because we have nouveau bits schedule for 5.1 and
> > to avoid any multi-tree synchronization this patchset adds few lines of
> > inline function that wrap the existing HMM driver API to the improved
> > API. The nouveau driver was tested before and after this patchset and it
> > builds and works on both case so there is no merging issue [2]. The
> > nouveau bit are queue up for 5.1 so this is why i added those inline.
> > 
> > If this get merge in 5.1 the plans is to merge the HMM to ODP in 5.2 or
> > 5.3 if testing shows any issues (so far no issues has been found with
> > limited testing but Mellanox will be running heavier testing for longer
> > time).
> > 
> > To avoid spamming mm i would like to not cc mm on ODP or nouveau patches,
> > however if people prefer to see those on mm mailing list then i can keep
> > it cced.
> > 
> > This is also what i intend to use as a base for AMD and Intel patches
> > (v2 with more thing of some rfc which were already posted in the past).
> > 
> 
> Hi Jerome,
> 
> Although Ralph has been testing and looking at this patchset, I just now
> noticed that there hasn't been much public review of it, so I'm doing
> a bit of that now. I don't think it's *quite* too late, because we're
> still not at the 5.1 merge window...sorry for taking so long to get to
> this.
> 
> Ralph, you might want to add ACKs or Tested-by's to some of these
> patches (or even Reviewed-by, if you went that deep, which I suspect you
> did in some cases), according to what you feel comfortable with?

More eyes are always welcome, i tested with nouveau and with infinibanb
mlx5. It seemed to work properly in my testing but i might have miss-
something.

Cheers,
Jérôme
Ralph Campbell Feb. 22, 2019, 11:31 p.m. UTC | #3
On 1/29/19 8:54 AM, jglisse@redhat.com wrote:
> From: Jérôme Glisse <jglisse@redhat.com>
> 
> This patchset improves the HMM driver API and add support for hugetlbfs
> and DAX mirroring. The improvement motivation was to make the ODP to HMM
> conversion easier [1]. Because we have nouveau bits schedule for 5.1 and
> to avoid any multi-tree synchronization this patchset adds few lines of
> inline function that wrap the existing HMM driver API to the improved
> API. The nouveau driver was tested before and after this patchset and it
> builds and works on both case so there is no merging issue [2]. The
> nouveau bit are queue up for 5.1 so this is why i added those inline.
> 
> If this get merge in 5.1 the plans is to merge the HMM to ODP in 5.2 or
> 5.3 if testing shows any issues (so far no issues has been found with
> limited testing but Mellanox will be running heavier testing for longer
> time).
> 
> To avoid spamming mm i would like to not cc mm on ODP or nouveau patches,
> however if people prefer to see those on mm mailing list then i can keep
> it cced.
> 
> This is also what i intend to use as a base for AMD and Intel patches
> (v2 with more thing of some rfc which were already posted in the past).
> 
> [1] https://cgit.freedesktop.org/~glisse/linux/log/?h=odp-hmm
> [2] https://cgit.freedesktop.org/~glisse/linux/log/?h=hmm-for-5.1
> 
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Felix Kuehling <Felix.Kuehling@amd.com>
> Cc: Christian König <christian.koenig@amd.com>
> Cc: Ralph Campbell <rcampbell@nvidia.com>
> Cc: John Hubbard <jhubbard@nvidia.com>
> Cc: Jason Gunthorpe <jgg@mellanox.com>
> Cc: Dan Williams <dan.j.williams@intel.com>
> 
> Jérôme Glisse (10):
>    mm/hmm: use reference counting for HMM struct
>    mm/hmm: do not erase snapshot when a range is invalidated
>    mm/hmm: improve and rename hmm_vma_get_pfns() to hmm_range_snapshot()
>    mm/hmm: improve and rename hmm_vma_fault() to hmm_range_fault()
>    mm/hmm: improve driver API to work and wait over a range
>    mm/hmm: add default fault flags to avoid the need to pre-fill pfns
>      arrays.
>    mm/hmm: add an helper function that fault pages and map them to a
>      device
>    mm/hmm: support hugetlbfs (snap shoting, faulting and DMA mapping)
>    mm/hmm: allow to mirror vma of a file on a DAX backed filesystem
>    mm/hmm: add helpers for driver to safely take the mmap_sem
> 
>   include/linux/hmm.h |  290 ++++++++++--
>   mm/hmm.c            | 1060 +++++++++++++++++++++++++++++--------------
>   2 files changed, 983 insertions(+), 367 deletions(-)
> 

I have been testing this patch series in addition to [1] with some
success. I wouldn't go as far as saying it is thoroughly tested
but you can add:

Reviewed-by: Ralph Campbell <rcampbell@nvidia.com>


[1] https://marc.info/?l=linux-mm&m=155060669514459&w=2
     ("[PATCH v5 0/9] mmu notifier provide context informations")
Jerome Glisse March 13, 2019, 1:27 a.m. UTC | #4
Andrew you will not be pushing this patchset in 5.1 ?

Cheers,
Jérôme
Andrew Morton March 13, 2019, 4:10 p.m. UTC | #5
On Tue, 12 Mar 2019 21:27:06 -0400 Jerome Glisse <jglisse@redhat.com> wrote:

> Andrew you will not be pushing this patchset in 5.1 ?

I'd like to.  It sounds like we're converging on a plan.

It would be good to hear more from the driver developers who will be
consuming these new features - links to patchsets, review feedback,
etc.  Which individuals should we be asking?  Felix, Christian and
Jason, perhaps?
Jason Gunthorpe March 13, 2019, 6:01 p.m. UTC | #6
On Wed, Mar 13, 2019 at 09:10:04AM -0700, Andrew Morton wrote:
> On Tue, 12 Mar 2019 21:27:06 -0400 Jerome Glisse <jglisse@redhat.com> wrote:
> 
> > Andrew you will not be pushing this patchset in 5.1 ?
> 
> I'd like to.  It sounds like we're converging on a plan.
> 
> It would be good to hear more from the driver developers who will be
> consuming these new features - links to patchsets, review feedback,
> etc.  Which individuals should we be asking?  Felix, Christian and
> Jason, perhaps?

At least the Mellanox driver patch looks like a good improvement:

https://patchwork.kernel.org/patch/10786625/
 5 files changed, 202 insertions(+), 452 deletions(-)

In fact it hollows out the 'umem_odp' driver abstraction we already
had in the RDMA core.

So, I fully expect to see this API used in mlx5 and RDMA-core after it
is merged.

We've done some testing now, and there are still some outstanding
questions on the driver parts, but I haven't seen anything
fundamentally wrong with HMM mirror come up.

Jason
Jerome Glisse March 13, 2019, 6:33 p.m. UTC | #7
On Wed, Mar 13, 2019 at 09:10:04AM -0700, Andrew Morton wrote:
> On Tue, 12 Mar 2019 21:27:06 -0400 Jerome Glisse <jglisse@redhat.com> wrote:
> 
> > Andrew you will not be pushing this patchset in 5.1 ?
> 
> I'd like to.  It sounds like we're converging on a plan.
> 
> It would be good to hear more from the driver developers who will be
> consuming these new features - links to patchsets, review feedback,
> etc.  Which individuals should we be asking?  Felix, Christian and
> Jason, perhaps?
> 

Adding Ben as nouveau maintainer. Note that this patchset only add
2 new function the rest is just refactoring to allow RDMA ODP. The
2 news functions will both be use by ODP and nouveau. Ben this is
the dma map function we discuss previously. If they get in 5.1 then
i will push there user in nouveau at least in 5.2. I will soon repost
ODP v2 patchset on top of RDMA tree so if this does not get in 5.1
then ODP will have to be push to 5.3 or to when this get upstream.

Cheers,
Jérôme
Felix Kuehling March 18, 2019, 5 p.m. UTC | #8
For amdgpu I looked over the changes and they look reasonable to me. 
Philip Yang (CCed) already rebased amdgpu on top of Jerome's patches and 
is looking forward to using the new helpers and simplifying our driver code.

Feel free to add my Acked-by to the patches.

Regards,
   Felix

On 3/13/2019 12:10 PM, Andrew Morton wrote:
> On Tue, 12 Mar 2019 21:27:06 -0400 Jerome Glisse <jglisse@redhat.com> wrote:
>
>> Andrew you will not be pushing this patchset in 5.1 ?
> I'd like to.  It sounds like we're converging on a plan.
>
> It would be good to hear more from the driver developers who will be
> consuming these new features - links to patchsets, review feedback,
> etc.  Which individuals should we be asking?  Felix, Christian and
> Jason, perhaps?
>
Jerome Glisse March 18, 2019, 5:04 p.m. UTC | #9
On Wed, Mar 13, 2019 at 09:10:04AM -0700, Andrew Morton wrote:
> On Tue, 12 Mar 2019 21:27:06 -0400 Jerome Glisse <jglisse@redhat.com> wrote:
> 
> > Andrew you will not be pushing this patchset in 5.1 ?
> 
> I'd like to.  It sounds like we're converging on a plan.
> 
> It would be good to hear more from the driver developers who will be
> consuming these new features - links to patchsets, review feedback,
> etc.  Which individuals should we be asking?  Felix, Christian and
> Jason, perhaps?
> 

So i am guessing you will not send this to Linus ? Should i repost ?
This patchset has 2 sides, first side is just reworking the HMM API
to make something better in respect to process lifetime. AMD folks
did find that helpful [1]. This rework is also necessary to ease up
the convertion of ODP to HMM [2] and Jason already said that he is
interested in seing that happening [3]. By missing 5.1 it means now
that i can not push ODP to HMM in 5.2 and it will be postpone to 5.3
which is also postoning other work ...

The second side is it adds 2 new helper dma map and dma unmap both
are gonna be use by ODP and latter by nouveau (after some other
nouveau changes are done). This new functions just do dma_map ie:
    hmm_dma_map() {
        existing_hmm_api()
        for_each_page() {
            dma_map_page()
        }
    }

Do you want to see anymore justification than that ?

[1] https://www.spinics.net/lists/amd-gfx/msg31048.html
[2] https://patchwork.kernel.org/patch/10786625/
[3] https://lkml.org/lkml/2019/3/13/591

Cheers,
Jérôme
Dan Williams March 18, 2019, 6:30 p.m. UTC | #10
On Mon, Mar 18, 2019 at 10:04 AM Jerome Glisse <jglisse@redhat.com> wrote:
>
> On Wed, Mar 13, 2019 at 09:10:04AM -0700, Andrew Morton wrote:
> > On Tue, 12 Mar 2019 21:27:06 -0400 Jerome Glisse <jglisse@redhat.com> wrote:
> >
> > > Andrew you will not be pushing this patchset in 5.1 ?
> >
> > I'd like to.  It sounds like we're converging on a plan.
> >
> > It would be good to hear more from the driver developers who will be
> > consuming these new features - links to patchsets, review feedback,
> > etc.  Which individuals should we be asking?  Felix, Christian and
> > Jason, perhaps?
> >
>
> So i am guessing you will not send this to Linus ? Should i repost ?
> This patchset has 2 sides, first side is just reworking the HMM API
> to make something better in respect to process lifetime. AMD folks
> did find that helpful [1]. This rework is also necessary to ease up
> the convertion of ODP to HMM [2] and Jason already said that he is
> interested in seing that happening [3]. By missing 5.1 it means now
> that i can not push ODP to HMM in 5.2 and it will be postpone to 5.3
> which is also postoning other work ...
>
> The second side is it adds 2 new helper dma map and dma unmap both
> are gonna be use by ODP and latter by nouveau (after some other
> nouveau changes are done). This new functions just do dma_map ie:
>     hmm_dma_map() {
>         existing_hmm_api()
>         for_each_page() {
>             dma_map_page()
>         }
>     }
>
> Do you want to see anymore justification than that ?

Yes, why does hmm needs its own dma mapping apis? It seems to
perpetuate the perception that hmm is something bolted onto the side
of the core-mm rather than a native capability.
Jerome Glisse March 18, 2019, 6:54 p.m. UTC | #11
On Mon, Mar 18, 2019 at 11:30:15AM -0700, Dan Williams wrote:
> On Mon, Mar 18, 2019 at 10:04 AM Jerome Glisse <jglisse@redhat.com> wrote:
> >
> > On Wed, Mar 13, 2019 at 09:10:04AM -0700, Andrew Morton wrote:
> > > On Tue, 12 Mar 2019 21:27:06 -0400 Jerome Glisse <jglisse@redhat.com> wrote:
> > >
> > > > Andrew you will not be pushing this patchset in 5.1 ?
> > >
> > > I'd like to.  It sounds like we're converging on a plan.
> > >
> > > It would be good to hear more from the driver developers who will be
> > > consuming these new features - links to patchsets, review feedback,
> > > etc.  Which individuals should we be asking?  Felix, Christian and
> > > Jason, perhaps?
> > >
> >
> > So i am guessing you will not send this to Linus ? Should i repost ?
> > This patchset has 2 sides, first side is just reworking the HMM API
> > to make something better in respect to process lifetime. AMD folks
> > did find that helpful [1]. This rework is also necessary to ease up
> > the convertion of ODP to HMM [2] and Jason already said that he is
> > interested in seing that happening [3]. By missing 5.1 it means now
> > that i can not push ODP to HMM in 5.2 and it will be postpone to 5.3
> > which is also postoning other work ...
> >
> > The second side is it adds 2 new helper dma map and dma unmap both
> > are gonna be use by ODP and latter by nouveau (after some other
> > nouveau changes are done). This new functions just do dma_map ie:
> >     hmm_dma_map() {
> >         existing_hmm_api()
> >         for_each_page() {
> >             dma_map_page()
> >         }
> >     }
> >
> > Do you want to see anymore justification than that ?
> 
> Yes, why does hmm needs its own dma mapping apis? It seems to
> perpetuate the perception that hmm is something bolted onto the side
> of the core-mm rather than a native capability.

Seriously ?

Kernel is fill with example where common code pattern that are not
device specific are turn into helpers and here this is exactly what
it is. A common pattern that all device driver will do which is turn
into a common helper.

Moreover this allow to share the same error code handling accross
driver when mapping one page fails. So this avoid the needs to
duplicate same boiler plate code accross different drivers.

Is code factorization not a good thing ? Should i duplicate every-
thing in every single driver ?


If that's not enough, this will also allow to handle peer to peer
and i posted patches for that [1] and again this is to avoid
duplicating common code accross different drivers.


It does feel that you oppose everything with HMM in its name just
because you do not like it. It is your prerogative to not like some-
thing but you should propose something that achieve the same result
instead of constantly questioning every single comma.

Cheers,
Jérôme

[1] https://lwn.net/ml/linux-kernel/20190129174728.6430-1-jglisse@redhat.com/
Dan Williams March 18, 2019, 7:18 p.m. UTC | #12
On Mon, Mar 18, 2019 at 11:55 AM Jerome Glisse <jglisse@redhat.com> wrote:
>
> On Mon, Mar 18, 2019 at 11:30:15AM -0700, Dan Williams wrote:
> > On Mon, Mar 18, 2019 at 10:04 AM Jerome Glisse <jglisse@redhat.com> wrote:
> > >
> > > On Wed, Mar 13, 2019 at 09:10:04AM -0700, Andrew Morton wrote:
> > > > On Tue, 12 Mar 2019 21:27:06 -0400 Jerome Glisse <jglisse@redhat.com> wrote:
> > > >
> > > > > Andrew you will not be pushing this patchset in 5.1 ?
> > > >
> > > > I'd like to.  It sounds like we're converging on a plan.
> > > >
> > > > It would be good to hear more from the driver developers who will be
> > > > consuming these new features - links to patchsets, review feedback,
> > > > etc.  Which individuals should we be asking?  Felix, Christian and
> > > > Jason, perhaps?
> > > >
> > >
> > > So i am guessing you will not send this to Linus ? Should i repost ?
> > > This patchset has 2 sides, first side is just reworking the HMM API
> > > to make something better in respect to process lifetime. AMD folks
> > > did find that helpful [1]. This rework is also necessary to ease up
> > > the convertion of ODP to HMM [2] and Jason already said that he is
> > > interested in seing that happening [3]. By missing 5.1 it means now
> > > that i can not push ODP to HMM in 5.2 and it will be postpone to 5.3
> > > which is also postoning other work ...
> > >
> > > The second side is it adds 2 new helper dma map and dma unmap both
> > > are gonna be use by ODP and latter by nouveau (after some other
> > > nouveau changes are done). This new functions just do dma_map ie:
> > >     hmm_dma_map() {
> > >         existing_hmm_api()
> > >         for_each_page() {
> > >             dma_map_page()
> > >         }
> > >     }
> > >
> > > Do you want to see anymore justification than that ?
> >
> > Yes, why does hmm needs its own dma mapping apis? It seems to
> > perpetuate the perception that hmm is something bolted onto the side
> > of the core-mm rather than a native capability.
>
> Seriously ?

Yes.

> Kernel is fill with example where common code pattern that are not
> device specific are turn into helpers and here this is exactly what
> it is. A common pattern that all device driver will do which is turn
> into a common helper.

Yes, but we also try not to introduce thin wrappers around existing
apis. If the current dma api does not understand some hmm constraint
I'm questioning why not teach the dma api that constraint and make it
a native capability rather than asking the driver developer to
understand the rules about when to use dma_map_page() vs
hmm_dma_map().

For example I don't think we want to end up with more headers like
include/linux/pci-dma-compat.h.

> Moreover this allow to share the same error code handling accross
> driver when mapping one page fails. So this avoid the needs to
> duplicate same boiler plate code accross different drivers.
>
> Is code factorization not a good thing ? Should i duplicate every-
> thing in every single driver ?

I did not ask for duplication, I asked why is it not more deeply integrated.

> If that's not enough, this will also allow to handle peer to peer
> and i posted patches for that [1] and again this is to avoid
> duplicating common code accross different drivers.

I went looking for the hmm_dma_map() patches on the list but could not
find them, so I was reacting to the "This new functions just do
dma_map", and wondered if that was the full extent of the
justification.

> It does feel that you oppose everything with HMM in its name just
> because you do not like it. It is your prerogative to not like some-
> thing but you should propose something that achieve the same result
> instead of constantly questioning every single comma.

I respect what you're trying to do, if I didn't I wouldn't bother
responding. Please don't put words in my mouth. I think it was
Churchill who said "if two people agree all the time, one of them is
redundant". You're raising questions with HMM that identify real gaps
in Linux memory management relative to new hardware capabilities, I
also think it is reasonable to question how the gaps are filled.
Jerome Glisse March 18, 2019, 7:28 p.m. UTC | #13
On Mon, Mar 18, 2019 at 12:18:38PM -0700, Dan Williams wrote:
> On Mon, Mar 18, 2019 at 11:55 AM Jerome Glisse <jglisse@redhat.com> wrote:
> >
> > On Mon, Mar 18, 2019 at 11:30:15AM -0700, Dan Williams wrote:
> > > On Mon, Mar 18, 2019 at 10:04 AM Jerome Glisse <jglisse@redhat.com> wrote:
> > > >
> > > > On Wed, Mar 13, 2019 at 09:10:04AM -0700, Andrew Morton wrote:
> > > > > On Tue, 12 Mar 2019 21:27:06 -0400 Jerome Glisse <jglisse@redhat.com> wrote:
> > > > >
> > > > > > Andrew you will not be pushing this patchset in 5.1 ?
> > > > >
> > > > > I'd like to.  It sounds like we're converging on a plan.
> > > > >
> > > > > It would be good to hear more from the driver developers who will be
> > > > > consuming these new features - links to patchsets, review feedback,
> > > > > etc.  Which individuals should we be asking?  Felix, Christian and
> > > > > Jason, perhaps?
> > > > >
> > > >
> > > > So i am guessing you will not send this to Linus ? Should i repost ?
> > > > This patchset has 2 sides, first side is just reworking the HMM API
> > > > to make something better in respect to process lifetime. AMD folks
> > > > did find that helpful [1]. This rework is also necessary to ease up
> > > > the convertion of ODP to HMM [2] and Jason already said that he is
> > > > interested in seing that happening [3]. By missing 5.1 it means now
> > > > that i can not push ODP to HMM in 5.2 and it will be postpone to 5.3
> > > > which is also postoning other work ...
> > > >
> > > > The second side is it adds 2 new helper dma map and dma unmap both
> > > > are gonna be use by ODP and latter by nouveau (after some other
> > > > nouveau changes are done). This new functions just do dma_map ie:
> > > >     hmm_dma_map() {
> > > >         existing_hmm_api()
> > > >         for_each_page() {
> > > >             dma_map_page()
> > > >         }
> > > >     }
> > > >
> > > > Do you want to see anymore justification than that ?
> > >
> > > Yes, why does hmm needs its own dma mapping apis? It seems to
> > > perpetuate the perception that hmm is something bolted onto the side
> > > of the core-mm rather than a native capability.
> >
> > Seriously ?
> 
> Yes.
> 
> > Kernel is fill with example where common code pattern that are not
> > device specific are turn into helpers and here this is exactly what
> > it is. A common pattern that all device driver will do which is turn
> > into a common helper.
> 
> Yes, but we also try not to introduce thin wrappers around existing
> apis. If the current dma api does not understand some hmm constraint
> I'm questioning why not teach the dma api that constraint and make it
> a native capability rather than asking the driver developer to
> understand the rules about when to use dma_map_page() vs
> hmm_dma_map().

There is nothing special here, existing_hmm_api() return an array of
page and the new helper just call dma_map_page for each entry in that
array. If it fails it undo everything so that error handling is share.

So i am not playing trick with DMA API i am just providing an helper
for a common pattern. Maybe the name confuse you but the pseudo should
be selft explanatory:
    Before
        mydriver_mirror_range() {
            err = existing_hmm_mirror_api(pages)
            if (err) {...}
            for_each_page(pages) {
                pas[i]= dma_map_page()
                if (dma_error(pas[i])) { ... }
            }
            // use pas[]
        }

    After
        mydriver_mirror_range() {
            err = hmm_range_dma_map(pas)
            if (err) { ... }
            // use pas[]
        }

So there is no rule of using one or the other. In the end it is the
same code. But instead of duplicating it in multiple drivers it is
share.

> 
> For example I don't think we want to end up with more headers like
> include/linux/pci-dma-compat.h.
> 
> > Moreover this allow to share the same error code handling accross
> > driver when mapping one page fails. So this avoid the needs to
> > duplicate same boiler plate code accross different drivers.
> >
> > Is code factorization not a good thing ? Should i duplicate every-
> > thing in every single driver ?
> 
> I did not ask for duplication, I asked why is it not more deeply integrated.

Because it is a common code pattern for HMM user not for DMA user.

> > If that's not enough, this will also allow to handle peer to peer
> > and i posted patches for that [1] and again this is to avoid
> > duplicating common code accross different drivers.
> 
> I went looking for the hmm_dma_map() patches on the list but could not
> find them, so I was reacting to the "This new functions just do
> dma_map", and wondered if that was the full extent of the
> justification.

They are here [1] patch 7 in this patch serie

Cheers,
Jérôme

[1] https://lkml.org/lkml/2019/1/29/1016
Dan Williams March 18, 2019, 7:36 p.m. UTC | #14
On Mon, Mar 18, 2019 at 12:29 PM Jerome Glisse <jglisse@redhat.com> wrote:
[..]
> > I went looking for the hmm_dma_map() patches on the list but could not
> > find them, so I was reacting to the "This new functions just do
> > dma_map", and wondered if that was the full extent of the
> > justification.
>
> They are here [1] patch 7 in this patch serie

Ah, I was missing the _range in my search. I'll take my comments over
to that patch. Thanks.

> [1] https://lkml.org/lkml/2019/1/29/1016
Ira Weiny March 19, 2019, 2:18 p.m. UTC | #15
On Tue, Mar 19, 2019 at 12:13:40PM -0700, Dan Williams wrote:
> On Tue, Mar 19, 2019 at 12:05 PM Jerome Glisse <jglisse@redhat.com> wrote:
> >
> > On Tue, Mar 19, 2019 at 11:42:00AM -0700, Dan Williams wrote:
> > > On Tue, Mar 19, 2019 at 10:45 AM Jerome Glisse <jglisse@redhat.com> wrote:
> > > >
> > > > On Tue, Mar 19, 2019 at 10:33:57AM -0700, Dan Williams wrote:
> > > > > On Tue, Mar 19, 2019 at 10:19 AM Jerome Glisse <jglisse@redhat.com> wrote:
> > > > > >
> > > > > > On Tue, Mar 19, 2019 at 10:12:49AM -0700, Andrew Morton wrote:
> > > > > > > On Tue, 19 Mar 2019 12:58:02 -0400 Jerome Glisse <jglisse@redhat.com> wrote:

[snip]

> >
> > Right now i am trying to unify driver for device that have can support
> > the mmu notifier approach through HMM. Unify to a superset of driver
> > that can not abide by mmu notifier is on my todo list like i said but
> > it comes after. I do not want to make the big jump in just one go. So
> > i doing thing under HMM and thus in HMM namespace, but once i tackle
> > the larger set i will move to generic namespace what make sense.
> >
> > This exact approach did happen several time already in the kernel. In
> > the GPU sub-system we did it several time. First do something for couple
> > devices that are very similar then grow to a bigger set of devices and
> > generalise along the way.
> >
> > So i do not see what is the problem of me repeating that same pattern
> > here again. Do something for a smaller set before tackling it on for
> > a bigger set.
> 
> All of that is fine, but when I asked about the ultimate trajectory
> that replaces hmm_range_dma_map() with an updated / HMM-aware GUP
> implementation, the response was that hmm_range_dma_map() is here to
> stay. The issue is not with forking off a small side effort, it's the
> plan to absorb that capability into a common implementation across
> non-HMM drivers where possible.

Just to get on the record in this thread.

+1

I think having an interface which handles the MMU notifier stuff for drivers is
awesome but we need to agree that the trajectory is to help more drivers if
possible.

Ira
Andrew Morton March 19, 2019, 4:40 p.m. UTC | #16
On Mon, 18 Mar 2019 13:04:04 -0400 Jerome Glisse <jglisse@redhat.com> wrote:

> On Wed, Mar 13, 2019 at 09:10:04AM -0700, Andrew Morton wrote:
> > On Tue, 12 Mar 2019 21:27:06 -0400 Jerome Glisse <jglisse@redhat.com> wrote:
> > 
> > > Andrew you will not be pushing this patchset in 5.1 ?
> > 
> > I'd like to.  It sounds like we're converging on a plan.
> > 
> > It would be good to hear more from the driver developers who will be
> > consuming these new features - links to patchsets, review feedback,
> > etc.  Which individuals should we be asking?  Felix, Christian and
> > Jason, perhaps?
> > 
> 
> So i am guessing you will not send this to Linus ?

I was waiting to see how the discussion proceeds.  Was also expecting
various changelog updates (at least) - more acks from driver
developers, additional pointers to client driver patchsets, description
of their readiness, etc.

Today I discover that Alex has cherrypicked "mm/hmm: use reference
counting for HMM struct" into a tree which is fed into linux-next which
rather messes things up from my end and makes it hard to feed a
(possibly modified version of) that into Linus.

So I think I'll throw up my hands, drop them all and shall await
developments :(
Jerome Glisse March 19, 2019, 4:58 p.m. UTC | #17
On Tue, Mar 19, 2019 at 09:40:07AM -0700, Andrew Morton wrote:
> On Mon, 18 Mar 2019 13:04:04 -0400 Jerome Glisse <jglisse@redhat.com> wrote:
> 
> > On Wed, Mar 13, 2019 at 09:10:04AM -0700, Andrew Morton wrote:
> > > On Tue, 12 Mar 2019 21:27:06 -0400 Jerome Glisse <jglisse@redhat.com> wrote:
> > > 
> > > > Andrew you will not be pushing this patchset in 5.1 ?
> > > 
> > > I'd like to.  It sounds like we're converging on a plan.
> > > 
> > > It would be good to hear more from the driver developers who will be
> > > consuming these new features - links to patchsets, review feedback,
> > > etc.  Which individuals should we be asking?  Felix, Christian and
> > > Jason, perhaps?
> > > 
> > 
> > So i am guessing you will not send this to Linus ?
> 
> I was waiting to see how the discussion proceeds.  Was also expecting
> various changelog updates (at least) - more acks from driver
> developers, additional pointers to client driver patchsets, description
> of their readiness, etc.

nouveau will benefit from this patchset and is already upstream in 5.1
so i am not sure what kind of pointer i can give for that, it is already
there. amdgpu will also benefit from it and is queue up AFAICT. ODP RDMA
is the third driver and i gave link to the patch that also use the 2
new functions that this patchset introduce. Do you want more ?

I guess i will repost with updated ack as Felix, Jason and few others
told me they were fine with it.

> 
> Today I discover that Alex has cherrypicked "mm/hmm: use reference
> counting for HMM struct" into a tree which is fed into linux-next which
> rather messes things up from my end and makes it hard to feed a
> (possibly modified version of) that into Linus.

:( i did not know the tree they pull that in was fed into next. I will
discourage them from doing so going forward.

> So I think I'll throw up my hands, drop them all and shall await
> developments :(

What more do you want to see ? I can repost with the ack already given
and the improve commit wording on some of the patch. But from user point
of view nouveau is already upstream, ODP RDMA depends on this patchset
and is posted and i have given link to it. amdgpu is queue up. What more
do i need ?

Cheers,
Jérôme
Andrew Morton March 19, 2019, 5:12 p.m. UTC | #18
On Tue, 19 Mar 2019 12:58:02 -0400 Jerome Glisse <jglisse@redhat.com> wrote:

> > So I think I'll throw up my hands, drop them all and shall await
> > developments :(
> 
> What more do you want to see ? I can repost with the ack already given
> and the improve commit wording on some of the patch. But from user point
> of view nouveau is already upstream, ODP RDMA depends on this patchset
> and is posted and i have given link to it. amdgpu is queue up. What more
> do i need ?

I guess I can ignore linux-next for a few days.  

Yes, a resend against mainline with those various updates will be
helpful.  Please go through the various fixes which we had as well:


mm-hmm-use-reference-counting-for-hmm-struct.patch
mm-hmm-do-not-erase-snapshot-when-a-range-is-invalidated.patch
mm-hmm-improve-and-rename-hmm_vma_get_pfns-to-hmm_range_snapshot.patch
mm-hmm-improve-and-rename-hmm_vma_fault-to-hmm_range_fault.patch
mm-hmm-improve-driver-api-to-work-and-wait-over-a-range.patch
mm-hmm-improve-driver-api-to-work-and-wait-over-a-range-fix.patch
mm-hmm-improve-driver-api-to-work-and-wait-over-a-range-fix-fix.patch
mm-hmm-add-default-fault-flags-to-avoid-the-need-to-pre-fill-pfns-arrays.patch
mm-hmm-add-an-helper-function-that-fault-pages-and-map-them-to-a-device.patch
mm-hmm-support-hugetlbfs-snap-shoting-faulting-and-dma-mapping.patch
mm-hmm-allow-to-mirror-vma-of-a-file-on-a-dax-backed-filesystem.patch
mm-hmm-allow-to-mirror-vma-of-a-file-on-a-dax-backed-filesystem-fix.patch
mm-hmm-allow-to-mirror-vma-of-a-file-on-a-dax-backed-filesystem-fix-2.patch
mm-hmm-add-helpers-for-driver-to-safely-take-the-mmap_sem.patch

Also, the discussion regarding [07/10] is substantial and is ongoing so
please let's push along wth that.

What is the review/discussion status of "[PATCH 09/10] mm/hmm: allow to
mirror vma of a file on a DAX backed filesystem"?
Jerome Glisse March 19, 2019, 5:18 p.m. UTC | #19
On Tue, Mar 19, 2019 at 10:12:49AM -0700, Andrew Morton wrote:
> On Tue, 19 Mar 2019 12:58:02 -0400 Jerome Glisse <jglisse@redhat.com> wrote:
> 
> > > So I think I'll throw up my hands, drop them all and shall await
> > > developments :(
> > 
> > What more do you want to see ? I can repost with the ack already given
> > and the improve commit wording on some of the patch. But from user point
> > of view nouveau is already upstream, ODP RDMA depends on this patchset
> > and is posted and i have given link to it. amdgpu is queue up. What more
> > do i need ?
> 
> I guess I can ignore linux-next for a few days.  
> 
> Yes, a resend against mainline with those various updates will be
> helpful.  Please go through the various fixes which we had as well:

Yes i will not forget them and i will try to get more config build
to be sure there is not issue. I need to register a tree with the
rand-config builder but i lack place where i can host a https tree
(i believe this is a requirement).

> 
> mm-hmm-use-reference-counting-for-hmm-struct.patch
> mm-hmm-do-not-erase-snapshot-when-a-range-is-invalidated.patch
> mm-hmm-improve-and-rename-hmm_vma_get_pfns-to-hmm_range_snapshot.patch
> mm-hmm-improve-and-rename-hmm_vma_fault-to-hmm_range_fault.patch
> mm-hmm-improve-driver-api-to-work-and-wait-over-a-range.patch
> mm-hmm-improve-driver-api-to-work-and-wait-over-a-range-fix.patch
> mm-hmm-improve-driver-api-to-work-and-wait-over-a-range-fix-fix.patch
> mm-hmm-add-default-fault-flags-to-avoid-the-need-to-pre-fill-pfns-arrays.patch
> mm-hmm-add-an-helper-function-that-fault-pages-and-map-them-to-a-device.patch
> mm-hmm-support-hugetlbfs-snap-shoting-faulting-and-dma-mapping.patch
> mm-hmm-allow-to-mirror-vma-of-a-file-on-a-dax-backed-filesystem.patch
> mm-hmm-allow-to-mirror-vma-of-a-file-on-a-dax-backed-filesystem-fix.patch
> mm-hmm-allow-to-mirror-vma-of-a-file-on-a-dax-backed-filesystem-fix-2.patch
> mm-hmm-add-helpers-for-driver-to-safely-take-the-mmap_sem.patch
> 
> Also, the discussion regarding [07/10] is substantial and is ongoing so
> please let's push along wth that.

I can move it as last patch in the serie but it is needed for ODP RDMA
convertion too. Otherwise i will just move that code into the ODP RDMA
code and will have to move it again into HMM code once i am done with
the nouveau changes and in the meantime i expect other driver will want
to use this 2 helpers too.

> 
> What is the review/discussion status of "[PATCH 09/10] mm/hmm: allow to
> mirror vma of a file on a DAX backed filesystem"?

I explained that this is needed for the ODP RDMA convertion as ODP RDMA
does supported DAX today and thus i can not push that convertion without
that support as otherwise i would regress RDMA ODP.

Also this is to be use by nouveau which is upstream and there is no
reasons to not support vma that happens to be mmap of a file on a file-
system that is using a DAX block device.

I do not think Dan had any comment code wise, i think he was complaining
about the wording of the commit not being clear and i proposed an updated
wording that he seemed to like.

Cheers,
Jérôme
Dan Williams March 19, 2019, 5:33 p.m. UTC | #20
On Tue, Mar 19, 2019 at 10:19 AM Jerome Glisse <jglisse@redhat.com> wrote:
>
> On Tue, Mar 19, 2019 at 10:12:49AM -0700, Andrew Morton wrote:
> > On Tue, 19 Mar 2019 12:58:02 -0400 Jerome Glisse <jglisse@redhat.com> wrote:
[..]
> > Also, the discussion regarding [07/10] is substantial and is ongoing so
> > please let's push along wth that.
>
> I can move it as last patch in the serie but it is needed for ODP RDMA
> convertion too. Otherwise i will just move that code into the ODP RDMA
> code and will have to move it again into HMM code once i am done with
> the nouveau changes and in the meantime i expect other driver will want
> to use this 2 helpers too.

I still hold out hope that we can find a way to have productive
discussions about the implementation of this infrastructure.
Threatening to move the code elsewhere to bypass the feedback is not
productive.

>
> >
> > What is the review/discussion status of "[PATCH 09/10] mm/hmm: allow to
> > mirror vma of a file on a DAX backed filesystem"?
>
> I explained that this is needed for the ODP RDMA convertion as ODP RDMA
> does supported DAX today and thus i can not push that convertion without
> that support as otherwise i would regress RDMA ODP.
>
> Also this is to be use by nouveau which is upstream and there is no
> reasons to not support vma that happens to be mmap of a file on a file-
> system that is using a DAX block device.
>
> I do not think Dan had any comment code wise, i think he was complaining
> about the wording of the commit not being clear and i proposed an updated
> wording that he seemed to like.

Yes, please resend with the updated changelog and I'll ack.
Jerome Glisse March 19, 2019, 5:45 p.m. UTC | #21
On Tue, Mar 19, 2019 at 10:33:57AM -0700, Dan Williams wrote:
> On Tue, Mar 19, 2019 at 10:19 AM Jerome Glisse <jglisse@redhat.com> wrote:
> >
> > On Tue, Mar 19, 2019 at 10:12:49AM -0700, Andrew Morton wrote:
> > > On Tue, 19 Mar 2019 12:58:02 -0400 Jerome Glisse <jglisse@redhat.com> wrote:
> [..]
> > > Also, the discussion regarding [07/10] is substantial and is ongoing so
> > > please let's push along wth that.
> >
> > I can move it as last patch in the serie but it is needed for ODP RDMA
> > convertion too. Otherwise i will just move that code into the ODP RDMA
> > code and will have to move it again into HMM code once i am done with
> > the nouveau changes and in the meantime i expect other driver will want
> > to use this 2 helpers too.
> 
> I still hold out hope that we can find a way to have productive
> discussions about the implementation of this infrastructure.
> Threatening to move the code elsewhere to bypass the feedback is not
> productive.

I am not threatening anything that code is in ODP _today_ with that
patchset i was factering it out so that i could also use it in nouveau.
nouveau is built in such way that right now i can not use it directly.
But i wanted to factor out now in hope that i can get the nouveau
changes in 5.2 and then convert nouveau in 5.3.

So when i said that code will be in ODP it just means that instead of
removing it from ODP i will keep it there and it will just delay more
code sharing for everyone.


> 
> >
> > >
> > > What is the review/discussion status of "[PATCH 09/10] mm/hmm: allow to
> > > mirror vma of a file on a DAX backed filesystem"?
> >
> > I explained that this is needed for the ODP RDMA convertion as ODP RDMA
> > does supported DAX today and thus i can not push that convertion without
> > that support as otherwise i would regress RDMA ODP.
> >
> > Also this is to be use by nouveau which is upstream and there is no
> > reasons to not support vma that happens to be mmap of a file on a file-
> > system that is using a DAX block device.
> >
> > I do not think Dan had any comment code wise, i think he was complaining
> > about the wording of the commit not being clear and i proposed an updated
> > wording that he seemed to like.
> 
> Yes, please resend with the updated changelog and I'll ack.
Dan Williams March 19, 2019, 6:42 p.m. UTC | #22
On Tue, Mar 19, 2019 at 10:45 AM Jerome Glisse <jglisse@redhat.com> wrote:
>
> On Tue, Mar 19, 2019 at 10:33:57AM -0700, Dan Williams wrote:
> > On Tue, Mar 19, 2019 at 10:19 AM Jerome Glisse <jglisse@redhat.com> wrote:
> > >
> > > On Tue, Mar 19, 2019 at 10:12:49AM -0700, Andrew Morton wrote:
> > > > On Tue, 19 Mar 2019 12:58:02 -0400 Jerome Glisse <jglisse@redhat.com> wrote:
> > [..]
> > > > Also, the discussion regarding [07/10] is substantial and is ongoing so
> > > > please let's push along wth that.
> > >
> > > I can move it as last patch in the serie but it is needed for ODP RDMA
> > > convertion too. Otherwise i will just move that code into the ODP RDMA
> > > code and will have to move it again into HMM code once i am done with
> > > the nouveau changes and in the meantime i expect other driver will want
> > > to use this 2 helpers too.
> >
> > I still hold out hope that we can find a way to have productive
> > discussions about the implementation of this infrastructure.
> > Threatening to move the code elsewhere to bypass the feedback is not
> > productive.
>
> I am not threatening anything that code is in ODP _today_ with that
> patchset i was factering it out so that i could also use it in nouveau.
> nouveau is built in such way that right now i can not use it directly.
> But i wanted to factor out now in hope that i can get the nouveau
> changes in 5.2 and then convert nouveau in 5.3.
>
> So when i said that code will be in ODP it just means that instead of
> removing it from ODP i will keep it there and it will just delay more
> code sharing for everyone.

The point I'm trying to make is that the code sharing for everyone is
moving the implementation closer to canonical kernel code and use
existing infrastructure. For example, I look at 'struct hmm_range' and
see nothing hmm specific in it. I think we can make that generic and
not build up more apis and data structures in the "hmm" namespace.
Alex Deucher March 19, 2019, 6:51 p.m. UTC | #23
> -----Original Message-----
> From: Jerome Glisse <jglisse@redhat.com>
> Sent: Tuesday, March 19, 2019 12:58 PM
> To: Andrew Morton <akpm@linux-foundation.org>
> Cc: linux-mm@kvack.org; linux-kernel@vger.kernel.org; Kuehling, Felix
> <Felix.Kuehling@amd.com>; Koenig, Christian
> <Christian.Koenig@amd.com>; Ralph Campbell <rcampbell@nvidia.com>;
> John Hubbard <jhubbard@nvidia.com>; Jason Gunthorpe
> <jgg@mellanox.com>; Dan Williams <dan.j.williams@intel.com>; Deucher,
> Alexander <Alexander.Deucher@amd.com>
> Subject: Re: [PATCH 00/10] HMM updates for 5.1
> 
> On Tue, Mar 19, 2019 at 09:40:07AM -0700, Andrew Morton wrote:
> > On Mon, 18 Mar 2019 13:04:04 -0400 Jerome Glisse <jglisse@redhat.com>
> wrote:
> >
> > > On Wed, Mar 13, 2019 at 09:10:04AM -0700, Andrew Morton wrote:
> > > > On Tue, 12 Mar 2019 21:27:06 -0400 Jerome Glisse
> <jglisse@redhat.com> wrote:
> > > >
> > > > > Andrew you will not be pushing this patchset in 5.1 ?
> > > >
> > > > I'd like to.  It sounds like we're converging on a plan.
> > > >
> > > > It would be good to hear more from the driver developers who will
> > > > be consuming these new features - links to patchsets, review
> > > > feedback, etc.  Which individuals should we be asking?  Felix,
> > > > Christian and Jason, perhaps?
> > > >
> > >
> > > So i am guessing you will not send this to Linus ?
> >
> > I was waiting to see how the discussion proceeds.  Was also expecting
> > various changelog updates (at least) - more acks from driver
> > developers, additional pointers to client driver patchsets,
> > description of their readiness, etc.
> 
> nouveau will benefit from this patchset and is already upstream in 5.1 so i am
> not sure what kind of pointer i can give for that, it is already there. amdgpu
> will also benefit from it and is queue up AFAICT. ODP RDMA is the third driver
> and i gave link to the patch that also use the 2 new functions that this
> patchset introduce. Do you want more ?
> 
> I guess i will repost with updated ack as Felix, Jason and few others told me
> they were fine with it.
> 
> >
> > Today I discover that Alex has cherrypicked "mm/hmm: use reference
> > counting for HMM struct" into a tree which is fed into linux-next
> > which rather messes things up from my end and makes it hard to feed a
> > (possibly modified version of) that into Linus.
> 
> :( i did not know the tree they pull that in was fed into next. I will discourage
> them from doing so going forward.
> 

I can drop it.  I included it because it fixes an issue with HMM as used by amdgpu in our current -next tree.  So users testing my drm-next branch will run into the issue without it.  I don't plan to include it the actual -next PR.  What is the recommended way to deal with this?

Alex

> > So I think I'll throw up my hands, drop them all and shall await
> > developments :(
> 
> What more do you want to see ? I can repost with the ack already given and
> the improve commit wording on some of the patch. But from user point of
> view nouveau is already upstream, ODP RDMA depends on this patchset and
> is posted and i have given link to it. amdgpu is queue up. What more do i
> need ?
> 
> Cheers,
> Jérôme
Jerome Glisse March 19, 2019, 7:05 p.m. UTC | #24
On Tue, Mar 19, 2019 at 11:42:00AM -0700, Dan Williams wrote:
> On Tue, Mar 19, 2019 at 10:45 AM Jerome Glisse <jglisse@redhat.com> wrote:
> >
> > On Tue, Mar 19, 2019 at 10:33:57AM -0700, Dan Williams wrote:
> > > On Tue, Mar 19, 2019 at 10:19 AM Jerome Glisse <jglisse@redhat.com> wrote:
> > > >
> > > > On Tue, Mar 19, 2019 at 10:12:49AM -0700, Andrew Morton wrote:
> > > > > On Tue, 19 Mar 2019 12:58:02 -0400 Jerome Glisse <jglisse@redhat.com> wrote:
> > > [..]
> > > > > Also, the discussion regarding [07/10] is substantial and is ongoing so
> > > > > please let's push along wth that.
> > > >
> > > > I can move it as last patch in the serie but it is needed for ODP RDMA
> > > > convertion too. Otherwise i will just move that code into the ODP RDMA
> > > > code and will have to move it again into HMM code once i am done with
> > > > the nouveau changes and in the meantime i expect other driver will want
> > > > to use this 2 helpers too.
> > >
> > > I still hold out hope that we can find a way to have productive
> > > discussions about the implementation of this infrastructure.
> > > Threatening to move the code elsewhere to bypass the feedback is not
> > > productive.
> >
> > I am not threatening anything that code is in ODP _today_ with that
> > patchset i was factering it out so that i could also use it in nouveau.
> > nouveau is built in such way that right now i can not use it directly.
> > But i wanted to factor out now in hope that i can get the nouveau
> > changes in 5.2 and then convert nouveau in 5.3.
> >
> > So when i said that code will be in ODP it just means that instead of
> > removing it from ODP i will keep it there and it will just delay more
> > code sharing for everyone.
> 
> The point I'm trying to make is that the code sharing for everyone is
> moving the implementation closer to canonical kernel code and use
> existing infrastructure. For example, I look at 'struct hmm_range' and
> see nothing hmm specific in it. I think we can make that generic and
> not build up more apis and data structures in the "hmm" namespace.

Right now i am trying to unify driver for device that have can support
the mmu notifier approach through HMM. Unify to a superset of driver
that can not abide by mmu notifier is on my todo list like i said but
it comes after. I do not want to make the big jump in just one go. So
i doing thing under HMM and thus in HMM namespace, but once i tackle
the larger set i will move to generic namespace what make sense.

This exact approach did happen several time already in the kernel. In
the GPU sub-system we did it several time. First do something for couple
devices that are very similar then grow to a bigger set of devices and
generalise along the way.

So i do not see what is the problem of me repeating that same pattern
here again. Do something for a smaller set before tackling it on for
a bigger set.

Cheers,
Jérôme
Dan Williams March 19, 2019, 7:13 p.m. UTC | #25
On Tue, Mar 19, 2019 at 12:05 PM Jerome Glisse <jglisse@redhat.com> wrote:
>
> On Tue, Mar 19, 2019 at 11:42:00AM -0700, Dan Williams wrote:
> > On Tue, Mar 19, 2019 at 10:45 AM Jerome Glisse <jglisse@redhat.com> wrote:
> > >
> > > On Tue, Mar 19, 2019 at 10:33:57AM -0700, Dan Williams wrote:
> > > > On Tue, Mar 19, 2019 at 10:19 AM Jerome Glisse <jglisse@redhat.com> wrote:
> > > > >
> > > > > On Tue, Mar 19, 2019 at 10:12:49AM -0700, Andrew Morton wrote:
> > > > > > On Tue, 19 Mar 2019 12:58:02 -0400 Jerome Glisse <jglisse@redhat.com> wrote:
> > > > [..]
> > > > > > Also, the discussion regarding [07/10] is substantial and is ongoing so
> > > > > > please let's push along wth that.
> > > > >
> > > > > I can move it as last patch in the serie but it is needed for ODP RDMA
> > > > > convertion too. Otherwise i will just move that code into the ODP RDMA
> > > > > code and will have to move it again into HMM code once i am done with
> > > > > the nouveau changes and in the meantime i expect other driver will want
> > > > > to use this 2 helpers too.
> > > >
> > > > I still hold out hope that we can find a way to have productive
> > > > discussions about the implementation of this infrastructure.
> > > > Threatening to move the code elsewhere to bypass the feedback is not
> > > > productive.
> > >
> > > I am not threatening anything that code is in ODP _today_ with that
> > > patchset i was factering it out so that i could also use it in nouveau.
> > > nouveau is built in such way that right now i can not use it directly.
> > > But i wanted to factor out now in hope that i can get the nouveau
> > > changes in 5.2 and then convert nouveau in 5.3.
> > >
> > > So when i said that code will be in ODP it just means that instead of
> > > removing it from ODP i will keep it there and it will just delay more
> > > code sharing for everyone.
> >
> > The point I'm trying to make is that the code sharing for everyone is
> > moving the implementation closer to canonical kernel code and use
> > existing infrastructure. For example, I look at 'struct hmm_range' and
> > see nothing hmm specific in it. I think we can make that generic and
> > not build up more apis and data structures in the "hmm" namespace.
>
> Right now i am trying to unify driver for device that have can support
> the mmu notifier approach through HMM. Unify to a superset of driver
> that can not abide by mmu notifier is on my todo list like i said but
> it comes after. I do not want to make the big jump in just one go. So
> i doing thing under HMM and thus in HMM namespace, but once i tackle
> the larger set i will move to generic namespace what make sense.
>
> This exact approach did happen several time already in the kernel. In
> the GPU sub-system we did it several time. First do something for couple
> devices that are very similar then grow to a bigger set of devices and
> generalise along the way.
>
> So i do not see what is the problem of me repeating that same pattern
> here again. Do something for a smaller set before tackling it on for
> a bigger set.

All of that is fine, but when I asked about the ultimate trajectory
that replaces hmm_range_dma_map() with an updated / HMM-aware GUP
implementation, the response was that hmm_range_dma_map() is here to
stay. The issue is not with forking off a small side effort, it's the
plan to absorb that capability into a common implementation across
non-HMM drivers where possible.
Jerome Glisse March 19, 2019, 7:18 p.m. UTC | #26
On Tue, Mar 19, 2019 at 12:13:40PM -0700, Dan Williams wrote:
> On Tue, Mar 19, 2019 at 12:05 PM Jerome Glisse <jglisse@redhat.com> wrote:
> >
> > On Tue, Mar 19, 2019 at 11:42:00AM -0700, Dan Williams wrote:
> > > On Tue, Mar 19, 2019 at 10:45 AM Jerome Glisse <jglisse@redhat.com> wrote:
> > > >
> > > > On Tue, Mar 19, 2019 at 10:33:57AM -0700, Dan Williams wrote:
> > > > > On Tue, Mar 19, 2019 at 10:19 AM Jerome Glisse <jglisse@redhat.com> wrote:
> > > > > >
> > > > > > On Tue, Mar 19, 2019 at 10:12:49AM -0700, Andrew Morton wrote:
> > > > > > > On Tue, 19 Mar 2019 12:58:02 -0400 Jerome Glisse <jglisse@redhat.com> wrote:
> > > > > [..]
> > > > > > > Also, the discussion regarding [07/10] is substantial and is ongoing so
> > > > > > > please let's push along wth that.
> > > > > >
> > > > > > I can move it as last patch in the serie but it is needed for ODP RDMA
> > > > > > convertion too. Otherwise i will just move that code into the ODP RDMA
> > > > > > code and will have to move it again into HMM code once i am done with
> > > > > > the nouveau changes and in the meantime i expect other driver will want
> > > > > > to use this 2 helpers too.
> > > > >
> > > > > I still hold out hope that we can find a way to have productive
> > > > > discussions about the implementation of this infrastructure.
> > > > > Threatening to move the code elsewhere to bypass the feedback is not
> > > > > productive.
> > > >
> > > > I am not threatening anything that code is in ODP _today_ with that
> > > > patchset i was factering it out so that i could also use it in nouveau.
> > > > nouveau is built in such way that right now i can not use it directly.
> > > > But i wanted to factor out now in hope that i can get the nouveau
> > > > changes in 5.2 and then convert nouveau in 5.3.
> > > >
> > > > So when i said that code will be in ODP it just means that instead of
> > > > removing it from ODP i will keep it there and it will just delay more
> > > > code sharing for everyone.
> > >
> > > The point I'm trying to make is that the code sharing for everyone is
> > > moving the implementation closer to canonical kernel code and use
> > > existing infrastructure. For example, I look at 'struct hmm_range' and
> > > see nothing hmm specific in it. I think we can make that generic and
> > > not build up more apis and data structures in the "hmm" namespace.
> >
> > Right now i am trying to unify driver for device that have can support
> > the mmu notifier approach through HMM. Unify to a superset of driver
> > that can not abide by mmu notifier is on my todo list like i said but
> > it comes after. I do not want to make the big jump in just one go. So
> > i doing thing under HMM and thus in HMM namespace, but once i tackle
> > the larger set i will move to generic namespace what make sense.
> >
> > This exact approach did happen several time already in the kernel. In
> > the GPU sub-system we did it several time. First do something for couple
> > devices that are very similar then grow to a bigger set of devices and
> > generalise along the way.
> >
> > So i do not see what is the problem of me repeating that same pattern
> > here again. Do something for a smaller set before tackling it on for
> > a bigger set.
> 
> All of that is fine, but when I asked about the ultimate trajectory
> that replaces hmm_range_dma_map() with an updated / HMM-aware GUP
> implementation, the response was that hmm_range_dma_map() is here to
> stay. The issue is not with forking off a small side effort, it's the
> plan to absorb that capability into a common implementation across
> non-HMM drivers where possible.

hmm_range_dma_map() is a superset of gup_range_dma_map() because on
top of gup_range_dma_map() the hmm version deals with mmu notifier.

But everything that is not mmu notifier related can be share through
gup_range_dma_map() so plan is to end up with:
    hmm_range_dma_map(hmm_struct) {
        hmm_mmu_notifier_specific_prep_step();
        gup_range_dma_map(hmm_struct->common_base_struct);
        hmm_mmu_notifier_specific_post_step();
    }

ie share as much as possible. Does that not make sense ? To get
there i will need to do non trivial addition to GUP and so i went
first to get HMM bits working and then work on common gup API.

Cheers,
Jérôme
Jerome Glisse March 19, 2019, 8:25 p.m. UTC | #27
On Tue, Mar 19, 2019 at 03:18:49PM -0400, Jerome Glisse wrote:
> On Tue, Mar 19, 2019 at 12:13:40PM -0700, Dan Williams wrote:
> > On Tue, Mar 19, 2019 at 12:05 PM Jerome Glisse <jglisse@redhat.com> wrote:
> > >
> > > On Tue, Mar 19, 2019 at 11:42:00AM -0700, Dan Williams wrote:
> > > > On Tue, Mar 19, 2019 at 10:45 AM Jerome Glisse <jglisse@redhat.com> wrote:
> > > > >
> > > > > On Tue, Mar 19, 2019 at 10:33:57AM -0700, Dan Williams wrote:
> > > > > > On Tue, Mar 19, 2019 at 10:19 AM Jerome Glisse <jglisse@redhat.com> wrote:
> > > > > > >
> > > > > > > On Tue, Mar 19, 2019 at 10:12:49AM -0700, Andrew Morton wrote:
> > > > > > > > On Tue, 19 Mar 2019 12:58:02 -0400 Jerome Glisse <jglisse@redhat.com> wrote:
> > > > > > [..]
> > > > > > > > Also, the discussion regarding [07/10] is substantial and is ongoing so
> > > > > > > > please let's push along wth that.
> > > > > > >
> > > > > > > I can move it as last patch in the serie but it is needed for ODP RDMA
> > > > > > > convertion too. Otherwise i will just move that code into the ODP RDMA
> > > > > > > code and will have to move it again into HMM code once i am done with
> > > > > > > the nouveau changes and in the meantime i expect other driver will want
> > > > > > > to use this 2 helpers too.
> > > > > >
> > > > > > I still hold out hope that we can find a way to have productive
> > > > > > discussions about the implementation of this infrastructure.
> > > > > > Threatening to move the code elsewhere to bypass the feedback is not
> > > > > > productive.
> > > > >
> > > > > I am not threatening anything that code is in ODP _today_ with that
> > > > > patchset i was factering it out so that i could also use it in nouveau.
> > > > > nouveau is built in such way that right now i can not use it directly.
> > > > > But i wanted to factor out now in hope that i can get the nouveau
> > > > > changes in 5.2 and then convert nouveau in 5.3.
> > > > >
> > > > > So when i said that code will be in ODP it just means that instead of
> > > > > removing it from ODP i will keep it there and it will just delay more
> > > > > code sharing for everyone.
> > > >
> > > > The point I'm trying to make is that the code sharing for everyone is
> > > > moving the implementation closer to canonical kernel code and use
> > > > existing infrastructure. For example, I look at 'struct hmm_range' and
> > > > see nothing hmm specific in it. I think we can make that generic and
> > > > not build up more apis and data structures in the "hmm" namespace.
> > >
> > > Right now i am trying to unify driver for device that have can support
> > > the mmu notifier approach through HMM. Unify to a superset of driver
> > > that can not abide by mmu notifier is on my todo list like i said but
> > > it comes after. I do not want to make the big jump in just one go. So
> > > i doing thing under HMM and thus in HMM namespace, but once i tackle
> > > the larger set i will move to generic namespace what make sense.
> > >
> > > This exact approach did happen several time already in the kernel. In
> > > the GPU sub-system we did it several time. First do something for couple
> > > devices that are very similar then grow to a bigger set of devices and
> > > generalise along the way.
> > >
> > > So i do not see what is the problem of me repeating that same pattern
> > > here again. Do something for a smaller set before tackling it on for
> > > a bigger set.
> > 
> > All of that is fine, but when I asked about the ultimate trajectory
> > that replaces hmm_range_dma_map() with an updated / HMM-aware GUP
> > implementation, the response was that hmm_range_dma_map() is here to
> > stay. The issue is not with forking off a small side effort, it's the
> > plan to absorb that capability into a common implementation across
> > non-HMM drivers where possible.
> 
> hmm_range_dma_map() is a superset of gup_range_dma_map() because on
> top of gup_range_dma_map() the hmm version deals with mmu notifier.
> 
> But everything that is not mmu notifier related can be share through
> gup_range_dma_map() so plan is to end up with:
>     hmm_range_dma_map(hmm_struct) {
>         hmm_mmu_notifier_specific_prep_step();
>         gup_range_dma_map(hmm_struct->common_base_struct);
>         hmm_mmu_notifier_specific_post_step();
>     }
> 
> ie share as much as possible. Does that not make sense ? To get
> there i will need to do non trivial addition to GUP and so i went
> first to get HMM bits working and then work on common gup API.
> 

And more to the hmm_range struct:

struct hmm_range {
    struct vm_area_struct *vma;       // Common
    struct list_head      list;       // HMM specific this is only useful
                                      // to track valid range if a mmu
                                      // notifier happens while we do
                                      // lookup the CPU page table
    unsigned long         start;      // Common
    unsigned long         end;        // Common
    uint64_t              *pfns;      // Common
    const uint64_t        *flags;     // Some flags would be HMM specific
    const uint64_t        *values;    // HMM specific
    uint8_t               pfn_shift;  // Common
    bool                  valid;      // HMM specific
};

So it is not all common they are thing that just do not make sense out
side a HMM capable driver.

Cheers,
Jérôme
Stephen Rothwell March 19, 2019, 9:51 p.m. UTC | #28
Hi Andrew,

On Tue, 19 Mar 2019 10:12:49 -0700 Andrew Morton <akpm@linux-foundation.org> wrote:
>
> Yes, a resend against mainline with those various updates will be
> helpful.  Please go through the various fixes which we had as well:
> 
> mm-hmm-use-reference-counting-for-hmm-struct.patch
> mm-hmm-do-not-erase-snapshot-when-a-range-is-invalidated.patch
> mm-hmm-improve-and-rename-hmm_vma_get_pfns-to-hmm_range_snapshot.patch
> mm-hmm-improve-and-rename-hmm_vma_fault-to-hmm_range_fault.patch
> mm-hmm-improve-driver-api-to-work-and-wait-over-a-range.patch
> mm-hmm-improve-driver-api-to-work-and-wait-over-a-range-fix.patch
> mm-hmm-improve-driver-api-to-work-and-wait-over-a-range-fix-fix.patch
> mm-hmm-add-default-fault-flags-to-avoid-the-need-to-pre-fill-pfns-arrays.patch
> mm-hmm-add-an-helper-function-that-fault-pages-and-map-them-to-a-device.patch
> mm-hmm-support-hugetlbfs-snap-shoting-faulting-and-dma-mapping.patch
> mm-hmm-allow-to-mirror-vma-of-a-file-on-a-dax-backed-filesystem.patch
> mm-hmm-allow-to-mirror-vma-of-a-file-on-a-dax-backed-filesystem-fix.patch
> mm-hmm-allow-to-mirror-vma-of-a-file-on-a-dax-backed-filesystem-fix-2.patch
> mm-hmm-add-helpers-for-driver-to-safely-take-the-mmap_sem.patch

I have dropped all those from linux-next today (along with
mm-hmm-fix-unused-variable-warnings.patch).
Jerome Glisse March 19, 2019, 10:24 p.m. UTC | #29
On Tue, Mar 19, 2019 at 07:18:26AM -0700, Ira Weiny wrote:
> On Tue, Mar 19, 2019 at 12:13:40PM -0700, Dan Williams wrote:
> > On Tue, Mar 19, 2019 at 12:05 PM Jerome Glisse <jglisse@redhat.com> wrote:
> > >
> > > On Tue, Mar 19, 2019 at 11:42:00AM -0700, Dan Williams wrote:
> > > > On Tue, Mar 19, 2019 at 10:45 AM Jerome Glisse <jglisse@redhat.com> wrote:
> > > > >
> > > > > On Tue, Mar 19, 2019 at 10:33:57AM -0700, Dan Williams wrote:
> > > > > > On Tue, Mar 19, 2019 at 10:19 AM Jerome Glisse <jglisse@redhat.com> wrote:
> > > > > > >
> > > > > > > On Tue, Mar 19, 2019 at 10:12:49AM -0700, Andrew Morton wrote:
> > > > > > > > On Tue, 19 Mar 2019 12:58:02 -0400 Jerome Glisse <jglisse@redhat.com> wrote:
> 
> [snip]
> 
> > >
> > > Right now i am trying to unify driver for device that have can support
> > > the mmu notifier approach through HMM. Unify to a superset of driver
> > > that can not abide by mmu notifier is on my todo list like i said but
> > > it comes after. I do not want to make the big jump in just one go. So
> > > i doing thing under HMM and thus in HMM namespace, but once i tackle
> > > the larger set i will move to generic namespace what make sense.
> > >
> > > This exact approach did happen several time already in the kernel. In
> > > the GPU sub-system we did it several time. First do something for couple
> > > devices that are very similar then grow to a bigger set of devices and
> > > generalise along the way.
> > >
> > > So i do not see what is the problem of me repeating that same pattern
> > > here again. Do something for a smaller set before tackling it on for
> > > a bigger set.
> > 
> > All of that is fine, but when I asked about the ultimate trajectory
> > that replaces hmm_range_dma_map() with an updated / HMM-aware GUP
> > implementation, the response was that hmm_range_dma_map() is here to
> > stay. The issue is not with forking off a small side effort, it's the
> > plan to absorb that capability into a common implementation across
> > non-HMM drivers where possible.
> 
> Just to get on the record in this thread.
> 
> +1
> 
> I think having an interface which handles the MMU notifier stuff for drivers is
> awesome but we need to agree that the trajectory is to help more drivers if
> possible.
> 

Yes and i want to get there step by step not just in one giant leap.
It seems Dan would like to see this all one step and i believe this
is too risky and make the patchset much bigger and harder to review.

Cheers,
Jérôme