mbox series

[v1,0/3] udmabuf: Add support for page migration out of movable zone or CMA

Message ID 20230817064934.3424431-1-vivek.kasireddy@intel.com (mailing list archive)
Headers show
Series udmabuf: Add support for page migration out of movable zone or CMA | expand

Message

Kasireddy, Vivek Aug. 17, 2023, 6:49 a.m. UTC
This patch series adds support for migrating pages associated with
a udmabuf out of the movable zone or CMA to avoid breaking features
such as memory hotunplug.

The first patch exports check_and_migrate_movable_pages() function
out of GUP so that the udmabuf driver can leverage it for page
migration that is done as part of the second patch. The last patch
adds two new udmabuf selftests to verify data coherency after
page migration.

Cc: David Hildenbrand <david@redhat.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Dongwon Kim <dongwon.kim@intel.com>
Cc: Junxiao Chang <junxiao.chang@intel.com>

Vivek Kasireddy (3):
  mm/gup: Export check_and_migrate_movable_pages()
  udmabuf: Add support for page migration out of movable zone or CMA
  selftests/dma-buf/udmabuf: Add tests to verify data after page
    migration

 drivers/dma-buf/udmabuf.c                     | 106 +++++++++++-
 include/linux/mm.h                            |   2 +
 mm/gup.c                                      |   9 +-
 .../selftests/drivers/dma-buf/udmabuf.c       | 151 +++++++++++++++++-
 4 files changed, 254 insertions(+), 14 deletions(-)

Comments

Jason Gunthorpe Aug. 17, 2023, 3:01 p.m. UTC | #1
On Wed, Aug 16, 2023 at 11:49:31PM -0700, Vivek Kasireddy wrote:
> This patch series adds support for migrating pages associated with
> a udmabuf out of the movable zone or CMA to avoid breaking features
> such as memory hotunplug.
> 
> The first patch exports check_and_migrate_movable_pages() function
> out of GUP so that the udmabuf driver can leverage it for page
> migration that is done as part of the second patch. The last patch
> adds two new udmabuf selftests to verify data coherency after
> page migration.

Please don't do this. If you want to do what GUP does then call
GUP. udmabuf is not so special that it needs to open code its own
weird version of it.

Jason
Kasireddy, Vivek Aug. 22, 2023, 5:36 a.m. UTC | #2
Hi Jason,

> > This patch series adds support for migrating pages associated with
> > a udmabuf out of the movable zone or CMA to avoid breaking features
> > such as memory hotunplug.
> >
> > The first patch exports check_and_migrate_movable_pages() function
> > out of GUP so that the udmabuf driver can leverage it for page
> > migration that is done as part of the second patch. The last patch
> > adds two new udmabuf selftests to verify data coherency after
> > page migration.
> 
> Please don't do this. If you want to do what GUP does then call
> GUP. udmabuf is not so special that it needs to open code its own
> weird version of it.
We can't call GUP directly as explained in the first patch of this series:
"For drivers that would like to migrate pages out of the movable
zone (or CMA) in order to pin them (longterm) for DMA, using
check_and_migrate_movable_pages() directly provides a convenient
option instead of duplicating similar checks (e.g, checking
the folios for zone, hugetlb, etc) and calling migrate_pages()
directly.

Ideally, a driver is expected to call pin_user_pages(FOLL_LONGTERM)
to migrate and pin the pages for longterm DMA but there are
situations where the GUP APIs cannot be used directly for
various reasons (e.g, when the VMA or start addr cannot be
easily determined but the relevant pages are available)."

Given the current (model and) UAPI (udmabuf_create), the userspace
only shares (memfd, offset, size) values that we use to find the 
relevant pages and pin them (by doing get_page()). Since the goal
is to also migrate these pages, I think we have the following options:
- Leverage check_and_migrate_movable_pages(); but this function
  needs to be exported from GUP.

- Iterate over all the pages (in udmabuf) to check for folio_is_longterm_pinnable()
  and call migrate_pages() eventually. This requires changes only to
  the udmabuf driver but we'd be duplicating much of the functionality
  provided by check_and_migrate_movable_pages().

- Call pin_user_pages_fast(FOLL_LONGTERM) from udmabuf driver. In
  order to do this, we have to first unpin all pages and iterate over all
  the VMAs of the VMM to identify the Guest RAM VMA and then use
  page_address_in_vma() to find the start addr of the ranges and then
  call GUP. Although this approach is feasible, it feels a bit convoluted.

- Add a new udmabuf UAPI to have userspace share (start addr, len) values
  so that the udmabuf driver can directly call GUP APIs. But this means all
  the current users of udmabuf such as Qemu, CrosVM, etc, need to be
  updated to use the new UAPI. 

- Add a new API to the backing store/allocator to longterm-pin the page.
  For example, something along the lines of shmem_pin_mapping_page_longterm()
  for shmem as suggested by Daniel. A similar one needs to be added for
  hugetlbfs as well.

Among these options, the first one seems very reasonable to me.

Thanks,
Vivek

> 
> Jason
Jason Gunthorpe Aug. 22, 2023, 12:23 p.m. UTC | #3
On Tue, Aug 22, 2023 at 05:36:56AM +0000, Kasireddy, Vivek wrote:
> Hi Jason,
> 
> > > This patch series adds support for migrating pages associated with
> > > a udmabuf out of the movable zone or CMA to avoid breaking features
> > > such as memory hotunplug.
> > >
> > > The first patch exports check_and_migrate_movable_pages() function
> > > out of GUP so that the udmabuf driver can leverage it for page
> > > migration that is done as part of the second patch. The last patch
> > > adds two new udmabuf selftests to verify data coherency after
> > > page migration.
> > 
> > Please don't do this. If you want to do what GUP does then call
> > GUP. udmabuf is not so special that it needs to open code its own
> > weird version of it.
> We can't call GUP directly as explained in the first patch of this series:
> "For drivers that would like to migrate pages out of the movable
> zone (or CMA) in order to pin them (longterm) for DMA, using
> check_and_migrate_movable_pages() directly provides a convenient
> option instead of duplicating similar checks (e.g, checking
> the folios for zone, hugetlb, etc) and calling migrate_pages()
> directly.
> 
> Ideally, a driver is expected to call pin_user_pages(FOLL_LONGTERM)
> to migrate and pin the pages for longterm DMA but there are
> situations where the GUP APIs cannot be used directly for
> various reasons (e.g, when the VMA or start addr cannot be
> easily determined but the relevant pages are available)."
> 
> Given the current (model and) UAPI (udmabuf_create), the userspace
> only shares (memfd, offset, size) values that we use to find the
> relevant pages and pin them (by doing get_page()). Since the goal
> is to also migrate these pages, I think we have the following options:

This seems like a bad choice of uAPI - we don't have any kernel
support for pinning from a memfd. If you want this then you have to
build this as generic code, not open code it into udmabuf.

> - Leverage check_and_migrate_movable_pages(); but this function
>   needs to be exported from GUP.

GUP has many behaviors, we keep adding more, these functions should
not leak out of the mm core code into drivers.
 
> - Iterate over all the pages (in udmabuf) to check for folio_is_longterm_pinnable()
>   and call migrate_pages() eventually. This requires changes only to
>   the udmabuf driver but we'd be duplicating much of the functionality
>   provided by check_and_migrate_movable_pages().

Definately not

> - Call pin_user_pages_fast(FOLL_LONGTERM) from udmabuf driver. In
>   order to do this, we have to first unpin all pages and iterate over all
>   the VMAs of the VMM to identify the Guest RAM VMA and then use
>   page_address_in_vma() to find the start addr of the ranges and then
>   call GUP. Although this approach is feasible, it feels a bit convoluted.

Userspace should have told the kernel where the memfd is mapped.
 
> - Add a new udmabuf UAPI to have userspace share (start addr, len) values
>   so that the udmabuf driver can directly call GUP APIs. But this means all
>   the current users of udmabuf such as Qemu, CrosVM, etc, need to be
>   updated to use the new UAPI. 

There you go
 
> - Add a new API to the backing store/allocator to longterm-pin the page.
>   For example, something along the lines of shmem_pin_mapping_page_longterm()
>   for shmem as suggested by Daniel. A similar one needs to be added for
>   hugetlbfs as well.

This may also be reasonable.

Jason
David Hildenbrand Aug. 23, 2023, 9:34 a.m. UTC | #4
>> - Add a new API to the backing store/allocator to longterm-pin the page.
>>    For example, something along the lines of shmem_pin_mapping_page_longterm()
>>    for shmem as suggested by Daniel. A similar one needs to be added for
>>    hugetlbfs as well.
> 
> This may also be reasonable.

Sounds reasonable to keep the old API (that we unfortunately have) working.
Kasireddy, Vivek Aug. 24, 2023, 6:31 a.m. UTC | #5
Hi David,

> 
> >> - Add a new API to the backing store/allocator to longterm-pin the page.
> >>    For example, something along the lines of
> shmem_pin_mapping_page_longterm()
> >>    for shmem as suggested by Daniel. A similar one needs to be added for
> >>    hugetlbfs as well.
> >
> > This may also be reasonable.
> 
> Sounds reasonable to keep the old API (that we unfortunately have) working.
I agree; I'd like to avoid adding new APIs unless absolutely necessary. Given this,
and considering the options I have mentioned earlier, what would be your
recommendation for how page migration needs to be done in udmabuf driver?

Thanks,
Vivek

> 
> --
> Cheers,
> 
> David / dhildenb
David Hildenbrand Aug. 24, 2023, 6:30 p.m. UTC | #6
On 24.08.23 08:31, Kasireddy, Vivek wrote:
> Hi David,
> 
>>
>>>> - Add a new API to the backing store/allocator to longterm-pin the page.
>>>>     For example, something along the lines of
>> shmem_pin_mapping_page_longterm()
>>>>     for shmem as suggested by Daniel. A similar one needs to be added for
>>>>     hugetlbfs as well.
>>>
>>> This may also be reasonable.
>>
>> Sounds reasonable to keep the old API (that we unfortunately have) working.
> I agree; I'd like to avoid adding new APIs unless absolutely necessary. Given this,
> and considering the options I have mentioned earlier, what would be your
> recommendation for how page migration needs to be done in udmabuf driver?

I guess using proper APIs for shmem and hugetlb. So, turning roughly 
what you have in patch#1 for now into common code, and only calling into 
that from udmabug.
Jason Gunthorpe Aug. 24, 2023, 6:30 p.m. UTC | #7
On Thu, Aug 24, 2023 at 08:30:17PM +0200, David Hildenbrand wrote:
> On 24.08.23 08:31, Kasireddy, Vivek wrote:
> > Hi David,
> > 
> > > 
> > > > > - Add a new API to the backing store/allocator to longterm-pin the page.
> > > > >     For example, something along the lines of
> > > shmem_pin_mapping_page_longterm()
> > > > >     for shmem as suggested by Daniel. A similar one needs to be added for
> > > > >     hugetlbfs as well.
> > > > 
> > > > This may also be reasonable.
> > > 
> > > Sounds reasonable to keep the old API (that we unfortunately have) working.
> > I agree; I'd like to avoid adding new APIs unless absolutely necessary. Given this,
> > and considering the options I have mentioned earlier, what would be your
> > recommendation for how page migration needs to be done in udmabuf driver?
> 
> I guess using proper APIs for shmem and hugetlb. So, turning roughly what
> you have in patch#1 for now into common code, and only calling into that
> from udmabug.

This is a lot of work for an obscure uapi :\

Jason
David Hildenbrand Aug. 24, 2023, 6:33 p.m. UTC | #8
On 24.08.23 20:30, Jason Gunthorpe wrote:
> On Thu, Aug 24, 2023 at 08:30:17PM +0200, David Hildenbrand wrote:
>> On 24.08.23 08:31, Kasireddy, Vivek wrote:
>>> Hi David,
>>>
>>>>
>>>>>> - Add a new API to the backing store/allocator to longterm-pin the page.
>>>>>>      For example, something along the lines of
>>>> shmem_pin_mapping_page_longterm()
>>>>>>      for shmem as suggested by Daniel. A similar one needs to be added for
>>>>>>      hugetlbfs as well.
>>>>>
>>>>> This may also be reasonable.
>>>>
>>>> Sounds reasonable to keep the old API (that we unfortunately have) working.
>>> I agree; I'd like to avoid adding new APIs unless absolutely necessary. Given this,
>>> and considering the options I have mentioned earlier, what would be your
>>> recommendation for how page migration needs to be done in udmabuf driver?
>>
>> I guess using proper APIs for shmem and hugetlb. So, turning roughly what
>> you have in patch#1 for now into common code, and only calling into that
>> from udmabug.
> 
> This is a lot of work for an obscure uapi :\

Well, what can we otherwise to to *not* break existing users? I'm not 
happy about this either.

Of course, we can come up with a new uapi, but we have to handle the old 
uapi somehow.

Sure, we can simply always fail when we detect ZONE_MOVABLE or 
MIGRATE_CMA. Maybe that keeps at least some use cases working.
Jason Gunthorpe Aug. 25, 2023, 5:29 p.m. UTC | #9
On Thu, Aug 24, 2023 at 08:33:09PM +0200, David Hildenbrand wrote:

> Sure, we can simply always fail when we detect ZONE_MOVABLE or MIGRATE_CMA.
> Maybe that keeps at least some use cases working.

That seems fairly reasonable

Jason
Kasireddy, Vivek Aug. 27, 2023, 6:49 p.m. UTC | #10
Hi Jason, David,

> 
> > Sure, we can simply always fail when we detect ZONE_MOVABLE or
> MIGRATE_CMA.
> > Maybe that keeps at least some use cases working.
> 
> That seems fairly reasonable
AFAICS, failing udmabuf_create() if we detect one or more pages are in
ZONE_MOVABLE or MIGRATE_CMA would not be a recoverable failure --
as it would result in the failure of Guest GUI (or compositor).

I think it makes sense to have a generic version of 
And, since check_and_migrate_movable_pages() is GUP-specific, would
it be ok to create a generic version of that (in mm/migrate.c) which can be
used by udmabuf and/or other drivers in the future?

Thanks,
Vivek

> 
> Jason
Kasireddy, Vivek Aug. 27, 2023, 7:05 p.m. UTC | #11
Hi Jason, David,

> > > Sure, we can simply always fail when we detect ZONE_MOVABLE or
> > MIGRATE_CMA.
> > > Maybe that keeps at least some use cases working.
> >
> > That seems fairly reasonable
> AFAICS, failing udmabuf_create() if we detect one or more pages are in
> ZONE_MOVABLE or MIGRATE_CMA would not be a recoverable failure --
> as it would result in the failure of Guest GUI (or compositor).
> 
> I think it makes sense to have a generic version of
> And, since check_and_migrate_movable_pages() is GUP-specific, would
> it be ok to create a generic version of that (in mm/migrate.c) which can be
> used by udmabuf and/or other drivers in the future?
Sorry, I accidentally sent this earlier email before finishing it. 
What I meant to say is since the same situation (inadvertently pinning pages
in movable) may probably arise in the future with another driver, I think it makes
sense to have a generic (non-GUP) version of check_and_migrate_movable_pages()
available in migration.h that drivers can use to ensure that they don't break
memory hotunplug accidentally.

Thanks,
Vivek

> 
> Thanks,
> Vivek
> 
> >
> > Jason
>
Jason Gunthorpe Aug. 30, 2023, 5:30 p.m. UTC | #12
On Sun, Aug 27, 2023 at 07:05:59PM +0000, Kasireddy, Vivek wrote:
> Hi Jason, David,
> 
> > > > Sure, we can simply always fail when we detect ZONE_MOVABLE or
> > > MIGRATE_CMA.
> > > > Maybe that keeps at least some use cases working.
> > >
> > > That seems fairly reasonable
> > AFAICS, failing udmabuf_create() if we detect one or more pages are in
> > ZONE_MOVABLE or MIGRATE_CMA would not be a recoverable failure --
> > as it would result in the failure of Guest GUI (or compositor).

Yes, you can't use whatever this driver is while enabling MOVABLE or
CMA in your kernel boot.

> > I think it makes sense to have a generic version of
> > And, since check_and_migrate_movable_pages() is GUP-specific, would
> > it be ok to create a generic version of that (in mm/migrate.c) which can be
> > used by udmabuf and/or other drivers in the future?
> Sorry, I accidentally sent this earlier email before finishing it.
> What I meant to say is since the same situation (inadvertently pinning pages
> in movable) may probably arise in the future with another driver, 

Why?

It was a big mistake to design a uAPI around taking in a FD and
extracting pages from it, we don't have kernel infrastructure for
that, and code liek that does not belong outside the MM at all.

> I think it makes sense to have a generic (non-GUP) version of
> check_and_migrate_movable_pages() available in migration.h that
> drivers can use to ensure that they don't break memory hotunplug
> accidentally.

Definately not.

Either use the VMA and pin_user_pages(), or implement
pin_user_pages_fd() in core code.

Do not open code something wonky in drivers.

Jason
David Hildenbrand Sept. 14, 2023, 1:43 p.m. UTC | #13
>> I think it makes sense to have a generic (non-GUP) version of
>> check_and_migrate_movable_pages() available in migration.h that
>> drivers can use to ensure that they don't break memory hotunplug
>> accidentally.
> 
> Definately not.
> 
> Either use the VMA and pin_user_pages(), or implement
> pin_user_pages_fd() in core code.
> 
> Do not open code something wonky in drivers.

Agreed. pin_user_pages_fd() might become relevant in the context of 
vfio/mdev + KVM gmem -- don't mmap guest memory but instead provide it 
via a special memfd to the kernel.

So there might be value in having such a core infrastructure.
Kasireddy, Vivek Sept. 16, 2023, 6:31 p.m. UTC | #14
Hi David,

> >> I think it makes sense to have a generic (non-GUP) version of
> >> check_and_migrate_movable_pages() available in migration.h that
> >> drivers can use to ensure that they don't break memory hotunplug
> >> accidentally.
> >
> > Definately not.
> >
> > Either use the VMA and pin_user_pages(), or implement
> > pin_user_pages_fd() in core code.
> >
> > Do not open code something wonky in drivers.
> 
> Agreed. pin_user_pages_fd() might become relevant in the context of
> vfio/mdev + KVM gmem -- don't mmap guest memory but instead provide it
> via a special memfd to the kernel.
> 
> So there might be value in having such a core infrastructure.
Ok, I'll work on adding pin_user_pages_fd() soon.

Thanks,
Vivek
> 
> --
> Cheers,
> 
> David / dhildenb