mbox series

[0/6] prohibit pinning pages in ZONE_MOVABLE

Message ID 20201202052330.474592-1-pasha.tatashin@soleen.com (mailing list archive)
Headers show
Series prohibit pinning pages in ZONE_MOVABLE | expand

Message

Pasha Tatashin Dec. 2, 2020, 5:23 a.m. UTC
When page is pinned it cannot be moved and its physical address stays
the same until pages is unpinned.

This is useful functionality to allows userland to implementation DMA
access. For example, it is used by vfio in vfio_pin_pages().

However, this functionality breaks memory hotplug/hotremove assumptions
that pages in ZONE_MOVABLE can always be migrated.

This patch series fixes this issue by forcing new allocations during
page pinning to omit ZONE_MOVABLE, and also to migrate any existing
pages from ZONE_MOVABLE during pinning.

It uses the same scheme logic that is currently used by CMA, and extends
the functionality for all allocations.

For more information read the discussion [1] about this problem.

[1] https://lore.kernel.org/lkml/CA+CK2bBffHBxjmb9jmSKacm0fJMinyt3Nhk8Nx6iudcQSj80_w@mail.gmail.com/

Pavel Tatashin (6):
  mm/gup: perform check_dax_vmas only when FS_DAX is enabled
  mm/gup: don't pin migrated cma pages in movable zone
  mm/gup: make __gup_longterm_locked common
  mm cma: rename PF_MEMALLOC_NOCMA to PF_MEMALLOC_NOMOVABLE
  mm: honor PF_MEMALLOC_NOMOVABLE for all allocations
  mm/gup: migrate pinned pages out of movable zone

 include/linux/migrate.h        |  1 +
 include/linux/sched.h          |  2 +-
 include/linux/sched/mm.h       | 21 +++------
 include/trace/events/migrate.h |  3 +-
 mm/gup.c                       | 82 +++++++++++++---------------------
 mm/hugetlb.c                   |  4 +-
 mm/page_alloc.c                | 26 ++++++-----
 7 files changed, 58 insertions(+), 81 deletions(-)

Comments

Joonsoo Kim Dec. 4, 2020, 4:02 a.m. UTC | #1
Hello,

On Wed, Dec 02, 2020 at 12:23:24AM -0500, Pavel Tatashin wrote:
> When page is pinned it cannot be moved and its physical address stays
> the same until pages is unpinned.
> 
> This is useful functionality to allows userland to implementation DMA
> access. For example, it is used by vfio in vfio_pin_pages().
> 
> However, this functionality breaks memory hotplug/hotremove assumptions
> that pages in ZONE_MOVABLE can always be migrated.
> 
> This patch series fixes this issue by forcing new allocations during
> page pinning to omit ZONE_MOVABLE, and also to migrate any existing
> pages from ZONE_MOVABLE during pinning.

I love what this patchset does, but, at least, it's better to consider
the side-effect of this patchset and inform it in somewhere. IIUC,
ZONE_MOVABLE exists for two purposes.

1) increasing availability of THP
2) memory hot-unplug

Potential issue would come from the case 1). They uses ZONE_MOVABLE
for THP availability and hard guarantee for migration isn't required
until now. So, there would be a system with following congifuration.

- memory layout: ZONE_NORMAL-512MB, ZONE_MOVABLE-512MB
- memory usage: unmovable-256MB, movable pinned-256MB, movable
  unpinned-512MB

With this patchset, movable pinned should be placed in ZONE_NORMAL so
512MB is required for ZONE_NORMAL. ZONE_NORMAL would be exhausted and
system performance would be highly afftect according to memory usage
pattern.

I'm not sure whether such configuration exists or not, but, at least,
it's better to write down this risk on commit message or something
else.

Thanks.
Pasha Tatashin Dec. 4, 2020, 3:55 p.m. UTC | #2
On Thu, Dec 3, 2020 at 11:03 PM Joonsoo Kim <js1304@gmail.com> wrote:
>
> Hello,
>
> On Wed, Dec 02, 2020 at 12:23:24AM -0500, Pavel Tatashin wrote:
> > When page is pinned it cannot be moved and its physical address stays
> > the same until pages is unpinned.
> >
> > This is useful functionality to allows userland to implementation DMA
> > access. For example, it is used by vfio in vfio_pin_pages().
> >
> > However, this functionality breaks memory hotplug/hotremove assumptions
> > that pages in ZONE_MOVABLE can always be migrated.
> >
> > This patch series fixes this issue by forcing new allocations during
> > page pinning to omit ZONE_MOVABLE, and also to migrate any existing
> > pages from ZONE_MOVABLE during pinning.
>
> I love what this patchset does, but, at least, it's better to consider
> the side-effect of this patchset and inform it in somewhere. IIUC,
> ZONE_MOVABLE exists for two purposes.
>
> 1) increasing availability of THP
> 2) memory hot-unplug
>
> Potential issue would come from the case 1). They uses ZONE_MOVABLE
> for THP availability and hard guarantee for migration isn't required
> until now. So, there would be a system with following congifuration.
>
> - memory layout: ZONE_NORMAL-512MB, ZONE_MOVABLE-512MB
> - memory usage: unmovable-256MB, movable pinned-256MB, movable
>   unpinned-512MB
>
> With this patchset, movable pinned should be placed in ZONE_NORMAL so
> 512MB is required for ZONE_NORMAL. ZONE_NORMAL would be exhausted and
> system performance would be highly afftect according to memory usage
> pattern.
>
> I'm not sure whether such configuration exists or not, but, at least,
> it's better to write down this risk on commit message or something
> else.

Yes, this indeed could be a problem for some configurations. I will
add your comment to the commit log of one of the patches.

Thank you,
Pasha
Jason Gunthorpe Dec. 4, 2020, 4:10 p.m. UTC | #3
On Fri, Dec 04, 2020 at 10:55:30AM -0500, Pavel Tatashin wrote:
> On Thu, Dec 3, 2020 at 11:03 PM Joonsoo Kim <js1304@gmail.com> wrote:
> >
> > Hello,
> >
> > On Wed, Dec 02, 2020 at 12:23:24AM -0500, Pavel Tatashin wrote:
> > > When page is pinned it cannot be moved and its physical address stays
> > > the same until pages is unpinned.
> > >
> > > This is useful functionality to allows userland to implementation DMA
> > > access. For example, it is used by vfio in vfio_pin_pages().
> > >
> > > However, this functionality breaks memory hotplug/hotremove assumptions
> > > that pages in ZONE_MOVABLE can always be migrated.
> > >
> > > This patch series fixes this issue by forcing new allocations during
> > > page pinning to omit ZONE_MOVABLE, and also to migrate any existing
> > > pages from ZONE_MOVABLE during pinning.
> >
> > I love what this patchset does, but, at least, it's better to consider
> > the side-effect of this patchset and inform it in somewhere. IIUC,
> > ZONE_MOVABLE exists for two purposes.
> >
> > 1) increasing availability of THP
> > 2) memory hot-unplug
> >
> > Potential issue would come from the case 1). They uses ZONE_MOVABLE
> > for THP availability and hard guarantee for migration isn't required
> > until now. So, there would be a system with following congifuration.
> >
> > - memory layout: ZONE_NORMAL-512MB, ZONE_MOVABLE-512MB
> > - memory usage: unmovable-256MB, movable pinned-256MB, movable
> >   unpinned-512MB
> >
> > With this patchset, movable pinned should be placed in ZONE_NORMAL so
> > 512MB is required for ZONE_NORMAL. ZONE_NORMAL would be exhausted and
> > system performance would be highly afftect according to memory usage
> > pattern.
> >
> > I'm not sure whether such configuration exists or not, but, at least,
> > it's better to write down this risk on commit message or something
> > else.
> 
> Yes, this indeed could be a problem for some configurations. I will
> add your comment to the commit log of one of the patches.

It sounds like there is some inherent tension here, breaking THP's
when doing pin_user_pages() is a really nasty thing to do. DMA
benefits greatly from THP.

I know nothing about ZONE_MOVABLE, is this auto-setup or an admin
option? If the result of this patch is standard systems can no longer
pin > 80% of their memory I have some regression concerns..

Jason
Pasha Tatashin Dec. 4, 2020, 5:50 p.m. UTC | #4
> > Yes, this indeed could be a problem for some configurations. I will
> > add your comment to the commit log of one of the patches.
>
> It sounds like there is some inherent tension here, breaking THP's
> when doing pin_user_pages() is a really nasty thing to do. DMA
> benefits greatly from THP.
>
> I know nothing about ZONE_MOVABLE, is this auto-setup or an admin
> option? If the result of this patch is standard systems can no longer
> pin > 80% of their memory I have some regression concerns..

ZONE_MOVABLE can be configured via kernel parameter, or when memory
nodes are onlined after hot-add; so this is something that admins
configure. ZONE_MOVABLE is designed to gurantee memory hot-plug
functionality, and not availability of THP, however, I did not know
about the use case where some admins might configure ZONE_MOVABLE to
increase availability of THP because pages are always migratable in
them. The thing is, if we fragment ZONE_MOVABLE by pinning pages in
it, the availability of THP also suffers.  We can migrate pages in
ZONE_NORMAL, just not guaranteed, so we can create THP in ZONE_NORMAL
as well, which is the usual case.
David Hildenbrand Dec. 4, 2020, 6:01 p.m. UTC | #5
On 04.12.20 18:50, Pavel Tatashin wrote:
>>> Yes, this indeed could be a problem for some configurations. I will
>>> add your comment to the commit log of one of the patches.
>>
>> It sounds like there is some inherent tension here, breaking THP's
>> when doing pin_user_pages() is a really nasty thing to do. DMA
>> benefits greatly from THP.
>>
>> I know nothing about ZONE_MOVABLE, is this auto-setup or an admin
>> option? If the result of this patch is standard systems can no longer
>> pin > 80% of their memory I have some regression concerns..
> 
> ZONE_MOVABLE can be configured via kernel parameter, or when memory
> nodes are onlined after hot-add; so this is something that admins
> configure. ZONE_MOVABLE is designed to gurantee memory hot-plug
> functionality, and not availability of THP, however, I did not know
> about the use case where some admins might configure ZONE_MOVABLE to
> increase availability of THP because pages are always migratable in
> them. The thing is, if we fragment ZONE_MOVABLE by pinning pages in
> it, the availability of THP also suffers.  We can migrate pages in
> ZONE_NORMAL, just not guaranteed, so we can create THP in ZONE_NORMAL
> as well, which is the usual case.

Right, we should document this at some place to make admins aware of
this. Something like

"Techniques that rely on long-term pinnings of memory (especially, RDMA
and vfio) are fundamentally problematic with ZONE_MOVABLE and,
therefore, memory hotunplug. Pinned pages cannot reside on ZONE_MOVABLE,
to guarantee that memory can still get hotunplugged - be aware that
pinning can fail even if there is plenty of free memory in ZONE_MOVABLE.
In addition, using ZONE_MOVABLE might make page pinning more expensive,
because pages have to be migrated off that zone first."

BTW, you might also want to update the comment for ZONE_MOVABLE in
include/linux/mmzone.h at the end of this series, removing the special
case of pinned pages (1.) and maybe adding what happens when trying to
pin pages on ZONE_MOVABLE.
Pasha Tatashin Dec. 4, 2020, 6:10 p.m. UTC | #6
> > ZONE_MOVABLE can be configured via kernel parameter, or when memory
> > nodes are onlined after hot-add; so this is something that admins
> > configure. ZONE_MOVABLE is designed to gurantee memory hot-plug
> > functionality, and not availability of THP, however, I did not know
> > about the use case where some admins might configure ZONE_MOVABLE to
> > increase availability of THP because pages are always migratable in
> > them. The thing is, if we fragment ZONE_MOVABLE by pinning pages in
> > it, the availability of THP also suffers.  We can migrate pages in
> > ZONE_NORMAL, just not guaranteed, so we can create THP in ZONE_NORMAL
> > as well, which is the usual case.
>
> Right, we should document this at some place to make admins aware of
> this. Something like
>
> "Techniques that rely on long-term pinnings of memory (especially, RDMA
> and vfio) are fundamentally problematic with ZONE_MOVABLE and,
> therefore, memory hotunplug. Pinned pages cannot reside on ZONE_MOVABLE,
> to guarantee that memory can still get hotunplugged - be aware that
> pinning can fail even if there is plenty of free memory in ZONE_MOVABLE.
> In addition, using ZONE_MOVABLE might make page pinning more expensive,
> because pages have to be migrated off that zone first."

Thanks, I will add this.

>
> BTW, you might also want to update the comment for ZONE_MOVABLE in
> include/linux/mmzone.h at the end of this series, removing the special
> case of pinned pages (1.) and maybe adding what happens when trying to
> pin pages on ZONE_MOVABLE.

Will do it.

Thank you,
Pasha
Joonsoo Kim Dec. 7, 2020, 7:12 a.m. UTC | #7
On Fri, Dec 04, 2020 at 12:50:56PM -0500, Pavel Tatashin wrote:
> > > Yes, this indeed could be a problem for some configurations. I will
> > > add your comment to the commit log of one of the patches.
> >
> > It sounds like there is some inherent tension here, breaking THP's
> > when doing pin_user_pages() is a really nasty thing to do. DMA
> > benefits greatly from THP.
> >
> > I know nothing about ZONE_MOVABLE, is this auto-setup or an admin
> > option? If the result of this patch is standard systems can no longer
> > pin > 80% of their memory I have some regression concerns..
> 
> ZONE_MOVABLE can be configured via kernel parameter, or when memory
> nodes are onlined after hot-add; so this is something that admins
> configure. ZONE_MOVABLE is designed to gurantee memory hot-plug

Just note, the origin of ZONE_MOVABLE is to provide availability of
huge page, especially, hugetlb page. AFAIK, not guarantee memory
hot-plug. See following commit that introduces the ZONE_MOVABLE.

2a1e274 Create the ZONE_MOVABLE zone

> functionality, and not availability of THP, however, I did not know
> about the use case where some admins might configure ZONE_MOVABLE to

The usecase is lightly mentioned in previous discussion.

http://lkml.kernel.org/r/alpine.DEB.2.23.453.2011221300100.2830030@chino.kir.corp.google.com

Anyway, I agree with your other arguments and this patchset.

Thanks.
Michal Hocko Dec. 7, 2020, 12:13 p.m. UTC | #8
On Mon 07-12-20 16:12:50, Joonsoo Kim wrote:
> On Fri, Dec 04, 2020 at 12:50:56PM -0500, Pavel Tatashin wrote:
> > > > Yes, this indeed could be a problem for some configurations. I will
> > > > add your comment to the commit log of one of the patches.
> > >
> > > It sounds like there is some inherent tension here, breaking THP's
> > > when doing pin_user_pages() is a really nasty thing to do. DMA
> > > benefits greatly from THP.
> > >
> > > I know nothing about ZONE_MOVABLE, is this auto-setup or an admin
> > > option? If the result of this patch is standard systems can no longer
> > > pin > 80% of their memory I have some regression concerns..
> > 
> > ZONE_MOVABLE can be configured via kernel parameter, or when memory
> > nodes are onlined after hot-add; so this is something that admins
> > configure. ZONE_MOVABLE is designed to gurantee memory hot-plug
> 
> Just note, the origin of ZONE_MOVABLE is to provide availability of
> huge page, especially, hugetlb page. AFAIK, not guarantee memory
> hot-plug. See following commit that introduces the ZONE_MOVABLE.
> 
> 2a1e274 Create the ZONE_MOVABLE zone
> 
> > functionality, and not availability of THP, however, I did not know
> > about the use case where some admins might configure ZONE_MOVABLE to
> 
> The usecase is lightly mentioned in previous discussion.
> 
> http://lkml.kernel.org/r/alpine.DEB.2.23.453.2011221300100.2830030@chino.kir.corp.google.com
> 
> Anyway, I agree with your other arguments and this patchset.

Yes, historically the original motivation for the movable zone was to
help creating large pages via compaction. I also do remember Mel
not being particularly happy about that.

The thing is that the movability constrain is just too strict for this
usecases because the movable zone, especially a lot of it, might be
causing similar to lowmem/highmem problems very well known from 32b
world. So an admin had to be always very careful when configuring to not
cause zone pressure problems.

Later on, with a higher demand on the memory hotplug - especially the
hotremove usecases - it has become clear that the only reliable way for
the memory offlining is to rule out any unmovable memory out of the way
and that is why a rather strong properly of movable zone was relied on.

In the end we are in two rather different requirements here. One for
optimization and one for correctness. In this case I would much rather
focus on the correctness aspect.