mbox series

[v5,0/5] mm/hugetlb: Early cow on fork, and a few cleanups

Message ID 20210217233547.93892-1-peterx@redhat.com (mailing list archive)
Headers show
Series mm/hugetlb: Early cow on fork, and a few cleanups | expand

Message

Peter Xu Feb. 17, 2021, 11:35 p.m. UTC
v5:
- patch 4: change "int cow" into "bool cow"
- collect r-bs for Jason

v4:
- add r-b for Mike on the last patch, add some more commit message explains
  that why we don't need wr-protect trick
- fix one warning of unused var in copy_present_page() [Gal]

v3:
- rebase to linux-next/akpm, switch to the new HPAGE helpers [MikeK]
- correct error check for alloc_huge_page(); test it this time to make sure
  fork() fails gracefully when overcommit [MikeK]
- move page copy out of pgtable lock: this changed quite a bit of the logic in
  the last patch, prealloc is dropped since I found it easier to understand
  without looping at all [MikeK]

v2:
- pass in 1 to alloc_huge_page() last param [Mike]
- reduce comment, unify the comment in one place [Linus]
- add r-bs for Mike and Miaohe

---- original cover letter ----

As reported by Gal [1], we still miss the code clip to handle early cow for
hugetlb case, which is true.  Again, it still feels odd to fork() after using a
few huge pages, especially if they're privately mapped to me..  However I do
agree with Gal and Jason in that we should still have that since that'll
complete the early cow on fork effort at least, and it'll still fix issues
where buffers are not well under control and not easy to apply MADV_DONTFORK.

The first two patches (1-2) are some cleanups I noticed when reading into the
hugetlb reserve map code.  I think it's good to have but they're not necessary
for fixing the fork issue.

The last two patches (3-4) is the real fix.

I tested this with a fork() after some vfio-pci assignment, so I'm pretty sure
the page copy path could trigger well (page will be accounted right after the
fork()), but I didn't do data check since the card I assigned is some random
nic.  Gal, please feel free to try this if you have better way to verify the
series.

  https://github.com/xzpeter/linux/tree/fork-cow-pin-huge

Please review, thanks!

[1] https://lore.kernel.org/lkml/27564187-4a08-f187-5a84-3df50009f6ca@amazon.com/

Peter Xu (5):
  hugetlb: Dedup the code to add a new file_region
  hugetlg: Break earlier in add_reservation_in_range() when we can
  mm: Introduce page_needs_cow_for_dma() for deciding whether cow
  mm: Use is_cow_mapping() across tree where proper
  hugetlb: Do early cow when page pinned on src mm

 drivers/gpu/drm/vmwgfx/vmwgfx_page_dirty.c |   4 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_ttm_glue.c   |   2 +-
 fs/proc/task_mmu.c                         |   2 -
 include/linux/mm.h                         |  21 ++++
 mm/huge_memory.c                           |   8 +-
 mm/hugetlb.c                               | 123 +++++++++++++++------
 mm/internal.h                              |   5 -
 mm/memory.c                                |   8 +-
 8 files changed, 117 insertions(+), 56 deletions(-)

Comments

Peter Xu March 1, 2021, 2:11 p.m. UTC | #1
On Wed, Feb 17, 2021 at 06:35:42PM -0500, Peter Xu wrote:
> v5:
> - patch 4: change "int cow" into "bool cow"
> - collect r-bs for Jason

Andrew,

I just noticed 5.12-rc1 has released; is this series still possible to make it
for 5.12, or needs to wait for 5.13?

Thanks,
Andrew Morton March 2, 2021, 12:28 a.m. UTC | #2
On Mon, 1 Mar 2021 09:11:51 -0500 Peter Xu <peterx@redhat.com> wrote:

> On Wed, Feb 17, 2021 at 06:35:42PM -0500, Peter Xu wrote:
> > v5:
> > - patch 4: change "int cow" into "bool cow"
> > - collect r-bs for Jason
> 
> Andrew,
> 
> I just noticed 5.12-rc1 has released; is this series still possible to make it
> for 5.12, or needs to wait for 5.13?
> 

It has taken a while to settle down.  What is the case for
fast-tracking it into 5.12?
Jason Gunthorpe March 2, 2021, 12:30 a.m. UTC | #3
On Mon, Mar 01, 2021 at 04:28:46PM -0800, Andrew Morton wrote:
> On Mon, 1 Mar 2021 09:11:51 -0500 Peter Xu <peterx@redhat.com> wrote:
> 
> > On Wed, Feb 17, 2021 at 06:35:42PM -0500, Peter Xu wrote:
> > > v5:
> > > - patch 4: change "int cow" into "bool cow"
> > > - collect r-bs for Jason
> > 
> > Andrew,
> > 
> > I just noticed 5.12-rc1 has released; is this series still possible to make it
> > for 5.12, or needs to wait for 5.13?
> > 
> 
> It has taken a while to settle down.  What is the case for
> fast-tracking it into 5.12?

IIRC hugetlb users and fork and DMA will get the unexpected VA
corruption that triggered all this work.

Jason
Zhang, Wei March 2, 2021, 12:59 a.m. UTC | #4
Yes, such user includes libfabric (https://ofiwg.github.io/libfabric/) . which uses hugetlb pages.
 
On 3/1/21, 4:30 PM, "Jason Gunthorpe" <jgg@ziepe.ca> wrote:

    CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.



    On Mon, Mar 01, 2021 at 04:28:46PM -0800, Andrew Morton wrote:
    > On Mon, 1 Mar 2021 09:11:51 -0500 Peter Xu <peterx@redhat.com> wrote:
    >
    > > On Wed, Feb 17, 2021 at 06:35:42PM -0500, Peter Xu wrote:
    > > > v5:
    > > > - patch 4: change "int cow" into "bool cow"
    > > > - collect r-bs for Jason
    > >
    > > Andrew,
    > >
    > > I just noticed 5.12-rc1 has released; is this series still possible to make it
    > > for 5.12, or needs to wait for 5.13?
    > >
    >
    > It has taken a while to settle down.  What is the case for
    > fast-tracking it into 5.12?

    IIRC hugetlb users and fork and DMA will get the unexpected VA
    corruption that triggered all this work.

    Jason
Peter Xu March 3, 2021, 1:46 a.m. UTC | #5
On Mon, Mar 01, 2021 at 04:28:46PM -0800, Andrew Morton wrote:
> On Mon, 1 Mar 2021 09:11:51 -0500 Peter Xu <peterx@redhat.com> wrote:
> 
> > On Wed, Feb 17, 2021 at 06:35:42PM -0500, Peter Xu wrote:
> > > v5:
> > > - patch 4: change "int cow" into "bool cow"
> > > - collect r-bs for Jason
> > 
> > Andrew,
> > 
> > I just noticed 5.12-rc1 has released; is this series still possible to make it
> > for 5.12, or needs to wait for 5.13?
> > 
> 
> It has taken a while to settle down.  What is the case for
> fast-tracking it into 5.12?

Andrew,

As Jason and Wei pointed out, I think some userspace still got corrupted data
without this series when using hugetlb backend.  I don't think it'll suite for
a late RC release but it'll still be great if it can be considered as an early
rc candidate, ideally rc1 of course.  If you prefer the other way, I can also
repost it before 5.13 merge window opens.

Thanks,
Linus Torvalds March 3, 2021, 2:45 a.m. UTC | #6
On Tue, Mar 2, 2021 at 5:47 PM Peter Xu <peterx@redhat.com> wrote:
>
> As Jason and Wei pointed out, I think some userspace still got corrupted data
> without this series when using hugetlb backend.  I don't think it'll suite for
> a late RC release but it'll still be great if it can be considered as an early
> rc candidate, ideally rc1 of course.  If you prefer the other way, I can also
> repost it before 5.13 merge window opens.

I think with my merge window delay issue, you guys have the perfect
excuse for pushing it a bit late and it still hitting 5.12

                Linus
Jason Gunthorpe March 4, 2021, 5:10 p.m. UTC | #7
On Tue, Mar 02, 2021 at 06:45:49PM -0800, Linus Torvalds wrote:
> On Tue, Mar 2, 2021 at 5:47 PM Peter Xu <peterx@redhat.com> wrote:
> >
> > As Jason and Wei pointed out, I think some userspace still got corrupted data
> > without this series when using hugetlb backend.  I don't think it'll suite for
> > a late RC release but it'll still be great if it can be considered as an early
> > rc candidate, ideally rc1 of course.  If you prefer the other way, I can also
> > repost it before 5.13 merge window opens.
> 
> I think with my merge window delay issue, you guys have the perfect
> excuse for pushing it a bit late and it still hitting 5.12

Andrew did you need something further from Peter? I don't see it in linux-next?

Jason