mbox series

[0/2] Add a 'seqcount' between gup_fast and copy_page_range

Message ID 0-v1-281e425c752f+2df-gup_fork_jgg@nvidia.com (mailing list archive)
Headers show
Series Add a 'seqcount' between gup_fast and copy_page_range | expand

Message

Jason Gunthorpe Oct. 24, 2020, 12:19 a.m. UTC
As discussed and suggested by Linus use a seqcount like thing to close the
small race between gup_fast and copy_page_range.

Unfortunately the good suggestion to just use write_seqcount_begin() blows
up lockdep immediately due to the (new?) requirement that the write side
of seqcount be in a preempt disabled region. For this application it does
not seem like a good idea, nor is it necessary as we don't spin on retry.

So I open coded a similar construct. Don't like it, will redo this in some
other way if there is a better idea. Since seqcount seems to have this
property now, it also feels wrong to be the only place to use the raw_
functions specifically to avoid the lockdep checks and other parts of
seqcount on the read side.

This can go after the merge window. I was table to test it using two
threads, one forking and the other using ibv_reg_mr() to trigger GUP
fast. Modifying copy_page_range() to sleep made the window large enough to
reliably hit to test the logic.

Jason Gunthorpe (2):
  mm: reorganize internal_get_user_pages_fast()
  mm: prevent gup_fast from racing with COW during fork

 include/linux/mm_types.h |   6 +++
 kernel/fork.c            |   1 +
 mm/gup.c                 | 107 ++++++++++++++++++++++++---------------
 mm/memory.c              |  16 +++++-
 4 files changed, 87 insertions(+), 43 deletions(-)

Comments

John Hubbard Oct. 24, 2020, 5:14 a.m. UTC | #1
On 10/23/20 5:19 PM, Jason Gunthorpe wrote:
> As discussed and suggested by Linus use a seqcount like thing to close the
> small race between gup_fast and copy_page_range.
> 
> Unfortunately the good suggestion to just use write_seqcount_begin() blows
> up lockdep immediately due to the (new?) requirement that the write side
> of seqcount be in a preempt disabled region. For this application it does
> not seem like a good idea, nor is it necessary as we don't spin on retry.
> 
> So I open coded a similar construct. Don't like it, will redo this in some
> other way if there is a better idea. Since seqcount seems to have this
> property now, it also feels wrong to be the only place to use the raw_
> functions specifically to avoid the lockdep checks and other parts of
> seqcount on the read side.

I really think situations like this are exactly where the "raw" functions
are appropriate. Using a locking API would be much better here, IMHO
anyway, than having to work through the various rmb(), smb*(), and other
barriers.

> 
> This can go after the merge window. I was table to test it using two
> threads, one forking and the other using ibv_reg_mr() to trigger GUP
> fast. Modifying copy_page_range() to sleep made the window large enough to
> reliably hit to test the logic.
> 
> Jason Gunthorpe (2):
>    mm: reorganize internal_get_user_pages_fast()
>    mm: prevent gup_fast from racing with COW during fork
> 
>   include/linux/mm_types.h |   6 +++
>   kernel/fork.c            |   1 +
>   mm/gup.c                 | 107 ++++++++++++++++++++++++---------------
>   mm/memory.c              |  16 +++++-
>   4 files changed, 87 insertions(+), 43 deletions(-)
> 

thanks,