RFC: get_user_pages_locked|unlocked to leverage VM_FAULT_RETRY

On Thu, Sep 25, 2014 at 02:50:29PM -0700, Andres Lagar-Cavilla wrote:
> It's nearly impossible to name it right because 1) it indicates we can
> relinquish 2) it returns whether we still hold the mmap semaphore.
> 
> I'd prefer it'd be called mmap_sem_hold, which conveys immediately
> what this is about ("nonblocking" or "locked" could be about a whole
> lot of things)

To me FOLL_NOWAIT/FAULT_FLAG_RETRY_NOWAIT is nonblocking,
"locked"/FAULT_FLAG_ALLOW_RETRY is still very much blocking, just
without the mmap_sem, so I called it "locked"... but I'm fine to
change the name to mmap_sem_hold. Just get_user_pages_mmap_sem_hold
seems less friendly than get_user_pages_locked(..., &locked). locked
as you used comes intuitive when you do later "if (locked) up_read".

Then I added an _unlocked kind which is a drop in replacement for many
places just to clean it up.

get_user_pages_unlocked and get_user_pages_fast are equivalent in
semantics, so any call of get_user_pages_unlocked(current,
current->mm, ...) has no reason to exist and should be replaced to
get_user_pages_fast unless "force = 1" (gup_fast has no force param
just to make the argument list a bit more confusing across the various
versions of gup).

get_user_pages over time should be phased out and dropped.

> I can see that. My background for coming into this is very similar: in
> a previous life we had a file system shim that would kick up into
> userspace for servicing VM memory. KVM just wouldn't let the file
> system give up the mmap semaphore. We had /proc readers hanging up all
> over the place while userspace was servicing. Not happy.
> 
> With KVM (now) and the standard x86 fault giving you ALLOW_RETRY, what
> stands in your way? Methinks that gup_fast has no slowpath fallback
> that turns on ALLOW_RETRY. What would oppose that being the global
> behavior?

It should become the global behavior. Just it doesn't need to become a
global behavior immediately for all kind of gups (i.e. video4linux
drivers will never need to poke into the KVM guest user memory so it
doesn't matter if they don't use gup_locked immediately). Even then we
can still support get_user_pages_locked(...., locked=NULL) for
ptrace/coredump and other things that may not want to trigger the
userfaultfd protocol and just get an immediate VM_FAULT_SIGBUS.

Userfaults will just VM_FAULT_SIGBUS (translated to -EFAULT by all gup
invocations) and not invoke the userfaultfd protocol, if
FAULT_FLAG_ALLOW_RETRY is not set. So any gup_locked with locked ==
NULL or or gup() (without locked parameter) will not invoke the
userfaultfd protocol.

But I need gup_fast to use FAULT_FLAG_ALLOW_RETRY because core places
like O_DIRECT uses it.

I tried to do a RFC patch below that goes into this direction and
should be enough for a start to solve all my issues with the mmap_sem
holding inside handle_userfault(), comments welcome.

This isn't bisectable in this order and it's untested anyway. It needs
more patchsplits.

This is just an initial RFC to know if it's ok to go into this
direction.

If it's ok I'll do some testing and submit it more properly. If your
patches goes in first it's fine and I'll just replace the call in KVM
to get_user_pages_unlocked (or whatever we want to call that thing).

I'd need to get this (or equivalent solution) merged before
re-submitting the userfaultfd patchset. I think the above benefits the
kernel as a whole in terms of mmap_sem holdtimes regardless of
userfaultfd so it should be good.

> Well, IIUC every code path that has ALLOW_RETRY dives in the second
> time with FAULT_TRIED or similar. In the common case, you happily
> blaze through the second time, but if someone raced in while all locks
> were given up, one pays the price of the second time being a full
> fault hogging the mmap sem. At some point you need to not keep being
> polite otherwise the task starves. Presumably the risk of an extra
> retry drops steeply every new gup retry. Maybe just try three times is
> good enough. It makes for ugly logic though.

I was under the idea that if one looped forever with VM_FAULT_RETRY
it'd eventually succeed, but it risks doing more work, so I'm also
sticking to the "locked != NULL" first, seek to the first page that
returned VM_FAULT_RETRY and issue a nr_pages=1 gup with locked ==
NULL, and then continue with locked != NULL at the next page. Just
like you did in the KVM slow path. And if "pages" array is NULL I bail
out at the first VM_FAULT_RETRY failure without insisting with further
gup calls of the "&locked" kind, your patch had just 1 page but you
also bailed out.

What this code above does is basically to generalize your optimization
to KVM and it makes it global and at the same time it avoids me
trouble in handle_userfault().

While at it I also converted some obvious candidate for gup_fast that
had no point in running slower (which I should split off in a separate
patch).

Thanks,
Andrea
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RFC: get_user_pages_locked|unlocked to leverage VM_FAULT_RETRY

Commit Message

Comments

Patch