Message ID | 20150515160426.GD19097@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Fri, May 15, 2015 at 9:04 AM, Andrea Arcangeli <aarcange@redhat.com> wrote: > > To fix it I added this along a comment: Ok, this looks good as a explanation/fix for the races (and also as an example of my worry about waitqueue_active() use in general). However, it now makes me suspect that the optimistic "let's check if they are even active" may not be worth it any more. You're adding a "smp_mb()" in order to avoid taking the real lock. Although I guess there are two locks there (one for each wait-queue) so maybe it's worth it. Linus -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index c89e96f..6be316b 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -593,6 +593,21 @@ static void __wake_userfault(struct userfaultfd_ctx *ctx, static __always_inline void wake_userfault(struct userfaultfd_ctx *ctx, struct userfaultfd_wake_range *range) { + /* + * To be sure waitqueue_active() is not reordered by the CPU + * before the pagetable update, use an explicit SMP memory + * barrier here. PT lock release or up_read(mmap_sem) still + * have release semantics that can allow the + * waitqueue_active() to be reordered before the pte update. + */ + smp_mb(); + + /* + * Use waitqueue_active because it's very frequent to + * change the address space atomically even if there are no + * userfaults yet. So we take the spinlock only when we're + * sure we've userfaults to wake. + */ if (waitqueue_active(&ctx->fault_pending_wqh) || waitqueue_active(&ctx->fault_wqh)) __wake_userfault(ctx, range);