From patchwork Mon Jun 15 17:22:08 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrea Arcangeli X-Patchwork-Id: 6611001 Return-Path: X-Original-To: patchwork-kvm@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork1.web.kernel.org (Postfix) with ESMTP id 1AD779F326 for ; Mon, 15 Jun 2015 17:23:22 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 39CE9203C3 for ; Mon, 15 Jun 2015 17:23:21 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 35CAE20395 for ; Mon, 15 Jun 2015 17:23:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756024AbbFORXR (ORCPT ); Mon, 15 Jun 2015 13:23:17 -0400 Received: from mx1.redhat.com ([209.132.183.28]:43244 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755633AbbFORWS (ORCPT ); Mon, 15 Jun 2015 13:22:18 -0400 Received: from int-mx14.intmail.prod.int.phx2.redhat.com (int-mx14.intmail.prod.int.phx2.redhat.com [10.5.11.27]) by mx1.redhat.com (Postfix) with ESMTPS id 91B8FB6F3C; Mon, 15 Jun 2015 17:22:18 +0000 (UTC) Received: from mail.random (ovpn-116-88.ams2.redhat.com [10.36.116.88]) by int-mx14.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id t5FHMGNk024797 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Mon, 15 Jun 2015 13:22:17 -0400 From: Andrea Arcangeli To: Andrew Morton , linux-kernel@vger.kernel.org, linux-mm@kvack.org, qemu-devel@nongnu.org, kvm@vger.kernel.org Cc: Pavel Emelyanov , Sanidhya Kashyap , zhang.zhanghailiang@huawei.com, Linus Torvalds , "Kirill A. Shutemov" , Andres Lagar-Cavilla , Dave Hansen , Paolo Bonzini , Rik van Riel , Mel Gorman , Andy Lutomirski , Hugh Dickins , Peter Feiner , "Dr. David Alan Gilbert" , Johannes Weiner , "Huangpeng (Peter)" Subject: [PATCH 4/7] userfaultfd: avoid missing wakeups during refile in userfaultfd_read Date: Mon, 15 Jun 2015 19:22:08 +0200 Message-Id: <1434388931-24487-5-git-send-email-aarcange@redhat.com> In-Reply-To: <1434388931-24487-1-git-send-email-aarcange@redhat.com> References: <1434388931-24487-1-git-send-email-aarcange@redhat.com> X-Scanned-By: MIMEDefang 2.68 on 10.5.11.27 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Spam-Status: No, score=-7.4 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP During the refile in userfaultfd_read both waitqueues could look empty to the lockless wake_userfault(). Use a seqcount to prevent this false negative that could leave an userfault blocked. Signed-off-by: Andrea Arcangeli --- fs/userfaultfd.c | 26 ++++++++++++++++++++++++-- 1 file changed, 24 insertions(+), 2 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index 8286ec8..f9e11ec 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -45,6 +45,8 @@ struct userfaultfd_ctx { wait_queue_head_t fault_wqh; /* waitqueue head for the pseudo fd to wakeup poll/read */ wait_queue_head_t fd_wqh; + /* a refile sequence protected by fault_pending_wqh lock */ + struct seqcount refile_seq; /* pseudo fd refcounting */ atomic_t refcount; /* userfaultfd syscall flags */ @@ -547,6 +549,15 @@ static ssize_t userfaultfd_ctx_read(struct userfaultfd_ctx *ctx, int no_wait, uwq = find_userfault(ctx); if (uwq) { /* + * Use a seqcount to repeat the lockless check + * in wake_userfault() to avoid missing + * wakeups because during the refile both + * waitqueue could become empty if this is the + * only userfault. + */ + write_seqcount_begin(&ctx->refile_seq); + + /* * The fault_pending_wqh.lock prevents the uwq * to disappear from under us. * @@ -570,6 +581,8 @@ static ssize_t userfaultfd_ctx_read(struct userfaultfd_ctx *ctx, int no_wait, list_del(&uwq->wq.task_list); __add_wait_queue(&ctx->fault_wqh, &uwq->wq); + write_seqcount_end(&ctx->refile_seq); + /* careful to always initialize msg if ret == 0 */ *msg = uwq->msg; spin_unlock(&ctx->fault_pending_wqh.lock); @@ -648,6 +661,9 @@ static void __wake_userfault(struct userfaultfd_ctx *ctx, static __always_inline void wake_userfault(struct userfaultfd_ctx *ctx, struct userfaultfd_wake_range *range) { + unsigned seq; + bool need_wakeup; + /* * To be sure waitqueue_active() is not reordered by the CPU * before the pagetable update, use an explicit SMP memory @@ -663,8 +679,13 @@ static __always_inline void wake_userfault(struct userfaultfd_ctx *ctx, * userfaults yet. So we take the spinlock only when we're * sure we've userfaults to wake. */ - if (waitqueue_active(&ctx->fault_pending_wqh) || - waitqueue_active(&ctx->fault_wqh)) + do { + seq = read_seqcount_begin(&ctx->refile_seq); + need_wakeup = waitqueue_active(&ctx->fault_pending_wqh) || + waitqueue_active(&ctx->fault_wqh); + cond_resched(); + } while (read_seqcount_retry(&ctx->refile_seq, seq)); + if (need_wakeup) __wake_userfault(ctx, range); } @@ -1223,6 +1244,7 @@ static void init_once_userfaultfd_ctx(void *mem) init_waitqueue_head(&ctx->fault_pending_wqh); init_waitqueue_head(&ctx->fault_wqh); init_waitqueue_head(&ctx->fd_wqh); + seqcount_init(&ctx->refile_seq); } /**