From patchwork Mon Jun 15 17:22:07 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrea Arcangeli X-Patchwork-Id: 6610951 Return-Path: X-Original-To: patchwork-kvm@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork1.web.kernel.org (Postfix) with ESMTP id C3B259F326 for ; Mon, 15 Jun 2015 17:22:31 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id DA3C8205F0 for ; Mon, 15 Jun 2015 17:22:30 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id D9DA8205EF for ; Mon, 15 Jun 2015 17:22:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751098AbbFORWZ (ORCPT ); Mon, 15 Jun 2015 13:22:25 -0400 Received: from mx1.redhat.com ([209.132.183.28]:60191 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755520AbbFORWQ (ORCPT ); Mon, 15 Jun 2015 13:22:16 -0400 Received: from int-mx13.intmail.prod.int.phx2.redhat.com (int-mx13.intmail.prod.int.phx2.redhat.com [10.5.11.26]) by mx1.redhat.com (Postfix) with ESMTPS id 12F0AB7A77; Mon, 15 Jun 2015 17:22:16 +0000 (UTC) Received: from mail.random (ovpn-116-88.ams2.redhat.com [10.36.116.88]) by int-mx13.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id t5FHMCHi016624 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Mon, 15 Jun 2015 13:22:15 -0400 From: Andrea Arcangeli To: Andrew Morton , linux-kernel@vger.kernel.org, linux-mm@kvack.org, qemu-devel@nongnu.org, kvm@vger.kernel.org Cc: Pavel Emelyanov , Sanidhya Kashyap , zhang.zhanghailiang@huawei.com, Linus Torvalds , "Kirill A. Shutemov" , Andres Lagar-Cavilla , Dave Hansen , Paolo Bonzini , Rik van Riel , Mel Gorman , Andy Lutomirski , Hugh Dickins , Peter Feiner , "Dr. David Alan Gilbert" , Johannes Weiner , "Huangpeng (Peter)" Subject: [PATCH 3/7] userfaultfd: allow signals to interrupt a userfault Date: Mon, 15 Jun 2015 19:22:07 +0200 Message-Id: <1434388931-24487-4-git-send-email-aarcange@redhat.com> In-Reply-To: <1434388931-24487-1-git-send-email-aarcange@redhat.com> References: <1434388931-24487-1-git-send-email-aarcange@redhat.com> X-Scanned-By: MIMEDefang 2.68 on 10.5.11.26 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Spam-Status: No, score=-7.4 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This is only simple to achieve if the userfault is going to return to userland (not to the kernel) because we can avoid returning VM_FAULT_RETRY despite we temporarily released the mmap_sem. The fault would just be retried by userland then. This is safe at least on x86 and powerpc (the two archs with the syscall implemented so far). Hint to verify for which archs this is safe: after handle_mm_fault returns, no access to data structures protected by the mmap_sem must be done by the fault code in arch/*/mm/fault.c until up_read(&mm->mmap_sem) is called. This has two main benefits: signals can run with lower latency in production (signals aren't blocked by userfaults and userfaults are immediately repeated after signal processing) and gdb can then trivially debug the threads blocked in this kind of userfaults coming directly from userland. On a side note: while gdb has a need to get signal processed, coredumps always worked perfectly with userfaults, no matter if the userfault is triggered by GUP a kernel copy_user or directly from userland. Signed-off-by: Andrea Arcangeli --- fs/userfaultfd.c | 35 ++++++++++++++++++++++++++++++++--- 1 file changed, 32 insertions(+), 3 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index b69d236..8286ec8 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -262,7 +262,7 @@ int handle_userfault(struct vm_area_struct *vma, unsigned long address, struct userfaultfd_ctx *ctx; struct userfaultfd_wait_queue uwq; int ret; - bool must_wait; + bool must_wait, return_to_userland; BUG_ON(!rwsem_is_locked(&mm->mmap_sem)); @@ -327,6 +327,9 @@ int handle_userfault(struct vm_area_struct *vma, unsigned long address, uwq.msg = userfault_msg(address, flags, reason); uwq.ctx = ctx; + return_to_userland = (flags & (FAULT_FLAG_USER|FAULT_FLAG_KILLABLE)) == + (FAULT_FLAG_USER|FAULT_FLAG_KILLABLE); + spin_lock(&ctx->fault_pending_wqh.lock); /* * After the __add_wait_queue the uwq is visible to userland @@ -338,14 +341,16 @@ int handle_userfault(struct vm_area_struct *vma, unsigned long address, * following the spin_unlock to happen before the list_add in * __add_wait_queue. */ - set_current_state(TASK_KILLABLE); + set_current_state(return_to_userland ? TASK_INTERRUPTIBLE : + TASK_KILLABLE); spin_unlock(&ctx->fault_pending_wqh.lock); must_wait = userfaultfd_must_wait(ctx, address, flags, reason); up_read(&mm->mmap_sem); if (likely(must_wait && !ACCESS_ONCE(ctx->released) && - !fatal_signal_pending(current))) { + (return_to_userland ? !signal_pending(current) : + !fatal_signal_pending(current)))) { wake_up_poll(&ctx->fd_wqh, POLLIN); schedule(); ret |= VM_FAULT_MAJOR; @@ -353,6 +358,30 @@ int handle_userfault(struct vm_area_struct *vma, unsigned long address, __set_current_state(TASK_RUNNING); + if (return_to_userland) { + if (signal_pending(current) && + !fatal_signal_pending(current)) { + /* + * If we got a SIGSTOP or SIGCONT and this is + * a normal userland page fault, just let + * userland return so the signal will be + * handled and gdb debugging works. The page + * fault code immediately after we return from + * this function is going to release the + * mmap_sem and it's not depending on it + * (unlike gup would if we were not to return + * VM_FAULT_RETRY). + * + * If a fatal signal is pending we still take + * the streamlined VM_FAULT_RETRY failure path + * and there's no need to retake the mmap_sem + * in such case. + */ + down_read(&mm->mmap_sem); + ret = 0; + } + } + /* * Here we race with the list_del; list_add in * userfaultfd_ctx_read(), however because we don't ever run