From patchwork Mon Sep 23 04:25:14 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 11156019 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1D9571599 for ; Mon, 23 Sep 2019 04:25:52 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id CD4312087C for ; Mon, 23 Sep 2019 04:25:51 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org CD4312087C Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id E7BBC6B000A; Mon, 23 Sep 2019 00:25:50 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id E2CCC6B000C; Mon, 23 Sep 2019 00:25:50 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CF4886B000D; Mon, 23 Sep 2019 00:25:50 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0018.hostedemail.com [216.40.44.18]) by kanga.kvack.org (Postfix) with ESMTP id AE0126B000A for ; Mon, 23 Sep 2019 00:25:50 -0400 (EDT) Received: from smtpin11.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with SMTP id 5FD40180AD802 for ; Mon, 23 Sep 2019 04:25:50 +0000 (UTC) X-FDA: 75964897260.11.boys33_6d19144b23545 X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,peterx@redhat.com,::linux-kernel@vger.kernel.org:david@redhat.com:hughd@google.com:gokhale2@llnl.gov:jglisse@redhat.com:xemul@virtuozzo.com:hannes@cmpxchg.org:peterx@redhat.com:cracauer@cons.org:mcfadden8@llnl.gov:shli@fb.com:aarcange@redhat.com:mike.kravetz@oracle.com:dplotnikov@virtuozzo.com:rppt@linux.vnet.ibm.com:torvalds@linux-foundation.org:mgorman@suse.de:kirill@shutemov.name:dgilbert@redhat.com,RULES_HIT:30003:30029:30034:30054:30070:30090,0,RBL:209.132.183.28:@redhat.com:.lbl8.mailshell.net-62.18.0.0 64.10.201.10,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:36,LUA_SUMMARY:none X-HE-Tag: boys33_6d19144b23545 X-Filterd-Recvd-Size: 11206 Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by imf19.hostedemail.com (Postfix) with ESMTP for ; Mon, 23 Sep 2019 04:25:49 +0000 (UTC) Received: from mail-pf1-f198.google.com (mail-pf1-f198.google.com [209.85.210.198]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 879F8756 for ; Mon, 23 Sep 2019 04:25:48 +0000 (UTC) Received: by mail-pf1-f198.google.com with SMTP id 194so9429302pfu.3 for ; Sun, 22 Sep 2019 21:25:48 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=oIX66W7ZMb/5JfByU6MkGyWo2hWcSJBtpdabai9VliU=; b=MPFBPOe/KCrnRnVPflQbZ91GUOHllXWnDFcsi+q0hW1mVKevPHLKzlZ4d+WVfi5an0 KRnsIp2/gv4W8e3PLAYZG9pXWODOGkrQUqLCyGdc8SzXdpxRAXI2IhCJcBudJH9Gg7aZ AptwArj/Elk728CexSSAmAbs1xuQfrgUnH5zny47JAPhYx3YaAKaX3e3Ipmi/adji39+ uwB532c/ALvuAkO/9RTlezkfnAjByz+bSc7xSm2GYEVAnvV3S7jwa/bM3iArYntsxFHg xku9rO+2KtzGCLvGH2DqlJ0EJrXwwwTrUjZ6QBnkYyeOFdBoNwAgY5APBR6opySxsk+m I/CQ== X-Gm-Message-State: APjAAAWDT/4Et9yHWLxqhqofdB5cAkGdLk+1pt7YhHI9JltdipQJEryU EPS/a3Wo7k5770dR/CzaUOe3pLHhCyJFrmEFHppMbdmHM5bDeyCMvAxvWmqG4v1r38b/6rrVPfP vSJglk/ppoQA= X-Received: by 2002:a63:be48:: with SMTP id g8mr26991656pgo.193.1569212747613; Sun, 22 Sep 2019 21:25:47 -0700 (PDT) X-Google-Smtp-Source: APXvYqz9prJ2K+BCMpveKciJuYN8A8mPKLRBwb9nuCBgKGvLdS+4PO22/QE9L/qvks5N0S0+7vBcSA== X-Received: by 2002:a63:be48:: with SMTP id g8mr26991618pgo.193.1569212747203; Sun, 22 Sep 2019 21:25:47 -0700 (PDT) Received: from xz-x1.redhat.com ([209.132.188.80]) by smtp.gmail.com with ESMTPSA id x20sm11781867pfp.120.2019.09.22.21.25.40 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 22 Sep 2019 21:25:46 -0700 (PDT) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: David Hildenbrand , Hugh Dickins , Maya Gokhale , Jerome Glisse , Pavel Emelyanov , Johannes Weiner , peterx@redhat.com, Martin Cracauer , Marty McFadden , Shaohua Li , Andrea Arcangeli , Mike Kravetz , Denis Plotnikov , Mike Rapoport , Linus Torvalds , Mel Gorman , "Kirill A . Shutemov" , "Dr . David Alan Gilbert" Subject: [PATCH v4 01/10] mm/gup: Rename "nonblocking" to "locked" where proper Date: Mon, 23 Sep 2019 12:25:14 +0800 Message-Id: <20190923042523.10027-2-peterx@redhat.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20190923042523.10027-1-peterx@redhat.com> References: <20190923042523.10027-1-peterx@redhat.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: There's plenty of places around __get_user_pages() that has a parameter "nonblocking" which does not really mean that "it won't block" (because it can really block) but instead it shows whether the mmap_sem is released by up_read() during the page fault handling mostly when VM_FAULT_RETRY is returned. We have the correct naming in e.g. get_user_pages_locked() or get_user_pages_remote() as "locked", however there're still many places that are using the "nonblocking" as name. Renaming the places to "locked" where proper to better suite the functionality of the variable. While at it, fixing up some of the comments accordingly. Reviewed-by: Mike Rapoport Reviewed-by: Jerome Glisse Reviewed-by: David Hildenbrand Signed-off-by: Peter Xu --- mm/gup.c | 44 +++++++++++++++++++++----------------------- mm/hugetlb.c | 8 ++++---- 2 files changed, 25 insertions(+), 27 deletions(-) diff --git a/mm/gup.c b/mm/gup.c index 98f13ab37bac..eddbb95dcb8f 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -622,12 +622,12 @@ static int get_gate_page(struct mm_struct *mm, unsigned long address, } /* - * mmap_sem must be held on entry. If @nonblocking != NULL and - * *@flags does not include FOLL_NOWAIT, the mmap_sem may be released. - * If it is, *@nonblocking will be set to 0 and -EBUSY returned. + * mmap_sem must be held on entry. If @locked != NULL and *@flags + * does not include FOLL_NOWAIT, the mmap_sem may be released. If it + * is, *@locked will be set to 0 and -EBUSY returned. */ static int faultin_page(struct task_struct *tsk, struct vm_area_struct *vma, - unsigned long address, unsigned int *flags, int *nonblocking) + unsigned long address, unsigned int *flags, int *locked) { unsigned int fault_flags = 0; vm_fault_t ret; @@ -639,7 +639,7 @@ static int faultin_page(struct task_struct *tsk, struct vm_area_struct *vma, fault_flags |= FAULT_FLAG_WRITE; if (*flags & FOLL_REMOTE) fault_flags |= FAULT_FLAG_REMOTE; - if (nonblocking) + if (locked) fault_flags |= FAULT_FLAG_ALLOW_RETRY; if (*flags & FOLL_NOWAIT) fault_flags |= FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_RETRY_NOWAIT; @@ -665,8 +665,8 @@ static int faultin_page(struct task_struct *tsk, struct vm_area_struct *vma, } if (ret & VM_FAULT_RETRY) { - if (nonblocking && !(fault_flags & FAULT_FLAG_RETRY_NOWAIT)) - *nonblocking = 0; + if (locked && !(fault_flags & FAULT_FLAG_RETRY_NOWAIT)) + *locked = 0; return -EBUSY; } @@ -743,7 +743,7 @@ static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags) * only intends to ensure the pages are faulted in. * @vmas: array of pointers to vmas corresponding to each page. * Or NULL if the caller does not require them. - * @nonblocking: whether waiting for disk IO or mmap_sem contention + * @locked: whether we're still with the mmap_sem held * * Returns number of pages pinned. This may be fewer than the number * requested. If nr_pages is 0 or negative, returns 0. If no pages @@ -772,13 +772,11 @@ static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags) * appropriate) must be called after the page is finished with, and * before put_page is called. * - * If @nonblocking != NULL, __get_user_pages will not wait for disk IO - * or mmap_sem contention, and if waiting is needed to pin all pages, - * *@nonblocking will be set to 0. Further, if @gup_flags does not - * include FOLL_NOWAIT, the mmap_sem will be released via up_read() in - * this case. + * If @locked != NULL, *@locked will be set to 0 when mmap_sem is + * released by an up_read(). That can happen if @gup_flags does not + * have FOLL_NOWAIT. * - * A caller using such a combination of @nonblocking and @gup_flags + * A caller using such a combination of @locked and @gup_flags * must therefore hold the mmap_sem for reading only, and recognize * when it's been released. Otherwise, it must be held for either * reading or writing and will not be released. @@ -790,7 +788,7 @@ static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags) static long __get_user_pages(struct task_struct *tsk, struct mm_struct *mm, unsigned long start, unsigned long nr_pages, unsigned int gup_flags, struct page **pages, - struct vm_area_struct **vmas, int *nonblocking) + struct vm_area_struct **vmas, int *locked) { long ret = 0, i = 0; struct vm_area_struct *vma = NULL; @@ -834,7 +832,7 @@ static long __get_user_pages(struct task_struct *tsk, struct mm_struct *mm, if (is_vm_hugetlb_page(vma)) { i = follow_hugetlb_page(mm, vma, pages, vmas, &start, &nr_pages, i, - gup_flags, nonblocking); + gup_flags, locked); continue; } } @@ -852,7 +850,7 @@ static long __get_user_pages(struct task_struct *tsk, struct mm_struct *mm, page = follow_page_mask(vma, start, foll_flags, &ctx); if (!page) { ret = faultin_page(tsk, vma, start, &foll_flags, - nonblocking); + locked); switch (ret) { case 0: goto retry; @@ -1178,7 +1176,7 @@ EXPORT_SYMBOL(get_user_pages_remote); * @vma: target vma * @start: start address * @end: end address - * @nonblocking: + * @locked: whether the mmap_sem is still held * * This takes care of mlocking the pages too if VM_LOCKED is set. * @@ -1186,14 +1184,14 @@ EXPORT_SYMBOL(get_user_pages_remote); * * vma->vm_mm->mmap_sem must be held. * - * If @nonblocking is NULL, it may be held for read or write and will + * If @locked is NULL, it may be held for read or write and will * be unperturbed. * - * If @nonblocking is non-NULL, it must held for read only and may be - * released. If it's released, *@nonblocking will be set to 0. + * If @locked is non-NULL, it must held for read only and may be + * released. If it's released, *@locked will be set to 0. */ long populate_vma_page_range(struct vm_area_struct *vma, - unsigned long start, unsigned long end, int *nonblocking) + unsigned long start, unsigned long end, int *locked) { struct mm_struct *mm = vma->vm_mm; unsigned long nr_pages = (end - start) / PAGE_SIZE; @@ -1228,7 +1226,7 @@ long populate_vma_page_range(struct vm_area_struct *vma, * not result in a stack expansion that recurses back here. */ return __get_user_pages(current, mm, start, nr_pages, gup_flags, - NULL, NULL, nonblocking); + NULL, NULL, locked); } /* diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 6d7296dd11b8..31c2a6275023 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -4270,7 +4270,7 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, struct page **pages, struct vm_area_struct **vmas, unsigned long *position, unsigned long *nr_pages, - long i, unsigned int flags, int *nonblocking) + long i, unsigned int flags, int *locked) { unsigned long pfn_offset; unsigned long vaddr = *position; @@ -4341,7 +4341,7 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, spin_unlock(ptl); if (flags & FOLL_WRITE) fault_flags |= FAULT_FLAG_WRITE; - if (nonblocking) + if (locked) fault_flags |= FAULT_FLAG_ALLOW_RETRY; if (flags & FOLL_NOWAIT) fault_flags |= FAULT_FLAG_ALLOW_RETRY | @@ -4358,9 +4358,9 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, break; } if (ret & VM_FAULT_RETRY) { - if (nonblocking && + if (locked && !(fault_flags & FAULT_FLAG_RETRY_NOWAIT)) - *nonblocking = 0; + *locked = 0; *nr_pages = 0; /* * VM_FAULT_RETRY must not return an From patchwork Mon Sep 23 04:25:15 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 11156021 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D48C114E5 for ; Mon, 23 Sep 2019 04:25:59 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id AB07C2087C for ; Mon, 23 Sep 2019 04:25:59 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org AB07C2087C Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id F17BB6B000E; Mon, 23 Sep 2019 00:25:58 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id EEF526B0010; Mon, 23 Sep 2019 00:25:58 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DB7466B0266; Mon, 23 Sep 2019 00:25:58 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0145.hostedemail.com [216.40.44.145]) by kanga.kvack.org (Postfix) with ESMTP id B8FFA6B000E for ; Mon, 23 Sep 2019 00:25:58 -0400 (EDT) Received: from smtpin09.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with SMTP id 543E318DD for ; Mon, 23 Sep 2019 04:25:58 +0000 (UTC) X-FDA: 75964897596.09.week96_6e43999ed0e3b X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,peterx@redhat.com,::linux-kernel@vger.kernel.org:david@redhat.com:hughd@google.com:gokhale2@llnl.gov:jglisse@redhat.com:xemul@virtuozzo.com:hannes@cmpxchg.org:peterx@redhat.com:cracauer@cons.org:mcfadden8@llnl.gov:shli@fb.com:aarcange@redhat.com:mike.kravetz@oracle.com:dplotnikov@virtuozzo.com:rppt@linux.vnet.ibm.com:torvalds@linux-foundation.org:mgorman@suse.de:kirill@shutemov.name:dgilbert@redhat.com,RULES_HIT:30054:30070,0,RBL:209.132.183.28:@redhat.com:.lbl8.mailshell.net-62.18.0.0 64.10.201.10,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:2,LUA_SUMMARY:none X-HE-Tag: week96_6e43999ed0e3b X-Filterd-Recvd-Size: 4236 Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by imf03.hostedemail.com (Postfix) with ESMTP for ; Mon, 23 Sep 2019 04:25:57 +0000 (UTC) Received: from mail-pl1-f197.google.com (mail-pl1-f197.google.com [209.85.214.197]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id AB4C483F4C for ; Mon, 23 Sep 2019 04:25:56 +0000 (UTC) Received: by mail-pl1-f197.google.com with SMTP id a6so7951511pls.20 for ; Sun, 22 Sep 2019 21:25:56 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=nL20lZzBbwkIGL5V1u5lEN6f2V609pKVyYfWwZ4NMZQ=; b=cNIECzOG6bMUx/e7DEhkWNhvVx5X2utXiJPnsSgBDyvLv6cEDlNmAGPKDfFyRJgB/g CvES45Da+Yj5IVu0AbeFxxBvPlfX7+Vgkuai79sK3b1FD3lDsf9XbXtuqBxjCSOhJJIv 8LhqabHxczW7n47c76QDtG6k6Fyj4PFIdHkn/PSqApJlfKDMb78q4O83xDazi6DULVnQ dbDl9SECRIV+U9YyVejfnG97UzZJ83iishWMHZr3ek+TxySdlKNMXIxCNB1kai2Wp37J QotltR5nYeKQluC3HaOrbnGii+Y3TYgfzi67f2H1VFoKaeBYgXKD/AKJb/ds208i7WLB bcuw== X-Gm-Message-State: APjAAAXQn+Oen+GeNdViLzU9/7g6gb8YSRLbkf4b3whNQJydRDW7eG8m gVdqsWo93+1Xmc7vFnIbXLff7bygOy97xtyKPHvdUxaApDV5g8aGPd3QmL8fVurHjffF29Wm9wT thuOSHXN+udw= X-Received: by 2002:a65:67d4:: with SMTP id b20mr27376187pgs.445.1569212755561; Sun, 22 Sep 2019 21:25:55 -0700 (PDT) X-Google-Smtp-Source: APXvYqyxuxMhYGvH4EJC3+piC4YcWYwZ7eWeZxL3c1t1HTqZSY/8jbiqPCLTpPP6LH9KEz2RQOBjDg== X-Received: by 2002:a65:67d4:: with SMTP id b20mr27376164pgs.445.1569212755347; Sun, 22 Sep 2019 21:25:55 -0700 (PDT) Received: from xz-x1.redhat.com ([209.132.188.80]) by smtp.gmail.com with ESMTPSA id x20sm11781867pfp.120.2019.09.22.21.25.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 22 Sep 2019 21:25:54 -0700 (PDT) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: David Hildenbrand , Hugh Dickins , Maya Gokhale , Jerome Glisse , Pavel Emelyanov , Johannes Weiner , peterx@redhat.com, Martin Cracauer , Marty McFadden , Shaohua Li , Andrea Arcangeli , Mike Kravetz , Denis Plotnikov , Mike Rapoport , Linus Torvalds , Mel Gorman , "Kirill A . Shutemov" , "Dr . David Alan Gilbert" Subject: [PATCH v4 02/10] mm/gup: Fix __get_user_pages() on fault retry of hugetlb Date: Mon, 23 Sep 2019 12:25:15 +0800 Message-Id: <20190923042523.10027-3-peterx@redhat.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20190923042523.10027-1-peterx@redhat.com> References: <20190923042523.10027-1-peterx@redhat.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: When follow_hugetlb_page() returns with *locked==0, it means we've got a VM_FAULT_RETRY within the fauling process and we've released the mmap_sem. When that happens, we should stop and bail out. Signed-off-by: Peter Xu --- mm/gup.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/mm/gup.c b/mm/gup.c index eddbb95dcb8f..e60d32f1674d 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -833,6 +833,16 @@ static long __get_user_pages(struct task_struct *tsk, struct mm_struct *mm, i = follow_hugetlb_page(mm, vma, pages, vmas, &start, &nr_pages, i, gup_flags, locked); + if (locked && *locked == 0) { + /* + * We've got a VM_FAULT_RETRY + * and we've lost mmap_sem. + * We must stop here. + */ + BUG_ON(gup_flags & FOLL_NOWAIT); + BUG_ON(ret != 0); + goto out; + } continue; } } From patchwork Mon Sep 23 04:25:16 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 11156023 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1716A1599 for ; Mon, 23 Sep 2019 04:26:09 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id C75252087C for ; Mon, 23 Sep 2019 04:26:08 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C75252087C Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 0C19A6B0269; Mon, 23 Sep 2019 00:26:08 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 0984B6B026A; Mon, 23 Sep 2019 00:26:08 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EC8406B026B; Mon, 23 Sep 2019 00:26:07 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0198.hostedemail.com [216.40.44.198]) by kanga.kvack.org (Postfix) with ESMTP id C41556B0269 for ; Mon, 23 Sep 2019 00:26:07 -0400 (EDT) Received: from smtpin08.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with SMTP id 5AF90181AC9AE for ; Mon, 23 Sep 2019 04:26:07 +0000 (UTC) X-FDA: 75964897974.08.sleet08_6f7f370b6f12a X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,peterx@redhat.com,::linux-kernel@vger.kernel.org:david@redhat.com:hughd@google.com:gokhale2@llnl.gov:jglisse@redhat.com:xemul@virtuozzo.com:hannes@cmpxchg.org:peterx@redhat.com:cracauer@cons.org:mcfadden8@llnl.gov:shli@fb.com:aarcange@redhat.com:mike.kravetz@oracle.com:dplotnikov@virtuozzo.com:rppt@linux.vnet.ibm.com:torvalds@linux-foundation.org:mgorman@suse.de:kirill@shutemov.name:dgilbert@redhat.com,RULES_HIT:30003:30012:30051:30054,0,RBL:209.132.183.28:@redhat.com:.lbl8.mailshell.net-62.18.0.0 64.10.201.10,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:39,LUA_SUMMARY:none X-HE-Tag: sleet08_6f7f370b6f12a X-Filterd-Recvd-Size: 17191 Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by imf29.hostedemail.com (Postfix) with ESMTP for ; Mon, 23 Sep 2019 04:26:06 +0000 (UTC) Received: from mail-pg1-f197.google.com (mail-pg1-f197.google.com [209.85.215.197]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 0D6EF83F4C for ; Mon, 23 Sep 2019 04:26:05 +0000 (UTC) Received: by mail-pg1-f197.google.com with SMTP id r25so8597698pgu.11 for ; Sun, 22 Sep 2019 21:26:05 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=IYAKQUt4Pznn4EiTTSRLjJXkE/unk8YPGJ1tVeBK5JI=; b=RIWdBqdWPQJIAVMr8++YrnMdNQ5l0LOWkD+rmIqt6nFRF1B9hr0JufZ9jKxbf1OmM2 9uxed/T+EbQU6lEl6C/P3lUJ/k/7UDCxK+WHvwQ8YcBDpZQvlFQyDk7jQsuiQqng87MZ OI4qqvix49btAmeoLCkwF6WUdlKn8TymHfzOb3/UvbiDGlYEtiu8kHNbzgGAlM8UDV1b fwOnqxU79sRWEZxZ9thjzrr2FTtDWBvQ4uw48XwXxoTlp/1Bbo9e0LQb3RT5jGWvP1Bb /aEMHV+xowTfAQi3Sv+9NB9hS8XCy1P0YsOILyYnmMOdh45UkL9HdhU+7MtlOQ40vZ4R EBFQ== X-Gm-Message-State: APjAAAVfMznNrv97RPMLrSu8KMKCbV1YRqd85+bDXx875fmtwnEi9Bhr rPJeBYYu95oudbx0TspAWf44a2NLRt7qrLtJzbTSbSsLqhrxcjngA9+ovw+WpD+4A9elWx2785t gnuM83e+10dA= X-Received: by 2002:a17:90a:17cb:: with SMTP id q69mr18345740pja.135.1569212763866; Sun, 22 Sep 2019 21:26:03 -0700 (PDT) X-Google-Smtp-Source: APXvYqxNIVhskSAJynDQXihZF0YNOS2kI2DAt8oONim/+KKACrblwp5we62pTMmK6Qp40OqrkaTmlQ== X-Received: by 2002:a17:90a:17cb:: with SMTP id q69mr18345700pja.135.1569212763434; Sun, 22 Sep 2019 21:26:03 -0700 (PDT) Received: from xz-x1.redhat.com ([209.132.188.80]) by smtp.gmail.com with ESMTPSA id x20sm11781867pfp.120.2019.09.22.21.25.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 22 Sep 2019 21:26:02 -0700 (PDT) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: David Hildenbrand , Hugh Dickins , Maya Gokhale , Jerome Glisse , Pavel Emelyanov , Johannes Weiner , peterx@redhat.com, Martin Cracauer , Marty McFadden , Shaohua Li , Andrea Arcangeli , Mike Kravetz , Denis Plotnikov , Mike Rapoport , Linus Torvalds , Mel Gorman , "Kirill A . Shutemov" , "Dr . David Alan Gilbert" Subject: [PATCH v4 03/10] mm: Introduce FAULT_FLAG_DEFAULT Date: Mon, 23 Sep 2019 12:25:16 +0800 Message-Id: <20190923042523.10027-4-peterx@redhat.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20190923042523.10027-1-peterx@redhat.com> References: <20190923042523.10027-1-peterx@redhat.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Although there're tons of arch-specific page fault handlers, most of them are still sharing the same initial value of the page fault flags. Say, merely all of the page fault handlers would allow the fault to be retried, and they also allow the fault to respond to SIGKILL. Let's define a default value for the fault flags to replace those initial page fault flags that were copied over. With this, it'll be far easier to introduce new fault flag that can be used by all the architectures instead of touching all the archs. Reviewed-by: David Hildenbrand Signed-off-by: Peter Xu --- arch/alpha/mm/fault.c | 2 +- arch/arc/mm/fault.c | 2 +- arch/arm/mm/fault.c | 2 +- arch/arm64/mm/fault.c | 2 +- arch/hexagon/mm/vm_fault.c | 2 +- arch/ia64/mm/fault.c | 2 +- arch/m68k/mm/fault.c | 2 +- arch/microblaze/mm/fault.c | 2 +- arch/mips/mm/fault.c | 2 +- arch/nds32/mm/fault.c | 2 +- arch/nios2/mm/fault.c | 2 +- arch/openrisc/mm/fault.c | 2 +- arch/parisc/mm/fault.c | 2 +- arch/powerpc/mm/fault.c | 2 +- arch/riscv/mm/fault.c | 2 +- arch/s390/mm/fault.c | 2 +- arch/sh/mm/fault.c | 2 +- arch/sparc/mm/fault_32.c | 2 +- arch/sparc/mm/fault_64.c | 2 +- arch/um/kernel/trap.c | 2 +- arch/unicore32/mm/fault.c | 2 +- arch/x86/mm/fault.c | 2 +- arch/xtensa/mm/fault.c | 2 +- include/linux/mm.h | 7 +++++++ 24 files changed, 30 insertions(+), 23 deletions(-) diff --git a/arch/alpha/mm/fault.c b/arch/alpha/mm/fault.c index 741e61ef9d3f..de4cc6936391 100644 --- a/arch/alpha/mm/fault.c +++ b/arch/alpha/mm/fault.c @@ -89,7 +89,7 @@ do_page_fault(unsigned long address, unsigned long mmcsr, const struct exception_table_entry *fixup; int si_code = SEGV_MAPERR; vm_fault_t fault; - unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE; + unsigned int flags = FAULT_FLAG_DEFAULT; /* As of EV6, a load into $31/$f31 is a prefetch, and never faults (or is suppressed by the PALcode). Support that for older CPUs diff --git a/arch/arc/mm/fault.c b/arch/arc/mm/fault.c index 3861543b66a0..61919e4e4eec 100644 --- a/arch/arc/mm/fault.c +++ b/arch/arc/mm/fault.c @@ -94,7 +94,7 @@ void do_page_fault(unsigned long address, struct pt_regs *regs) (regs->ecr_cause == ECR_C_PROTV_INST_FETCH)) exec = 1; - flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE; + flags = FAULT_FLAG_DEFAULT; if (user_mode(regs)) flags |= FAULT_FLAG_USER; if (write) diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c index 890eeaac3cbb..2ae28ffec622 100644 --- a/arch/arm/mm/fault.c +++ b/arch/arm/mm/fault.c @@ -241,7 +241,7 @@ do_page_fault(unsigned long addr, unsigned int fsr, struct pt_regs *regs) struct mm_struct *mm; int sig, code; vm_fault_t fault; - unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE; + unsigned int flags = FAULT_FLAG_DEFAULT; if (kprobe_page_fault(regs, fsr)) return 0; diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c index cfd65b63f36f..613e7434c208 100644 --- a/arch/arm64/mm/fault.c +++ b/arch/arm64/mm/fault.c @@ -410,7 +410,7 @@ static int __kprobes do_page_fault(unsigned long addr, unsigned int esr, struct mm_struct *mm = current->mm; vm_fault_t fault, major = 0; unsigned long vm_flags = VM_READ | VM_WRITE; - unsigned int mm_flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE; + unsigned int mm_flags = FAULT_FLAG_DEFAULT; if (kprobe_page_fault(regs, esr)) return 0; diff --git a/arch/hexagon/mm/vm_fault.c b/arch/hexagon/mm/vm_fault.c index b3bc71680ae4..223787e01bdd 100644 --- a/arch/hexagon/mm/vm_fault.c +++ b/arch/hexagon/mm/vm_fault.c @@ -41,7 +41,7 @@ void do_page_fault(unsigned long address, long cause, struct pt_regs *regs) int si_code = SEGV_MAPERR; vm_fault_t fault; const struct exception_table_entry *fixup; - unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE; + unsigned int flags = FAULT_FLAG_DEFAULT; /* * If we're in an interrupt or have no user context, diff --git a/arch/ia64/mm/fault.c b/arch/ia64/mm/fault.c index c2f299fe9e04..d039b846f671 100644 --- a/arch/ia64/mm/fault.c +++ b/arch/ia64/mm/fault.c @@ -65,7 +65,7 @@ ia64_do_page_fault (unsigned long address, unsigned long isr, struct pt_regs *re struct mm_struct *mm = current->mm; unsigned long mask; vm_fault_t fault; - unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE; + unsigned int flags = FAULT_FLAG_DEFAULT; mask = ((((isr >> IA64_ISR_X_BIT) & 1UL) << VM_EXEC_BIT) | (((isr >> IA64_ISR_W_BIT) & 1UL) << VM_WRITE_BIT)); diff --git a/arch/m68k/mm/fault.c b/arch/m68k/mm/fault.c index e9b1d7585b43..8e734309ace9 100644 --- a/arch/m68k/mm/fault.c +++ b/arch/m68k/mm/fault.c @@ -71,7 +71,7 @@ int do_page_fault(struct pt_regs *regs, unsigned long address, struct mm_struct *mm = current->mm; struct vm_area_struct * vma; vm_fault_t fault; - unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE; + unsigned int flags = FAULT_FLAG_DEFAULT; pr_debug("do page fault:\nregs->sr=%#x, regs->pc=%#lx, address=%#lx, %ld, %p\n", regs->sr, regs->pc, address, error_code, mm ? mm->pgd : NULL); diff --git a/arch/microblaze/mm/fault.c b/arch/microblaze/mm/fault.c index e6a810b0c7ad..45c9f66c1dbc 100644 --- a/arch/microblaze/mm/fault.c +++ b/arch/microblaze/mm/fault.c @@ -91,7 +91,7 @@ void do_page_fault(struct pt_regs *regs, unsigned long address, int code = SEGV_MAPERR; int is_write = error_code & ESR_S; vm_fault_t fault; - unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE; + unsigned int flags = FAULT_FLAG_DEFAULT; regs->ear = address; regs->esr = error_code; diff --git a/arch/mips/mm/fault.c b/arch/mips/mm/fault.c index f589aa8f47d9..6660b77ff8f3 100644 --- a/arch/mips/mm/fault.c +++ b/arch/mips/mm/fault.c @@ -44,7 +44,7 @@ static void __kprobes __do_page_fault(struct pt_regs *regs, unsigned long write, const int field = sizeof(unsigned long) * 2; int si_code; vm_fault_t fault; - unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE; + unsigned int flags = FAULT_FLAG_DEFAULT; static DEFINE_RATELIMIT_STATE(ratelimit_state, 5 * HZ, 10); diff --git a/arch/nds32/mm/fault.c b/arch/nds32/mm/fault.c index 064ae5d2159d..a40de112a23a 100644 --- a/arch/nds32/mm/fault.c +++ b/arch/nds32/mm/fault.c @@ -76,7 +76,7 @@ void do_page_fault(unsigned long entry, unsigned long addr, int si_code; vm_fault_t fault; unsigned int mask = VM_READ | VM_WRITE | VM_EXEC; - unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE; + unsigned int flags = FAULT_FLAG_DEFAULT; error_code = error_code & (ITYPE_mskINST | ITYPE_mskETYPE); tsk = current; diff --git a/arch/nios2/mm/fault.c b/arch/nios2/mm/fault.c index 6a2e716b959f..a401b45cae47 100644 --- a/arch/nios2/mm/fault.c +++ b/arch/nios2/mm/fault.c @@ -47,7 +47,7 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long cause, struct mm_struct *mm = tsk->mm; int code = SEGV_MAPERR; vm_fault_t fault; - unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE; + unsigned int flags = FAULT_FLAG_DEFAULT; cause >>= 2; diff --git a/arch/openrisc/mm/fault.c b/arch/openrisc/mm/fault.c index 5d4d3a9691d0..fd1592a56238 100644 --- a/arch/openrisc/mm/fault.c +++ b/arch/openrisc/mm/fault.c @@ -50,7 +50,7 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long address, struct vm_area_struct *vma; int si_code; vm_fault_t fault; - unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE; + unsigned int flags = FAULT_FLAG_DEFAULT; tsk = current; diff --git a/arch/parisc/mm/fault.c b/arch/parisc/mm/fault.c index adbd5e2144a3..355e3e13fa72 100644 --- a/arch/parisc/mm/fault.c +++ b/arch/parisc/mm/fault.c @@ -274,7 +274,7 @@ void do_page_fault(struct pt_regs *regs, unsigned long code, if (!mm) goto no_context; - flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE; + flags = FAULT_FLAG_DEFAULT; if (user_mode(regs)) flags |= FAULT_FLAG_USER; diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c index 8432c281de92..408ee769c470 100644 --- a/arch/powerpc/mm/fault.c +++ b/arch/powerpc/mm/fault.c @@ -435,7 +435,7 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address, { struct vm_area_struct * vma; struct mm_struct *mm = current->mm; - unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE; + unsigned int flags = FAULT_FLAG_DEFAULT; int is_exec = TRAP(regs) == 0x400; int is_user = user_mode(regs); int is_write = page_fault_is_write(error_code); diff --git a/arch/riscv/mm/fault.c b/arch/riscv/mm/fault.c index 96add1427a75..deeb820bd855 100644 --- a/arch/riscv/mm/fault.c +++ b/arch/riscv/mm/fault.c @@ -28,7 +28,7 @@ asmlinkage void do_page_fault(struct pt_regs *regs) struct vm_area_struct *vma; struct mm_struct *mm; unsigned long addr, cause; - unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE; + unsigned int flags = FAULT_FLAG_DEFAULT; int code = SEGV_MAPERR; vm_fault_t fault; diff --git a/arch/s390/mm/fault.c b/arch/s390/mm/fault.c index 7b0bb475c166..74a77b2bca75 100644 --- a/arch/s390/mm/fault.c +++ b/arch/s390/mm/fault.c @@ -429,7 +429,7 @@ static inline vm_fault_t do_exception(struct pt_regs *regs, int access) address = trans_exc_code & __FAIL_ADDR_MASK; perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address); - flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE; + flags = FAULT_FLAG_DEFAULT; if (user_mode(regs)) flags |= FAULT_FLAG_USER; if (access == VM_WRITE || (trans_exc_code & store_indication) == 0x400) diff --git a/arch/sh/mm/fault.c b/arch/sh/mm/fault.c index 5f51456f4fc7..becf0be267bb 100644 --- a/arch/sh/mm/fault.c +++ b/arch/sh/mm/fault.c @@ -380,7 +380,7 @@ asmlinkage void __kprobes do_page_fault(struct pt_regs *regs, struct mm_struct *mm; struct vm_area_struct * vma; vm_fault_t fault; - unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE; + unsigned int flags = FAULT_FLAG_DEFAULT; tsk = current; mm = tsk->mm; diff --git a/arch/sparc/mm/fault_32.c b/arch/sparc/mm/fault_32.c index 8d69de111470..0863f6fdd2c5 100644 --- a/arch/sparc/mm/fault_32.c +++ b/arch/sparc/mm/fault_32.c @@ -168,7 +168,7 @@ asmlinkage void do_sparc_fault(struct pt_regs *regs, int text_fault, int write, int from_user = !(regs->psr & PSR_PS); int code; vm_fault_t fault; - unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE; + unsigned int flags = FAULT_FLAG_DEFAULT; if (text_fault) address = regs->pc; diff --git a/arch/sparc/mm/fault_64.c b/arch/sparc/mm/fault_64.c index 2371fb6b97e4..a1cba3eef79e 100644 --- a/arch/sparc/mm/fault_64.c +++ b/arch/sparc/mm/fault_64.c @@ -267,7 +267,7 @@ asmlinkage void __kprobes do_sparc64_fault(struct pt_regs *regs) int si_code, fault_code; vm_fault_t fault; unsigned long address, mm_rss; - unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE; + unsigned int flags = FAULT_FLAG_DEFAULT; fault_code = get_thread_fault_code(); diff --git a/arch/um/kernel/trap.c b/arch/um/kernel/trap.c index 58fe36856182..bc2756782d64 100644 --- a/arch/um/kernel/trap.c +++ b/arch/um/kernel/trap.c @@ -32,7 +32,7 @@ int handle_page_fault(unsigned long address, unsigned long ip, pmd_t *pmd; pte_t *pte; int err = -EFAULT; - unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE; + unsigned int flags = FAULT_FLAG_DEFAULT; *code_out = SEGV_MAPERR; diff --git a/arch/unicore32/mm/fault.c b/arch/unicore32/mm/fault.c index 76342de9cf8c..60453c892c51 100644 --- a/arch/unicore32/mm/fault.c +++ b/arch/unicore32/mm/fault.c @@ -202,7 +202,7 @@ static int do_pf(unsigned long addr, unsigned int fsr, struct pt_regs *regs) struct mm_struct *mm; int sig, code; vm_fault_t fault; - unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE; + unsigned int flags = FAULT_FLAG_DEFAULT; tsk = current; mm = tsk->mm; diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index 9ceacd1156db..994c860ac2d8 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -1287,7 +1287,7 @@ void do_user_addr_fault(struct pt_regs *regs, struct task_struct *tsk; struct mm_struct *mm; vm_fault_t fault, major = 0; - unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE; + unsigned int flags = FAULT_FLAG_DEFAULT; tsk = current; mm = tsk->mm; diff --git a/arch/xtensa/mm/fault.c b/arch/xtensa/mm/fault.c index f81b1478da61..d2b082908538 100644 --- a/arch/xtensa/mm/fault.c +++ b/arch/xtensa/mm/fault.c @@ -43,7 +43,7 @@ void do_page_fault(struct pt_regs *regs) int is_write, is_exec; vm_fault_t fault; - unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE; + unsigned int flags = FAULT_FLAG_DEFAULT; code = SEGV_MAPERR; diff --git a/include/linux/mm.h b/include/linux/mm.h index 0334ca97c584..57fb5c535f8e 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -393,6 +393,13 @@ extern pgprot_t protection_map[16]; #define FAULT_FLAG_REMOTE 0x80 /* faulting for non current tsk/mm */ #define FAULT_FLAG_INSTRUCTION 0x100 /* The fault was during an instruction fetch */ +/* + * The default fault flags that should be used by most of the + * arch-specific page fault handlers. + */ +#define FAULT_FLAG_DEFAULT (FAULT_FLAG_ALLOW_RETRY | \ + FAULT_FLAG_KILLABLE) + #define FAULT_FLAG_TRACE \ { FAULT_FLAG_WRITE, "WRITE" }, \ { FAULT_FLAG_MKWRITE, "MKWRITE" }, \ From patchwork Mon Sep 23 04:25:17 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 11156027 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3DD8314E5 for ; Mon, 23 Sep 2019 04:26:15 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 0B2B82089F for ; Mon, 23 Sep 2019 04:26:15 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0B2B82089F Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 3A6CA6B026C; Mon, 23 Sep 2019 00:26:14 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 37D4A6B026D; Mon, 23 Sep 2019 00:26:14 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 294316B026E; Mon, 23 Sep 2019 00:26:14 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0168.hostedemail.com [216.40.44.168]) by kanga.kvack.org (Postfix) with ESMTP id 07F8A6B026C for ; Mon, 23 Sep 2019 00:26:14 -0400 (EDT) Received: from smtpin21.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with SMTP id A4D1A8243768 for ; Mon, 23 Sep 2019 04:26:13 +0000 (UTC) X-FDA: 75964898226.21.tooth57_706aa80b9651d X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,peterx@redhat.com,::linux-kernel@vger.kernel.org:david@redhat.com:hughd@google.com:gokhale2@llnl.gov:jglisse@redhat.com:xemul@virtuozzo.com:hannes@cmpxchg.org:peterx@redhat.com:cracauer@cons.org:mcfadden8@llnl.gov:shli@fb.com:aarcange@redhat.com:mike.kravetz@oracle.com:dplotnikov@virtuozzo.com:rppt@linux.vnet.ibm.com:torvalds@linux-foundation.org:mgorman@suse.de:kirill@shutemov.name:dgilbert@redhat.com,RULES_HIT:30051:30054:30070,0,RBL:209.132.183.28:@redhat.com:.lbl8.mailshell.net-62.18.0.0 64.10.201.10,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:29,LUA_SUMMARY:none X-HE-Tag: tooth57_706aa80b9651d X-Filterd-Recvd-Size: 7734 Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by imf17.hostedemail.com (Postfix) with ESMTP for ; Mon, 23 Sep 2019 04:26:12 +0000 (UTC) Received: from mail-pl1-f199.google.com (mail-pl1-f199.google.com [209.85.214.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 7A3ACC0546FC for ; Mon, 23 Sep 2019 04:26:11 +0000 (UTC) Received: by mail-pl1-f199.google.com with SMTP id x5so7976286pln.5 for ; Sun, 22 Sep 2019 21:26:11 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=GXIovqwqm63wtxzzyeIbAQrpQWpcAdR9JxWJf3/Rl1Y=; b=OBgzJ4QmUtT80IxDvgWldZyszJAS2Lea9qoq4Je5S0YpMXq8nt4iNBrKrvgBcm5tG+ nULDiwTj1dm+5SWuR+k57U874oOdwfeUdUwlckxQiPb05WYcPJ6+t9GOddunWGRy2fcq O/v03mzw8N7VrEA4Ozt0R+kVfyiL5yZb3yv5B7Y/sEewhnQxWWpYAAEb+8+q8JZaD5d9 qVSqQt0gfQVCpM2d3XVIDpnKlcGRd/j3VC8kAI2zOAASpUo1Se98qktv1/0S5h3LgaDO RzqarGMtVqn5xRfEA1hF3aMb92jVZA9yHAGi/Rov4X+L/fRLBPFAoKxLU3B5z1Ub/y59 KHCg== X-Gm-Message-State: APjAAAVQa1jeUoso1dY/UQgEDZiyRNq/11vt7XK6nJqR+Lu2TCF0pNAV bC0rLY4K3H7sk9FnhvGa5HTpmTvdFgV9VljFZ8LUB5WRMufmOrAOfI+YOl+pm5kk1v0ScjNXsaN FmsgtHs9p/1E= X-Received: by 2002:aa7:8acf:: with SMTP id b15mr24586112pfd.191.1569212770634; Sun, 22 Sep 2019 21:26:10 -0700 (PDT) X-Google-Smtp-Source: APXvYqx6iUJWUgZ29uQGs7aHG+ZKt7JXzV+Gz1NXrFI6pYw3jfFgBbkb+Z15GkLEoauMRbp2DWsMsw== X-Received: by 2002:aa7:8acf:: with SMTP id b15mr24586091pfd.191.1569212770425; Sun, 22 Sep 2019 21:26:10 -0700 (PDT) Received: from xz-x1.redhat.com ([209.132.188.80]) by smtp.gmail.com with ESMTPSA id x20sm11781867pfp.120.2019.09.22.21.26.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 22 Sep 2019 21:26:09 -0700 (PDT) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: David Hildenbrand , Hugh Dickins , Maya Gokhale , Jerome Glisse , Pavel Emelyanov , Johannes Weiner , peterx@redhat.com, Martin Cracauer , Marty McFadden , Shaohua Li , Andrea Arcangeli , Mike Kravetz , Denis Plotnikov , Mike Rapoport , Linus Torvalds , Mel Gorman , "Kirill A . Shutemov" , "Dr . David Alan Gilbert" Subject: [PATCH v4 04/10] mm: Introduce FAULT_FLAG_INTERRUPTIBLE Date: Mon, 23 Sep 2019 12:25:17 +0800 Message-Id: <20190923042523.10027-5-peterx@redhat.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20190923042523.10027-1-peterx@redhat.com> References: <20190923042523.10027-1-peterx@redhat.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: handle_userfaultfd() is currently the only one place in the kernel page fault procedures that can respond to non-fatal userspace signals. It was trying to detect such an allowance by checking against USER & KILLABLE flags, which was "un-official". In this patch, we introduced a new flag (FAULT_FLAG_INTERRUPTIBLE) to show that the fault handler allows the fault procedure to respond even to non-fatal signals. Meanwhile, add this new flag to the default fault flags so that all the page fault handlers can benefit from the new flag. With that, replacing the userfault check to this one. Since the line is getting even longer, clean up the fault flags a bit too to ease TTY users. Although we've got a new flag and applied it, we shouldn't have any functional change with this patch so far. Suggested-by: Linus Torvalds Reviewed-by: David Hildenbrand Signed-off-by: Peter Xu --- fs/userfaultfd.c | 4 +--- include/linux/mm.h | 39 ++++++++++++++++++++++++++++----------- 2 files changed, 29 insertions(+), 14 deletions(-) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index fe6d804a38dc..3c55ee64bcb1 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -462,9 +462,7 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason) uwq.ctx = ctx; uwq.waken = false; - return_to_userland = - (vmf->flags & (FAULT_FLAG_USER|FAULT_FLAG_KILLABLE)) == - (FAULT_FLAG_USER|FAULT_FLAG_KILLABLE); + return_to_userland = vmf->flags & FAULT_FLAG_INTERRUPTIBLE; blocking_state = return_to_userland ? TASK_INTERRUPTIBLE : TASK_KILLABLE; diff --git a/include/linux/mm.h b/include/linux/mm.h index 57fb5c535f8e..53ec7abb8472 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -383,22 +383,38 @@ extern unsigned int kobjsize(const void *objp); */ extern pgprot_t protection_map[16]; -#define FAULT_FLAG_WRITE 0x01 /* Fault was a write access */ -#define FAULT_FLAG_MKWRITE 0x02 /* Fault was mkwrite of existing pte */ -#define FAULT_FLAG_ALLOW_RETRY 0x04 /* Retry fault if blocking */ -#define FAULT_FLAG_RETRY_NOWAIT 0x08 /* Don't drop mmap_sem and wait when retrying */ -#define FAULT_FLAG_KILLABLE 0x10 /* The fault task is in SIGKILL killable region */ -#define FAULT_FLAG_TRIED 0x20 /* Second try */ -#define FAULT_FLAG_USER 0x40 /* The fault originated in userspace */ -#define FAULT_FLAG_REMOTE 0x80 /* faulting for non current tsk/mm */ -#define FAULT_FLAG_INSTRUCTION 0x100 /* The fault was during an instruction fetch */ +/** + * Fault flag definitions. + * + * @FAULT_FLAG_WRITE: Fault was a write fault. + * @FAULT_FLAG_MKWRITE: Fault was mkwrite of existing PTE. + * @FAULT_FLAG_ALLOW_RETRY: Allow to retry the fault if blocked. + * @FAULT_FLAG_RETRY_NOWAIT: Don't drop mmap_sem and wait when retrying. + * @FAULT_FLAG_KILLABLE: The fault task is in SIGKILL killable region. + * @FAULT_FLAG_TRIED: The fault has been tried once. + * @FAULT_FLAG_USER: The fault originated in userspace. + * @FAULT_FLAG_REMOTE: The fault is not for current task/mm. + * @FAULT_FLAG_INSTRUCTION: The fault was during an instruction fetch. + * @FAULT_FLAG_INTERRUPTIBLE: The fault can be interrupted by non-fatal signals. + */ +#define FAULT_FLAG_WRITE 0x01 +#define FAULT_FLAG_MKWRITE 0x02 +#define FAULT_FLAG_ALLOW_RETRY 0x04 +#define FAULT_FLAG_RETRY_NOWAIT 0x08 +#define FAULT_FLAG_KILLABLE 0x10 +#define FAULT_FLAG_TRIED 0x20 +#define FAULT_FLAG_USER 0x40 +#define FAULT_FLAG_REMOTE 0x80 +#define FAULT_FLAG_INSTRUCTION 0x100 +#define FAULT_FLAG_INTERRUPTIBLE 0x200 /* * The default fault flags that should be used by most of the * arch-specific page fault handlers. */ #define FAULT_FLAG_DEFAULT (FAULT_FLAG_ALLOW_RETRY | \ - FAULT_FLAG_KILLABLE) + FAULT_FLAG_KILLABLE | \ + FAULT_FLAG_INTERRUPTIBLE) #define FAULT_FLAG_TRACE \ { FAULT_FLAG_WRITE, "WRITE" }, \ @@ -409,7 +425,8 @@ extern pgprot_t protection_map[16]; { FAULT_FLAG_TRIED, "TRIED" }, \ { FAULT_FLAG_USER, "USER" }, \ { FAULT_FLAG_REMOTE, "REMOTE" }, \ - { FAULT_FLAG_INSTRUCTION, "INSTRUCTION" } + { FAULT_FLAG_INSTRUCTION, "INSTRUCTION" }, \ + { FAULT_FLAG_INTERRUPTIBLE, "INTERRUPTIBLE" } /* * vm_fault is filled by the the pagefault handler and passed to the vma's From patchwork Mon Sep 23 04:25:18 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 11156029 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EF90714E5 for ; Mon, 23 Sep 2019 04:26:21 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 9E0B92087C for ; Mon, 23 Sep 2019 04:26:21 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9E0B92087C Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id C7F526B026E; Mon, 23 Sep 2019 00:26:20 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id C562A6B026F; Mon, 23 Sep 2019 00:26:20 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B20126B0270; Mon, 23 Sep 2019 00:26:20 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0045.hostedemail.com [216.40.44.45]) by kanga.kvack.org (Postfix) with ESMTP id 899966B026E for ; Mon, 23 Sep 2019 00:26:20 -0400 (EDT) Received: from smtpin15.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with SMTP id 2B3B718DD for ; Mon, 23 Sep 2019 04:26:20 +0000 (UTC) X-FDA: 75964898520.15.chess19_7171225357857 X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,peterx@redhat.com,::linux-kernel@vger.kernel.org:david@redhat.com:hughd@google.com:gokhale2@llnl.gov:jglisse@redhat.com:xemul@virtuozzo.com:hannes@cmpxchg.org:peterx@redhat.com:cracauer@cons.org:mcfadden8@llnl.gov:shli@fb.com:aarcange@redhat.com:mike.kravetz@oracle.com:dplotnikov@virtuozzo.com:rppt@linux.vnet.ibm.com:torvalds@linux-foundation.org:mgorman@suse.de:kirill@shutemov.name:dgilbert@redhat.com,RULES_HIT:30003:30012:30045:30051:30054:30070,0,RBL:209.132.183.28:@redhat.com:.lbl8.mailshell.net-62.18.0.0 64.10.201.10,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:1:0,LFtime:26,LUA_SUMMARY:none X-HE-Tag: chess19_7171225357857 X-Filterd-Recvd-Size: 19833 Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by imf22.hostedemail.com (Postfix) with ESMTP for ; Mon, 23 Sep 2019 04:26:19 +0000 (UTC) Received: from mail-pg1-f198.google.com (mail-pg1-f198.google.com [209.85.215.198]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id A28BB83F4C for ; Mon, 23 Sep 2019 04:26:18 +0000 (UTC) Received: by mail-pg1-f198.google.com with SMTP id 6so1543137pgi.10 for ; Sun, 22 Sep 2019 21:26:18 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=hIx8A3yEH181U25gCr2QcCcmLxEbF4TWpxYF9KeMWd8=; b=oPOou/vc9VyQcm25/zYqxWRGHmiqkfePDevEEzGnG2jKJoHznLnW381bttAipEqWDJ GEH6xsI9cD2jR3S+l0Erw0+K9l5x/3M7PhkBBFxo8oMrIB6o9ew9SGwE4wxUQMgKts/q cFY1iU5ESZ/l/3SfkoWKbmhKhfSoOekXXI7SRjJhTpTRuPQt+4OqLPus/NcXWeDLM2qr 0Hp+C9r5zuf7V6/LgofdriyO8cP2N1b/14cI4cIbZKL8xCWSVcUxAkd+JScWMoXQdboz S+qwJg1Od/WX9Lb9fqt5epgukr7KMi81D+KdLyBC3RPDfnKmXjqErb3AwIhmqTyNoWZD o+AA== X-Gm-Message-State: APjAAAW0lUOIPdWH0nhejApHA4rH44WxHRnJl8OABUKgEy5i7F+lR3aD cBfOjry6HSaI+A3NEzKWExO9+ktr/llgucLRCVv6uEHZ+zS3kG9nDiEKHS9HVdUzMcgFZSp4oRK TpXoeehPaRho= X-Received: by 2002:a65:60cd:: with SMTP id r13mr28136710pgv.421.1569212777795; Sun, 22 Sep 2019 21:26:17 -0700 (PDT) X-Google-Smtp-Source: APXvYqxVJBzQP25zAcUihv/H6xbRwH2tbSkGeKfHDdJtpYmYjT+2d46K1yCcnl5zTfhJFFpyjPlxuw== X-Received: by 2002:a65:60cd:: with SMTP id r13mr28136668pgv.421.1569212777310; Sun, 22 Sep 2019 21:26:17 -0700 (PDT) Received: from xz-x1.redhat.com ([209.132.188.80]) by smtp.gmail.com with ESMTPSA id x20sm11781867pfp.120.2019.09.22.21.26.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 22 Sep 2019 21:26:16 -0700 (PDT) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: David Hildenbrand , Hugh Dickins , Maya Gokhale , Jerome Glisse , Pavel Emelyanov , Johannes Weiner , peterx@redhat.com, Martin Cracauer , Marty McFadden , Shaohua Li , Andrea Arcangeli , Mike Kravetz , Denis Plotnikov , Mike Rapoport , Linus Torvalds , Mel Gorman , "Kirill A . Shutemov" , "Dr . David Alan Gilbert" Subject: [PATCH v4 05/10] mm: Return faster for non-fatal signals in user mode faults Date: Mon, 23 Sep 2019 12:25:18 +0800 Message-Id: <20190923042523.10027-6-peterx@redhat.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20190923042523.10027-1-peterx@redhat.com> References: <20190923042523.10027-1-peterx@redhat.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The idea comes from the upstream discussion between Linus and Andrea: https://lore.kernel.org/lkml/20171102193644.GB22686@redhat.com/ A summary to the issue: there was a special path in handle_userfault() in the past that we'll return a VM_FAULT_NOPAGE when we detected non-fatal signals when waiting for userfault handling. We did that by reacquiring the mmap_sem before returning. However that brings a risk in that the vmas might have changed when we retake the mmap_sem and even we could be holding an invalid vma structure. This patch is a preparation of removing that special path by allowing the page fault to return even faster if we were interrupted by a non-fatal signal during a user-mode page fault handling routine. Suggested-by: Linus Torvalds Suggested-by: Andrea Arcangeli Signed-off-by: Peter Xu Acked-by: Palmer Dabbelt # RISC-V parts --- arch/alpha/mm/fault.c | 3 ++- arch/arc/mm/fault.c | 5 +++++ arch/arm/mm/fault.c | 9 +++++---- arch/arm64/mm/fault.c | 9 +++++---- arch/hexagon/mm/vm_fault.c | 3 ++- arch/ia64/mm/fault.c | 3 ++- arch/m68k/mm/fault.c | 5 +++-- arch/microblaze/mm/fault.c | 3 ++- arch/mips/mm/fault.c | 3 ++- arch/nds32/mm/fault.c | 9 +++++---- arch/nios2/mm/fault.c | 3 ++- arch/openrisc/mm/fault.c | 3 ++- arch/parisc/mm/fault.c | 3 ++- arch/powerpc/mm/fault.c | 2 ++ arch/riscv/mm/fault.c | 5 +++-- arch/s390/mm/fault.c | 4 ++-- arch/sh/mm/fault.c | 4 ++++ arch/sparc/mm/fault_32.c | 2 +- arch/sparc/mm/fault_64.c | 3 ++- arch/um/kernel/trap.c | 4 +++- arch/unicore32/mm/fault.c | 5 +++-- arch/x86/mm/fault.c | 2 ++ arch/xtensa/mm/fault.c | 3 ++- include/linux/sched/signal.h | 12 ++++++++++++ 24 files changed, 75 insertions(+), 32 deletions(-) diff --git a/arch/alpha/mm/fault.c b/arch/alpha/mm/fault.c index de4cc6936391..ab1d4212d658 100644 --- a/arch/alpha/mm/fault.c +++ b/arch/alpha/mm/fault.c @@ -150,7 +150,8 @@ do_page_fault(unsigned long address, unsigned long mmcsr, the fault. */ fault = handle_mm_fault(vma, address, flags); - if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current)) + if ((fault & VM_FAULT_RETRY) && + fault_should_check_signal(user_mode(regs))) return; if (unlikely(fault & VM_FAULT_ERROR)) { diff --git a/arch/arc/mm/fault.c b/arch/arc/mm/fault.c index 61919e4e4eec..27adf4e608e4 100644 --- a/arch/arc/mm/fault.c +++ b/arch/arc/mm/fault.c @@ -142,6 +142,11 @@ void do_page_fault(unsigned long address, struct pt_regs *regs) goto no_context; return; } + + /* Allow user to handle non-fatal signals first */ + if (signal_pending(current) && user_mode(regs)) + return; + /* * retry state machine */ diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c index 2ae28ffec622..f00fb4eafe54 100644 --- a/arch/arm/mm/fault.c +++ b/arch/arm/mm/fault.c @@ -291,14 +291,15 @@ do_page_fault(unsigned long addr, unsigned int fsr, struct pt_regs *regs) fault = __do_page_fault(mm, addr, fsr, flags, tsk); - /* If we need to retry but a fatal signal is pending, handle the + /* If we need to retry but a signal is pending, try to handle the * signal first. We do not need to release the mmap_sem because * it would already be released in __lock_page_or_retry in * mm/filemap.c. */ - if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current)) { - if (!user_mode(regs)) + if (unlikely(fault & VM_FAULT_RETRY && signal_pending(current))) { + if (fatal_signal_pending(current) && !user_mode(regs)) goto no_context; - return 0; + if (user_mode(regs)) + return 0; } /* diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c index 613e7434c208..0d3fe0ea6a70 100644 --- a/arch/arm64/mm/fault.c +++ b/arch/arm64/mm/fault.c @@ -479,15 +479,16 @@ static int __kprobes do_page_fault(unsigned long addr, unsigned int esr, if (fault & VM_FAULT_RETRY) { /* - * If we need to retry but a fatal signal is pending, + * If we need to retry but a signal is pending, try to * handle the signal first. We do not need to release * the mmap_sem because it would already be released * in __lock_page_or_retry in mm/filemap.c. */ - if (fatal_signal_pending(current)) { - if (!user_mode(regs)) + if (signal_pending(current)) { + if (fatal_signal_pending(current) && !user_mode(regs)) goto no_context; - return 0; + if (user_mode(regs)) + return 0; } /* diff --git a/arch/hexagon/mm/vm_fault.c b/arch/hexagon/mm/vm_fault.c index 223787e01bdd..88a2e5635bfb 100644 --- a/arch/hexagon/mm/vm_fault.c +++ b/arch/hexagon/mm/vm_fault.c @@ -91,7 +91,8 @@ void do_page_fault(unsigned long address, long cause, struct pt_regs *regs) fault = handle_mm_fault(vma, address, flags); - if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current)) + if ((fault & VM_FAULT_RETRY) && + fault_should_check_signal(user_mode(regs))) return; /* The most common case -- we are done. */ diff --git a/arch/ia64/mm/fault.c b/arch/ia64/mm/fault.c index d039b846f671..8d47acf50fda 100644 --- a/arch/ia64/mm/fault.c +++ b/arch/ia64/mm/fault.c @@ -141,7 +141,8 @@ ia64_do_page_fault (unsigned long address, unsigned long isr, struct pt_regs *re */ fault = handle_mm_fault(vma, address, flags); - if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current)) + if ((fault & VM_FAULT_RETRY) && + fault_should_check_signal(user_mode(regs))) return; if (unlikely(fault & VM_FAULT_ERROR)) { diff --git a/arch/m68k/mm/fault.c b/arch/m68k/mm/fault.c index 8e734309ace9..103f93ba8139 100644 --- a/arch/m68k/mm/fault.c +++ b/arch/m68k/mm/fault.c @@ -138,8 +138,9 @@ int do_page_fault(struct pt_regs *regs, unsigned long address, fault = handle_mm_fault(vma, address, flags); pr_debug("handle_mm_fault returns %x\n", fault); - if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current)) - return 0; + if ((fault & VM_FAULT_RETRY) && + fault_should_check_signal(user_mode(regs))) + return; if (unlikely(fault & VM_FAULT_ERROR)) { if (fault & VM_FAULT_OOM) diff --git a/arch/microblaze/mm/fault.c b/arch/microblaze/mm/fault.c index 45c9f66c1dbc..8b0615eab4b6 100644 --- a/arch/microblaze/mm/fault.c +++ b/arch/microblaze/mm/fault.c @@ -217,7 +217,8 @@ void do_page_fault(struct pt_regs *regs, unsigned long address, */ fault = handle_mm_fault(vma, address, flags); - if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current)) + if ((fault & VM_FAULT_RETRY) && + fault_should_check_signal(user_mode(regs))) return; if (unlikely(fault & VM_FAULT_ERROR)) { diff --git a/arch/mips/mm/fault.c b/arch/mips/mm/fault.c index 6660b77ff8f3..48aac20a1ded 100644 --- a/arch/mips/mm/fault.c +++ b/arch/mips/mm/fault.c @@ -154,7 +154,8 @@ static void __kprobes __do_page_fault(struct pt_regs *regs, unsigned long write, */ fault = handle_mm_fault(vma, address, flags); - if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current)) + if ((fault & VM_FAULT_RETRY) && + fault_should_check_signal(user_mode(regs))) return; perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address); diff --git a/arch/nds32/mm/fault.c b/arch/nds32/mm/fault.c index a40de112a23a..baa44f9d0b4a 100644 --- a/arch/nds32/mm/fault.c +++ b/arch/nds32/mm/fault.c @@ -206,14 +206,15 @@ void do_page_fault(unsigned long entry, unsigned long addr, fault = handle_mm_fault(vma, addr, flags); /* - * If we need to retry but a fatal signal is pending, handle the + * If we need to retry but a signal is pending, try to handle the * signal first. We do not need to release the mmap_sem because it * would already be released in __lock_page_or_retry in mm/filemap.c. */ - if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current)) { - if (!user_mode(regs)) + if ((fault & VM_FAULT_RETRY) && signal_pending(current)) { + if (fatal_signal_pending(current) && !user_mode(regs)) goto no_context; - return; + if (user_mode(regs)) + return; } if (unlikely(fault & VM_FAULT_ERROR)) { diff --git a/arch/nios2/mm/fault.c b/arch/nios2/mm/fault.c index a401b45cae47..f9f178484184 100644 --- a/arch/nios2/mm/fault.c +++ b/arch/nios2/mm/fault.c @@ -133,7 +133,8 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long cause, */ fault = handle_mm_fault(vma, address, flags); - if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current)) + if ((fault & VM_FAULT_RETRY) && + fault_should_check_signal(user_mode(regs))) return; if (unlikely(fault & VM_FAULT_ERROR)) { diff --git a/arch/openrisc/mm/fault.c b/arch/openrisc/mm/fault.c index fd1592a56238..8ba3696dd10c 100644 --- a/arch/openrisc/mm/fault.c +++ b/arch/openrisc/mm/fault.c @@ -161,7 +161,8 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long address, fault = handle_mm_fault(vma, address, flags); - if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current)) + if ((fault & VM_FAULT_RETRY) && + fault_should_check_signal(user_mode(regs))) return; if (unlikely(fault & VM_FAULT_ERROR)) { diff --git a/arch/parisc/mm/fault.c b/arch/parisc/mm/fault.c index 355e3e13fa72..163dcb080c7b 100644 --- a/arch/parisc/mm/fault.c +++ b/arch/parisc/mm/fault.c @@ -304,7 +304,8 @@ void do_page_fault(struct pt_regs *regs, unsigned long code, fault = handle_mm_fault(vma, address, flags); - if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current)) + if ((fault & VM_FAULT_RETRY) && + fault_should_check_signal(user_mode(regs))) return; if (unlikely(fault & VM_FAULT_ERROR)) { diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c index 408ee769c470..d321a6c5fe62 100644 --- a/arch/powerpc/mm/fault.c +++ b/arch/powerpc/mm/fault.c @@ -596,6 +596,8 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address, */ flags &= ~FAULT_FLAG_ALLOW_RETRY; flags |= FAULT_FLAG_TRIED; + if (is_user && signal_pending(current)) + return 0; if (!fatal_signal_pending(current)) goto retry; } diff --git a/arch/riscv/mm/fault.c b/arch/riscv/mm/fault.c index deeb820bd855..ea8f301de65b 100644 --- a/arch/riscv/mm/fault.c +++ b/arch/riscv/mm/fault.c @@ -111,11 +111,12 @@ asmlinkage void do_page_fault(struct pt_regs *regs) fault = handle_mm_fault(vma, addr, flags); /* - * If we need to retry but a fatal signal is pending, handle the + * If we need to retry but a signal is pending, try to handle the * signal first. We do not need to release the mmap_sem because it * would already be released in __lock_page_or_retry in mm/filemap.c. */ - if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(tsk)) + if ((fault & VM_FAULT_RETRY) && + fault_should_check_signal(user_mode(regs))) return; if (unlikely(fault & VM_FAULT_ERROR)) { diff --git a/arch/s390/mm/fault.c b/arch/s390/mm/fault.c index 74a77b2bca75..3ad77501deef 100644 --- a/arch/s390/mm/fault.c +++ b/arch/s390/mm/fault.c @@ -480,8 +480,8 @@ static inline vm_fault_t do_exception(struct pt_regs *regs, int access) * the fault. */ fault = handle_mm_fault(vma, address, flags); - /* No reason to continue if interrupted by SIGKILL. */ - if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current)) { + if ((fault & VM_FAULT_RETRY) && + fault_should_check_signal(user_mode(regs))) { fault = VM_FAULT_SIGNAL; if (flags & FAULT_FLAG_RETRY_NOWAIT) goto out_up; diff --git a/arch/sh/mm/fault.c b/arch/sh/mm/fault.c index becf0be267bb..f620282a37fd 100644 --- a/arch/sh/mm/fault.c +++ b/arch/sh/mm/fault.c @@ -489,6 +489,10 @@ asmlinkage void __kprobes do_page_fault(struct pt_regs *regs, * have already released it in __lock_page_or_retry * in mm/filemap.c. */ + + if (user_mode(regs) && signal_pending(tsk)) + return; + goto retry; } } diff --git a/arch/sparc/mm/fault_32.c b/arch/sparc/mm/fault_32.c index 0863f6fdd2c5..9af0c3ad50d6 100644 --- a/arch/sparc/mm/fault_32.c +++ b/arch/sparc/mm/fault_32.c @@ -237,7 +237,7 @@ asmlinkage void do_sparc_fault(struct pt_regs *regs, int text_fault, int write, */ fault = handle_mm_fault(vma, address, flags); - if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current)) + if ((fault & VM_FAULT_RETRY) && fault_should_check_signal(from_user)) return; if (unlikely(fault & VM_FAULT_ERROR)) { diff --git a/arch/sparc/mm/fault_64.c b/arch/sparc/mm/fault_64.c index a1cba3eef79e..566f05f9040b 100644 --- a/arch/sparc/mm/fault_64.c +++ b/arch/sparc/mm/fault_64.c @@ -421,7 +421,8 @@ asmlinkage void __kprobes do_sparc64_fault(struct pt_regs *regs) fault = handle_mm_fault(vma, address, flags); - if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current)) + if ((fault & VM_FAULT_RETRY) && + fault_should_check_signal(flags & FAULT_FLAG_USER)) goto exit_exception; if (unlikely(fault & VM_FAULT_ERROR)) { diff --git a/arch/um/kernel/trap.c b/arch/um/kernel/trap.c index bc2756782d64..3c72111f27e9 100644 --- a/arch/um/kernel/trap.c +++ b/arch/um/kernel/trap.c @@ -76,7 +76,9 @@ int handle_page_fault(unsigned long address, unsigned long ip, fault = handle_mm_fault(vma, address, flags); - if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current)) + + if ((fault & VM_FAULT_RETRY) && + fault_should_check_signal(is_user)) goto out_nosemaphore; if (unlikely(fault & VM_FAULT_ERROR)) { diff --git a/arch/unicore32/mm/fault.c b/arch/unicore32/mm/fault.c index 60453c892c51..04c193439c97 100644 --- a/arch/unicore32/mm/fault.c +++ b/arch/unicore32/mm/fault.c @@ -246,11 +246,12 @@ static int do_pf(unsigned long addr, unsigned int fsr, struct pt_regs *regs) fault = __do_pf(mm, addr, fsr, flags, tsk); - /* If we need to retry but a fatal signal is pending, handle the + /* If we need to retry but a signal is pending, try to handle the * signal first. We do not need to release the mmap_sem because * it would already be released in __lock_page_or_retry in * mm/filemap.c. */ - if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current)) + if ((fault & VM_FAULT_RETRY) && + fault_should_check_signal(user_mode(regs))) return 0; if (!(fault & VM_FAULT_ERROR) && (flags & FAULT_FLAG_ALLOW_RETRY)) { diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index 994c860ac2d8..f7836472961e 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -1451,6 +1451,8 @@ void do_user_addr_fault(struct pt_regs *regs, if (flags & FAULT_FLAG_ALLOW_RETRY) { flags &= ~FAULT_FLAG_ALLOW_RETRY; flags |= FAULT_FLAG_TRIED; + if ((flags & FAULT_FLAG_USER) && signal_pending(tsk)) + return; if (!fatal_signal_pending(tsk)) goto retry; } diff --git a/arch/xtensa/mm/fault.c b/arch/xtensa/mm/fault.c index d2b082908538..094606676c36 100644 --- a/arch/xtensa/mm/fault.c +++ b/arch/xtensa/mm/fault.c @@ -110,7 +110,8 @@ void do_page_fault(struct pt_regs *regs) */ fault = handle_mm_fault(vma, address, flags); - if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current)) + if ((fault & VM_FAULT_RETRY) && + fault_should_check_signal(user_mode(regs))) return; if (unlikely(fault & VM_FAULT_ERROR)) { diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h index efd8ce7675ed..ccce63f2822d 100644 --- a/include/linux/sched/signal.h +++ b/include/linux/sched/signal.h @@ -377,6 +377,18 @@ static inline int signal_pending_state(long state, struct task_struct *p) return (state & TASK_INTERRUPTIBLE) || __fatal_signal_pending(p); } +/* + * This should only be used in fault handlers to decide whether we + * should stop the current fault routine to handle the signals + * instead. It should normally be used when a signal interrupted a + * page fault which can lead to a VM_FAULT_RETRY. + */ +static inline bool fault_should_check_signal(bool is_user) +{ + return (fatal_signal_pending(current) || + (is_user && signal_pending(current))); +} + /* * Reevaluate whether the task has signals pending delivery. * Wake the task if so. From patchwork Mon Sep 23 04:25:19 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 11156031 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B17E91599 for ; Mon, 23 Sep 2019 04:26:29 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 879F820820 for ; Mon, 23 Sep 2019 04:26:29 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 879F820820 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id CA6046B0270; Mon, 23 Sep 2019 00:26:28 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id C7CFF6B0271; Mon, 23 Sep 2019 00:26:28 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B94DB6B0272; Mon, 23 Sep 2019 00:26:28 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0061.hostedemail.com [216.40.44.61]) by kanga.kvack.org (Postfix) with ESMTP id 990116B0270 for ; Mon, 23 Sep 2019 00:26:28 -0400 (EDT) Received: from smtpin24.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with SMTP id 47EAF3A87 for ; Mon, 23 Sep 2019 04:26:28 +0000 (UTC) X-FDA: 75964898856.24.burst89_7299b3119d956 X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,peterx@redhat.com,::linux-kernel@vger.kernel.org:david@redhat.com:hughd@google.com:gokhale2@llnl.gov:jglisse@redhat.com:xemul@virtuozzo.com:hannes@cmpxchg.org:peterx@redhat.com:cracauer@cons.org:mcfadden8@llnl.gov:shli@fb.com:aarcange@redhat.com:mike.kravetz@oracle.com:dplotnikov@virtuozzo.com:rppt@linux.vnet.ibm.com:torvalds@linux-foundation.org:mgorman@suse.de:kirill@shutemov.name:dgilbert@redhat.com,RULES_HIT:30012:30054:30091,0,RBL:209.132.183.28:@redhat.com:.lbl8.mailshell.net-62.18.0.0 64.10.201.10,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:1:0,LFtime:25,LUA_SUMMARY:none X-HE-Tag: burst89_7299b3119d956 X-Filterd-Recvd-Size: 5148 Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by imf50.hostedemail.com (Postfix) with ESMTP for ; Mon, 23 Sep 2019 04:26:27 +0000 (UTC) Received: from mail-pf1-f199.google.com (mail-pf1-f199.google.com [209.85.210.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 8B6A6756 for ; Mon, 23 Sep 2019 04:26:26 +0000 (UTC) Received: by mail-pf1-f199.google.com with SMTP id w16so9416488pfj.9 for ; Sun, 22 Sep 2019 21:26:26 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=hmCNR49q9HpXGRvKjuRuWGZM/G5xvOD8IvISTorYf1k=; b=AbjUDoXbDXvTbUaZ0PT3/W4yLGrC9zsLw9NBmQfU6sCJVxFElLu6RIda7aH0pWyioN GpY/HG6r+jA7TVmm3Yjqc42ufbyV0ZAMjK1qWc09sh39/Js5+kEau/91OKvqz3scLjvF D8TKKpvwEn4lz04rIoNMaZ2gCqwsZOCuQJC70jKElezivjXyOQKVSXiJppP2+6XNHAoM TRDS8WLkW40rrVF5rJFQz7t7qFg9R9F04VBbFrVA6AqYUQO/8QHbg35JBTjx02/t5psV XbrCuaarkP9/SWFHhEcNpn+srx3v4WBt8Fovys1ppjycQb5cZU547JeVTFEL7lQDMZAX rqOQ== X-Gm-Message-State: APjAAAUrlAB7EBMgsfYTMIrh4+g+JcO+CwZRqNH63BV08NYx8b4z5tDw mpuA9d3AYjUxjF8a+lfmNMZBLApgUbM8JmiKKR/+ESdTkzmtI6XroLvv0T0iGi3zF5SMVeUmonH u4Er1zPQ+hzY= X-Received: by 2002:a65:5546:: with SMTP id t6mr27177514pgr.441.1569212785747; Sun, 22 Sep 2019 21:26:25 -0700 (PDT) X-Google-Smtp-Source: APXvYqygb3V/LkvF1CW/oddzTISxP+CyZpRHXEfJdMfg/L+uhJ5pN2m1AqoDMJvxWgNhDIwD+LUMWA== X-Received: by 2002:a65:5546:: with SMTP id t6mr27177491pgr.441.1569212785463; Sun, 22 Sep 2019 21:26:25 -0700 (PDT) Received: from xz-x1.redhat.com ([209.132.188.80]) by smtp.gmail.com with ESMTPSA id x20sm11781867pfp.120.2019.09.22.21.26.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 22 Sep 2019 21:26:24 -0700 (PDT) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: David Hildenbrand , Hugh Dickins , Maya Gokhale , Jerome Glisse , Pavel Emelyanov , Johannes Weiner , peterx@redhat.com, Martin Cracauer , Marty McFadden , Shaohua Li , Andrea Arcangeli , Mike Kravetz , Denis Plotnikov , Mike Rapoport , Linus Torvalds , Mel Gorman , "Kirill A . Shutemov" , "Dr . David Alan Gilbert" Subject: [PATCH v4 06/10] userfaultfd: Don't retake mmap_sem to emulate NOPAGE Date: Mon, 23 Sep 2019 12:25:19 +0800 Message-Id: <20190923042523.10027-7-peterx@redhat.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20190923042523.10027-1-peterx@redhat.com> References: <20190923042523.10027-1-peterx@redhat.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This patch removes the risk path in handle_userfault() then we will be sure that the callers of handle_mm_fault() will know that the VMAs might have changed. Meanwhile with previous patch we don't lose responsiveness as well since the core mm code now can handle the nonfatal userspace signals even if we return VM_FAULT_RETRY. Suggested-by: Andrea Arcangeli Suggested-by: Linus Torvalds Reviewed-by: Jerome Glisse Signed-off-by: Peter Xu --- fs/userfaultfd.c | 24 ------------------------ 1 file changed, 24 deletions(-) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index 3c55ee64bcb1..2b3b48e94ae4 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -522,30 +522,6 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason) __set_current_state(TASK_RUNNING); - if (return_to_userland) { - if (signal_pending(current) && - !fatal_signal_pending(current)) { - /* - * If we got a SIGSTOP or SIGCONT and this is - * a normal userland page fault, just let - * userland return so the signal will be - * handled and gdb debugging works. The page - * fault code immediately after we return from - * this function is going to release the - * mmap_sem and it's not depending on it - * (unlike gup would if we were not to return - * VM_FAULT_RETRY). - * - * If a fatal signal is pending we still take - * the streamlined VM_FAULT_RETRY failure path - * and there's no need to retake the mmap_sem - * in such case. - */ - down_read(&mm->mmap_sem); - ret = VM_FAULT_NOPAGE; - } - } - /* * Here we race with the list_del; list_add in * userfaultfd_ctx_read(), however because we don't ever run From patchwork Mon Sep 23 04:25:20 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 11156033 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 165F01599 for ; Mon, 23 Sep 2019 04:26:38 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id B7AFB20820 for ; Mon, 23 Sep 2019 04:26:37 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B7AFB20820 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id E05B76B0273; Mon, 23 Sep 2019 00:26:36 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id DB62F6B0274; Mon, 23 Sep 2019 00:26:36 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CA6026B0275; Mon, 23 Sep 2019 00:26:36 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0213.hostedemail.com [216.40.44.213]) by kanga.kvack.org (Postfix) with ESMTP id A240C6B0273 for ; Mon, 23 Sep 2019 00:26:36 -0400 (EDT) Received: from smtpin09.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with SMTP id 5ED2C2C96 for ; Mon, 23 Sep 2019 04:26:36 +0000 (UTC) X-FDA: 75964899192.09.crib20_73cb5cea1051a X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,peterx@redhat.com,::linux-kernel@vger.kernel.org:david@redhat.com:hughd@google.com:gokhale2@llnl.gov:jglisse@redhat.com:xemul@virtuozzo.com:hannes@cmpxchg.org:peterx@redhat.com:cracauer@cons.org:mcfadden8@llnl.gov:shli@fb.com:aarcange@redhat.com:mike.kravetz@oracle.com:dplotnikov@virtuozzo.com:rppt@linux.vnet.ibm.com:torvalds@linux-foundation.org:mgorman@suse.de:kirill@shutemov.name:dgilbert@redhat.com,RULES_HIT:30003:30012:30045:30054:30070:30075,0,RBL:209.132.183.28:@redhat.com:.lbl8.mailshell.net-62.18.0.0 64.10.201.10,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:1:0,LFtime:15,LUA_SUMMARY:none X-HE-Tag: crib20_73cb5cea1051a X-Filterd-Recvd-Size: 23415 Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by imf35.hostedemail.com (Postfix) with ESMTP for ; Mon, 23 Sep 2019 04:26:35 +0000 (UTC) Received: from mail-pf1-f200.google.com (mail-pf1-f200.google.com [209.85.210.200]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 6B8DD81F31 for ; Mon, 23 Sep 2019 04:26:33 +0000 (UTC) Received: by mail-pf1-f200.google.com with SMTP id x10so9387992pfr.20 for ; Sun, 22 Sep 2019 21:26:33 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=EsgdUF+uMj6vzq/3uqT9GESZFCs70hBfcsMCawzy/0w=; b=T1cO5gLNCv5/pTgL3RqfLZ+K60viHaH5EtkZ1ZDNm4qdzaG+CfXM5/ZtdF7rCQGhc3 0GSCUul0nHlVNxi9/fGZU5wPJdFMkf0IYfRiwMMnOuzAKhcobLI+OOhr+Hm4ZWpdvvrg 1Rl6oss2OgrL8jt70jH7ehmJcyoCfA02+NsIerPPIn+AORW7uxxvSiKgA/2iSf8YlHj2 RrowcusXS0HnpsnPaE32oSHOva6hUs9/CgACaXhyJfDsetJivIaJmIgWmLqfWQhrHNYS ySrWLBGtiV6/K2fVvaMLQ4lzwfjWLzD46hePFmayjwHpEBKDEZZFVUYfvaqtdGy3lSr1 0Pvw== X-Gm-Message-State: APjAAAWpkPNjI8D54NfEz3N48hLXgcbvbrIv6lchKOXixb8skStkCtBl RzVO4FqINNqSGfRja2aVe4jnTmcgFt3UJ+DfP5+zar7Q/UPVt5PXyjXvx68ckr7kt5+/7hndz37 pCDEEX53WwdM= X-Received: by 2002:a65:6111:: with SMTP id z17mr538083pgu.415.1569212792504; Sun, 22 Sep 2019 21:26:32 -0700 (PDT) X-Google-Smtp-Source: APXvYqxUx4s9r727YfJpb016eUhonH+iopYY7bIPU20r6ZWP1GzRrgdeqgtcDAVSO/LkjnQlJ8O42A== X-Received: by 2002:a65:6111:: with SMTP id z17mr538037pgu.415.1569212791930; Sun, 22 Sep 2019 21:26:31 -0700 (PDT) Received: from xz-x1.redhat.com ([209.132.188.80]) by smtp.gmail.com with ESMTPSA id x20sm11781867pfp.120.2019.09.22.21.26.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 22 Sep 2019 21:26:31 -0700 (PDT) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: David Hildenbrand , Hugh Dickins , Maya Gokhale , Jerome Glisse , Pavel Emelyanov , Johannes Weiner , peterx@redhat.com, Martin Cracauer , Marty McFadden , Shaohua Li , Andrea Arcangeli , Mike Kravetz , Denis Plotnikov , Mike Rapoport , Linus Torvalds , Mel Gorman , "Kirill A . Shutemov" , "Dr . David Alan Gilbert" Subject: [PATCH v4 07/10] mm: Allow VM_FAULT_RETRY for multiple times Date: Mon, 23 Sep 2019 12:25:20 +0800 Message-Id: <20190923042523.10027-8-peterx@redhat.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20190923042523.10027-1-peterx@redhat.com> References: <20190923042523.10027-1-peterx@redhat.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The idea comes from a discussion between Linus and Andrea [1]. Before this patch we only allow a page fault to retry once. We achieved this by clearing the FAULT_FLAG_ALLOW_RETRY flag when doing handle_mm_fault() the second time. This was majorly used to avoid unexpected starvation of the system by looping over forever to handle the page fault on a single page. However that should hardly happen, and after all for each code path to return a VM_FAULT_RETRY we'll first wait for a condition (during which time we should possibly yield the cpu) to happen before VM_FAULT_RETRY is really returned. This patch removes the restriction by keeping the FAULT_FLAG_ALLOW_RETRY flag when we receive VM_FAULT_RETRY. It means that the page fault handler now can retry the page fault for multiple times if necessary without the need to generate another page fault event. Meanwhile we still keep the FAULT_FLAG_TRIED flag so page fault handler can still identify whether a page fault is the first attempt or not. Then we'll have these combinations of fault flags (only considering ALLOW_RETRY flag and TRIED flag): - ALLOW_RETRY and !TRIED: this means the page fault allows to retry, and this is the first try - ALLOW_RETRY and TRIED: this means the page fault allows to retry, and this is not the first try - !ALLOW_RETRY and !TRIED: this means the page fault does not allow to retry at all - !ALLOW_RETRY and TRIED: this is forbidden and should never be used In existing code we have multiple places that has taken special care of the first condition above by checking against (fault_flags & FAULT_FLAG_ALLOW_RETRY). This patch introduces a simple helper to detect the first retry of a page fault by checking against both (fault_flags & FAULT_FLAG_ALLOW_RETRY) and !(fault_flag & FAULT_FLAG_TRIED) because now even the 2nd try will have the ALLOW_RETRY set, then use that helper in all existing special paths. One example is in __lock_page_or_retry(), now we'll drop the mmap_sem only in the first attempt of page fault and we'll keep it in follow up retries, so old locking behavior will be retained. This will be a nice enhancement for current code [2] at the same time a supporting material for the future userfaultfd-writeprotect work, since in that work there will always be an explicit userfault writeprotect retry for protected pages, and if that cannot resolve the page fault (e.g., when userfaultfd-writeprotect is used in conjunction with swapped pages) then we'll possibly need a 3rd retry of the page fault. It might also benefit other potential users who will have similar requirement like userfault write-protection. GUP code is not touched yet and will be covered in follow up patch. Please read the thread below for more information. [1] https://lore.kernel.org/lkml/20171102193644.GB22686@redhat.com/ [2] https://lore.kernel.org/lkml/20181230154648.GB9832@redhat.com/ Suggested-by: Linus Torvalds Suggested-by: Andrea Arcangeli Reviewed-by: Jerome Glisse Signed-off-by: Peter Xu --- arch/alpha/mm/fault.c | 2 +- arch/arc/mm/fault.c | 1 - arch/arm/mm/fault.c | 3 --- arch/arm64/mm/fault.c | 5 ----- arch/hexagon/mm/vm_fault.c | 1 - arch/ia64/mm/fault.c | 1 - arch/m68k/mm/fault.c | 3 --- arch/microblaze/mm/fault.c | 1 - arch/mips/mm/fault.c | 1 - arch/nds32/mm/fault.c | 1 - arch/nios2/mm/fault.c | 3 --- arch/openrisc/mm/fault.c | 1 - arch/parisc/mm/fault.c | 4 +--- arch/powerpc/mm/fault.c | 6 ------ arch/riscv/mm/fault.c | 5 ----- arch/s390/mm/fault.c | 5 +---- arch/sh/mm/fault.c | 1 - arch/sparc/mm/fault_32.c | 1 - arch/sparc/mm/fault_64.c | 1 - arch/um/kernel/trap.c | 1 - arch/unicore32/mm/fault.c | 4 +--- arch/x86/mm/fault.c | 2 -- arch/xtensa/mm/fault.c | 1 - drivers/gpu/drm/ttm/ttm_bo_vm.c | 12 ++++++++--- include/linux/mm.h | 37 +++++++++++++++++++++++++++++++++ mm/filemap.c | 2 +- mm/shmem.c | 2 +- 27 files changed, 52 insertions(+), 55 deletions(-) diff --git a/arch/alpha/mm/fault.c b/arch/alpha/mm/fault.c index ab1d4212d658..e032d2d03012 100644 --- a/arch/alpha/mm/fault.c +++ b/arch/alpha/mm/fault.c @@ -170,7 +170,7 @@ do_page_fault(unsigned long address, unsigned long mmcsr, else current->min_flt++; if (fault & VM_FAULT_RETRY) { - flags &= ~FAULT_FLAG_ALLOW_RETRY; + flags |= FAULT_FLAG_TRIED; /* No need to up_read(&mm->mmap_sem) as we would * have already released it in __lock_page_or_retry diff --git a/arch/arc/mm/fault.c b/arch/arc/mm/fault.c index 27adf4e608e4..bbcde83e010a 100644 --- a/arch/arc/mm/fault.c +++ b/arch/arc/mm/fault.c @@ -151,7 +151,6 @@ void do_page_fault(unsigned long address, struct pt_regs *regs) * retry state machine */ if (flags & FAULT_FLAG_ALLOW_RETRY) { - flags &= ~FAULT_FLAG_ALLOW_RETRY; flags |= FAULT_FLAG_TRIED; goto retry; } diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c index f00fb4eafe54..5f1fb46a37b0 100644 --- a/arch/arm/mm/fault.c +++ b/arch/arm/mm/fault.c @@ -320,9 +320,6 @@ do_page_fault(unsigned long addr, unsigned int fsr, struct pt_regs *regs) regs, addr); } if (fault & VM_FAULT_RETRY) { - /* Clear FAULT_FLAG_ALLOW_RETRY to avoid any risk - * of starvation. */ - flags &= ~FAULT_FLAG_ALLOW_RETRY; flags |= FAULT_FLAG_TRIED; goto retry; } diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c index 0d3fe0ea6a70..8c26097bcf0d 100644 --- a/arch/arm64/mm/fault.c +++ b/arch/arm64/mm/fault.c @@ -491,12 +491,7 @@ static int __kprobes do_page_fault(unsigned long addr, unsigned int esr, return 0; } - /* - * Clear FAULT_FLAG_ALLOW_RETRY to avoid any risk of - * starvation. - */ if (mm_flags & FAULT_FLAG_ALLOW_RETRY) { - mm_flags &= ~FAULT_FLAG_ALLOW_RETRY; mm_flags |= FAULT_FLAG_TRIED; goto retry; } diff --git a/arch/hexagon/mm/vm_fault.c b/arch/hexagon/mm/vm_fault.c index 88a2e5635bfb..a299d2142cbb 100644 --- a/arch/hexagon/mm/vm_fault.c +++ b/arch/hexagon/mm/vm_fault.c @@ -103,7 +103,6 @@ void do_page_fault(unsigned long address, long cause, struct pt_regs *regs) else current->min_flt++; if (fault & VM_FAULT_RETRY) { - flags &= ~FAULT_FLAG_ALLOW_RETRY; flags |= FAULT_FLAG_TRIED; goto retry; } diff --git a/arch/ia64/mm/fault.c b/arch/ia64/mm/fault.c index 8d47acf50fda..7679e960c685 100644 --- a/arch/ia64/mm/fault.c +++ b/arch/ia64/mm/fault.c @@ -168,7 +168,6 @@ ia64_do_page_fault (unsigned long address, unsigned long isr, struct pt_regs *re else current->min_flt++; if (fault & VM_FAULT_RETRY) { - flags &= ~FAULT_FLAG_ALLOW_RETRY; flags |= FAULT_FLAG_TRIED; /* No need to up_read(&mm->mmap_sem) as we would diff --git a/arch/m68k/mm/fault.c b/arch/m68k/mm/fault.c index 103f93ba8139..d4ef4fdf4de4 100644 --- a/arch/m68k/mm/fault.c +++ b/arch/m68k/mm/fault.c @@ -163,9 +163,6 @@ int do_page_fault(struct pt_regs *regs, unsigned long address, else current->min_flt++; if (fault & VM_FAULT_RETRY) { - /* Clear FAULT_FLAG_ALLOW_RETRY to avoid any risk - * of starvation. */ - flags &= ~FAULT_FLAG_ALLOW_RETRY; flags |= FAULT_FLAG_TRIED; /* diff --git a/arch/microblaze/mm/fault.c b/arch/microblaze/mm/fault.c index 8b0615eab4b6..9a359568f70a 100644 --- a/arch/microblaze/mm/fault.c +++ b/arch/microblaze/mm/fault.c @@ -237,7 +237,6 @@ void do_page_fault(struct pt_regs *regs, unsigned long address, else current->min_flt++; if (fault & VM_FAULT_RETRY) { - flags &= ~FAULT_FLAG_ALLOW_RETRY; flags |= FAULT_FLAG_TRIED; /* diff --git a/arch/mips/mm/fault.c b/arch/mips/mm/fault.c index 48aac20a1ded..5eeea572ff3f 100644 --- a/arch/mips/mm/fault.c +++ b/arch/mips/mm/fault.c @@ -179,7 +179,6 @@ static void __kprobes __do_page_fault(struct pt_regs *regs, unsigned long write, tsk->min_flt++; } if (fault & VM_FAULT_RETRY) { - flags &= ~FAULT_FLAG_ALLOW_RETRY; flags |= FAULT_FLAG_TRIED; /* diff --git a/arch/nds32/mm/fault.c b/arch/nds32/mm/fault.c index baa44f9d0b4a..954ca83d3289 100644 --- a/arch/nds32/mm/fault.c +++ b/arch/nds32/mm/fault.c @@ -243,7 +243,6 @@ void do_page_fault(unsigned long entry, unsigned long addr, 1, regs, addr); } if (fault & VM_FAULT_RETRY) { - flags &= ~FAULT_FLAG_ALLOW_RETRY; flags |= FAULT_FLAG_TRIED; /* No need to up_read(&mm->mmap_sem) as we would diff --git a/arch/nios2/mm/fault.c b/arch/nios2/mm/fault.c index f9f178484184..07f467577e77 100644 --- a/arch/nios2/mm/fault.c +++ b/arch/nios2/mm/fault.c @@ -158,9 +158,6 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long cause, else current->min_flt++; if (fault & VM_FAULT_RETRY) { - /* Clear FAULT_FLAG_ALLOW_RETRY to avoid any risk - * of starvation. */ - flags &= ~FAULT_FLAG_ALLOW_RETRY; flags |= FAULT_FLAG_TRIED; /* diff --git a/arch/openrisc/mm/fault.c b/arch/openrisc/mm/fault.c index 8ba3696dd10c..e7dadbdb21b3 100644 --- a/arch/openrisc/mm/fault.c +++ b/arch/openrisc/mm/fault.c @@ -182,7 +182,6 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long address, else tsk->min_flt++; if (fault & VM_FAULT_RETRY) { - flags &= ~FAULT_FLAG_ALLOW_RETRY; flags |= FAULT_FLAG_TRIED; /* No need to up_read(&mm->mmap_sem) as we would diff --git a/arch/parisc/mm/fault.c b/arch/parisc/mm/fault.c index 163dcb080c7b..c837da780a79 100644 --- a/arch/parisc/mm/fault.c +++ b/arch/parisc/mm/fault.c @@ -329,14 +329,12 @@ void do_page_fault(struct pt_regs *regs, unsigned long code, else current->min_flt++; if (fault & VM_FAULT_RETRY) { - flags &= ~FAULT_FLAG_ALLOW_RETRY; - /* * No need to up_read(&mm->mmap_sem) as we would * have already released it in __lock_page_or_retry * in mm/filemap.c. */ - + flags |= FAULT_FLAG_TRIED; goto retry; } } diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c index d321a6c5fe62..321f24d0762f 100644 --- a/arch/powerpc/mm/fault.c +++ b/arch/powerpc/mm/fault.c @@ -588,13 +588,7 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address, * case. */ if (unlikely(fault & VM_FAULT_RETRY)) { - /* We retry only once */ if (flags & FAULT_FLAG_ALLOW_RETRY) { - /* - * Clear FAULT_FLAG_ALLOW_RETRY to avoid any risk - * of starvation. - */ - flags &= ~FAULT_FLAG_ALLOW_RETRY; flags |= FAULT_FLAG_TRIED; if (is_user && signal_pending(current)) return 0; diff --git a/arch/riscv/mm/fault.c b/arch/riscv/mm/fault.c index ea8f301de65b..d1710ef75432 100644 --- a/arch/riscv/mm/fault.c +++ b/arch/riscv/mm/fault.c @@ -143,11 +143,6 @@ asmlinkage void do_page_fault(struct pt_regs *regs) 1, regs, addr); } if (fault & VM_FAULT_RETRY) { - /* - * Clear FAULT_FLAG_ALLOW_RETRY to avoid any risk - * of starvation. - */ - flags &= ~(FAULT_FLAG_ALLOW_RETRY); flags |= FAULT_FLAG_TRIED; /* diff --git a/arch/s390/mm/fault.c b/arch/s390/mm/fault.c index 3ad77501deef..46ef1159d146 100644 --- a/arch/s390/mm/fault.c +++ b/arch/s390/mm/fault.c @@ -514,10 +514,7 @@ static inline vm_fault_t do_exception(struct pt_regs *regs, int access) fault = VM_FAULT_PFAULT; goto out_up; } - /* Clear FAULT_FLAG_ALLOW_RETRY to avoid any risk - * of starvation. */ - flags &= ~(FAULT_FLAG_ALLOW_RETRY | - FAULT_FLAG_RETRY_NOWAIT); + flags &= ~FAULT_FLAG_RETRY_NOWAIT; flags |= FAULT_FLAG_TRIED; down_read(&mm->mmap_sem); goto retry; diff --git a/arch/sh/mm/fault.c b/arch/sh/mm/fault.c index f620282a37fd..2e9cf3fd395f 100644 --- a/arch/sh/mm/fault.c +++ b/arch/sh/mm/fault.c @@ -481,7 +481,6 @@ asmlinkage void __kprobes do_page_fault(struct pt_regs *regs, regs, address); } if (fault & VM_FAULT_RETRY) { - flags &= ~FAULT_FLAG_ALLOW_RETRY; flags |= FAULT_FLAG_TRIED; /* diff --git a/arch/sparc/mm/fault_32.c b/arch/sparc/mm/fault_32.c index 9af0c3ad50d6..97494086f1e5 100644 --- a/arch/sparc/mm/fault_32.c +++ b/arch/sparc/mm/fault_32.c @@ -261,7 +261,6 @@ asmlinkage void do_sparc_fault(struct pt_regs *regs, int text_fault, int write, 1, regs, address); } if (fault & VM_FAULT_RETRY) { - flags &= ~FAULT_FLAG_ALLOW_RETRY; flags |= FAULT_FLAG_TRIED; /* No need to up_read(&mm->mmap_sem) as we would diff --git a/arch/sparc/mm/fault_64.c b/arch/sparc/mm/fault_64.c index 566f05f9040b..a1730c3a8f30 100644 --- a/arch/sparc/mm/fault_64.c +++ b/arch/sparc/mm/fault_64.c @@ -446,7 +446,6 @@ asmlinkage void __kprobes do_sparc64_fault(struct pt_regs *regs) 1, regs, address); } if (fault & VM_FAULT_RETRY) { - flags &= ~FAULT_FLAG_ALLOW_RETRY; flags |= FAULT_FLAG_TRIED; /* No need to up_read(&mm->mmap_sem) as we would diff --git a/arch/um/kernel/trap.c b/arch/um/kernel/trap.c index 3c72111f27e9..063da0930d31 100644 --- a/arch/um/kernel/trap.c +++ b/arch/um/kernel/trap.c @@ -98,7 +98,6 @@ int handle_page_fault(unsigned long address, unsigned long ip, else current->min_flt++; if (fault & VM_FAULT_RETRY) { - flags &= ~FAULT_FLAG_ALLOW_RETRY; flags |= FAULT_FLAG_TRIED; goto retry; diff --git a/arch/unicore32/mm/fault.c b/arch/unicore32/mm/fault.c index 04c193439c97..8b3367ec0d80 100644 --- a/arch/unicore32/mm/fault.c +++ b/arch/unicore32/mm/fault.c @@ -260,9 +260,7 @@ static int do_pf(unsigned long addr, unsigned int fsr, struct pt_regs *regs) else tsk->min_flt++; if (fault & VM_FAULT_RETRY) { - /* Clear FAULT_FLAG_ALLOW_RETRY to avoid any risk - * of starvation. */ - flags &= ~FAULT_FLAG_ALLOW_RETRY; + flags |= FAULT_FLAG_TRIED; goto retry; } } diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index f7836472961e..7664f0f89ef6 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -1447,9 +1447,7 @@ void do_user_addr_fault(struct pt_regs *regs, * that we made any progress. Handle this case first. */ if (unlikely(fault & VM_FAULT_RETRY)) { - /* Retry at most once */ if (flags & FAULT_FLAG_ALLOW_RETRY) { - flags &= ~FAULT_FLAG_ALLOW_RETRY; flags |= FAULT_FLAG_TRIED; if ((flags & FAULT_FLAG_USER) && signal_pending(tsk)) return; diff --git a/arch/xtensa/mm/fault.c b/arch/xtensa/mm/fault.c index 094606676c36..1d91a23d27d3 100644 --- a/arch/xtensa/mm/fault.c +++ b/arch/xtensa/mm/fault.c @@ -129,7 +129,6 @@ void do_page_fault(struct pt_regs *regs) else current->min_flt++; if (fault & VM_FAULT_RETRY) { - flags &= ~FAULT_FLAG_ALLOW_RETRY; flags |= FAULT_FLAG_TRIED; /* No need to up_read(&mm->mmap_sem) as we would diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c index 6dacff49c1cc..8f2f9ee6effa 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c @@ -61,9 +61,10 @@ static vm_fault_t ttm_bo_vm_fault_idle(struct ttm_buffer_object *bo, /* * If possible, avoid waiting for GPU with mmap_sem - * held. + * held. We only do this if the fault allows retry and this + * is the first attempt. */ - if (vmf->flags & FAULT_FLAG_ALLOW_RETRY) { + if (fault_flag_allow_retry_first(vmf->flags)) { ret = VM_FAULT_RETRY; if (vmf->flags & FAULT_FLAG_RETRY_NOWAIT) goto out_unlock; @@ -132,7 +133,12 @@ static vm_fault_t ttm_bo_vm_fault(struct vm_fault *vmf) * for the buffer to become unreserved. */ if (unlikely(!reservation_object_trylock(bo->resv))) { - if (vmf->flags & FAULT_FLAG_ALLOW_RETRY) { + /* + * If the fault allows retry and this is the first + * fault attempt, we try to release the mmap_sem + * before waiting + */ + if (fault_flag_allow_retry_first(vmf->flags)) { if (!(vmf->flags & FAULT_FLAG_RETRY_NOWAIT)) { ttm_bo_get(bo); up_read(&vmf->vma->vm_mm->mmap_sem); diff --git a/include/linux/mm.h b/include/linux/mm.h index 53ec7abb8472..0fdbdcb257d6 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -396,6 +396,25 @@ extern pgprot_t protection_map[16]; * @FAULT_FLAG_REMOTE: The fault is not for current task/mm. * @FAULT_FLAG_INSTRUCTION: The fault was during an instruction fetch. * @FAULT_FLAG_INTERRUPTIBLE: The fault can be interrupted by non-fatal signals. + * + * About @FAULT_FLAG_ALLOW_RETRY and @FAULT_FLAG_TRIED: we can specify + * whether we would allow page faults to retry by specifying these two + * fault flags correctly. Currently there can be three legal combinations: + * + * (a) ALLOW_RETRY and !TRIED: this means the page fault allows retry, and + * this is the first try + * + * (b) ALLOW_RETRY and TRIED: this means the page fault allows retry, and + * we've already tried at least once + * + * (c) !ALLOW_RETRY and !TRIED: this means the page fault does not allow retry + * + * The unlisted combination (!ALLOW_RETRY && TRIED) is illegal and should never + * be used. Note that page faults can be allowed to retry for multiple times, + * in which case we'll have an initial fault with flags (a) then later on + * continuous faults with flags (b). We should always try to detect pending + * signals before a retry to make sure the continuous page faults can still be + * interrupted if necessary. */ #define FAULT_FLAG_WRITE 0x01 #define FAULT_FLAG_MKWRITE 0x02 @@ -416,6 +435,24 @@ extern pgprot_t protection_map[16]; FAULT_FLAG_KILLABLE | \ FAULT_FLAG_INTERRUPTIBLE) +/** + * fault_flag_allow_retry_first - check ALLOW_RETRY the first time + * + * This is mostly used for places where we want to try to avoid taking + * the mmap_sem for too long a time when waiting for another condition + * to change, in which case we can try to be polite to release the + * mmap_sem in the first round to avoid potential starvation of other + * processes that would also want the mmap_sem. + * + * Return: true if the page fault allows retry and this is the first + * attempt of the fault handling; false otherwise. + */ +static inline bool fault_flag_allow_retry_first(unsigned int flags) +{ + return (flags & FAULT_FLAG_ALLOW_RETRY) && + (!(flags & FAULT_FLAG_TRIED)); +} + #define FAULT_FLAG_TRACE \ { FAULT_FLAG_WRITE, "WRITE" }, \ { FAULT_FLAG_MKWRITE, "MKWRITE" }, \ diff --git a/mm/filemap.c b/mm/filemap.c index d0cf700bf201..543404617f5a 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1399,7 +1399,7 @@ EXPORT_SYMBOL_GPL(__lock_page_killable); int __lock_page_or_retry(struct page *page, struct mm_struct *mm, unsigned int flags) { - if (flags & FAULT_FLAG_ALLOW_RETRY) { + if (fault_flag_allow_retry_first(flags)) { /* * CAUTION! In this case, mmap_sem is not released * even though return 0. diff --git a/mm/shmem.c b/mm/shmem.c index 2bed4761f279..1af8d8e60231 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -2011,7 +2011,7 @@ static vm_fault_t shmem_fault(struct vm_fault *vmf) DEFINE_WAIT_FUNC(shmem_fault_wait, synchronous_wake_function); ret = VM_FAULT_NOPAGE; - if ((vmf->flags & FAULT_FLAG_ALLOW_RETRY) && + if (fault_flag_allow_retry_first(vmf->flags) && !(vmf->flags & FAULT_FLAG_RETRY_NOWAIT)) { /* It's polite to up mmap_sem if we can */ up_read(&vma->vm_mm->mmap_sem); From patchwork Mon Sep 23 04:25:21 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 11156035 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4659014E5 for ; Mon, 23 Sep 2019 04:26:43 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 0FCF1208C2 for ; Mon, 23 Sep 2019 04:26:43 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0FCF1208C2 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 443C16B0275; Mon, 23 Sep 2019 00:26:42 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 4198F6B0276; Mon, 23 Sep 2019 00:26:42 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 308506B0277; Mon, 23 Sep 2019 00:26:42 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0116.hostedemail.com [216.40.44.116]) by kanga.kvack.org (Postfix) with ESMTP id 0F6F36B0275 for ; Mon, 23 Sep 2019 00:26:42 -0400 (EDT) Received: from smtpin25.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with SMTP id B8893181AC9AE for ; Mon, 23 Sep 2019 04:26:41 +0000 (UTC) X-FDA: 75964899402.25.rake37_74971f4b25c03 X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,peterx@redhat.com,::linux-kernel@vger.kernel.org:david@redhat.com:hughd@google.com:gokhale2@llnl.gov:jglisse@redhat.com:xemul@virtuozzo.com:hannes@cmpxchg.org:peterx@redhat.com:cracauer@cons.org:mcfadden8@llnl.gov:shli@fb.com:aarcange@redhat.com:mike.kravetz@oracle.com:dplotnikov@virtuozzo.com:rppt@linux.vnet.ibm.com:torvalds@linux-foundation.org:mgorman@suse.de:kirill@shutemov.name:dgilbert@redhat.com,RULES_HIT:30054:30070,0,RBL:209.132.183.28:@redhat.com:.lbl8.mailshell.net-62.18.0.0 64.10.201.10,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:24,LUA_SUMMARY:none X-HE-Tag: rake37_74971f4b25c03 X-Filterd-Recvd-Size: 6201 Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by imf16.hostedemail.com (Postfix) with ESMTP for ; Mon, 23 Sep 2019 04:26:41 +0000 (UTC) Received: from mail-pg1-f199.google.com (mail-pg1-f199.google.com [209.85.215.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 3F04B58569 for ; Mon, 23 Sep 2019 04:26:40 +0000 (UTC) Received: by mail-pg1-f199.google.com with SMTP id b12so8620729pgm.14 for ; Sun, 22 Sep 2019 21:26:40 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=AKkEfjdoHGbVoKbr4xUEMS07lQumUdJTKR+cWS6/cLc=; b=g4zO3I3zFqtI5m8pbOhw7iOSRfXd+AadZzBFSU+uQDkrAa/rhc2A7zwPO8JdGT+Wdh 5UI9MxTT1lsdJluSQ6igjSDVFgj2qGTbskIkujX/EUoiUvYnIDfxKzvH+eUP77zfFfxR MCtpECqWmGJgDhMwrgldrgUZ494fzqTfNaR6cqIiaT6njKjRC/EaBX99yWxMzWU4fH/K tyYw6ZB+SoOl0nh2PYxHuPRh4EI8BMhFdAlhqbRwG7QdeqW/I2I2qmR8u3mt3yjzQMsm 3QF2C5XszdLlJ/cZNRJivDh9KVL3yG6UhA1QzfuZ8Fq5oaPgkGfHjcR1BYgORaWy3YK2 eyGw== X-Gm-Message-State: APjAAAW2mSskrqyw+khT2k4suUFWhnN7u/Q5QUAW5RlCbJEud45fq/Va rcQlvi2hT8u+rCUgjcV84DDlB8yuMgt+SsB7DSfTBxIw5IYpiZHDtW2HIeVOt1sGsgJVKfcGTH0 XJX2SOtEguVA= X-Received: by 2002:a63:f745:: with SMTP id f5mr12778398pgk.175.1569212798978; Sun, 22 Sep 2019 21:26:38 -0700 (PDT) X-Google-Smtp-Source: APXvYqzCmcEfrc8QuXpBvHO5rswwqPBwY/o99XMn56GJV9UHiDMAd1lbEY/N2Jk/9XaUVCdbc3jZ2w== X-Received: by 2002:a63:f745:: with SMTP id f5mr12778387pgk.175.1569212798740; Sun, 22 Sep 2019 21:26:38 -0700 (PDT) Received: from xz-x1.redhat.com ([209.132.188.80]) by smtp.gmail.com with ESMTPSA id x20sm11781867pfp.120.2019.09.22.21.26.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 22 Sep 2019 21:26:38 -0700 (PDT) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: David Hildenbrand , Hugh Dickins , Maya Gokhale , Jerome Glisse , Pavel Emelyanov , Johannes Weiner , peterx@redhat.com, Martin Cracauer , Marty McFadden , Shaohua Li , Andrea Arcangeli , Mike Kravetz , Denis Plotnikov , Mike Rapoport , Linus Torvalds , Mel Gorman , "Kirill A . Shutemov" , "Dr . David Alan Gilbert" Subject: [PATCH v4 08/10] mm/gup: Allow VM_FAULT_RETRY for multiple times Date: Mon, 23 Sep 2019 12:25:21 +0800 Message-Id: <20190923042523.10027-9-peterx@redhat.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20190923042523.10027-1-peterx@redhat.com> References: <20190923042523.10027-1-peterx@redhat.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This is the gup counterpart of the change that allows the VM_FAULT_RETRY to happen for more than once. One thing to mention is that we must check the fatal signal here before retry because the GUP can be interrupted by that, otherwise we can loop forever. Signed-off-by: Peter Xu --- mm/gup.c | 27 +++++++++++++++++++++------ mm/hugetlb.c | 6 ++++-- 2 files changed, 25 insertions(+), 8 deletions(-) diff --git a/mm/gup.c b/mm/gup.c index e60d32f1674d..d2811bb15a25 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -644,7 +644,10 @@ static int faultin_page(struct task_struct *tsk, struct vm_area_struct *vma, if (*flags & FOLL_NOWAIT) fault_flags |= FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_RETRY_NOWAIT; if (*flags & FOLL_TRIED) { - VM_WARN_ON_ONCE(fault_flags & FAULT_FLAG_ALLOW_RETRY); + /* + * Note: FAULT_FLAG_ALLOW_RETRY and FAULT_FLAG_TRIED + * can co-exist + */ fault_flags |= FAULT_FLAG_TRIED; } @@ -994,7 +997,6 @@ int fixup_user_fault(struct task_struct *tsk, struct mm_struct *mm, down_read(&mm->mmap_sem); if (!(fault_flags & FAULT_FLAG_TRIED)) { *unlocked = true; - fault_flags &= ~FAULT_FLAG_ALLOW_RETRY; fault_flags |= FAULT_FLAG_TRIED; goto retry; } @@ -1069,17 +1071,30 @@ static __always_inline long __get_user_pages_locked(struct task_struct *tsk, if (likely(pages)) pages += ret; start += ret << PAGE_SHIFT; + lock_dropped = true; +retry: /* * Repeat on the address that fired VM_FAULT_RETRY - * without FAULT_FLAG_ALLOW_RETRY but with - * FAULT_FLAG_TRIED. + * with both FAULT_FLAG_ALLOW_RETRY and + * FAULT_FLAG_TRIED. Note that GUP can be interrupted + * by fatal signals, so we need to check it before we + * start trying again otherwise it can loop forever. */ + + if (fatal_signal_pending(current)) + break; + *locked = 1; - lock_dropped = true; down_read(&mm->mmap_sem); + ret = __get_user_pages(tsk, mm, start, 1, flags | FOLL_TRIED, - pages, NULL, NULL); + pages, NULL, locked); + if (!*locked) { + /* Continue to retry until we succeeded */ + BUG_ON(ret != 0); + goto retry; + } if (ret != 1) { BUG_ON(ret > 1); if (!pages_done) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 31c2a6275023..d0c98cff5b0f 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -4347,8 +4347,10 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, fault_flags |= FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_RETRY_NOWAIT; if (flags & FOLL_TRIED) { - VM_WARN_ON_ONCE(fault_flags & - FAULT_FLAG_ALLOW_RETRY); + /* + * Note: FAULT_FLAG_ALLOW_RETRY and + * FAULT_FLAG_TRIED can co-exist + */ fault_flags |= FAULT_FLAG_TRIED; } ret = hugetlb_fault(mm, vma, vaddr, fault_flags); From patchwork Mon Sep 23 04:25:22 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 11156039 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 322601599 for ; Mon, 23 Sep 2019 04:26:51 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id F3D0420820 for ; Mon, 23 Sep 2019 04:26:50 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org F3D0420820 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 3CCA66B0007; Mon, 23 Sep 2019 00:26:50 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 37CDD6B0278; Mon, 23 Sep 2019 00:26:50 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 273EE6B0007; Mon, 23 Sep 2019 00:26:50 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0133.hostedemail.com [216.40.44.133]) by kanga.kvack.org (Postfix) with ESMTP id 069956B0007 for ; Mon, 23 Sep 2019 00:26:49 -0400 (EDT) Received: from smtpin06.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with SMTP id B730940F9 for ; Mon, 23 Sep 2019 04:26:49 +0000 (UTC) X-FDA: 75964899738.06.turn46_75b9885495b2c X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,peterx@redhat.com,::linux-kernel@vger.kernel.org:david@redhat.com:hughd@google.com:gokhale2@llnl.gov:jglisse@redhat.com:xemul@virtuozzo.com:hannes@cmpxchg.org:peterx@redhat.com:cracauer@cons.org:mcfadden8@llnl.gov:shli@fb.com:aarcange@redhat.com:mike.kravetz@oracle.com:dplotnikov@virtuozzo.com:rppt@linux.vnet.ibm.com:torvalds@linux-foundation.org:mgorman@suse.de:kirill@shutemov.name:dgilbert@redhat.com,RULES_HIT:30003:30012:30036:30054,0,RBL:209.132.183.28:@redhat.com:.lbl8.mailshell.net-62.18.0.0 64.10.201.10,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:46,LUA_SUMMARY:none X-HE-Tag: turn46_75b9885495b2c X-Filterd-Recvd-Size: 5801 Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by imf38.hostedemail.com (Postfix) with ESMTP for ; Mon, 23 Sep 2019 04:26:48 +0000 (UTC) Received: from mail-pl1-f200.google.com (mail-pl1-f200.google.com [209.85.214.200]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id EDEFB2DD43 for ; Mon, 23 Sep 2019 04:26:47 +0000 (UTC) Received: by mail-pl1-f200.google.com with SMTP id y2so7966060plk.19 for ; Sun, 22 Sep 2019 21:26:47 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=Zt8EwZyFUvlwT1VYMNYV6I/TnTem7aMxIFLElv6YJ/M=; b=CWi5MLwAusGZ03euquC0DPIR76AuXFie824+pNxXxLzU9ASItt3v7E9Ps3PdikX5fC +F+FW4fCUyuTwkPkxUBQU9pII9KYUIKqhpEV5fiW8DlXxebNEtBFWJGWpJBAo8jkFLbe +LCAmd0vcUL18/yfrsP1q6tkcHoIdrIRIBgJHPceUUdKb1htESeGIKFL6MFCjF+kUIO9 Rc2da9hMPtFUDNxjMw8Dy7L3z5JvN2R6u/uDlzg++cljm7nH5Do7P5hSDUYbGWVd9/jQ AJKcKt8x5VLOlpOrphbcOTZovBFqrRkwS7bH+xtzYKPHdCNx5Tpb6YvJnqGbZyZo8ujw N4SQ== X-Gm-Message-State: APjAAAVmBDT6xibDkxhYJhp7YLzAYWhBRlP+qHut2WVWkWnw0AWv+9mU Jr9Ekvdyn+WXz6EvFz1JxWbHvIWtV/qn8NzD4cKyuaVvV86xBQ3hjxV8KRTDpZAzbDzBocLeMU1 1VWo4LaOgqQw= X-Received: by 2002:a17:90a:154f:: with SMTP id y15mr18843071pja.73.1569212805395; Sun, 22 Sep 2019 21:26:45 -0700 (PDT) X-Google-Smtp-Source: APXvYqwNMbTEwVZcWOawKIWtuccnJenC83wJpAVi10A5pUVIrEmJU1iYlQhFZ/KIbnUTXUKCTHPwqg== X-Received: by 2002:a17:90a:154f:: with SMTP id y15mr18843054pja.73.1569212805196; Sun, 22 Sep 2019 21:26:45 -0700 (PDT) Received: from xz-x1.redhat.com ([209.132.188.80]) by smtp.gmail.com with ESMTPSA id x20sm11781867pfp.120.2019.09.22.21.26.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 22 Sep 2019 21:26:44 -0700 (PDT) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: David Hildenbrand , Hugh Dickins , Maya Gokhale , Jerome Glisse , Pavel Emelyanov , Johannes Weiner , peterx@redhat.com, Martin Cracauer , Marty McFadden , Shaohua Li , Andrea Arcangeli , Mike Kravetz , Denis Plotnikov , Mike Rapoport , Linus Torvalds , Mel Gorman , "Kirill A . Shutemov" , "Dr . David Alan Gilbert" Subject: [PATCH v4 09/10] mm/gup: Allow to react to fatal signals Date: Mon, 23 Sep 2019 12:25:22 +0800 Message-Id: <20190923042523.10027-10-peterx@redhat.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20190923042523.10027-1-peterx@redhat.com> References: <20190923042523.10027-1-peterx@redhat.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The existing gup code does not react to the fatal signals in many code paths. For example, in one retry path of gup we're still using down_read() rather than down_read_killable(). Also, when doing page faults we don't pass in FAULT_FLAG_KILLABLE as well, which means that within the faulting process we'll wait in non-killable way as well. These were spotted by Linus during the code review of some other patches. Let's allow the gup code to react to fatal signals to improve the responsiveness of threads when during gup and being killed. Signed-off-by: Peter Xu --- mm/gup.c | 12 +++++++++--- mm/hugetlb.c | 3 ++- 2 files changed, 11 insertions(+), 4 deletions(-) diff --git a/mm/gup.c b/mm/gup.c index d2811bb15a25..4c638473db83 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -640,7 +640,7 @@ static int faultin_page(struct task_struct *tsk, struct vm_area_struct *vma, if (*flags & FOLL_REMOTE) fault_flags |= FAULT_FLAG_REMOTE; if (locked) - fault_flags |= FAULT_FLAG_ALLOW_RETRY; + fault_flags |= FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE; if (*flags & FOLL_NOWAIT) fault_flags |= FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_RETRY_NOWAIT; if (*flags & FOLL_TRIED) { @@ -973,7 +973,7 @@ int fixup_user_fault(struct task_struct *tsk, struct mm_struct *mm, vm_fault_t ret, major = 0; if (unlocked) - fault_flags |= FAULT_FLAG_ALLOW_RETRY; + fault_flags |= FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE; retry: vma = find_extend_vma(mm, address); @@ -1086,7 +1086,13 @@ static __always_inline long __get_user_pages_locked(struct task_struct *tsk, break; *locked = 1; - down_read(&mm->mmap_sem); + ret = down_read_killable(&mm->mmap_sem); + if (ret) { + BUG_ON(ret > 0); + if (!pages_done) + pages_done = ret; + break; + } ret = __get_user_pages(tsk, mm, start, 1, flags | FOLL_TRIED, pages, NULL, locked); diff --git a/mm/hugetlb.c b/mm/hugetlb.c index d0c98cff5b0f..84034154d50e 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -4342,7 +4342,8 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, if (flags & FOLL_WRITE) fault_flags |= FAULT_FLAG_WRITE; if (locked) - fault_flags |= FAULT_FLAG_ALLOW_RETRY; + fault_flags |= FAULT_FLAG_ALLOW_RETRY | + FAULT_FLAG_KILLABLE; if (flags & FOLL_NOWAIT) fault_flags |= FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_RETRY_NOWAIT; From patchwork Mon Sep 23 04:25:23 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 11156041 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4995214E5 for ; Mon, 23 Sep 2019 04:26:56 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 160D120820 for ; Mon, 23 Sep 2019 04:26:56 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 160D120820 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 3CC4E6B000D; Mon, 23 Sep 2019 00:26:55 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 37C5A6B0278; Mon, 23 Sep 2019 00:26:55 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2B9306B0279; Mon, 23 Sep 2019 00:26:55 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0178.hostedemail.com [216.40.44.178]) by kanga.kvack.org (Postfix) with ESMTP id 0A8106B000D for ; Mon, 23 Sep 2019 00:26:55 -0400 (EDT) Received: from smtpin26.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with SMTP id 80145824CA1F for ; Mon, 23 Sep 2019 04:26:54 +0000 (UTC) X-FDA: 75964899948.26.cars22_7662c8e6bb104 X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,peterx@redhat.com,::linux-kernel@vger.kernel.org:david@redhat.com:hughd@google.com:gokhale2@llnl.gov:jglisse@redhat.com:xemul@virtuozzo.com:hannes@cmpxchg.org:peterx@redhat.com:cracauer@cons.org:mcfadden8@llnl.gov:shli@fb.com:aarcange@redhat.com:mike.kravetz@oracle.com:dplotnikov@virtuozzo.com:rppt@linux.vnet.ibm.com:torvalds@linux-foundation.org:mgorman@suse.de:kirill@shutemov.name:dgilbert@redhat.com,RULES_HIT:30012:30054:30070,0,RBL:209.132.183.28:@redhat.com:.lbl8.mailshell.net-62.18.0.0 64.10.201.10,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:2,LUA_SUMMARY:none X-HE-Tag: cars22_7662c8e6bb104 X-Filterd-Recvd-Size: 6577 Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by imf47.hostedemail.com (Postfix) with ESMTP for ; Mon, 23 Sep 2019 04:26:53 +0000 (UTC) Received: from mail-pg1-f200.google.com (mail-pg1-f200.google.com [209.85.215.200]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 9155D5859E for ; Mon, 23 Sep 2019 04:26:52 +0000 (UTC) Received: by mail-pg1-f200.google.com with SMTP id h189so8215037pgc.4 for ; Sun, 22 Sep 2019 21:26:52 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=N4oZOlVdiq1iUNu2ipKdKrBjjBpOUecRmSvPVqDDXZc=; b=BvPC+rmxWBvbg1kOcohgucZplMj+jsVkNdUtPBg4SHVw+F27h1b0PlYIM70lz5A0fz rTJMU4FqNEHzzogo9YL94vGDtKTnqrnC5vAogwPwGUicUEa5GoBCxgnq/xWKsLViLXPz mFZlO2rqHOf6XQupsoyca7l/UeXw4f9I/bQ9+1e/9yRaBjkkdDnnuCuKACHPOZCwGBqf eanEqQ6JUi9XwayMa4CiNQR90GEI+TO2szKHSxtJDa/2zseWMyrvDeLjmAgmtf+wU+h1 a/Ngk/BvP0keS4bcGLcHf+T//nG1IJsFiKb+6yhye5PZD7DVc67fYtps20VUqlJwY/87 WoHg== X-Gm-Message-State: APjAAAUwHmPsMY8YqNyxhHgg0HzA5EOjMsk++53IEih0i4V1CtLiDVmK thyrvOIaR/z9fzUSLE0qJYuXen8QXHYn73STQPxoWL6qwtg+l4CXUfBsteCiPDHZo/rnkD7k1br 7yNFBCLRYYQc= X-Received: by 2002:a17:902:166:: with SMTP id 93mr30567167plb.195.1569212811918; Sun, 22 Sep 2019 21:26:51 -0700 (PDT) X-Google-Smtp-Source: APXvYqye+DK7u0JotOoJqpvnTZrRb6/Qt3/D8C+ZXa1+qVSUReRHxlafT5dNKcY1JDA9hSKH4gDDqQ== X-Received: by 2002:a17:902:166:: with SMTP id 93mr30567137plb.195.1569212811739; Sun, 22 Sep 2019 21:26:51 -0700 (PDT) Received: from xz-x1.redhat.com ([209.132.188.80]) by smtp.gmail.com with ESMTPSA id x20sm11781867pfp.120.2019.09.22.21.26.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 22 Sep 2019 21:26:51 -0700 (PDT) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: David Hildenbrand , Hugh Dickins , Maya Gokhale , Jerome Glisse , Pavel Emelyanov , Johannes Weiner , peterx@redhat.com, Martin Cracauer , Marty McFadden , Shaohua Li , Andrea Arcangeli , Mike Kravetz , Denis Plotnikov , Mike Rapoport , Linus Torvalds , Mel Gorman , "Kirill A . Shutemov" , "Dr . David Alan Gilbert" Subject: [PATCH v4 10/10] mm/userfaultfd: Honor FAULT_FLAG_KILLABLE in fault path Date: Mon, 23 Sep 2019 12:25:23 +0800 Message-Id: <20190923042523.10027-11-peterx@redhat.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20190923042523.10027-1-peterx@redhat.com> References: <20190923042523.10027-1-peterx@redhat.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Userfaultfd fault path was by default killable even if the caller does not have FAULT_FLAG_KILLABLE. That makes sense before in that when with gup we don't have FAULT_FLAG_KILLABLE properly set before. Now after previous patch we've got FAULT_FLAG_KILLABLE applied even for gup code so it should also make sense to let userfaultfd to honor the FAULT_FLAG_KILLABLE. Because we're unconditionally setting FAULT_FLAG_KILLABLE in gup code right now, this patch should have no functional change. It also cleaned the code a little bit by introducing some helpers. Signed-off-by: Peter Xu --- fs/userfaultfd.c | 36 ++++++++++++++++++++++++++++-------- 1 file changed, 28 insertions(+), 8 deletions(-) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index 2b3b48e94ae4..8c5863ccbf0e 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -334,6 +334,30 @@ static inline bool userfaultfd_must_wait(struct userfaultfd_ctx *ctx, return ret; } +/* Should pair with userfaultfd_signal_pending() */ +static inline long userfaultfd_get_blocking_state(unsigned int flags) +{ + if (flags & FAULT_FLAG_INTERRUPTIBLE) + return TASK_INTERRUPTIBLE; + + if (flags & FAULT_FLAG_KILLABLE) + return TASK_KILLABLE; + + return TASK_UNINTERRUPTIBLE; +} + +/* Should pair with userfaultfd_get_blocking_state() */ +static inline bool userfaultfd_signal_pending(unsigned int flags) +{ + if (flags & FAULT_FLAG_INTERRUPTIBLE) + return signal_pending(current); + + if (flags & FAULT_FLAG_KILLABLE) + return fatal_signal_pending(current); + + return false; +} + /* * The locking rules involved in returning VM_FAULT_RETRY depending on * FAULT_FLAG_ALLOW_RETRY, FAULT_FLAG_RETRY_NOWAIT and @@ -355,7 +379,7 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason) struct userfaultfd_ctx *ctx; struct userfaultfd_wait_queue uwq; vm_fault_t ret = VM_FAULT_SIGBUS; - bool must_wait, return_to_userland; + bool must_wait; long blocking_state; /* @@ -462,9 +486,7 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason) uwq.ctx = ctx; uwq.waken = false; - return_to_userland = vmf->flags & FAULT_FLAG_INTERRUPTIBLE; - blocking_state = return_to_userland ? TASK_INTERRUPTIBLE : - TASK_KILLABLE; + blocking_state = userfaultfd_get_blocking_state(vmf->flags); spin_lock_irq(&ctx->fault_pending_wqh.lock); /* @@ -490,8 +512,7 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason) up_read(&mm->mmap_sem); if (likely(must_wait && !READ_ONCE(ctx->released) && - (return_to_userland ? !signal_pending(current) : - !fatal_signal_pending(current)))) { + userfaultfd_signal_pending(vmf->flags))) { wake_up_poll(&ctx->fd_wqh, EPOLLIN); schedule(); ret |= VM_FAULT_MAJOR; @@ -513,8 +534,7 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason) set_current_state(blocking_state); if (READ_ONCE(uwq.waken) || READ_ONCE(ctx->released) || - (return_to_userland ? signal_pending(current) : - fatal_signal_pending(current))) + userfaultfd_signal_pending(vmf->flags)) break; schedule(); }