From patchwork Wed Nov 21 06:21:00 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 10691993 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A80AA13BF for ; Wed, 21 Nov 2018 06:21:38 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 853252B88B for ; Wed, 21 Nov 2018 06:21:38 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 834682B88D; Wed, 21 Nov 2018 06:21:38 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0A6BC2B891 for ; Wed, 21 Nov 2018 06:21:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 005F66B2485; Wed, 21 Nov 2018 01:21:28 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id EF8936B2486; Wed, 21 Nov 2018 01:21:27 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DC0206B2487; Wed, 21 Nov 2018 01:21:27 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qt1-f199.google.com (mail-qt1-f199.google.com [209.85.160.199]) by kanga.kvack.org (Postfix) with ESMTP id ADFA56B2485 for ; Wed, 21 Nov 2018 01:21:27 -0500 (EST) Received: by mail-qt1-f199.google.com with SMTP id t42so2465786qth.12 for ; Tue, 20 Nov 2018 22:21:27 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=XwYSKB2sIwOJaYf9H5Pu5gmqI1iBHqd91Nz70b/uAtY=; b=itprB/Zv4jdoZQimztJ33OaePwarespi+ZMHF1hHfJsHtTPnACYN0iyGU14BurUFd5 dY6CCfJLwDq1Z4IgVFkHTgbWEFthm1R9h043bKVvjStZEX2o7593NoMCJMjFVBQd65C2 7s9glc7ByILigW/KrjvucC22rVI6wASIWP/YAM0ZItUTESw4LKrbbPXmvEY3os1mb1lI BjmuwLmkF5wc8j50yaW+LVX4Ik0uAOcV1TBqSHuhduTevH291n3RocVS9wyjMfEIC7U0 FlWZ5VPrbeqbqsFdTGJ+FkTn1JryPDG1z9ezAA7WDRlOmXUrdQBcb9iyrucrhbkwphYn N7bw== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of peterx@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com X-Gm-Message-State: AA+aEWZ4KrUwxasEwLBNln1haoOTjzl9ZnD7jRty4tniU6BSX+6ntjHg He/d2Upz1hHh11lVRL2LjYKpiQrSy/qF0hgecqGvxcCL57GoG6PFLbDmJQehUDE5CKd6D5mobjM gzC511aRBC+fclgMyf+m7TUwr3rpoTV2cnT7eEc7LZrepxvDcb04zy0aAnjxdIAWbeA== X-Received: by 2002:a0c:8b64:: with SMTP id d36mr4641798qvc.233.1542781287456; Tue, 20 Nov 2018 22:21:27 -0800 (PST) X-Google-Smtp-Source: AFSGD/VxG4Dm0wY9Mr7T/1ivCN0y0rfwABDkqauCwdu3FJBP9rYnVkDVw0SuHMdiHpibtEQT91Vw X-Received: by 2002:a0c:8b64:: with SMTP id d36mr4641766qvc.233.1542781286605; Tue, 20 Nov 2018 22:21:26 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1542781286; cv=none; d=google.com; s=arc-20160816; b=0hNmgnq9Zbe+ku+aZuhOkGV0OW+sNdeYw9tjIYjOq4/TxjEnakZ1/hWlgcmYlVa1Ou BfCTdQ/dBojvV8pn7JksSLK28L2hhZKWmpDQ2mYrLX5FcYKbUj+7znOKymdZC1D+NwXT KjnZM9ZpgFaieE1RUUMgglKaA3Ap/PhisP1KRKXbllkRfwIpVjy12M0jVCDomeMHGMlj rx8Ao68ZhM6qjaEQDJiLdErkH9z1dWhxtC0eF9b4U2z1CiQariKDVvtP8KytDWEhOXfn 7rSjMPh86BrzrMwzmPrfb4q8WIUPui18kAhFvQAUCF7uPUQjie2fYtecYH8YJxbO3SPt d9hg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=XwYSKB2sIwOJaYf9H5Pu5gmqI1iBHqd91Nz70b/uAtY=; b=kCP2vvvLb0L0e2akOhatM929iI2W80WrCSv0m23ZQ96/wYbtBKcWm8nHRl0jTQPTiY RkBCcUQ3GVXBVRUoXaaFFOYGUbM7P28uuRFQUVgcHwjE8r/S7RqMg4r2erxLUcxcsYHp Vaydg5aNhxXGEzVXJMRqaFhUpUnMlHxEiehYB27+hTYDfVmpD8ksLIbMkE2fNGR6EnLP daBxB+Zlk2omODskMl1D+YgScC9VR1otLGlgrbognFvQFCERMNw+22iwjWJUkNxOc5BV vjUANbDavjiFC3GcMAzTyOp8s2uhtqiR6hsPq+05z5mHhknXoUVpUVc33pNFWTLPNqNS BCnw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of peterx@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from mx1.redhat.com (mx1.redhat.com. [209.132.183.28]) by mx.google.com with ESMTPS id r22si2517543qtp.273.2018.11.20.22.21.26 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 20 Nov 2018 22:21:26 -0800 (PST) Received-SPF: pass (google.com: domain of peterx@redhat.com designates 209.132.183.28 as permitted sender) client-ip=209.132.183.28; Authentication-Results: mx.google.com; spf=pass (google.com: domain of peterx@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 983A137F41; Wed, 21 Nov 2018 06:21:25 +0000 (UTC) Received: from xz-x1.nay.redhat.com (dhcp-14-128.nay.redhat.com [10.66.14.128]) by smtp.corp.redhat.com (Postfix) with ESMTP id C21A9600C5; Wed, 21 Nov 2018 06:21:19 +0000 (UTC) From: Peter Xu To: linux-kernel@vger.kernel.org Cc: Keith Busch , Jerome Glisse , peterx@redhat.com, Dan Williams , linux-mm@kvack.org, Matthew Wilcox , Al Viro , Andrea Arcangeli , Huang Ying , Mike Kravetz , Mike Rapoport , Linus Torvalds , "Michael S. Tsirkin" , "Kirill A . Shutemov" , Michal Hocko , Vlastimil Babka , Pavel Tatashin , Andrew Morton Subject: [PATCH RFC v3 1/4] mm: gup: rename "nonblocking" to "locked" where proper Date: Wed, 21 Nov 2018 14:21:00 +0800 Message-Id: <20181121062103.18835-2-peterx@redhat.com> In-Reply-To: <20181121062103.18835-1-peterx@redhat.com> References: <20181121062103.18835-1-peterx@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.29]); Wed, 21 Nov 2018 06:21:25 +0000 (UTC) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP There's plenty of places around __get_user_pages() that has a parameter "nonblocking" which does not really mean that "it won't block" (because it can really block) but instead it shows whether the mmap_sem is released by up_read() during the page fault handling mostly when VM_FAULT_RETRY is returned. We have the correct naming in e.g. get_user_pages_locked() or get_user_pages_remote() as "locked", however there're still many places that are using the "nonblocking" as name. Renaming the places to "locked" where proper to better suite the functionality of the variable. While at it, fixing up some of the comments accordingly. Signed-off-by: Peter Xu --- mm/gup.c | 44 +++++++++++++++++++++----------------------- mm/hugetlb.c | 8 ++++---- 2 files changed, 25 insertions(+), 27 deletions(-) diff --git a/mm/gup.c b/mm/gup.c index aa43620a3270..a40111c742ba 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -506,12 +506,12 @@ static int get_gate_page(struct mm_struct *mm, unsigned long address, } /* - * mmap_sem must be held on entry. If @nonblocking != NULL and - * *@flags does not include FOLL_NOWAIT, the mmap_sem may be released. - * If it is, *@nonblocking will be set to 0 and -EBUSY returned. + * mmap_sem must be held on entry. If @locked != NULL and *@flags + * does not include FOLL_NOWAIT, the mmap_sem may be released. If it + * is, *@locked will be set to 0 and -EBUSY returned. */ static int faultin_page(struct task_struct *tsk, struct vm_area_struct *vma, - unsigned long address, unsigned int *flags, int *nonblocking) + unsigned long address, unsigned int *flags, int *locked) { unsigned int fault_flags = 0; vm_fault_t ret; @@ -523,7 +523,7 @@ static int faultin_page(struct task_struct *tsk, struct vm_area_struct *vma, fault_flags |= FAULT_FLAG_WRITE; if (*flags & FOLL_REMOTE) fault_flags |= FAULT_FLAG_REMOTE; - if (nonblocking) + if (locked) fault_flags |= FAULT_FLAG_ALLOW_RETRY; if (*flags & FOLL_NOWAIT) fault_flags |= FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_RETRY_NOWAIT; @@ -549,8 +549,8 @@ static int faultin_page(struct task_struct *tsk, struct vm_area_struct *vma, } if (ret & VM_FAULT_RETRY) { - if (nonblocking && !(fault_flags & FAULT_FLAG_RETRY_NOWAIT)) - *nonblocking = 0; + if (locked && !(fault_flags & FAULT_FLAG_RETRY_NOWAIT)) + *locked = 0; return -EBUSY; } @@ -627,7 +627,7 @@ static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags) * only intends to ensure the pages are faulted in. * @vmas: array of pointers to vmas corresponding to each page. * Or NULL if the caller does not require them. - * @nonblocking: whether waiting for disk IO or mmap_sem contention + * @locked: whether we're still with the mmap_sem held * * Returns number of pages pinned. This may be fewer than the number * requested. If nr_pages is 0 or negative, returns 0. If no pages @@ -656,13 +656,11 @@ static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags) * appropriate) must be called after the page is finished with, and * before put_page is called. * - * If @nonblocking != NULL, __get_user_pages will not wait for disk IO - * or mmap_sem contention, and if waiting is needed to pin all pages, - * *@nonblocking will be set to 0. Further, if @gup_flags does not - * include FOLL_NOWAIT, the mmap_sem will be released via up_read() in - * this case. + * If @locked != NULL, *@locked will be set to 0 when mmap_sem is + * released by an up_read(). That can happen if @gup_flags does not + * has FOLL_NOWAIT. * - * A caller using such a combination of @nonblocking and @gup_flags + * A caller using such a combination of @locked and @gup_flags * must therefore hold the mmap_sem for reading only, and recognize * when it's been released. Otherwise, it must be held for either * reading or writing and will not be released. @@ -674,7 +672,7 @@ static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags) static long __get_user_pages(struct task_struct *tsk, struct mm_struct *mm, unsigned long start, unsigned long nr_pages, unsigned int gup_flags, struct page **pages, - struct vm_area_struct **vmas, int *nonblocking) + struct vm_area_struct **vmas, int *locked) { long ret = 0, i = 0; struct vm_area_struct *vma = NULL; @@ -719,7 +717,7 @@ static long __get_user_pages(struct task_struct *tsk, struct mm_struct *mm, if (is_vm_hugetlb_page(vma)) { i = follow_hugetlb_page(mm, vma, pages, vmas, &start, &nr_pages, i, - gup_flags, nonblocking); + gup_flags, locked); continue; } } @@ -737,7 +735,7 @@ static long __get_user_pages(struct task_struct *tsk, struct mm_struct *mm, page = follow_page_mask(vma, start, foll_flags, &ctx); if (!page) { ret = faultin_page(tsk, vma, start, &foll_flags, - nonblocking); + locked); switch (ret) { case 0: goto retry; @@ -1196,7 +1194,7 @@ EXPORT_SYMBOL(get_user_pages_longterm); * @vma: target vma * @start: start address * @end: end address - * @nonblocking: + * @locked: whether the mmap_sem is still held * * This takes care of mlocking the pages too if VM_LOCKED is set. * @@ -1204,14 +1202,14 @@ EXPORT_SYMBOL(get_user_pages_longterm); * * vma->vm_mm->mmap_sem must be held. * - * If @nonblocking is NULL, it may be held for read or write and will + * If @locked is NULL, it may be held for read or write and will * be unperturbed. * - * If @nonblocking is non-NULL, it must held for read only and may be - * released. If it's released, *@nonblocking will be set to 0. + * If @locked is non-NULL, it must held for read only and may be + * released. If it's released, *@locked will be set to 0. */ long populate_vma_page_range(struct vm_area_struct *vma, - unsigned long start, unsigned long end, int *nonblocking) + unsigned long start, unsigned long end, int *locked) { struct mm_struct *mm = vma->vm_mm; unsigned long nr_pages = (end - start) / PAGE_SIZE; @@ -1246,7 +1244,7 @@ long populate_vma_page_range(struct vm_area_struct *vma, * not result in a stack expansion that recurses back here. */ return __get_user_pages(current, mm, start, nr_pages, gup_flags, - NULL, NULL, nonblocking); + NULL, NULL, locked); } /* diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 7f2a28ab46d5..a131b9e750d3 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -4181,7 +4181,7 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, struct page **pages, struct vm_area_struct **vmas, unsigned long *position, unsigned long *nr_pages, - long i, unsigned int flags, int *nonblocking) + long i, unsigned int flags, int *locked) { unsigned long pfn_offset; unsigned long vaddr = *position; @@ -4252,7 +4252,7 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, spin_unlock(ptl); if (flags & FOLL_WRITE) fault_flags |= FAULT_FLAG_WRITE; - if (nonblocking) + if (locked) fault_flags |= FAULT_FLAG_ALLOW_RETRY; if (flags & FOLL_NOWAIT) fault_flags |= FAULT_FLAG_ALLOW_RETRY | @@ -4269,8 +4269,8 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, break; } if (ret & VM_FAULT_RETRY) { - if (nonblocking) - *nonblocking = 0; + if (locked) + *locked = 0; *nr_pages = 0; /* * VM_FAULT_RETRY must not return an From patchwork Wed Nov 21 06:21:01 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 10691995 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4DB3013BF for ; Wed, 21 Nov 2018 06:21:43 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 39F942B888 for ; Wed, 21 Nov 2018 06:21:43 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 382102B892; Wed, 21 Nov 2018 06:21:43 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DD6002B88F for ; Wed, 21 Nov 2018 06:21:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B77516B2487; Wed, 21 Nov 2018 01:21:40 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id B01196B2488; Wed, 21 Nov 2018 01:21:40 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9C7AD6B2489; Wed, 21 Nov 2018 01:21:40 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qt1-f199.google.com (mail-qt1-f199.google.com [209.85.160.199]) by kanga.kvack.org (Postfix) with ESMTP id 6637C6B2487 for ; Wed, 21 Nov 2018 01:21:40 -0500 (EST) Received: by mail-qt1-f199.google.com with SMTP id b16so2478927qtc.22 for ; Tue, 20 Nov 2018 22:21:40 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=oDkNo0Fqp3W0fuAsGwWrNIlUWJ1WposWzxI1xgQsAjw=; b=THpJB1K+UER7wCa9RulvKNZs692AfM3MuTRnwxLLO2YYNz1aMRsmcpNHRb1TokgHqt KiiuEq6Bz6IwFoBoWoS6a2AcaunUYU8dB6LVTiIYFhZnPS/s7c3hWiroWBtmBApnKKs1 oCfztgujMDueysOyL8+npABa05Cbam+RvXLfdLJy3DNmB+AaDc11hvQlObyO7bRa0Olu MAaWa5eAW9ofnCSSBt/rUsFNHVXhVGmKhun7c4meTlc6/DSgtd4KS6P/aeko5GtQI67+ slzyebgoQviW/DXezM3L0ydnQddMq9bbsM6W7XRKmp8losMrXMnvpRg8VQTPvWLjM0Ym 9j8Q== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of peterx@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com X-Gm-Message-State: AGRZ1gLjDMwTyZuxlNb+ZIftToZPIsmvv+OH0WeWRobAW++BnsUBqSMd XJdGv2B29J4JuuriL6l4KurJ0t4qqdbbmSwJLvy7y2t/aOVrA0JMCWRf2ohz95yu4KjUiwRcwnV 5XDh30fBTdwAqbVppXgep9YWCdST/PqSCHJpKuSOAZ/2p1w12fjBMq2HUYMYJxHj2Fw== X-Received: by 2002:ac8:1:: with SMTP id a1mr4682767qtg.366.1542781300147; Tue, 20 Nov 2018 22:21:40 -0800 (PST) X-Google-Smtp-Source: AJdET5fpIYf+HpN3UN/21cRwAZRPYaBAoPYtqIHwySbe/Wgobb4Y+zQXBx7dgsOG7qz75n2dOTLe X-Received: by 2002:ac8:1:: with SMTP id a1mr4682716qtg.366.1542781298620; Tue, 20 Nov 2018 22:21:38 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1542781298; cv=none; d=google.com; s=arc-20160816; b=WTjFSZSSvmb/SJHWkqbkK4+ZllQdlBz+I4gmWNEdAtPmGGKgaz9ro6MZobvZqrVDab cLKYBDd9D9mIEsq/sk3mapF8TSXd1w8ua8g3Q1jQzQ4lfJJGFX6LnyUGr/hnAaKUXjLx uhkHGI2VMyB2GqyvDtRh8jjk2JpHqqzytYxMlvQCxF8yacZgw6rneJlH5f6tVVT4xeI3 /E+JWd3pp4pDnnCn+PVG12DUgGTfRyeuop2O47ydJSvD1kqqIqixCeKKIXe2axU1nyaw CUyuAHqMeV0TF5hbPtK7ZiIHXy3xWslSlCfYQ6fIlVVRrKRhGp4scUSIl9Z20brc3Xjc uvrg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=oDkNo0Fqp3W0fuAsGwWrNIlUWJ1WposWzxI1xgQsAjw=; b=cjFUmt1ZUJ6zADRveq/WpOe8xjNaSnezr2sBzH6yFUetBvZ1SYr+GTYPf2dH11YAw4 Jk4isRPXICU1EpVDEuHAVdpNM+96lCYgRzW29q4ZmoHXqGTTPYV9QgAXYQ9PrrGtaHnJ rXwxhfgeJz0hbECvdOHTJfDSdNPmumJ9V2a9iVx6aTUwBSBnDYJGwNaunSfq9zHSAil3 9jl7xRFlB1mJBSZCTP1ne4z4DLjv94jwpnK21XpOBWCBr0o7R7gJBgEq2Rg+8aLA8JDb q4sGm7t9LCFkKkDy7lbUsumKIUXRDA+jUhlnFi/01t2L8fBHy5PU7Z9qIGILtYqrAgPX mFUg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of peterx@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from mx1.redhat.com (mx1.redhat.com. [209.132.183.28]) by mx.google.com with ESMTPS id r41si2677380qtj.56.2018.11.20.22.21.38 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 20 Nov 2018 22:21:38 -0800 (PST) Received-SPF: pass (google.com: domain of peterx@redhat.com designates 209.132.183.28 as permitted sender) client-ip=209.132.183.28; Authentication-Results: mx.google.com; spf=pass (google.com: domain of peterx@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 9B540821C3; Wed, 21 Nov 2018 06:21:37 +0000 (UTC) Received: from xz-x1.nay.redhat.com (dhcp-14-128.nay.redhat.com [10.66.14.128]) by smtp.corp.redhat.com (Postfix) with ESMTP id 231E4600C5; Wed, 21 Nov 2018 06:21:25 +0000 (UTC) From: Peter Xu To: linux-kernel@vger.kernel.org Cc: Keith Busch , Jerome Glisse , peterx@redhat.com, Dan Williams , linux-mm@kvack.org, Matthew Wilcox , Al Viro , Andrea Arcangeli , Huang Ying , Mike Kravetz , Mike Rapoport , Linus Torvalds , "Michael S. Tsirkin" , "Kirill A . Shutemov" , Michal Hocko , Vlastimil Babka , Pavel Tatashin , Andrew Morton Subject: [PATCH RFC v3 2/4] mm: userfault: return VM_FAULT_RETRY on signals Date: Wed, 21 Nov 2018 14:21:01 +0800 Message-Id: <20181121062103.18835-3-peterx@redhat.com> In-Reply-To: <20181121062103.18835-1-peterx@redhat.com> References: <20181121062103.18835-1-peterx@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.28]); Wed, 21 Nov 2018 06:21:37 +0000 (UTC) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP There was a special path in handle_userfault() in the past that we'll return a VM_FAULT_NOPAGE when we detected non-fatal signals when waiting for userfault handling. We did that by reacquiring the mmap_sem before returning. However that brings a risk in that the vmas might have changed when we retake the mmap_sem and even we could be holding an invalid vma structure. The problem was reported by syzbot. This patch removes the special path and we'll return a VM_FAULT_RETRY with the common path even if we have got such signals. Then for all the architectures that is passing in VM_FAULT_ALLOW_RETRY into handle_mm_fault(), we check not only for SIGKILL but for all the rest of userspace pending signals right after we returned from handle_mm_fault(). The idea comes from the upstream discussion between Linus and Andrea: https://lkml.org/lkml/2017/10/30/560 (This patch contains a potential fix for a double-free of mmap_sem on ARC architecture; please see https://lkml.org/lkml/2018/11/1/723 for more information) Suggested-by: Linus Torvalds Suggested-by: Andrea Arcangeli Signed-off-by: Peter Xu --- arch/alpha/mm/fault.c | 2 +- arch/arc/mm/fault.c | 11 +++++++---- arch/arm/mm/fault.c | 14 ++++++++++---- arch/arm64/mm/fault.c | 6 +++--- arch/hexagon/mm/vm_fault.c | 2 +- arch/ia64/mm/fault.c | 2 +- arch/m68k/mm/fault.c | 2 +- arch/microblaze/mm/fault.c | 2 +- arch/mips/mm/fault.c | 2 +- arch/nds32/mm/fault.c | 6 +++--- arch/nios2/mm/fault.c | 2 +- arch/openrisc/mm/fault.c | 2 +- arch/parisc/mm/fault.c | 2 +- arch/powerpc/mm/fault.c | 4 +++- arch/riscv/mm/fault.c | 4 ++-- arch/s390/mm/fault.c | 9 ++++++--- arch/sh/mm/fault.c | 4 ++++ arch/sparc/mm/fault_32.c | 3 +++ arch/sparc/mm/fault_64.c | 3 +++ arch/um/kernel/trap.c | 5 ++++- arch/unicore32/mm/fault.c | 4 ++-- arch/x86/mm/fault.c | 12 +++++++++++- arch/xtensa/mm/fault.c | 3 +++ fs/userfaultfd.c | 24 ------------------------ 24 files changed, 73 insertions(+), 57 deletions(-) diff --git a/arch/alpha/mm/fault.c b/arch/alpha/mm/fault.c index d73dc473fbb9..46e5e420ad2a 100644 --- a/arch/alpha/mm/fault.c +++ b/arch/alpha/mm/fault.c @@ -150,7 +150,7 @@ do_page_fault(unsigned long address, unsigned long mmcsr, the fault. */ fault = handle_mm_fault(vma, address, flags); - if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current)) + if ((fault & VM_FAULT_RETRY) && signal_pending(current)) return; if (unlikely(fault & VM_FAULT_ERROR)) { diff --git a/arch/arc/mm/fault.c b/arch/arc/mm/fault.c index c9da6102eb4f..52b6b71b74e2 100644 --- a/arch/arc/mm/fault.c +++ b/arch/arc/mm/fault.c @@ -142,11 +142,14 @@ void do_page_fault(unsigned long address, struct pt_regs *regs) fault = handle_mm_fault(vma, address, flags); /* If Pagefault was interrupted by SIGKILL, exit page fault "early" */ - if (unlikely(fatal_signal_pending(current))) { - if ((fault & VM_FAULT_ERROR) && !(fault & VM_FAULT_RETRY)) + if (unlikely(fatal_signal_pending(current) && user_mode(regs))) { + /* + * VM_FAULT_RETRY means we have released the mmap_sem, + * otherwise we need to drop it before leaving + */ + if (!(fault & VM_FAULT_RETRY)) up_read(&mm->mmap_sem); - if (user_mode(regs)) - return; + return; } perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address); diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c index f4ea4c62c613..743077d19669 100644 --- a/arch/arm/mm/fault.c +++ b/arch/arm/mm/fault.c @@ -308,14 +308,20 @@ do_page_fault(unsigned long addr, unsigned int fsr, struct pt_regs *regs) fault = __do_page_fault(mm, addr, fsr, flags, tsk); - /* If we need to retry but a fatal signal is pending, handle the + /* If we need to retry but a signal is pending, handle the * signal first. We do not need to release the mmap_sem because * it would already be released in __lock_page_or_retry in * mm/filemap.c. */ - if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current)) { - if (!user_mode(regs)) + if (fault & VM_FAULT_RETRY) { + if (fatal_signal_pending(current) && !user_mode(regs)) goto no_context; - return 0; + else if (signal_pending(current)) + /* + * It's either a common signal, or a fatal + * signal but for the userspace, we return + * immediately. + */ + return 0; } /* diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c index 7d9571f4ae3d..744d6451ea83 100644 --- a/arch/arm64/mm/fault.c +++ b/arch/arm64/mm/fault.c @@ -499,13 +499,13 @@ static int __kprobes do_page_fault(unsigned long addr, unsigned int esr, if (fault & VM_FAULT_RETRY) { /* - * If we need to retry but a fatal signal is pending, + * If we need to retry but a signal is pending, * handle the signal first. We do not need to release * the mmap_sem because it would already be released * in __lock_page_or_retry in mm/filemap.c. */ - if (fatal_signal_pending(current)) { - if (!user_mode(regs)) + if (signal_pending(current)) { + if (fatal_signal_pending(current) && !user_mode(regs)) goto no_context; return 0; } diff --git a/arch/hexagon/mm/vm_fault.c b/arch/hexagon/mm/vm_fault.c index eb263e61daf4..be10b441d9cc 100644 --- a/arch/hexagon/mm/vm_fault.c +++ b/arch/hexagon/mm/vm_fault.c @@ -104,7 +104,7 @@ void do_page_fault(unsigned long address, long cause, struct pt_regs *regs) fault = handle_mm_fault(vma, address, flags); - if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current)) + if ((fault & VM_FAULT_RETRY) && signal_pending(current)) return; /* The most common case -- we are done. */ diff --git a/arch/ia64/mm/fault.c b/arch/ia64/mm/fault.c index 5baeb022f474..62c2d39d2bed 100644 --- a/arch/ia64/mm/fault.c +++ b/arch/ia64/mm/fault.c @@ -163,7 +163,7 @@ ia64_do_page_fault (unsigned long address, unsigned long isr, struct pt_regs *re */ fault = handle_mm_fault(vma, address, flags); - if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current)) + if ((fault & VM_FAULT_RETRY) && signal_pending(current)) return; if (unlikely(fault & VM_FAULT_ERROR)) { diff --git a/arch/m68k/mm/fault.c b/arch/m68k/mm/fault.c index 9b6163c05a75..d9808a807ab8 100644 --- a/arch/m68k/mm/fault.c +++ b/arch/m68k/mm/fault.c @@ -138,7 +138,7 @@ int do_page_fault(struct pt_regs *regs, unsigned long address, fault = handle_mm_fault(vma, address, flags); pr_debug("handle_mm_fault returns %x\n", fault); - if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current)) + if ((fault & VM_FAULT_RETRY) && signal_pending(current)) return 0; if (unlikely(fault & VM_FAULT_ERROR)) { diff --git a/arch/microblaze/mm/fault.c b/arch/microblaze/mm/fault.c index 202ad6a494f5..4fd2dbd0c5ca 100644 --- a/arch/microblaze/mm/fault.c +++ b/arch/microblaze/mm/fault.c @@ -217,7 +217,7 @@ void do_page_fault(struct pt_regs *regs, unsigned long address, */ fault = handle_mm_fault(vma, address, flags); - if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current)) + if ((fault & VM_FAULT_RETRY) && signal_pending(current)) return; if (unlikely(fault & VM_FAULT_ERROR)) { diff --git a/arch/mips/mm/fault.c b/arch/mips/mm/fault.c index 73d8a0f0b810..92374fd091d2 100644 --- a/arch/mips/mm/fault.c +++ b/arch/mips/mm/fault.c @@ -154,7 +154,7 @@ static void __kprobes __do_page_fault(struct pt_regs *regs, unsigned long write, */ fault = handle_mm_fault(vma, address, flags); - if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current)) + if ((fault & VM_FAULT_RETRY) && signal_pending(current)) return; perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address); diff --git a/arch/nds32/mm/fault.c b/arch/nds32/mm/fault.c index b740534b152c..72461745d3e1 100644 --- a/arch/nds32/mm/fault.c +++ b/arch/nds32/mm/fault.c @@ -207,12 +207,12 @@ void do_page_fault(unsigned long entry, unsigned long addr, fault = handle_mm_fault(vma, addr, flags); /* - * If we need to retry but a fatal signal is pending, handle the + * If we need to retry but a signal is pending, handle the * signal first. We do not need to release the mmap_sem because it * would already be released in __lock_page_or_retry in mm/filemap.c. */ - if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current)) { - if (!user_mode(regs)) + if (fault & VM_FAULT_RETRY && signal_pending(current)) { + if (fatal_signal_pending(current) && !user_mode(regs)) goto no_context; return; } diff --git a/arch/nios2/mm/fault.c b/arch/nios2/mm/fault.c index 24fd84cf6006..5939434a31ae 100644 --- a/arch/nios2/mm/fault.c +++ b/arch/nios2/mm/fault.c @@ -134,7 +134,7 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long cause, */ fault = handle_mm_fault(vma, address, flags); - if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current)) + if ((fault & VM_FAULT_RETRY) && signal_pending(current)) return; if (unlikely(fault & VM_FAULT_ERROR)) { diff --git a/arch/openrisc/mm/fault.c b/arch/openrisc/mm/fault.c index dc4dbafc1d83..873ecb5d82d7 100644 --- a/arch/openrisc/mm/fault.c +++ b/arch/openrisc/mm/fault.c @@ -165,7 +165,7 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long address, fault = handle_mm_fault(vma, address, flags); - if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current)) + if ((fault & VM_FAULT_RETRY) && signal_pending(current)) return; if (unlikely(fault & VM_FAULT_ERROR)) { diff --git a/arch/parisc/mm/fault.c b/arch/parisc/mm/fault.c index c8e8b7c05558..29422eec329d 100644 --- a/arch/parisc/mm/fault.c +++ b/arch/parisc/mm/fault.c @@ -303,7 +303,7 @@ void do_page_fault(struct pt_regs *regs, unsigned long code, fault = handle_mm_fault(vma, address, flags); - if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current)) + if ((fault & VM_FAULT_RETRY) && signal_pending(current)) return; if (unlikely(fault & VM_FAULT_ERROR)) { diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c index 1697e903bbf2..8bc0d091f13c 100644 --- a/arch/powerpc/mm/fault.c +++ b/arch/powerpc/mm/fault.c @@ -575,8 +575,10 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address, */ flags &= ~FAULT_FLAG_ALLOW_RETRY; flags |= FAULT_FLAG_TRIED; - if (!fatal_signal_pending(current)) + if (!signal_pending(current)) goto retry; + else if (!fatal_signal_pending(current) && is_user) + return 0; } /* diff --git a/arch/riscv/mm/fault.c b/arch/riscv/mm/fault.c index 88401d5125bc..4fc8d746bec3 100644 --- a/arch/riscv/mm/fault.c +++ b/arch/riscv/mm/fault.c @@ -123,11 +123,11 @@ asmlinkage void do_page_fault(struct pt_regs *regs) fault = handle_mm_fault(vma, addr, flags); /* - * If we need to retry but a fatal signal is pending, handle the + * If we need to retry but a signal is pending, handle the * signal first. We do not need to release the mmap_sem because it * would already be released in __lock_page_or_retry in mm/filemap.c. */ - if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(tsk)) + if ((fault & VM_FAULT_RETRY) && signal_pending(tsk)) return; if (unlikely(fault & VM_FAULT_ERROR)) { diff --git a/arch/s390/mm/fault.c b/arch/s390/mm/fault.c index 2b8f32f56e0c..19b4fb2fafab 100644 --- a/arch/s390/mm/fault.c +++ b/arch/s390/mm/fault.c @@ -500,9 +500,12 @@ static inline vm_fault_t do_exception(struct pt_regs *regs, int access) * the fault. */ fault = handle_mm_fault(vma, address, flags); - /* No reason to continue if interrupted by SIGKILL. */ - if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current)) { - fault = VM_FAULT_SIGNAL; + /* Do not continue if interrupted by signals. */ + if ((fault & VM_FAULT_RETRY) && signal_pending(current)) { + if (fatal_signal_pending(current)) + fault = VM_FAULT_SIGNAL; + else + fault = 0; if (flags & FAULT_FLAG_RETRY_NOWAIT) goto out_up; goto out; diff --git a/arch/sh/mm/fault.c b/arch/sh/mm/fault.c index 6defd2c6d9b1..baf5d73df40c 100644 --- a/arch/sh/mm/fault.c +++ b/arch/sh/mm/fault.c @@ -506,6 +506,10 @@ asmlinkage void __kprobes do_page_fault(struct pt_regs *regs, * have already released it in __lock_page_or_retry * in mm/filemap.c. */ + + if (user_mode(regs) && signal_pending(tsk)) + return; + goto retry; } } diff --git a/arch/sparc/mm/fault_32.c b/arch/sparc/mm/fault_32.c index b0440b0edd97..a2c83104fe35 100644 --- a/arch/sparc/mm/fault_32.c +++ b/arch/sparc/mm/fault_32.c @@ -269,6 +269,9 @@ asmlinkage void do_sparc_fault(struct pt_regs *regs, int text_fault, int write, * in mm/filemap.c. */ + if (user_mode(regs) && signal_pending(tsk)) + return; + goto retry; } } diff --git a/arch/sparc/mm/fault_64.c b/arch/sparc/mm/fault_64.c index 8f8a604c1300..cad71ec5c7b3 100644 --- a/arch/sparc/mm/fault_64.c +++ b/arch/sparc/mm/fault_64.c @@ -467,6 +467,9 @@ asmlinkage void __kprobes do_sparc64_fault(struct pt_regs *regs) * in mm/filemap.c. */ + if (user_mode(regs) && signal_pending(current)) + return; + goto retry; } } diff --git a/arch/um/kernel/trap.c b/arch/um/kernel/trap.c index 0e8b6158f224..09baf37b65b9 100644 --- a/arch/um/kernel/trap.c +++ b/arch/um/kernel/trap.c @@ -76,8 +76,11 @@ int handle_page_fault(unsigned long address, unsigned long ip, fault = handle_mm_fault(vma, address, flags); - if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current)) + if (fault & VM_FAULT_RETRY && signal_pending(current)) { + if (is_user && !fatal_signal_pending(current)) + err = 0; goto out_nosemaphore; + } if (unlikely(fault & VM_FAULT_ERROR)) { if (fault & VM_FAULT_OOM) { diff --git a/arch/unicore32/mm/fault.c b/arch/unicore32/mm/fault.c index b9a3a50644c1..3611f19234a1 100644 --- a/arch/unicore32/mm/fault.c +++ b/arch/unicore32/mm/fault.c @@ -248,11 +248,11 @@ static int do_pf(unsigned long addr, unsigned int fsr, struct pt_regs *regs) fault = __do_pf(mm, addr, fsr, flags, tsk); - /* If we need to retry but a fatal signal is pending, handle the + /* If we need to retry but a signal is pending, handle the * signal first. We do not need to release the mmap_sem because * it would already be released in __lock_page_or_retry in * mm/filemap.c. */ - if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current)) + if ((fault & VM_FAULT_RETRY) && signal_pending(current)) return 0; if (!(fault & VM_FAULT_ERROR) && (flags & FAULT_FLAG_ALLOW_RETRY)) { diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index 71d4b9d4d43f..b94ef0c2b98c 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -1433,8 +1433,18 @@ void do_user_addr_fault(struct pt_regs *regs, if (flags & FAULT_FLAG_ALLOW_RETRY) { flags &= ~FAULT_FLAG_ALLOW_RETRY; flags |= FAULT_FLAG_TRIED; - if (!fatal_signal_pending(tsk)) + if (!signal_pending(tsk)) goto retry; + else if (!fatal_signal_pending(tsk)) + /* + * There is a signal for the task but + * it's not fatal, let's return + * directly to the userspace. This + * gives chance for signals like + * SIGSTOP/SIGCONT to be handled + * faster, e.g., with GDB. + */ + return; } /* User mode? Just return to handle the fatal exception */ diff --git a/arch/xtensa/mm/fault.c b/arch/xtensa/mm/fault.c index 2ab0e0dcd166..792dad5e2f12 100644 --- a/arch/xtensa/mm/fault.c +++ b/arch/xtensa/mm/fault.c @@ -136,6 +136,9 @@ void do_page_fault(struct pt_regs *regs) * in mm/filemap.c. */ + if (user_mode(regs) && signal_pending(current)) + return; + goto retry; } } diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index 356d2b8568c1..249c9211b2ed 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -515,30 +515,6 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason) __set_current_state(TASK_RUNNING); - if (return_to_userland) { - if (signal_pending(current) && - !fatal_signal_pending(current)) { - /* - * If we got a SIGSTOP or SIGCONT and this is - * a normal userland page fault, just let - * userland return so the signal will be - * handled and gdb debugging works. The page - * fault code immediately after we return from - * this function is going to release the - * mmap_sem and it's not depending on it - * (unlike gup would if we were not to return - * VM_FAULT_RETRY). - * - * If a fatal signal is pending we still take - * the streamlined VM_FAULT_RETRY failure path - * and there's no need to retake the mmap_sem - * in such case. - */ - down_read(&mm->mmap_sem); - ret = VM_FAULT_NOPAGE; - } - } - /* * Here we race with the list_del; list_add in * userfaultfd_ctx_read(), however because we don't ever run From patchwork Wed Nov 21 06:21:02 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 10691997 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6C8BD13BF for ; Wed, 21 Nov 2018 06:21:49 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 593222B898 for ; Wed, 21 Nov 2018 06:21:49 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 4AC112B87E; Wed, 21 Nov 2018 06:21:49 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2DCFE2B87E for ; Wed, 21 Nov 2018 06:21:48 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E5DDA6B2489; Wed, 21 Nov 2018 01:21:46 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id E0C586B248A; Wed, 21 Nov 2018 01:21:46 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CD58D6B248B; Wed, 21 Nov 2018 01:21:46 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qt1-f200.google.com (mail-qt1-f200.google.com [209.85.160.200]) by kanga.kvack.org (Postfix) with ESMTP id 9C0936B2489 for ; Wed, 21 Nov 2018 01:21:46 -0500 (EST) Received: by mail-qt1-f200.google.com with SMTP id z6so2481563qtj.21 for ; Tue, 20 Nov 2018 22:21:46 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=hLd0PBhtDw0BtWvnV02w9L8JuwfS9TThyAq7ObrnJvw=; b=BY16H0y/JQdQNzQ2SvJBsIY9pLF4dzA4zCa3MlyLWP/OA4/DZdg0UewaeY7rUqz6Tj f4NmyQ5Qhh6tPcElMav6PC4zN0uj4wSAHckGPuCbHSgAMcOK39JeDsmYjhV+CeanO6dn 7RKj1boSRs++zI5m/7YjL2xrVILrbQJ8VlsxR3AEorrD1Pc8TP5Re6ko+QUsab3eaoag N52+EYiCjP1zNF4D71mtG9sIGKQ7X6+/R/wryCP7nHDTbQ2NSwkOSRUbw66ud/aNI/3O wAoo3YfiVheAfRmqZOmIoPhn59rqtBlLnkWE+m+NthzXo07epA/h/IWSrqnFBW9iVzCJ 6x9g== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of peterx@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com X-Gm-Message-State: AGRZ1gJ+kjMINWONKgIYPJYf+5lo296Xr1a9TBYCcm5z72z4irqFjMtP zhfRu8D7dQGOoIKImN95cgC0MQrPWb8NjufljmSh68v+Xv7pFexwyTBM0aw0oNqWoal0TMQXqt/ d3kph95eUEB1RfbpLWCJy8DTQTNhB7rlj+K5YsN5hPw4tpJDXDGtn22WcHPGeVYgUKg== X-Received: by 2002:ac8:169b:: with SMTP id r27mr4557718qtj.23.1542781306362; Tue, 20 Nov 2018 22:21:46 -0800 (PST) X-Google-Smtp-Source: AFSGD/UL840ecvphgSWWZ/qd9JZWQHKXP8BDcgBj1SP9E5bE8qRTv8jBazXeUw08FkpN0vx2uOb3 X-Received: by 2002:ac8:169b:: with SMTP id r27mr4557671qtj.23.1542781305081; Tue, 20 Nov 2018 22:21:45 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1542781305; cv=none; d=google.com; s=arc-20160816; b=wokI1P3xj45eKutnKMwjt+Snpo+lKF9d33y3GB45e6OjYNsxVX0G5Sd4AOlKyJihBa c2+InkqMNw4+Y+NtV74+DuBjl6evyA9bToBXK4lFZCA6a5emqABnK/A7U+QSKE9a7HFV 9E/9NBgslY+Nl7AEhKhRfmFKbDqzFxVlSn92sOQaWPNvifQG86meV9sPii+MbIfpuNG1 kmgTxBTeY3qYBbO1cSPM0n62a1ijNKBxi3oLjrlz2G9tXbrtBh2OcTrFsw3NVWIYIJk4 iGGmMwQv0itOdwV+TxbKB5t8gZiNrBA+RFZu0qNYAf3i+D1lqxJ4x5UCTRb8UUlJxXNU dgGQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=hLd0PBhtDw0BtWvnV02w9L8JuwfS9TThyAq7ObrnJvw=; b=zJ3ql7GSrll9r0n8U6zkru/eJH8/wXWlb87RSerkq+hArTthtgcHoYVrrfW94E1xf7 FrUxTlIv87tBUSmBm3vZqXTD0F5QqgGAOBISfYNz0eHu4nfMNzQl0BRyQvFAG917YLDU SyZBwCQAda1tpSrEClXS076RzW4CrOYWDGskn+wFVR44ZQAPa39dEDIBSHbg0AlOzw6I yPlOT11eXaZZeTqAEgC1hkoQTh8jG2WDWFEzVHaJMJxkijWk0SC2arQ/Fy7tLtCUkOQa R19DihAEH/8KKUHcTpA5DRe7XqV8nyLoQzA1M1MTxznv2IF1jKSObxA59QPd+c9hD7Fr PGXA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of peterx@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from mx1.redhat.com (mx1.redhat.com. [209.132.183.28]) by mx.google.com with ESMTPS id f70si2053473qke.10.2018.11.20.22.21.44 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 20 Nov 2018 22:21:45 -0800 (PST) Received-SPF: pass (google.com: domain of peterx@redhat.com designates 209.132.183.28 as permitted sender) client-ip=209.132.183.28; Authentication-Results: mx.google.com; spf=pass (google.com: domain of peterx@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id ED9B7C0587D2; Wed, 21 Nov 2018 06:21:43 +0000 (UTC) Received: from xz-x1.nay.redhat.com (dhcp-14-128.nay.redhat.com [10.66.14.128]) by smtp.corp.redhat.com (Postfix) with ESMTP id 212F5600C5; Wed, 21 Nov 2018 06:21:37 +0000 (UTC) From: Peter Xu To: linux-kernel@vger.kernel.org Cc: Keith Busch , Jerome Glisse , peterx@redhat.com, Dan Williams , linux-mm@kvack.org, Matthew Wilcox , Al Viro , Andrea Arcangeli , Huang Ying , Mike Kravetz , Mike Rapoport , Linus Torvalds , "Michael S. Tsirkin" , "Kirill A . Shutemov" , Michal Hocko , Vlastimil Babka , Pavel Tatashin , Andrew Morton Subject: [PATCH RFC v3 3/4] mm: allow VM_FAULT_RETRY for multiple times Date: Wed, 21 Nov 2018 14:21:02 +0800 Message-Id: <20181121062103.18835-4-peterx@redhat.com> In-Reply-To: <20181121062103.18835-1-peterx@redhat.com> References: <20181121062103.18835-1-peterx@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.32]); Wed, 21 Nov 2018 06:21:44 +0000 (UTC) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP The idea comes from a discussion between Linus and Andrea [1]. Before this patch we only allow a page fault to retry once. We achieved this by clearing the FAULT_FLAG_ALLOW_RETRY flag when doing handle_mm_fault() the second time. This was majorly used to avoid unexpected starvation of the system by looping over forever to handle the page fault on a single page. However that should hardly happen, and after all for each code path to return a VM_FAULT_RETRY we'll first wait for a condition (during which time we should possibly yield the cpu) to happen before VM_FAULT_RETRY is really returned. This patch removes the restriction by keeping the FAULT_FLAG_ALLOW_RETRY flag when we receive VM_FAULT_RETRY. It means that the page fault handler now can retry the page fault for multiple times if necessary without the need to generate another page fault event. Meanwhile we still keep the FAULT_FLAG_TRIED flag so page fault handler can still identify whether a page fault is the first attempt or not. GUP code is not touched yet and will be covered in follow up patch. This will be a nice enhancement for current code at the same time a supporting material for the future userfaultfd-writeprotect work since in that work there will always be an explicit userfault writeprotect retry for protected pages, and if that cannot resolve the page fault (e.g., when userfaultfd-writeprotect is used in conjunction with shared memory) then we'll possibly need a 3rd retry of the page fault. It might also benefit other potential users who will have similar requirement like userfault write-protection. Please read the thread below for more information. [1] https://lkml.org/lkml/2017/11/2/833 Suggested-by: Linus Torvalds Suggested-by: Andrea Arcangeli Signed-off-by: Peter Xu --- arch/alpha/mm/fault.c | 2 +- arch/arc/mm/fault.c | 1 - arch/arm/mm/fault.c | 3 --- arch/arm64/mm/fault.c | 5 ----- arch/hexagon/mm/vm_fault.c | 1 - arch/ia64/mm/fault.c | 1 - arch/m68k/mm/fault.c | 3 --- arch/microblaze/mm/fault.c | 1 - arch/mips/mm/fault.c | 1 - arch/nds32/mm/fault.c | 1 - arch/nios2/mm/fault.c | 3 --- arch/openrisc/mm/fault.c | 1 - arch/parisc/mm/fault.c | 2 -- arch/powerpc/mm/fault.c | 5 ----- arch/riscv/mm/fault.c | 5 ----- arch/s390/mm/fault.c | 5 +---- arch/sh/mm/fault.c | 1 - arch/sparc/mm/fault_32.c | 1 - arch/sparc/mm/fault_64.c | 1 - arch/um/kernel/trap.c | 1 - arch/unicore32/mm/fault.c | 6 +----- arch/x86/mm/fault.c | 1 - arch/xtensa/mm/fault.c | 1 - 23 files changed, 3 insertions(+), 49 deletions(-) diff --git a/arch/alpha/mm/fault.c b/arch/alpha/mm/fault.c index 46e5e420ad2a..deae82bb83c1 100644 --- a/arch/alpha/mm/fault.c +++ b/arch/alpha/mm/fault.c @@ -169,7 +169,7 @@ do_page_fault(unsigned long address, unsigned long mmcsr, else current->min_flt++; if (fault & VM_FAULT_RETRY) { - flags &= ~FAULT_FLAG_ALLOW_RETRY; + flags |= FAULT_FLAG_TRIED; /* No need to up_read(&mm->mmap_sem) as we would * have already released it in __lock_page_or_retry diff --git a/arch/arc/mm/fault.c b/arch/arc/mm/fault.c index 52b6b71b74e2..a8b02db45324 100644 --- a/arch/arc/mm/fault.c +++ b/arch/arc/mm/fault.c @@ -168,7 +168,6 @@ void do_page_fault(unsigned long address, struct pt_regs *regs) } if (fault & VM_FAULT_RETRY) { - flags &= ~FAULT_FLAG_ALLOW_RETRY; flags |= FAULT_FLAG_TRIED; goto retry; } diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c index 743077d19669..377781d8491a 100644 --- a/arch/arm/mm/fault.c +++ b/arch/arm/mm/fault.c @@ -342,9 +342,6 @@ do_page_fault(unsigned long addr, unsigned int fsr, struct pt_regs *regs) regs, addr); } if (fault & VM_FAULT_RETRY) { - /* Clear FAULT_FLAG_ALLOW_RETRY to avoid any risk - * of starvation. */ - flags &= ~FAULT_FLAG_ALLOW_RETRY; flags |= FAULT_FLAG_TRIED; goto retry; } diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c index 744d6451ea83..8a26e03fc2bf 100644 --- a/arch/arm64/mm/fault.c +++ b/arch/arm64/mm/fault.c @@ -510,12 +510,7 @@ static int __kprobes do_page_fault(unsigned long addr, unsigned int esr, return 0; } - /* - * Clear FAULT_FLAG_ALLOW_RETRY to avoid any risk of - * starvation. - */ if (mm_flags & FAULT_FLAG_ALLOW_RETRY) { - mm_flags &= ~FAULT_FLAG_ALLOW_RETRY; mm_flags |= FAULT_FLAG_TRIED; goto retry; } diff --git a/arch/hexagon/mm/vm_fault.c b/arch/hexagon/mm/vm_fault.c index be10b441d9cc..576751597e77 100644 --- a/arch/hexagon/mm/vm_fault.c +++ b/arch/hexagon/mm/vm_fault.c @@ -115,7 +115,6 @@ void do_page_fault(unsigned long address, long cause, struct pt_regs *regs) else current->min_flt++; if (fault & VM_FAULT_RETRY) { - flags &= ~FAULT_FLAG_ALLOW_RETRY; flags |= FAULT_FLAG_TRIED; goto retry; } diff --git a/arch/ia64/mm/fault.c b/arch/ia64/mm/fault.c index 62c2d39d2bed..9de95d39935e 100644 --- a/arch/ia64/mm/fault.c +++ b/arch/ia64/mm/fault.c @@ -189,7 +189,6 @@ ia64_do_page_fault (unsigned long address, unsigned long isr, struct pt_regs *re else current->min_flt++; if (fault & VM_FAULT_RETRY) { - flags &= ~FAULT_FLAG_ALLOW_RETRY; flags |= FAULT_FLAG_TRIED; /* No need to up_read(&mm->mmap_sem) as we would diff --git a/arch/m68k/mm/fault.c b/arch/m68k/mm/fault.c index d9808a807ab8..b1b2109e4ab4 100644 --- a/arch/m68k/mm/fault.c +++ b/arch/m68k/mm/fault.c @@ -162,9 +162,6 @@ int do_page_fault(struct pt_regs *regs, unsigned long address, else current->min_flt++; if (fault & VM_FAULT_RETRY) { - /* Clear FAULT_FLAG_ALLOW_RETRY to avoid any risk - * of starvation. */ - flags &= ~FAULT_FLAG_ALLOW_RETRY; flags |= FAULT_FLAG_TRIED; /* diff --git a/arch/microblaze/mm/fault.c b/arch/microblaze/mm/fault.c index 4fd2dbd0c5ca..05a4847ac0bf 100644 --- a/arch/microblaze/mm/fault.c +++ b/arch/microblaze/mm/fault.c @@ -236,7 +236,6 @@ void do_page_fault(struct pt_regs *regs, unsigned long address, else current->min_flt++; if (fault & VM_FAULT_RETRY) { - flags &= ~FAULT_FLAG_ALLOW_RETRY; flags |= FAULT_FLAG_TRIED; /* diff --git a/arch/mips/mm/fault.c b/arch/mips/mm/fault.c index 92374fd091d2..9953b5b571df 100644 --- a/arch/mips/mm/fault.c +++ b/arch/mips/mm/fault.c @@ -178,7 +178,6 @@ static void __kprobes __do_page_fault(struct pt_regs *regs, unsigned long write, tsk->min_flt++; } if (fault & VM_FAULT_RETRY) { - flags &= ~FAULT_FLAG_ALLOW_RETRY; flags |= FAULT_FLAG_TRIED; /* diff --git a/arch/nds32/mm/fault.c b/arch/nds32/mm/fault.c index 72461745d3e1..f0b775cb5cdf 100644 --- a/arch/nds32/mm/fault.c +++ b/arch/nds32/mm/fault.c @@ -237,7 +237,6 @@ void do_page_fault(unsigned long entry, unsigned long addr, else tsk->min_flt++; if (fault & VM_FAULT_RETRY) { - flags &= ~FAULT_FLAG_ALLOW_RETRY; flags |= FAULT_FLAG_TRIED; /* No need to up_read(&mm->mmap_sem) as we would diff --git a/arch/nios2/mm/fault.c b/arch/nios2/mm/fault.c index 5939434a31ae..9dd1c51acc22 100644 --- a/arch/nios2/mm/fault.c +++ b/arch/nios2/mm/fault.c @@ -158,9 +158,6 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long cause, else current->min_flt++; if (fault & VM_FAULT_RETRY) { - /* Clear FAULT_FLAG_ALLOW_RETRY to avoid any risk - * of starvation. */ - flags &= ~FAULT_FLAG_ALLOW_RETRY; flags |= FAULT_FLAG_TRIED; /* diff --git a/arch/openrisc/mm/fault.c b/arch/openrisc/mm/fault.c index 873ecb5d82d7..ff92c5674781 100644 --- a/arch/openrisc/mm/fault.c +++ b/arch/openrisc/mm/fault.c @@ -185,7 +185,6 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long address, else tsk->min_flt++; if (fault & VM_FAULT_RETRY) { - flags &= ~FAULT_FLAG_ALLOW_RETRY; flags |= FAULT_FLAG_TRIED; /* No need to up_read(&mm->mmap_sem) as we would diff --git a/arch/parisc/mm/fault.c b/arch/parisc/mm/fault.c index 29422eec329d..7d3e96a9a7ab 100644 --- a/arch/parisc/mm/fault.c +++ b/arch/parisc/mm/fault.c @@ -327,8 +327,6 @@ void do_page_fault(struct pt_regs *regs, unsigned long code, else current->min_flt++; if (fault & VM_FAULT_RETRY) { - flags &= ~FAULT_FLAG_ALLOW_RETRY; - /* * No need to up_read(&mm->mmap_sem) as we would * have already released it in __lock_page_or_retry diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c index 8bc0d091f13c..8bdc7e75d2e5 100644 --- a/arch/powerpc/mm/fault.c +++ b/arch/powerpc/mm/fault.c @@ -569,11 +569,6 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address, if (unlikely(fault & VM_FAULT_RETRY)) { /* We retry only once */ if (flags & FAULT_FLAG_ALLOW_RETRY) { - /* - * Clear FAULT_FLAG_ALLOW_RETRY to avoid any risk - * of starvation. - */ - flags &= ~FAULT_FLAG_ALLOW_RETRY; flags |= FAULT_FLAG_TRIED; if (!signal_pending(current)) goto retry; diff --git a/arch/riscv/mm/fault.c b/arch/riscv/mm/fault.c index 4fc8d746bec3..aad2c0557d2f 100644 --- a/arch/riscv/mm/fault.c +++ b/arch/riscv/mm/fault.c @@ -154,11 +154,6 @@ asmlinkage void do_page_fault(struct pt_regs *regs) 1, regs, addr); } if (fault & VM_FAULT_RETRY) { - /* - * Clear FAULT_FLAG_ALLOW_RETRY to avoid any risk - * of starvation. - */ - flags &= ~(FAULT_FLAG_ALLOW_RETRY); flags |= FAULT_FLAG_TRIED; /* diff --git a/arch/s390/mm/fault.c b/arch/s390/mm/fault.c index 19b4fb2fafab..819f87169ee1 100644 --- a/arch/s390/mm/fault.c +++ b/arch/s390/mm/fault.c @@ -537,10 +537,7 @@ static inline vm_fault_t do_exception(struct pt_regs *regs, int access) fault = VM_FAULT_PFAULT; goto out_up; } - /* Clear FAULT_FLAG_ALLOW_RETRY to avoid any risk - * of starvation. */ - flags &= ~(FAULT_FLAG_ALLOW_RETRY | - FAULT_FLAG_RETRY_NOWAIT); + flags &= ~FAULT_FLAG_RETRY_NOWAIT; flags |= FAULT_FLAG_TRIED; down_read(&mm->mmap_sem); goto retry; diff --git a/arch/sh/mm/fault.c b/arch/sh/mm/fault.c index baf5d73df40c..cd710e2d7c57 100644 --- a/arch/sh/mm/fault.c +++ b/arch/sh/mm/fault.c @@ -498,7 +498,6 @@ asmlinkage void __kprobes do_page_fault(struct pt_regs *regs, regs, address); } if (fault & VM_FAULT_RETRY) { - flags &= ~FAULT_FLAG_ALLOW_RETRY; flags |= FAULT_FLAG_TRIED; /* diff --git a/arch/sparc/mm/fault_32.c b/arch/sparc/mm/fault_32.c index a2c83104fe35..6735cd1c09b9 100644 --- a/arch/sparc/mm/fault_32.c +++ b/arch/sparc/mm/fault_32.c @@ -261,7 +261,6 @@ asmlinkage void do_sparc_fault(struct pt_regs *regs, int text_fault, int write, 1, regs, address); } if (fault & VM_FAULT_RETRY) { - flags &= ~FAULT_FLAG_ALLOW_RETRY; flags |= FAULT_FLAG_TRIED; /* No need to up_read(&mm->mmap_sem) as we would diff --git a/arch/sparc/mm/fault_64.c b/arch/sparc/mm/fault_64.c index cad71ec5c7b3..28d5b4d012c6 100644 --- a/arch/sparc/mm/fault_64.c +++ b/arch/sparc/mm/fault_64.c @@ -459,7 +459,6 @@ asmlinkage void __kprobes do_sparc64_fault(struct pt_regs *regs) 1, regs, address); } if (fault & VM_FAULT_RETRY) { - flags &= ~FAULT_FLAG_ALLOW_RETRY; flags |= FAULT_FLAG_TRIED; /* No need to up_read(&mm->mmap_sem) as we would diff --git a/arch/um/kernel/trap.c b/arch/um/kernel/trap.c index 09baf37b65b9..c63fc292aea0 100644 --- a/arch/um/kernel/trap.c +++ b/arch/um/kernel/trap.c @@ -99,7 +99,6 @@ int handle_page_fault(unsigned long address, unsigned long ip, else current->min_flt++; if (fault & VM_FAULT_RETRY) { - flags &= ~FAULT_FLAG_ALLOW_RETRY; flags |= FAULT_FLAG_TRIED; goto retry; diff --git a/arch/unicore32/mm/fault.c b/arch/unicore32/mm/fault.c index 3611f19234a1..fdf577956f5f 100644 --- a/arch/unicore32/mm/fault.c +++ b/arch/unicore32/mm/fault.c @@ -260,12 +260,8 @@ static int do_pf(unsigned long addr, unsigned int fsr, struct pt_regs *regs) tsk->maj_flt++; else tsk->min_flt++; - if (fault & VM_FAULT_RETRY) { - /* Clear FAULT_FLAG_ALLOW_RETRY to avoid any risk - * of starvation. */ - flags &= ~FAULT_FLAG_ALLOW_RETRY; + if (fault & VM_FAULT_RETRY) goto retry; - } } up_read(&mm->mmap_sem); diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index b94ef0c2b98c..645b1365a72d 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -1431,7 +1431,6 @@ void do_user_addr_fault(struct pt_regs *regs, if (unlikely(fault & VM_FAULT_RETRY)) { /* Retry at most once */ if (flags & FAULT_FLAG_ALLOW_RETRY) { - flags &= ~FAULT_FLAG_ALLOW_RETRY; flags |= FAULT_FLAG_TRIED; if (!signal_pending(tsk)) goto retry; diff --git a/arch/xtensa/mm/fault.c b/arch/xtensa/mm/fault.c index 792dad5e2f12..7cd55f2d66c9 100644 --- a/arch/xtensa/mm/fault.c +++ b/arch/xtensa/mm/fault.c @@ -128,7 +128,6 @@ void do_page_fault(struct pt_regs *regs) else current->min_flt++; if (fault & VM_FAULT_RETRY) { - flags &= ~FAULT_FLAG_ALLOW_RETRY; flags |= FAULT_FLAG_TRIED; /* No need to up_read(&mm->mmap_sem) as we would From patchwork Wed Nov 21 06:21:03 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 10691999 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9C12713BF for ; Wed, 21 Nov 2018 06:21:56 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8C0672B8B1 for ; Wed, 21 Nov 2018 06:21:56 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 805852B89F; Wed, 21 Nov 2018 06:21:56 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 141052B896 for ; Wed, 21 Nov 2018 06:21:56 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DE9486B248B; Wed, 21 Nov 2018 01:21:54 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id D98CB6B248C; Wed, 21 Nov 2018 01:21:54 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C38A06B248D; Wed, 21 Nov 2018 01:21:54 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qk1-f197.google.com (mail-qk1-f197.google.com [209.85.222.197]) by kanga.kvack.org (Postfix) with ESMTP id 922F46B248B for ; Wed, 21 Nov 2018 01:21:54 -0500 (EST) Received: by mail-qk1-f197.google.com with SMTP id g22so5751767qke.15 for ; Tue, 20 Nov 2018 22:21:54 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=13Xd4hA69jLTS4UEEtQy7QfFCo+mbPX/t8zrT3erd7w=; b=Kh36dyhGRmLjUHZxmEbU6tGWjvtZPAS2wh+CUnF71OVwsLAuTFEgNOLE5J9Sjbxcmt vsEiDZVttn4B9EW/KvH92iIY+V/KXClNkLlPWBNcTMuHLblbGxycf69oxIQsVD7nccwT A/3D0V53Y4kIOfStl+VNBPRogrB1+K+WgZGJ6dVvYhdtHfoXVEXVt84q7WQz+kqCdeGA 2Tkz7Bo4qe56bWMh3IP7MhB735KYcVWf/b6jiENt/Myny+6PczTFPxnjBoSeXMfLIL+c fb3pEj5s2fH0OBi5N365GuYw4CM1qo0wrT+0DK1uHQsyKywcT1gBezT+nLNH1gbYrHwp bm/A== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of peterx@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com X-Gm-Message-State: AA+aEWZbhNUxcXaxyGzwrWRjqW1ZlsjAl1u5VRCQW2cJZXeMGsvclIHc K+msAo48BRVOhbnRWVJbl+CTvn9aGNtxnLRL+NKZuujaMM5fuKPn5Dni8aLWdsxW7IJvKuAj42M 6vzYrYj/yq/o86FlSn2GCvtLuhxPJJrBKXw6z4UjT6zhGxTx2dR0JFV1tNI6GBKVDZw== X-Received: by 2002:a37:1b46:: with SMTP id b67mr4469521qkb.144.1542781314356; Tue, 20 Nov 2018 22:21:54 -0800 (PST) X-Google-Smtp-Source: AFSGD/XVrazR1xH0I+ndEVhTiJvia3WwDDJAn7+83Cro2yBhKFkSSOvOtMJWve5WR5lCfjDfEE5K X-Received: by 2002:a37:1b46:: with SMTP id b67mr4469510qkb.144.1542781313782; Tue, 20 Nov 2018 22:21:53 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1542781313; cv=none; d=google.com; s=arc-20160816; b=qwX5Lsp+IoEqMSnJOsGp2cK4oEGcTwYrHKxiXLVLbFeBdQq3pbqWnbDFrWyx6DM/e6 YoZ92O26xP6GIJoo60+oxMMuPg75H+IqPjC8eRVc6Wayp206wvbE+hbGZCMiZ85yJyWP WNve1+TmsSPu2hX8WUOn5IB/B25MM+73f3f5FvHragicADVFGhPWMl+384N7EvIO63yR QHn9Gk6QOSUuWQS8nCHNiKs5XTgvRn17583CQZg/7Bv1VCQCzhWWfaQDvFyDzpMJKVlx hqIPY69DvuJWE7ArJxvFyXUgCK6qmo/tvjHiZ8vA0O8pxvSik/V2kQvcsOhiOo01mwli qilQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=13Xd4hA69jLTS4UEEtQy7QfFCo+mbPX/t8zrT3erd7w=; b=awzkaz1v3ORJsfs7ZyYZJVaghMsw3bNjgclJw+G4KO7sZaHMiSsk+e/WiZVO0wGxPZ ktJKSXat8NiFHZX2paZuacKC7bf3B/dVajDhN4VFQDTnhfn1RhN+mai68L2Eft5ckThx iGSmrgF5kEbcRlOIzK3MZJtUS1XElOkWXSuQ3M3/7dXp9PU0+Qx+FNGperGGl76RFFzk OfZJyJYE34uDdO8Hx3dIafGJ/KU8oelXyPWqdEujj4ZKtBW9s4oRAaGSopnSb46cw/Dt yhC7kuqeuzRSUQb0sxS6jQastU8Qb+dy5X+cWt+xSB7IxYVrb1J8ZzXIvzm64H+g5KNM e8SQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of peterx@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from mx1.redhat.com (mx1.redhat.com. [209.132.183.28]) by mx.google.com with ESMTPS id o51si22641710qtj.204.2018.11.20.22.21.53 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 20 Nov 2018 22:21:53 -0800 (PST) Received-SPF: pass (google.com: domain of peterx@redhat.com designates 209.132.183.28 as permitted sender) client-ip=209.132.183.28; Authentication-Results: mx.google.com; spf=pass (google.com: domain of peterx@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id DA7BC89ACE; Wed, 21 Nov 2018 06:21:52 +0000 (UTC) Received: from xz-x1.nay.redhat.com (dhcp-14-128.nay.redhat.com [10.66.14.128]) by smtp.corp.redhat.com (Postfix) with ESMTP id 74DD2600C3; Wed, 21 Nov 2018 06:21:44 +0000 (UTC) From: Peter Xu To: linux-kernel@vger.kernel.org Cc: Keith Busch , Jerome Glisse , peterx@redhat.com, Dan Williams , linux-mm@kvack.org, Matthew Wilcox , Al Viro , Andrea Arcangeli , Huang Ying , Mike Kravetz , Mike Rapoport , Linus Torvalds , "Michael S. Tsirkin" , "Kirill A . Shutemov" , Michal Hocko , Vlastimil Babka , Pavel Tatashin , Andrew Morton Subject: [PATCH RFC v3 4/4] mm: gup: allow VM_FAULT_RETRY for multiple times Date: Wed, 21 Nov 2018 14:21:03 +0800 Message-Id: <20181121062103.18835-5-peterx@redhat.com> In-Reply-To: <20181121062103.18835-1-peterx@redhat.com> References: <20181121062103.18835-1-peterx@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.26]); Wed, 21 Nov 2018 06:21:53 +0000 (UTC) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP This is the gup counterpart of the change that allows the VM_FAULT_RETRY to happen for more than once. Signed-off-by: Peter Xu --- mm/gup.c | 17 +++++++++++++---- 1 file changed, 13 insertions(+), 4 deletions(-) diff --git a/mm/gup.c b/mm/gup.c index a40111c742ba..7c3b3ab6be88 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -528,7 +528,10 @@ static int faultin_page(struct task_struct *tsk, struct vm_area_struct *vma, if (*flags & FOLL_NOWAIT) fault_flags |= FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_RETRY_NOWAIT; if (*flags & FOLL_TRIED) { - VM_WARN_ON_ONCE(fault_flags & FAULT_FLAG_ALLOW_RETRY); + /* + * Note: FAULT_FLAG_ALLOW_RETRY and FAULT_FLAG_TRIED + * can co-exist + */ fault_flags |= FAULT_FLAG_TRIED; } @@ -944,17 +947,23 @@ static __always_inline long __get_user_pages_locked(struct task_struct *tsk, /* VM_FAULT_RETRY triggered, so seek to the faulting offset */ pages += ret; start += ret << PAGE_SHIFT; + lock_dropped = true; +retry: /* * Repeat on the address that fired VM_FAULT_RETRY - * without FAULT_FLAG_ALLOW_RETRY but with + * with both FAULT_FLAG_ALLOW_RETRY and * FAULT_FLAG_TRIED. */ *locked = 1; - lock_dropped = true; down_read(&mm->mmap_sem); ret = __get_user_pages(tsk, mm, start, 1, flags | FOLL_TRIED, - pages, NULL, NULL); + pages, NULL, locked); + if (!*locked) { + /* Continue to retry until we succeeded */ + BUG_ON(ret != 0); + goto retry; + } if (ret != 1) { BUG_ON(ret > 1); if (!pages_done)