From patchwork Mon Jan 21 07:57:14 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 10772817 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9CFCA13BF for ; Mon, 21 Jan 2019 07:59:42 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8FAD529CB1 for ; Mon, 21 Jan 2019 07:59:42 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 833A029CBB; Mon, 21 Jan 2019 07:59:42 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id EE61D29CB1 for ; Mon, 21 Jan 2019 07:59:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D87678E000E; Mon, 21 Jan 2019 02:59:40 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id D36158E0001; Mon, 21 Jan 2019 02:59:40 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C4CDC8E000E; Mon, 21 Jan 2019 02:59:40 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qt1-f198.google.com (mail-qt1-f198.google.com [209.85.160.198]) by kanga.kvack.org (Postfix) with ESMTP id 99D438E0001 for ; Mon, 21 Jan 2019 02:59:40 -0500 (EST) Received: by mail-qt1-f198.google.com with SMTP id t18so20335248qtj.3 for ; Sun, 20 Jan 2019 23:59:40 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=Mka3b8vvtY1z/EONomoJWbxxwVNXsJ/BDhDKm5hAFfQ=; b=kVXWBiK3OSaSCHHt0PQEIkFq4o5D5oGFZTkJcf63OIDZiaY2hDkLhoAyT7LWXWimpN ud0I/9KSeMrvedjSxGFLwQ9GOc6CShu8MsnpEkBWNhkcbB6D/hQngI63qaxDxS1JK9P3 o36OQ0mdnGm2dZQcNky1oIC7kMRsbcVbiJVpP+R/tDtPHPDaa0Yuc0FZs+3LtgidXr9r hjrQWEzgR48gIH0azknRkAX2wfHQtP+CPvTmoVQYrNqfTlJurC9WaAlOZtj7k2RCMz8j 4y+0QmBczVo1BaBlHhDpDBZQb+eVh3YWIYflAIwWO9w2Gk9a69wFWwS7uWBfjf7yiBFQ Azag== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of peterx@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com X-Gm-Message-State: AJcUukd12wVN8VgWcLJPIbpKe/BRYjX0BZpG4TvCoSzAB2O7+QQ4E6AV hHKPXa3unNwxe5KMbKzw4UgjObJv1OXKrNnyNIqOOWUq6pVmX0+Fl9+tNds6FunHgPUw5BSAXfR IU+6NdDZvZYYIh3ZGiL85ENG+CStqLsTjjLh+i4A+cmr6RRNZJMsCbvi03vqK4GdPBQ== X-Received: by 2002:a37:be84:: with SMTP id o126mr24240929qkf.312.1548057580379; Sun, 20 Jan 2019 23:59:40 -0800 (PST) X-Google-Smtp-Source: ALg8bN4yPPEPXNOy++eHxQWeXU4627C1T4+d7yJbdfa1wZXU5a/NwcYc5sZOWJ5IdpnjWX9WuNMH X-Received: by 2002:a37:be84:: with SMTP id o126mr24240906qkf.312.1548057579708; Sun, 20 Jan 2019 23:59:39 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1548057579; cv=none; d=google.com; s=arc-20160816; b=bTfTcpbrW24n9SpMCSxkCm/203ihyUUUj9uh7uiEw5wo0lJHZJXCRk7324VqfhVo9O 5gJ8R4TAQfl4rH8nQX1Gv71YMpZbCV9fYyziEt1SPkrElddtT4KcdI3+/1KRC5xJmvPA TK77AMBl5O4SBOsOlw1R1mTEhtVykkBRmvexy8wxAyzzJB4VxoEvoUOctrHJSpJ9Sfmm pqSyY8o+j5PpSkO6f1spdPWq15CECdbDUEWcN8AhY+loJhxWwZuevzkeLtUQNkj+9Ptq 2IWZD4f5GkdJHB1oKsMgDoWRwuEsLcQZEy3GuhWkK4KkbgXoMip3NOlwN2aXt72OkXak r7cg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=Mka3b8vvtY1z/EONomoJWbxxwVNXsJ/BDhDKm5hAFfQ=; b=s5e1UxFRdIAfgqb42QtCY2RMQ9H5vs5vObtIVrpg/PegDJd3b2pP+XlNv9I0tmckuv yRgXbjAMmQMKl1ufN2DVAImxrLH05d5ZdhQtiVZx8nxfEpYrRdwI6P+erKhXZcDf2C5L ODS15ekdhgvz+12Mj0YDLWw1Ijc9shvlwxfUZYk/iobOMh7NCSZxDLk/idIWnrO4MjAW Ki6edVIv4DJuvqSrsBIIXaJTtQqjZA08+69P2SaP31ecUayJV6QGVA6IKlE7gFrqXEkm TeT1KbU4/K+IaSgTKLi6meGgRe0UWuFldnYyhRX3edZspOMpmwjkLKDqQx/+mbXoUUlg tmiA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of peterx@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from mx1.redhat.com (mx1.redhat.com. [209.132.183.28]) by mx.google.com with ESMTPS id u24si1168646qtj.98.2019.01.20.23.59.39 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 20 Jan 2019 23:59:39 -0800 (PST) Received-SPF: pass (google.com: domain of peterx@redhat.com designates 209.132.183.28 as permitted sender) client-ip=209.132.183.28; Authentication-Results: mx.google.com; spf=pass (google.com: domain of peterx@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id C392719CF67; Mon, 21 Jan 2019 07:59:38 +0000 (UTC) Received: from xz-x1.nay.redhat.com (dhcp-14-116.nay.redhat.com [10.66.14.116]) by smtp.corp.redhat.com (Postfix) with ESMTP id E4481608F3; Mon, 21 Jan 2019 07:59:32 +0000 (UTC) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Hugh Dickins , Maya Gokhale , Jerome Glisse , Johannes Weiner , peterx@redhat.com, Martin Cracauer , Denis Plotnikov , Shaohua Li , Andrea Arcangeli , Pavel Emelyanov , Mike Kravetz , Marty McFadden , Mike Rapoport , Mel Gorman , "Kirill A . Shutemov" , "Dr . David Alan Gilbert" Subject: [PATCH RFC 16/24] userfaultfd: wp: handle COW properly for uffd-wp Date: Mon, 21 Jan 2019 15:57:14 +0800 Message-Id: <20190121075722.7945-17-peterx@redhat.com> In-Reply-To: <20190121075722.7945-1-peterx@redhat.com> References: <20190121075722.7945-1-peterx@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.29]); Mon, 21 Jan 2019 07:59:39 +0000 (UTC) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP This allows uffd-wp to support write-protected pages for COW. For example, the uffd write-protected PTE could also be write-protected by other usages like COW or zero pages. When that happens, we can't simply set the write bit in the PTE since otherwise it'll change the content of every single reference to the page. Instead, we should do the COW first if necessary, then handle the uffd-wp fault. To correctly copy the page, we'll also need to carry over the _PAGE_UFFD_WP bit if it was set in the original PTE. For huge PMDs, we just simply split the huge PMDs where we want to resolve an uffd-wp page fault always. That matches what we do with general huge PMD write protections. In that way, we resolved the huge PMD copy-on-write issue into PTE copy-on-write. Signed-off-by: Peter Xu --- mm/memory.c | 2 ++ mm/mprotect.c | 55 ++++++++++++++++++++++++++++++++++++++++++++++++--- 2 files changed, 54 insertions(+), 3 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index ef823c07f635..a3de13b728f4 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -2290,6 +2290,8 @@ vm_fault_t wp_page_copy(struct vm_fault *vmf) } flush_cache_page(vma, vmf->address, pte_pfn(vmf->orig_pte)); entry = mk_pte(new_page, vma->vm_page_prot); + if (pte_uffd_wp(vmf->orig_pte)) + entry = pte_mkuffd_wp(entry); entry = maybe_mkwrite(pte_mkdirty(entry), vma); /* * Clear the pte entry and flush it first, before updating the diff --git a/mm/mprotect.c b/mm/mprotect.c index 000e246c163b..c37c9aa7a54e 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -77,14 +77,13 @@ static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd, if (pte_present(oldpte)) { pte_t ptent; bool preserve_write = prot_numa && pte_write(oldpte); + struct page *page; /* * Avoid trapping faults against the zero or KSM * pages. See similar comment in change_huge_pmd. */ if (prot_numa) { - struct page *page; - page = vm_normal_page(vma, addr, oldpte); if (!page || PageKsm(page)) continue; @@ -114,6 +113,46 @@ static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd, continue; } + /* + * Detect whether we'll need to COW before + * resolving an uffd-wp fault. Note that this + * includes detection of the zero page (where + * page==NULL) + */ + if (uffd_wp_resolve) { + /* If the fault is resolved already, skip */ + if (!pte_uffd_wp(*pte)) + continue; + page = vm_normal_page(vma, addr, oldpte); + if (!page || page_mapcount(page) > 1) { + struct vm_fault vmf = { + .vma = vma, + .address = addr & PAGE_MASK, + .page = page, + .orig_pte = oldpte, + .pmd = pmd, + /* pte and ptl not needed */ + }; + vm_fault_t ret; + + if (page) + get_page(page); + arch_leave_lazy_mmu_mode(); + pte_unmap_unlock(pte, ptl); + ret = wp_page_copy(&vmf); + /* PTE is changed, or OOM */ + if (ret == 0) + /* It's done by others */ + continue; + else if (WARN_ON(ret != VM_FAULT_WRITE)) + return pages; + pte = pte_offset_map_lock(vma->vm_mm, + pmd, addr, + &ptl); + arch_enter_lazy_mmu_mode(); + } + } + ptent = ptep_modify_prot_start(mm, addr, pte); ptent = pte_modify(ptent, newprot); if (preserve_write) @@ -184,6 +223,7 @@ static inline unsigned long change_pmd_range(struct vm_area_struct *vma, unsigned long pages = 0; unsigned long nr_huge_updates = 0; unsigned long mni_start = 0; + bool uffd_wp_resolve = cp_flags & MM_CP_UFFD_WP_RESOLVE; pmd = pmd_offset(pud, addr); do { @@ -201,7 +241,16 @@ static inline unsigned long change_pmd_range(struct vm_area_struct *vma, } if (is_swap_pmd(*pmd) || pmd_trans_huge(*pmd) || pmd_devmap(*pmd)) { - if (next - addr != HPAGE_PMD_SIZE) { + /* + * When resolving an userfaultfd write + * protection fault, it's not easy to identify + * whether a THP is shared with others and + * whether we'll need to do copy-on-write, so + * just split it always for now to simply the + * procedure. And that's the policy too for + * general THP write-protect in af9e4d5f2de2. + */ + if (next - addr != HPAGE_PMD_SIZE || uffd_wp_resolve) { __split_huge_pmd(vma, pmd, addr, false, NULL); } else { int nr_ptes = change_huge_pmd(vma, pmd, addr,