From patchwork Tue Jul 10 23:34:07 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yang Shi X-Patchwork-Id: 10518409 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 5F0C56032A for ; Tue, 10 Jul 2018 23:34:44 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4CA0E28EB9 for ; Tue, 10 Jul 2018 23:34:44 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 4026228ECF; Tue, 10 Jul 2018 23:34:44 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00, MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7FDA028EB9 for ; Tue, 10 Jul 2018 23:34:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A89F46B0010; Tue, 10 Jul 2018 19:34:42 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id A601E6B0269; Tue, 10 Jul 2018 19:34:42 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9788C6B026A; Tue, 10 Jul 2018 19:34:42 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl0-f71.google.com (mail-pl0-f71.google.com [209.85.160.71]) by kanga.kvack.org (Postfix) with ESMTP id 576806B0010 for ; Tue, 10 Jul 2018 19:34:42 -0400 (EDT) Received: by mail-pl0-f71.google.com with SMTP id az8-v6so13132206plb.15 for ; Tue, 10 Jul 2018 16:34:42 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=mvVRrOvZARBJQg0lTYpOBKlXq05ck3r7VNWomqi7UxQ=; b=myiqf/PqjTCaLD5qXfdi2Tqhw1f6QlkugcKIECI0Cbu1ua5bgszwgdKu0cScTf6t7U HUJqmo+t9W/CMgNPU/mRluJcFWefyidLMe7Y9WwMXTOC/ig6/TErFN0leChQFzpo0KV4 UWS6C6Y8GHoPTA76f+YMgmMVjnS0n7z1+Rr3VIcMT8QvCXDbnppAu3JLIFktxA/jX7k1 gH4NgheSZILr5eD/axC1iLvYBoix1QqgEoWUcHmU3LiMa5iuH+ffdkTBx/7lDgpn0h00 obBNjMdX1nfmL8xSYOUCIQ4/FEucGNwJ0GCKDpmGBXPtrHi3iVXCiucW3m/5SVVtgNxD Cj2A== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of yang.shi@linux.alibaba.com designates 47.88.44.37 as permitted sender) smtp.mailfrom=yang.shi@linux.alibaba.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=alibaba.com X-Gm-Message-State: APt69E2/rSzhKJNGgnrnn8jviz3nRtpvlfUx2xeaUqZ2sc7VsTHQHpJI t1zltxd6aMN6NN2uqvu+QvJtoa6rQuCP/gXyAqfh38TioPtec8qpsbPxgvH3MGeo5FFVLnUAbKg pZ0h8xwNcBsQNd85I4GN4hanVxAd1B5oU/XVIyMKQgjLmblucNBye3x76T59qNA5eLw== X-Received: by 2002:a65:4888:: with SMTP id n8-v6mr24877277pgs.149.1531265682020; Tue, 10 Jul 2018 16:34:42 -0700 (PDT) X-Google-Smtp-Source: AAOMgpfVXEXK+1uIFWw1FHnGNPVLImkCdpYWej/YD7kNbXfeGDe+12WT5hiunF8t/X2IX73frx5q X-Received: by 2002:a65:4888:: with SMTP id n8-v6mr24877239pgs.149.1531265681133; Tue, 10 Jul 2018 16:34:41 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1531265681; cv=none; d=google.com; s=arc-20160816; b=VWr3Mm5qMsz3zNxUM+neQn8mm6O0vHyLlhgmNjwjbmJAHncrZH4GBDRfqiwFHTp87O Q/apXYLHbiQuquDpg3ZHakT29Y3IzKU2OstcSqhwotz+BBCypdponXvIXFnqUiGeH36H hGjZT2Nsw5x3kK/D543/+aB4XU0pgIIdJHI7FwFTvQZIoNJ+f7GDVuRLBiyhBBuWUNmE 1H0v/1JiRWzpiMB7ms9njM8jpNf6jTFikO5Ea12DDgx+gyLDnzB+w2XHMI5OMwMsdrYU pYGEx90JQGgLn8T55IeJtgOmbJ2Zs0MDFYtFULd00vFT9ARYuQUJpwC/fUfH+pimwO2l clqw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from :arc-authentication-results; bh=mvVRrOvZARBJQg0lTYpOBKlXq05ck3r7VNWomqi7UxQ=; b=zV6df9BOrQ96y/Y9grHJ97ttXvaDC83p+BS80+EwSJqb4tzL9xoqWoSAAtYL6TGt1X UTlhxn5hNT9SmYDtFESIvK/2z2dkJrk2gyWlQo34jpuGmwlpiTaM960XO+mkUJzeRaO8 BeksdkjJbCBl35lieTXX9j/Xi+bmjnjzKJ3/Hz8LRFzytHjkmTVIDrV1Wz2GRg/Ag/fb smC+TLhXTJAuiDHIQE5cgzQmjOdAxBTDI+Z40MblzpTCeJdHfrS5waDq8T60Hfu3e/LE CZoNm25o8MsIWZyR+chQb7Q8Q+sPlhsoPuEVDzpwDL89GWCLdEmTh6DUSh0ENPl3fOR4 Szpw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of yang.shi@linux.alibaba.com designates 47.88.44.37 as permitted sender) smtp.mailfrom=yang.shi@linux.alibaba.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: from out4437.biz.mail.alibaba.com (out4437.biz.mail.alibaba.com. [47.88.44.37]) by mx.google.com with ESMTPS id v18-v6si17492819pgl.171.2018.07.10.16.34.39 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 10 Jul 2018 16:34:41 -0700 (PDT) Received-SPF: pass (google.com: domain of yang.shi@linux.alibaba.com designates 47.88.44.37 as permitted sender) client-ip=47.88.44.37; Authentication-Results: mx.google.com; spf=pass (google.com: domain of yang.shi@linux.alibaba.com designates 47.88.44.37 as permitted sender) smtp.mailfrom=yang.shi@linux.alibaba.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=alibaba.com X-Alimail-AntiSpam: AC=PASS; BC=-1|-1; BR=01201311R151e4; CH=green; FP=0|-1|-1|-1|0|-1|-1|-1; HT=e01f04452; MF=yang.shi@linux.alibaba.com; NM=1; PH=DS; RN=8; SR=0; TI=SMTPD_---0T4Ny6oS_1531265651; Received: from e19h19392.et15sqa.tbsite.net(mailfrom:yang.shi@linux.alibaba.com fp:SMTPD_---0T4Ny6oS_1531265651) by smtp.aliyun-inc.com(127.0.0.1); Wed, 11 Jul 2018 07:34:25 +0800 From: Yang Shi To: mhocko@kernel.org, willy@infradead.org, ldufour@linux.vnet.ibm.com, kirill@shutemov.name, akpm@linux-foundation.org Cc: yang.shi@linux.alibaba.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [RFC v4 PATCH 1/3] mm: introduce VM_DEAD flag and extend check_stable_address_space to check it Date: Wed, 11 Jul 2018 07:34:07 +0800 Message-Id: <1531265649-93433-2-git-send-email-yang.shi@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1531265649-93433-1-git-send-email-yang.shi@linux.alibaba.com> References: <1531265649-93433-1-git-send-email-yang.shi@linux.alibaba.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP VM_DEAD flag is used to mark a vma is being unmapped for the later munmap large address space optimization. Before the optimization PF race with munmap, may return the right content or SIGSEGV, but with the optimization, it may return a zero page. Use this flag to mark PF to this area is unstable, will trigger SIGSEGV, in order to prevent from the 3rd state. This flag will be set by the optimization for unmapping large address space (>= 1GB) in the later patch. It is 64 bit only at the moment, since: * we used up vm_flags bit for 32 bit * 32 bit machine typically will not have such large mapping Extend check_stable_address_space() to check this flag, as well as the page fault path of shmem and hugetlb. Since oom reaper doesn't tear down shmem and hugetlb, so skip those two cases for MMF_UNSTABLE. Suggested-by: Michal Hocko Signed-off-by: Yang Shi --- include/linux/mm.h | 8 ++++++++ include/linux/oom.h | 20 -------------------- mm/huge_memory.c | 4 ++-- mm/hugetlb.c | 5 +++++ mm/memory.c | 39 +++++++++++++++++++++++++++++++++++---- mm/shmem.c | 9 ++++++++- 6 files changed, 58 insertions(+), 27 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index a0fbb9f..ce7b112 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -242,6 +242,12 @@ extern int overcommit_kbytes_handler(struct ctl_table *, int, void __user *, #endif #endif /* CONFIG_ARCH_HAS_PKEYS */ +#ifdef CONFIG_64BIT +#define VM_DEAD BIT(37) /* bit only usable on 64 bit kernel */ +#else +#define VM_DEAD 0 +#endif + #if defined(CONFIG_X86) # define VM_PAT VM_ARCH_1 /* PAT reserves whole VMA at once (x86) */ #elif defined(CONFIG_PPC) @@ -2782,5 +2788,7 @@ static inline bool page_is_guard(struct page *page) static inline void setup_nr_node_ids(void) {} #endif +extern int check_stable_address_space(struct vm_area_struct *vma); + #endif /* __KERNEL__ */ #endif /* _LINUX_MM_H */ diff --git a/include/linux/oom.h b/include/linux/oom.h index 6adac11..0265ed5 100644 --- a/include/linux/oom.h +++ b/include/linux/oom.h @@ -75,26 +75,6 @@ static inline bool mm_is_oom_victim(struct mm_struct *mm) return test_bit(MMF_OOM_VICTIM, &mm->flags); } -/* - * Checks whether a page fault on the given mm is still reliable. - * This is no longer true if the oom reaper started to reap the - * address space which is reflected by MMF_UNSTABLE flag set in - * the mm. At that moment any !shared mapping would lose the content - * and could cause a memory corruption (zero pages instead of the - * original content). - * - * User should call this before establishing a page table entry for - * a !shared mapping and under the proper page table lock. - * - * Return 0 when the PF is safe VM_FAULT_SIGBUS otherwise. - */ -static inline int check_stable_address_space(struct mm_struct *mm) -{ - if (unlikely(test_bit(MMF_UNSTABLE, &mm->flags))) - return VM_FAULT_SIGBUS; - return 0; -} - void __oom_reap_task_mm(struct mm_struct *mm); extern unsigned long oom_badness(struct task_struct *p, diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 1cd7c1a..997bac9 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -578,7 +578,7 @@ static int __do_huge_pmd_anonymous_page(struct vm_fault *vmf, struct page *page, } else { pmd_t entry; - ret = check_stable_address_space(vma->vm_mm); + ret = check_stable_address_space(vma); if (ret) goto unlock_release; @@ -696,7 +696,7 @@ int do_huge_pmd_anonymous_page(struct vm_fault *vmf) ret = 0; set = false; if (pmd_none(*vmf->pmd)) { - ret = check_stable_address_space(vma->vm_mm); + ret = check_stable_address_space(vma); if (ret) { spin_unlock(vmf->ptl); } else if (userfaultfd_missing(vma)) { diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 3612fbb..8965d02 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -3887,6 +3887,10 @@ int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, int need_wait_lock = 0; unsigned long haddr = address & huge_page_mask(h); + ret = check_stable_address_space(vma); + if (ret) + goto out; + ptep = huge_pte_offset(mm, haddr, huge_page_size(h)); if (ptep) { entry = huge_ptep_get(ptep); @@ -4006,6 +4010,7 @@ int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, */ if (need_wait_lock) wait_on_page_locked(page); +out: return ret; } diff --git a/mm/memory.c b/mm/memory.c index 7206a63..250547f 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -68,7 +68,7 @@ #include #include #include -#include +#include #include #include @@ -776,6 +776,37 @@ static void print_bad_pte(struct vm_area_struct *vma, unsigned long addr, } /* + * Checks whether a page fault on the given mm is still reliable. + * This is no longer true if the oom reaper started to reap the + * address space which is reflected by MMF_UNSTABLE flag set in + * the mm. At that moment any !shared mapping would lose the content + * and could cause a memory corruption (zero pages instead of the + * original content). + * oom reaper doesn't reap hugetlb and shmem, so skip the check for + * such vmas. + * + * And, check if the given vma has VM_DEAD flag set, which means + * the vma will be unmapped soon, PF is not safe for such vma. + * + * User should call this before establishing a page table entry for + * a !shared mapping (disk file based), or shmem mapping, or hugetlb + * mapping, and under the proper page table lock. + * + * Return 0 when the PF is safe, VM_FAULT_SIGBUS or VM_FAULT_SIGSEGV + * otherwise. + */ +int check_stable_address_space(struct vm_area_struct *vma) +{ + if (vma->vm_flags & VM_DEAD) + return VM_FAULT_SIGSEGV; + if (!is_vm_hugetlb_page(vma) && !shmem_file(vma->vm_file)) { + if (unlikely(test_bit(MMF_UNSTABLE, &vma->vm_mm->flags))) + return VM_FAULT_SIGBUS; + } + return 0; +} + +/* * vm_normal_page -- This function gets the "struct page" associated with a pte. * * "Special" mappings do not wish to be associated with a "struct page" (either @@ -3147,7 +3178,7 @@ static int do_anonymous_page(struct vm_fault *vmf) vmf->address, &vmf->ptl); if (!pte_none(*vmf->pte)) goto unlock; - ret = check_stable_address_space(vma->vm_mm); + ret = check_stable_address_space(vma); if (ret) goto unlock; /* Deliver the page fault to userland, check inside PT lock */ @@ -3184,7 +3215,7 @@ static int do_anonymous_page(struct vm_fault *vmf) if (!pte_none(*vmf->pte)) goto release; - ret = check_stable_address_space(vma->vm_mm); + ret = check_stable_address_space(vma); if (ret) goto release; @@ -3495,7 +3526,7 @@ int finish_fault(struct vm_fault *vmf) * page */ if (!(vmf->vma->vm_flags & VM_SHARED)) - ret = check_stable_address_space(vmf->vma->vm_mm); + ret = check_stable_address_space(vmf->vma); if (!ret) ret = alloc_set_pte(vmf, vmf->memcg, page); if (vmf->pte) diff --git a/mm/shmem.c b/mm/shmem.c index 2cab844..9f9ac7c 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1953,7 +1953,13 @@ static vm_fault_t shmem_fault(struct vm_fault *vmf) gfp_t gfp = mapping_gfp_mask(inode->i_mapping); enum sgp_type sgp; int err; - vm_fault_t ret = VM_FAULT_LOCKED; + vm_fault_t ret = 0; + + ret = check_stable_address_space(vma); + if (ret) + goto out; + + ret = VM_FAULT_LOCKED; /* * Trinity finds that probing a hole which tmpfs is punching can @@ -2025,6 +2031,7 @@ static vm_fault_t shmem_fault(struct vm_fault *vmf) gfp, vma, vmf, &ret); if (err) return vmf_error(err); +out: return ret; }