From patchwork Fri Sep 25 22:26:00 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 11800939 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 62099112E for ; Fri, 25 Sep 2020 22:26:09 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id D648B221EC for ; Fri, 25 Sep 2020 22:26:08 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="BVt7XG6g" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D648B221EC Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id C5CB3900003; Fri, 25 Sep 2020 18:26:07 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id C32206B0068; Fri, 25 Sep 2020 18:26:07 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AFFBA6B006C; Fri, 25 Sep 2020 18:26:07 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0203.hostedemail.com [216.40.44.203]) by kanga.kvack.org (Postfix) with ESMTP id 954476B0062 for ; Fri, 25 Sep 2020 18:26:07 -0400 (EDT) Received: from smtpin08.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 5BE25180AD802 for ; Fri, 25 Sep 2020 22:26:07 +0000 (UTC) X-FDA: 77303017974.08.skate60_450887d2716b Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin08.hostedemail.com (Postfix) with ESMTP id 418161819E76B for ; Fri, 25 Sep 2020 22:26:07 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,peterx@redhat.com,,RULES_HIT:30054:30090,0,RBL:63.128.21.124:@redhat.com:.lbl8.mailshell.net-62.18.0.100 64.10.201.10;04yrzntu7zdtpb4xyf8w39x6ykefqyc7ahcy5uhyb4bawgg4gjs46byhh89tsr1.9sbfn1do43r8ztxw39gpsicwguwi8mhtq54gyb3njpk57uhson9i9gqfc8jwaek.e-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:23,LUA_SUMMARY:none X-HE-Tag: skate60_450887d2716b X-Filterd-Recvd-Size: 6992 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [63.128.21.124]) by imf06.hostedemail.com (Postfix) with ESMTP for ; Fri, 25 Sep 2020 22:26:06 +0000 (UTC) Dkim-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1601072766; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=83VPDmzYEL7zTGyeIJw5KGTP3VVq8cB09Qjoi453FqI=; b=BVt7XG6gyD+VbzTS4ma2CniHiwjIal8kPyQo7Ndf91a+T/Eknpu4sZvGk94sOSN8sGPzjc Mw01qOjjIBKKjiKwLA6sW7dcComRaE3c9CfiJO51qu2KbmXo+96nJsJxJ98ltGXieEP+jg cwwyGAP6z0iz8Dg/MrZF6C/4sB9st2Y= Received: from mail-qv1-f70.google.com (mail-qv1-f70.google.com [209.85.219.70]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-490-xW-5BVOwObCrHdYgk9effw-1; Fri, 25 Sep 2020 18:26:04 -0400 X-MC-Unique: xW-5BVOwObCrHdYgk9effw-1 Received: by mail-qv1-f70.google.com with SMTP id k14so2722977qvw.20 for ; Fri, 25 Sep 2020 15:26:04 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=83VPDmzYEL7zTGyeIJw5KGTP3VVq8cB09Qjoi453FqI=; b=N+u69ITjFrOemEVA5J6hZXPLOUuMd0AYqUMcmuECRF6i1Kc8iX6zyHqDmCjKjKx7aV +bBOZo9knRaXZ/H6NhRvERHHAloIlIiS1WSj1eN5p0+vRkvXFhGcrtduFZ+sxzK8NZGj KIDwI1SUM5mIr1W5bIvSiTvtBh0faG9cRfeDg6m/IdgpxWdrWPT5ArOdUhbUIt+J1ick VRv2cl4oVp+PVd7VRzzxomDpBpxeMfAvdeN845nAKDNPB+GhOVFLSvhYhTBl98RKB8nj zITMtnkQXdT3qfhi5SgZ/UvwlDe4KVOnAbdNWGeS3NtTFBLXpL5uROTDQ28DKDk9whWz BvtA== X-Gm-Message-State: AOAM533lmh7pSargfYklkGl4yY6uS/XKfA9mVohSLvHcn1SUK08U4Nfz 5f3Q0AOq54ORP3mUL/m+huL15p4S7BP4+NoaM/4MQGQ6zZ+CI/Do89wTPdQqKfZsoAPxLh6+/Fc Cg6k4qxc7tpQ= X-Received: by 2002:a05:620a:1367:: with SMTP id d7mr2294653qkl.20.1601072764007; Fri, 25 Sep 2020 15:26:04 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzRyeulRe69kAimB9nqfmhL3dvaIO6LziVgyjL+yf/PZwpi4PZATybtfpDGIbm/A0z9wGWDeA== X-Received: by 2002:a05:620a:1367:: with SMTP id d7mr2294608qkl.20.1601072763613; Fri, 25 Sep 2020 15:26:03 -0700 (PDT) Received: from localhost.localdomain (bras-vprn-toroon474qw-lp130-11-70-53-122-15.dsl.bell.ca. [70.53.122.15]) by smtp.gmail.com with ESMTPSA id w44sm3051471qth.9.2020.09.25.15.26.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 25 Sep 2020 15:26:02 -0700 (PDT) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: peterx@redhat.com, Jason Gunthorpe , John Hubbard , Andrew Morton , Christoph Hellwig , Yang Shi , Oleg Nesterov , Kirill Tkhai , Kirill Shutemov , Hugh Dickins , Jann Horn , Linus Torvalds , Michal Hocko , Jan Kara , Andrea Arcangeli , Leon Romanovsky Subject: [PATCH v2 4/4] mm/thp: Split huge pmds/puds if they're pinned when fork() Date: Fri, 25 Sep 2020 18:26:00 -0400 Message-Id: <20200925222600.6832-5-peterx@redhat.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200925222600.6832-1-peterx@redhat.com> References: <20200925222600.6832-1-peterx@redhat.com> MIME-Version: 1.0 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=peterx@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Pinned pages shouldn't be write-protected when fork() happens, because follow up copy-on-write on these pages could cause the pinned pages to be replaced by random newly allocated pages. For huge PMDs, we split the huge pmd if pinning is detected. So that future handling will be done by the PTE level (with our latest changes, each of the small pages will be copied). We can achieve this by let copy_huge_pmd() return -EAGAIN for pinned pages, so that we'll fallthrough in copy_pmd_range() and finally land the next copy_pte_range() call. Huge PUDs will be even more special - so far it does not support anonymous pages. But it can actually be done the same as the huge PMDs even if the split huge PUDs means to erase the PUD entries. It'll guarantee the follow up fault ins will remap the same pages in either parent/child later. This might not be the most efficient way, but it should be easy and clean enough. It should be fine, since we're tackling with a very rare case just to make sure userspaces that pinned some thps will still work even without MADV_DONTFORK and after they fork()ed. Signed-off-by: Peter Xu --- mm/huge_memory.c | 28 ++++++++++++++++++++++++++++ 1 file changed, 28 insertions(+) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index faadc449cca5..da397779a6d4 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1074,6 +1074,24 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm, src_page = pmd_page(pmd); VM_BUG_ON_PAGE(!PageHead(src_page), src_page); + + /* + * If this page is a potentially pinned page, split and retry the fault + * with smaller page size. Normally this should not happen because the + * userspace should use MADV_DONTFORK upon pinned regions. This is a + * best effort that the pinned pages won't be replaced by another + * random page during the coming copy-on-write. + */ + if (unlikely(is_cow_mapping(vma->vm_flags) && + atomic_read(&src_mm->has_pinned) && + page_maybe_dma_pinned(src_page))) { + pte_free(dst_mm, pgtable); + spin_unlock(src_ptl); + spin_unlock(dst_ptl); + __split_huge_pmd(vma, src_pmd, addr, false, NULL); + return -EAGAIN; + } + get_page(src_page); page_dup_rmap(src_page, true); add_mm_counter(dst_mm, MM_ANONPAGES, HPAGE_PMD_NR); @@ -1177,6 +1195,16 @@ int copy_huge_pud(struct mm_struct *dst_mm, struct mm_struct *src_mm, /* No huge zero pud yet */ } + /* Please refer to comments in copy_huge_pmd() */ + if (unlikely(is_cow_mapping(vma->vm_flags) && + atomic_read(&src_mm->has_pinned) && + page_maybe_dma_pinned(pud_page(pud)))) { + spin_unlock(src_ptl); + spin_unlock(dst_ptl); + __split_huge_pud(vma, src_pud, addr); + return -EAGAIN; + } + pudp_set_wrprotect(src_mm, addr, src_pud); pud = pud_mkold(pud_wrprotect(pud)); set_pud_at(dst_mm, addr, dst_pud, pud);