From patchwork Thu Feb 20 16:31:04 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 11394563 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7BD95159A for ; Thu, 20 Feb 2020 16:31:53 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 3F127206F4 for ; Thu, 20 Feb 2020 16:31:53 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="jJIFsUAJ" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3F127206F4 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 465556B006E; Thu, 20 Feb 2020 11:31:42 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 3F0F96B0071; Thu, 20 Feb 2020 11:31:42 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 132816B0070; Thu, 20 Feb 2020 11:31:42 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0056.hostedemail.com [216.40.44.56]) by kanga.kvack.org (Postfix) with ESMTP id E22E16B006E for ; Thu, 20 Feb 2020 11:31:41 -0500 (EST) Received: from smtpin12.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 91EF934A3 for ; Thu, 20 Feb 2020 16:31:41 +0000 (UTC) X-FDA: 76511046402.12.floor33_2bd13c802ea60 X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,peterx@redhat.com,::linux-kernel@vger.kernel.org:bgeffon@google.com:xemul@virtuozzo.com:mike.kravetz@oracle.com:david@redhat.com:peterx@redhat.com:cracauer@cons.org:aarcange@redhat.com:mgorman@suse.de:bobbypowers@gmail.com:rppt@linux.vnet.ibm.com:kirill@shutemov.name:gokhale2@llnl.gov:hannes@cmpxchg.org:mcfadden8@llnl.gov:dplotnikov@virtuozzo.com:hughd@google.com:dgilbert@redhat.com:jglisse@redhat.com,RULES_HIT:30054:30070:30075,0,RBL:205.139.110.120:@redhat.com:.lbl8.mailshell.net-62.18.0.100 66.10.201.10,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:1:0,LFtime:24,LUA_SUMMARY:none X-HE-Tag: floor33_2bd13c802ea60 X-Filterd-Recvd-Size: 7081 Received: from us-smtp-1.mimecast.com (us-smtp-delivery-1.mimecast.com [205.139.110.120]) by imf27.hostedemail.com (Postfix) with ESMTP for ; Thu, 20 Feb 2020 16:31:40 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1582216300; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=LbrteasogGSyXQLbXHEEXDi4/YaIHooGYndFd7vewto=; b=jJIFsUAJSe2qXQdfiyfcGciz906dg12skC4Vor6t044WZd7oUzs8khP7LAQ7MB3IYOI8ND MIVyNGYBrOzRaR2gP5Xs/o2jiLnP4YWl/9JDxSyTM02zkb73GGFaJviJjEmfHKUvr3U8CK Upjn/j2x+1G/NTvBGgLGKH9QbnP/E1c= Received: from mail-qt1-f198.google.com (mail-qt1-f198.google.com [209.85.160.198]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-416-YTymRFOSPxeLznu-jHheHQ-1; Thu, 20 Feb 2020 11:31:37 -0500 Received: by mail-qt1-f198.google.com with SMTP id d9so2962816qtq.13 for ; Thu, 20 Feb 2020 08:31:37 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=aG1PP/azsdciMGR9kcbYgDDygB48VmVQwcR6SsxMpCc=; b=RorzHY3frj4mpuqTLUhuoVrWyjwBuiALt9B9AA6Qvp0lShBwuEZ0UWyLee4n5tXh3C FcqrJPPp4aBMx9wDMwLeb2CqorvRImJZaBn86/rM+PbV7jhjo/shH8XkjCRxfT0Q72Pa 2y6sgRjDHF6aIH+K15dkGJOx9XVAUKjlqnp8vyPWLreQ+u1LDa5Je2/xn9E4E2DNIKDe WPXRdX2g3w9lLX3pqfs0HfCeRW6mXTi6Rpsth5SsfaYvN/8TnF2T/c1WW0BPt5s7Gn7v /nxSlwG3GBb4PG3BxuEKI7zu8SffVRfpCyn7+1cOlNVgz4q50L4GPNOSHOkg4XbgSa/W ojZQ== X-Gm-Message-State: APjAAAUD8e+GrDAiC0PdtzCz8yyBbD09uZ0utCy8oerd4H46Bf/b6K93 XcG1mBiObBXUtwB6NTj9hXhqZ+FStGkXylePpC4LCJ99Oicfu7VaigPlJkOgz8DSHqiUj0sbsjF 5VVoqWHmIO0c= X-Received: by 2002:a37:a12:: with SMTP id 18mr29827786qkk.249.1582216296546; Thu, 20 Feb 2020 08:31:36 -0800 (PST) X-Google-Smtp-Source: APXvYqyeeoJmBfxz+eoE07FIBfNGkpp7QPDv14l0vh9lWzGoCTi5/u4us5+ZDy8T66tGi8PhjGv1Og== X-Received: by 2002:a37:a12:: with SMTP id 18mr29827758qkk.249.1582216296306; Thu, 20 Feb 2020 08:31:36 -0800 (PST) Received: from xz-x1.redhat.com ([104.156.64.75]) by smtp.gmail.com with ESMTPSA id l19sm42366qkl.3.2020.02.20.08.31.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 20 Feb 2020 08:31:35 -0800 (PST) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Brian Geffon , Pavel Emelyanov , Mike Kravetz , David Hildenbrand , peterx@redhat.com, Martin Cracauer , Andrea Arcangeli , Mel Gorman , Bobby Powers , Mike Rapoport , "Kirill A . Shutemov" , Maya Gokhale , Johannes Weiner , Marty McFadden , Denis Plotnikov , Hugh Dickins , "Dr . David Alan Gilbert" , Jerome Glisse Subject: [PATCH v6 11/19] khugepaged: skip collapse if uffd-wp detected Date: Thu, 20 Feb 2020 11:31:04 -0500 Message-Id: <20200220163112.11409-12-peterx@redhat.com> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20200220163112.11409-1-peterx@redhat.com> References: <20200220163112.11409-1-peterx@redhat.com> MIME-Version: 1.0 X-MC-Unique: YTymRFOSPxeLznu-jHheHQ-1 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Don't collapse the huge PMD if there is any userfault write protected small PTEs. The problem is that the write protection is in small page granularity and there's no way to keep all these write protection information if the small pages are going to be merged into a huge PMD. The same thing needs to be considered for swap entries and migration entries. So do the check as well disregarding khugepaged_max_ptes_swap. Reviewed-by: Jerome Glisse Reviewed-by: Mike Rapoport Signed-off-by: Peter Xu --- include/trace/events/huge_memory.h | 1 + mm/khugepaged.c | 23 +++++++++++++++++++++++ 2 files changed, 24 insertions(+) diff --git a/include/trace/events/huge_memory.h b/include/trace/events/huge_memory.h index d82a0f4e824d..70e32ff096ec 100644 --- a/include/trace/events/huge_memory.h +++ b/include/trace/events/huge_memory.h @@ -13,6 +13,7 @@ EM( SCAN_PMD_NULL, "pmd_null") \ EM( SCAN_EXCEED_NONE_PTE, "exceed_none_pte") \ EM( SCAN_PTE_NON_PRESENT, "pte_non_present") \ + EM( SCAN_PTE_UFFD_WP, "pte_uffd_wp") \ EM( SCAN_PAGE_RO, "no_writable_page") \ EM( SCAN_LACK_REFERENCED_PAGE, "lack_referenced_page") \ EM( SCAN_PAGE_NULL, "page_null") \ diff --git a/mm/khugepaged.c b/mm/khugepaged.c index b679908743cb..789485cc9387 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -29,6 +29,7 @@ enum scan_result { SCAN_PMD_NULL, SCAN_EXCEED_NONE_PTE, SCAN_PTE_NON_PRESENT, + SCAN_PTE_UFFD_WP, SCAN_PAGE_RO, SCAN_LACK_REFERENCED_PAGE, SCAN_PAGE_NULL, @@ -1141,6 +1142,15 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, pte_t pteval = *_pte; if (is_swap_pte(pteval)) { if (++unmapped <= khugepaged_max_ptes_swap) { + /* + * Always be strict with uffd-wp + * enabled swap entries. Please see + * comment below for pte_uffd_wp(). + */ + if (pte_swp_uffd_wp(pteval)) { + result = SCAN_PTE_UFFD_WP; + goto out_unmap; + } continue; } else { result = SCAN_EXCEED_SWAP_PTE; @@ -1160,6 +1170,19 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, result = SCAN_PTE_NON_PRESENT; goto out_unmap; } + if (pte_uffd_wp(pteval)) { + /* + * Don't collapse the page if any of the small + * PTEs are armed with uffd write protection. + * Here we can also mark the new huge pmd as + * write protected if any of the small ones is + * marked but that could bring uknown + * userfault messages that falls outside of + * the registered range. So, just be simple. + */ + result = SCAN_PTE_UFFD_WP; + goto out_unmap; + } if (pte_write(pteval)) writable = true;