From patchwork Mon Jan 21 07:57:08 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 10772803 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id F0ECB1390 for ; Mon, 21 Jan 2019 07:59:00 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DFDFD29C15 for ; Mon, 21 Jan 2019 07:59:00 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id D2D5A29C18; Mon, 21 Jan 2019 07:59:00 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1DF4029C15 for ; Mon, 21 Jan 2019 07:59:00 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2D9608E0004; Mon, 21 Jan 2019 02:58:59 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 288718E0001; Mon, 21 Jan 2019 02:58:59 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 178E58E0004; Mon, 21 Jan 2019 02:58:59 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qt1-f200.google.com (mail-qt1-f200.google.com [209.85.160.200]) by kanga.kvack.org (Postfix) with ESMTP id E25488E0001 for ; Mon, 21 Jan 2019 02:58:58 -0500 (EST) Received: by mail-qt1-f200.google.com with SMTP id w18so19924323qts.8 for ; Sun, 20 Jan 2019 23:58:58 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=KpcxPza66LjFTCrDzc03kNRql8zP/+CfIr1uzfP3XZk=; b=B1TXTFTPIthXclr1BWhnZLzcHdy13CWgPzs+D6aX0Y37e6599fSDv7qG6ItKI5GIpR lL1b5C0ICgBDL5oazH/F21t4fgjNFCyaVZgg8IrqFAFb08gTscns/Tr6hmCYMFI+bg4m lB1XL9C7liDwIJjrKeQBUNRk6x5wDIOoDHnGQcSFMs8THhdL2Hqyn4XCOSNjsMxwff+t TuSHMnbeV8j6w5TZ0JRBOt551u+aZooKmVrTS1bZhzgXV/njZFXdwLpO4c+hRE/rTDVH xA7UaykLieYSFpN840zS1w+uhCEll8c1ethVQ0/cW2ZlNgK0av9T1fuVtOVU82GH1pWr ew5w== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of peterx@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com X-Gm-Message-State: AJcUukcnx+J8pTySxbvrGw5XATCjgpfFPXigvfio81bvsN7/ujUXVGs0 xkP8DDA/XmdSZQ8FNogA1SMMw9J1bT6Tf6IrJpNyrbhTf3jmabsIEtrqObX7LWaf5sNM1iBdAMU amzctwAyeYIKavAGEfgiNhAfb5UCQTmzkMzEd41qxRAYr1mS5A9+WhhLYkqe9XA/GwA== X-Received: by 2002:a37:8cc5:: with SMTP id o188mr23143435qkd.62.1548057538633; Sun, 20 Jan 2019 23:58:58 -0800 (PST) X-Google-Smtp-Source: ALg8bN7CS/6PDARjWOFRri4OhHO0OpOWUaO1E2MDyLHL+t1ohAMXGkFLkxOjmLIr8qfz3418YbPE X-Received: by 2002:a37:8cc5:: with SMTP id o188mr23143412qkd.62.1548057537828; Sun, 20 Jan 2019 23:58:57 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1548057537; cv=none; d=google.com; s=arc-20160816; b=rP+Z8pDmd4V+1GugcVnfPfDV5OVfAhoAe+iVYihxLWRr3N5jDfkGpf0Vzv5MMiZDw2 IujFq3Vmvolh2vpsR3p56sLw5KmenpvV1ltzNGt/BCFnu2syssFpYamRg0pJihFTA4+w JsV3vMbokvrM7gmnjOzZWMMrtmTtbkx5ZhL2kC3SBSBFMYYm3f8wdBOPPvnwuLZ6cRxD 4OymBimDQUk4pujaU4btxQlSANUBeDeonTNLOcccqHHB4HxupDi8VUyrSZWi3Um4OQOj l1fpVO2mGnNJwNa/96SVhE7XHgU+Xj+hX6nIiGa2GEW3dveAJye51RbLNmisNOR/f61X eMFw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=KpcxPza66LjFTCrDzc03kNRql8zP/+CfIr1uzfP3XZk=; b=0JQT0kJWZ7PFp5JujOIHNCk/VjMKTXNkFqEJ1FeCyoOBvKnMCcrGpuQXxbpmF1qBaQ b7EyX0MP4OZyLcsWYGfX/g/4MevqKpBMW6kUmau/82hS2yVGubzI9VTEikLJR4HI4tLG Zj07GgMrv6aPpvJR5KEMe41BjvwkveSE56QMvcurKoNodGN+iUBYPT3Kh2259Mt7wA/z nQ8QClcg0xzVyy7OaAtIbekfwAlv4UV6o4kf6NIr0joyV9+S0hbfYuWmihcL0NXA0lzk x+MmaH7iTVpWRQosOCDdhIFc73jHUki3hySygYWzY/B1u8Sml2YLNu5dOrDsEB2y2PWP YeIg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of peterx@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from mx1.redhat.com (mx1.redhat.com. [209.132.183.28]) by mx.google.com with ESMTPS id b62si3235431qkf.59.2019.01.20.23.58.57 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 20 Jan 2019 23:58:57 -0800 (PST) Received-SPF: pass (google.com: domain of peterx@redhat.com designates 209.132.183.28 as permitted sender) client-ip=209.132.183.28; Authentication-Results: mx.google.com; spf=pass (google.com: domain of peterx@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id DA33E8046C; Mon, 21 Jan 2019 07:58:56 +0000 (UTC) Received: from xz-x1.nay.redhat.com (dhcp-14-116.nay.redhat.com [10.66.14.116]) by smtp.corp.redhat.com (Postfix) with ESMTP id C613F608C7; Mon, 21 Jan 2019 07:58:48 +0000 (UTC) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Hugh Dickins , Maya Gokhale , Jerome Glisse , Johannes Weiner , peterx@redhat.com, Martin Cracauer , Denis Plotnikov , Shaohua Li , Andrea Arcangeli , Pavel Emelyanov , Mike Kravetz , Marty McFadden , Mike Rapoport , Mel Gorman , "Kirill A . Shutemov" , "Dr . David Alan Gilbert" Subject: [PATCH RFC 10/24] userfaultfd: wp: add WP pagetable tracking to x86 Date: Mon, 21 Jan 2019 15:57:08 +0800 Message-Id: <20190121075722.7945-11-peterx@redhat.com> In-Reply-To: <20190121075722.7945-1-peterx@redhat.com> References: <20190121075722.7945-1-peterx@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.28]); Mon, 21 Jan 2019 07:58:57 +0000 (UTC) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Andrea Arcangeli Accurate userfaultfd WP tracking is possible by tracking exactly which virtual memory ranges were writeprotected by userland. We can't relay only on the RW bit of the mapped pagetable because that information is destroyed by fork() or KSM or swap. If we were to relay on that, we'd need to stay on the safe side and generate false positive wp faults for every swapped out page. Signed-off-by: Andrea Arcangeli Signed-off-by: Peter Xu --- arch/x86/Kconfig | 1 + arch/x86/include/asm/pgtable.h | 52 ++++++++++++++++++++++++++++ arch/x86/include/asm/pgtable_64.h | 8 ++++- arch/x86/include/asm/pgtable_types.h | 9 +++++ include/asm-generic/pgtable.h | 1 + include/asm-generic/pgtable_uffd.h | 51 +++++++++++++++++++++++++++ init/Kconfig | 5 +++ 7 files changed, 126 insertions(+), 1 deletion(-) create mode 100644 include/asm-generic/pgtable_uffd.h diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 8689e794a43c..096c773452d0 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -207,6 +207,7 @@ config X86 select USER_STACKTRACE_SUPPORT select VIRT_TO_BUS select X86_FEATURE_NAMES if PROC_FS + select HAVE_ARCH_USERFAULTFD_WP if USERFAULTFD config INSTRUCTION_DECODER def_bool y diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 40616e805292..7a71158982f4 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -23,6 +23,7 @@ #ifndef __ASSEMBLY__ #include +#include extern pgd_t early_top_pgt[PTRS_PER_PGD]; int __init __early_make_pgtable(unsigned long address, pmdval_t pmd); @@ -293,6 +294,23 @@ static inline pte_t pte_clear_flags(pte_t pte, pteval_t clear) return native_make_pte(v & ~clear); } +#ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP +static inline int pte_uffd_wp(pte_t pte) +{ + return pte_flags(pte) & _PAGE_UFFD_WP; +} + +static inline pte_t pte_mkuffd_wp(pte_t pte) +{ + return pte_set_flags(pte, _PAGE_UFFD_WP); +} + +static inline pte_t pte_clear_uffd_wp(pte_t pte) +{ + return pte_clear_flags(pte, _PAGE_UFFD_WP); +} +#endif /* CONFIG_HAVE_ARCH_USERFAULTFD_WP */ + static inline pte_t pte_mkclean(pte_t pte) { return pte_clear_flags(pte, _PAGE_DIRTY); @@ -372,6 +390,23 @@ static inline pmd_t pmd_clear_flags(pmd_t pmd, pmdval_t clear) return native_make_pmd(v & ~clear); } +#ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP +static inline int pmd_uffd_wp(pmd_t pmd) +{ + return pmd_flags(pmd) & _PAGE_UFFD_WP; +} + +static inline pmd_t pmd_mkuffd_wp(pmd_t pmd) +{ + return pmd_set_flags(pmd, _PAGE_UFFD_WP); +} + +static inline pmd_t pmd_clear_uffd_wp(pmd_t pmd) +{ + return pmd_clear_flags(pmd, _PAGE_UFFD_WP); +} +#endif /* CONFIG_HAVE_ARCH_USERFAULTFD_WP */ + static inline pmd_t pmd_mkold(pmd_t pmd) { return pmd_clear_flags(pmd, _PAGE_ACCESSED); @@ -1351,6 +1386,23 @@ static inline pmd_t pmd_swp_clear_soft_dirty(pmd_t pmd) #endif #endif +#ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP +static inline pte_t pte_swp_mkuffd_wp(pte_t pte) +{ + return pte_set_flags(pte, _PAGE_SWP_UFFD_WP); +} + +static inline int pte_swp_uffd_wp(pte_t pte) +{ + return pte_flags(pte) & _PAGE_SWP_UFFD_WP; +} + +static inline pte_t pte_swp_clear_uffd_wp(pte_t pte) +{ + return pte_clear_flags(pte, _PAGE_SWP_UFFD_WP); +} +#endif /* CONFIG_HAVE_ARCH_USERFAULTFD_WP */ + #define PKRU_AD_BIT 0x1 #define PKRU_WD_BIT 0x2 #define PKRU_BITS_PER_PKEY 2 diff --git a/arch/x86/include/asm/pgtable_64.h b/arch/x86/include/asm/pgtable_64.h index 9c85b54bf03c..e0c5d29b8685 100644 --- a/arch/x86/include/asm/pgtable_64.h +++ b/arch/x86/include/asm/pgtable_64.h @@ -189,7 +189,7 @@ extern void sync_global_pgds(unsigned long start, unsigned long end); * * | ... | 11| 10| 9|8|7|6|5| 4| 3|2| 1|0| <- bit number * | ... |SW3|SW2|SW1|G|L|D|A|CD|WT|U| W|P| <- bit names - * | TYPE (59-63) | ~OFFSET (9-58) |0|0|X|X| X| X|X|SD|0| <- swp entry + * | TYPE (59-63) | ~OFFSET (9-58) |0|0|X|X| X| X|F|SD|0| <- swp entry * * G (8) is aliased and used as a PROT_NONE indicator for * !present ptes. We need to start storing swap entries above @@ -197,9 +197,15 @@ extern void sync_global_pgds(unsigned long start, unsigned long end); * erratum where they can be incorrectly set by hardware on * non-present PTEs. * + * SD Bits 1-4 are not used in non-present format and available for + * special use described below: + * * SD (1) in swp entry is used to store soft dirty bit, which helps us * remember soft dirty over page migration * + * F (2) in swp entry is used to record when a pagetable is + * writeprotected by userfaultfd WP support. + * * Bit 7 in swp entry should be 0 because pmd_present checks not only P, * but also L and G. * diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h index 106b7d0e2dae..163043ab142d 100644 --- a/arch/x86/include/asm/pgtable_types.h +++ b/arch/x86/include/asm/pgtable_types.h @@ -32,6 +32,7 @@ #define _PAGE_BIT_SPECIAL _PAGE_BIT_SOFTW1 #define _PAGE_BIT_CPA_TEST _PAGE_BIT_SOFTW1 +#define _PAGE_BIT_UFFD_WP _PAGE_BIT_SOFTW2 /* userfaultfd wrprotected */ #define _PAGE_BIT_SOFT_DIRTY _PAGE_BIT_SOFTW3 /* software dirty tracking */ #define _PAGE_BIT_DEVMAP _PAGE_BIT_SOFTW4 @@ -100,6 +101,14 @@ #define _PAGE_SWP_SOFT_DIRTY (_AT(pteval_t, 0)) #endif +#ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP +#define _PAGE_UFFD_WP (_AT(pteval_t, 1) << _PAGE_BIT_UFFD_WP) +#define _PAGE_SWP_UFFD_WP _PAGE_USER +#else +#define _PAGE_UFFD_WP (_AT(pteval_t, 0)) +#define _PAGE_SWP_UFFD_WP (_AT(pteval_t, 0)) +#endif + #if defined(CONFIG_X86_64) || defined(CONFIG_X86_PAE) #define _PAGE_NX (_AT(pteval_t, 1) << _PAGE_BIT_NX) #define _PAGE_DEVMAP (_AT(u64, 1) << _PAGE_BIT_DEVMAP) diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h index 359fb935ded6..0e1470ecf7b5 100644 --- a/include/asm-generic/pgtable.h +++ b/include/asm-generic/pgtable.h @@ -10,6 +10,7 @@ #include #include #include +#include #if 5 - defined(__PAGETABLE_P4D_FOLDED) - defined(__PAGETABLE_PUD_FOLDED) - \ defined(__PAGETABLE_PMD_FOLDED) != CONFIG_PGTABLE_LEVELS diff --git a/include/asm-generic/pgtable_uffd.h b/include/asm-generic/pgtable_uffd.h new file mode 100644 index 000000000000..643d1bf559c2 --- /dev/null +++ b/include/asm-generic/pgtable_uffd.h @@ -0,0 +1,51 @@ +#ifndef _ASM_GENERIC_PGTABLE_UFFD_H +#define _ASM_GENERIC_PGTABLE_UFFD_H + +#ifndef CONFIG_HAVE_ARCH_USERFAULTFD_WP +static __always_inline int pte_uffd_wp(pte_t pte) +{ + return 0; +} + +static __always_inline int pmd_uffd_wp(pmd_t pmd) +{ + return 0; +} + +static __always_inline pte_t pte_mkuffd_wp(pte_t pte) +{ + return pte; +} + +static __always_inline pmd_t pmd_mkuffd_wp(pmd_t pmd) +{ + return pmd; +} + +static __always_inline pte_t pte_clear_uffd_wp(pte_t pte) +{ + return pte; +} + +static __always_inline pmd_t pmd_clear_uffd_wp(pmd_t pmd) +{ + return pmd; +} + +static __always_inline pte_t pte_swp_mkuffd_wp(pte_t pte) +{ + return pte; +} + +static __always_inline int pte_swp_uffd_wp(pte_t pte) +{ + return 0; +} + +static __always_inline pte_t pte_swp_clear_uffd_wp(pte_t pte) +{ + return pte; +} +#endif /* CONFIG_HAVE_ARCH_USERFAULTFD_WP */ + +#endif /* _ASM_GENERIC_PGTABLE_UFFD_H */ diff --git a/init/Kconfig b/init/Kconfig index cf5b5a0dcbc2..2a02e004874e 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -1418,6 +1418,11 @@ config ADVISE_SYSCALLS applications use these syscalls, you can disable this option to save space. +config HAVE_ARCH_USERFAULTFD_WP + bool + help + Arch has userfaultfd write protection support + config MEMBARRIER bool "Enable membarrier() system call" if EXPERT default y