From patchwork Wed Apr 17 21:25:49 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13633882 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 912FBC4345F for ; Wed, 17 Apr 2024 21:25:58 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 11D406B009A; Wed, 17 Apr 2024 17:25:58 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0CDCC6B009C; Wed, 17 Apr 2024 17:25:58 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id ED69E6B009D; Wed, 17 Apr 2024 17:25:57 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id CBE336B009A for ; Wed, 17 Apr 2024 17:25:57 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 8836DA12E8 for ; Wed, 17 Apr 2024 21:25:57 +0000 (UTC) X-FDA: 82020306354.01.B23FD17 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf11.hostedemail.com (Postfix) with ESMTP id 99E5C40004 for ; Wed, 17 Apr 2024 21:25:55 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=KtuiU6XU; spf=pass (imf11.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1713389155; a=rsa-sha256; cv=none; b=zHfHj9GTLOaOiKuWzefgmoPJMOsc+yLN6Wq/F4BbgvJeRVk/hPYzjqQUa/3G7DC2BSw4EF H/UfOf620LwFMCdbvoUfp5mn0u5ovkgKtuoTl4aqLk+unLY1VOAuBzAod33UPbDOa6eDuS NiRl7OA5Wk3qzVILVor/CbefMc3UrVQ= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=KtuiU6XU; spf=pass (imf11.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1713389155; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=ahj2taeFOxrKqKUGuyCB3eGq6ZHz9BjeESNc79q5P5w=; b=srQyLJjAFCJ1O8P6h+ckcp7Ae83WWeNJEPhJQZzMb0gmm5wJ1X1vKKl2aDednCJf4nnO4h PGECMVgPw6nKm2RkZrQw3ERgxknygj6K5Md8ZjAjKLoRx/wMgTws2jG8K7jhYOLz7VXYyT Ft7YeKAiPD3U5u16vMVYsefc6zPX/xg= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1713389155; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=ahj2taeFOxrKqKUGuyCB3eGq6ZHz9BjeESNc79q5P5w=; b=KtuiU6XUrujvaPmp6OFvga+I5u+oVnAHx5xbGI9Rhz1hb3/RufnTewJwe/7oAarLNpkla0 LRVDUUxgUd5RtB5+JbdJs126SyF4Cmwz+akFcDmlaH30xtY3iQ/8j1j8NW8gDCf5V5RFBH Dgth5CHixdsV8dTOClmROxe/NOzqBc0= Received: from mail-oo1-f71.google.com (mail-oo1-f71.google.com [209.85.161.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-34--zwwn19fP-GEhzCKi_XO2g-1; Wed, 17 Apr 2024 17:25:53 -0400 X-MC-Unique: -zwwn19fP-GEhzCKi_XO2g-1 Received: by mail-oo1-f71.google.com with SMTP id 006d021491bc7-5aa3afc55c6so42624eaf.0 for ; Wed, 17 Apr 2024 14:25:53 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1713389152; x=1713993952; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=ahj2taeFOxrKqKUGuyCB3eGq6ZHz9BjeESNc79q5P5w=; b=bpICHHAKMXC6oE55bX9VitKo4W30UD/QsAfStuXsgVL8Ipax8v1W9wFE5FAMxFiPTm Hy2uarPDru9MxvyEhs+XlkYesVW/Ki17Q7tLZoaKhA9sghRic4mNZBI6Kisd971QO7Fg Gacjdn/wh12JYfQOc30oJMR4ND3rN6nfSNQP21cZddGw5qsLr5eSkTCmx8wsp1ndXALK ETHqW72PRUSoxcb/gJAqJHCsLDpa3C7JLAVkzo1hUGEkJPoWaZ12QVMt7lglZNdtNcFu MObGpfL2xgdnd4TiU/eOC9iCU2yKAnGa5nl2CkHxEwJNu+cR0IefWyQtYHCExzYsl6uG rrjA== X-Gm-Message-State: AOJu0YyM1fHIyZ1TQKb4oN7oDhiEmuvuJpsua4Noanei5lVg7TgumA+x Z9VuGT+0Vypuq7g8v+FW4FkU3C5r5j6PSFUSkagLleUD1FYRFoJBsTGnFDF5aV7N1pfN0DacxRS ApqFI+/K4jNRI4budMF0kskizYB5JCPGyuYOz41AFvsQdW9e0s9Wy3ss/QNT9xwdQyLLb1ETOv4 qKZsTFLMQWTnOSVM7g353AuTRN6fJpzg== X-Received: by 2002:a05:6358:88e:b0:186:43ee:8d32 with SMTP id m14-20020a056358088e00b0018643ee8d32mr699699rwj.3.1713389152248; Wed, 17 Apr 2024 14:25:52 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEtDZFWDX2pFLDDeWFFYKjqvQrUADLN/TxQZFeEHDALfL/PjWEmHUVYEoXcLY7cj8DYh1dCFQ== X-Received: by 2002:a05:6358:88e:b0:186:43ee:8d32 with SMTP id m14-20020a056358088e00b0018643ee8d32mr699661rwj.3.1713389151515; Wed, 17 Apr 2024 14:25:51 -0700 (PDT) Received: from x1n.redhat.com (pool-99-254-121-117.cpe.net.cable.rogers.com. [99.254.121.117]) by smtp.gmail.com with ESMTPSA id p10-20020a0c8c8a000000b0069942a53f46sm30653qvb.53.2024.04.17.14.25.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 17 Apr 2024 14:25:51 -0700 (PDT) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Pasha Tatashin , peterx@redhat.com, Nadav Amit , Andrew Morton , Axel Rasmussen , David Hildenbrand Subject: [PATCH v4] mm/page_table_check: Support userfault wr-protect entries Date: Wed, 17 Apr 2024 17:25:49 -0400 Message-ID: <20240417212549.2766883-1-peterx@redhat.com> X-Mailer: git-send-email 2.44.0 MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 99E5C40004 X-Stat-Signature: fqxe3ey8qx1a76qx3g5nebx1wgnpmb98 X-Rspam-User: X-HE-Tag: 1713389155-814875 X-HE-Meta: U2FsdGVkX183VXQFLQ1le19Z8A8wU/GaNGsTUueFFLAW8LGzRXcGQLepY8mWilU0gmCx68isdD47KH2g7OWgqKddndQtXl5KuVMdwKl5RLJEM7SctVKM3gAZd7EYFtZO0EFZB38emW3FsB2pHRWphJ0Pyh0EX9oMLELuWBipY6RPYNPbs0dT4ogZ5gQsi0X6c1HtZKWCRI2dLMuAnHtSZSv9H3noVivWlQGwIvvKclE/WoTy74JNWVtJ6bD9laVtDbH31UdoA7ZjIdkLm5uOZiDqbpCbyVJnI8aJJHHVcjz8zO2wT/v91Mw2suD7blaFMUCw6oCq5G5oWH4ubj7Go/ucZxa8NC5cTSMzslhE3aVRP+PfjUbCUTnHu9T3Ypez1MWIFUINuo7N0zLPgfNyQ4d3yeR5CnAbxCrlSMK+cXXJ8iyBrilGTwTDpKOCgmvUQksMoAPnt+Fc/Gh1pjOEvdEotKaZT1L238zyPEGSAxhsEhxuAow6wov4xud7rtG5IvejzAOteU4fk4GH2npw9P+oWrhyP0KmH6kH/o93mvXc7EjFmN44BVAsW53E8LbacYbMtsWkQRSB6OKuzLFdilQPo21kYGfzfwZBHhUzBJIRJx8CtHxO3mZNuy6/slEAEpBI2tbMfiJ1PDCGRDs92/oAzFc4RNoWzG8ukHgRHQJ1jUcBPgTJaGd1kHxU1gY6pisRJkAyJqW3flI4Gnc7uBnksxDjbUHuH1NL8GB75g5+NZFDaQoD/FWShMgDCXo0dgkHHxlhqO0wIUGprKEeY0VJm1UMT5SV+P49EJJWtpv4d3x5Puci0CMR7g2KdzyPrF8YKE0qr23FV71FppUp0YMg/4hqPPpQ0MAcgsbsG/kUvMzA18tOJELL9rtdChEXRu0aJdWllGpBnIQfvtpHDTceDcGN8g1Gb7qYY3sbsYYfoG4tUYspiOa/iXOBN/yUg9S5hK4yowGE87BElbV Gh+Q4eAw JjHdrxUTCd5Wf6gfoBYGpGN+m/ksr3dAcNcf/1xJCZdZTRQW5FDezS+7lYWUMgYDrceaRdW3xfNATNTIfXNt/qykn6dedPZ6bs7tZHmiXVsBrpDjtLaK/52KLKrgGu/4FAwZNTy2kGzuvBlOhW+sVdQvmypjDi02Ds/CJ4ArM77QuKeVUf10HOLWj6Hgho0tel61L8JE91bqCBy/t1Vtf0dk59GCy0G34xnlOifDBkZZVaDp5JTe5DHjLlY3c5f57GBEFKClLlt/ZP5/CVlYr2oMRBFj5PpcHG9pzcQ77ueQ6ZNXAtxHaZgLAIYT002OamMifY1rMdvbtlyvKoSJZvxYtJ2vlsa+8YGnYAQcOokFSudYhiF8PWfV8URdHGgr72dB8hIt+zZnIkfeViw8+/mtW3bCItRHUEHwxTa9thJ0TOQxrj3gl7N37eiUfWM4qUmcN8n1g8Yyz8mE0HNbJT/T1VQgSxQgWkxsI1KsLx+NZ7885ryHUWJdZzuF7Sbv88Hs3v2BT9fLzOyr1LlKkbUjJDs0x4mrpBnHXnU/AWX605X/ZjTyklblTMCx4bu1gb0AQoaMgl4ok4XMzBLasC6/qORy9yXCvYPNZcg3ED4s7LRAv/CLgjwpEQmVvL8Bcu7FipEclGFZ4lzQ= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Allow page_table_check hooks to check over userfaultfd wr-protect criteria upon pgtable updates. The rule is no co-existance allowed for any writable flag against userfault wr-protect flag. This should be better than c2da319c2e, where we used to only sanitize such issues during a pgtable walk, but when hitting such issue we don't have a good chance to know where does that writable bit came from [1], so that even the pgtable walk exposes a kernel bug (which is still helpful on triaging) but not easy to track and debug. Now we switch to track the source. It's much easier too with the recent introduction of page table check. There are some limitations with using the page table check here for userfaultfd wr-protect purpose: - It is only enabled with explicit enablement of page table check configs and/or boot parameters, but should be good enough to track at least syzbot issues, as syzbot should enable PAGE_TABLE_CHECK[_ENFORCED] for x86 [1]. We used to have DEBUG_VM but it's now off for most distros, while distros also normally not enable PAGE_TABLE_CHECK[_ENFORCED], which is similar. - It conditionally works with the ptep_modify_prot API. It will be bypassed when e.g. XEN PV is enabled, however still work for most of the rest scenarios, which should be the common cases so should be good enough. - Hugetlb check is a bit hairy, as the page table check cannot identify hugetlb pte or normal pte via trapping at set_pte_at(), because of the current design where hugetlb maps every layers to pte_t... For example, the default set_huge_pte_at() can invoke set_pte_at() directly and lose the hugetlb context, treating it the same as a normal pte_t. So far it's fine because we have huge_pte_uffd_wp() always equals to pte_uffd_wp() as long as supported (x86 only). It'll be a bigger problem when we'll define _PAGE_UFFD_WP differently at various pgtable levels, because then one huge_pte_uffd_wp() per-arch will stop making sense first.. as of now we can leave this for later too. This patch also removes commit c2da319c2e altogether, as we have something better now. [1] https://lore.kernel.org/all/000000000000dce0530615c89210@google.com/ Cc: Pasha Tatashin Signed-off-by: Peter Xu Reviewed-by: Pasha Tatashin --- v2: - Rename __page_table_check_pxx() to page_table_check_pxx_flags(), meanwhile move the pte check out of the loop [Pasha] - Fix build issues reported from the bot, also added SWP_DEVICE_WRITE which was overlooked before v3: - Add missing doc update [Pasha] v4: - Fix wordings in doc, use more elegant swap helpers [Pasha] --- Documentation/mm/page_table_check.rst | 9 +++++++- arch/x86/include/asm/pgtable.h | 18 +--------------- mm/page_table_check.c | 30 +++++++++++++++++++++++++++ 3 files changed, 39 insertions(+), 18 deletions(-) diff --git a/Documentation/mm/page_table_check.rst b/Documentation/mm/page_table_check.rst index c12838ce6b8d..c59f22eb6a0f 100644 --- a/Documentation/mm/page_table_check.rst +++ b/Documentation/mm/page_table_check.rst @@ -14,7 +14,7 @@ Page table check performs extra verifications at the time when new pages become accessible from the userspace by getting their page table entries (PTEs PMDs etc.) added into the table. -In case of detected corruption, the kernel is crashed. There is a small +In case of most detected corruption, the kernel is crashed. There is a small performance and memory overhead associated with the page table check. Therefore, it is disabled by default, but can be optionally enabled on systems where the extra hardening outweighs the performance costs. Also, because page table check @@ -22,6 +22,13 @@ is synchronous, it can help with debugging double map memory corruption issues, by crashing kernel at the time wrong mapping occurs instead of later which is often the case with memory corruptions bugs. +It can also be used to do page table entry checks over various flags, dump +warnings when illegal combinations of entry flags are detected. Currently, +userfaultfd is the only user of such to sanity check wr-protect bit against +any writable flags. Illegal flag combinations will not directly cause data +corruption in this case immediately, but that will cause read-only data to +be writable, leading to corrupt when the page content is later modified. + Double mapping detection logic ============================== diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 273f7557218c..65b8e5bb902c 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -388,23 +388,7 @@ static inline pte_t pte_wrprotect(pte_t pte) #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP static inline int pte_uffd_wp(pte_t pte) { - bool wp = pte_flags(pte) & _PAGE_UFFD_WP; - -#ifdef CONFIG_DEBUG_VM - /* - * Having write bit for wr-protect-marked present ptes is fatal, - * because it means the uffd-wp bit will be ignored and write will - * just go through. - * - * Use any chance of pgtable walking to verify this (e.g., when - * page swapped out or being migrated for all purposes). It means - * something is already wrong. Tell the admin even before the - * process crashes. We also nail it with wrong pgtable setup. - */ - WARN_ON_ONCE(wp && pte_write(pte)); -#endif - - return wp; + return pte_flags(pte) & _PAGE_UFFD_WP; } static inline pte_t pte_mkuffd_wp(pte_t pte) diff --git a/mm/page_table_check.c b/mm/page_table_check.c index af69c3c8f7c2..4169576bed72 100644 --- a/mm/page_table_check.c +++ b/mm/page_table_check.c @@ -7,6 +7,8 @@ #include #include #include +#include +#include #undef pr_fmt #define pr_fmt(fmt) "page_table_check: " fmt @@ -182,6 +184,22 @@ void __page_table_check_pud_clear(struct mm_struct *mm, pud_t pud) } EXPORT_SYMBOL(__page_table_check_pud_clear); +/* Whether the swap entry cached writable information */ +static inline bool swap_cached_writable(swp_entry_t entry) +{ + return is_writable_device_exclusive_entry(entry) || + is_writable_device_private_entry(entry) || + is_writable_migration_entry(entry); +} + +static inline void page_table_check_pte_flags(pte_t pte) +{ + if (pte_present(pte) && pte_uffd_wp(pte)) + WARN_ON_ONCE(pte_write(pte)); + else if (is_swap_pte(pte) && pte_swp_uffd_wp(pte)) + WARN_ON_ONCE(swap_cached_writable(pte_to_swp_entry(pte))); +} + void __page_table_check_ptes_set(struct mm_struct *mm, pte_t *ptep, pte_t pte, unsigned int nr) { @@ -190,6 +208,8 @@ void __page_table_check_ptes_set(struct mm_struct *mm, pte_t *ptep, pte_t pte, if (&init_mm == mm) return; + page_table_check_pte_flags(pte); + for (i = 0; i < nr; i++) __page_table_check_pte_clear(mm, ptep_get(ptep + i)); if (pte_user_accessible_page(pte)) @@ -197,11 +217,21 @@ void __page_table_check_ptes_set(struct mm_struct *mm, pte_t *ptep, pte_t pte, } EXPORT_SYMBOL(__page_table_check_ptes_set); +static inline void page_table_check_pmd_flags(pmd_t pmd) +{ + if (pmd_present(pmd) && pmd_uffd_wp(pmd)) + WARN_ON_ONCE(pmd_write(pmd)); + else if (is_swap_pmd(pmd) && pmd_swp_uffd_wp(pmd)) + WARN_ON_ONCE(swap_cached_writable(pmd_to_swp_entry(pmd))); +} + void __page_table_check_pmd_set(struct mm_struct *mm, pmd_t *pmdp, pmd_t pmd) { if (&init_mm == mm) return; + page_table_check_pmd_flags(pmd); + __page_table_check_pmd_clear(mm, *pmdp); if (pmd_user_accessible_page(pmd)) { page_table_check_set(pmd_pfn(pmd), PMD_SIZE >> PAGE_SHIFT,