From patchwork Fri Apr 14 14:23:41 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chih-En Lin X-Patchwork-Id: 13211643 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E8ACDC77B6E for ; Fri, 14 Apr 2023 14:28:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231192AbjDNO2Q (ORCPT ); Fri, 14 Apr 2023 10:28:16 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50644 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231196AbjDNO1n (ORCPT ); Fri, 14 Apr 2023 10:27:43 -0400 Received: from mail-pl1-x632.google.com (mail-pl1-x632.google.com [IPv6:2607:f8b0:4864:20::632]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E3E66CC27; Fri, 14 Apr 2023 07:27:00 -0700 (PDT) Received: by mail-pl1-x632.google.com with SMTP id y6so17360091plp.2; Fri, 14 Apr 2023 07:27:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1681482418; x=1684074418; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=dmHQTVbWw4YJlaNGarehFo5gsGgag0lLmmC2sUNSy5Y=; b=Rg/JhHpRFP6NIoGwm8nSvdS97FJcuj54dkHXyU7p5Et6BsX8Y9CS79uGrfUmlS+JAL hPa+a+/WivLZ888Jgxh5bYaernMZCgxjwb6UiAJIg+o7J8xsb26hyzcAQGcdrRDrWPni qAx/B+FYNu3GQx+8coAUZg7QUzb86cyQ6xyA9cGOrVU+5F5TvZM8zXeYjs3NfX13Npqo bUkbOI/uzjodhpPk7flzky6o5Bn/RWd94kPtge56rFxlfVjiZwagW4fP1Ra7E8uq6m3p xS7lqJYVqdKFKA4LdZGqWW4hjdpjXCYPbiDtCuRbf0Arcl1cQTSkZlFTBu9DDfP/aL+W WL1w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1681482418; x=1684074418; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=dmHQTVbWw4YJlaNGarehFo5gsGgag0lLmmC2sUNSy5Y=; b=iUonSlTBPWbCa/9d8ZmkzWFyhNV/hGokH5v4WSwwzG/8Wrgyl3uWlj6JYpmm3bpvNf CKRVBIUqLeAlVxKPXtUBZZS866iM5K0drTPyjdfio/cU5pC6DpH2lABeUoMpS9+ovGTW MrQmgqy6lOzSwigaddGyHonSTiWEaQ9uMROXiZKOAl1mpb93yWtKoXRs6D8ihfMcLYKz dmvxYJbPuGJMa+H1xGWb77dO1DOJGX1/RL4PbU99ZdrgIhLP4MZJuM2Qj8agA6fbtjf8 ykzWrXZal0s/X10bzQZR72xSLHbyLjGfo0yLdtQEpUqJBXXBszn9KPiSA3YsRAE6oJR6 WeYQ== X-Gm-Message-State: AAQBX9cUDE+ZEshlaugIeQ1+dqhesHHPgs0B6Ujm2lBwdG6U/dm86n/8 ucdpLC/Yi5Yma7WhPTijwPY= X-Google-Smtp-Source: AKy350abULEhuTKLG0Ici+w1lLZw7xKikgteau1wnZf/Xw2onMLRl5rBE0j5qE6rGm+yWTHui+5XfQ== X-Received: by 2002:a17:90a:7443:b0:247:271:c3f4 with SMTP id o3-20020a17090a744300b002470271c3f4mr5656869pjk.2.1681482418095; Fri, 14 Apr 2023 07:26:58 -0700 (PDT) Received: from strix-laptop.. (2001-b011-20e0-1499-8303-7502-d3d7-e13b.dynamic-ip6.hinet.net. [2001:b011:20e0:1499:8303:7502:d3d7:e13b]) by smtp.googlemail.com with ESMTPSA id h7-20020a17090ac38700b0022335f1dae2sm2952386pjt.22.2023.04.14.07.26.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 14 Apr 2023 07:26:57 -0700 (PDT) From: Chih-En Lin To: Andrew Morton , Qi Zheng , David Hildenbrand , "Matthew Wilcox (Oracle)" , Christophe Leroy , John Hubbard , Nadav Amit , Barry Song , Pasha Tatashin Cc: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Steven Rostedt , Masami Hiramatsu , Peter Zijlstra , Arnaldo Carvalho de Melo , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Ian Rogers , Adrian Hunter , Yu Zhao , Steven Barrett , Juergen Gross , Peter Xu , Kefeng Wang , Tong Tiangen , Christoph Hellwig , "Liam R. Howlett" , Yang Shi , Vlastimil Babka , Alex Sierra , Vincent Whitchurch , Anshuman Khandual , Li kunyu , Liu Shixin , Hugh Dickins , Minchan Kim , Joey Gouly , Chih-En Lin , Michal Hocko , Suren Baghdasaryan , "Zach O'Keefe" , Gautam Menghani , Catalin Marinas , Mark Brown , "Eric W. Biederman" , Andrei Vagin , Shakeel Butt , Daniel Bristot de Oliveira , "Jason A. Donenfeld" , Greg Kroah-Hartman , Alexey Gladkov , x86@kernel.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, Dinglan Peng , Pedro Fonseca , Jim Huang , Huichun Feng Subject: [PATCH v5 17/17] mm: Check the unexpected modification of COW-ed PTE Date: Fri, 14 Apr 2023 22:23:41 +0800 Message-Id: <20230414142341.354556-18-shiyn.lin@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230414142341.354556-1-shiyn.lin@gmail.com> References: <20230414142341.354556-1-shiyn.lin@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-trace-kernel@vger.kernel.org In the most of the cases, we don't expect any write access to COW-ed PTE table. To prevent this, add the new modification check to the page table check. But, there are still some of valid reasons where we might want to modify COW-ed PTE tables. Therefore, add the enable/disable function to the check. Signed-off-by: Chih-En Lin --- arch/x86/include/asm/pgtable.h | 1 + include/linux/page_table_check.h | 62 ++++++++++++++++++++++++++++++++ mm/memory.c | 4 +++ mm/page_table_check.c | 58 ++++++++++++++++++++++++++++++ 4 files changed, 125 insertions(+) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 7425f32e5293..6b323c672e36 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -1022,6 +1022,7 @@ static inline pud_t native_local_pudp_get_and_clear(pud_t *pudp) static inline void set_pte_at(struct mm_struct *mm, unsigned long addr, pte_t *ptep, pte_t pte) { + cowed_pte_table_check_modify(mm, addr, ptep, pte); page_table_check_pte_set(mm, addr, ptep, pte); set_pte(ptep, pte); } diff --git a/include/linux/page_table_check.h b/include/linux/page_table_check.h index 01e16c7696ec..4a54dc454281 100644 --- a/include/linux/page_table_check.h +++ b/include/linux/page_table_check.h @@ -113,6 +113,54 @@ static inline void page_table_check_pte_clear_range(struct mm_struct *mm, __page_table_check_pte_clear_range(mm, addr, pmd); } +#ifdef CONFIG_COW_PTE +void __check_cowed_pte_table_enable(pte_t *ptep); +void __check_cowed_pte_table_disable(pte_t *ptep); +void __cowed_pte_table_check_modify(struct mm_struct *mm, unsigned long addr, + pte_t *ptep, pte_t pte); + +static inline void check_cowed_pte_table_enable(pte_t *ptep) +{ + if (static_branch_likely(&page_table_check_disabled)) + return; + + __check_cowed_pte_table_enable(ptep); +} + +static inline void check_cowed_pte_table_disable(pte_t *ptep) +{ + if (static_branch_likely(&page_table_check_disabled)) + return; + + __check_cowed_pte_table_disable(ptep); +} + +static inline void cowed_pte_table_check_modify(struct mm_struct *mm, + unsigned long addr, + pte_t *ptep, pte_t pte) +{ + if (static_branch_likely(&page_table_check_disabled)) + return; + + __cowed_pte_table_check_modify(mm, addr, ptep, pte); +} +#else +static inline void check_cowed_pte_table_enable(pte_t *ptep) +{ +} + +static inline void check_cowed_pte_table_disable(pte_t *ptep) +{ +} + +static inline void cowed_pte_table_check_modify(struct mm_struct *mm, + unsigned long addr, + pte_t *ptep, pte_t pte) +{ +} +#endif /* CONFIG_COW_PTE */ + + #else static inline void page_table_check_alloc(struct page *page, unsigned int order) @@ -162,5 +210,19 @@ static inline void page_table_check_pte_clear_range(struct mm_struct *mm, { } +static inline void check_cowed_pte_table_enable(pte_t *ptep) +{ +} + +static inline void check_cowed_pte_table_disable(pte_t *ptep) +{ +} + +static inline void cowed_pte_table_check_modify(struct mm_struct *mm, + unsigned long addr, + pte_t *ptep, pte_t pte) +{ +} + #endif /* CONFIG_PAGE_TABLE_CHECK */ #endif /* __LINUX_PAGE_TABLE_CHECK_H */ diff --git a/mm/memory.c b/mm/memory.c index 7908e20f802a..e62487413038 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1202,10 +1202,12 @@ copy_cow_pte_range(struct vm_area_struct *dst_vma, * Although, parent's PTE is COW-ed, we should * still need to handle all the swap stuffs. */ + check_cowed_pte_table_disable(src_pte); ret = copy_nonpresent_pte(dst_mm, src_mm, src_pte, src_pte, curr, curr, addr, rss); + check_cowed_pte_table_enable(src_pte); if (ret == -EIO) { entry = pte_to_swp_entry(*src_pte); break; @@ -1223,8 +1225,10 @@ copy_cow_pte_range(struct vm_area_struct *dst_vma, * copy_present_pte() will determine the mapped page * should be COW mapping or not. */ + check_cowed_pte_table_disable(src_pte); ret = copy_present_pte(curr, curr, src_pte, src_pte, addr, rss, NULL); + check_cowed_pte_table_enable(src_pte); /* * If we need a pre-allocated page for this pte, * drop the lock, recover all the entries, fall diff --git a/mm/page_table_check.c b/mm/page_table_check.c index 25d8610c0042..5175c7476508 100644 --- a/mm/page_table_check.c +++ b/mm/page_table_check.c @@ -14,6 +14,9 @@ struct page_table_check { atomic_t anon_map_count; atomic_t file_map_count; +#ifdef CONFIG_COW_PTE + atomic_t check_cowed_pte; +#endif }; static bool __page_table_check_enabled __initdata = @@ -248,3 +251,58 @@ void __page_table_check_pte_clear_range(struct mm_struct *mm, pte_unmap(ptep - PTRS_PER_PTE); } } + +#ifdef CONFIG_COW_PTE +void __check_cowed_pte_table_enable(pte_t *ptep) +{ + struct page *page = pte_page(*ptep); + struct page_ext *page_ext = page_ext_get(page); + struct page_table_check *ptc = get_page_table_check(page_ext); + + atomic_set(&ptc->check_cowed_pte, 1); + page_ext_put(page_ext); +} + +void __check_cowed_pte_table_disable(pte_t *ptep) +{ + struct page *page = pte_page(*ptep); + struct page_ext *page_ext = page_ext_get(page); + struct page_table_check *ptc = get_page_table_check(page_ext); + + atomic_set(&ptc->check_cowed_pte, 0); + page_ext_put(page_ext); +} + +static int check_cowed_pte_table(pte_t *ptep) +{ + struct page *page = pte_page(*ptep); + struct page_ext *page_ext = page_ext_get(page); + struct page_table_check *ptc = get_page_table_check(page_ext); + int check = 0; + + check = atomic_read(&ptc->check_cowed_pte); + page_ext_put(page_ext); + + return check; +} + +void __cowed_pte_table_check_modify(struct mm_struct *mm, unsigned long addr, + pte_t *ptep, pte_t pte) +{ + pgd_t *pgd; + p4d_t *p4d; + pud_t *pud; + pmd_t *pmd; + + if (!test_bit(MMF_COW_PTE, &mm->flags) || !check_cowed_pte_table(ptep)) + return; + + pgd = pgd_offset(mm, addr); + p4d = p4d_offset(pgd, addr); + pud = pud_offset(p4d, addr); + pmd = pmd_offset(pud, addr); + + if (!pmd_none(*pmd) && !pmd_write(*pmd) && cow_pte_count(pmd) > 1) + BUG_ON(!pte_same(*ptep, pte)); +} +#endif