From patchwork Wed Jul 12 08:16:16 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peng Zhang X-Patchwork-Id: 13309770 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CA611EB64DA for ; Wed, 12 Jul 2023 08:16:42 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1757C6B0071; Wed, 12 Jul 2023 04:16:42 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 126D06B0072; Wed, 12 Jul 2023 04:16:42 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 015446B0075; Wed, 12 Jul 2023 04:16:41 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id E4B996B0071 for ; Wed, 12 Jul 2023 04:16:41 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id B40AD1C7C54 for ; Wed, 12 Jul 2023 08:16:41 +0000 (UTC) X-FDA: 81002253402.18.98C50AA Received: from mail-pf1-f171.google.com (mail-pf1-f171.google.com [209.85.210.171]) by imf13.hostedemail.com (Postfix) with ESMTP id 288DC20011 for ; Wed, 12 Jul 2023 08:16:38 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b="BNe6/5m0"; dmarc=pass (policy=quarantine) header.from=bytedance.com; spf=pass (imf13.hostedemail.com: domain of zhangpeng.00@bytedance.com designates 209.85.210.171 as permitted sender) smtp.mailfrom=zhangpeng.00@bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1689149799; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=oC+qyxbfAgsH4hJHeKoCaJaZtcxWMG8Si4Ik9TdQp1U=; b=C+qPvAtf6wHFH6Ev1ZKDelnFtDKJO7gcdrWPHsG8fo5vBfMod9A3eV/VhAVsoP+UUbpZEj 2UFctSkJRUI5/dBaYZPg3IGJm1ky7+u7S0D1QBB3zLkWb3ZvGkzmzZ8rajQD60Hqu7VyhZ p2TjxoqbddVE6h7cYSnrruLpTSFgQgM= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b="BNe6/5m0"; dmarc=pass (policy=quarantine) header.from=bytedance.com; spf=pass (imf13.hostedemail.com: domain of zhangpeng.00@bytedance.com designates 209.85.210.171 as permitted sender) smtp.mailfrom=zhangpeng.00@bytedance.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1689149799; a=rsa-sha256; cv=none; b=V3kMvUnUg0bYxK9HStib9roNkg/L4+LglGT5kKUfuGgbkskZAAox7i0pbWQKC/2n4A8VKT 1Uq7IPcT5NLwP5QoXa4fnUSrk5mEmkuzocvWf7h/UvTEwCSpepM9TZ0gRb6ejezm5olw// 97LDae2RCoxzC8wfUj9988uUdQPdFd8= Received: by mail-pf1-f171.google.com with SMTP id d2e1a72fcca58-666edfc50deso333251b3a.0 for ; Wed, 12 Jul 2023 01:16:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1689149797; x=1691741797; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=oC+qyxbfAgsH4hJHeKoCaJaZtcxWMG8Si4Ik9TdQp1U=; b=BNe6/5m0eyl0GQBWqvxqvFw/modnVzyDYQ8xLML3eain/H68danKbF2pH1MKXyH1n+ QCmNZDHkctGwhxlbd8/zr4kvvNGOGiB6tXq12M2HtCauEuMeiLSsmFfsBL7ja4kHIId4 GcouPP9HbxCxqg83M2w8aNuPotIwecgdSVO1fwrM+V7RMUxNmyg6Y37gtKrYyhrpvxBs BUgTlI4y9Lpm8Q9mPcO+pwzkWSop/wVaKNcEaB3/Ni65rMKwjhk7c9+7d7tMZkNJXPCj AJyS4OW74C6oeO5AEmNlU7X3EMl/vuYqiZwu16EfkQkWzwRG9Gqqmq/uhR4Co3nWL1SR ppDQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689149797; x=1691741797; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=oC+qyxbfAgsH4hJHeKoCaJaZtcxWMG8Si4Ik9TdQp1U=; b=LjTe2rL3rNQCKmC3hnJqgRlSxSM/+gow4F0wIH38WT/iI07O0ryhsLvqtJ3y056Z78 ZmFiD9Pil39YFKWeVaak5z/FglAkum8Sm+fqIHak4qA2/YTXrRUPoFT0ZU6pSfz4zKsx bBdAKTKC8T+hLpUrR/CSqWNiUi9UT9XQlQMoLGPsNyr8dopM9z1HxNbdz9tUyAj5rOoA uGMsRPUptK8ndazIwZk3T5hGM86wxDgajdtCTwvSrVDfo2EiVtgxf3gcLB1uR5pJP5bH rdlmHJlli+LbSvR2JOR0qCMoYjID/XjHR4e4CCAqUGFeney8PCGxTiLvIa5OhSZT9AHc bcAQ== X-Gm-Message-State: ABy/qLZijvJcCJPOqEMNSyHRVEW6ePk8qsdSMxfQxK9vIiuCOXOG+Z24 Xdxw7GQeEoePAtQvifQgVwjTLw== X-Google-Smtp-Source: APBJJlHzRj5WK7wvGYTqNSBrjaP4SOZPjSfagdOlkQ1usXTWUG8zgd7Jwt04bs5q8sUpP1ze/DdjQw== X-Received: by 2002:a05:6a00:2b8d:b0:666:c1ae:3b87 with SMTP id dv13-20020a056a002b8d00b00666c1ae3b87mr1547140pfb.12.1689149797588; Wed, 12 Jul 2023 01:16:37 -0700 (PDT) Received: from GL4FX4PXWL.bytedance.net ([139.177.225.243]) by smtp.gmail.com with ESMTPSA id d7-20020aa78147000000b0063f2a5a59d1sm2988587pfn.190.2023.07.12.01.16.34 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Wed, 12 Jul 2023 01:16:37 -0700 (PDT) From: Peng Zhang To: glider@google.com, elver@google.com, dvyukov@google.com, akpm@linux-foundation.org Cc: kasan-dev@googlegroups.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, muchun.song@linux.dev, Peng Zhang Subject: [PATCH v2] mm: kfence: allocate kfence_metadata at runtime Date: Wed, 12 Jul 2023 16:16:16 +0800 Message-Id: <20230712081616.45177-1-zhangpeng.00@bytedance.com> X-Mailer: git-send-email 2.37.0 (Apple Git-136) MIME-Version: 1.0 X-Rspam-User: X-Stat-Signature: dxf48un9in1eb93uak3j1941wjt4fu5g X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 288DC20011 X-HE-Tag: 1689149798-8105 X-HE-Meta: U2FsdGVkX1/vHhd6Fy4dr0iKUr8jNrqCvSDiUT/8Acg76YRYaCsbAyMJwurpz+Tw8Ni2YQ07D2bVBgk0Vp2J/qI1/sTmc+PXb4AOFJpdeJyajJrhN6rBlpFQ0JWC2pIsmbZX9epXmPBMC7t8ecW7ff2evzgexpongai1/8Mqrzf322TO3Wg8x1kMjiZ49MImIHobdeymrJGOXyIw1r3Qt2jsQPKFnbWs7Hx+h1bLe1donHDOW6UOj7NoyBH2sSeBioWMmErigfmqMsV4RhEAJ1YbngrKvNx3FBhlZw2JibD93K40RO8UNgY149c5tvbsDB7mmcAwtw2Y++eDdHb3ah/vlhDsHes+3m2XZKn8KmGzzbnhOz5OOv7ZR0PhJ+HeEVg7WGWPkxN9gJFN625HQed2FJq5JRo+ZpzHZP4x+Cevz6rAs/nYNjtwpps1swrD8t8bzIfT4qwVVp53AuuWjG9DCKPLGFn+3a1V75Zf17m3SkhDHXk2Vu7a3owrXENa7duJ9LzLRKc4ZAP3TrF7iN9ZZnIfKzDWMg164dhObfPT7kaTXBKIhTXsCCjzoNhr52aWeWzI+k8l3HzLntmZx6gqDtCquy8kgreMg/aGdLYArNY6pKDRP7J9HZCK5D5hc5hwdUi/vN1w4J8r3c6sU11EmQUHqlW5ftgHuZ1qZ7rjIA7+/MKiC8N4JPfN5vz8dnKD0EQQF/50k/NV9kEFqec15ih+yOpk1rj63mHPnvJhnCGL4sCdcTLHwH4Mg90TkAuQN4eU9EXF3LalUJKt+lGItHF5VAt4lTNn5rtuu+zMJRwnn8pjYh8j4dnjnHCUMNT96xzR/xuP/FVC2zIa3N74TKd3RNAnEPtyRrYHPRm7i43/R/VoFKd8vFLV4t28OBB0ow0fH3nBkdkG3VR8jCaC7WcYIdY+YkEKZNej/C3f8usuMsdxNhErs8rmSVaqtt3vX2XPirYkFRfWear 4qt4HF+u rlw9uN889F6GTua5Z/uQv3GFAo9xIiItOLEgnXQQEtrUeM5KuXfjJx5ar8Do4LCLYFs4vOdeLHrQaYSvVvPYbc98jaM23YiKhsa2Vp2b73okxTV9LoHO/uzrBFDAP7gzWtpLvg5l911iCR/Kzq/cxvXDWJ8Wg4u0JkX00pz4+DETArRI/KV6ZXMpdnXWoE1dJIlgqDYX7RuWVi5fdYaG3kTvBYX3uTC5P4DfIGXff91a82V9ZVcP0eWOGP1xdpf7tA+EgxJFOg51RMoAbssMFnQ09rGyYZpYL/B6hC9bBTFKnVqTKmJR7ZhZwyHm0MV7uOnig6aJtgSs9KnK1iKegZ3TClp81rHL9o8lnMaTyn4fkzc2SaT+2MgifPNfcimAyjooVaFRH86iPc3r9xMJf1J9blWWYBTk9asi2il3BLw/p2oonNjaCoqA/XcBwsVae5fVn9eGwThE6doifGRsaGRuqc2ADf4IJahSjCVl4FqwNt2ifeT6Zx/yJvlFLokvpedJF5OBDp0bX+84= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: kfence_metadata is currently a static array. For the purpose of allocating scalable __kfence_pool, we first change it to runtime allocation of metadata. Since the size of an object of kfence_metadata is 1160 bytes, we can save at least 72 pages (with default 256 objects) without enabling kfence. Below is the numbers obtained in qemu (with default 256 objects). before: Memory: 8134692K/8388080K available (3668K bss) after: Memory: 8136740K/8388080K available (1620K bss) More than expected, it saves 2MB memory. It can be seen that the size of the .bss section has changed, possibly because it affects the linker. Signed-off-by: Peng Zhang --- Changes since v1: - Fix a stupid problem of not being able to initialize kfence. The problem is that I slightly modified the patch before sending it out, but it has not been tested. I'm extremely sorry. - Drop kfence_alloc_metadata() and kfence_free_metadata() because they are no longer reused. - Allocate metadata from memblock during early initialization. Fixed the issue of allocating metadata size that cannot exceed the limit of the buddy system during early initialization. - Fix potential UAF in kfence_shutdown_cache(). v1: https://lore.kernel.org/lkml/20230710032714.26200-1-zhangpeng.00@bytedance.com/ include/linux/kfence.h | 5 +- mm/kfence/core.c | 124 ++++++++++++++++++++++++++++------------- mm/kfence/kfence.h | 5 +- mm/mm_init.c | 2 +- 4 files changed, 94 insertions(+), 42 deletions(-) diff --git a/include/linux/kfence.h b/include/linux/kfence.h index 726857a4b680..68e71562bfa7 100644 --- a/include/linux/kfence.h +++ b/include/linux/kfence.h @@ -59,9 +59,10 @@ static __always_inline bool is_kfence_address(const void *addr) } /** - * kfence_alloc_pool() - allocate the KFENCE pool via memblock + * kfence_alloc_pool_and_metadata() - allocate the KFENCE pool and KFENCE + * metadata via memblock */ -void __init kfence_alloc_pool(void); +void __init kfence_alloc_pool_and_metadata(void); /** * kfence_init() - perform KFENCE initialization at boot time diff --git a/mm/kfence/core.c b/mm/kfence/core.c index dad3c0eb70a0..ed0424950cf1 100644 --- a/mm/kfence/core.c +++ b/mm/kfence/core.c @@ -116,7 +116,16 @@ EXPORT_SYMBOL(__kfence_pool); /* Export for test modules. */ * backing pages (in __kfence_pool). */ static_assert(CONFIG_KFENCE_NUM_OBJECTS > 0); -struct kfence_metadata kfence_metadata[CONFIG_KFENCE_NUM_OBJECTS]; +struct kfence_metadata *kfence_metadata; + +/* + * When kfence_metadata is not NULL, it may be that kfence is being initialized + * at this time, and it may be used by kfence_shutdown_cache() during + * initialization. If the initialization fails, kfence_metadata will be released, + * causing UAF. So it is necessary to add kfence_metadata_init for initialization, + * and kfence_metadata will be visible only when initialization is successful. + */ +static struct kfence_metadata *kfence_metadata_init; /* Freelist with available objects. */ static struct list_head kfence_freelist = LIST_HEAD_INIT(kfence_freelist); @@ -591,7 +600,7 @@ static unsigned long kfence_init_pool(void) __folio_set_slab(slab_folio(slab)); #ifdef CONFIG_MEMCG - slab->memcg_data = (unsigned long)&kfence_metadata[i / 2 - 1].objcg | + slab->memcg_data = (unsigned long)&kfence_metadata_init[i / 2 - 1].objcg | MEMCG_DATA_OBJCGS; #endif } @@ -610,7 +619,7 @@ static unsigned long kfence_init_pool(void) } for (i = 0; i < CONFIG_KFENCE_NUM_OBJECTS; i++) { - struct kfence_metadata *meta = &kfence_metadata[i]; + struct kfence_metadata *meta = &kfence_metadata_init[i]; /* Initialize metadata. */ INIT_LIST_HEAD(&meta->list); @@ -626,6 +635,12 @@ static unsigned long kfence_init_pool(void) addr += 2 * PAGE_SIZE; } + /* + * Make kfence_metadata visible only when initialization is successful. + * Otherwise, if the initialization fails and kfence_metadata is + * freed, it may cause UAF in kfence_shutdown_cache(). + */ + kfence_metadata = kfence_metadata_init; return 0; reset_slab: @@ -672,26 +687,10 @@ static bool __init kfence_init_pool_early(void) */ memblock_free_late(__pa(addr), KFENCE_POOL_SIZE - (addr - (unsigned long)__kfence_pool)); __kfence_pool = NULL; - return false; -} - -static bool kfence_init_pool_late(void) -{ - unsigned long addr, free_size; - addr = kfence_init_pool(); - - if (!addr) - return true; + memblock_free_late(__pa(kfence_metadata_init), KFENCE_METADATA_SIZE); + kfence_metadata_init = NULL; - /* Same as above. */ - free_size = KFENCE_POOL_SIZE - (addr - (unsigned long)__kfence_pool); -#ifdef CONFIG_CONTIG_ALLOC - free_contig_range(page_to_pfn(virt_to_page((void *)addr)), free_size / PAGE_SIZE); -#else - free_pages_exact((void *)addr, free_size); -#endif - __kfence_pool = NULL; return false; } @@ -841,19 +840,30 @@ static void toggle_allocation_gate(struct work_struct *work) /* === Public interface ===================================================== */ -void __init kfence_alloc_pool(void) +void __init kfence_alloc_pool_and_metadata(void) { if (!kfence_sample_interval) return; - /* if the pool has already been initialized by arch, skip the below. */ - if (__kfence_pool) - return; - - __kfence_pool = memblock_alloc(KFENCE_POOL_SIZE, PAGE_SIZE); - + /* + * If the pool has already been initialized by arch, there is no need to + * re-allocate the memory pool. + */ if (!__kfence_pool) + __kfence_pool = memblock_alloc(KFENCE_POOL_SIZE, PAGE_SIZE); + + if (!__kfence_pool) { pr_err("failed to allocate pool\n"); + return; + } + + /* The memory allocated by memblock has been zeroed out. */ + kfence_metadata_init = memblock_alloc(KFENCE_METADATA_SIZE, PAGE_SIZE); + if (!kfence_metadata_init) { + pr_err("failed to allocate metadata\n"); + memblock_free(__kfence_pool, KFENCE_POOL_SIZE); + __kfence_pool = NULL; + } } static void kfence_init_enable(void) @@ -895,33 +905,68 @@ void __init kfence_init(void) static int kfence_init_late(void) { - const unsigned long nr_pages = KFENCE_POOL_SIZE / PAGE_SIZE; + const unsigned long nr_pages_pool = KFENCE_POOL_SIZE / PAGE_SIZE; + const unsigned long nr_pages_meta = KFENCE_METADATA_SIZE / PAGE_SIZE; + unsigned long addr = (unsigned long)__kfence_pool; + unsigned long free_size = KFENCE_POOL_SIZE; + int err = -ENOMEM; + #ifdef CONFIG_CONTIG_ALLOC struct page *pages; - - pages = alloc_contig_pages(nr_pages, GFP_KERNEL, first_online_node, NULL); + pages = alloc_contig_pages(nr_pages_pool, GFP_KERNEL, first_online_node, + NULL); if (!pages) return -ENOMEM; + __kfence_pool = page_to_virt(pages); + pages = alloc_contig_pages(nr_pages_meta, GFP_KERNEL, first_online_node, + NULL); + if (pages) + kfence_metadata_init = page_to_virt(pages); #else - if (nr_pages > MAX_ORDER_NR_PAGES) { + if (nr_pages_pool > MAX_ORDER_NR_PAGES || + nr_pages_meta > MAX_ORDER_NR_PAGES) { pr_warn("KFENCE_NUM_OBJECTS too large for buddy allocator\n"); return -EINVAL; } + __kfence_pool = alloc_pages_exact(KFENCE_POOL_SIZE, GFP_KERNEL); if (!__kfence_pool) return -ENOMEM; + + kfence_metadata_init = alloc_pages_exact(KFENCE_METADATA_SIZE, GFP_KERNEL); #endif - if (!kfence_init_pool_late()) { - pr_err("%s failed\n", __func__); - return -EBUSY; + if (!kfence_metadata_init) + goto free_pool; + + memzero_explicit(kfence_metadata_init, KFENCE_METADATA_SIZE); + addr = kfence_init_pool(); + if (!addr) { + kfence_init_enable(); + kfence_debugfs_init(); + return 0; } - kfence_init_enable(); - kfence_debugfs_init(); + pr_err("%s failed\n", __func__); + free_size = KFENCE_POOL_SIZE - (addr - (unsigned long)__kfence_pool); + err = -EBUSY; - return 0; +#ifdef CONFIG_CONTIG_ALLOC + free_contig_range(page_to_pfn(virt_to_page((void *)kfence_metadata_init)), + nr_pages_meta); +free_pool: + free_contig_range(page_to_pfn(virt_to_page((void *)addr)), + free_size / PAGE_SIZE); +#else + free_pages_exact((void *)kfence_metadata_init, KFENCE_METADATA_SIZE); +free_pool: + free_pages_exact((void *)addr, free_size); +#endif + + kfence_metadata_init = NULL; + __kfence_pool = NULL; + return err; } static int kfence_enable_late(void) @@ -941,6 +986,9 @@ void kfence_shutdown_cache(struct kmem_cache *s) struct kfence_metadata *meta; int i; + if (!kfence_metadata) + return; + for (i = 0; i < CONFIG_KFENCE_NUM_OBJECTS; i++) { bool in_use; diff --git a/mm/kfence/kfence.h b/mm/kfence/kfence.h index 392fb273e7bd..f46fbb03062b 100644 --- a/mm/kfence/kfence.h +++ b/mm/kfence/kfence.h @@ -102,7 +102,10 @@ struct kfence_metadata { #endif }; -extern struct kfence_metadata kfence_metadata[CONFIG_KFENCE_NUM_OBJECTS]; +#define KFENCE_METADATA_SIZE PAGE_ALIGN(sizeof(struct kfence_metadata) * \ + CONFIG_KFENCE_NUM_OBJECTS) + +extern struct kfence_metadata *kfence_metadata; static inline struct kfence_metadata *addr_to_metadata(unsigned long addr) { diff --git a/mm/mm_init.c b/mm/mm_init.c index a1963c3322af..86b26d013f4b 100644 --- a/mm/mm_init.c +++ b/mm/mm_init.c @@ -2778,7 +2778,7 @@ void __init mm_core_init(void) */ page_ext_init_flatmem(); mem_debugging_and_hardening_init(); - kfence_alloc_pool(); + kfence_alloc_pool_and_metadata(); report_meminit(); kmsan_init_shadow(); stack_depot_early_init();