From patchwork Fri Sep 15 10:59:20 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matteo Rizzo X-Patchwork-Id: 13386894 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4E077EE6458 for ; Fri, 15 Sep 2023 10:59:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234069AbjIOK7r (ORCPT ); Fri, 15 Sep 2023 06:59:47 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54182 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234014AbjIOK7q (ORCPT ); Fri, 15 Sep 2023 06:59:46 -0400 Received: from mail-ej1-x64a.google.com (mail-ej1-x64a.google.com [IPv6:2a00:1450:4864:20::64a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 73182186 for ; Fri, 15 Sep 2023 03:59:41 -0700 (PDT) Received: by mail-ej1-x64a.google.com with SMTP id a640c23a62f3a-993c2d9e496so153476366b.0 for ; Fri, 15 Sep 2023 03:59:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1694775580; x=1695380380; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=Y3DxUvEg+3g6OhCwS+K/sONLa3J3FFOGvmkB4cPhPPQ=; b=jM9yoZmgXs7EqL1Jxcr3PrzkI1HmyIq50nPIQf1AQTR91kFWFnSUVKrUGXF8bBgwRR QIiURFlFD7QW0CC4frsPrl/cyYAWHx0CavTt9lVVRlHu8W00lI8Q2WhfmHckiL/1J6Hg jUzkDkBXS5H2AMd/p9DUpbGnVDvPwYexmUhygIICOwLPuuJWxE6MOGyENv60pcvZucst ZuINFfczGmlxBLbuKE2+GsvqgkhxQQEdllr0T5j+j86+rQGPtOqxUAUeQidd6OI31sW9 eFzjjVNQA8lic2JJHFFxQPpTYB1TiMPdym0mE/VDtK4ClVamHDnE8cXBCofN/I3MdPSA D/ug== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1694775580; x=1695380380; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Y3DxUvEg+3g6OhCwS+K/sONLa3J3FFOGvmkB4cPhPPQ=; b=BSHYMhDcLBIJtlPEOGSmL06XVG1WlZFeJB03k17/dY9uX3J9oZrAL5cCqG+sQyHZuu eFdK3u/tVSxZq9utUP1zFKtNRfBThq8NpKzZvUYCQnMGcLQMjDPFJhYfbStfBTBKS9+t 3zkjH4bj63sPordSHYOBnPpwvYOivn/VVR89FAd2BdzPMBaFOpQTrxNwhxXZGMwGEiE8 P7aBUzXMvdXh4W+wZEhm+2Nb+P9sTCwif5NgIMIZGlh1CS4IhqDNQOe54wrOXdyF0PnD B10GUp02OnHOpaj6cVD2GWcH/L1OCiflgaa1bY37d4hD3rpe6tfNr2c2MOB65/3kmXk6 VTWg== X-Gm-Message-State: AOJu0YxzGF1lWfbW5g3aMPC1HpfkdGFlw1QD/MdXmb6y9KBZPQ/cCzod kknVZAHfKsZ7qQ/3k6qVLjswiU3x8ESgJwtdTA== X-Google-Smtp-Source: AGHT+IFQRtSOFovuH5GEqReIKr5W2Ia+jznFnJVU0a2nrbLVpMe5HKKOvwnQcx9qn1Ns4cZ4yXwA2xDWIO7cNyXrLQ== X-Received: from mr-cloudtop2.c.googlers.com ([fda3:e722:ac3:cc00:31:98fb:c0a8:2a6]) (user=matteorizzo job=sendgmr) by 2002:a17:907:cb13:b0:9a1:cac8:6448 with SMTP id um19-20020a170907cb1300b009a1cac86448mr7245ejc.2.1694775579655; Fri, 15 Sep 2023 03:59:39 -0700 (PDT) Date: Fri, 15 Sep 2023 10:59:20 +0000 In-Reply-To: <20230915105933.495735-1-matteorizzo@google.com> Mime-Version: 1.0 References: <20230915105933.495735-1-matteorizzo@google.com> X-Mailer: git-send-email 2.42.0.459.ge4e396fd5e-goog Message-ID: <20230915105933.495735-2-matteorizzo@google.com> Subject: [RFC PATCH 01/14] mm/slub: don't try to dereference invalid freepointers From: Matteo Rizzo To: cl@linux.com, penberg@kernel.org, rientjes@google.com, iamjoonsoo.kim@lge.com, akpm@linux-foundation.org, vbabka@suse.cz, roman.gushchin@linux.dev, 42.hyeyoo@gmail.com, keescook@chromium.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-hardening@vger.kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org, hpa@zytor.com, corbet@lwn.net, luto@kernel.org, peterz@infradead.org Cc: jannh@google.com, matteorizzo@google.com, evn@google.com, poprdi@google.com, jordyzomer@google.com Precedence: bulk List-ID: X-Mailing-List: linux-hardening@vger.kernel.org slab_free_freelist_hook tries to read a freelist pointer from the current object even when freeing a single object. This is invalid because single objects don't actually contain a freelist pointer when they're freed and the memory contains other data. This causes problems for checking the integrity of freelist in get_freepointer. Signed-off-by: Matteo Rizzo Reviewed-by: Kees Cook Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> --- mm/slub.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/mm/slub.c b/mm/slub.c index f7940048138c..a7dae207c2d2 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -1820,7 +1820,9 @@ static inline bool slab_free_freelist_hook(struct kmem_cache *s, do { object = next; - next = get_freepointer(s, object); + /* Single objects don't actually contain a freepointer */ + if (object != old_tail) + next = get_freepointer(s, object); /* If object's reuse doesn't have to be delayed */ if (!slab_free_hook(s, object, slab_want_init_on_free(s))) { From patchwork Fri Sep 15 10:59:21 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matteo Rizzo X-Patchwork-Id: 13386895 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C4323EE6455 for ; Fri, 15 Sep 2023 10:59:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234269AbjIOK7x (ORCPT ); Fri, 15 Sep 2023 06:59:53 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54228 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233884AbjIOK7v (ORCPT ); Fri, 15 Sep 2023 06:59:51 -0400 Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com [IPv6:2607:f8b0:4864:20::114a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C5D20101 for ; Fri, 15 Sep 2023 03:59:43 -0700 (PDT) Received: by mail-yw1-x114a.google.com with SMTP id 00721157ae682-59b5d4a8242so25297717b3.0 for ; Fri, 15 Sep 2023 03:59:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1694775583; x=1695380383; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=ZucAKzByYrFaSau/ORvEIe6Pa38oq3gYlqLtnctNAqY=; b=0UUipZjE05nfLxs//N+5K5UD1LOg8zvkCIA/PPAeacB3UA7ZgNPiM9SlycPer+gLF6 BVy9xvJYA7dkKfNTjNj3cw9Eaw0qi2E4DgJr/hnzdNf5+t9Ls6xgwqREMZBwKqe22HpY j5CL2LouU3WbUF4RCC226OpcwNNd+ELOMxYkpaj3+WIgX8QsFyztklW/ZsJVn0V/dXmI k0QnktpwyQRYNLgnZ7rqtRY82q8CH/jlzfkde0KOmW0NZUTkusQmZsKSS/pH9RdFBUK6 UHcp56PWP8oOuIbj/2X0zj7lbrzm1aYWQ/HMSNoovmjrkCzoIWbv8HDQq9at2315+BBv Ne/Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1694775583; x=1695380383; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=ZucAKzByYrFaSau/ORvEIe6Pa38oq3gYlqLtnctNAqY=; b=oYibLbDlEKaIz9vaj0+W5H0lxqQ45jMz8tK/G2tByL2Sz1f2mRcnTh4OcX88UBY6xL QHqK5gMXIU6cdZ4M9cLWtWXBJsRSaIT36eXAxXPYSQGrZWQJv1YjB+4ozc4tEJz/CNOL o0n8Dyu/+sqtfuKjZ6l2UTCyKilaBDwNs6VmLn8mbdYk8Py+vcjG5DGtHM9A0ojnYhp8 JGc0PNfvSU0OE+odC4NYUfDhTvHpyyVo7VlBpwTRdRhX2HFhFXFR6fpc4ZvM4dEM4PBH uJNpmAFELJTJlxbZu71vZNJiGsTgJ8Nwkkv/k/1rBgGIUZ5bFeCvOS/zH7dB88pCxw8O 5BwA== X-Gm-Message-State: AOJu0Yxpr3m8e0BFJM/ldSKdVDQd1PKhx2wWU3qR69fnkOrcvxd2vwLB VxsvGnQR+SHj14MCjeCuMqH4258YwhcVTnZBPg== X-Google-Smtp-Source: AGHT+IEAnaaQu/BWsVFGNIRiv6lcplWzZAAkMq3yz1aqKWinKyMc9fRYJHvcl3uXxi5mNmOlrck3nzUMRNpgDaT5wA== X-Received: from mr-cloudtop2.c.googlers.com ([fda3:e722:ac3:cc00:31:98fb:c0a8:2a6]) (user=matteorizzo job=sendgmr) by 2002:a81:b612:0:b0:565:9bee:22e0 with SMTP id u18-20020a81b612000000b005659bee22e0mr34824ywh.0.1694775582994; Fri, 15 Sep 2023 03:59:42 -0700 (PDT) Date: Fri, 15 Sep 2023 10:59:21 +0000 In-Reply-To: <20230915105933.495735-1-matteorizzo@google.com> Mime-Version: 1.0 References: <20230915105933.495735-1-matteorizzo@google.com> X-Mailer: git-send-email 2.42.0.459.ge4e396fd5e-goog Message-ID: <20230915105933.495735-3-matteorizzo@google.com> Subject: [RFC PATCH 02/14] mm/slub: add is_slab_addr/is_slab_page helpers From: Matteo Rizzo To: cl@linux.com, penberg@kernel.org, rientjes@google.com, iamjoonsoo.kim@lge.com, akpm@linux-foundation.org, vbabka@suse.cz, roman.gushchin@linux.dev, 42.hyeyoo@gmail.com, keescook@chromium.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-hardening@vger.kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org, hpa@zytor.com, corbet@lwn.net, luto@kernel.org, peterz@infradead.org Cc: jannh@google.com, matteorizzo@google.com, evn@google.com, poprdi@google.com, jordyzomer@google.com Precedence: bulk List-ID: X-Mailing-List: linux-hardening@vger.kernel.org From: Jann Horn This is refactoring in preparation for adding two different implementations (for SLAB_VIRTUAL enabled and disabled). virt_to_folio(x) expands to _compound_head(virt_to_page(x)) and virt_to_head_page(x) also expands to _compound_head(virt_to_page(x)) so PageSlab(virt_to_head_page(res)) should be equivalent to is_slab_addr(res). Signed-off-by: Jann Horn Co-developed-by: Matteo Rizzo Signed-off-by: Matteo Rizzo Reviewed-by: Kees Cook --- include/linux/slab.h | 1 + kernel/resource.c | 2 +- mm/slab.h | 9 +++++++++ mm/slab_common.c | 5 ++--- mm/slub.c | 6 +++--- 5 files changed, 16 insertions(+), 7 deletions(-) diff --git a/include/linux/slab.h b/include/linux/slab.h index 8228d1276a2f..a2d82010d269 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -793,4 +793,5 @@ int slab_dead_cpu(unsigned int cpu); #define slab_dead_cpu NULL #endif +#define is_slab_addr(addr) folio_test_slab(virt_to_folio(addr)) #endif /* _LINUX_SLAB_H */ diff --git a/kernel/resource.c b/kernel/resource.c index b1763b2fd7ef..c829e5f97292 100644 --- a/kernel/resource.c +++ b/kernel/resource.c @@ -158,7 +158,7 @@ static void free_resource(struct resource *res) * buddy and trying to be smart and reusing them eventually in * alloc_resource() overcomplicates resource handling. */ - if (res && PageSlab(virt_to_head_page(res))) + if (res && is_slab_addr(res)) kfree(res); } diff --git a/mm/slab.h b/mm/slab.h index 799a315695c6..25e41dd6087e 100644 --- a/mm/slab.h +++ b/mm/slab.h @@ -169,6 +169,15 @@ static_assert(IS_ALIGNED(offsetof(struct slab, freelist), sizeof(freelist_aba_t) */ #define slab_page(s) folio_page(slab_folio(s), 0) +/** + * is_slab_page - Checks if a page is really a slab page + * @s: The slab + * + * Checks if s points to a slab page. + * + * Return: true if s points to a slab and false otherwise. + */ +#define is_slab_page(s) folio_test_slab(slab_folio(s)) /* * If network-based swap is enabled, sl*b must keep track of whether pages * were allocated from pfmemalloc reserves. diff --git a/mm/slab_common.c b/mm/slab_common.c index e99e821065c3..79102d24f099 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -1063,7 +1063,7 @@ void kfree(const void *object) return; folio = virt_to_folio(object); - if (unlikely(!folio_test_slab(folio))) { + if (unlikely(!is_slab_addr(object))) { free_large_kmalloc(folio, (void *)object); return; } @@ -1094,8 +1094,7 @@ size_t __ksize(const void *object) return 0; folio = virt_to_folio(object); - - if (unlikely(!folio_test_slab(folio))) { + if (unlikely(!is_slab_addr(object))) { if (WARN_ON(folio_size(folio) <= KMALLOC_MAX_CACHE_SIZE)) return 0; if (WARN_ON(object != folio_address(folio))) diff --git a/mm/slub.c b/mm/slub.c index a7dae207c2d2..b69916ab7aa8 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -1259,7 +1259,7 @@ static int check_slab(struct kmem_cache *s, struct slab *slab) { int maxobj; - if (!folio_test_slab(slab_folio(slab))) { + if (!is_slab_page(slab)) { slab_err(s, slab, "Not a valid slab page"); return 0; } @@ -1454,7 +1454,7 @@ static noinline bool alloc_debug_processing(struct kmem_cache *s, return true; bad: - if (folio_test_slab(slab_folio(slab))) { + if (is_slab_page(slab)) { /* * If this is a slab page then lets do the best we can * to avoid issues in the future. Marking all objects @@ -1484,7 +1484,7 @@ static inline int free_consistency_checks(struct kmem_cache *s, return 0; if (unlikely(s != slab->slab_cache)) { - if (!folio_test_slab(slab_folio(slab))) { + if (!is_slab_page(slab)) { slab_err(s, slab, "Attempt to free object(0x%p) outside of slab", object); } else if (!slab->slab_cache) { From patchwork Fri Sep 15 10:59:22 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matteo Rizzo X-Patchwork-Id: 13386896 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id EDDD4EE645A for ; Fri, 15 Sep 2023 10:59:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233910AbjIOK7z (ORCPT ); Fri, 15 Sep 2023 06:59:55 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37786 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234277AbjIOK7x (ORCPT ); Fri, 15 Sep 2023 06:59:53 -0400 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9E4DCCC6 for ; Fri, 15 Sep 2023 03:59:46 -0700 (PDT) Received: by mail-yb1-xb4a.google.com with SMTP id 3f1490d57ef6-d8027f9dfefso2372804276.0 for ; Fri, 15 Sep 2023 03:59:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1694775585; x=1695380385; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=dKUQWZsAVWnYh2N3/tMI/up9IoZgDH1PEAwTzCTiQv4=; b=VEPfBEVXTmYLoHER1CewY8TLgJJJdmC+XcNqAnmS2A+IIP2QTrpVnBGgnTjVsSFNKO I+9/qcdpOGpj6tG1v4xqvLtgWOgT1VG7aYNS9Mnfyc2G8saHlYRws/+BoL4+s9bjrbTo fgPbErMim2HYgkcTDbtAw8GiOTxtQkPF8wGj7yWskWt3M7XXYbvDjdqdBVAzXm90+7yR gweACBwsKptAT/hkF7Mb3asWpix2DuE9DxsI5GI0xnGX4aAAuQETWwmlptjAMiwwVuny EaYa5Ld9q+oCWdEiJ/Gotd7V4ml45kkXp6YjgvxTUYY+11euFyFxpY5dbhER+6FSw8df zUPA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1694775585; x=1695380385; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=dKUQWZsAVWnYh2N3/tMI/up9IoZgDH1PEAwTzCTiQv4=; b=kMD7SjycbnKc3T6oj6PNc2BRr7md5c61UjgVJ5//YpL1V8s7/5us3phyNdIKgfpGFz JOE6dBREfaqEjnxR8LdVMtmo/pHiDm7n1yDmIFJvHeereVO1d0clq4THIkPppeQeqVOD Xotll8iP8mbtU8iZYReeO0Yr1PhcD7e2or/Rofe4cqyuZzAAhF/z+qrqkTDCIJsfNHoN P0QZE6ufWFBk3R0IiHyFI4tnGcMJ5K29XjpYLt1PGMmA/Gxy2VKE7nECPl+qvSOtIsW3 WS0rGrTTuHi7mqOh2f8irg+1ZKRisa5v7hoFKzZpfB9d4PZ+XmAPvRHlmt8o8nvczBYk esTw== X-Gm-Message-State: AOJu0Yy8O3wvSees4thLuDgxx2f9EK7E9BtZaT0T93TZLCMmwseUKqhu Ry3+pBu/4DuS8CsJEs7tTqbiVcDBb01vaimntA== X-Google-Smtp-Source: AGHT+IExUgO0kRxpLLuV9VDemRInF3a4NZjmmNsX0ecLXHU0nrTEoCf8spF7NDX59lcBkESYL/MAzaMJx3pxH3/0YQ== X-Received: from mr-cloudtop2.c.googlers.com ([fda3:e722:ac3:cc00:31:98fb:c0a8:2a6]) (user=matteorizzo job=sendgmr) by 2002:a05:6902:138e:b0:d78:245a:aac4 with SMTP id x14-20020a056902138e00b00d78245aaac4mr27236ybu.1.1694775585627; Fri, 15 Sep 2023 03:59:45 -0700 (PDT) Date: Fri, 15 Sep 2023 10:59:22 +0000 In-Reply-To: <20230915105933.495735-1-matteorizzo@google.com> Mime-Version: 1.0 References: <20230915105933.495735-1-matteorizzo@google.com> X-Mailer: git-send-email 2.42.0.459.ge4e396fd5e-goog Message-ID: <20230915105933.495735-4-matteorizzo@google.com> Subject: [RFC PATCH 03/14] mm/slub: move kmem_cache_order_objects to slab.h From: Matteo Rizzo To: cl@linux.com, penberg@kernel.org, rientjes@google.com, iamjoonsoo.kim@lge.com, akpm@linux-foundation.org, vbabka@suse.cz, roman.gushchin@linux.dev, 42.hyeyoo@gmail.com, keescook@chromium.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-hardening@vger.kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org, hpa@zytor.com, corbet@lwn.net, luto@kernel.org, peterz@infradead.org Cc: jannh@google.com, matteorizzo@google.com, evn@google.com, poprdi@google.com, jordyzomer@google.com Precedence: bulk List-ID: X-Mailing-List: linux-hardening@vger.kernel.org From: Jann Horn This is refactoring for SLAB_VIRTUAL. The implementation needs to know the order of the virtual memory region allocated to each slab to know how much physical memory to allocate when the slab is reused. We reuse kmem_cache_order_objects for this, so we have to move it before struct slab. Signed-off-by: Jann Horn Co-developed-by: Matteo Rizzo Signed-off-by: Matteo Rizzo Reviewed-by: Kees Cook --- include/linux/slub_def.h | 9 --------- mm/slab.h | 22 ++++++++++++++++++++++ mm/slub.c | 12 ------------ 3 files changed, 22 insertions(+), 21 deletions(-) diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h index deb90cf4bffb..0adf5ba8241b 100644 --- a/include/linux/slub_def.h +++ b/include/linux/slub_def.h @@ -83,15 +83,6 @@ struct kmem_cache_cpu { #define slub_percpu_partial_read_once(c) NULL #endif // CONFIG_SLUB_CPU_PARTIAL -/* - * Word size structure that can be atomically updated or read and that - * contains both the order and the number of objects that a slab of the - * given order would contain. - */ -struct kmem_cache_order_objects { - unsigned int x; -}; - /* * Slab cache management. */ diff --git a/mm/slab.h b/mm/slab.h index 25e41dd6087e..3fe0d1e26e26 100644 --- a/mm/slab.h +++ b/mm/slab.h @@ -38,6 +38,15 @@ typedef union { freelist_full_t full; } freelist_aba_t; +/* + * Word size structure that can be atomically updated or read and that + * contains both the order and the number of objects that a slab of the + * given order would contain. + */ +struct kmem_cache_order_objects { + unsigned int x; +}; + /* Reuses the bits in struct page */ struct slab { unsigned long __page_flags; @@ -227,6 +236,19 @@ static inline struct slab *virt_to_slab(const void *addr) return folio_slab(folio); } +#define OO_SHIFT 16 +#define OO_MASK ((1 << OO_SHIFT) - 1) + +static inline unsigned int oo_order(struct kmem_cache_order_objects x) +{ + return x.x >> OO_SHIFT; +} + +static inline unsigned int oo_objects(struct kmem_cache_order_objects x) +{ + return x.x & OO_MASK; +} + static inline int slab_order(const struct slab *slab) { return folio_order((struct folio *)slab_folio(slab)); diff --git a/mm/slub.c b/mm/slub.c index b69916ab7aa8..df2529c03bd3 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -284,8 +284,6 @@ static inline bool kmem_cache_has_cpu_partial(struct kmem_cache *s) */ #define DEBUG_METADATA_FLAGS (SLAB_RED_ZONE | SLAB_POISON | SLAB_STORE_USER) -#define OO_SHIFT 16 -#define OO_MASK ((1 << OO_SHIFT) - 1) #define MAX_OBJS_PER_PAGE 32767 /* since slab.objects is u15 */ /* Internal SLUB flags */ @@ -473,16 +471,6 @@ static inline struct kmem_cache_order_objects oo_make(unsigned int order, return x; } -static inline unsigned int oo_order(struct kmem_cache_order_objects x) -{ - return x.x >> OO_SHIFT; -} - -static inline unsigned int oo_objects(struct kmem_cache_order_objects x) -{ - return x.x & OO_MASK; -} - #ifdef CONFIG_SLUB_CPU_PARTIAL static void slub_set_cpu_partial(struct kmem_cache *s, unsigned int nr_objects) { From patchwork Fri Sep 15 10:59:23 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matteo Rizzo X-Patchwork-Id: 13386897 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 694FEEE6458 for ; Fri, 15 Sep 2023 11:00:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234258AbjIOLAF (ORCPT ); Fri, 15 Sep 2023 07:00:05 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37780 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234266AbjIOK7y (ORCPT ); Fri, 15 Sep 2023 06:59:54 -0400 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 172AF186 for ; Fri, 15 Sep 2023 03:59:49 -0700 (PDT) Received: by mail-yb1-xb4a.google.com with SMTP id 3f1490d57ef6-d81e72d4ec0so458499276.0 for ; Fri, 15 Sep 2023 03:59:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1694775588; x=1695380388; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=xbAGox+1BQp1J2SCvFRdM4+gVemYBYzERzmxGZTZFIk=; b=S9XslvuI5gBbHvYu8f7Ky91OO+Y1hZncxlL21uUgx5MO/6rdEDpG76N2oSViFvwES8 pfJbPgJgmsNYUMj+iKxbvzeqx29PMB1MVrDAmj6Tgtz2NgLTjhhO1PtVSIPHwqz3meOm QBRMSCuftb9Jei3z8+MmglO2IAPtTm1gdJEyp2c8FRobWY8VwYMQc9IbKSLvNzxFzKFk R0WJByLuE59mANduQXQA7w4DguI7zC2ac+6ZZ7xmgMkeAFlABS2LXCaseJC9jU38vZ3H 8fI/jfB0pgdQaVGTdAveXUqn2MZ9Zqj/Gqh9pvIHrh0Exc8SVaRwFXT6zMTVtIAFK/c3 /QIw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1694775588; x=1695380388; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=xbAGox+1BQp1J2SCvFRdM4+gVemYBYzERzmxGZTZFIk=; b=jNvOUAm3/qYk7lr7yp8reO3wfofE4xp4YNYxNOxrTLP9Shu9CSqDLGJ58Kyu7/rr1j LogZG/Ud3c2rlVonv01v+gku7MxpmqcHLgX2wn0+R3HSfEQVfb2w/WeUYce9i3Jkv5vr PmG5M97u0wHjOZYCvOaQ6iZ/N5G1Dp3l1OCXxeIBF8TJOvaJejECe2qWv9ZimGccPSCs oPU4xYv882TYq1AqpmRtEZ37SCSJmIntqun9sDnJtd9OgLoR8Bqm4hHuj8OX2FSdbG9H 41iRrhzkRt+Xh8vIZwiYL5rlpo/NoY62s64iEvTmFw5A76cqhScHeqsaHHuxrp/k0O90 N4rQ== X-Gm-Message-State: AOJu0Yy+rB3fgRcjJSMyvkWcMjYzKhib6MKrcHTlmez/N+sJ10VjVFq2 civCeDe5fRPzxdvuh1mCSA8s3sRzpUbu8wAlkA== X-Google-Smtp-Source: AGHT+IFkQIG3OgVqqlEN6/c9lGi9RDx6q7XJsfPR2EVN/OdFKbEkhspIMm/2iOfMAd3xJyQxw6Wg1f3WaLFs8ikxSg== X-Received: from mr-cloudtop2.c.googlers.com ([fda3:e722:ac3:cc00:31:98fb:c0a8:2a6]) (user=matteorizzo job=sendgmr) by 2002:a05:6902:11cd:b0:d81:5c03:df99 with SMTP id n13-20020a05690211cd00b00d815c03df99mr40387ybu.3.1694775588301; Fri, 15 Sep 2023 03:59:48 -0700 (PDT) Date: Fri, 15 Sep 2023 10:59:23 +0000 In-Reply-To: <20230915105933.495735-1-matteorizzo@google.com> Mime-Version: 1.0 References: <20230915105933.495735-1-matteorizzo@google.com> X-Mailer: git-send-email 2.42.0.459.ge4e396fd5e-goog Message-ID: <20230915105933.495735-5-matteorizzo@google.com> Subject: [RFC PATCH 04/14] mm: use virt_to_slab instead of folio_slab From: Matteo Rizzo To: cl@linux.com, penberg@kernel.org, rientjes@google.com, iamjoonsoo.kim@lge.com, akpm@linux-foundation.org, vbabka@suse.cz, roman.gushchin@linux.dev, 42.hyeyoo@gmail.com, keescook@chromium.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-hardening@vger.kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org, hpa@zytor.com, corbet@lwn.net, luto@kernel.org, peterz@infradead.org Cc: jannh@google.com, matteorizzo@google.com, evn@google.com, poprdi@google.com, jordyzomer@google.com Precedence: bulk List-ID: X-Mailing-List: linux-hardening@vger.kernel.org From: Jann Horn This is refactoring in preparation for the introduction of SLAB_VIRTUAL which does not implement folio_slab. With SLAB_VIRTUAL there is no longer a 1:1 correspondence between slabs and pages of physical memory used by the slab allocator. There is no way to look up the slab which corresponds to a specific page of physical memory without iterating over all slabs or over the page tables. Instead of doing that, we can look up the slab starting from its virtual address which can still be performed cheaply with both SLAB_VIRTUAL enabled and disabled. Signed-off-by: Jann Horn Co-developed-by: Matteo Rizzo Signed-off-by: Matteo Rizzo Reviewed-by: Kees Cook --- mm/memcontrol.c | 2 +- mm/slab_common.c | 12 +++++++----- mm/slub.c | 14 ++++++-------- 3 files changed, 14 insertions(+), 14 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index e8ca4bdcb03c..0ab9f5323db7 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2936,7 +2936,7 @@ struct mem_cgroup *mem_cgroup_from_obj_folio(struct folio *folio, void *p) struct slab *slab; unsigned int off; - slab = folio_slab(folio); + slab = virt_to_slab(p); objcgs = slab_objcgs(slab); if (!objcgs) return NULL; diff --git a/mm/slab_common.c b/mm/slab_common.c index 79102d24f099..42ceaf7e9f47 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -1062,13 +1062,13 @@ void kfree(const void *object) if (unlikely(ZERO_OR_NULL_PTR(object))) return; - folio = virt_to_folio(object); if (unlikely(!is_slab_addr(object))) { + folio = virt_to_folio(object); free_large_kmalloc(folio, (void *)object); return; } - slab = folio_slab(folio); + slab = virt_to_slab(object); s = slab->slab_cache; __kmem_cache_free(s, (void *)object, _RET_IP_); } @@ -1089,12 +1089,13 @@ EXPORT_SYMBOL(kfree); size_t __ksize(const void *object) { struct folio *folio; + struct kmem_cache *s; if (unlikely(object == ZERO_SIZE_PTR)) return 0; - folio = virt_to_folio(object); if (unlikely(!is_slab_addr(object))) { + folio = virt_to_folio(object); if (WARN_ON(folio_size(folio) <= KMALLOC_MAX_CACHE_SIZE)) return 0; if (WARN_ON(object != folio_address(folio))) @@ -1102,11 +1103,12 @@ size_t __ksize(const void *object) return folio_size(folio); } + s = virt_to_slab(object)->slab_cache; #ifdef CONFIG_SLUB_DEBUG - skip_orig_size_check(folio_slab(folio)->slab_cache, object); + skip_orig_size_check(s, object); #endif - return slab_ksize(folio_slab(folio)->slab_cache); + return slab_ksize(s); } void *kmalloc_trace(struct kmem_cache *s, gfp_t gfpflags, size_t size) diff --git a/mm/slub.c b/mm/slub.c index df2529c03bd3..ad33d9e1601d 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -3848,25 +3848,23 @@ int build_detached_freelist(struct kmem_cache *s, size_t size, { int lookahead = 3; void *object; - struct folio *folio; + struct slab *slab; size_t same; object = p[--size]; - folio = virt_to_folio(object); + slab = virt_to_slab(object); if (!s) { /* Handle kalloc'ed objects */ - if (unlikely(!folio_test_slab(folio))) { - free_large_kmalloc(folio, object); + if (unlikely(slab == NULL)) { + free_large_kmalloc(virt_to_folio(object), object); df->slab = NULL; return size; } - /* Derive kmem_cache from object */ - df->slab = folio_slab(folio); - df->s = df->slab->slab_cache; + df->s = slab->slab_cache; } else { - df->slab = folio_slab(folio); df->s = cache_from_obj(s, object); /* Support for memcg */ } + df->slab = slab; /* Start new detached freelist */ df->tail = object; From patchwork Fri Sep 15 10:59:24 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matteo Rizzo X-Patchwork-Id: 13386898 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B0B72EE6458 for ; Fri, 15 Sep 2023 11:00:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232147AbjIOLAI (ORCPT ); Fri, 15 Sep 2023 07:00:08 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45746 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234336AbjIOLAF (ORCPT ); Fri, 15 Sep 2023 07:00:05 -0400 Received: from mail-ej1-x64a.google.com (mail-ej1-x64a.google.com [IPv6:2a00:1450:4864:20::64a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 446CBCD0 for ; Fri, 15 Sep 2023 03:59:52 -0700 (PDT) Received: by mail-ej1-x64a.google.com with SMTP id a640c23a62f3a-9adb6dd9e94so137301366b.2 for ; Fri, 15 Sep 2023 03:59:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1694775590; x=1695380390; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=jWKqeflunudoGymWmoNHHzgvD8T7OkY46quv3dOXpy0=; b=y7rwNgx5k+DXr11V2BHWhy4llRFlcX1PV5xnO8aUypv9+5X3Q1RLH6DBPt5Ei6aH/q MxB3tr91+IBbtWdSO/Iz60Ku74D9gg4vdywXgL7Kdwzd0JxS7ke/fhSDQEP3g57rh4dy lwKdhFHtZoGTAVhH9eQRNbWgKxcSO2VGa4oI9/yXHSCmM1u6xHDDSg7lZyjwRmqcmDZt mXjdu88rV0iq2Ilc4zAVdsNx9jp/dk3Ytqc+GcVimQFc1xJnw/GSGP6MxLx3iyQFyO47 yd+xZQuBp0PT/s0HZufJpypXiSURxaYzoSiDjTvy0d6q++ttBaNr9Dcm5fvlCM6Zhdqq bgYQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1694775590; x=1695380390; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=jWKqeflunudoGymWmoNHHzgvD8T7OkY46quv3dOXpy0=; b=YRMiVFWSCyuzyQilp450A5rtMstxB+Zn5ORJL+9byu8sJH8tkxR8x3iydmv55TEgi/ xis6mf4vFMkN7tiRJyXYPT1SG38WNFT/o5Ec6+ce/ypB9/BKQ40UZgWJ4JaCj8phCqHP NYzmjPtZW+bl23Vl/2K+XnDn5Ak5gE80o4R+IRbLjJwuUmqMA4xCmjdUEKKsC0XkzXIC BtAxx2V6sAqppESlJmyJ5cLJEFUthk/HCxNULTo5M4tvETfoAzHY82cLopAqAmQbllDc NehzDSoOWFXwuL4G0eTpfGBPAT9+4s6YoJEYQNk6xH10hGMj2vYFlVjDi1ginqj0x7h5 n6mQ== X-Gm-Message-State: AOJu0Yw9z8zvoT2gwFazXA0NlJuS6GXr1ESaZNbjkNFZizadJKjJ80nd ov/EFCWA+GwJFEJnnbyFoVPggznHdbOWrI31Xg== X-Google-Smtp-Source: AGHT+IHib2bYQAS4qp2GAjeH5qpDFROM7WBawhFVE9nuJn7ed324ILtD5K90tbrPwgE6ihy4nV/s6H7sq+VLOkV6Ww== X-Received: from mr-cloudtop2.c.googlers.com ([fda3:e722:ac3:cc00:31:98fb:c0a8:2a6]) (user=matteorizzo job=sendgmr) by 2002:a17:907:890:b0:9ad:a751:2ea3 with SMTP id zt16-20020a170907089000b009ada7512ea3mr6741ejb.6.1694775590539; Fri, 15 Sep 2023 03:59:50 -0700 (PDT) Date: Fri, 15 Sep 2023 10:59:24 +0000 In-Reply-To: <20230915105933.495735-1-matteorizzo@google.com> Mime-Version: 1.0 References: <20230915105933.495735-1-matteorizzo@google.com> X-Mailer: git-send-email 2.42.0.459.ge4e396fd5e-goog Message-ID: <20230915105933.495735-6-matteorizzo@google.com> Subject: [RFC PATCH 05/14] mm/slub: create folio_set/clear_slab helpers From: Matteo Rizzo To: cl@linux.com, penberg@kernel.org, rientjes@google.com, iamjoonsoo.kim@lge.com, akpm@linux-foundation.org, vbabka@suse.cz, roman.gushchin@linux.dev, 42.hyeyoo@gmail.com, keescook@chromium.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-hardening@vger.kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org, hpa@zytor.com, corbet@lwn.net, luto@kernel.org, peterz@infradead.org Cc: jannh@google.com, matteorizzo@google.com, evn@google.com, poprdi@google.com, jordyzomer@google.com Precedence: bulk List-ID: X-Mailing-List: linux-hardening@vger.kernel.org From: Jann Horn This is refactoring in preparation for SLAB_VIRTUAL. Extract this code to separate functions so that it's not duplicated in the code that allocates and frees page with SLAB_VIRTUAL enabled. Signed-off-by: Jann Horn Co-developed-by: Matteo Rizzo Signed-off-by: Matteo Rizzo Reviewed-by: Kees Cook --- mm/slub.c | 32 ++++++++++++++++++++++---------- 1 file changed, 22 insertions(+), 10 deletions(-) diff --git a/mm/slub.c b/mm/slub.c index ad33d9e1601d..9b87afade125 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -1849,6 +1849,26 @@ static void *setup_object(struct kmem_cache *s, void *object) /* * Slab allocation and freeing */ + +static void folio_set_slab(struct folio *folio, struct slab *slab) +{ + __folio_set_slab(folio); + /* Make the flag visible before any changes to folio->mapping */ + smp_wmb(); + + if (folio_is_pfmemalloc(folio)) + slab_set_pfmemalloc(slab); +} + +static void folio_clear_slab(struct folio *folio, struct slab *slab) +{ + __slab_clear_pfmemalloc(slab); + folio->mapping = NULL; + /* Make the mapping reset visible before clearing the flag */ + smp_wmb(); + __folio_clear_slab(folio); +} + static inline struct slab *alloc_slab_page(gfp_t flags, int node, struct kmem_cache_order_objects oo) { @@ -1865,11 +1885,7 @@ static inline struct slab *alloc_slab_page(gfp_t flags, int node, return NULL; slab = folio_slab(folio); - __folio_set_slab(folio); - /* Make the flag visible before any changes to folio->mapping */ - smp_wmb(); - if (folio_is_pfmemalloc(folio)) - slab_set_pfmemalloc(slab); + folio_set_slab(folio, slab); return slab; } @@ -2067,11 +2083,7 @@ static void __free_slab(struct kmem_cache *s, struct slab *slab) int order = folio_order(folio); int pages = 1 << order; - __slab_clear_pfmemalloc(slab); - folio->mapping = NULL; - /* Make the mapping reset visible before clearing the flag */ - smp_wmb(); - __folio_clear_slab(folio); + folio_clear_slab(folio, slab); mm_account_reclaimed_pages(pages); unaccount_slab(slab, order, s); __free_pages(&folio->page, order); From patchwork Fri Sep 15 10:59:25 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matteo Rizzo X-Patchwork-Id: 13386899 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C490AEE6458 for ; Fri, 15 Sep 2023 11:00:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234237AbjIOLAV (ORCPT ); Fri, 15 Sep 2023 07:00:21 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45784 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234329AbjIOLAK (ORCPT ); Fri, 15 Sep 2023 07:00:10 -0400 Received: from mail-ej1-x64a.google.com (mail-ej1-x64a.google.com [IPv6:2a00:1450:4864:20::64a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 06C571713 for ; Fri, 15 Sep 2023 03:59:54 -0700 (PDT) Received: by mail-ej1-x64a.google.com with SMTP id a640c23a62f3a-9a9e12a3093so319230866b.0 for ; Fri, 15 Sep 2023 03:59:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1694775593; x=1695380393; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=h/q3ouMwQ7IDAIohC0sCavv+JpOlHz5JpjrFwf8Yaow=; b=kf7ywg1cK7qdtNseJfNJII+vGPd7GqIfWF5yIxge0QWHLNSZcmfhkFdLuP1xmndcc7 nSBFVW6bbXgqYdZF44TryJrX6lIXD0llXx/q4W7/HIK/KZL9UHx9DRlyvIlyYDw4ikIO 9J86p55d8HaGcnaJT81C9lcK4VDFZoSc9Y0DKt1vNPhkgeABErHaTpvAmsOy54Qs6P3v TXIZFA7nvgHo6K/GkBdhDBToPm/1JwRQB8rbxbt7C/wuYEYeDJ7WAEJYXuhuTMnr1Irw cN37ZbwXueaajQEwjv3mx5ecKdvPGuUxp7fRsw6UgXKUQKlzyp5JhwbOB3tozbjRyo+N Du3Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1694775593; x=1695380393; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=h/q3ouMwQ7IDAIohC0sCavv+JpOlHz5JpjrFwf8Yaow=; b=L2eXHFj4DzAU+JPWnXNDGQPL3QslWiyyT4QyeG6M7NX0hp555oSvJjTUEM9qT6hWCc 871W/jS2lIXkGqSkBsPjrpMbpum8nFT/+CDEoey9rDheZb4J34KzWF4GcFrI9p5RjwUT 0jVdZqQ8dpEVpeA/vBInBYYNOqWMuG/1vKLGai0E6FL3jo5MiUe/vs2HverKcJxbt8JV 8KQe42ZuxSsgf/jRvmCJ3YF/fpRpyF+2QY5HOdGznvKaR2O3sT1LNRi/JiScDXCAiB7g wyI2OsZTV9r7zDSdO3FOTg/bh/KUAfDRh0N7hXNqhF2KTU1aBISrgqU55BbUTiClSgtA b2lA== X-Gm-Message-State: AOJu0YxTvENGCuNXVWuzjoTyuDCNU9refJ8OJWxMt0ltit5I4gkzhkIG gaB9qj7NN1OeQ18h2CW8wCiULQQ8HrlhwoOiXQ== X-Google-Smtp-Source: AGHT+IGzGcqa6kGNtl+Zbc2lxfKKH+MRmw22LvMY2Bcl+linrh96N5epPcLI/8cSAKDbzrfQBk304rpFFgdGLDJl5w== X-Received: from mr-cloudtop2.c.googlers.com ([fda3:e722:ac3:cc00:31:98fb:c0a8:2a6]) (user=matteorizzo job=sendgmr) by 2002:a17:907:c9aa:b0:9ad:c79d:2a20 with SMTP id uj42-20020a170907c9aa00b009adc79d2a20mr11116ejc.1.1694775593399; Fri, 15 Sep 2023 03:59:53 -0700 (PDT) Date: Fri, 15 Sep 2023 10:59:25 +0000 In-Reply-To: <20230915105933.495735-1-matteorizzo@google.com> Mime-Version: 1.0 References: <20230915105933.495735-1-matteorizzo@google.com> X-Mailer: git-send-email 2.42.0.459.ge4e396fd5e-goog Message-ID: <20230915105933.495735-7-matteorizzo@google.com> Subject: [RFC PATCH 06/14] mm/slub: pass additional args to alloc_slab_page From: Matteo Rizzo To: cl@linux.com, penberg@kernel.org, rientjes@google.com, iamjoonsoo.kim@lge.com, akpm@linux-foundation.org, vbabka@suse.cz, roman.gushchin@linux.dev, 42.hyeyoo@gmail.com, keescook@chromium.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-hardening@vger.kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org, hpa@zytor.com, corbet@lwn.net, luto@kernel.org, peterz@infradead.org Cc: jannh@google.com, matteorizzo@google.com, evn@google.com, poprdi@google.com, jordyzomer@google.com Precedence: bulk List-ID: X-Mailing-List: linux-hardening@vger.kernel.org From: Jann Horn This is refactoring in preparation for SLAB_VIRTUAL. The implementation of SLAB_VIRTUAL needs access to struct kmem_cache in alloc_slab_page in order to take unused slabs from the slab freelist, which is per-cache. In addition to that it passes two different sets of GFP flags. meta_gfp_flags is used for the memory backing the metadata region and page tables, and gfp_flags for the data memory. Signed-off-by: Jann Horn Co-developed-by: Matteo Rizzo Signed-off-by: Matteo Rizzo Reviewed-by: Kees Cook --- mm/slub.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/mm/slub.c b/mm/slub.c index 9b87afade125..eaa1256aff89 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -1869,7 +1869,8 @@ static void folio_clear_slab(struct folio *folio, struct slab *slab) __folio_clear_slab(folio); } -static inline struct slab *alloc_slab_page(gfp_t flags, int node, +static inline struct slab *alloc_slab_page(struct kmem_cache *s, + gfp_t meta_flags, gfp_t flags, int node, struct kmem_cache_order_objects oo) { struct folio *folio; @@ -2020,7 +2021,7 @@ static struct slab *allocate_slab(struct kmem_cache *s, gfp_t flags, int node) if ((alloc_gfp & __GFP_DIRECT_RECLAIM) && oo_order(oo) > oo_order(s->min)) alloc_gfp = (alloc_gfp | __GFP_NOMEMALLOC) & ~__GFP_RECLAIM; - slab = alloc_slab_page(alloc_gfp, node, oo); + slab = alloc_slab_page(s, flags, alloc_gfp, node, oo); if (unlikely(!slab)) { oo = s->min; alloc_gfp = flags; @@ -2028,7 +2029,7 @@ static struct slab *allocate_slab(struct kmem_cache *s, gfp_t flags, int node) * Allocation may have failed due to fragmentation. * Try a lower order alloc if possible */ - slab = alloc_slab_page(alloc_gfp, node, oo); + slab = alloc_slab_page(s, flags, alloc_gfp, node, oo); if (unlikely(!slab)) return NULL; stat(s, ORDER_FALLBACK); From patchwork Fri Sep 15 10:59:26 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matteo Rizzo X-Patchwork-Id: 13386900 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B942DEE645B for ; Fri, 15 Sep 2023 11:00:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234343AbjIOLAW (ORCPT ); Fri, 15 Sep 2023 07:00:22 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54254 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234172AbjIOLAS (ORCPT ); Fri, 15 Sep 2023 07:00:18 -0400 Received: from mail-ed1-x549.google.com (mail-ed1-x549.google.com [IPv6:2a00:1450:4864:20::549]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 57C0A19BA for ; Fri, 15 Sep 2023 03:59:57 -0700 (PDT) Received: by mail-ed1-x549.google.com with SMTP id 4fb4d7f45d1cf-529fa33ac99so1446952a12.0 for ; Fri, 15 Sep 2023 03:59:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1694775596; x=1695380396; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=rOse4MLSxLAQnwPck5JwfSnTM0/2w8+aiHpu8xQSJg0=; b=wEBLi8Pqw0u7yFLZgv46Dx9+Atqe+oud6meqog8I47zKzg8Eskqs+OurJC/F0QHH45 wC/4SdPnnxcNVgcrFJF6UWoxu8QVGVxFeOe3eYmSBvIXfDmmiogJiJzVcw0LPdp3ily8 SF1bTzPLRBiAxRe2HzPkoIDRIHlffQcWukdD44lbwhoc9ygdlZldIbS7PMwJbCnQk91G czbVaEeUnaqpGOxPa9MYmRiuSYSwPdPTmBJ6KBSetdpXmAYYlYElT+1FJmRpVQ5lVtjE YLL6tuk0AAPj/zRSdDBxJ1mZcxaxhpl7YDu1dOOJLSQrohRSE+b51uH3V1mKbZYrmbLo Qi5w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1694775596; x=1695380396; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=rOse4MLSxLAQnwPck5JwfSnTM0/2w8+aiHpu8xQSJg0=; b=ofoyYAhBWjYyV4D7M6H1dUiIC8twGi/E4ac5VqqD6+i1b5eihG96mSuFqsXkZTtN4g uUApTbCwF5i0dDBAKmZOyXvHFsKzu6T5d5/yAU2P6u0lXPNTHDiFyLUX4WF6FEtXvV0a kx9hE7lbeiQtBHsRsNKr2Oz3xfo2PRyIMUQJQO56VOTqpq91kYe2Z51+0Yv6kHAR+igc K9WJZP/OxJlXYVDSFPDcPzfei8FntmlMh0XQ212E3AYoxOBHbder1PMzV7RdPJS2KhX9 jcq5YxQyu+bhhe1Q6vjejZDpAWTvkXonUTzcGP0ZOqWm6K5KhCkuuQ88SQ01llR92U8u CZxg== X-Gm-Message-State: AOJu0YwQu1rMK5MhFM5rRKUqgAzFem2N1kw5JgJWRCys25aOCGf3no2o mTE23B0eA6qKUv6DLBblmw2kEDaOoioVWhVIhw== X-Google-Smtp-Source: AGHT+IFLNNO7nLlngx6UR+g98LJngeXta2Ps2Exu4HntuRM5obt3GthiOTu6TJ2m43o6UrVjvgY27QRf/WNPQWcdWw== X-Received: from mr-cloudtop2.c.googlers.com ([fda3:e722:ac3:cc00:31:98fb:c0a8:2a6]) (user=matteorizzo job=sendgmr) by 2002:a50:baa4:0:b0:525:442c:2e5d with SMTP id x33-20020a50baa4000000b00525442c2e5dmr9427ede.6.1694775595758; Fri, 15 Sep 2023 03:59:55 -0700 (PDT) Date: Fri, 15 Sep 2023 10:59:26 +0000 In-Reply-To: <20230915105933.495735-1-matteorizzo@google.com> Mime-Version: 1.0 References: <20230915105933.495735-1-matteorizzo@google.com> X-Mailer: git-send-email 2.42.0.459.ge4e396fd5e-goog Message-ID: <20230915105933.495735-8-matteorizzo@google.com> Subject: [RFC PATCH 07/14] mm/slub: pass slab pointer to the freeptr decode helper From: Matteo Rizzo To: cl@linux.com, penberg@kernel.org, rientjes@google.com, iamjoonsoo.kim@lge.com, akpm@linux-foundation.org, vbabka@suse.cz, roman.gushchin@linux.dev, 42.hyeyoo@gmail.com, keescook@chromium.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-hardening@vger.kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org, hpa@zytor.com, corbet@lwn.net, luto@kernel.org, peterz@infradead.org Cc: jannh@google.com, matteorizzo@google.com, evn@google.com, poprdi@google.com, jordyzomer@google.com Precedence: bulk List-ID: X-Mailing-List: linux-hardening@vger.kernel.org From: Jann Horn This is refactoring in preparation for checking freeptrs for corruption inside freelist_ptr_decode(). Signed-off-by: Jann Horn Co-developed-by: Matteo Rizzo Signed-off-by: Matteo Rizzo --- mm/slub.c | 43 +++++++++++++++++++++++-------------------- 1 file changed, 23 insertions(+), 20 deletions(-) diff --git a/mm/slub.c b/mm/slub.c index eaa1256aff89..42e7cc0b4452 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -383,7 +383,8 @@ static inline freeptr_t freelist_ptr_encode(const struct kmem_cache *s, } static inline void *freelist_ptr_decode(const struct kmem_cache *s, - freeptr_t ptr, unsigned long ptr_addr) + freeptr_t ptr, unsigned long ptr_addr, + struct slab *slab) { void *decoded; @@ -395,7 +396,8 @@ static inline void *freelist_ptr_decode(const struct kmem_cache *s, return decoded; } -static inline void *get_freepointer(struct kmem_cache *s, void *object) +static inline void *get_freepointer(struct kmem_cache *s, void *object, + struct slab *slab) { unsigned long ptr_addr; freeptr_t p; @@ -403,7 +405,7 @@ static inline void *get_freepointer(struct kmem_cache *s, void *object) object = kasan_reset_tag(object); ptr_addr = (unsigned long)object + s->offset; p = *(freeptr_t *)(ptr_addr); - return freelist_ptr_decode(s, p, ptr_addr); + return freelist_ptr_decode(s, p, ptr_addr, slab); } #ifndef CONFIG_SLUB_TINY @@ -424,18 +426,19 @@ static void prefetch_freepointer(const struct kmem_cache *s, void *object) * get_freepointer_safe() returns initialized memory. */ __no_kmsan_checks -static inline void *get_freepointer_safe(struct kmem_cache *s, void *object) +static inline void *get_freepointer_safe(struct kmem_cache *s, void *object, + struct slab *slab) { unsigned long freepointer_addr; freeptr_t p; if (!debug_pagealloc_enabled_static()) - return get_freepointer(s, object); + return get_freepointer(s, object, slab); object = kasan_reset_tag(object); freepointer_addr = (unsigned long)object + s->offset; copy_from_kernel_nofault(&p, (freeptr_t *)freepointer_addr, sizeof(p)); - return freelist_ptr_decode(s, p, freepointer_addr); + return freelist_ptr_decode(s, p, freepointer_addr, slab); } static inline void set_freepointer(struct kmem_cache *s, void *object, void *fp) @@ -627,7 +630,7 @@ static void __fill_map(unsigned long *obj_map, struct kmem_cache *s, bitmap_zero(obj_map, slab->objects); - for (p = slab->freelist; p; p = get_freepointer(s, p)) + for (p = slab->freelist; p; p = get_freepointer(s, p, slab)) set_bit(__obj_to_index(s, addr, p), obj_map); } @@ -937,7 +940,7 @@ static void print_trailer(struct kmem_cache *s, struct slab *slab, u8 *p) print_slab_info(slab); pr_err("Object 0x%p @offset=%tu fp=0x%p\n\n", - p, p - addr, get_freepointer(s, p)); + p, p - addr, get_freepointer(s, p, slab)); if (s->flags & SLAB_RED_ZONE) print_section(KERN_ERR, "Redzone ", p - s->red_left_pad, @@ -1230,7 +1233,7 @@ static int check_object(struct kmem_cache *s, struct slab *slab, return 1; /* Check free pointer validity */ - if (!check_valid_pointer(s, slab, get_freepointer(s, p))) { + if (!check_valid_pointer(s, slab, get_freepointer(s, p, slab))) { object_err(s, slab, p, "Freepointer corrupt"); /* * No choice but to zap it and thus lose the remainder @@ -1298,7 +1301,7 @@ static int on_freelist(struct kmem_cache *s, struct slab *slab, void *search) break; } object = fp; - fp = get_freepointer(s, object); + fp = get_freepointer(s, object, slab); nr++; } @@ -1810,7 +1813,7 @@ static inline bool slab_free_freelist_hook(struct kmem_cache *s, object = next; /* Single objects don't actually contain a freepointer */ if (object != old_tail) - next = get_freepointer(s, object); + next = get_freepointer(s, object, virt_to_slab(object)); /* If object's reuse doesn't have to be delayed */ if (!slab_free_hook(s, object, slab_want_init_on_free(s))) { @@ -2161,7 +2164,7 @@ static void *alloc_single_from_partial(struct kmem_cache *s, lockdep_assert_held(&n->list_lock); object = slab->freelist; - slab->freelist = get_freepointer(s, object); + slab->freelist = get_freepointer(s, object, slab); slab->inuse++; if (!alloc_debug_processing(s, slab, object, orig_size)) { @@ -2192,7 +2195,7 @@ static void *alloc_single_from_new_slab(struct kmem_cache *s, object = slab->freelist; - slab->freelist = get_freepointer(s, object); + slab->freelist = get_freepointer(s, object, slab); slab->inuse = 1; if (!alloc_debug_processing(s, slab, object, orig_size)) @@ -2517,7 +2520,7 @@ static void deactivate_slab(struct kmem_cache *s, struct slab *slab, freelist_tail = NULL; freelist_iter = freelist; while (freelist_iter) { - nextfree = get_freepointer(s, freelist_iter); + nextfree = get_freepointer(s, freelist_iter, slab); /* * If 'nextfree' is invalid, it is possible that the object at @@ -2944,7 +2947,7 @@ static inline bool free_debug_processing(struct kmem_cache *s, /* Reached end of constructed freelist yet? */ if (object != tail) { - object = get_freepointer(s, object); + object = get_freepointer(s, object, slab); goto next_object; } checks_ok = true; @@ -3173,7 +3176,7 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node, * That slab must be frozen for per cpu allocations to work. */ VM_BUG_ON(!c->slab->frozen); - c->freelist = get_freepointer(s, freelist); + c->freelist = get_freepointer(s, freelist, c->slab); c->tid = next_tid(c->tid); local_unlock_irqrestore(&s->cpu_slab->lock, flags); return freelist; @@ -3275,7 +3278,7 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node, * For !pfmemalloc_match() case we don't load freelist so that * we don't make further mismatched allocations easier. */ - deactivate_slab(s, slab, get_freepointer(s, freelist)); + deactivate_slab(s, slab, get_freepointer(s, freelist, slab)); return freelist; } @@ -3377,7 +3380,7 @@ static __always_inline void *__slab_alloc_node(struct kmem_cache *s, unlikely(!object || !slab || !node_match(slab, node))) { object = __slab_alloc(s, gfpflags, node, addr, c, orig_size); } else { - void *next_object = get_freepointer_safe(s, object); + void *next_object = get_freepointer_safe(s, object, slab); /* * The cmpxchg will only match if there was no additional @@ -3984,7 +3987,7 @@ static inline int __kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, continue; /* goto for-loop */ } - c->freelist = get_freepointer(s, object); + c->freelist = get_freepointer(s, object, c->slab); p[i] = object; maybe_wipe_obj_freeptr(s, p[i]); } @@ -4275,7 +4278,7 @@ static void early_kmem_cache_node_alloc(int node) init_tracking(kmem_cache_node, n); #endif n = kasan_slab_alloc(kmem_cache_node, n, GFP_KERNEL, false); - slab->freelist = get_freepointer(kmem_cache_node, n); + slab->freelist = get_freepointer(kmem_cache_node, n, slab); slab->inuse = 1; kmem_cache_node->node[node] = n; init_kmem_cache_node(n); From patchwork Fri Sep 15 10:59:27 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matteo Rizzo X-Patchwork-Id: 13386901 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6F12EEE6456 for ; Fri, 15 Sep 2023 11:00:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234285AbjIOLA2 (ORCPT ); Fri, 15 Sep 2023 07:00:28 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36760 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234367AbjIOLAZ (ORCPT ); Fri, 15 Sep 2023 07:00:25 -0400 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6A0E61BF4 for ; Fri, 15 Sep 2023 03:59:59 -0700 (PDT) Received: by mail-yb1-xb4a.google.com with SMTP id 3f1490d57ef6-d7e7e70fa52so2271864276.0 for ; Fri, 15 Sep 2023 03:59:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1694775598; x=1695380398; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=PIsfdB6xTDxZC/lwkP7uk8Z8Jcdw9pIYLJmg+xNA39E=; b=2vUDZjgvuJgT6Dxl96lpPH2wdVDJTwodMyV+fgNSXrXWT39dB78Ag5ylWhOB6Ng213 VcaopasHaEJZl7Q1UrT0FoVUJCkrcCSlnxLz7I8g0tcGM6BG0T7LAszHfQj3JZ6ZpKbB YQR+GK0WpNc27Qu6Wi0ouGX/KiDXdd6qZIHy75+0tPXuhBa2bXos3QzUmgA4BAbqb78o YaGcHh2ENF3PkXvYE8zSppYsW/skio12alDCH5V3jj4Sxp/sGgS+O87Dy3Hu9twJf+co uO2aPSyDPkJqlrjaJ4Uedwr484h3CKPnSQ2RQH4Wn0O+uqhfmz8oQ7pjmOk955rdW4KC XhjA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1694775598; x=1695380398; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=PIsfdB6xTDxZC/lwkP7uk8Z8Jcdw9pIYLJmg+xNA39E=; b=FBPT7XbIAIczXLf7oFBarJrHo96EwPByiqpBP+nBlHjg1yo4ag6rrq1U4dO4yuE2IY YXfzTE3LlqnJXdzqypLlhYVyuKAhAceH4Un9JThaOqIixDWBXKUe599gVTPV2ZKQCjUX v7pi2Bishf+PFztuo/+gWDrk+BGiFXY7agrKK92gyD6Xzj8KD1CYF1NZ9VjK5eaUJ41P nWtnALMnxOceD+UdruAZTUBL1iFXAdOos/JlLqTd92ABjfpzn9qzOAd1FB3e4vCwM6fv rsOkJ/yx4kyekQUnoQ3AF14QcbPb1X4MKfIs8C1NZAD2waRzm/OLZCvlvaLqSZmOtYQX cJyw== X-Gm-Message-State: AOJu0Yz3yQWUwckuArJXXWt0DvsmxCKhyxHhTuNBD3scmaaP5lhHtKTt +b00N6P/vmE4oa205ixNK089chsmkVxZd3a/Ug== X-Google-Smtp-Source: AGHT+IHof1ICS1lqUlllRxkXI3ZaCAbe0FHMFloFuiAiVd7ISUWF8gEW5L/+IOYmFoxGVNOKnVGZCUEbaXvmEPeHwA== X-Received: from mr-cloudtop2.c.googlers.com ([fda3:e722:ac3:cc00:31:98fb:c0a8:2a6]) (user=matteorizzo job=sendgmr) by 2002:a5b:d10:0:b0:d81:7f38:6d65 with SMTP id y16-20020a5b0d10000000b00d817f386d65mr22869ybp.2.1694775598311; Fri, 15 Sep 2023 03:59:58 -0700 (PDT) Date: Fri, 15 Sep 2023 10:59:27 +0000 In-Reply-To: <20230915105933.495735-1-matteorizzo@google.com> Mime-Version: 1.0 References: <20230915105933.495735-1-matteorizzo@google.com> X-Mailer: git-send-email 2.42.0.459.ge4e396fd5e-goog Message-ID: <20230915105933.495735-9-matteorizzo@google.com> Subject: [RFC PATCH 08/14] security: introduce CONFIG_SLAB_VIRTUAL From: Matteo Rizzo To: cl@linux.com, penberg@kernel.org, rientjes@google.com, iamjoonsoo.kim@lge.com, akpm@linux-foundation.org, vbabka@suse.cz, roman.gushchin@linux.dev, 42.hyeyoo@gmail.com, keescook@chromium.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-hardening@vger.kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org, hpa@zytor.com, corbet@lwn.net, luto@kernel.org, peterz@infradead.org Cc: jannh@google.com, matteorizzo@google.com, evn@google.com, poprdi@google.com, jordyzomer@google.com Precedence: bulk List-ID: X-Mailing-List: linux-hardening@vger.kernel.org From: Jann Horn SLAB_VIRTUAL is a mitigation for the SLUB allocator which prevents reuse of virtual addresses across different slab caches and therefore makes some types of use-after-free bugs unexploitable. SLAB_VIRTUAL is incompatible with KASAN and we believe it's not worth adding support for it. This is because SLAB_VIRTUAL and KASAN are aimed at two different use cases: KASAN is meant for catching bugs as early as possible in debug/fuzz/testing builds, and it's not meant to be used in production. SLAB_VIRTUAL on the other hand is an exploit mitigation that doesn't attempt to highlight bugs but instead tries to make them unexploitable. It doesn't make sense to enable it in debugging builds or during fuzzing, and instead we expect that it will be enabled in production kernels. SLAB_VIRTUAL is not currently compatible with KFENCE, removing this limitation is future work. Signed-off-by: Jann Horn Co-developed-by: Matteo Rizzo Signed-off-by: Matteo Rizzo Reviewed-by: Kees Cook --- security/Kconfig.hardening | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/security/Kconfig.hardening b/security/Kconfig.hardening index 0f295961e773..9f4e6e38aa76 100644 --- a/security/Kconfig.hardening +++ b/security/Kconfig.hardening @@ -355,4 +355,18 @@ config GCC_PLUGIN_RANDSTRUCT * https://grsecurity.net/ * https://pax.grsecurity.net/ +config SLAB_VIRTUAL + bool "Allocate slab objects from virtual memory" + depends on SLUB && !SLUB_TINY + # If KFENCE support is desired, it could be implemented on top of our + # virtual memory allocation facilities + depends on !KFENCE + # ASAN support will require that shadow memory is allocated + # appropriately. + depends on !KASAN + help + Allocate slab objects from kernel-virtual memory, and ensure that + virtual memory used as a slab cache is never reused to store + objects from other slab caches or non-slab data. + endmenu From patchwork Fri Sep 15 10:59:28 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matteo Rizzo X-Patchwork-Id: 13386902 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id AEE71EE6455 for ; Fri, 15 Sep 2023 11:00:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234180AbjIOLAm (ORCPT ); Fri, 15 Sep 2023 07:00:42 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36772 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234306AbjIOLAd (ORCPT ); Fri, 15 Sep 2023 07:00:33 -0400 Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BF31BCC7 for ; Fri, 15 Sep 2023 04:00:02 -0700 (PDT) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-59c27703cc6so159217b3.2 for ; Fri, 15 Sep 2023 04:00:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1694775601; x=1695380401; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=Rgcw5kSlwTUZsnv8liQ/ZCTx1m6umum1AvnN+yk4ci0=; b=YNj4ReoeIxRcOkpVq1wXRHDJ4kBOiomlcsoOyIJsPeKpmSe6sUS5axgSozT2cUiDNF UqI+EpaBrjvu03Qa7dvD7SvLO6kwCl712RRDvHxermpsNFzBOfEx0m2UTPF3z4jl/yLX jAzEnQWYLo1Gr8mGDGsrEewV5ZCk+Qp1X4OdwvOo2McV2JMSEODYU6cn8qcsuVFHSs9D pe7+CgbeAs/3c5IuHTQe+Yisg3r0qXFFz2uPZuY9j0Zk/bEVTlEwga3EDXK+r/YFMjye M2PHAbgQnmBkUROVb6AQ29yPfOW61rTSNVZXaPYhPAELyisAzA3LWZgqHhCCRktRPcAY HeEw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1694775601; x=1695380401; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Rgcw5kSlwTUZsnv8liQ/ZCTx1m6umum1AvnN+yk4ci0=; b=F1/iChPGqT0PWRnPDdLR6hmO44Qr5W++CUMtj3ex/JRYYAZUXddyypZ7Tlg9ZbtSGA 5T46f0Ka3JjW9tDlNvWndHWwlwPpvA+FcBMBxA0+IC9OpTB6p590Vbnhe4CO9PwO+wPQ 6VxHy7+JcDEJ3VnJcloXaoICA6PtyQtumSl54nt635UawqtaB/cXt3L94usuqlivCy3W yWy360Dkyr53wt4CP5t8ds+yjBJtHjThuEVSf/ACFeXTMYyriPDMR0FFHsjhUFg7yjEA bSRMcg7+5xUrI1r3ZPpVXe3A3ns/R8j/OO45BHNCelgcQ/x5MqS/e/f6K4lQSaM1cK5A AmWA== X-Gm-Message-State: AOJu0YxsPQOiqgHNQNu6KlVYzmDWpuEXB/eeYk9WsUthfYFrb0dkpVTb Jc8AKgvZQ+Zffqed7hLzgjLY/nZuMVAHeDT+FA== X-Google-Smtp-Source: AGHT+IEcouH8JvuJElm8N08gEPWOgDsvJMW1WGDf8e1nxWq0e0tWAe1u3fFjyWd///wS5TI4Lwp37X8nvTPZK0q/lg== X-Received: from mr-cloudtop2.c.googlers.com ([fda3:e722:ac3:cc00:31:98fb:c0a8:2a6]) (user=matteorizzo job=sendgmr) by 2002:a81:ae43:0:b0:59b:f493:813d with SMTP id g3-20020a81ae43000000b0059bf493813dmr28901ywk.1.1694775601139; Fri, 15 Sep 2023 04:00:01 -0700 (PDT) Date: Fri, 15 Sep 2023 10:59:28 +0000 In-Reply-To: <20230915105933.495735-1-matteorizzo@google.com> Mime-Version: 1.0 References: <20230915105933.495735-1-matteorizzo@google.com> X-Mailer: git-send-email 2.42.0.459.ge4e396fd5e-goog Message-ID: <20230915105933.495735-10-matteorizzo@google.com> Subject: [RFC PATCH 09/14] mm/slub: add the slab freelists to kmem_cache From: Matteo Rizzo To: cl@linux.com, penberg@kernel.org, rientjes@google.com, iamjoonsoo.kim@lge.com, akpm@linux-foundation.org, vbabka@suse.cz, roman.gushchin@linux.dev, 42.hyeyoo@gmail.com, keescook@chromium.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-hardening@vger.kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org, hpa@zytor.com, corbet@lwn.net, luto@kernel.org, peterz@infradead.org Cc: jannh@google.com, matteorizzo@google.com, evn@google.com, poprdi@google.com, jordyzomer@google.com Precedence: bulk List-ID: X-Mailing-List: linux-hardening@vger.kernel.org From: Jann Horn With SLAB_VIRTUAL enabled, unused slabs which still have virtual memory allocated to them but no physical memory are kept in a per-cache list so that they can be reused later if the cache needs to grow again. Signed-off-by: Jann Horn Co-developed-by: Matteo Rizzo Signed-off-by: Matteo Rizzo Reviewed-by: Kees Cook --- include/linux/slub_def.h | 16 ++++++++++++++++ mm/slub.c | 23 +++++++++++++++++++++++ 2 files changed, 39 insertions(+) diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h index 0adf5ba8241b..693e9bb34edc 100644 --- a/include/linux/slub_def.h +++ b/include/linux/slub_def.h @@ -86,6 +86,20 @@ struct kmem_cache_cpu { /* * Slab cache management. */ +struct kmem_cache_virtual { +#ifdef CONFIG_SLAB_VIRTUAL + /* Protects freed_slabs and freed_slabs_min */ + spinlock_t freed_slabs_lock; + /* + * Slabs on this list have virtual memory of size oo allocated to them + * but no physical memory + */ + struct list_head freed_slabs; + /* Same as freed_slabs but with memory of size min */ + struct list_head freed_slabs_min; +#endif +}; + struct kmem_cache { #ifndef CONFIG_SLUB_TINY struct kmem_cache_cpu __percpu *cpu_slab; @@ -107,6 +121,8 @@ struct kmem_cache { /* Allocation and freeing of slabs */ struct kmem_cache_order_objects min; + struct kmem_cache_virtual virtual; + gfp_t allocflags; /* gfp flags to use on each alloc */ int refcount; /* Refcount for slab cache destroy */ void (*ctor)(void *); diff --git a/mm/slub.c b/mm/slub.c index 42e7cc0b4452..4f77e5d4fe6c 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -4510,8 +4510,20 @@ static int calculate_sizes(struct kmem_cache *s) return !!oo_objects(s->oo); } +static inline void slab_virtual_open(struct kmem_cache *s) +{ +#ifdef CONFIG_SLAB_VIRTUAL + /* WARNING: this stuff will be relocated in bootstrap()! */ + spin_lock_init(&s->virtual.freed_slabs_lock); + INIT_LIST_HEAD(&s->virtual.freed_slabs); + INIT_LIST_HEAD(&s->virtual.freed_slabs_min); +#endif +} + static int kmem_cache_open(struct kmem_cache *s, slab_flags_t flags) { + slab_virtual_open(s); + s->flags = kmem_cache_flags(s->size, flags, s->name); #ifdef CONFIG_SLAB_FREELIST_HARDENED s->random = get_random_long(); @@ -4994,6 +5006,16 @@ static int slab_memory_callback(struct notifier_block *self, * that may be pointing to the wrong kmem_cache structure. */ +static inline void slab_virtual_bootstrap(struct kmem_cache *s, struct kmem_cache *static_cache) +{ + slab_virtual_open(s); + +#ifdef CONFIG_SLAB_VIRTUAL + list_splice(&static_cache->virtual.freed_slabs, &s->virtual.freed_slabs); + list_splice(&static_cache->virtual.freed_slabs_min, &s->virtual.freed_slabs_min); +#endif +} + static struct kmem_cache * __init bootstrap(struct kmem_cache *static_cache) { int node; @@ -5001,6 +5023,7 @@ static struct kmem_cache * __init bootstrap(struct kmem_cache *static_cache) struct kmem_cache_node *n; memcpy(s, static_cache, kmem_cache->object_size); + slab_virtual_bootstrap(s, static_cache); /* * This runs very early, and only the boot processor is supposed to be From patchwork Fri Sep 15 10:59:29 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matteo Rizzo X-Patchwork-Id: 13386903 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8F9D9EE6456 for ; Fri, 15 Sep 2023 11:00:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233843AbjIOLAq (ORCPT ); Fri, 15 Sep 2023 07:00:46 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55856 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233924AbjIOLAm (ORCPT ); Fri, 15 Sep 2023 07:00:42 -0400 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8D2272121 for ; Fri, 15 Sep 2023 04:00:05 -0700 (PDT) Received: by mail-yb1-xb4a.google.com with SMTP id 3f1490d57ef6-d7edc01fdc9so2140411276.3 for ; Fri, 15 Sep 2023 04:00:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1694775604; x=1695380404; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=QO31JQTgjdCfbGBtgkfgkDIiB/3zYgVmpQ8kZJD1h+Y=; b=rnLhAKjLuhA4vg/nVySYKRO6L9b0ld7QHjnHHV6pyD4+JlR7b9/V9L5EaAg2xBThI6 GtgyvL5XlRdgnRTEe4GW5npsY8nQ+dyAil8lYbxZu3ngjzvyEFRTtyKQ7dPdl+Db5gc0 PuAqWNkruacXAJB8Dycg8oYIecGc9jnfoX7Psrgoq1gO/WNgMRhSEppj668vER1lN/xS s6o4W+ftGzwYabSAbrg5S6u1VRNaUrTmYilNwsPrV0Vti8NZto88GvUZjKcKWoyj9TjX 8CZLhDZC+iqED6I5UaYd96YHiSYPyxSolQ+UBvM1fV//kc0NX13RNlzk40Zt8xn3q1U4 9ROQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1694775604; x=1695380404; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=QO31JQTgjdCfbGBtgkfgkDIiB/3zYgVmpQ8kZJD1h+Y=; b=cxl67i0xqIsI8HhjCWis1IgUFXnjvXvfaY/dC0JrBE3xMXiq4WVUAZXzBJujeHg6DA 9KyNnC2zhDs9gWiNnIzugzZ5QmgkAMwa74foc8PdnF4DwodpzIRyVj8L9k/zJ0DAOtoV HTWoODw+Y6MJcQjjAZt/3JXQhfUXbZNA6vLXnVzZQ99zDSt+gy/ijFN8binvNaGfc8M/ Dj7hBXSPmh9DEdY7D9CX84Nx/OArHRXsuBTEm9wUD71v42xQ+op4ekTdAownlyyppajK muXrX3CyTiGDLYziBpep3P4ef40T5WiovoAzCxKZe7jlIiPXuARxHAnciHn280iZXZVo naEw== X-Gm-Message-State: AOJu0YySwxWqbfcM68b5nJdRFBOXDFvdUWsCPlIEZcYohdsAL07XWaAQ /qk3dhHJAm8/QXRgUTZ7A7O1Ahedd0EB6t4LyA== X-Google-Smtp-Source: AGHT+IHEmylFdcJCHDvLHo7IywUVvUbkVuBMx1sd6LW13fQHTajbqMCrs6QknCEIuXA+dbFf620Y1iDB5FTTmGzUFQ== X-Received: from mr-cloudtop2.c.googlers.com ([fda3:e722:ac3:cc00:31:98fb:c0a8:2a6]) (user=matteorizzo job=sendgmr) by 2002:a05:6902:144d:b0:d81:503e:2824 with SMTP id a13-20020a056902144d00b00d81503e2824mr26306ybv.10.1694775603871; Fri, 15 Sep 2023 04:00:03 -0700 (PDT) Date: Fri, 15 Sep 2023 10:59:29 +0000 In-Reply-To: <20230915105933.495735-1-matteorizzo@google.com> Mime-Version: 1.0 References: <20230915105933.495735-1-matteorizzo@google.com> X-Mailer: git-send-email 2.42.0.459.ge4e396fd5e-goog Message-ID: <20230915105933.495735-11-matteorizzo@google.com> Subject: [RFC PATCH 10/14] x86: Create virtual memory region for SLUB From: Matteo Rizzo To: cl@linux.com, penberg@kernel.org, rientjes@google.com, iamjoonsoo.kim@lge.com, akpm@linux-foundation.org, vbabka@suse.cz, roman.gushchin@linux.dev, 42.hyeyoo@gmail.com, keescook@chromium.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-hardening@vger.kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org, hpa@zytor.com, corbet@lwn.net, luto@kernel.org, peterz@infradead.org Cc: jannh@google.com, matteorizzo@google.com, evn@google.com, poprdi@google.com, jordyzomer@google.com Precedence: bulk List-ID: X-Mailing-List: linux-hardening@vger.kernel.org From: Jann Horn SLAB_VIRTUAL reserves 512 GiB of virtual memory and uses them for both struct slab and the actual slab memory. The pointers returned by kmem_cache_alloc will point to this range of memory. Signed-off-by: Jann Horn Co-developed-by: Matteo Rizzo Signed-off-by: Matteo Rizzo Reviewed-by: Kees Cook --- Documentation/arch/x86/x86_64/mm.rst | 4 ++-- arch/x86/include/asm/pgtable_64_types.h | 16 ++++++++++++++++ arch/x86/mm/init_64.c | 19 +++++++++++++++---- arch/x86/mm/kaslr.c | 9 +++++++++ arch/x86/mm/mm_internal.h | 4 ++++ mm/slub.c | 4 ++++ security/Kconfig.hardening | 2 ++ 7 files changed, 52 insertions(+), 6 deletions(-) diff --git a/Documentation/arch/x86/x86_64/mm.rst b/Documentation/arch/x86/x86_64/mm.rst index 35e5e18c83d0..121179537175 100644 --- a/Documentation/arch/x86/x86_64/mm.rst +++ b/Documentation/arch/x86/x86_64/mm.rst @@ -57,7 +57,7 @@ Complete virtual memory map with 4-level page tables fffffc0000000000 | -4 TB | fffffdffffffffff | 2 TB | ... unused hole | | | | vaddr_end for KASLR fffffe0000000000 | -2 TB | fffffe7fffffffff | 0.5 TB | cpu_entry_area mapping - fffffe8000000000 | -1.5 TB | fffffeffffffffff | 0.5 TB | ... unused hole + fffffe8000000000 | -1.5 TB | fffffeffffffffff | 0.5 TB | SLUB virtual memory ffffff0000000000 | -1 TB | ffffff7fffffffff | 0.5 TB | %esp fixup stacks ffffff8000000000 | -512 GB | ffffffeeffffffff | 444 GB | ... unused hole ffffffef00000000 | -68 GB | fffffffeffffffff | 64 GB | EFI region mapping space @@ -116,7 +116,7 @@ Complete virtual memory map with 5-level page tables fffffc0000000000 | -4 TB | fffffdffffffffff | 2 TB | ... unused hole | | | | vaddr_end for KASLR fffffe0000000000 | -2 TB | fffffe7fffffffff | 0.5 TB | cpu_entry_area mapping - fffffe8000000000 | -1.5 TB | fffffeffffffffff | 0.5 TB | ... unused hole + fffffe8000000000 | -1.5 TB | fffffeffffffffff | 0.5 TB | SLUB virtual memory ffffff0000000000 | -1 TB | ffffff7fffffffff | 0.5 TB | %esp fixup stacks ffffff8000000000 | -512 GB | ffffffeeffffffff | 444 GB | ... unused hole ffffffef00000000 | -68 GB | fffffffeffffffff | 64 GB | EFI region mapping space diff --git a/arch/x86/include/asm/pgtable_64_types.h b/arch/x86/include/asm/pgtable_64_types.h index 38b54b992f32..e1a91eb084c4 100644 --- a/arch/x86/include/asm/pgtable_64_types.h +++ b/arch/x86/include/asm/pgtable_64_types.h @@ -6,6 +6,7 @@ #ifndef __ASSEMBLY__ #include +#include #include /* @@ -199,6 +200,21 @@ extern unsigned int ptrs_per_p4d; #define ESPFIX_PGD_ENTRY _AC(-2, UL) #define ESPFIX_BASE_ADDR (ESPFIX_PGD_ENTRY << P4D_SHIFT) +#ifdef CONFIG_SLAB_VIRTUAL +#define SLAB_PGD_ENTRY _AC(-3, UL) +#define SLAB_BASE_ADDR (SLAB_PGD_ENTRY << P4D_SHIFT) +#define SLAB_END_ADDR (SLAB_BASE_ADDR + P4D_SIZE) + +/* + * We need to define this here because we need it to compute SLAB_META_SIZE + * and including slab.h causes a dependency cycle. + */ +#define STRUCT_SLAB_SIZE (32 * sizeof(void *)) +#define SLAB_VPAGES ((SLAB_END_ADDR - SLAB_BASE_ADDR) / PAGE_SIZE) +#define SLAB_META_SIZE ALIGN(SLAB_VPAGES * STRUCT_SLAB_SIZE, PAGE_SIZE) +#define SLAB_DATA_BASE_ADDR (SLAB_BASE_ADDR + SLAB_META_SIZE) +#endif /* CONFIG_SLAB_VIRTUAL */ + #define CPU_ENTRY_AREA_PGD _AC(-4, UL) #define CPU_ENTRY_AREA_BASE (CPU_ENTRY_AREA_PGD << P4D_SHIFT) diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c index a190aae8ceaf..d716ddfd9880 100644 --- a/arch/x86/mm/init_64.c +++ b/arch/x86/mm/init_64.c @@ -1279,16 +1279,19 @@ static void __init register_page_bootmem_info(void) } /* - * Pre-allocates page-table pages for the vmalloc area in the kernel page-table. + * Pre-allocates page-table pages for the vmalloc and SLUB areas in the kernel + * page-table. * Only the level which needs to be synchronized between all page-tables is * allocated because the synchronization can be expensive. */ -static void __init preallocate_vmalloc_pages(void) +static void __init preallocate_top_level_entries_range(unsigned long start, + unsigned long end) { unsigned long addr; const char *lvl; - for (addr = VMALLOC_START; addr <= VMEMORY_END; addr = ALIGN(addr + 1, PGDIR_SIZE)) { + + for (addr = start; addr <= end; addr = ALIGN(addr + 1, PGDIR_SIZE)) { pgd_t *pgd = pgd_offset_k(addr); p4d_t *p4d; pud_t *pud; @@ -1328,6 +1331,14 @@ static void __init preallocate_vmalloc_pages(void) panic("Failed to pre-allocate %s pages for vmalloc area\n", lvl); } +static void __init preallocate_top_level_entries(void) +{ + preallocate_top_level_entries_range(VMALLOC_START, VMEMORY_END); +#ifdef CONFIG_SLAB_VIRTUAL + preallocate_top_level_entries_range(SLAB_BASE_ADDR, SLAB_END_ADDR - 1); +#endif +} + void __init mem_init(void) { pci_iommu_alloc(); @@ -1351,7 +1362,7 @@ void __init mem_init(void) if (get_gate_vma(&init_mm)) kclist_add(&kcore_vsyscall, (void *)VSYSCALL_ADDR, PAGE_SIZE, KCORE_USER); - preallocate_vmalloc_pages(); + preallocate_top_level_entries(); } #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT diff --git a/arch/x86/mm/kaslr.c b/arch/x86/mm/kaslr.c index 37db264866b6..7b297d372a8c 100644 --- a/arch/x86/mm/kaslr.c +++ b/arch/x86/mm/kaslr.c @@ -136,6 +136,15 @@ void __init kernel_randomize_memory(void) vaddr = round_up(vaddr + 1, PUD_SIZE); remain_entropy -= entropy; } + +#ifdef CONFIG_SLAB_VIRTUAL + /* + * slub_addr_base is initialized separately from the + * kaslr_memory_regions because it comes after CPU_ENTRY_AREA_BASE. + */ + prandom_bytes_state(&rand_state, &rand, sizeof(rand)); + slub_addr_base += (rand & ((1UL << 36) - PAGE_SIZE)); +#endif } void __meminit init_trampoline_kaslr(void) diff --git a/arch/x86/mm/mm_internal.h b/arch/x86/mm/mm_internal.h index 3f37b5c80bb3..fafb79b7e019 100644 --- a/arch/x86/mm/mm_internal.h +++ b/arch/x86/mm/mm_internal.h @@ -25,4 +25,8 @@ void update_cache_mode_entry(unsigned entry, enum page_cache_mode cache); extern unsigned long tlb_single_page_flush_ceiling; +#ifdef CONFIG_SLAB_VIRTUAL +extern unsigned long slub_addr_base; +#endif + #endif /* __X86_MM_INTERNAL_H */ diff --git a/mm/slub.c b/mm/slub.c index 4f77e5d4fe6c..a731fdc79bff 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -166,6 +166,10 @@ * the fast path and disables lockless freelists. */ +#ifdef CONFIG_SLAB_VIRTUAL +unsigned long slub_addr_base = SLAB_DATA_BASE_ADDR; +#endif /* CONFIG_SLAB_VIRTUAL */ + /* * We could simply use migrate_disable()/enable() but as long as it's a * function call even on !PREEMPT_RT, use inline preempt_disable() there. diff --git a/security/Kconfig.hardening b/security/Kconfig.hardening index 9f4e6e38aa76..f4a0af424149 100644 --- a/security/Kconfig.hardening +++ b/security/Kconfig.hardening @@ -357,6 +357,8 @@ config GCC_PLUGIN_RANDSTRUCT config SLAB_VIRTUAL bool "Allocate slab objects from virtual memory" + # For virtual memory region allocation + depends on X86_64 depends on SLUB && !SLUB_TINY # If KFENCE support is desired, it could be implemented on top of our # virtual memory allocation facilities From patchwork Fri Sep 15 10:59:30 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matteo Rizzo X-Patchwork-Id: 13386904 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3F10EEE6455 for ; Fri, 15 Sep 2023 11:00:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234308AbjIOLA6 (ORCPT ); Fri, 15 Sep 2023 07:00:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50328 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234389AbjIOLAz (ORCPT ); Fri, 15 Sep 2023 07:00:55 -0400 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D5219C1 for ; Fri, 15 Sep 2023 04:00:07 -0700 (PDT) Received: by mail-yb1-xb4a.google.com with SMTP id 3f1490d57ef6-d7ec535fe42so2177421276.1 for ; Fri, 15 Sep 2023 04:00:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1694775606; x=1695380406; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=fwMqMOgZczlkXIp265D11obkAY0m5SEDJnzDVowAfdI=; b=vih8/iziYYW1X/tOkqfJNNWkbxGwS582WfHF/U0hzSVobzgMFe6aYnD76g1QnbuGnk g1bc2qZqtx/e16WL8CopzT/rFha1+XGgPECljvJO96C/q+wpuXIqpITO5FuqNCLRWCzs WmcDjmFhdAmwnmZWPrW16Tn3t+dog07QXRL7wIP7X8JmqGYeCQO+N1ipUH2xbFWO2dGr 4PuRQ5dW4XowqGIKv1pDUpNEd+IWs4zzEvOl9/LdbRbXOfKmAROiCyGA5Z6SXuT4Neux HqVqRqwc4beAZDmvmCeYkK5VrSUTSr3WOF0B+LoW1JtQcCLN0qwY8LVeS8JvVrTL2bvT NLUA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1694775606; x=1695380406; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=fwMqMOgZczlkXIp265D11obkAY0m5SEDJnzDVowAfdI=; b=nl0YPrcCC9zmIQoWxl2Ho1Dckgdjnt+UhiHGxqvdk6X3nMl+yQa6pZY4DqXwU/rGrS e4fUB9pObOkgJqBaHV0o4lEKLyTFPlac6RPKeqiXw8/AcJte6cRD1cSpA3Sy2Hg4xhg7 oZfVhgDQl9sslqXFSvxFBXnajEu75b4Lf2Y/x9+ouUoudbRX9yVR+ajTs+nDz4hOvSpz oaY5WDILcDeAJkSby+s2d8CE3ZpRKf2UCfr0NoEbe5ekgFBAbAJAGpp05YjMM/6UnULH syxrjhv+a1WrCNTyce1qi4mCDz1MZ2o173xeSRaLVLT7gF1EB+k0jwOtYWTewOPtdLmp +dZg== X-Gm-Message-State: AOJu0YzqqVviKZ7axpvXBDGENqBg3pZ2SF3u8tM5s8ehhRVzEIQJ8646 s8CIKq1XGGtKJ1yQSck5BlW6IBWa8m6fTnwnNQ== X-Google-Smtp-Source: AGHT+IEZscvrc96V1gn1BF986Tuf/VhyLh041S+hHsJVTGzLN4PjtlTEnpTINuRLJ+4zXNfGiWlDrchNDKphRD7cog== X-Received: from mr-cloudtop2.c.googlers.com ([fda3:e722:ac3:cc00:31:98fb:c0a8:2a6]) (user=matteorizzo job=sendgmr) by 2002:a05:6902:1341:b0:d80:cf4:7e80 with SMTP id g1-20020a056902134100b00d800cf47e80mr23456ybu.7.1694775606539; Fri, 15 Sep 2023 04:00:06 -0700 (PDT) Date: Fri, 15 Sep 2023 10:59:30 +0000 In-Reply-To: <20230915105933.495735-1-matteorizzo@google.com> Mime-Version: 1.0 References: <20230915105933.495735-1-matteorizzo@google.com> X-Mailer: git-send-email 2.42.0.459.ge4e396fd5e-goog Message-ID: <20230915105933.495735-12-matteorizzo@google.com> Subject: [RFC PATCH 11/14] mm/slub: allocate slabs from virtual memory From: Matteo Rizzo To: cl@linux.com, penberg@kernel.org, rientjes@google.com, iamjoonsoo.kim@lge.com, akpm@linux-foundation.org, vbabka@suse.cz, roman.gushchin@linux.dev, 42.hyeyoo@gmail.com, keescook@chromium.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-hardening@vger.kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org, hpa@zytor.com, corbet@lwn.net, luto@kernel.org, peterz@infradead.org Cc: jannh@google.com, matteorizzo@google.com, evn@google.com, poprdi@google.com, jordyzomer@google.com Precedence: bulk List-ID: X-Mailing-List: linux-hardening@vger.kernel.org From: Jann Horn This is the main implementation of SLAB_VIRTUAL. With SLAB_VIRTUAL enabled, slab memory is not allocated from the linear map but from a dedicated region of virtual memory. The code ensures that once a range of virtual addresses is assigned to a slab cache, that virtual memory is never reused again except for other slabs in that same cache. This lets us mitigate some exploits for use-after-free vulnerabilities where the attacker makes SLUB release a slab page to the page allocator and then makes it reuse that same page for a different slab cache ("cross-cache attacks"). With SLAB_VIRTUAL enabled struct slab no longer overlaps struct page but instead it is allocated from a dedicated region of virtual memory. This makes it possible to have references to slabs whose physical memory has been freed. SLAB_VIRTUAL has a small performance overhead, about 1-2% on kernel compilation time. We are using 4 KiB pages to map slab pages and slab metadata area, instead of the 2 MiB pages that the kernel uses to map the physmap. We experimented with a version of the patch that uses 2 MiB pages and we did see some performance improvement but the code also became much more complicated and ugly because we would need to allocate and free multiple slabs at once. In addition to the TLB contention, SLAB_VIRTUAL also adds new locks to the slow path of the allocator. Lock contention also contributes to the performance penalty to some extent, and this is more visible on machines with many CPUs. Signed-off-by: Jann Horn Co-developed-by: Matteo Rizzo Signed-off-by: Matteo Rizzo --- arch/x86/include/asm/page_64.h | 10 + arch/x86/include/asm/pgtable_64_types.h | 5 + arch/x86/mm/physaddr.c | 10 + include/linux/slab.h | 7 + init/main.c | 1 + mm/slab.h | 106 ++++++ mm/slab_common.c | 4 + mm/slub.c | 439 +++++++++++++++++++++++- mm/usercopy.c | 12 +- 9 files changed, 587 insertions(+), 7 deletions(-) diff --git a/arch/x86/include/asm/page_64.h b/arch/x86/include/asm/page_64.h index cc6b8e087192..25fb734a2fe6 100644 --- a/arch/x86/include/asm/page_64.h +++ b/arch/x86/include/asm/page_64.h @@ -3,6 +3,7 @@ #define _ASM_X86_PAGE_64_H #include +#include #ifndef __ASSEMBLY__ #include @@ -18,10 +19,19 @@ extern unsigned long page_offset_base; extern unsigned long vmalloc_base; extern unsigned long vmemmap_base; +#ifdef CONFIG_SLAB_VIRTUAL +unsigned long slab_virt_to_phys(unsigned long x); +#endif + static __always_inline unsigned long __phys_addr_nodebug(unsigned long x) { unsigned long y = x - __START_KERNEL_map; +#ifdef CONFIG_SLAB_VIRTUAL + if (is_slab_addr(x)) + return slab_virt_to_phys(x); +#endif + /* use the carry flag to determine if x was < __START_KERNEL_map */ x = y + ((x > y) ? phys_base : (__START_KERNEL_map - PAGE_OFFSET)); diff --git a/arch/x86/include/asm/pgtable_64_types.h b/arch/x86/include/asm/pgtable_64_types.h index e1a91eb084c4..4aae822a6a96 100644 --- a/arch/x86/include/asm/pgtable_64_types.h +++ b/arch/x86/include/asm/pgtable_64_types.h @@ -213,6 +213,11 @@ extern unsigned int ptrs_per_p4d; #define SLAB_VPAGES ((SLAB_END_ADDR - SLAB_BASE_ADDR) / PAGE_SIZE) #define SLAB_META_SIZE ALIGN(SLAB_VPAGES * STRUCT_SLAB_SIZE, PAGE_SIZE) #define SLAB_DATA_BASE_ADDR (SLAB_BASE_ADDR + SLAB_META_SIZE) + +#define is_slab_addr(ptr) ((unsigned long)(ptr) >= SLAB_DATA_BASE_ADDR && \ + (unsigned long)(ptr) < SLAB_END_ADDR) +#define is_slab_meta(ptr) ((unsigned long)(ptr) >= SLAB_BASE_ADDR && \ + (unsigned long)(ptr) < SLAB_DATA_BASE_ADDR) #endif /* CONFIG_SLAB_VIRTUAL */ #define CPU_ENTRY_AREA_PGD _AC(-4, UL) diff --git a/arch/x86/mm/physaddr.c b/arch/x86/mm/physaddr.c index fc3f3d3e2ef2..7f1b81c75e4d 100644 --- a/arch/x86/mm/physaddr.c +++ b/arch/x86/mm/physaddr.c @@ -16,6 +16,11 @@ unsigned long __phys_addr(unsigned long x) { unsigned long y = x - __START_KERNEL_map; +#ifdef CONFIG_SLAB_VIRTUAL + if (is_slab_addr(x)) + return slab_virt_to_phys(x); +#endif + /* use the carry flag to determine if x was < __START_KERNEL_map */ if (unlikely(x > y)) { x = y + phys_base; @@ -48,6 +53,11 @@ bool __virt_addr_valid(unsigned long x) { unsigned long y = x - __START_KERNEL_map; +#ifdef CONFIG_SLAB_VIRTUAL + if (is_slab_addr(x)) + return true; +#endif + /* use the carry flag to determine if x was < __START_KERNEL_map */ if (unlikely(x > y)) { x = y + phys_base; diff --git a/include/linux/slab.h b/include/linux/slab.h index a2d82010d269..2180d5170995 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -793,5 +793,12 @@ int slab_dead_cpu(unsigned int cpu); #define slab_dead_cpu NULL #endif +#ifdef CONFIG_SLAB_VIRTUAL +void __init init_slub_page_reclaim(void); +#else #define is_slab_addr(addr) folio_test_slab(virt_to_folio(addr)) +static inline void init_slub_page_reclaim(void) +{ +} +#endif /* CONFIG_SLAB_VIRTUAL */ #endif /* _LINUX_SLAB_H */ diff --git a/init/main.c b/init/main.c index ad920fac325c..72456964417e 100644 --- a/init/main.c +++ b/init/main.c @@ -1532,6 +1532,7 @@ static noinline void __init kernel_init_freeable(void) workqueue_init(); init_mm_internals(); + init_slub_page_reclaim(); rcu_init_tasks_generic(); do_pre_smp_initcalls(); diff --git a/mm/slab.h b/mm/slab.h index 3fe0d1e26e26..460c802924bd 100644 --- a/mm/slab.h +++ b/mm/slab.h @@ -1,6 +1,11 @@ /* SPDX-License-Identifier: GPL-2.0 */ #ifndef MM_SLAB_H #define MM_SLAB_H + +#include +#include +#include + /* * Internal slab definitions */ @@ -49,7 +54,35 @@ struct kmem_cache_order_objects { /* Reuses the bits in struct page */ struct slab { + /* + * With CONFIG_SLAB_VIRTUAL enabled instances of struct slab are not + * overlapped with struct page but instead they are allocated from + * a dedicated virtual memory area. + */ +#ifndef CONFIG_SLAB_VIRTUAL unsigned long __page_flags; +#else + /* + * Used by virt_to_slab to find the actual struct slab for a slab that + * spans multiple pages. + */ + struct slab *compound_slab_head; + + /* + * Pointer to the folio that the objects are allocated from, or NULL if + * the slab is currently unused and no physical memory is allocated to + * it. Protected by slub_kworker_lock. + */ + struct folio *backing_folio; + + struct kmem_cache_order_objects oo; + + struct list_head flush_list_elem; + + /* Replaces the page lock */ + spinlock_t slab_lock; + +#endif #if defined(CONFIG_SLAB) @@ -104,12 +137,17 @@ struct slab { #error "Unexpected slab allocator configured" #endif + /* See comment for __page_flags above. */ +#ifndef CONFIG_SLAB_VIRTUAL atomic_t __page_refcount; +#endif #ifdef CONFIG_MEMCG unsigned long memcg_data; #endif }; +/* See comment for __page_flags above. */ +#ifndef CONFIG_SLAB_VIRTUAL #define SLAB_MATCH(pg, sl) \ static_assert(offsetof(struct page, pg) == offsetof(struct slab, sl)) SLAB_MATCH(flags, __page_flags); @@ -120,10 +158,15 @@ SLAB_MATCH(memcg_data, memcg_data); #endif #undef SLAB_MATCH static_assert(sizeof(struct slab) <= sizeof(struct page)); +#else +static_assert(sizeof(struct slab) <= STRUCT_SLAB_SIZE); +#endif + #if defined(system_has_freelist_aba) && defined(CONFIG_SLUB) static_assert(IS_ALIGNED(offsetof(struct slab, freelist), sizeof(freelist_aba_t))); #endif +#ifndef CONFIG_SLAB_VIRTUAL /** * folio_slab - Converts from folio to slab. * @folio: The folio. @@ -187,6 +230,14 @@ static_assert(IS_ALIGNED(offsetof(struct slab, freelist), sizeof(freelist_aba_t) * Return: true if s points to a slab and false otherwise. */ #define is_slab_page(s) folio_test_slab(slab_folio(s)) +#else +#define slab_folio(s) (s->backing_folio) +#define is_slab_page(s) is_slab_meta(s) +/* Needed for check_heap_object but never actually used */ +#define folio_slab(folio) NULL +static void *slab_to_virt(const struct slab *s); +#endif /* CONFIG_SLAB_VIRTUAL */ + /* * If network-based swap is enabled, sl*b must keep track of whether pages * were allocated from pfmemalloc reserves. @@ -213,7 +264,11 @@ static inline void __slab_clear_pfmemalloc(struct slab *slab) static inline void *slab_address(const struct slab *slab) { +#ifdef CONFIG_SLAB_VIRTUAL + return slab_to_virt(slab); +#else return folio_address(slab_folio(slab)); +#endif } static inline int slab_nid(const struct slab *slab) @@ -226,6 +281,52 @@ static inline pg_data_t *slab_pgdat(const struct slab *slab) return folio_pgdat(slab_folio(slab)); } +#ifdef CONFIG_SLAB_VIRTUAL +/* + * Internal helper. Returns the address of the struct slab corresponding to + * the virtual memory page containing kaddr. This does a simple arithmetic + * mapping and does *not* return the struct slab of the head page! + */ +static unsigned long virt_to_slab_raw(unsigned long addr) +{ + VM_WARN_ON(!is_slab_addr(addr)); + return SLAB_BASE_ADDR + + ((addr - SLAB_BASE_ADDR) / PAGE_SIZE * sizeof(struct slab)); +} + +static struct slab *virt_to_slab(const void *addr) +{ + struct slab *slab, *slab_head; + + if (!is_slab_addr(addr)) + return NULL; + + slab = (struct slab *)virt_to_slab_raw((unsigned long)addr); + slab_head = slab->compound_slab_head; + + if (CHECK_DATA_CORRUPTION(!is_slab_meta(slab_head), + "compound slab head out of meta range: %p", slab_head)) + return NULL; + + return slab_head; +} + +static void *slab_to_virt(const struct slab *s) +{ + unsigned long slab_idx; + bool unaligned_slab = + ((unsigned long)s - SLAB_BASE_ADDR) % sizeof(*s) != 0; + + if (CHECK_DATA_CORRUPTION(!is_slab_meta(s), "slab not in meta range") || + CHECK_DATA_CORRUPTION(unaligned_slab, "unaligned slab pointer") || + CHECK_DATA_CORRUPTION(s->compound_slab_head != s, + "%s called on non-head slab", __func__)) + return NULL; + + slab_idx = ((unsigned long)s - SLAB_BASE_ADDR) / sizeof(*s); + return (void *)(SLAB_BASE_ADDR + PAGE_SIZE * slab_idx); +} +#else static inline struct slab *virt_to_slab(const void *addr) { struct folio *folio = virt_to_folio(addr); @@ -235,6 +336,7 @@ static inline struct slab *virt_to_slab(const void *addr) return folio_slab(folio); } +#endif /* CONFIG_SLAB_VIRTUAL */ #define OO_SHIFT 16 #define OO_MASK ((1 << OO_SHIFT) - 1) @@ -251,7 +353,11 @@ static inline unsigned int oo_objects(struct kmem_cache_order_objects x) static inline int slab_order(const struct slab *slab) { +#ifndef CONFIG_SLAB_VIRTUAL return folio_order((struct folio *)slab_folio(slab)); +#else + return oo_order(slab->oo); +#endif } static inline size_t slab_size(const struct slab *slab) diff --git a/mm/slab_common.c b/mm/slab_common.c index 42ceaf7e9f47..7754fdba07a0 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -1064,6 +1064,10 @@ void kfree(const void *object) if (unlikely(!is_slab_addr(object))) { folio = virt_to_folio(object); + if (IS_ENABLED(CONFIG_SLAB_VIRTUAL) && + CHECK_DATA_CORRUPTION(folio_test_slab(folio), + "unexpected slab page mapped outside slab range")) + return; free_large_kmalloc(folio, (void *)object); return; } diff --git a/mm/slub.c b/mm/slub.c index a731fdc79bff..66ae60cdadaf 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -38,6 +38,10 @@ #include #include #include +#include +#include +#include +#include #include #include #include @@ -168,6 +172,8 @@ #ifdef CONFIG_SLAB_VIRTUAL unsigned long slub_addr_base = SLAB_DATA_BASE_ADDR; +/* Protects slub_addr_base */ +static DEFINE_SPINLOCK(slub_valloc_lock); #endif /* CONFIG_SLAB_VIRTUAL */ /* @@ -430,19 +436,18 @@ static void prefetch_freepointer(const struct kmem_cache *s, void *object) * get_freepointer_safe() returns initialized memory. */ __no_kmsan_checks -static inline void *get_freepointer_safe(struct kmem_cache *s, void *object, +static inline freeptr_t get_freepointer_safe(struct kmem_cache *s, void *object, struct slab *slab) { - unsigned long freepointer_addr; + unsigned long freepointer_addr = (unsigned long)object + s->offset; freeptr_t p; if (!debug_pagealloc_enabled_static()) - return get_freepointer(s, object, slab); + return *(freeptr_t *)freepointer_addr; object = kasan_reset_tag(object); - freepointer_addr = (unsigned long)object + s->offset; copy_from_kernel_nofault(&p, (freeptr_t *)freepointer_addr, sizeof(p)); - return freelist_ptr_decode(s, p, freepointer_addr, slab); + return p; } static inline void set_freepointer(struct kmem_cache *s, void *object, void *fp) @@ -478,6 +483,17 @@ static inline struct kmem_cache_order_objects oo_make(unsigned int order, return x; } +#ifdef CONFIG_SLAB_VIRTUAL +unsigned long slab_virt_to_phys(unsigned long x) +{ + struct slab *slab = virt_to_slab((void *)x); + struct folio *folio = slab_folio(slab); + + return page_to_phys(folio_page(folio, 0)) + offset_in_folio(folio, x); +} +EXPORT_SYMBOL(slab_virt_to_phys); +#endif + #ifdef CONFIG_SLUB_CPU_PARTIAL static void slub_set_cpu_partial(struct kmem_cache *s, unsigned int nr_objects) { @@ -506,18 +522,26 @@ slub_set_cpu_partial(struct kmem_cache *s, unsigned int nr_objects) */ static __always_inline void slab_lock(struct slab *slab) { +#ifdef CONFIG_SLAB_VIRTUAL + spin_lock(&slab->slab_lock); +#else struct page *page = slab_page(slab); VM_BUG_ON_PAGE(PageTail(page), page); bit_spin_lock(PG_locked, &page->flags); +#endif } static __always_inline void slab_unlock(struct slab *slab) { +#ifdef CONFIG_SLAB_VIRTUAL + spin_unlock(&slab->slab_lock); +#else struct page *page = slab_page(slab); VM_BUG_ON_PAGE(PageTail(page), page); __bit_spin_unlock(PG_locked, &page->flags); +#endif } static inline bool @@ -1863,6 +1887,10 @@ static void folio_set_slab(struct folio *folio, struct slab *slab) /* Make the flag visible before any changes to folio->mapping */ smp_wmb(); +#ifdef CONFIG_SLAB_VIRTUAL + slab->backing_folio = folio; +#endif + if (folio_is_pfmemalloc(folio)) slab_set_pfmemalloc(slab); } @@ -1874,8 +1902,285 @@ static void folio_clear_slab(struct folio *folio, struct slab *slab) /* Make the mapping reset visible before clearing the flag */ smp_wmb(); __folio_clear_slab(folio); +#ifdef CONFIG_SLAB_VIRTUAL + slab->backing_folio = NULL; +#endif +} + +#ifdef CONFIG_SLAB_VIRTUAL +/* + * Make sure we have the necessary page tables for the given address. + * Returns a pointer to the PTE, or NULL on allocation failure. + * + * We're using ugly low-level code here instead of the standard + * helpers because the normal code insists on using GFP_KERNEL. + * + * If may_alloc is false, throw an error if the PTE is not already mapped. + */ +static pte_t *slub_get_ptep(unsigned long address, gfp_t gfp_flags, + bool may_alloc) +{ + pgd_t *pgd = pgd_offset_k(address); + p4d_t *p4d; + pud_t *pud; + pmd_t *pmd; + unsigned long flags; + struct page *spare_page = NULL; + +retry: + spin_lock_irqsave(&slub_valloc_lock, flags); + /* + * The top-level entry should already be present - see + * preallocate_top_level_entries(). + */ + BUG_ON(pgd_none(READ_ONCE(*pgd))); + p4d = p4d_offset(pgd, address); + if (p4d_none(READ_ONCE(*p4d))) { + if (!spare_page) + goto need_page; + p4d_populate(&init_mm, p4d, (pud_t *)page_to_virt(spare_page)); + goto need_page; + + } + pud = pud_offset(p4d, address); + if (pud_none(READ_ONCE(*pud))) { + if (!spare_page) + goto need_page; + pud_populate(&init_mm, pud, (pmd_t *)page_to_virt(spare_page)); + goto need_page; + } + pmd = pmd_offset(pud, address); + if (pmd_none(READ_ONCE(*pmd))) { + if (!spare_page) + goto need_page; + pmd_populate_kernel(&init_mm, pmd, + (pte_t *)page_to_virt(spare_page)); + spare_page = NULL; + } + spin_unlock_irqrestore(&slub_valloc_lock, flags); + if (spare_page) + __free_page(spare_page); + return pte_offset_kernel(pmd, address); + +need_page: + spin_unlock_irqrestore(&slub_valloc_lock, flags); + VM_WARN_ON(!may_alloc); + spare_page = alloc_page(gfp_flags); + if (unlikely(!spare_page)) + return NULL; + /* ensure ordering between page zeroing and PTE write */ + smp_wmb(); + goto retry; +} + +/* + * Reserve a range of virtual address space, ensure that we have page tables for + * it, and allocate a corresponding struct slab. + * This is cold code, we don't really have to worry about performance here. + */ +static struct slab *alloc_slab_meta(unsigned int order, gfp_t gfp_flags) +{ + unsigned long alloc_size = PAGE_SIZE << order; + unsigned long flags; + unsigned long old_base; + unsigned long data_range_start, data_range_end; + unsigned long meta_range_start, meta_range_end; + unsigned long addr; + struct slab *slab, *sp; + bool valid_start, valid_end; + + gfp_flags &= (__GFP_HIGH | __GFP_RECLAIM | __GFP_IO | + __GFP_FS | __GFP_NOWARN | __GFP_RETRY_MAYFAIL | + __GFP_NOFAIL | __GFP_NORETRY | __GFP_MEMALLOC | + __GFP_NOMEMALLOC); + /* New page tables and metadata pages should be zeroed */ + gfp_flags |= __GFP_ZERO; + + spin_lock_irqsave(&slub_valloc_lock, flags); +retry_locked: + old_base = slub_addr_base; + + /* + * We drop the lock. The following code might sleep during + * page table allocation. Any mutations we make before rechecking + * slub_addr_base are idempotent, so that's fine. + */ + spin_unlock_irqrestore(&slub_valloc_lock, flags); + + /* + * [data_range_start, data_range_end) is the virtual address range where + * this slab's objects will be mapped. + * We want alignment appropriate for the order. Note that this could be + * relaxed based on the alignment requirements of the objects being + * allocated, but for now, we behave like the page allocator would. + */ + data_range_start = ALIGN(old_base, alloc_size); + data_range_end = data_range_start + alloc_size; + + valid_start = data_range_start >= SLAB_BASE_ADDR && + IS_ALIGNED(data_range_start, PAGE_SIZE); + valid_end = data_range_end >= SLAB_BASE_ADDR && + IS_ALIGNED(data_range_end, PAGE_SIZE); + if (CHECK_DATA_CORRUPTION(!valid_start, + "invalid slab data range start") || + CHECK_DATA_CORRUPTION(!valid_end, + "invalid slab data range end")) + return NULL; + + /* We ran out of virtual memory for slabs */ + if (WARN_ON_ONCE(data_range_start >= SLAB_END_ADDR || + data_range_end >= SLAB_END_ADDR)) + return NULL; + + /* + * [meta_range_start, meta_range_end) is the range where the struct + * slabs for the current data range are mapped. The first struct slab, + * located at meta_range_start is the head slab that contains the actual + * data, all other struct slabs in the range point to the head slab. + */ + meta_range_start = virt_to_slab_raw(data_range_start); + meta_range_end = virt_to_slab_raw(data_range_end); + + /* Ensure the meta range is mapped. */ + for (addr = ALIGN_DOWN(meta_range_start, PAGE_SIZE); + addr < meta_range_end; addr += PAGE_SIZE) { + pte_t *ptep = slub_get_ptep(addr, gfp_flags, true); + + if (ptep == NULL) + return NULL; + + spin_lock_irqsave(&slub_valloc_lock, flags); + if (pte_none(READ_ONCE(*ptep))) { + struct page *meta_page; + + spin_unlock_irqrestore(&slub_valloc_lock, flags); + meta_page = alloc_page(gfp_flags); + if (meta_page == NULL) + return NULL; + spin_lock_irqsave(&slub_valloc_lock, flags); + + /* Make sure that no one else has already mapped that page */ + if (pte_none(READ_ONCE(*ptep))) + set_pte_safe(ptep, + mk_pte(meta_page, PAGE_KERNEL)); + else + __free_page(meta_page); + } + spin_unlock_irqrestore(&slub_valloc_lock, flags); + } + + /* Ensure we have page tables for the data range. */ + for (addr = data_range_start; addr < data_range_end; + addr += PAGE_SIZE) { + pte_t *ptep = slub_get_ptep(addr, gfp_flags, true); + + if (ptep == NULL) + return NULL; + } + + /* Did we race with someone else who made forward progress? */ + spin_lock_irqsave(&slub_valloc_lock, flags); + if (old_base != slub_addr_base) + goto retry_locked; + + /* Success! Grab the range for ourselves. */ + slub_addr_base = data_range_end; + spin_unlock_irqrestore(&slub_valloc_lock, flags); + + slab = (struct slab *)meta_range_start; + spin_lock_init(&slab->slab_lock); + + /* Initialize basic slub metadata for virt_to_slab() */ + for (sp = slab; (unsigned long)sp < meta_range_end; sp++) + sp->compound_slab_head = slab; + + return slab; +} + +/* Get an unused slab, or allocate a new one */ +static struct slab *get_free_slab(struct kmem_cache *s, + struct kmem_cache_order_objects oo, gfp_t meta_gfp_flags, + struct list_head *freed_slabs) +{ + unsigned long flags; + struct slab *slab; + + spin_lock_irqsave(&s->virtual.freed_slabs_lock, flags); + slab = list_first_entry_or_null(freed_slabs, struct slab, slab_list); + + if (likely(slab)) { + list_del(&slab->slab_list); + + spin_unlock_irqrestore(&s->virtual.freed_slabs_lock, flags); + return slab; + } + + spin_unlock_irqrestore(&s->virtual.freed_slabs_lock, flags); + slab = alloc_slab_meta(oo_order(oo), meta_gfp_flags); + if (slab == NULL) + return NULL; + + return slab; } +static struct slab *alloc_slab_page(struct kmem_cache *s, + gfp_t meta_gfp_flags, gfp_t gfp_flags, int node, + struct kmem_cache_order_objects oo) +{ + struct folio *folio; + struct slab *slab; + unsigned int order = oo_order(oo); + unsigned long flags; + void *virt_mapping; + pte_t *ptep; + struct list_head *freed_slabs; + + if (order == oo_order(s->min)) + freed_slabs = &s->virtual.freed_slabs_min; + else + freed_slabs = &s->virtual.freed_slabs; + + slab = get_free_slab(s, oo, meta_gfp_flags, freed_slabs); + + /* + * Avoid making UAF reads easily exploitable by repopulating + * with pages containing attacker-controller data - always zero + * pages. + */ + gfp_flags |= __GFP_ZERO; + if (node == NUMA_NO_NODE) + folio = (struct folio *)alloc_pages(gfp_flags, order); + else + folio = (struct folio *)__alloc_pages_node(node, gfp_flags, + order); + + if (!folio) { + /* Rollback: put the struct slab back. */ + spin_lock_irqsave(&s->virtual.freed_slabs_lock, flags); + list_add(&slab->slab_list, freed_slabs); + spin_unlock_irqrestore(&s->virtual.freed_slabs_lock, flags); + + return NULL; + } + folio_set_slab(folio, slab); + + slab->oo = oo; + + virt_mapping = slab_to_virt(slab); + + /* Wire up physical folio */ + for (unsigned long i = 0; i < (1UL << oo_order(oo)); i++) { + ptep = slub_get_ptep( + (unsigned long)virt_mapping + i * PAGE_SIZE, 0, false); + if (CHECK_DATA_CORRUPTION(pte_present(*ptep), + "slab PTE already present")) + return NULL; + set_pte_safe(ptep, mk_pte(folio_page(folio, i), PAGE_KERNEL)); + } + + return slab; +} +#else static inline struct slab *alloc_slab_page(struct kmem_cache *s, gfp_t meta_flags, gfp_t flags, int node, struct kmem_cache_order_objects oo) @@ -1897,6 +2202,7 @@ static inline struct slab *alloc_slab_page(struct kmem_cache *s, return slab; } +#endif /* CONFIG_SLAB_VIRTUAL */ #ifdef CONFIG_SLAB_FREELIST_RANDOM /* Pre-initialize the random sequence cache */ @@ -2085,6 +2391,94 @@ static struct slab *new_slab(struct kmem_cache *s, gfp_t flags, int node) flags & (GFP_RECLAIM_MASK | GFP_CONSTRAINT_MASK), node); } +#ifdef CONFIG_SLAB_VIRTUAL +static DEFINE_SPINLOCK(slub_kworker_lock); +static struct kthread_worker *slub_kworker; +static LIST_HEAD(slub_tlbflush_queue); + +static void slub_tlbflush_worker(struct kthread_work *work) +{ + unsigned long irq_flags; + LIST_HEAD(local_queue); + struct slab *slab, *tmp; + unsigned long addr_start = ULONG_MAX; + unsigned long addr_end = 0; + + spin_lock_irqsave(&slub_kworker_lock, irq_flags); + list_splice_init(&slub_tlbflush_queue, &local_queue); + list_for_each_entry(slab, &local_queue, flush_list_elem) { + unsigned long start = (unsigned long)slab_to_virt(slab); + unsigned long end = start + PAGE_SIZE * + (1UL << oo_order(slab->oo)); + + if (start < addr_start) + addr_start = start; + if (end > addr_end) + addr_end = end; + } + spin_unlock_irqrestore(&slub_kworker_lock, irq_flags); + + if (addr_start < addr_end) + flush_tlb_kernel_range(addr_start, addr_end); + + spin_lock_irqsave(&slub_kworker_lock, irq_flags); + list_for_each_entry_safe(slab, tmp, &local_queue, flush_list_elem) { + struct folio *folio = slab_folio(slab); + struct kmem_cache *s = slab->slab_cache; + + list_del(&slab->flush_list_elem); + folio_clear_slab(folio, slab); + __free_pages(folio_page(folio, 0), oo_order(slab->oo)); + + /* IRQs are already off */ + spin_lock(&s->virtual.freed_slabs_lock); + if (oo_order(slab->oo) == oo_order(s->oo)) { + list_add(&slab->slab_list, &s->virtual.freed_slabs); + } else { + WARN_ON(oo_order(slab->oo) != oo_order(s->min)); + list_add(&slab->slab_list, &s->virtual.freed_slabs_min); + } + spin_unlock(&s->virtual.freed_slabs_lock); + } + spin_unlock_irqrestore(&slub_kworker_lock, irq_flags); +} +static DEFINE_KTHREAD_WORK(slub_tlbflush_work, slub_tlbflush_worker); + +static void __free_slab(struct kmem_cache *s, struct slab *slab) +{ + int order = oo_order(slab->oo); + unsigned long pages = 1UL << order; + unsigned long slab_base = (unsigned long)slab_address(slab); + unsigned long irq_flags; + + /* Clear the PTEs for the slab we're freeing */ + for (unsigned long i = 0; i < pages; i++) { + unsigned long addr = slab_base + i * PAGE_SIZE; + pte_t *ptep = slub_get_ptep(addr, 0, false); + + if (CHECK_DATA_CORRUPTION(!pte_present(*ptep), + "slab PTE already clear")) + return; + + ptep_clear(&init_mm, addr, ptep); + } + + mm_account_reclaimed_pages(pages); + unaccount_slab(slab, order, s); + + /* + * We might not be able to a TLB flush here (e.g. hardware interrupt + * handlers) so instead we give the slab to the TLB flusher thread + * which will flush the TLB for us and only then free the physical + * memory. + */ + spin_lock_irqsave(&slub_kworker_lock, irq_flags); + list_add(&slab->flush_list_elem, &slub_tlbflush_queue); + spin_unlock_irqrestore(&slub_kworker_lock, irq_flags); + if (READ_ONCE(slub_kworker) != NULL) + kthread_queue_work(slub_kworker, &slub_tlbflush_work); +} +#else static void __free_slab(struct kmem_cache *s, struct slab *slab) { struct folio *folio = slab_folio(slab); @@ -2096,6 +2490,7 @@ static void __free_slab(struct kmem_cache *s, struct slab *slab) unaccount_slab(slab, order, s); __free_pages(&folio->page, order); } +#endif /* CONFIG_SLAB_VIRTUAL */ static void rcu_free_slab(struct rcu_head *h) { @@ -3384,7 +3779,15 @@ static __always_inline void *__slab_alloc_node(struct kmem_cache *s, unlikely(!object || !slab || !node_match(slab, node))) { object = __slab_alloc(s, gfpflags, node, addr, c, orig_size); } else { - void *next_object = get_freepointer_safe(s, object, slab); + void *next_object; + freeptr_t next_encoded = get_freepointer_safe(s, object, slab); + + if (unlikely(READ_ONCE(c->tid) != tid)) + goto redo; + + next_object = freelist_ptr_decode(s, next_encoded, + (unsigned long)kasan_reset_tag(object) + s->offset, + slab); /* * The cmpxchg will only match if there was no additional @@ -5050,6 +5453,30 @@ static struct kmem_cache * __init bootstrap(struct kmem_cache *static_cache) return s; } +#ifdef CONFIG_SLAB_VIRTUAL +/* + * Late initialization of reclaim kthread. + * This has to happen way later than kmem_cache_init() because it depends on + * having all the kthread infrastructure ready. + */ +void __init init_slub_page_reclaim(void) +{ + struct kthread_worker *w; + + w = kthread_create_worker(0, "slub-physmem-reclaim"); + if (IS_ERR(w)) + panic("unable to create slub-physmem-reclaim worker"); + + /* + * Make sure that the kworker is properly initialized before making + * the store visible to other CPUs. The free path will check that + * slub_kworker is not NULL before attempting to give the TLB flusher + * pages to free. + */ + smp_store_release(&slub_kworker, w); +} +#endif /* CONFIG_SLAB_VIRTUAL */ + void __init kmem_cache_init(void) { static __initdata struct kmem_cache boot_kmem_cache, diff --git a/mm/usercopy.c b/mm/usercopy.c index 83c164aba6e0..8b30906ca7f9 100644 --- a/mm/usercopy.c +++ b/mm/usercopy.c @@ -189,9 +189,19 @@ static inline void check_heap_object(const void *ptr, unsigned long n, if (!virt_addr_valid(ptr)) return; + /* + * We need to check this first because when CONFIG_SLAB_VIRTUAL is + * enabled a slab address might not be backed by a folio. + */ + if (IS_ENABLED(CONFIG_SLAB_VIRTUAL) && is_slab_addr(ptr)) { + /* Check slab allocator for flags and size. */ + __check_heap_object(ptr, n, virt_to_slab(ptr), to_user); + return; + } + folio = virt_to_folio(ptr); - if (folio_test_slab(folio)) { + if (!IS_ENABLED(CONFIG_SLAB_VIRTUAL) && folio_test_slab(folio)) { /* Check slab allocator for flags and size. */ __check_heap_object(ptr, n, folio_slab(folio), to_user); } else if (folio_test_large(folio)) { From patchwork Fri Sep 15 10:59:31 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matteo Rizzo X-Patchwork-Id: 13386905 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2DCB6EE6456 for ; Fri, 15 Sep 2023 11:01:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234183AbjIOLBk (ORCPT ); Fri, 15 Sep 2023 07:01:40 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33510 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232796AbjIOLBf (ORCPT ); Fri, 15 Sep 2023 07:01:35 -0400 Received: from mail-ej1-x649.google.com (mail-ej1-x649.google.com [IPv6:2a00:1450:4864:20::649]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 407A12733 for ; Fri, 15 Sep 2023 04:00:11 -0700 (PDT) Received: by mail-ej1-x649.google.com with SMTP id a640c23a62f3a-9a9d7a801a3so143155166b.2 for ; Fri, 15 Sep 2023 04:00:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1694775609; x=1695380409; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=Xv8bl1gghcslEiMH0/FhgvPpwPOagI6elAD3nLRWX2k=; b=S2p93F7A0wqiuJDV2GRyCDi4nQ/GWf5rYiMWy293ffMZgTknAV9VtkFjanCFblpjeZ Vz8riOrDUxMru8/Xd+jpL+9x03lPexD3tetyF6nW+ZLbmerLobIp+mZKD17m9CNfUpsE MOqVfSqBBVEwpWiGN4oAFbfSgW2oDtnN6GHD4WjvkPcaimIqVdD4ytihCT2228jmRTAH 4jruLV64m/j0Dw4HwPpSfLEXH1kKv5/XByh5i9fjp1NAjcewXlYamwpJh3hR/KRt2IZJ WL363SQmgT7XUMl8Xni4TtoUSQoQYLnnhqDNxZLxAbv/ueK4lNbA92kVd7zi0iRu4Gm1 YX5g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1694775609; x=1695380409; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Xv8bl1gghcslEiMH0/FhgvPpwPOagI6elAD3nLRWX2k=; b=L2NlU6U6Cb/xDqQoN3cuAPlE6G9zWzCHTRDADNYfbem5fERh31VPs2hrWwrRMB+2dT PF1rm6pcjpKjNsFQrB4ojSzGf+RdSAYmO18J/oclGJ5DzH2LApGaiRZhKeh1UMEIiSJZ QqforKoo1b0Sw1i/nXJRbY29iCPr4lVvPP//vQbCh26oVUJ9biWbmNoX0zzQoN/DT5kA hT4dnEEySEY2hzjS/Hbb7oClhvFbBICTGP1kdF1N3kmY+Pul7gXBKFdb1F1XiDkVEGqS XDaCwXKEV0vzDSppfI/36XgcdWgfqbgzPcwAHKf5OLlut/1UxgFByex38TC3dplIFmZO u1Ww== X-Gm-Message-State: AOJu0Yx6WUQCTHo8xFiQKPupCcUVh7qxeTISLdSX/YQbVwDWRWtijGSU QLBZ/aKrfbcZNvoINmx0IVJFcD+oA5ynjX31NQ== X-Google-Smtp-Source: AGHT+IEynAr2SnY1fMJiDuLoEg0/d2a8kjbFH3ONSC9J81629RSdKiVLHnan7ER1f/DyQsRkv3eYGMXRY5zwIxc8BQ== X-Received: from mr-cloudtop2.c.googlers.com ([fda3:e722:ac3:cc00:31:98fb:c0a8:2a6]) (user=matteorizzo job=sendgmr) by 2002:a17:907:71d7:b0:9ad:c478:586b with SMTP id zw23-20020a17090771d700b009adc478586bmr6329ejb.13.1694775609336; Fri, 15 Sep 2023 04:00:09 -0700 (PDT) Date: Fri, 15 Sep 2023 10:59:31 +0000 In-Reply-To: <20230915105933.495735-1-matteorizzo@google.com> Mime-Version: 1.0 References: <20230915105933.495735-1-matteorizzo@google.com> X-Mailer: git-send-email 2.42.0.459.ge4e396fd5e-goog Message-ID: <20230915105933.495735-13-matteorizzo@google.com> Subject: [RFC PATCH 12/14] mm/slub: introduce the deallocated_pages sysfs attribute From: Matteo Rizzo To: cl@linux.com, penberg@kernel.org, rientjes@google.com, iamjoonsoo.kim@lge.com, akpm@linux-foundation.org, vbabka@suse.cz, roman.gushchin@linux.dev, 42.hyeyoo@gmail.com, keescook@chromium.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-hardening@vger.kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org, hpa@zytor.com, corbet@lwn.net, luto@kernel.org, peterz@infradead.org Cc: jannh@google.com, matteorizzo@google.com, evn@google.com, poprdi@google.com, jordyzomer@google.com Precedence: bulk List-ID: X-Mailing-List: linux-hardening@vger.kernel.org From: Jann Horn When SLAB_VIRTUAL is enabled this new sysfs attribute tracks the number of slab pages whose physical memory has been reclaimed but whose virtual memory is still allocated to a kmem_cache. Signed-off-by: Jann Horn Co-developed-by: Matteo Rizzo Signed-off-by: Matteo Rizzo Reviewed-by: Kees Cook --- include/linux/slub_def.h | 4 +++- mm/slub.c | 18 ++++++++++++++++++ 2 files changed, 21 insertions(+), 1 deletion(-) diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h index 693e9bb34edc..eea402d849da 100644 --- a/include/linux/slub_def.h +++ b/include/linux/slub_def.h @@ -88,7 +88,7 @@ struct kmem_cache_cpu { */ struct kmem_cache_virtual { #ifdef CONFIG_SLAB_VIRTUAL - /* Protects freed_slabs and freed_slabs_min */ + /* Protects freed_slabs, freed_slabs_min, and nr_free_pages */ spinlock_t freed_slabs_lock; /* * Slabs on this list have virtual memory of size oo allocated to them @@ -97,6 +97,8 @@ struct kmem_cache_virtual { struct list_head freed_slabs; /* Same as freed_slabs but with memory of size min */ struct list_head freed_slabs_min; + /* Number of slab pages which got freed */ + unsigned long nr_freed_pages; #endif }; diff --git a/mm/slub.c b/mm/slub.c index 66ae60cdadaf..0f7f5bf0b174 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -2110,6 +2110,8 @@ static struct slab *get_free_slab(struct kmem_cache *s, if (likely(slab)) { list_del(&slab->slab_list); + WRITE_ONCE(s->virtual.nr_freed_pages, + s->virtual.nr_freed_pages - (1UL << slab_order(slab))); spin_unlock_irqrestore(&s->virtual.freed_slabs_lock, flags); return slab; @@ -2158,6 +2160,8 @@ static struct slab *alloc_slab_page(struct kmem_cache *s, /* Rollback: put the struct slab back. */ spin_lock_irqsave(&s->virtual.freed_slabs_lock, flags); list_add(&slab->slab_list, freed_slabs); + WRITE_ONCE(s->virtual.nr_freed_pages, + s->virtual.nr_freed_pages + (1UL << slab_order(slab))); spin_unlock_irqrestore(&s->virtual.freed_slabs_lock, flags); return NULL; @@ -2438,6 +2442,8 @@ static void slub_tlbflush_worker(struct kthread_work *work) WARN_ON(oo_order(slab->oo) != oo_order(s->min)); list_add(&slab->slab_list, &s->virtual.freed_slabs_min); } + WRITE_ONCE(s->virtual.nr_freed_pages, s->virtual.nr_freed_pages + + (1UL << slab_order(slab))); spin_unlock(&s->virtual.freed_slabs_lock); } spin_unlock_irqrestore(&slub_kworker_lock, irq_flags); @@ -4924,6 +4930,7 @@ static inline void slab_virtual_open(struct kmem_cache *s) spin_lock_init(&s->virtual.freed_slabs_lock); INIT_LIST_HEAD(&s->virtual.freed_slabs); INIT_LIST_HEAD(&s->virtual.freed_slabs_min); + s->virtual.nr_freed_pages = 0; #endif } @@ -6098,6 +6105,14 @@ static ssize_t objects_partial_show(struct kmem_cache *s, char *buf) } SLAB_ATTR_RO(objects_partial); +#ifdef CONFIG_SLAB_VIRTUAL +static ssize_t deallocated_pages_show(struct kmem_cache *s, char *buf) +{ + return sysfs_emit(buf, "%lu\n", READ_ONCE(s->virtual.nr_freed_pages)); +} +SLAB_ATTR_RO(deallocated_pages); +#endif /* CONFIG_SLAB_VIRTUAL */ + static ssize_t slabs_cpu_partial_show(struct kmem_cache *s, char *buf) { int objects = 0; @@ -6424,6 +6439,9 @@ static struct attribute *slab_attrs[] = { &min_partial_attr.attr, &cpu_partial_attr.attr, &objects_partial_attr.attr, +#ifdef CONFIG_SLAB_VIRTUAL + &deallocated_pages_attr.attr, +#endif &partial_attr.attr, &cpu_slabs_attr.attr, &ctor_attr.attr, From patchwork Fri Sep 15 10:59:32 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matteo Rizzo X-Patchwork-Id: 13386906 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9DEC6EE6455 for ; Fri, 15 Sep 2023 11:01:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234305AbjIOLBl (ORCPT ); Fri, 15 Sep 2023 07:01:41 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33536 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234330AbjIOLBi (ORCPT ); Fri, 15 Sep 2023 07:01:38 -0400 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D42BD2D43 for ; Fri, 15 Sep 2023 04:00:12 -0700 (PDT) Received: by mail-yb1-xb49.google.com with SMTP id 3f1490d57ef6-d8191a1d5acso1962556276.1 for ; Fri, 15 Sep 2023 04:00:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1694775612; x=1695380412; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=C4pEChVXRU6sm004Zu1IJnbwIsY1yXPwkOtMip47RAc=; b=caER2+GTwQV0Eb9KjW9LgnO5MO+4bH08gvZvDYLwhRzAj6vxyz9Kx4oIWpM2gnGD2r iv1hfpc35TbryyXHFB3fNa4MQ9qx/OVRnmd5ENOT+vwdBYQ173BABw0om86QdNTmPAcV WlPYBxZpLu/EJTw1MVq2Nj5buHDgztP2ChREUy1a6eoQQT48nPtCneIDyxYu3CEv3VnV GdG/BYmTZyu9toFczoU4gQUY2H7pyZNWjZa2/QeCLrhz/mqxd+hwaHhkvpCi0vZ5HkQB JgQ7s0FDXRt2U08+93YHyKMUOAQEa8dT04K2Z7mHKYrRa0YtjndYwTXJz5JcCYbiRA/f 0asw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1694775612; x=1695380412; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=C4pEChVXRU6sm004Zu1IJnbwIsY1yXPwkOtMip47RAc=; b=Q9amF84iGNSIx0v0lozPMkOShSozD6W4qRHsEnUNe0i+Ylsn14Zy37rNxddqMC3o5k 1B8RrpJGvKVQSAhQrCbwqKO8mvg55DfUCvfpS3oMObNSif1dyXYV2VyLP7WlVPokLjvI ldGqguG/p9OTpoaYLShg91H8aAB304JgWbhKADe/JslSCseujnEcFoQywJD4gI1JVKvz jD1ONbNJNRhXNnF13ZPh2fm78W2D613rreCeLBWH8AljcMptphw6JysFOJ78vfl+pVCH WXnGHNzpSIIjdY00aLzYx1fm6UEYQuuq0yd/NYePg9r7rTuAIExOuunv8V5+Xcw3fa9l /zFA== X-Gm-Message-State: AOJu0YzjCfmgeL2LpTIQGwFpNXSctMd19o66ZK/86zwjDTBiz1rQ2Q/H BKhMTYjoBVyzOB0GNEy6Ktyf65z79E1fhk75sw== X-Google-Smtp-Source: AGHT+IHt9z6QcITgpugvtEYL0lGwEogSA+C1s55ADVWGOnuIpG1JocMIlh+GBo6p7lHOr6nsErM2osN2OLhU91MuRQ== X-Received: from mr-cloudtop2.c.googlers.com ([fda3:e722:ac3:cc00:31:98fb:c0a8:2a6]) (user=matteorizzo job=sendgmr) by 2002:a05:6902:118a:b0:d80:183c:92b9 with SMTP id m10-20020a056902118a00b00d80183c92b9mr29633ybu.4.1694775611854; Fri, 15 Sep 2023 04:00:11 -0700 (PDT) Date: Fri, 15 Sep 2023 10:59:32 +0000 In-Reply-To: <20230915105933.495735-1-matteorizzo@google.com> Mime-Version: 1.0 References: <20230915105933.495735-1-matteorizzo@google.com> X-Mailer: git-send-email 2.42.0.459.ge4e396fd5e-goog Message-ID: <20230915105933.495735-14-matteorizzo@google.com> Subject: [RFC PATCH 13/14] mm/slub: sanity-check freepointers From: Matteo Rizzo To: cl@linux.com, penberg@kernel.org, rientjes@google.com, iamjoonsoo.kim@lge.com, akpm@linux-foundation.org, vbabka@suse.cz, roman.gushchin@linux.dev, 42.hyeyoo@gmail.com, keescook@chromium.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-hardening@vger.kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org, hpa@zytor.com, corbet@lwn.net, luto@kernel.org, peterz@infradead.org Cc: jannh@google.com, matteorizzo@google.com, evn@google.com, poprdi@google.com, jordyzomer@google.com Precedence: bulk List-ID: X-Mailing-List: linux-hardening@vger.kernel.org From: Jann Horn Sanity-check that: - non-NULL freepointers point into the slab - freepointers look plausibly aligned Signed-off-by: Jann Horn Co-developed-by: Matteo Rizzo Signed-off-by: Matteo Rizzo Reviewed-by: Kees Cook --- lib/slub_kunit.c | 4 ++++ mm/slab.h | 8 +++++++ mm/slub.c | 57 ++++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 69 insertions(+) diff --git a/lib/slub_kunit.c b/lib/slub_kunit.c index d4a3730b08fa..acf8600bd1fd 100644 --- a/lib/slub_kunit.c +++ b/lib/slub_kunit.c @@ -45,6 +45,10 @@ static void test_clobber_zone(struct kunit *test) #ifndef CONFIG_KASAN static void test_next_pointer(struct kunit *test) { + if (IS_ENABLED(CONFIG_SLAB_VIRTUAL)) + kunit_skip(test, + "incompatible with freepointer corruption detection in CONFIG_SLAB_VIRTUAL"); + struct kmem_cache *s = test_kmem_cache_create("TestSlub_next_ptr_free", 64, SLAB_POISON); u8 *p = kmem_cache_alloc(s, GFP_KERNEL); diff --git a/mm/slab.h b/mm/slab.h index 460c802924bd..8d10a011bdf0 100644 --- a/mm/slab.h +++ b/mm/slab.h @@ -79,6 +79,14 @@ struct slab { struct list_head flush_list_elem; + /* + * Not in kmem_cache because it depends on whether the allocation is + * normal order or fallback order. + * an alternative might be to over-allocate virtual memory for + * fallback-order pages. + */ + unsigned long align_mask; + /* Replaces the page lock */ spinlock_t slab_lock; diff --git a/mm/slub.c b/mm/slub.c index 0f7f5bf0b174..57474c8a6569 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -392,6 +392,44 @@ static inline freeptr_t freelist_ptr_encode(const struct kmem_cache *s, return (freeptr_t){.v = encoded}; } +/* + * Does some validation of freelist pointers. Without SLAB_VIRTUAL this is + * currently a no-op. + */ +static inline bool freelist_pointer_corrupted(struct slab *slab, freeptr_t ptr, + void *decoded) +{ +#ifdef CONFIG_SLAB_VIRTUAL + /* + * If the freepointer decodes to 0, use 0 as the slab_base so that + * the check below always passes (0 & slab->align_mask == 0). + */ + unsigned long slab_base = decoded ? (unsigned long)slab_to_virt(slab) + : 0; + + /* + * This verifies that the SLUB freepointer does not point outside the + * slab. Since at that point we can basically do it for free, it also + * checks that the pointer alignment looks vaguely sane. + * However, we probably don't want the cost of a proper division here, + * so instead we just do a cheap check whether the bottom bits that are + * clear in the size are also clear in the pointer. + * So for kmalloc-32, it does a perfect alignment check, but for + * kmalloc-192, it just checks that the pointer is a multiple of 32. + * This should probably be reconsidered - is this a good tradeoff, or + * should that part be thrown out, or do we want a proper accurate + * alignment check (and can we make it work with acceptable performance + * cost compared to the security improvement - probably not)? + */ + return CHECK_DATA_CORRUPTION( + ((unsigned long)decoded & slab->align_mask) != slab_base, + "bad freeptr (encoded %lx, ptr %p, base %lx, mask %lx", + ptr.v, decoded, slab_base, slab->align_mask); +#else + return false; +#endif +} + static inline void *freelist_ptr_decode(const struct kmem_cache *s, freeptr_t ptr, unsigned long ptr_addr, struct slab *slab) @@ -403,6 +441,10 @@ static inline void *freelist_ptr_decode(const struct kmem_cache *s, #else decoded = (void *)ptr.v; #endif + + if (unlikely(freelist_pointer_corrupted(slab, ptr, decoded))) + return NULL; + return decoded; } @@ -2122,6 +2164,21 @@ static struct slab *get_free_slab(struct kmem_cache *s, if (slab == NULL) return NULL; + /* + * Bits that must be equal to start-of-slab address for all + * objects inside the slab. + * For compatibility with pointer tagging (like in HWASAN), this would + * need to clear the pointer tag bits from the mask. + */ + slab->align_mask = ~((PAGE_SIZE << oo_order(oo)) - 1); + + /* + * Object alignment bits (must be zero, which is equal to the bits in + * the start-of-slab address) + */ + if (s->red_left_pad == 0) + slab->align_mask |= (1 << (ffs(s->size) - 1)) - 1; + return slab; } From patchwork Fri Sep 15 10:59:33 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matteo Rizzo X-Patchwork-Id: 13386907 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CAF33EE6456 for ; Fri, 15 Sep 2023 11:02:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234313AbjIOLCe (ORCPT ); Fri, 15 Sep 2023 07:02:34 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35206 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234325AbjIOLCd (ORCPT ); Fri, 15 Sep 2023 07:02:33 -0400 Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 323A52D55 for ; Fri, 15 Sep 2023 04:00:15 -0700 (PDT) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-594e5e2e608so25545787b3.2 for ; Fri, 15 Sep 2023 04:00:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1694775614; x=1695380414; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=/kdjtZjOl0pG1Y8OuH7SH0XgMhsH4ZDWKrdEtcTG86U=; b=3ndptF9yE0L023Xk0WgZY1aIs2w8tVLrrV0WwCIQAni/dKRZByoLnrT4Lu1rqRxp4f 3aE2B70ZTkvS/LjcxEI6kN/ibdqHpR4C0ru2VUsEkAnV8zATtSXLvuNoLPr0cfTii1DQ crghhB/Xk/og2ZfPvknCkfOpKwNz8M04i8MHtgefFYZKRTjvZdFXYHTrs6xR8KkJmPWg i95HteKNtPwzhxRjoPS0s85gf4CWtjHzloF9Irb4BKNgiq+MdlPNvUkgApzudF4r/Bo7 jtEl/yrmZCLqEUeC8CNGp9VFPm3KBg9kur28tfqVBmd6tzAQFCS0tJvvB0BVA4li4xl2 HHBg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1694775614; x=1695380414; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=/kdjtZjOl0pG1Y8OuH7SH0XgMhsH4ZDWKrdEtcTG86U=; b=q4iCCh3jt8b7kRm1m3kJTDUTbajKT9JuLchgGU9Sub6lVHSv9nMQM1dfep1zbtIMlw Cf04uosgdHeY/B4EiAcFPGnbgg6XhCgRYEA0Uun4ZGjouFrw1wN2fR8MyZek5MyVBn4V Pjugah/MmiZAAwgL2SqdG7Y1txIOisHEiEH7oWrhyPzGUBCEETxvqbTlfILl+r9KwS19 KHvWzfjQGg+hvGRfBOJurb1F5JwYcp5SMbMKuxgAzsBuQstEzok892gJVm0gueIwr4tH LGaZyg7RTISa3FcZXF1xBEPOR5HGWbafRQ2MLHGajhKkomGaAzTuPlBXdjedTQ/KPbom LtsA== X-Gm-Message-State: AOJu0YzGVJVHJLcS7D4siLYnEQ9+qLJTrT1BVuqXw6Ps0tioy2sRDv1V 26vMJr06SlSEOSOcUNWYSSU18REOIo3yIpMKtA== X-Google-Smtp-Source: AGHT+IH3LIX5OX5bZq/S9d+gFcO+JqbLY4Sep9fDWrOQHGQEzWkq5gu7AdHBA2+dfu8HpxR++S9aMZH+wQ9NGJ4UEA== X-Received: from mr-cloudtop2.c.googlers.com ([fda3:e722:ac3:cc00:31:98fb:c0a8:2a6]) (user=matteorizzo job=sendgmr) by 2002:a81:b643:0:b0:59b:f863:6f60 with SMTP id h3-20020a81b643000000b0059bf8636f60mr37260ywk.4.1694775614195; Fri, 15 Sep 2023 04:00:14 -0700 (PDT) Date: Fri, 15 Sep 2023 10:59:33 +0000 In-Reply-To: <20230915105933.495735-1-matteorizzo@google.com> Mime-Version: 1.0 References: <20230915105933.495735-1-matteorizzo@google.com> X-Mailer: git-send-email 2.42.0.459.ge4e396fd5e-goog Message-ID: <20230915105933.495735-15-matteorizzo@google.com> Subject: [RFC PATCH 14/14] security: add documentation for SLAB_VIRTUAL From: Matteo Rizzo To: cl@linux.com, penberg@kernel.org, rientjes@google.com, iamjoonsoo.kim@lge.com, akpm@linux-foundation.org, vbabka@suse.cz, roman.gushchin@linux.dev, 42.hyeyoo@gmail.com, keescook@chromium.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-hardening@vger.kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org, hpa@zytor.com, corbet@lwn.net, luto@kernel.org, peterz@infradead.org Cc: jannh@google.com, matteorizzo@google.com, evn@google.com, poprdi@google.com, jordyzomer@google.com Precedence: bulk List-ID: X-Mailing-List: linux-hardening@vger.kernel.org From: Jann Horn Document what SLAB_VIRTUAL is trying to do, how it's implemented, and why. Signed-off-by: Jann Horn Co-developed-by: Matteo Rizzo Signed-off-by: Matteo Rizzo --- Documentation/security/self-protection.rst | 102 +++++++++++++++++++++ 1 file changed, 102 insertions(+) diff --git a/Documentation/security/self-protection.rst b/Documentation/security/self-protection.rst index 910668e665cb..5a5e99e3f244 100644 --- a/Documentation/security/self-protection.rst +++ b/Documentation/security/self-protection.rst @@ -314,3 +314,105 @@ To help kill classes of bugs that result in kernel addresses being written to userspace, the destination of writes needs to be tracked. If the buffer is destined for userspace (e.g. seq_file backed ``/proc`` files), it should automatically censor sensitive values. + + +Memory Allocator Mitigations +============================ + +Protection against cross-cache attacks (SLAB_VIRTUAL) +----------------------------------------------------- + +SLAB_VIRTUAL is a mitigation that deterministically prevents cross-cache +attacks. + +Linux Kernel use-after-free vulnerabilities are commonly exploited by turning +them into an object type confusion (having two active pointers of different +types to the same memory location) using one of the following techniques: + +1. Direct object reuse: make the kernel give the victim object back to the slab + allocator, then allocate the object again from the same slab cache as a + different type. This is only possible if the victim object resides in a slab + cache which can contain objects of different types - for example one of the + kmalloc caches. +2. "Cross-cache attack": make the kernel give the victim object back to the slab + allocator, then make the slab allocator give the page containing the object + back to the page allocator, then either allocate the page directly as some + other type of page or make the slab allocator allocate it again for a + different slab cache and allocate an object from there. + +In either case, the important part is that the same virtual address is reused +for two objects of different types. + +The first case can be addressed by separating objects of different types +into different slab caches. If a slab cache only contains objects of the +same type then directly turning an use-after-free into a type confusion is +impossible as long as the slab page that contains the victim object remains +assigned to that slab cache. This type of mitigation is easily bypassable +by cross-cache attacks: if the attacker can make the slab allocator return +the page containing the victim object to the page allocator and then make +it use the same page for a different slab cache, type confusion becomes +possible again. Addressing the first case is therefore only worthwhile if +cross-cache attacks are also addressed. AUTOSLAB uses a combination of +probabilistic mitigations for this. SLAB_VIRTUAL addresses the second case +deterministically by changing the way the slab allocator allocates memory. + +Preventing slab virtual address reuse +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +In theory there is an easy fix against cross-cache attacks: modify the slab +allocator so that it never gives memory back to the page allocator. In practice +this would be problematic because physical memory remains permanently assigned +to a slab cache even if it doesn't contain any active objects. A viable +cross-cache mitigation must allow the system to reclaim unused physical memory. +In the current design of the slab allocator there is no way +to keep a region of virtual memory permanently assigned to a slab cache without +also permanently reserving physical memory. That is because the virtual +addresses that the slab allocator uses come from the linear map region, where +there is a 1:1 correspondence between virtual and physical addresses. + +SLAB_VIRTUAL's solution is to create a dedicated virtual memory region that is +only used for slab memory, and to enforce that once a range of virtual addresses +is used for a slab cache, it is never reused for any other caches. Using a +dedicated region of virtual memory lets us reserve ranges of virtual addresses +to prevent cross-cache attacks and at the same time release physical memory back +to the system when it's no longer needed. This is what Chromium's PartitionAlloc +does in userspace +(https://chromium.googlesource.com/chromium/src/+/354da2514b31df2aa14291199a567e10a7671621/base/allocator/partition_allocator/PartitionAlloc.md). + +Implementation +~~~~~~~~~~~~~~ + +SLAB_VIRTUAL reserves a region of virtual memory for the slab allocator. All +pointers returned by the slab allocator point to this region. The region is +statically partitioned in two sub-regions: the metadata region and the data +region. The data region is where the actual objects are allocated from. The +metadata region is an array of struct slab objects, one for each PAGE_SIZE bytes +in the data region. +Without SLAB_VIRTUAL, struct slab is overlaid on top of the struct page/struct +folio that corresponds to the physical memory page backing the slab instead of +using a dedicated memory region. This doesn't work for SLAB_VIRTUAL, which needs +to store metadata for slabs even when no physical memory is allocated to them. +Having an array of struct slab lets us implement virt_to_slab efficiently purely +with arithmetic. In order to support high-order slabs, the struct slabs +corresponding to tail pages contain a pointer to the head slab, which +corresponds to the slab's head page. + +TLB flushing +~~~~~~~~~~~~ + +Before it can release a page of physical memory back to the page allocator, the +slab allocator must flush the TLB entries for that page on all CPUs. This is not +only necessary for the mitigation to work reliably but it's also required for +correctness. Without a TLB flush some CPUs might continue using the old mapping +if the virtual address range is reused for a new slab and cause memory +corruption even in the absence of other bugs. The slab allocator can release +pages in contexts where TLB flushes can't be performed (e.g. in hardware +interrupt handlers). Pages to free are not freed directly, and instead they are +put on a queue and freed from a workqueue context which also flushes the TLB. + +Performance +~~~~~~~~~~~ + +SLAB_VIRTUAL's performance impact depends on the workload. On kernel compilation +(kernbench) the slowdown is about 1-2% depending on the machine type and is +slightly worse on machines with more cores.