From patchwork Fri Aug 18 06:05:23 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Li X-Patchwork-Id: 13357381 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 001B1C71136 for ; Fri, 18 Aug 2023 06:06:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DD5C9940052; Fri, 18 Aug 2023 02:06:02 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D866C940009; Fri, 18 Aug 2023 02:06:02 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C75B0940052; Fri, 18 Aug 2023 02:06:02 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id B3D1F940009 for ; Fri, 18 Aug 2023 02:06:02 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 6484FC05CC for ; Fri, 18 Aug 2023 06:06:02 +0000 (UTC) X-FDA: 81136189764.14.D21F148 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf16.hostedemail.com (Postfix) with ESMTP id AE57218000B for ; Fri, 18 Aug 2023 06:06:00 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=fnqnrGAn; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf16.hostedemail.com: domain of chrisl@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=chrisl@kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1692338760; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Hj/wVF4H+jUXqVgcHQl2oGumXYjH8YVELta79Uqk9HQ=; b=0UPFZ52yF8Am2jCm1LOQD88WHat9lP/i64/r0kAF3x/4NAG5zQsME8UMRouHok+UHmp1bW 6KUAeH8HR70UFsfogjm09mIjB2Jjoia0Nk1m0uWP6IV8Y1vFszYDP1ELZyDkWVDiCUzq0q lgM/EBbfe5a5WmnPYXqPwI7MriFn16w= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=fnqnrGAn; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf16.hostedemail.com: domain of chrisl@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=chrisl@kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1692338760; a=rsa-sha256; cv=none; b=Jt0E/KRXa9WT98Un6XPr5uwLC37dEHnmElsGDzHuIDK3nhZCIJu8Af7mB5e+2W5qNyVz67 tSA5SQaDLYP9fgNrj3IiCHEA/f4/KBuly3a4TVxubqBhDOQ30ikVUBFOTQay+4hFVXhmcI gnqn20vxmqTKQcvksqcoU6+QnfncSFM= Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id CEABB634F4; Fri, 18 Aug 2023 06:05:59 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id E04F4C433C9; Fri, 18 Aug 2023 06:05:58 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1692338759; bh=uh54hFbXQ/BfwpWzs/O+bKFUYp44N50ejYnxz4It+b0=; h=From:Date:Subject:References:In-Reply-To:To:Cc:From; b=fnqnrGAnVQyXR1SS/0NXCLMP1X7tRNAp9dB+8XKfYgRvzDsbTaFsi/rxhGGW7IuU6 +e+pSg5knwNBoIguA8cF0CMjDF1mPQDAkkk9o5tHAQ1OIq15vvfagkUnN3UFVye4IA y7lSunNnzFdJ/gUaPDMcBUoVZv+xg5EquAfE4ZPclYglXa8Za+JGEctXzdz7xeMloE 7mipOXKSkwoNk+FJr0D+wIuuVlS5ESh4VTmOWRr5QBhsAx/fkt4lEZ6pdYaQ4ks/gU lPmxLd+NP4xvTdrGiwzYT6oBBvS6WT61fjeX0zl7HnBKfWhszwBaWETEUil8nFwrdT +JCr7unUVIkXg== From: Chris Li Date: Thu, 17 Aug 2023 23:05:23 -0700 Subject: [PATCH RFC 1/2] mm/page_alloc: safeguard free_pcppages_bulk MIME-Version: 1.0 Message-Id: <20230817-free_pcppages_bulk-v1-1-c14574a9f80c@kernel.org> References: <20230817-free_pcppages_bulk-v1-0-c14574a9f80c@kernel.org> In-Reply-To: <20230817-free_pcppages_bulk-v1-0-c14574a9f80c@kernel.org> To: Andrew Morton , Kemeng Shi Cc: akpm@linux-foundation.org, baolin.wang@linux.alibaba.com, mgorman@techsingularity.net, Michal Hocko , david@redhat.com, willy@infradead.org, linux-mm@kvack.org, Namhyung Kim , Greg Thelen , linux-kernel@vger.kernel.org, Chris Li , John Sperbeck X-Mailer: b4 0.12.2 X-Rspamd-Queue-Id: AE57218000B X-Rspam-User: X-Rspamd-Server: rspam02 X-Stat-Signature: syw83mt3kbf1xe7pqj8ai6ighwbtwj3a X-HE-Tag: 1692338760-685030 X-HE-Meta: U2FsdGVkX19yvxjbnN0/6MnYnLuog5p9mhCowmOQIUBTl4VPxXc/lQbvcoiMbShii0SmdQ7HuURG6tTMAEEMmTZEcWGgy0KAOI9nbIUuJNaeHh76BW9m5F3wUBFHd/oNgMZdJZFiU29kZ+zV2+EU3iVavkQwAq0eKCj2nGxBwziuRgBTZujdRnaUm3oY0jRaPAf5ZMWsUqqox8xJI84LWWYaZfZ7SfO1iAUDe26pPsMhGGMTp5SCoMhtgAk/57baPCwYivagoW1QerTZjhJN2kVZdQYWSe1pOLurTLsqPHHUODIqqVEsXEdZzYgcUKUk+6eiFQw+32Z2nGC+PKx8jB2x7UA0FCXD9RB6SJ6Kic1VEPhlqMkzI+HgD+lqb5eVufD4wp5Fo22pS+CIDiXWDfD0V1THO3FELTtGQjsBcIAEi8SsUHuDIfTB5BCDGoClYC1sDwM2J1Mu6mZ2LuXSj6s7iBI9UJK4zUp6gI8HkZnCyCZgdUW/q+NCv9mnDkczsAbWq3M5GlqaxuB+JnxhDoqP/e8Dauvfdc1M43/N0OZkUijALmvfxmVqXlyvX4ft+/lJuMGaSUjpyk8GZKH5W+vtG1EHnyBVFlgcfWl4MDuARUhd7lq5CyFOqzAz88nhW+eQ974dhZ4mrMc7Pf6W3dOsNSrZv9hNJ4ubC38H/auXOnEEBgLXY/8Jd+zPJrGHKb5ih15FVLHan3FJoYP9dTecR537b+oj5t70b4D70WweZBDM7issHCrNc3evpUmuFTFXtQt78qMSWvYTISXxsIKj9cjigWBnfdzJN28pZghi12zo1qBTKrxg6ToXSewp0U/OPCys1eSeCo9BpugzT705/yi7aWQZrsmBjs9vX89Pi5pN4VImLofUnd9jsxmGGd6BGl2HCtLrMlwYymtEVsDpY+1agSbGDAMrwbRoXbtce05Ow1Rw5w2FDbYbqkSUJZ1hc8hep+/XWWmDeGE u74NNjvA b8LEcWjwPbL+nVe/HFJQkqanu9DYxdEgSixgOoDj8fekEyYF396wBFkP16wVioVt10ICrPXOyHavRcG+knF2M9jWpfRHx3JHuddndlgIosxCeMQiRCj+qccrMfrCv4lTi16pNyIxMVtOAgzZ0UzUXSvWbZv0G3ZVxh9lMzzJCGRpUFUG6UgjLFFi+WMbAIRggb9wS/3LnPmdGTfh9T5CSk6YkhPuNMCeHHMDzdxg36yne30wUgsKXp7kCwN8daThvBVw2ftX69kYSFPnS9VOQMXNhNNj/9eZgtRZRgSsjJH/0H9aVaMjQ8KatrjIn8EWQAfwl8/B2ofZMhoNCx5TgrXyO6e7Ex9tCTQKKu0I9FdhguY17GHjf4/2KHvaJ2eBZXAIJdTh8kLkcHzG0Xwk1xTcluqpSikpXR+FF X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The current free_pcppages_bulk() can panic when pcp->count is changed outside of this function by the BPF program injected in ftrace function entry. Commit c66a36af7ba3a628 was to fix on the BPF program side to not allocate memory inside the spinlock. But the kernel can still panic loading similar BPF without the fix. Here is the step to reproduce it: $ git checkout 19030564ab116757e32 $ cd tools/perf $ make perf $ ./perf lock con -ab -- ./perf bench sched messaging You should be able to see the kernel panic within 20 seconds. Here is what happened in the panic: count = min(pcp->count, count); free_pcppages_bulk() assumes count and pcp->count are in sync. There are no pcp->count changes outside of this function. That assumption gets broken when BPF lock contention code allocates memory inside spinlock. pcp->count is one less than "count". The loop only checks against "count" and runs into a deadloop because pcp->count drops to zero and all lists are empty. In a deadloop pindex_min can grow bigger than pindex_max and pindex_max can lower to negative. The kernel panic is happening on the pindex trying to access outside of pcp->lists ranges. Notice that this is just one of the (buggy) BPF programs that can break it. Other than the spin lock, there are other function tracepoints under this function can be hooked up to the BPF program which can allocate memory and change the pcp->count. One argument is that BPF should not allocate memory under the spinlock. On the other hand, the kernel can just check pcp->count inside the loop to avoid the kernel panic. Signed-off-by: Chris Li Reported-by: John Sperbeck --- mm/page_alloc.c | 8 +------- 1 file changed, 1 insertion(+), 7 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 1eb3864e1dbc7..347cb93081a02 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1215,12 +1215,6 @@ static void free_pcppages_bulk(struct zone *zone, int count, bool isolated_pageblocks; struct page *page; - /* - * Ensure proper count is passed which otherwise would stuck in the - * below while (list_empty(list)) loop. - */ - count = min(pcp->count, count); - /* Ensure requested pindex is drained first. */ pindex = pindex - 1; @@ -1266,7 +1260,7 @@ static void free_pcppages_bulk(struct zone *zone, int count, __free_one_page(page, page_to_pfn(page), zone, order, mt, FPI_NONE); trace_mm_page_pcpu_drain(page, order, mt); - } while (count > 0 && !list_empty(list)); + } while (count > 0 && pcp->count > 0 && !list_empty(list)); } spin_unlock_irqrestore(&zone->lock, flags);