[1/1] linux-5.10/rcu/kvfree: Avoid freeing new kfree_rcu() memory after old grace period

From: Ziwei Dai <ziwei.dai@unisoc.com>

From: Ziwei Dai <ziwei.dai@unisoc.com>

commit 5da7cb193db32da783a3f3e77d8b639989321d48 upstream.

Memory passed to kvfree_rcu() that is to be freed is tracked by a
per-CPU kfree_rcu_cpu structure, which in turn contains pointers
to kvfree_rcu_bulk_data structures that contain pointers to memory
that has not yet been handed to RCU, along with an kfree_rcu_cpu_work
structure that tracks the memory that has already been handed to RCU.
These structures track three categories of memory: (1) Memory for
kfree(), (2) Memory for kvfree(), and (3) Memory for both that arrived
during an OOM episode.  The first two categories are tracked in a
cache-friendly manner involving a dynamically allocated page of pointers
(the aforementioned kvfree_rcu_bulk_data structures), while the third
uses a simple (but decidedly cache-unfriendly) linked list through the
rcu_head structures in each block of memory.

On a given CPU, these three categories are handled as a unit, with that
CPU's kfree_rcu_cpu_work structure having one pointer for each of the
three categories.  Clearly, new memory for a given category cannot be
placed in the corresponding kfree_rcu_cpu_work structure until any old
memory has had its grace period elapse and thus has been removed.  And
the kfree_rcu_monitor() function does in fact check for this.

Except that the kfree_rcu_monitor() function checks these pointers one
at a time.  This means that if the previous kfree_rcu() memory passed
to RCU had only category 1 and the current one has only category 2, the
kfree_rcu_monitor() function will send that current category-2 memory
along immediately.  This can result in memory being freed too soon,
that is, out from under unsuspecting RCU readers.

To see this, consider the following sequence of events, in which:

o	Task A on CPU 0 calls rcu_read_lock(), then uses "from_cset",
	then is preempted.

o	CPU 1 calls kfree_rcu(cset, rcu_head) in order to free "from_cset"
	after a later grace period.  Except that "from_cset" is freed
	right after the previous grace period ended, so that "from_cset"
	is immediately freed.  Task A resumes and references "from_cset"'s
	member, after which nothing good happens.

In full detail:

CPU 0					CPU 1
----------------------			----------------------
count_memcg_event_mm()
|rcu_read_lock()  <---
|mem_cgroup_from_task()
 |// css_set_ptr is the "from_cset" mentioned on CPU 1
 |css_set_ptr = rcu_dereference((task)->cgroups)
 |// Hard irq comes, current task is scheduled out.

					cgroup_attach_task()
					|cgroup_migrate()
					|cgroup_migrate_execute()
					|css_set_move_task(task, from_cset, to_cset, true)
					|cgroup_move_task(task, to_cset)
					|rcu_assign_pointer(.., to_cset)
					|...
					|cgroup_migrate_finish()
					|put_css_set_locked(from_cset)
					|from_cset->refcount return 0
					|kfree_rcu(cset, rcu_head) // free from_cset after new gp
					|add_ptr_to_bulk_krc_lock()
					|schedule_delayed_work(&krcp->monitor_work, ..)

					kfree_rcu_monitor()
					|krcp->bulk_head[0]'s work attached to krwp->bulk_head_free[]
					|queue_rcu_work(system_wq, &krwp->rcu_work)
					|if rwork->rcu.work is not in WORK_STRUCT_PENDING_BIT state,
					|call_rcu(&rwork->rcu, rcu_work_rcufn) <--- request new gp

					// There is a perious call_rcu(.., rcu_work_rcufn)
					// gp end, rcu_work_rcufn() is called.
					rcu_work_rcufn()
					|__queue_work(.., rwork->wq, &rwork->work);

					|kfree_rcu_work()
					|krwp->bulk_head_free[0] bulk is freed before new gp end!!!
					|The "from_cset" is freed before new gp end.

// the task resumes some time later.
 |css_set_ptr->subsys[(subsys_id) <--- Caused kernel crash, because css_set_ptr is freed.

This commit therefore causes kfree_rcu_monitor() to refrain from moving
kfree_rcu() memory to the kfree_rcu_cpu_work structure until the RCU
grace period has completed for all three categories.

v2: Use helper function instead of inserted code block at kfree_rcu_monitor().

[UR: backport to 5.10-stable]
[UR: Added missing need_offload_krc() function]
Fixes: 34c881745549 ("rcu: Support kfree_bulk() interface in kfree_rcu()")
Fixes: 5f3c8d620447 ("rcu/tree: Maintain separate array for vmalloc ptrs")
Reported-by: Mukesh Ojha <quic_mojha@quicinc.com>
Signed-off-by: Ziwei Dai <ziwei.dai@unisoc.com>
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Tested-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
---
 kernel/rcu/tree.c | 49 +++++++++++++++++++++++++++++++++--------------
 1 file changed, 35 insertions(+), 14 deletions(-)

Message ID	20230418102518.5911-1-urezki@gmail.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <rcu-owner@vger.kernel.org> X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id AB6F3C77B71 for <rcu@archiver.kernel.org>; Tue, 18 Apr 2023 10:26:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231296AbjDRK0C (ORCPT <rfc822;rcu@archiver.kernel.org>); Tue, 18 Apr 2023 06:26:02 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50572 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231307AbjDRKZh (ORCPT <rfc822;rcu@vger.kernel.org>); Tue, 18 Apr 2023 06:25:37 -0400 Received: from mail-lj1-x22f.google.com (mail-lj1-x22f.google.com [IPv6:2a00:1450:4864:20::22f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D0BDB9ECC; Tue, 18 Apr 2023 03:25:22 -0700 (PDT) Received: by mail-lj1-x22f.google.com with SMTP id 38308e7fff4ca-2a8b766322bso19580371fa.1; Tue, 18 Apr 2023 03:25:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1681813521; x=1684405521; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=3eqsG3JIOUnwL2J/03j4BZMVHjdVi/38PpyBzyrx03A=; b=ARgOuIPer0pk3cxOb5BQ4uGOAAbcVDYDIRYrjsPt1m5UlkhK4gDxW6s19LbF8HymUy Bh10q7D7nFm45HZF5K1xeXsmTB5URRB7TyLdkVXNB6rBls9jYBSVhMlYOJ48SNSigMQa JfBgq3BHQAzMrDL+1X9COxZFhN8OGnwASKciSnfFEL+Y+jirLmjXfnQpcYDWAIRMuW50 ygjRcn49jM84HlJr+8Jfihlmp+8dHYp/Y1zl6Vhtq3BMrDXiU2xNDwiD32zurJC8aN70 qj68O3Ti2wAthG4eOXsRUdDul9V5OkRyCygo6j+Gh6z1cFpcKgFSVhBXbtPluHrhdunh qyqw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1681813521; x=1684405521; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=3eqsG3JIOUnwL2J/03j4BZMVHjdVi/38PpyBzyrx03A=; b=kkp+RA4pkzwlnjwupu5OADku8XO1rnSpKxK2563Kywr24LN52P2vzKzPLv5K6pn4kj suYuNJEaICEfM2TSwUO558i7RcSesGJnWAiuej18gA/MmhwKrTkESbYb5VhYSt90/p7h +SR/UeW+kZyTSwYjTG+isfOeLIiluvL6XJVg34fdbd5O5w35Frg/bCjVz/QVPY/OEH7q k5rx/0dm4tG+77PDmvIIlfLezZKEIopkMU3HodTE2gBYq9QMqTcObfO6c98WYhgGLPuJ kjpHyLOBhF3m+09XetMRlKOpoi/uGqBGNuhkK17aWRd+Mu4FtxsUmMoei4lo+Ak0wUBS 3G5Q== X-Gm-Message-State: AAQBX9fex37WxrOgbNvjX8lTrqpUbsq7NBlrSb/U9q5As07N6V6Jjkz/ eS/zocEQ0zL7jO4mbHLsQdFCdKiism8= X-Google-Smtp-Source: AKy350bOHqAjngubK0JB/JofcgRK9SEksGCAlT0jRSnwEuS0C4vG0vm9mcG8QtatZO5UyGy8OYRzGg== X-Received: by 2002:ac2:54b3:0:b0:4eb:341c:ecc1 with SMTP id w19-20020ac254b3000000b004eb341cecc1mr2729280lfk.5.1681813520884; Tue, 18 Apr 2023 03:25:20 -0700 (PDT) Received: from pc638.lan ([155.137.26.201]) by smtp.gmail.com with ESMTPSA id z13-20020ac25ded000000b004ec8de8ab43sm2332421lfq.139.2023.04.18.03.25.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 18 Apr 2023 03:25:20 -0700 (PDT) From: "Uladzislau Rezki (Sony)" <urezki@gmail.com> To: stable@vger.kernel.org Cc: RCU <rcu@vger.kernel.org>, Uladzislau Rezki <urezki@gmail.com>, Oleksiy Avramchenko <oleksiy.avramchenko@sony.com>, Ziwei Dai <ziwei.dai@unisoc.com>, Mukesh Ojha <quic_mojha@quicinc.com>, "Paul E . McKenney" <paulmck@kernel.org> Subject: [PATCH 1/1] linux-5.10/rcu/kvfree: Avoid freeing new kfree_rcu() memory after old grace period Date: Tue, 18 Apr 2023 12:25:16 +0200 Message-Id: <20230418102518.5911-1-urezki@gmail.com> X-Mailer: git-send-email 2.30.2 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: <rcu.vger.kernel.org> X-Mailing-List: rcu@vger.kernel.org
Series	[1/1] linux-5.10/rcu/kvfree: Avoid freeing new kfree_rcu() memory after old grace period \| expand [1/1] linux-5.10/rcu/kvfree: Avoid freeing new kfree_rcu() memory after old grace period

[1/1] linux-5.10/rcu/kvfree: Avoid freeing new kfree_rcu() memory after old grace period

Commit Message

Patch