From patchwork Tue Dec 10 16:40:31 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Uladzislau Rezki X-Patchwork-Id: 13901757 Received: from mail-lf1-f49.google.com (mail-lf1-f49.google.com [209.85.167.49]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9EC061D5171; Tue, 10 Dec 2024 16:40:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.167.49 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733848842; cv=none; b=UPEMvYmsKBuwEDAiaNMPdhca/nhBpYteZJypJ5dHACwZ7g3AyA3ZTS1Km2IdWmFGMp7yBxvdwnjLPnCEcnEkrBKLvxI4N6cxbyn/K4HIzUuKtsB/DTeUNe3QIIEp+grJlBgtv8vGDghrv3kNrTt0xC0Q41UIolPuQJx6k7b8Uj8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733848842; c=relaxed/simple; bh=Urh+WmY3GFT00nqYOZ92UBETC1FBOevOJk0kXUGPxaw=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=WXKL+qPuOW1kpEGFVas1py3c2ElUotcSbDcFpToNtg/gQCjOIMpo4yuJ7atG5PwzHHrLdpImg5EKz8dQ1wZwYUPrsiAkt/np9AR1iQrvMnqs0AWszy/0fl5EpfUaELRygr+fYC7XFxJwfLiny/dZCAF3fc3pFE6XZ+U9L82jnCg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=IlNH9x96; arc=none smtp.client-ip=209.85.167.49 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="IlNH9x96" Received: by mail-lf1-f49.google.com with SMTP id 2adb3069b0e04-54021daa6cbso1951566e87.0; Tue, 10 Dec 2024 08:40:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1733848839; x=1734453639; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=UdPE5CaBHMhT6cavsDrFmlg9kwPywztBv9Mfz0gYFxI=; b=IlNH9x96sjb4g+9ESbXCJjr21rfg4/nXliJ072RuqkYn36sDUjt0d2k2LLPRfVCzY1 MFXaBvN7T/dmv76lnfuU9VYBZbsqHQ+L3mA/ZO9TDgEGDZEiY8Vxix2zQtK5C7T0d5eM odwi8z1ykLUnUZgvh2FRnXs8XgNGP877K80kB3RVMUm41dTOH0yao+40sMl6xe2t2V25 P4sKzDONRc6ePQ0Ro2qup8MKrzfH85l9hejS/xyvbAZqHDwBc2oF7RslhUd+dWVBFHUk KI5kYRg20lECWx1muQ0UFFlLk58zrlpCD9X6oozWwhQVJgeILX9SmbYcwzV6XqtV7Qpx /75g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1733848839; x=1734453639; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=UdPE5CaBHMhT6cavsDrFmlg9kwPywztBv9Mfz0gYFxI=; b=oS0CButkkTXbV/nW0XREZDhCOTPi6ddNLgsFzBNCU/qxPDFsaK+F7J3egwwl7uFk4/ n3GSdmItxQRlpwpLe99lPnEK60P5Mq5nCJwsKdtKXumJiwy6H/NrenHLVUO4/f+8eWV2 ElPb1FGe0x/as/nljsaEGAO1ZEEE/aln+JPP7Lnzg4NgXYp1ryTgZFVOw0IIvO0njutf Tun6edr+6Qy1dG0typpD99qUZmJNxiUhVnGDT1QAi9/R4oiIe1OfLqcbd6sEwLbcnLlE uKZeim/ct7F4umEkmlvqvvsNNrIAVULiiKhuyNop7S553qkujR8aIvndWtr99xmLwNbX 1ycA== X-Forwarded-Encrypted: i=1; AJvYcCW23tDoAM2JVFOE4+yu9Jz6hPbG68CON3ZwrafsdZZ0QsT7XO4oJd92pMJ5M9KKHxJvXIhD0CD7czUoBpw=@vger.kernel.org X-Gm-Message-State: AOJu0Yy7K4wYC0JwYi9G5a4sWPOViMLywKt2SVTTHUGK5179MtniSHMW V+X7mtz5DUpJe4Af/k0H8Sa6NPyH2JsQJ5/B239qz2VIun7cdwu2 X-Gm-Gg: ASbGncs6MhS3ECf66QQVA4dPqTFtSp3iFV7mnI+JohTzLNQmZA+kECxUARIAPKVFaPs rzRaZbKze7g7NCSvOFUn4hDTD/1Krw6KRPG/55RBkDPzseHtPOqTjrymlOUivoGA5ivDDZlvf1j VCQaMEZFDyEs23AmCGffatVXsMXwysbyeyj2CSXxgRX6UV5bp1N3p5IOubGzhwt8OeeM4UxicVW n5ifhqIpx41yavUCRutsdi6peFofAH1N7TkH8zVUOW6IXe2lg== X-Google-Smtp-Source: AGHT+IFXg03XavY3zbgVuUO+42mxcr0qW2R92VYVB7iUi/Qw3cgWDDrElu+xTCGcLsSdSOB6pE9uDg== X-Received: by 2002:a05:6512:15a2:b0:540:1abe:d6d2 with SMTP id 2adb3069b0e04-54024107498mr1918094e87.35.1733848838529; Tue, 10 Dec 2024 08:40:38 -0800 (PST) Received: from pc638.lan ([2001:9b1:d5a0:a500:2d8:61ff:fec9:d743]) by smtp.gmail.com with ESMTPSA id 2adb3069b0e04-53f93377eefsm1031875e87.67.2024.12.10.08.40.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 10 Dec 2024 08:40:37 -0800 (PST) From: "Uladzislau Rezki (Sony)" To: linux-mm@kvack.org, Andrew Morton , Vlastimil Babka Cc: RCU , LKML , Uladzislau Rezki , Oleksiy Avramchenko Subject: [RFC v1 1/5] rcu/kvfree: Temporary reclaim over call_rcu() Date: Tue, 10 Dec 2024 17:40:31 +0100 Message-Id: <20241210164035.3391747-2-urezki@gmail.com> X-Mailer: git-send-email 2.39.5 In-Reply-To: <20241210164035.3391747-1-urezki@gmail.com> References: <20241210164035.3391747-1-urezki@gmail.com> Precedence: bulk X-Mailing-List: rcu@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 This is to start a smooth process of moving a main functionality to the SLAB. Therefore this patch: - adds a support(temporary) to reclaim freed objects over call_rcu(); - disconnects a main functionality of kvfree_rcu() API by using call_rcu(); - directly reclaims an object for a single-argument variant; - adds an rcu_barrier() call to the kvfree_rcu_barrier(). Signed-off-by: Uladzislau Rezki (Sony) --- kernel/rcu/tree.c | 27 +++++++++++++++++++++++---- 1 file changed, 23 insertions(+), 4 deletions(-) diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index b1f883fcd918..ab24229dfa73 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -2559,13 +2559,19 @@ static void rcu_do_batch(struct rcu_data *rdp) debug_rcu_head_unqueue(rhp); rcu_lock_acquire(&rcu_callback_map); - trace_rcu_invoke_callback(rcu_state.name, rhp); f = rhp->func; - debug_rcu_head_callback(rhp); - WRITE_ONCE(rhp->func, (rcu_callback_t)0L); - f(rhp); + /* This is temporary, it will be removed when migration is over. */ + if (__is_kvfree_rcu_offset((unsigned long) f)) { + trace_rcu_invoke_kvfree_callback("", rhp, (unsigned long) f); + kvfree((void *) rhp - (unsigned long) f); + } else { + trace_rcu_invoke_callback(rcu_state.name, rhp); + debug_rcu_head_callback(rhp); + WRITE_ONCE(rhp->func, (rcu_callback_t)0L); + f(rhp); + } rcu_lock_release(&rcu_callback_map); /* @@ -3787,6 +3793,16 @@ void kvfree_call_rcu(struct rcu_head *head, void *ptr) struct kfree_rcu_cpu *krcp; bool success; + if (head) { + call_rcu(head, (rcu_callback_t) ((void *) head - ptr)); + } else { + synchronize_rcu(); + kvfree(ptr); + } + + /* Disconnect the rest. */ + return; + /* * Please note there is a limitation for the head-less * variant, that is why there is a clear rule for such @@ -3871,6 +3887,9 @@ void kvfree_rcu_barrier(void) bool queued; int i, cpu; + /* Temporary. */ + rcu_barrier(); + /* * Firstly we detach objects and queue them over an RCU-batch * for all CPUs. Finally queued works are flushed for each CPU. From patchwork Tue Dec 10 16:40:32 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Uladzislau Rezki X-Patchwork-Id: 13901758 Received: from mail-lf1-f50.google.com (mail-lf1-f50.google.com [209.85.167.50]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BF8792153DC; Tue, 10 Dec 2024 16:40:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.167.50 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733848843; cv=none; b=UCO5zYa6IHN6nzbcHDYv6pnFny+r6ZNpg4mmqvKn7JEnbqzRysgU3Gn4bzkh1ZYMOGaqK4x2lx4mzePxS2i/UrIRKWhBnMpWfX+mJajpkJsyIj4HSGup1lVmDviaxvO3cZwpajOgqChcQsdte8T2lA+p4lBh3J2sHOCwQXOsgsg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733848843; c=relaxed/simple; bh=96C4Ff7noTHvUfYlu51M40HOaWBfgD2burIr67S83dc=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=nxh1kbm44sLlkd6zsZfEJGW6SGWVhJM+4WaGizX0IXfzko0Tdma23GGp4ioXcua3lr8jO4lMEfq+SL6vXvCP8n9mek38HHfN+t03JPzwtASFq55+jPypCjqcaSaIntmtne4dyaH69ZPSouCCZbvPoOvoIv23LBYur7p7mBQpanY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=QixbVDay; arc=none smtp.client-ip=209.85.167.50 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="QixbVDay" Received: by mail-lf1-f50.google.com with SMTP id 2adb3069b0e04-540215984f0so2069953e87.1; Tue, 10 Dec 2024 08:40:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1733848840; x=1734453640; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=ClETbVzs2/eIlopkwOYmZeoS18KY5TfMVLeWid2GulI=; b=QixbVDay9+thajhGScGmLN64oUI/s1BoxkH/O+lIyz8sAn6b0uLq8xnWb/BqlPSjiM knyjMzHQfv0uDIySRbmj9SBwCysrnrbLHCfoxPSO3QdMJhNJUSFA/iYhI5ny6TqOFPfG +QPhdG74py5+/ZVwYuotYCA9+pe4YMfvGbAbNzu4MS2dIJJQn3lKT1Cizm3nKjKtJCZz xfe0QUb8YTQPfZPDB3hzWggoq2n+S/zs+IMfar/MIvn6jOXQzhVb/N2+0X/iNpVbhyoK eOKlpKIZeKWSXZ461TAxLEL8/j+etKCk3uxpFqWqlBG/O4IMNVqBD3U0HrQfyYxtcRIf BBfg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1733848840; x=1734453640; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ClETbVzs2/eIlopkwOYmZeoS18KY5TfMVLeWid2GulI=; b=W1x2dlII2dC6KrUbgDxtygyN1leK89GwWqMyvi7SogZq30tbZxygih51L2JDJndANA 0PKAzloeLgbMB1GbONez5m56uiMglj6jgH9qlTxKI0jyPaVjqjTes2x3wWn6no9TV08X S3cTPPVtBZMXg2ny2XwkXHLkGcf9kBYilQEIn1T0WCaM2BT0/thkwgeKE/KI7GGcoIUr 3X/rkopA6KSglJZAvNsnIkFrPD30DB7zslZ/tDPNv8fYHrl994kyUuWgqyGJAEYxTZe1 VpbykuD7l+HSeoANYTX0Wekh5r3Klr2fEH5oS69Jq4w7EilGPx766+xcVFxSZ7Mc1m5O 2D9w== X-Forwarded-Encrypted: i=1; AJvYcCUFcltvoMJb4ZP5wCcChQspZxo8tCjIaF4cLZTn/spxqc3MxPP0jBP0tITPON6+8eIXJPmHFy7PMG5A0gs=@vger.kernel.org X-Gm-Message-State: AOJu0Yw8Y+kxl3c4KAd+/43ZezTuyBaGUkIXgAF/B5SYqnlqwuiEkIx/ ibJYP/7lxxdC96aYzMb8IcjlF5ju3hMJcQ7Dickg8UjP2GcGCmDq X-Gm-Gg: ASbGnctKjSp2QAgbxbdVYhE/LArClpTLpuAh7OzmJlZvOTAFN3DasVnGwIOCpQfjLH7 7rmv8diOli8v0KeCphZiWBYBKDiISs53iz/w5Q88XIBt3CcL0MzaN4EAaAS1n0EvEBK8yRJqq6h yMuOsnwHyNIFfvNWc/+Eme0601JjGbQb6oiRiWFYW0kQI6TnUx30afpS8odtKH5yxp9vEb1ykbv QIGuwUrKjy/olxRsVrGETa/FdNfuZrlIRrpKy6+tCt9yvP/uQ== X-Google-Smtp-Source: AGHT+IFLmxwSZlSjwaR27QpJke6rURhJNzaMl3K1lpQik83HRlwxxz03FF+x+GNaY58FSlMOxRq9gA== X-Received: by 2002:a05:6512:ba6:b0:540:1fec:f322 with SMTP id 2adb3069b0e04-5402410485emr1773977e87.39.1733848839554; Tue, 10 Dec 2024 08:40:39 -0800 (PST) Received: from pc638.lan ([2001:9b1:d5a0:a500:2d8:61ff:fec9:d743]) by smtp.gmail.com with ESMTPSA id 2adb3069b0e04-53f93377eefsm1031875e87.67.2024.12.10.08.40.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 10 Dec 2024 08:40:38 -0800 (PST) From: "Uladzislau Rezki (Sony)" To: linux-mm@kvack.org, Andrew Morton , Vlastimil Babka Cc: RCU , LKML , Uladzislau Rezki , Oleksiy Avramchenko Subject: [RFC v1 2/5] mm/slab: Copy main data structures of kvfree_rcu() Date: Tue, 10 Dec 2024 17:40:32 +0100 Message-Id: <20241210164035.3391747-3-urezki@gmail.com> X-Mailer: git-send-email 2.39.5 In-Reply-To: <20241210164035.3391747-1-urezki@gmail.com> References: <20241210164035.3391747-1-urezki@gmail.com> Precedence: bulk X-Mailing-List: rcu@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 This patch copies main data structures of kvfree_rcu() API from the kernel/rcu/tree.c into slab_common.c file. Later on, it will be removed from the tree.c. Signed-off-by: Uladzislau Rezki (Sony) --- mm/slab_common.c | 95 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 95 insertions(+) diff --git a/mm/slab_common.c b/mm/slab_common.c index 893d32059915..a249fdb0d92e 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -1338,3 +1338,98 @@ EXPORT_TRACEPOINT_SYMBOL(kmem_cache_alloc); EXPORT_TRACEPOINT_SYMBOL(kfree); EXPORT_TRACEPOINT_SYMBOL(kmem_cache_free); +/* Maximum number of jiffies to wait before draining a batch. */ +#define KFREE_DRAIN_JIFFIES (5 * HZ) +#define KFREE_N_BATCHES 2 +#define FREE_N_CHANNELS 2 + +/** + * struct kvfree_rcu_bulk_data - single block to store kvfree_rcu() pointers + * @list: List node. All blocks are linked between each other + * @gp_snap: Snapshot of RCU state for objects placed to this bulk + * @nr_records: Number of active pointers in the array + * @records: Array of the kvfree_rcu() pointers + */ +struct kvfree_rcu_bulk_data { + struct list_head list; + struct rcu_gp_oldstate gp_snap; + unsigned long nr_records; + void *records[] __counted_by(nr_records); +}; + +/* + * This macro defines how many entries the "records" array + * will contain. It is based on the fact that the size of + * kvfree_rcu_bulk_data structure becomes exactly one page. + */ +#define KVFREE_BULK_MAX_ENTR \ + ((PAGE_SIZE - sizeof(struct kvfree_rcu_bulk_data)) / sizeof(void *)) + +/** + * struct kfree_rcu_cpu_work - single batch of kfree_rcu() requests + * @rcu_work: Let queue_rcu_work() invoke workqueue handler after grace period + * @head_free: List of kfree_rcu() objects waiting for a grace period + * @head_free_gp_snap: Grace-period snapshot to check for attempted premature frees. + * @bulk_head_free: Bulk-List of kvfree_rcu() objects waiting for a grace period + * @krcp: Pointer to @kfree_rcu_cpu structure + */ + +struct kfree_rcu_cpu_work { + struct rcu_work rcu_work; + struct rcu_head *head_free; + struct rcu_gp_oldstate head_free_gp_snap; + struct list_head bulk_head_free[FREE_N_CHANNELS]; + struct kfree_rcu_cpu *krcp; +}; + +/** + * struct kfree_rcu_cpu - batch up kfree_rcu() requests for RCU grace period + * @head: List of kfree_rcu() objects not yet waiting for a grace period + * @head_gp_snap: Snapshot of RCU state for objects placed to "@head" + * @bulk_head: Bulk-List of kvfree_rcu() objects not yet waiting for a grace period + * @krw_arr: Array of batches of kfree_rcu() objects waiting for a grace period + * @lock: Synchronize access to this structure + * @monitor_work: Promote @head to @head_free after KFREE_DRAIN_JIFFIES + * @initialized: The @rcu_work fields have been initialized + * @head_count: Number of objects in rcu_head singular list + * @bulk_count: Number of objects in bulk-list + * @bkvcache: + * A simple cache list that contains objects for reuse purpose. + * In order to save some per-cpu space the list is singular. + * Even though it is lockless an access has to be protected by the + * per-cpu lock. + * @page_cache_work: A work to refill the cache when it is empty + * @backoff_page_cache_fill: Delay cache refills + * @work_in_progress: Indicates that page_cache_work is running + * @hrtimer: A hrtimer for scheduling a page_cache_work + * @nr_bkv_objs: number of allocated objects at @bkvcache. + * + * This is a per-CPU structure. The reason that it is not included in + * the rcu_data structure is to permit this code to be extracted from + * the RCU files. Such extraction could allow further optimization of + * the interactions with the slab allocators. + */ +struct kfree_rcu_cpu { + // Objects queued on a linked list + // through their rcu_head structures. + struct rcu_head *head; + unsigned long head_gp_snap; + atomic_t head_count; + + // Objects queued on a bulk-list. + struct list_head bulk_head[FREE_N_CHANNELS]; + atomic_t bulk_count[FREE_N_CHANNELS]; + + struct kfree_rcu_cpu_work krw_arr[KFREE_N_BATCHES]; + raw_spinlock_t lock; + struct delayed_work monitor_work; + bool initialized; + + struct delayed_work page_cache_work; + atomic_t backoff_page_cache_fill; + atomic_t work_in_progress; + struct hrtimer hrtimer; + + struct llist_head bkvcache; + int nr_bkv_objs; +}; From patchwork Tue Dec 10 16:40:33 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Uladzislau Rezki X-Patchwork-Id: 13901759 Received: from mail-lf1-f54.google.com (mail-lf1-f54.google.com [209.85.167.54]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D071322FE17; Tue, 10 Dec 2024 16:40:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.167.54 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733848845; cv=none; b=CkE8zZ9Ko7y/CyBF8HCRmgjuKfx19kNMjiOdwORQpz14RIDUAvGpHzo4uxqo223yLUyPE2oycXrxzgeUJxT4L3IZy5uGqPBSWfZMB0XbJrI/1L9+s4ptPW4ACEaNu1PJq/mzluHjdexhGCGvHo4Km/uqN/X2eIR5/TnGgS6HNMk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733848845; c=relaxed/simple; bh=JcJp/kjpV8uX8ed5/eGkNqXQaPsJfJogE2GsATef6rw=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=o54AbWiSgtMn3TnxEGuB5NsSrPqTx/VqFQxyM1wcNSnU9L4QvC1JtpMZWV//uo5VH2ouqNdwkXjLZbNBpH+yz0xkt2fLPgBRnAwaSNNcJLulesIg0gfmEDBF19bB/0WgH4MRNybPmWr1m0Flexs5mTTathuuD2i94AaOQqW8GUM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=drjAHxoh; arc=none smtp.client-ip=209.85.167.54 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="drjAHxoh" Received: by mail-lf1-f54.google.com with SMTP id 2adb3069b0e04-540201cfedbso2024829e87.3; Tue, 10 Dec 2024 08:40:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1733848841; x=1734453641; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=dvLl1/wUXAtZR/OsurduIKqJhQ3HiLLZga1zIHvCWGI=; b=drjAHxohUQJ2lDAITrV5THxv2qShkOV0JhRSZ6PpDfMXbKnWGRAT6gOI1WIBrFGOZs l4hiaa7dYVeR0j/O1OCli0bi9xMIRDAQiW0hcO14ivzAgwhMqIMGRxxo65TqYM3edo8z FROV+FK88a73yYax+i/jqI+dthDZBvqdAR7QsZDXbkuEla0bngPGOKbNTTIpCNbm3TxZ 5IwRiJKZgxqceiJdRhDcJmcji1VCZrAAnGuzwQg1qowFwBaJNvS69w0xhAOq6gDBiWID hblYZv33WC4DniJnfI7mSI5cZBSnPONS2V1SjpnhLDPb8YAJlGAuFFO5J0A3kJ0FYOMj KPKA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1733848841; x=1734453641; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=dvLl1/wUXAtZR/OsurduIKqJhQ3HiLLZga1zIHvCWGI=; b=JOEFEX0rlklpMxc6Tv1xTMrYJrz6UmZhMUAWbyU3AQCI5sN7n+UZbY/nHT9LEnAVpA z3cYZEu8Wvx7DhBeAUOUtfW9x9w2GT9YxUeX9zJU9gMMUWA+h/nZaofcqWhZNz93xuhK 1ZCpr2CDzAYh92w4FVe7ImluyL8lw3ifdua9CSRniJfpSzre03jqJ4C/ogkBk3IrIHsm Zp/iOWHf4qRzW4475fBgNn5/CmXX822+7M5f3+R1REpqzVWXs1bYnqKWW0GaF1dSfo+H 5FXwLI0iBkqYOanB64VVyuL5+ChJdtR+SHPJo2c17pByy2xl+8IyNF5eznOCRpCmWCip B1gA== X-Forwarded-Encrypted: i=1; AJvYcCVu31PseQ76SF4zzIJB+NBpfM+qicXP6Wmsl/2NE1POuS1bSFbommknZ+k8AEuQL0m9GEc0rXGBLpHGdGM=@vger.kernel.org X-Gm-Message-State: AOJu0YwKDNNRtPYwFe9GhgzpIC8DfR2kATfQE5ea97+8S+6fZviGU/J/ 2nRPbLQDIcs9wLRmmeFcor4NbpRfxqYNFgk4t8LUFBykbJ9oD0SP X-Gm-Gg: ASbGncsPaNezW/YFTICyZgrc507B6Do0IULgoZBIJxuItkQlaa6k2rayU85Xv7vad1Q uxixFlDH/Kb0aFd+C4XEqOsGFt50oxyzEMRHiAAx9xb0LXff5a4tVlPUcM+3eCJHqKqthpAwTgv XwMBrtA2Sfo5wv2GxYwQqogdnsJQf2ZCUXJhFrkll/fJwR91WFCbZ2KhSrXUzejiJcNf1wcl6Ds hqEgGPjyhBJcPnfhXRiKzKosJHNzLW97TEMFx0AwoF8IxKZzA== X-Google-Smtp-Source: AGHT+IEEONV/UlxxlrdyYro3lAWzK8q69rAAkBvPwN5AeuK8TFPXVq4XWd7GAu8I2w2bCJc4ds24Ug== X-Received: by 2002:ac2:42c6:0:b0:540:2542:cba6 with SMTP id 2adb3069b0e04-5402542cde0mr971031e87.21.1733848840603; Tue, 10 Dec 2024 08:40:40 -0800 (PST) Received: from pc638.lan ([2001:9b1:d5a0:a500:2d8:61ff:fec9:d743]) by smtp.gmail.com with ESMTPSA id 2adb3069b0e04-53f93377eefsm1031875e87.67.2024.12.10.08.40.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 10 Dec 2024 08:40:39 -0800 (PST) From: "Uladzislau Rezki (Sony)" To: linux-mm@kvack.org, Andrew Morton , Vlastimil Babka Cc: RCU , LKML , Uladzislau Rezki , Oleksiy Avramchenko Subject: [RFC v1 3/5] mm/slab: Copy internal functions of kvfree_rcu() Date: Tue, 10 Dec 2024 17:40:33 +0100 Message-Id: <20241210164035.3391747-4-urezki@gmail.com> X-Mailer: git-send-email 2.39.5 In-Reply-To: <20241210164035.3391747-1-urezki@gmail.com> References: <20241210164035.3391747-1-urezki@gmail.com> Precedence: bulk X-Mailing-List: rcu@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Copy main functions of kvfree_rcu() from the kernel/rcu/tree.c to the slab_common.c file. In order to prevent a compiler warnings about defined but not used functions, below ones: run_page_cache_worker() fill_page_cache_func() kfree_rcu_monitor() kfree_rcu_work() drain_page_cache() are temporary marked as "__maybe_unused" in the slab_common.c file. Signed-off-by: Uladzislau Rezki (Sony) --- mm/slab_common.c | 507 +++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 507 insertions(+) diff --git a/mm/slab_common.c b/mm/slab_common.c index a249fdb0d92e..e7e1d5b5f31b 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -28,7 +28,9 @@ #include #include #include +#include +#include "../kernel/rcu/rcu.h" #include "internal.h" #include "slab.h" @@ -1433,3 +1435,508 @@ struct kfree_rcu_cpu { struct llist_head bkvcache; int nr_bkv_objs; }; + +/* + * This rcu parameter is runtime-read-only. It reflects + * a minimum allowed number of objects which can be cached + * per-CPU. Object size is equal to one page. This value + * can be changed at boot time. + */ +static int rcu_min_cached_objs = 5; +module_param(rcu_min_cached_objs, int, 0444); + +// A page shrinker can ask for pages to be freed to make them +// available for other parts of the system. This usually happens +// under low memory conditions, and in that case we should also +// defer page-cache filling for a short time period. +// +// The default value is 5 seconds, which is long enough to reduce +// interference with the shrinker while it asks other systems to +// drain their caches. +static int rcu_delay_page_cache_fill_msec = 5000; +module_param(rcu_delay_page_cache_fill_msec, int, 0444); + +static DEFINE_PER_CPU(struct kfree_rcu_cpu, krc) = { + .lock = __RAW_SPIN_LOCK_UNLOCKED(krc.lock), +}; + +static __always_inline void +debug_rcu_bhead_unqueue(struct kvfree_rcu_bulk_data *bhead) +{ +#ifdef CONFIG_DEBUG_OBJECTS_RCU_HEAD + int i; + + for (i = 0; i < bhead->nr_records; i++) + debug_rcu_head_unqueue((struct rcu_head *)(bhead->records[i])); +#endif +} + +static inline struct kfree_rcu_cpu * +krc_this_cpu_lock(unsigned long *flags) +{ + struct kfree_rcu_cpu *krcp; + + local_irq_save(*flags); // For safely calling this_cpu_ptr(). + krcp = this_cpu_ptr(&krc); + raw_spin_lock(&krcp->lock); + + return krcp; +} + +static inline void +krc_this_cpu_unlock(struct kfree_rcu_cpu *krcp, unsigned long flags) +{ + raw_spin_unlock_irqrestore(&krcp->lock, flags); +} + +static inline struct kvfree_rcu_bulk_data * +get_cached_bnode(struct kfree_rcu_cpu *krcp) +{ + if (!krcp->nr_bkv_objs) + return NULL; + + WRITE_ONCE(krcp->nr_bkv_objs, krcp->nr_bkv_objs - 1); + return (struct kvfree_rcu_bulk_data *) + llist_del_first(&krcp->bkvcache); +} + +static inline bool +put_cached_bnode(struct kfree_rcu_cpu *krcp, + struct kvfree_rcu_bulk_data *bnode) +{ + // Check the limit. + if (krcp->nr_bkv_objs >= rcu_min_cached_objs) + return false; + + llist_add((struct llist_node *) bnode, &krcp->bkvcache); + WRITE_ONCE(krcp->nr_bkv_objs, krcp->nr_bkv_objs + 1); + return true; +} + +static int __maybe_unused +drain_page_cache(struct kfree_rcu_cpu *krcp) +{ + unsigned long flags; + struct llist_node *page_list, *pos, *n; + int freed = 0; + + if (!rcu_min_cached_objs) + return 0; + + raw_spin_lock_irqsave(&krcp->lock, flags); + page_list = llist_del_all(&krcp->bkvcache); + WRITE_ONCE(krcp->nr_bkv_objs, 0); + raw_spin_unlock_irqrestore(&krcp->lock, flags); + + llist_for_each_safe(pos, n, page_list) { + free_page((unsigned long)pos); + freed++; + } + + return freed; +} + +static void +kvfree_rcu_bulk(struct kfree_rcu_cpu *krcp, + struct kvfree_rcu_bulk_data *bnode, int idx) +{ + unsigned long flags; + int i; + + if (!WARN_ON_ONCE(!poll_state_synchronize_rcu_full(&bnode->gp_snap))) { + debug_rcu_bhead_unqueue(bnode); + rcu_lock_acquire(&rcu_callback_map); + if (idx == 0) { // kmalloc() / kfree(). + trace_rcu_invoke_kfree_bulk_callback( + "slab", bnode->nr_records, + bnode->records); + + kfree_bulk(bnode->nr_records, bnode->records); + } else { // vmalloc() / vfree(). + for (i = 0; i < bnode->nr_records; i++) { + trace_rcu_invoke_kvfree_callback( + "slab", bnode->records[i], 0); + + vfree(bnode->records[i]); + } + } + rcu_lock_release(&rcu_callback_map); + } + + raw_spin_lock_irqsave(&krcp->lock, flags); + if (put_cached_bnode(krcp, bnode)) + bnode = NULL; + raw_spin_unlock_irqrestore(&krcp->lock, flags); + + if (bnode) + free_page((unsigned long) bnode); + + cond_resched_tasks_rcu_qs(); +} + +static void +kvfree_rcu_list(struct rcu_head *head) +{ + struct rcu_head *next; + + for (; head; head = next) { + void *ptr = (void *) head->func; + unsigned long offset = (void *) head - ptr; + + next = head->next; + debug_rcu_head_unqueue((struct rcu_head *)ptr); + rcu_lock_acquire(&rcu_callback_map); + trace_rcu_invoke_kvfree_callback("slab", head, offset); + + if (!WARN_ON_ONCE(!__is_kvfree_rcu_offset(offset))) + kvfree(ptr); + + rcu_lock_release(&rcu_callback_map); + cond_resched_tasks_rcu_qs(); + } +} + +/* + * This function is invoked in workqueue context after a grace period. + * It frees all the objects queued on ->bulk_head_free or ->head_free. + */ +static void __maybe_unused +kfree_rcu_work(struct work_struct *work) +{ + unsigned long flags; + struct kvfree_rcu_bulk_data *bnode, *n; + struct list_head bulk_head[FREE_N_CHANNELS]; + struct rcu_head *head; + struct kfree_rcu_cpu *krcp; + struct kfree_rcu_cpu_work *krwp; + struct rcu_gp_oldstate head_gp_snap; + int i; + + krwp = container_of(to_rcu_work(work), + struct kfree_rcu_cpu_work, rcu_work); + krcp = krwp->krcp; + + raw_spin_lock_irqsave(&krcp->lock, flags); + // Channels 1 and 2. + for (i = 0; i < FREE_N_CHANNELS; i++) + list_replace_init(&krwp->bulk_head_free[i], &bulk_head[i]); + + // Channel 3. + head = krwp->head_free; + krwp->head_free = NULL; + head_gp_snap = krwp->head_free_gp_snap; + raw_spin_unlock_irqrestore(&krcp->lock, flags); + + // Handle the first two channels. + for (i = 0; i < FREE_N_CHANNELS; i++) { + // Start from the tail page, so a GP is likely passed for it. + list_for_each_entry_safe(bnode, n, &bulk_head[i], list) + kvfree_rcu_bulk(krcp, bnode, i); + } + + /* + * This is used when the "bulk" path can not be used for the + * double-argument of kvfree_rcu(). This happens when the + * page-cache is empty, which means that objects are instead + * queued on a linked list through their rcu_head structures. + * This list is named "Channel 3". + */ + if (head && !WARN_ON_ONCE(!poll_state_synchronize_rcu_full(&head_gp_snap))) + kvfree_rcu_list(head); +} + +static bool +need_offload_krc(struct kfree_rcu_cpu *krcp) +{ + int i; + + for (i = 0; i < FREE_N_CHANNELS; i++) + if (!list_empty(&krcp->bulk_head[i])) + return true; + + return !!READ_ONCE(krcp->head); +} + +static bool +need_wait_for_krwp_work(struct kfree_rcu_cpu_work *krwp) +{ + int i; + + for (i = 0; i < FREE_N_CHANNELS; i++) + if (!list_empty(&krwp->bulk_head_free[i])) + return true; + + return !!krwp->head_free; +} + +static int krc_count(struct kfree_rcu_cpu *krcp) +{ + int sum = atomic_read(&krcp->head_count); + int i; + + for (i = 0; i < FREE_N_CHANNELS; i++) + sum += atomic_read(&krcp->bulk_count[i]); + + return sum; +} + +static void +schedule_delayed_monitor_work(struct kfree_rcu_cpu *krcp) +{ + long delay, delay_left; + + delay = krc_count(krcp) >= KVFREE_BULK_MAX_ENTR ? 1:KFREE_DRAIN_JIFFIES; + if (delayed_work_pending(&krcp->monitor_work)) { + delay_left = krcp->monitor_work.timer.expires - jiffies; + if (delay < delay_left) + mod_delayed_work(system_unbound_wq, &krcp->monitor_work, delay); + return; + } + queue_delayed_work(system_unbound_wq, &krcp->monitor_work, delay); +} + +static void +kvfree_rcu_drain_ready(struct kfree_rcu_cpu *krcp) +{ + struct list_head bulk_ready[FREE_N_CHANNELS]; + struct kvfree_rcu_bulk_data *bnode, *n; + struct rcu_head *head_ready = NULL; + unsigned long flags; + int i; + + raw_spin_lock_irqsave(&krcp->lock, flags); + for (i = 0; i < FREE_N_CHANNELS; i++) { + INIT_LIST_HEAD(&bulk_ready[i]); + + list_for_each_entry_safe_reverse(bnode, n, &krcp->bulk_head[i], list) { + if (!poll_state_synchronize_rcu_full(&bnode->gp_snap)) + break; + + atomic_sub(bnode->nr_records, &krcp->bulk_count[i]); + list_move(&bnode->list, &bulk_ready[i]); + } + } + + if (krcp->head && poll_state_synchronize_rcu(krcp->head_gp_snap)) { + head_ready = krcp->head; + atomic_set(&krcp->head_count, 0); + WRITE_ONCE(krcp->head, NULL); + } + raw_spin_unlock_irqrestore(&krcp->lock, flags); + + for (i = 0; i < FREE_N_CHANNELS; i++) { + list_for_each_entry_safe(bnode, n, &bulk_ready[i], list) + kvfree_rcu_bulk(krcp, bnode, i); + } + + if (head_ready) + kvfree_rcu_list(head_ready); +} + +/* + * Return: %true if a work is queued, %false otherwise. + */ +static bool +kvfree_rcu_queue_batch(struct kfree_rcu_cpu *krcp) +{ + unsigned long flags; + bool queued = false; + int i, j; + + raw_spin_lock_irqsave(&krcp->lock, flags); + + // Attempt to start a new batch. + for (i = 0; i < KFREE_N_BATCHES; i++) { + struct kfree_rcu_cpu_work *krwp = &(krcp->krw_arr[i]); + + // Try to detach bulk_head or head and attach it, only when + // all channels are free. Any channel is not free means at krwp + // there is on-going rcu work to handle krwp's free business. + if (need_wait_for_krwp_work(krwp)) + continue; + + // kvfree_rcu_drain_ready() might handle this krcp, if so give up. + if (need_offload_krc(krcp)) { + // Channel 1 corresponds to the SLAB-pointer bulk path. + // Channel 2 corresponds to vmalloc-pointer bulk path. + for (j = 0; j < FREE_N_CHANNELS; j++) { + if (list_empty(&krwp->bulk_head_free[j])) { + atomic_set(&krcp->bulk_count[j], 0); + list_replace_init(&krcp->bulk_head[j], + &krwp->bulk_head_free[j]); + } + } + + // Channel 3 corresponds to both SLAB and vmalloc + // objects queued on the linked list. + if (!krwp->head_free) { + krwp->head_free = krcp->head; + get_state_synchronize_rcu_full(&krwp->head_free_gp_snap); + atomic_set(&krcp->head_count, 0); + WRITE_ONCE(krcp->head, NULL); + } + + // One work is per one batch, so there are three + // "free channels", the batch can handle. Break + // the loop since it is done with this CPU thus + // queuing an RCU work is _always_ success here. + queued = queue_rcu_work(system_unbound_wq, &krwp->rcu_work); + WARN_ON_ONCE(!queued); + break; + } + } + + raw_spin_unlock_irqrestore(&krcp->lock, flags); + return queued; +} + +/* + * This function is invoked after the KFREE_DRAIN_JIFFIES timeout. + */ +static void __maybe_unused +kfree_rcu_monitor(struct work_struct *work) +{ + struct kfree_rcu_cpu *krcp = container_of(work, + struct kfree_rcu_cpu, monitor_work.work); + + // Drain ready for reclaim. + kvfree_rcu_drain_ready(krcp); + + // Queue a batch for a rest. + kvfree_rcu_queue_batch(krcp); + + // If there is nothing to detach, it means that our job is + // successfully done here. In case of having at least one + // of the channels that is still busy we should rearm the + // work to repeat an attempt. Because previous batches are + // still in progress. + if (need_offload_krc(krcp)) + schedule_delayed_monitor_work(krcp); +} + +static enum hrtimer_restart +schedule_page_work_fn(struct hrtimer *t) +{ + struct kfree_rcu_cpu *krcp = + container_of(t, struct kfree_rcu_cpu, hrtimer); + + queue_delayed_work(system_highpri_wq, &krcp->page_cache_work, 0); + return HRTIMER_NORESTART; +} + +static void __maybe_unused +fill_page_cache_func(struct work_struct *work) +{ + struct kvfree_rcu_bulk_data *bnode; + struct kfree_rcu_cpu *krcp = + container_of(work, struct kfree_rcu_cpu, + page_cache_work.work); + unsigned long flags; + int nr_pages; + bool pushed; + int i; + + nr_pages = atomic_read(&krcp->backoff_page_cache_fill) ? + 1 : rcu_min_cached_objs; + + for (i = READ_ONCE(krcp->nr_bkv_objs); i < nr_pages; i++) { + bnode = (struct kvfree_rcu_bulk_data *) + __get_free_page(GFP_KERNEL | __GFP_NORETRY | __GFP_NOMEMALLOC | __GFP_NOWARN); + + if (!bnode) + break; + + raw_spin_lock_irqsave(&krcp->lock, flags); + pushed = put_cached_bnode(krcp, bnode); + raw_spin_unlock_irqrestore(&krcp->lock, flags); + + if (!pushed) { + free_page((unsigned long) bnode); + break; + } + } + + atomic_set(&krcp->work_in_progress, 0); + atomic_set(&krcp->backoff_page_cache_fill, 0); +} + +static void __maybe_unused +run_page_cache_worker(struct kfree_rcu_cpu *krcp) +{ + // If cache disabled, bail out. + if (!rcu_min_cached_objs) + return; + + if (rcu_scheduler_active == RCU_SCHEDULER_RUNNING && + !atomic_xchg(&krcp->work_in_progress, 1)) { + if (atomic_read(&krcp->backoff_page_cache_fill)) { + queue_delayed_work(system_unbound_wq, + &krcp->page_cache_work, + msecs_to_jiffies(rcu_delay_page_cache_fill_msec)); + } else { + hrtimer_init(&krcp->hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL); + krcp->hrtimer.function = schedule_page_work_fn; + hrtimer_start(&krcp->hrtimer, 0, HRTIMER_MODE_REL); + } + } +} + +// Record ptr in a page managed by krcp, with the pre-krc_this_cpu_lock() +// state specified by flags. If can_alloc is true, the caller must +// be schedulable and not be holding any locks or mutexes that might be +// acquired by the memory allocator or anything that it might invoke. +// Returns true if ptr was successfully recorded, else the caller must +// use a fallback. +static inline bool +add_ptr_to_bulk_krc_lock(struct kfree_rcu_cpu **krcp, + unsigned long *flags, void *ptr, bool can_alloc) +{ + struct kvfree_rcu_bulk_data *bnode; + int idx; + + *krcp = krc_this_cpu_lock(flags); + if (unlikely(!(*krcp)->initialized)) + return false; + + idx = !!is_vmalloc_addr(ptr); + bnode = list_first_entry_or_null(&(*krcp)->bulk_head[idx], + struct kvfree_rcu_bulk_data, list); + + /* Check if a new block is required. */ + if (!bnode || bnode->nr_records == KVFREE_BULK_MAX_ENTR) { + bnode = get_cached_bnode(*krcp); + if (!bnode && can_alloc) { + krc_this_cpu_unlock(*krcp, *flags); + + // __GFP_NORETRY - allows a light-weight direct reclaim + // what is OK from minimizing of fallback hitting point of + // view. Apart of that it forbids any OOM invoking what is + // also beneficial since we are about to release memory soon. + // + // __GFP_NOMEMALLOC - prevents from consuming of all the + // memory reserves. Please note we have a fallback path. + // + // __GFP_NOWARN - it is supposed that an allocation can + // be failed under low memory or high memory pressure + // scenarios. + bnode = (struct kvfree_rcu_bulk_data *) + __get_free_page(GFP_KERNEL | __GFP_NORETRY | __GFP_NOMEMALLOC | __GFP_NOWARN); + raw_spin_lock_irqsave(&(*krcp)->lock, *flags); + } + + if (!bnode) + return false; + + // Initialize the new block and attach it. + bnode->nr_records = 0; + list_add(&bnode->list, &(*krcp)->bulk_head[idx]); + } + + // Finally insert and update the GP for this page. + bnode->nr_records++; + bnode->records[bnode->nr_records - 1] = ptr; + get_state_synchronize_rcu_full(&bnode->gp_snap); + atomic_inc(&(*krcp)->bulk_count[idx]); + + return true; +} From patchwork Tue Dec 10 16:40:34 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Uladzislau Rezki X-Patchwork-Id: 13901760 Received: from mail-lf1-f51.google.com (mail-lf1-f51.google.com [209.85.167.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8F62D2309B8; Tue, 10 Dec 2024 16:40:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.167.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733848845; cv=none; b=R76thtI85oqWYcf6aztUABOVETJ8BW9GZFseIpSWfxdFqcF41Tz6qZFhg14FEXPfIwUbgdKYoM6UOXN4/BtkIBSgXWxsN/MZDBQ19ujBVF4qZIKvi8Iy2Qrlsval9sPpi9NncTsgDRwlmIqjORSp80e6KCzXE2y1lhd9zzISxFo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733848845; c=relaxed/simple; bh=atxL5PqkJtOAvQd9mjOnjPLCndaPfawFysuK+qiokr0=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=WNfTqD1r7h7Wdu/ILAijponOdNGRETAYHKrLBkyKYlOo7dbv7Y7+nYbFs7Ujzogd4pAt6RcAAhyUNo0o7hdJng7JF5/A9H2JhGpIVE4p/OLxCclx1DKZjpr7Y7rYwNHqdwSXEGTj7jxz2YItUTXQSlFHDpxk5BO6PdWjHg3ulgI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=NQOPZOCv; arc=none smtp.client-ip=209.85.167.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="NQOPZOCv" Received: by mail-lf1-f51.google.com with SMTP id 2adb3069b0e04-53f757134cdso3215054e87.2; Tue, 10 Dec 2024 08:40:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1733848842; x=1734453642; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=8psq5tPidMqLngz98gskDzgVmEZF8iPxbnPcrOFnt1Y=; b=NQOPZOCveGQHYsnv4bxgUrUe3upDxo2dO7ungE1+EQGTaNgMPWJ2X0ttPNQ1R+V2T4 PAc/d/YVuGebJxx471FBWlsM7FHKkymd+DACbxNSqYKi5i+/oyE0CydcoAkPgZxGJoqA GyTyjWiyxiI0S7f2wsBQQUfw5Lhm9P3iEQM7z7STrk51/MpMgH0HmGu03KIvlmEfTsMZ RXFIS/RxR+E0CAWS4lFGoLojn/KtXIY5hMWh84kzIEGi7fzrBP5Inn3VHiAwNW7PM+mR vZHAkhmobiu7g9BjRweRhBTd0zu2ztCCpN1/Nbrk2EuTb9yUAO+ukHCetcUATLzRMHM5 exvw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1733848842; x=1734453642; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=8psq5tPidMqLngz98gskDzgVmEZF8iPxbnPcrOFnt1Y=; b=p/4G8cXc89RPKmeZPcFP4EkdfqjOY1ifkMLEoq3D9iI2USeU9ooF/hdtL+faJ6Tp04 Xo71x0mWNXfHx0VSoUmxdBSB75Q133020lcINpnEDNN9YzuZovsEmW859/0Q/FQYQkfP WMRLaoBPuZykQhfmTKKkCvuGIaZ6ficNqJuTh7NLIWu+hx8YxLaJNNVY4J5D2llpU8vN FFCEPqEsalMrTLDKDg4olBTNCEWsH87KeSgfnbrPPf3s30fD12ltdriijre9GRG4Du+/ huapOILwsXQi6ft7B4qWt/CMsECIqZSg43SeYq9dfJncbdkUGo5DNLI0dvxMCQjl9ZOY 4vHA== X-Forwarded-Encrypted: i=1; AJvYcCUftiVhAeCB1BueQIVw4zCDpWaqBUWGbKo0fj8iWhFxlkgZwP8u8FTdtokkjv8k1iXPY46lEr0IBAxFNak=@vger.kernel.org X-Gm-Message-State: AOJu0Yzto39REHWnxFb2nBWW/DX68azOksSTnJZTymMWn03rG8737QK+ /zOXQ2c7IU9gv6PhfTY5gAVuKXOSBow/cErGutEApqHAN63rUQt7 X-Gm-Gg: ASbGncuRt5g0N88SmBtE1PfArx/s/3XU2Ktf8lZa2sbjmdX1SRCpifSMK8aYqwgHFp+ aiiCQcbh1KCn5Xoh//a9vy6MLqDCwLcjchYPsfSoQlU4z7biEShTTqlvWKmh6d/PqdN4bYV3IHj m0Zbk1HnJH/NeNLZxuPFoIOgVOsxIhnfk37zXeRNNkWXUh6q3Nss3iKXqndO+qcElwNuLlUb3Mz GUNaZ7nZgcTwsEIvkTk3L/xEborK3Z+0anmnOybZY3exi3s4A== X-Google-Smtp-Source: AGHT+IH7sMXWy8vZyr5EGz+tyNTQnk0FxUAmt/2tf7nxEsgjM8bPz/qNg9Wj1lm+mcn0TAZSNwIVkg== X-Received: by 2002:a05:6512:2245:b0:53e:389d:8cdd with SMTP id 2adb3069b0e04-53e389d8dfdmr5755357e87.34.1733848841502; Tue, 10 Dec 2024 08:40:41 -0800 (PST) Received: from pc638.lan ([2001:9b1:d5a0:a500:2d8:61ff:fec9:d743]) by smtp.gmail.com with ESMTPSA id 2adb3069b0e04-53f93377eefsm1031875e87.67.2024.12.10.08.40.40 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 10 Dec 2024 08:40:40 -0800 (PST) From: "Uladzislau Rezki (Sony)" To: linux-mm@kvack.org, Andrew Morton , Vlastimil Babka Cc: RCU , LKML , Uladzislau Rezki , Oleksiy Avramchenko Subject: [RFC v1 4/5] mm/slab: Copy a function of kvfree_rcu() initialization Date: Tue, 10 Dec 2024 17:40:34 +0100 Message-Id: <20241210164035.3391747-5-urezki@gmail.com> X-Mailer: git-send-email 2.39.5 In-Reply-To: <20241210164035.3391747-1-urezki@gmail.com> References: <20241210164035.3391747-1-urezki@gmail.com> Precedence: bulk X-Mailing-List: rcu@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 As a final step an initialization of kvfree_rcu() functionality is copied into slab_common.c from the tree.c file as well as shrinker related code. The function is temporary marked as "__maybe_unused" to eliminate a compiler warnings. Signed-off-by: Uladzislau Rezki (Sony) --- mm/slab_common.c | 91 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 91 insertions(+) diff --git a/mm/slab_common.c b/mm/slab_common.c index e7e1d5b5f31b..cffc96bd279a 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -1940,3 +1940,94 @@ add_ptr_to_bulk_krc_lock(struct kfree_rcu_cpu **krcp, return true; } + +static unsigned long +kfree_rcu_shrink_count(struct shrinker *shrink, struct shrink_control *sc) +{ + int cpu; + unsigned long count = 0; + + /* Snapshot count of all CPUs */ + for_each_possible_cpu(cpu) { + struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu); + + count += krc_count(krcp); + count += READ_ONCE(krcp->nr_bkv_objs); + atomic_set(&krcp->backoff_page_cache_fill, 1); + } + + return count == 0 ? SHRINK_EMPTY : count; +} + +static unsigned long +kfree_rcu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc) +{ + int cpu, freed = 0; + + for_each_possible_cpu(cpu) { + int count; + struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu); + + count = krc_count(krcp); + count += drain_page_cache(krcp); + kfree_rcu_monitor(&krcp->monitor_work.work); + + sc->nr_to_scan -= count; + freed += count; + + if (sc->nr_to_scan <= 0) + break; + } + + return freed == 0 ? SHRINK_STOP : freed; +} + +static void __init __maybe_unused +kfree_rcu_batch_init(void) +{ + int cpu; + int i, j; + struct shrinker *kfree_rcu_shrinker; + + /* Clamp it to [0:100] seconds interval. */ + if (rcu_delay_page_cache_fill_msec < 0 || + rcu_delay_page_cache_fill_msec > 100 * MSEC_PER_SEC) { + + rcu_delay_page_cache_fill_msec = + clamp(rcu_delay_page_cache_fill_msec, 0, + (int) (100 * MSEC_PER_SEC)); + + pr_info("Adjusting rcutree.rcu_delay_page_cache_fill_msec to %d ms.\n", + rcu_delay_page_cache_fill_msec); + } + + for_each_possible_cpu(cpu) { + struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu); + + for (i = 0; i < KFREE_N_BATCHES; i++) { + INIT_RCU_WORK(&krcp->krw_arr[i].rcu_work, kfree_rcu_work); + krcp->krw_arr[i].krcp = krcp; + + for (j = 0; j < FREE_N_CHANNELS; j++) + INIT_LIST_HEAD(&krcp->krw_arr[i].bulk_head_free[j]); + } + + for (i = 0; i < FREE_N_CHANNELS; i++) + INIT_LIST_HEAD(&krcp->bulk_head[i]); + + INIT_DELAYED_WORK(&krcp->monitor_work, kfree_rcu_monitor); + INIT_DELAYED_WORK(&krcp->page_cache_work, fill_page_cache_func); + krcp->initialized = true; + } + + kfree_rcu_shrinker = shrinker_alloc(0, "rcu-slab-kfree"); + if (!kfree_rcu_shrinker) { + pr_err("Failed to allocate kfree_rcu() shrinker!\n"); + return; + } + + kfree_rcu_shrinker->count_objects = kfree_rcu_shrink_count; + kfree_rcu_shrinker->scan_objects = kfree_rcu_shrink_scan; + + shrinker_register(kfree_rcu_shrinker); +} From patchwork Tue Dec 10 16:40:35 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Uladzislau Rezki X-Patchwork-Id: 13901761 Received: from mail-lf1-f51.google.com (mail-lf1-f51.google.com [209.85.167.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 75CF9233D81; Tue, 10 Dec 2024 16:40:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.167.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733848848; cv=none; b=rMRgo/VfOM3P7/Kovp1+M0bsqI9Ow68/Ie3r5CyXluazPWauhLgLWTe4xYA9SLB7fwP404rX+eFvNZPEx7OlpFAqyR53oy563vV3sOZAf58fq2cHefNevz6hcpUYXxLlmD48XZhT9/gc5GEy2EnstCAWVafmhaPgzCPw11s1PHc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733848848; c=relaxed/simple; bh=VKftjw03RHaWnGNolD1PpEtAWpyo9AqfiOBo2S3AN3Y=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=p+CxE6FFzYG426O+wCTv8ynEv5oI8oJ0iFybMutnsAU+j4h1zn/g5n9vM7RENjcoAGVBIiLRS4O8xEhD3C94xjh8WD8VRSQ7cxRMItVP2GEQ7lRZb2p0+zpQs8n0kU6T39thotAykeamPYL+VELZZsqzUHFcMlLY7dmOyIqPGDQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=eT2Tlqmm; arc=none smtp.client-ip=209.85.167.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="eT2Tlqmm" Received: by mail-lf1-f51.google.com with SMTP id 2adb3069b0e04-53ffaaeeb76so3087230e87.0; Tue, 10 Dec 2024 08:40:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1733848844; x=1734453644; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=gZyu20V0IT5sFAUs0VabPR/e3lkWhuOMoGZgqAXATFM=; b=eT2Tlqmm4pq98XTggghZxaYFQjukWXFu7/1IPTHi8F0PvkE5wKb5qh6FJR+mn/UzUb TDEK0wUS4yEHz3xPlFkHRXxcePVCCI80HYeUPRZnUe5B2xkhUCR0rtZ3W7CwQSkkUcbZ 6C4zjNOUJgoqi2F5i/2fdfibA+LyAdK0Ofa9BLKDtYgYkycHysL0byNYOjbOTYIwCyvY uh2M5cOAzlD8arLC+g0rUR0mEsDfLDkFg2k8yxL+NUpoGDNYzqchd3XfHBCIimVrmd6+ OfOrLzhgpgqLiaQiFgKf+QW7KepDurOZCoAb67RC1tvoE30UAb4KelGAOzuS6ttDaQ4b rkQA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1733848844; x=1734453644; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=gZyu20V0IT5sFAUs0VabPR/e3lkWhuOMoGZgqAXATFM=; b=Mj1ue6wry2cPJlFZg9P5KysGKsOc4ec1fY3djrpwN/ulVArJStcASLpxNI86EZ6BIC D+Qo/wwH0OOziva54mTuQSrQZZ78gZhU0mqnEe6+V4BZJR9TiNtJp1dZH1mSuN//rzN4 oY9PNmboFqGKDch9EZuo0S2M9eEVRSsZ59e+nbpwo1Q+jvHGFq14Zkhf5zTyMG/2eDBI VYp0UEkVJCAlGnM1Ng8maEOl5um3Ht8bxNnZbkR0Asqg5eBD/iiM+eXWGgmAVlD1K190 3ONun/wb9PDhEMqxdVvr90SQt+hHQds8AL3yNmIn0bURzxusZ55YFgdHgHYEnlp+//Ws ht7w== X-Forwarded-Encrypted: i=1; AJvYcCXGxIPeclRwcu/hNygn4gp+yVMyLUannbaJvw7JFJCDeZ5/5IFa3Act00rH+r1bZAq+EtlJY1Z88V8i89o=@vger.kernel.org X-Gm-Message-State: AOJu0YxzMfC2yoQ0L3/1YLG3OFxllq3el8RluB9mqZRNZxV2+OrOxTAM hFn0T49CiyIryIBPut6KjeJ9D4Jqbqeza4onao7UhQZS3Q0kr9Ax X-Gm-Gg: ASbGncvjVB910mW41LA3banA3h2O/idKhb8Wled82cfW5oj7cXrbgrwZz+JtQBu3py7 FPzWGP1wCPF2gbOo/QefWrQdDpcOvMJQsEKB1kjscL+6eDpFyqe9/MHtXNEMvlAOl71Ep77ovCq 0jfDyGNbYeR0G9eTI2e06BwheQite2Mzt0gNXbAZXfKQ21OiUYbikXZoanh481w/BCJcyVcg7JE VnU1yEsaHHCQYWw84siCaMPtazjxN2EkeLTtvXmwmnsy7nUsQ== X-Google-Smtp-Source: AGHT+IGC7l/kzkPm03FA64LMQLtPLRrGPZWNRGEIpVIvobJcPaCqJsqgeIavMCRlV0yqxEXaUpJUUg== X-Received: by 2002:a05:6512:6c8:b0:53f:f71:4d74 with SMTP id 2adb3069b0e04-53f0f714e3amr5537287e87.8.1733848842959; Tue, 10 Dec 2024 08:40:42 -0800 (PST) Received: from pc638.lan ([2001:9b1:d5a0:a500:2d8:61ff:fec9:d743]) by smtp.gmail.com with ESMTPSA id 2adb3069b0e04-53f93377eefsm1031875e87.67.2024.12.10.08.40.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 10 Dec 2024 08:40:41 -0800 (PST) From: "Uladzislau Rezki (Sony)" To: linux-mm@kvack.org, Andrew Morton , Vlastimil Babka Cc: RCU , LKML , Uladzislau Rezki , Oleksiy Avramchenko Subject: [RFC v1 5/5] mm/slab: Move kvfree_rcu() into SLAB Date: Tue, 10 Dec 2024 17:40:35 +0100 Message-Id: <20241210164035.3391747-6-urezki@gmail.com> X-Mailer: git-send-email 2.39.5 In-Reply-To: <20241210164035.3391747-1-urezki@gmail.com> References: <20241210164035.3391747-1-urezki@gmail.com> Precedence: bulk X-Mailing-List: rcu@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 A final move of kvfree_rcu() functionality into slab_common.c file: - Rename kfree_rcu_batch_init() to the kvfree_rcu_init(); - Invoke the kvfree_rcu_init() function from main.c after rcu_init(); - Move the rest of functionality to the slab_common.c file; - Fully remove kvfree_rcu() from the kernel/rcu/tree.c file; - Remove a temporary solution to handle freeing ptrs. after GP; - Remove "__maybe_unused" from the slab_common.c file; - Do not export main functionality for CONFIG_TINY_RCU case. Signed-off-by: Uladzislau Rezki (Sony) --- include/linux/slab.h | 1 + init/main.c | 1 + kernel/rcu/tree.c | 893 +------------------------------------------ mm/slab_common.c | 256 +++++++++++-- 4 files changed, 225 insertions(+), 926 deletions(-) diff --git a/include/linux/slab.h b/include/linux/slab.h index b35e2db7eb0e..8a2d006119f8 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -1076,5 +1076,6 @@ unsigned int kmem_cache_size(struct kmem_cache *s); size_t kmalloc_size_roundup(size_t size); void __init kmem_cache_init_late(void); +void __init kvfree_rcu_init(void); #endif /* _LINUX_SLAB_H */ diff --git a/init/main.c b/init/main.c index c4778edae797..27d177784f3a 100644 --- a/init/main.c +++ b/init/main.c @@ -995,6 +995,7 @@ void start_kernel(void) workqueue_init_early(); rcu_init(); + kvfree_rcu_init(); /* Trace events are available after this */ trace_init(); diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index ab24229dfa73..4c9c16945e3a 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -186,26 +186,6 @@ static int rcu_unlock_delay; module_param(rcu_unlock_delay, int, 0444); #endif -/* - * This rcu parameter is runtime-read-only. It reflects - * a minimum allowed number of objects which can be cached - * per-CPU. Object size is equal to one page. This value - * can be changed at boot time. - */ -static int rcu_min_cached_objs = 5; -module_param(rcu_min_cached_objs, int, 0444); - -// A page shrinker can ask for pages to be freed to make them -// available for other parts of the system. This usually happens -// under low memory conditions, and in that case we should also -// defer page-cache filling for a short time period. -// -// The default value is 5 seconds, which is long enough to reduce -// interference with the shrinker while it asks other systems to -// drain their caches. -static int rcu_delay_page_cache_fill_msec = 5000; -module_param(rcu_delay_page_cache_fill_msec, int, 0444); - /* Retrieve RCU kthreads priority for rcutorture */ int rcu_get_gp_kthreads_prio(void) { @@ -2559,19 +2539,13 @@ static void rcu_do_batch(struct rcu_data *rdp) debug_rcu_head_unqueue(rhp); rcu_lock_acquire(&rcu_callback_map); + trace_rcu_invoke_callback(rcu_state.name, rhp); f = rhp->func; + debug_rcu_head_callback(rhp); + WRITE_ONCE(rhp->func, (rcu_callback_t)0L); + f(rhp); - /* This is temporary, it will be removed when migration is over. */ - if (__is_kvfree_rcu_offset((unsigned long) f)) { - trace_rcu_invoke_kvfree_callback("", rhp, (unsigned long) f); - kvfree((void *) rhp - (unsigned long) f); - } else { - trace_rcu_invoke_callback(rcu_state.name, rhp); - debug_rcu_head_callback(rhp); - WRITE_ONCE(rhp->func, (rcu_callback_t)0L); - f(rhp); - } rcu_lock_release(&rcu_callback_map); /* @@ -3197,815 +3171,6 @@ void call_rcu(struct rcu_head *head, rcu_callback_t func) } EXPORT_SYMBOL_GPL(call_rcu); -/* Maximum number of jiffies to wait before draining a batch. */ -#define KFREE_DRAIN_JIFFIES (5 * HZ) -#define KFREE_N_BATCHES 2 -#define FREE_N_CHANNELS 2 - -/** - * struct kvfree_rcu_bulk_data - single block to store kvfree_rcu() pointers - * @list: List node. All blocks are linked between each other - * @gp_snap: Snapshot of RCU state for objects placed to this bulk - * @nr_records: Number of active pointers in the array - * @records: Array of the kvfree_rcu() pointers - */ -struct kvfree_rcu_bulk_data { - struct list_head list; - struct rcu_gp_oldstate gp_snap; - unsigned long nr_records; - void *records[] __counted_by(nr_records); -}; - -/* - * This macro defines how many entries the "records" array - * will contain. It is based on the fact that the size of - * kvfree_rcu_bulk_data structure becomes exactly one page. - */ -#define KVFREE_BULK_MAX_ENTR \ - ((PAGE_SIZE - sizeof(struct kvfree_rcu_bulk_data)) / sizeof(void *)) - -/** - * struct kfree_rcu_cpu_work - single batch of kfree_rcu() requests - * @rcu_work: Let queue_rcu_work() invoke workqueue handler after grace period - * @head_free: List of kfree_rcu() objects waiting for a grace period - * @head_free_gp_snap: Grace-period snapshot to check for attempted premature frees. - * @bulk_head_free: Bulk-List of kvfree_rcu() objects waiting for a grace period - * @krcp: Pointer to @kfree_rcu_cpu structure - */ - -struct kfree_rcu_cpu_work { - struct rcu_work rcu_work; - struct rcu_head *head_free; - struct rcu_gp_oldstate head_free_gp_snap; - struct list_head bulk_head_free[FREE_N_CHANNELS]; - struct kfree_rcu_cpu *krcp; -}; - -/** - * struct kfree_rcu_cpu - batch up kfree_rcu() requests for RCU grace period - * @head: List of kfree_rcu() objects not yet waiting for a grace period - * @head_gp_snap: Snapshot of RCU state for objects placed to "@head" - * @bulk_head: Bulk-List of kvfree_rcu() objects not yet waiting for a grace period - * @krw_arr: Array of batches of kfree_rcu() objects waiting for a grace period - * @lock: Synchronize access to this structure - * @monitor_work: Promote @head to @head_free after KFREE_DRAIN_JIFFIES - * @initialized: The @rcu_work fields have been initialized - * @head_count: Number of objects in rcu_head singular list - * @bulk_count: Number of objects in bulk-list - * @bkvcache: - * A simple cache list that contains objects for reuse purpose. - * In order to save some per-cpu space the list is singular. - * Even though it is lockless an access has to be protected by the - * per-cpu lock. - * @page_cache_work: A work to refill the cache when it is empty - * @backoff_page_cache_fill: Delay cache refills - * @work_in_progress: Indicates that page_cache_work is running - * @hrtimer: A hrtimer for scheduling a page_cache_work - * @nr_bkv_objs: number of allocated objects at @bkvcache. - * - * This is a per-CPU structure. The reason that it is not included in - * the rcu_data structure is to permit this code to be extracted from - * the RCU files. Such extraction could allow further optimization of - * the interactions with the slab allocators. - */ -struct kfree_rcu_cpu { - // Objects queued on a linked list - // through their rcu_head structures. - struct rcu_head *head; - unsigned long head_gp_snap; - atomic_t head_count; - - // Objects queued on a bulk-list. - struct list_head bulk_head[FREE_N_CHANNELS]; - atomic_t bulk_count[FREE_N_CHANNELS]; - - struct kfree_rcu_cpu_work krw_arr[KFREE_N_BATCHES]; - raw_spinlock_t lock; - struct delayed_work monitor_work; - bool initialized; - - struct delayed_work page_cache_work; - atomic_t backoff_page_cache_fill; - atomic_t work_in_progress; - struct hrtimer hrtimer; - - struct llist_head bkvcache; - int nr_bkv_objs; -}; - -static DEFINE_PER_CPU(struct kfree_rcu_cpu, krc) = { - .lock = __RAW_SPIN_LOCK_UNLOCKED(krc.lock), -}; - -static __always_inline void -debug_rcu_bhead_unqueue(struct kvfree_rcu_bulk_data *bhead) -{ -#ifdef CONFIG_DEBUG_OBJECTS_RCU_HEAD - int i; - - for (i = 0; i < bhead->nr_records; i++) - debug_rcu_head_unqueue((struct rcu_head *)(bhead->records[i])); -#endif -} - -static inline struct kfree_rcu_cpu * -krc_this_cpu_lock(unsigned long *flags) -{ - struct kfree_rcu_cpu *krcp; - - local_irq_save(*flags); // For safely calling this_cpu_ptr(). - krcp = this_cpu_ptr(&krc); - raw_spin_lock(&krcp->lock); - - return krcp; -} - -static inline void -krc_this_cpu_unlock(struct kfree_rcu_cpu *krcp, unsigned long flags) -{ - raw_spin_unlock_irqrestore(&krcp->lock, flags); -} - -static inline struct kvfree_rcu_bulk_data * -get_cached_bnode(struct kfree_rcu_cpu *krcp) -{ - if (!krcp->nr_bkv_objs) - return NULL; - - WRITE_ONCE(krcp->nr_bkv_objs, krcp->nr_bkv_objs - 1); - return (struct kvfree_rcu_bulk_data *) - llist_del_first(&krcp->bkvcache); -} - -static inline bool -put_cached_bnode(struct kfree_rcu_cpu *krcp, - struct kvfree_rcu_bulk_data *bnode) -{ - // Check the limit. - if (krcp->nr_bkv_objs >= rcu_min_cached_objs) - return false; - - llist_add((struct llist_node *) bnode, &krcp->bkvcache); - WRITE_ONCE(krcp->nr_bkv_objs, krcp->nr_bkv_objs + 1); - return true; -} - -static int -drain_page_cache(struct kfree_rcu_cpu *krcp) -{ - unsigned long flags; - struct llist_node *page_list, *pos, *n; - int freed = 0; - - if (!rcu_min_cached_objs) - return 0; - - raw_spin_lock_irqsave(&krcp->lock, flags); - page_list = llist_del_all(&krcp->bkvcache); - WRITE_ONCE(krcp->nr_bkv_objs, 0); - raw_spin_unlock_irqrestore(&krcp->lock, flags); - - llist_for_each_safe(pos, n, page_list) { - free_page((unsigned long)pos); - freed++; - } - - return freed; -} - -static void -kvfree_rcu_bulk(struct kfree_rcu_cpu *krcp, - struct kvfree_rcu_bulk_data *bnode, int idx) -{ - unsigned long flags; - int i; - - if (!WARN_ON_ONCE(!poll_state_synchronize_rcu_full(&bnode->gp_snap))) { - debug_rcu_bhead_unqueue(bnode); - rcu_lock_acquire(&rcu_callback_map); - if (idx == 0) { // kmalloc() / kfree(). - trace_rcu_invoke_kfree_bulk_callback( - rcu_state.name, bnode->nr_records, - bnode->records); - - kfree_bulk(bnode->nr_records, bnode->records); - } else { // vmalloc() / vfree(). - for (i = 0; i < bnode->nr_records; i++) { - trace_rcu_invoke_kvfree_callback( - rcu_state.name, bnode->records[i], 0); - - vfree(bnode->records[i]); - } - } - rcu_lock_release(&rcu_callback_map); - } - - raw_spin_lock_irqsave(&krcp->lock, flags); - if (put_cached_bnode(krcp, bnode)) - bnode = NULL; - raw_spin_unlock_irqrestore(&krcp->lock, flags); - - if (bnode) - free_page((unsigned long) bnode); - - cond_resched_tasks_rcu_qs(); -} - -static void -kvfree_rcu_list(struct rcu_head *head) -{ - struct rcu_head *next; - - for (; head; head = next) { - void *ptr = (void *) head->func; - unsigned long offset = (void *) head - ptr; - - next = head->next; - debug_rcu_head_unqueue((struct rcu_head *)ptr); - rcu_lock_acquire(&rcu_callback_map); - trace_rcu_invoke_kvfree_callback(rcu_state.name, head, offset); - - if (!WARN_ON_ONCE(!__is_kvfree_rcu_offset(offset))) - kvfree(ptr); - - rcu_lock_release(&rcu_callback_map); - cond_resched_tasks_rcu_qs(); - } -} - -/* - * This function is invoked in workqueue context after a grace period. - * It frees all the objects queued on ->bulk_head_free or ->head_free. - */ -static void kfree_rcu_work(struct work_struct *work) -{ - unsigned long flags; - struct kvfree_rcu_bulk_data *bnode, *n; - struct list_head bulk_head[FREE_N_CHANNELS]; - struct rcu_head *head; - struct kfree_rcu_cpu *krcp; - struct kfree_rcu_cpu_work *krwp; - struct rcu_gp_oldstate head_gp_snap; - int i; - - krwp = container_of(to_rcu_work(work), - struct kfree_rcu_cpu_work, rcu_work); - krcp = krwp->krcp; - - raw_spin_lock_irqsave(&krcp->lock, flags); - // Channels 1 and 2. - for (i = 0; i < FREE_N_CHANNELS; i++) - list_replace_init(&krwp->bulk_head_free[i], &bulk_head[i]); - - // Channel 3. - head = krwp->head_free; - krwp->head_free = NULL; - head_gp_snap = krwp->head_free_gp_snap; - raw_spin_unlock_irqrestore(&krcp->lock, flags); - - // Handle the first two channels. - for (i = 0; i < FREE_N_CHANNELS; i++) { - // Start from the tail page, so a GP is likely passed for it. - list_for_each_entry_safe(bnode, n, &bulk_head[i], list) - kvfree_rcu_bulk(krcp, bnode, i); - } - - /* - * This is used when the "bulk" path can not be used for the - * double-argument of kvfree_rcu(). This happens when the - * page-cache is empty, which means that objects are instead - * queued on a linked list through their rcu_head structures. - * This list is named "Channel 3". - */ - if (head && !WARN_ON_ONCE(!poll_state_synchronize_rcu_full(&head_gp_snap))) - kvfree_rcu_list(head); -} - -static bool -need_offload_krc(struct kfree_rcu_cpu *krcp) -{ - int i; - - for (i = 0; i < FREE_N_CHANNELS; i++) - if (!list_empty(&krcp->bulk_head[i])) - return true; - - return !!READ_ONCE(krcp->head); -} - -static bool -need_wait_for_krwp_work(struct kfree_rcu_cpu_work *krwp) -{ - int i; - - for (i = 0; i < FREE_N_CHANNELS; i++) - if (!list_empty(&krwp->bulk_head_free[i])) - return true; - - return !!krwp->head_free; -} - -static int krc_count(struct kfree_rcu_cpu *krcp) -{ - int sum = atomic_read(&krcp->head_count); - int i; - - for (i = 0; i < FREE_N_CHANNELS; i++) - sum += atomic_read(&krcp->bulk_count[i]); - - return sum; -} - -static void -schedule_delayed_monitor_work(struct kfree_rcu_cpu *krcp) -{ - long delay, delay_left; - - delay = krc_count(krcp) >= KVFREE_BULK_MAX_ENTR ? 1:KFREE_DRAIN_JIFFIES; - if (delayed_work_pending(&krcp->monitor_work)) { - delay_left = krcp->monitor_work.timer.expires - jiffies; - if (delay < delay_left) - mod_delayed_work(system_unbound_wq, &krcp->monitor_work, delay); - return; - } - queue_delayed_work(system_unbound_wq, &krcp->monitor_work, delay); -} - -static void -kvfree_rcu_drain_ready(struct kfree_rcu_cpu *krcp) -{ - struct list_head bulk_ready[FREE_N_CHANNELS]; - struct kvfree_rcu_bulk_data *bnode, *n; - struct rcu_head *head_ready = NULL; - unsigned long flags; - int i; - - raw_spin_lock_irqsave(&krcp->lock, flags); - for (i = 0; i < FREE_N_CHANNELS; i++) { - INIT_LIST_HEAD(&bulk_ready[i]); - - list_for_each_entry_safe_reverse(bnode, n, &krcp->bulk_head[i], list) { - if (!poll_state_synchronize_rcu_full(&bnode->gp_snap)) - break; - - atomic_sub(bnode->nr_records, &krcp->bulk_count[i]); - list_move(&bnode->list, &bulk_ready[i]); - } - } - - if (krcp->head && poll_state_synchronize_rcu(krcp->head_gp_snap)) { - head_ready = krcp->head; - atomic_set(&krcp->head_count, 0); - WRITE_ONCE(krcp->head, NULL); - } - raw_spin_unlock_irqrestore(&krcp->lock, flags); - - for (i = 0; i < FREE_N_CHANNELS; i++) { - list_for_each_entry_safe(bnode, n, &bulk_ready[i], list) - kvfree_rcu_bulk(krcp, bnode, i); - } - - if (head_ready) - kvfree_rcu_list(head_ready); -} - -/* - * Return: %true if a work is queued, %false otherwise. - */ -static bool -kvfree_rcu_queue_batch(struct kfree_rcu_cpu *krcp) -{ - unsigned long flags; - bool queued = false; - int i, j; - - raw_spin_lock_irqsave(&krcp->lock, flags); - - // Attempt to start a new batch. - for (i = 0; i < KFREE_N_BATCHES; i++) { - struct kfree_rcu_cpu_work *krwp = &(krcp->krw_arr[i]); - - // Try to detach bulk_head or head and attach it, only when - // all channels are free. Any channel is not free means at krwp - // there is on-going rcu work to handle krwp's free business. - if (need_wait_for_krwp_work(krwp)) - continue; - - // kvfree_rcu_drain_ready() might handle this krcp, if so give up. - if (need_offload_krc(krcp)) { - // Channel 1 corresponds to the SLAB-pointer bulk path. - // Channel 2 corresponds to vmalloc-pointer bulk path. - for (j = 0; j < FREE_N_CHANNELS; j++) { - if (list_empty(&krwp->bulk_head_free[j])) { - atomic_set(&krcp->bulk_count[j], 0); - list_replace_init(&krcp->bulk_head[j], - &krwp->bulk_head_free[j]); - } - } - - // Channel 3 corresponds to both SLAB and vmalloc - // objects queued on the linked list. - if (!krwp->head_free) { - krwp->head_free = krcp->head; - get_state_synchronize_rcu_full(&krwp->head_free_gp_snap); - atomic_set(&krcp->head_count, 0); - WRITE_ONCE(krcp->head, NULL); - } - - // One work is per one batch, so there are three - // "free channels", the batch can handle. Break - // the loop since it is done with this CPU thus - // queuing an RCU work is _always_ success here. - queued = queue_rcu_work(system_unbound_wq, &krwp->rcu_work); - WARN_ON_ONCE(!queued); - break; - } - } - - raw_spin_unlock_irqrestore(&krcp->lock, flags); - return queued; -} - -/* - * This function is invoked after the KFREE_DRAIN_JIFFIES timeout. - */ -static void kfree_rcu_monitor(struct work_struct *work) -{ - struct kfree_rcu_cpu *krcp = container_of(work, - struct kfree_rcu_cpu, monitor_work.work); - - // Drain ready for reclaim. - kvfree_rcu_drain_ready(krcp); - - // Queue a batch for a rest. - kvfree_rcu_queue_batch(krcp); - - // If there is nothing to detach, it means that our job is - // successfully done here. In case of having at least one - // of the channels that is still busy we should rearm the - // work to repeat an attempt. Because previous batches are - // still in progress. - if (need_offload_krc(krcp)) - schedule_delayed_monitor_work(krcp); -} - -static enum hrtimer_restart -schedule_page_work_fn(struct hrtimer *t) -{ - struct kfree_rcu_cpu *krcp = - container_of(t, struct kfree_rcu_cpu, hrtimer); - - queue_delayed_work(system_highpri_wq, &krcp->page_cache_work, 0); - return HRTIMER_NORESTART; -} - -static void fill_page_cache_func(struct work_struct *work) -{ - struct kvfree_rcu_bulk_data *bnode; - struct kfree_rcu_cpu *krcp = - container_of(work, struct kfree_rcu_cpu, - page_cache_work.work); - unsigned long flags; - int nr_pages; - bool pushed; - int i; - - nr_pages = atomic_read(&krcp->backoff_page_cache_fill) ? - 1 : rcu_min_cached_objs; - - for (i = READ_ONCE(krcp->nr_bkv_objs); i < nr_pages; i++) { - bnode = (struct kvfree_rcu_bulk_data *) - __get_free_page(GFP_KERNEL | __GFP_NORETRY | __GFP_NOMEMALLOC | __GFP_NOWARN); - - if (!bnode) - break; - - raw_spin_lock_irqsave(&krcp->lock, flags); - pushed = put_cached_bnode(krcp, bnode); - raw_spin_unlock_irqrestore(&krcp->lock, flags); - - if (!pushed) { - free_page((unsigned long) bnode); - break; - } - } - - atomic_set(&krcp->work_in_progress, 0); - atomic_set(&krcp->backoff_page_cache_fill, 0); -} - -static void -run_page_cache_worker(struct kfree_rcu_cpu *krcp) -{ - // If cache disabled, bail out. - if (!rcu_min_cached_objs) - return; - - if (rcu_scheduler_active == RCU_SCHEDULER_RUNNING && - !atomic_xchg(&krcp->work_in_progress, 1)) { - if (atomic_read(&krcp->backoff_page_cache_fill)) { - queue_delayed_work(system_unbound_wq, - &krcp->page_cache_work, - msecs_to_jiffies(rcu_delay_page_cache_fill_msec)); - } else { - hrtimer_init(&krcp->hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL); - krcp->hrtimer.function = schedule_page_work_fn; - hrtimer_start(&krcp->hrtimer, 0, HRTIMER_MODE_REL); - } - } -} - -// Record ptr in a page managed by krcp, with the pre-krc_this_cpu_lock() -// state specified by flags. If can_alloc is true, the caller must -// be schedulable and not be holding any locks or mutexes that might be -// acquired by the memory allocator or anything that it might invoke. -// Returns true if ptr was successfully recorded, else the caller must -// use a fallback. -static inline bool -add_ptr_to_bulk_krc_lock(struct kfree_rcu_cpu **krcp, - unsigned long *flags, void *ptr, bool can_alloc) -{ - struct kvfree_rcu_bulk_data *bnode; - int idx; - - *krcp = krc_this_cpu_lock(flags); - if (unlikely(!(*krcp)->initialized)) - return false; - - idx = !!is_vmalloc_addr(ptr); - bnode = list_first_entry_or_null(&(*krcp)->bulk_head[idx], - struct kvfree_rcu_bulk_data, list); - - /* Check if a new block is required. */ - if (!bnode || bnode->nr_records == KVFREE_BULK_MAX_ENTR) { - bnode = get_cached_bnode(*krcp); - if (!bnode && can_alloc) { - krc_this_cpu_unlock(*krcp, *flags); - - // __GFP_NORETRY - allows a light-weight direct reclaim - // what is OK from minimizing of fallback hitting point of - // view. Apart of that it forbids any OOM invoking what is - // also beneficial since we are about to release memory soon. - // - // __GFP_NOMEMALLOC - prevents from consuming of all the - // memory reserves. Please note we have a fallback path. - // - // __GFP_NOWARN - it is supposed that an allocation can - // be failed under low memory or high memory pressure - // scenarios. - bnode = (struct kvfree_rcu_bulk_data *) - __get_free_page(GFP_KERNEL | __GFP_NORETRY | __GFP_NOMEMALLOC | __GFP_NOWARN); - raw_spin_lock_irqsave(&(*krcp)->lock, *flags); - } - - if (!bnode) - return false; - - // Initialize the new block and attach it. - bnode->nr_records = 0; - list_add(&bnode->list, &(*krcp)->bulk_head[idx]); - } - - // Finally insert and update the GP for this page. - bnode->nr_records++; - bnode->records[bnode->nr_records - 1] = ptr; - get_state_synchronize_rcu_full(&bnode->gp_snap); - atomic_inc(&(*krcp)->bulk_count[idx]); - - return true; -} - -/* - * Queue a request for lazy invocation of the appropriate free routine - * after a grace period. Please note that three paths are maintained, - * two for the common case using arrays of pointers and a third one that - * is used only when the main paths cannot be used, for example, due to - * memory pressure. - * - * Each kvfree_call_rcu() request is added to a batch. The batch will be drained - * every KFREE_DRAIN_JIFFIES number of jiffies. All the objects in the batch will - * be free'd in workqueue context. This allows us to: batch requests together to - * reduce the number of grace periods during heavy kfree_rcu()/kvfree_rcu() load. - */ -void kvfree_call_rcu(struct rcu_head *head, void *ptr) -{ - unsigned long flags; - struct kfree_rcu_cpu *krcp; - bool success; - - if (head) { - call_rcu(head, (rcu_callback_t) ((void *) head - ptr)); - } else { - synchronize_rcu(); - kvfree(ptr); - } - - /* Disconnect the rest. */ - return; - - /* - * Please note there is a limitation for the head-less - * variant, that is why there is a clear rule for such - * objects: it can be used from might_sleep() context - * only. For other places please embed an rcu_head to - * your data. - */ - if (!head) - might_sleep(); - - // Queue the object but don't yet schedule the batch. - if (debug_rcu_head_queue(ptr)) { - // Probable double kfree_rcu(), just leak. - WARN_ONCE(1, "%s(): Double-freed call. rcu_head %p\n", - __func__, head); - - // Mark as success and leave. - return; - } - - kasan_record_aux_stack_noalloc(ptr); - success = add_ptr_to_bulk_krc_lock(&krcp, &flags, ptr, !head); - if (!success) { - run_page_cache_worker(krcp); - - if (head == NULL) - // Inline if kvfree_rcu(one_arg) call. - goto unlock_return; - - head->func = ptr; - head->next = krcp->head; - WRITE_ONCE(krcp->head, head); - atomic_inc(&krcp->head_count); - - // Take a snapshot for this krcp. - krcp->head_gp_snap = get_state_synchronize_rcu(); - success = true; - } - - /* - * The kvfree_rcu() caller considers the pointer freed at this point - * and likely removes any references to it. Since the actual slab - * freeing (and kmemleak_free()) is deferred, tell kmemleak to ignore - * this object (no scanning or false positives reporting). - */ - kmemleak_ignore(ptr); - - // Set timer to drain after KFREE_DRAIN_JIFFIES. - if (rcu_scheduler_active == RCU_SCHEDULER_RUNNING) - schedule_delayed_monitor_work(krcp); - -unlock_return: - krc_this_cpu_unlock(krcp, flags); - - /* - * Inline kvfree() after synchronize_rcu(). We can do - * it from might_sleep() context only, so the current - * CPU can pass the QS state. - */ - if (!success) { - debug_rcu_head_unqueue((struct rcu_head *) ptr); - synchronize_rcu(); - kvfree(ptr); - } -} -EXPORT_SYMBOL_GPL(kvfree_call_rcu); - -/** - * kvfree_rcu_barrier - Wait until all in-flight kvfree_rcu() complete. - * - * Note that a single argument of kvfree_rcu() call has a slow path that - * triggers synchronize_rcu() following by freeing a pointer. It is done - * before the return from the function. Therefore for any single-argument - * call that will result in a kfree() to a cache that is to be destroyed - * during module exit, it is developer's responsibility to ensure that all - * such calls have returned before the call to kmem_cache_destroy(). - */ -void kvfree_rcu_barrier(void) -{ - struct kfree_rcu_cpu_work *krwp; - struct kfree_rcu_cpu *krcp; - bool queued; - int i, cpu; - - /* Temporary. */ - rcu_barrier(); - - /* - * Firstly we detach objects and queue them over an RCU-batch - * for all CPUs. Finally queued works are flushed for each CPU. - * - * Please note. If there are outstanding batches for a particular - * CPU, those have to be finished first following by queuing a new. - */ - for_each_possible_cpu(cpu) { - krcp = per_cpu_ptr(&krc, cpu); - - /* - * Check if this CPU has any objects which have been queued for a - * new GP completion. If not(means nothing to detach), we are done - * with it. If any batch is pending/running for this "krcp", below - * per-cpu flush_rcu_work() waits its completion(see last step). - */ - if (!need_offload_krc(krcp)) - continue; - - while (1) { - /* - * If we are not able to queue a new RCU work it means: - * - batches for this CPU are still in flight which should - * be flushed first and then repeat; - * - no objects to detach, because of concurrency. - */ - queued = kvfree_rcu_queue_batch(krcp); - - /* - * Bail out, if there is no need to offload this "krcp" - * anymore. As noted earlier it can run concurrently. - */ - if (queued || !need_offload_krc(krcp)) - break; - - /* There are ongoing batches. */ - for (i = 0; i < KFREE_N_BATCHES; i++) { - krwp = &(krcp->krw_arr[i]); - flush_rcu_work(&krwp->rcu_work); - } - } - } - - /* - * Now we guarantee that all objects are flushed. - */ - for_each_possible_cpu(cpu) { - krcp = per_cpu_ptr(&krc, cpu); - - /* - * A monitor work can drain ready to reclaim objects - * directly. Wait its completion if running or pending. - */ - cancel_delayed_work_sync(&krcp->monitor_work); - - for (i = 0; i < KFREE_N_BATCHES; i++) { - krwp = &(krcp->krw_arr[i]); - flush_rcu_work(&krwp->rcu_work); - } - } -} -EXPORT_SYMBOL_GPL(kvfree_rcu_barrier); - -static unsigned long -kfree_rcu_shrink_count(struct shrinker *shrink, struct shrink_control *sc) -{ - int cpu; - unsigned long count = 0; - - /* Snapshot count of all CPUs */ - for_each_possible_cpu(cpu) { - struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu); - - count += krc_count(krcp); - count += READ_ONCE(krcp->nr_bkv_objs); - atomic_set(&krcp->backoff_page_cache_fill, 1); - } - - return count == 0 ? SHRINK_EMPTY : count; -} - -static unsigned long -kfree_rcu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc) -{ - int cpu, freed = 0; - - for_each_possible_cpu(cpu) { - int count; - struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu); - - count = krc_count(krcp); - count += drain_page_cache(krcp); - kfree_rcu_monitor(&krcp->monitor_work.work); - - sc->nr_to_scan -= count; - freed += count; - - if (sc->nr_to_scan <= 0) - break; - } - - return freed == 0 ? SHRINK_STOP : freed; -} - -void __init kfree_rcu_scheduler_running(void) -{ - int cpu; - - for_each_possible_cpu(cpu) { - struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu); - - if (need_offload_krc(krcp)) - schedule_delayed_monitor_work(krcp); - } -} - /* * During early boot, any blocking grace-period wait automatically * implies a grace period. @@ -5665,62 +4830,12 @@ static void __init rcu_dump_rcu_node_tree(void) struct workqueue_struct *rcu_gp_wq; -static void __init kfree_rcu_batch_init(void) -{ - int cpu; - int i, j; - struct shrinker *kfree_rcu_shrinker; - - /* Clamp it to [0:100] seconds interval. */ - if (rcu_delay_page_cache_fill_msec < 0 || - rcu_delay_page_cache_fill_msec > 100 * MSEC_PER_SEC) { - - rcu_delay_page_cache_fill_msec = - clamp(rcu_delay_page_cache_fill_msec, 0, - (int) (100 * MSEC_PER_SEC)); - - pr_info("Adjusting rcutree.rcu_delay_page_cache_fill_msec to %d ms.\n", - rcu_delay_page_cache_fill_msec); - } - - for_each_possible_cpu(cpu) { - struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu); - - for (i = 0; i < KFREE_N_BATCHES; i++) { - INIT_RCU_WORK(&krcp->krw_arr[i].rcu_work, kfree_rcu_work); - krcp->krw_arr[i].krcp = krcp; - - for (j = 0; j < FREE_N_CHANNELS; j++) - INIT_LIST_HEAD(&krcp->krw_arr[i].bulk_head_free[j]); - } - - for (i = 0; i < FREE_N_CHANNELS; i++) - INIT_LIST_HEAD(&krcp->bulk_head[i]); - - INIT_DELAYED_WORK(&krcp->monitor_work, kfree_rcu_monitor); - INIT_DELAYED_WORK(&krcp->page_cache_work, fill_page_cache_func); - krcp->initialized = true; - } - - kfree_rcu_shrinker = shrinker_alloc(0, "rcu-kfree"); - if (!kfree_rcu_shrinker) { - pr_err("Failed to allocate kfree_rcu() shrinker!\n"); - return; - } - - kfree_rcu_shrinker->count_objects = kfree_rcu_shrink_count; - kfree_rcu_shrinker->scan_objects = kfree_rcu_shrink_scan; - - shrinker_register(kfree_rcu_shrinker); -} - void __init rcu_init(void) { int cpu = smp_processor_id(); rcu_early_boot_tests(); - kfree_rcu_batch_init(); rcu_bootup_announce(); sanitize_kthread_prio(); rcu_init_geometry(); diff --git a/mm/slab_common.c b/mm/slab_common.c index cffc96bd279a..39de00e2cf88 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -1513,7 +1513,7 @@ put_cached_bnode(struct kfree_rcu_cpu *krcp, return true; } -static int __maybe_unused +static int drain_page_cache(struct kfree_rcu_cpu *krcp) { unsigned long flags; @@ -1600,7 +1600,7 @@ kvfree_rcu_list(struct rcu_head *head) * This function is invoked in workqueue context after a grace period. * It frees all the objects queued on ->bulk_head_free or ->head_free. */ -static void __maybe_unused +static void kfree_rcu_work(struct work_struct *work) { unsigned long flags; @@ -1793,7 +1793,7 @@ kvfree_rcu_queue_batch(struct kfree_rcu_cpu *krcp) /* * This function is invoked after the KFREE_DRAIN_JIFFIES timeout. */ -static void __maybe_unused +static void kfree_rcu_monitor(struct work_struct *work) { struct kfree_rcu_cpu *krcp = container_of(work, @@ -1814,17 +1814,7 @@ kfree_rcu_monitor(struct work_struct *work) schedule_delayed_monitor_work(krcp); } -static enum hrtimer_restart -schedule_page_work_fn(struct hrtimer *t) -{ - struct kfree_rcu_cpu *krcp = - container_of(t, struct kfree_rcu_cpu, hrtimer); - - queue_delayed_work(system_highpri_wq, &krcp->page_cache_work, 0); - return HRTIMER_NORESTART; -} - -static void __maybe_unused +static void fill_page_cache_func(struct work_struct *work) { struct kvfree_rcu_bulk_data *bnode; @@ -1860,27 +1850,6 @@ fill_page_cache_func(struct work_struct *work) atomic_set(&krcp->backoff_page_cache_fill, 0); } -static void __maybe_unused -run_page_cache_worker(struct kfree_rcu_cpu *krcp) -{ - // If cache disabled, bail out. - if (!rcu_min_cached_objs) - return; - - if (rcu_scheduler_active == RCU_SCHEDULER_RUNNING && - !atomic_xchg(&krcp->work_in_progress, 1)) { - if (atomic_read(&krcp->backoff_page_cache_fill)) { - queue_delayed_work(system_unbound_wq, - &krcp->page_cache_work, - msecs_to_jiffies(rcu_delay_page_cache_fill_msec)); - } else { - hrtimer_init(&krcp->hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL); - krcp->hrtimer.function = schedule_page_work_fn; - hrtimer_start(&krcp->hrtimer, 0, HRTIMER_MODE_REL); - } - } -} - // Record ptr in a page managed by krcp, with the pre-krc_this_cpu_lock() // state specified by flags. If can_alloc is true, the caller must // be schedulable and not be holding any locks or mutexes that might be @@ -1941,6 +1910,219 @@ add_ptr_to_bulk_krc_lock(struct kfree_rcu_cpu **krcp, return true; } +#if !defined(CONFIG_TINY_RCU) + +static enum hrtimer_restart +schedule_page_work_fn(struct hrtimer *t) +{ + struct kfree_rcu_cpu *krcp = + container_of(t, struct kfree_rcu_cpu, hrtimer); + + queue_delayed_work(system_highpri_wq, &krcp->page_cache_work, 0); + return HRTIMER_NORESTART; +} + +static void +run_page_cache_worker(struct kfree_rcu_cpu *krcp) +{ + // If cache disabled, bail out. + if (!rcu_min_cached_objs) + return; + + if (rcu_scheduler_active == RCU_SCHEDULER_RUNNING && + !atomic_xchg(&krcp->work_in_progress, 1)) { + if (atomic_read(&krcp->backoff_page_cache_fill)) { + queue_delayed_work(system_unbound_wq, + &krcp->page_cache_work, + msecs_to_jiffies(rcu_delay_page_cache_fill_msec)); + } else { + hrtimer_init(&krcp->hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL); + krcp->hrtimer.function = schedule_page_work_fn; + hrtimer_start(&krcp->hrtimer, 0, HRTIMER_MODE_REL); + } + } +} + +/* + * Queue a request for lazy invocation of the appropriate free routine + * after a grace period. Please note that three paths are maintained, + * two for the common case using arrays of pointers and a third one that + * is used only when the main paths cannot be used, for example, due to + * memory pressure. + * + * Each kvfree_call_rcu() request is added to a batch. The batch will be drained + * every KFREE_DRAIN_JIFFIES number of jiffies. All the objects in the batch will + * be free'd in workqueue context. This allows us to: batch requests together to + * reduce the number of grace periods during heavy kfree_rcu()/kvfree_rcu() load. + */ +void kvfree_call_rcu(struct rcu_head *head, void *ptr) +{ + unsigned long flags; + struct kfree_rcu_cpu *krcp; + bool success; + + /* + * Please note there is a limitation for the head-less + * variant, that is why there is a clear rule for such + * objects: it can be used from might_sleep() context + * only. For other places please embed an rcu_head to + * your data. + */ + if (!head) + might_sleep(); + + // Queue the object but don't yet schedule the batch. + if (debug_rcu_head_queue(ptr)) { + // Probable double kfree_rcu(), just leak. + WARN_ONCE(1, "%s(): Double-freed call. rcu_head %p\n", + __func__, head); + + // Mark as success and leave. + return; + } + + kasan_record_aux_stack_noalloc(ptr); + success = add_ptr_to_bulk_krc_lock(&krcp, &flags, ptr, !head); + if (!success) { + run_page_cache_worker(krcp); + + if (head == NULL) + // Inline if kvfree_rcu(one_arg) call. + goto unlock_return; + + head->func = ptr; + head->next = krcp->head; + WRITE_ONCE(krcp->head, head); + atomic_inc(&krcp->head_count); + + // Take a snapshot for this krcp. + krcp->head_gp_snap = get_state_synchronize_rcu(); + success = true; + } + + /* + * The kvfree_rcu() caller considers the pointer freed at this point + * and likely removes any references to it. Since the actual slab + * freeing (and kmemleak_free()) is deferred, tell kmemleak to ignore + * this object (no scanning or false positives reporting). + */ + kmemleak_ignore(ptr); + + // Set timer to drain after KFREE_DRAIN_JIFFIES. + if (rcu_scheduler_active == RCU_SCHEDULER_RUNNING) + schedule_delayed_monitor_work(krcp); + +unlock_return: + krc_this_cpu_unlock(krcp, flags); + + /* + * Inline kvfree() after synchronize_rcu(). We can do + * it from might_sleep() context only, so the current + * CPU can pass the QS state. + */ + if (!success) { + debug_rcu_head_unqueue((struct rcu_head *) ptr); + synchronize_rcu(); + kvfree(ptr); + } +} +EXPORT_SYMBOL_GPL(kvfree_call_rcu); + +void __init +kfree_rcu_scheduler_running(void) +{ + int cpu; + + for_each_possible_cpu(cpu) { + struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu); + + if (need_offload_krc(krcp)) + schedule_delayed_monitor_work(krcp); + } +} + +/** + * kvfree_rcu_barrier - Wait until all in-flight kvfree_rcu() complete. + * + * Note that a single argument of kvfree_rcu() call has a slow path that + * triggers synchronize_rcu() following by freeing a pointer. It is done + * before the return from the function. Therefore for any single-argument + * call that will result in a kfree() to a cache that is to be destroyed + * during module exit, it is developer's responsibility to ensure that all + * such calls have returned before the call to kmem_cache_destroy(). + */ +void kvfree_rcu_barrier(void) +{ + struct kfree_rcu_cpu_work *krwp; + struct kfree_rcu_cpu *krcp; + bool queued; + int i, cpu; + + /* + * Firstly we detach objects and queue them over an RCU-batch + * for all CPUs. Finally queued works are flushed for each CPU. + * + * Please note. If there are outstanding batches for a particular + * CPU, those have to be finished first following by queuing a new. + */ + for_each_possible_cpu(cpu) { + krcp = per_cpu_ptr(&krc, cpu); + + /* + * Check if this CPU has any objects which have been queued for a + * new GP completion. If not(means nothing to detach), we are done + * with it. If any batch is pending/running for this "krcp", below + * per-cpu flush_rcu_work() waits its completion(see last step). + */ + if (!need_offload_krc(krcp)) + continue; + + while (1) { + /* + * If we are not able to queue a new RCU work it means: + * - batches for this CPU are still in flight which should + * be flushed first and then repeat; + * - no objects to detach, because of concurrency. + */ + queued = kvfree_rcu_queue_batch(krcp); + + /* + * Bail out, if there is no need to offload this "krcp" + * anymore. As noted earlier it can run concurrently. + */ + if (queued || !need_offload_krc(krcp)) + break; + + /* There are ongoing batches. */ + for (i = 0; i < KFREE_N_BATCHES; i++) { + krwp = &(krcp->krw_arr[i]); + flush_rcu_work(&krwp->rcu_work); + } + } + } + + /* + * Now we guarantee that all objects are flushed. + */ + for_each_possible_cpu(cpu) { + krcp = per_cpu_ptr(&krc, cpu); + + /* + * A monitor work can drain ready to reclaim objects + * directly. Wait its completion if running or pending. + */ + cancel_delayed_work_sync(&krcp->monitor_work); + + for (i = 0; i < KFREE_N_BATCHES; i++) { + krwp = &(krcp->krw_arr[i]); + flush_rcu_work(&krwp->rcu_work); + } + } +} +EXPORT_SYMBOL_GPL(kvfree_rcu_barrier); + +#endif /* #if !defined(CONFIG_TINY_RCU) */ + static unsigned long kfree_rcu_shrink_count(struct shrinker *shrink, struct shrink_control *sc) { @@ -1982,8 +2164,8 @@ kfree_rcu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc) return freed == 0 ? SHRINK_STOP : freed; } -static void __init __maybe_unused -kfree_rcu_batch_init(void) +void __init +kvfree_rcu_init(void) { int cpu; int i, j;