From patchwork Sun Nov 19 09:25:29 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Haifeng Xu X-Patchwork-Id: 13460333 X-Patchwork-Delegate: kuba@kernel.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=shopee.com header.i=@shopee.com header.b="UpJnNI1g" Received: from mail-oi1-x236.google.com (mail-oi1-x236.google.com [IPv6:2607:f8b0:4864:20::236]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 900F8126 for ; Sun, 19 Nov 2023 01:25:46 -0800 (PST) Received: by mail-oi1-x236.google.com with SMTP id 5614622812f47-3b2e73a17a0so2363651b6e.3 for ; Sun, 19 Nov 2023 01:25:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=shopee.com; s=shopee.com; t=1700385946; x=1700990746; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=FD0yKcX7kb1gSGQ8bjxMh0bsMglpu98y3UOeL+swq7s=; b=UpJnNI1gUBRN5wInuTBJwDa0M3BX3ItfR9/62g7jCoGn1Au4zpDw1NCriO4KISsfyd XLaOHrHqZ+xfI8MV60hF/1suplUD/D2ZoxVuL1QUDXfvHgLdm+SwCgTd5kcJKLxkmx5g XgKpkdG5BmSdsuPqBaaWT+DaBIzR7T80pbn+cNaPIMc19qzzZ1N0MgJlKkazmRuFfveY aFfyngqgmHoed7TyFOGj8n3RheNeLIz1sp98RyG+lEd9JC2oB1EowvsU0H8z+P1OOgqe PKhFcgWv9/3auf8Dyh3I7p0HzemFogneF+/jJgMM3VB4Zc6xNjA0l0Mtj0yDc/2oGFg6 tm2w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1700385946; x=1700990746; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=FD0yKcX7kb1gSGQ8bjxMh0bsMglpu98y3UOeL+swq7s=; b=J8nEA1E2lL2Ythm4+l4XEpUSQltHoPzYbERBkV8Y7F+GYw+ox3G0qzefmER1eZzx93 i0OAR6gAD8V8MqzeRN62GKRzG+SvgNRJquZQU9aZc+0q4VdZW8y8G7EzFsnR+MmYNfW0 vb3v9Yd5uqtXPtNutpnN5MKM4DZSqfmG3JbidrCz0JF91kZuFrq45z4iGjqCuyS0et9c WR21/5HzzH6M6SD9EaVewnpSPgI9BFERX0oAB4+wEeNIchbKb9emiatPn+mn/heUXQFs qsy+N+GPaS9S0YMQbiSi1Y9M53ENgfNtshyMJQU8ZX3nyqPNFpOUdTMdjCf2zxkkazMd Qikw== X-Gm-Message-State: AOJu0Yz68pMzQasVtgqE5+I73k/9kIfnH/fZjc1lUNkbs624m64lwzza 2Gln81btHlYVKTzrpgtYWENQOA== X-Google-Smtp-Source: AGHT+IFO3PUO8cjn+I99JfAVxnBNNpjCqvobLaIcZMSfHpc3ahoKwFZ3EOdta2OymcCDqTlB/F73Cg== X-Received: by 2002:a05:6808:f02:b0:3ae:3bd:d3d2 with SMTP id m2-20020a0568080f0200b003ae03bdd3d2mr6355804oiw.10.1700385945893; Sun, 19 Nov 2023 01:25:45 -0800 (PST) Received: from ubuntu-hf2.default.svc.cluster.local ([101.127.248.173]) by smtp.gmail.com with ESMTPSA id d8-20020a170903230800b001cc0e3a29a8sm4060770plh.89.2023.11.19.01.25.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 19 Nov 2023 01:25:45 -0800 (PST) From: Haifeng Xu To: edumazet@google.com Cc: andy@greyhouse.net, davem@davemloft.net, j.vosburgh@gmail.com, kuba@kernel.org, pabeni@redhat.com, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Haifeng Xu Subject: [PATCH v3 1/2] bonding: export devnet_rename_sem Date: Sun, 19 Nov 2023 09:25:29 +0000 Message-Id: <20231119092530.13071-1-haifeng.xu@shopee.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org This patch exports devnet_rename_sem variable, so it can be accessed in the bonding modulde, not only being limited in net/core/dev.c. Signed-off-by: Haifeng Xu Suggested-by: Eric Dumazet Reviewed-by: Eric Dumazet --- include/net/bonding.h | 3 +++ net/core/dev.c | 3 ++- 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/include/net/bonding.h b/include/net/bonding.h index 5b8b1b644a2d..6c16d778b615 100644 --- a/include/net/bonding.h +++ b/include/net/bonding.h @@ -780,6 +780,9 @@ extern const struct sysfs_ops slave_sysfs_ops; /* exported from bond_3ad.c */ extern const u8 lacpdu_mcast_addr[]; +/* exported from net/core/dev.c */ +extern struct rw_semaphore devnet_rename_sem; + static inline netdev_tx_t bond_tx_drop(struct net_device *dev, struct sk_buff *skb) { dev_core_stats_tx_dropped_inc(dev); diff --git a/net/core/dev.c b/net/core/dev.c index af53f6d838ce..fdafab617227 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -197,7 +197,8 @@ static DEFINE_SPINLOCK(napi_hash_lock); static unsigned int napi_gen_id = NR_CPUS; static DEFINE_READ_MOSTLY_HASHTABLE(napi_hash, 8); -static DECLARE_RWSEM(devnet_rename_sem); +DECLARE_RWSEM(devnet_rename_sem); +EXPORT_SYMBOL(devnet_rename_sem); static inline void dev_base_seq_inc(struct net *net) { From patchwork Sun Nov 19 09:25:30 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Haifeng Xu X-Patchwork-Id: 13460334 X-Patchwork-Delegate: kuba@kernel.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=shopee.com header.i=@shopee.com header.b="UDdJGanh" Received: from mail-pf1-x42b.google.com (mail-pf1-x42b.google.com [IPv6:2607:f8b0:4864:20::42b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 97D28129 for ; Sun, 19 Nov 2023 01:25:51 -0800 (PST) Received: by mail-pf1-x42b.google.com with SMTP id d2e1a72fcca58-6c10f098a27so2764358b3a.2 for ; Sun, 19 Nov 2023 01:25:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=shopee.com; s=shopee.com; t=1700385951; x=1700990751; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=cl0+5wne9atTu1deNj/aM6BhU2wZluYA+vtbr+CP8oc=; b=UDdJGanhZR2MhMHOZY+K4NfuIexN9FN6hUP9X+WdxdJgCmKHINM6cI05xcijvsVsGb 9Bg9UyxE9Nycd+nk/IMcu+tnAY8ic8cQWnTSKCpUc4DJI6Kze1mRrJAJjQZooDAchboi HZMmjpaVCyzL+6BN6c/GWGP+k0v3sawqSkExPidXTeJCgYb1brI7JKBWOaAKx3gr+dTD 5lEcRQUp+9Wm65YkV/48GV0jtv2/icg8B/sS6xRhmOyOCbPjAkXmj6jw1oQdo88MIWj1 k4n5mZ6ByeRDfp9vn5mtPp0QXJ9s4RN35hv/SajBFA7Z55nvtuCmM4O3R9jUAh2qNquX tEgw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1700385951; x=1700990751; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=cl0+5wne9atTu1deNj/aM6BhU2wZluYA+vtbr+CP8oc=; b=AFKnfNF618YheQ49h8azwY7EU+sUN41H6uzsBIWxhb77fau6fFJ5qFHNF/uil5D4Wr sCuX1rz2kiJ67N24rI59OpiOn7t/cOgbdZ8gQgcMIGEA+FmuuWioBXobkNGibdPBqSPC S6Enq9CmhP73b/5tJcF0UANScr0UWYiT/EBpgfMyFql9ySMDiI64YUhM/19nH176HfpC 4WptQrGqgI+8HCxxd6BZCqipSCh3NyjejwHhfi7rFBD353hXQKrblHq7Hq0Scq5K7J/6 9XJpfzX7JHL+GNKWii+A5ndUONfpASLywbj8n8/aAQaWOmjCjQ2lPVspWCoUsEf81Hk9 sRHA== X-Gm-Message-State: AOJu0YwRxVS8aZaMzSUoWb09V7rMGpPdo9RGjZ9yzeAA1qhLQB9JIZYh 8EdS/jzAOmn5BSK/IpUYWhC+Eg== X-Google-Smtp-Source: AGHT+IEgZIYHLvOa28MWmYQ4TUN9OQAx9cgdtnyK3TVYFd5RmO1wYor5816Txy+B6+Ukt6Pn5kFWSg== X-Received: by 2002:a17:902:ecc6:b0:1cf:530b:d007 with SMTP id a6-20020a170902ecc600b001cf530bd007mr1873339plh.53.1700385950979; Sun, 19 Nov 2023 01:25:50 -0800 (PST) Received: from ubuntu-hf2.default.svc.cluster.local ([101.127.248.173]) by smtp.gmail.com with ESMTPSA id d8-20020a170903230800b001cc0e3a29a8sm4060770plh.89.2023.11.19.01.25.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 19 Nov 2023 01:25:50 -0800 (PST) From: Haifeng Xu To: edumazet@google.com Cc: andy@greyhouse.net, davem@davemloft.net, j.vosburgh@gmail.com, kuba@kernel.org, pabeni@redhat.com, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Haifeng Xu Subject: [PATCH v3 2/2] bonding: use a read-write lock in bonding_show_bonds() Date: Sun, 19 Nov 2023 09:25:30 +0000 Message-Id: <20231119092530.13071-2-haifeng.xu@shopee.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org Problem description: Call stack: ...... PID: 210933 TASK: ffff92424e5ec080 CPU: 13 COMMAND: "kworker/u96:2" [ffffa7a8e96bbac0] __schedule at ffffffffb0719898 [ffffa7a8e96bbb48] schedule at ffffffffb0719e9e [ffffa7a8e96bbb68] rwsem_down_write_slowpath at ffffffffafb3167a [ffffa7a8e96bbc00] down_write at ffffffffb071bfc1 [ffffa7a8e96bbc18] kernfs_remove_by_name_ns at ffffffffafe3593e [ffffa7a8e96bbc48] sysfs_unmerge_group at ffffffffafe38922 [ffffa7a8e96bbc68] dpm_sysfs_remove at ffffffffb021c96a [ffffa7a8e96bbc80] device_del at ffffffffb0209af8 [ffffa7a8e96bbcd0] netdev_unregister_kobject at ffffffffb04a6b0e [ffffa7a8e96bbcf8] unregister_netdevice_many at ffffffffb046d3d9 [ffffa7a8e96bbd60] default_device_exit_batch at ffffffffb046d8d1 [ffffa7a8e96bbdd0] ops_exit_list at ffffffffb045e21d [ffffa7a8e96bbe00] cleanup_net at ffffffffb045ea46 [ffffa7a8e96bbe60] process_one_work at ffffffffafad94bb [ffffa7a8e96bbeb0] worker_thread at ffffffffafad96ad [ffffa7a8e96bbf10] kthread at ffffffffafae132a [ffffa7a8e96bbf50] ret_from_fork at ffffffffafa04b92 290858 PID: 278176 TASK: ffff925deb39a040 CPU: 32 COMMAND: "node-exporter" [ffffa7a8d14dbb80] __schedule at ffffffffb0719898 [ffffa7a8d14dbc08] schedule at ffffffffb0719e9e [ffffa7a8d14dbc28] schedule_preempt_disabled at ffffffffb071a24e [ffffa7a8d14dbc38] __mutex_lock at ffffffffb071af28 [ffffa7a8d14dbcb8] __mutex_lock_slowpath at ffffffffb071b1a3 [ffffa7a8d14dbcc8] mutex_lock at ffffffffb071b1e2 [ffffa7a8d14dbce0] rtnl_lock at ffffffffb047f4b5 [ffffa7a8d14dbcf0] bonding_show_bonds at ffffffffc079b1a1 [bonding] [ffffa7a8d14dbd20] class_attr_show at ffffffffb02117ce [ffffa7a8d14dbd30] sysfs_kf_seq_show at ffffffffafe37ba1 [ffffa7a8d14dbd50] kernfs_seq_show at ffffffffafe35c07 [ffffa7a8d14dbd60] seq_read_iter at ffffffffafd9fce0 [ffffa7a8d14dbdc0] kernfs_fop_read_iter at ffffffffafe36a10 [ffffa7a8d14dbe00] new_sync_read at ffffffffafd6de23 [ffffa7a8d14dbe90] vfs_read at ffffffffafd6e64e [ffffa7a8d14dbed0] ksys_read at ffffffffafd70977 [ffffa7a8d14dbf10] __x64_sys_read at ffffffffafd70a0a [ffffa7a8d14dbf20] do_syscall_64 at ffffffffb070bf1c [ffffa7a8d14dbf50] entry_SYSCALL_64_after_hwframe at ffffffffb080007c ...... Thread 210933 holds the rtnl_mutex and tries to acquire the kernfs_rwsem, but there are many readers which hold the kernfs_rwsem, so it has to sleep for a long time to wait the readers release the lock. Thread 278176 and any other threads which call bonding_show_bonds() also need to wait because they try to acquire the rtnl_mutex. bonding_show_bonds() uses rtnl_mutex to protect the bond_list traversal. However, the addition and deletion of bond_list are only performed in bond_init()/bond_uninit(), so we can introduce a separate read-write lock to synchronize bond list mutation. In addition, bonding_show_bonds() could race with dev_change_name(), so we need devnet_rename_sem to protect the access to dev->name. What are the benefits of this change? 1) All threads which call bonding_show_bonds() only wait when the registration or unregistration of bond device happens or the name of net device changes. 2) There are many other users of rtnl_mutex, so bonding_show_bonds() won't compete with them. In a word, this change reduces the lock contention of rtnl_mutex. Signed-off-by: Haifeng Xu Suggested-by: Eric Dumazet Reviewed-by: Eric Dumazet --- v2: - move the call stack after the description - fix typos in the changelog v3: - add devnet_rename_sem in bonding_show_bonds() - update the changelog --- drivers/net/bonding/bond_main.c | 4 ++++ drivers/net/bonding/bond_sysfs.c | 8 ++++++-- include/net/bonding.h | 3 +++ 3 files changed, 13 insertions(+), 2 deletions(-) diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 8e6cc0e133b7..db8f1efaab78 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -5957,7 +5957,9 @@ static void bond_uninit(struct net_device *bond_dev) bond_set_slave_arr(bond, NULL, NULL); + write_lock(&bonding_dev_lock); list_del(&bond->bond_list); + write_unlock(&bonding_dev_lock); bond_debug_unregister(bond); } @@ -6370,7 +6372,9 @@ static int bond_init(struct net_device *bond_dev) spin_lock_init(&bond->stats_lock); netdev_lockdep_set_classes(bond_dev); + write_lock(&bonding_dev_lock); list_add_tail(&bond->bond_list, &bn->dev_list); + write_unlock(&bonding_dev_lock); bond_prepare_sysfs_group(bond); diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c index 2805135a7205..5de71af7c36f 100644 --- a/drivers/net/bonding/bond_sysfs.c +++ b/drivers/net/bonding/bond_sysfs.c @@ -28,6 +28,8 @@ #define to_bond(cd) ((struct bonding *)(netdev_priv(to_net_dev(cd)))) +DEFINE_RWLOCK(bonding_dev_lock); + /* "show" function for the bond_masters attribute. * The class parameter is ignored. */ @@ -40,7 +42,8 @@ static ssize_t bonding_show_bonds(const struct class *cls, int res = 0; struct bonding *bond; - rtnl_lock(); + down_read(&devnet_rename_sem); + read_lock(&bonding_dev_lock); list_for_each_entry(bond, &bn->dev_list, bond_list) { if (res > (PAGE_SIZE - IFNAMSIZ)) { @@ -55,7 +58,8 @@ static ssize_t bonding_show_bonds(const struct class *cls, if (res) buf[res-1] = '\n'; /* eat the leftover space */ - rtnl_unlock(); + read_unlock(&bonding_dev_lock); + up_read(&devnet_rename_sem); return res; } diff --git a/include/net/bonding.h b/include/net/bonding.h index 6c16d778b615..ede4116457e2 100644 --- a/include/net/bonding.h +++ b/include/net/bonding.h @@ -783,6 +783,9 @@ extern const u8 lacpdu_mcast_addr[]; /* exported from net/core/dev.c */ extern struct rw_semaphore devnet_rename_sem; +/* exported from bond_sysfs.c */ +extern rwlock_t bonding_dev_lock; + static inline netdev_tx_t bond_tx_drop(struct net_device *dev, struct sk_buff *skb) { dev_core_stats_tx_dropped_inc(dev);