diff mbox series

[net] net: lapbether: Prevent racing when checking whether the netif is running

Message ID 20210311072311.2969-1-xie.he.0141@gmail.com (mailing list archive)
State Accepted
Delegated to: Netdev Maintainers
Headers show
Series [net] net: lapbether: Prevent racing when checking whether the netif is running | expand

Checks

Context Check Description
netdev/cover_letter success Link
netdev/fixes_present success Link
netdev/patch_count success Link
netdev/tree_selection success Clearly marked for net
netdev/subject_prefix success Link
netdev/cc_maintainers success CCed 5 of 5 maintainers
netdev/source_inline success Was 0 now: 0
netdev/verify_signedoff success Link
netdev/module_param success Was 0 now: 0
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/verify_fixes success Link
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 94 lines checked
netdev/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
netdev/header_inline success Link

Commit Message

Xie He March 11, 2021, 7:23 a.m. UTC
There are two "netif_running" checks in this driver. One is in
"lapbeth_xmit" and the other is in "lapbeth_rcv". They serve to make
sure that the LAPB APIs called in these functions are called before
"lapb_unregister" is called by the "ndo_stop" function.

However, these "netif_running" checks are unreliable, because it's
possible that immediately after "netif_running" returns true, "ndo_stop"
is called (which causes "lapb_unregister" to be called).

This patch adds locking to make sure "lapbeth_xmit" and "lapbeth_rcv" can
reliably check and ensure the netif is running while doing their work.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Xie He <xie.he.0141@gmail.com>
---
 drivers/net/wan/lapbether.c | 32 +++++++++++++++++++++++++-------
 1 file changed, 25 insertions(+), 7 deletions(-)

Comments

Jakub Kicinski March 11, 2021, 8:43 p.m. UTC | #1
On Wed, 10 Mar 2021 23:23:09 -0800 Xie He wrote:
> There are two "netif_running" checks in this driver. One is in
> "lapbeth_xmit" and the other is in "lapbeth_rcv". They serve to make
> sure that the LAPB APIs called in these functions are called before
> "lapb_unregister" is called by the "ndo_stop" function.
> 
> However, these "netif_running" checks are unreliable, because it's
> possible that immediately after "netif_running" returns true, "ndo_stop"
> is called (which causes "lapb_unregister" to be called).
> 
> This patch adds locking to make sure "lapbeth_xmit" and "lapbeth_rcv" can
> reliably check and ensure the netif is running while doing their work.
> 
> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> Signed-off-by: Xie He <xie.he.0141@gmail.com>

Is this a theoretical issues or do you see a path where it triggers?

Who are the callers sending frames to a device which went down?
Xie He March 11, 2021, 9:12 p.m. UTC | #2
On Thu, Mar 11, 2021 at 12:43 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> Is this a theoretical issues or do you see a path where it triggers?
>
> Who are the callers sending frames to a device which went down?

This is a theoretical issue. I didn't see this issue in practice.

When "__dev_queue_xmit" and "sch_direct_xmit" call
"dev_hard_start_xmit", there appears to be no locking mechanism
preventing the netif from going down while "dev_hard_start_xmit" is
doing its work.

David once confirmed in an email that a driver's "ndo_stop" function
would indeed race with its "ndo_start_xmit" function:
https://lore.kernel.org/netdev/20190520.200922.2277656639346033061.davem@davemloft.net/
Jakub Kicinski March 11, 2021, 10:52 p.m. UTC | #3
On Thu, 11 Mar 2021 13:12:25 -0800 Xie He wrote:
> On Thu, Mar 11, 2021 at 12:43 PM Jakub Kicinski <kuba@kernel.org> wrote:
> >
> > Is this a theoretical issues or do you see a path where it triggers?
> >
> > Who are the callers sending frames to a device which went down?  
> 
> This is a theoretical issue. I didn't see this issue in practice.
> 
> When "__dev_queue_xmit" and "sch_direct_xmit" call
> "dev_hard_start_xmit", there appears to be no locking mechanism
> preventing the netif from going down while "dev_hard_start_xmit" is
> doing its work.

Normally driver's ndo_stop() calls netif_tx_disable() which takes TX
locks, so unless your driver is lockless (LLTX) there should be no xmit
calls after that point.
Xie He March 11, 2021, 11:13 p.m. UTC | #4
On Thu, Mar 11, 2021 at 2:52 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> Normally driver's ndo_stop() calls netif_tx_disable() which takes TX
> locks, so unless your driver is lockless (LLTX) there should be no xmit
> calls after that point.

Do you mean I should call "netif_tx_disable" inside my "ndo_stop"
function as a fix for the racing between "ndo_stop" and
"ndo_start_xmit"?

I can't call "netif_tx_disable" inside my "ndo_stop" function because
"netif_tx_disable" will call "netif_tx_stop_queue", which causes
another racing problem. Please see my recent commit f7d9d4854519
("net: lapbether: Remove netif_start_queue / netif_stop_queue")
Jakub Kicinski March 12, 2021, 12:10 a.m. UTC | #5
On Thu, 11 Mar 2021 15:13:01 -0800 Xie He wrote:
> On Thu, Mar 11, 2021 at 2:52 PM Jakub Kicinski <kuba@kernel.org> wrote:
> >
> > Normally driver's ndo_stop() calls netif_tx_disable() which takes TX
> > locks, so unless your driver is lockless (LLTX) there should be no xmit
> > calls after that point.  
> 
> Do you mean I should call "netif_tx_disable" inside my "ndo_stop"
> function as a fix for the racing between "ndo_stop" and
> "ndo_start_xmit"?
> 
> I can't call "netif_tx_disable" inside my "ndo_stop" function because
> "netif_tx_disable" will call "netif_tx_stop_queue", which causes
> another racing problem. Please see my recent commit f7d9d4854519
> ("net: lapbether: Remove netif_start_queue / netif_stop_queue")

And the "noqueue" queue is there because it's on top of hdlc_fr.c
somehow or some out of tree driver? Or do you install it manually?
Xie He March 12, 2021, 12:28 a.m. UTC | #6
On Thu, Mar 11, 2021 at 4:10 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> And the "noqueue" queue is there because it's on top of hdlc_fr.c
> somehow or some out of tree driver? Or do you install it manually?

No, this driver is not related to "hdlc_fr.c" or any out-of-tree
driver. The default qdisc is "noqueue" for this driver because this
driver doesn't set "tx_queue_len". This means the value of
"tx_queue_len" would be 0. In this case, "alloc_netdev_mqs" will
automatically add the "IFF_NO_QUEUE" flag to the device, then
"attach_one_default_qdisc" in "net/sched/sch_generic.c" will attach
the "noqueue" qdisc for devices with the "IFF_NO_QUEUE" flag.
Jakub Kicinski March 12, 2021, 1:40 a.m. UTC | #7
On Thu, 11 Mar 2021 16:28:47 -0800 Xie He wrote:
> On Thu, Mar 11, 2021 at 4:10 PM Jakub Kicinski <kuba@kernel.org> wrote:
> >
> > And the "noqueue" queue is there because it's on top of hdlc_fr.c
> > somehow or some out of tree driver? Or do you install it manually?  
> 
> No, this driver is not related to "hdlc_fr.c" or any out-of-tree
> driver. The default qdisc is "noqueue" for this driver because this
> driver doesn't set "tx_queue_len". This means the value of
> "tx_queue_len" would be 0. In this case, "alloc_netdev_mqs" will
> automatically add the "IFF_NO_QUEUE" flag to the device, then
> "attach_one_default_qdisc" in "net/sched/sch_generic.c" will attach
> the "noqueue" qdisc for devices with the "IFF_NO_QUEUE" flag.

I see.
Xie He March 14, 2021, 1:59 a.m. UTC | #8
Hi Martin,

Could you ack? Thanks!
Martin Schiller March 16, 2021, 3:17 p.m. UTC | #9
On 2021-03-14 02:59, Xie He wrote:
> Hi Martin,
> 
> Could you ack? Thanks!

Acked-by: Martin Schiller <ms@dev.tdt.de>
Xie He March 16, 2021, 3:54 p.m. UTC | #10
On Tue, Mar 16, 2021 at 8:17 AM Martin Schiller <ms@dev.tdt.de> wrote:
>
> On 2021-03-14 02:59, Xie He wrote:
> > Hi Martin,
> >
> > Could you ack? Thanks!
>
> Acked-by: Martin Schiller <ms@dev.tdt.de>

Thank you!!
diff mbox series

Patch

diff --git a/drivers/net/wan/lapbether.c b/drivers/net/wan/lapbether.c
index c3372498f4f1..8fda0446ff71 100644
--- a/drivers/net/wan/lapbether.c
+++ b/drivers/net/wan/lapbether.c
@@ -51,6 +51,8 @@  struct lapbethdev {
 	struct list_head	node;
 	struct net_device	*ethdev;	/* link to ethernet device */
 	struct net_device	*axdev;		/* lapbeth device (lapb#) */
+	bool			up;
+	spinlock_t		up_lock;	/* Protects "up" */
 };
 
 static LIST_HEAD(lapbeth_devices);
@@ -101,8 +103,9 @@  static int lapbeth_rcv(struct sk_buff *skb, struct net_device *dev, struct packe
 	rcu_read_lock();
 	lapbeth = lapbeth_get_x25_dev(dev);
 	if (!lapbeth)
-		goto drop_unlock;
-	if (!netif_running(lapbeth->axdev))
+		goto drop_unlock_rcu;
+	spin_lock_bh(&lapbeth->up_lock);
+	if (!lapbeth->up)
 		goto drop_unlock;
 
 	len = skb->data[0] + skb->data[1] * 256;
@@ -117,11 +120,14 @@  static int lapbeth_rcv(struct sk_buff *skb, struct net_device *dev, struct packe
 		goto drop_unlock;
 	}
 out:
+	spin_unlock_bh(&lapbeth->up_lock);
 	rcu_read_unlock();
 	return 0;
 drop_unlock:
 	kfree_skb(skb);
 	goto out;
+drop_unlock_rcu:
+	rcu_read_unlock();
 drop:
 	kfree_skb(skb);
 	return 0;
@@ -151,13 +157,11 @@  static int lapbeth_data_indication(struct net_device *dev, struct sk_buff *skb)
 static netdev_tx_t lapbeth_xmit(struct sk_buff *skb,
 				      struct net_device *dev)
 {
+	struct lapbethdev *lapbeth = netdev_priv(dev);
 	int err;
 
-	/*
-	 * Just to be *really* sure not to send anything if the interface
-	 * is down, the ethernet device may have gone.
-	 */
-	if (!netif_running(dev))
+	spin_lock_bh(&lapbeth->up_lock);
+	if (!lapbeth->up)
 		goto drop;
 
 	/* There should be a pseudo header of 1 byte added by upper layers.
@@ -194,6 +198,7 @@  static netdev_tx_t lapbeth_xmit(struct sk_buff *skb,
 		goto drop;
 	}
 out:
+	spin_unlock_bh(&lapbeth->up_lock);
 	return NETDEV_TX_OK;
 drop:
 	kfree_skb(skb);
@@ -285,6 +290,7 @@  static const struct lapb_register_struct lapbeth_callbacks = {
  */
 static int lapbeth_open(struct net_device *dev)
 {
+	struct lapbethdev *lapbeth = netdev_priv(dev);
 	int err;
 
 	if ((err = lapb_register(dev, &lapbeth_callbacks)) != LAPB_OK) {
@@ -292,13 +298,22 @@  static int lapbeth_open(struct net_device *dev)
 		return -ENODEV;
 	}
 
+	spin_lock_bh(&lapbeth->up_lock);
+	lapbeth->up = true;
+	spin_unlock_bh(&lapbeth->up_lock);
+
 	return 0;
 }
 
 static int lapbeth_close(struct net_device *dev)
 {
+	struct lapbethdev *lapbeth = netdev_priv(dev);
 	int err;
 
+	spin_lock_bh(&lapbeth->up_lock);
+	lapbeth->up = false;
+	spin_unlock_bh(&lapbeth->up_lock);
+
 	if ((err = lapb_unregister(dev)) != LAPB_OK)
 		pr_err("lapb_unregister error: %d\n", err);
 
@@ -356,6 +371,9 @@  static int lapbeth_new_device(struct net_device *dev)
 	dev_hold(dev);
 	lapbeth->ethdev = dev;
 
+	lapbeth->up = false;
+	spin_lock_init(&lapbeth->up_lock);
+
 	rc = -EIO;
 	if (register_netdevice(ndev))
 		goto fail;