Message ID | 20201204175125.1444314-1-vladimir.oltean@nxp.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [v2,net] net: mscc: ocelot: install MAC addresses in .ndo_set_rx_mode from process context | expand |
Context | Check | Description |
---|---|---|
netdev/apply | fail | Patch does not apply to net |
netdev/tree_selection | success | Clearly marked for net |
On Fri, 4 Dec 2020 19:51:25 +0200 Vladimir Oltean wrote: > Currently ocelot_set_rx_mode calls ocelot_mact_learn directly, which has > a very nice ocelot_mact_wait_for_completion at the end. Introduced in > commit 639c1b2625af ("net: mscc: ocelot: Register poll timeout should be > wall time not attempts"), this function uses readx_poll_timeout which > triggers a lot of lockdep warnings and is also dangerous to use from > atomic context, leading to lockups and panics. > > Steen Hegelund added a poll timeout of 100 ms for checking the MAC > table, a duration which is clearly absurd to poll in atomic context. > So we need to defer the MAC table access to process context, which we do > via a dynamically allocated workqueue which contains all there is to > know about the MAC table operation it has to do. > > Fixes: 639c1b2625af ("net: mscc: ocelot: Register poll timeout should be wall time not attempts") > Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> > Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> > --- > Changes in v2: > - Added Fixes tag (it won't backport that far, but anyway) > - Using get_device and put_device to avoid racing with unbind Does get_device really protect you from unbind? I thought it only protects you from .release being called, IOW freeing struct device memory.. More usual way of handling this would be allocating your own workqueue and flushing that wq at the right point. > drivers/net/ethernet/mscc/ocelot_net.c | 83 +++++++++++++++++++++++++- > 1 file changed, 80 insertions(+), 3 deletions(-) This is a little large for a rc7 fix :S What's the expected poll time? maybe it's not that bad to busy wait? Clearly nobody noticed the issue in 2 years (you mention lockdep so not a "real" deadlock) which makes me think the wait can't be that long? Also for a reference - there are drivers out there with busy poll timeout of seconds :/
On Fri, Dec 04, 2020 at 10:00:21AM -0800, Jakub Kicinski wrote: > On Fri, 4 Dec 2020 19:51:25 +0200 Vladimir Oltean wrote: > > Currently ocelot_set_rx_mode calls ocelot_mact_learn directly, which has > > a very nice ocelot_mact_wait_for_completion at the end. Introduced in > > commit 639c1b2625af ("net: mscc: ocelot: Register poll timeout should be > > wall time not attempts"), this function uses readx_poll_timeout which > > triggers a lot of lockdep warnings and is also dangerous to use from > > atomic context, leading to lockups and panics. > > > > Steen Hegelund added a poll timeout of 100 ms for checking the MAC > > table, a duration which is clearly absurd to poll in atomic context. > > So we need to defer the MAC table access to process context, which we do > > via a dynamically allocated workqueue which contains all there is to > > know about the MAC table operation it has to do. > > > > Fixes: 639c1b2625af ("net: mscc: ocelot: Register poll timeout should be wall time not attempts") > > Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> > > Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> > > --- > > Changes in v2: > > - Added Fixes tag (it won't backport that far, but anyway) > > - Using get_device and put_device to avoid racing with unbind > > Does get_device really protect you from unbind? I thought it only > protects you from .release being called, IOW freeing struct device > memory.. Possibly. I ran a bind && unbind loop for a while, and I couldn't trigger any concurrency. > More usual way of handling this would be allocating your own workqueue > and flushing that wq at the right point. Yeah, well I more or less deliberately lose track of the workqueue as soon as ocelot_enqueue_mact_action is over, and that is by design. There is potentially more than one address to offload to the hardware in progress at the same time, and any sort of serialization in .ndo_set_rx_mode (so I could add the workqueue to a list of items to cancel on unbind) would mean (a) more complicated code (b) more busy waiting > > drivers/net/ethernet/mscc/ocelot_net.c | 83 +++++++++++++++++++++++++- > > 1 file changed, 80 insertions(+), 3 deletions(-) > > This is a little large for a rc7 fix :S Fine, net-next it is then. > What's the expected poll time? maybe it's not that bad to busy wait? > Clearly nobody noticed the issue in 2 years (you mention lockdep so > not a "real" deadlock) which makes me think the wait can't be that long? Not too much, but the sleep is there. Also, all 3 of ocelot/felix/seville are memory-mapped devices. But I heard from Alex a while ago that he intends to add support for a switch managed over a slow bus like SPI, and to use the same regmap infrastructure. That would mean that this problem would need to be resolved anyway. > Also for a reference - there are drivers out there with busy poll > timeout of seconds :/ Yeah, not sure if that tells me anything. I prefer avoiding that from atomic context, because our cyclictest numbers are not that great anyway, the last thing I want is to make them worse.
On 04/12/2020 18:12:51+0000, Vladimir Oltean wrote: > Yeah, well I more or less deliberately lose track of the workqueue as > soon as ocelot_enqueue_mact_action is over, and that is by design. There > is potentially more than one address to offload to the hardware in progress > at the same time, and any sort of serialization in .ndo_set_rx_mode (so > I could add the workqueue to a list of items to cancel on unbind) > would mean > (a) more complicated code > (b) more busy waiting > > > > drivers/net/ethernet/mscc/ocelot_net.c | 83 +++++++++++++++++++++++++- > > > 1 file changed, 80 insertions(+), 3 deletions(-) > > > > This is a little large for a rc7 fix :S > > Fine, net-next it is then. > > > What's the expected poll time? maybe it's not that bad to busy wait? > > Clearly nobody noticed the issue in 2 years (you mention lockdep so > > not a "real" deadlock) which makes me think the wait can't be that long? > > Not too much, but the sleep is there. > Also, all 3 of ocelot/felix/seville are memory-mapped devices. But I > heard from Alex a while ago that he intends to add support for a switch > managed over a slow bus like SPI, and to use the same regmap infrastructure. > That would mean that this problem would need to be resolved anyway. > This is still on the way but it will not happen this year unfortunately.
On Fri, Dec 04, 2020 at 10:00:21AM -0800, Jakub Kicinski wrote: > Does get_device really protect you from unbind? I thought it only > protects you from .release being called, IOW freeing struct device > memory.. > > More usual way of handling this would be allocating your own workqueue > and flushing that wq at the right point. > > > drivers/net/ethernet/mscc/ocelot_net.c | 83 +++++++++++++++++++++++++- > > 1 file changed, 80 insertions(+), 3 deletions(-) > > This is a little large for a rc7 fix :S If you like v1 more, you can apply v1.
On Fri, 4 Dec 2020 18:12:51 +0000 Vladimir Oltean wrote: > On Fri, Dec 04, 2020 at 10:00:21AM -0800, Jakub Kicinski wrote: > > On Fri, 4 Dec 2020 19:51:25 +0200 Vladimir Oltean wrote: > > > Currently ocelot_set_rx_mode calls ocelot_mact_learn directly, which has > > > a very nice ocelot_mact_wait_for_completion at the end. Introduced in > > > commit 639c1b2625af ("net: mscc: ocelot: Register poll timeout should be > > > wall time not attempts"), this function uses readx_poll_timeout which > > > triggers a lot of lockdep warnings and is also dangerous to use from > > > atomic context, leading to lockups and panics. > > > > > > Steen Hegelund added a poll timeout of 100 ms for checking the MAC > > > table, a duration which is clearly absurd to poll in atomic context. > > > So we need to defer the MAC table access to process context, which we do > > > via a dynamically allocated workqueue which contains all there is to > > > know about the MAC table operation it has to do. > > > > > > Fixes: 639c1b2625af ("net: mscc: ocelot: Register poll timeout should be wall time not attempts") > > > Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> > > > Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> > > > --- > > > Changes in v2: > > > - Added Fixes tag (it won't backport that far, but anyway) > > > - Using get_device and put_device to avoid racing with unbind > > > > Does get_device really protect you from unbind? I thought it only > > protects you from .release being called, IOW freeing struct device > > memory.. > > Possibly. > I ran a bind && unbind loop for a while, and I couldn't trigger any > concurrency. You'd need to switch to a delayed work or add some other sleep for testing, maybe? > > More usual way of handling this would be allocating your own workqueue > > and flushing that wq at the right point. > > Yeah, well I more or less deliberately lose track of the workqueue as > soon as ocelot_enqueue_mact_action is over, and that is by design. There > is potentially more than one address to offload to the hardware in progress > at the same time, and any sort of serialization in .ndo_set_rx_mode (so > I could add the workqueue to a list of items to cancel on unbind) > would mean > (a) more complicated code > (b) more busy waiting Are you sure you're not confusing workqueue with a work entry? You can still put multiple work entries on the queue. > > > drivers/net/ethernet/mscc/ocelot_net.c | 83 +++++++++++++++++++++++++- > > > 1 file changed, 80 insertions(+), 3 deletions(-) > > > > This is a little large for a rc7 fix :S > > Fine, net-next it is then. If this really is the fix we want, it's the fix we want, and it should go into net. We'll just need to test it very well is all. > > What's the expected poll time? maybe it's not that bad to busy wait? > > Clearly nobody noticed the issue in 2 years (you mention lockdep so > > not a "real" deadlock) which makes me think the wait can't be that long? > > Not too much, but the sleep is there. > Also, all 3 of ocelot/felix/seville are memory-mapped devices. But I > heard from Alex a while ago that he intends to add support for a switch > managed over a slow bus like SPI, and to use the same regmap infrastructure. > That would mean that this problem would need to be resolved anyway. So it's MMIO but the other end is some firmware running on the device? > > Also for a reference - there are drivers out there with busy poll > > timeout of seconds :/ > > Yeah, not sure if that tells me anything. I prefer avoiding that from > atomic context, because our cyclictest numbers are not that great anyway, > the last thing I want is to make them worse. Fair.
On Fri, Dec 04, 2020 at 10:55:16AM -0800, Jakub Kicinski wrote: > On Fri, 4 Dec 2020 18:12:51 +0000 Vladimir Oltean wrote: > > On Fri, Dec 04, 2020 at 10:00:21AM -0800, Jakub Kicinski wrote: > > > On Fri, 4 Dec 2020 19:51:25 +0200 Vladimir Oltean wrote: > > > > Currently ocelot_set_rx_mode calls ocelot_mact_learn directly, which has > > > > a very nice ocelot_mact_wait_for_completion at the end. Introduced in > > > > commit 639c1b2625af ("net: mscc: ocelot: Register poll timeout should be > > > > wall time not attempts"), this function uses readx_poll_timeout which > > > > triggers a lot of lockdep warnings and is also dangerous to use from > > > > atomic context, leading to lockups and panics. > > > > > > > > Steen Hegelund added a poll timeout of 100 ms for checking the MAC > > > > table, a duration which is clearly absurd to poll in atomic context. > > > > So we need to defer the MAC table access to process context, which we do > > > > via a dynamically allocated workqueue which contains all there is to > > > > know about the MAC table operation it has to do. > > > > > > > > Fixes: 639c1b2625af ("net: mscc: ocelot: Register poll timeout should be wall time not attempts") > > > > Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> > > > > Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> > > > > --- > > > > Changes in v2: > > > > - Added Fixes tag (it won't backport that far, but anyway) > > > > - Using get_device and put_device to avoid racing with unbind > > > > > > Does get_device really protect you from unbind? I thought it only > > > protects you from .release being called, IOW freeing struct device > > > memory.. > > > > Possibly. > > I ran a bind && unbind loop for a while, and I couldn't trigger any > > concurrency. > > You'd need to switch to a delayed work or add some other sleep for > testing, maybe? Ok, I'll test with a sleep in the worker task. > > > More usual way of handling this would be allocating your own workqueue > > > and flushing that wq at the right point. > > > > Yeah, well I more or less deliberately lose track of the workqueue as > > soon as ocelot_enqueue_mact_action is over, and that is by design. There > > is potentially more than one address to offload to the hardware in progress > > at the same time, and any sort of serialization in .ndo_set_rx_mode (so > > I could add the workqueue to a list of items to cancel on unbind) > > would mean > > (a) more complicated code > > (b) more busy waiting > > Are you sure you're not confusing workqueue with a work entry? > > You can still put multiple work entries on the queue. I am confused indeed. I will create an ordered_workqueue in ocelot and I will flush it after unregistering the network interfaces and before unbinding the device, when accesses to registers are still valid but there is no further NDO that gets called. > > > > drivers/net/ethernet/mscc/ocelot_net.c | 83 +++++++++++++++++++++++++- > > > > 1 file changed, 80 insertions(+), 3 deletions(-) > > > > > > This is a little large for a rc7 fix :S > > > > Fine, net-next it is then. > > If this really is the fix we want, it's the fix we want, and it should > go into net. We'll just need to test it very well is all. Well, as I said, I don't care at all how far this patch will be backported. I am not using the ocelot switchdev driver anyway (since I don't have hardware that uses it), I just have a test vehicle that I use from time to time to check that I don't introduce regressions in the various code paths. And seeing lockdep give warnings is annoying. I am perfectly fine with targeting v3 for net-next. I don't think even the AUTOSEL crew will pick it up, since it's going to conflict with some refactoring. > > > What's the expected poll time? maybe it's not that bad to busy wait? > > > Clearly nobody noticed the issue in 2 years (you mention lockdep so > > > not a "real" deadlock) which makes me think the wait can't be that long? > > > > Not too much, but the sleep is there. > > Also, all 3 of ocelot/felix/seville are memory-mapped devices. But I > > heard from Alex a while ago that he intends to add support for a switch > > managed over a slow bus like SPI, and to use the same regmap infrastructure. > > That would mean that this problem would need to be resolved anyway. > > So it's MMIO but the other end is some firmware running on the device? No. It's always register-based access, just that in some cases the registers are directly memory-mapped for Linux, while in other cases they are beyond an SPI bus. But there isn't any firmware in any case.
On Fri, 4 Dec 2020 10:55:16 -0800 Jakub Kicinski wrote: > > > > drivers/net/ethernet/mscc/ocelot_net.c | 83 +++++++++++++++++++++++++- > > > > 1 file changed, 80 insertions(+), 3 deletions(-) > > > > > > This is a little large for a rc7 fix :S > > > > Fine, net-next it is then. > > If this really is the fix we want, it's the fix we want, and it should > go into net. We'll just need to test it very well is all. And when I said "test very well" I obviously mean test against the tree it's targeting, 'cause this doesn't apply to net right now..
On 04/12/2020 10:55:16-0800, Jakub Kicinski wrote: > On Fri, 4 Dec 2020 18:12:51 +0000 Vladimir Oltean wrote: > > On Fri, Dec 04, 2020 at 10:00:21AM -0800, Jakub Kicinski wrote: > > > On Fri, 4 Dec 2020 19:51:25 +0200 Vladimir Oltean wrote: > > > > Currently ocelot_set_rx_mode calls ocelot_mact_learn directly, which has > > > > a very nice ocelot_mact_wait_for_completion at the end. Introduced in > > > > commit 639c1b2625af ("net: mscc: ocelot: Register poll timeout should be > > > > wall time not attempts"), this function uses readx_poll_timeout which > > > > triggers a lot of lockdep warnings and is also dangerous to use from > > > > atomic context, leading to lockups and panics. > > > > > > > > Steen Hegelund added a poll timeout of 100 ms for checking the MAC > > > > table, a duration which is clearly absurd to poll in atomic context. > > > > So we need to defer the MAC table access to process context, which we do > > > > via a dynamically allocated workqueue which contains all there is to > > > > know about the MAC table operation it has to do. > > > > > > > > Fixes: 639c1b2625af ("net: mscc: ocelot: Register poll timeout should be wall time not attempts") > > > > Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> > > > > Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> > > > > --- > > > > Changes in v2: > > > > - Added Fixes tag (it won't backport that far, but anyway) > > > > - Using get_device and put_device to avoid racing with unbind > > > > > > Does get_device really protect you from unbind? I thought it only > > > protects you from .release being called, IOW freeing struct device > > > memory.. > > > > Possibly. > > I ran a bind && unbind loop for a while, and I couldn't trigger any > > concurrency. > > You'd need to switch to a delayed work or add some other sleep for > testing, maybe? > > > > More usual way of handling this would be allocating your own workqueue > > > and flushing that wq at the right point. > > > > Yeah, well I more or less deliberately lose track of the workqueue as > > soon as ocelot_enqueue_mact_action is over, and that is by design. There > > is potentially more than one address to offload to the hardware in progress > > at the same time, and any sort of serialization in .ndo_set_rx_mode (so > > I could add the workqueue to a list of items to cancel on unbind) > > would mean > > (a) more complicated code > > (b) more busy waiting > > Are you sure you're not confusing workqueue with a work entry? > > You can still put multiple work entries on the queue. > > > > > drivers/net/ethernet/mscc/ocelot_net.c | 83 +++++++++++++++++++++++++- > > > > 1 file changed, 80 insertions(+), 3 deletions(-) > > > > > > This is a little large for a rc7 fix :S > > > > Fine, net-next it is then. > > If this really is the fix we want, it's the fix we want, and it should > go into net. We'll just need to test it very well is all. > > > > What's the expected poll time? maybe it's not that bad to busy wait? > > > Clearly nobody noticed the issue in 2 years (you mention lockdep so > > > not a "real" deadlock) which makes me think the wait can't be that long? > > > > Not too much, but the sleep is there. > > Also, all 3 of ocelot/felix/seville are memory-mapped devices. But I > > heard from Alex a while ago that he intends to add support for a switch > > managed over a slow bus like SPI, and to use the same regmap infrastructure. > > That would mean that this problem would need to be resolved anyway. > > So it's MMIO but the other end is some firmware running on the device? > No, the SoC can also expose its registers as a pcie endpoint or a SPI device. In that case, obviously, the on board MIPS CPU is not enabled. IIRC, you can even access all the registers using MDIO. > > > Also for a reference - there are drivers out there with busy poll > > > timeout of seconds :/ > > > > Yeah, not sure if that tells me anything. I prefer avoiding that from > > atomic context, because our cyclictest numbers are not that great anyway, > > the last thing I want is to make them worse. > > Fair. On that topic, we could certainly play with regmap fast_io, I dont remember if this is enabled by default or even remove regmap locking completely when using MMIO.
On Fri, 4 Dec 2020 19:23:11 +0000 Vladimir Oltean wrote: > > > > What's the expected poll time? maybe it's not that bad to busy wait? > > > > Clearly nobody noticed the issue in 2 years (you mention lockdep so > > > > not a "real" deadlock) which makes me think the wait can't be that long? > > > > > > Not too much, but the sleep is there. > > > Also, all 3 of ocelot/felix/seville are memory-mapped devices. But I > > > heard from Alex a while ago that he intends to add support for a switch > > > managed over a slow bus like SPI, and to use the same regmap infrastructure. > > > That would mean that this problem would need to be resolved anyway. > > > > So it's MMIO but the other end is some firmware running on the device? > > No. It's always register-based access, just that in some cases the > registers are directly memory-mapped for Linux, while in other cases > they are beyond an SPI bus. But there isn't any firmware in any case. I see, so ocelot_mact_wait_for_completion() waits for some hardware engine to process the command? It looked like FW is hiding behind that register at a glance. 100ms sounds very high for hardware processing.
diff --git a/drivers/net/ethernet/mscc/ocelot_net.c b/drivers/net/ethernet/mscc/ocelot_net.c index c65ae6f75a16..b621772de91e 100644 --- a/drivers/net/ethernet/mscc/ocelot_net.c +++ b/drivers/net/ethernet/mscc/ocelot_net.c @@ -414,13 +414,84 @@ static int ocelot_port_xmit(struct sk_buff *skb, struct net_device *dev) return NETDEV_TX_OK; } +enum ocelot_action_type { + OCELOT_MACT_LEARN, + OCELOT_MACT_FORGET, +}; + +struct ocelot_mact_work_ctx { + struct work_struct work; + struct ocelot *ocelot; + enum ocelot_action_type type; + union { + /* OCELOT_MACT_LEARN */ + struct { + int pgid; + enum macaccess_entry_type entry_type; + unsigned char addr[ETH_ALEN]; + u16 vid; + } learn; + /* OCELOT_MACT_FORGET */ + struct { + unsigned char addr[ETH_ALEN]; + u16 vid; + } forget; + }; +}; + +#define ocelot_work_to_ctx(x) \ + container_of((x), struct ocelot_mact_work_ctx, work) + +static void ocelot_mact_work(struct work_struct *work) +{ + struct ocelot_mact_work_ctx *w = ocelot_work_to_ctx(work); + struct ocelot *ocelot = w->ocelot; + + switch (w->type) { + case OCELOT_MACT_LEARN: + ocelot_mact_learn(ocelot, w->learn.pgid, w->learn.addr, + w->learn.vid, w->learn.entry_type); + break; + case OCELOT_MACT_FORGET: + ocelot_mact_forget(ocelot, w->forget.addr, w->forget.vid); + break; + default: + break; + }; + + put_device(ocelot->dev); + kfree(w); +} + +static int ocelot_enqueue_mact_action(struct ocelot *ocelot, + const struct ocelot_mact_work_ctx *ctx) +{ + struct ocelot_mact_work_ctx *w = kmalloc(sizeof(*w), GFP_ATOMIC); + + if (!w) + return -ENOMEM; + + get_device(ocelot->dev); + memcpy(w, ctx, sizeof(*w)); + w->ocelot = ocelot; + INIT_WORK(&w->work, ocelot_mact_work); + schedule_work(&w->work); + + return 0; +} + static int ocelot_mc_unsync(struct net_device *dev, const unsigned char *addr) { struct ocelot_port_private *priv = netdev_priv(dev); struct ocelot_port *ocelot_port = &priv->port; struct ocelot *ocelot = ocelot_port->ocelot; + struct ocelot_mact_work_ctx w; + + ether_addr_copy(w.forget.addr, addr); + w.forget.vid = ocelot_port->pvid_vlan.vid; + w.type = OCELOT_MACT_FORGET; - return ocelot_mact_forget(ocelot, addr, ocelot_port->pvid_vlan.vid); + return ocelot_enqueue_mact_action(ocelot, &w); } static int ocelot_mc_sync(struct net_device *dev, const unsigned char *addr) @@ -428,9 +499,15 @@ static int ocelot_mc_sync(struct net_device *dev, const unsigned char *addr) struct ocelot_port_private *priv = netdev_priv(dev); struct ocelot_port *ocelot_port = &priv->port; struct ocelot *ocelot = ocelot_port->ocelot; + struct ocelot_mact_work_ctx w; + + ether_addr_copy(w.learn.addr, addr); + w.learn.vid = ocelot_port->pvid_vlan.vid; + w.learn.pgid = PGID_CPU; + w.learn.entry_type = ENTRYTYPE_LOCKED; + w.type = OCELOT_MACT_LEARN; - return ocelot_mact_learn(ocelot, PGID_CPU, addr, - ocelot_port->pvid_vlan.vid, ENTRYTYPE_LOCKED); + return ocelot_enqueue_mact_action(ocelot, &w); } static void ocelot_set_rx_mode(struct net_device *dev)