Message ID | 20220427164659.106447-9-miquel.raynal@bootlin.com (mailing list archive) |
---|---|
State | Awaiting Upstream |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | ieee802154: Synchronous Tx support | expand |
Context | Check | Description |
---|---|---|
netdev/tree_selection | success | Guessing tree name failed - patch did not apply |
Hi, On Wed, Apr 27, 2022 at 12:47 PM Miquel Raynal <miquel.raynal@bootlin.com> wrote: > > We should never start a transmission after the queue has been stopped. > > But because it might work we don't kill the function here but rather > warn loudly the user that something is wrong. > > Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> > --- > net/mac802154/ieee802154_i.h | 8 ++++++++ > net/mac802154/tx.c | 2 ++ > net/mac802154/util.c | 18 ++++++++++++++++++ > 3 files changed, 28 insertions(+) > > diff --git a/net/mac802154/ieee802154_i.h b/net/mac802154/ieee802154_i.h > index b55fdefb0b34..cb61a4abaf37 100644 > --- a/net/mac802154/ieee802154_i.h > +++ b/net/mac802154/ieee802154_i.h > @@ -178,6 +178,14 @@ bool ieee802154_queue_is_held(struct ieee802154_local *local); > */ > void ieee802154_stop_queue(struct ieee802154_local *local); > > +/** > + * ieee802154_queue_is_stopped - check whether ieee802154 queue was stopped > + * @local: main mac object > + * > + * Goes through all the interfaces and indicates if they are all stopped or not. > + */ > +bool ieee802154_queue_is_stopped(struct ieee802154_local *local); > + > /* MIB callbacks */ > void mac802154_dev_set_page_channel(struct net_device *dev, u8 page, u8 chan); > > diff --git a/net/mac802154/tx.c b/net/mac802154/tx.c > index a8a83f0167bf..021dddfea542 100644 > --- a/net/mac802154/tx.c > +++ b/net/mac802154/tx.c > @@ -124,6 +124,8 @@ bool ieee802154_queue_is_held(struct ieee802154_local *local) > static netdev_tx_t > ieee802154_hot_tx(struct ieee802154_local *local, struct sk_buff *skb) > { > + WARN_ON_ONCE(ieee802154_queue_is_stopped(local)); > + > return ieee802154_tx(local, skb); > } > > diff --git a/net/mac802154/util.c b/net/mac802154/util.c > index 847e0864b575..cfd17a7db532 100644 > --- a/net/mac802154/util.c > +++ b/net/mac802154/util.c > @@ -44,6 +44,24 @@ void ieee802154_stop_queue(struct ieee802154_local *local) > rcu_read_unlock(); > } > > +bool ieee802154_queue_is_stopped(struct ieee802154_local *local) > +{ > + struct ieee802154_sub_if_data *sdata; > + bool stopped = true; > + > + rcu_read_lock(); > + list_for_each_entry_rcu(sdata, &local->interfaces, list) { > + if (!sdata->dev) > + continue; > + > + if (!netif_queue_stopped(sdata->dev)) > + stopped = false; > + } > + rcu_read_unlock(); > + > + return stopped; > +} sorry this makes no sense, you using net core functionality to check if a queue is stopped in a net core netif callback. Whereas the sense here for checking if the queue is really stopped is when 802.15.4 thinks the queue is stopped vs net core netif callback running. It means for MLME-ops there are points we want to make sure that net core is not handling any xmit and we should check this point and not introducing net core functionality checks. btw: if it's hit your if branch the first time you can break? I am not done with the review, this is just what I see now and we can discuss that. Please be patient. - Alex
Hi Alexander, alex.aring@gmail.com wrote on Wed, 27 Apr 2022 14:01:25 -0400: > Hi, > > On Wed, Apr 27, 2022 at 12:47 PM Miquel Raynal > <miquel.raynal@bootlin.com> wrote: > > > > We should never start a transmission after the queue has been stopped. > > > > But because it might work we don't kill the function here but rather > > warn loudly the user that something is wrong. > > > > Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> > > --- [...] > > diff --git a/net/mac802154/tx.c b/net/mac802154/tx.c > > index a8a83f0167bf..021dddfea542 100644 > > --- a/net/mac802154/tx.c > > +++ b/net/mac802154/tx.c > > @@ -124,6 +124,8 @@ bool ieee802154_queue_is_held(struct ieee802154_local *local) > > static netdev_tx_t > > ieee802154_hot_tx(struct ieee802154_local *local, struct sk_buff *skb) > > { > > + WARN_ON_ONCE(ieee802154_queue_is_stopped(local)); > > + > > return ieee802154_tx(local, skb); > > } > > > > diff --git a/net/mac802154/util.c b/net/mac802154/util.c > > index 847e0864b575..cfd17a7db532 100644 > > --- a/net/mac802154/util.c > > +++ b/net/mac802154/util.c > > @@ -44,6 +44,24 @@ void ieee802154_stop_queue(struct ieee802154_local *local) > > rcu_read_unlock(); > > } > > > > +bool ieee802154_queue_is_stopped(struct ieee802154_local *local) > > +{ > > + struct ieee802154_sub_if_data *sdata; > > + bool stopped = true; > > + > > + rcu_read_lock(); > > + list_for_each_entry_rcu(sdata, &local->interfaces, list) { > > + if (!sdata->dev) > > + continue; > > + > > + if (!netif_queue_stopped(sdata->dev)) > > + stopped = false; > > + } > > + rcu_read_unlock(); > > + > > + return stopped; > > +} > > sorry this makes no sense, you using net core functionality to check > if a queue is stopped in a net core netif callback. Whereas the sense > here for checking if the queue is really stopped is when 802.15.4 > thinks the queue is stopped vs net core netif callback running. It > means for MLME-ops there are points we want to make sure that net core > is not handling any xmit and we should check this point and not > introducing net core functionality checks. I think I've mixed two things, your remark makes complete sense. I should instead here just check a 802.15.4 internal variable. > btw: if it's hit your if branch the first time you can break? Yes, we could definitely improve a bit the logic to break earlier, but in the end these checks won't remain I believe. > I am not done with the review, this is just what I see now and we can > discuss that. Please be patient. Sure, thanks for the quick feedback anyway! hanks, Miquèl
Hi, On Thu, Apr 28, 2022 at 3:58 AM Miquel Raynal <miquel.raynal@bootlin.com> wrote: > > Hi Alexander, > > alex.aring@gmail.com wrote on Wed, 27 Apr 2022 14:01:25 -0400: > > > Hi, > > > > On Wed, Apr 27, 2022 at 12:47 PM Miquel Raynal > > <miquel.raynal@bootlin.com> wrote: > > > > > > We should never start a transmission after the queue has been stopped. > > > > > > But because it might work we don't kill the function here but rather > > > warn loudly the user that something is wrong. > > > > > > Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> > > > --- > > [...] > > > > diff --git a/net/mac802154/tx.c b/net/mac802154/tx.c > > > index a8a83f0167bf..021dddfea542 100644 > > > --- a/net/mac802154/tx.c > > > +++ b/net/mac802154/tx.c > > > @@ -124,6 +124,8 @@ bool ieee802154_queue_is_held(struct ieee802154_local *local) > > > static netdev_tx_t > > > ieee802154_hot_tx(struct ieee802154_local *local, struct sk_buff *skb) > > > { > > > + WARN_ON_ONCE(ieee802154_queue_is_stopped(local)); > > > + > > > return ieee802154_tx(local, skb); > > > } > > > > > > diff --git a/net/mac802154/util.c b/net/mac802154/util.c > > > index 847e0864b575..cfd17a7db532 100644 > > > --- a/net/mac802154/util.c > > > +++ b/net/mac802154/util.c > > > @@ -44,6 +44,24 @@ void ieee802154_stop_queue(struct ieee802154_local *local) > > > rcu_read_unlock(); > > > } > > > > > > +bool ieee802154_queue_is_stopped(struct ieee802154_local *local) > > > +{ > > > + struct ieee802154_sub_if_data *sdata; > > > + bool stopped = true; > > > + > > > + rcu_read_lock(); > > > + list_for_each_entry_rcu(sdata, &local->interfaces, list) { > > > + if (!sdata->dev) > > > + continue; > > > + > > > + if (!netif_queue_stopped(sdata->dev)) > > > + stopped = false; > > > + } > > > + rcu_read_unlock(); > > > + > > > + return stopped; > > > +} > > > > sorry this makes no sense, you using net core functionality to check > > if a queue is stopped in a net core netif callback. Whereas the sense > > here for checking if the queue is really stopped is when 802.15.4 > > thinks the queue is stopped vs net core netif callback running. It > > means for MLME-ops there are points we want to make sure that net core > > is not handling any xmit and we should check this point and not > > introducing net core functionality checks. > > I think I've mixed two things, your remark makes complete sense. I > should instead here just check a 802.15.4 internal variable. > I am thinking about this patch series... and I think it still has bugs or at least it's easy to have bugs when the context is not right prepared to call a synchronized transmission. We leave here the netdev state machine world for transmit vs e.g. start/stop netif callback... We have a warning here if there is a core netif xmit callback running when 802.15.4 thinks it shouldn't (because we take control of it) but I also think about a kind of the other way around. A warning if 802.15.4 transmits something but the netdev core logic "thinks" it shouldn't. That requires some checks (probably from netcore functionality) to check if we call a 802.15.4 sync xmit but netif core already called stop() callback. The last stop() callback - means the driver_ops stop() callback was called, we have some "open_count" counter there which MUST be incremented before doing any looping of one or several sync transmissions. All I can say is if we call xmit() but the driver is in stop() state... it will break things. My concern is also here that e.g. calling netif down or device suspend() are only two examples I have in my mind right now. I don't know all cases which can occur, that's why we should introduce another WARN_ON_ONCE() for the case that 802.15.4 transmits something but we are in a state where we can't transmit something according to netif state (driver ops called stop()). Can you add such a check as well? And please keep in mind to increment the open count when implementing MLME-ops (or at least handle it somehow), otherwise I guess it's easy to hit the warning. If another user reports warnings and tells us what they did we might know more other "cases" to fix. There should maybe be an option in hwsim to delay a transmission completion and such cases can be tested... - Alex
Hi Alexander, alex.aring@gmail.com wrote on Sun, 1 May 2022 20:21:18 -0400: > Hi, > > On Thu, Apr 28, 2022 at 3:58 AM Miquel Raynal <miquel.raynal@bootlin.com> wrote: > > > > Hi Alexander, > > > > alex.aring@gmail.com wrote on Wed, 27 Apr 2022 14:01:25 -0400: > > > > > Hi, > > > > > > On Wed, Apr 27, 2022 at 12:47 PM Miquel Raynal > > > <miquel.raynal@bootlin.com> wrote: > > > > > > > > We should never start a transmission after the queue has been stopped. > > > > > > > > But because it might work we don't kill the function here but rather > > > > warn loudly the user that something is wrong. > > > > > > > > Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> > > > > --- > > > > [...] > > > > > > diff --git a/net/mac802154/tx.c b/net/mac802154/tx.c > > > > index a8a83f0167bf..021dddfea542 100644 > > > > --- a/net/mac802154/tx.c > > > > +++ b/net/mac802154/tx.c > > > > @@ -124,6 +124,8 @@ bool ieee802154_queue_is_held(struct ieee802154_local *local) > > > > static netdev_tx_t > > > > ieee802154_hot_tx(struct ieee802154_local *local, struct sk_buff *skb) > > > > { > > > > + WARN_ON_ONCE(ieee802154_queue_is_stopped(local)); > > > > + > > > > return ieee802154_tx(local, skb); > > > > } > > > > > > > > diff --git a/net/mac802154/util.c b/net/mac802154/util.c > > > > index 847e0864b575..cfd17a7db532 100644 > > > > --- a/net/mac802154/util.c > > > > +++ b/net/mac802154/util.c > > > > @@ -44,6 +44,24 @@ void ieee802154_stop_queue(struct ieee802154_local *local) > > > > rcu_read_unlock(); > > > > } > > > > > > > > +bool ieee802154_queue_is_stopped(struct ieee802154_local *local) > > > > +{ > > > > + struct ieee802154_sub_if_data *sdata; > > > > + bool stopped = true; > > > > + > > > > + rcu_read_lock(); > > > > + list_for_each_entry_rcu(sdata, &local->interfaces, list) { > > > > + if (!sdata->dev) > > > > + continue; > > > > + > > > > + if (!netif_queue_stopped(sdata->dev)) > > > > + stopped = false; > > > > + } > > > > + rcu_read_unlock(); > > > > + > > > > + return stopped; > > > > +} > > > > > > sorry this makes no sense, you using net core functionality to check > > > if a queue is stopped in a net core netif callback. Whereas the sense > > > here for checking if the queue is really stopped is when 802.15.4 > > > thinks the queue is stopped vs net core netif callback running. It > > > means for MLME-ops there are points we want to make sure that net core > > > is not handling any xmit and we should check this point and not > > > introducing net core functionality checks. > > > > I think I've mixed two things, your remark makes complete sense. I > > should instead here just check a 802.15.4 internal variable. > > > > I am thinking about this patch series... and I think it still has bugs > or at least it's easy to have bugs when the context is not right > prepared to call a synchronized transmission. We leave here the netdev > state machine world for transmit vs e.g. start/stop netif callback... > We have a warning here if there is a core netif xmit callback running > when 802.15.4 thinks it shouldn't (because we take control of it) but > I also think about a kind of the other way around. A warning if > 802.15.4 transmits something but the netdev core logic "thinks" it > shouldn't. > > That requires some checks (probably from netcore functionality) to > check if we call a 802.15.4 sync xmit but netif core already called > stop() callback. The last stop() callback - means the driver_ops > stop() callback was called, we have some "open_count" counter there > which MUST be incremented before doing any looping of one or several > sync transmissions. All I can say is if we call xmit() but the driver > is in stop() state... it will break things. > > My concern is also here that e.g. calling netif down or device > suspend() are only two examples I have in my mind right now. I don't > know all cases which can occur, that's why we should introduce another > WARN_ON_ONCE() for the case that 802.15.4 transmits something but we > are in a state where we can't transmit something according to netif > state (driver ops called stop()). > > Can you add such a check as well? That is a good idea, I have added such a check: if the interface is supposed to be down I'll warn and return because I don't think there is much we can do in this situation besides avoiding trying to transmit anything. > And please keep in mind to increment > the open count when implementing MLME-ops (or at least handle it > somehow), otherwise I guess it's easy to hit the warning. If another > user reports warnings and tells us what they did we might know more > other "cases" to fix. I don't think incrementing the open_count counter is the right solution here just because the stop call is not supposed to fail and has no straightforward ways to be deferred. In particular, just keeping the open_count incremented will just avoid the actual driver stop operation to be executed and the core will not notice it. I came out with another solution: acquiring the rtnl when performing a MLME Tx operation to serialize these operations. We can easily have a version which just checks the rtnl was acquired as well for situations when the MLME operations are called by eg. the nl layer (and thus, with the rtnl lock taken automatically). Thanks, Miquèl
Hi, On Thu, May 12, 2022 at 10:33 AM Miquel Raynal <miquel.raynal@bootlin.com> wrote: > > Hi Alexander, > > alex.aring@gmail.com wrote on Sun, 1 May 2022 20:21:18 -0400: > > > Hi, > > > > On Thu, Apr 28, 2022 at 3:58 AM Miquel Raynal <miquel.raynal@bootlin.com> wrote: > > > > > > Hi Alexander, > > > > > > alex.aring@gmail.com wrote on Wed, 27 Apr 2022 14:01:25 -0400: > > > > > > > Hi, > > > > > > > > On Wed, Apr 27, 2022 at 12:47 PM Miquel Raynal > > > > <miquel.raynal@bootlin.com> wrote: > > > > > > > > > > We should never start a transmission after the queue has been stopped. > > > > > > > > > > But because it might work we don't kill the function here but rather > > > > > warn loudly the user that something is wrong. > > > > > > > > > > Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> > > > > > --- > > > > > > [...] > > > > > > > > diff --git a/net/mac802154/tx.c b/net/mac802154/tx.c > > > > > index a8a83f0167bf..021dddfea542 100644 > > > > > --- a/net/mac802154/tx.c > > > > > +++ b/net/mac802154/tx.c > > > > > @@ -124,6 +124,8 @@ bool ieee802154_queue_is_held(struct ieee802154_local *local) > > > > > static netdev_tx_t > > > > > ieee802154_hot_tx(struct ieee802154_local *local, struct sk_buff *skb) > > > > > { > > > > > + WARN_ON_ONCE(ieee802154_queue_is_stopped(local)); > > > > > + > > > > > return ieee802154_tx(local, skb); > > > > > } > > > > > > > > > > diff --git a/net/mac802154/util.c b/net/mac802154/util.c > > > > > index 847e0864b575..cfd17a7db532 100644 > > > > > --- a/net/mac802154/util.c > > > > > +++ b/net/mac802154/util.c > > > > > @@ -44,6 +44,24 @@ void ieee802154_stop_queue(struct ieee802154_local *local) > > > > > rcu_read_unlock(); > > > > > } > > > > > > > > > > +bool ieee802154_queue_is_stopped(struct ieee802154_local *local) > > > > > +{ > > > > > + struct ieee802154_sub_if_data *sdata; > > > > > + bool stopped = true; > > > > > + > > > > > + rcu_read_lock(); > > > > > + list_for_each_entry_rcu(sdata, &local->interfaces, list) { > > > > > + if (!sdata->dev) > > > > > + continue; > > > > > + > > > > > + if (!netif_queue_stopped(sdata->dev)) > > > > > + stopped = false; > > > > > + } > > > > > + rcu_read_unlock(); > > > > > + > > > > > + return stopped; > > > > > +} > > > > > > > > sorry this makes no sense, you using net core functionality to check > > > > if a queue is stopped in a net core netif callback. Whereas the sense > > > > here for checking if the queue is really stopped is when 802.15.4 > > > > thinks the queue is stopped vs net core netif callback running. It > > > > means for MLME-ops there are points we want to make sure that net core > > > > is not handling any xmit and we should check this point and not > > > > introducing net core functionality checks. > > > > > > I think I've mixed two things, your remark makes complete sense. I > > > should instead here just check a 802.15.4 internal variable. > > > > > > > I am thinking about this patch series... and I think it still has bugs > > or at least it's easy to have bugs when the context is not right > > prepared to call a synchronized transmission. We leave here the netdev > > state machine world for transmit vs e.g. start/stop netif callback... > > We have a warning here if there is a core netif xmit callback running > > when 802.15.4 thinks it shouldn't (because we take control of it) but > > I also think about a kind of the other way around. A warning if > > 802.15.4 transmits something but the netdev core logic "thinks" it > > shouldn't. > > > > That requires some checks (probably from netcore functionality) to > > check if we call a 802.15.4 sync xmit but netif core already called > > stop() callback. The last stop() callback - means the driver_ops > > stop() callback was called, we have some "open_count" counter there > > which MUST be incremented before doing any looping of one or several > > sync transmissions. All I can say is if we call xmit() but the driver > > is in stop() state... it will break things. > > > > My concern is also here that e.g. calling netif down or device > > suspend() are only two examples I have in my mind right now. I don't > > know all cases which can occur, that's why we should introduce another > > WARN_ON_ONCE() for the case that 802.15.4 transmits something but we > > are in a state where we can't transmit something according to netif > > state (driver ops called stop()). > > > > Can you add such a check as well? > > That is a good idea, I have added such a check: if the interface is > supposed to be down I'll warn and return because I don't think there is > much we can do in this situation besides avoiding trying to transmit > anything. > ok... > > And please keep in mind to increment > > the open count when implementing MLME-ops (or at least handle it > > somehow), otherwise I guess it's easy to hit the warning. If another > > user reports warnings and tells us what they did we might know more > > other "cases" to fix. > > I don't think incrementing the open_count counter is the right solution > here just because the stop call is not supposed to fail and has no > straightforward ways to be deferred. In particular, just keeping the > open_count incremented will just avoid the actual driver stop operation > to be executed and the core will not notice it. > the stop callback can sleep, it's the job of the driver to synchronize it somehow with the transceiver state. > I came out with another solution: acquiring the rtnl when performing a > MLME Tx operation to serialize these operations. We can easily have a > version which just checks the rtnl was acquired as well for situations > when the MLME operations are called by eg. the nl layer (and thus, with > the rtnl lock taken automatically). The rtnl lock needs definitely to be held during such operation. - Alex
diff --git a/net/mac802154/ieee802154_i.h b/net/mac802154/ieee802154_i.h index b55fdefb0b34..cb61a4abaf37 100644 --- a/net/mac802154/ieee802154_i.h +++ b/net/mac802154/ieee802154_i.h @@ -178,6 +178,14 @@ bool ieee802154_queue_is_held(struct ieee802154_local *local); */ void ieee802154_stop_queue(struct ieee802154_local *local); +/** + * ieee802154_queue_is_stopped - check whether ieee802154 queue was stopped + * @local: main mac object + * + * Goes through all the interfaces and indicates if they are all stopped or not. + */ +bool ieee802154_queue_is_stopped(struct ieee802154_local *local); + /* MIB callbacks */ void mac802154_dev_set_page_channel(struct net_device *dev, u8 page, u8 chan); diff --git a/net/mac802154/tx.c b/net/mac802154/tx.c index a8a83f0167bf..021dddfea542 100644 --- a/net/mac802154/tx.c +++ b/net/mac802154/tx.c @@ -124,6 +124,8 @@ bool ieee802154_queue_is_held(struct ieee802154_local *local) static netdev_tx_t ieee802154_hot_tx(struct ieee802154_local *local, struct sk_buff *skb) { + WARN_ON_ONCE(ieee802154_queue_is_stopped(local)); + return ieee802154_tx(local, skb); } diff --git a/net/mac802154/util.c b/net/mac802154/util.c index 847e0864b575..cfd17a7db532 100644 --- a/net/mac802154/util.c +++ b/net/mac802154/util.c @@ -44,6 +44,24 @@ void ieee802154_stop_queue(struct ieee802154_local *local) rcu_read_unlock(); } +bool ieee802154_queue_is_stopped(struct ieee802154_local *local) +{ + struct ieee802154_sub_if_data *sdata; + bool stopped = true; + + rcu_read_lock(); + list_for_each_entry_rcu(sdata, &local->interfaces, list) { + if (!sdata->dev) + continue; + + if (!netif_queue_stopped(sdata->dev)) + stopped = false; + } + rcu_read_unlock(); + + return stopped; +} + enum hrtimer_restart ieee802154_xmit_ifs_timer(struct hrtimer *timer) { struct ieee802154_local *local =
We should never start a transmission after the queue has been stopped. But because it might work we don't kill the function here but rather warn loudly the user that something is wrong. Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> --- net/mac802154/ieee802154_i.h | 8 ++++++++ net/mac802154/tx.c | 2 ++ net/mac802154/util.c | 18 ++++++++++++++++++ 3 files changed, 28 insertions(+)