diff mbox series

[wpan-next,v2,09/11] net: mac802154: Introduce a synchronous API for MLME commands

Message ID 20220512143314.235604-10-miquel.raynal@bootlin.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series ieee802154: Synchronous Tx support | expand

Checks

Context Check Description
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix warning Target tree name not specified in the subject
netdev/cover_letter success Series has a cover letter
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 4 this patch: 4
netdev/cc_maintainers warning 1 maintainers not CCed: edumazet@google.com
netdev/build_clang success Errors and warnings before: 4 this patch: 4
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 4 this patch: 4
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 38 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/tree_selection success Guessing tree name failed - patch did not apply

Commit Message

Miquel Raynal May 12, 2022, 2:33 p.m. UTC
This is the slow path, we need to wait for each command to be processed
before continuing so let's introduce an helper which does the
transmission and blocks until it gets notified of its asynchronous
completion. This helper is going to be used when introducing scan
support.

Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
---
 net/mac802154/ieee802154_i.h |  1 +
 net/mac802154/tx.c           | 25 +++++++++++++++++++++++++
 2 files changed, 26 insertions(+)

Comments

Alexander Aring May 15, 2022, 10:28 p.m. UTC | #1
Hi,

On Thu, May 12, 2022 at 10:34 AM Miquel Raynal
<miquel.raynal@bootlin.com> wrote:
>
> This is the slow path, we need to wait for each command to be processed
> before continuing so let's introduce an helper which does the
> transmission and blocks until it gets notified of its asynchronous
> completion. This helper is going to be used when introducing scan
> support.
>
> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
> ---
>  net/mac802154/ieee802154_i.h |  1 +
>  net/mac802154/tx.c           | 25 +++++++++++++++++++++++++
>  2 files changed, 26 insertions(+)
>
> diff --git a/net/mac802154/ieee802154_i.h b/net/mac802154/ieee802154_i.h
> index a057827fc48a..f8b374810a11 100644
> --- a/net/mac802154/ieee802154_i.h
> +++ b/net/mac802154/ieee802154_i.h
> @@ -125,6 +125,7 @@ extern struct ieee802154_mlme_ops mac802154_mlme_wpan;
>  void ieee802154_rx(struct ieee802154_local *local, struct sk_buff *skb);
>  void ieee802154_xmit_sync_worker(struct work_struct *work);
>  int ieee802154_sync_and_hold_queue(struct ieee802154_local *local);
> +int ieee802154_mlme_tx(struct ieee802154_local *local, struct sk_buff *skb);
>  netdev_tx_t
>  ieee802154_monitor_start_xmit(struct sk_buff *skb, struct net_device *dev);
>  netdev_tx_t
> diff --git a/net/mac802154/tx.c b/net/mac802154/tx.c
> index 38f74b8b6740..ec8d872143ee 100644
> --- a/net/mac802154/tx.c
> +++ b/net/mac802154/tx.c
> @@ -128,6 +128,31 @@ int ieee802154_sync_and_hold_queue(struct ieee802154_local *local)
>         return ieee802154_sync_queue(local);
>  }
>
> +int ieee802154_mlme_tx(struct ieee802154_local *local, struct sk_buff *skb)
> +{
> +       int ret;
> +
> +       /* Avoid possible calls to ->ndo_stop() when we asynchronously perform
> +        * MLME transmissions.
> +        */
> +       rtnl_lock();

I think we should make an ASSERT_RTNL() here, the lock needs to be
earlier than that over the whole MLME op. MLME can trigger more than
one message, the whole sync_hold/release queue should be earlier than
that... in my opinion is it not right to allow other messages so far
an MLME op is going on? I am not sure what the standard says to this,
but I think it should be stopped the whole time? All those sequence
diagrams show only some specific frames, also remember that on the
receive side we drop all other frames if MLME op (e.g. scan) is going
on?

- Alex
Alexander Aring May 15, 2022, 10:56 p.m. UTC | #2
Hi,

On Sun, May 15, 2022 at 6:28 PM Alexander Aring <aahringo@redhat.com> wrote:
>
> Hi,
>
> On Thu, May 12, 2022 at 10:34 AM Miquel Raynal
> <miquel.raynal@bootlin.com> wrote:
> >
> > This is the slow path, we need to wait for each command to be processed
> > before continuing so let's introduce an helper which does the
> > transmission and blocks until it gets notified of its asynchronous
> > completion. This helper is going to be used when introducing scan
> > support.
> >
> > Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
> > ---
> >  net/mac802154/ieee802154_i.h |  1 +
> >  net/mac802154/tx.c           | 25 +++++++++++++++++++++++++
> >  2 files changed, 26 insertions(+)
> >
> > diff --git a/net/mac802154/ieee802154_i.h b/net/mac802154/ieee802154_i.h
> > index a057827fc48a..f8b374810a11 100644
> > --- a/net/mac802154/ieee802154_i.h
> > +++ b/net/mac802154/ieee802154_i.h
> > @@ -125,6 +125,7 @@ extern struct ieee802154_mlme_ops mac802154_mlme_wpan;
> >  void ieee802154_rx(struct ieee802154_local *local, struct sk_buff *skb);
> >  void ieee802154_xmit_sync_worker(struct work_struct *work);
> >  int ieee802154_sync_and_hold_queue(struct ieee802154_local *local);
> > +int ieee802154_mlme_tx(struct ieee802154_local *local, struct sk_buff *skb);
> >  netdev_tx_t
> >  ieee802154_monitor_start_xmit(struct sk_buff *skb, struct net_device *dev);
> >  netdev_tx_t
> > diff --git a/net/mac802154/tx.c b/net/mac802154/tx.c
> > index 38f74b8b6740..ec8d872143ee 100644
> > --- a/net/mac802154/tx.c
> > +++ b/net/mac802154/tx.c
> > @@ -128,6 +128,31 @@ int ieee802154_sync_and_hold_queue(struct ieee802154_local *local)
> >         return ieee802154_sync_queue(local);
> >  }
> >
> > +int ieee802154_mlme_tx(struct ieee802154_local *local, struct sk_buff *skb)
> > +{
> > +       int ret;
> > +
> > +       /* Avoid possible calls to ->ndo_stop() when we asynchronously perform
> > +        * MLME transmissions.
> > +        */
> > +       rtnl_lock();
>
> I think we should make an ASSERT_RTNL() here, the lock needs to be
> earlier than that over the whole MLME op. MLME can trigger more than
> one message, the whole sync_hold/release queue should be earlier than
> that... in my opinion is it not right to allow other messages so far
> an MLME op is going on? I am not sure what the standard says to this,
> but I think it should be stopped the whole time? All those sequence
> diagrams show only some specific frames, also remember that on the
> receive side we drop all other frames if MLME op (e.g. scan) is going
> on?

Maybe some mlme_op_pre(), ... mlme_tx(), ..., mlme_tx(), ...,
mlme_op_post() handling?

- Alex
Alexander Aring May 15, 2022, 11:03 p.m. UTC | #3
Hi,

On Sun, May 15, 2022 at 6:28 PM Alexander Aring <aahringo@redhat.com> wrote:
>
> Hi,
>
> On Thu, May 12, 2022 at 10:34 AM Miquel Raynal
> <miquel.raynal@bootlin.com> wrote:
> >
> > This is the slow path, we need to wait for each command to be processed
> > before continuing so let's introduce an helper which does the
> > transmission and blocks until it gets notified of its asynchronous
> > completion. This helper is going to be used when introducing scan
> > support.
> >
> > Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
> > ---
> >  net/mac802154/ieee802154_i.h |  1 +
> >  net/mac802154/tx.c           | 25 +++++++++++++++++++++++++
> >  2 files changed, 26 insertions(+)
> >
> > diff --git a/net/mac802154/ieee802154_i.h b/net/mac802154/ieee802154_i.h
> > index a057827fc48a..f8b374810a11 100644
> > --- a/net/mac802154/ieee802154_i.h
> > +++ b/net/mac802154/ieee802154_i.h
> > @@ -125,6 +125,7 @@ extern struct ieee802154_mlme_ops mac802154_mlme_wpan;
> >  void ieee802154_rx(struct ieee802154_local *local, struct sk_buff *skb);
> >  void ieee802154_xmit_sync_worker(struct work_struct *work);
> >  int ieee802154_sync_and_hold_queue(struct ieee802154_local *local);
> > +int ieee802154_mlme_tx(struct ieee802154_local *local, struct sk_buff *skb);
> >  netdev_tx_t
> >  ieee802154_monitor_start_xmit(struct sk_buff *skb, struct net_device *dev);
> >  netdev_tx_t
> > diff --git a/net/mac802154/tx.c b/net/mac802154/tx.c
> > index 38f74b8b6740..ec8d872143ee 100644
> > --- a/net/mac802154/tx.c
> > +++ b/net/mac802154/tx.c
> > @@ -128,6 +128,31 @@ int ieee802154_sync_and_hold_queue(struct ieee802154_local *local)
> >         return ieee802154_sync_queue(local);
> >  }
> >
> > +int ieee802154_mlme_tx(struct ieee802154_local *local, struct sk_buff *skb)
> > +{
> > +       int ret;
> > +
> > +       /* Avoid possible calls to ->ndo_stop() when we asynchronously perform
> > +        * MLME transmissions.
> > +        */
> > +       rtnl_lock();
>
> I think we should make an ASSERT_RTNL() here, the lock needs to be
> earlier than that over the whole MLME op. MLME can trigger more than

not over the whole MLME_op, that's terrible to hold the rtnl lock so
long... so I think this is fine that some netdev call will interfere
with this transmission.
So forget about the ASSERT_RTNL() here, it's fine (I hope).

> one message, the whole sync_hold/release queue should be earlier than
> that... in my opinion is it not right to allow other messages so far
> an MLME op is going on? I am not sure what the standard says to this,
> but I think it should be stopped the whole time? All those sequence

Whereas the stop of the netdev queue makes sense for the whole mlme-op
(in my opinion).

- Alex
Miquel Raynal May 17, 2022, 1:30 p.m. UTC | #4
aahringo@redhat.com wrote on Sun, 15 May 2022 19:03:53 -0400:

> Hi,
> 
> On Sun, May 15, 2022 at 6:28 PM Alexander Aring <aahringo@redhat.com> wrote:
> >
> > Hi,
> >
> > On Thu, May 12, 2022 at 10:34 AM Miquel Raynal
> > <miquel.raynal@bootlin.com> wrote:  
> > >
> > > This is the slow path, we need to wait for each command to be processed
> > > before continuing so let's introduce an helper which does the
> > > transmission and blocks until it gets notified of its asynchronous
> > > completion. This helper is going to be used when introducing scan
> > > support.
> > >
> > > Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
> > > ---
> > >  net/mac802154/ieee802154_i.h |  1 +
> > >  net/mac802154/tx.c           | 25 +++++++++++++++++++++++++
> > >  2 files changed, 26 insertions(+)
> > >
> > > diff --git a/net/mac802154/ieee802154_i.h b/net/mac802154/ieee802154_i.h
> > > index a057827fc48a..f8b374810a11 100644
> > > --- a/net/mac802154/ieee802154_i.h
> > > +++ b/net/mac802154/ieee802154_i.h
> > > @@ -125,6 +125,7 @@ extern struct ieee802154_mlme_ops mac802154_mlme_wpan;
> > >  void ieee802154_rx(struct ieee802154_local *local, struct sk_buff *skb);
> > >  void ieee802154_xmit_sync_worker(struct work_struct *work);
> > >  int ieee802154_sync_and_hold_queue(struct ieee802154_local *local);
> > > +int ieee802154_mlme_tx(struct ieee802154_local *local, struct sk_buff *skb);
> > >  netdev_tx_t
> > >  ieee802154_monitor_start_xmit(struct sk_buff *skb, struct net_device *dev);
> > >  netdev_tx_t
> > > diff --git a/net/mac802154/tx.c b/net/mac802154/tx.c
> > > index 38f74b8b6740..ec8d872143ee 100644
> > > --- a/net/mac802154/tx.c
> > > +++ b/net/mac802154/tx.c
> > > @@ -128,6 +128,31 @@ int ieee802154_sync_and_hold_queue(struct ieee802154_local *local)
> > >         return ieee802154_sync_queue(local);
> > >  }
> > >
> > > +int ieee802154_mlme_tx(struct ieee802154_local *local, struct sk_buff *skb)
> > > +{
> > > +       int ret;
> > > +
> > > +       /* Avoid possible calls to ->ndo_stop() when we asynchronously perform
> > > +        * MLME transmissions.
> > > +        */
> > > +       rtnl_lock();  
> >
> > I think we should make an ASSERT_RTNL() here, the lock needs to be
> > earlier than that over the whole MLME op. MLME can trigger more than  
> 
> not over the whole MLME_op, that's terrible to hold the rtnl lock so
> long... so I think this is fine that some netdev call will interfere
> with this transmission.
> So forget about the ASSERT_RTNL() here, it's fine (I hope).
> 
> > one message, the whole sync_hold/release queue should be earlier than
> > that... in my opinion is it not right to allow other messages so far
> > an MLME op is going on? I am not sure what the standard says to this,
> > but I think it should be stopped the whole time? All those sequence  
> 
> Whereas the stop of the netdev queue makes sense for the whole mlme-op
> (in my opinion).

I might still implement an MLME pre/post helper and do the queue
hold/release calls there, while only taking the rtnl from the _tx.

And I might create an mlme_tx_one() which does the pre/post calls as
well.

Would something like this fit?

Thanks,
Miquèl
Alexander Aring May 18, 2022, 1:14 a.m. UTC | #5
Hi,

On Tue, May 17, 2022 at 9:30 AM Miquel Raynal <miquel.raynal@bootlin.com> wrote:
>
>
> aahringo@redhat.com wrote on Sun, 15 May 2022 19:03:53 -0400:
>
> > Hi,
> >
> > On Sun, May 15, 2022 at 6:28 PM Alexander Aring <aahringo@redhat.com> wrote:
> > >
> > > Hi,
> > >
> > > On Thu, May 12, 2022 at 10:34 AM Miquel Raynal
> > > <miquel.raynal@bootlin.com> wrote:
> > > >
> > > > This is the slow path, we need to wait for each command to be processed
> > > > before continuing so let's introduce an helper which does the
> > > > transmission and blocks until it gets notified of its asynchronous
> > > > completion. This helper is going to be used when introducing scan
> > > > support.
> > > >
> > > > Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
> > > > ---
> > > >  net/mac802154/ieee802154_i.h |  1 +
> > > >  net/mac802154/tx.c           | 25 +++++++++++++++++++++++++
> > > >  2 files changed, 26 insertions(+)
> > > >
> > > > diff --git a/net/mac802154/ieee802154_i.h b/net/mac802154/ieee802154_i.h
> > > > index a057827fc48a..f8b374810a11 100644
> > > > --- a/net/mac802154/ieee802154_i.h
> > > > +++ b/net/mac802154/ieee802154_i.h
> > > > @@ -125,6 +125,7 @@ extern struct ieee802154_mlme_ops mac802154_mlme_wpan;
> > > >  void ieee802154_rx(struct ieee802154_local *local, struct sk_buff *skb);
> > > >  void ieee802154_xmit_sync_worker(struct work_struct *work);
> > > >  int ieee802154_sync_and_hold_queue(struct ieee802154_local *local);
> > > > +int ieee802154_mlme_tx(struct ieee802154_local *local, struct sk_buff *skb);
> > > >  netdev_tx_t
> > > >  ieee802154_monitor_start_xmit(struct sk_buff *skb, struct net_device *dev);
> > > >  netdev_tx_t
> > > > diff --git a/net/mac802154/tx.c b/net/mac802154/tx.c
> > > > index 38f74b8b6740..ec8d872143ee 100644
> > > > --- a/net/mac802154/tx.c
> > > > +++ b/net/mac802154/tx.c
> > > > @@ -128,6 +128,31 @@ int ieee802154_sync_and_hold_queue(struct ieee802154_local *local)
> > > >         return ieee802154_sync_queue(local);
> > > >  }
> > > >
> > > > +int ieee802154_mlme_tx(struct ieee802154_local *local, struct sk_buff *skb)
> > > > +{
> > > > +       int ret;
> > > > +
> > > > +       /* Avoid possible calls to ->ndo_stop() when we asynchronously perform
> > > > +        * MLME transmissions.
> > > > +        */
> > > > +       rtnl_lock();
> > >
> > > I think we should make an ASSERT_RTNL() here, the lock needs to be
> > > earlier than that over the whole MLME op. MLME can trigger more than
> >
> > not over the whole MLME_op, that's terrible to hold the rtnl lock so
> > long... so I think this is fine that some netdev call will interfere
> > with this transmission.
> > So forget about the ASSERT_RTNL() here, it's fine (I hope).
> >
> > > one message, the whole sync_hold/release queue should be earlier than
> > > that... in my opinion is it not right to allow other messages so far
> > > an MLME op is going on? I am not sure what the standard says to this,
> > > but I think it should be stopped the whole time? All those sequence
> >
> > Whereas the stop of the netdev queue makes sense for the whole mlme-op
> > (in my opinion).
>
> I might still implement an MLME pre/post helper and do the queue
> hold/release calls there, while only taking the rtnl from the _tx.
>
> And I might create an mlme_tx_one() which does the pre/post calls as
> well.
>
> Would something like this fit?

I think so, I've heard for some transceiver types a scan operation can
take hours... but I guess whoever triggers that scan in such an
environment knows that it has some "side-effects"...

- Alex
Miquel Raynal May 18, 2022, 10:12 a.m. UTC | #6
aahringo@redhat.com wrote on Tue, 17 May 2022 21:14:03 -0400:

> Hi,
> 
> On Tue, May 17, 2022 at 9:30 AM Miquel Raynal <miquel.raynal@bootlin.com> wrote:
> >
> >
> > aahringo@redhat.com wrote on Sun, 15 May 2022 19:03:53 -0400:
> >  
> > > Hi,
> > >
> > > On Sun, May 15, 2022 at 6:28 PM Alexander Aring <aahringo@redhat.com> wrote:  
> > > >
> > > > Hi,
> > > >
> > > > On Thu, May 12, 2022 at 10:34 AM Miquel Raynal
> > > > <miquel.raynal@bootlin.com> wrote:  
> > > > >
> > > > > This is the slow path, we need to wait for each command to be processed
> > > > > before continuing so let's introduce an helper which does the
> > > > > transmission and blocks until it gets notified of its asynchronous
> > > > > completion. This helper is going to be used when introducing scan
> > > > > support.
> > > > >
> > > > > Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
> > > > > ---
> > > > >  net/mac802154/ieee802154_i.h |  1 +
> > > > >  net/mac802154/tx.c           | 25 +++++++++++++++++++++++++
> > > > >  2 files changed, 26 insertions(+)
> > > > >
> > > > > diff --git a/net/mac802154/ieee802154_i.h b/net/mac802154/ieee802154_i.h
> > > > > index a057827fc48a..f8b374810a11 100644
> > > > > --- a/net/mac802154/ieee802154_i.h
> > > > > +++ b/net/mac802154/ieee802154_i.h
> > > > > @@ -125,6 +125,7 @@ extern struct ieee802154_mlme_ops mac802154_mlme_wpan;
> > > > >  void ieee802154_rx(struct ieee802154_local *local, struct sk_buff *skb);
> > > > >  void ieee802154_xmit_sync_worker(struct work_struct *work);
> > > > >  int ieee802154_sync_and_hold_queue(struct ieee802154_local *local);
> > > > > +int ieee802154_mlme_tx(struct ieee802154_local *local, struct sk_buff *skb);
> > > > >  netdev_tx_t
> > > > >  ieee802154_monitor_start_xmit(struct sk_buff *skb, struct net_device *dev);
> > > > >  netdev_tx_t
> > > > > diff --git a/net/mac802154/tx.c b/net/mac802154/tx.c
> > > > > index 38f74b8b6740..ec8d872143ee 100644
> > > > > --- a/net/mac802154/tx.c
> > > > > +++ b/net/mac802154/tx.c
> > > > > @@ -128,6 +128,31 @@ int ieee802154_sync_and_hold_queue(struct ieee802154_local *local)
> > > > >         return ieee802154_sync_queue(local);
> > > > >  }
> > > > >
> > > > > +int ieee802154_mlme_tx(struct ieee802154_local *local, struct sk_buff *skb)
> > > > > +{
> > > > > +       int ret;
> > > > > +
> > > > > +       /* Avoid possible calls to ->ndo_stop() when we asynchronously perform
> > > > > +        * MLME transmissions.
> > > > > +        */
> > > > > +       rtnl_lock();  
> > > >
> > > > I think we should make an ASSERT_RTNL() here, the lock needs to be
> > > > earlier than that over the whole MLME op. MLME can trigger more than  
> > >
> > > not over the whole MLME_op, that's terrible to hold the rtnl lock so
> > > long... so I think this is fine that some netdev call will interfere
> > > with this transmission.
> > > So forget about the ASSERT_RTNL() here, it's fine (I hope).
> > >  
> > > > one message, the whole sync_hold/release queue should be earlier than
> > > > that... in my opinion is it not right to allow other messages so far
> > > > an MLME op is going on? I am not sure what the standard says to this,
> > > > but I think it should be stopped the whole time? All those sequence  
> > >
> > > Whereas the stop of the netdev queue makes sense for the whole mlme-op
> > > (in my opinion).  
> >
> > I might still implement an MLME pre/post helper and do the queue
> > hold/release calls there, while only taking the rtnl from the _tx.
> >
> > And I might create an mlme_tx_one() which does the pre/post calls as
> > well.
> >
> > Would something like this fit?  
> 
> I think so, I've heard for some transceiver types a scan operation can
> take hours... but I guess whoever triggers that scan in such an
> environment knows that it has some "side-effects"...

Yeah, a scan requires the data queue to be stopped and all incoming
packets to be dropped (others than beacons, ofc), so users must be
aware of this limitation.
Alexander Aring May 18, 2022, 12:05 p.m. UTC | #7
Hi,

On Wed, May 18, 2022 at 6:12 AM Miquel Raynal <miquel.raynal@bootlin.com> wrote:
>
>
> aahringo@redhat.com wrote on Tue, 17 May 2022 21:14:03 -0400:
>
> > Hi,
> >
> > On Tue, May 17, 2022 at 9:30 AM Miquel Raynal <miquel.raynal@bootlin.com> wrote:
> > >
> > >
> > > aahringo@redhat.com wrote on Sun, 15 May 2022 19:03:53 -0400:
> > >
> > > > Hi,
> > > >
> > > > On Sun, May 15, 2022 at 6:28 PM Alexander Aring <aahringo@redhat.com> wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > On Thu, May 12, 2022 at 10:34 AM Miquel Raynal
> > > > > <miquel.raynal@bootlin.com> wrote:
> > > > > >
> > > > > > This is the slow path, we need to wait for each command to be processed
> > > > > > before continuing so let's introduce an helper which does the
> > > > > > transmission and blocks until it gets notified of its asynchronous
> > > > > > completion. This helper is going to be used when introducing scan
> > > > > > support.
> > > > > >
> > > > > > Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
> > > > > > ---
> > > > > >  net/mac802154/ieee802154_i.h |  1 +
> > > > > >  net/mac802154/tx.c           | 25 +++++++++++++++++++++++++
> > > > > >  2 files changed, 26 insertions(+)
> > > > > >
> > > > > > diff --git a/net/mac802154/ieee802154_i.h b/net/mac802154/ieee802154_i.h
> > > > > > index a057827fc48a..f8b374810a11 100644
> > > > > > --- a/net/mac802154/ieee802154_i.h
> > > > > > +++ b/net/mac802154/ieee802154_i.h
> > > > > > @@ -125,6 +125,7 @@ extern struct ieee802154_mlme_ops mac802154_mlme_wpan;
> > > > > >  void ieee802154_rx(struct ieee802154_local *local, struct sk_buff *skb);
> > > > > >  void ieee802154_xmit_sync_worker(struct work_struct *work);
> > > > > >  int ieee802154_sync_and_hold_queue(struct ieee802154_local *local);
> > > > > > +int ieee802154_mlme_tx(struct ieee802154_local *local, struct sk_buff *skb);
> > > > > >  netdev_tx_t
> > > > > >  ieee802154_monitor_start_xmit(struct sk_buff *skb, struct net_device *dev);
> > > > > >  netdev_tx_t
> > > > > > diff --git a/net/mac802154/tx.c b/net/mac802154/tx.c
> > > > > > index 38f74b8b6740..ec8d872143ee 100644
> > > > > > --- a/net/mac802154/tx.c
> > > > > > +++ b/net/mac802154/tx.c
> > > > > > @@ -128,6 +128,31 @@ int ieee802154_sync_and_hold_queue(struct ieee802154_local *local)
> > > > > >         return ieee802154_sync_queue(local);
> > > > > >  }
> > > > > >
> > > > > > +int ieee802154_mlme_tx(struct ieee802154_local *local, struct sk_buff *skb)
> > > > > > +{
> > > > > > +       int ret;
> > > > > > +
> > > > > > +       /* Avoid possible calls to ->ndo_stop() when we asynchronously perform
> > > > > > +        * MLME transmissions.
> > > > > > +        */
> > > > > > +       rtnl_lock();
> > > > >
> > > > > I think we should make an ASSERT_RTNL() here, the lock needs to be
> > > > > earlier than that over the whole MLME op. MLME can trigger more than
> > > >
> > > > not over the whole MLME_op, that's terrible to hold the rtnl lock so
> > > > long... so I think this is fine that some netdev call will interfere
> > > > with this transmission.
> > > > So forget about the ASSERT_RTNL() here, it's fine (I hope).
> > > >
> > > > > one message, the whole sync_hold/release queue should be earlier than
> > > > > that... in my opinion is it not right to allow other messages so far
> > > > > an MLME op is going on? I am not sure what the standard says to this,
> > > > > but I think it should be stopped the whole time? All those sequence
> > > >
> > > > Whereas the stop of the netdev queue makes sense for the whole mlme-op
> > > > (in my opinion).
> > >
> > > I might still implement an MLME pre/post helper and do the queue
> > > hold/release calls there, while only taking the rtnl from the _tx.
> > >
> > > And I might create an mlme_tx_one() which does the pre/post calls as
> > > well.
> > >
> > > Would something like this fit?
> >
> > I think so, I've heard for some transceiver types a scan operation can
> > take hours... but I guess whoever triggers that scan in such an
> > environment knows that it has some "side-effects"...
>
> Yeah, a scan requires the data queue to be stopped and all incoming
> packets to be dropped (others than beacons, ofc), so users must be
> aware of this limitation.

I think there is a real problem about how the user can synchronize the
start of a scan and be sure that at this point everything was
transmitted, we might need to real "flush" the queue. Your naming
"flush" is also wrong, It will flush the framebuffer(s) of the
transceivers but not the netdev queue... and we probably should flush
the netdev queue before starting mlme-op... this is something to add
in the mlme_op_pre() function.

- Alex
Miquel Raynal May 18, 2022, 12:37 p.m. UTC | #8
alex.aring@gmail.com wrote on Wed, 18 May 2022 08:05:46 -0400:

> Hi,
> 
> On Wed, May 18, 2022 at 6:12 AM Miquel Raynal <miquel.raynal@bootlin.com> wrote:
> >
> >
> > aahringo@redhat.com wrote on Tue, 17 May 2022 21:14:03 -0400:
> >  
> > > Hi,
> > >
> > > On Tue, May 17, 2022 at 9:30 AM Miquel Raynal <miquel.raynal@bootlin.com> wrote:  
> > > >
> > > >
> > > > aahringo@redhat.com wrote on Sun, 15 May 2022 19:03:53 -0400:
> > > >  
> > > > > Hi,
> > > > >
> > > > > On Sun, May 15, 2022 at 6:28 PM Alexander Aring <aahringo@redhat.com> wrote:  
> > > > > >
> > > > > > Hi,
> > > > > >
> > > > > > On Thu, May 12, 2022 at 10:34 AM Miquel Raynal
> > > > > > <miquel.raynal@bootlin.com> wrote:  
> > > > > > >
> > > > > > > This is the slow path, we need to wait for each command to be processed
> > > > > > > before continuing so let's introduce an helper which does the
> > > > > > > transmission and blocks until it gets notified of its asynchronous
> > > > > > > completion. This helper is going to be used when introducing scan
> > > > > > > support.
> > > > > > >
> > > > > > > Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
> > > > > > > ---
> > > > > > >  net/mac802154/ieee802154_i.h |  1 +
> > > > > > >  net/mac802154/tx.c           | 25 +++++++++++++++++++++++++
> > > > > > >  2 files changed, 26 insertions(+)
> > > > > > >
> > > > > > > diff --git a/net/mac802154/ieee802154_i.h b/net/mac802154/ieee802154_i.h
> > > > > > > index a057827fc48a..f8b374810a11 100644
> > > > > > > --- a/net/mac802154/ieee802154_i.h
> > > > > > > +++ b/net/mac802154/ieee802154_i.h
> > > > > > > @@ -125,6 +125,7 @@ extern struct ieee802154_mlme_ops mac802154_mlme_wpan;
> > > > > > >  void ieee802154_rx(struct ieee802154_local *local, struct sk_buff *skb);
> > > > > > >  void ieee802154_xmit_sync_worker(struct work_struct *work);
> > > > > > >  int ieee802154_sync_and_hold_queue(struct ieee802154_local *local);
> > > > > > > +int ieee802154_mlme_tx(struct ieee802154_local *local, struct sk_buff *skb);
> > > > > > >  netdev_tx_t
> > > > > > >  ieee802154_monitor_start_xmit(struct sk_buff *skb, struct net_device *dev);
> > > > > > >  netdev_tx_t
> > > > > > > diff --git a/net/mac802154/tx.c b/net/mac802154/tx.c
> > > > > > > index 38f74b8b6740..ec8d872143ee 100644
> > > > > > > --- a/net/mac802154/tx.c
> > > > > > > +++ b/net/mac802154/tx.c
> > > > > > > @@ -128,6 +128,31 @@ int ieee802154_sync_and_hold_queue(struct ieee802154_local *local)
> > > > > > >         return ieee802154_sync_queue(local);
> > > > > > >  }
> > > > > > >
> > > > > > > +int ieee802154_mlme_tx(struct ieee802154_local *local, struct sk_buff *skb)
> > > > > > > +{
> > > > > > > +       int ret;
> > > > > > > +
> > > > > > > +       /* Avoid possible calls to ->ndo_stop() when we asynchronously perform
> > > > > > > +        * MLME transmissions.
> > > > > > > +        */
> > > > > > > +       rtnl_lock();  
> > > > > >
> > > > > > I think we should make an ASSERT_RTNL() here, the lock needs to be
> > > > > > earlier than that over the whole MLME op. MLME can trigger more than  
> > > > >
> > > > > not over the whole MLME_op, that's terrible to hold the rtnl lock so
> > > > > long... so I think this is fine that some netdev call will interfere
> > > > > with this transmission.
> > > > > So forget about the ASSERT_RTNL() here, it's fine (I hope).
> > > > >  
> > > > > > one message, the whole sync_hold/release queue should be earlier than
> > > > > > that... in my opinion is it not right to allow other messages so far
> > > > > > an MLME op is going on? I am not sure what the standard says to this,
> > > > > > but I think it should be stopped the whole time? All those sequence  
> > > > >
> > > > > Whereas the stop of the netdev queue makes sense for the whole mlme-op
> > > > > (in my opinion).  
> > > >
> > > > I might still implement an MLME pre/post helper and do the queue
> > > > hold/release calls there, while only taking the rtnl from the _tx.
> > > >
> > > > And I might create an mlme_tx_one() which does the pre/post calls as
> > > > well.
> > > >
> > > > Would something like this fit?  
> > >
> > > I think so, I've heard for some transceiver types a scan operation can
> > > take hours... but I guess whoever triggers that scan in such an
> > > environment knows that it has some "side-effects"...  
> >
> > Yeah, a scan requires the data queue to be stopped and all incoming
> > packets to be dropped (others than beacons, ofc), so users must be
> > aware of this limitation.  
> 
> I think there is a real problem about how the user can synchronize the
> start of a scan and be sure that at this point everything was
> transmitted, we might need to real "flush" the queue. Your naming
> "flush" is also wrong, It will flush the framebuffer(s) of the
> transceivers but not the netdev queue... and we probably should flush
> the netdev queue before starting mlme-op... this is something to add
> in the mlme_op_pre() function.

Is it even possible? This requires waiting for the netdev queue to be
empty before stopping it, but if users constantly flood the transceiver
with data packets this might "never" happen.

And event thought we might accept this situation, I don't know how to
check the emptiness of the netif queue. Any inputs?

Thanks,
Miquèl
Alexander Aring May 18, 2022, 1:08 p.m. UTC | #9
Hi,

On Wed, May 18, 2022 at 8:37 AM Miquel Raynal <miquel.raynal@bootlin.com> wrote:
>
>
> alex.aring@gmail.com wrote on Wed, 18 May 2022 08:05:46 -0400:
>
> > Hi,
> >
> > On Wed, May 18, 2022 at 6:12 AM Miquel Raynal <miquel.raynal@bootlin.com> wrote:
> > >
> > >
> > > aahringo@redhat.com wrote on Tue, 17 May 2022 21:14:03 -0400:
> > >
> > > > Hi,
> > > >
> > > > On Tue, May 17, 2022 at 9:30 AM Miquel Raynal <miquel.raynal@bootlin.com> wrote:
> > > > >
> > > > >
> > > > > aahringo@redhat.com wrote on Sun, 15 May 2022 19:03:53 -0400:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > On Sun, May 15, 2022 at 6:28 PM Alexander Aring <aahringo@redhat.com> wrote:
> > > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > On Thu, May 12, 2022 at 10:34 AM Miquel Raynal
> > > > > > > <miquel.raynal@bootlin.com> wrote:
> > > > > > > >
> > > > > > > > This is the slow path, we need to wait for each command to be processed
> > > > > > > > before continuing so let's introduce an helper which does the
> > > > > > > > transmission and blocks until it gets notified of its asynchronous
> > > > > > > > completion. This helper is going to be used when introducing scan
> > > > > > > > support.
> > > > > > > >
> > > > > > > > Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
> > > > > > > > ---
> > > > > > > >  net/mac802154/ieee802154_i.h |  1 +
> > > > > > > >  net/mac802154/tx.c           | 25 +++++++++++++++++++++++++
> > > > > > > >  2 files changed, 26 insertions(+)
> > > > > > > >
> > > > > > > > diff --git a/net/mac802154/ieee802154_i.h b/net/mac802154/ieee802154_i.h
> > > > > > > > index a057827fc48a..f8b374810a11 100644
> > > > > > > > --- a/net/mac802154/ieee802154_i.h
> > > > > > > > +++ b/net/mac802154/ieee802154_i.h
> > > > > > > > @@ -125,6 +125,7 @@ extern struct ieee802154_mlme_ops mac802154_mlme_wpan;
> > > > > > > >  void ieee802154_rx(struct ieee802154_local *local, struct sk_buff *skb);
> > > > > > > >  void ieee802154_xmit_sync_worker(struct work_struct *work);
> > > > > > > >  int ieee802154_sync_and_hold_queue(struct ieee802154_local *local);
> > > > > > > > +int ieee802154_mlme_tx(struct ieee802154_local *local, struct sk_buff *skb);
> > > > > > > >  netdev_tx_t
> > > > > > > >  ieee802154_monitor_start_xmit(struct sk_buff *skb, struct net_device *dev);
> > > > > > > >  netdev_tx_t
> > > > > > > > diff --git a/net/mac802154/tx.c b/net/mac802154/tx.c
> > > > > > > > index 38f74b8b6740..ec8d872143ee 100644
> > > > > > > > --- a/net/mac802154/tx.c
> > > > > > > > +++ b/net/mac802154/tx.c
> > > > > > > > @@ -128,6 +128,31 @@ int ieee802154_sync_and_hold_queue(struct ieee802154_local *local)
> > > > > > > >         return ieee802154_sync_queue(local);
> > > > > > > >  }
> > > > > > > >
> > > > > > > > +int ieee802154_mlme_tx(struct ieee802154_local *local, struct sk_buff *skb)
> > > > > > > > +{
> > > > > > > > +       int ret;
> > > > > > > > +
> > > > > > > > +       /* Avoid possible calls to ->ndo_stop() when we asynchronously perform
> > > > > > > > +        * MLME transmissions.
> > > > > > > > +        */
> > > > > > > > +       rtnl_lock();
> > > > > > >
> > > > > > > I think we should make an ASSERT_RTNL() here, the lock needs to be
> > > > > > > earlier than that over the whole MLME op. MLME can trigger more than
> > > > > >
> > > > > > not over the whole MLME_op, that's terrible to hold the rtnl lock so
> > > > > > long... so I think this is fine that some netdev call will interfere
> > > > > > with this transmission.
> > > > > > So forget about the ASSERT_RTNL() here, it's fine (I hope).
> > > > > >
> > > > > > > one message, the whole sync_hold/release queue should be earlier than
> > > > > > > that... in my opinion is it not right to allow other messages so far
> > > > > > > an MLME op is going on? I am not sure what the standard says to this,
> > > > > > > but I think it should be stopped the whole time? All those sequence
> > > > > >
> > > > > > Whereas the stop of the netdev queue makes sense for the whole mlme-op
> > > > > > (in my opinion).
> > > > >
> > > > > I might still implement an MLME pre/post helper and do the queue
> > > > > hold/release calls there, while only taking the rtnl from the _tx.
> > > > >
> > > > > And I might create an mlme_tx_one() which does the pre/post calls as
> > > > > well.
> > > > >
> > > > > Would something like this fit?
> > > >
> > > > I think so, I've heard for some transceiver types a scan operation can
> > > > take hours... but I guess whoever triggers that scan in such an
> > > > environment knows that it has some "side-effects"...
> > >
> > > Yeah, a scan requires the data queue to be stopped and all incoming
> > > packets to be dropped (others than beacons, ofc), so users must be
> > > aware of this limitation.
> >
> > I think there is a real problem about how the user can synchronize the
> > start of a scan and be sure that at this point everything was
> > transmitted, we might need to real "flush" the queue. Your naming
> > "flush" is also wrong, It will flush the framebuffer(s) of the
> > transceivers but not the netdev queue... and we probably should flush
> > the netdev queue before starting mlme-op... this is something to add
> > in the mlme_op_pre() function.
>
> Is it even possible? This requires waiting for the netdev queue to be
> empty before stopping it, but if users constantly flood the transceiver
> with data packets this might "never" happen.
>

Nothing is impossible, just maybe nobody thought about that. Sure
putting more into the queue should be forbidden but what's inside
should be "flushed". Currently we make a hard cut, there is no way
that the user knows what's sent or not BUT that is the case for
xmit_do() anyway, it's not reliable... people need to have the right
upper layer protocol. However I think we could run into problems if we
especially have features like waiting for the socket error queue to
know if e.g. an ack was received or not.

> And event thought we might accept this situation, I don't know how to
> check the emptiness of the netif queue. Any inputs?

Don't think about it, I see a practical issue here which I keep in my mind.

- Alex
Miquel Raynal May 18, 2022, 4:12 p.m. UTC | #10
Hi Alexander,

> > > > > > > > > +int ieee802154_mlme_tx(struct ieee802154_local *local, struct sk_buff *skb)
> > > > > > > > > +{
> > > > > > > > > +       int ret;
> > > > > > > > > +
> > > > > > > > > +       /* Avoid possible calls to ->ndo_stop() when we asynchronously perform
> > > > > > > > > +        * MLME transmissions.
> > > > > > > > > +        */
> > > > > > > > > +       rtnl_lock();  
> > > > > > > >
> > > > > > > > I think we should make an ASSERT_RTNL() here, the lock needs to be
> > > > > > > > earlier than that over the whole MLME op. MLME can trigger more than  
> > > > > > >
> > > > > > > not over the whole MLME_op, that's terrible to hold the rtnl lock so
> > > > > > > long... so I think this is fine that some netdev call will interfere
> > > > > > > with this transmission.
> > > > > > > So forget about the ASSERT_RTNL() here, it's fine (I hope).
> > > > > > >  
> > > > > > > > one message, the whole sync_hold/release queue should be earlier than
> > > > > > > > that... in my opinion is it not right to allow other messages so far
> > > > > > > > an MLME op is going on? I am not sure what the standard says to this,
> > > > > > > > but I think it should be stopped the whole time? All those sequence  
> > > > > > >
> > > > > > > Whereas the stop of the netdev queue makes sense for the whole mlme-op
> > > > > > > (in my opinion).  
> > > > > >
> > > > > > I might still implement an MLME pre/post helper and do the queue
> > > > > > hold/release calls there, while only taking the rtnl from the _tx.
> > > > > >
> > > > > > And I might create an mlme_tx_one() which does the pre/post calls as
> > > > > > well.
> > > > > >
> > > > > > Would something like this fit?  
> > > > >
> > > > > I think so, I've heard for some transceiver types a scan operation can
> > > > > take hours... but I guess whoever triggers that scan in such an
> > > > > environment knows that it has some "side-effects"...  
> > > >
> > > > Yeah, a scan requires the data queue to be stopped and all incoming
> > > > packets to be dropped (others than beacons, ofc), so users must be
> > > > aware of this limitation.  
> > >
> > > I think there is a real problem about how the user can synchronize the
> > > start of a scan and be sure that at this point everything was
> > > transmitted, we might need to real "flush" the queue. Your naming
> > > "flush" is also wrong, It will flush the framebuffer(s) of the
> > > transceivers but not the netdev queue... and we probably should flush
> > > the netdev queue before starting mlme-op... this is something to add
> > > in the mlme_op_pre() function.  
> >
> > Is it even possible? This requires waiting for the netdev queue to be
> > empty before stopping it, but if users constantly flood the transceiver
> > with data packets this might "never" happen.
> >  
> 
> Nothing is impossible, just maybe nobody thought about that. Sure
> putting more into the queue should be forbidden but what's inside
> should be "flushed". Currently we make a hard cut, there is no way
> that the user knows what's sent or not BUT that is the case for
> xmit_do() anyway, it's not reliable... people need to have the right
> upper layer protocol. However I think we could run into problems if we
> especially have features like waiting for the socket error queue to
> know if e.g. an ack was received or not.

Looking at net/core/dev.c I don't see the issue anymore, let me try to
explain: as far as I understand the net device queue is a very
conceptual "queue" which only has a reality if the underlying layer
really implements the concept of a queue. To be more precise, at the
netdev level itself, there is a HARD_TX_LOCK() call which serializes
the ->ndo_start_xmit() calls, but whatever entered the
->ndo_start_xmit() hook _will_ be handled by the lower layer and is not
in any "waiting" state at the net core level.

In practice, the IEEE 802.15.4 core treats all packets immediately and
do not really bother "queuing" them like if there was a "waiting"
state. So all messages that the userspace expected to be send (which
did not return NETDEV_TX_BUSY) at the moment where we decide to stop
data transmissions will be processed.

If several frames had to be transmitted to the IEEE 802.15.4 core and
they all passed the netdev "queuing" mechanism, then they will be
forwarded to the tranceivers thanks to the wait_event(!ongoing_txs) and
only after we declare the queue sync'ed.

For me there is no hard cut.

> > And event thought we might accept this situation, I don't know how to
> > check the emptiness of the netif queue. Any inputs?  
> 
> Don't think about it, I see a practical issue here which I keep in my mind.
> 
> - Alex
> 


Thanks,
Miquèl
Alexander Aring May 19, 2022, 1:51 a.m. UTC | #11
Hi,

On Wed, May 18, 2022 at 12:12 PM Miquel Raynal
<miquel.raynal@bootlin.com> wrote:
>
> Hi Alexander,
>
> > > > > > > > > > +int ieee802154_mlme_tx(struct ieee802154_local *local, struct sk_buff *skb)
> > > > > > > > > > +{
> > > > > > > > > > +       int ret;
> > > > > > > > > > +
> > > > > > > > > > +       /* Avoid possible calls to ->ndo_stop() when we asynchronously perform
> > > > > > > > > > +        * MLME transmissions.
> > > > > > > > > > +        */
> > > > > > > > > > +       rtnl_lock();
> > > > > > > > >
> > > > > > > > > I think we should make an ASSERT_RTNL() here, the lock needs to be
> > > > > > > > > earlier than that over the whole MLME op. MLME can trigger more than
> > > > > > > >
> > > > > > > > not over the whole MLME_op, that's terrible to hold the rtnl lock so
> > > > > > > > long... so I think this is fine that some netdev call will interfere
> > > > > > > > with this transmission.
> > > > > > > > So forget about the ASSERT_RTNL() here, it's fine (I hope).
> > > > > > > >
> > > > > > > > > one message, the whole sync_hold/release queue should be earlier than
> > > > > > > > > that... in my opinion is it not right to allow other messages so far
> > > > > > > > > an MLME op is going on? I am not sure what the standard says to this,
> > > > > > > > > but I think it should be stopped the whole time? All those sequence
> > > > > > > >
> > > > > > > > Whereas the stop of the netdev queue makes sense for the whole mlme-op
> > > > > > > > (in my opinion).
> > > > > > >
> > > > > > > I might still implement an MLME pre/post helper and do the queue
> > > > > > > hold/release calls there, while only taking the rtnl from the _tx.
> > > > > > >
> > > > > > > And I might create an mlme_tx_one() which does the pre/post calls as
> > > > > > > well.
> > > > > > >
> > > > > > > Would something like this fit?
> > > > > >
> > > > > > I think so, I've heard for some transceiver types a scan operation can
> > > > > > take hours... but I guess whoever triggers that scan in such an
> > > > > > environment knows that it has some "side-effects"...
> > > > >
> > > > > Yeah, a scan requires the data queue to be stopped and all incoming
> > > > > packets to be dropped (others than beacons, ofc), so users must be
> > > > > aware of this limitation.
> > > >
> > > > I think there is a real problem about how the user can synchronize the
> > > > start of a scan and be sure that at this point everything was
> > > > transmitted, we might need to real "flush" the queue. Your naming
> > > > "flush" is also wrong, It will flush the framebuffer(s) of the
> > > > transceivers but not the netdev queue... and we probably should flush
> > > > the netdev queue before starting mlme-op... this is something to add
> > > > in the mlme_op_pre() function.
> > >
> > > Is it even possible? This requires waiting for the netdev queue to be
> > > empty before stopping it, but if users constantly flood the transceiver
> > > with data packets this might "never" happen.
> > >
> >
> > Nothing is impossible, just maybe nobody thought about that. Sure
> > putting more into the queue should be forbidden but what's inside
> > should be "flushed". Currently we make a hard cut, there is no way
> > that the user knows what's sent or not BUT that is the case for
> > xmit_do() anyway, it's not reliable... people need to have the right
> > upper layer protocol. However I think we could run into problems if we
> > especially have features like waiting for the socket error queue to
> > know if e.g. an ack was received or not.
>
> Looking at net/core/dev.c I don't see the issue anymore, let me try to
> explain: as far as I understand the net device queue is a very
> conceptual "queue" which only has a reality if the underlying layer
> really implements the concept of a queue. To be more precise, at the
> netdev level itself, there is a HARD_TX_LOCK() call which serializes
> the ->ndo_start_xmit() calls, but whatever entered the
> ->ndo_start_xmit() hook _will_ be handled by the lower layer and is not
> in any "waiting" state at the net core level.
>
> In practice, the IEEE 802.15.4 core treats all packets immediately and
> do not really bother "queuing" them like if there was a "waiting"
> state. So all messages that the userspace expected to be send (which
> did not return NETDEV_TX_BUSY) at the moment where we decide to stop
> data transmissions will be processed.
>
> If several frames had to be transmitted to the IEEE 802.15.4 core and
> they all passed the netdev "queuing" mechanism, then they will be
> forwarded to the tranceivers thanks to the wait_event(!ongoing_txs) and
> only after we declare the queue sync'ed.
>
> For me there is no hard cut.

In my opinion there is definitely in case of a wpan interface a queue
handling right above xmit_do() which is in a "works for now" state.
Your queue flush function will not flush any queue, as I said it's
flushing the transceivers framebuffer at the starting point of
xmit_do() call and you should change your comments/function names to
describe this behaviour.

- Alex
Miquel Raynal May 19, 2022, 2:20 p.m. UTC | #12
Hi Alexander,

aahringo@redhat.com wrote on Wed, 18 May 2022 21:51:36 -0400:

> Hi,
> 
> On Wed, May 18, 2022 at 12:12 PM Miquel Raynal
> <miquel.raynal@bootlin.com> wrote:
> >
> > Hi Alexander,
> >  
> > > > > > > > > > > +int ieee802154_mlme_tx(struct ieee802154_local *local, struct sk_buff *skb)
> > > > > > > > > > > +{
> > > > > > > > > > > +       int ret;
> > > > > > > > > > > +
> > > > > > > > > > > +       /* Avoid possible calls to ->ndo_stop() when we asynchronously perform
> > > > > > > > > > > +        * MLME transmissions.
> > > > > > > > > > > +        */
> > > > > > > > > > > +       rtnl_lock();  
> > > > > > > > > >
> > > > > > > > > > I think we should make an ASSERT_RTNL() here, the lock needs to be
> > > > > > > > > > earlier than that over the whole MLME op. MLME can trigger more than  
> > > > > > > > >
> > > > > > > > > not over the whole MLME_op, that's terrible to hold the rtnl lock so
> > > > > > > > > long... so I think this is fine that some netdev call will interfere
> > > > > > > > > with this transmission.
> > > > > > > > > So forget about the ASSERT_RTNL() here, it's fine (I hope).
> > > > > > > > >  
> > > > > > > > > > one message, the whole sync_hold/release queue should be earlier than
> > > > > > > > > > that... in my opinion is it not right to allow other messages so far
> > > > > > > > > > an MLME op is going on? I am not sure what the standard says to this,
> > > > > > > > > > but I think it should be stopped the whole time? All those sequence  
> > > > > > > > >
> > > > > > > > > Whereas the stop of the netdev queue makes sense for the whole mlme-op
> > > > > > > > > (in my opinion).  
> > > > > > > >
> > > > > > > > I might still implement an MLME pre/post helper and do the queue
> > > > > > > > hold/release calls there, while only taking the rtnl from the _tx.
> > > > > > > >
> > > > > > > > And I might create an mlme_tx_one() which does the pre/post calls as
> > > > > > > > well.
> > > > > > > >
> > > > > > > > Would something like this fit?  
> > > > > > >
> > > > > > > I think so, I've heard for some transceiver types a scan operation can
> > > > > > > take hours... but I guess whoever triggers that scan in such an
> > > > > > > environment knows that it has some "side-effects"...  
> > > > > >
> > > > > > Yeah, a scan requires the data queue to be stopped and all incoming
> > > > > > packets to be dropped (others than beacons, ofc), so users must be
> > > > > > aware of this limitation.  
> > > > >
> > > > > I think there is a real problem about how the user can synchronize the
> > > > > start of a scan and be sure that at this point everything was
> > > > > transmitted, we might need to real "flush" the queue. Your naming
> > > > > "flush" is also wrong, It will flush the framebuffer(s) of the
> > > > > transceivers but not the netdev queue... and we probably should flush
> > > > > the netdev queue before starting mlme-op... this is something to add
> > > > > in the mlme_op_pre() function.  
> > > >
> > > > Is it even possible? This requires waiting for the netdev queue to be
> > > > empty before stopping it, but if users constantly flood the transceiver
> > > > with data packets this might "never" happen.
> > > >  
> > >
> > > Nothing is impossible, just maybe nobody thought about that. Sure
> > > putting more into the queue should be forbidden but what's inside
> > > should be "flushed". Currently we make a hard cut, there is no way
> > > that the user knows what's sent or not BUT that is the case for
> > > xmit_do() anyway, it's not reliable... people need to have the right
> > > upper layer protocol. However I think we could run into problems if we
> > > especially have features like waiting for the socket error queue to
> > > know if e.g. an ack was received or not.  
> >
> > Looking at net/core/dev.c I don't see the issue anymore, let me try to
> > explain: as far as I understand the net device queue is a very
> > conceptual "queue" which only has a reality if the underlying layer
> > really implements the concept of a queue. To be more precise, at the
> > netdev level itself, there is a HARD_TX_LOCK() call which serializes
> > the ->ndo_start_xmit() calls, but whatever entered the  
> > ->ndo_start_xmit() hook _will_ be handled by the lower layer and is not  
> > in any "waiting" state at the net core level.
> >
> > In practice, the IEEE 802.15.4 core treats all packets immediately and
> > do not really bother "queuing" them like if there was a "waiting"
> > state. So all messages that the userspace expected to be send (which
> > did not return NETDEV_TX_BUSY) at the moment where we decide to stop
> > data transmissions will be processed.
> >
> > If several frames had to be transmitted to the IEEE 802.15.4 core and
> > they all passed the netdev "queuing" mechanism, then they will be
> > forwarded to the tranceivers thanks to the wait_event(!ongoing_txs) and
> > only after we declare the queue sync'ed.
> >
> > For me there is no hard cut.  
> 
> In my opinion there is definitely in case of a wpan interface a queue
> handling right above xmit_do() which is in a "works for now" state.
> Your queue flush function will not flush any queue, as I said it's
> flushing the transceivers framebuffer at the starting point of
> xmit_do() call and you should change your comments/function names to
> describe this behaviour.

I see we are still discussing the v2 here but I assume the v3 naming
might already be better, I am not flushing anymore but I "{sync,sync and
hold} the ieee802154 queue", which is hopefully closer to the reality.

The fact is that we have no way of knowing what happens at an higher
level (for the best) and if the user ever decides to perform a scan,
any data left in the upper layers will be frozen there for a moment, I
would say it is the responsibility of the user to act on the upper
layers to sync there first.

I'll send the v4 with all your other comments addressed, let's
continue discussing there if needed.

Thanks,
Miquèl
diff mbox series

Patch

diff --git a/net/mac802154/ieee802154_i.h b/net/mac802154/ieee802154_i.h
index a057827fc48a..f8b374810a11 100644
--- a/net/mac802154/ieee802154_i.h
+++ b/net/mac802154/ieee802154_i.h
@@ -125,6 +125,7 @@  extern struct ieee802154_mlme_ops mac802154_mlme_wpan;
 void ieee802154_rx(struct ieee802154_local *local, struct sk_buff *skb);
 void ieee802154_xmit_sync_worker(struct work_struct *work);
 int ieee802154_sync_and_hold_queue(struct ieee802154_local *local);
+int ieee802154_mlme_tx(struct ieee802154_local *local, struct sk_buff *skb);
 netdev_tx_t
 ieee802154_monitor_start_xmit(struct sk_buff *skb, struct net_device *dev);
 netdev_tx_t
diff --git a/net/mac802154/tx.c b/net/mac802154/tx.c
index 38f74b8b6740..ec8d872143ee 100644
--- a/net/mac802154/tx.c
+++ b/net/mac802154/tx.c
@@ -128,6 +128,31 @@  int ieee802154_sync_and_hold_queue(struct ieee802154_local *local)
 	return ieee802154_sync_queue(local);
 }
 
+int ieee802154_mlme_tx(struct ieee802154_local *local, struct sk_buff *skb)
+{
+	int ret;
+
+	/* Avoid possible calls to ->ndo_stop() when we asynchronously perform
+	 * MLME transmissions.
+	 */
+	rtnl_lock();
+
+	/* Ensure the device was not stopped, otherwise error out */
+	if (!local->open_count)
+		return -EBUSY;
+
+	ieee802154_sync_and_hold_queue(local);
+
+	ieee802154_tx(local, skb);
+	ret = ieee802154_sync_queue(local);
+
+	ieee802154_release_queue(local);
+
+	rtnl_unlock();
+
+	return ret;
+}
+
 static netdev_tx_t
 ieee802154_hot_tx(struct ieee802154_local *local, struct sk_buff *skb)
 {