diff mbox series

[RFC] mt76: usb: reduce locking in mt76u_tx_tasklet

Message ID 1ee5ce7818f9d45c9713ce99e810cb84f50dcf03.1552907276.git.lorenzo@kernel.org (mailing list archive)
State RFC
Delegated to: Felix Fietkau
Headers show
Series [RFC] mt76: usb: reduce locking in mt76u_tx_tasklet | expand

Commit Message

Lorenzo Bianconi March 18, 2019, 11:09 a.m. UTC
Similar to pci counterpart, reduce locking in mt76u_tx_tasklet since
q->head is managed just in mt76u_tx_tasklet and q->queued is updated
holding q->lock

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 drivers/net/wireless/mediatek/mt76/usb.c | 18 +++++++++++-------
 1 file changed, 11 insertions(+), 7 deletions(-)

Comments

Stanislaw Gruszka March 19, 2019, 11:07 a.m. UTC | #1
On Mon, Mar 18, 2019 at 12:09:32PM +0100, Lorenzo Bianconi wrote:
> Similar to pci counterpart, reduce locking in mt76u_tx_tasklet since
> q->head is managed just in mt76u_tx_tasklet and q->queued is updated
> holding q->lock
> 
> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> ---
>  drivers/net/wireless/mediatek/mt76/usb.c | 18 +++++++++++-------
>  1 file changed, 11 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/net/wireless/mediatek/mt76/usb.c b/drivers/net/wireless/mediatek/mt76/usb.c
> index ac03acdae279..8cd70c32d77a 100644
> --- a/drivers/net/wireless/mediatek/mt76/usb.c
> +++ b/drivers/net/wireless/mediatek/mt76/usb.c
> @@ -634,29 +634,33 @@ static void mt76u_tx_tasklet(unsigned long data)
>  	int i;
>  
>  	for (i = 0; i < IEEE80211_NUM_ACS; i++) {
> +		u32 n_queued = 0, n_sw_queued = 0;
> +
>  		sq = &dev->q_tx[i];
>  		q = sq->q;
>  
> -		spin_lock_bh(&q->lock);
> -		while (true) {
> +		while (q->queued > n_queued) {
>  			buf = &q->entry[q->head].ubuf;
> -			if (!buf->done || !q->queued)
> +			if (!buf->done)
>  				break;

I'm still thinking if this is safe or not. Is somewhat tricky to
read variable outside the lock because in such case there is no time
guarantee when variable written on one CPU gets updated value on
different CPU. And for USB is not only q->queued but also buf->done.
 
Stanislaw
Lorenzo Bianconi March 19, 2019, 12:58 p.m. UTC | #2
> On Mon, Mar 18, 2019 at 12:09:32PM +0100, Lorenzo Bianconi wrote:
> > Similar to pci counterpart, reduce locking in mt76u_tx_tasklet since
> > q->head is managed just in mt76u_tx_tasklet and q->queued is updated
> > holding q->lock
> > 
> > Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> > ---
> >  drivers/net/wireless/mediatek/mt76/usb.c | 18 +++++++++++-------
> >  1 file changed, 11 insertions(+), 7 deletions(-)
> > 
> > diff --git a/drivers/net/wireless/mediatek/mt76/usb.c b/drivers/net/wireless/mediatek/mt76/usb.c
> > index ac03acdae279..8cd70c32d77a 100644
> > --- a/drivers/net/wireless/mediatek/mt76/usb.c
> > +++ b/drivers/net/wireless/mediatek/mt76/usb.c
> > @@ -634,29 +634,33 @@ static void mt76u_tx_tasklet(unsigned long data)
> >  	int i;
> >  
> >  	for (i = 0; i < IEEE80211_NUM_ACS; i++) {
> > +		u32 n_queued = 0, n_sw_queued = 0;
> > +
> >  		sq = &dev->q_tx[i];
> >  		q = sq->q;
> >  
> > -		spin_lock_bh(&q->lock);
> > -		while (true) {
> > +		while (q->queued > n_queued) {
> >  			buf = &q->entry[q->head].ubuf;
> > -			if (!buf->done || !q->queued)
> > +			if (!buf->done)
> >  				break;
> 
> I'm still thinking if this is safe or not. Is somewhat tricky to
> read variable outside the lock because in such case there is no time
> guarantee when variable written on one CPU gets updated value on
> different CPU. And for USB is not only q->queued but also buf->done.

Hi Stanislaw,

I was wondering if this is safe as well, but q->queued is updated holding q->lock
and I guess it will ensure to not overlap tx and status code path.
Regarding buf->done, it is already updated without holding the lock in mt76u_complete_tx

Regards,
Lorenzo

>  
> Stanislaw
>
Stanislaw Gruszka March 19, 2019, 4:04 p.m. UTC | #3
On Tue, Mar 19, 2019 at 01:58:13PM +0100, Lorenzo Bianconi wrote:
> > On Mon, Mar 18, 2019 at 12:09:32PM +0100, Lorenzo Bianconi wrote:
> > > Similar to pci counterpart, reduce locking in mt76u_tx_tasklet since
> > > q->head is managed just in mt76u_tx_tasklet and q->queued is updated
> > > holding q->lock
> > > 
> > > Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> > > ---
> > >  drivers/net/wireless/mediatek/mt76/usb.c | 18 +++++++++++-------
> > >  1 file changed, 11 insertions(+), 7 deletions(-)
> > > 
> > > diff --git a/drivers/net/wireless/mediatek/mt76/usb.c b/drivers/net/wireless/mediatek/mt76/usb.c
> > > index ac03acdae279..8cd70c32d77a 100644
> > > --- a/drivers/net/wireless/mediatek/mt76/usb.c
> > > +++ b/drivers/net/wireless/mediatek/mt76/usb.c
> > > @@ -634,29 +634,33 @@ static void mt76u_tx_tasklet(unsigned long data)
> > >  	int i;
> > >  
> > >  	for (i = 0; i < IEEE80211_NUM_ACS; i++) {
> > > +		u32 n_queued = 0, n_sw_queued = 0;
> > > +
> > >  		sq = &dev->q_tx[i];
> > >  		q = sq->q;
> > >  
> > > -		spin_lock_bh(&q->lock);
> > > -		while (true) {
> > > +		while (q->queued > n_queued) {
> > >  			buf = &q->entry[q->head].ubuf;
> > > -			if (!buf->done || !q->queued)
> > > +			if (!buf->done)
> > >  				break;
> > 
> > I'm still thinking if this is safe or not. Is somewhat tricky to
> > read variable outside the lock because in such case there is no time
> > guarantee when variable written on one CPU gets updated value on
> > different CPU. And for USB is not only q->queued but also buf->done.
> 
> Hi Stanislaw,
> 
> I was wondering if this is safe as well, but q->queued is updated holding q->lock
> and I guess it will ensure to not overlap tx and status code path.

Overlap will not happen, at worst what can happen is q->queued will be
smaller on tx_tasklet than on tx_queue_skb.

> Regarding buf->done, it is already updated without holding the lock in mt76u_complete_tx

That's actually a bug, but it's not important, if tx_tasklet will not
see updated buf->done <- true value by mt76u_complete_tx on different
cpu, it will not complete skb. It will be done on next tx_tasklet iteration.
Worse thing would be opposite situation.

Stanislaw
Lorenzo Bianconi March 19, 2019, 4:23 p.m. UTC | #4
> On Tue, Mar 19, 2019 at 01:58:13PM +0100, Lorenzo Bianconi wrote:
> > > On Mon, Mar 18, 2019 at 12:09:32PM +0100, Lorenzo Bianconi wrote:
> > > > Similar to pci counterpart, reduce locking in mt76u_tx_tasklet since
> > > > q->head is managed just in mt76u_tx_tasklet and q->queued is updated
> > > > holding q->lock
> > > > 
> > > > Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> > > > ---
> > > >  drivers/net/wireless/mediatek/mt76/usb.c | 18 +++++++++++-------
> > > >  1 file changed, 11 insertions(+), 7 deletions(-)
> > > > 
> > > > diff --git a/drivers/net/wireless/mediatek/mt76/usb.c b/drivers/net/wireless/mediatek/mt76/usb.c
> > > > index ac03acdae279..8cd70c32d77a 100644
> > > > --- a/drivers/net/wireless/mediatek/mt76/usb.c
> > > > +++ b/drivers/net/wireless/mediatek/mt76/usb.c
> > > > @@ -634,29 +634,33 @@ static void mt76u_tx_tasklet(unsigned long data)
> > > >  	int i;
> > > >  
> > > >  	for (i = 0; i < IEEE80211_NUM_ACS; i++) {
> > > > +		u32 n_queued = 0, n_sw_queued = 0;
> > > > +
> > > >  		sq = &dev->q_tx[i];
> > > >  		q = sq->q;
> > > >  
> > > > -		spin_lock_bh(&q->lock);
> > > > -		while (true) {
> > > > +		while (q->queued > n_queued) {
> > > >  			buf = &q->entry[q->head].ubuf;
> > > > -			if (!buf->done || !q->queued)
> > > > +			if (!buf->done)
> > > >  				break;
> > > 
> > > I'm still thinking if this is safe or not. Is somewhat tricky to
> > > read variable outside the lock because in such case there is no time
> > > guarantee when variable written on one CPU gets updated value on
> > > different CPU. And for USB is not only q->queued but also buf->done.
> > 
> > Hi Stanislaw,
> > 
> > I was wondering if this is safe as well, but q->queued is updated holding q->lock
> > and I guess it will ensure to not overlap tx and status code path.
> 
> Overlap will not happen, at worst what can happen is q->queued will be
> smaller on tx_tasklet than on tx_queue_skb.

Yes, that is the point :)

> 
> > Regarding buf->done, it is already updated without holding the lock in mt76u_complete_tx
> 
> That's actually a bug, but it's not important, if tx_tasklet will not
> see updated buf->done <- true value by mt76u_complete_tx on different
> cpu, it will not complete skb. It will be done on next tx_tasklet iteration.
> Worse thing would be opposite situation.

Can this really occur? (since queued is update holding the lock)

> 
> Stanislaw
Stanislaw Gruszka March 20, 2019, 8:11 a.m. UTC | #5
On Tue, Mar 19, 2019 at 05:23:25PM +0100, Lorenzo Bianconi wrote:
> > On Tue, Mar 19, 2019 at 01:58:13PM +0100, Lorenzo Bianconi wrote:
> > > > On Mon, Mar 18, 2019 at 12:09:32PM +0100, Lorenzo Bianconi wrote:
> > > > > Similar to pci counterpart, reduce locking in mt76u_tx_tasklet since
> > > > > q->head is managed just in mt76u_tx_tasklet and q->queued is updated
> > > > > holding q->lock
> > > > > 
> > > > > Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> > > > > ---
> > > > >  drivers/net/wireless/mediatek/mt76/usb.c | 18 +++++++++++-------
> > > > >  1 file changed, 11 insertions(+), 7 deletions(-)
> > > > > 
> > > > > diff --git a/drivers/net/wireless/mediatek/mt76/usb.c b/drivers/net/wireless/mediatek/mt76/usb.c
> > > > > index ac03acdae279..8cd70c32d77a 100644
> > > > > --- a/drivers/net/wireless/mediatek/mt76/usb.c
> > > > > +++ b/drivers/net/wireless/mediatek/mt76/usb.c
> > > > > @@ -634,29 +634,33 @@ static void mt76u_tx_tasklet(unsigned long data)
> > > > >  	int i;
> > > > >  
> > > > >  	for (i = 0; i < IEEE80211_NUM_ACS; i++) {
> > > > > +		u32 n_queued = 0, n_sw_queued = 0;
> > > > > +
> > > > >  		sq = &dev->q_tx[i];
> > > > >  		q = sq->q;
> > > > >  
> > > > > -		spin_lock_bh(&q->lock);
> > > > > -		while (true) {
> > > > > +		while (q->queued > n_queued) {
> > > > >  			buf = &q->entry[q->head].ubuf;
> > > > > -			if (!buf->done || !q->queued)
> > > > > +			if (!buf->done)
> > > > >  				break;
> > > > 
> > > > I'm still thinking if this is safe or not. Is somewhat tricky to
> > > > read variable outside the lock because in such case there is no time
> > > > guarantee when variable written on one CPU gets updated value on
> > > > different CPU. And for USB is not only q->queued but also buf->done.
> > > 
> > > Hi Stanislaw,
> > > 
> > > I was wondering if this is safe as well, but q->queued is updated holding q->lock
> > > and I guess it will ensure to not overlap tx and status code path.
> > 
> > Overlap will not happen, at worst what can happen is q->queued will be
> > smaller on tx_tasklet than on tx_queue_skb.
> 
> Yes, that is the point :)
> 
> > 
> > > Regarding buf->done, it is already updated without holding the lock in mt76u_complete_tx
> > 
> > That's actually a bug, but it's not important, if tx_tasklet will not
> > see updated buf->done <- true value by mt76u_complete_tx on different
> > cpu, it will not complete skb. It will be done on next tx_tasklet iteration.
> > Worse thing would be opposite situation.
> 
> Can this really occur?
I was thinking about that and yes it can occur. If q->queued and
buf->done writes/read will be reordered by CPUs. To prevent that you 
will need to use smp_wmb/smp_rmb pair, but it's just simpler and more
convenient to use lock.

> (since queued is update holding the lock)
Holding the lock on one thread without holding it on concurrent thread
is irrelevant, it's the same as not holding any lock at all.

Stanislaw
Lorenzo Bianconi March 21, 2019, 9:02 a.m. UTC | #6
> On Tue, Mar 19, 2019 at 05:23:25PM +0100, Lorenzo Bianconi wrote:
> > > On Tue, Mar 19, 2019 at 01:58:13PM +0100, Lorenzo Bianconi wrote:
> > > > > On Mon, Mar 18, 2019 at 12:09:32PM +0100, Lorenzo Bianconi wrote:
> > > > > > Similar to pci counterpart, reduce locking in mt76u_tx_tasklet since
> > > > > > q->head is managed just in mt76u_tx_tasklet and q->queued is updated
> > > > > > holding q->lock
> > > > > > 
> > > > > > Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> > > > > > ---
> > > > > >  drivers/net/wireless/mediatek/mt76/usb.c | 18 +++++++++++-------
> > > > > >  1 file changed, 11 insertions(+), 7 deletions(-)
> > > > > > 
> > > > > > diff --git a/drivers/net/wireless/mediatek/mt76/usb.c b/drivers/net/wireless/mediatek/mt76/usb.c
> > > > > > index ac03acdae279..8cd70c32d77a 100644
> > > > > > --- a/drivers/net/wireless/mediatek/mt76/usb.c
> > > > > > +++ b/drivers/net/wireless/mediatek/mt76/usb.c
> > > > > > @@ -634,29 +634,33 @@ static void mt76u_tx_tasklet(unsigned long data)
> > > > > >  	int i;
> > > > > >  
> > > > > >  	for (i = 0; i < IEEE80211_NUM_ACS; i++) {
> > > > > > +		u32 n_queued = 0, n_sw_queued = 0;
> > > > > > +
> > > > > >  		sq = &dev->q_tx[i];
> > > > > >  		q = sq->q;
> > > > > >  
> > > > > > -		spin_lock_bh(&q->lock);
> > > > > > -		while (true) {
> > > > > > +		while (q->queued > n_queued) {
> > > > > >  			buf = &q->entry[q->head].ubuf;
> > > > > > -			if (!buf->done || !q->queued)
> > > > > > +			if (!buf->done)
> > > > > >  				break;
> > > > > 
> > > > > I'm still thinking if this is safe or not. Is somewhat tricky to
> > > > > read variable outside the lock because in such case there is no time
> > > > > guarantee when variable written on one CPU gets updated value on
> > > > > different CPU. And for USB is not only q->queued but also buf->done.
> > > > 
> > > > Hi Stanislaw,
> > > > 
> > > > I was wondering if this is safe as well, but q->queued is updated holding q->lock
> > > > and I guess it will ensure to not overlap tx and status code path.
> > > 
> > > Overlap will not happen, at worst what can happen is q->queued will be
> > > smaller on tx_tasklet than on tx_queue_skb.
> > 
> > Yes, that is the point :)
> > 
> > > 
> > > > Regarding buf->done, it is already updated without holding the lock in mt76u_complete_tx
> > > 
> > > That's actually a bug, but it's not important, if tx_tasklet will not
> > > see updated buf->done <- true value by mt76u_complete_tx on different
> > > cpu, it will not complete skb. It will be done on next tx_tasklet iteration.
> > > Worse thing would be opposite situation.
> > 
> > Can this really occur?
> I was thinking about that and yes it can occur. If q->queued and
> buf->done writes/read will be reordered by CPUs. To prevent that you 
> will need to use smp_wmb/smp_rmb pair, but it's just simpler and more
> convenient to use lock.

good point, I will go through it.

Regards,
Lorenzo

> 
> > (since queued is update holding the lock)
> Holding the lock on one thread without holding it on concurrent thread
> is irrelevant, it's the same as not holding any lock at all.
> 
> Stanislaw
Felix Fietkau March 21, 2019, 9:10 a.m. UTC | #7
On 2019-03-21 10:02, Lorenzo Bianconi wrote:
>> On Tue, Mar 19, 2019 at 05:23:25PM +0100, Lorenzo Bianconi wrote:
>> > > On Tue, Mar 19, 2019 at 01:58:13PM +0100, Lorenzo Bianconi wrote:
>> > > > > On Mon, Mar 18, 2019 at 12:09:32PM +0100, Lorenzo Bianconi wrote:
>> > > > > > Similar to pci counterpart, reduce locking in mt76u_tx_tasklet since
>> > > > > > q->head is managed just in mt76u_tx_tasklet and q->queued is updated
>> > > > > > holding q->lock
>> > > > > > 
>> > > > > > Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
>> > > > > > ---
>> > > > > >  drivers/net/wireless/mediatek/mt76/usb.c | 18 +++++++++++-------
>> > > > > >  1 file changed, 11 insertions(+), 7 deletions(-)
>> > > > > > 
>> > > > > > diff --git a/drivers/net/wireless/mediatek/mt76/usb.c b/drivers/net/wireless/mediatek/mt76/usb.c
>> > > > > > index ac03acdae279..8cd70c32d77a 100644
>> > > > > > --- a/drivers/net/wireless/mediatek/mt76/usb.c
>> > > > > > +++ b/drivers/net/wireless/mediatek/mt76/usb.c
>> > > > > > @@ -634,29 +634,33 @@ static void mt76u_tx_tasklet(unsigned long data)
>> > > > > >  	int i;
>> > > > > >  
>> > > > > >  	for (i = 0; i < IEEE80211_NUM_ACS; i++) {
>> > > > > > +		u32 n_queued = 0, n_sw_queued = 0;
>> > > > > > +
>> > > > > >  		sq = &dev->q_tx[i];
>> > > > > >  		q = sq->q;
>> > > > > >  
>> > > > > > -		spin_lock_bh(&q->lock);
>> > > > > > -		while (true) {
>> > > > > > +		while (q->queued > n_queued) {
>> > > > > >  			buf = &q->entry[q->head].ubuf;
>> > > > > > -			if (!buf->done || !q->queued)
>> > > > > > +			if (!buf->done)
>> > > > > >  				break;
>> > > > > 
>> > > > > I'm still thinking if this is safe or not. Is somewhat tricky to
>> > > > > read variable outside the lock because in such case there is no time
>> > > > > guarantee when variable written on one CPU gets updated value on
>> > > > > different CPU. And for USB is not only q->queued but also buf->done.
>> > > > 
>> > > > Hi Stanislaw,
>> > > > 
>> > > > I was wondering if this is safe as well, but q->queued is updated holding q->lock
>> > > > and I guess it will ensure to not overlap tx and status code path.
>> > > 
>> > > Overlap will not happen, at worst what can happen is q->queued will be
>> > > smaller on tx_tasklet than on tx_queue_skb.
>> > 
>> > Yes, that is the point :)
>> > 
>> > > 
>> > > > Regarding buf->done, it is already updated without holding the lock in mt76u_complete_tx
>> > > 
>> > > That's actually a bug, but it's not important, if tx_tasklet will not
>> > > see updated buf->done <- true value by mt76u_complete_tx on different
>> > > cpu, it will not complete skb. It will be done on next tx_tasklet iteration.
>> > > Worse thing would be opposite situation.
>> > 
>> > Can this really occur?
>> I was thinking about that and yes it can occur. If q->queued and
>> buf->done writes/read will be reordered by CPUs. To prevent that you 
>> will need to use smp_wmb/smp_rmb pair, but it's just simpler and more
>> convenient to use lock.
> 
> good point, I will go through it.
Another simple solution would be to set buf->done = false in
mt76u_tx_tasklet after tx_complete_skb instead of doing it at enqueue time.

- Felix
diff mbox series

Patch

diff --git a/drivers/net/wireless/mediatek/mt76/usb.c b/drivers/net/wireless/mediatek/mt76/usb.c
index ac03acdae279..8cd70c32d77a 100644
--- a/drivers/net/wireless/mediatek/mt76/usb.c
+++ b/drivers/net/wireless/mediatek/mt76/usb.c
@@ -634,29 +634,33 @@  static void mt76u_tx_tasklet(unsigned long data)
 	int i;
 
 	for (i = 0; i < IEEE80211_NUM_ACS; i++) {
+		u32 n_queued = 0, n_sw_queued = 0;
+
 		sq = &dev->q_tx[i];
 		q = sq->q;
 
-		spin_lock_bh(&q->lock);
-		while (true) {
+		while (q->queued > n_queued) {
 			buf = &q->entry[q->head].ubuf;
-			if (!buf->done || !q->queued)
+			if (!buf->done)
 				break;
 
 			if (q->entry[q->head].schedule) {
 				q->entry[q->head].schedule = false;
-				sq->swq_queued--;
+				n_sw_queued++;
 			}
 
 			entry = q->entry[q->head];
 			q->head = (q->head + 1) % q->ndesc;
-			q->queued--;
+			n_queued++;
 
-			spin_unlock_bh(&q->lock);
 			dev->drv->tx_complete_skb(dev, i, &entry);
-			spin_lock_bh(&q->lock);
 		}
 
+		spin_lock_bh(&q->lock);
+
+		sq->swq_queued -= n_sw_queued;
+		q->queued -= n_queued;
+
 		wake = q->stopped && q->queued < q->ndesc - 8;
 		if (wake)
 			q->stopped = false;