diff mbox series

[v3,wireless-drivers,3/3] mt76: usb: do not always copy the first part of received frames

Message ID 1a9566c0a41ad0d940487a9d3f0008993c075ef2.1560461404.git.lorenzo@kernel.org (mailing list archive)
State Superseded
Delegated to: Felix Fietkau
Headers show
Series mt76: usb: fix A-MSDU support | expand

Commit Message

Lorenzo Bianconi June 13, 2019, 9:43 p.m. UTC
Set usb buffer size taking into account skb_shared_info in order to
not always copy the first part of received frames if A-MSDU is enabled
for SG capable devices. Moreover align usb buffer size to max_ep
boundaries and set buf_size to PAGE_SIZE even for sg case

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 drivers/net/wireless/mediatek/mt76/usb.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

Comments

Stanislaw Gruszka June 14, 2019, 7:53 a.m. UTC | #1
On Thu, Jun 13, 2019 at 11:43:13PM +0200, Lorenzo Bianconi wrote:
> Set usb buffer size taking into account skb_shared_info in order to
> not always copy the first part of received frames if A-MSDU is enabled
> for SG capable devices. Moreover align usb buffer size to max_ep
> boundaries and set buf_size to PAGE_SIZE even for sg case

I think this should not be applied to wirless-drivers, only first patch
that fix the bug and optimizations should be done in -next.

> +	int i, data_size;
>  
> +	data_size = rounddown(SKB_WITH_OVERHEAD(q->buf_size),
> +			      dev->usb.in_ep[MT_EP_IN_PKT_RX].max_packet);
>  	for (i = 0; i < nsgs; i++) {
>  		struct page *page;
>  		void *data;
> @@ -302,7 +304,7 @@ mt76u_fill_rx_sg(struct mt76_dev *dev, struct mt76_queue *q, struct urb *urb,
>  
>  		page = virt_to_head_page(data);
>  		offset = data - page_address(page);
> -		sg_set_page(&urb->sg[i], page, q->buf_size, offset);
> +		sg_set_page(&urb->sg[i], page, data_size, offset);
<snip>
> -	q->buf_size = dev->usb.sg_en ? MT_RX_BUF_SIZE : PAGE_SIZE;
>  	q->ndesc = MT_NUM_RX_ENTRIES;
> +	q->buf_size = PAGE_SIZE;
> +

This should be associated with decrease of MT_SG_MAX_SIZE to value that
is actually needed and currently this is 2 for 4k AMSDU.

However I don't think allocating 2 pages to avoid ieee80211 header and SNAP
copy is worth to do. For me best approach would be allocate 1 page for
4k AMSDU, 2 for 8k and 3 for 12k (still using sg, but without data_size
change to avoid 32B copying).

Stanislaw
Lorenzo Bianconi June 14, 2019, 10:22 a.m. UTC | #2
> On Thu, Jun 13, 2019 at 11:43:13PM +0200, Lorenzo Bianconi wrote:
> > Set usb buffer size taking into account skb_shared_info in order to
> > not always copy the first part of received frames if A-MSDU is enabled
> > for SG capable devices. Moreover align usb buffer size to max_ep
> > boundaries and set buf_size to PAGE_SIZE even for sg case
> 
> I think this should not be applied to wirless-drivers, only first patch
> that fix the bug and optimizations should be done in -next.

ack, right. I think patch 2/3 and 3/3 can go directly in Felix's tree

> 
> > +	int i, data_size;
> >  
> > +	data_size = rounddown(SKB_WITH_OVERHEAD(q->buf_size),
> > +			      dev->usb.in_ep[MT_EP_IN_PKT_RX].max_packet);
> >  	for (i = 0; i < nsgs; i++) {
> >  		struct page *page;
> >  		void *data;
> > @@ -302,7 +304,7 @@ mt76u_fill_rx_sg(struct mt76_dev *dev, struct mt76_queue *q, struct urb *urb,
> >  
> >  		page = virt_to_head_page(data);
> >  		offset = data - page_address(page);
> > -		sg_set_page(&urb->sg[i], page, q->buf_size, offset);
> > +		sg_set_page(&urb->sg[i], page, data_size, offset);
> <snip>
> > -	q->buf_size = dev->usb.sg_en ? MT_RX_BUF_SIZE : PAGE_SIZE;
> >  	q->ndesc = MT_NUM_RX_ENTRIES;
> > +	q->buf_size = PAGE_SIZE;
> > +
> 
> This should be associated with decrease of MT_SG_MAX_SIZE to value that
> is actually needed and currently this is 2 for 4k AMSDU.

MT_SG_MAX_SIZE is used even on tx side and I do not think we will end up with a
huge difference here

> 
> However I don't think allocating 2 pages to avoid ieee80211 header and SNAP
> copy is worth to do. For me best approach would be allocate 1 page for
> 4k AMSDU, 2 for 8k and 3 for 12k (still using sg, but without data_size
> change to avoid 32B copying).

From my point of view it is better to avoid copying if it is possible. Are you
sure there is no difference?

Regards,
Lorenzo

> 
> Stanislaw
Stanislaw Gruszka June 14, 2019, 11:04 a.m. UTC | #3
On Fri, Jun 14, 2019 at 12:22:48PM +0200, Lorenzo Bianconi wrote:
> > On Thu, Jun 13, 2019 at 11:43:13PM +0200, Lorenzo Bianconi wrote:
> > > Set usb buffer size taking into account skb_shared_info in order to
> > > not always copy the first part of received frames if A-MSDU is enabled
> > > for SG capable devices. Moreover align usb buffer size to max_ep
> > > boundaries and set buf_size to PAGE_SIZE even for sg case
> > 
> > I think this should not be applied to wirless-drivers, only first patch
> > that fix the bug and optimizations should be done in -next.
> 
> ack, right. I think patch 2/3 and 3/3 can go directly in Felix's tree
> 
> > 
> > > +	int i, data_size;
> > >  
> > > +	data_size = rounddown(SKB_WITH_OVERHEAD(q->buf_size),
> > > +			      dev->usb.in_ep[MT_EP_IN_PKT_RX].max_packet);
> > >  	for (i = 0; i < nsgs; i++) {
> > >  		struct page *page;
> > >  		void *data;
> > > @@ -302,7 +304,7 @@ mt76u_fill_rx_sg(struct mt76_dev *dev, struct mt76_queue *q, struct urb *urb,
> > >  
> > >  		page = virt_to_head_page(data);
> > >  		offset = data - page_address(page);
> > > -		sg_set_page(&urb->sg[i], page, q->buf_size, offset);
> > > +		sg_set_page(&urb->sg[i], page, data_size, offset);
> > <snip>
> > > -	q->buf_size = dev->usb.sg_en ? MT_RX_BUF_SIZE : PAGE_SIZE;
> > >  	q->ndesc = MT_NUM_RX_ENTRIES;
> > > +	q->buf_size = PAGE_SIZE;
> > > +
> > 
> > This should be associated with decrease of MT_SG_MAX_SIZE to value that
> > is actually needed and currently this is 2 for 4k AMSDU.
> 
> MT_SG_MAX_SIZE is used even on tx side and I do not think we will end up with a
> huge difference here

So use different value as argument for mt76u_fill_rx_sg() in
mt76u_rx_urb_alloc(). After changing buf_size to PAGE_SIZE we will
allocate 8 pages per rx queue entry, but only 2 pages will be used
(with data_size change, 1 without data_size change). Or I'm wrong?

> > However I don't think allocating 2 pages to avoid ieee80211 header and SNAP
> > copy is worth to do. For me best approach would be allocate 1 page for
> > 4k AMSDU, 2 for 8k and 3 for 12k (still using sg, but without data_size
> > change to avoid 32B copying).
> 
> From my point of view it is better to avoid copying if it is possible. Are you
> sure there is no difference?

I do not understand what you mean by difference here.

Stanislaw
Lorenzo Bianconi June 14, 2019, 12:46 p.m. UTC | #4
> On Fri, Jun 14, 2019 at 12:22:48PM +0200, Lorenzo Bianconi wrote:
> > > On Thu, Jun 13, 2019 at 11:43:13PM +0200, Lorenzo Bianconi wrote:
> > > > Set usb buffer size taking into account skb_shared_info in order to
> > > > not always copy the first part of received frames if A-MSDU is enabled
> > > > for SG capable devices. Moreover align usb buffer size to max_ep
> > > > boundaries and set buf_size to PAGE_SIZE even for sg case
> > > 
> > > I think this should not be applied to wirless-drivers, only first patch
> > > that fix the bug and optimizations should be done in -next.
> > 
> > ack, right. I think patch 2/3 and 3/3 can go directly in Felix's tree
> > 
> > > 
> > > > +	int i, data_size;
> > > >  
> > > > +	data_size = rounddown(SKB_WITH_OVERHEAD(q->buf_size),
> > > > +			      dev->usb.in_ep[MT_EP_IN_PKT_RX].max_packet);
> > > >  	for (i = 0; i < nsgs; i++) {
> > > >  		struct page *page;
> > > >  		void *data;
> > > > @@ -302,7 +304,7 @@ mt76u_fill_rx_sg(struct mt76_dev *dev, struct mt76_queue *q, struct urb *urb,
> > > >  
> > > >  		page = virt_to_head_page(data);
> > > >  		offset = data - page_address(page);
> > > > -		sg_set_page(&urb->sg[i], page, q->buf_size, offset);
> > > > +		sg_set_page(&urb->sg[i], page, data_size, offset);
> > > <snip>
> > > > -	q->buf_size = dev->usb.sg_en ? MT_RX_BUF_SIZE : PAGE_SIZE;
> > > >  	q->ndesc = MT_NUM_RX_ENTRIES;
> > > > +	q->buf_size = PAGE_SIZE;
> > > > +
> > > 
> > > This should be associated with decrease of MT_SG_MAX_SIZE to value that
> > > is actually needed and currently this is 2 for 4k AMSDU.
> > 
> > MT_SG_MAX_SIZE is used even on tx side and I do not think we will end up with a
> > huge difference here
> 
> So use different value as argument for mt76u_fill_rx_sg() in
> mt76u_rx_urb_alloc(). After changing buf_size to PAGE_SIZE we will
> allocate 8 pages per rx queue entry, but only 2 pages will be used
> (with data_size change, 1 without data_size change). Or I'm wrong?

yes, it is right (we will use two pages with data_size change). Maybe better to
use 4 pages for each rx queue entry? (otherwise we will probably change it in
the future)

> 
> > > However I don't think allocating 2 pages to avoid ieee80211 header and SNAP
> > > copy is worth to do. For me best approach would be allocate 1 page for
> > > 4k AMSDU, 2 for 8k and 3 for 12k (still using sg, but without data_size
> > > change to avoid 32B copying).
> > 
> > From my point of view it is better to avoid copying if it is possible. Are you
> > sure there is no difference?
> 
> I do not understand what you mean by difference here.

tpt differences, not sure if there are any

Regards,
Lorenzo

> 
> Stanislaw
Stanislaw Gruszka June 15, 2019, 9:40 a.m. UTC | #5
On Fri, Jun 14, 2019 at 02:46:36PM +0200, Lorenzo Bianconi wrote:
> > > 
> > > ack, right. I think patch 2/3 and 3/3 can go directly in Felix's tree
> > > 
> > > > 
> > > > > +	int i, data_size;
> > > > >  
> > > > > +	data_size = rounddown(SKB_WITH_OVERHEAD(q->buf_size),
> > > > > +			      dev->usb.in_ep[MT_EP_IN_PKT_RX].max_packet);
> > > > >  	for (i = 0; i < nsgs; i++) {
> > > > >  		struct page *page;
> > > > >  		void *data;
> > > > > @@ -302,7 +304,7 @@ mt76u_fill_rx_sg(struct mt76_dev *dev, struct mt76_queue *q, struct urb *urb,
> > > > >  
> > > > >  		page = virt_to_head_page(data);
> > > > >  		offset = data - page_address(page);
> > > > > -		sg_set_page(&urb->sg[i], page, q->buf_size, offset);
> > > > > +		sg_set_page(&urb->sg[i], page, data_size, offset);
> > > > <snip>
> > > > > -	q->buf_size = dev->usb.sg_en ? MT_RX_BUF_SIZE : PAGE_SIZE;
> > > > >  	q->ndesc = MT_NUM_RX_ENTRIES;
> > > > > +	q->buf_size = PAGE_SIZE;
> > > > > +
> > > > 
> > > > This should be associated with decrease of MT_SG_MAX_SIZE to value that
> > > > is actually needed and currently this is 2 for 4k AMSDU.
> > > 
> > > MT_SG_MAX_SIZE is used even on tx side and I do not think we will end up with a
> > > huge difference here
> > 
> > So use different value as argument for mt76u_fill_rx_sg() in
> > mt76u_rx_urb_alloc(). After changing buf_size to PAGE_SIZE we will
> > allocate 8 pages per rx queue entry, but only 2 pages will be used
> > (with data_size change, 1 without data_size change). Or I'm wrong?
> 
> yes, it is right (we will use two pages with data_size change). Maybe better to
> use 4 pages for each rx queue entry? (otherwise we will probably change it in
> the future)

We should not allocate more than is required. If support for bigger
rx AMSDUs will be added and announced in vht/ht capabilities to remote
stations, then increase of number of segments will be needed.

> > > > However I don't think allocating 2 pages to avoid ieee80211 header and SNAP
> > > > copy is worth to do. For me best approach would be allocate 1 page for
> > > > 4k AMSDU, 2 for 8k and 3 for 12k (still using sg, but without data_size
> > > > change to avoid 32B copying).
> > > 
> > > From my point of view it is better to avoid copying if it is possible. Are you
> > > sure there is no difference?
> > 
> > I do not understand what you mean by difference here.
> 
> tpt differences, not sure if there are any

I would not expect any measurable difference in tpt nor in cpu usage
either way.

But I think, if some AMSDU subframe will be spited into two fragments,
data most likely will need to be linearised/copied, at some point before
passed to application, what will overcome any benefit of avoiding coping
802.11 header. Thought, I don't think this somehow will be visible in
benchmarking.

Stanislaw
Lorenzo Bianconi June 19, 2019, 8:09 p.m. UTC | #6
> On Fri, Jun 14, 2019 at 02:46:36PM +0200, Lorenzo Bianconi wrote:
> > > > 
> > > > ack, right. I think patch 2/3 and 3/3 can go directly in Felix's tree
> > > > 
> > > > > 
> > > > > > +	int i, data_size;
> > > > > >  
> > > > > > +	data_size = rounddown(SKB_WITH_OVERHEAD(q->buf_size),
> > > > > > +			      dev->usb.in_ep[MT_EP_IN_PKT_RX].max_packet);
> > > > > >  	for (i = 0; i < nsgs; i++) {
> > > > > >  		struct page *page;
> > > > > >  		void *data;
> > > > > > @@ -302,7 +304,7 @@ mt76u_fill_rx_sg(struct mt76_dev *dev, struct mt76_queue *q, struct urb *urb,
> > > > > >  
> > > > > >  		page = virt_to_head_page(data);
> > > > > >  		offset = data - page_address(page);
> > > > > > -		sg_set_page(&urb->sg[i], page, q->buf_size, offset);
> > > > > > +		sg_set_page(&urb->sg[i], page, data_size, offset);
> > > > > <snip>
> > > > > > -	q->buf_size = dev->usb.sg_en ? MT_RX_BUF_SIZE : PAGE_SIZE;
> > > > > >  	q->ndesc = MT_NUM_RX_ENTRIES;
> > > > > > +	q->buf_size = PAGE_SIZE;
> > > > > > +
> > > > > 
> > > > > This should be associated with decrease of MT_SG_MAX_SIZE to value that
> > > > > is actually needed and currently this is 2 for 4k AMSDU.
> > > > 
> > > > MT_SG_MAX_SIZE is used even on tx side and I do not think we will end up with a
> > > > huge difference here
> > > 
> > > So use different value as argument for mt76u_fill_rx_sg() in
> > > mt76u_rx_urb_alloc(). After changing buf_size to PAGE_SIZE we will
> > > allocate 8 pages per rx queue entry, but only 2 pages will be used
> > > (with data_size change, 1 without data_size change). Or I'm wrong?
> > 
> > yes, it is right (we will use two pages with data_size change). Maybe better to
> > use 4 pages for each rx queue entry? (otherwise we will probably change it in
> > the future)
> 
> We should not allocate more than is required. If support for bigger
> rx AMSDUs will be added and announced in vht/ht capabilities to remote
> stations, then increase of number of segments will be needed.
> 
> > > > > However I don't think allocating 2 pages to avoid ieee80211 header and SNAP
> > > > > copy is worth to do. For me best approach would be allocate 1 page for
> > > > > 4k AMSDU, 2 for 8k and 3 for 12k (still using sg, but without data_size
> > > > > change to avoid 32B copying).
> > > > 
> > > > From my point of view it is better to avoid copying if it is possible. Are you
> > > > sure there is no difference?
> > > 
> > > I do not understand what you mean by difference here.
> > 
> > tpt differences, not sure if there are any
> 
> I would not expect any measurable difference in tpt nor in cpu usage
> either way.
> 
> But I think, if some AMSDU subframe will be spited into two fragments,
> data most likely will need to be linearised/copied, at some point before
> passed to application, what will overcome any benefit of avoiding coping
> 802.11 header. Thought, I don't think this somehow will be visible in
> benchmarking.

Sorry for the late reply. I think so.
I will post a v4 soon.

Regards,
Lorenzo

> 
> Stanislaw
diff mbox series

Patch

diff --git a/drivers/net/wireless/mediatek/mt76/usb.c b/drivers/net/wireless/mediatek/mt76/usb.c
index 1ee54a9b302e..2ee3f8fa1483 100644
--- a/drivers/net/wireless/mediatek/mt76/usb.c
+++ b/drivers/net/wireless/mediatek/mt76/usb.c
@@ -289,8 +289,10 @@  static int
 mt76u_fill_rx_sg(struct mt76_dev *dev, struct mt76_queue *q, struct urb *urb,
 		 int nsgs, gfp_t gfp)
 {
-	int i;
+	int i, data_size;
 
+	data_size = rounddown(SKB_WITH_OVERHEAD(q->buf_size),
+			      dev->usb.in_ep[MT_EP_IN_PKT_RX].max_packet);
 	for (i = 0; i < nsgs; i++) {
 		struct page *page;
 		void *data;
@@ -302,7 +304,7 @@  mt76u_fill_rx_sg(struct mt76_dev *dev, struct mt76_queue *q, struct urb *urb,
 
 		page = virt_to_head_page(data);
 		offset = data - page_address(page);
-		sg_set_page(&urb->sg[i], page, q->buf_size, offset);
+		sg_set_page(&urb->sg[i], page, data_size, offset);
 	}
 
 	if (i < nsgs) {
@@ -314,7 +316,7 @@  mt76u_fill_rx_sg(struct mt76_dev *dev, struct mt76_queue *q, struct urb *urb,
 	}
 
 	urb->num_sgs = max_t(int, i, urb->num_sgs);
-	urb->transfer_buffer_length = urb->num_sgs * q->buf_size,
+	urb->transfer_buffer_length = urb->num_sgs * data_size;
 	sg_init_marker(urb->sg, urb->num_sgs);
 
 	return i ? : -ENOMEM;
@@ -611,8 +613,9 @@  static int mt76u_alloc_rx(struct mt76_dev *dev)
 	if (!q->entry)
 		return -ENOMEM;
 
-	q->buf_size = dev->usb.sg_en ? MT_RX_BUF_SIZE : PAGE_SIZE;
 	q->ndesc = MT_NUM_RX_ENTRIES;
+	q->buf_size = PAGE_SIZE;
+
 	for (i = 0; i < q->ndesc; i++) {
 		err = mt76u_rx_urb_alloc(dev, &q->entry[i]);
 		if (err < 0)