diff mbox series

[net-next] qmi_wwan: Increase headroom for QMAP SKBs

Message ID 20210106122403.1321180-1-kristian.evensen@gmail.com (mailing list archive)
State Accepted
Delegated to: Netdev Maintainers
Headers show
Series [net-next] qmi_wwan: Increase headroom for QMAP SKBs | expand

Checks

Context Check Description
netdev/cover_letter success Link
netdev/fixes_present success Link
netdev/patch_count success Link
netdev/tree_selection success Clearly marked for net-next
netdev/subject_prefix success Link
netdev/cc_maintainers warning 3 maintainers not CCed: kuba@kernel.org linux-usb@vger.kernel.org davem@davemloft.net
netdev/source_inline success Was 0 now: 0
netdev/verify_signedoff success Link
netdev/module_param success Was 0 now: 0
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/verify_fixes success Link
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 15 lines checked
netdev/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
netdev/header_inline success Link
netdev/stable success Stable not CCed

Commit Message

Kristian Evensen Jan. 6, 2021, 12:24 p.m. UTC
When measuring the throughput (iperf3 + TCP) while routing on a
not-so-powerful device (Mediatek MT7621, 880MHz CPU), I noticed that I
achieved significantly lower speeds with QMI-based modems than for
example a USB LAN dongle. The CPU was saturated in all of my tests.

With the dongle I got ~300 Mbit/s, while I only measured ~200 Mbit/s
with the modems. All offloads, etc.  were switched off for the dongle,
and I configured the modems to use QMAP (16k aggregation). The tests
with the dongle were performed in my local (gigabit) network, while the
LTE network the modems were connected to delivers 700-800 Mbit/s.

Profiling the kernel revealed the cause of the performance difference.
In qmimux_rx_fixup(), an SKB is allocated for each packet contained in
the URB. This SKB has too little headroom, causing the check in
skb_cow() (called from ip_forward()) to fail. pskb_expand_head() is then
called and the SKB is reallocated. In the output from perf, I see that a
significant amount of time is spent in pskb_expand_head() + support
functions.

In order to ensure that the SKB has enough headroom, this commit
increases the amount of memory allocated in qmimux_rx_fixup() by
LL_MAX_HEADER. The reason for using LL_MAX_HEADER and not a more
accurate value, is that we do not know the type of the outgoing network
interface. After making this change, I achieve the same throughput with
the modems as with the dongle.

Signed-off-by: Kristian Evensen <kristian.evensen@gmail.com>
---
 drivers/net/usb/qmi_wwan.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Bjørn Mork Jan. 6, 2021, 2:31 p.m. UTC | #1
Kristian Evensen <kristian.evensen@gmail.com> writes:

> When measuring the throughput (iperf3 + TCP) while routing on a
> not-so-powerful device (Mediatek MT7621, 880MHz CPU), I noticed that I
> achieved significantly lower speeds with QMI-based modems than for
> example a USB LAN dongle. The CPU was saturated in all of my tests.
>
> With the dongle I got ~300 Mbit/s, while I only measured ~200 Mbit/s
> with the modems. All offloads, etc.  were switched off for the dongle,
> and I configured the modems to use QMAP (16k aggregation). The tests
> with the dongle were performed in my local (gigabit) network, while the
> LTE network the modems were connected to delivers 700-800 Mbit/s.
>
> Profiling the kernel revealed the cause of the performance difference.
> In qmimux_rx_fixup(), an SKB is allocated for each packet contained in
> the URB. This SKB has too little headroom, causing the check in
> skb_cow() (called from ip_forward()) to fail. pskb_expand_head() is then
> called and the SKB is reallocated. In the output from perf, I see that a
> significant amount of time is spent in pskb_expand_head() + support
> functions.
>
> In order to ensure that the SKB has enough headroom, this commit
> increases the amount of memory allocated in qmimux_rx_fixup() by
> LL_MAX_HEADER. The reason for using LL_MAX_HEADER and not a more
> accurate value, is that we do not know the type of the outgoing network
> interface. After making this change, I achieve the same throughput with
> the modems as with the dongle.
>
> Signed-off-by: Kristian Evensen <kristian.evensen@gmail.com>

Nice work!

Just wondering: Will the same problem affect the usbnet allocated skbs
as well in case of raw-ip? They will obviously be large enough, but the
reserved headroom probably isn't when we put an IP packet there without
any L2 header?

In any case:

Acked-by: Bjørn Mork <bjorn@mork.no>
Kristian Evensen Jan. 6, 2021, 3:18 p.m. UTC | #2
Hi Bjørn,

On Wed, Jan 6, 2021 at 3:31 PM Bjørn Mork <bjorn@mork.no> wrote:
> Nice work!

Thanks a lot!

> Just wondering: Will the same problem affect the usbnet allocated skbs
> as well in case of raw-ip? They will obviously be large enough, but the
> reserved headroom probably isn't when we put an IP packet there without
> any L2 header?

You are right, I completely forgot about those SKBs. I will try to
find some time to investigate the non-QMAP performance, if a similar
fix (I guess an skb_reserve after the case-statement is enough) will
have an effect and submit a follow-up patch in case. Thanks for
reminding me, I have switched to only use QMAP :)

BR,
Kristian
Jakub Kicinski Jan. 7, 2021, 8:07 p.m. UTC | #3
On Wed, 06 Jan 2021 15:31:10 +0100 Bjørn Mork wrote:
> Kristian Evensen <kristian.evensen@gmail.com> writes:
> 
> > When measuring the throughput (iperf3 + TCP) while routing on a
> > not-so-powerful device (Mediatek MT7621, 880MHz CPU), I noticed that I
> > achieved significantly lower speeds with QMI-based modems than for
> > example a USB LAN dongle. The CPU was saturated in all of my tests.
> >
> > With the dongle I got ~300 Mbit/s, while I only measured ~200 Mbit/s
> > with the modems. All offloads, etc.  were switched off for the dongle,
> > and I configured the modems to use QMAP (16k aggregation). The tests
> > with the dongle were performed in my local (gigabit) network, while the
> > LTE network the modems were connected to delivers 700-800 Mbit/s.
> >
> > Profiling the kernel revealed the cause of the performance difference.
> > In qmimux_rx_fixup(), an SKB is allocated for each packet contained in
> > the URB. This SKB has too little headroom, causing the check in
> > skb_cow() (called from ip_forward()) to fail. pskb_expand_head() is then
> > called and the SKB is reallocated. In the output from perf, I see that a
> > significant amount of time is spent in pskb_expand_head() + support
> > functions.
> >
> > In order to ensure that the SKB has enough headroom, this commit
> > increases the amount of memory allocated in qmimux_rx_fixup() by
> > LL_MAX_HEADER. The reason for using LL_MAX_HEADER and not a more
> > accurate value, is that we do not know the type of the outgoing network
> > interface. After making this change, I achieve the same throughput with
> > the modems as with the dongle.
> >
> > Signed-off-by: Kristian Evensen <kristian.evensen@gmail.com>  
> 
> Nice work!
> 
> Just wondering: Will the same problem affect the usbnet allocated skbs
> as well in case of raw-ip? They will obviously be large enough, but the
> reserved headroom probably isn't when we put an IP packet there without
> any L2 header?
> 
> In any case:
> 
> Acked-by: Bjørn Mork <bjorn@mork.no>

Applied, thanks!
diff mbox series

Patch

diff --git a/drivers/net/usb/qmi_wwan.c b/drivers/net/usb/qmi_wwan.c
index af19513a9..7ea113f51 100644
--- a/drivers/net/usb/qmi_wwan.c
+++ b/drivers/net/usb/qmi_wwan.c
@@ -186,7 +186,7 @@  static int qmimux_rx_fixup(struct usbnet *dev, struct sk_buff *skb)
 		net = qmimux_find_dev(dev, hdr->mux_id);
 		if (!net)
 			goto skip;
-		skbn = netdev_alloc_skb(net, pkt_len);
+		skbn = netdev_alloc_skb(net, pkt_len + LL_MAX_HEADER);
 		if (!skbn)
 			return 0;
 		skbn->dev = net;
@@ -203,6 +203,7 @@  static int qmimux_rx_fixup(struct usbnet *dev, struct sk_buff *skb)
 			goto skip;
 		}
 
+		skb_reserve(skbn, LL_MAX_HEADER);
 		skb_put_data(skbn, skb->data + offset + qmimux_hdr_sz, pkt_len);
 		if (netif_rx(skbn) != NET_RX_SUCCESS) {
 			net->stats.rx_errors++;