Message ID | 20221130124616.1500643-1-dnlplm@gmail.com (mailing list archive) |
---|---|
Headers | show |
Series | add tx packets aggregation to ethtool and rmnet | expand |
On Wed, Nov 30, 2022 at 5:15 AM Daniele Palmas <dnlplm@gmail.com> wrote: > > Hello maintainers and all, > > this patchset implements tx qmap packets aggregation in rmnet and generic > ethtool support for that. > > Some low-cat Thread-x based modems are not capable of properly reaching the maximum > allowed throughput both in tx and rx during a bidirectional test if tx packets > aggregation is not enabled. > > I verified this problem with rmnet + qmi_wwan by using a MDM9207 Cat. 4 based modem > (50Mbps/150Mbps max throughput). What is actually happening is pictured at > https://drive.google.com/file/d/1gSbozrtd9h0X63i6vdkNpN68d-9sg8f9/view Thank you for documenting which device this is. Is it still handing in 150ms of bufferbloat in good conditions, and 25 seconds or so in bad? > Testing with iperf TCP, when rx and tx flows are tested singularly there's no issue > in tx and minor issues in rx (not able to reach max throughput). When there are concurrent > tx and rx flows, tx throughput has an huge drop. rx a minor one, but still present. > > The same scenario with tx aggregation enabled is pictured at > https://drive.google.com/file/d/1jcVIKNZD7K3lHtwKE5W02mpaloudYYih/view > showing a regular graph. > > This issue does not happen with high-cat modems (e.g. SDX20), or at least it > does not happen at the throughputs I'm able to test currently: maybe the same > could happen when moving close to the maximum rates supported by those modems. > Anyway, having the tx aggregation enabled should not hurt. > > The first attempt to solve this issue was in qmi_wwan qmap implementation, > see the discussion at https://lore.kernel.org/netdev/20221019132503.6783-1-dnlplm@gmail.com/ > > However, it turned out that rmnet was a better candidate for the implementation. > > Moreover, Greg and Jakub suggested also to use ethtool for the configuration: > not sure if I got their advice right, but this patchset add also generic ethtool > support for tx aggregation. > > The patches have been tested mainly against an MDM9207 based modem through USB > and SDX55 through PCI (MHI). > > v2 should address the comments highlighted in the review: the implementation is > still in rmnet, due to Subash's request of keeping tx aggregation there. > > Thanks, > Daniele > > Daniele Palmas (3): > ethtool: add tx aggregation parameters > net: qualcomm: rmnet: add tx packets aggregation > net: qualcomm: rmnet: add ethtool support for configuring tx > aggregation > > Documentation/networking/ethtool-netlink.rst | 17 ++ > .../ethernet/qualcomm/rmnet/rmnet_config.c | 5 + > .../ethernet/qualcomm/rmnet/rmnet_config.h | 20 ++ > .../ethernet/qualcomm/rmnet/rmnet_handlers.c | 18 +- > .../net/ethernet/qualcomm/rmnet/rmnet_map.h | 6 + > .../ethernet/qualcomm/rmnet/rmnet_map_data.c | 191 ++++++++++++++++++ > .../net/ethernet/qualcomm/rmnet/rmnet_vnd.c | 54 ++++- > .../net/ethernet/qualcomm/rmnet/rmnet_vnd.h | 1 + > include/linux/ethtool.h | 12 +- > include/uapi/linux/ethtool_netlink.h | 3 + > net/ethtool/coalesce.c | 22 +- > 11 files changed, 342 insertions(+), 7 deletions(-) > > -- > 2.37.1 >
Hello Dave, Il giorno mer 30 nov 2022 alle ore 16:04 Dave Taht <dave.taht@gmail.com> ha scritto: > > On Wed, Nov 30, 2022 at 5:15 AM Daniele Palmas <dnlplm@gmail.com> wrote: > > > > Hello maintainers and all, > > > > this patchset implements tx qmap packets aggregation in rmnet and generic > > ethtool support for that. > > > > Some low-cat Thread-x based modems are not capable of properly reaching the maximum > > allowed throughput both in tx and rx during a bidirectional test if tx packets > > aggregation is not enabled. > > > > I verified this problem with rmnet + qmi_wwan by using a MDM9207 Cat. 4 based modem > > (50Mbps/150Mbps max throughput). What is actually happening is pictured at > > https://drive.google.com/file/d/1gSbozrtd9h0X63i6vdkNpN68d-9sg8f9/view > > Thank you for documenting which device this is. Is it still handing in > 150ms of bufferbloat in good conditions, > and 25 seconds or so in bad? > New Flent test results available at https://drive.google.com/drive/folders/1-rpeuM2Dg9rVdYCP0M84K4Ook5kcZTWc?usp=share_link From what I can understand, it seems to me a bit better, but not completely sure how much is directly related to the changes of v2. Regards, Daniele
On Thu, Dec 1, 2022 at 2:55 AM Daniele Palmas <dnlplm@gmail.com> wrote: > > Hello Dave, > > Il giorno mer 30 nov 2022 alle ore 16:04 Dave Taht > <dave.taht@gmail.com> ha scritto: > > > > On Wed, Nov 30, 2022 at 5:15 AM Daniele Palmas <dnlplm@gmail.com> wrote: > > > > > > Hello maintainers and all, > > > > > > this patchset implements tx qmap packets aggregation in rmnet and generic > > > ethtool support for that. > > > > > > Some low-cat Thread-x based modems are not capable of properly reaching the maximum > > > allowed throughput both in tx and rx during a bidirectional test if tx packets > > > aggregation is not enabled. > > > > > > I verified this problem with rmnet + qmi_wwan by using a MDM9207 Cat. 4 based modem > > > (50Mbps/150Mbps max throughput). What is actually happening is pictured at > > > https://drive.google.com/file/d/1gSbozrtd9h0X63i6vdkNpN68d-9sg8f9/view > > > > Thank you for documenting which device this is. Is it still handing in > > 150ms of bufferbloat in good conditions, > > and 25 seconds or so in bad? > > > > New Flent test results available at > https://drive.google.com/drive/folders/1-rpeuM2Dg9rVdYCP0M84K4Ook5kcZTWc?usp=share_link > > From what I can understand, it seems to me a bit better, but not > completely sure how much is directly related to the changes of v2. Anything that can shorten the round trips being experienced by the flows such as yours and wedge more data packets in would be a goodness. A switch to using the "BBR" congestion controller might be a vast improvement over what I figure is your default of cubic. On both server and client.... modprobe tcp_bbr sysctl -w net.ipv4.tcp_congestion_control=bbr And rerun your tests. Over the years we've come up with multiple mechanisms for fixing this on other network subsystems (bql, aql, tsq, etc, etc), but something that could track a "completion" - where we knew the packet had finally got "in the air" and out of the device would be best. It's kind of hard to convince everyone to replace their congestion controller. Any chance you could share these results with the maker of the test tool you are primarily using? I'd like to think that they would embrace adding some sort of simultaneous latency measurement to it, and I would hope that that would help bring more minds to figuring out how to solve the 25!! seconds worth of delay that can accumulate on this path through the kernel https://github.com/Zoxc/crusader is a nice, emerging tool, that runs on all OSes (it's in rust) that does a network test more right and is simpler than flent, by far. I especially like the staggered start feature in it, as it would clearly show a second flow, trying to start, with this kind of latency already in the system, failing miserably. My comments on your patchset are not a blocker to being accepted! If you want to get a grip on how much better things "could be", slap an instance of cake bandwidth 40mbit on the up, and 80mbit on the down on your "good" network setup, watch the latencies fall to nearly zero, watch a second flow starting late grab it's fair share of the bandwidth almost immediately, and dream of the metaverse actually working right... > Regards, > Daniele
Il giorno gio 1 dic 2022 alle ore 16:22 Dave Taht <dave.taht@gmail.com> ha scritto: > > A switch to using the "BBR" congestion controller might be a vast > improvement over what I figure is your default of cubic. On both > server and client.... > > modprobe tcp_bbr > sysctl -w net.ipv4.tcp_congestion_control=bbr > > And rerun your tests. > Done, results available at https://drive.google.com/drive/folders/14CRs3ugyX_ANpUhMRJCH3ORy-jR3mbIS?usp=sharing > Any chance you could share these results with the maker of the test > tool you are primarily using? Mmmhh... Not easy, since I don't have a direct point of contact. > My comments on your patchset are not a blocker to being accepted! > Thanks, I must admit that I don't have much knowledge on this topic, will try to educate myself. Regards, Daniele
On Fri, Dec 2, 2022 at 9:03 AM Daniele Palmas <dnlplm@gmail.com> wrote: > > Il giorno gio 1 dic 2022 alle ore 16:22 Dave Taht > <dave.taht@gmail.com> ha scritto: > > > > A switch to using the "BBR" congestion controller might be a vast > > improvement over what I figure is your default of cubic. On both > > server and client.... > > > > modprobe tcp_bbr > > sysctl -w net.ipv4.tcp_congestion_control=bbr > > > > And rerun your tests. > > > > Done, results available at > https://drive.google.com/drive/folders/14CRs3ugyX_ANpUhMRJCH3ORy-jR3mbIS?usp=sharing Mildly better on the up with bbr. till cracks 24 seconds on the rrul up + down test. bbr on the server also you are using might actually work out (which of course involves convincing the world to switch to bbr for cell transfers) - could you try bbr on the server also? I keep hoping that somewhere in the 4g or 5g usb stack there is some way to tag packets and tell the OS that the given packet had actually been sent out over the air. Getting the max packets in the card at any rate good or bad, down below 20ms would be marvelous. the 5G folk keep boasting about 1ms mac "latency" and never seem to have noticed how bad it can get if you actually send real data over a usb bus to it. > > Any chance you could share these results with the maker of the test > > tool you are primarily using? > > Mmmhh... Not easy, since I don't have a direct point of contact. Mind if I rant about this somewhere? > > My comments on your patchset are not a blocker to being accepted! > > > > Thanks, I must admit that I don't have much knowledge on this topic, > will try to educate myself. thx! while I have a lot of entertaining videos about this kind of stuff the starting point I recommend most is van jacobson's ietf talk, with a fountain analogy: video:https://archive.org/details/video1_20191129 slides: https://pollere.net/Pdfdocs/codel.ietf-tsv.jul12.pdf The abstraction I keep hoping we find (somehow) - for lte/4g/5g - to me would look a lot like BQL: https://lwn.net/Articles/469652/ > > Regards, > Daniele