Message ID | 20190523210651.80902-5-fklassen@appneta.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Allow TX timestamp with UDP GSO | expand |
On Thu, May 23, 2019 at 5:11 PM Fred Klassen <fklassen@appneta.com> wrote: > > This enhancement adds the '-a' option, which will count all CMSG > messages on the error queue and print a summary report. > > Fixes: 3a687bef148d ("selftests: udp gso benchmark") Also not a fix, but an extension. > > Example: > > # ./udpgso_bench_tx -4uT -a -l5 -S 1472 -D 172.16.120.189 > udp tx: 492 MB/s 8354 calls/s 8354 msg/s > udp tx: 477 MB/s 8106 calls/s 8106 msg/s > udp tx: 488 MB/s 8288 calls/s 8288 msg/s > udp tx: 882 MB/s 14975 calls/s 14975 msg/s > Summary over 5.000 seconds ... > sum udp tx: 696 MB/s 57696 calls (11539/s) 57696 msgs (11539/s) > Tx Timestamps: received: 57696 errors: 0 > > This can be useful in tracking loss of messages when under load. For example, > adding the '-z' option results in loss of TX timestamp messages: > > # ./udpgso_bench_tx -4ucT -a -l5 -S 1472 -D 172.16.120.189 -p 3239 -z > udp tx: 490 MB/s 8325 calls/s 8325 msg/s > udp tx: 500 MB/s 8492 calls/s 8492 msg/s > udp tx: 883 MB/s 14985 calls/s 14985 msg/s > udp tx: 756 MB/s 12823 calls/s 12823 msg/s > Summary over 5.000 seconds ... > sum udp tx: 657 MB/s 54429 calls (10885/s) 54429 msgs (10885/s) > Tx Timestamps: received: 34046 errors: 0 > Zerocopy acks: received: 54422 errors: 0 This would probably also be more useful as regression test if it is in the form of a pass/fail test: if timestamps are requested and total count is zero, then the feature is broken and the process should exit with an error. > > Fixes: 3a687bef148d ("selftests: udp gso benchmark") Repeated > Signed-off-by: Fred Klassen <fklassen@appneta.com> > --- > tools/testing/selftests/net/udpgso_bench_tx.c | 152 +++++++++++++++++++------- > 1 file changed, 113 insertions(+), 39 deletions(-) > > diff --git a/tools/testing/selftests/net/udpgso_bench_tx.c b/tools/testing/selftests/net/udpgso_bench_tx.c > index 56e0d890b066..9924342a0b03 100644 > --- a/tools/testing/selftests/net/udpgso_bench_tx.c > +++ b/tools/testing/selftests/net/udpgso_bench_tx.c > @@ -62,10 +62,19 @@ static bool cfg_tcp; > static uint32_t cfg_tx_ts = SOF_TIMESTAMPING_TX_SOFTWARE; > static bool cfg_tx_tstamp; > static uint32_t cfg_tos; > +static bool cfg_audit; > static bool cfg_verbose; > static bool cfg_zerocopy; > static int cfg_msg_nr; > static uint16_t cfg_gso_size; > +static unsigned long total_num_msgs; > +static unsigned long total_num_sends; > +static unsigned long stat_tx_ts; > +static unsigned long stat_tx_ts_errors; > +static unsigned long tstart; > +static unsigned long tend; > +static unsigned long stat_zcopies; > +static unsigned long stat_zcopy_errors; > > static socklen_t cfg_alen; > static struct sockaddr_storage cfg_dst_addr; > @@ -137,8 +146,11 @@ static void flush_cmsg(struct cmsghdr *cmsg) > struct my_scm_timestamping *tss; > > tss = (struct my_scm_timestamping *)CMSG_DATA(cmsg); > - fprintf(stderr, "tx timestamp = %lu.%09lu\n", > - tss->ts[i].tv_sec, tss->ts[i].tv_nsec); > + if (tss->ts[i].tv_sec == 0) > + stat_tx_ts_errors++; > + if (cfg_verbose) > + fprintf(stderr, "tx timestamp = %lu.%09lu\n", > + tss->ts[i].tv_sec, tss->ts[i].tv_nsec); changes unrelated to this feature?
Willem, this is only my 2nd patch, and my last one was a one liner. I’ll try to work through this, but let me know if I am doing a rookie mistake (learning curve and all). > On May 23, 2019, at 2:56 PM, Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote: > > On Thu, May 23, 2019 at 5:11 PM Fred Klassen <fklassen@appneta.com> wrote: >> >> This enhancement adds the '-a' option, which will count all CMSG >> messages on the error queue and print a summary report. >> >> Fixes: 3a687bef148d ("selftests: udp gso benchmark") > > Also not a fix, but an extension. I’ll make a v2 patch and remove “Fixes:". > >> >> Example: >> >> # ./udpgso_bench_tx -4uT -a -l5 -S 1472 -D 172.16.120.189 >> udp tx: 492 MB/s 8354 calls/s 8354 msg/s >> udp tx: 477 MB/s 8106 calls/s 8106 msg/s >> udp tx: 488 MB/s 8288 calls/s 8288 msg/s >> udp tx: 882 MB/s 14975 calls/s 14975 msg/s >> Summary over 5.000 seconds ... >> sum udp tx: 696 MB/s 57696 calls (11539/s) 57696 msgs (11539/s) >> Tx Timestamps: received: 57696 errors: 0 >> >> This can be useful in tracking loss of messages when under load. For example, >> adding the '-z' option results in loss of TX timestamp messages: >> >> # ./udpgso_bench_tx -4ucT -a -l5 -S 1472 -D 172.16.120.189 -p 3239 -z >> udp tx: 490 MB/s 8325 calls/s 8325 msg/s >> udp tx: 500 MB/s 8492 calls/s 8492 msg/s >> udp tx: 883 MB/s 14985 calls/s 14985 msg/s >> udp tx: 756 MB/s 12823 calls/s 12823 msg/s >> Summary over 5.000 seconds ... >> sum udp tx: 657 MB/s 54429 calls (10885/s) 54429 msgs (10885/s) >> Tx Timestamps: received: 34046 errors: 0 >> Zerocopy acks: received: 54422 errors: 0 > > This would probably also be more useful as regression test if it is in > the form of a pass/fail test: if timestamps are requested and total > count is zero, then the feature is broken and the process should exit > with an error. > I’ll add a hard failure for zero response for TX Timestamps or Zerocopy, or if any errors occur. >> >> Fixes: 3a687bef148d ("selftests: udp gso benchmark") > > Repeated Will fix. > >> Signed-off-by: Fred Klassen <fklassen@appneta.com> >> --- >> tools/testing/selftests/net/udpgso_bench_tx.c | 152 +++++++++++++++++++------- >> 1 file changed, 113 insertions(+), 39 deletions(-) >> >> diff --git a/tools/testing/selftests/net/udpgso_bench_tx.c b/tools/testing/selftests/net/udpgso_bench_tx.c >> index 56e0d890b066..9924342a0b03 100644 >> --- a/tools/testing/selftests/net/udpgso_bench_tx.c >> +++ b/tools/testing/selftests/net/udpgso_bench_tx.c >> @@ -62,10 +62,19 @@ static bool cfg_tcp; >> static uint32_t cfg_tx_ts = SOF_TIMESTAMPING_TX_SOFTWARE; >> static bool cfg_tx_tstamp; >> static uint32_t cfg_tos; >> +static bool cfg_audit; >> static bool cfg_verbose; >> static bool cfg_zerocopy; >> static int cfg_msg_nr; >> static uint16_t cfg_gso_size; >> +static unsigned long total_num_msgs; >> +static unsigned long total_num_sends; >> +static unsigned long stat_tx_ts; >> +static unsigned long stat_tx_ts_errors; >> +static unsigned long tstart; >> +static unsigned long tend; >> +static unsigned long stat_zcopies; >> +static unsigned long stat_zcopy_errors; >> >> static socklen_t cfg_alen; >> static struct sockaddr_storage cfg_dst_addr; >> @@ -137,8 +146,11 @@ static void flush_cmsg(struct cmsghdr *cmsg) >> struct my_scm_timestamping *tss; >> >> tss = (struct my_scm_timestamping *)CMSG_DATA(cmsg); >> - fprintf(stderr, "tx timestamp = %lu.%09lu\n", >> - tss->ts[i].tv_sec, tss->ts[i].tv_nsec); >> + if (tss->ts[i].tv_sec == 0) >> + stat_tx_ts_errors++; >> + if (cfg_verbose) >> + fprintf(stderr, "tx timestamp = %lu.%09lu\n", >> + tss->ts[i].tv_sec, tss->ts[i].tv_nsec); > > changes unrelated to this feature? I’ll remove. Do you think that I should pull out any messages related to “cfg_verbose”?
On Thu, May 23, 2019 at 9:27 PM Fred Klassen <fklassen@appneta.com> wrote: > > Willem, this is only my 2nd patch, and my last one was a one liner. > I’ll try to work through this, but let me know if I am doing a rookie > mistake (learning curve and all). Not at all. The fix makes perfect sense. The test patches 2 and 4 are not fixes, so are better suited to to net-next. Perhaps the changes to the test can also be more concise, just the minimal changes needed to demonstrate the bug and fix. > >> tss = (struct my_scm_timestamping *)CMSG_DATA(cmsg); > >> - fprintf(stderr, "tx timestamp = %lu.%09lu\n", > >> - tss->ts[i].tv_sec, tss->ts[i].tv_nsec); > >> + if (tss->ts[i].tv_sec == 0) > >> + stat_tx_ts_errors++; > >> + if (cfg_verbose) > >> + fprintf(stderr, "tx timestamp = %lu.%09lu\n", > >> + tss->ts[i].tv_sec, tss->ts[i].tv_nsec); > > > > changes unrelated to this feature? > > I’ll remove. Do you think that I should pull out any messages related > to “cfg_verbose”? This change did not seem relevant to the main feature of the patch.
Willem, Thanks for spending so much time with me on this patch. I’m pretty new to this, so I know I am making lots of mistakes. I have been working on a v2 of the selftests in net-next, but want to review the layout of the report before I submit (see below). Also, I my v2 fix in net is still up for debate. In its current state, it meets my application’s requirements, but may not meet all of yours. I am still open to suggestions, but so far I don’t have an alternate solution that doesn’t break what I need working. I also have a question about submitting a fix in net and a related enhancement in net-next. I feel I should be referencing the fix in net in my net-next commit, but I don’t know how it should be done. Any suggestions? > On May 23, 2019, at 2:56 PM, Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote: >> >> Example: >> >> # ./udpgso_bench_tx -4uT -a -l5 -S 1472 -D 172.16.120.189 >> udp tx: 492 MB/s 8354 calls/s 8354 msg/s >> udp tx: 477 MB/s 8106 calls/s 8106 msg/s >> udp tx: 488 MB/s 8288 calls/s 8288 msg/s >> udp tx: 882 MB/s 14975 calls/s 14975 msg/s >> Summary over 5.000 seconds ... >> sum udp tx: 696 MB/s 57696 calls (11539/s) 57696 msgs (11539/s) >> Tx Timestamps: received: 57696 errors: 0 >> >> This can be useful in tracking loss of messages when under load. For example, >> adding the '-z' option results in loss of TX timestamp messages: >> >> # ./udpgso_bench_tx -4ucT -a -l5 -S 1472 -D 172.16.120.189 -p 3239 -z >> udp tx: 490 MB/s 8325 calls/s 8325 msg/s >> udp tx: 500 MB/s 8492 calls/s 8492 msg/s >> udp tx: 883 MB/s 14985 calls/s 14985 msg/s >> udp tx: 756 MB/s 12823 calls/s 12823 msg/s >> Summary over 5.000 seconds ... >> sum udp tx: 657 MB/s 54429 calls (10885/s) 54429 msgs (10885/s) >> Tx Timestamps: received: 34046 errors: 0 >> Zerocopy acks: received: 54422 errors: 0 > > This would probably also be more useful as regression test if it is in > the form of a pass/fail test: if timestamps are requested and total > count is zero, then the feature is broken and the process should exit > with an error. I have it set up as a pass/fail test. Below is a sample output of the test, including a failure on IPv6 TCP Zerocopy audit (a failure that may lead to a memory leak). I wanted to review the report with you before I push up the v2 patch into net-next. Are these extra tests what you were expecting? Is it OK that it doesn’t flow well? Also, there is a failure about every 3rd time I run it, indicating that some TX or Zerocopy messages are lost. Is that OK? ipv4 tcp tcp rx: 11665 MB/s 189014 calls/s tcp tx: 11665 MB/s 197859 calls/s 197859 msg/s tcp rx: 11706 MB/s 190749 calls/s tcp tx: 11706 MB/s 198545 calls/s 198545 msg/s tcp tx: 11653 MB/s 197645 calls/s 197645 msg/s tcp rx: 11653 MB/s 189292 calls/s tcp zerocopy tcp rx: 6658 MB/s 111849 calls/s tcp tx: 6658 MB/s 112940 calls/s 112940 msg/s tcp tx: 6824 MB/s 115742 calls/s 115742 msg/s tcp rx: 6824 MB/s 115711 calls/s tcp rx: 6865 MB/s 116402 calls/s tcp tx: 6865 MB/s 116440 calls/s 116440 msg/s tcp zerocopy audit tcp tx: 6646 MB/s 112729 calls/s 112729 msg/s tcp rx: 6646 MB/s 111445 calls/s tcp rx: 6672 MB/s 112833 calls/s tcp tx: 6672 MB/s 113164 calls/s 113164 msg/s tcp tx: 6624 MB/s 112355 calls/s 112355 msg/s tcp rx: 6624 MB/s 110877 calls/s Summary over 4.000 seconds... sum tcp tx: 6813 MB/s 451402 calls (112850/s) 451402 msgs (112850/s) Zerocopy acks: 451402 received 0 errors udp udp rx: 914 MB/s 651407 calls/s udp tx: 925 MB/s 659274 calls/s 15697 msg/s udp rx: 919 MB/s 654859 calls/s udp tx: 919 MB/s 654864 calls/s 15592 msg/s udp rx: 914 MB/s 651668 calls/s udp tx: 914 MB/s 651630 calls/s 15515 msg/s udp rx: 918 MB/s 654642 calls/s udp gso udp rx: 2271 MB/s 1618319 calls/s udp tx: 2515 MB/s 42658 calls/s 42658 msg/s udp rx: 2337 MB/s 1665280 calls/s udp tx: 2551 MB/s 43276 calls/s 43276 msg/s udp rx: 2338 MB/s 1665792 calls/s udp tx: 2557 MB/s 43384 calls/s 43384 msg/s udp rx: 2339 MB/s 1666560 calls/s udp gso zerocopy udp rx: 1725 MB/s 1229087 calls/s udp tx: 1840 MB/s 31219 calls/s 31219 msg/s udp rx: 1850 MB/s 1318460 calls/s udp tx: 1850 MB/s 31388 calls/s 31388 msg/s udp rx: 1845 MB/s 1314428 calls/s udp tx: 1845 MB/s 31299 calls/s 31299 msg/s udp gso timestamp udp rx: 2286 MB/s 1628928 calls/s udp tx: 2446 MB/s 41499 calls/s 41499 msg/s udp rx: 2354 MB/s 1677312 calls/s udp tx: 2473 MB/s 41952 calls/s 41952 msg/s udp rx: 2342 MB/s 1668864 calls/s udp tx: 2471 MB/s 41925 calls/s 41925 msg/s udp rx: 2333 MB/s 1662464 calls/s udp gso zerocopy audit udp rx: 1787 MB/s 1273481 calls/s udp tx: 1832 MB/s 31082 calls/s 31082 msg/s udp rx: 1881 MB/s 1340476 calls/s udp tx: 1881 MB/s 31916 calls/s 31916 msg/s udp rx: 1880 MB/s 1339880 calls/s udp tx: 1881 MB/s 31904 calls/s 31904 msg/s udp rx: 1881 MB/s 1340010 calls/s Summary over 4.000 seconds... sum udp tx: 1912 MB/s 126694 calls (31673/s) 126694 msgs (31673/s) Zerocopy acks: 126694 received 0 errors udp gso timestamp audit udp rx: 2259 MB/s 1609327 calls/s udp tx: 2410 MB/s 40879 calls/s 40879 msg/s udp rx: 2346 MB/s 1671168 calls/s udp tx: 2446 MB/s 41491 calls/s 41491 msg/s udp rx: 2358 MB/s 1680128 calls/s udp tx: 2448 MB/s 41522 calls/s 41522 msg/s udp rx: 2356 MB/s 1678848 calls/s Summary over 4.000 seconds... sum udp tx: 2494 MB/s 165276 calls (41319/s) 165276 msgs (41319/s) Tx Timestamps: 165276 received 0 errors udp gso zerocopy timestamp audit udp rx: 1658 MB/s 1181647 calls/s udp tx: 1678 MB/s 28466 calls/s 28466 msg/s udp rx: 1718 MB/s 1224010 calls/s udp tx: 1718 MB/s 29146 calls/s 29146 msg/s udp rx: 1718 MB/s 1224086 calls/s udp tx: 1718 MB/s 29144 calls/s 29144 msg/s udp rx: 1717 MB/s 1223464 calls/s Summary over 4.000 seconds... sum udp tx: 1747 MB/s 115771 calls (28942/s) 115771 msgs (28942/s) Tx Timestamps: 115771 received 0 errors Zerocopy acks: 115771 received 0 errors ipv6 tcp tcp tx: 11470 MB/s 194547 calls/s 194547 msg/s tcp rx: 11470 MB/s 188442 calls/s tcp tx: 11803 MB/s 200193 calls/s 200193 msg/s tcp rx: 11803 MB/s 194526 calls/s tcp tx: 11832 MB/s 200681 calls/s 200681 msg/s tcp rx: 11832 MB/s 194459 calls/s tcp zerocopy tcp rx: 7482 MB/s 80176 calls/s tcp tx: 7482 MB/s 126908 calls/s 126908 msg/s tcp rx: 6641 MB/s 112609 calls/s tcp tx: 6641 MB/s 112649 calls/s 112649 msg/s tcp tx: 6584 MB/s 111683 calls/s 111683 msg/s tcp rx: 6584 MB/s 111617 calls/s tcp zerocopy audit tcp tx: 6719 MB/s 113968 calls/s 113968 msg/s tcp rx: 6719 MB/s 113893 calls/s tcp rx: 6772 MB/s 114831 calls/s tcp tx: 6772 MB/s 114872 calls/s 114872 msg/s tcp tx: 6793 MB/s 115226 calls/s 115226 msg/s tcp rx: 7075 MB/s 116595 calls/s Summary over 4.000 seconds... sum tcp tx: 6921 MB/s 458580 calls (114645/s) 458580 msgs (114645/s) ./udpgso_bench_tx: Unexpected number of Zerocopy completions: 458580 expected 458578 received udp udp rx: 833 MB/s 607639 calls/s udp tx: 851 MB/s 621264 calls/s 14448 msg/s udp rx: 860 MB/s 627246 calls/s udp tx: 860 MB/s 627284 calls/s 14588 msg/s udp rx: 862 MB/s 628718 calls/s udp tx: 861 MB/s 628574 calls/s 14618 msg/s udp rx: 863 MB/s 630058 calls/s udp gso udp rx: 2310 MB/s 1683314 calls/s udp tx: 2472 MB/s 41931 calls/s 41931 msg/s udp rx: 2343 MB/s 1708032 calls/s udp tx: 2493 MB/s 42298 calls/s 42298 msg/s udp rx: 2322 MB/s 1692160 calls/s udp tx: 2496 MB/s 42347 calls/s 42347 msg/s udp gso zerocopy udp rx: 1752 MB/s 1278423 calls/s udp tx: 1778 MB/s 30162 calls/s 30162 msg/s udp rx: 1804 MB/s 1316058 calls/s udp tx: 1804 MB/s 30605 calls/s 30605 msg/s udp rx: 1807 MB/s 1318120 calls/s udp tx: 1808 MB/s 30681 calls/s 30681 msg/s udp rx: 1730 MB/s 1261819 calls/s udp gso timestamp udp rx: 2296 MB/s 1673728 calls/s udp tx: 2375 MB/s 40284 calls/s 40284 msg/s udp rx: 2334 MB/s 1701632 calls/s udp tx: 2377 MB/s 40319 calls/s 40319 msg/s udp rx: 2335 MB/s 1702093 calls/s udp tx: 2377 MB/s 40319 calls/s 40319 msg/s udp rx: 2336 MB/s 1702656 calls/s udp gso zerocopy audit udp rx: 1717 MB/s 1252554 calls/s udp tx: 1759 MB/s 29839 calls/s 29839 msg/s udp rx: 1811 MB/s 1321003 calls/s udp tx: 1811 MB/s 30722 calls/s 30722 msg/s udp rx: 1811 MB/s 1320917 calls/s udp tx: 1811 MB/s 30719 calls/s 30719 msg/s udp rx: 1810 MB/s 1320606 calls/s Summary over 4.000 seconds... sum udp tx: 1839 MB/s 121868 calls (30467/s) 121868 msgs (30467/s) Zerocopy acks: 121868 received 0 errors udp gso timestamp audit udp rx: 2231 MB/s 1626112 calls/s udp tx: 2337 MB/s 39646 calls/s 39646 msg/s udp rx: 2292 MB/s 1670400 calls/s udp tx: 2365 MB/s 40122 calls/s 40122 msg/s udp rx: 2287 MB/s 1666816 calls/s udp tx: 2363 MB/s 40084 calls/s 40084 msg/s udp rx: 2288 MB/s 1667840 calls/s Summary over 4.000 seconds... sum udp tx: 2412 MB/s 159818 calls (39954/s) 159818 msgs (39954/s) Tx Timestamps: 159818 received 0 errors udp gso zerocopy timestamp audit udp rx: 1592 MB/s 1161479 calls/s udp tx: 1624 MB/s 27560 calls/s 27560 msg/s udp rx: 1657 MB/s 1208472 calls/s udp tx: 1656 MB/s 28103 calls/s 28103 msg/s udp rx: 1653 MB/s 1206064 calls/s udp tx: 1653 MB/s 28047 calls/s 28047 msg/s udp rx: 1648 MB/s 1202151 calls/s Summary over 4.000 seconds... sum udp tx: 1683 MB/s 111556 calls (27889/s) 111556 msgs (27889/s) Tx Timestamps: 111556 received 0 errors Zerocopy acks: 111556 received 0 errors
On Mon, May 27, 2019 at 5:30 PM Fred Klassen <fklassen@appneta.com> wrote: > > Willem, > > Thanks for spending so much time with me on this patch. I’m pretty > new to this, so I know I am making lots of mistakes. I have been > working on a v2 of the selftests in net-next, but want to review the > layout of the report before I submit (see below). > > Also, I my v2 fix in net is still up for debate. In its current state, it > meets my application’s requirements, but may not meet all of yours. I gave more specific feedback on issues with it (referencing zerocopy and IP_TOS, say). Also, it is safer to update only the relevant timestamp bits in tx_flags, rather that blanket overwrite, given that some bits are already set in skb_segment. I have not checked whether this is absolutely necessary. > I am still open to suggestions, but so far I don’t have an alternate > solution that doesn’t break what I need working. Did you see my response yesterday? I can live with the first segment. Even if I don't think that it buys much in practice given xmit_more (and it does cost something, e.g., during requeueing). > I also have a question about submitting a fix in net and a related > enhancement in net-next. I feel I should be referencing the fix > in net in my net-next commit, but I don’t know how it should be > done. Any suggestions? It is not strictly necessary, but indeed often a nice to have. We generally reference by SHA1, so wait with submitting the test until the fix is merged. See also the ipv6 flowlabel test that I just sent for one example. > > > > This would probably also be more useful as regression test if it is in > > the form of a pass/fail test: if timestamps are requested and total > > count is zero, then the feature is broken and the process should exit > > with an error. > > I have it set up as a pass/fail test. Great, thanks. > Below is a sample output of the > test, including a failure on IPv6 TCP Zerocopy audit (a failure that may > lead to a memory leak). Can you elaborate on this suspected memory leak? > I wanted to review the report with you before > I push up the v2 patch into net-next. > > Are these extra tests what you were expecting? Is it OK that it doesn’t > flow well? Do you mean how the output looks? That seems fine. > Also, there is a failure about every 3rd time I run it, > indicating that some TX or Zerocopy messages are lost. Is that OK? No that is not. These tests are run in a continuous test infrastructure. We should try hard to avoid flakiness. If this intermittent failure is due to a real kernel bug, please move that part to a flag (or just comment out) to temporarily exclude it from continuous testing. More commonly it is an issue with the test itself. My SO_TXTIME test from last week depends on timing, which has me somewhat worried when run across a wide variety of (likely virtualized) platforms. I purposely chose large timescales to minimize the risk. On a related note, tests run as part of continuous testing should run as briefly as possible. Perhaps we need to reduce the time per run to accommodate for the new variants you are adding. > > ipv4 > tcp > tcp rx: 11665 MB/s 189014 calls/s > tcp tx: 11665 MB/s 197859 calls/s 197859 msg/s > tcp rx: 11706 MB/s 190749 calls/s > tcp tx: 11706 MB/s 198545 calls/s 198545 msg/s > tcp tx: 11653 MB/s 197645 calls/s 197645 msg/s > tcp rx: 11653 MB/s 189292 calls/s > tcp zerocopy > tcp rx: 6658 MB/s 111849 calls/s > tcp tx: 6658 MB/s 112940 calls/s 112940 msg/s > tcp tx: 6824 MB/s 115742 calls/s 115742 msg/s > tcp rx: 6824 MB/s 115711 calls/s > tcp rx: 6865 MB/s 116402 calls/s > tcp tx: 6865 MB/s 116440 calls/s 116440 msg/s > tcp zerocopy audit > tcp tx: 6646 MB/s 112729 calls/s 112729 msg/s > tcp rx: 6646 MB/s 111445 calls/s > tcp rx: 6672 MB/s 112833 calls/s > tcp tx: 6672 MB/s 113164 calls/s 113164 msg/s > tcp tx: 6624 MB/s 112355 calls/s 112355 msg/s > tcp rx: 6624 MB/s 110877 calls/s > Summary over 4.000 seconds... > sum tcp tx: 6813 MB/s 451402 calls (112850/s) 451402 msgs (112850/s) > Zerocopy acks: 451402 received 0 errors > udp > udp rx: 914 MB/s 651407 calls/s > udp tx: 925 MB/s 659274 calls/s 15697 msg/s > udp rx: 919 MB/s 654859 calls/s > udp tx: 919 MB/s 654864 calls/s 15592 msg/s > udp rx: 914 MB/s 651668 calls/s > udp tx: 914 MB/s 651630 calls/s 15515 msg/s > udp rx: 918 MB/s 654642 calls/s > udp gso > udp rx: 2271 MB/s 1618319 calls/s > udp tx: 2515 MB/s 42658 calls/s 42658 msg/s > udp rx: 2337 MB/s 1665280 calls/s > udp tx: 2551 MB/s 43276 calls/s 43276 msg/s > udp rx: 2338 MB/s 1665792 calls/s > udp tx: 2557 MB/s 43384 calls/s 43384 msg/s > udp rx: 2339 MB/s 1666560 calls/s > udp gso zerocopy > udp rx: 1725 MB/s 1229087 calls/s > udp tx: 1840 MB/s 31219 calls/s 31219 msg/s > udp rx: 1850 MB/s 1318460 calls/s > udp tx: 1850 MB/s 31388 calls/s 31388 msg/s > udp rx: 1845 MB/s 1314428 calls/s > udp tx: 1845 MB/s 31299 calls/s 31299 msg/s > udp gso timestamp > udp rx: 2286 MB/s 1628928 calls/s > udp tx: 2446 MB/s 41499 calls/s 41499 msg/s > udp rx: 2354 MB/s 1677312 calls/s > udp tx: 2473 MB/s 41952 calls/s 41952 msg/s > udp rx: 2342 MB/s 1668864 calls/s > udp tx: 2471 MB/s 41925 calls/s 41925 msg/s > udp rx: 2333 MB/s 1662464 calls/s > udp gso zerocopy audit > udp rx: 1787 MB/s 1273481 calls/s > udp tx: 1832 MB/s 31082 calls/s 31082 msg/s > udp rx: 1881 MB/s 1340476 calls/s > udp tx: 1881 MB/s 31916 calls/s 31916 msg/s > udp rx: 1880 MB/s 1339880 calls/s > udp tx: 1881 MB/s 31904 calls/s 31904 msg/s > udp rx: 1881 MB/s 1340010 calls/s > Summary over 4.000 seconds... > sum udp tx: 1912 MB/s 126694 calls (31673/s) 126694 msgs (31673/s) > Zerocopy acks: 126694 received 0 errors > udp gso timestamp audit > udp rx: 2259 MB/s 1609327 calls/s > udp tx: 2410 MB/s 40879 calls/s 40879 msg/s > udp rx: 2346 MB/s 1671168 calls/s > udp tx: 2446 MB/s 41491 calls/s 41491 msg/s > udp rx: 2358 MB/s 1680128 calls/s > udp tx: 2448 MB/s 41522 calls/s 41522 msg/s > udp rx: 2356 MB/s 1678848 calls/s > Summary over 4.000 seconds... > sum udp tx: 2494 MB/s 165276 calls (41319/s) 165276 msgs (41319/s) > Tx Timestamps: 165276 received 0 errors > udp gso zerocopy timestamp audit > udp rx: 1658 MB/s 1181647 calls/s > udp tx: 1678 MB/s 28466 calls/s 28466 msg/s > udp rx: 1718 MB/s 1224010 calls/s > udp tx: 1718 MB/s 29146 calls/s 29146 msg/s > udp rx: 1718 MB/s 1224086 calls/s > udp tx: 1718 MB/s 29144 calls/s 29144 msg/s > udp rx: 1717 MB/s 1223464 calls/s > Summary over 4.000 seconds... > sum udp tx: 1747 MB/s 115771 calls (28942/s) 115771 msgs (28942/s) > Tx Timestamps: 115771 received 0 errors > Zerocopy acks: 115771 received 0 errors > ipv6 > tcp > tcp tx: 11470 MB/s 194547 calls/s 194547 msg/s > tcp rx: 11470 MB/s 188442 calls/s > tcp tx: 11803 MB/s 200193 calls/s 200193 msg/s > tcp rx: 11803 MB/s 194526 calls/s > tcp tx: 11832 MB/s 200681 calls/s 200681 msg/s > tcp rx: 11832 MB/s 194459 calls/s > tcp zerocopy > tcp rx: 7482 MB/s 80176 calls/s > tcp tx: 7482 MB/s 126908 calls/s 126908 msg/s > tcp rx: 6641 MB/s 112609 calls/s > tcp tx: 6641 MB/s 112649 calls/s 112649 msg/s > tcp tx: 6584 MB/s 111683 calls/s 111683 msg/s > tcp rx: 6584 MB/s 111617 calls/s > tcp zerocopy audit > tcp tx: 6719 MB/s 113968 calls/s 113968 msg/s > tcp rx: 6719 MB/s 113893 calls/s > tcp rx: 6772 MB/s 114831 calls/s > tcp tx: 6772 MB/s 114872 calls/s 114872 msg/s > tcp tx: 6793 MB/s 115226 calls/s 115226 msg/s > tcp rx: 7075 MB/s 116595 calls/s > Summary over 4.000 seconds... > sum tcp tx: 6921 MB/s 458580 calls (114645/s) 458580 msgs (114645/s) > ./udpgso_bench_tx: Unexpected number of Zerocopy completions: 458580 expected 458578 received Is this the issue you're referring to? Good catch. Clearly this is a good test to have :) That is likely due to some timing issue in the test, e.g., no waiting long enough to harvest all completions. That is something I can look into after the code is merged.
> On May 27, 2019, at 2:46 PM, Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote: >> Also, I my v2 fix in net is still up for debate. In its current state, it >> meets my application’s requirements, but may not meet all of yours. > I gave more specific feedback on issues with it (referencing zerocopy > and IP_TOS, say). > Unfortunately I don’t have a very good email setup, and I found a bunch of your comments in my junk folder. That was on Saturday, and on Sunday I spent some time implementing your suggestions. I have not pushed the changes up yet. I wanted to discuss whether or not to attach a buffer to the recvmsg(fd, &msg, MSG_ERRQUEUE). Without it, I have MSG_TRUNC errors in my msg_flags. Either I have to add a buffer, or ignore that error flag. > Also, it is safer to update only the relevant timestamp bits in > tx_flags, rather that blanket overwrite, given that some bits are > already set in skb_segment. I have not checked whether this is > absolutely necessary. > I agree. See tcp_fragment_tstamp(). I think this should work. skb_shinfo(seg)->tx_flags |= (skb_shinfo(gso_skb)->tx_flags & SKBTX_ANY_TSTAMP); >> I am still open to suggestions, but so far I don’t have an alternate >> solution that doesn’t break what I need working. > > Did you see my response yesterday? I can live with the first segment. > Even if I don't think that it buys much in practice given xmit_more > (and it does cost something, e.g., during requeueing). > I’m sorry, I didn’t receive a response. Once again, I am struggling with crappy email setup. Hopefully as of today my junk mail filters are set up properly. I’d like to see that comment. I have been wondering about xmit_more myself. I don’t think it changes anything for software timestamps, but it may with hardware timestamps. I have service contracts with Intel and Mellanox. I can open up a ticket with them to see exactly when the timestamp is taken. I believe you mentioned before that this is vendor specific. > It is not strictly necessary, but indeed often a nice to have. We > generally reference by SHA1, so wait with submitting the test until > the fix is merged. See also the ipv6 flowlabel test that I just sent > for one example. Thanks. I will hold off with the test until I get a final fix in net, and I’ll use your example. >> Below is a sample output of the >> test, including a failure on IPv6 TCP Zerocopy audit (a failure that may >> lead to a memory leak). > > Can you elaborate on this suspected memory leak? A user program cannot free a zerocopy buffer until it is reported as free. If zerocopy events are not reported, that could be a memory leak. I may have a fix. I have added a -P option when I am running an audit. It doesn’t appear to affect performance, and since implementing it I have received all error messages expected for both timestamp and zerocopy. I am still testing. >> I wanted to review the report with you before >> I push up the v2 patch into net-next. >> >> Are these extra tests what you were expecting? Is it OK that it doesn’t >> flow well? > > Do you mean how the output looks? That seems fine. > Good. Thanks. >> Also, there is a failure about every 3rd time I run it, >> indicating that some TX or Zerocopy messages are lost. Is that OK? > > No that is not. These tests are run in a continuous test > infrastructure. We should try hard to avoid flakiness. > As per above comment, I think I removed the flakiness. I will run overnight to confirm. > If this intermittent failure is due to a real kernel bug, please move > that part to a flag (or just comment out) to temporarily exclude it > from continuous testing. > > More commonly it is an issue with the test itself. My SO_TXTIME test > from last week depends on timing, which has me somewhat worried when > run across a wide variety of (likely virtualized) platforms. I > purposely chose large timescales to minimize the risk. > > On a related note, tests run as part of continuous testing should run > as briefly as possible. Perhaps we need to reduce the time per run to > accommodate for the new variants you are adding. > I could reduce testing from 4 to 2 seconds. Anything below that and I miss some reports. When I found flakey results, I found I could reproduce them in as little as 1 second. >> Summary over 4.000 seconds... >> sum tcp tx: 6921 MB/s 458580 calls (114645/s) 458580 msgs (114645/s) >> ./udpgso_bench_tx: Unexpected number of Zerocopy completions: 458580 expected 458578 received > > Is this the issue you're referring to? Good catch. Clearly this is a > good test to have :) That is likely due to some timing issue in the > test, e.g., no waiting long enough to harvest all completions. That is > something I can look into after the code is merged. Thanks. Should the test have failed at this point? I did return an error(), but the script kept running. As stated, I don’t want to push up until I have tested more fully, and the fix is accepted (which requires a v3). If you want to review what I have, I can push it up now with the understanding that I may still fine tune things.
On Mon, May 27, 2019 at 6:56 PM Fred Klassen <fklassen@appneta.com> wrote: > > > > > On May 27, 2019, at 2:46 PM, Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote: > >> Also, I my v2 fix in net is still up for debate. In its current state, it > >> meets my application’s requirements, but may not meet all of yours. > > > I gave more specific feedback on issues with it (referencing zerocopy > > and IP_TOS, say). > > > > Unfortunately I don’t have a very good email setup, and I found a > bunch of your comments in my junk folder. That was on Saturday, > and on Sunday I spent some time implementing your suggestions. > I have not pushed the changes up yet. > > I wanted to discuss whether or not to attach a buffer to the > recvmsg(fd, &msg, MSG_ERRQUEUE). Without it, I have > MSG_TRUNC errors in my msg_flags. Either I have to add > a buffer, or ignore that error flag. Either sounds reasonable. It is an expected and well understood message if underprovisioning the receive data buffer. > > Also, it is safer to update only the relevant timestamp bits in > > tx_flags, rather that blanket overwrite, given that some bits are > > already set in skb_segment. I have not checked whether this is > > absolutely necessary. > > > I agree. See tcp_fragment_tstamp(). > > I think this should work. > > skb_shinfo(seg)->tx_flags |= > (skb_shinfo(gso_skb)->tx_flags & SKBTX_ANY_TSTAMP); Agreed. It is more obviously correct. Only drawback is that the RMW is more expensive than a straight assignment. > >> I am still open to suggestions, but so far I don’t have an alternate > >> solution that doesn’t break what I need working. > > > > Did you see my response yesterday? I can live with the first segment. > > Even if I don't think that it buys much in practice given xmit_more > > (and it does cost something, e.g., during requeueing). > > > > I’m sorry, I didn’t receive a response. Once again, I am struggling > with crappy email setup. Hopefully as of today my junk mail filters are > set up properly. > > I’d like to see that comment. The netdev list is archived and available through various websites, like lore.kernel.org/netdev . As well as the patches with comments at patchwork.ozlabs.org/project/netdev/list > I have been wondering about xmit_more > myself. I don’t think it changes anything for software timestamps, > but it may with hardware timestamps. It arguably makes the software timestamp too early if taken on the first segment, as the NIC is only informed of all the new descriptors when the last segment is written and the doorbell is rung. > > Can you elaborate on this suspected memory leak? > > A user program cannot free a zerocopy buffer until it is reported as free. > If zerocopy events are not reported, that could be a memory leak. > > I may have a fix. I have added a -P option when I am running an audit. > It doesn’t appear to affect performance, and since implementing it I have > received all error messages expected for both timestamp and zerocopy. > > I am still testing. I see, a userspace leak from lack of completion notification. If the issue is a few missing notifications at the end of the run, then perhaps cfg_waittime_ms is too short. > > On a related note, tests run as part of continuous testing should run > > as briefly as possible. Perhaps we need to reduce the time per run to > > accommodate for the new variants you are adding. > > > > I could reduce testing from 4 to 2 seconds. Anything below that and I > miss some reports. When I found flakey results, I found I could reproduce > them in as little as 1 second. > >> Summary over 4.000 seconds... > >> sum tcp tx: 6921 MB/s 458580 calls (114645/s) 458580 msgs (114645/s) > >> ./udpgso_bench_tx: Unexpected number of Zerocopy completions: 458580 expected 458578 received > > > > Is this the issue you're referring to? Good catch. Clearly this is a > > good test to have :) That is likely due to some timing issue in the > > test, e.g., no waiting long enough to harvest all completions. That is > > something I can look into after the code is merged. > > Thanks. > > Should the test have failed at this point? I did return an error(), but > the script kept running. This should normally be cause for test failure, I think yes. Though it's fine to send the code for review and possibly even merge, so that I can take a look. > As stated, I don’t want to push up until I have tested more fully, and > the fix is accepted (which requires a v3). If you want to review what > I have, I can push it up now with the understanding that I may still > fine tune things. Sounds good, thanks.
> On May 27, 2019, at 6:15 PM, Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote: >> I wanted to discuss whether or not to attach a buffer to the >> recvmsg(fd, &msg, MSG_ERRQUEUE). Without it, I have >> MSG_TRUNC errors in my msg_flags. Either I have to add >> a buffer, or ignore that error flag. > > Either sounds reasonable. It is an expected and well understood > message if underprovisioning the receive data buffer. > I’ll stick with setting up buffers. It will fail if there are any bugs introduced in buffer copy routines. > > The netdev list is archived and available through various websites, > like lore.kernel.org/netdev . As well as the patches with comments at > patchwork.ozlabs.org/project/netdev/list > Much better. Sure beats hunting down lost emails. >> I have been wondering about xmit_more >> myself. I don’t think it changes anything for software timestamps, >> but it may with hardware timestamps. > > It arguably makes the software timestamp too early if taken on the > first segment, as the NIC is only informed of all the new descriptors > when the last segment is written and the doorbell is rung. > Totally makes sense. Possibly this can be improved software TX timestamps by delaying until just before ring buffer is advanced. It would have to be updated in each driver. I may have a look at this once I am complete this patch. Hopefully that one will be a bit smoother. >>> Can you elaborate on this suspected memory leak? >> >> A user program cannot free a zerocopy buffer until it is reported as free. >> If zerocopy events are not reported, that could be a memory leak. >> >> I may have a fix. I have added a -P option when I am running an audit. >> It doesn’t appear to affect performance, and since implementing it I have >> received all error messages expected for both timestamp and zerocopy. >> >> I am still testing. > > I see, a userspace leak from lack of completion notification. > > If the issue is a few missing notifications at the end of the run, > then perhaps cfg_waittime_ms is too short. > I’ll get back to you when I have tested this more thoroughly. Early results suggest that adding the -P poll() option has fixed it without any appreciable performance hit. I’ll share raw results with you, and we can make a final decision together. >> Should the test have failed at this point? I did return an error(), but >> the script kept running. > > This should normally be cause for test failure, I think yes. Though > it's fine to send the code for review and possibly even merge, so that > I can take a look. > Sounds like udpgso_bench.sh needs a ’set -e’ to ensure it stops on first error.
> >> I have been wondering about xmit_more > >> myself. I don’t think it changes anything for software timestamps, > >> but it may with hardware timestamps. > > > > It arguably makes the software timestamp too early if taken on the > > first segment, as the NIC is only informed of all the new descriptors > > when the last segment is written and the doorbell is rung. > > > > Totally makes sense. Possibly this can be improved software TX > timestamps by delaying until just before ring buffer is advanced. > It would have to be updated in each driver. I may have a look at > this once I am complete this patch. Hopefully that one will be a bit > smoother. How do you see that? The skb_tstamp_tx call currently is already the last action before ringing the doorbell, after setting up the descriptor. It cannot be set later. The only issue specific to GSO is that xmit_more can forego this doorbell until the last segment. We want to complicate this logic with a special case based on tx_flags. A process that cares should either not use GSO, or the timestamp should be associated with the last segment as I've been arguing so far. > >>> Can you elaborate on this suspected memory leak? > >> > >> A user program cannot free a zerocopy buffer until it is reported as free. > >> If zerocopy events are not reported, that could be a memory leak. > >> > >> I may have a fix. I have added a -P option when I am running an audit. > >> It doesn’t appear to affect performance, and since implementing it I have > >> received all error messages expected for both timestamp and zerocopy. > >> > >> I am still testing. > > > > I see, a userspace leak from lack of completion notification. > > > > If the issue is a few missing notifications at the end of the run, > > then perhaps cfg_waittime_ms is too short. > > > > I’ll get back to you when I have tested this more thoroughly. Early results > suggest that adding the -P poll() option has fixed it without any appreciable > performance hit. I’ll share raw results with you, and we can make a final > decision together. In the main loop? It still is peculiar that notifications appear to go missing unless the process blocks waiting for them. Nothing in sock_zerocopy_callback or the queueing onto the error queue should cause drops, as far as I know. > > >> Should the test have failed at this point? I did return an error(), but > >> the script kept running. > > > > This should normally be cause for test failure, I think yes. Though > > it's fine to send the code for review and possibly even merge, so that > > I can take a look. > > > > Sounds like udpgso_bench.sh needs a ’set -e’ to ensure it stops on > first error. Indeed. Ideally even run all tests, but return error if any failed, like this recent patch selftests/bpf: fail test_tunnel.sh if subtests fail https://patchwork.ozlabs.org/patch/1105221/ but that may be a lot of code churn and better left to a separate patch.
> On May 28, 2019, at 8:08 AM, Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote: > I will push up latest patches soon. I did some testing and discovered that only TCP audit tests failed. They failed much less often when enabling poll. Once in about 20 runs still failed. Therefore I commented out the TCP audit tests. As for the other tests, this is what I got with poll() disabled… udp gso zerocopy timestamp audit udp rx: 1611 MB/s 1148129 calls/s udp tx: 1659 MB/s 28146 calls/s 28146 msg/s udp rx: 1686 MB/s 1201494 calls/s udp tx: 1685 MB/s 28579 calls/s 28579 msg/s udp rx: 1685 MB/s 1200402 calls/s udp tx: 1683 MB/s 28552 calls/s 28552 msg/s Summary over 3.000 seconds... sum udp tx: 1716 MB/s 85277 calls (28425/s) 85277 msgs (28425/s) Tx Timestamps: 85277 received 0 errors Zerocopy acks: 85277 received 0 errors Here you see that with poll() enabled, it is a bit slower, so I don’t have it enabled in udpgso_bench.sh … udp gso zerocopy timestamp audit udp rx: 1591 MB/s 1133945 calls/s udp tx: 1613 MB/s 27358 calls/s 27358 msg/s udp rx: 1644 MB/s 1171674 calls/s udp tx: 1643 MB/s 27869 calls/s 27869 msg/s udp rx: 1643 MB/s 1170666 calls/s udp tx: 1641 MB/s 27845 calls/s 27845 msg/s Summary over 3.000 seconds... sum udp tx: 1671 MB/s 83072 calls (27690/s) 83072 msgs (27690/s) Tx Timestamps: 83072 received 0 errors Zerocopy acks: 83072 received 0 errors You may be interested that I reduced test lengths from 4 to 3 seconds, but I am still getting 3 reports per test. I picked up the extra report by changing 'if (tnow > treport)’ to 'if (tnow >= treport)’ > The only issue specific to GSO is that xmit_more can forego this > doorbell until the last segment. We want to complicate this logic with > a special case based on tx_flags. A process that cares should either > not use GSO, or the timestamp should be associated with the last > segment as I've been arguing so far. This is the area I was thinking of looking into. I’m not sure it will work or that it will be too messy. It may be worth a little bit of digging to see if there is anything there. That will be down the road a bu >> >> I’ll get back to you when I have tested this more thoroughly. Early results >> suggest that adding the -P poll() option has fixed it without any appreciable >> performance hit. I’ll share raw results with you, and we can make a final >> decision together. > > In the main loop? It still is peculiar that notifications appear to go > missing unless the process blocks waiting for them. Nothing in > sock_zerocopy_callback or the queueing onto the error queue should > cause drops, as far as I know. > Now that I know the issue is only in TCP, I can speculate that all bytes are being reported, but done with fewer messages. It may warrant some investigation in case there is some kind of bug. > Indeed. Ideally even run all tests, but return error if any failed, > like this recent patch > > selftests/bpf: fail test_tunnel.sh if subtests fail > https://patchwork.ozlabs.org/patch/1105221/ > > but that may be a lot of code churn and better left to a separate patch. I like it. I have it coded up, and it seems to work well. I’ll make a separate commit in the patch set so we can yank it out if you feel it is too much
On Tue, May 28, 2019 at 12:57 PM Fred Klassen <fklassen@appneta.com> wrote: > > > > > On May 28, 2019, at 8:08 AM, Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote: > > > > I will push up latest patches soon. > > I did some testing and discovered that only TCP audit tests failed. They > failed much less often when enabling poll. Once in about 20 runs > still failed. Therefore I commented out the TCP audit tests. Sounds good, thanks. > You may be interested that I reduced test lengths from 4 to 3 seconds, > but I am still getting 3 reports per test. I picked up the extra report by > changing 'if (tnow > treport)’ to 'if (tnow >= treport)’ Nice! > > The only issue specific to GSO is that xmit_more can forego this > > doorbell until the last segment. We want to complicate this logic with > > a special case based on tx_flags. A process that cares should either > > not use GSO, or the timestamp should be associated with the last > > segment as I've been arguing so far. > > This is the area I was thinking of looking into. I’m not sure it will work > or that it will be too messy. It may be worth a little bit of digging to > see if there is anything there. That will be down the road a bu Sorry, I meant we [do not (!)] want to complicate this logic. And definitely don't want to read skb_shinfo where that cacheline isn't accessed already. Given xmit_more, do you still find the first segment the right one to move the timestamp tx_flags to in __udp_gso_segment? > > >> > >> I’ll get back to you when I have tested this more thoroughly. Early results > >> suggest that adding the -P poll() option has fixed it without any appreciable > >> performance hit. I’ll share raw results with you, and we can make a final > >> decision together. > > > > In the main loop? It still is peculiar that notifications appear to go > > missing unless the process blocks waiting for them. Nothing in > > sock_zerocopy_callback or the queueing onto the error queue should > > cause drops, as far as I know. > > > > Now that I know the issue is only in TCP, I can speculate that all bytes are > being reported, but done with fewer messages. It may warrant some > investigation in case there is some kind of bug. This would definitely still be a bug and should not happen. We have quite a bit of experience with TCP zerocopy and I have not run into this in practice, so I do think that it is somehow a test artifact. > > Indeed. Ideally even run all tests, but return error if any failed, > > like this recent patch > > > > selftests/bpf: fail test_tunnel.sh if subtests fail > > https://patchwork.ozlabs.org/patch/1105221/ > > > > but that may be a lot of code churn and better left to a separate patch. > > I like it. I have it coded up, and it seems to work well. I’ll make a > separate commit in the patch set so we can yank it out if you feel > it is too much Great. Yes, it sounds like an independent improvement, in which case it is easier to review as a separate patch. If you already have it, by all means send it as part of the larger patchset.
> Now that I know the issue is only in TCP, I can speculate that all bytes are > > being reported, but done with fewer messages. It may warrant some > > investigation in case there is some kind of bug. > > This would definitely still be a bug and should not happen. We have > quite a bit of experience with TCP zerocopy and I have not run into > this in practice, so I do think that it is somehow a test artifact. To be clear, I'm not saying that it is an artifact of your extensions. It's quite likely that the zerocopy benchmark was a bit flaky in that regard all along. No need to spend further time on that for this patchset.
diff --git a/tools/testing/selftests/net/udpgso_bench_tx.c b/tools/testing/selftests/net/udpgso_bench_tx.c index 56e0d890b066..9924342a0b03 100644 --- a/tools/testing/selftests/net/udpgso_bench_tx.c +++ b/tools/testing/selftests/net/udpgso_bench_tx.c @@ -62,10 +62,19 @@ static bool cfg_tcp; static uint32_t cfg_tx_ts = SOF_TIMESTAMPING_TX_SOFTWARE; static bool cfg_tx_tstamp; static uint32_t cfg_tos; +static bool cfg_audit; static bool cfg_verbose; static bool cfg_zerocopy; static int cfg_msg_nr; static uint16_t cfg_gso_size; +static unsigned long total_num_msgs; +static unsigned long total_num_sends; +static unsigned long stat_tx_ts; +static unsigned long stat_tx_ts_errors; +static unsigned long tstart; +static unsigned long tend; +static unsigned long stat_zcopies; +static unsigned long stat_zcopy_errors; static socklen_t cfg_alen; static struct sockaddr_storage cfg_dst_addr; @@ -137,8 +146,11 @@ static void flush_cmsg(struct cmsghdr *cmsg) struct my_scm_timestamping *tss; tss = (struct my_scm_timestamping *)CMSG_DATA(cmsg); - fprintf(stderr, "tx timestamp = %lu.%09lu\n", - tss->ts[i].tv_sec, tss->ts[i].tv_nsec); + if (tss->ts[i].tv_sec == 0) + stat_tx_ts_errors++; + if (cfg_verbose) + fprintf(stderr, "tx timestamp = %lu.%09lu\n", + tss->ts[i].tv_sec, tss->ts[i].tv_nsec); } else { error(1, 0, "unknown SOL_SOCKET cmsg type=%u level=%u\n", @@ -157,41 +169,52 @@ static void flush_cmsg(struct cmsghdr *cmsg) switch (err->ee_origin) { case SO_EE_ORIGIN_TIMESTAMPING: // Got a TX timestamp from error queue - fprintf(stderr, - "got SO_EE_ORIGIN_TIMESTAMPING\n"); + stat_tx_ts++; + if (cfg_verbose) + fprintf(stderr, + "got SO_EE_ORIGIN_TIMESTAMPING\n"); break; case SO_EE_ORIGIN_ICMP: case SO_EE_ORIGIN_ICMP6: - fprintf(stderr, - "received ICMP error: type=%u, code=%u\n", - err->ee_type, err->ee_code); + if (cfg_verbose) + fprintf(stderr, + "received ICMP error: type=%u, code=%u\n", + err->ee_type, err->ee_code); break; case SO_EE_ORIGIN_ZEROCOPY: { __u32 lo = err->ee_info; __u32 hi = err->ee_data; - if (hi == lo - 1) + if (hi == lo - 1) { // TX was aborted - fprintf(stderr, - "Zerocopy TX aborted: lo=%u hi=%u\n", - lo, hi); - if (hi == lo) + stat_zcopy_errors++; + if (cfg_verbose) + fprintf(stderr, + "Zerocopy TX aborted: lo=%u hi=%u\n", + lo, hi); + } else if (hi == lo) { // single ID acknowledged - fprintf(stderr, - "Zerocopy TX ack ID: %u\n", - lo); - else + stat_zcopies++; + if (cfg_verbose) + fprintf(stderr, + "Zerocopy TX ack ID: %u\n", + lo); + } else { // range of IDs acknowledged - fprintf(stderr, - "Zerocopy TX ack %u IDs %u to %u\n", - hi - lo + 1, lo, hi); + stat_zcopies += hi - lo + 1; + if (cfg_verbose) + fprintf(stderr, + "Zerocopy TX ack %u IDs %u to %u\n", + hi - lo + 1, lo, hi); + } break; } case SO_EE_ORIGIN_LOCAL: - fprintf(stderr, - "received packet with local origin: %u\n", - err->ee_origin); + if (cfg_verbose) + fprintf(stderr, + "received packet with local origin: %u\n", + err->ee_origin); break; default: error(0, 1, @@ -236,7 +259,7 @@ static void flush_errqueue_recv(int fd) if (ret == -1) error(1, errno, "errqueue"); msg.msg_flags = 0; - if (cfg_verbose) { + if (cfg_audit || cfg_verbose) { for (cmsg = CMSG_FIRSTHDR(&msg); cmsg; cmsg = CMSG_NXTHDR(&msg, cmsg)) @@ -245,19 +268,21 @@ static void flush_errqueue_recv(int fd) } } -static void flush_errqueue(int fd) +static void flush_errqueue(int fd, const bool do_poll) { - if (cfg_poll) { + if (do_poll) { struct pollfd fds = { 0 }; int ret; fds.fd = fd; fds.events = POLLERR; - ret = poll(&fds, 1, 1000); - if (ret == 0) - error(1, 0, "poll timeout"); - else if (ret < 0) + ret = poll(&fds, 1, 500); + if (ret == 0) { + if (cfg_verbose) + fprintf(stderr, "poll timeout\n"); + } else if (ret < 0) { error(1, errno, "poll"); + } } flush_errqueue_recv(fd); @@ -458,7 +483,7 @@ static int send_udp_segment(int fd, char *data) static void usage(const char *filepath) { error(1, 0, - "Usage: %s [-46cmHPtTuvz] [-C cpu] [-D dst ip] [-l secs] [-M messagenr] [-p port] [-q tos] [-s sendsize] [-S gsosize]", + "Usage: %s [-46acmHPtTuvz] [-C cpu] [-D dst ip] [-l secs] [-M messagenr] [-p port] [-q tos] [-s sendsize] [-S gsosize]", filepath); } @@ -467,7 +492,7 @@ static void parse_opts(int argc, char **argv) int max_len, hdrlen; int c; - while ((c = getopt(argc, argv, "46cC:D:Hl:mM:p:s:q:PS:tTuvz")) != -1) { + while ((c = getopt(argc, argv, "46acC:D:Hl:mM:p:s:q:PS:tTuvz")) != -1) { switch (c) { case '4': if (cfg_family != PF_UNSPEC) @@ -481,6 +506,9 @@ static void parse_opts(int argc, char **argv) cfg_family = PF_INET6; cfg_alen = sizeof(struct sockaddr_in6); break; + case 'a': + cfg_audit = true; + break; case 'c': cfg_cache_trash = true; break; @@ -599,6 +627,51 @@ static void set_tx_timestamping(int fd) error(1, errno, "setsockopt tx timestamping"); } +static void print_final_report(unsigned long num_msgs, unsigned long num_sends) +{ + unsigned long tdelta; + + tdelta = tend - tstart; + if (!tdelta) + return; + + fprintf(stderr, "Summary over %lu.%03lu seconds ...\n", tdelta / 1000, + tdelta % 1000); + fprintf(stderr, + "sum %s tx: %6lu MB/s %10lu calls (%lu/s) %10lu msgs (%lu/s)\n", + cfg_tcp ? "tcp" : "udp", + ((num_msgs * cfg_payload_len) >> 10) / tdelta, + num_sends, num_sends * 1000 / tdelta, + num_msgs, num_msgs * 1000 / tdelta); + if (cfg_tx_tstamp) + fprintf(stderr, + "Tx Timestamps: received: %9lu errors: %lu\n", + stat_tx_ts, stat_tx_ts_errors); + + if (cfg_zerocopy) + fprintf(stderr, + "Zerocopy acks: received: %9lu errors: %lu\n", + stat_zcopies, stat_zcopy_errors); +} + +static void print_report(unsigned long num_msgs, unsigned long num_sends, + const bool final) +{ + if (!final) + fprintf(stderr, + "%s tx: %6lu MB/s %8lu calls/s %6lu msg/s\n", + cfg_tcp ? "tcp" : "udp", + (num_msgs * cfg_payload_len) >> 20, + num_sends, num_msgs); + + if (cfg_audit) { + total_num_msgs += num_msgs; + total_num_sends += num_sends; + if (final) + print_final_report(total_num_msgs, total_num_sends); + } +} + int main(int argc, char **argv) { unsigned long num_msgs, num_sends; @@ -640,6 +713,8 @@ int main(int argc, char **argv) num_msgs = 0; num_sends = 0; tnow = gettimeofday_ms(); + tstart = tnow; + tend = tnow; tstop = tnow + cfg_runtime_ms; treport = tnow + 1000; @@ -654,19 +729,15 @@ int main(int argc, char **argv) else num_sends += send_udp(fd, buf[i]); num_msgs++; - if ((cfg_zerocopy && (num_msgs & 0xF) == 0) || cfg_tx_tstamp) - flush_errqueue(fd); + if (cfg_tx_tstamp || (cfg_zerocopy && (num_msgs & 0xF) == 0)) + flush_errqueue(fd, cfg_poll); if (cfg_msg_nr && num_msgs >= cfg_msg_nr) break; tnow = gettimeofday_ms(); if (tnow > treport) { - fprintf(stderr, - "%s tx: %6lu MB/s %8lu calls/s %6lu msg/s\n", - cfg_tcp ? "tcp" : "udp", - (num_msgs * cfg_payload_len) >> 20, - num_sends, num_msgs); + print_report(num_msgs, num_sends, false); num_msgs = 0; num_sends = 0; treport = tnow + 1000; @@ -679,10 +750,13 @@ int main(int argc, char **argv) } while (!interrupted && (cfg_runtime_ms == -1 || tnow < tstop)); if (cfg_zerocopy || cfg_tx_tstamp) - flush_errqueue(fd); + flush_errqueue(fd, true); if (close(fd)) error(1, errno, "close"); + tend = tnow; + print_report(num_msgs, num_sends, true); + return 0; }
This enhancement adds the '-a' option, which will count all CMSG messages on the error queue and print a summary report. Fixes: 3a687bef148d ("selftests: udp gso benchmark") Example: # ./udpgso_bench_tx -4uT -a -l5 -S 1472 -D 172.16.120.189 udp tx: 492 MB/s 8354 calls/s 8354 msg/s udp tx: 477 MB/s 8106 calls/s 8106 msg/s udp tx: 488 MB/s 8288 calls/s 8288 msg/s udp tx: 882 MB/s 14975 calls/s 14975 msg/s Summary over 5.000 seconds ... sum udp tx: 696 MB/s 57696 calls (11539/s) 57696 msgs (11539/s) Tx Timestamps: received: 57696 errors: 0 This can be useful in tracking loss of messages when under load. For example, adding the '-z' option results in loss of TX timestamp messages: # ./udpgso_bench_tx -4ucT -a -l5 -S 1472 -D 172.16.120.189 -p 3239 -z udp tx: 490 MB/s 8325 calls/s 8325 msg/s udp tx: 500 MB/s 8492 calls/s 8492 msg/s udp tx: 883 MB/s 14985 calls/s 14985 msg/s udp tx: 756 MB/s 12823 calls/s 12823 msg/s Summary over 5.000 seconds ... sum udp tx: 657 MB/s 54429 calls (10885/s) 54429 msgs (10885/s) Tx Timestamps: received: 34046 errors: 0 Zerocopy acks: received: 54422 errors: 0 Fixes: 3a687bef148d ("selftests: udp gso benchmark") Signed-off-by: Fred Klassen <fklassen@appneta.com> --- tools/testing/selftests/net/udpgso_bench_tx.c | 152 +++++++++++++++++++------- 1 file changed, 113 insertions(+), 39 deletions(-)