Message ID | 20240713055552.2482367-1-thinker.li@gmail.com (mailing list archive) |
---|---|
Headers | show |
Series | monitor network traffic for flaky test cases | expand |
It fails on CI. Looks like we don't have tcpdump installed on CI! On 7/12/24 22:55, Kui-Feng Lee wrote: > Run tcpdump in the background for flaky test cases related to network > features. > > We have some flaky test cases that are difficult to debug without > knowing what the traffic looks like. With the log printed by tcpdump, > the CI log may help developers to fix these flaky test cases. > > This patch set monitors a few test cases. Recently, they have been > showing flaky behavior. If these test cases fail, they will report a > traffic log. > > At the beginning and the end of a traffic log, there are additional > traffic packets used for synchronization between the test cases and > the tcpdump process. These packets consist of UDP packets sent to > 127.0.0.241:4321 and ICMP unreachable messages for this > destination. For instance, the first two and the last two packets > serve as synchronization packets in the following log. > > 15:04:08.586368 lo In IP 127.0.0.1.58904 > 127.0.0.241.4321: UDP, length 5 > 15:04:08.586435 lo In IP 127.0.0.241 > 127.0.0.1: ICMP 127.0.0.241 udp port 4321 unreachable, length 41 > 15:04:08.704526 lo In IP6 ::1.52053 > ::1.45070: UDP, length 8 > 15:04:08.722785 lo In IP 127.0.0.1.51863 > 127.0.0.241.4321: UDP, length 15 > 15:04:08.722856 lo In IP 127.0.0.241 > 127.0.0.1: ICMP 127.0.0.241 udp port 4321 unreachable, length 51 > > The IP address 127.0.0.241 is used for synchronization, so the > loopback interface "lo" should be up in the network namespace where > the test is being conducted. While not ideal, this should suffice for > testing purposes. > > The following block is an example that monitors the network traffic of > a test case. This test is running in the network namespace > "testns". You can pass NULL to traffic_monitor_start() if the entire > test, from traffic_monitor_start() to traffic_monitor_stop(), is > running in the same namespace. > > struct tmonitor_ctx *tmon; > > ... > tmon = traffic_monitor_start("testns"); > ASSERT_TRUE(tmon, "traffic_monitor_start"); > > ... test ... > > /* Report the traffic log only if there is one or more errors. */ > if (env.subtest_state->error_cnt) > traffic_monitor_report(tmon); > traffic_monitor_stop(tmon); > > traffic_monitor_start() may fail, but we just ignore it since the > failure doesn't affect the following test. This tracking feature > takes another 60ms for each test with qemu on my test environment. > > Kui-Feng Lee (4): > selftests/bpf: Add traffic monitor functions. > selftests/bpf: Monitor traffic for tc_redirect/tc_redirect_dtime. > selftests/bpf: Monitor traffic for sockmap_listen. > selftests/bpf: Monitor traffic for select_reuseport. > > tools/testing/selftests/bpf/network_helpers.c | 244 ++++++++++++++++++ > tools/testing/selftests/bpf/network_helpers.h | 5 + > .../bpf/prog_tests/select_reuseport.c | 9 + > .../selftests/bpf/prog_tests/sockmap_listen.c | 10 + > .../selftests/bpf/prog_tests/tc_redirect.c | 7 + > 5 files changed, 275 insertions(+) >
On 07/12, Kui-Feng Lee wrote: > Run tcpdump in the background for flaky test cases related to network > features. Have you considered linking against libpcap instead of shelling out to tcpdump? As long as we have this lib installed on the runners (likely?) that should be a bit cleaner than doing tcpdump.. WDYT?
On 7/15/24 14:33, Stanislav Fomichev wrote: > On 07/12, Kui-Feng Lee wrote: >> Run tcpdump in the background for flaky test cases related to network >> features. > > Have you considered linking against libpcap instead of shelling out > to tcpdump? As long as we have this lib installed on the runners > (likely?) that should be a bit cleaner than doing tcpdump.. WDYT? I just checked the script building the root image for vmtest. [1] It doesn't install libpcap. If our approach is to capture the packets in a file, and let developers download the file, it would be a simple and straight forward solution. If we want a log in text, it would be more complicated to parse packets. Martin & Stanislay, WDYT about capture packets in a file and using libpcap directly? Developers can download the file and parse it with tcpdump locally. [1] https://github.com/libbpf/ci/blob/main/rootfs/mkrootfs_debian.sh
On 7/15/24 3:07 PM, Kui-Feng Lee wrote: > > > On 7/15/24 14:33, Stanislav Fomichev wrote: >> On 07/12, Kui-Feng Lee wrote: >>> Run tcpdump in the background for flaky test cases related to network >>> features. >> >> Have you considered linking against libpcap instead of shelling out >> to tcpdump? As long as we have this lib installed on the runners >> (likely?) that should be a bit cleaner than doing tcpdump.. WDYT? > > I just checked the script building the root image for vmtest. [1] > It doesn't install libpcap. > > If our approach is to capture the packets in a file, and let developers > download the file, it would be a simple and straight forward solution. > If we want a log in text, it would be more complicated to parse > packets. > > Martin & Stanislay, > > WDYT about capture packets in a file and using libpcap directly? > Developers can download the file and parse it with tcpdump locally. thinking out loud... Re: libpcap (instead of tcpdump) part. I am not very experienced in libpcap. I don't have a strong preference. I do hope patch 1 could be more straight forward that no need to use loops and artificial udp packets to ensure the tcpdump is fully ready to capture. I assume using libpcap can make this sync part easier/cleaner (pthread_cond?) and not too much code is needed to use libpcap? Re: dump to file and download by developer. If I read patch 1 correctly, it only dumps everything at the end of the test (by calling traffic_monitor_report). imo, it lost the chronological ordering with other ASSERT_* logs and does not make a big difference (vs downloading as a file). The developer needs to go through another exercise to figure out (e.g.) a captured packet may be related to a ASSERT_* failure by connecting the timestamp between the ASSERT_* log and the captured packet (afaik, there is a timestamp in the CI raw log). Ideally, the packet can be logged to stderr/out as soon as it is captured such that the developer can still sort of relate the packet with the other ASSERT_*() log around it.
On 7/15/24 16:56, Martin KaFai Lau wrote: > On 7/15/24 3:07 PM, Kui-Feng Lee wrote: >> >> >> On 7/15/24 14:33, Stanislav Fomichev wrote: >>> On 07/12, Kui-Feng Lee wrote: >>>> Run tcpdump in the background for flaky test cases related to network >>>> features. >>> >>> Have you considered linking against libpcap instead of shelling out >>> to tcpdump? As long as we have this lib installed on the runners >>> (likely?) that should be a bit cleaner than doing tcpdump.. WDYT? >> >> I just checked the script building the root image for vmtest. [1] >> It doesn't install libpcap. >> >> If our approach is to capture the packets in a file, and let developers >> download the file, it would be a simple and straight forward solution. >> If we want a log in text, it would be more complicated to parse >> packets. >> >> Martin & Stanislay, >> >> WDYT about capture packets in a file and using libpcap directly? >> Developers can download the file and parse it with tcpdump locally. > > thinking out loud... > > Re: libpcap (instead of tcpdump) part. I am not very experienced in > libpcap. I don't have a strong preference. I do hope patch 1 could be > more straight forward that no need to use loops and artificial udp > packets to ensure the tcpdump is fully ready to capture. I assume using > libpcap can make this sync part easier/cleaner (pthread_cond?) and not > too much code is needed to use libpcap? Yes, it would be easier and cleaner if we don't parse the payload of packets. > > Re: dump to file and download by developer. If I read patch 1 correctly, > it only dumps everything at the end of the test (by calling > traffic_monitor_report). imo, it lost the chronological ordering with > other ASSERT_* logs and does not make a big difference (vs downloading > as a file). > > The developer needs to go through another exercise to figure out (e.g.) > a captured packet may be related to a ASSERT_* failure by connecting the > timestamp between the ASSERT_* log and the captured packet (afaik, there > is a timestamp in the CI raw log). Ideally, the packet can be logged to > stderr/out as soon as it is captured such that the developer can still > sort of relate the packet with the other ASSERT_*() log around it. > We can print an ordered number for each received packets to stderr asap and use libpcap without parsing packets. Developers use tcpdump or wireshark to parse packets. Or, we can just run tcpdump on background and let it write to stderr directly. This is convenient for developers. However, we still need to wait for tcpdump ready. Another hybrid solution is to have a libpcap thread to capture packets and feed packets to tcpdump through a pipe to parse packets. With this, we don't need to wait for tcpdump anymore. However, I am not sure how long tcpdump gets ready. It may make all these efforts of keeping order useless. But, I can try it.
On 07/15, Kui-Feng Lee wrote: > > > On 7/15/24 16:56, Martin KaFai Lau wrote: > > On 7/15/24 3:07 PM, Kui-Feng Lee wrote: > > > > > > > > > On 7/15/24 14:33, Stanislav Fomichev wrote: > > > > On 07/12, Kui-Feng Lee wrote: > > > > > Run tcpdump in the background for flaky test cases related to network > > > > > features. > > > > > > > > Have you considered linking against libpcap instead of shelling out > > > > to tcpdump? As long as we have this lib installed on the runners > > > > (likely?) that should be a bit cleaner than doing tcpdump.. WDYT? > > > > > > I just checked the script building the root image for vmtest. [1] > > > It doesn't install libpcap. > > > > > > If our approach is to capture the packets in a file, and let developers > > > download the file, it would be a simple and straight forward solution. > > > If we want a log in text, it would be more complicated to parse > > > packets. > > > > > > Martin & Stanislay, > > > > > > WDYT about capture packets in a file and using libpcap directly? > > > Developers can download the file and parse it with tcpdump locally. > > > > thinking out loud... > > > > Re: libpcap (instead of tcpdump) part. I am not very experienced in > > libpcap. I don't have a strong preference. I do hope patch 1 could be > > more straight forward that no need to use loops and artificial udp > > packets to ensure the tcpdump is fully ready to capture. I assume using > > libpcap can make this sync part easier/cleaner (pthread_cond?) and not > > too much code is needed to use libpcap? > > Yes, it would be easier and cleaner if we don't parse the payload > of packets. Yeah, same, no strong preference; was just wondering whether you've made a conscious choice of not using it because it definitely makes things a bit easier wrt to the part where you try to sync with tcpdump.. Also +1 on saving the raw file (via libpcap or tcpdump -w).
On 7/15/24 20:25, Stanislav Fomichev wrote: > On 07/15, Kui-Feng Lee wrote: >> >> >> On 7/15/24 16:56, Martin KaFai Lau wrote: >>> On 7/15/24 3:07 PM, Kui-Feng Lee wrote: >>>> >>>> >>>> On 7/15/24 14:33, Stanislav Fomichev wrote: >>>>> On 07/12, Kui-Feng Lee wrote: >>>>>> Run tcpdump in the background for flaky test cases related to network >>>>>> features. >>>>> >>>>> Have you considered linking against libpcap instead of shelling out >>>>> to tcpdump? As long as we have this lib installed on the runners >>>>> (likely?) that should be a bit cleaner than doing tcpdump.. WDYT? >>>> >>>> I just checked the script building the root image for vmtest. [1] >>>> It doesn't install libpcap. >>>> >>>> If our approach is to capture the packets in a file, and let developers >>>> download the file, it would be a simple and straight forward solution. >>>> If we want a log in text, it would be more complicated to parse >>>> packets. >>>> >>>> Martin & Stanislay, >>>> >>>> WDYT about capture packets in a file and using libpcap directly? >>>> Developers can download the file and parse it with tcpdump locally. >>> >>> thinking out loud... >>> >>> Re: libpcap (instead of tcpdump) part. I am not very experienced in >>> libpcap. I don't have a strong preference. I do hope patch 1 could be >>> more straight forward that no need to use loops and artificial udp >>> packets to ensure the tcpdump is fully ready to capture. I assume using >>> libpcap can make this sync part easier/cleaner (pthread_cond?) and not >>> too much code is needed to use libpcap? >> >> Yes, it would be easier and cleaner if we don't parse the payload >> of packets. > > Yeah, same, no strong preference; was just wondering whether you've > made a conscious choice of not using it because it definitely makes things > a bit easier wrt to the part where you try to sync with tcpdump.. > > Also +1 on saving the raw file (via libpcap or tcpdump -w). I agree to do it with libpcap. And, will print a number to stdout/stderr as an index to the packets in the raw file. However, I need to figure out how to make the raw file available to developers at first.