mbox series

[bpf-next,v8,0/6] monitor network traffic for flaky test cases

Message ID 20240815053254.470944-1-thinker.li@gmail.com (mailing list archive)
Headers show
Series monitor network traffic for flaky test cases | expand

Message

Kui-Feng Lee Aug. 15, 2024, 5:32 a.m. UTC
Capture packets in the background for flaky test cases related to
network features.

We have some flaky test cases that are difficult to debug without
knowing what the traffic looks like. Capturing packets, the CI log and
packet files may help developers to fix these flaky test cases.

This patch set monitors a few test cases. Recently, they have been
showing flaky behavior.

    lo      In  IPv4 127.0.0.1:40265 > 127.0.0.1:55907: TCP, length 68, SYN
    lo      In  IPv4 127.0.0.1:55907 > 127.0.0.1:40265: TCP, length 60, SYN, ACK
    lo      In  IPv4 127.0.0.1:40265 > 127.0.0.1:55907: TCP, length 60, ACK
    lo      In  IPv4 127.0.0.1:55907 > 127.0.0.1:40265: TCP, length 52, ACK
    lo      In  IPv4 127.0.0.1:40265 > 127.0.0.1:55907: TCP, length 52, FIN, ACK
    lo      In  IPv4 127.0.0.1:55907 > 127.0.0.1:40265: TCP, length 52, RST, ACK
    Packet file: packets-2173-86-select_reuseport:sockhash_IPv4_TCP_LOOPBACK_test_detach_bpf-test.log
    #280/87 select_reuseport/sockhash IPv4/TCP LOOPBACK test_detach_bpf:OK

The above block is the log of a test case. It shows every packet of a
connection. The captured packets are stored in the file called
packets-2173-86-select_reuseport:sockhash_IPv4_TCP_LOOPBACK_test_detach_bpf-test.log.

We have a set of high-level helpers and a test_progs option to
simplify the process of enabling the traffic monitor. netns_new() and
netns_free() are helpers used to create and delete namespaces while
also enabling the traffic monitor for the namespace based on the
patterns provided by the "-m" option of test_progs. The value of the
"-m" option is a list of patterns used to enable the traffic monitor
for a group of tests or a file containing patterns. CI can utilize
this option to enable monitoring.

traffic_monitor_start() and traffic_monitor_stop() are low-level
functions to start monitoring explicitly. You can have more controls,
however high-level helpers are preferred.

The following block is an example that monitors the network traffic of
a test case in a network namespace.

    struct netns_obj *netns;
    
    ...
    netns = netns_new("test", true);
    if (!ASSERT_TRUE(netns, "netns_new"))
        goto err;
    
    ... test ...
    
    netns_free(netns);

netns_new() will create a network namespace named "test" and bring up
"lo" in the namespace. By passing "true" as the 2nd argument, it will
set the network namespace of the current process to
"test".netns_free() will destroy the namespace, and the process will
leave the "test" namespace if the struct netns_obj returned by
netns_new() is created with "true" as the 2nd argument. If the name of
the test matches the patterns given by the "-m" option, the traffic
monitor will be enabled for the "test" namespace as well.

The packet files are located in the directory "/tmp/tmon_pcap/". The
directory is intended to be compressed as a file so that developers
can download it from the CI.

This feature is enabled only if libpcap is available when building
selftests.

---

Changes from v7:

 - Remove ":" with "__" from the file names of traffic logs. ':' would
   cause an error of the upload-artifact action of github.

 - Move remove_netns() to avoid a forward declaration.

Changes from v6:

 - Remove unnecessary memcpy for addresses.

 - Make packet messages similar to what tcpdump prints.

 - Check return value of inet_ntop().

 - Remove duplicated errno in messages.

 - Print arphdr_type for not handled packets.

 - Set dev "lo" in make_netns().

 - Avoid stacking netns by moving traffic_monitor_start() to earlier
   position.

 - Remove the word "packet" from packet messages.

 - Replace pipe with eventfd (wake_fd) to synchronize background threads.

Changes from v5:

 - Remove "-m" completely if traffic monitor is not enabled.

Changes from v4:

 - Use pkg-config to detect libpcap, and enable traffic monitor if
   there is libpcap.

 - Move traffic monitor functions back to network_helper.c, and pass
   extra parameters to traffic_monitor_start().

 - Use flockfile() & funlockfile() to avoid log interleaving.

 - Show "In", "Out", "M" ... for captured packets.

 - Print a warning message if the user pass a "-m" when libpcap is not
   available.

 - Bring up dev lo in netns_new().

Changes from v3:

 - Rebase to the latest tip of bpf-next/for-next

 - Change verb back to C string.

Changes from v2:

 - Include pcap header files conditionally.

 - Move the implementation of traffic monitor to test_progs.c.

 - Include test name and namespace as a part of names of packet files.

 - Parse and print ICMP(v4|v6) packets.

 - Add netns_new() and netns_free() to create and delete network
   namespaces.

   - Make tc_redirect, sockmap_listen and select_reuseport test in a
     network namespace.

 - Add the "-m" option to test_progs to enable traffic monitor for the
   tests matching the pattern. CI may use this option to enable
   monitoring for a given set of tests.

Changes from v1:

 - Move to calling libpcap directly to capture packets in a background
   thread.

 - Print parsed packet information for TCP and UDP packets.

v1: https://lore.kernel.org/all/20240713055552.2482367-5-thinker.li@gmail.com/
v2: https://lore.kernel.org/all/20240723182439.1434795-1-thinker.li@gmail.com/
v3: https://lore.kernel.org/all/20240730002745.1484204-1-thinker.li@gmail.com/
v4: https://lore.kernel.org/all/20240731193140.758210-1-thinker.li@gmail.com/
v5: https://lore.kernel.org/all/20240806221243.1806879-1-thinker.li@gmail.com/
v6: https://lore.kernel.org/all/20240807183149.764711-1-thinker.li@gmail.com/
v7: https://lore.kernel.org/all/20240810023534.2458227-2-thinker.li@gmail.com/

Kui-Feng Lee (6):
  selftests/bpf: Add traffic monitor functions.
  selftests/bpf: Add the traffic monitor option to test_progs.
  selftests/bpf: netns_new() and netns_free() helpers.
  selftests/bpf: Monitor traffic for tc_redirect.
  selftests/bpf: Monitor traffic for sockmap_listen.
  selftests/bpf: Monitor traffic for select_reuseport.

 tools/testing/selftests/bpf/Makefile          |   4 +
 tools/testing/selftests/bpf/network_helpers.c | 504 ++++++++++++++++++
 tools/testing/selftests/bpf/network_helpers.h |  20 +
 .../bpf/prog_tests/select_reuseport.c         |  37 +-
 .../selftests/bpf/prog_tests/sockmap_listen.c |   8 +
 .../selftests/bpf/prog_tests/tc_redirect.c    |  33 +-
 tools/testing/selftests/bpf/test_progs.c      | 174 +++++-
 tools/testing/selftests/bpf/test_progs.h      |   6 +
 8 files changed, 731 insertions(+), 55 deletions(-)

Comments

Kui-Feng Lee Aug. 15, 2024, 5 p.m. UTC | #1
The following link [1] is the collection of test results with a PR to
enable traffic monitor on CI. Since "-m" is a new option, we can not
enable it on CI before this patchset has been landed, or it will break
all tests.

At the very end of the "Run selftests" section, you will see lines like

'''
Artifact tmon-logs-x86_64-gcc-test_progs has been successfully uploaded! 
Final size is 125359 bytes. Artifact ID is 1816543551
Artifact download URL: 
https://github.com/kernel-patches/vmtest/actions/runs/10407067642/artifacts/1816543551
'''

The developers may download packet files by following the link and
analyze packets with tcpdump or wireshark.


[1] 
https://github.com/kernel-patches/vmtest/actions/runs/10407067642/job/28821826062?pr=280



On 8/14/24 22:32, Kui-Feng Lee wrote:
> Capture packets in the background for flaky test cases related to
> network features.
> 
> We have some flaky test cases that are difficult to debug without
> knowing what the traffic looks like. Capturing packets, the CI log and
> packet files may help developers to fix these flaky test cases.
> 
> This patch set monitors a few test cases. Recently, they have been
> showing flaky behavior.
> 
>      lo      In  IPv4 127.0.0.1:40265 > 127.0.0.1:55907: TCP, length 68, SYN
>      lo      In  IPv4 127.0.0.1:55907 > 127.0.0.1:40265: TCP, length 60, SYN, ACK
>      lo      In  IPv4 127.0.0.1:40265 > 127.0.0.1:55907: TCP, length 60, ACK
>      lo      In  IPv4 127.0.0.1:55907 > 127.0.0.1:40265: TCP, length 52, ACK
>      lo      In  IPv4 127.0.0.1:40265 > 127.0.0.1:55907: TCP, length 52, FIN, ACK
>      lo      In  IPv4 127.0.0.1:55907 > 127.0.0.1:40265: TCP, length 52, RST, ACK
>      Packet file: packets-2173-86-select_reuseport:sockhash_IPv4_TCP_LOOPBACK_test_detach_bpf-test.log
>      #280/87 select_reuseport/sockhash IPv4/TCP LOOPBACK test_detach_bpf:OK
> 
> The above block is the log of a test case. It shows every packet of a
> connection. The captured packets are stored in the file called
> packets-2173-86-select_reuseport:sockhash_IPv4_TCP_LOOPBACK_test_detach_bpf-test.log.
> 
> We have a set of high-level helpers and a test_progs option to
> simplify the process of enabling the traffic monitor. netns_new() and
> netns_free() are helpers used to create and delete namespaces while
> also enabling the traffic monitor for the namespace based on the
> patterns provided by the "-m" option of test_progs. The value of the
> "-m" option is a list of patterns used to enable the traffic monitor
> for a group of tests or a file containing patterns. CI can utilize
> this option to enable monitoring.
> 
> traffic_monitor_start() and traffic_monitor_stop() are low-level
> functions to start monitoring explicitly. You can have more controls,
> however high-level helpers are preferred.
> 
> The following block is an example that monitors the network traffic of
> a test case in a network namespace.
> 
>      struct netns_obj *netns;
>      
>      ...
>      netns = netns_new("test", true);
>      if (!ASSERT_TRUE(netns, "netns_new"))
>          goto err;
>      
>      ... test ...
>      
>      netns_free(netns);
> 
> netns_new() will create a network namespace named "test" and bring up
> "lo" in the namespace. By passing "true" as the 2nd argument, it will
> set the network namespace of the current process to
> "test".netns_free() will destroy the namespace, and the process will
> leave the "test" namespace if the struct netns_obj returned by
> netns_new() is created with "true" as the 2nd argument. If the name of
> the test matches the patterns given by the "-m" option, the traffic
> monitor will be enabled for the "test" namespace as well.
> 
> The packet files are located in the directory "/tmp/tmon_pcap/". The
> directory is intended to be compressed as a file so that developers
> can download it from the CI.
> 
> This feature is enabled only if libpcap is available when building
> selftests.
> 
> ---
> 
> Changes from v7:
> 
>   - Remove ":" with "__" from the file names of traffic logs. ':' would
>     cause an error of the upload-artifact action of github.
> 
>   - Move remove_netns() to avoid a forward declaration.
> 
> Changes from v6:
> 
>   - Remove unnecessary memcpy for addresses.
> 
>   - Make packet messages similar to what tcpdump prints.
> 
>   - Check return value of inet_ntop().
> 
>   - Remove duplicated errno in messages.
> 
>   - Print arphdr_type for not handled packets.
> 
>   - Set dev "lo" in make_netns().
> 
>   - Avoid stacking netns by moving traffic_monitor_start() to earlier
>     position.
> 
>   - Remove the word "packet" from packet messages.
> 
>   - Replace pipe with eventfd (wake_fd) to synchronize background threads.
> 
> Changes from v5:
> 
>   - Remove "-m" completely if traffic monitor is not enabled.
> 
> Changes from v4:
> 
>   - Use pkg-config to detect libpcap, and enable traffic monitor if
>     there is libpcap.
> 
>   - Move traffic monitor functions back to network_helper.c, and pass
>     extra parameters to traffic_monitor_start().
> 
>   - Use flockfile() & funlockfile() to avoid log interleaving.
> 
>   - Show "In", "Out", "M" ... for captured packets.
> 
>   - Print a warning message if the user pass a "-m" when libpcap is not
>     available.
> 
>   - Bring up dev lo in netns_new().
> 
> Changes from v3:
> 
>   - Rebase to the latest tip of bpf-next/for-next
> 
>   - Change verb back to C string.
> 
> Changes from v2:
> 
>   - Include pcap header files conditionally.
> 
>   - Move the implementation of traffic monitor to test_progs.c.
> 
>   - Include test name and namespace as a part of names of packet files.
> 
>   - Parse and print ICMP(v4|v6) packets.
> 
>   - Add netns_new() and netns_free() to create and delete network
>     namespaces.
> 
>     - Make tc_redirect, sockmap_listen and select_reuseport test in a
>       network namespace.
> 
>   - Add the "-m" option to test_progs to enable traffic monitor for the
>     tests matching the pattern. CI may use this option to enable
>     monitoring for a given set of tests.
> 
> Changes from v1:
> 
>   - Move to calling libpcap directly to capture packets in a background
>     thread.
> 
>   - Print parsed packet information for TCP and UDP packets.
> 
> v1: https://lore.kernel.org/all/20240713055552.2482367-5-thinker.li@gmail.com/
> v2: https://lore.kernel.org/all/20240723182439.1434795-1-thinker.li@gmail.com/
> v3: https://lore.kernel.org/all/20240730002745.1484204-1-thinker.li@gmail.com/
> v4: https://lore.kernel.org/all/20240731193140.758210-1-thinker.li@gmail.com/
> v5: https://lore.kernel.org/all/20240806221243.1806879-1-thinker.li@gmail.com/
> v6: https://lore.kernel.org/all/20240807183149.764711-1-thinker.li@gmail.com/
> v7: https://lore.kernel.org/all/20240810023534.2458227-2-thinker.li@gmail.com/
> 
> Kui-Feng Lee (6):
>    selftests/bpf: Add traffic monitor functions.
>    selftests/bpf: Add the traffic monitor option to test_progs.
>    selftests/bpf: netns_new() and netns_free() helpers.
>    selftests/bpf: Monitor traffic for tc_redirect.
>    selftests/bpf: Monitor traffic for sockmap_listen.
>    selftests/bpf: Monitor traffic for select_reuseport.
> 
>   tools/testing/selftests/bpf/Makefile          |   4 +
>   tools/testing/selftests/bpf/network_helpers.c | 504 ++++++++++++++++++
>   tools/testing/selftests/bpf/network_helpers.h |  20 +
>   .../bpf/prog_tests/select_reuseport.c         |  37 +-
>   .../selftests/bpf/prog_tests/sockmap_listen.c |   8 +
>   .../selftests/bpf/prog_tests/tc_redirect.c    |  33 +-
>   tools/testing/selftests/bpf/test_progs.c      | 174 +++++-
>   tools/testing/selftests/bpf/test_progs.h      |   6 +
>   8 files changed, 731 insertions(+), 55 deletions(-)
>
patchwork-bot+netdevbpf@kernel.org Aug. 15, 2024, 7:50 p.m. UTC | #2
Hello:

This series was applied to bpf/bpf-next.git (master)
by Martin KaFai Lau <martin.lau@kernel.org>:

On Wed, 14 Aug 2024 22:32:48 -0700 you wrote:
> Capture packets in the background for flaky test cases related to
> network features.
> 
> We have some flaky test cases that are difficult to debug without
> knowing what the traffic looks like. Capturing packets, the CI log and
> packet files may help developers to fix these flaky test cases.
> 
> [...]

Here is the summary with links:
  - [bpf-next,v8,1/6] selftests/bpf: Add traffic monitor functions.
    https://git.kernel.org/bpf/bpf-next/c/f52403b6bfea
  - [bpf-next,v8,2/6] selftests/bpf: Add the traffic monitor option to test_progs.
    https://git.kernel.org/bpf/bpf-next/c/f5281aacec85
  - [bpf-next,v8,3/6] selftests/bpf: netns_new() and netns_free() helpers.
    https://git.kernel.org/bpf/bpf-next/c/1e115a58be0f
  - [bpf-next,v8,4/6] selftests/bpf: Monitor traffic for tc_redirect.
    https://git.kernel.org/bpf/bpf-next/c/52a5b8a30fa8
  - [bpf-next,v8,5/6] selftests/bpf: Monitor traffic for sockmap_listen.
    https://git.kernel.org/bpf/bpf-next/c/b407b52b1850
  - [bpf-next,v8,6/6] selftests/bpf: Monitor traffic for select_reuseport.
    https://git.kernel.org/bpf/bpf-next/c/69354085975a

You are awesome, thank you!