Message ID | 20220816222920.1952936-1-vladimir.oltean@nxp.com (mailing list archive) |
---|---|
Headers | show |
Series | 802.1Q Frame Preemption and 802.3 MAC Merge support via ethtool | expand |
On Wed, 17 Aug 2022 01:29:13 +0300 Vladimir Oltean wrote: > What also exists but is not exported here are PAUSE stats for the > pMAC. Since those are also protocol-specific stats, I'm not sure how > to mix the 2 (MAC Merge layer + PAUSE). Maybe just extend > ETHTOOL_A_PAUSE_STAT_TX_FRAMES and ETHTOOL_A_PAUSE_STAT_RX_FRAMES with > the pMAC variants? I have a couple of general questions. The mm and fp are related but fp can be implemented without mm or they must always come together? (I'd still split patch 2 for ease of review, tho.) When we have separate set of stats for pMAC the normal stats are sum of all traffic, right? So normal - pMAC == eMAC, everything that's not preemptible is express? Did you consider adding an attribute for switching between MAC and pMAC for stats rather than duplicating things?
On Tue, Aug 16, 2022 at 08:34:17PM -0700, Jakub Kicinski wrote: > On Wed, 17 Aug 2022 01:29:13 +0300 Vladimir Oltean wrote: > > What also exists but is not exported here are PAUSE stats for the > > pMAC. Since those are also protocol-specific stats, I'm not sure how > > to mix the 2 (MAC Merge layer + PAUSE). Maybe just extend > > ETHTOOL_A_PAUSE_STAT_TX_FRAMES and ETHTOOL_A_PAUSE_STAT_RX_FRAMES with > > the pMAC variants? > > I have a couple of general questions. The mm and fp are related but fp > can be implemented without mm or they must always come together? (I'd > still split patch 2 for ease of review, tho.) FP cannot be implemented without MM and MM makes limited (but some) sense without FP. Since FP just decides which packets you TX via the pMAC and which via the eMAC, you can configure just the MM layer such that you interoperate with a FP-capable switch, but you don't actually generate any preemptable traffic yourself. In fact, the reasons why I decided to split these are: - because they are part of different specs, which call for different managed objects - because in an SoC where IPs are mixed and matched from different vendors, it makes perfect sense to me that the FP portion (more related to the queue/classification system) is provided by one vendor, and the MM portion is provided by another. In the future, we may find enough commonalities to justify introducing the concept of a dedicated MAC driver, independent/reusable between Ethernet controller ("net_device") drivers. We have this today already with the PCS layer in phylink. So if there is a physical split between the layers, I think keeping a split in terms of callbacks makes some sense too. > When we have separate set of stats for pMAC the normal stats are sum of > all traffic, right? So normal - pMAC == eMAC, everything that's not > preemptible is express? Actually not quite, or at least not for the LS1028A ENETC and Felix switch. The normal counters report just what the eMAC sees, and the pMAC counters just what the pMAC sees. After all, only the eMAC was enabled up until now. Nobody does the addition currently. > Did you consider adding an attribute for switching between MAC and pMAC > for stats rather than duplicating things? No. Could you expand on that idea a little? Add a netlink attribute where, and this helps reduce duplication where, and how?
On Wed, 17 Aug 2022 11:50:09 +0000 Vladimir Oltean wrote: > On Tue, Aug 16, 2022 at 08:34:17PM -0700, Jakub Kicinski wrote: > > I have a couple of general questions. The mm and fp are related but fp > > can be implemented without mm or they must always come together? (I'd > > still split patch 2 for ease of review, tho.) > > FP cannot be implemented without MM and MM makes limited (but some) > sense without FP. Since FP just decides which packets you TX via the > pMAC and which via the eMAC, you can configure just the MM layer such > that you interoperate with a FP-capable switch, but you don't actually > generate any preemptable traffic yourself. > > In fact, the reasons why I decided to split these are: > - because they are part of different specs, which call for different > managed objects > - because in an SoC where IPs are mixed and matched from different > vendors, it makes perfect sense to me that the FP portion (more > related to the queue/classification system) is provided by one vendor, > and the MM portion is provided by another. In the future, we may find > enough commonalities to justify introducing the concept of a dedicated > MAC driver, independent/reusable between Ethernet controller ("net_device") > drivers. We have this today already with the PCS layer in phylink. > So if there is a physical split between the layers, I think keeping a > split in terms of callbacks makes some sense too. Hah, interesting. I was under the impression that FP can be done without MM, if frame is preempted it just gets scrambled (bad FCS gets injected or a special symbol) and dropped by the receiver. I had it completely backwards, then. > > When we have separate set of stats for pMAC the normal stats are sum of > > all traffic, right? So normal - pMAC == eMAC, everything that's not > > preemptible is express? > > Actually not quite, or at least not for the LS1028A ENETC and Felix switch. > The normal counters report just what the eMAC sees, and the pMAC counters > just what the pMAC sees. After all, only the eMAC was enabled up until now. > Nobody does the addition currently. I see. And the netdev stats are the total? > > Did you consider adding an attribute for switching between MAC and pMAC > > for stats rather than duplicating things? > > No. Could you expand on that idea a little? Add a netlink attribute > where, and this helps reduce duplication where, and how? Add a attribute to ETHTOOL_MSG_STATS_GET, let's call it ETHTOOL_A_STATS_EXPRESS, a flag. Plumb thru to all the stats callback an extra argument (a structure for future extensibility) with a bool pMAC; Add a capability field to ethtool_ops to announce that driver will pay attention to the bool pMAC / has support. We can then use the existing callbacks. Am I making sense?
Hi Vladimir, Vladimir Oltean <vladimir.oltean@nxp.com> writes: > Vinicius' progress on upstreaming frame preemption support for Intel I226 > seemed to stall, so I decided to give it a go using my own view as well. > https://patchwork.kernel.org/project/netdevbpf/cover/20220520011538.1098888-1-vinicius.gomes@intel.com/ I was stuck with some internal projects (and some other things) for some time which left me with very little energy/time to follow up with that series. Just let's say that your timing was very good, a few more days I would have sent another version. I am kind of glad that you decided to take this torch. > > Please don't take this patch set too seriously; I spent only a few > days working on this, and I'm only posting it as RFC to inform others > I've started doing this, before I spend too much time to risk colliding > with someone else's active work. > > Compared to Vinicius' previous patches, this is basically a new > implementation, with the following differences: > > - The MAC Merge (mm) and Frame Preemption (fp) settings are split like > they were in Vinicius' proposal to have fp as part of tc-taprio. But > in my proposal, the fp portion is still part of ethtool, like mm. > I have some questions/comments about this part. Mostly related to "prios" in this context. Will make them in the UAPI patches. > - We have statistics, actually 2 kinds. First we have MAC merge layer > stats, which are exposed as protocol-specific stats: > > ethtool --json --include-statistics --show-mm eno2 > [ { > "ifname": "eno2", > "verify-status": "SUCCEEDED", > "verify-time": 10, > "supported": true, > "enabled": true, > "active": true, > "add-frag-size": 0, > "statistics": { > "MACMergeFrameAssErrorCount": 0, > "MACMergeFrameSmdErrorCount": 0, > "MACMergeFrameAssOkCount": 0, > "MACMergeFragCountRx": 0, > "MACMergeFragCountTx": 0, > "MACMergeHoldCount": 0 > } > } ] > > and then we also have the usual standardized statistics counters, but > replicated for the pMAC: > > ethtool -S eno0 --groups pmac-rmon > Standard stats for eno0: > pmac-rmon-etherStatsUndersizePkts: 0 > pmac-rmon-etherStatsOversizePkts: 0 > pmac-rmon-etherStatsFragments: 0 > pmac-rmon-etherStatsJabbers: 0 > rx-pmac-rmon-etherStatsPkts64to64Octets: 0 > rx-pmac-rmon-etherStatsPkts65to127Octets: 0 > rx-pmac-rmon-etherStatsPkts128to255Octets: 0 > rx-pmac-rmon-etherStatsPkts256to511Octets: 0 > rx-pmac-rmon-etherStatsPkts512to1023Octets: 0 > rx-pmac-rmon-etherStatsPkts1024to1522Octets: 0 > rx-pmac-rmon-etherStatsPkts1523to9000Octets: 0 > tx-pmac-rmon-etherStatsPkts64to64Octets: 0 > tx-pmac-rmon-etherStatsPkts65to127Octets: 0 > tx-pmac-rmon-etherStatsPkts128to255Octets: 0 > tx-pmac-rmon-etherStatsPkts256to511Octets: 0 > tx-pmac-rmon-etherStatsPkts512to1023Octets: 0 > tx-pmac-rmon-etherStatsPkts1024to1522Octets: 0 > tx-pmac-rmon-etherStatsPkts1523to9000Octets: 0 > > ethtool -S eno0 --groups eth-pmac-mac > Standard stats for eno0: > eth-pmac-mac-FramesTransmittedOK: 0 > eth-pmac-mac-SingleCollisionFrames: 0 > eth-pmac-mac-MultipleCollisionFrames: 0 > eth-pmac-mac-FramesReceivedOK: 0 > eth-pmac-mac-FrameCheckSequenceErrors: 0 > eth-pmac-mac-AlignmentErrors: 0 > eth-pmac-mac-OctetsTransmittedOK: 0 > eth-pmac-mac-FramesWithDeferredXmissions: 0 > eth-pmac-mac-LateCollisions: 0 > eth-pmac-mac-FramesAbortedDueToXSColls: 0 > eth-pmac-mac-FramesLostDueToIntMACXmitError: 0 > eth-pmac-mac-CarrierSenseErrors: 0 > eth-pmac-mac-OctetsReceivedOK: 0 > eth-pmac-mac-FramesLostDueToIntMACRcvError: 0 > eth-pmac-mac-MulticastFramesXmittedOK: 0 > eth-pmac-mac-BroadcastFramesXmittedOK: 0 > eth-pmac-mac-MulticastFramesReceivedOK: 0 > eth-pmac-mac-BroadcastFramesReceivedOK: 0 > > ethtool -S eno0 --groups eth-pmac-ctrl > Standard stats for eno0: > eth-pmac-ctrl-MACControlFramesTransmitted: 0 > eth-pmac-ctrl-MACControlFramesReceived: 0 > > What also exists but is not exported here are PAUSE stats for the > pMAC. Since those are also protocol-specific stats, I'm not sure how > to mix the 2 (MAC Merge layer + PAUSE). Maybe just extend > ETHTOOL_A_PAUSE_STAT_TX_FRAMES and ETHTOOL_A_PAUSE_STAT_RX_FRAMES with > the pMAC variants? > > - Finally, the hardware I'm working with (here, the test vehicle is the > NXP ENETC from LS1028A, although I have patches for the Felix switch > as well, but those need a bit of a revolution in the driver to go in > first). This hardware is not without its flaws, but at least allows me > to concentrate on the UAPI portions for this series. > > I also have a kselftest written, but it's for the Felix switch (covers > forwarding latency) and so it's not included here. > > Are there objections in exposing the UAPI for this new feature in this way? > I really liked the statistics part, even though the hardware I am working right now with doesn't provide all of them. > Also, there is no documentation associated with this patch set, other > than the code. Life is too short to write documentation for an RFC, sorry. > I may get kdoc related kernel bot warnings because I copy-pasted ethtool > structure definitions from here and there, but I didn't fill in the > descriptions of all their fields. All those fields are as truthful to > the standards as possible rather than my own variables or names, so > please refer to those specs for now. > > Vladimir Oltean (7): > net: ethtool: netlink: introduce ethnl_update_bool() > net: ethtool: add support for Frame Preemption and MAC Merge layer > net: ethtool: stats: make stats_put_stats() take input from multiple > sources > net: ethtool: stats: replicate standardized counters for the pMAC > net: enetc: parameterize port MAC stats to also cover the pMAC > net: enetc: expose some standardized ethtool counters > net: enetc: add support for Frame Preemption and MAC Merge layer > > .../ethernet/freescale/enetc/enetc_ethtool.c | 399 +++++++++++++++--- > .../net/ethernet/freescale/enetc/enetc_hw.h | 132 +++--- > include/linux/ethtool.h | 68 +++ > include/uapi/linux/ethtool.h | 15 + > include/uapi/linux/ethtool_netlink.h | 86 ++++ > net/ethtool/Makefile | 3 +- > net/ethtool/fp.c | 295 +++++++++++++ > net/ethtool/mm.c | 228 ++++++++++ > net/ethtool/netlink.c | 38 ++ > net/ethtool/netlink.h | 34 ++ > net/ethtool/stats.c | 218 +++++++--- > 11 files changed, 1338 insertions(+), 178 deletions(-) > create mode 100644 net/ethtool/fp.c > create mode 100644 net/ethtool/mm.c > > -- > 2.34.1 > Cheers,
On Wed Aug 17 2022, Vladimir Oltean wrote: > Vinicius' progress on upstreaming frame preemption support for Intel I226 > seemed to stall, so I decided to give it a go using my own view as well. > https://patchwork.kernel.org/project/netdevbpf/cover/20220520011538.1098888-1-vinicius.gomes@intel.com/ Great to see progress on FPE :-). [snip] > - Finally, the hardware I'm working with (here, the test vehicle is the > NXP ENETC from LS1028A, although I have patches for the Felix switch > as well, but those need a bit of a revolution in the driver to go in > first). This hardware is not without its flaws, but at least allows me > to concentrate on the UAPI portions for this series. > > I also have a kselftest written, but it's for the Felix switch (covers > forwarding latency) and so it's not included here. What kind of selftest did you implement? So far I've been doing this: Using a cyclic real time application to create high priority frames and running iperf3 in parallel to simulate low priority traffic constantly. Afterwards, checking the NIC statistics for fragments and so on. Also checking the latency of the RT frames with FPE on/off. BTW, if you guys need help with testing of patches, i do have access to i225 and stmmacs which both support FPE. Also the Hirschmann switches should support it. Thanks, Kurt
Hi Kurt, On Fri, Aug 19, 2022 at 10:16:20AM +0200, Kurt Kanzenbach wrote: > On Wed Aug 17 2022, Vladimir Oltean wrote: > > Vinicius' progress on upstreaming frame preemption support for Intel I226 > > seemed to stall, so I decided to give it a go using my own view as well. > > https://patchwork.kernel.org/project/netdevbpf/cover/20220520011538.1098888-1-vinicius.gomes@intel.com/ > > Great to see progress on FPE :-). Let's hope it lasts ;) > > - Finally, the hardware I'm working with (here, the test vehicle is the > > NXP ENETC from LS1028A, although I have patches for the Felix switch > > as well, but those need a bit of a revolution in the driver to go in > > first). This hardware is not without its flaws, but at least allows me > > to concentrate on the UAPI portions for this series. > > > > I also have a kselftest written, but it's for the Felix switch (covers > > forwarding latency) and so it's not included here. > > What kind of selftest did you implement? So far I've been doing this: > Using a cyclic real time application to create high priority frames and > running iperf3 in parallel to simulate low priority traffic > constantly. Afterwards, checking the NIC statistics for fragments and so > on. Also checking the latency of the RT frames with FPE on/off. > > BTW, if you guys need help with testing of patches, i do have access to > i225 and stmmacs which both support FPE. Also the Hirschmann switches > should support it. Blah, I didn't want to spoil the surprise just yet. I am orchestrating 2 isochron senders at specific times, one of PT traffic and one of ET. There are actually 2 variants of this: one is for endpoint FP and the other is for bridge FP. I only had time to convert the bridge FP to kselftest format; not the endpoint one (for enetc). In the endpoint case, interference is created on the sender interface. I compare HW TX timestamps to the expected TX times to calculate how long it took until PT was preempted. I repeat the test millions of times until I can plot the latencies having the PT <-> ET base time offset on the X axis. It looks very cool. The bridge case is similar, except for the fact that interference is created on a bridge port going to a common receiver of 2 isochron senders. What I plot is the path delay, and again, this shows actual preemption times with a nanosecond resolution. Here's a small snapshot of the main script. Not shown are the supporting scripts for this: #!/bin/bash # SPDX-License-Identifier: GPL-2.0 # Copyright 2022 NXP # # Selftest for Frame Preemption on a switch. Creates controlled packet # collisions between a priority configured as preemptable traffic (PT) and a # priority configured as express traffic (ET). # # +-----------------------------+ # | fp.sh | # | br0 | # | +--------+--------+ | # | | | | | # | $swp1 $swp2 $swp3 | # +-----------------------------+ # +--------------------+ ^ ^ | +----------------+ # | | | | | | | # | fp_node_et_send.sh |--------+ | +------>| fp_node_rcv.sh | # | $h1 | | | $h3 | # +--------------------+ | +----------------+ # | # +--------------------+ | # | | | # | fp_node_pt_send.sh |-----------------+ # | $h2 | # +--------------------+ # # Normally, a packet collision plot with no frame preemption looks as follows. # The first packet starts being transmitted by the MAC, and the second packet # sees a latency proportional to how much transmission time of the previous # packet there is left. # # ^ # ET | # frame | ---. # path | / --. # delay | | --. # | / --. # | | --. # | / --. # | | ---. # |-----/ ----------------------- # | # ---+-----------------------------------------------------------> # | ET base time offset relative to PT # # Depending on the preemption point (where the express packet hits the # preemptable packet that is undergoing transmission), the express packet will # have a higher or lower path delay (latency). # # The packet collision can be broken into 3 distinct intervals: # (a) when the packets start interfering, but not enough of the PT packet has # been transmitted yet so as to preempt it (MAC merge fragments must still # obey the minimum Ethernet frame size rule). There are no frame # preemptions when packets collide here. # (b) the preemptable area, where the latency should be equal to the # transmission time of a MAC merge fragment rather than a full packet # (preemption should happen right away). # (c) where the packets still interfere, but there are not enough bytes of the # PT frame left to transmit so as to construct a valid fragment out of the # remainder. No preemptions are expected in this region. # # ^ # ET | a b c # frame | <----><---------------><----> # path | # delay | # | # | /| # | / -----------------. # | | ---. # |-----/ ----------------------- # | # ---+-----------------------------------------------------------> # | ET base time offset relative to PT # # The selftest evaluates the preemption performance of a given switch by # plotting the ET latency as a function of the ET base time offset. WAIT_TIME=1 NUM_NETIFS=3 NETIF_CREATE=no REQUIRE_MZ=no lib_dir=$(dirname $0)/../../../net/forwarding source $lib_dir/lib.sh source $lib_dir/tsn_lib.sh source $(dirname $0)/fp_topology.sh require_command gnuplot swp1=${NETIFS[p1]} swp2=${NETIFS[p2]} swp3=${NETIFS[p3]} signal_received=false ethtool_save_fp_admin_status() { admin_status=$(ethtool --json --show-fp $1 | \ jq "(.[].\"parameter-table\"[] | select(.prio == $2)).\"admin-status\"") } ethtool_restore_fp_admin_status() { case $admin_status in express) ethtool --set-fp $1 admin-status $2:E ;; preemptable) ethtool --set-fp $1 admin-status $2:P ;; esac } avg_pdelay() { local file=$1 local tmp=$(mktemp) printf "print((" > $tmp isochron report \ --input-file "${file}" \ --printf-format "print(\"{} + \".format(%d - %d), end='')\n" \ --printf-args "RT" | python3 - >> $tmp printf "0) / $NUM_FRAMES)" >> $tmp cat $tmp | python3 - rm -f $tmp } validate_coordination() { local et=$1; shift local pt=$1; shift local et_start local pt_start et_start=$(($(isochron report --input-file $et \ --printf-format "%u - %u" --printf-args "SB" --stop 1))) pt_start=$(($(isochron report --input-file $pt \ --printf-format "%u - %u" --printf-args "SB" --stop 1))) if ! [ $et_start = $pt_start ]; then printf "ET test %s / PT test %s: ET starts at %s, PT starts at %s\n" \ "$et" "$pt" \ "$(isochron report --input-file $et \ --printf-format "%T" --printf-args "S" --stop 1)" \ "$(isochron report --input-file $pt \ --printf-format "%T" --printf-args "S" --stop 1)" exit 1 fi } plot() { local pt_frame_size=$1; shift local num_tests=$1; shift local fp_label=$1; shift local pt_svg="plots/pt-${pt_frame_size}-${fp_label}-pt.svg" local pt_plot="plots/pt-${pt_frame_size}-${fp_label}-pt.plot" local et_svg="plots/pt-${pt_frame_size}-${fp_label}-et.svg" local et_plot="plots/pt-${pt_frame_size}-${fp_label}-et.plot" local pt_avg_pdelay local et_avg_pdelay local base_time local et local pt local i printf "Plotting charts for collision between PT@%d and ET@%d\n" \ "${pt_frame_size}" "${ET_FRAME_SIZE}" mkdir -p plots rm -f $et_plot $pt_plot for ((i = 0; i < num_tests; i++)); do et="reports/pt-${pt_frame_size}-test-${i}-${fp_label}-et.dat" pt="reports/pt-${pt_frame_size}-test-${i}-${fp_label}-pt.dat" base_time=$(isochron report --input-file "${et}" \ --printf-format "%u" --printf-args "B" \ --stop 1) pt_avg_pdelay=$(avg_pdelay $pt) et_avg_pdelay=$(avg_pdelay $et) echo "$base_time $pt_avg_pdelay" >> $pt_plot echo "$base_time $et_avg_pdelay" >> $et_plot if [ "$signal_received" = "true" ]; then exit 1 fi done gnuplot --persist <<- EOF set xlabel "Base time offset (ns)" set ylabel "Average PT path delay (ns)" set term svg set terminal svg enhanced background rgb "white" set output "${pt_svg}" plot "${pt_plot}" using 1:2 with lines title "PT latency" EOF gnuplot --persist <<- EOF set xlabel "Base time offset (ns)" set ylabel "Average ET path delay (ns)" set term svg set terminal svg enhanced background rgb "white" set output "${et_svg}" plot "${et_plot}" using 1:2 with lines title "ET latency" EOF rm -f ${et_plot} ${pt_plot} } test_collision() { local test_num=$1; shift local pt_frame_size=$1; shift local base_time_offset=$1; shift local fp_label=$1; shift local vrf_name=$(master_name_get br0) local et="reports/pt-${pt_frame_size}-test-${test_num}-${fp_label}-et.dat" local pt="reports/pt-${pt_frame_size}-test-${test_num}-${fp_label}-pt.dat" local orchestration=$(mktemp) local et_extra_args local pt_extra_args if [ "${H1_REQUIRE_PTP4L}" = "no" ]; then et_extra_args="${et_extra_args} --omit-sync" fi if [ "${H2_REQUIRE_PTP4L}" = "no" ]; then pt_extra_args="${pt_extra_args} --omit-sync" fi if [ "${H3_REQUIRE_PTP4L}" = "no" ]; then et_extra_args="${et_extra_args} --omit-remote-sync" pt_extra_args="${pt_extra_args} --omit-remote-sync" fi mkdir -p reports printf "Collision between PT packets of size %d and ET packets of size %d at %s, test %d (base time offset %d)\n" \ ${pt_frame_size} ${ET_FRAME_SIZE} ${fp_label} ${test_num} ${base_time_offset} cat <<- EOF > ${orchestration} [ET] host = $H1_IPV4%$vrf_name port = $H1_PORT exec = isochron send \\ --client $H3_IPV4%$vrf_name \\ --stats-port $H3_ET_PORT \\ --interface $ET_IF_NAME \\ --unix-domain-socket $UDS_ADDRESS_H1 \\ --num-frames $NUM_FRAMES \\ --priority $ET_PRIO \\ --vid 0 \\ --base-time ${base_time_offset} \\ --cycle-time ${CYCLE_TIME_NS} \\ --frame-size $ET_FRAME_SIZE \\ --sync-threshold $SYNC_THRESHOLD_NS \\ --cpu-mask $((1 << ${ISOCHRON_CPU})) \\ --sched-fifo \\ --sched-priority 98 \\ --etype 0xdead \\ --txtime \\ ${et_extra_args} \\ --output-file ${et} [PT] host = $H2_IPV4%$vrf_name port = $H2_PORT exec = isochron send \\ --client $H3_IPV4%$vrf_name \\ --stats-port $H3_PT_PORT \\ --interface $PT_IF_NAME \\ --unix-domain-socket $UDS_ADDRESS_H2 \\ --num-frames $NUM_FRAMES \\ --priority $PT_PRIO \\ --vid 0 \\ --base-time 0.000000000 \\ --cycle-time ${CYCLE_TIME_NS} \\ --frame-size $pt_frame_size \\ --sync-threshold $SYNC_THRESHOLD_NS \\ --cpu-mask $((1 << ${ISOCHRON_CPU})) \\ --sched-fifo \\ --sched-priority 98 \\ --etype 0xdeaf \\ --txtime \\ ${pt_extra_args} \\ --output-file ${pt} EOF isochron orchestrate --input-file ${orchestration} rm -rf ${orchestration} validate_coordination $et $pt } test_collision_sweep_base_time() { local pt_frame_size=$1; shift local base_time_start=$1; shift local base_time_stop=$1; shift local base_time_increment=$1; shift local fp_label=$1; shift local num_tests local i num_tests=$(((base_time_stop - base_time_start + 1) / base_time_increment)) for ((i = 0; i < num_tests; i++)); do base_time=$((base_time_start + i * base_time_increment)) test_collision "$i" "$pt_frame_size" "$base_time" "$fp_label" if [ "$signal_received" = "true" ]; then exit 1 fi done plot $pt_frame_size $num_tests $fp_label } test_pt_124() { local start=$1; shift local stop=$1; shift local fp_label=$1; shift test_collision_sweep_base_time 124 $start $stop 40 $fp_label } test_pt_300() { local start=$1; shift local stop=$1; shift local fp_label=$1; shift test_collision_sweep_base_time 300 $start $stop 40 $fp_label } test_pt_600() { local start=$1; shift local stop=$1; shift local fp_label=$1; shift test_collision_sweep_base_time 600 $start $stop 40 $fp_label } test_pt_1000() { local start=$1; shift local stop=$1; shift local fp_label=$1; shift test_collision_sweep_base_time 1000 $start $stop 40 $fp_label } test_pt_1500() { local start=$1; shift local stop=$1; shift local fp_label=$1 test_collision_sweep_base_time 1500 $start $stop 40 $fp_label } test_no_fp_cut_through() { local fp_label="no-fp" test_pt_124 0 1800 $fp_label test_pt_300 0 3400 $fp_label test_pt_600 0 7000 $fp_label test_pt_1000 1500 12000 $fp_label test_pt_1500 3000 17000 $fp_label } test_fp_quick() { ethtool_save_fp_admin_status $swp3 $PT_PRIO ethtool --set-mm $swp3 verify-disable off enabled on add-frag-size 0 ethtool --set-fp $swp3 admin-status $PT_PRIO:P test_pt_1500 12000 26000 "quick" ethtool_restore_fp_admin_status $swp3 $PT_PRIO } test_fp() { local fp_label=$1; shift local add_frag_size=$1; shift ethtool_save_fp_admin_status $swp3 $PT_PRIO ethtool --set-mm $swp3 verify-disable off enabled on add-frag-size $add_frag_size ethtool --set-fp $swp3 admin-status $PT_PRIO:P test_pt_124 200 2000 $fp_label test_pt_300 1500 5000 $fp_label test_pt_600 4500 10000 $fp_label test_pt_1000 8800 17000 $fp_label test_pt_1500 12000 26000 $fp_label ethtool_restore_fp_admin_status $swp3 $PT_PRIO } test_fp_add_frag_size_0() { test_fp "add-frag-size-0" 0 } test_fp_add_frag_size_1() { test_fp "add-frag-size-1" 1 } test_fp_add_frag_size_2() { test_fp "add-frag-size-2" 2 } test_fp_add_frag_size_3() { test_fp "add-frag-size-2" 3 } setup_prepare() { vrf_prepare switch_create ptp4l_start "$swp1 $swp2 $swp3" false ${UDS_ADDRESS_SWITCH} phc2sys_start ${UDS_ADDRESS_SWITCH} } cleanup() { pre_cleanup ptp4l_stop "$swp1 $swp2 $swp3" phc2sys_stop switch_destroy vrf_cleanup } signal() { signal_received=true } trap cleanup EXIT trap signal SIGTERM trap signal SIGINT # test_no_fp_cut_through # test_fp_add_frag_size_0 # test_fp_add_frag_size_1 # test_fp_add_frag_size_2 # test_fp_add_frag_size_3 ALL_TESTS=" test_fp_quick " setup_prepare setup_wait tests_run exit $EXIT_STATUS
On Fri Aug 19 2022, Vladimir Oltean wrote: > Hi Kurt, > > On Fri, Aug 19, 2022 at 10:16:20AM +0200, Kurt Kanzenbach wrote: >> On Wed Aug 17 2022, Vladimir Oltean wrote: >> > Vinicius' progress on upstreaming frame preemption support for Intel I226 >> > seemed to stall, so I decided to give it a go using my own view as well. >> > https://patchwork.kernel.org/project/netdevbpf/cover/20220520011538.1098888-1-vinicius.gomes@intel.com/ >> >> Great to see progress on FPE :-). > > Let's hope it lasts ;) > >> > - Finally, the hardware I'm working with (here, the test vehicle is the >> > NXP ENETC from LS1028A, although I have patches for the Felix switch >> > as well, but those need a bit of a revolution in the driver to go in >> > first). This hardware is not without its flaws, but at least allows me >> > to concentrate on the UAPI portions for this series. >> > >> > I also have a kselftest written, but it's for the Felix switch (covers >> > forwarding latency) and so it's not included here. >> >> What kind of selftest did you implement? So far I've been doing this: >> Using a cyclic real time application to create high priority frames and >> running iperf3 in parallel to simulate low priority traffic >> constantly. Afterwards, checking the NIC statistics for fragments and so >> on. Also checking the latency of the RT frames with FPE on/off. >> >> BTW, if you guys need help with testing of patches, i do have access to >> i225 and stmmacs which both support FPE. Also the Hirschmann switches >> should support it. > > Blah, I didn't want to spoil the surprise just yet. I am orchestrating 2 > isochron senders at specific times, one of PT traffic and one of ET. > > There are actually 2 variants of this: one is for endpoint FP and the > other is for bridge FP. I only had time to convert the bridge FP to > kselftest format; not the endpoint one (for enetc). > > In the endpoint case, interference is created on the sender interface. > I compare HW TX timestamps to the expected TX times to calculate how > long it took until PT was preempted. I repeat the test millions of times > until I can plot the latencies having the PT <-> ET base time offset on > the X axis. It looks very cool. > > The bridge case is similar, except for the fact that interference is > created on a bridge port going to a common receiver of 2 isochron > senders. What I plot is the path delay, and again, this shows actual > preemption times with a nanosecond resolution. That makes a lot of sense and this kind of test scenario should work for other endpoints implementations such as igc and stmmac too. Thanks, Kurt
On Wed, Aug 17, 2022 at 11:46:42AM -0700, Jakub Kicinski wrote: > > > When we have separate set of stats for pMAC the normal stats are sum of > > > all traffic, right? So normal - pMAC == eMAC, everything that's not > > > preemptible is express? > > > > Actually not quite, or at least not for the LS1028A ENETC and Felix switch. > > The normal counters report just what the eMAC sees, and the pMAC counters > > just what the pMAC sees. After all, only the eMAC was enabled up until now. > > Nobody does the addition currently. > > I see. And the netdev stats are the total? dev->stats reports the aggregate of express and preemptable packets seen by software, yes. I got the hint though, I should also report the aggregate. The summation seems like a generic problem which ethtool should be able to do internally, yet a generic implementation is riddled with problems that must be dealt with (RMON histograms reported by eMAC and pMAC can be different; some counters could be implemented by the eMAC but not the pMAC or vice versa, and that needs to be handled, etc). Additionally, the summation of counters must also be done for ndo_get_stats64(), when those come from hardware as well. So I'd incline to do it in the driver rn. > > > Did you consider adding an attribute for switching between MAC and pMAC > > > for stats rather than duplicating things? > > > > No. Could you expand on that idea a little? Add a netlink attribute > > where, and this helps reduce duplication where, and how? > > Add a attribute to ETHTOOL_MSG_STATS_GET, let's call it > ETHTOOL_A_STATS_EXPRESS, a flag. I'll add this to the UAPI and to internal data structures, ok? enum ethtool_stats_src { ETHTOOL_STATS_SRC_AGGREGATE = 0, ETHTOOL_STATS_SRC_EMAC, ETHTOOL_STATS_SRC_PMAC, }; > Plumb thru to all the stats callback an extra argument > (a structure for future extensibility) with a bool pMAC; > > Add a capability field to ethtool_ops to announce that > driver will pay attention to the bool pMAC / has support. You mean capability field as in ethtool_ops::supported_coalesce_params, right? (we discussed about this separately). This won't fit the enetc driver very well. Some enetc ports on the NXP LS1028A support the MM layer (port 0, port 2) and some don't (port 1, port 3). Yet they share the same PF driver. So populating mm_supported = true in the const struct enetc_pf_ethtool_ops isn't going to cover both. I can, however, key on my ethtool_ops :: get_mm_state() function which lets the driver report a "bool supported". Is this ok? > We can then use the existing callbacks. > > Am I making sense? Yes, thanks.
On Sat, 1 Oct 2022 15:53:38 +0000 Vladimir Oltean wrote: > > Add a attribute to ETHTOOL_MSG_STATS_GET, let's call it > > ETHTOOL_A_STATS_EXPRESS, a flag. > > I'll add this to the UAPI and to internal data structures, ok? > > enum ethtool_stats_src { > ETHTOOL_STATS_SRC_AGGREGATE = 0, > ETHTOOL_STATS_SRC_EMAC, > ETHTOOL_STATS_SRC_PMAC, > }; Yup! > > Plumb thru to all the stats callback an extra argument > > (a structure for future extensibility) with a bool pMAC; > > > > Add a capability field to ethtool_ops to announce that > > driver will pay attention to the bool pMAC / has support. > > You mean capability field as in ethtool_ops::supported_coalesce_params, > right? (we discussed about this separately). > This won't fit the enetc driver very well. Some enetc ports on the NXP > LS1028A support the MM layer (port 0, port 2) and some don't (port 1, > port 3). Yet they share the same PF driver. So populating mm_supported = > true in the const struct enetc_pf_ethtool_ops isn't going to cover both. > I can, however, key on my ethtool_ops :: get_mm_state() function which > lets the driver report a "bool supported". Is this ok? That happens, I think about the capability in the ops as driver caps rather than HW caps. The driver can still return -EOPNOTSUPP, but it guarantees to check the field's value. Most (all but one) datacenter NIC vendors have uber-drivers for all their HW generations these days, static per-driver caps can't map to HW caps in my world. So weak preference for sticking to that model to avoid confusion about the semantics of existing caps vs caps which should use a function call.
On Mon, Oct 03, 2022 at 07:36:03AM -0700, Jakub Kicinski wrote: > On Sat, 1 Oct 2022 15:53:38 +0000 Vladimir Oltean wrote: > > > Add a attribute to ETHTOOL_MSG_STATS_GET, let's call it > > > ETHTOOL_A_STATS_EXPRESS, a flag. > > > > I'll add this to the UAPI and to internal data structures, ok? > > > > enum ethtool_stats_src { > > ETHTOOL_STATS_SRC_AGGREGATE = 0, > > ETHTOOL_STATS_SRC_EMAC, > > ETHTOOL_STATS_SRC_PMAC, > > }; > > Yup! Ok. I've also added enum ethtool_stats_src as the first member of struct ethtool_eth_mac_stats, ethtool_eth_phy_stats, ethtool_eth_ctrl_stats, ethtool_pause_stats, ethtool_rmon_stats. So I am not adding an extra argument (another "structure for future extensibility" as you wrote below). Hope that's ok. > > > Plumb thru to all the stats callback an extra argument > > > (a structure for future extensibility) with a bool pMAC; > > > > > > Add a capability field to ethtool_ops to announce that > > > driver will pay attention to the bool pMAC / has support. > > > > You mean capability field as in ethtool_ops::supported_coalesce_params, > > right? (we discussed about this separately). > > This won't fit the enetc driver very well. Some enetc ports on the NXP > > LS1028A support the MM layer (port 0, port 2) and some don't (port 1, > > port 3). Yet they share the same PF driver. So populating mm_supported = > > true in the const struct enetc_pf_ethtool_ops isn't going to cover both. > > I can, however, key on my ethtool_ops :: get_mm_state() function which > > lets the driver report a "bool supported". Is this ok? > > That happens, I think about the capability in the ops as driver caps > rather than HW caps. The driver can still return -EOPNOTSUPP, but it > guarantees to check the field's value. The stats callbacks return void. We'd be relying on the ETHTOOL_STAT_NOT_SET value. > > Most (all but one) datacenter NIC vendors have uber-drivers for all > their HW generations these days, static per-driver caps can't map to > HW caps in my world. > > So weak preference for sticking to that model to avoid confusion about > the semantics of existing caps vs caps which should use a function call. An even bigger uber-driver is DSA, with its own dsa_slave_ethtool_ops. If I put "supported_mm" in ethtool_ops, and set it to true in DSA, I become responsible for rejecting everything except ETHTOOL_STATS_SRC_AGGREGATE for all DSA drivers, which I'd rather not do. Alternatively, I put it to false in DSA and I won't have pMAC stats callbacks getting called even if I do support a pMAC. Maybe DSA isn't even the only one in this situation.