Message ID | 20250204004907.789330-6-yoong.siang.song@intel.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | xsk: TX metadata Launch Time support | expand |
On 4/2/2025 8:49 am, Song Yoong Siang wrote: > Enable Launch Time Control (LTC) support for XDP zero copy via XDP Tx > metadata framework. > > This patch has been tested with tools/testing/selftests/bpf/xdp_hw_metadata > on Intel I225-LM Ethernet controller. Below are the test steps and result. > > Test 1: Send a single packet with the launch time set to 1 s in the future. > > Test steps: > 1. On the DUT, start the xdp_hw_metadata selftest application: > $ sudo ./xdp_hw_metadata enp2s0 -l 1000000000 -L 1 > > 2. On the Link Partner, send a UDP packet with VLAN priority 1 to port 9091 > of the DUT. > > Result: > When the launch time is set to 1 s in the future, the delta between the > launch time and the transmit hardware timestamp is 0.016 us, as shown in > printout of the xdp_hw_metadata application below. > 0x562ff5dc8880: rx_desc[4]->addr=84110 addr=84110 comp_addr=84110 EoP > rx_hash: 0xE343384 with RSS type:0x1 > HW RX-time: 1734578015467548904 (sec:1734578015.4675) > delta to User RX-time sec:0.0002 (183.103 usec) > XDP RX-time: 1734578015467651698 (sec:1734578015.4677) > delta to User RX-time sec:0.0001 (80.309 usec) > No rx_vlan_tci or rx_vlan_proto, err=-95 > 0x562ff5dc8880: ping-pong with csum=561c (want c7dd) > csum_start=34 csum_offset=6 > HW RX-time: 1734578015467548904 (sec:1734578015.4675) > delta to HW Launch-time sec:1.0000 (1000000.000 usec) > 0x562ff5dc8880: complete tx idx=4 addr=4018 > HW Launch-time: 1734578016467548904 (sec:1734578016.4675) > delta to HW TX-complete-time sec:0.0000 (0.016 usec) > HW TX-complete-time: 1734578016467548920 (sec:1734578016.4675) > delta to User TX-complete-time sec:0.0000 > (32.546 usec) > XDP RX-time: 1734578015467651698 (sec:1734578015.4677) > delta to User TX-complete-time sec:0.9999 > (999929.768 usec) > HW RX-time: 1734578015467548904 (sec:1734578015.4675) > delta to HW TX-complete-time sec:1.0000 (1000000.016 usec) > 0x562ff5dc8880: complete rx idx=132 addr=84110 > > Test 2: Send 1000 packets with a 10 ms interval and the launch time set to > 500 us in the future. > > Test steps: > 1. On the DUT, start the xdp_hw_metadata selftest application: > $ sudo chrt -f 99 ./xdp_hw_metadata enp2s0 -l 500000 -L 1 > \ > /dev/shm/result.log > > 2. On the Link Partner, send 1000 UDP packets with a 10 ms interval and > VLAN priority 1 to port 9091 of the DUT. > > Result: > When the launch time is set to 500 us in the future, the average delta > between the launch time and the transmit hardware timestamp is 0.016 us, > as shown in the analysis of /dev/shm/result.log below. The XDP launch time > works correctly in sending 1000 packets continuously. > Min delta: 0.005 us > Avr delta: 0.016 us > Max delta: 0.031 us > Total packets forwarded: 1000 > > Signed-off-by: Song Yoong Siang <yoong.siang.song@intel.com> > --- > drivers/net/ethernet/intel/igc/igc_main.c | 42 +++++++++++++++++++++-- > 1 file changed, 40 insertions(+), 2 deletions(-) > > diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c > index c3edd8bcf633..535d340c71c9 100644 > --- a/drivers/net/ethernet/intel/igc/igc_main.c > +++ b/drivers/net/ethernet/intel/igc/igc_main.c > @@ -2951,9 +2951,33 @@ static u64 igc_xsk_fill_timestamp(void *_priv) > return *(u64 *)_priv; > } > > +static void igc_xsk_request_launch_time(u64 launch_time, void *_priv) > +{ > + struct igc_metadata_request *meta_req = _priv; > + struct igc_ring *tx_ring = meta_req->tx_ring; > + __le32 launch_time_offset; > + bool insert_empty = false; > + bool first_flag = false; > + > + if (!tx_ring->launchtime_enable) > + return; > + > + launch_time_offset = igc_tx_launchtime(tx_ring, > + ns_to_ktime(launch_time), > + &first_flag, &insert_empty); > + if (insert_empty) { > + igc_insert_empty_packet(tx_ring); > + meta_req->tx_buffer = > + &tx_ring->tx_buffer_info[tx_ring->next_to_use]; > + } > + > + igc_tx_ctxtdesc(tx_ring, launch_time_offset, first_flag, 0, 0, 0); > +} > + > const struct xsk_tx_metadata_ops igc_xsk_tx_metadata_ops = { > .tmo_request_timestamp = igc_xsk_request_timestamp, > .tmo_fill_timestamp = igc_xsk_fill_timestamp, > + .tmo_request_launch_time = igc_xsk_request_launch_time, > }; > > static void igc_xdp_xmit_zc(struct igc_ring *ring) > @@ -2976,7 +3000,13 @@ static void igc_xdp_xmit_zc(struct igc_ring *ring) > ntu = ring->next_to_use; > budget = igc_desc_unused(ring); > > - while (xsk_tx_peek_desc(pool, &xdp_desc) && budget--) { > + /* Packets with launch time require one data descriptor and one context > + * descriptor. When the launch time falls into the next Qbv cycle, we > + * may need to insert an empty packet, which requires two more > + * descriptors. Therefore, to be safe, we always ensure we have at least > + * 4 descriptors available. > + */ > + while (xsk_tx_peek_desc(pool, &xdp_desc) && budget >= 4) { > struct igc_metadata_request meta_req; > struct xsk_tx_metadata *meta = NULL; > struct igc_tx_buffer *bi; > @@ -3000,6 +3030,12 @@ static void igc_xdp_xmit_zc(struct igc_ring *ring) > xsk_tx_metadata_request(meta, &igc_xsk_tx_metadata_ops, > &meta_req); > > + /* xsk_tx_metadata_request() may have updated next_to_use */ > + ntu = ring->next_to_use; > + > + /* xsk_tx_metadata_request() may have updated Tx buffer info */ > + bi = meta_req.tx_buffer; > + > tx_desc = IGC_TX_DESC(ring, ntu); > tx_desc->read.cmd_type_len = cpu_to_le32(meta_req.cmd_type); > tx_desc->read.olinfo_status = cpu_to_le32(olinfo_status); > @@ -3017,9 +3053,11 @@ static void igc_xdp_xmit_zc(struct igc_ring *ring) > ntu++; > if (ntu == ring->count) > ntu = 0; > + > + ring->next_to_use = ntu; > + budget = igc_desc_unused(ring); > } > > - ring->next_to_use = ntu; > if (tx_desc) { > igc_flush_tx_descriptors(ring); > xsk_tx_release(pool); Reviewed-by: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
On Tue, Feb 04, 2025 at 08:49:07AM +0800, Song Yoong Siang wrote: > Enable Launch Time Control (LTC) support for XDP zero copy via XDP Tx > metadata framework. > > This patch has been tested with tools/testing/selftests/bpf/xdp_hw_metadata > on Intel I225-LM Ethernet controller. Below are the test steps and result. > > Test 1: Send a single packet with the launch time set to 1 s in the future. > > Test steps: > 1. On the DUT, start the xdp_hw_metadata selftest application: > $ sudo ./xdp_hw_metadata enp2s0 -l 1000000000 -L 1 > > 2. On the Link Partner, send a UDP packet with VLAN priority 1 to port 9091 > of the DUT. > > Result: > When the launch time is set to 1 s in the future, the delta between the > launch time and the transmit hardware timestamp is 0.016 us, as shown in > printout of the xdp_hw_metadata application below. > 0x562ff5dc8880: rx_desc[4]->addr=84110 addr=84110 comp_addr=84110 EoP > rx_hash: 0xE343384 with RSS type:0x1 > HW RX-time: 1734578015467548904 (sec:1734578015.4675) > delta to User RX-time sec:0.0002 (183.103 usec) > XDP RX-time: 1734578015467651698 (sec:1734578015.4677) > delta to User RX-time sec:0.0001 (80.309 usec) > No rx_vlan_tci or rx_vlan_proto, err=-95 > 0x562ff5dc8880: ping-pong with csum=561c (want c7dd) > csum_start=34 csum_offset=6 > HW RX-time: 1734578015467548904 (sec:1734578015.4675) > delta to HW Launch-time sec:1.0000 (1000000.000 usec) > 0x562ff5dc8880: complete tx idx=4 addr=4018 > HW Launch-time: 1734578016467548904 (sec:1734578016.4675) > delta to HW TX-complete-time sec:0.0000 (0.016 usec) > HW TX-complete-time: 1734578016467548920 (sec:1734578016.4675) > delta to User TX-complete-time sec:0.0000 > (32.546 usec) > XDP RX-time: 1734578015467651698 (sec:1734578015.4677) > delta to User TX-complete-time sec:0.9999 > (999929.768 usec) > HW RX-time: 1734578015467548904 (sec:1734578015.4675) > delta to HW TX-complete-time sec:1.0000 (1000000.016 usec) > 0x562ff5dc8880: complete rx idx=132 addr=84110 > > Test 2: Send 1000 packets with a 10 ms interval and the launch time set to > 500 us in the future. > > Test steps: > 1. On the DUT, start the xdp_hw_metadata selftest application: > $ sudo chrt -f 99 ./xdp_hw_metadata enp2s0 -l 500000 -L 1 > \ > /dev/shm/result.log > > 2. On the Link Partner, send 1000 UDP packets with a 10 ms interval and > VLAN priority 1 to port 9091 of the DUT. > > Result: > When the launch time is set to 500 us in the future, the average delta > between the launch time and the transmit hardware timestamp is 0.016 us, > as shown in the analysis of /dev/shm/result.log below. The XDP launch time > works correctly in sending 1000 packets continuously. > Min delta: 0.005 us > Avr delta: 0.016 us > Max delta: 0.031 us > Total packets forwarded: 1000 > > Signed-off-by: Song Yoong Siang <yoong.siang.song@intel.com> > --- > drivers/net/ethernet/intel/igc/igc_main.c | 42 +++++++++++++++++++++-- > 1 file changed, 40 insertions(+), 2 deletions(-) > > diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c > index c3edd8bcf633..535d340c71c9 100644 > --- a/drivers/net/ethernet/intel/igc/igc_main.c > +++ b/drivers/net/ethernet/intel/igc/igc_main.c > @@ -2951,9 +2951,33 @@ static u64 igc_xsk_fill_timestamp(void *_priv) > return *(u64 *)_priv; > } > > +static void igc_xsk_request_launch_time(u64 launch_time, void *_priv) > +{ > + struct igc_metadata_request *meta_req = _priv; > + struct igc_ring *tx_ring = meta_req->tx_ring; > + __le32 launch_time_offset; > + bool insert_empty = false; > + bool first_flag = false; > + > + if (!tx_ring->launchtime_enable) > + return; > + > + launch_time_offset = igc_tx_launchtime(tx_ring, > + ns_to_ktime(launch_time), > + &first_flag, &insert_empty); > + if (insert_empty) { > + igc_insert_empty_packet(tx_ring); > + meta_req->tx_buffer = > + &tx_ring->tx_buffer_info[tx_ring->next_to_use]; in this case I think you currently are leaking the skbs and dma mappings that igc_init_empty_frame() did. you're going to mix IGC_TX_BUFFER_TYPE_XSK with IGC_TX_BUFFER_TYPE_SKB and the latter is not explicitly initialized. Even though IGC_TX_BUFFER_TYPE_SKB happens to be equal to 0, igc_tx_buffer::type is never cleared in the tx clean desc routine. > + } > + > + igc_tx_ctxtdesc(tx_ring, launch_time_offset, first_flag, 0, 0, 0); > +} > + > const struct xsk_tx_metadata_ops igc_xsk_tx_metadata_ops = { > .tmo_request_timestamp = igc_xsk_request_timestamp, > .tmo_fill_timestamp = igc_xsk_fill_timestamp, > + .tmo_request_launch_time = igc_xsk_request_launch_time, > }; > > static void igc_xdp_xmit_zc(struct igc_ring *ring) > @@ -2976,7 +3000,13 @@ static void igc_xdp_xmit_zc(struct igc_ring *ring) > ntu = ring->next_to_use; > budget = igc_desc_unused(ring); > > - while (xsk_tx_peek_desc(pool, &xdp_desc) && budget--) { > + /* Packets with launch time require one data descriptor and one context > + * descriptor. When the launch time falls into the next Qbv cycle, we > + * may need to insert an empty packet, which requires two more > + * descriptors. Therefore, to be safe, we always ensure we have at least > + * 4 descriptors available. > + */ > + while (xsk_tx_peek_desc(pool, &xdp_desc) && budget >= 4) { > struct igc_metadata_request meta_req; > struct xsk_tx_metadata *meta = NULL; > struct igc_tx_buffer *bi; > @@ -3000,6 +3030,12 @@ static void igc_xdp_xmit_zc(struct igc_ring *ring) > xsk_tx_metadata_request(meta, &igc_xsk_tx_metadata_ops, > &meta_req); > > + /* xsk_tx_metadata_request() may have updated next_to_use */ > + ntu = ring->next_to_use; > + > + /* xsk_tx_metadata_request() may have updated Tx buffer info */ > + bi = meta_req.tx_buffer; > + > tx_desc = IGC_TX_DESC(ring, ntu); > tx_desc->read.cmd_type_len = cpu_to_le32(meta_req.cmd_type); > tx_desc->read.olinfo_status = cpu_to_le32(olinfo_status); > @@ -3017,9 +3053,11 @@ static void igc_xdp_xmit_zc(struct igc_ring *ring) > ntu++; > if (ntu == ring->count) > ntu = 0; > + > + ring->next_to_use = ntu; > + budget = igc_desc_unused(ring); why count the remaining space in loop? couldn't you decrement it accordingly to the count of descriptors you have produced? writing ntu back and forth between local var and ring struct performance-wise does not look fine. > } > > - ring->next_to_use = ntu; > if (tx_desc) { > igc_flush_tx_descriptors(ring); > xsk_tx_release(pool); > -- > 2.34.1 >
On Tuesday, February 4, 2025 6:10 PM, Fijalkowski, Maciej <maciej.fijalkowski@intel.com> wrote: >On Tue, Feb 04, 2025 at 08:49:07AM +0800, Song Yoong Siang wrote: > >> Enable Launch Time Control (LTC) support for XDP zero copy via XDP Tx >> metadata framework. >> >> This patch has been tested with tools/testing/selftests/bpf/xdp_hw_metadata >> on Intel I225-LM Ethernet controller. Below are the test steps and result. >> >> Test 1: Send a single packet with the launch time set to 1 s in the future. >> >> Test steps: >> 1. On the DUT, start the xdp_hw_metadata selftest application: >> $ sudo ./xdp_hw_metadata enp2s0 -l 1000000000 -L 1 >> >> 2. On the Link Partner, send a UDP packet with VLAN priority 1 to port 9091 >> of the DUT. >> >> Result: >> When the launch time is set to 1 s in the future, the delta between the >> launch time and the transmit hardware timestamp is 0.016 us, as shown in >> printout of the xdp_hw_metadata application below. >> 0x562ff5dc8880: rx_desc[4]->addr=84110 addr=84110 comp_addr=84110 EoP >> rx_hash: 0xE343384 with RSS type:0x1 >> HW RX-time: 1734578015467548904 (sec:1734578015.4675) >> delta to User RX-time sec:0.0002 (183.103 usec) >> XDP RX-time: 1734578015467651698 (sec:1734578015.4677) >> delta to User RX-time sec:0.0001 (80.309 usec) >> No rx_vlan_tci or rx_vlan_proto, err=-95 >> 0x562ff5dc8880: ping-pong with csum=561c (want c7dd) >> csum_start=34 csum_offset=6 >> HW RX-time: 1734578015467548904 (sec:1734578015.4675) >> delta to HW Launch-time sec:1.0000 (1000000.000 usec) >> 0x562ff5dc8880: complete tx idx=4 addr=4018 >> HW Launch-time: 1734578016467548904 (sec:1734578016.4675) >> delta to HW TX-complete-time sec:0.0000 (0.016 usec) >> HW TX-complete-time: 1734578016467548920 (sec:1734578016.4675) >> delta to User TX-complete-time sec:0.0000 >> (32.546 usec) >> XDP RX-time: 1734578015467651698 (sec:1734578015.4677) >> delta to User TX-complete-time sec:0.9999 >> (999929.768 usec) >> HW RX-time: 1734578015467548904 (sec:1734578015.4675) >> delta to HW TX-complete-time sec:1.0000 (1000000.016 usec) >> 0x562ff5dc8880: complete rx idx=132 addr=84110 >> >> Test 2: Send 1000 packets with a 10 ms interval and the launch time set to >> 500 us in the future. >> >> Test steps: >> 1. On the DUT, start the xdp_hw_metadata selftest application: >> $ sudo chrt -f 99 ./xdp_hw_metadata enp2s0 -l 500000 -L 1 > \ >> /dev/shm/result.log >> >> 2. On the Link Partner, send 1000 UDP packets with a 10 ms interval and >> VLAN priority 1 to port 9091 of the DUT. >> >> Result: >> When the launch time is set to 500 us in the future, the average delta >> between the launch time and the transmit hardware timestamp is 0.016 us, >> as shown in the analysis of /dev/shm/result.log below. The XDP launch time >> works correctly in sending 1000 packets continuously. >> Min delta: 0.005 us >> Avr delta: 0.016 us >> Max delta: 0.031 us >> Total packets forwarded: 1000 >> >> Signed-off-by: Song Yoong Siang <yoong.siang.song@intel.com> >> --- >> drivers/net/ethernet/intel/igc/igc_main.c | 42 +++++++++++++++++++++-- >> 1 file changed, 40 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/net/ethernet/intel/igc/igc_main.c >b/drivers/net/ethernet/intel/igc/igc_main.c >> index c3edd8bcf633..535d340c71c9 100644 >> --- a/drivers/net/ethernet/intel/igc/igc_main.c >> +++ b/drivers/net/ethernet/intel/igc/igc_main.c >> @@ -2951,9 +2951,33 @@ static u64 igc_xsk_fill_timestamp(void *_priv) >> return *(u64 *)_priv; >> } >> >> +static void igc_xsk_request_launch_time(u64 launch_time, void *_priv) >> +{ >> + struct igc_metadata_request *meta_req = _priv; >> + struct igc_ring *tx_ring = meta_req->tx_ring; >> + __le32 launch_time_offset; >> + bool insert_empty = false; >> + bool first_flag = false; >> + >> + if (!tx_ring->launchtime_enable) >> + return; >> + >> + launch_time_offset = igc_tx_launchtime(tx_ring, >> + ns_to_ktime(launch_time), >> + &first_flag, &insert_empty); >> + if (insert_empty) { >> + igc_insert_empty_packet(tx_ring); >> + meta_req->tx_buffer = >> + &tx_ring->tx_buffer_info[tx_ring->next_to_use]; > >in this case I think you currently are leaking the skbs and dma mappings >that igc_init_empty_frame() did. you're going to mix >IGC_TX_BUFFER_TYPE_XSK with IGC_TX_BUFFER_TYPE_SKB and the latter is not >explicitly initialized. Even though IGC_TX_BUFFER_TYPE_SKB happens to be >equal to 0, igc_tx_buffer::type is never cleared in the tx clean desc >routine. > Hi Fijalkowski Maciej, Thanks for your inputs. Yes, you are right, IGC_TX_BUFFER_TYPE_SKB is mixed together with IGC_TX_BUFFER_TYPE_XSK. Regarding the skb and dma map, following code in igc_clean_tx_irq() will free the skb and unmap the dma, Do these answer your concern on leaking? igc_main.c:3133: case IGC_TX_BUFFER_TYPE_SKB: igc_main.c-3134- napi_consume_skb(tx_buffer->skb, napi_budget); igc_main.c-3135- igc_unmap_tx_buffer(tx_ring->dev, tx_buffer); igc_main.c-3136- break; Regarding the igc_tx_buffer::type never cleared, I think the important thing is making the igc_tx_buffer::next_to_watch NULL to indicate no remaining packet. Since transmit function will always set the igc_tx_buffer::type to a proper type, I think it is optional for us to clear it. Is that make sense to you? >> + } >> + >> + igc_tx_ctxtdesc(tx_ring, launch_time_offset, first_flag, 0, 0, 0); >> +} >> + >> const struct xsk_tx_metadata_ops igc_xsk_tx_metadata_ops = { >> .tmo_request_timestamp = igc_xsk_request_timestamp, >> .tmo_fill_timestamp = igc_xsk_fill_timestamp, >> + .tmo_request_launch_time = igc_xsk_request_launch_time, >> }; >> >> static void igc_xdp_xmit_zc(struct igc_ring *ring) >> @@ -2976,7 +3000,13 @@ static void igc_xdp_xmit_zc(struct igc_ring *ring) >> ntu = ring->next_to_use; >> budget = igc_desc_unused(ring); >> >> - while (xsk_tx_peek_desc(pool, &xdp_desc) && budget--) { >> + /* Packets with launch time require one data descriptor and one context >> + * descriptor. When the launch time falls into the next Qbv cycle, we >> + * may need to insert an empty packet, which requires two more >> + * descriptors. Therefore, to be safe, we always ensure we have at least >> + * 4 descriptors available. >> + */ >> + while (xsk_tx_peek_desc(pool, &xdp_desc) && budget >= 4) { >> struct igc_metadata_request meta_req; >> struct xsk_tx_metadata *meta = NULL; >> struct igc_tx_buffer *bi; >> @@ -3000,6 +3030,12 @@ static void igc_xdp_xmit_zc(struct igc_ring *ring) >> xsk_tx_metadata_request(meta, &igc_xsk_tx_metadata_ops, >> &meta_req); >> >> + /* xsk_tx_metadata_request() may have updated next_to_use */ >> + ntu = ring->next_to_use; >> + >> + /* xsk_tx_metadata_request() may have updated Tx buffer info */ >> + bi = meta_req.tx_buffer; >> + >> tx_desc = IGC_TX_DESC(ring, ntu); >> tx_desc->read.cmd_type_len = cpu_to_le32(meta_req.cmd_type); >> tx_desc->read.olinfo_status = cpu_to_le32(olinfo_status); >> @@ -3017,9 +3053,11 @@ static void igc_xdp_xmit_zc(struct igc_ring *ring) >> ntu++; >> if (ntu == ring->count) >> ntu = 0; >> + >> + ring->next_to_use = ntu; >> + budget = igc_desc_unused(ring); > >why count the remaining space in loop? couldn't you decrement it >accordingly to the count of descriptors you have produced? writing ntu >back and forth between local var and ring struct performance-wise does not >look fine. > Yes, I can check the number of used descriptor in xsk_tx_metadata_request() by introducing a new field named used_desc in struct igc_metadata_request, and then decreases the budget with it. Do this way looked good to you? Thanks & Regards Siang >> } >> >> - ring->next_to_use = ntu; >> if (tx_desc) { >> igc_flush_tx_descriptors(ring); >> xsk_tx_release(pool); >> -- >> 2.34.1 >>
On Tue, Feb 04, 2025 at 02:14:00PM +0100, Song, Yoong Siang wrote: > On Tuesday, February 4, 2025 6:10 PM, Fijalkowski, Maciej <maciej.fijalkowski@intel.com> wrote: > >On Tue, Feb 04, 2025 at 08:49:07AM +0800, Song Yoong Siang wrote: > > > >> Enable Launch Time Control (LTC) support for XDP zero copy via XDP Tx > >> metadata framework. > >> > >> This patch has been tested with tools/testing/selftests/bpf/xdp_hw_metadata > >> on Intel I225-LM Ethernet controller. Below are the test steps and result. > >> > >> Test 1: Send a single packet with the launch time set to 1 s in the future. > >> > >> Test steps: > >> 1. On the DUT, start the xdp_hw_metadata selftest application: > >> $ sudo ./xdp_hw_metadata enp2s0 -l 1000000000 -L 1 > >> > >> 2. On the Link Partner, send a UDP packet with VLAN priority 1 to port 9091 > >> of the DUT. > >> > >> Result: > >> When the launch time is set to 1 s in the future, the delta between the > >> launch time and the transmit hardware timestamp is 0.016 us, as shown in > >> printout of the xdp_hw_metadata application below. > >> 0x562ff5dc8880: rx_desc[4]->addr=84110 addr=84110 comp_addr=84110 EoP > >> rx_hash: 0xE343384 with RSS type:0x1 > >> HW RX-time: 1734578015467548904 (sec:1734578015.4675) > >> delta to User RX-time sec:0.0002 (183.103 usec) > >> XDP RX-time: 1734578015467651698 (sec:1734578015.4677) > >> delta to User RX-time sec:0.0001 (80.309 usec) > >> No rx_vlan_tci or rx_vlan_proto, err=-95 > >> 0x562ff5dc8880: ping-pong with csum=561c (want c7dd) > >> csum_start=34 csum_offset=6 > >> HW RX-time: 1734578015467548904 (sec:1734578015.4675) > >> delta to HW Launch-time sec:1.0000 (1000000.000 usec) > >> 0x562ff5dc8880: complete tx idx=4 addr=4018 > >> HW Launch-time: 1734578016467548904 (sec:1734578016.4675) > >> delta to HW TX-complete-time sec:0.0000 (0.016 usec) > >> HW TX-complete-time: 1734578016467548920 (sec:1734578016.4675) > >> delta to User TX-complete-time sec:0.0000 > >> (32.546 usec) > >> XDP RX-time: 1734578015467651698 (sec:1734578015.4677) > >> delta to User TX-complete-time sec:0.9999 > >> (999929.768 usec) > >> HW RX-time: 1734578015467548904 (sec:1734578015.4675) > >> delta to HW TX-complete-time sec:1.0000 (1000000.016 usec) > >> 0x562ff5dc8880: complete rx idx=132 addr=84110 > >> > >> Test 2: Send 1000 packets with a 10 ms interval and the launch time set to > >> 500 us in the future. > >> > >> Test steps: > >> 1. On the DUT, start the xdp_hw_metadata selftest application: > >> $ sudo chrt -f 99 ./xdp_hw_metadata enp2s0 -l 500000 -L 1 > \ > >> /dev/shm/result.log > >> > >> 2. On the Link Partner, send 1000 UDP packets with a 10 ms interval and > >> VLAN priority 1 to port 9091 of the DUT. > >> > >> Result: > >> When the launch time is set to 500 us in the future, the average delta > >> between the launch time and the transmit hardware timestamp is 0.016 us, > >> as shown in the analysis of /dev/shm/result.log below. The XDP launch time > >> works correctly in sending 1000 packets continuously. > >> Min delta: 0.005 us > >> Avr delta: 0.016 us > >> Max delta: 0.031 us > >> Total packets forwarded: 1000 > >> > >> Signed-off-by: Song Yoong Siang <yoong.siang.song@intel.com> > >> --- > >> drivers/net/ethernet/intel/igc/igc_main.c | 42 +++++++++++++++++++++-- > >> 1 file changed, 40 insertions(+), 2 deletions(-) > >> > >> diff --git a/drivers/net/ethernet/intel/igc/igc_main.c > >b/drivers/net/ethernet/intel/igc/igc_main.c > >> index c3edd8bcf633..535d340c71c9 100644 > >> --- a/drivers/net/ethernet/intel/igc/igc_main.c > >> +++ b/drivers/net/ethernet/intel/igc/igc_main.c > >> @@ -2951,9 +2951,33 @@ static u64 igc_xsk_fill_timestamp(void *_priv) > >> return *(u64 *)_priv; > >> } > >> > >> +static void igc_xsk_request_launch_time(u64 launch_time, void *_priv) > >> +{ > >> + struct igc_metadata_request *meta_req = _priv; > >> + struct igc_ring *tx_ring = meta_req->tx_ring; > >> + __le32 launch_time_offset; > >> + bool insert_empty = false; > >> + bool first_flag = false; > >> + > >> + if (!tx_ring->launchtime_enable) > >> + return; > >> + > >> + launch_time_offset = igc_tx_launchtime(tx_ring, > >> + ns_to_ktime(launch_time), > >> + &first_flag, &insert_empty); > >> + if (insert_empty) { > >> + igc_insert_empty_packet(tx_ring); > >> + meta_req->tx_buffer = > >> + &tx_ring->tx_buffer_info[tx_ring->next_to_use]; > > > >in this case I think you currently are leaking the skbs and dma mappings > >that igc_init_empty_frame() did. you're going to mix > >IGC_TX_BUFFER_TYPE_XSK with IGC_TX_BUFFER_TYPE_SKB and the latter is not > >explicitly initialized. Even though IGC_TX_BUFFER_TYPE_SKB happens to be > >equal to 0, igc_tx_buffer::type is never cleared in the tx clean desc > >routine. > > > > Hi Fijalkowski Maciej, > > Thanks for your inputs. > > Yes, you are right, IGC_TX_BUFFER_TYPE_SKB is mixed together with > IGC_TX_BUFFER_TYPE_XSK. Regarding the skb and dma map, > following code in igc_clean_tx_irq() will free the skb and unmap the dma, > Do these answer your concern on leaking? > > igc_main.c:3133: case IGC_TX_BUFFER_TYPE_SKB: > igc_main.c-3134- napi_consume_skb(tx_buffer->skb, napi_budget); > igc_main.c-3135- igc_unmap_tx_buffer(tx_ring->dev, tx_buffer); > igc_main.c-3136- break; > > Regarding the igc_tx_buffer::type never cleared, I think the > important thing is making the igc_tx_buffer::next_to_watch NULL > to indicate no remaining packet. Since transmit function will > always set the igc_tx_buffer::type to a proper type, igc_init_tx_empty_descriptor() does not set ::type, that was my point. So these empty descs might be treated as IGC_TX_BUFFER_TYPE_XSK, which is wrong and your qouted code above will never get called. You will then leak skb and dma map plus you will confuse XSK code due to xsk_frames miscount. > I think it is optional for us to clear it. > Is that make sense to you? > > >> + } > >> + > >> + igc_tx_ctxtdesc(tx_ring, launch_time_offset, first_flag, 0, 0, 0); > >> +} > >> + > >> const struct xsk_tx_metadata_ops igc_xsk_tx_metadata_ops = { > >> .tmo_request_timestamp = igc_xsk_request_timestamp, > >> .tmo_fill_timestamp = igc_xsk_fill_timestamp, > >> + .tmo_request_launch_time = igc_xsk_request_launch_time, > >> }; > >> > >> static void igc_xdp_xmit_zc(struct igc_ring *ring) > >> @@ -2976,7 +3000,13 @@ static void igc_xdp_xmit_zc(struct igc_ring *ring) > >> ntu = ring->next_to_use; > >> budget = igc_desc_unused(ring); > >> > >> - while (xsk_tx_peek_desc(pool, &xdp_desc) && budget--) { > >> + /* Packets with launch time require one data descriptor and one context > >> + * descriptor. When the launch time falls into the next Qbv cycle, we > >> + * may need to insert an empty packet, which requires two more > >> + * descriptors. Therefore, to be safe, we always ensure we have at least > >> + * 4 descriptors available. > >> + */ > >> + while (xsk_tx_peek_desc(pool, &xdp_desc) && budget >= 4) { > >> struct igc_metadata_request meta_req; > >> struct xsk_tx_metadata *meta = NULL; > >> struct igc_tx_buffer *bi; > >> @@ -3000,6 +3030,12 @@ static void igc_xdp_xmit_zc(struct igc_ring *ring) > >> xsk_tx_metadata_request(meta, &igc_xsk_tx_metadata_ops, > >> &meta_req); > >> > >> + /* xsk_tx_metadata_request() may have updated next_to_use */ > >> + ntu = ring->next_to_use; > >> + > >> + /* xsk_tx_metadata_request() may have updated Tx buffer info */ > >> + bi = meta_req.tx_buffer; > >> + > >> tx_desc = IGC_TX_DESC(ring, ntu); > >> tx_desc->read.cmd_type_len = cpu_to_le32(meta_req.cmd_type); > >> tx_desc->read.olinfo_status = cpu_to_le32(olinfo_status); > >> @@ -3017,9 +3053,11 @@ static void igc_xdp_xmit_zc(struct igc_ring *ring) > >> ntu++; > >> if (ntu == ring->count) > >> ntu = 0; > >> + > >> + ring->next_to_use = ntu; > >> + budget = igc_desc_unused(ring); > > > >why count the remaining space in loop? couldn't you decrement it > >accordingly to the count of descriptors you have produced? writing ntu > >back and forth between local var and ring struct performance-wise does not > >look fine. > > > > Yes, I can check the number of used descriptor in xsk_tx_metadata_request() > by introducing a new field named used_desc in struct igc_metadata_request, > and then decreases the budget with it. > > Do this way looked good to you? Yes this makes sense to me, thanks! > > Thanks & Regards > Siang > > >> } > >> > >> - ring->next_to_use = ntu; > >> if (tx_desc) { > >> igc_flush_tx_descriptors(ring); > >> xsk_tx_release(pool); > >> -- > >> 2.34.1 > >>
On Tuesday, February 4, 2025 10:04 PM, Fijalkowski, Maciej <maciej.fijalkowski@intel.com> wrote: >On Tue, Feb 04, 2025 at 02:14:00PM +0100, Song, Yoong Siang wrote: >> On Tuesday, February 4, 2025 6:10 PM, Fijalkowski, Maciej<maciej.fijalkowski@intel.com> wrote: >> >On Tue, Feb 04, 2025 at 08:49:07AM +0800, Song Yoong Siang wrote: >> > >> >> Enable Launch Time Control (LTC) support for XDP zero copy via XDP Tx >> >> metadata framework. >> >> >> >> This patch has been tested with tools/testing/selftests/bpf/xdp_hw_metadata >> >> on Intel I225-LM Ethernet controller. Below are the test steps and result. >> >> >> >> Test 1: Send a single packet with the launch time set to 1 s in the future. >> >> >> >> Test steps: >> >> 1. On the DUT, start the xdp_hw_metadata selftest application: >> >> $ sudo ./xdp_hw_metadata enp2s0 -l 1000000000 -L 1 >> >> >> >> 2. On the Link Partner, send a UDP packet with VLAN priority 1 to port 9091 >> >> of the DUT. >> >> >> >> Result: >> >> When the launch time is set to 1 s in the future, the delta between the >> >> launch time and the transmit hardware timestamp is 0.016 us, as shown in >> >> printout of the xdp_hw_metadata application below. >> >> 0x562ff5dc8880: rx_desc[4]->addr=84110 addr=84110 comp_addr=84110 EoP >> >> rx_hash: 0xE343384 with RSS type:0x1 >> >> HW RX-time: 1734578015467548904 (sec:1734578015.4675) >> >> delta to User RX-time sec:0.0002 (183.103 usec) >> >> XDP RX-time: 1734578015467651698 (sec:1734578015.4677) >> >> delta to User RX-time sec:0.0001 (80.309 usec) >> >> No rx_vlan_tci or rx_vlan_proto, err=-95 >> >> 0x562ff5dc8880: ping-pong with csum=561c (want c7dd) >> >> csum_start=34 csum_offset=6 >> >> HW RX-time: 1734578015467548904 (sec:1734578015.4675) >> >> delta to HW Launch-time sec:1.0000 (1000000.000 usec) >> >> 0x562ff5dc8880: complete tx idx=4 addr=4018 >> >> HW Launch-time: 1734578016467548904 (sec:1734578016.4675) >> >> delta to HW TX-complete-time sec:0.0000 (0.016 usec) >> >> HW TX-complete-time: 1734578016467548920 (sec:1734578016.4675) >> >> delta to User TX-complete-time sec:0.0000 >> >> (32.546 usec) >> >> XDP RX-time: 1734578015467651698 (sec:1734578015.4677) >> >> delta to User TX-complete-time sec:0.9999 >> >> (999929.768 usec) >> >> HW RX-time: 1734578015467548904 (sec:1734578015.4675) >> >> delta to HW TX-complete-time sec:1.0000 (1000000.016 usec) >> >> 0x562ff5dc8880: complete rx idx=132 addr=84110 >> >> >> >> Test 2: Send 1000 packets with a 10 ms interval and the launch time set to >> >> 500 us in the future. >> >> >> >> Test steps: >> >> 1. On the DUT, start the xdp_hw_metadata selftest application: >> >> $ sudo chrt -f 99 ./xdp_hw_metadata enp2s0 -l 500000 -L 1 > \ >> >> /dev/shm/result.log >> >> >> >> 2. On the Link Partner, send 1000 UDP packets with a 10 ms interval and >> >> VLAN priority 1 to port 9091 of the DUT. >> >> >> >> Result: >> >> When the launch time is set to 500 us in the future, the average delta >> >> between the launch time and the transmit hardware timestamp is 0.016 us, >> >> as shown in the analysis of /dev/shm/result.log below. The XDP launch time >> >> works correctly in sending 1000 packets continuously. >> >> Min delta: 0.005 us >> >> Avr delta: 0.016 us >> >> Max delta: 0.031 us >> >> Total packets forwarded: 1000 >> >> >> >> Signed-off-by: Song Yoong Siang <yoong.siang.song@intel.com> >> >> --- >> >> drivers/net/ethernet/intel/igc/igc_main.c | 42 +++++++++++++++++++++-- >> >> 1 file changed, 40 insertions(+), 2 deletions(-) >> >> >> >> diff --git a/drivers/net/ethernet/intel/igc/igc_main.c >> >b/drivers/net/ethernet/intel/igc/igc_main.c >> >> index c3edd8bcf633..535d340c71c9 100644 >> >> --- a/drivers/net/ethernet/intel/igc/igc_main.c >> >> +++ b/drivers/net/ethernet/intel/igc/igc_main.c >> >> @@ -2951,9 +2951,33 @@ static u64 igc_xsk_fill_timestamp(void *_priv) >> >> return *(u64 *)_priv; >> >> } >> >> >> >> +static void igc_xsk_request_launch_time(u64 launch_time, void *_priv) >> >> +{ >> >> + struct igc_metadata_request *meta_req = _priv; >> >> + struct igc_ring *tx_ring = meta_req->tx_ring; >> >> + __le32 launch_time_offset; >> >> + bool insert_empty = false; >> >> + bool first_flag = false; >> >> + >> >> + if (!tx_ring->launchtime_enable) >> >> + return; >> >> + >> >> + launch_time_offset = igc_tx_launchtime(tx_ring, >> >> + ns_to_ktime(launch_time), >> >> + &first_flag, &insert_empty); >> >> + if (insert_empty) { >> >> + igc_insert_empty_packet(tx_ring); >> >> + meta_req->tx_buffer = >> >> + &tx_ring->tx_buffer_info[tx_ring->next_to_use]; >> > >> >in this case I think you currently are leaking the skbs and dma mappings >> >that igc_init_empty_frame() did. you're going to mix >> >IGC_TX_BUFFER_TYPE_XSK with IGC_TX_BUFFER_TYPE_SKB and the latter is not >> >explicitly initialized. Even though IGC_TX_BUFFER_TYPE_SKB happens to be >> >equal to 0, igc_tx_buffer::type is never cleared in the tx clean desc >> >routine. >> > >> >> Hi Fijalkowski Maciej, >> >> Thanks for your inputs. >> >> Yes, you are right, IGC_TX_BUFFER_TYPE_SKB is mixed together with >> IGC_TX_BUFFER_TYPE_XSK. Regarding the skb and dma map, >> following code in igc_clean_tx_irq() will free the skb and unmap the dma, >> Do these answer your concern on leaking? >> >> igc_main.c:3133: case IGC_TX_BUFFER_TYPE_SKB: >> igc_main.c-3134- napi_consume_skb(tx_buffer->skb, napi_budget); >> igc_main.c-3135- igc_unmap_tx_buffer(tx_ring->dev, tx_buffer); >> igc_main.c-3136- break; >> >> Regarding the igc_tx_buffer::type never cleared, I think the >> important thing is making the igc_tx_buffer::next_to_watch NULL >> to indicate no remaining packet. Since transmit function will >> always set the igc_tx_buffer::type to a proper type, > >igc_init_tx_empty_descriptor() does not set ::type, that was my point. So >these empty descs might be treated as IGC_TX_BUFFER_TYPE_XSK, which is >wrong and your qouted code above will never get called. You will then leak >skb and dma map plus you will confuse XSK code due to xsk_frames miscount. > Now I understand what you mean. Thanks for the explanation. I will add the missing IGC_TX_BUFFER_TYPE_SKB initialization in igc_init_empty_frame(), as below. --- a/drivers/net/ethernet/intel/igc/igc_main.c +++ b/drivers/net/ethernet/intel/igc/igc_main.c @@ -1096,6 +1096,7 @@ static int igc_init_empty_frame(struct igc_ring *ring, return -ENOMEM; } + buffer->type = IGC_TX_BUFFER_TYPE_SKB; buffer->skb = skb; buffer->protocol = 0; buffer->bytecount = skb->len; With above, IMHO, we no need to clear igc_tx_buffer::type, Are you agree? >> I think it is optional for us to clear it. >> Is that make sense to you? >> >> >> + } >> >> + >> >> + igc_tx_ctxtdesc(tx_ring, launch_time_offset, first_flag, 0, 0, 0); >> >> +} >> >> + >> >> const struct xsk_tx_metadata_ops igc_xsk_tx_metadata_ops = { >> >> .tmo_request_timestamp = igc_xsk_request_timestamp, >> >> .tmo_fill_timestamp = igc_xsk_fill_timestamp, >> >> + .tmo_request_launch_time = igc_xsk_request_launch_time, >> >> }; >> >> >> >> static void igc_xdp_xmit_zc(struct igc_ring *ring) >> >> @@ -2976,7 +3000,13 @@ static void igc_xdp_xmit_zc(struct igc_ring *ring) >> >> ntu = ring->next_to_use; >> >> budget = igc_desc_unused(ring); >> >> >> >> - while (xsk_tx_peek_desc(pool, &xdp_desc) && budget--) { >> >> + /* Packets with launch time require one data descriptor and one context >> >> + * descriptor. When the launch time falls into the next Qbv cycle, we >> >> + * may need to insert an empty packet, which requires two more >> >> + * descriptors. Therefore, to be safe, we always ensure we have at least >> >> + * 4 descriptors available. >> >> + */ >> >> + while (xsk_tx_peek_desc(pool, &xdp_desc) && budget >= 4) { >> >> struct igc_metadata_request meta_req; >> >> struct xsk_tx_metadata *meta = NULL; >> >> struct igc_tx_buffer *bi; >> >> @@ -3000,6 +3030,12 @@ static void igc_xdp_xmit_zc(struct igc_ring *ring) >> >> xsk_tx_metadata_request(meta, &igc_xsk_tx_metadata_ops, >> >> &meta_req); >> >> >> >> + /* xsk_tx_metadata_request() may have updated next_to_use */ >> >> + ntu = ring->next_to_use; >> >> + >> >> + /* xsk_tx_metadata_request() may have updated Tx buffer info */ >> >> + bi = meta_req.tx_buffer; >> >> + >> >> tx_desc = IGC_TX_DESC(ring, ntu); >> >> tx_desc->read.cmd_type_len = cpu_to_le32(meta_req.cmd_type); >> >> tx_desc->read.olinfo_status = cpu_to_le32(olinfo_status); >> >> @@ -3017,9 +3053,11 @@ static void igc_xdp_xmit_zc(struct igc_ring *ring) >> >> ntu++; >> >> if (ntu == ring->count) >> >> ntu = 0; >> >> + >> >> + ring->next_to_use = ntu; >> >> + budget = igc_desc_unused(ring); >> > >> >why count the remaining space in loop? couldn't you decrement it >> >accordingly to the count of descriptors you have produced? writing ntu >> >back and forth between local var and ring struct performance-wise does not >> >look fine. >> > >> >> Yes, I can check the number of used descriptor in xsk_tx_metadata_request() >> by introducing a new field named used_desc in struct igc_metadata_request, >> and then decreases the budget with it. >> >> Do this way looked good to you? > >Yes this makes sense to me, thanks! > Thanks, I will rework and submit next version. >> >> Thanks & Regards >> Siang >> >> >> } >> >> >> >> - ring->next_to_use = ntu; >> >> if (tx_desc) { >> >> igc_flush_tx_descriptors(ring); >> >> xsk_tx_release(pool); >> >> -- >> >> 2.34.1 >> >> Thanks & Regards Siang
On Tue, Feb 04, 2025 at 03:49:32PM +0100, Song, Yoong Siang wrote: > On Tuesday, February 4, 2025 10:04 PM, Fijalkowski, Maciej <maciej.fijalkowski@intel.com> wrote: > >On Tue, Feb 04, 2025 at 02:14:00PM +0100, Song, Yoong Siang wrote: > >> On Tuesday, February 4, 2025 6:10 PM, Fijalkowski, Maciej<maciej.fijalkowski@intel.com> wrote: > >> >On Tue, Feb 04, 2025 at 08:49:07AM +0800, Song Yoong Siang wrote: > >> > > >> >> Enable Launch Time Control (LTC) support for XDP zero copy via XDP Tx > >> >> metadata framework. > >> >> > >> >> This patch has been tested with tools/testing/selftests/bpf/xdp_hw_metadata > >> >> on Intel I225-LM Ethernet controller. Below are the test steps and result. > >> >> > >> >> Test 1: Send a single packet with the launch time set to 1 s in the future. > >> >> > >> >> Test steps: > >> >> 1. On the DUT, start the xdp_hw_metadata selftest application: > >> >> $ sudo ./xdp_hw_metadata enp2s0 -l 1000000000 -L 1 > >> >> > >> >> 2. On the Link Partner, send a UDP packet with VLAN priority 1 to port 9091 > >> >> of the DUT. > >> >> > >> >> Result: > >> >> When the launch time is set to 1 s in the future, the delta between the > >> >> launch time and the transmit hardware timestamp is 0.016 us, as shown in > >> >> printout of the xdp_hw_metadata application below. > >> >> 0x562ff5dc8880: rx_desc[4]->addr=84110 addr=84110 comp_addr=84110 EoP > >> >> rx_hash: 0xE343384 with RSS type:0x1 > >> >> HW RX-time: 1734578015467548904 (sec:1734578015.4675) > >> >> delta to User RX-time sec:0.0002 (183.103 usec) > >> >> XDP RX-time: 1734578015467651698 (sec:1734578015.4677) > >> >> delta to User RX-time sec:0.0001 (80.309 usec) > >> >> No rx_vlan_tci or rx_vlan_proto, err=-95 > >> >> 0x562ff5dc8880: ping-pong with csum=561c (want c7dd) > >> >> csum_start=34 csum_offset=6 > >> >> HW RX-time: 1734578015467548904 (sec:1734578015.4675) > >> >> delta to HW Launch-time sec:1.0000 (1000000.000 usec) > >> >> 0x562ff5dc8880: complete tx idx=4 addr=4018 > >> >> HW Launch-time: 1734578016467548904 (sec:1734578016.4675) > >> >> delta to HW TX-complete-time sec:0.0000 (0.016 usec) > >> >> HW TX-complete-time: 1734578016467548920 (sec:1734578016.4675) > >> >> delta to User TX-complete-time sec:0.0000 > >> >> (32.546 usec) > >> >> XDP RX-time: 1734578015467651698 (sec:1734578015.4677) > >> >> delta to User TX-complete-time sec:0.9999 > >> >> (999929.768 usec) > >> >> HW RX-time: 1734578015467548904 (sec:1734578015.4675) > >> >> delta to HW TX-complete-time sec:1.0000 (1000000.016 usec) > >> >> 0x562ff5dc8880: complete rx idx=132 addr=84110 > >> >> > >> >> Test 2: Send 1000 packets with a 10 ms interval and the launch time set to > >> >> 500 us in the future. > >> >> > >> >> Test steps: > >> >> 1. On the DUT, start the xdp_hw_metadata selftest application: > >> >> $ sudo chrt -f 99 ./xdp_hw_metadata enp2s0 -l 500000 -L 1 > \ > >> >> /dev/shm/result.log > >> >> > >> >> 2. On the Link Partner, send 1000 UDP packets with a 10 ms interval and > >> >> VLAN priority 1 to port 9091 of the DUT. > >> >> > >> >> Result: > >> >> When the launch time is set to 500 us in the future, the average delta > >> >> between the launch time and the transmit hardware timestamp is 0.016 us, > >> >> as shown in the analysis of /dev/shm/result.log below. The XDP launch time > >> >> works correctly in sending 1000 packets continuously. > >> >> Min delta: 0.005 us > >> >> Avr delta: 0.016 us > >> >> Max delta: 0.031 us > >> >> Total packets forwarded: 1000 > >> >> > >> >> Signed-off-by: Song Yoong Siang <yoong.siang.song@intel.com> > >> >> --- > >> >> drivers/net/ethernet/intel/igc/igc_main.c | 42 +++++++++++++++++++++-- > >> >> 1 file changed, 40 insertions(+), 2 deletions(-) > >> >> > >> >> diff --git a/drivers/net/ethernet/intel/igc/igc_main.c > >> >b/drivers/net/ethernet/intel/igc/igc_main.c > >> >> index c3edd8bcf633..535d340c71c9 100644 > >> >> --- a/drivers/net/ethernet/intel/igc/igc_main.c > >> >> +++ b/drivers/net/ethernet/intel/igc/igc_main.c > >> >> @@ -2951,9 +2951,33 @@ static u64 igc_xsk_fill_timestamp(void *_priv) > >> >> return *(u64 *)_priv; > >> >> } > >> >> > >> >> +static void igc_xsk_request_launch_time(u64 launch_time, void *_priv) > >> >> +{ > >> >> + struct igc_metadata_request *meta_req = _priv; > >> >> + struct igc_ring *tx_ring = meta_req->tx_ring; > >> >> + __le32 launch_time_offset; > >> >> + bool insert_empty = false; > >> >> + bool first_flag = false; > >> >> + > >> >> + if (!tx_ring->launchtime_enable) > >> >> + return; > >> >> + > >> >> + launch_time_offset = igc_tx_launchtime(tx_ring, > >> >> + ns_to_ktime(launch_time), > >> >> + &first_flag, &insert_empty); > >> >> + if (insert_empty) { > >> >> + igc_insert_empty_packet(tx_ring); > >> >> + meta_req->tx_buffer = > >> >> + &tx_ring->tx_buffer_info[tx_ring->next_to_use]; > >> > > >> >in this case I think you currently are leaking the skbs and dma mappings > >> >that igc_init_empty_frame() did. you're going to mix > >> >IGC_TX_BUFFER_TYPE_XSK with IGC_TX_BUFFER_TYPE_SKB and the latter is not > >> >explicitly initialized. Even though IGC_TX_BUFFER_TYPE_SKB happens to be > >> >equal to 0, igc_tx_buffer::type is never cleared in the tx clean desc > >> >routine. > >> > > >> > >> Hi Fijalkowski Maciej, > >> > >> Thanks for your inputs. > >> > >> Yes, you are right, IGC_TX_BUFFER_TYPE_SKB is mixed together with > >> IGC_TX_BUFFER_TYPE_XSK. Regarding the skb and dma map, > >> following code in igc_clean_tx_irq() will free the skb and unmap the dma, > >> Do these answer your concern on leaking? > >> > >> igc_main.c:3133: case IGC_TX_BUFFER_TYPE_SKB: > >> igc_main.c-3134- napi_consume_skb(tx_buffer->skb, napi_budget); > >> igc_main.c-3135- igc_unmap_tx_buffer(tx_ring->dev, tx_buffer); > >> igc_main.c-3136- break; > >> > >> Regarding the igc_tx_buffer::type never cleared, I think the > >> important thing is making the igc_tx_buffer::next_to_watch NULL > >> to indicate no remaining packet. Since transmit function will > >> always set the igc_tx_buffer::type to a proper type, > > > >igc_init_tx_empty_descriptor() does not set ::type, that was my point. So > >these empty descs might be treated as IGC_TX_BUFFER_TYPE_XSK, which is > >wrong and your qouted code above will never get called. You will then leak > >skb and dma map plus you will confuse XSK code due to xsk_frames miscount. > > > > Now I understand what you mean. Thanks for the explanation. I will add the > missing IGC_TX_BUFFER_TYPE_SKB initialization in igc_init_empty_frame(), > as below. > > --- a/drivers/net/ethernet/intel/igc/igc_main.c > +++ b/drivers/net/ethernet/intel/igc/igc_main.c > @@ -1096,6 +1096,7 @@ static int igc_init_empty_frame(struct igc_ring *ring, > return -ENOMEM; > } > > + buffer->type = IGC_TX_BUFFER_TYPE_SKB; > buffer->skb = skb; > buffer->protocol = 0; > buffer->bytecount = skb->len; > > With above, IMHO, we no need to clear igc_tx_buffer::type, > Are you agree? Yes, the contract should be that every routine that produces descriptors should be setting the ::type explicitly. Sorry for not being clear from beginning but I'm glad we're on the same page now. I'm afraid this single line should be send as a fix, though :< even without your patch set empty descs could be mixed IGC_TX_BUFFER_TYPE_XDP type. > > >> I think it is optional for us to clear it. > >> Is that make sense to you? > >> > >> >> + } > >> >> + > >> >> + igc_tx_ctxtdesc(tx_ring, launch_time_offset, first_flag, 0, 0, 0); > >> >> +} > >> >> + > >> >> const struct xsk_tx_metadata_ops igc_xsk_tx_metadata_ops = { > >> >> .tmo_request_timestamp = igc_xsk_request_timestamp, > >> >> .tmo_fill_timestamp = igc_xsk_fill_timestamp, > >> >> + .tmo_request_launch_time = igc_xsk_request_launch_time, > >> >> }; > >> >> > >> >> static void igc_xdp_xmit_zc(struct igc_ring *ring) > >> >> @@ -2976,7 +3000,13 @@ static void igc_xdp_xmit_zc(struct igc_ring *ring) > >> >> ntu = ring->next_to_use; > >> >> budget = igc_desc_unused(ring); > >> >> > >> >> - while (xsk_tx_peek_desc(pool, &xdp_desc) && budget--) { > >> >> + /* Packets with launch time require one data descriptor and one context > >> >> + * descriptor. When the launch time falls into the next Qbv cycle, we > >> >> + * may need to insert an empty packet, which requires two more > >> >> + * descriptors. Therefore, to be safe, we always ensure we have at least > >> >> + * 4 descriptors available. > >> >> + */ > >> >> + while (xsk_tx_peek_desc(pool, &xdp_desc) && budget >= 4) { > >> >> struct igc_metadata_request meta_req; > >> >> struct xsk_tx_metadata *meta = NULL; > >> >> struct igc_tx_buffer *bi; > >> >> @@ -3000,6 +3030,12 @@ static void igc_xdp_xmit_zc(struct igc_ring *ring) > >> >> xsk_tx_metadata_request(meta, &igc_xsk_tx_metadata_ops, > >> >> &meta_req); > >> >> > >> >> + /* xsk_tx_metadata_request() may have updated next_to_use */ > >> >> + ntu = ring->next_to_use; > >> >> + > >> >> + /* xsk_tx_metadata_request() may have updated Tx buffer info */ > >> >> + bi = meta_req.tx_buffer; > >> >> + > >> >> tx_desc = IGC_TX_DESC(ring, ntu); > >> >> tx_desc->read.cmd_type_len = cpu_to_le32(meta_req.cmd_type); > >> >> tx_desc->read.olinfo_status = cpu_to_le32(olinfo_status); > >> >> @@ -3017,9 +3053,11 @@ static void igc_xdp_xmit_zc(struct igc_ring *ring) > >> >> ntu++; > >> >> if (ntu == ring->count) > >> >> ntu = 0; > >> >> + > >> >> + ring->next_to_use = ntu; > >> >> + budget = igc_desc_unused(ring); > >> > > >> >why count the remaining space in loop? couldn't you decrement it > >> >accordingly to the count of descriptors you have produced? writing ntu > >> >back and forth between local var and ring struct performance-wise does not > >> >look fine. > >> > > >> > >> Yes, I can check the number of used descriptor in xsk_tx_metadata_request() > >> by introducing a new field named used_desc in struct igc_metadata_request, > >> and then decreases the budget with it. > >> > >> Do this way looked good to you? > > > >Yes this makes sense to me, thanks! > > > > Thanks, I will rework and submit next version. > > >> > >> Thanks & Regards > >> Siang > >> > >> >> } > >> >> > >> >> - ring->next_to_use = ntu; > >> >> if (tx_desc) { > >> >> igc_flush_tx_descriptors(ring); > >> >> xsk_tx_release(pool); > >> >> -- > >> >> 2.34.1 > >> >> > > Thanks & Regards > Siang >
On Tuesday, February 4, 2025 11:18 PM, Fijalkowski, Maciej <maciej.fijalkowski@intel.com> wrote: >On Tue, Feb 04, 2025 at 03:49:32PM +0100, Song, Yoong Siang wrote: [...] >> With above, IMHO, we no need to clear igc_tx_buffer::type, >> Are you agree? > >Yes, the contract should be that every routine that produces descriptors >should be setting the ::type explicitly. Sorry for not being clear from >beginning but I'm glad we're on the same page now. > >I'm afraid this single line should be send as a fix, though :< even >without your patch set empty descs could be mixed IGC_TX_BUFFER_TYPE_XDP >type. > Agree with you. I will send this single line as a fix patch to iwl-net Thanks & Regards Siang
diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c index c3edd8bcf633..535d340c71c9 100644 --- a/drivers/net/ethernet/intel/igc/igc_main.c +++ b/drivers/net/ethernet/intel/igc/igc_main.c @@ -2951,9 +2951,33 @@ static u64 igc_xsk_fill_timestamp(void *_priv) return *(u64 *)_priv; } +static void igc_xsk_request_launch_time(u64 launch_time, void *_priv) +{ + struct igc_metadata_request *meta_req = _priv; + struct igc_ring *tx_ring = meta_req->tx_ring; + __le32 launch_time_offset; + bool insert_empty = false; + bool first_flag = false; + + if (!tx_ring->launchtime_enable) + return; + + launch_time_offset = igc_tx_launchtime(tx_ring, + ns_to_ktime(launch_time), + &first_flag, &insert_empty); + if (insert_empty) { + igc_insert_empty_packet(tx_ring); + meta_req->tx_buffer = + &tx_ring->tx_buffer_info[tx_ring->next_to_use]; + } + + igc_tx_ctxtdesc(tx_ring, launch_time_offset, first_flag, 0, 0, 0); +} + const struct xsk_tx_metadata_ops igc_xsk_tx_metadata_ops = { .tmo_request_timestamp = igc_xsk_request_timestamp, .tmo_fill_timestamp = igc_xsk_fill_timestamp, + .tmo_request_launch_time = igc_xsk_request_launch_time, }; static void igc_xdp_xmit_zc(struct igc_ring *ring) @@ -2976,7 +3000,13 @@ static void igc_xdp_xmit_zc(struct igc_ring *ring) ntu = ring->next_to_use; budget = igc_desc_unused(ring); - while (xsk_tx_peek_desc(pool, &xdp_desc) && budget--) { + /* Packets with launch time require one data descriptor and one context + * descriptor. When the launch time falls into the next Qbv cycle, we + * may need to insert an empty packet, which requires two more + * descriptors. Therefore, to be safe, we always ensure we have at least + * 4 descriptors available. + */ + while (xsk_tx_peek_desc(pool, &xdp_desc) && budget >= 4) { struct igc_metadata_request meta_req; struct xsk_tx_metadata *meta = NULL; struct igc_tx_buffer *bi; @@ -3000,6 +3030,12 @@ static void igc_xdp_xmit_zc(struct igc_ring *ring) xsk_tx_metadata_request(meta, &igc_xsk_tx_metadata_ops, &meta_req); + /* xsk_tx_metadata_request() may have updated next_to_use */ + ntu = ring->next_to_use; + + /* xsk_tx_metadata_request() may have updated Tx buffer info */ + bi = meta_req.tx_buffer; + tx_desc = IGC_TX_DESC(ring, ntu); tx_desc->read.cmd_type_len = cpu_to_le32(meta_req.cmd_type); tx_desc->read.olinfo_status = cpu_to_le32(olinfo_status); @@ -3017,9 +3053,11 @@ static void igc_xdp_xmit_zc(struct igc_ring *ring) ntu++; if (ntu == ring->count) ntu = 0; + + ring->next_to_use = ntu; + budget = igc_desc_unused(ring); } - ring->next_to_use = ntu; if (tx_desc) { igc_flush_tx_descriptors(ring); xsk_tx_release(pool);
Enable Launch Time Control (LTC) support for XDP zero copy via XDP Tx metadata framework. This patch has been tested with tools/testing/selftests/bpf/xdp_hw_metadata on Intel I225-LM Ethernet controller. Below are the test steps and result. Test 1: Send a single packet with the launch time set to 1 s in the future. Test steps: 1. On the DUT, start the xdp_hw_metadata selftest application: $ sudo ./xdp_hw_metadata enp2s0 -l 1000000000 -L 1 2. On the Link Partner, send a UDP packet with VLAN priority 1 to port 9091 of the DUT. Result: When the launch time is set to 1 s in the future, the delta between the launch time and the transmit hardware timestamp is 0.016 us, as shown in printout of the xdp_hw_metadata application below. 0x562ff5dc8880: rx_desc[4]->addr=84110 addr=84110 comp_addr=84110 EoP rx_hash: 0xE343384 with RSS type:0x1 HW RX-time: 1734578015467548904 (sec:1734578015.4675) delta to User RX-time sec:0.0002 (183.103 usec) XDP RX-time: 1734578015467651698 (sec:1734578015.4677) delta to User RX-time sec:0.0001 (80.309 usec) No rx_vlan_tci or rx_vlan_proto, err=-95 0x562ff5dc8880: ping-pong with csum=561c (want c7dd) csum_start=34 csum_offset=6 HW RX-time: 1734578015467548904 (sec:1734578015.4675) delta to HW Launch-time sec:1.0000 (1000000.000 usec) 0x562ff5dc8880: complete tx idx=4 addr=4018 HW Launch-time: 1734578016467548904 (sec:1734578016.4675) delta to HW TX-complete-time sec:0.0000 (0.016 usec) HW TX-complete-time: 1734578016467548920 (sec:1734578016.4675) delta to User TX-complete-time sec:0.0000 (32.546 usec) XDP RX-time: 1734578015467651698 (sec:1734578015.4677) delta to User TX-complete-time sec:0.9999 (999929.768 usec) HW RX-time: 1734578015467548904 (sec:1734578015.4675) delta to HW TX-complete-time sec:1.0000 (1000000.016 usec) 0x562ff5dc8880: complete rx idx=132 addr=84110 Test 2: Send 1000 packets with a 10 ms interval and the launch time set to 500 us in the future. Test steps: 1. On the DUT, start the xdp_hw_metadata selftest application: $ sudo chrt -f 99 ./xdp_hw_metadata enp2s0 -l 500000 -L 1 > \ /dev/shm/result.log 2. On the Link Partner, send 1000 UDP packets with a 10 ms interval and VLAN priority 1 to port 9091 of the DUT. Result: When the launch time is set to 500 us in the future, the average delta between the launch time and the transmit hardware timestamp is 0.016 us, as shown in the analysis of /dev/shm/result.log below. The XDP launch time works correctly in sending 1000 packets continuously. Min delta: 0.005 us Avr delta: 0.016 us Max delta: 0.031 us Total packets forwarded: 1000 Signed-off-by: Song Yoong Siang <yoong.siang.song@intel.com> --- drivers/net/ethernet/intel/igc/igc_main.c | 42 +++++++++++++++++++++-- 1 file changed, 40 insertions(+), 2 deletions(-)