diff mbox series

[bpf-next,v7,5/5] igc: Add launch time support to XDP ZC

Message ID 20250204004907.789330-6-yoong.siang.song@intel.com (mailing list archive)
State New
Headers show
Series xsk: TX metadata Launch Time support | expand

Commit Message

Song Yoong Siang Feb. 4, 2025, 12:49 a.m. UTC
Enable Launch Time Control (LTC) support for XDP zero copy via XDP Tx
metadata framework.

This patch has been tested with tools/testing/selftests/bpf/xdp_hw_metadata
on Intel I225-LM Ethernet controller. Below are the test steps and result.

Test 1: Send a single packet with the launch time set to 1 s in the future.

Test steps:
1. On the DUT, start the xdp_hw_metadata selftest application:
   $ sudo ./xdp_hw_metadata enp2s0 -l 1000000000 -L 1

2. On the Link Partner, send a UDP packet with VLAN priority 1 to port 9091
   of the DUT.

Result:
When the launch time is set to 1 s in the future, the delta between the
launch time and the transmit hardware timestamp is 0.016 us, as shown in
printout of the xdp_hw_metadata application below.
  0x562ff5dc8880: rx_desc[4]->addr=84110 addr=84110 comp_addr=84110 EoP
  rx_hash: 0xE343384 with RSS type:0x1
  HW RX-time:   1734578015467548904 (sec:1734578015.4675)
                delta to User RX-time sec:0.0002 (183.103 usec)
  XDP RX-time:   1734578015467651698 (sec:1734578015.4677)
                 delta to User RX-time sec:0.0001 (80.309 usec)
  No rx_vlan_tci or rx_vlan_proto, err=-95
  0x562ff5dc8880: ping-pong with csum=561c (want c7dd)
                  csum_start=34 csum_offset=6
  HW RX-time:   1734578015467548904 (sec:1734578015.4675)
                delta to HW Launch-time sec:1.0000 (1000000.000 usec)
  0x562ff5dc8880: complete tx idx=4 addr=4018
  HW Launch-time:   1734578016467548904 (sec:1734578016.4675)
                    delta to HW TX-complete-time sec:0.0000 (0.016 usec)
  HW TX-complete-time:   1734578016467548920 (sec:1734578016.4675)
                         delta to User TX-complete-time sec:0.0000
                         (32.546 usec)
  XDP RX-time:   1734578015467651698 (sec:1734578015.4677)
                 delta to User TX-complete-time sec:0.9999
                 (999929.768 usec)
  HW RX-time:   1734578015467548904 (sec:1734578015.4675)
                delta to HW TX-complete-time sec:1.0000 (1000000.016 usec)
  0x562ff5dc8880: complete rx idx=132 addr=84110

Test 2: Send 1000 packets with a 10 ms interval and the launch time set to
        500 us in the future.

Test steps:
1. On the DUT, start the xdp_hw_metadata selftest application:
   $ sudo chrt -f 99 ./xdp_hw_metadata enp2s0 -l 500000 -L 1 > \
     /dev/shm/result.log

2. On the Link Partner, send 1000 UDP packets with a 10 ms interval and
   VLAN priority 1 to port 9091 of the DUT.

Result:
When the launch time is set to 500 us in the future, the average delta
between the launch time and the transmit hardware timestamp is 0.016 us,
as shown in the analysis of /dev/shm/result.log below. The XDP launch time
works correctly in sending 1000 packets continuously.
  Min delta: 0.005 us
  Avr delta: 0.016 us
  Max delta: 0.031 us
  Total packets forwarded: 1000

Signed-off-by: Song Yoong Siang <yoong.siang.song@intel.com>
---
 drivers/net/ethernet/intel/igc/igc_main.c | 42 +++++++++++++++++++++--
 1 file changed, 40 insertions(+), 2 deletions(-)

Comments

Faizal Rahim Feb. 4, 2025, 2:43 a.m. UTC | #1
On 4/2/2025 8:49 am, Song Yoong Siang wrote:
> Enable Launch Time Control (LTC) support for XDP zero copy via XDP Tx
> metadata framework.
> 
> This patch has been tested with tools/testing/selftests/bpf/xdp_hw_metadata
> on Intel I225-LM Ethernet controller. Below are the test steps and result.
> 
> Test 1: Send a single packet with the launch time set to 1 s in the future.
> 
> Test steps:
> 1. On the DUT, start the xdp_hw_metadata selftest application:
>     $ sudo ./xdp_hw_metadata enp2s0 -l 1000000000 -L 1
> 
> 2. On the Link Partner, send a UDP packet with VLAN priority 1 to port 9091
>     of the DUT.
> 
> Result:
> When the launch time is set to 1 s in the future, the delta between the
> launch time and the transmit hardware timestamp is 0.016 us, as shown in
> printout of the xdp_hw_metadata application below.
>    0x562ff5dc8880: rx_desc[4]->addr=84110 addr=84110 comp_addr=84110 EoP
>    rx_hash: 0xE343384 with RSS type:0x1
>    HW RX-time:   1734578015467548904 (sec:1734578015.4675)
>                  delta to User RX-time sec:0.0002 (183.103 usec)
>    XDP RX-time:   1734578015467651698 (sec:1734578015.4677)
>                   delta to User RX-time sec:0.0001 (80.309 usec)
>    No rx_vlan_tci or rx_vlan_proto, err=-95
>    0x562ff5dc8880: ping-pong with csum=561c (want c7dd)
>                    csum_start=34 csum_offset=6
>    HW RX-time:   1734578015467548904 (sec:1734578015.4675)
>                  delta to HW Launch-time sec:1.0000 (1000000.000 usec)
>    0x562ff5dc8880: complete tx idx=4 addr=4018
>    HW Launch-time:   1734578016467548904 (sec:1734578016.4675)
>                      delta to HW TX-complete-time sec:0.0000 (0.016 usec)
>    HW TX-complete-time:   1734578016467548920 (sec:1734578016.4675)
>                           delta to User TX-complete-time sec:0.0000
>                           (32.546 usec)
>    XDP RX-time:   1734578015467651698 (sec:1734578015.4677)
>                   delta to User TX-complete-time sec:0.9999
>                   (999929.768 usec)
>    HW RX-time:   1734578015467548904 (sec:1734578015.4675)
>                  delta to HW TX-complete-time sec:1.0000 (1000000.016 usec)
>    0x562ff5dc8880: complete rx idx=132 addr=84110
> 
> Test 2: Send 1000 packets with a 10 ms interval and the launch time set to
>          500 us in the future.
> 
> Test steps:
> 1. On the DUT, start the xdp_hw_metadata selftest application:
>     $ sudo chrt -f 99 ./xdp_hw_metadata enp2s0 -l 500000 -L 1 > \
>       /dev/shm/result.log
> 
> 2. On the Link Partner, send 1000 UDP packets with a 10 ms interval and
>     VLAN priority 1 to port 9091 of the DUT.
> 
> Result:
> When the launch time is set to 500 us in the future, the average delta
> between the launch time and the transmit hardware timestamp is 0.016 us,
> as shown in the analysis of /dev/shm/result.log below. The XDP launch time
> works correctly in sending 1000 packets continuously.
>    Min delta: 0.005 us
>    Avr delta: 0.016 us
>    Max delta: 0.031 us
>    Total packets forwarded: 1000
> 
> Signed-off-by: Song Yoong Siang <yoong.siang.song@intel.com>
> ---
>   drivers/net/ethernet/intel/igc/igc_main.c | 42 +++++++++++++++++++++--
>   1 file changed, 40 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
> index c3edd8bcf633..535d340c71c9 100644
> --- a/drivers/net/ethernet/intel/igc/igc_main.c
> +++ b/drivers/net/ethernet/intel/igc/igc_main.c
> @@ -2951,9 +2951,33 @@ static u64 igc_xsk_fill_timestamp(void *_priv)
>   	return *(u64 *)_priv;
>   }
>   
> +static void igc_xsk_request_launch_time(u64 launch_time, void *_priv)
> +{
> +	struct igc_metadata_request *meta_req = _priv;
> +	struct igc_ring *tx_ring = meta_req->tx_ring;
> +	__le32 launch_time_offset;
> +	bool insert_empty = false;
> +	bool first_flag = false;
> +
> +	if (!tx_ring->launchtime_enable)
> +		return;
> +
> +	launch_time_offset = igc_tx_launchtime(tx_ring,
> +					       ns_to_ktime(launch_time),
> +					       &first_flag, &insert_empty);
> +	if (insert_empty) {
> +		igc_insert_empty_packet(tx_ring);
> +		meta_req->tx_buffer =
> +			&tx_ring->tx_buffer_info[tx_ring->next_to_use];
> +	}
> +
> +	igc_tx_ctxtdesc(tx_ring, launch_time_offset, first_flag, 0, 0, 0);
> +}
> +
>   const struct xsk_tx_metadata_ops igc_xsk_tx_metadata_ops = {
>   	.tmo_request_timestamp		= igc_xsk_request_timestamp,
>   	.tmo_fill_timestamp		= igc_xsk_fill_timestamp,
> +	.tmo_request_launch_time	= igc_xsk_request_launch_time,
>   };
>   
>   static void igc_xdp_xmit_zc(struct igc_ring *ring)
> @@ -2976,7 +3000,13 @@ static void igc_xdp_xmit_zc(struct igc_ring *ring)
>   	ntu = ring->next_to_use;
>   	budget = igc_desc_unused(ring);
>   
> -	while (xsk_tx_peek_desc(pool, &xdp_desc) && budget--) {
> +	/* Packets with launch time require one data descriptor and one context
> +	 * descriptor. When the launch time falls into the next Qbv cycle, we
> +	 * may need to insert an empty packet, which requires two more
> +	 * descriptors. Therefore, to be safe, we always ensure we have at least
> +	 * 4 descriptors available.
> +	 */
> +	while (xsk_tx_peek_desc(pool, &xdp_desc) && budget >= 4) {
>   		struct igc_metadata_request meta_req;
>   		struct xsk_tx_metadata *meta = NULL;
>   		struct igc_tx_buffer *bi;
> @@ -3000,6 +3030,12 @@ static void igc_xdp_xmit_zc(struct igc_ring *ring)
>   		xsk_tx_metadata_request(meta, &igc_xsk_tx_metadata_ops,
>   					&meta_req);
>   
> +		/* xsk_tx_metadata_request() may have updated next_to_use */
> +		ntu = ring->next_to_use;
> +
> +		/* xsk_tx_metadata_request() may have updated Tx buffer info */
> +		bi = meta_req.tx_buffer;
> +
>   		tx_desc = IGC_TX_DESC(ring, ntu);
>   		tx_desc->read.cmd_type_len = cpu_to_le32(meta_req.cmd_type);
>   		tx_desc->read.olinfo_status = cpu_to_le32(olinfo_status);
> @@ -3017,9 +3053,11 @@ static void igc_xdp_xmit_zc(struct igc_ring *ring)
>   		ntu++;
>   		if (ntu == ring->count)
>   			ntu = 0;
> +
> +		ring->next_to_use = ntu;
> +		budget = igc_desc_unused(ring);
>   	}
>   
> -	ring->next_to_use = ntu;
>   	if (tx_desc) {
>   		igc_flush_tx_descriptors(ring);
>   		xsk_tx_release(pool);

Reviewed-by: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
Maciej Fijalkowski Feb. 4, 2025, 10:09 a.m. UTC | #2
On Tue, Feb 04, 2025 at 08:49:07AM +0800, Song Yoong Siang wrote:

> Enable Launch Time Control (LTC) support for XDP zero copy via XDP Tx
> metadata framework.
> 
> This patch has been tested with tools/testing/selftests/bpf/xdp_hw_metadata
> on Intel I225-LM Ethernet controller. Below are the test steps and result.
> 
> Test 1: Send a single packet with the launch time set to 1 s in the future.
> 
> Test steps:
> 1. On the DUT, start the xdp_hw_metadata selftest application:
>    $ sudo ./xdp_hw_metadata enp2s0 -l 1000000000 -L 1
> 
> 2. On the Link Partner, send a UDP packet with VLAN priority 1 to port 9091
>    of the DUT.
> 
> Result:
> When the launch time is set to 1 s in the future, the delta between the
> launch time and the transmit hardware timestamp is 0.016 us, as shown in
> printout of the xdp_hw_metadata application below.
>   0x562ff5dc8880: rx_desc[4]->addr=84110 addr=84110 comp_addr=84110 EoP
>   rx_hash: 0xE343384 with RSS type:0x1
>   HW RX-time:   1734578015467548904 (sec:1734578015.4675)
>                 delta to User RX-time sec:0.0002 (183.103 usec)
>   XDP RX-time:   1734578015467651698 (sec:1734578015.4677)
>                  delta to User RX-time sec:0.0001 (80.309 usec)
>   No rx_vlan_tci or rx_vlan_proto, err=-95
>   0x562ff5dc8880: ping-pong with csum=561c (want c7dd)
>                   csum_start=34 csum_offset=6
>   HW RX-time:   1734578015467548904 (sec:1734578015.4675)
>                 delta to HW Launch-time sec:1.0000 (1000000.000 usec)
>   0x562ff5dc8880: complete tx idx=4 addr=4018
>   HW Launch-time:   1734578016467548904 (sec:1734578016.4675)
>                     delta to HW TX-complete-time sec:0.0000 (0.016 usec)
>   HW TX-complete-time:   1734578016467548920 (sec:1734578016.4675)
>                          delta to User TX-complete-time sec:0.0000
>                          (32.546 usec)
>   XDP RX-time:   1734578015467651698 (sec:1734578015.4677)
>                  delta to User TX-complete-time sec:0.9999
>                  (999929.768 usec)
>   HW RX-time:   1734578015467548904 (sec:1734578015.4675)
>                 delta to HW TX-complete-time sec:1.0000 (1000000.016 usec)
>   0x562ff5dc8880: complete rx idx=132 addr=84110
> 
> Test 2: Send 1000 packets with a 10 ms interval and the launch time set to
>         500 us in the future.
> 
> Test steps:
> 1. On the DUT, start the xdp_hw_metadata selftest application:
>    $ sudo chrt -f 99 ./xdp_hw_metadata enp2s0 -l 500000 -L 1 > \
>      /dev/shm/result.log
> 
> 2. On the Link Partner, send 1000 UDP packets with a 10 ms interval and
>    VLAN priority 1 to port 9091 of the DUT.
> 
> Result:
> When the launch time is set to 500 us in the future, the average delta
> between the launch time and the transmit hardware timestamp is 0.016 us,
> as shown in the analysis of /dev/shm/result.log below. The XDP launch time
> works correctly in sending 1000 packets continuously.
>   Min delta: 0.005 us
>   Avr delta: 0.016 us
>   Max delta: 0.031 us
>   Total packets forwarded: 1000
> 
> Signed-off-by: Song Yoong Siang <yoong.siang.song@intel.com>
> ---
>  drivers/net/ethernet/intel/igc/igc_main.c | 42 +++++++++++++++++++++--
>  1 file changed, 40 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
> index c3edd8bcf633..535d340c71c9 100644
> --- a/drivers/net/ethernet/intel/igc/igc_main.c
> +++ b/drivers/net/ethernet/intel/igc/igc_main.c
> @@ -2951,9 +2951,33 @@ static u64 igc_xsk_fill_timestamp(void *_priv)
>  	return *(u64 *)_priv;
>  }
>  
> +static void igc_xsk_request_launch_time(u64 launch_time, void *_priv)
> +{
> +	struct igc_metadata_request *meta_req = _priv;
> +	struct igc_ring *tx_ring = meta_req->tx_ring;
> +	__le32 launch_time_offset;
> +	bool insert_empty = false;
> +	bool first_flag = false;
> +
> +	if (!tx_ring->launchtime_enable)
> +		return;
> +
> +	launch_time_offset = igc_tx_launchtime(tx_ring,
> +					       ns_to_ktime(launch_time),
> +					       &first_flag, &insert_empty);
> +	if (insert_empty) {
> +		igc_insert_empty_packet(tx_ring);
> +		meta_req->tx_buffer =
> +			&tx_ring->tx_buffer_info[tx_ring->next_to_use];

in this case I think you currently are leaking the skbs and dma mappings
that igc_init_empty_frame() did. you're going to mix
IGC_TX_BUFFER_TYPE_XSK with IGC_TX_BUFFER_TYPE_SKB and the latter is not
explicitly initialized. Even though IGC_TX_BUFFER_TYPE_SKB happens to be
equal to 0, igc_tx_buffer::type is never cleared in the tx clean desc
routine.

> +	}
> +
> +	igc_tx_ctxtdesc(tx_ring, launch_time_offset, first_flag, 0, 0, 0);
> +}
> +
>  const struct xsk_tx_metadata_ops igc_xsk_tx_metadata_ops = {
>  	.tmo_request_timestamp		= igc_xsk_request_timestamp,
>  	.tmo_fill_timestamp		= igc_xsk_fill_timestamp,
> +	.tmo_request_launch_time	= igc_xsk_request_launch_time,
>  };
>  
>  static void igc_xdp_xmit_zc(struct igc_ring *ring)
> @@ -2976,7 +3000,13 @@ static void igc_xdp_xmit_zc(struct igc_ring *ring)
>  	ntu = ring->next_to_use;
>  	budget = igc_desc_unused(ring);
>  
> -	while (xsk_tx_peek_desc(pool, &xdp_desc) && budget--) {
> +	/* Packets with launch time require one data descriptor and one context
> +	 * descriptor. When the launch time falls into the next Qbv cycle, we
> +	 * may need to insert an empty packet, which requires two more
> +	 * descriptors. Therefore, to be safe, we always ensure we have at least
> +	 * 4 descriptors available.
> +	 */
> +	while (xsk_tx_peek_desc(pool, &xdp_desc) && budget >= 4) {
>  		struct igc_metadata_request meta_req;
>  		struct xsk_tx_metadata *meta = NULL;
>  		struct igc_tx_buffer *bi;
> @@ -3000,6 +3030,12 @@ static void igc_xdp_xmit_zc(struct igc_ring *ring)
>  		xsk_tx_metadata_request(meta, &igc_xsk_tx_metadata_ops,
>  					&meta_req);
>  
> +		/* xsk_tx_metadata_request() may have updated next_to_use */
> +		ntu = ring->next_to_use;
> +
> +		/* xsk_tx_metadata_request() may have updated Tx buffer info */
> +		bi = meta_req.tx_buffer;
> +
>  		tx_desc = IGC_TX_DESC(ring, ntu);
>  		tx_desc->read.cmd_type_len = cpu_to_le32(meta_req.cmd_type);
>  		tx_desc->read.olinfo_status = cpu_to_le32(olinfo_status);
> @@ -3017,9 +3053,11 @@ static void igc_xdp_xmit_zc(struct igc_ring *ring)
>  		ntu++;
>  		if (ntu == ring->count)
>  			ntu = 0;
> +
> +		ring->next_to_use = ntu;
> +		budget = igc_desc_unused(ring);

why count the remaining space in loop? couldn't you decrement it
accordingly to the count of descriptors you have produced? writing ntu
back and forth between local var and ring struct performance-wise does not
look fine.

>  	}
>  
> -	ring->next_to_use = ntu;
>  	if (tx_desc) {
>  		igc_flush_tx_descriptors(ring);
>  		xsk_tx_release(pool);
> -- 
> 2.34.1
>
Song Yoong Siang Feb. 4, 2025, 1:14 p.m. UTC | #3
On Tuesday, February 4, 2025 6:10 PM, Fijalkowski, Maciej <maciej.fijalkowski@intel.com> wrote:
>On Tue, Feb 04, 2025 at 08:49:07AM +0800, Song Yoong Siang wrote:
>
>> Enable Launch Time Control (LTC) support for XDP zero copy via XDP Tx
>> metadata framework.
>>
>> This patch has been tested with tools/testing/selftests/bpf/xdp_hw_metadata
>> on Intel I225-LM Ethernet controller. Below are the test steps and result.
>>
>> Test 1: Send a single packet with the launch time set to 1 s in the future.
>>
>> Test steps:
>> 1. On the DUT, start the xdp_hw_metadata selftest application:
>>    $ sudo ./xdp_hw_metadata enp2s0 -l 1000000000 -L 1
>>
>> 2. On the Link Partner, send a UDP packet with VLAN priority 1 to port 9091
>>    of the DUT.
>>
>> Result:
>> When the launch time is set to 1 s in the future, the delta between the
>> launch time and the transmit hardware timestamp is 0.016 us, as shown in
>> printout of the xdp_hw_metadata application below.
>>   0x562ff5dc8880: rx_desc[4]->addr=84110 addr=84110 comp_addr=84110 EoP
>>   rx_hash: 0xE343384 with RSS type:0x1
>>   HW RX-time:   1734578015467548904 (sec:1734578015.4675)
>>                 delta to User RX-time sec:0.0002 (183.103 usec)
>>   XDP RX-time:   1734578015467651698 (sec:1734578015.4677)
>>                  delta to User RX-time sec:0.0001 (80.309 usec)
>>   No rx_vlan_tci or rx_vlan_proto, err=-95
>>   0x562ff5dc8880: ping-pong with csum=561c (want c7dd)
>>                   csum_start=34 csum_offset=6
>>   HW RX-time:   1734578015467548904 (sec:1734578015.4675)
>>                 delta to HW Launch-time sec:1.0000 (1000000.000 usec)
>>   0x562ff5dc8880: complete tx idx=4 addr=4018
>>   HW Launch-time:   1734578016467548904 (sec:1734578016.4675)
>>                     delta to HW TX-complete-time sec:0.0000 (0.016 usec)
>>   HW TX-complete-time:   1734578016467548920 (sec:1734578016.4675)
>>                          delta to User TX-complete-time sec:0.0000
>>                          (32.546 usec)
>>   XDP RX-time:   1734578015467651698 (sec:1734578015.4677)
>>                  delta to User TX-complete-time sec:0.9999
>>                  (999929.768 usec)
>>   HW RX-time:   1734578015467548904 (sec:1734578015.4675)
>>                 delta to HW TX-complete-time sec:1.0000 (1000000.016 usec)
>>   0x562ff5dc8880: complete rx idx=132 addr=84110
>>
>> Test 2: Send 1000 packets with a 10 ms interval and the launch time set to
>>         500 us in the future.
>>
>> Test steps:
>> 1. On the DUT, start the xdp_hw_metadata selftest application:
>>    $ sudo chrt -f 99 ./xdp_hw_metadata enp2s0 -l 500000 -L 1 > \
>>      /dev/shm/result.log
>>
>> 2. On the Link Partner, send 1000 UDP packets with a 10 ms interval and
>>    VLAN priority 1 to port 9091 of the DUT.
>>
>> Result:
>> When the launch time is set to 500 us in the future, the average delta
>> between the launch time and the transmit hardware timestamp is 0.016 us,
>> as shown in the analysis of /dev/shm/result.log below. The XDP launch time
>> works correctly in sending 1000 packets continuously.
>>   Min delta: 0.005 us
>>   Avr delta: 0.016 us
>>   Max delta: 0.031 us
>>   Total packets forwarded: 1000
>>
>> Signed-off-by: Song Yoong Siang <yoong.siang.song@intel.com>
>> ---
>>  drivers/net/ethernet/intel/igc/igc_main.c | 42 +++++++++++++++++++++--
>>  1 file changed, 40 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/intel/igc/igc_main.c
>b/drivers/net/ethernet/intel/igc/igc_main.c
>> index c3edd8bcf633..535d340c71c9 100644
>> --- a/drivers/net/ethernet/intel/igc/igc_main.c
>> +++ b/drivers/net/ethernet/intel/igc/igc_main.c
>> @@ -2951,9 +2951,33 @@ static u64 igc_xsk_fill_timestamp(void *_priv)
>>  	return *(u64 *)_priv;
>>  }
>>
>> +static void igc_xsk_request_launch_time(u64 launch_time, void *_priv)
>> +{
>> +	struct igc_metadata_request *meta_req = _priv;
>> +	struct igc_ring *tx_ring = meta_req->tx_ring;
>> +	__le32 launch_time_offset;
>> +	bool insert_empty = false;
>> +	bool first_flag = false;
>> +
>> +	if (!tx_ring->launchtime_enable)
>> +		return;
>> +
>> +	launch_time_offset = igc_tx_launchtime(tx_ring,
>> +					       ns_to_ktime(launch_time),
>> +					       &first_flag, &insert_empty);
>> +	if (insert_empty) {
>> +		igc_insert_empty_packet(tx_ring);
>> +		meta_req->tx_buffer =
>> +			&tx_ring->tx_buffer_info[tx_ring->next_to_use];
>
>in this case I think you currently are leaking the skbs and dma mappings
>that igc_init_empty_frame() did. you're going to mix
>IGC_TX_BUFFER_TYPE_XSK with IGC_TX_BUFFER_TYPE_SKB and the latter is not
>explicitly initialized. Even though IGC_TX_BUFFER_TYPE_SKB happens to be
>equal to 0, igc_tx_buffer::type is never cleared in the tx clean desc
>routine.
>

Hi Fijalkowski Maciej,

Thanks for your inputs.

Yes, you are right, IGC_TX_BUFFER_TYPE_SKB is mixed together with
IGC_TX_BUFFER_TYPE_XSK. Regarding the skb and dma map, 
following code in igc_clean_tx_irq() will free the skb and unmap the dma,
Do these answer your concern on leaking?

igc_main.c:3133:                case IGC_TX_BUFFER_TYPE_SKB:
igc_main.c-3134-                        napi_consume_skb(tx_buffer->skb, napi_budget);
igc_main.c-3135-                        igc_unmap_tx_buffer(tx_ring->dev, tx_buffer);
igc_main.c-3136-                        break;

Regarding the igc_tx_buffer::type never cleared, I think the
important thing is making the igc_tx_buffer::next_to_watch NULL
to indicate no remaining packet. Since transmit function will
always set the igc_tx_buffer::type to a proper type,
I think it is optional for us to clear it.
Is that make sense to you?

>> +	}
>> +
>> +	igc_tx_ctxtdesc(tx_ring, launch_time_offset, first_flag, 0, 0, 0);
>> +}
>> +
>>  const struct xsk_tx_metadata_ops igc_xsk_tx_metadata_ops = {
>>  	.tmo_request_timestamp		= igc_xsk_request_timestamp,
>>  	.tmo_fill_timestamp		= igc_xsk_fill_timestamp,
>> +	.tmo_request_launch_time	= igc_xsk_request_launch_time,
>>  };
>>
>>  static void igc_xdp_xmit_zc(struct igc_ring *ring)
>> @@ -2976,7 +3000,13 @@ static void igc_xdp_xmit_zc(struct igc_ring *ring)
>>  	ntu = ring->next_to_use;
>>  	budget = igc_desc_unused(ring);
>>
>> -	while (xsk_tx_peek_desc(pool, &xdp_desc) && budget--) {
>> +	/* Packets with launch time require one data descriptor and one context
>> +	 * descriptor. When the launch time falls into the next Qbv cycle, we
>> +	 * may need to insert an empty packet, which requires two more
>> +	 * descriptors. Therefore, to be safe, we always ensure we have at least
>> +	 * 4 descriptors available.
>> +	 */
>> +	while (xsk_tx_peek_desc(pool, &xdp_desc) && budget >= 4) {
>>  		struct igc_metadata_request meta_req;
>>  		struct xsk_tx_metadata *meta = NULL;
>>  		struct igc_tx_buffer *bi;
>> @@ -3000,6 +3030,12 @@ static void igc_xdp_xmit_zc(struct igc_ring *ring)
>>  		xsk_tx_metadata_request(meta, &igc_xsk_tx_metadata_ops,
>>  					&meta_req);
>>
>> +		/* xsk_tx_metadata_request() may have updated next_to_use */
>> +		ntu = ring->next_to_use;
>> +
>> +		/* xsk_tx_metadata_request() may have updated Tx buffer info */
>> +		bi = meta_req.tx_buffer;
>> +
>>  		tx_desc = IGC_TX_DESC(ring, ntu);
>>  		tx_desc->read.cmd_type_len = cpu_to_le32(meta_req.cmd_type);
>>  		tx_desc->read.olinfo_status = cpu_to_le32(olinfo_status);
>> @@ -3017,9 +3053,11 @@ static void igc_xdp_xmit_zc(struct igc_ring *ring)
>>  		ntu++;
>>  		if (ntu == ring->count)
>>  			ntu = 0;
>> +
>> +		ring->next_to_use = ntu;
>> +		budget = igc_desc_unused(ring);
>
>why count the remaining space in loop? couldn't you decrement it
>accordingly to the count of descriptors you have produced? writing ntu
>back and forth between local var and ring struct performance-wise does not
>look fine.
>

Yes, I can check the number of used descriptor in xsk_tx_metadata_request()
by introducing a new field named used_desc in struct igc_metadata_request,
and then decreases the budget with it.

Do this way looked good to you?
 
Thanks & Regards
Siang

>>  	}
>>
>> -	ring->next_to_use = ntu;
>>  	if (tx_desc) {
>>  		igc_flush_tx_descriptors(ring);
>>  		xsk_tx_release(pool);
>> --
>> 2.34.1
>>
Maciej Fijalkowski Feb. 4, 2025, 2:03 p.m. UTC | #4
On Tue, Feb 04, 2025 at 02:14:00PM +0100, Song, Yoong Siang wrote:
> On Tuesday, February 4, 2025 6:10 PM, Fijalkowski, Maciej <maciej.fijalkowski@intel.com> wrote:
> >On Tue, Feb 04, 2025 at 08:49:07AM +0800, Song Yoong Siang wrote:
> >
> >> Enable Launch Time Control (LTC) support for XDP zero copy via XDP Tx
> >> metadata framework.
> >>
> >> This patch has been tested with tools/testing/selftests/bpf/xdp_hw_metadata
> >> on Intel I225-LM Ethernet controller. Below are the test steps and result.
> >>
> >> Test 1: Send a single packet with the launch time set to 1 s in the future.
> >>
> >> Test steps:
> >> 1. On the DUT, start the xdp_hw_metadata selftest application:
> >>    $ sudo ./xdp_hw_metadata enp2s0 -l 1000000000 -L 1
> >>
> >> 2. On the Link Partner, send a UDP packet with VLAN priority 1 to port 9091
> >>    of the DUT.
> >>
> >> Result:
> >> When the launch time is set to 1 s in the future, the delta between the
> >> launch time and the transmit hardware timestamp is 0.016 us, as shown in
> >> printout of the xdp_hw_metadata application below.
> >>   0x562ff5dc8880: rx_desc[4]->addr=84110 addr=84110 comp_addr=84110 EoP
> >>   rx_hash: 0xE343384 with RSS type:0x1
> >>   HW RX-time:   1734578015467548904 (sec:1734578015.4675)
> >>                 delta to User RX-time sec:0.0002 (183.103 usec)
> >>   XDP RX-time:   1734578015467651698 (sec:1734578015.4677)
> >>                  delta to User RX-time sec:0.0001 (80.309 usec)
> >>   No rx_vlan_tci or rx_vlan_proto, err=-95
> >>   0x562ff5dc8880: ping-pong with csum=561c (want c7dd)
> >>                   csum_start=34 csum_offset=6
> >>   HW RX-time:   1734578015467548904 (sec:1734578015.4675)
> >>                 delta to HW Launch-time sec:1.0000 (1000000.000 usec)
> >>   0x562ff5dc8880: complete tx idx=4 addr=4018
> >>   HW Launch-time:   1734578016467548904 (sec:1734578016.4675)
> >>                     delta to HW TX-complete-time sec:0.0000 (0.016 usec)
> >>   HW TX-complete-time:   1734578016467548920 (sec:1734578016.4675)
> >>                          delta to User TX-complete-time sec:0.0000
> >>                          (32.546 usec)
> >>   XDP RX-time:   1734578015467651698 (sec:1734578015.4677)
> >>                  delta to User TX-complete-time sec:0.9999
> >>                  (999929.768 usec)
> >>   HW RX-time:   1734578015467548904 (sec:1734578015.4675)
> >>                 delta to HW TX-complete-time sec:1.0000 (1000000.016 usec)
> >>   0x562ff5dc8880: complete rx idx=132 addr=84110
> >>
> >> Test 2: Send 1000 packets with a 10 ms interval and the launch time set to
> >>         500 us in the future.
> >>
> >> Test steps:
> >> 1. On the DUT, start the xdp_hw_metadata selftest application:
> >>    $ sudo chrt -f 99 ./xdp_hw_metadata enp2s0 -l 500000 -L 1 > \
> >>      /dev/shm/result.log
> >>
> >> 2. On the Link Partner, send 1000 UDP packets with a 10 ms interval and
> >>    VLAN priority 1 to port 9091 of the DUT.
> >>
> >> Result:
> >> When the launch time is set to 500 us in the future, the average delta
> >> between the launch time and the transmit hardware timestamp is 0.016 us,
> >> as shown in the analysis of /dev/shm/result.log below. The XDP launch time
> >> works correctly in sending 1000 packets continuously.
> >>   Min delta: 0.005 us
> >>   Avr delta: 0.016 us
> >>   Max delta: 0.031 us
> >>   Total packets forwarded: 1000
> >>
> >> Signed-off-by: Song Yoong Siang <yoong.siang.song@intel.com>
> >> ---
> >>  drivers/net/ethernet/intel/igc/igc_main.c | 42 +++++++++++++++++++++--
> >>  1 file changed, 40 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/drivers/net/ethernet/intel/igc/igc_main.c
> >b/drivers/net/ethernet/intel/igc/igc_main.c
> >> index c3edd8bcf633..535d340c71c9 100644
> >> --- a/drivers/net/ethernet/intel/igc/igc_main.c
> >> +++ b/drivers/net/ethernet/intel/igc/igc_main.c
> >> @@ -2951,9 +2951,33 @@ static u64 igc_xsk_fill_timestamp(void *_priv)
> >>  	return *(u64 *)_priv;
> >>  }
> >>
> >> +static void igc_xsk_request_launch_time(u64 launch_time, void *_priv)
> >> +{
> >> +	struct igc_metadata_request *meta_req = _priv;
> >> +	struct igc_ring *tx_ring = meta_req->tx_ring;
> >> +	__le32 launch_time_offset;
> >> +	bool insert_empty = false;
> >> +	bool first_flag = false;
> >> +
> >> +	if (!tx_ring->launchtime_enable)
> >> +		return;
> >> +
> >> +	launch_time_offset = igc_tx_launchtime(tx_ring,
> >> +					       ns_to_ktime(launch_time),
> >> +					       &first_flag, &insert_empty);
> >> +	if (insert_empty) {
> >> +		igc_insert_empty_packet(tx_ring);
> >> +		meta_req->tx_buffer =
> >> +			&tx_ring->tx_buffer_info[tx_ring->next_to_use];
> >
> >in this case I think you currently are leaking the skbs and dma mappings
> >that igc_init_empty_frame() did. you're going to mix
> >IGC_TX_BUFFER_TYPE_XSK with IGC_TX_BUFFER_TYPE_SKB and the latter is not
> >explicitly initialized. Even though IGC_TX_BUFFER_TYPE_SKB happens to be
> >equal to 0, igc_tx_buffer::type is never cleared in the tx clean desc
> >routine.
> >
> 
> Hi Fijalkowski Maciej,
> 
> Thanks for your inputs.
> 
> Yes, you are right, IGC_TX_BUFFER_TYPE_SKB is mixed together with
> IGC_TX_BUFFER_TYPE_XSK. Regarding the skb and dma map, 
> following code in igc_clean_tx_irq() will free the skb and unmap the dma,
> Do these answer your concern on leaking?
> 
> igc_main.c:3133:                case IGC_TX_BUFFER_TYPE_SKB:
> igc_main.c-3134-                        napi_consume_skb(tx_buffer->skb, napi_budget);
> igc_main.c-3135-                        igc_unmap_tx_buffer(tx_ring->dev, tx_buffer);
> igc_main.c-3136-                        break;
> 
> Regarding the igc_tx_buffer::type never cleared, I think the
> important thing is making the igc_tx_buffer::next_to_watch NULL
> to indicate no remaining packet. Since transmit function will
> always set the igc_tx_buffer::type to a proper type,

igc_init_tx_empty_descriptor() does not set ::type, that was my point. So
these empty descs might be treated as IGC_TX_BUFFER_TYPE_XSK, which is
wrong and your qouted code above will never get called. You will then leak
skb and dma map plus you will confuse XSK code due to xsk_frames miscount.

> I think it is optional for us to clear it.
> Is that make sense to you?
> 
> >> +	}
> >> +
> >> +	igc_tx_ctxtdesc(tx_ring, launch_time_offset, first_flag, 0, 0, 0);
> >> +}
> >> +
> >>  const struct xsk_tx_metadata_ops igc_xsk_tx_metadata_ops = {
> >>  	.tmo_request_timestamp		= igc_xsk_request_timestamp,
> >>  	.tmo_fill_timestamp		= igc_xsk_fill_timestamp,
> >> +	.tmo_request_launch_time	= igc_xsk_request_launch_time,
> >>  };
> >>
> >>  static void igc_xdp_xmit_zc(struct igc_ring *ring)
> >> @@ -2976,7 +3000,13 @@ static void igc_xdp_xmit_zc(struct igc_ring *ring)
> >>  	ntu = ring->next_to_use;
> >>  	budget = igc_desc_unused(ring);
> >>
> >> -	while (xsk_tx_peek_desc(pool, &xdp_desc) && budget--) {
> >> +	/* Packets with launch time require one data descriptor and one context
> >> +	 * descriptor. When the launch time falls into the next Qbv cycle, we
> >> +	 * may need to insert an empty packet, which requires two more
> >> +	 * descriptors. Therefore, to be safe, we always ensure we have at least
> >> +	 * 4 descriptors available.
> >> +	 */
> >> +	while (xsk_tx_peek_desc(pool, &xdp_desc) && budget >= 4) {
> >>  		struct igc_metadata_request meta_req;
> >>  		struct xsk_tx_metadata *meta = NULL;
> >>  		struct igc_tx_buffer *bi;
> >> @@ -3000,6 +3030,12 @@ static void igc_xdp_xmit_zc(struct igc_ring *ring)
> >>  		xsk_tx_metadata_request(meta, &igc_xsk_tx_metadata_ops,
> >>  					&meta_req);
> >>
> >> +		/* xsk_tx_metadata_request() may have updated next_to_use */
> >> +		ntu = ring->next_to_use;
> >> +
> >> +		/* xsk_tx_metadata_request() may have updated Tx buffer info */
> >> +		bi = meta_req.tx_buffer;
> >> +
> >>  		tx_desc = IGC_TX_DESC(ring, ntu);
> >>  		tx_desc->read.cmd_type_len = cpu_to_le32(meta_req.cmd_type);
> >>  		tx_desc->read.olinfo_status = cpu_to_le32(olinfo_status);
> >> @@ -3017,9 +3053,11 @@ static void igc_xdp_xmit_zc(struct igc_ring *ring)
> >>  		ntu++;
> >>  		if (ntu == ring->count)
> >>  			ntu = 0;
> >> +
> >> +		ring->next_to_use = ntu;
> >> +		budget = igc_desc_unused(ring);
> >
> >why count the remaining space in loop? couldn't you decrement it
> >accordingly to the count of descriptors you have produced? writing ntu
> >back and forth between local var and ring struct performance-wise does not
> >look fine.
> >
> 
> Yes, I can check the number of used descriptor in xsk_tx_metadata_request()
> by introducing a new field named used_desc in struct igc_metadata_request,
> and then decreases the budget with it.
> 
> Do this way looked good to you?

Yes this makes sense to me, thanks!

>  
> Thanks & Regards
> Siang
> 
> >>  	}
> >>
> >> -	ring->next_to_use = ntu;
> >>  	if (tx_desc) {
> >>  		igc_flush_tx_descriptors(ring);
> >>  		xsk_tx_release(pool);
> >> --
> >> 2.34.1
> >>
Song Yoong Siang Feb. 4, 2025, 2:49 p.m. UTC | #5
On Tuesday, February 4, 2025 10:04 PM, Fijalkowski, Maciej <maciej.fijalkowski@intel.com> wrote:
>On Tue, Feb 04, 2025 at 02:14:00PM +0100, Song, Yoong Siang wrote:
>> On Tuesday, February 4, 2025 6:10 PM, Fijalkowski, Maciej<maciej.fijalkowski@intel.com> wrote:
>> >On Tue, Feb 04, 2025 at 08:49:07AM +0800, Song Yoong Siang wrote:
>> >
>> >> Enable Launch Time Control (LTC) support for XDP zero copy via XDP Tx
>> >> metadata framework.
>> >>
>> >> This patch has been tested with tools/testing/selftests/bpf/xdp_hw_metadata
>> >> on Intel I225-LM Ethernet controller. Below are the test steps and result.
>> >>
>> >> Test 1: Send a single packet with the launch time set to 1 s in the future.
>> >>
>> >> Test steps:
>> >> 1. On the DUT, start the xdp_hw_metadata selftest application:
>> >>    $ sudo ./xdp_hw_metadata enp2s0 -l 1000000000 -L 1
>> >>
>> >> 2. On the Link Partner, send a UDP packet with VLAN priority 1 to port 9091
>> >>    of the DUT.
>> >>
>> >> Result:
>> >> When the launch time is set to 1 s in the future, the delta between the
>> >> launch time and the transmit hardware timestamp is 0.016 us, as shown in
>> >> printout of the xdp_hw_metadata application below.
>> >>   0x562ff5dc8880: rx_desc[4]->addr=84110 addr=84110 comp_addr=84110 EoP
>> >>   rx_hash: 0xE343384 with RSS type:0x1
>> >>   HW RX-time:   1734578015467548904 (sec:1734578015.4675)
>> >>                 delta to User RX-time sec:0.0002 (183.103 usec)
>> >>   XDP RX-time:   1734578015467651698 (sec:1734578015.4677)
>> >>                  delta to User RX-time sec:0.0001 (80.309 usec)
>> >>   No rx_vlan_tci or rx_vlan_proto, err=-95
>> >>   0x562ff5dc8880: ping-pong with csum=561c (want c7dd)
>> >>                   csum_start=34 csum_offset=6
>> >>   HW RX-time:   1734578015467548904 (sec:1734578015.4675)
>> >>                 delta to HW Launch-time sec:1.0000 (1000000.000 usec)
>> >>   0x562ff5dc8880: complete tx idx=4 addr=4018
>> >>   HW Launch-time:   1734578016467548904 (sec:1734578016.4675)
>> >>                     delta to HW TX-complete-time sec:0.0000 (0.016 usec)
>> >>   HW TX-complete-time:   1734578016467548920 (sec:1734578016.4675)
>> >>                          delta to User TX-complete-time sec:0.0000
>> >>                          (32.546 usec)
>> >>   XDP RX-time:   1734578015467651698 (sec:1734578015.4677)
>> >>                  delta to User TX-complete-time sec:0.9999
>> >>                  (999929.768 usec)
>> >>   HW RX-time:   1734578015467548904 (sec:1734578015.4675)
>> >>                 delta to HW TX-complete-time sec:1.0000 (1000000.016 usec)
>> >>   0x562ff5dc8880: complete rx idx=132 addr=84110
>> >>
>> >> Test 2: Send 1000 packets with a 10 ms interval and the launch time set to
>> >>         500 us in the future.
>> >>
>> >> Test steps:
>> >> 1. On the DUT, start the xdp_hw_metadata selftest application:
>> >>    $ sudo chrt -f 99 ./xdp_hw_metadata enp2s0 -l 500000 -L 1 > \
>> >>      /dev/shm/result.log
>> >>
>> >> 2. On the Link Partner, send 1000 UDP packets with a 10 ms interval and
>> >>    VLAN priority 1 to port 9091 of the DUT.
>> >>
>> >> Result:
>> >> When the launch time is set to 500 us in the future, the average delta
>> >> between the launch time and the transmit hardware timestamp is 0.016 us,
>> >> as shown in the analysis of /dev/shm/result.log below. The XDP launch time
>> >> works correctly in sending 1000 packets continuously.
>> >>   Min delta: 0.005 us
>> >>   Avr delta: 0.016 us
>> >>   Max delta: 0.031 us
>> >>   Total packets forwarded: 1000
>> >>
>> >> Signed-off-by: Song Yoong Siang <yoong.siang.song@intel.com>
>> >> ---
>> >>  drivers/net/ethernet/intel/igc/igc_main.c | 42 +++++++++++++++++++++--
>> >>  1 file changed, 40 insertions(+), 2 deletions(-)
>> >>
>> >> diff --git a/drivers/net/ethernet/intel/igc/igc_main.c
>> >b/drivers/net/ethernet/intel/igc/igc_main.c
>> >> index c3edd8bcf633..535d340c71c9 100644
>> >> --- a/drivers/net/ethernet/intel/igc/igc_main.c
>> >> +++ b/drivers/net/ethernet/intel/igc/igc_main.c
>> >> @@ -2951,9 +2951,33 @@ static u64 igc_xsk_fill_timestamp(void *_priv)
>> >>  	return *(u64 *)_priv;
>> >>  }
>> >>
>> >> +static void igc_xsk_request_launch_time(u64 launch_time, void *_priv)
>> >> +{
>> >> +	struct igc_metadata_request *meta_req = _priv;
>> >> +	struct igc_ring *tx_ring = meta_req->tx_ring;
>> >> +	__le32 launch_time_offset;
>> >> +	bool insert_empty = false;
>> >> +	bool first_flag = false;
>> >> +
>> >> +	if (!tx_ring->launchtime_enable)
>> >> +		return;
>> >> +
>> >> +	launch_time_offset = igc_tx_launchtime(tx_ring,
>> >> +					       ns_to_ktime(launch_time),
>> >> +					       &first_flag, &insert_empty);
>> >> +	if (insert_empty) {
>> >> +		igc_insert_empty_packet(tx_ring);
>> >> +		meta_req->tx_buffer =
>> >> +			&tx_ring->tx_buffer_info[tx_ring->next_to_use];
>> >
>> >in this case I think you currently are leaking the skbs and dma mappings
>> >that igc_init_empty_frame() did. you're going to mix
>> >IGC_TX_BUFFER_TYPE_XSK with IGC_TX_BUFFER_TYPE_SKB and the latter is not
>> >explicitly initialized. Even though IGC_TX_BUFFER_TYPE_SKB happens to be
>> >equal to 0, igc_tx_buffer::type is never cleared in the tx clean desc
>> >routine.
>> >
>>
>> Hi Fijalkowski Maciej,
>>
>> Thanks for your inputs.
>>
>> Yes, you are right, IGC_TX_BUFFER_TYPE_SKB is mixed together with
>> IGC_TX_BUFFER_TYPE_XSK. Regarding the skb and dma map,
>> following code in igc_clean_tx_irq() will free the skb and unmap the dma,
>> Do these answer your concern on leaking?
>>
>> igc_main.c:3133:                case IGC_TX_BUFFER_TYPE_SKB:
>> igc_main.c-3134-                        napi_consume_skb(tx_buffer->skb, napi_budget);
>> igc_main.c-3135-                        igc_unmap_tx_buffer(tx_ring->dev, tx_buffer);
>> igc_main.c-3136-                        break;
>>
>> Regarding the igc_tx_buffer::type never cleared, I think the
>> important thing is making the igc_tx_buffer::next_to_watch NULL
>> to indicate no remaining packet. Since transmit function will
>> always set the igc_tx_buffer::type to a proper type,
>
>igc_init_tx_empty_descriptor() does not set ::type, that was my point. So
>these empty descs might be treated as IGC_TX_BUFFER_TYPE_XSK, which is
>wrong and your qouted code above will never get called. You will then leak
>skb and dma map plus you will confuse XSK code due to xsk_frames miscount.
>

Now I understand what you mean. Thanks for the explanation. I will add the
missing IGC_TX_BUFFER_TYPE_SKB initialization in igc_init_empty_frame(),
as below.

--- a/drivers/net/ethernet/intel/igc/igc_main.c
+++ b/drivers/net/ethernet/intel/igc/igc_main.c
@@ -1096,6 +1096,7 @@ static int igc_init_empty_frame(struct igc_ring *ring,
                return -ENOMEM;
        }

+       buffer->type = IGC_TX_BUFFER_TYPE_SKB;
        buffer->skb = skb;
        buffer->protocol = 0;
        buffer->bytecount = skb->len;

With above, IMHO, we no need to clear igc_tx_buffer::type,
Are you agree?

>> I think it is optional for us to clear it.
>> Is that make sense to you?
>>
>> >> +	}
>> >> +
>> >> +	igc_tx_ctxtdesc(tx_ring, launch_time_offset, first_flag, 0, 0, 0);
>> >> +}
>> >> +
>> >>  const struct xsk_tx_metadata_ops igc_xsk_tx_metadata_ops = {
>> >>  	.tmo_request_timestamp		= igc_xsk_request_timestamp,
>> >>  	.tmo_fill_timestamp		= igc_xsk_fill_timestamp,
>> >> +	.tmo_request_launch_time	= igc_xsk_request_launch_time,
>> >>  };
>> >>
>> >>  static void igc_xdp_xmit_zc(struct igc_ring *ring)
>> >> @@ -2976,7 +3000,13 @@ static void igc_xdp_xmit_zc(struct igc_ring *ring)
>> >>  	ntu = ring->next_to_use;
>> >>  	budget = igc_desc_unused(ring);
>> >>
>> >> -	while (xsk_tx_peek_desc(pool, &xdp_desc) && budget--) {
>> >> +	/* Packets with launch time require one data descriptor and one context
>> >> +	 * descriptor. When the launch time falls into the next Qbv cycle, we
>> >> +	 * may need to insert an empty packet, which requires two more
>> >> +	 * descriptors. Therefore, to be safe, we always ensure we have at least
>> >> +	 * 4 descriptors available.
>> >> +	 */
>> >> +	while (xsk_tx_peek_desc(pool, &xdp_desc) && budget >= 4) {
>> >>  		struct igc_metadata_request meta_req;
>> >>  		struct xsk_tx_metadata *meta = NULL;
>> >>  		struct igc_tx_buffer *bi;
>> >> @@ -3000,6 +3030,12 @@ static void igc_xdp_xmit_zc(struct igc_ring *ring)
>> >>  		xsk_tx_metadata_request(meta, &igc_xsk_tx_metadata_ops,
>> >>  					&meta_req);
>> >>
>> >> +		/* xsk_tx_metadata_request() may have updated next_to_use */
>> >> +		ntu = ring->next_to_use;
>> >> +
>> >> +		/* xsk_tx_metadata_request() may have updated Tx buffer info */
>> >> +		bi = meta_req.tx_buffer;
>> >> +
>> >>  		tx_desc = IGC_TX_DESC(ring, ntu);
>> >>  		tx_desc->read.cmd_type_len = cpu_to_le32(meta_req.cmd_type);
>> >>  		tx_desc->read.olinfo_status = cpu_to_le32(olinfo_status);
>> >> @@ -3017,9 +3053,11 @@ static void igc_xdp_xmit_zc(struct igc_ring *ring)
>> >>  		ntu++;
>> >>  		if (ntu == ring->count)
>> >>  			ntu = 0;
>> >> +
>> >> +		ring->next_to_use = ntu;
>> >> +		budget = igc_desc_unused(ring);
>> >
>> >why count the remaining space in loop? couldn't you decrement it
>> >accordingly to the count of descriptors you have produced? writing ntu
>> >back and forth between local var and ring struct performance-wise does not
>> >look fine.
>> >
>>
>> Yes, I can check the number of used descriptor in xsk_tx_metadata_request()
>> by introducing a new field named used_desc in struct igc_metadata_request,
>> and then decreases the budget with it.
>>
>> Do this way looked good to you?
>
>Yes this makes sense to me, thanks!
>

Thanks, I will rework and submit next version.

>>
>> Thanks & Regards
>> Siang
>>
>> >>  	}
>> >>
>> >> -	ring->next_to_use = ntu;
>> >>  	if (tx_desc) {
>> >>  		igc_flush_tx_descriptors(ring);
>> >>  		xsk_tx_release(pool);
>> >> --
>> >> 2.34.1
>> >>

Thanks & Regards
Siang
Maciej Fijalkowski Feb. 4, 2025, 3:17 p.m. UTC | #6
On Tue, Feb 04, 2025 at 03:49:32PM +0100, Song, Yoong Siang wrote:
> On Tuesday, February 4, 2025 10:04 PM, Fijalkowski, Maciej <maciej.fijalkowski@intel.com> wrote:
> >On Tue, Feb 04, 2025 at 02:14:00PM +0100, Song, Yoong Siang wrote:
> >> On Tuesday, February 4, 2025 6:10 PM, Fijalkowski, Maciej<maciej.fijalkowski@intel.com> wrote:
> >> >On Tue, Feb 04, 2025 at 08:49:07AM +0800, Song Yoong Siang wrote:
> >> >
> >> >> Enable Launch Time Control (LTC) support for XDP zero copy via XDP Tx
> >> >> metadata framework.
> >> >>
> >> >> This patch has been tested with tools/testing/selftests/bpf/xdp_hw_metadata
> >> >> on Intel I225-LM Ethernet controller. Below are the test steps and result.
> >> >>
> >> >> Test 1: Send a single packet with the launch time set to 1 s in the future.
> >> >>
> >> >> Test steps:
> >> >> 1. On the DUT, start the xdp_hw_metadata selftest application:
> >> >>    $ sudo ./xdp_hw_metadata enp2s0 -l 1000000000 -L 1
> >> >>
> >> >> 2. On the Link Partner, send a UDP packet with VLAN priority 1 to port 9091
> >> >>    of the DUT.
> >> >>
> >> >> Result:
> >> >> When the launch time is set to 1 s in the future, the delta between the
> >> >> launch time and the transmit hardware timestamp is 0.016 us, as shown in
> >> >> printout of the xdp_hw_metadata application below.
> >> >>   0x562ff5dc8880: rx_desc[4]->addr=84110 addr=84110 comp_addr=84110 EoP
> >> >>   rx_hash: 0xE343384 with RSS type:0x1
> >> >>   HW RX-time:   1734578015467548904 (sec:1734578015.4675)
> >> >>                 delta to User RX-time sec:0.0002 (183.103 usec)
> >> >>   XDP RX-time:   1734578015467651698 (sec:1734578015.4677)
> >> >>                  delta to User RX-time sec:0.0001 (80.309 usec)
> >> >>   No rx_vlan_tci or rx_vlan_proto, err=-95
> >> >>   0x562ff5dc8880: ping-pong with csum=561c (want c7dd)
> >> >>                   csum_start=34 csum_offset=6
> >> >>   HW RX-time:   1734578015467548904 (sec:1734578015.4675)
> >> >>                 delta to HW Launch-time sec:1.0000 (1000000.000 usec)
> >> >>   0x562ff5dc8880: complete tx idx=4 addr=4018
> >> >>   HW Launch-time:   1734578016467548904 (sec:1734578016.4675)
> >> >>                     delta to HW TX-complete-time sec:0.0000 (0.016 usec)
> >> >>   HW TX-complete-time:   1734578016467548920 (sec:1734578016.4675)
> >> >>                          delta to User TX-complete-time sec:0.0000
> >> >>                          (32.546 usec)
> >> >>   XDP RX-time:   1734578015467651698 (sec:1734578015.4677)
> >> >>                  delta to User TX-complete-time sec:0.9999
> >> >>                  (999929.768 usec)
> >> >>   HW RX-time:   1734578015467548904 (sec:1734578015.4675)
> >> >>                 delta to HW TX-complete-time sec:1.0000 (1000000.016 usec)
> >> >>   0x562ff5dc8880: complete rx idx=132 addr=84110
> >> >>
> >> >> Test 2: Send 1000 packets with a 10 ms interval and the launch time set to
> >> >>         500 us in the future.
> >> >>
> >> >> Test steps:
> >> >> 1. On the DUT, start the xdp_hw_metadata selftest application:
> >> >>    $ sudo chrt -f 99 ./xdp_hw_metadata enp2s0 -l 500000 -L 1 > \
> >> >>      /dev/shm/result.log
> >> >>
> >> >> 2. On the Link Partner, send 1000 UDP packets with a 10 ms interval and
> >> >>    VLAN priority 1 to port 9091 of the DUT.
> >> >>
> >> >> Result:
> >> >> When the launch time is set to 500 us in the future, the average delta
> >> >> between the launch time and the transmit hardware timestamp is 0.016 us,
> >> >> as shown in the analysis of /dev/shm/result.log below. The XDP launch time
> >> >> works correctly in sending 1000 packets continuously.
> >> >>   Min delta: 0.005 us
> >> >>   Avr delta: 0.016 us
> >> >>   Max delta: 0.031 us
> >> >>   Total packets forwarded: 1000
> >> >>
> >> >> Signed-off-by: Song Yoong Siang <yoong.siang.song@intel.com>
> >> >> ---
> >> >>  drivers/net/ethernet/intel/igc/igc_main.c | 42 +++++++++++++++++++++--
> >> >>  1 file changed, 40 insertions(+), 2 deletions(-)
> >> >>
> >> >> diff --git a/drivers/net/ethernet/intel/igc/igc_main.c
> >> >b/drivers/net/ethernet/intel/igc/igc_main.c
> >> >> index c3edd8bcf633..535d340c71c9 100644
> >> >> --- a/drivers/net/ethernet/intel/igc/igc_main.c
> >> >> +++ b/drivers/net/ethernet/intel/igc/igc_main.c
> >> >> @@ -2951,9 +2951,33 @@ static u64 igc_xsk_fill_timestamp(void *_priv)
> >> >>  	return *(u64 *)_priv;
> >> >>  }
> >> >>
> >> >> +static void igc_xsk_request_launch_time(u64 launch_time, void *_priv)
> >> >> +{
> >> >> +	struct igc_metadata_request *meta_req = _priv;
> >> >> +	struct igc_ring *tx_ring = meta_req->tx_ring;
> >> >> +	__le32 launch_time_offset;
> >> >> +	bool insert_empty = false;
> >> >> +	bool first_flag = false;
> >> >> +
> >> >> +	if (!tx_ring->launchtime_enable)
> >> >> +		return;
> >> >> +
> >> >> +	launch_time_offset = igc_tx_launchtime(tx_ring,
> >> >> +					       ns_to_ktime(launch_time),
> >> >> +					       &first_flag, &insert_empty);
> >> >> +	if (insert_empty) {
> >> >> +		igc_insert_empty_packet(tx_ring);
> >> >> +		meta_req->tx_buffer =
> >> >> +			&tx_ring->tx_buffer_info[tx_ring->next_to_use];
> >> >
> >> >in this case I think you currently are leaking the skbs and dma mappings
> >> >that igc_init_empty_frame() did. you're going to mix
> >> >IGC_TX_BUFFER_TYPE_XSK with IGC_TX_BUFFER_TYPE_SKB and the latter is not
> >> >explicitly initialized. Even though IGC_TX_BUFFER_TYPE_SKB happens to be
> >> >equal to 0, igc_tx_buffer::type is never cleared in the tx clean desc
> >> >routine.
> >> >
> >>
> >> Hi Fijalkowski Maciej,
> >>
> >> Thanks for your inputs.
> >>
> >> Yes, you are right, IGC_TX_BUFFER_TYPE_SKB is mixed together with
> >> IGC_TX_BUFFER_TYPE_XSK. Regarding the skb and dma map,
> >> following code in igc_clean_tx_irq() will free the skb and unmap the dma,
> >> Do these answer your concern on leaking?
> >>
> >> igc_main.c:3133:                case IGC_TX_BUFFER_TYPE_SKB:
> >> igc_main.c-3134-                        napi_consume_skb(tx_buffer->skb, napi_budget);
> >> igc_main.c-3135-                        igc_unmap_tx_buffer(tx_ring->dev, tx_buffer);
> >> igc_main.c-3136-                        break;
> >>
> >> Regarding the igc_tx_buffer::type never cleared, I think the
> >> important thing is making the igc_tx_buffer::next_to_watch NULL
> >> to indicate no remaining packet. Since transmit function will
> >> always set the igc_tx_buffer::type to a proper type,
> >
> >igc_init_tx_empty_descriptor() does not set ::type, that was my point. So
> >these empty descs might be treated as IGC_TX_BUFFER_TYPE_XSK, which is
> >wrong and your qouted code above will never get called. You will then leak
> >skb and dma map plus you will confuse XSK code due to xsk_frames miscount.
> >
> 
> Now I understand what you mean. Thanks for the explanation. I will add the
> missing IGC_TX_BUFFER_TYPE_SKB initialization in igc_init_empty_frame(),
> as below.
> 
> --- a/drivers/net/ethernet/intel/igc/igc_main.c
> +++ b/drivers/net/ethernet/intel/igc/igc_main.c
> @@ -1096,6 +1096,7 @@ static int igc_init_empty_frame(struct igc_ring *ring,
>                 return -ENOMEM;
>         }
> 
> +       buffer->type = IGC_TX_BUFFER_TYPE_SKB;
>         buffer->skb = skb;
>         buffer->protocol = 0;
>         buffer->bytecount = skb->len;
> 
> With above, IMHO, we no need to clear igc_tx_buffer::type,
> Are you agree?

Yes, the contract should be that every routine that produces descriptors
should be setting the ::type explicitly. Sorry for not being clear from
beginning but I'm glad we're on the same page now.

I'm afraid this single line should be send as a fix, though :< even
without your patch set empty descs could be mixed IGC_TX_BUFFER_TYPE_XDP
type.

> 
> >> I think it is optional for us to clear it.
> >> Is that make sense to you?
> >>
> >> >> +	}
> >> >> +
> >> >> +	igc_tx_ctxtdesc(tx_ring, launch_time_offset, first_flag, 0, 0, 0);
> >> >> +}
> >> >> +
> >> >>  const struct xsk_tx_metadata_ops igc_xsk_tx_metadata_ops = {
> >> >>  	.tmo_request_timestamp		= igc_xsk_request_timestamp,
> >> >>  	.tmo_fill_timestamp		= igc_xsk_fill_timestamp,
> >> >> +	.tmo_request_launch_time	= igc_xsk_request_launch_time,
> >> >>  };
> >> >>
> >> >>  static void igc_xdp_xmit_zc(struct igc_ring *ring)
> >> >> @@ -2976,7 +3000,13 @@ static void igc_xdp_xmit_zc(struct igc_ring *ring)
> >> >>  	ntu = ring->next_to_use;
> >> >>  	budget = igc_desc_unused(ring);
> >> >>
> >> >> -	while (xsk_tx_peek_desc(pool, &xdp_desc) && budget--) {
> >> >> +	/* Packets with launch time require one data descriptor and one context
> >> >> +	 * descriptor. When the launch time falls into the next Qbv cycle, we
> >> >> +	 * may need to insert an empty packet, which requires two more
> >> >> +	 * descriptors. Therefore, to be safe, we always ensure we have at least
> >> >> +	 * 4 descriptors available.
> >> >> +	 */
> >> >> +	while (xsk_tx_peek_desc(pool, &xdp_desc) && budget >= 4) {
> >> >>  		struct igc_metadata_request meta_req;
> >> >>  		struct xsk_tx_metadata *meta = NULL;
> >> >>  		struct igc_tx_buffer *bi;
> >> >> @@ -3000,6 +3030,12 @@ static void igc_xdp_xmit_zc(struct igc_ring *ring)
> >> >>  		xsk_tx_metadata_request(meta, &igc_xsk_tx_metadata_ops,
> >> >>  					&meta_req);
> >> >>
> >> >> +		/* xsk_tx_metadata_request() may have updated next_to_use */
> >> >> +		ntu = ring->next_to_use;
> >> >> +
> >> >> +		/* xsk_tx_metadata_request() may have updated Tx buffer info */
> >> >> +		bi = meta_req.tx_buffer;
> >> >> +
> >> >>  		tx_desc = IGC_TX_DESC(ring, ntu);
> >> >>  		tx_desc->read.cmd_type_len = cpu_to_le32(meta_req.cmd_type);
> >> >>  		tx_desc->read.olinfo_status = cpu_to_le32(olinfo_status);
> >> >> @@ -3017,9 +3053,11 @@ static void igc_xdp_xmit_zc(struct igc_ring *ring)
> >> >>  		ntu++;
> >> >>  		if (ntu == ring->count)
> >> >>  			ntu = 0;
> >> >> +
> >> >> +		ring->next_to_use = ntu;
> >> >> +		budget = igc_desc_unused(ring);
> >> >
> >> >why count the remaining space in loop? couldn't you decrement it
> >> >accordingly to the count of descriptors you have produced? writing ntu
> >> >back and forth between local var and ring struct performance-wise does not
> >> >look fine.
> >> >
> >>
> >> Yes, I can check the number of used descriptor in xsk_tx_metadata_request()
> >> by introducing a new field named used_desc in struct igc_metadata_request,
> >> and then decreases the budget with it.
> >>
> >> Do this way looked good to you?
> >
> >Yes this makes sense to me, thanks!
> >
> 
> Thanks, I will rework and submit next version.
> 
> >>
> >> Thanks & Regards
> >> Siang
> >>
> >> >>  	}
> >> >>
> >> >> -	ring->next_to_use = ntu;
> >> >>  	if (tx_desc) {
> >> >>  		igc_flush_tx_descriptors(ring);
> >> >>  		xsk_tx_release(pool);
> >> >> --
> >> >> 2.34.1
> >> >>
> 
> Thanks & Regards
> Siang
>
Song Yoong Siang Feb. 4, 2025, 3:29 p.m. UTC | #7
On Tuesday, February 4, 2025 11:18 PM, Fijalkowski, Maciej <maciej.fijalkowski@intel.com> wrote:
>On Tue, Feb 04, 2025 at 03:49:32PM +0100, Song, Yoong Siang wrote:

[...]

>> With above, IMHO, we no need to clear igc_tx_buffer::type,
>> Are you agree?
>
>Yes, the contract should be that every routine that produces descriptors
>should be setting the ::type explicitly. Sorry for not being clear from
>beginning but I'm glad we're on the same page now.
>
>I'm afraid this single line should be send as a fix, though :< even
>without your patch set empty descs could be mixed IGC_TX_BUFFER_TYPE_XDP
>type.
>

Agree with you. I will send this single line as a fix patch to iwl-net

Thanks & Regards
Siang
diff mbox series

Patch

diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
index c3edd8bcf633..535d340c71c9 100644
--- a/drivers/net/ethernet/intel/igc/igc_main.c
+++ b/drivers/net/ethernet/intel/igc/igc_main.c
@@ -2951,9 +2951,33 @@  static u64 igc_xsk_fill_timestamp(void *_priv)
 	return *(u64 *)_priv;
 }
 
+static void igc_xsk_request_launch_time(u64 launch_time, void *_priv)
+{
+	struct igc_metadata_request *meta_req = _priv;
+	struct igc_ring *tx_ring = meta_req->tx_ring;
+	__le32 launch_time_offset;
+	bool insert_empty = false;
+	bool first_flag = false;
+
+	if (!tx_ring->launchtime_enable)
+		return;
+
+	launch_time_offset = igc_tx_launchtime(tx_ring,
+					       ns_to_ktime(launch_time),
+					       &first_flag, &insert_empty);
+	if (insert_empty) {
+		igc_insert_empty_packet(tx_ring);
+		meta_req->tx_buffer =
+			&tx_ring->tx_buffer_info[tx_ring->next_to_use];
+	}
+
+	igc_tx_ctxtdesc(tx_ring, launch_time_offset, first_flag, 0, 0, 0);
+}
+
 const struct xsk_tx_metadata_ops igc_xsk_tx_metadata_ops = {
 	.tmo_request_timestamp		= igc_xsk_request_timestamp,
 	.tmo_fill_timestamp		= igc_xsk_fill_timestamp,
+	.tmo_request_launch_time	= igc_xsk_request_launch_time,
 };
 
 static void igc_xdp_xmit_zc(struct igc_ring *ring)
@@ -2976,7 +3000,13 @@  static void igc_xdp_xmit_zc(struct igc_ring *ring)
 	ntu = ring->next_to_use;
 	budget = igc_desc_unused(ring);
 
-	while (xsk_tx_peek_desc(pool, &xdp_desc) && budget--) {
+	/* Packets with launch time require one data descriptor and one context
+	 * descriptor. When the launch time falls into the next Qbv cycle, we
+	 * may need to insert an empty packet, which requires two more
+	 * descriptors. Therefore, to be safe, we always ensure we have at least
+	 * 4 descriptors available.
+	 */
+	while (xsk_tx_peek_desc(pool, &xdp_desc) && budget >= 4) {
 		struct igc_metadata_request meta_req;
 		struct xsk_tx_metadata *meta = NULL;
 		struct igc_tx_buffer *bi;
@@ -3000,6 +3030,12 @@  static void igc_xdp_xmit_zc(struct igc_ring *ring)
 		xsk_tx_metadata_request(meta, &igc_xsk_tx_metadata_ops,
 					&meta_req);
 
+		/* xsk_tx_metadata_request() may have updated next_to_use */
+		ntu = ring->next_to_use;
+
+		/* xsk_tx_metadata_request() may have updated Tx buffer info */
+		bi = meta_req.tx_buffer;
+
 		tx_desc = IGC_TX_DESC(ring, ntu);
 		tx_desc->read.cmd_type_len = cpu_to_le32(meta_req.cmd_type);
 		tx_desc->read.olinfo_status = cpu_to_le32(olinfo_status);
@@ -3017,9 +3053,11 @@  static void igc_xdp_xmit_zc(struct igc_ring *ring)
 		ntu++;
 		if (ntu == ring->count)
 			ntu = 0;
+
+		ring->next_to_use = ntu;
+		budget = igc_desc_unused(ring);
 	}
 
-	ring->next_to_use = ntu;
 	if (tx_desc) {
 		igc_flush_tx_descriptors(ring);
 		xsk_tx_release(pool);