diff mbox series

[v4,bpf-next,09/10] selftests: xsk: rely on pkts_in_flight in wait_for_tx_completion()

Message ID 20220616180609.905015-10-maciej.fijalkowski@intel.com (mailing list archive)
State Changes Requested
Delegated to: BPF
Headers show
Series AF_XDP ZC selftests | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for bpf-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix success Link
netdev/cover_letter success Series has a cover letter
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/cc_maintainers warning 10 maintainers not CCed: songliubraving@fb.com hawk@kernel.org linux-kselftest@vger.kernel.org davem@davemloft.net yhs@fb.com john.fastabend@gmail.com kafai@fb.com shuah@kernel.org andrii@kernel.org kpsingh@kernel.org
netdev/build_clang success Errors and warnings before: 0 this patch: 0
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
netdev/checkpatch warning WARNING: Use of volatile is usually wrong: see Documentation/process/volatile-considered-harmful.rst
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
bpf/vmtest-bpf-next-PR success PR summary
bpf/vmtest-bpf-next-VM_Test-1 success Logs for Kernel LATEST on ubuntu-latest with gcc
bpf/vmtest-bpf-next-VM_Test-2 success Logs for Kernel LATEST on ubuntu-latest with llvm-15
bpf/vmtest-bpf-next-VM_Test-3 success Logs for Kernel LATEST on z15 with gcc

Commit Message

Fijalkowski, Maciej June 16, 2022, 6:06 p.m. UTC
Some of the drivers that implement support for AF_XDP Zero Copy (like
ice) can have lazy approach for cleaning Tx descriptors. For ZC, when
descriptor is cleaned, it is placed onto AF_XDP completion queue. This
means that current implementation of wait_for_tx_completion() in
xdpxceiver can get onto infinite loop, as some of the descriptors can
never reach CQ.

This function can be changed to rely on pkts_in_flight instead.

Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
---
 tools/testing/selftests/bpf/xdpxceiver.c | 3 ++-
 tools/testing/selftests/bpf/xdpxceiver.h | 2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

Comments

John Fastabend June 18, 2022, 2:56 a.m. UTC | #1
Maciej Fijalkowski wrote:
> Some of the drivers that implement support for AF_XDP Zero Copy (like
> ice) can have lazy approach for cleaning Tx descriptors. For ZC, when
> descriptor is cleaned, it is placed onto AF_XDP completion queue. This
> means that current implementation of wait_for_tx_completion() in
> xdpxceiver can get onto infinite loop, as some of the descriptors can
> never reach CQ.
> 
> This function can be changed to rely on pkts_in_flight instead.
> 
> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> ---

Sorry I'm going to need more details to follow whats going on here.

In send_pkts() we do the expected thing and send all the pkts and
then call wait_for_tx_completion().

Wait for completion is obvious,

 static void wait_for_tx_completion(struct xsk_socket_info *xsk)               
 {                                                   
        while (xsk->outstanding_tx)                                                      
                complete_pkts(xsk, BATCH_SIZE);
 }  

the 'outstanding_tx' counter appears to be decremented in complete_pkts().
This is done by looking at xdk_ring_cons__peek() makes sense to me until
it shows up here we don't know the pkt has been completely sent and
can release the resources.

Now if you just zero it on exit and call it good how do you know the
resources are safe to clean up? Or that you don't have a real bug
in the driver that isn't correctly releasing the resource.

How are users expected to use a lazy approach to tx descriptor cleaning
in this case e.g. on exit like in this case. It seems we need to
fix the root cause of ice not putting things on the completion queue
or I misunderstood the patch.


>  tools/testing/selftests/bpf/xdpxceiver.c | 3 ++-
>  tools/testing/selftests/bpf/xdpxceiver.h | 2 +-
>  2 files changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/testing/selftests/bpf/xdpxceiver.c b/tools/testing/selftests/bpf/xdpxceiver.c
> index de4cf0432243..13a3b2ac2399 100644
> --- a/tools/testing/selftests/bpf/xdpxceiver.c
> +++ b/tools/testing/selftests/bpf/xdpxceiver.c
> @@ -965,7 +965,7 @@ static int __send_pkts(struct ifobject *ifobject, u32 *pkt_nb)
>  
>  static void wait_for_tx_completion(struct xsk_socket_info *xsk)
>  {
> -	while (xsk->outstanding_tx)
> +	while (pkts_in_flight)
>  		complete_pkts(xsk, BATCH_SIZE);
>  }
>  
> @@ -1269,6 +1269,7 @@ static void *worker_testapp_validate_rx(void *arg)
>  		pthread_mutex_unlock(&pacing_mutex);
>  	}
>  
> +	pkts_in_flight = 0;
>  	pthread_exit(NULL);
>  }
Fijalkowski, Maciej June 20, 2022, 12:02 p.m. UTC | #2
On Fri, Jun 17, 2022 at 07:56:17PM -0700, John Fastabend wrote:
> Maciej Fijalkowski wrote:
> > Some of the drivers that implement support for AF_XDP Zero Copy (like
> > ice) can have lazy approach for cleaning Tx descriptors. For ZC, when
> > descriptor is cleaned, it is placed onto AF_XDP completion queue. This
> > means that current implementation of wait_for_tx_completion() in
> > xdpxceiver can get onto infinite loop, as some of the descriptors can
> > never reach CQ.
> > 
> > This function can be changed to rely on pkts_in_flight instead.
> > 
> > Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
> > Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> > ---
> 
> Sorry I'm going to need more details to follow whats going on here.
> 
> In send_pkts() we do the expected thing and send all the pkts and
> then call wait_for_tx_completion().
> 
> Wait for completion is obvious,
> 
>  static void wait_for_tx_completion(struct xsk_socket_info *xsk)               
>  {                                                   
>         while (xsk->outstanding_tx)                                                      
>                 complete_pkts(xsk, BATCH_SIZE);
>  }  
> 
> the 'outstanding_tx' counter appears to be decremented in complete_pkts().
> This is done by looking at xdk_ring_cons__peek() makes sense to me until
> it shows up here we don't know the pkt has been completely sent and
> can release the resources.

This is necessary for scenarios like l2fwd in xdpsock where you would be
taking entries from cq back to fq to refill the rx hw queue and keep going
with the flow.

> 
> Now if you just zero it on exit and call it good how do you know the
> resources are safe to clean up? Or that you don't have a real bug
> in the driver that isn't correctly releasing the resource.

xdpxceiver spawns two threads one for tx and one for rx. from rx thread
POV if receive_pkts() ended its job then this implies that tx thread
transmitted all of the frames that rx thread expected to receive. this
zeroing is then only to terminate the tx thread and finish the current
test case so that further cases under the current mode can be executed.

> 
> How are users expected to use a lazy approach to tx descriptor cleaning
> in this case e.g. on exit like in this case. It seems we need to
> fix the root cause of ice not putting things on the completion queue
> or I misunderstood the patch.

ice puts things on cq lazily on purpose as we added batching to Tx side
where we clean descs only when it's needed.

We need to exit spawned threads before we detach socket from interface.
Socket detach is done from main thread and in that case driver goes
through tx ring and places descriptors that are left to completion queue.

> 
> 
> >  tools/testing/selftests/bpf/xdpxceiver.c | 3 ++-
> >  tools/testing/selftests/bpf/xdpxceiver.h | 2 +-
> >  2 files changed, 3 insertions(+), 2 deletions(-)
> > 
> > diff --git a/tools/testing/selftests/bpf/xdpxceiver.c b/tools/testing/selftests/bpf/xdpxceiver.c
> > index de4cf0432243..13a3b2ac2399 100644
> > --- a/tools/testing/selftests/bpf/xdpxceiver.c
> > +++ b/tools/testing/selftests/bpf/xdpxceiver.c
> > @@ -965,7 +965,7 @@ static int __send_pkts(struct ifobject *ifobject, u32 *pkt_nb)
> >  
> >  static void wait_for_tx_completion(struct xsk_socket_info *xsk)
> >  {
> > -	while (xsk->outstanding_tx)
> > +	while (pkts_in_flight)
> >  		complete_pkts(xsk, BATCH_SIZE);
> >  }
> >  
> > @@ -1269,6 +1269,7 @@ static void *worker_testapp_validate_rx(void *arg)
> >  		pthread_mutex_unlock(&pacing_mutex);
> >  	}
> >  
> > +	pkts_in_flight = 0;
> >  	pthread_exit(NULL);
> >  }
John Fastabend June 20, 2022, 4:36 p.m. UTC | #3
Maciej Fijalkowski wrote:
> On Fri, Jun 17, 2022 at 07:56:17PM -0700, John Fastabend wrote:
> > Maciej Fijalkowski wrote:
> > > Some of the drivers that implement support for AF_XDP Zero Copy (like
> > > ice) can have lazy approach for cleaning Tx descriptors. For ZC, when
> > > descriptor is cleaned, it is placed onto AF_XDP completion queue. This
> > > means that current implementation of wait_for_tx_completion() in
> > > xdpxceiver can get onto infinite loop, as some of the descriptors can
> > > never reach CQ.
> > > 
> > > This function can be changed to rely on pkts_in_flight instead.
> > > 
> > > Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
> > > Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> > > ---
> > 
> > Sorry I'm going to need more details to follow whats going on here.
> > 
> > In send_pkts() we do the expected thing and send all the pkts and
> > then call wait_for_tx_completion().
> > 
> > Wait for completion is obvious,
> > 
> >  static void wait_for_tx_completion(struct xsk_socket_info *xsk)               
> >  {                                                   
> >         while (xsk->outstanding_tx)                                                      
> >                 complete_pkts(xsk, BATCH_SIZE);
> >  }  
> > 
> > the 'outstanding_tx' counter appears to be decremented in complete_pkts().
> > This is done by looking at xdk_ring_cons__peek() makes sense to me until
> > it shows up here we don't know the pkt has been completely sent and
> > can release the resources.
> 
> This is necessary for scenarios like l2fwd in xdpsock where you would be
> taking entries from cq back to fq to refill the rx hw queue and keep going
> with the flow.
> 
> > 
> > Now if you just zero it on exit and call it good how do you know the
> > resources are safe to clean up? Or that you don't have a real bug
> > in the driver that isn't correctly releasing the resource.
> 
> xdpxceiver spawns two threads one for tx and one for rx. from rx thread
> POV if receive_pkts() ended its job then this implies that tx thread
> transmitted all of the frames that rx thread expected to receive. this
> zeroing is then only to terminate the tx thread and finish the current
> test case so that further cases under the current mode can be executed.
> 
> > 
> > How are users expected to use a lazy approach to tx descriptor cleaning
> > in this case e.g. on exit like in this case. It seems we need to
> > fix the root cause of ice not putting things on the completion queue
> > or I misunderstood the patch.
> 
> ice puts things on cq lazily on purpose as we added batching to Tx side
> where we clean descs only when it's needed.
> 
> We need to exit spawned threads before we detach socket from interface.
> Socket detach is done from main thread and in that case driver goes
> through tx ring and places descriptors that are left to completion queue.

But, in general (not this specific xdpxceiver) how does an application
that is tx only know when its OK to tear things down if the ice
driver can get stuck and never puts those on the completion queue? Should
there be some timer that fires and writes these back regardless of more
descriptors are seen? Did I understand the driver correctly.

Thanks,
John
diff mbox series

Patch

diff --git a/tools/testing/selftests/bpf/xdpxceiver.c b/tools/testing/selftests/bpf/xdpxceiver.c
index de4cf0432243..13a3b2ac2399 100644
--- a/tools/testing/selftests/bpf/xdpxceiver.c
+++ b/tools/testing/selftests/bpf/xdpxceiver.c
@@ -965,7 +965,7 @@  static int __send_pkts(struct ifobject *ifobject, u32 *pkt_nb)
 
 static void wait_for_tx_completion(struct xsk_socket_info *xsk)
 {
-	while (xsk->outstanding_tx)
+	while (pkts_in_flight)
 		complete_pkts(xsk, BATCH_SIZE);
 }
 
@@ -1269,6 +1269,7 @@  static void *worker_testapp_validate_rx(void *arg)
 		pthread_mutex_unlock(&pacing_mutex);
 	}
 
+	pkts_in_flight = 0;
 	pthread_exit(NULL);
 }
 
diff --git a/tools/testing/selftests/bpf/xdpxceiver.h b/tools/testing/selftests/bpf/xdpxceiver.h
index b7aa6c7cf2be..f364a92675f8 100644
--- a/tools/testing/selftests/bpf/xdpxceiver.h
+++ b/tools/testing/selftests/bpf/xdpxceiver.h
@@ -170,6 +170,6 @@  pthread_barrier_t barr;
 pthread_mutex_t pacing_mutex = PTHREAD_MUTEX_INITIALIZER;
 pthread_cond_t pacing_cond = PTHREAD_COND_INITIALIZER;
 
-int pkts_in_flight;
+volatile int pkts_in_flight;
 
 #endif				/* XDPXCEIVER_H */