diff mbox series

[v3] RDMA/rxe: Generate a completion on error after getting a wqe

Message ID 20220331120245.314614-1-yangx.jy@fujitsu.com (mailing list archive)
State Changes Requested
Headers show
Series [v3] RDMA/rxe: Generate a completion on error after getting a wqe | expand

Commit Message

Xiao Yang March 31, 2022, 12:02 p.m. UTC
Current rxe_requester() doesn't generate a completion on error after
getting a wqe. Fix the issue by calling rxe_completer() on error.

Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com>
---
 drivers/infiniband/sw/rxe/rxe_req.c | 24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

Comments

Leon Romanovsky March 31, 2022, 5:19 p.m. UTC | #1
On Thu, Mar 31, 2022 at 08:02:45PM +0800, Xiao Yang wrote:
> Current rxe_requester() doesn't generate a completion on error after
> getting a wqe. Fix the issue by calling rxe_completer() on error.
> 
> Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com>
> ---
>  drivers/infiniband/sw/rxe/rxe_req.c | 24 ++++++++++++------------
>  1 file changed, 12 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/infiniband/sw/rxe/rxe_req.c b/drivers/infiniband/sw/rxe/rxe_req.c
> index ae5fbc79dd5c..01ae400e5481 100644
> --- a/drivers/infiniband/sw/rxe/rxe_req.c
> +++ b/drivers/infiniband/sw/rxe/rxe_req.c
> @@ -648,26 +648,24 @@ int rxe_requester(void *arg)
>  		psn_compare(qp->req.psn, (qp->comp.psn +
>  				RXE_MAX_UNACKED_PSNS)) > 0)) {
>  		qp->req.wait_psn = 1;
> -		goto exit;
> +		goto qp_op_err;
>  	}
>  
>  	/* Limit the number of inflight SKBs per QP */
>  	if (unlikely(atomic_read(&qp->skb_out) >
>  		     RXE_INFLIGHT_SKBS_PER_QP_HIGH)) {
>  		qp->need_req_skb = 1;
> -		goto exit;
> +		goto qp_op_err;
>  	}
>  
>  	opcode = next_opcode(qp, wqe, wqe->wr.opcode);
> -	if (unlikely(opcode < 0)) {
> -		wqe->status = IB_WC_LOC_QP_OP_ERR;
> -		goto exit;
> -	}
> +	if (unlikely(opcode < 0))
> +		goto qp_op_err;
>  
>  	mask = rxe_opcode[opcode].mask;
>  	if (unlikely(mask & RXE_READ_OR_ATOMIC_MASK)) {
>  		if (check_init_depth(qp, wqe))
> -			goto exit;
> +			goto qp_op_err;
>  	}
>  
>  	mtu = get_mtu(qp);
> @@ -706,26 +704,26 @@ int rxe_requester(void *arg)
>  	av = rxe_get_av(&pkt, &ah);
>  	if (unlikely(!av)) {
>  		pr_err("qp#%d Failed no address vector\n", qp_num(qp));
> -		wqe->status = IB_WC_LOC_QP_OP_ERR;
>  		goto err_drop_ah;
>  	}
>  
>  	skb = init_req_packet(qp, av, wqe, opcode, payload, &pkt);
>  	if (unlikely(!skb)) {
>  		pr_err("qp#%d Failed allocating skb\n", qp_num(qp));
> -		wqe->status = IB_WC_LOC_QP_OP_ERR;
>  		goto err_drop_ah;
>  	}
>  
>  	ret = finish_packet(qp, av, wqe, &pkt, skb, payload);
>  	if (unlikely(ret)) {
>  		pr_debug("qp#%d Error during finish packet\n", qp_num(qp));
> +		if (ah)

No, ah can't be NULL. This is why I proposed to clean rxe_get_av() too.
**ahp is not NULL, as an input to that function and in all flows it is
updated with new ah pointer. If it can't update, the NULL will be
returned and rxe_requester() will exit.

Thanks

> +			rxe_put(ah);
>  		if (ret == -EFAULT)
>  			wqe->status = IB_WC_LOC_PROT_ERR;
>  		else
>  			wqe->status = IB_WC_LOC_QP_OP_ERR;
>  		kfree_skb(skb);
> -		goto err_drop_ah;
> +		goto err;
>  	}
>  
>  	if (ah)
> @@ -751,8 +749,7 @@ int rxe_requester(void *arg)
>  			goto exit;
>  		}
>  
> -		wqe->status = IB_WC_LOC_QP_OP_ERR;
> -		goto err;
> +		goto qp_op_err;
>  	}
>  
>  	update_state(qp, wqe, &pkt);
> @@ -762,6 +759,9 @@ int rxe_requester(void *arg)
>  err_drop_ah:
>  	if (ah)
>  		rxe_put(ah);
> +
> +qp_op_err:
> +	wqe->status = IB_WC_LOC_QP_OP_ERR;
>  err:
>  	wqe->state = wqe_state_error;
>  	__rxe_do_task(&qp->comp.task);
> -- 
> 2.25.4
> 
> 
>
Leon Romanovsky April 1, 2022, 4:50 a.m. UTC | #2
On Fri, Apr 01, 2022 at 03:52:37AM +0000, yangx.jy@fujitsu.com wrote:
> On 2022/4/1 1:19, Leon Romanovsky wrote:
> 
>         ret = finish_packet(qp, av, wqe, &pkt, skb, payload);
>         if (unlikely(ret)) {
>                 pr_debug("qp#%d Error during finish packet\n", qp_num(qp));
> +               if (ah)
> 
> 
> No, ah can't be NULL. This is why I proposed to clean rxe_get_av() too.
> **ahp is not NULL, as an input to that function and in all flows it is
> updated with new ah pointer. If it can't update, the NULL will be
> returned and rxe_requester() will exit.

Please use plain-text friendly email client.

> 
> Hi Leon,
> 
> If ah_num is 0 in rxe_get_av(), rxe_get_av() returns not NULL (i.e. av is not NULL) and ah is NULL.
> In this case, I think we can reach here and ah is NULL.
> BTW, I don't want to mix the fix and the change of rxe_get_av()'s logic.

ok, no problem.

Thanks

> 
> Best Regards,
> 
> Xiao Yang
Xiao Yang April 1, 2022, 5:09 a.m. UTC | #3
On 2022/4/1 12:50, Leon Romanovsky wrote:
> On Fri, Apr 01, 2022 at 03:52:37AM +0000, yangx.jy@fujitsu.com wrote:
>> On 2022/4/1 1:19, Leon Romanovsky wrote:
>>
>>          ret = finish_packet(qp, av, wqe, &pkt, skb, payload);
>>          if (unlikely(ret)) {
>>                  pr_debug("qp#%d Error during finish packet\n", qp_num(qp));
>> +               if (ah)
>>
>>
>> No, ah can't be NULL. This is why I proposed to clean rxe_get_av() too.
>> **ahp is not NULL, as an input to that function and in all flows it is
>> updated with new ah pointer. If it can't update, the NULL will be
>> returned and rxe_requester() will exit.
> 
> Please use plain-text friendly email client.
Hi Leon,

Thanks for your reminder.


> 
>>
>> Hi Leon,
>>
>> If ah_num is 0 in rxe_get_av(), rxe_get_av() returns not NULL (i.e. av is not NULL) and ah is NULL.
>> In this case, I think we can reach here and ah is NULL.
>> BTW, I don't want to mix the fix and the change of rxe_get_av()'s logic.
> 
> ok, no problem.

^_^

Best Regards,
Xiao Yang

> 
> Thanks
> 
>>
>> Best Regards,
>>
>> Xiao Yang
Bob Pearson April 7, 2022, 9:39 p.m. UTC | #4
On 3/31/22 07:02, Xiao Yang wrote:
> Current rxe_requester() doesn't generate a completion on error after
> getting a wqe. Fix the issue by calling rxe_completer() on error.
> 
> Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com>
> ---
>  drivers/infiniband/sw/rxe/rxe_req.c | 24 ++++++++++++------------
>  1 file changed, 12 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/infiniband/sw/rxe/rxe_req.c b/drivers/infiniband/sw/rxe/rxe_req.c
> index ae5fbc79dd5c..01ae400e5481 100644
> --- a/drivers/infiniband/sw/rxe/rxe_req.c
> +++ b/drivers/infiniband/sw/rxe/rxe_req.c
> @@ -648,26 +648,24 @@ int rxe_requester(void *arg)
>  		psn_compare(qp->req.psn, (qp->comp.psn +
>  				RXE_MAX_UNACKED_PSNS)) > 0)) {
>  		qp->req.wait_psn = 1;
> -		goto exit;
> +		goto qp_op_err;
>  	}
This isn't an error. What is happening is the requester has advanced too far into the
work request queue compared to what has been completed. So it sets a flag and exits the
loop. When the completer finishes another completion it will reschedule the requester.
RXE_MAX_UNACKED_PSNS is 128 so only 128 packets are allowed in the network without
seeing an ack.
>  
>  	/* Limit the number of inflight SKBs per QP */
>  	if (unlikely(atomic_read(&qp->skb_out) >
>  		     RXE_INFLIGHT_SKBS_PER_QP_HIGH)) {
>  		qp->need_req_skb = 1;
> -		goto exit;
> +		goto qp_op_err;
>  	}
This also is not an error. Here there is a limit on the number SKBs in flight.
The SKBs are consumed by the ethernet driver when the packet is put on the wire.
This prevents the driver from using too much memory when the NIC is paused.
>  
>  	opcode = next_opcode(qp, wqe, wqe->wr.opcode);
> -	if (unlikely(opcode < 0)) {
> -		wqe->status = IB_WC_LOC_QP_OP_ERR;
> -		goto exit;
> -	}
This really is an error, but one that shouldn't happen under normal
circumstances since it implies an illegal operation. Like attempting
a write on a UD QP. Or causing an invalid opcode sequence.
> +	if (unlikely(opcode < 0))
> +		goto qp_op_err;
>  
>  	mask = rxe_opcode[opcode].mask;
>  	if (unlikely(mask & RXE_READ_OR_ATOMIC_MASK)) {
>  		if (check_init_depth(qp, wqe))
> -			goto exit;
> +			goto qp_op_err;
>  	}
This isn't an error. It means that someone posted a read/atomic operation
and there are too many pending read/atomic operations pending. You just need
to wait for one of them to complete so it is another pause.
>  
>  	mtu = get_mtu(qp);
> @@ -706,26 +704,26 @@ int rxe_requester(void *arg)
>  	av = rxe_get_av(&pkt, &ah);
>  	if (unlikely(!av)) {
>  		pr_err("qp#%d Failed no address vector\n", qp_num(qp));
> -		wqe->status = IB_WC_LOC_QP_OP_ERR;
>  		goto err_drop_ah;
>  	}
This is an error. It could happen if the address handle referred to by
the WR is not longer valid. There should always be an address vector but
there may or may not be an address handle if the AV comes from a connected
QP or not.
>  
>  	skb = init_req_packet(qp, av, wqe, opcode, payload, &pkt);
>  	if (unlikely(!skb)) {
>  		pr_err("qp#%d Failed allocating skb\n", qp_num(qp));
> -		wqe->status = IB_WC_LOC_QP_OP_ERR;
>  		goto err_drop_ah;
>  	}
>  
>  	ret = finish_packet(qp, av, wqe, &pkt, skb, payload);
>  	if (unlikely(ret)) {
>  		pr_debug("qp#%d Error during finish packet\n", qp_num(qp));
> +		if (ah)
> +			rxe_put(ah);
>  		if (ret == -EFAULT)
>  			wqe->status = IB_WC_LOC_PROT_ERR;
>  		else
>  			wqe->status = IB_WC_LOC_QP_OP_ERR;
>  		kfree_skb(skb);
> -		goto err_drop_ah;
> +		goto err;
>  	}
Not sure the point of this. 
>  
>  	if (ah)
> @@ -751,8 +749,7 @@ int rxe_requester(void *arg)
>  			goto exit;
>  		}
>  
> -		wqe->status = IB_WC_LOC_QP_OP_ERR;
> -		goto err;
> +		goto qp_op_err;
>  	}
>  
>  	update_state(qp, wqe, &pkt);
> @@ -762,6 +759,9 @@ int rxe_requester(void *arg)
>  err_drop_ah:
>  	if (ah)
>  		rxe_put(ah);
> +
> +qp_op_err:
> +	wqe->status = IB_WC_LOC_QP_OP_ERR;
>  err:
>  	wqe->state = wqe_state_error;
>  	__rxe_do_task(&qp->comp.task);

Near as I can tell all you did was turn a few pauses into errors which is wrong
and otherwise move things around. What were you trying to solve here?

Bob
Xiao Yang April 8, 2022, 3:14 a.m. UTC | #5
Hi Bob,

Thanks a lot for your detailed comments.

On 2022/4/8 5:39, Bob Pearson wrote:
> On 3/31/22 07:02, Xiao Yang wrote:
>> Current rxe_requester() doesn't generate a completion on error after
>> getting a wqe. Fix the issue by calling rxe_completer() on error.
>>
>> Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com>
>> ---
>>   drivers/infiniband/sw/rxe/rxe_req.c | 24 ++++++++++++------------
>>   1 file changed, 12 insertions(+), 12 deletions(-)
>>
>> diff --git a/drivers/infiniband/sw/rxe/rxe_req.c b/drivers/infiniband/sw/rxe/rxe_req.c
>> index ae5fbc79dd5c..01ae400e5481 100644
>> --- a/drivers/infiniband/sw/rxe/rxe_req.c
>> +++ b/drivers/infiniband/sw/rxe/rxe_req.c
>> @@ -648,26 +648,24 @@ int rxe_requester(void *arg)
>>   		psn_compare(qp->req.psn, (qp->comp.psn +
>>   				RXE_MAX_UNACKED_PSNS)) > 0)) {
>>   		qp->req.wait_psn = 1;
>> -		goto exit;
>> +		goto qp_op_err;
>>   	}
> This isn't an error. What is happening is the requester has advanced too far into the
> work request queue compared to what has been completed. So it sets a flag and exits the
> loop. When the completer finishes another completion it will reschedule the requester.
> RXE_MAX_UNACKED_PSNS is 128 so only 128 packets are allowed in the network without
> seeing an ack.

Got it.

>>   
>>   	/* Limit the number of inflight SKBs per QP */
>>   	if (unlikely(atomic_read(&qp->skb_out) >
>>   		     RXE_INFLIGHT_SKBS_PER_QP_HIGH)) {
>>   		qp->need_req_skb = 1;
>> -		goto exit;
>> +		goto qp_op_err;
>>   	}
> This also is not an error. Here there is a limit on the number SKBs in flight.
> The SKBs are consumed by the ethernet driver when the packet is put on the wire.
> This prevents the driver from using too much memory when the NIC is paused.

Got it.

>>   
>>   	opcode = next_opcode(qp, wqe, wqe->wr.opcode);
>> -	if (unlikely(opcode < 0)) {
>> -		wqe->status = IB_WC_LOC_QP_OP_ERR;
>> -		goto exit;
>> -	}
> This really is an error, but one that shouldn't happen under normal
> circumstances since it implies an illegal operation. Like attempting
> a write on a UD QP. Or causing an invalid opcode sequence.

Yes, this is the issue I try to resolve. Current rxe_requester() doesn't 
generate a completion when processing an unsupported/invalid opcode.
I am trying to introduce new RDMA Atomic Write opcode recently. If rxe 
driver doesn't support a new opcode (e.g. RDMA Atomic Write) and RDMA 
library supports it, an application using the new opcode can reproduce 
the issue.

>> +	if (unlikely(opcode < 0))
>> +		goto qp_op_err;
>>   
>>   	mask = rxe_opcode[opcode].mask;
>>   	if (unlikely(mask & RXE_READ_OR_ATOMIC_MASK)) {
>>   		if (check_init_depth(qp, wqe))
>> -			goto exit;
>> +			goto qp_op_err;
>>   	}
> This isn't an error. It means that someone posted a read/atomic operation
> and there are too many pending read/atomic operations pending. You just need
> to wait for one of them to complete so it is another pause.

Got it.

>>   
>>   	mtu = get_mtu(qp);
>> @@ -706,26 +704,26 @@ int rxe_requester(void *arg)
>>   	av = rxe_get_av(&pkt, &ah);
>>   	if (unlikely(!av)) {
>>   		pr_err("qp#%d Failed no address vector\n", qp_num(qp));
>> -		wqe->status = IB_WC_LOC_QP_OP_ERR;
>>   		goto err_drop_ah;
>>   	}
> This is an error. It could happen if the address handle referred to by
> the WR is not longer valid. There should always be an address vector but
> there may or may not be an address handle if the AV comes from a connected
> QP or not.

Yes, I think the original logic is enough.

>>   
>>   	skb = init_req_packet(qp, av, wqe, opcode, payload, &pkt);
>>   	if (unlikely(!skb)) {
>>   		pr_err("qp#%d Failed allocating skb\n", qp_num(qp));
>> -		wqe->status = IB_WC_LOC_QP_OP_ERR;
>>   		goto err_drop_ah;
>>   	}
>>   
>>   	ret = finish_packet(qp, av, wqe, &pkt, skb, payload);
>>   	if (unlikely(ret)) {
>>   		pr_debug("qp#%d Error during finish packet\n", qp_num(qp));
>> +		if (ah)
>> +			rxe_put(ah);
>>   		if (ret == -EFAULT)
>>   			wqe->status = IB_WC_LOC_PROT_ERR;
>>   		else
>>   			wqe->status = IB_WC_LOC_QP_OP_ERR;
>>   		kfree_skb(skb);
>> -		goto err_drop_ah;
>> +		goto err;
>>   	}
> Not sure the point of this.

Just keep the original logic.

>>   
>>   	if (ah)
>> @@ -751,8 +749,7 @@ int rxe_requester(void *arg)
>>   			goto exit;
>>   		}
>>   
>> -		wqe->status = IB_WC_LOC_QP_OP_ERR;
>> -		goto err;
>> +		goto qp_op_err;
>>   	}
>>   
>>   	update_state(qp, wqe, &pkt);
>> @@ -762,6 +759,9 @@ int rxe_requester(void *arg)
>>   err_drop_ah:
>>   	if (ah)
>>   		rxe_put(ah);
>> +
>> +qp_op_err:
>> +	wqe->status = IB_WC_LOC_QP_OP_ERR;
>>   err:
>>   	wqe->state = wqe_state_error;
>>   	__rxe_do_task(&qp->comp.task);
> 
> Near as I can tell all you did was turn a few pauses into errors which is wrong
> and otherwise move things around. What were you trying to solve here?

I will resend a new patch to only fix the unsupported/invalid issue.

Best Regards,
Xiao Yang

> 
> Bob
diff mbox series

Patch

diff --git a/drivers/infiniband/sw/rxe/rxe_req.c b/drivers/infiniband/sw/rxe/rxe_req.c
index ae5fbc79dd5c..01ae400e5481 100644
--- a/drivers/infiniband/sw/rxe/rxe_req.c
+++ b/drivers/infiniband/sw/rxe/rxe_req.c
@@ -648,26 +648,24 @@  int rxe_requester(void *arg)
 		psn_compare(qp->req.psn, (qp->comp.psn +
 				RXE_MAX_UNACKED_PSNS)) > 0)) {
 		qp->req.wait_psn = 1;
-		goto exit;
+		goto qp_op_err;
 	}
 
 	/* Limit the number of inflight SKBs per QP */
 	if (unlikely(atomic_read(&qp->skb_out) >
 		     RXE_INFLIGHT_SKBS_PER_QP_HIGH)) {
 		qp->need_req_skb = 1;
-		goto exit;
+		goto qp_op_err;
 	}
 
 	opcode = next_opcode(qp, wqe, wqe->wr.opcode);
-	if (unlikely(opcode < 0)) {
-		wqe->status = IB_WC_LOC_QP_OP_ERR;
-		goto exit;
-	}
+	if (unlikely(opcode < 0))
+		goto qp_op_err;
 
 	mask = rxe_opcode[opcode].mask;
 	if (unlikely(mask & RXE_READ_OR_ATOMIC_MASK)) {
 		if (check_init_depth(qp, wqe))
-			goto exit;
+			goto qp_op_err;
 	}
 
 	mtu = get_mtu(qp);
@@ -706,26 +704,26 @@  int rxe_requester(void *arg)
 	av = rxe_get_av(&pkt, &ah);
 	if (unlikely(!av)) {
 		pr_err("qp#%d Failed no address vector\n", qp_num(qp));
-		wqe->status = IB_WC_LOC_QP_OP_ERR;
 		goto err_drop_ah;
 	}
 
 	skb = init_req_packet(qp, av, wqe, opcode, payload, &pkt);
 	if (unlikely(!skb)) {
 		pr_err("qp#%d Failed allocating skb\n", qp_num(qp));
-		wqe->status = IB_WC_LOC_QP_OP_ERR;
 		goto err_drop_ah;
 	}
 
 	ret = finish_packet(qp, av, wqe, &pkt, skb, payload);
 	if (unlikely(ret)) {
 		pr_debug("qp#%d Error during finish packet\n", qp_num(qp));
+		if (ah)
+			rxe_put(ah);
 		if (ret == -EFAULT)
 			wqe->status = IB_WC_LOC_PROT_ERR;
 		else
 			wqe->status = IB_WC_LOC_QP_OP_ERR;
 		kfree_skb(skb);
-		goto err_drop_ah;
+		goto err;
 	}
 
 	if (ah)
@@ -751,8 +749,7 @@  int rxe_requester(void *arg)
 			goto exit;
 		}
 
-		wqe->status = IB_WC_LOC_QP_OP_ERR;
-		goto err;
+		goto qp_op_err;
 	}
 
 	update_state(qp, wqe, &pkt);
@@ -762,6 +759,9 @@  int rxe_requester(void *arg)
 err_drop_ah:
 	if (ah)
 		rxe_put(ah);
+
+qp_op_err:
+	wqe->status = IB_WC_LOC_QP_OP_ERR;
 err:
 	wqe->state = wqe_state_error;
 	__rxe_do_task(&qp->comp.task);