diff mbox

[v1,01/11] IB/rxe: Move refcounting earlier in rxe_send()

Message ID 1503687956-7110-2-git-send-email-andrew.boyer@dell.com (mailing list archive)
State Superseded
Headers show

Commit Message

Andrew Boyer Aug. 25, 2017, 7:05 p.m. UTC
The network stack will call nskb's destructor, rxe_skb_tx_dtor(), if the
packet gets dropped by ip_local_out()/ip6_local_out(). Thus we need to add
the QP ref before output to avoid extra dereferences during network
congestion. This could lead to unwanted destruction of the QP.

Fix up the skb_out accounting, too.

Fixes: fda85ce91240 ("IB/rxe: Fix kernel panic from skb destructor")
Signed-off-by: Andrew Boyer <andrew.boyer@dell.com>
Acked-by: Moni Shoua <monis@mellanox.com>
---
 drivers/infiniband/sw/rxe/rxe_net.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

Comments

Yuval Shaia Aug. 27, 2017, 12:01 p.m. UTC | #1
On Fri, Aug 25, 2017 at 03:05:46PM -0400, Andrew Boyer wrote:
> The network stack will call nskb's destructor, rxe_skb_tx_dtor(), if the
> packet gets dropped by ip_local_out()/ip6_local_out(). Thus we need to add
> the QP ref before output to avoid extra dereferences during network
> congestion. This could lead to unwanted destruction of the QP.
> 
> Fix up the skb_out accounting, too.
> 
> Fixes: fda85ce91240 ("IB/rxe: Fix kernel panic from skb destructor")
> Signed-off-by: Andrew Boyer <andrew.boyer@dell.com>
> Acked-by: Moni Shoua <monis@mellanox.com>
> ---
>  drivers/infiniband/sw/rxe/rxe_net.c | 8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/infiniband/sw/rxe/rxe_net.c
> index 08f3f90..0810f38 100644
> --- a/drivers/infiniband/sw/rxe/rxe_net.c
> +++ b/drivers/infiniband/sw/rxe/rxe_net.c
> @@ -460,12 +460,17 @@ int rxe_send(struct rxe_dev *rxe, struct rxe_pkt_info *pkt, struct sk_buff *skb)
>  	nskb->destructor = rxe_skb_tx_dtor;
>  	nskb->sk = pkt->qp->sk->sk;
>  
> +	rxe_add_ref(pkt->qp);
> +	atomic_inc(&pkt->qp->skb_out);
> +
>  	if (av->network_type == RDMA_NETWORK_IPV4) {
>  		err = ip_local_out(dev_net(skb_dst(skb)->dev), nskb->sk, nskb);
>  	} else if (av->network_type == RDMA_NETWORK_IPV6) {
>  		err = ip6_local_out(dev_net(skb_dst(skb)->dev), nskb->sk, nskb);
>  	} else {
>  		pr_err("Unknown layer 3 protocol: %d\n", av->network_type);
> +		atomic_dec(&pkt->qp->skb_out);
> +		rxe_drop_ref(pkt->qp);
>  		kfree_skb(nskb);
>  		return -EINVAL;
>  	}
> @@ -475,10 +480,7 @@ int rxe_send(struct rxe_dev *rxe, struct rxe_pkt_info *pkt, struct sk_buff *skb)
>  		return -EAGAIN;
>  	}

Shouldn't we also dec and drop ref in case that net_xmit_eval returns
error?

>  
> -	rxe_add_ref(pkt->qp);
> -	atomic_inc(&pkt->qp->skb_out);
>  	kfree_skb(skb);
> -
>  	return 0;
>  }
>  
> -- 
> 1.8.3.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andrew Boyer Aug. 28, 2017, 1:05 p.m. UTC | #2
On 8/27/17, 8:01 AM, "linux-rdma-owner@vger.kernel.org on behalf of Yuval
Shaia" <linux-rdma-owner@vger.kernel.org on behalf of
yuval.shaia@oracle.com> wrote:

>On Fri, Aug 25, 2017 at 03:05:46PM -0400, Andrew Boyer wrote:
>> The network stack will call nskb's destructor, rxe_skb_tx_dtor(), if the
>> packet gets dropped by ip_local_out()/ip6_local_out(). Thus we need to
>>add
>> the QP ref before output to avoid extra dereferences during network
>> congestion. This could lead to unwanted destruction of the QP.
>> 
>> Fix up the skb_out accounting, too.
>> 
>> Fixes: fda85ce91240 ("IB/rxe: Fix kernel panic from skb destructor")
>> Signed-off-by: Andrew Boyer <andrew.boyer@dell.com>
>> Acked-by: Moni Shoua <monis@mellanox.com>
>> ---
>>  drivers/infiniband/sw/rxe/rxe_net.c | 8 +++++---
>>  1 file changed, 5 insertions(+), 3 deletions(-)
>> 
>> diff --git a/drivers/infiniband/sw/rxe/rxe_net.c
>>b/drivers/infiniband/sw/rxe/rxe_net.c
>> index 08f3f90..0810f38 100644
>> --- a/drivers/infiniband/sw/rxe/rxe_net.c
>> +++ b/drivers/infiniband/sw/rxe/rxe_net.c
>> @@ -460,12 +460,17 @@ int rxe_send(struct rxe_dev *rxe, struct
>>rxe_pkt_info *pkt, struct sk_buff *skb)
>>  	nskb->destructor = rxe_skb_tx_dtor;
>>  	nskb->sk = pkt->qp->sk->sk;
>>  
>> +	rxe_add_ref(pkt->qp);
>> +	atomic_inc(&pkt->qp->skb_out);
>> +
>>  	if (av->network_type == RDMA_NETWORK_IPV4) {
>>  		err = ip_local_out(dev_net(skb_dst(skb)->dev), nskb->sk, nskb);
>>  	} else if (av->network_type == RDMA_NETWORK_IPV6) {
>>  		err = ip6_local_out(dev_net(skb_dst(skb)->dev), nskb->sk, nskb);
>>  	} else {
>>  		pr_err("Unknown layer 3 protocol: %d\n", av->network_type);
>> +		atomic_dec(&pkt->qp->skb_out);
>> +		rxe_drop_ref(pkt->qp);
>>  		kfree_skb(nskb);
>>  		return -EINVAL;
>>  	}
>> @@ -475,10 +480,7 @@ int rxe_send(struct rxe_dev *rxe, struct
>>rxe_pkt_info *pkt, struct sk_buff *skb)
>>  		return -EAGAIN;
>>  	}
>
>Shouldn't we also dec and drop ref in case that net_xmit_eval returns
>error?

That¹s the catch. net_xmit_eval() is just looking for error codes; it¹s
not doing the actual transmit. The transmit already started back in
ip_local_out()/ip6_local_out(), and if those end up failing the destructor
will be called. Any cleanup we do in this error path would be duplicated
in the destructor.

-Andrew

>
>>  
>> -	rxe_add_ref(pkt->qp);
>> -	atomic_inc(&pkt->qp->skb_out);
>>  	kfree_skb(skb);
>> -
>>  	return 0;
>>  }
>>  
>> -- 
>> 1.8.3.1
>> 
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>--
>To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/infiniband/sw/rxe/rxe_net.c
index 08f3f90..0810f38 100644
--- a/drivers/infiniband/sw/rxe/rxe_net.c
+++ b/drivers/infiniband/sw/rxe/rxe_net.c
@@ -460,12 +460,17 @@  int rxe_send(struct rxe_dev *rxe, struct rxe_pkt_info *pkt, struct sk_buff *skb)
 	nskb->destructor = rxe_skb_tx_dtor;
 	nskb->sk = pkt->qp->sk->sk;
 
+	rxe_add_ref(pkt->qp);
+	atomic_inc(&pkt->qp->skb_out);
+
 	if (av->network_type == RDMA_NETWORK_IPV4) {
 		err = ip_local_out(dev_net(skb_dst(skb)->dev), nskb->sk, nskb);
 	} else if (av->network_type == RDMA_NETWORK_IPV6) {
 		err = ip6_local_out(dev_net(skb_dst(skb)->dev), nskb->sk, nskb);
 	} else {
 		pr_err("Unknown layer 3 protocol: %d\n", av->network_type);
+		atomic_dec(&pkt->qp->skb_out);
+		rxe_drop_ref(pkt->qp);
 		kfree_skb(nskb);
 		return -EINVAL;
 	}
@@ -475,10 +480,7 @@  int rxe_send(struct rxe_dev *rxe, struct rxe_pkt_info *pkt, struct sk_buff *skb)
 		return -EAGAIN;
 	}
 
-	rxe_add_ref(pkt->qp);
-	atomic_inc(&pkt->qp->skb_out);
 	kfree_skb(skb);
-
 	return 0;
 }