Message ID | 1518508786-3204-1-git-send-email-yanjun.zhu@oracle.com (mailing list archive) |
---|---|
State | Changes Requested |
Headers | show |
On Tue, Feb 13, 2018 at 02:59:46AM -0500, Zhu Yanjun wrote: > In send_atomic_ack function, it is not necessary make a s/make/to make > skb_clone. To gain better performance(high throughput and s/e(h/e (h > low latency), this skb_clone is removed. > > The following tests are made. > > server client > --------- --------- > |1.1.1.1|<----rxe-channel--->|1.1.1.2| > --------- --------- > > On server: rping -s -a 1.1.1.1 -v -C 1000 -S 512 > On client: rping -c -a 1.1.1.1 -v -C 1000 -S 512 > > The kernel config CONFIG_DEBUG_KMEMLEAK is enabled on both server > and client. > > This test runs for several hours. There is no memory leak and the whole > system can work well. > > As the above network, the following tests are made. > > Server: ibv_rc_pingpong -d rxe0 -g 1 > Client: ibv_rc_pingpong -d rxe0 -g 1 1.1.1.1 > > The result on Server(10 tests are made). > Before: > Throughput is 137.07 Mbit/sec > Latency is 517.76 usec/iter > > After: > Throughput is 148.85 Mbit/sec > Latency is 476.64 usec/iter > > The throughput is enhanced and the latency is reduced. > > CC: Srinivas Eeda <srinivas.eeda@oracle.com> > CC: Junxiao Bi <junxiao.bi@oracle.com> > Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com> > --- > V1-->V2: 10 tests are made. From throughput and latency, the performance is better. > --- > drivers/infiniband/sw/rxe/rxe_resp.c | 15 ++------------- > 1 file changed, 2 insertions(+), 13 deletions(-) > > diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c > index d37bb9b..6d01d16 100644 > --- a/drivers/infiniband/sw/rxe/rxe_resp.c > +++ b/drivers/infiniband/sw/rxe/rxe_resp.c > @@ -969,7 +969,6 @@ static int send_atomic_ack(struct rxe_qp *qp, struct rxe_pkt_info *pkt, > int rc = 0; > struct rxe_pkt_info ack_pkt; > struct sk_buff *skb; > - struct sk_buff *skb_copy; > struct rxe_dev *rxe = to_rdev(qp->ibqp.device); > struct resp_res *res; > > @@ -981,15 +980,6 @@ static int send_atomic_ack(struct rxe_qp *qp, struct rxe_pkt_info *pkt, > goto out; > } > > - skb_copy = skb_clone(skb, GFP_ATOMIC); > - if (skb_copy) > - rxe_add_ref(qp); /* for the new SKB */ Are you sure we don't need this? > - else { > - pr_warn("Could not clone atomic response\n"); > - rc = -ENOMEM; > - goto out; > - } > - > res = &qp->resp.resources[qp->resp.res_head]; > free_rd_atomic_resource(qp, res); > rxe_advance_resp_resource(qp); > @@ -998,17 +988,16 @@ static int send_atomic_ack(struct rxe_qp *qp, struct rxe_pkt_info *pkt, > memset((unsigned char *)SKB_TO_PKT(skb) + sizeof(ack_pkt), 0, > sizeof(skb->cb) - sizeof(ack_pkt)); > > + refcount_inc(&skb->users); > res->type = RXE_ATOMIC_MASK; > res->atomic.skb = skb; > res->first_psn = ack_pkt.psn; > res->last_psn = ack_pkt.psn; > res->cur_psn = ack_pkt.psn; > > - rc = rxe_xmit_packet(rxe, qp, &ack_pkt, skb_copy); > + rc = rxe_xmit_packet(rxe, qp, &ack_pkt, skb); > if (rc) { > pr_err_ratelimited("Failed sending ack\n"); > - rxe_drop_ref(qp); > - kfree_skb(skb_copy); > } With this modification suggesting to remove the curly braces. > > out: > -- > 2.7.4 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2018/2/13 19:21, Yuval Shaia wrote: > On Tue, Feb 13, 2018 at 02:59:46AM -0500, Zhu Yanjun wrote: >> In send_atomic_ack function, it is not necessary make a > s/make/to make > >> skb_clone. To gain better performance(high throughput and > s/e(h/e (h > >> low latency), this skb_clone is removed. >> >> The following tests are made. >> >> server client >> --------- --------- >> |1.1.1.1|<----rxe-channel--->|1.1.1.2| >> --------- --------- >> >> On server: rping -s -a 1.1.1.1 -v -C 1000 -S 512 >> On client: rping -c -a 1.1.1.1 -v -C 1000 -S 512 >> >> The kernel config CONFIG_DEBUG_KMEMLEAK is enabled on both server >> and client. >> >> This test runs for several hours. There is no memory leak and the whole >> system can work well. >> >> As the above network, the following tests are made. >> >> Server: ibv_rc_pingpong -d rxe0 -g 1 >> Client: ibv_rc_pingpong -d rxe0 -g 1 1.1.1.1 >> >> The result on Server(10 tests are made). >> Before: >> Throughput is 137.07 Mbit/sec >> Latency is 517.76 usec/iter >> >> After: >> Throughput is 148.85 Mbit/sec >> Latency is 476.64 usec/iter >> >> The throughput is enhanced and the latency is reduced. >> >> CC: Srinivas Eeda <srinivas.eeda@oracle.com> >> CC: Junxiao Bi <junxiao.bi@oracle.com> >> Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com> >> --- >> V1-->V2: 10 tests are made. From throughput and latency, the performance is better. >> --- >> drivers/infiniband/sw/rxe/rxe_resp.c | 15 ++------------- >> 1 file changed, 2 insertions(+), 13 deletions(-) >> >> diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c >> index d37bb9b..6d01d16 100644 >> --- a/drivers/infiniband/sw/rxe/rxe_resp.c >> +++ b/drivers/infiniband/sw/rxe/rxe_resp.c >> @@ -969,7 +969,6 @@ static int send_atomic_ack(struct rxe_qp *qp, struct rxe_pkt_info *pkt, >> int rc = 0; >> struct rxe_pkt_info ack_pkt; >> struct sk_buff *skb; >> - struct sk_buff *skb_copy; >> struct rxe_dev *rxe = to_rdev(qp->ibqp.device); >> struct resp_res *res; >> >> @@ -981,15 +980,6 @@ static int send_atomic_ack(struct rxe_qp *qp, struct rxe_pkt_info *pkt, >> goto out; >> } >> >> - skb_copy = skb_clone(skb, GFP_ATOMIC); >> - if (skb_copy) >> - rxe_add_ref(qp); /* for the new SKB */ > Are you sure we don't need this? From my stress tests and performance tests, it will get better performance to remove skb_clone. And there is no memory leak. The whole soft RoCE can work well. So I think removing this function is a good choice. Zhu Yanjun > >> - else { >> - pr_warn("Could not clone atomic response\n"); >> - rc = -ENOMEM; >> - goto out; >> - } >> - >> res = &qp->resp.resources[qp->resp.res_head]; >> free_rd_atomic_resource(qp, res); >> rxe_advance_resp_resource(qp); >> @@ -998,17 +988,16 @@ static int send_atomic_ack(struct rxe_qp *qp, struct rxe_pkt_info *pkt, >> memset((unsigned char *)SKB_TO_PKT(skb) + sizeof(ack_pkt), 0, >> sizeof(skb->cb) - sizeof(ack_pkt)); >> >> + refcount_inc(&skb->users); >> res->type = RXE_ATOMIC_MASK; >> res->atomic.skb = skb; >> res->first_psn = ack_pkt.psn; >> res->last_psn = ack_pkt.psn; >> res->cur_psn = ack_pkt.psn; >> >> - rc = rxe_xmit_packet(rxe, qp, &ack_pkt, skb_copy); >> + rc = rxe_xmit_packet(rxe, qp, &ack_pkt, skb); >> if (rc) { >> pr_err_ratelimited("Failed sending ack\n"); >> - rxe_drop_ref(qp); >> - kfree_skb(skb_copy); >> } > With this modification suggesting to remove the curly braces. > >> >> out: >> -- >> 2.7.4 >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> > > } > > > - skb_copy = skb_clone(skb, GFP_ATOMIC); > > > - if (skb_copy) > > > - rxe_add_ref(qp); /* for the new SKB */ > > Are you sure we don't need this? > From my stress tests and performance tests, it will get better performance > to remove skb_clone. My concern is only with the above ref count. > And there is no memory leak. The whole soft RoCE can work well. > So I think removing this function is a good choice. > > Zhu Yanjun > > -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2018/2/14 18:36, Yuval Shaia wrote: >>>> } >>>> - skb_copy = skb_clone(skb, GFP_ATOMIC); >>>> - if (skb_copy) >>>> - rxe_add_ref(qp); /* for the new SKB */ >>> Are you sure we don't need this? >> From my stress tests and performance tests, it will get better performance >> to remove skb_clone. > My concern is only with the above ref count. Hi, Yuval Thanks for your code review. From the function send_atomic_ack, the variable skb is used in the following 2 places. " ... res->type = RXE_ATOMIC_MASK; res->atomic.skb = skb; <----One is here. res->first_psn = ack_pkt.psn; res->last_psn = ack_pkt.psn; res->cur_psn = ack_pkt.psn; rc = rxe_xmit_packet(rxe, qp, &ack_pkt, skb); <---The other is here. ... " The 2 will call kfree_skb after skb is used. void kfree_skb(struct sk_buff *skb) { if (!skb_unref(skb)) return; trace_kfree_skb(skb, __builtin_return_address(0)); __kfree_skb(skb); } In the function kfree_skb, user ref is decreased and checked. To make sure in the 2 places, skb can be used. I increased user ref. Best Regards, Zhu Yanjun > >> And there is no memory leak. The whole soft RoCE can work well. >> So I think removing this function is a good choice. >> >> Zhu Yanjun -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2/14/2018 12:36 PM, Yuval Shaia wrote: >>>> } >>>> - skb_copy = skb_clone(skb, GFP_ATOMIC); >>>> - if (skb_copy) >>>> - rxe_add_ref(qp); /* for the new SKB */ >>> Are you sure we don't need this? >> From my stress tests and performance tests, it will get better performance >> to remove skb_clone. > > My concern is only with the above ref count. I agree with yuval in case xmit fails. send_atomic_ack() { rxe_xmit_packet() { rxe_send() { rxe_add_ref(pkt->qp) <--- add qp ref err = ip_local_out() <---- fail here } if (rxe_xmit_packet() fails) { rxe_drop_ref(qp) <--- you must deref here } } > >> And there is no memory leak. The whole soft RoCE can work well. >> So I think removing this function is a good choice. >> >> Zhu Yanjun >>> > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2018/2/18 16:12, Yonatan Cohen wrote: > On 2/14/2018 12:36 PM, Yuval Shaia wrote: >>>>> } >>>>> - skb_copy = skb_clone(skb, GFP_ATOMIC); >>>>> - if (skb_copy) >>>>> - rxe_add_ref(qp); /* for the new SKB */ >>>> Are you sure we don't need this? >>> From my stress tests and performance tests, it will get better >>> performance >>> to remove skb_clone. >> >> My concern is only with the above ref count. > I agree with yuval in case xmit fails. > send_atomic_ack() > { > rxe_xmit_packet() > { > rxe_send() > { > rxe_add_ref(pkt->qp) <--- add qp ref > err = ip_local_out() <---- fail here > } > > if (rxe_xmit_packet() fails) { > rxe_drop_ref(qp) <--- you must deref here > } > } Thank you very much. Very nice. I will make a new patch with it. Zhu Yanjun >> >>> And there is no memory leak. The whole soft RoCE can work well. >>> So I think removing this function is a good choice. >>> >>> Zhu Yanjun >>>> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > > -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2018/2/18 16:12, Yonatan Cohen wrote: > On 2/14/2018 12:36 PM, Yuval Shaia wrote: >>>>> } >>>>> - skb_copy = skb_clone(skb, GFP_ATOMIC); >>>>> - if (skb_copy) >>>>> - rxe_add_ref(qp); /* for the new SKB */ >>>> Are you sure we don't need this? >>> From my stress tests and performance tests, it will get better >>> performance >>> to remove skb_clone. >> >> My concern is only with the above ref count. > I agree with yuval in case xmit fails. > send_atomic_ack() > { > rxe_xmit_packet() > { > rxe_send() > { > rxe_add_ref(pkt->qp) <--- add qp ref > err = ip_local_out() <---- fail here > } > > if (rxe_xmit_packet() fails) { > rxe_drop_ref(qp) <--- you must deref here > } > } Hi, Thanks for your review. I checked the source code. When error occurs in the function ip_local_out/ip6_local_out, the network stack will call skb's destructor, rxe_skb_tx_dtor. In this function, rxe_drop_ref is called. So it is not necessary to call rxe_drop_ref again. static void rxe_skb_tx_dtor(struct sk_buff *skb) { struct sock *sk = skb->sk; struct rxe_qp *qp = sk->sk_user_data; int skb_out = atomic_dec_return(&qp->skb_out); if (unlikely(qp->need_req_skb && skb_out < RXE_INFLIGHT_SKBS_PER_QP_LOW)) rxe_run_task(&qp->req.task, 1); rxe_drop_ref(qp); <----here } Best Regards, Zhu Yanjun >> >>> And there is no memory leak. The whole soft RoCE can work well. >>> So I think removing this function is a good choice. >>> >>> Zhu Yanjun >>>> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > > -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2018/2/13 19:21, Yuval Shaia wrote: > On Tue, Feb 13, 2018 at 02:59:46AM -0500, Zhu Yanjun wrote: >> In send_atomic_ack function, it is not necessary make a > s/make/to make > >> skb_clone. To gain better performance(high throughput and > s/e(h/e (h > >> low latency), this skb_clone is removed. >> >> The following tests are made. >> >> server client >> --------- --------- >> |1.1.1.1|<----rxe-channel--->|1.1.1.2| >> --------- --------- >> >> On server: rping -s -a 1.1.1.1 -v -C 1000 -S 512 >> On client: rping -c -a 1.1.1.1 -v -C 1000 -S 512 >> >> The kernel config CONFIG_DEBUG_KMEMLEAK is enabled on both server >> and client. >> >> This test runs for several hours. There is no memory leak and the whole >> system can work well. >> >> As the above network, the following tests are made. >> >> Server: ibv_rc_pingpong -d rxe0 -g 1 >> Client: ibv_rc_pingpong -d rxe0 -g 1 1.1.1.1 >> >> The result on Server(10 tests are made). >> Before: >> Throughput is 137.07 Mbit/sec >> Latency is 517.76 usec/iter >> >> After: >> Throughput is 148.85 Mbit/sec >> Latency is 476.64 usec/iter >> >> The throughput is enhanced and the latency is reduced. >> >> CC: Srinivas Eeda <srinivas.eeda@oracle.com> >> CC: Junxiao Bi <junxiao.bi@oracle.com> >> Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com> >> --- >> V1-->V2: 10 tests are made. From throughput and latency, the performance is better. >> --- >> drivers/infiniband/sw/rxe/rxe_resp.c | 15 ++------------- >> 1 file changed, 2 insertions(+), 13 deletions(-) >> >> diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c >> index d37bb9b..6d01d16 100644 >> --- a/drivers/infiniband/sw/rxe/rxe_resp.c >> +++ b/drivers/infiniband/sw/rxe/rxe_resp.c >> @@ -969,7 +969,6 @@ static int send_atomic_ack(struct rxe_qp *qp, struct rxe_pkt_info *pkt, >> int rc = 0; >> struct rxe_pkt_info ack_pkt; >> struct sk_buff *skb; >> - struct sk_buff *skb_copy; >> struct rxe_dev *rxe = to_rdev(qp->ibqp.device); >> struct resp_res *res; >> >> @@ -981,15 +980,6 @@ static int send_atomic_ack(struct rxe_qp *qp, struct rxe_pkt_info *pkt, >> goto out; >> } >> >> - skb_copy = skb_clone(skb, GFP_ATOMIC); >> - if (skb_copy) >> - rxe_add_ref(qp); /* for the new SKB */ > Are you sure we don't need this? Hi, Yuval I checked the source code again. In rxe_send, rxe_add_ref is called. So IMHO, this rxe_add_ref is not needed here. ... send_atomic_ack rxe_xmit_packet rxe_send ... int rxe_send(struct rxe_pkt_info *pkt, struct sk_buff *skb) { struct rxe_av *av; int err; av = rxe_get_av(pkt); skb->destructor = rxe_skb_tx_dtor; skb->sk = pkt->qp->sk->sk; rxe_add_ref(pkt->qp); <----here ... If I am wrong, please correct me. Thanks a lot. Zhu Yanjun > >> - else { >> - pr_warn("Could not clone atomic response\n"); >> - rc = -ENOMEM; >> - goto out; >> - } >> - >> res = &qp->resp.resources[qp->resp.res_head]; >> free_rd_atomic_resource(qp, res); >> rxe_advance_resp_resource(qp); >> @@ -998,17 +988,16 @@ static int send_atomic_ack(struct rxe_qp *qp, struct rxe_pkt_info *pkt, >> memset((unsigned char *)SKB_TO_PKT(skb) + sizeof(ack_pkt), 0, >> sizeof(skb->cb) - sizeof(ack_pkt)); >> >> + refcount_inc(&skb->users); >> res->type = RXE_ATOMIC_MASK; >> res->atomic.skb = skb; >> res->first_psn = ack_pkt.psn; >> res->last_psn = ack_pkt.psn; >> res->cur_psn = ack_pkt.psn; >> >> - rc = rxe_xmit_packet(rxe, qp, &ack_pkt, skb_copy); >> + rc = rxe_xmit_packet(rxe, qp, &ack_pkt, skb); >> if (rc) { >> pr_err_ratelimited("Failed sending ack\n"); >> - rxe_drop_ref(qp); >> - kfree_skb(skb_copy); >> } > With this modification suggesting to remove the curly braces. > >> >> out: >> -- >> 2.7.4 >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c index d37bb9b..6d01d16 100644 --- a/drivers/infiniband/sw/rxe/rxe_resp.c +++ b/drivers/infiniband/sw/rxe/rxe_resp.c @@ -969,7 +969,6 @@ static int send_atomic_ack(struct rxe_qp *qp, struct rxe_pkt_info *pkt, int rc = 0; struct rxe_pkt_info ack_pkt; struct sk_buff *skb; - struct sk_buff *skb_copy; struct rxe_dev *rxe = to_rdev(qp->ibqp.device); struct resp_res *res; @@ -981,15 +980,6 @@ static int send_atomic_ack(struct rxe_qp *qp, struct rxe_pkt_info *pkt, goto out; } - skb_copy = skb_clone(skb, GFP_ATOMIC); - if (skb_copy) - rxe_add_ref(qp); /* for the new SKB */ - else { - pr_warn("Could not clone atomic response\n"); - rc = -ENOMEM; - goto out; - } - res = &qp->resp.resources[qp->resp.res_head]; free_rd_atomic_resource(qp, res); rxe_advance_resp_resource(qp); @@ -998,17 +988,16 @@ static int send_atomic_ack(struct rxe_qp *qp, struct rxe_pkt_info *pkt, memset((unsigned char *)SKB_TO_PKT(skb) + sizeof(ack_pkt), 0, sizeof(skb->cb) - sizeof(ack_pkt)); + refcount_inc(&skb->users); res->type = RXE_ATOMIC_MASK; res->atomic.skb = skb; res->first_psn = ack_pkt.psn; res->last_psn = ack_pkt.psn; res->cur_psn = ack_pkt.psn; - rc = rxe_xmit_packet(rxe, qp, &ack_pkt, skb_copy); + rc = rxe_xmit_packet(rxe, qp, &ack_pkt, skb); if (rc) { pr_err_ratelimited("Failed sending ack\n"); - rxe_drop_ref(qp); - kfree_skb(skb_copy); } out:
In send_atomic_ack function, it is not necessary make a skb_clone. To gain better performance(high throughput and low latency), this skb_clone is removed. The following tests are made. server client --------- --------- |1.1.1.1|<----rxe-channel--->|1.1.1.2| --------- --------- On server: rping -s -a 1.1.1.1 -v -C 1000 -S 512 On client: rping -c -a 1.1.1.1 -v -C 1000 -S 512 The kernel config CONFIG_DEBUG_KMEMLEAK is enabled on both server and client. This test runs for several hours. There is no memory leak and the whole system can work well. As the above network, the following tests are made. Server: ibv_rc_pingpong -d rxe0 -g 1 Client: ibv_rc_pingpong -d rxe0 -g 1 1.1.1.1 The result on Server(10 tests are made). Before: Throughput is 137.07 Mbit/sec Latency is 517.76 usec/iter After: Throughput is 148.85 Mbit/sec Latency is 476.64 usec/iter The throughput is enhanced and the latency is reduced. CC: Srinivas Eeda <srinivas.eeda@oracle.com> CC: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com> --- V1-->V2: 10 tests are made. From throughput and latency, the performance is better. --- drivers/infiniband/sw/rxe/rxe_resp.c | 15 ++------------- 1 file changed, 2 insertions(+), 13 deletions(-)