Message ID | 20220114054852.38058-1-tonylu@linux.alibaba.com (mailing list archive) |
---|---|
Headers | show |
Series | net/smc: Spread workload over multiple cores | expand |
On Fri, Jan 14, 2022 at 01:48:46PM +0800, Tony Lu wrote: > Currently, SMC creates one CQ per IB device, and shares this cq among > all the QPs of links. Meanwhile, this CQ is always binded to the first > completion vector, the IRQ affinity of this vector binds to some CPU > core. > > ┌────────┐ ┌──────────────┐ ┌──────────────┐ > │ SMC IB │ ├────┐ │ │ │ > │ DEVICE │ ┌─▶│ QP │ SMC LINK├──▶│SMC Link Group│ > │ ┌────┤ │ ├────┘ │ │ │ > │ │ CQ ├─┘ └──────────────┘ └──────────────┘ > │ │ ├─┐ ┌──────────────┐ ┌──────────────┐ > │ └────┤ │ ├────┐ │ │ │ > │ │ └─▶│ QP │ SMC LINK├──▶│SMC Link Group│ > │ │ ├────┘ │ │ │ > └────────┘ └──────────────┘ └──────────────┘ > > In this model, when connections execeeds SMC_RMBS_PER_LGR_MAX, it will > create multiple link groups and corresponding QPs. All the connections > share limited QPs and one CQ (both recv and send sides). Generally, one > completion vector binds to a fixed CPU core, it will limit the > performance by single core, and large-scale scenes, such as multiple > threads and lots of connections. > > Running nginx and wrk test with 8 threads and 800 connections on 8 cores > host, the softirq of CPU 0 is limited the scalability: > > 04:18:54 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle > 04:18:55 PM all 5.81 0.00 19.42 0.00 2.94 10.21 0.00 0.00 0.00 61.63 > 04:18:55 PM 0 0.00 0.00 0.00 0.00 16.80 82.78 0.00 0.00 0.00 0.41 > <snip> > > Nowadays, RDMA devices have more than one completion vectors, such as > mlx5 has 8, eRDMA has 4 completion vector by default. This unlocks the > limitation of single vector and single CPU core. > > To enhance scalability and take advantage of multi-core resources, we > can spread CQs to different CPU cores, and introduce more flexible > mapping. Here comes up a new model, the main different is that creating > multiple CQs per IB device, which the max number of CQs is limited by > ibdev's ability (num_comp_vectors). In the scenen of multiple linkgroups, > the link group's QP can bind to the least used CQ, and CQs are binded > to different completion vector and CPU cores. So that we can spread > the softirq (tasklet of wr tx/rx) handler to different cores. > > ┌──────────────┐ ┌──────────────┐ > ┌────────┐ ┌───────┐ ├────┐ │ │ │ > │ ├─▶│ CQ 0 ├──▶│ QP │ SMC LINK├──▶│SMC Link Group│ > │ │ └───────┘ ├────┘ │ │ │ > │ SMC IB │ ┌───────┐ └──────────────┘ └──────────────┘ > │ DEVICE ├─▶│ CQ 1 │─┐ > │ │ └───────┘ │ ┌──────────────┐ ┌──────────────┐ > │ │ ┌───────┐ │ ├────┐ │ │ │ > │ ├─▶│ CQ n │ └▶│ QP │ SMC LINK├──▶│SMC Link Group│ > └────────┘ └───────┘ ├────┘ │ │ │ > └──────────────┘ └──────────────┘ > > After sperad one CQ (4 linkgroups) to four CPU cores, the softirq load > spreads to different cores: > > 04:26:25 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle > 04:26:26 PM all 10.70 0.00 35.80 0.00 7.64 26.62 0.00 0.00 0.00 19.24 > 04:26:26 PM 0 0.00 0.00 0.00 0.00 16.33 50.00 0.00 0.00 0.00 33.67 > 04:26:26 PM 1 0.00 0.00 0.00 0.00 15.46 69.07 0.00 0.00 0.00 15.46 > 04:26:26 PM 2 0.00 0.00 0.00 0.00 13.13 39.39 0.00 0.00 0.00 47.47 > 04:26:26 PM 3 0.00 0.00 0.00 0.00 13.27 55.10 0.00 0.00 0.00 31.63 > <snip> > > Here is the benchmark with this patch (prototype of new model): > > Test environment: > - CPU Intel Xeon Platinum 8 core, mem 32 GiB, nic Mellanox CX4. > - nginx + wrk HTTP benchmark. > - nginx: disable access_log, increase keepalive_timeout and > keepalive_requests, long-live connection. > - wrk: 8 threads and 100, 200, 400 connections. > > Benchmark result: > > Conns/QPS 100 200 400 > w/o patch 338502.49 359216.66 398167.16 > w/ patch 677247.40 694193.70 812502.69 > Ratio +100.07% +93.25% +104.06% > > This prototype patches show nealy 1x increasement of QPS. > > The benchmarkes of 100, 200, 400 connections use 1, 1, 2 link groups. > When link group is one, it spreads send/recv to two cores. Once more > than one link groups, it would spread to more cores. > > If the application's connections is no more than link group's limitation > (SMC_RMBS_PER_LGR_MAX, 255), and CPU resources is restricted. This patch > introduces a tunable way to reduce the hard limitation of link group > connections number. It reduces the restriction of less CQs (cores) and > less competition, such as link-level CDC slots. It depends on the scenes > of applications, so this patch provides a userspace knob, users can > choose to share link groups for saving resources, or create more link > groups for less limitation. > > Patch 1-4 introduce multiple CQs support. > - Patch 1 spreads CQ to two cores, it works for less connections. > - Patch 2, 3, 4 introduce multiple CQs support, involves a new medium > to tie link and ibdev, and load balancing between different completion > vectors and CQs. > - the policy of spreading CQs is still thinking and testing to get > highest performance, such as splitting recv/send CQs, or joining them > together, or bind recv/recv (send/send) CQ to same vector and so on. > Glad to hear your advice. > > Patch 5 is a medium for userspace control knob. > - mainly provide two knobs to adjust the buffer size of smc socket. We > found that too little buffers would let smc wait for buffer for a long > time, and limit the performance. > - introduce a sysctl framework, just for tuning, netlink also does work. > Because sysctl is easy to compose as patch and no need userspace example. > I am glad to wait for your advice about the control panel for > userspace. > > Patch 6 introduces a tunable knob to decrease the per link group > connections' number, which would increase parallel performance as > mentioned previous. > > These patches are still improving, I am very glad to hear your advice. Please CC RDMA mailing list next time. Why didn't you use already existed APIs in drivers/infiniband/core/cq.c? ib_cq_pool_get() will do most if not all of your open-coded CQ spreading logic. Thanks
On Sun, Jan 16, 2022 at 11:00:33AM +0200, Leon Romanovsky wrote: > On Fri, Jan 14, 2022 at 01:48:46PM +0800, Tony Lu wrote: > > <snip> > > > > These patches are still improving, I am very glad to hear your advice. > > Please CC RDMA mailing list next time. I will do it in the next patch. > Why didn't you use already existed APIs in drivers/infiniband/core/cq.c? > ib_cq_pool_get() will do most if not all of your open-coded CQ spreading > logic. Thanks for your advice. I have looked into this API about shared CQ pool. It should suit for this scene after my brief test. I will replace the logic of least-used CQ to this CQ pool API in the next patch. Thank you. Tony Lu
On Sun, Jan 16, 2022 at 11:00:33AM +0200, Leon Romanovsky wrote: > > Please CC RDMA mailing list next time. > > Why didn't you use already existed APIs in drivers/infiniband/core/cq.c? > ib_cq_pool_get() will do most if not all of your open-coded CQ spreading > logic. I am working on replacing with ib_cq_pool_get(), this need ib_poll_context to indicate the poller which provides by ib_poll_handler(). It's okay for now, but for the callback function. When it polled a ib_wc, it would call wc->wr_cqe->done(cq, wc), which is the union with wr_id. The wr_id is heavily used in SMC. In this patch set, I am not going to change the logic which is out of cq allocation. So I have to use original interface to allocate cq this time. I am glad to hear your advice, if I missed information or misunderstood. Thanks, Tony Lu
On Wed, Jan 26, 2022 at 03:23:22PM +0800, Tony Lu wrote: > On Sun, Jan 16, 2022 at 11:00:33AM +0200, Leon Romanovsky wrote: > > > > Please CC RDMA mailing list next time. > > > > Why didn't you use already existed APIs in drivers/infiniband/core/cq.c? > > ib_cq_pool_get() will do most if not all of your open-coded CQ spreading > > logic. > > I am working on replacing with ib_cq_pool_get(), this need ib_poll_context > to indicate the poller which provides by ib_poll_handler(). It's okay > for now, but for the callback function. When it polled a ib_wc, it > would call wc->wr_cqe->done(cq, wc), which is the union with wr_id. The > wr_id is heavily used in SMC. Part of using the new interface is converting to use wr_cqe, you should just do that work instead of trying to duplicate a core API in a driver. Jason
On Wed, Jan 26, 2022 at 11:28:06AM -0400, Jason Gunthorpe wrote: > On Wed, Jan 26, 2022 at 03:23:22PM +0800, Tony Lu wrote: > > On Sun, Jan 16, 2022 at 11:00:33AM +0200, Leon Romanovsky wrote: > > > > > > Please CC RDMA mailing list next time. > > > > > > Why didn't you use already existed APIs in drivers/infiniband/core/cq.c? > > > ib_cq_pool_get() will do most if not all of your open-coded CQ spreading > > > logic. > > > > I am working on replacing with ib_cq_pool_get(), this need ib_poll_context > > to indicate the poller which provides by ib_poll_handler(). It's okay > > for now, but for the callback function. When it polled a ib_wc, it > > would call wc->wr_cqe->done(cq, wc), which is the union with wr_id. The > > wr_id is heavily used in SMC. > > Part of using the new interface is converting to use wr_cqe, you > should just do that work instead of trying to duplicate a core API in > a driver. Thanks for your advice. This patch set aims to improve performance with current API in SMC protocol, which is more urgent. I will do that work with new API in the next separate patch.That work may require a lot of revisions, I will issue patches after a full discussion with Karsten and finalization of the solution. Thanks, Tony Lu
On Thu, Jan 27, 2022 at 11:14:37AM +0800, Tony Lu wrote: > On Wed, Jan 26, 2022 at 11:28:06AM -0400, Jason Gunthorpe wrote: > > On Wed, Jan 26, 2022 at 03:23:22PM +0800, Tony Lu wrote: > > > On Sun, Jan 16, 2022 at 11:00:33AM +0200, Leon Romanovsky wrote: > > > > > > > > Please CC RDMA mailing list next time. > > > > > > > > Why didn't you use already existed APIs in drivers/infiniband/core/cq.c? > > > > ib_cq_pool_get() will do most if not all of your open-coded CQ spreading > > > > logic. > > > > > > I am working on replacing with ib_cq_pool_get(), this need ib_poll_context > > > to indicate the poller which provides by ib_poll_handler(). It's okay > > > for now, but for the callback function. When it polled a ib_wc, it > > > would call wc->wr_cqe->done(cq, wc), which is the union with wr_id. The > > > wr_id is heavily used in SMC. > > > > Part of using the new interface is converting to use wr_cqe, you > > should just do that work instead of trying to duplicate a core API in > > a driver. > > Thanks for your advice. This patch set aims to improve performance with > current API in SMC protocol, which is more urgent. This code existed from 2017, it is hard to agree with "urgent" claim. Thanks
On Thu, Jan 27, 2022 at 08:21:07AM +0200, Leon Romanovsky wrote: > On Thu, Jan 27, 2022 at 11:14:37AM +0800, Tony Lu wrote: > > On Wed, Jan 26, 2022 at 11:28:06AM -0400, Jason Gunthorpe wrote: > > > On Wed, Jan 26, 2022 at 03:23:22PM +0800, Tony Lu wrote: > > > > On Sun, Jan 16, 2022 at 11:00:33AM +0200, Leon Romanovsky wrote: > > > > > > > > > > Please CC RDMA mailing list next time. > > > > > > > > > > Why didn't you use already existed APIs in drivers/infiniband/core/cq.c? > > > > > ib_cq_pool_get() will do most if not all of your open-coded CQ spreading > > > > > logic. > > > > > > > > I am working on replacing with ib_cq_pool_get(), this need ib_poll_context > > > > to indicate the poller which provides by ib_poll_handler(). It's okay > > > > for now, but for the callback function. When it polled a ib_wc, it > > > > would call wc->wr_cqe->done(cq, wc), which is the union with wr_id. The > > > > wr_id is heavily used in SMC. > > > > > > Part of using the new interface is converting to use wr_cqe, you > > > should just do that work instead of trying to duplicate a core API in > > > a driver. > > > > Thanks for your advice. This patch set aims to improve performance with > > current API in SMC protocol, which is more urgent. > > This code existed from 2017, it is hard to agree with "urgent" claim. Yes, I agree with you that the code is old. I think there are two problems, one for performance issue, the other one for API refactor. We are running into the performance issues mentioned in patches in our cloud environment. So I think it is more urgent for a real world issue. The current modification is less intrusive to the code. This makes changes simpler. And current implement works for now, this is why I put refactor behind. Thank you, Tony Lu
On Thu, Jan 27, 2022 at 03:59:36PM +0800, Tony Lu wrote: > On Thu, Jan 27, 2022 at 08:21:07AM +0200, Leon Romanovsky wrote: > > On Thu, Jan 27, 2022 at 11:14:37AM +0800, Tony Lu wrote: > > > On Wed, Jan 26, 2022 at 11:28:06AM -0400, Jason Gunthorpe wrote: > > > > On Wed, Jan 26, 2022 at 03:23:22PM +0800, Tony Lu wrote: > > > > > On Sun, Jan 16, 2022 at 11:00:33AM +0200, Leon Romanovsky wrote: > > > > > > > > > > > > Please CC RDMA mailing list next time. > > > > > > > > > > > > Why didn't you use already existed APIs in drivers/infiniband/core/cq.c? > > > > > > ib_cq_pool_get() will do most if not all of your open-coded CQ spreading > > > > > > logic. > > > > > > > > > > I am working on replacing with ib_cq_pool_get(), this need ib_poll_context > > > > > to indicate the poller which provides by ib_poll_handler(). It's okay > > > > > for now, but for the callback function. When it polled a ib_wc, it > > > > > would call wc->wr_cqe->done(cq, wc), which is the union with wr_id. The > > > > > wr_id is heavily used in SMC. > > > > > > > > Part of using the new interface is converting to use wr_cqe, you > > > > should just do that work instead of trying to duplicate a core API in > > > > a driver. > > > > > > Thanks for your advice. This patch set aims to improve performance with > > > current API in SMC protocol, which is more urgent. > > > > This code existed from 2017, it is hard to agree with "urgent" claim. > > Yes, I agree with you that the code is old. I think there are two > problems, one for performance issue, the other one for API refactor. > > We are running into the performance issues mentioned in patches in our > cloud environment. So I think it is more urgent for a real world issue. > > The current modification is less intrusive to the code. This makes > changes simpler. And current implement works for now, this is why I put > refactor behind. We are not requesting to refactor the code, but properly use existing in-kernel API, while implementing new feature ("Spread workload over multiple cores"). Thanks > > Thank you, > Tony Lu
On Thu, Jan 27, 2022 at 10:47:09AM +0200, Leon Romanovsky wrote: > On Thu, Jan 27, 2022 at 03:59:36PM +0800, Tony Lu wrote: > > > > Yes, I agree with you that the code is old. I think there are two > > problems, one for performance issue, the other one for API refactor. > > > > We are running into the performance issues mentioned in patches in our > > cloud environment. So I think it is more urgent for a real world issue. > > > > The current modification is less intrusive to the code. This makes > > changes simpler. And current implement works for now, this is why I put > > refactor behind. > > We are not requesting to refactor the code, but properly use existing > in-kernel API, while implementing new feature ("Spread workload over > multiple cores"). Sorry for that if I missed something about properly using existing in-kernel API. I am not sure the proper API is to use ib_cq_pool_get() and ib_cq_pool_put()? If so, these APIs doesn't suit for current smc's usage, I have to refactor logic (tasklet and wr_id) in smc. I think it is a huge work and should do it with full discussion. Thanks, Tony Lu
On Thu, Jan 27, 2022 at 05:14:35PM +0800, Tony Lu wrote: > On Thu, Jan 27, 2022 at 10:47:09AM +0200, Leon Romanovsky wrote: > > On Thu, Jan 27, 2022 at 03:59:36PM +0800, Tony Lu wrote: > > > > > > Yes, I agree with you that the code is old. I think there are two > > > problems, one for performance issue, the other one for API refactor. > > > > > > We are running into the performance issues mentioned in patches in our > > > cloud environment. So I think it is more urgent for a real world issue. > > > > > > The current modification is less intrusive to the code. This makes > > > changes simpler. And current implement works for now, this is why I put > > > refactor behind. > > > > We are not requesting to refactor the code, but properly use existing > > in-kernel API, while implementing new feature ("Spread workload over > > multiple cores"). > > Sorry for that if I missed something about properly using existing > in-kernel API. I am not sure the proper API is to use ib_cq_pool_get() > and ib_cq_pool_put()? > > If so, these APIs doesn't suit for current smc's usage, I have to > refactor logic (tasklet and wr_id) in smc. I think it is a huge work > and should do it with full discussion. This discussion is not going anywhere. Just to summarize, we (Jason and I) are asking to use existing API, from the beginning. You can try and convince netdev maintainers to merge the code despite our request. Thanks > > Thanks, > Tony Lu
On Thu, Jan 27, 2022 at 11:25:41AM +0200, Leon Romanovsky wrote: > On Thu, Jan 27, 2022 at 05:14:35PM +0800, Tony Lu wrote: > > On Thu, Jan 27, 2022 at 10:47:09AM +0200, Leon Romanovsky wrote: > > > On Thu, Jan 27, 2022 at 03:59:36PM +0800, Tony Lu wrote: > > > > Sorry for that if I missed something about properly using existing > > in-kernel API. I am not sure the proper API is to use ib_cq_pool_get() > > and ib_cq_pool_put()? > > > > If so, these APIs doesn't suit for current smc's usage, I have to > > refactor logic (tasklet and wr_id) in smc. I think it is a huge work > > and should do it with full discussion. > > This discussion is not going anywhere. Just to summarize, we (Jason and I) > are asking to use existing API, from the beginning. Yes, I can't agree more with you about using existing API and I have tried them earlier. The existing APIs are easy to use if I wrote a new logic. I also don't want to repeat the codes. The main obstacle is that the packet and wr processing of smc is tightly bound to the old API and not easy to replace with existing API. To solve a real issue, I have to fix it based on the old API. If using existing API in this patch, I have to refactor smc logics which needs more time. Our production tree is synced with smc next. So I choose to fix this issue first, then refactor these logic to fit existing API once and for all. > You can try and convince netdev maintainers to merge the code despite > our request. That's not my purpose to recklessly merge this patch. I appreciate that you can tell me existing APIs are available. So I listened to everyone and decided to go with a compromise, fix it first, then refactor. Thanks for your advice. Tony Lu
On 27/01/2022 10:50, Tony Lu wrote: > On Thu, Jan 27, 2022 at 11:25:41AM +0200, Leon Romanovsky wrote: >> On Thu, Jan 27, 2022 at 05:14:35PM +0800, Tony Lu wrote: >>> On Thu, Jan 27, 2022 at 10:47:09AM +0200, Leon Romanovsky wrote: >>>> On Thu, Jan 27, 2022 at 03:59:36PM +0800, Tony Lu wrote: >>> >>> Sorry for that if I missed something about properly using existing >>> in-kernel API. I am not sure the proper API is to use ib_cq_pool_get() >>> and ib_cq_pool_put()? >>> >>> If so, these APIs doesn't suit for current smc's usage, I have to >>> refactor logic (tasklet and wr_id) in smc. I think it is a huge work >>> and should do it with full discussion. >> >> This discussion is not going anywhere. Just to summarize, we (Jason and I) >> are asking to use existing API, from the beginning. > > Yes, I can't agree more with you about using existing API and I have > tried them earlier. The existing APIs are easy to use if I wrote a new > logic. I also don't want to repeat the codes. > > The main obstacle is that the packet and wr processing of smc is > tightly bound to the old API and not easy to replace with existing API. > > To solve a real issue, I have to fix it based on the old API. If using > existing API in this patch, I have to refactor smc logics which needs > more time. Our production tree is synced with smc next. So I choose to > fix this issue first, then refactor these logic to fit existing API once > and for all. While I understand your approach to fix the issue first I need to say that such interim fixes create an significant amount of effort that has to be spent for review and test for others. And there is the increased risk to introduce new bugs by just this only-for-now fix. Given the fact that right now you are the only one who is affected by this problem I recommend to keep your fix in your environment for now, and come back with the final version. In the meantime I can use the saved time to review the bunch of other patches that we received. Thank you!
On Thu, Jan 27, 2022 at 03:52:36PM +0100, Karsten Graul wrote: > On 27/01/2022 10:50, Tony Lu wrote: > > On Thu, Jan 27, 2022 at 11:25:41AM +0200, Leon Romanovsky wrote: > >> On Thu, Jan 27, 2022 at 05:14:35PM +0800, Tony Lu wrote: > >>> On Thu, Jan 27, 2022 at 10:47:09AM +0200, Leon Romanovsky wrote: > >>>> On Thu, Jan 27, 2022 at 03:59:36PM +0800, Tony Lu wrote: > >>> > >>> Sorry for that if I missed something about properly using existing > >>> in-kernel API. I am not sure the proper API is to use ib_cq_pool_get() > >>> and ib_cq_pool_put()? > >>> > >>> If so, these APIs doesn't suit for current smc's usage, I have to > >>> refactor logic (tasklet and wr_id) in smc. I think it is a huge work > >>> and should do it with full discussion. > >> > >> This discussion is not going anywhere. Just to summarize, we (Jason and I) > >> are asking to use existing API, from the beginning. > > > > Yes, I can't agree more with you about using existing API and I have > > tried them earlier. The existing APIs are easy to use if I wrote a new > > logic. I also don't want to repeat the codes. > > > > The main obstacle is that the packet and wr processing of smc is > > tightly bound to the old API and not easy to replace with existing API. > > > > To solve a real issue, I have to fix it based on the old API. If using > > existing API in this patch, I have to refactor smc logics which needs > > more time. Our production tree is synced with smc next. So I choose to > > fix this issue first, then refactor these logic to fit existing API once > > and for all. > > While I understand your approach to fix the issue first I need to say > that such interim fixes create an significant amount of effort that has to > be spent for review and test for others. And there is the increased risk > to introduce new bugs by just this only-for-now fix. Let's back to this patch itself. This approach spreads CQs to different vectors, it tries to solve this issue under current design and not to introduce more changes to make it easier to review and test. It severely limits the performance of SMC when replacing TCP. This patch tries to reduce the gap between SMC and TCP. To use newer API, it should have a lots of work to do with wr process logic, for example remove tasklet handler, refactor wr_id logic. I have no idea if we should do this? If it's okay and got your permission, I will do this in the next patch. > Given the fact that right now you are the only one who is affected by this problem > I recommend to keep your fix in your environment for now, and come back with the > final version. In the meantime I can use the saved time to review the bunch > of other patches that we received. I really appreciate the time you spent reviewing our patch. Recently, our team has submitted a lot of patches and got your detailed suggestions, including panic (linkgroup, CDC), performance and so on. We are using SMC in our public cloud environment. Therefore, we maintain a internal tree and try to contribute these changes to upstream, and we will continue to invest to improve the stability, performance and compatibility, and focus on SMC for a long time. We are willing to commit time and resource to help out in reviewing and testing the patch in mail list and -next, as reviewer or tester. We have built up CI/CD and nightly test for SMC. And we intend to send test reports for each patch in the mail list, help to review, find out panic and performance regression. Not sure if this proposal will help save your time to review other patches? Glad to hear your advice. Thank you, Tony Lu
On 28/01/2022 07:55, Tony Lu wrote: > On Thu, Jan 27, 2022 at 03:52:36PM +0100, Karsten Graul wrote: >> On 27/01/2022 10:50, Tony Lu wrote: >>> On Thu, Jan 27, 2022 at 11:25:41AM +0200, Leon Romanovsky wrote: >>>> On Thu, Jan 27, 2022 at 05:14:35PM +0800, Tony Lu wrote: >>>>> On Thu, Jan 27, 2022 at 10:47:09AM +0200, Leon Romanovsky wrote: >>>>>> On Thu, Jan 27, 2022 at 03:59:36PM +0800, Tony Lu wrote: >>>>> >>>>> Sorry for that if I missed something about properly using existing >>>>> in-kernel API. I am not sure the proper API is to use ib_cq_pool_get() >>>>> and ib_cq_pool_put()? >>>>> >>>>> If so, these APIs doesn't suit for current smc's usage, I have to >>>>> refactor logic (tasklet and wr_id) in smc. I think it is a huge work >>>>> and should do it with full discussion. >>>> >>>> This discussion is not going anywhere. Just to summarize, we (Jason and I) >>>> are asking to use existing API, from the beginning. >>> >>> Yes, I can't agree more with you about using existing API and I have >>> tried them earlier. The existing APIs are easy to use if I wrote a new >>> logic. I also don't want to repeat the codes. >>> >>> The main obstacle is that the packet and wr processing of smc is >>> tightly bound to the old API and not easy to replace with existing API. >>> >>> To solve a real issue, I have to fix it based on the old API. If using >>> existing API in this patch, I have to refactor smc logics which needs >>> more time. Our production tree is synced with smc next. So I choose to >>> fix this issue first, then refactor these logic to fit existing API once >>> and for all. >> >> While I understand your approach to fix the issue first I need to say >> that such interim fixes create an significant amount of effort that has to >> be spent for review and test for others. And there is the increased risk >> to introduce new bugs by just this only-for-now fix. > > Let's back to this patch itself. This approach spreads CQs to different > vectors, it tries to solve this issue under current design and not to > introduce more changes to make it easier to review and test. It severely > limits the performance of SMC when replacing TCP. This patch tries to > reduce the gap between SMC and TCP. > > To use newer API, it should have a lots of work to do with wr process > logic, for example remove tasklet handler, refactor wr_id logic. I have > no idea if we should do this? If it's okay and got your permission, I > will do this in the next patch. Hi Tony, I think there was quite a discussion now about this patch series and the conclusion from the RDMA list and from my side was that if this code is changed it should be done using the new API. The current version re-implements code that is already available there. I agree that using the new API is the way to go, and I am in for any early discussions about the changes that are needed. >> Given the fact that right now you are the only one who is affected by this problem >> I recommend to keep your fix in your environment for now, and come back with the >> final version. In the meantime I can use the saved time to review the bunch >> of other patches that we received. > > I really appreciate the time you spent reviewing our patch. Recently, > our team has submitted a lot of patches and got your detailed > suggestions, including panic (linkgroup, CDC), performance and so on. > We are using SMC in our public cloud environment. Therefore, we maintain > a internal tree and try to contribute these changes to upstream, and we > will continue to invest to improve the stability, performance and > compatibility, and focus on SMC for a long time. > > We are willing to commit time and resource to help out in reviewing and > testing the patch in mail list and -next, as reviewer or tester. > > We have built up CI/CD and nightly test for SMC. And we intend to send > test reports for each patch in the mail list, help to review, find out > panic and performance regression. > > Not sure if this proposal will help save your time to review other > patches? Glad to hear your advice. Thanks for all the time and work you spend to bring SMC into a cloud environment! We really appreciate this a lot. Karsten
On Tue, Feb 01, 2022 at 05:50:48PM +0100, Karsten Graul wrote: > On 28/01/2022 07:55, Tony Lu wrote: > > On Thu, Jan 27, 2022 at 03:52:36PM +0100, Karsten Graul wrote: > >> On 27/01/2022 10:50, Tony Lu wrote: > >>> On Thu, Jan 27, 2022 at 11:25:41AM +0200, Leon Romanovsky wrote: > >>>> On Thu, Jan 27, 2022 at 05:14:35PM +0800, Tony Lu wrote: > >>>>> On Thu, Jan 27, 2022 at 10:47:09AM +0200, Leon Romanovsky wrote: > >>>>>> On Thu, Jan 27, 2022 at 03:59:36PM +0800, Tony Lu wrote: > >>>>> > >>>>> Sorry for that if I missed something about properly using existing > >>>>> in-kernel API. I am not sure the proper API is to use ib_cq_pool_get() > >>>>> and ib_cq_pool_put()? > >>>>> > >>>>> If so, these APIs doesn't suit for current smc's usage, I have to > >>>>> refactor logic (tasklet and wr_id) in smc. I think it is a huge work > >>>>> and should do it with full discussion. > >>>> > >>>> This discussion is not going anywhere. Just to summarize, we (Jason and I) > >>>> are asking to use existing API, from the beginning. > >>> > >>> Yes, I can't agree more with you about using existing API and I have > >>> tried them earlier. The existing APIs are easy to use if I wrote a new > >>> logic. I also don't want to repeat the codes. > >>> > >>> The main obstacle is that the packet and wr processing of smc is > >>> tightly bound to the old API and not easy to replace with existing API. > >>> > >>> To solve a real issue, I have to fix it based on the old API. If using > >>> existing API in this patch, I have to refactor smc logics which needs > >>> more time. Our production tree is synced with smc next. So I choose to > >>> fix this issue first, then refactor these logic to fit existing API once > >>> and for all. > >> > >> While I understand your approach to fix the issue first I need to say > >> that such interim fixes create an significant amount of effort that has to > >> be spent for review and test for others. And there is the increased risk > >> to introduce new bugs by just this only-for-now fix. > > > > Let's back to this patch itself. This approach spreads CQs to different > > vectors, it tries to solve this issue under current design and not to > > introduce more changes to make it easier to review and test. It severely > > limits the performance of SMC when replacing TCP. This patch tries to > > reduce the gap between SMC and TCP. > > > > To use newer API, it should have a lots of work to do with wr process > > logic, for example remove tasklet handler, refactor wr_id logic. I have > > no idea if we should do this? If it's okay and got your permission, I > > will do this in the next patch. > > Hi Tony, > > I think there was quite a discussion now about this patch series and the conclusion from > the RDMA list and from my side was that if this code is changed it should be done using > the new API. The current version re-implements code that is already available there. > > I agree that using the new API is the way to go, and I am in for any early discussions > about the changes that are needed. > Thank you for pointing me to the sure way. I am working on this. I will send the complete refactor version with the new API later. Best regards, Tony Lu