mbox series

[MAINLINE,0/2] Enable DIM for legacy ULPs and use it in RDS

Message ID 20240918083552.77531-1-haakon.bugge@oracle.com (mailing list archive)
Headers show
Series Enable DIM for legacy ULPs and use it in RDS | expand

Message

Haakon Bugge Sept. 18, 2024, 8:35 a.m. UTC
The Dynamic Interrupt Moderation mechanism can only be used by ULPs
using ib_alloc_cq() and family. We extend DIM to also cover legacy
ULPs using ib_create_cq(). The last commit takes advantage of this end
uses DIM in RDS.

Håkon Bugge (2):
  RDMA/core: Enable legacy ULPs to use RDMA DIM
  rds: ib: Add Dynamic Interrupt Moderation to CQs

 drivers/infiniband/core/cq.c    |  7 +++++--
 drivers/infiniband/core/cq.h    | 16 ++++++++++++++++
 drivers/infiniband/core/verbs.c |  6 ++++++
 net/rds/ib_cm.c                 | 10 ++++++++++
 4 files changed, 37 insertions(+), 2 deletions(-)
 create mode 100644 drivers/infiniband/core/cq.h

--
2.43.5

Comments

Christoph Hellwig Sept. 19, 2024, 2:17 p.m. UTC | #1
On Wed, Sep 18, 2024 at 10:35:50AM +0200, Håkon Bugge wrote:
> The Dynamic Interrupt Moderation mechanism can only be used by ULPs
> using ib_alloc_cq() and family. We extend DIM to also cover legacy
> ULPs using ib_create_cq(). The last commit takes advantage of this end
> uses DIM in RDS.

I would much prefer if you could move RDS off that horrible API finally
instead of investing more effort into it and making it more complicated.
Haakon Bugge Sept. 20, 2024, 9:46 a.m. UTC | #2
Hi Christoph,

> On 19 Sep 2024, at 16:17, Christoph Hellwig <hch@infradead.org> wrote:
> 
> On Wed, Sep 18, 2024 at 10:35:50AM +0200, Håkon Bugge wrote:
>> The Dynamic Interrupt Moderation mechanism can only be used by ULPs
>> using ib_alloc_cq() and family. We extend DIM to also cover legacy
>> ULPs using ib_create_cq(). The last commit takes advantage of this end
>> uses DIM in RDS.
> 
> I would much prefer if you could move RDS off that horrible API finally
> instead of investing more effort into it and making it more complicated.

ib_alloc_cq() and family does not support arming the CQ with the IB_CQ_SOLICITED flag, which RDS uses.


Thxs, Håkon
Christoph Hellwig Sept. 20, 2024, 1:51 p.m. UTC | #3
On Fri, Sep 20, 2024 at 09:46:06AM +0000, Haakon Bugge wrote:
> > I would much prefer if you could move RDS off that horrible API finally
> > instead of investing more effort into it and making it more complicated.
> 
> ib_alloc_cq() and family does not support arming the CQ with the IB_CQ_SOLICITED flag, which RDS uses.

Then work on supporting it.  RDS and SMC are the only users, so one
of the maintainers needs to drive it.
Zhu Yanjun Sept. 21, 2024, 2:28 a.m. UTC | #4
在 2024/9/20 21:51, Christoph Hellwig 写道:
> On Fri, Sep 20, 2024 at 09:46:06AM +0000, Haakon Bugge wrote:
>>> I would much prefer if you could move RDS off that horrible API finally
>>> instead of investing more effort into it and making it more complicated.
>>
>> ib_alloc_cq() and family does not support arming the CQ with the IB_CQ_SOLICITED flag, which RDS uses.
> 
> Then work on supporting it.  RDS and SMC are the only users, so one

Some other open source projects are also the users.

Zhu Yanjun

> of the maintainers needs to drive it.
>
Leon Romanovsky Sept. 22, 2024, 2:11 p.m. UTC | #5
On Sat, Sep 21, 2024 at 10:28:48AM +0800, Zhu Yanjun wrote:
> 在 2024/9/20 21:51, Christoph Hellwig 写道:
> > On Fri, Sep 20, 2024 at 09:46:06AM +0000, Haakon Bugge wrote:
> > > > I would much prefer if you could move RDS off that horrible API finally
> > > > instead of investing more effort into it and making it more complicated.
> > > 
> > > ib_alloc_cq() and family does not support arming the CQ with the IB_CQ_SOLICITED flag, which RDS uses.
> > 
> > Then work on supporting it.  RDS and SMC are the only users, so one
> 
> Some other open source projects are also the users.

What are these projects? I see only RDS and SMC.

Thanks

> 
> Zhu Yanjun
> 
> > of the maintainers needs to drive it.
> > 
> 
>
Christoph Hellwig Sept. 23, 2024, 12:03 p.m. UTC | #6
On Sat, Sep 21, 2024 at 10:28:48AM +0800, Zhu Yanjun wrote:
> 在 2024/9/20 21:51, Christoph Hellwig 写道:
> > On Fri, Sep 20, 2024 at 09:46:06AM +0000, Haakon Bugge wrote:
> > > > I would much prefer if you could move RDS off that horrible API finally
> > > > instead of investing more effort into it and making it more complicated.
> > > 
> > > ib_alloc_cq() and family does not support arming the CQ with the IB_CQ_SOLICITED flag, which RDS uses.
> > 
> > Then work on supporting it.  RDS and SMC are the only users, so one
> 
> Some other open source projects are also the users.

I'm not sure what you mean with "open source projects", but the only
thing that matters is users in the kernel tree.
Zhu Yanjun Sept. 24, 2024, 1:58 a.m. UTC | #7
在 2024/9/23 20:03, Christoph Hellwig 写道:
> On Sat, Sep 21, 2024 at 10:28:48AM +0800, Zhu Yanjun wrote:
>> 在 2024/9/20 21:51, Christoph Hellwig 写道:
>>> On Fri, Sep 20, 2024 at 09:46:06AM +0000, Haakon Bugge wrote:
>>>>> I would much prefer if you could move RDS off that horrible API finally
>>>>> instead of investing more effort into it and making it more complicated.
>>>>
>>>> ib_alloc_cq() and family does not support arming the CQ with the IB_CQ_SOLICITED flag, which RDS uses.
>>>
>>> Then work on supporting it.  RDS and SMC are the only users, so one
>>
>> Some other open source projects are also the users.
> 
> I'm not sure what you mean with "open source projects", but the only
> thing that matters is users in the kernel tree.

The users that I mentioned is not in the kernel tree.

Best Regards,
Zhu Yanjun
>
Christoph Hellwig Sept. 24, 2024, 6:54 a.m. UTC | #8
On Tue, Sep 24, 2024 at 09:58:24AM +0800, Zhu Yanjun wrote:
> The users that I mentioned is not in the kernel tree.

And why do you think that would matter the slightest?
Christoph Hellwig Sept. 24, 2024, 7:01 a.m. UTC | #9
On Fri, Sep 20, 2024 at 06:51:18AM -0700, Christoph Hellwig wrote:
> On Fri, Sep 20, 2024 at 09:46:06AM +0000, Haakon Bugge wrote:
> > > I would much prefer if you could move RDS off that horrible API finally
> > > instead of investing more effort into it and making it more complicated.
> > 
> > ib_alloc_cq() and family does not support arming the CQ with the IB_CQ_SOLICITED flag, which RDS uses.
> 
> Then work on supporting it.  RDS and SMC are the only users, so one
> of the maintainers needs to drive it.

I took a quick look at what it would take, and adding IB_CQ_SOLICITED
support to the cq API looks pretty trivial, you'll just need to pass
it to ib_cq_pool_get by adding a new argument and the pass it
down to __ib_alloc_cq.  So yes, please get started at moving RDS out
of the stone age.
Haakon Bugge Sept. 24, 2024, 8:59 a.m. UTC | #10
> On 24 Sep 2024, at 09:01, Christoph Hellwig <hch@infradead.org> wrote:
> 
> On Fri, Sep 20, 2024 at 06:51:18AM -0700, Christoph Hellwig wrote:
>> On Fri, Sep 20, 2024 at 09:46:06AM +0000, Haakon Bugge wrote:
>>>> I would much prefer if you could move RDS off that horrible API finally
>>>> instead of investing more effort into it and making it more complicated.
>>> 
>>> ib_alloc_cq() and family does not support arming the CQ with the IB_CQ_SOLICITED flag, which RDS uses.
>> 
>> Then work on supporting it.  RDS and SMC are the only users, so one
>> of the maintainers needs to drive it.
> 
> I took a quick look at what it would take, and adding IB_CQ_SOLICITED
> support to the cq API looks pretty trivial, you'll just need to pass
> it to ib_cq_pool_get by adding a new argument and the pass it
> down to __ib_alloc_cq.  So yes, please get started at moving RDS out
> of the stone age.

Agree. I'll work on that. I am also inclined to improve it by having designated CPUs where the worker threads polling the CQs execute, as that often improves cache hit rates. But that will come as another commit.


Thxs, Håkon
Zhu Yanjun Sept. 24, 2024, 1:59 p.m. UTC | #11
在 2024/9/24 14:54, Christoph Hellwig 写道:
> On Tue, Sep 24, 2024 at 09:58:24AM +0800, Zhu Yanjun wrote:
>> The users that I mentioned is not in the kernel tree.
> 
> And why do you think that would matter the slightest?

I noticed that the same cq functions are used. And I also made tests 
with this patch series. Without this patch series, dim mechanism will 
not be invoked.

Zhu Yanjun

>
Haakon Bugge Sept. 24, 2024, 3:16 p.m. UTC | #12
> On 24 Sep 2024, at 15:59, Zhu Yanjun <yanjun.zhu@linux.dev> wrote:
> 
> 在 2024/9/24 14:54, Christoph Hellwig 写道:
>> On Tue, Sep 24, 2024 at 09:58:24AM +0800, Zhu Yanjun wrote:
>>> The users that I mentioned is not in the kernel tree.
>> And why do you think that would matter the slightest?
> 
> I noticed that the same cq functions are used. And I also made tests with this patch series. Without this patch series, dim mechanism will not be invoked.

Christoph alluded to say: Do not modify the old cq_create_cq() code in order to support DIM, it is better to change the ULP to use ib_alloc_cq(), and get DIM enabled that way.


Thxs, Håkon
Allison Henderson Sept. 24, 2024, 3:20 p.m. UTC | #13
On Tue, 2024-09-24 at 08:59 +0000, Haakon Bugge wrote:
> 
> 
> > On 24 Sep 2024, at 09:01, Christoph Hellwig <hch@infradead.org>
> > wrote:
> > 
> > On Fri, Sep 20, 2024 at 06:51:18AM -0700, Christoph Hellwig wrote:
> > > On Fri, Sep 20, 2024 at 09:46:06AM +0000, Haakon Bugge wrote:
> > > > > I would much prefer if you could move RDS off that horrible
> > > > > API finally
> > > > > instead of investing more effort into it and making it more
> > > > > complicated.
> > > > 
> > > > ib_alloc_cq() and family does not support arming the CQ with
> > > > the IB_CQ_SOLICITED flag, which RDS uses.
> > > 
> > > Then work on supporting it.  RDS and SMC are the only users, so
> > > one
> > > of the maintainers needs to drive it.
> > 
> > I took a quick look at what it would take, and adding
> > IB_CQ_SOLICITED
> > support to the cq API looks pretty trivial, you'll just need to
> > pass
> > it to ib_cq_pool_get by adding a new argument and the pass it
> > down to __ib_alloc_cq.  So yes, please get started at moving RDS
> > out
> > of the stone age.
> 
> Agree. I'll work on that. I am also inclined to improve it by having
> designated CPUs where the worker threads polling the CQs execute, as
> that often improves cache hit rates. But that will come as another
> commit.
> 
> 
> Thxs, Håkon
> 

Ok, sounds like a good plan then.  Thanks for working on this Haakon!

Allison
Zhu Yanjun Sept. 25, 2024, 2:04 a.m. UTC | #14
在 2024/9/24 23:16, Haakon Bugge 写道:
> 
> 
>> On 24 Sep 2024, at 15:59, Zhu Yanjun <yanjun.zhu@linux.dev> wrote:
>>
>> 在 2024/9/24 14:54, Christoph Hellwig 写道:
>>> On Tue, Sep 24, 2024 at 09:58:24AM +0800, Zhu Yanjun wrote:
>>>> The users that I mentioned is not in the kernel tree.
>>> And why do you think that would matter the slightest?
>>
>> I noticed that the same cq functions are used. And I also made tests with this patch series. Without this patch series, dim mechanism will not be invoked.
> 
> Christoph alluded to say: Do not modify the old cq_create_cq() code in order to support DIM, it is better to change the ULP to use ib_alloc_cq(), and get DIM enabled that way.

Hi, Haakon

To be honest, I like your original commit that enable DIM for legacy 
ULPs because this can fix this problem once for all and improve the old 
ib_create_cq function.

The idea from Christoph will cause a lot of changes in ULPs. I am not 
very sure if these changes cause risks or not.

Thus, I prefer to your original commit. But I will follow the advice 
from Leon and Jason.

Zhu Yanjun

> 
> 
> Thxs, Håkon
>
Leon Romanovsky Sept. 25, 2024, 9:26 a.m. UTC | #15
On Wed, Sep 25, 2024 at 10:04:21AM +0800, Zhu Yanjun wrote:
> 在 2024/9/24 23:16, Haakon Bugge 写道:
> > 
> > 
> > > On 24 Sep 2024, at 15:59, Zhu Yanjun <yanjun.zhu@linux.dev> wrote:
> > > 
> > > 在 2024/9/24 14:54, Christoph Hellwig 写道:
> > > > On Tue, Sep 24, 2024 at 09:58:24AM +0800, Zhu Yanjun wrote:
> > > > > The users that I mentioned is not in the kernel tree.
> > > > And why do you think that would matter the slightest?
> > > 
> > > I noticed that the same cq functions are used. And I also made tests with this patch series. Without this patch series, dim mechanism will not be invoked.
> > 
> > Christoph alluded to say: Do not modify the old cq_create_cq() code in order to support DIM, it is better to change the ULP to use ib_alloc_cq(), and get DIM enabled that way.
> 
> Hi, Haakon
> 
> To be honest, I like your original commit that enable DIM for legacy ULPs
> because this can fix this problem once for all and improve the old
> ib_create_cq function.
> 
> The idea from Christoph will cause a lot of changes in ULPs. I am not very
> sure if these changes cause risks or not.
> 
> Thus, I prefer to your original commit. But I will follow the advice from
> Leon and Jason.

Christoph was very clear and he summarized our position very well. We
said similar thing to SMC folks in 2022 [1] and RDS is no different here.

So no, "old ib_create_cq" shouldn't be used by ULPs.

Thanks

[1] https://lore.kernel.org/netdev/YePesYRnrKCh1vFy@unreal/

> 
> Zhu Yanjun
> 
> > 
> > 
> > Thxs, Håkon
> > 
> 
>
Zhu Yanjun Sept. 26, 2024, 3:25 p.m. UTC | #16
在 2024/9/25 17:26, Leon Romanovsky 写道:
> On Wed, Sep 25, 2024 at 10:04:21AM +0800, Zhu Yanjun wrote:
>> 在 2024/9/24 23:16, Haakon Bugge 写道:
>>>
>>>
>>>> On 24 Sep 2024, at 15:59, Zhu Yanjun <yanjun.zhu@linux.dev> wrote:
>>>>
>>>> 在 2024/9/24 14:54, Christoph Hellwig 写道:
>>>>> On Tue, Sep 24, 2024 at 09:58:24AM +0800, Zhu Yanjun wrote:
>>>>>> The users that I mentioned is not in the kernel tree.
>>>>> And why do you think that would matter the slightest?
>>>>
>>>> I noticed that the same cq functions are used. And I also made tests with this patch series. Without this patch series, dim mechanism will not be invoked.
>>>
>>> Christoph alluded to say: Do not modify the old cq_create_cq() code in order to support DIM, it is better to change the ULP to use ib_alloc_cq(), and get DIM enabled that way.
>>
>> Hi, Haakon
>>
>> To be honest, I like your original commit that enable DIM for legacy ULPs
>> because this can fix this problem once for all and improve the old
>> ib_create_cq function.
>>
>> The idea from Christoph will cause a lot of changes in ULPs. I am not very
>> sure if these changes cause risks or not.
>>
>> Thus, I prefer to your original commit. But I will follow the advice from
>> Leon and Jason.
> 
> Christoph was very clear and he summarized our position very well. We
> said similar thing to SMC folks in 2022 [1] and RDS is no different here.

Thanks, Leon. I will read this link carefully.

> 
> So no, "old ib_create_cq" shouldn't be used by ULPs.

Got it. I have replaced ib_create_cq with ib cq pool APIs. Perhaps 
drivers/infiniband/ulp/srpt/ib_srpt.c is a good example to use ib cq 
pool APIs.

Best Regards,
Zhu Yanjun

> 
> Thanks
> 
> [1] https://lore.kernel.org/netdev/YePesYRnrKCh1vFy@unreal/
> 
>>
>> Zhu Yanjun
>>
>>>
>>>
>>> Thxs, Håkon
>>>
>>
>>