mbox series

[for-next,0/2] Host information userspace version

Message ID 20210105104326.67895-1-galpress@amazon.com (mailing list archive)
Headers show
Series Host information userspace version | expand

Message

Gal Pressman Jan. 5, 2021, 10:43 a.m. UTC
The following two patches add the userspace version to the host
information struct reported to the device, used for debugging and
troubleshooting purposes.

PR was sent:
https://github.com/linux-rdma/rdma-core/pull/918

Thanks,
Gal

Gal Pressman (2):
  RDMA/efa: Move host info set to first ucontext allocation
  RDMA/efa: Report userspace version in host info

 drivers/infiniband/hw/efa/efa.h                 | 7 +++++++
 drivers/infiniband/hw/efa/efa_admin_cmds_defs.h | 3 +++
 drivers/infiniband/hw/efa/efa_main.c            | 6 +++---
 drivers/infiniband/hw/efa/efa_verbs.c           | 3 +++
 include/uapi/rdma/efa-abi.h                     | 3 ++-
 5 files changed, 18 insertions(+), 4 deletions(-)


base-commit: e246b7c035d74abfb3507fa10082d0c42cc016c3

Comments

Gal Pressman Jan. 19, 2021, 7:17 a.m. UTC | #1
On 05/01/2021 12:43, Gal Pressman wrote:
> The following two patches add the userspace version to the host
> information struct reported to the device, used for debugging and
> troubleshooting purposes.
> 
> PR was sent:
> https://github.com/linux-rdma/rdma-core/pull/918
> 
> Thanks,
> Gal

Anything stopping this series from being merged?
Leon Romanovsky Jan. 19, 2021, 8:46 a.m. UTC | #2
On Tue, Jan 19, 2021 at 09:17:14AM +0200, Gal Pressman wrote:
> On 05/01/2021 12:43, Gal Pressman wrote:
> > The following two patches add the userspace version to the host
> > information struct reported to the device, used for debugging and
> > troubleshooting purposes.
> >
> > PR was sent:
> > https://github.com/linux-rdma/rdma-core/pull/918
> >
> > Thanks,
> > Gal
>
> Anything stopping this series from being merged?

It is unclear when this forwarding of non-verbs data to the FW will stop.

Thanks
Gal Pressman Jan. 19, 2021, 9:10 a.m. UTC | #3
On 19/01/2021 10:46, Leon Romanovsky wrote:
> On Tue, Jan 19, 2021 at 09:17:14AM +0200, Gal Pressman wrote:
>> On 05/01/2021 12:43, Gal Pressman wrote:
>>> The following two patches add the userspace version to the host
>>> information struct reported to the device, used for debugging and
>>> troubleshooting purposes.
>>>
>>> PR was sent:
>>> https://github.com/linux-rdma/rdma-core/pull/918
>>>
>>> Thanks,
>>> Gal
>>
>> Anything stopping this series from being merged?
> 
> It is unclear when this forwarding of non-verbs data to the FW will stop.

This was already discussed in the PR. Not everything should be passed through
this interface, there should be a limit and it should be examined per case.
rdma-core version is clearly related to an RDMA device.

BTW, if you have any concerns about a patch you can state them, you don't have
to ignore it and wait for the submitter to ask what's wrong..
Leon Romanovsky Jan. 19, 2021, 11:58 a.m. UTC | #4
On Tue, Jan 19, 2021 at 11:10:59AM +0200, Gal Pressman wrote:
> On 19/01/2021 10:46, Leon Romanovsky wrote:
> > On Tue, Jan 19, 2021 at 09:17:14AM +0200, Gal Pressman wrote:
> >> On 05/01/2021 12:43, Gal Pressman wrote:
> >>> The following two patches add the userspace version to the host
> >>> information struct reported to the device, used for debugging and
> >>> troubleshooting purposes.
> >>>
> >>> PR was sent:
> >>> https://github.com/linux-rdma/rdma-core/pull/918
> >>>
> >>> Thanks,
> >>> Gal
> >>
> >> Anything stopping this series from being merged?
> >
> > It is unclear when this forwarding of non-verbs data to the FW will stop.
>
> This was already discussed in the PR. Not everything should be passed through
> this interface, there should be a limit and it should be examined per case.
> rdma-core version is clearly related to an RDMA device.

"Clearly or not" - it depends on the observer.

>
> BTW, if you have any concerns about a patch you can state them, you don't have
> to ignore it and wait for the submitter to ask what's wrong..

Didn't you mistake me with anyone else?

I'm reviewer in the kernel exactly like you and it gives me nice thing - ignore patches.

Thanks
Gal Pressman Jan. 19, 2021, 1:19 p.m. UTC | #5
On 19/01/2021 13:58, Leon Romanovsky wrote:
> On Tue, Jan 19, 2021 at 11:10:59AM +0200, Gal Pressman wrote:
>> On 19/01/2021 10:46, Leon Romanovsky wrote:
>>> On Tue, Jan 19, 2021 at 09:17:14AM +0200, Gal Pressman wrote:
>>>> On 05/01/2021 12:43, Gal Pressman wrote:
>>>>> The following two patches add the userspace version to the host
>>>>> information struct reported to the device, used for debugging and
>>>>> troubleshooting purposes.
>>>>>
>>>>> PR was sent:
>>>>> https://github.com/linux-rdma/rdma-core/pull/918
>>>>>
>>>>> Thanks,
>>>>> Gal
>>>>
>>>> Anything stopping this series from being merged?
>>>
>>> It is unclear when this forwarding of non-verbs data to the FW will stop.
>>
>> This was already discussed in the PR. Not everything should be passed through
>> this interface, there should be a limit and it should be examined per case.
>> rdma-core version is clearly related to an RDMA device.
> 
> "Clearly or not" - it depends on the observer.
> 
>>
>> BTW, if you have any concerns about a patch you can state them, you don't have
>> to ignore it and wait for the submitter to ask what's wrong..
> 
> Didn't you mistake me with anyone else?

No, you decided to answer my original question :).

> I'm reviewer in the kernel exactly like you and it gives me nice thing - ignore patches.

Don't get me wrong, your review is very appreciated, but this series is 20 LOC
which you already reviewed two weeks ago, and I replied to all comments.
Ignoring patches is fine, but please don't review, ignore and wait for the last
minute to say they shouldn't be merged.
Leon Romanovsky Jan. 19, 2021, 1:34 p.m. UTC | #6
On Tue, Jan 19, 2021 at 03:19:15PM +0200, Gal Pressman wrote:
> On 19/01/2021 13:58, Leon Romanovsky wrote:
> > On Tue, Jan 19, 2021 at 11:10:59AM +0200, Gal Pressman wrote:
> >> On 19/01/2021 10:46, Leon Romanovsky wrote:
> >>> On Tue, Jan 19, 2021 at 09:17:14AM +0200, Gal Pressman wrote:
> >>>> On 05/01/2021 12:43, Gal Pressman wrote:
> >>>>> The following two patches add the userspace version to the host
> >>>>> information struct reported to the device, used for debugging and
> >>>>> troubleshooting purposes.
> >>>>>
> >>>>> PR was sent:
> >>>>> https://github.com/linux-rdma/rdma-core/pull/918
> >>>>>
> >>>>> Thanks,
> >>>>> Gal
> >>>>
> >>>> Anything stopping this series from being merged?
> >>>
> >>> It is unclear when this forwarding of non-verbs data to the FW will stop.
> >>
> >> This was already discussed in the PR. Not everything should be passed through
> >> this interface, there should be a limit and it should be examined per case.
> >> rdma-core version is clearly related to an RDMA device.
> >
> > "Clearly or not" - it depends on the observer.
> >
> >>
> >> BTW, if you have any concerns about a patch you can state them, you don't have
> >> to ignore it and wait for the submitter to ask what's wrong..
> >
> > Didn't you mistake me with anyone else?
>
> No, you decided to answer my original question :).
>
> > I'm reviewer in the kernel exactly like you and it gives me nice thing - ignore patches.
>
> Don't get me wrong, your review is very appreciated, but this series is 20 LOC
> which you already reviewed two weeks ago, and I replied to all comments.
> Ignoring patches is fine, but please don't review, ignore and wait for the last
> minute to say they shouldn't be merged.

Sorry, but it is impossible to get it right.

I gave you hint WHY it takes so long, it is not review/ack/nack or
anything like this.

Thanks
Jason Gunthorpe Jan. 21, 2021, 6:35 p.m. UTC | #7
On Tue, Jan 19, 2021 at 09:17:14AM +0200, Gal Pressman wrote:
> On 05/01/2021 12:43, Gal Pressman wrote:
> > The following two patches add the userspace version to the host
> > information struct reported to the device, used for debugging and
> > troubleshooting purposes.
> > 
> > PR was sent:
> > https://github.com/linux-rdma/rdma-core/pull/918
> > 
> > Thanks,
> > Gal
> 
> Anything stopping this series from being merged?

Honestly, I'm not very keen on this

Why does this have to go through a kernel driver, can't you collect
OS telemetry some other way?

Jason
Gal Pressman Jan. 21, 2021, 7:40 p.m. UTC | #8
On 21/01/2021 20:35, Jason Gunthorpe wrote:
> On Tue, Jan 19, 2021 at 09:17:14AM +0200, Gal Pressman wrote:
>> On 05/01/2021 12:43, Gal Pressman wrote:
>>> The following two patches add the userspace version to the host
>>> information struct reported to the device, used for debugging and
>>> troubleshooting purposes.
>>>
>>> PR was sent:
>>> https://github.com/linux-rdma/rdma-core/pull/918
>>>
>>> Thanks,
>>> Gal
>>
>> Anything stopping this series from being merged?
> 
> Honestly, I'm not very keen on this
> 
> Why does this have to go through a kernel driver, can't you collect
> OS telemetry some other way?

Hmm, it has to go through rdma-core somehow, what sort of component can
rdma-core interact with to pass such data? The only one I could think of is the
RDMA driver :).

As I said, I get your concern, I was going on and off about this as well, but
the userspace version is a very useful piece of information in the context of a
kernel bypass device. It's just as important as the kernel version.
I agree that this is not the place to pass things like gcc version, but I don't
think that's the case here :).

Do you absolutely hate the idea of passing the userspace version, or are you
worried about what's next to come?
If it's the latter, we don't really have plans to push anything similar anytime
soon, and even if we did, I don't think it should block this series.
Jason Gunthorpe Jan. 27, 2021, 4:57 p.m. UTC | #9
On Thu, Jan 21, 2021 at 09:40:49PM +0200, Gal Pressman wrote:
> On 21/01/2021 20:35, Jason Gunthorpe wrote:
> > On Tue, Jan 19, 2021 at 09:17:14AM +0200, Gal Pressman wrote:
> >> On 05/01/2021 12:43, Gal Pressman wrote:
> >>> The following two patches add the userspace version to the host
> >>> information struct reported to the device, used for debugging and
> >>> troubleshooting purposes.
> >>>
> >>> PR was sent:
> >>> https://github.com/linux-rdma/rdma-core/pull/918
> >>>
> >>> Thanks,
> >>> Gal
> >>
> >> Anything stopping this series from being merged?
> > 
> > Honestly, I'm not very keen on this
> > 
> > Why does this have to go through a kernel driver, can't you collect
> > OS telemetry some other way?
> 
> Hmm, it has to go through rdma-core somehow, what sort of component can
> rdma-core interact with to pass such data? The only one I could think of is the
> RDMA driver :).
> 
> As I said, I get your concern, I was going on and off about this as well, but
> the userspace version is a very useful piece of information in the context of a
> kernel bypass device. It's just as important as the kernel version.
> I agree that this is not the place to pass things like gcc version, but I don't
> think that's the case here :).

Well, if we were to do this for mlx5 we'd want to pass UCX and maybe
other stuff, it seems like it gets quickly out of hand.

I think telemetry is better done as some telemetry subsystem, not
integrated all over the place

Jason
Gal Pressman Jan. 27, 2021, 5:53 p.m. UTC | #10
On 27/01/2021 18:57, Jason Gunthorpe wrote:
> On Thu, Jan 21, 2021 at 09:40:49PM +0200, Gal Pressman wrote:
>> On 21/01/2021 20:35, Jason Gunthorpe wrote:
>>> On Tue, Jan 19, 2021 at 09:17:14AM +0200, Gal Pressman wrote:
>>>> On 05/01/2021 12:43, Gal Pressman wrote:
>>>>> The following two patches add the userspace version to the host
>>>>> information struct reported to the device, used for debugging and
>>>>> troubleshooting purposes.
>>>>>
>>>>> PR was sent:
>>>>> https://github.com/linux-rdma/rdma-core/pull/918
>>>>>
>>>>> Thanks,
>>>>> Gal
>>>>
>>>> Anything stopping this series from being merged?
>>>
>>> Honestly, I'm not very keen on this
>>>
>>> Why does this have to go through a kernel driver, can't you collect
>>> OS telemetry some other way?
>>
>> Hmm, it has to go through rdma-core somehow, what sort of component can
>> rdma-core interact with to pass such data? The only one I could think of is the
>> RDMA driver :).
>>
>> As I said, I get your concern, I was going on and off about this as well, but
>> the userspace version is a very useful piece of information in the context of a
>> kernel bypass device. It's just as important as the kernel version.
>> I agree that this is not the place to pass things like gcc version, but I don't
>> think that's the case here :).
> 
> Well, if we were to do this for mlx5 we'd want to pass UCX and maybe
> other stuff, it seems like it gets quickly out of hand.

Agree, that's why I think this should be limited to things in rdma-core's reach,
sounds like a reasonable limit to me.

> I think telemetry is better done as some telemetry subsystem, not
> integrated all over the place

Interesting, but that would still be all over the place as each package would
have to report its version to that telemetry driver.

And since this currently doesn't exist, should we stay without a solution?
Specifically talking about rdma-core version, do you think it could be merged?
Gal Pressman Feb. 28, 2021, 9:56 a.m. UTC | #11
On 27/01/2021 19:53, Gal Pressman wrote:
> On 27/01/2021 18:57, Jason Gunthorpe wrote:
>> On Thu, Jan 21, 2021 at 09:40:49PM +0200, Gal Pressman wrote:
>>> On 21/01/2021 20:35, Jason Gunthorpe wrote:
>>>> On Tue, Jan 19, 2021 at 09:17:14AM +0200, Gal Pressman wrote:
>>>>> On 05/01/2021 12:43, Gal Pressman wrote:
>>>>>> The following two patches add the userspace version to the host
>>>>>> information struct reported to the device, used for debugging and
>>>>>> troubleshooting purposes.
>>>>>>
>>>>>> PR was sent:
>>>>>> https://github.com/linux-rdma/rdma-core/pull/918
>>>>>>
>>>>>> Thanks,
>>>>>> Gal
>>>>>
>>>>> Anything stopping this series from being merged?
>>>>
>>>> Honestly, I'm not very keen on this
>>>>
>>>> Why does this have to go through a kernel driver, can't you collect
>>>> OS telemetry some other way?
>>>
>>> Hmm, it has to go through rdma-core somehow, what sort of component can
>>> rdma-core interact with to pass such data? The only one I could think of is the
>>> RDMA driver :).
>>>
>>> As I said, I get your concern, I was going on and off about this as well, but
>>> the userspace version is a very useful piece of information in the context of a
>>> kernel bypass device. It's just as important as the kernel version.
>>> I agree that this is not the place to pass things like gcc version, but I don't
>>> think that's the case here :).
>>
>> Well, if we were to do this for mlx5 we'd want to pass UCX and maybe
>> other stuff, it seems like it gets quickly out of hand.
> 
> Agree, that's why I think this should be limited to things in rdma-core's reach,
> sounds like a reasonable limit to me.
> 
>> I think telemetry is better done as some telemetry subsystem, not
>> integrated all over the place
> 
> Interesting, but that would still be all over the place as each package would
> have to report its version to that telemetry driver.
> 
> And since this currently doesn't exist, should we stay without a solution?
> Specifically talking about rdma-core version, do you think it could be merged?
> 

Jason?
Jason Gunthorpe March 2, 2021, 12:04 a.m. UTC | #12
On Sun, Feb 28, 2021 at 11:56:01AM +0200, Gal Pressman wrote:
> On 27/01/2021 19:53, Gal Pressman wrote:
> > On 27/01/2021 18:57, Jason Gunthorpe wrote:
> >> On Thu, Jan 21, 2021 at 09:40:49PM +0200, Gal Pressman wrote:
> >>> On 21/01/2021 20:35, Jason Gunthorpe wrote:
> >>>> On Tue, Jan 19, 2021 at 09:17:14AM +0200, Gal Pressman wrote:
> >>>>> On 05/01/2021 12:43, Gal Pressman wrote:
> >>>>>> The following two patches add the userspace version to the host
> >>>>>> information struct reported to the device, used for debugging and
> >>>>>> troubleshooting purposes.
> >>>>>>
> >>>>>> PR was sent:
> >>>>>> https://github.com/linux-rdma/rdma-core/pull/918
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Gal
> >>>>>
> >>>>> Anything stopping this series from being merged?
> >>>>
> >>>> Honestly, I'm not very keen on this
> >>>>
> >>>> Why does this have to go through a kernel driver, can't you collect
> >>>> OS telemetry some other way?
> >>>
> >>> Hmm, it has to go through rdma-core somehow, what sort of component can
> >>> rdma-core interact with to pass such data? The only one I could think of is the
> >>> RDMA driver :).
> >>>
> >>> As I said, I get your concern, I was going on and off about this as well, but
> >>> the userspace version is a very useful piece of information in the context of a
> >>> kernel bypass device. It's just as important as the kernel version.
> >>> I agree that this is not the place to pass things like gcc version, but I don't
> >>> think that's the case here :).
> >>
> >> Well, if we were to do this for mlx5 we'd want to pass UCX and maybe
> >> other stuff, it seems like it gets quickly out of hand.
> > 
> > Agree, that's why I think this should be limited to things in rdma-core's reach,
> > sounds like a reasonable limit to me.
> > 
> >> I think telemetry is better done as some telemetry subsystem, not
> >> integrated all over the place
> > 
> > Interesting, but that would still be all over the place as each package would
> > have to report its version to that telemetry driver.
> > 
> > And since this currently doesn't exist, should we stay without a solution?
> > Specifically talking about rdma-core version, do you think it could be merged?
> > 
> 
> Jason?

I'm not keen on it, it doesn't work well for other use-cases, it seems
too hacky.

Jason