mbox series

[ibacm,v2,0/2] ibacm: acme supports only one port

Message ID cover.1546850468.git.haakon.bugge@oracle.com (mailing list archive)
Headers show
Series ibacm: acme supports only one port | expand

Message

Haakon Bugge Jan. 7, 2019, 8:45 a.m. UTC
Some end-of-line spaces forced this into a commit series.


Håkon Bugge (2):
  ibacm: Remove trailing blanks in acm.h
  ibacm: acme supports only one port

 ibacm/include/infiniband/acm.h |  8 +++--
 ibacm/src/acm.c                | 11 +++++--
 ibacm/src/acme.c               | 36 ++++++++++++++---------
 ibacm/src/libacm.c             | 53 +++++++++++++++++++++++-----------
 ibacm/src/libacm.h             |  2 +-
 5 files changed, 72 insertions(+), 38 deletions(-)

--
2.19.2

Comments

Haakon Bugge Jan. 16, 2019, 8:17 p.m. UTC | #1
> On 7 Jan 2019, at 09:45, Håkon Bugge <haakon.bugge@oracle.com> wrote:
> 
> Some end-of-line spaces forced this into a commit series.
> 
> 
> Håkon Bugge (2):
>  ibacm: Remove trailing blanks in acm.h
>  ibacm: acme supports only one port
> 
> ibacm/include/infiniband/acm.h |  8 +++--
> ibacm/src/acm.c                | 11 +++++--
> ibacm/src/acme.c               | 36 ++++++++++++++---------
> ibacm/src/libacm.c             | 53 +++++++++++++++++++++++-----------
> ibacm/src/libacm.h             |  2 +-
> 5 files changed, 72 insertions(+), 38 deletions(-)

Ping?


> 
> --
> 2.19.2
>
Jason Gunthorpe Jan. 16, 2019, 8:34 p.m. UTC | #2
On Wed, Jan 16, 2019 at 09:17:24PM +0100, Håkon Bugge wrote:
> 
> 
> > On 7 Jan 2019, at 09:45, Håkon Bugge <haakon.bugge@oracle.com> wrote:
> > 
> > Some end-of-line spaces forced this into a commit series.
> > 
> > 
> > Håkon Bugge (2):
> >  ibacm: Remove trailing blanks in acm.h
> >  ibacm: acme supports only one port
> > 
> > ibacm/include/infiniband/acm.h |  8 +++--
> > ibacm/src/acm.c                | 11 +++++--
> > ibacm/src/acme.c               | 36 ++++++++++++++---------
> > ibacm/src/libacm.c             | 53 +++++++++++++++++++++++-----------
> > ibacm/src/libacm.h             |  2 +-
> > 5 files changed, 72 insertions(+), 38 deletions(-)
> 
> Ping?

Does anyone want to review ACM code?

I'm inclined to just apply Hakon's patches otherwise.

Sean/Hal? I have you listed as maintaining ACM, do you want to stop?

Jason
Hal Rosenstock Jan. 23, 2019, 5:17 p.m. UTC | #3
On 1/16/2019 3:34 PM, Jason Gunthorpe wrote:
> Hal? I have you listed as maintaining ACM, do you want to stop?

Yes; I don't really have the time for this anymore.

-- Hal

> Jason
Haakon Bugge Jan. 29, 2019, 8:46 a.m. UTC | #4
> On 23 Jan 2019, at 18:17, Hal Rosenstock <hal@dev.mellanox.co.il> wrote:
> 
> On 1/16/2019 3:34 PM, Jason Gunthorpe wrote:
>> Hal? I have you listed as maintaining ACM, do you want to stop?
> 
> Yes; I don't really have the time for this anymore.

Anyone else?


Thanks, Håkon
Jason Gunthorpe Jan. 29, 2019, 11:43 p.m. UTC | #5
On Tue, Jan 29, 2019 at 09:46:58AM +0100, Håkon Bugge wrote:
> 
> 
> > On 23 Jan 2019, at 18:17, Hal Rosenstock <hal@dev.mellanox.co.il> wrote:
> > 
> > On 1/16/2019 3:34 PM, Jason Gunthorpe wrote:
> >> Hal? I have you listed as maintaining ACM, do you want to stop?
> > 
> > Yes; I don't really have the time for this anymore.
> 
> Anyone else?

Hey Hakon, I think you just got nominated as the new ACM maintainer :) :)

Jason
Mark Bloch Jan. 29, 2019, 11:52 p.m. UTC | #6
On 1/29/19 3:43 PM, Jason Gunthorpe wrote:
> On Tue, Jan 29, 2019 at 09:46:58AM +0100, Håkon Bugge wrote:
>>
>>
>>> On 23 Jan 2019, at 18:17, Hal Rosenstock <hal@dev.mellanox.co.il> wrote:
>>>
>>> On 1/16/2019 3:34 PM, Jason Gunthorpe wrote:
>>>> Hal? I have you listed as maintaining ACM, do you want to stop?
>>>
>>> Yes; I don't really have the time for this anymore.
>>
>> Anyone else?
> 
> Hey Hakon, I think you just got nominated as the new ACM maintainer :) :)

Congratulation!, on a semi-related note I've just found two fixes from 2016 I had for ibacm
Hakon can you please have a look and see if they are still needed?

commit 966120a5b3c0f8302a4a2d8a1d74207bd35c97e6
Author: Mark Bloch <markb@mellanox.com>
Date:   Thu Jun 2 14:40:59 2016 +0300

    Fix invalidation of a path record
    
    When a timeout has expired for a path record, we need to initialize
    the DLID, otherwise the DLID will be included in the request
    for a new path record which is invalid because the DLID may have
    changed.
    
    Signed-off-by: Mark Bloch <markb@mellanox.com>

diff --git a/prov/acmp/src/acmp.c b/prov/acmp/src/acmp.c
index 717bb83..ad7be19 100644
--- a/prov/acmp/src/acmp.c
+++ b/prov/acmp/src/acmp.c
@@ -1734,11 +1734,12 @@ static int acmp_dest_timeout(struct acmp_dest *dest)
 {
        uint64_t timestamp = time_stamp_min();
 
-       if (timestamp > dest->addr_timeout) {
+       if (timestamp >= dest->addr_timeout) {
                acm_log(2, "%s address timed out\n", dest->name);
                dest->state = ACMP_INIT;
                return 1;
-       } else if (timestamp > dest->route_timeout) {
+       } else if (timestamp >= dest->route_timeout) {
+               dest->path.dlid = 0;
                acm_log(2, "%s route timed out\n", dest->name);
                dest->state = ACMP_ADDR_RESOLVED;
                return 1;
and:
commit ce409bff62a1c9d6d0f3c1a7d0f93241558b8d79
Author: Mark Bloch <markb@mellanox.com>
Date:   Thu Sep 15 10:56:19 2016 +0300

    Update port info on reregister event
    
    On SM handover/failover an IBV_EVENT_CLIENT_REREGISTER
    event is triggered. ibacm didn't update the port info (which includes
    SA info) which resulted that mads sent by ibacm were sent with stale
    SA parameters.
    
    The fix forces ibacm to update the port info on IBV_EVENT_CLIENT_REREGISTER.
    
    Signed-off-by: Mark Bloch <markb@mellanox.com>

diff --git a/src/acm.c b/src/acm.c
index b75f089..99c06c4 100644
--- a/src/acm.c
+++ b/src/acm.c
@@ -2599,6 +2599,7 @@ static void acm_event_handler(struct acmc_device *dev)
                        acm_port_down(&dev->port[i]);
                break;
        case IBV_EVENT_CLIENT_REREGISTER:
+               acm_port_change(&dev->port[i]);
                if ((dev->port[i].state == IBV_PORT_ACTIVE) &&
                    dev->port[i].prov_port_context) {
                        dev->port[i].prov->handle_event(dev->port[i].prov_port_context,

I don't have an a setup/env to test this (just don't some git cleaning and found those)
and it seems you care and testing this :)

Mark
> 
> Jason
>
Haakon Bugge Jan. 30, 2019, 1:13 p.m. UTC | #7
> On 30 Jan 2019, at 00:52, Mark Bloch <markb@mellanox.com> wrote:
> On 1/29/19 3:43 PM, Jason Gunthorpe wrote:
>> []
>> Hey Hakon, I think you just got nominated as the new ACM maintainer :) :)


Thanks! Appreciated to being thought of :-) We are actually working some grounds here in orcl to come up with a candidate as well. Just need to make sure the chain of commands here understands that it takes time and requires certain resources for testing/building etc. So, will shortly come back on this point. 

But, an orcl maintainer isn't really expected to review orcl contributions, is she?

> Congratulation!,

Thanks!

> on a semi-related note I've just found two fixes from 2016 I had for ibacm
> Hakon can you please have a look and see if they are still needed?
> 
> commit 966120a5b3c0f8302a4a2d8a1d74207bd35c97e6
> Author: Mark Bloch <markb@mellanox.com>
> Date:   Thu Jun 2 14:40:59 2016 +0300
> 
>    Fix invalidation of a path record
> 
>    When a timeout has expired for a path record, we need to initialize
>    the DLID, otherwise the DLID will be included in the request
>    for a new path record which is invalid because the DLID may have
>    changed.

I looked briefly into this, and IIRC, ACM uses both gid and lid when both are non-zero. But why not skip the volatile LID always and rely on the gid, which doesn't change?


> 
>    Signed-off-by: Mark Bloch <markb@mellanox.com>
> 
> diff --git a/prov/acmp/src/acmp.c b/prov/acmp/src/acmp.c
> index 717bb83..ad7be19 100644
> --- a/prov/acmp/src/acmp.c
> +++ b/prov/acmp/src/acmp.c
> @@ -1734,11 +1734,12 @@ static int acmp_dest_timeout(struct acmp_dest *dest)
> {
>        uint64_t timestamp = time_stamp_min();
> 
> -       if (timestamp > dest->addr_timeout) {
> +       if (timestamp >= dest->addr_timeout) {
>                acm_log(2, "%s address timed out\n", dest->name);
>                dest->state = ACMP_INIT;
>                return 1;
> -       } else if (timestamp > dest->route_timeout) {
> +       } else if (timestamp >= dest->route_timeout) {
> +               dest->path.dlid = 0;
>                acm_log(2, "%s route timed out\n", dest->name);
>                dest->state = ACMP_ADDR_RESOLVED;
>                return 1;
> and:
> commit ce409bff62a1c9d6d0f3c1a7d0f93241558b8d79
> Author: Mark Bloch <markb@mellanox.com>
> Date:   Thu Sep 15 10:56:19 2016 +0300
> 
>    Update port info on reregister event
> 
>    On SM handover/failover an IBV_EVENT_CLIENT_REREGISTER
>    event is triggered. ibacm didn't update the port info (which includes
>    SA info) which resulted that mads sent by ibacm were sent with stale
>    SA parameters.
> 
>    The fix forces ibacm to update the port info on IBV_EVENT_CLIENT_REREGISTER.

If the case here is that the port is active, this sounds correct. Need to see if the issue can be reproed by using ibportstate to change the lid, and get opensm to restore it. Would that be the scenario this commit aims at fixing?



Thxs, Håkon


> 
>    Signed-off-by: Mark Bloch <markb@mellanox.com>
> 
> diff --git a/src/acm.c b/src/acm.c
> index b75f089..99c06c4 100644
> --- a/src/acm.c
> +++ b/src/acm.c
> @@ -2599,6 +2599,7 @@ static void acm_event_handler(struct acmc_device *dev)
>                        acm_port_down(&dev->port[i]);
>                break;
>        case IBV_EVENT_CLIENT_REREGISTER:
> +               acm_port_change(&dev->port[i]);
>                if ((dev->port[i].state == IBV_PORT_ACTIVE) &&
>                    dev->port[i].prov_port_context) {
>                        dev->port[i].prov->handle_event(dev->port[i].prov_port_context,
> 
> I don't have an a setup/env to test this (just don't some git cleaning and found those)
> and it seems you care and testing this :)
> 
> Mark
>> 
>> Jason
Mark Bloch Jan. 31, 2019, 5:42 p.m. UTC | #8
On 1/30/19 5:13 AM, Håkon Bugge wrote:
> 
> 
>> On 30 Jan 2019, at 00:52, Mark Bloch <markb@mellanox.com> wrote:
>> On 1/29/19 3:43 PM, Jason Gunthorpe wrote:
>>> []
>>> Hey Hakon, I think you just got nominated as the new ACM maintainer :) :)
> 
> 
> Thanks! Appreciated to being thought of :-) We are actually working some grounds here in orcl to come up with a candidate as well. Just need to make sure the chain of commands here understands that it takes time and requires certain resources for testing/building etc. So, will shortly come back on this point. 
> 
> But, an orcl maintainer isn't really expected to review orcl contributions, is she?
> 
>> Congratulation!,
> 
> Thanks!
> 
>> on a semi-related note I've just found two fixes from 2016 I had for ibacm
>> Hakon can you please have a look and see if they are still needed?
>>
>> commit 966120a5b3c0f8302a4a2d8a1d74207bd35c97e6
>> Author: Mark Bloch <markb@mellanox.com>
>> Date:   Thu Jun 2 14:40:59 2016 +0300
>>
>>    Fix invalidation of a path record
>>
>>    When a timeout has expired for a path record, we need to initialize
>>    the DLID, otherwise the DLID will be included in the request
>>    for a new path record which is invalid because the DLID may have
>>    changed.
> 
> I looked briefly into this, and IIRC, ACM uses both gid and lid when both are non-zero. But why not skip the volatile LID always and rely on the gid, which doesn't change?

Don't we use the LID in other places?

> 
> 
>>
>>    Signed-off-by: Mark Bloch <markb@mellanox.com>
>>
>> diff --git a/prov/acmp/src/acmp.c b/prov/acmp/src/acmp.c
>> index 717bb83..ad7be19 100644
>> --- a/prov/acmp/src/acmp.c
>> +++ b/prov/acmp/src/acmp.c
>> @@ -1734,11 +1734,12 @@ static int acmp_dest_timeout(struct acmp_dest *dest)
>> {
>>        uint64_t timestamp = time_stamp_min();
>>
>> -       if (timestamp > dest->addr_timeout) {
>> +       if (timestamp >= dest->addr_timeout) {
>>                acm_log(2, "%s address timed out\n", dest->name);
>>                dest->state = ACMP_INIT;
>>                return 1;
>> -       } else if (timestamp > dest->route_timeout) {
>> +       } else if (timestamp >= dest->route_timeout) {
>> +               dest->path.dlid = 0;
>>                acm_log(2, "%s route timed out\n", dest->name);
>>                dest->state = ACMP_ADDR_RESOLVED;
>>                return 1;
>> and:
>> commit ce409bff62a1c9d6d0f3c1a7d0f93241558b8d79
>> Author: Mark Bloch <markb@mellanox.com>
>> Date:   Thu Sep 15 10:56:19 2016 +0300
>>
>>    Update port info on reregister event
>>
>>    On SM handover/failover an IBV_EVENT_CLIENT_REREGISTER
>>    event is triggered. ibacm didn't update the port info (which includes
>>    SA info) which resulted that mads sent by ibacm were sent with stale
>>    SA parameters.
>>
>>    The fix forces ibacm to update the port info on IBV_EVENT_CLIENT_REREGISTER.
> 
> If the case here is that the port is active, this sounds correct. Need to see if the issue can be reproed by using ibportstate to change the lid, and get opensm to restore it. Would that be the scenario this commit aims at fixing?

From what I remember (and it was a few years back :)), it fixes the issue where a new SM takes over but ibacm still tries to use
the old one.

Mark
> 
> 
> 
> Thxs, Håkon
> 
> 
>>
>>    Signed-off-by: Mark Bloch <markb@mellanox.com>
>>
>> diff --git a/src/acm.c b/src/acm.c
>> index b75f089..99c06c4 100644
>> --- a/src/acm.c
>> +++ b/src/acm.c
>> @@ -2599,6 +2599,7 @@ static void acm_event_handler(struct acmc_device *dev)
>>                        acm_port_down(&dev->port[i]);
>>                break;
>>        case IBV_EVENT_CLIENT_REREGISTER:
>> +               acm_port_change(&dev->port[i]);
>>                if ((dev->port[i].state == IBV_PORT_ACTIVE) &&
>>                    dev->port[i].prov_port_context) {
>>                        dev->port[i].prov->handle_event(dev->port[i].prov_port_context,
>>
>> I don't have an a setup/env to test this (just don't some git cleaning and found those)
>> and it seems you care and testing this :)
>>
>> Mark
>>>
>>> Jason
>
Jason Gunthorpe Feb. 1, 2019, 7:02 p.m. UTC | #9
On Wed, Jan 30, 2019 at 02:13:40PM +0100, Håkon Bugge wrote:
> 
> 
> > On 30 Jan 2019, at 00:52, Mark Bloch <markb@mellanox.com> wrote:
> > On 1/29/19 3:43 PM, Jason Gunthorpe wrote:
> >> []
> >> Hey Hakon, I think you just got nominated as the new ACM maintainer :) :)
> 
> 
> Thanks! Appreciated to being thought of :-) We are actually working
> some grounds here in orcl to come up with a candidate as well. Just
> need to make sure the chain of commands here understands that it
> takes time and requires certain resources for testing/building
> etc. So, will shortly come back on this point.

Very good, let me know
 
> But, an orcl maintainer isn't really expected to review orcl
> contributions, is she?

Sure, I review Mellanox stuff all the time.

You just have to be trusted to keep the community responsibility hat
distinct from the business responsibility hat.

Jason
Haakon Bugge Feb. 12, 2019, 9:14 a.m. UTC | #10
> On 1 Feb 2019, at 20:02, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> 
> On Wed, Jan 30, 2019 at 02:13:40PM +0100, Håkon Bugge wrote:
>> 
>> 
>>> On 30 Jan 2019, at 00:52, Mark Bloch <markb@mellanox.com> wrote:
>>> On 1/29/19 3:43 PM, Jason Gunthorpe wrote:
>>>> []
>>>> Hey Hakon, I think you just got nominated as the new ACM maintainer :) :)
>> 
>> 
>> Thanks! Appreciated to being thought of :-) We are actually working
>> some grounds here in orcl to come up with a candidate as well. Just
>> need to make sure the chain of commands here understands that it
>> takes time and requires certain resources for testing/building
>> etc. So, will shortly come back on this point.
> 
> Very good, let me know


I got a "go" from my management for this. Would also like to nominate my colleague Mark Haywood, CCed, so we as a minimum can avoid reviewing our own commits.

I assume election happens by me sending a commit on MAINTAINERS, and then counting NAKs and ACKs?

Hal, did I get it right that you want to be excused?


Thxs, Håkon
Jason Gunthorpe Feb. 12, 2019, 5:39 p.m. UTC | #11
On Tue, Feb 12, 2019 at 10:14:29AM +0100, Håkon Bugge wrote:
> 
> 
> > On 1 Feb 2019, at 20:02, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > 
> > On Wed, Jan 30, 2019 at 02:13:40PM +0100, Håkon Bugge wrote:
> >> 
> >> 
> >>> On 30 Jan 2019, at 00:52, Mark Bloch <markb@mellanox.com> wrote:
> >>> On 1/29/19 3:43 PM, Jason Gunthorpe wrote:
> >>>> []
> >>>> Hey Hakon, I think you just got nominated as the new ACM maintainer :) :)
> >> 
> >> 
> >> Thanks! Appreciated to being thought of :-) We are actually working
> >> some grounds here in orcl to come up with a candidate as well. Just
> >> need to make sure the chain of commands here understands that it
> >> takes time and requires certain resources for testing/building
> >> etc. So, will shortly come back on this point.
> > 
> > Very good, let me know
> 
> 
> I got a "go" from my management for this. Would also like to
> nominate my colleague Mark Haywood, CCed, so we as a minimum can
> avoid reviewing our own commits.
> 
> I assume election happens by me sending a commit on MAINTAINERS, and
> then counting NAKs and ACKs?

Yes, please do. Remove all the old entries MAINTAINERS and cc those
people when you post your patch.

Regards,
Jason
Leon Romanovsky Feb. 12, 2019, 5:39 p.m. UTC | #12
On Tue, Feb 12, 2019 at 10:14:29AM +0100, Håkon Bugge wrote:
>
>
> > On 1 Feb 2019, at 20:02, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >
> > On Wed, Jan 30, 2019 at 02:13:40PM +0100, Håkon Bugge wrote:
> >>
> >>
> >>> On 30 Jan 2019, at 00:52, Mark Bloch <markb@mellanox.com> wrote:
> >>> On 1/29/19 3:43 PM, Jason Gunthorpe wrote:
> >>>> []
> >>>> Hey Hakon, I think you just got nominated as the new ACM maintainer :) :)
> >>
> >>
> >> Thanks! Appreciated to being thought of :-) We are actually working
> >> some grounds here in orcl to come up with a candidate as well. Just
> >> need to make sure the chain of commands here understands that it
> >> takes time and requires certain resources for testing/building
> >> etc. So, will shortly come back on this point.
> >
> > Very good, let me know
>
>
> I got a "go" from my management for this. Would also like to nominate my colleague Mark Haywood, CCed, so we as a minimum can avoid reviewing our own commits.
>
> I assume election happens by me sending a commit on MAINTAINERS, and then counting NAKs and ACKs?

Go for it.

>
> Hal, did I get it right that you want to be excused?
>
>
> Thxs, Håkon
>
>
>