Message ID | cover.1546850468.git.haakon.bugge@oracle.com (mailing list archive) |
---|---|
Headers | show |
Series | ibacm: acme supports only one port | expand |
> On 7 Jan 2019, at 09:45, Håkon Bugge <haakon.bugge@oracle.com> wrote: > > Some end-of-line spaces forced this into a commit series. > > > Håkon Bugge (2): > ibacm: Remove trailing blanks in acm.h > ibacm: acme supports only one port > > ibacm/include/infiniband/acm.h | 8 +++-- > ibacm/src/acm.c | 11 +++++-- > ibacm/src/acme.c | 36 ++++++++++++++--------- > ibacm/src/libacm.c | 53 +++++++++++++++++++++++----------- > ibacm/src/libacm.h | 2 +- > 5 files changed, 72 insertions(+), 38 deletions(-) Ping? > > -- > 2.19.2 >
On Wed, Jan 16, 2019 at 09:17:24PM +0100, Håkon Bugge wrote: > > > > On 7 Jan 2019, at 09:45, Håkon Bugge <haakon.bugge@oracle.com> wrote: > > > > Some end-of-line spaces forced this into a commit series. > > > > > > Håkon Bugge (2): > > ibacm: Remove trailing blanks in acm.h > > ibacm: acme supports only one port > > > > ibacm/include/infiniband/acm.h | 8 +++-- > > ibacm/src/acm.c | 11 +++++-- > > ibacm/src/acme.c | 36 ++++++++++++++--------- > > ibacm/src/libacm.c | 53 +++++++++++++++++++++++----------- > > ibacm/src/libacm.h | 2 +- > > 5 files changed, 72 insertions(+), 38 deletions(-) > > Ping? Does anyone want to review ACM code? I'm inclined to just apply Hakon's patches otherwise. Sean/Hal? I have you listed as maintaining ACM, do you want to stop? Jason
On 1/16/2019 3:34 PM, Jason Gunthorpe wrote: > Hal? I have you listed as maintaining ACM, do you want to stop? Yes; I don't really have the time for this anymore. -- Hal > Jason
> On 23 Jan 2019, at 18:17, Hal Rosenstock <hal@dev.mellanox.co.il> wrote: > > On 1/16/2019 3:34 PM, Jason Gunthorpe wrote: >> Hal? I have you listed as maintaining ACM, do you want to stop? > > Yes; I don't really have the time for this anymore. Anyone else? Thanks, Håkon
On Tue, Jan 29, 2019 at 09:46:58AM +0100, Håkon Bugge wrote: > > > > On 23 Jan 2019, at 18:17, Hal Rosenstock <hal@dev.mellanox.co.il> wrote: > > > > On 1/16/2019 3:34 PM, Jason Gunthorpe wrote: > >> Hal? I have you listed as maintaining ACM, do you want to stop? > > > > Yes; I don't really have the time for this anymore. > > Anyone else? Hey Hakon, I think you just got nominated as the new ACM maintainer :) :) Jason
On 1/29/19 3:43 PM, Jason Gunthorpe wrote: > On Tue, Jan 29, 2019 at 09:46:58AM +0100, Håkon Bugge wrote: >> >> >>> On 23 Jan 2019, at 18:17, Hal Rosenstock <hal@dev.mellanox.co.il> wrote: >>> >>> On 1/16/2019 3:34 PM, Jason Gunthorpe wrote: >>>> Hal? I have you listed as maintaining ACM, do you want to stop? >>> >>> Yes; I don't really have the time for this anymore. >> >> Anyone else? > > Hey Hakon, I think you just got nominated as the new ACM maintainer :) :) Congratulation!, on a semi-related note I've just found two fixes from 2016 I had for ibacm Hakon can you please have a look and see if they are still needed? commit 966120a5b3c0f8302a4a2d8a1d74207bd35c97e6 Author: Mark Bloch <markb@mellanox.com> Date: Thu Jun 2 14:40:59 2016 +0300 Fix invalidation of a path record When a timeout has expired for a path record, we need to initialize the DLID, otherwise the DLID will be included in the request for a new path record which is invalid because the DLID may have changed. Signed-off-by: Mark Bloch <markb@mellanox.com> diff --git a/prov/acmp/src/acmp.c b/prov/acmp/src/acmp.c index 717bb83..ad7be19 100644 --- a/prov/acmp/src/acmp.c +++ b/prov/acmp/src/acmp.c @@ -1734,11 +1734,12 @@ static int acmp_dest_timeout(struct acmp_dest *dest) { uint64_t timestamp = time_stamp_min(); - if (timestamp > dest->addr_timeout) { + if (timestamp >= dest->addr_timeout) { acm_log(2, "%s address timed out\n", dest->name); dest->state = ACMP_INIT; return 1; - } else if (timestamp > dest->route_timeout) { + } else if (timestamp >= dest->route_timeout) { + dest->path.dlid = 0; acm_log(2, "%s route timed out\n", dest->name); dest->state = ACMP_ADDR_RESOLVED; return 1; and: commit ce409bff62a1c9d6d0f3c1a7d0f93241558b8d79 Author: Mark Bloch <markb@mellanox.com> Date: Thu Sep 15 10:56:19 2016 +0300 Update port info on reregister event On SM handover/failover an IBV_EVENT_CLIENT_REREGISTER event is triggered. ibacm didn't update the port info (which includes SA info) which resulted that mads sent by ibacm were sent with stale SA parameters. The fix forces ibacm to update the port info on IBV_EVENT_CLIENT_REREGISTER. Signed-off-by: Mark Bloch <markb@mellanox.com> diff --git a/src/acm.c b/src/acm.c index b75f089..99c06c4 100644 --- a/src/acm.c +++ b/src/acm.c @@ -2599,6 +2599,7 @@ static void acm_event_handler(struct acmc_device *dev) acm_port_down(&dev->port[i]); break; case IBV_EVENT_CLIENT_REREGISTER: + acm_port_change(&dev->port[i]); if ((dev->port[i].state == IBV_PORT_ACTIVE) && dev->port[i].prov_port_context) { dev->port[i].prov->handle_event(dev->port[i].prov_port_context, I don't have an a setup/env to test this (just don't some git cleaning and found those) and it seems you care and testing this :) Mark > > Jason >
> On 30 Jan 2019, at 00:52, Mark Bloch <markb@mellanox.com> wrote: > On 1/29/19 3:43 PM, Jason Gunthorpe wrote: >> [] >> Hey Hakon, I think you just got nominated as the new ACM maintainer :) :) Thanks! Appreciated to being thought of :-) We are actually working some grounds here in orcl to come up with a candidate as well. Just need to make sure the chain of commands here understands that it takes time and requires certain resources for testing/building etc. So, will shortly come back on this point. But, an orcl maintainer isn't really expected to review orcl contributions, is she? > Congratulation!, Thanks! > on a semi-related note I've just found two fixes from 2016 I had for ibacm > Hakon can you please have a look and see if they are still needed? > > commit 966120a5b3c0f8302a4a2d8a1d74207bd35c97e6 > Author: Mark Bloch <markb@mellanox.com> > Date: Thu Jun 2 14:40:59 2016 +0300 > > Fix invalidation of a path record > > When a timeout has expired for a path record, we need to initialize > the DLID, otherwise the DLID will be included in the request > for a new path record which is invalid because the DLID may have > changed. I looked briefly into this, and IIRC, ACM uses both gid and lid when both are non-zero. But why not skip the volatile LID always and rely on the gid, which doesn't change? > > Signed-off-by: Mark Bloch <markb@mellanox.com> > > diff --git a/prov/acmp/src/acmp.c b/prov/acmp/src/acmp.c > index 717bb83..ad7be19 100644 > --- a/prov/acmp/src/acmp.c > +++ b/prov/acmp/src/acmp.c > @@ -1734,11 +1734,12 @@ static int acmp_dest_timeout(struct acmp_dest *dest) > { > uint64_t timestamp = time_stamp_min(); > > - if (timestamp > dest->addr_timeout) { > + if (timestamp >= dest->addr_timeout) { > acm_log(2, "%s address timed out\n", dest->name); > dest->state = ACMP_INIT; > return 1; > - } else if (timestamp > dest->route_timeout) { > + } else if (timestamp >= dest->route_timeout) { > + dest->path.dlid = 0; > acm_log(2, "%s route timed out\n", dest->name); > dest->state = ACMP_ADDR_RESOLVED; > return 1; > and: > commit ce409bff62a1c9d6d0f3c1a7d0f93241558b8d79 > Author: Mark Bloch <markb@mellanox.com> > Date: Thu Sep 15 10:56:19 2016 +0300 > > Update port info on reregister event > > On SM handover/failover an IBV_EVENT_CLIENT_REREGISTER > event is triggered. ibacm didn't update the port info (which includes > SA info) which resulted that mads sent by ibacm were sent with stale > SA parameters. > > The fix forces ibacm to update the port info on IBV_EVENT_CLIENT_REREGISTER. If the case here is that the port is active, this sounds correct. Need to see if the issue can be reproed by using ibportstate to change the lid, and get opensm to restore it. Would that be the scenario this commit aims at fixing? Thxs, Håkon > > Signed-off-by: Mark Bloch <markb@mellanox.com> > > diff --git a/src/acm.c b/src/acm.c > index b75f089..99c06c4 100644 > --- a/src/acm.c > +++ b/src/acm.c > @@ -2599,6 +2599,7 @@ static void acm_event_handler(struct acmc_device *dev) > acm_port_down(&dev->port[i]); > break; > case IBV_EVENT_CLIENT_REREGISTER: > + acm_port_change(&dev->port[i]); > if ((dev->port[i].state == IBV_PORT_ACTIVE) && > dev->port[i].prov_port_context) { > dev->port[i].prov->handle_event(dev->port[i].prov_port_context, > > I don't have an a setup/env to test this (just don't some git cleaning and found those) > and it seems you care and testing this :) > > Mark >> >> Jason
On 1/30/19 5:13 AM, Håkon Bugge wrote: > > >> On 30 Jan 2019, at 00:52, Mark Bloch <markb@mellanox.com> wrote: >> On 1/29/19 3:43 PM, Jason Gunthorpe wrote: >>> [] >>> Hey Hakon, I think you just got nominated as the new ACM maintainer :) :) > > > Thanks! Appreciated to being thought of :-) We are actually working some grounds here in orcl to come up with a candidate as well. Just need to make sure the chain of commands here understands that it takes time and requires certain resources for testing/building etc. So, will shortly come back on this point. > > But, an orcl maintainer isn't really expected to review orcl contributions, is she? > >> Congratulation!, > > Thanks! > >> on a semi-related note I've just found two fixes from 2016 I had for ibacm >> Hakon can you please have a look and see if they are still needed? >> >> commit 966120a5b3c0f8302a4a2d8a1d74207bd35c97e6 >> Author: Mark Bloch <markb@mellanox.com> >> Date: Thu Jun 2 14:40:59 2016 +0300 >> >> Fix invalidation of a path record >> >> When a timeout has expired for a path record, we need to initialize >> the DLID, otherwise the DLID will be included in the request >> for a new path record which is invalid because the DLID may have >> changed. > > I looked briefly into this, and IIRC, ACM uses both gid and lid when both are non-zero. But why not skip the volatile LID always and rely on the gid, which doesn't change? Don't we use the LID in other places? > > >> >> Signed-off-by: Mark Bloch <markb@mellanox.com> >> >> diff --git a/prov/acmp/src/acmp.c b/prov/acmp/src/acmp.c >> index 717bb83..ad7be19 100644 >> --- a/prov/acmp/src/acmp.c >> +++ b/prov/acmp/src/acmp.c >> @@ -1734,11 +1734,12 @@ static int acmp_dest_timeout(struct acmp_dest *dest) >> { >> uint64_t timestamp = time_stamp_min(); >> >> - if (timestamp > dest->addr_timeout) { >> + if (timestamp >= dest->addr_timeout) { >> acm_log(2, "%s address timed out\n", dest->name); >> dest->state = ACMP_INIT; >> return 1; >> - } else if (timestamp > dest->route_timeout) { >> + } else if (timestamp >= dest->route_timeout) { >> + dest->path.dlid = 0; >> acm_log(2, "%s route timed out\n", dest->name); >> dest->state = ACMP_ADDR_RESOLVED; >> return 1; >> and: >> commit ce409bff62a1c9d6d0f3c1a7d0f93241558b8d79 >> Author: Mark Bloch <markb@mellanox.com> >> Date: Thu Sep 15 10:56:19 2016 +0300 >> >> Update port info on reregister event >> >> On SM handover/failover an IBV_EVENT_CLIENT_REREGISTER >> event is triggered. ibacm didn't update the port info (which includes >> SA info) which resulted that mads sent by ibacm were sent with stale >> SA parameters. >> >> The fix forces ibacm to update the port info on IBV_EVENT_CLIENT_REREGISTER. > > If the case here is that the port is active, this sounds correct. Need to see if the issue can be reproed by using ibportstate to change the lid, and get opensm to restore it. Would that be the scenario this commit aims at fixing? From what I remember (and it was a few years back :)), it fixes the issue where a new SM takes over but ibacm still tries to use the old one. Mark > > > > Thxs, Håkon > > >> >> Signed-off-by: Mark Bloch <markb@mellanox.com> >> >> diff --git a/src/acm.c b/src/acm.c >> index b75f089..99c06c4 100644 >> --- a/src/acm.c >> +++ b/src/acm.c >> @@ -2599,6 +2599,7 @@ static void acm_event_handler(struct acmc_device *dev) >> acm_port_down(&dev->port[i]); >> break; >> case IBV_EVENT_CLIENT_REREGISTER: >> + acm_port_change(&dev->port[i]); >> if ((dev->port[i].state == IBV_PORT_ACTIVE) && >> dev->port[i].prov_port_context) { >> dev->port[i].prov->handle_event(dev->port[i].prov_port_context, >> >> I don't have an a setup/env to test this (just don't some git cleaning and found those) >> and it seems you care and testing this :) >> >> Mark >>> >>> Jason >
On Wed, Jan 30, 2019 at 02:13:40PM +0100, Håkon Bugge wrote: > > > > On 30 Jan 2019, at 00:52, Mark Bloch <markb@mellanox.com> wrote: > > On 1/29/19 3:43 PM, Jason Gunthorpe wrote: > >> [] > >> Hey Hakon, I think you just got nominated as the new ACM maintainer :) :) > > > Thanks! Appreciated to being thought of :-) We are actually working > some grounds here in orcl to come up with a candidate as well. Just > need to make sure the chain of commands here understands that it > takes time and requires certain resources for testing/building > etc. So, will shortly come back on this point. Very good, let me know > But, an orcl maintainer isn't really expected to review orcl > contributions, is she? Sure, I review Mellanox stuff all the time. You just have to be trusted to keep the community responsibility hat distinct from the business responsibility hat. Jason
> On 1 Feb 2019, at 20:02, Jason Gunthorpe <jgg@ziepe.ca> wrote: > > On Wed, Jan 30, 2019 at 02:13:40PM +0100, Håkon Bugge wrote: >> >> >>> On 30 Jan 2019, at 00:52, Mark Bloch <markb@mellanox.com> wrote: >>> On 1/29/19 3:43 PM, Jason Gunthorpe wrote: >>>> [] >>>> Hey Hakon, I think you just got nominated as the new ACM maintainer :) :) >> >> >> Thanks! Appreciated to being thought of :-) We are actually working >> some grounds here in orcl to come up with a candidate as well. Just >> need to make sure the chain of commands here understands that it >> takes time and requires certain resources for testing/building >> etc. So, will shortly come back on this point. > > Very good, let me know I got a "go" from my management for this. Would also like to nominate my colleague Mark Haywood, CCed, so we as a minimum can avoid reviewing our own commits. I assume election happens by me sending a commit on MAINTAINERS, and then counting NAKs and ACKs? Hal, did I get it right that you want to be excused? Thxs, Håkon
On Tue, Feb 12, 2019 at 10:14:29AM +0100, Håkon Bugge wrote: > > > > On 1 Feb 2019, at 20:02, Jason Gunthorpe <jgg@ziepe.ca> wrote: > > > > On Wed, Jan 30, 2019 at 02:13:40PM +0100, Håkon Bugge wrote: > >> > >> > >>> On 30 Jan 2019, at 00:52, Mark Bloch <markb@mellanox.com> wrote: > >>> On 1/29/19 3:43 PM, Jason Gunthorpe wrote: > >>>> [] > >>>> Hey Hakon, I think you just got nominated as the new ACM maintainer :) :) > >> > >> > >> Thanks! Appreciated to being thought of :-) We are actually working > >> some grounds here in orcl to come up with a candidate as well. Just > >> need to make sure the chain of commands here understands that it > >> takes time and requires certain resources for testing/building > >> etc. So, will shortly come back on this point. > > > > Very good, let me know > > > I got a "go" from my management for this. Would also like to > nominate my colleague Mark Haywood, CCed, so we as a minimum can > avoid reviewing our own commits. > > I assume election happens by me sending a commit on MAINTAINERS, and > then counting NAKs and ACKs? Yes, please do. Remove all the old entries MAINTAINERS and cc those people when you post your patch. Regards, Jason
On Tue, Feb 12, 2019 at 10:14:29AM +0100, Håkon Bugge wrote: > > > > On 1 Feb 2019, at 20:02, Jason Gunthorpe <jgg@ziepe.ca> wrote: > > > > On Wed, Jan 30, 2019 at 02:13:40PM +0100, Håkon Bugge wrote: > >> > >> > >>> On 30 Jan 2019, at 00:52, Mark Bloch <markb@mellanox.com> wrote: > >>> On 1/29/19 3:43 PM, Jason Gunthorpe wrote: > >>>> [] > >>>> Hey Hakon, I think you just got nominated as the new ACM maintainer :) :) > >> > >> > >> Thanks! Appreciated to being thought of :-) We are actually working > >> some grounds here in orcl to come up with a candidate as well. Just > >> need to make sure the chain of commands here understands that it > >> takes time and requires certain resources for testing/building > >> etc. So, will shortly come back on this point. > > > > Very good, let me know > > > I got a "go" from my management for this. Would also like to nominate my colleague Mark Haywood, CCed, so we as a minimum can avoid reviewing our own commits. > > I assume election happens by me sending a commit on MAINTAINERS, and then counting NAKs and ACKs? Go for it. > > Hal, did I get it right that you want to be excused? > > > Thxs, Håkon > > >