Message ID | 50BCC497.2030207@dev.mellanox.co.il (mailing list archive) |
---|---|
State | Deferred |
Delegated to: | Alex Netes |
Headers | show |
On 12/03/2012 08:26 AM, Hal Rosenstock wrote: > > Signed-off-by: Alex Netes<alexne@mellanox.com> > Signed-off-by: Hal Rosenstock<hal@mellanox.com> > --- > opensm/osm_torus.c | 2 +- > 1 files changed, 1 insertions(+), 1 deletions(-) > > diff --git a/opensm/osm_torus.c b/opensm/osm_torus.c > index c06f8d4..075f84a 100644 > --- a/opensm/osm_torus.c > +++ b/opensm/osm_torus.c > @@ -8089,7 +8089,7 @@ void torus_update_osm_vlarb(void *context, osm_physp_t *osm_phys_port, > * So, leave VL 0 alone, remap VL 4 to VL 1, zero out the rest, > * and compress out the zero entries to the end. > */ > - if (!sw || !port_num || > + if (!sw || !port_num || sw->port[port_num] || > sw->port[port_num]->pgrp->port_grp != 2 * TORUS_MAX_DIM) > return; > With the patch as-is, if torus_update_osm_vlarb() returns early for any non-NULL switch port, it will never do any updates. If the crash was that sw->port[port_num] was NULL, shouldn't the check be !sw->port[port_num] ? Can you tell me more about the test case that leads to the crash? Is it that there's a switch with a port that's not connected to anything, and torus_update_osm_vlarb() was called for it? Testing for a non-NULL sw->port[port_num] is definitely the right thing to do to handle that case, and I'm sorry I missed it earlier. If not, then something else is likely broken, and we need to find and fix that. -- Jim -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 12/3/2012 4:09 PM, Jim Schutt wrote: > On 12/03/2012 08:26 AM, Hal Rosenstock wrote: >> >> Signed-off-by: Alex Netes<alexne@mellanox.com> >> Signed-off-by: Hal Rosenstock<hal@mellanox.com> >> --- >> opensm/osm_torus.c | 2 +- >> 1 files changed, 1 insertions(+), 1 deletions(-) >> >> diff --git a/opensm/osm_torus.c b/opensm/osm_torus.c >> index c06f8d4..075f84a 100644 >> --- a/opensm/osm_torus.c >> +++ b/opensm/osm_torus.c >> @@ -8089,7 +8089,7 @@ void torus_update_osm_vlarb(void *context, >> osm_physp_t *osm_phys_port, >> * So, leave VL 0 alone, remap VL 4 to VL 1, zero out the rest, >> * and compress out the zero entries to the end. >> */ >> - if (!sw || !port_num || >> + if (!sw || !port_num || sw->port[port_num] || >> sw->port[port_num]->pgrp->port_grp != 2 * TORUS_MAX_DIM) >> return; >> > > With the patch as-is, if torus_update_osm_vlarb() returns early > for any non-NULL switch port, it will never do any updates. > > If the crash was that sw->port[port_num] was NULL, > shouldn't the check be !sw->port[port_num] ? > > Can you tell me more about the test case that leads to the crash? > > Is it that there's a switch with a port that's not connected > to anything, and torus_update_osm_vlarb() was called for it? > > Testing for a non-NULL sw->port[port_num] is definitely the right > thing to do to handle that case, and I'm sorry I missed it earlier. > > If not, then something else is likely broken, and we need to find > and fix that. Yes, it was meant to be NULL pointer check. v2 of patch coming soon. -- Hal > > -- Jim > > -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/opensm/osm_torus.c b/opensm/osm_torus.c index c06f8d4..075f84a 100644 --- a/opensm/osm_torus.c +++ b/opensm/osm_torus.c @@ -8089,7 +8089,7 @@ void torus_update_osm_vlarb(void *context, osm_physp_t *osm_phys_port, * So, leave VL 0 alone, remap VL 4 to VL 1, zero out the rest, * and compress out the zero entries to the end. */ - if (!sw || !port_num || + if (!sw || !port_num || sw->port[port_num] || sw->port[port_num]->pgrp->port_grp != 2 * TORUS_MAX_DIM) return;