diff mbox

[1/1] Initiate heavy sweep in MFTSubnSet fails during idle time process

Message ID 20110512161443.GC22389@calypso.voltaire.com (mailing list archive)
State New, archived
Delegated to: Alex Netes
Headers show

Commit Message

Alex Netes May 12, 2011, 4:14 p.m. UTC
MFTSubnSet failed MADs may leave temporary MC loops in the fabric.
In order to eliminate this faulty state as quick as possible it's a good
thing to initiate a heavy sweep immediately and to wait for the next light
sweep.

Signed-off-by: Alex Netes <alexne@mellanox.com>
---
 opensm/osm_state_mgr.c |    7 +++++++
 1 files changed, 7 insertions(+), 0 deletions(-)

Comments

Hal Rosenstock May 14, 2011, 2:43 p.m. UTC | #1
Hi Alex,

On 5/12/2011 12:14 PM, Alex Netes wrote:
> MFTSubnSet failed MADs may leave temporary MC loops in the fabric.
> In order to eliminate this faulty state as quick as possible it's a good
> thing to initiate a heavy sweep immediately and to wait for the next light
> sweep.
> 
> Signed-off-by: Alex Netes <alexne@mellanox.com>
> ---
>  opensm/osm_state_mgr.c |    7 +++++++
>  1 files changed, 7 insertions(+), 0 deletions(-)
> 
> diff --git a/opensm/osm_state_mgr.c b/opensm/osm_state_mgr.c
> index dd308f2..aa71b03 100644
> --- a/opensm/osm_state_mgr.c
> +++ b/opensm/osm_state_mgr.c
> @@ -1434,6 +1434,13 @@ static void do_process_mgrp_queue(osm_sm_t * sm)
>  		osm_mcast_mgr_process(sm);
>  		wait_for_pending_transactions(&sm->p_subn->p_osm->stats);
>  	}
> +
> +	/* if one or more MFTSubnSet MADs fails
> +	 * during idle process time initiate heavy sweep */
> +	if (sm->p_subn->force_heavy_sweep
> +	    || sm->p_subn->subnet_initialization_error)
> +		osm_sm_signal(sm, OSM_SIGNAL_SWEEP);

subnet_initialization_error is more than just set MFT failures. Should
it be narrowed down to just those failures ?

Also, while this looks like it would fix the scenario you mention,
couldn't this change cause a continual heavy sweep ?

-- Hal

> +
>  }
>  
>  void osm_state_mgr_process(IN osm_sm_t * sm, IN osm_signal_t signal)

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alex Netes May 15, 2011, 7:14 a.m. UTC | #2
Hi Hal,

On 10:43 Sat 14 May     , Hal Rosenstock wrote:
> Hi Alex,
> 
> On 5/12/2011 12:14 PM, Alex Netes wrote:
> > MFTSubnSet failed MADs may leave temporary MC loops in the fabric.
> > In order to eliminate this faulty state as quick as possible it's a good
> > thing to initiate a heavy sweep immediately and to wait for the next light
> > sweep.
> > 
> > Signed-off-by: Alex Netes <alexne@mellanox.com>
> > ---
> >  opensm/osm_state_mgr.c |    7 +++++++
> >  1 files changed, 7 insertions(+), 0 deletions(-)
> > 
> > diff --git a/opensm/osm_state_mgr.c b/opensm/osm_state_mgr.c
> > index dd308f2..aa71b03 100644
> > --- a/opensm/osm_state_mgr.c
> > +++ b/opensm/osm_state_mgr.c
> > @@ -1434,6 +1434,13 @@ static void do_process_mgrp_queue(osm_sm_t * sm)
> >  		osm_mcast_mgr_process(sm);
> >  		wait_for_pending_transactions(&sm->p_subn->p_osm->stats);
> >  	}
> > +
> > +	/* if one or more MFTSubnSet MADs fails
> > +	 * during idle process time initiate heavy sweep */
> > +	if (sm->p_subn->force_heavy_sweep
> > +	    || sm->p_subn->subnet_initialization_error)
> > +		osm_sm_signal(sm, OSM_SIGNAL_SWEEP);
> 
> subnet_initialization_error is more than just set MFT failures. Should
> it be narrowed down to just those failures ?
> 

Do you mean, just resend the MFTs without causing heavy sweep?

> Also, while this looks like it would fix the scenario you mention,
> couldn't this change cause a continual heavy sweep ?
> 

Yes. This can cause continual heavy sweep. But this would happen anyway. This
patch initiate heavy sweep immediately and without it, the heavy sweep would be
started on the next light sweep. In both cases you can end up in heavy sweep
loop.
> -- Hal
> 
> > +
> >  }
> >  
> >  void osm_state_mgr_process(IN osm_sm_t * sm, IN osm_signal_t signal)
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/opensm/osm_state_mgr.c b/opensm/osm_state_mgr.c
index dd308f2..aa71b03 100644
--- a/opensm/osm_state_mgr.c
+++ b/opensm/osm_state_mgr.c
@@ -1434,6 +1434,13 @@  static void do_process_mgrp_queue(osm_sm_t * sm)
 		osm_mcast_mgr_process(sm);
 		wait_for_pending_transactions(&sm->p_subn->p_osm->stats);
 	}
+
+	/* if one or more MFTSubnSet MADs fails
+	 * during idle process time initiate heavy sweep */
+	if (sm->p_subn->force_heavy_sweep
+	    || sm->p_subn->subnet_initialization_error)
+		osm_sm_signal(sm, OSM_SIGNAL_SWEEP);
+
 }
 
 void osm_state_mgr_process(IN osm_sm_t * sm, IN osm_signal_t signal)