diff mbox

[3/3] opensm/osm_perfmgr.c: Fix perfmgr sweep_state race

Message ID 1403891774.30332.421.camel@auk59.llnl.gov (mailing list archive)
State Superseded
Delegated to: Hal Rosenstock
Headers show

Commit Message

Al Chu June 27, 2014, 5:56 p.m. UTC
Introduce new sweep state PERFMGR_SWEEP_POST_PROCESSING to fix
race in perfmgr.

Race occurs as follows:

Under typical conditions, osm_perfmgr_process() is entered
with sweep_state set to PERFMGR_SWEEP_SLEEP.  osm_perfmgr_process()
sets sweep_state to PERFMGR_SWEEP_ACTIVE when it begins to sweep.

osm_perfmgr_process() will eventually call perfmgr_send_mad() by
way of perfmgr_query_counters() and several other functions.

Responses to performance counter MADs may initiate the sending
of more MADs via perfmgr_send_mad(), such as through redirection
or the desire to clear counters.

If too many MADs have been put on the wire, perfmgr_send_mad()
will throttle sending out MADS and temporarily change sweep_state
between PERFMGR_SWEEP_SUSPENDED and PERFMGR_SWEEP_ACTIVE as it
throttles.  The sweep_state is set to PERFMGR_SWEEP_ACTIVE
when all performance counter MADs have been sent out by the sweeper.

osm_perfmgr_process() eventually completes its sweep and puts
sweep_state back into PERFMGR_SWEEP_SLEEP.

At this point, some MADs may still be on the wire.  New MADs may be
put back on the wire if responses necessitate it (redirection or
clearing counters).  If enough MADs are put back onto the wire,
perfmgr_send_mad() will throttle as normal, temporarily moving
between PERFMGR_SWEEP_SUSPENDED and PERFMGR_SWEEP_ACTIVE.  After
the throttling is complete, sweep_state is put into
PERFMGR_SWEEP_ACTIVE state.

This is the key problem, the sweep_state is changed from
PERFMGR_SWEEP_SLEEP to PERFMGR_SWEEP_ACTIVE outside of
osm_perfmgr_process().

Now that the perfmgr is in ACTIVE state, any future sweep call to
osm_perfmgr_process() will not sweep b/c the sweep_state is set
to PERFMGR_SWEEP_ACTIVE.

The introduction of a new sweep_state PERFMGR_SWEEP_POST_PROCESSING
fixes this problem.

If perfmgr_send_mad() throttles mads while in PERFMGR_SWEEP_SLEEP.
sweep_state will be moved into the PERFMGR_SWEEP_POST_PROCESSING
state instead of PERFMGR_SWEEP_SUSPENDED/PERFMGR_SWEEP_ACTIVE.

When all post-SLEEP state MAD processing is complete, the sweep_state
will move from PERFMGR_SWEEP_POST_PROCESSING back to PERFMGR_SWEEP_SLEEP,
so that future sweeps can operate as normal.

Signed-off-by: Albert L. Chu <chu11@llnl.gov>
---
 include/opensm/osm_perfmgr.h |    6 +++++-
 opensm/osm_perfmgr.c         |   21 +++++++++++++++++++--
 2 files changed, 24 insertions(+), 3 deletions(-)

Comments

Hal Rosenstock June 27, 2014, 9:36 p.m. UTC | #1
On 6/27/2014 1:56 PM, Albert Chu wrote:
> Introduce new sweep state PERFMGR_SWEEP_POST_PROCESSING to fix
> race in perfmgr.
> 
> Race occurs as follows:

Nice find!

> 
> Under typical conditions, osm_perfmgr_process() is entered
> with sweep_state set to PERFMGR_SWEEP_SLEEP.  osm_perfmgr_process()
> sets sweep_state to PERFMGR_SWEEP_ACTIVE when it begins to sweep.
> 
> osm_perfmgr_process() will eventually call perfmgr_send_mad() by
> way of perfmgr_query_counters() and several other functions.
> 
> Responses to performance counter MADs may initiate the sending
> of more MADs via perfmgr_send_mad(), such as through redirection
> or the desire to clear counters.
> 
> If too many MADs have been put on the wire, perfmgr_send_mad()
> will throttle sending out MADS and temporarily change sweep_state
> between PERFMGR_SWEEP_SUSPENDED and PERFMGR_SWEEP_ACTIVE as it
> throttles.  The sweep_state is set to PERFMGR_SWEEP_ACTIVE
> when all performance counter MADs have been sent out by the sweeper.
> 
> osm_perfmgr_process() eventually completes its sweep and puts
> sweep_state back into PERFMGR_SWEEP_SLEEP.
> 
> At this point, some MADs may still be on the wire.  New MADs may be
> put back on the wire if responses necessitate it (redirection or
> clearing counters).  If enough MADs are put back onto the wire,
> perfmgr_send_mad() will throttle as normal, temporarily moving
> between PERFMGR_SWEEP_SUSPENDED and PERFMGR_SWEEP_ACTIVE.  After
> the throttling is complete, sweep_state is put into
> PERFMGR_SWEEP_ACTIVE state.
> 
> This is the key problem, the sweep_state is changed from
> PERFMGR_SWEEP_SLEEP to PERFMGR_SWEEP_ACTIVE outside of
> osm_perfmgr_process().
> 
> Now that the perfmgr is in ACTIVE state, any future sweep call to
> osm_perfmgr_process() will not sweep b/c the sweep_state is set
> to PERFMGR_SWEEP_ACTIVE.
> 
> The introduction of a new sweep_state PERFMGR_SWEEP_POST_PROCESSING
> fixes this problem.
> 
> If perfmgr_send_mad() throttles mads while in PERFMGR_SWEEP_SLEEP.
> sweep_state will be moved into the PERFMGR_SWEEP_POST_PROCESSING
> state instead of PERFMGR_SWEEP_SUSPENDED/PERFMGR_SWEEP_ACTIVE.
> 
> When all post-SLEEP state MAD processing is complete, the sweep_state
> will move from PERFMGR_SWEEP_POST_PROCESSING back to PERFMGR_SWEEP_SLEEP,
> so that future sweeps can operate as normal.
> 
> Signed-off-by: Albert L. Chu <chu11@llnl.gov>
> ---
>  include/opensm/osm_perfmgr.h |    6 +++++-
>  opensm/osm_perfmgr.c         |   21 +++++++++++++++++++--
>  2 files changed, 24 insertions(+), 3 deletions(-)
> 
> diff --git a/include/opensm/osm_perfmgr.h b/include/opensm/osm_perfmgr.h
> index ab02c26..44a278d 100644
> --- a/include/opensm/osm_perfmgr.h
> +++ b/include/opensm/osm_perfmgr.h
> @@ -88,7 +88,8 @@ typedef enum {
>  typedef enum {
>  	PERFMGR_SWEEP_SLEEP,
>  	PERFMGR_SWEEP_ACTIVE,
> -	PERFMGR_SWEEP_SUSPENDED
> +	PERFMGR_SWEEP_SUSPENDED,
> +	PERFMGR_SWEEP_POST_PROCESSING
>  } osm_perfmgr_sweep_state_t;
>  
>  typedef struct monitored_port {
> @@ -239,6 +240,9 @@ inline static const char *osm_perfmgr_get_sweep_state_str(osm_perfmgr_t * perfmg
>  	case PERFMGR_SWEEP_SUSPENDED:
>  		return "Suspended";
>  		break;
> +	case PERFMGR_SWEEP_POST_PROCESSING:
> +		return "PostProcessing";
> +		break;
>  	}
>  	return "UNKNOWN";
>  }
> diff --git a/opensm/osm_perfmgr.c b/opensm/osm_perfmgr.c
> index bfa63dc..863e9bf 100644
> --- a/opensm/osm_perfmgr.c
> +++ b/opensm/osm_perfmgr.c
> @@ -164,6 +164,16 @@ static inline void decrement_outstanding_queries(osm_perfmgr_t * pm)
>  {
>  	cl_atomic_dec(&pm->outstanding_queries);
>  	cl_event_signal(&pm->sig_query);
> +
> +	if (!pm->outstanding_queries) {
> +		cl_spinlock_acquire(&pm->lock);
> +		if (pm->sweep_state == PERFMGR_SWEEP_POST_PROCESSING) {
> +			pm->sweep_state = PERFMGR_SWEEP_SLEEP;
> +			OSM_LOG(pm->log, OSM_LOG_INFO,
> +				"PM sweep state exiting Post Processing\n");
> +		}
> +		cl_spinlock_release(&pm->lock);
> +	}

Shouldn't this potential sweep state update occur prior to the event
being signaled ?

-- Hal

>  }
>  
>  /**********************************************************************
> @@ -439,7 +449,13 @@ static ib_api_status_t perfmgr_send_mad(osm_perfmgr_t *perfmgr,
>  		while (perfmgr->outstanding_queries >
>  		       (int32_t)perfmgr->max_outstanding_queries) {
>  			cl_spinlock_acquire(&perfmgr->lock);
> -			perfmgr->sweep_state = PERFMGR_SWEEP_SUSPENDED;
> +			if (perfmgr->sweep_state == PERFMGR_SWEEP_SLEEP) {
> +				perfmgr->sweep_state = PERFMGR_SWEEP_POST_PROCESSING;
> +				OSM_LOG(perfmgr->log, OSM_LOG_INFO,
> +					"PM sweep state going into Post Processing\n");
> +			}
> +			else if (perfmgr->sweep_state == PERFMGR_SWEEP_ACTIVE)
> +				perfmgr->sweep_state = PERFMGR_SWEEP_SUSPENDED;
>  			cl_spinlock_release(&perfmgr->lock);
>  wait:
>  			sts = cl_event_wait_on(&perfmgr->sig_query,
> @@ -1015,7 +1031,8 @@ void osm_perfmgr_process(osm_perfmgr_t * pm)
>  
>  	cl_spinlock_acquire(&pm->lock);
>  	if (pm->sweep_state == PERFMGR_SWEEP_ACTIVE ||
> -	    pm->sweep_state == PERFMGR_SWEEP_SUSPENDED) {
> +	    pm->sweep_state == PERFMGR_SWEEP_SUSPENDED ||
> +	    pm->sweep_state == PERFMGR_SWEEP_POST_PROCESSING) {
>  		cl_spinlock_release(&pm->lock);
>  		OSM_LOG(pm->log, OSM_LOG_INFO,
>  			"PM sweep state %d, skipping sweep\n",

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Al Chu June 30, 2014, 5:05 p.m. UTC | #2
Hey Hal,

Thanks.  You're right about the tweak below.  I'll resubmit the patch.

Al

On Fri, 2014-06-27 at 17:36 -0400, Hal Rosenstock wrote:
> On 6/27/2014 1:56 PM, Albert Chu wrote:
> > Introduce new sweep state PERFMGR_SWEEP_POST_PROCESSING to fix
> > race in perfmgr.
> > 
> > Race occurs as follows:
> 
> Nice find!
> 
> > 
> > Under typical conditions, osm_perfmgr_process() is entered
> > with sweep_state set to PERFMGR_SWEEP_SLEEP.  osm_perfmgr_process()
> > sets sweep_state to PERFMGR_SWEEP_ACTIVE when it begins to sweep.
> > 
> > osm_perfmgr_process() will eventually call perfmgr_send_mad() by
> > way of perfmgr_query_counters() and several other functions.
> > 
> > Responses to performance counter MADs may initiate the sending
> > of more MADs via perfmgr_send_mad(), such as through redirection
> > or the desire to clear counters.
> > 
> > If too many MADs have been put on the wire, perfmgr_send_mad()
> > will throttle sending out MADS and temporarily change sweep_state
> > between PERFMGR_SWEEP_SUSPENDED and PERFMGR_SWEEP_ACTIVE as it
> > throttles.  The sweep_state is set to PERFMGR_SWEEP_ACTIVE
> > when all performance counter MADs have been sent out by the sweeper.
> > 
> > osm_perfmgr_process() eventually completes its sweep and puts
> > sweep_state back into PERFMGR_SWEEP_SLEEP.
> > 
> > At this point, some MADs may still be on the wire.  New MADs may be
> > put back on the wire if responses necessitate it (redirection or
> > clearing counters).  If enough MADs are put back onto the wire,
> > perfmgr_send_mad() will throttle as normal, temporarily moving
> > between PERFMGR_SWEEP_SUSPENDED and PERFMGR_SWEEP_ACTIVE.  After
> > the throttling is complete, sweep_state is put into
> > PERFMGR_SWEEP_ACTIVE state.
> > 
> > This is the key problem, the sweep_state is changed from
> > PERFMGR_SWEEP_SLEEP to PERFMGR_SWEEP_ACTIVE outside of
> > osm_perfmgr_process().
> > 
> > Now that the perfmgr is in ACTIVE state, any future sweep call to
> > osm_perfmgr_process() will not sweep b/c the sweep_state is set
> > to PERFMGR_SWEEP_ACTIVE.
> > 
> > The introduction of a new sweep_state PERFMGR_SWEEP_POST_PROCESSING
> > fixes this problem.
> > 
> > If perfmgr_send_mad() throttles mads while in PERFMGR_SWEEP_SLEEP.
> > sweep_state will be moved into the PERFMGR_SWEEP_POST_PROCESSING
> > state instead of PERFMGR_SWEEP_SUSPENDED/PERFMGR_SWEEP_ACTIVE.
> > 
> > When all post-SLEEP state MAD processing is complete, the sweep_state
> > will move from PERFMGR_SWEEP_POST_PROCESSING back to PERFMGR_SWEEP_SLEEP,
> > so that future sweeps can operate as normal.
> > 
> > Signed-off-by: Albert L. Chu <chu11@llnl.gov>
> > ---
> >  include/opensm/osm_perfmgr.h |    6 +++++-
> >  opensm/osm_perfmgr.c         |   21 +++++++++++++++++++--
> >  2 files changed, 24 insertions(+), 3 deletions(-)
> > 
> > diff --git a/include/opensm/osm_perfmgr.h b/include/opensm/osm_perfmgr.h
> > index ab02c26..44a278d 100644
> > --- a/include/opensm/osm_perfmgr.h
> > +++ b/include/opensm/osm_perfmgr.h
> > @@ -88,7 +88,8 @@ typedef enum {
> >  typedef enum {
> >  	PERFMGR_SWEEP_SLEEP,
> >  	PERFMGR_SWEEP_ACTIVE,
> > -	PERFMGR_SWEEP_SUSPENDED
> > +	PERFMGR_SWEEP_SUSPENDED,
> > +	PERFMGR_SWEEP_POST_PROCESSING
> >  } osm_perfmgr_sweep_state_t;
> >  
> >  typedef struct monitored_port {
> > @@ -239,6 +240,9 @@ inline static const char *osm_perfmgr_get_sweep_state_str(osm_perfmgr_t * perfmg
> >  	case PERFMGR_SWEEP_SUSPENDED:
> >  		return "Suspended";
> >  		break;
> > +	case PERFMGR_SWEEP_POST_PROCESSING:
> > +		return "PostProcessing";
> > +		break;
> >  	}
> >  	return "UNKNOWN";
> >  }
> > diff --git a/opensm/osm_perfmgr.c b/opensm/osm_perfmgr.c
> > index bfa63dc..863e9bf 100644
> > --- a/opensm/osm_perfmgr.c
> > +++ b/opensm/osm_perfmgr.c
> > @@ -164,6 +164,16 @@ static inline void decrement_outstanding_queries(osm_perfmgr_t * pm)
> >  {
> >  	cl_atomic_dec(&pm->outstanding_queries);
> >  	cl_event_signal(&pm->sig_query);
> > +
> > +	if (!pm->outstanding_queries) {
> > +		cl_spinlock_acquire(&pm->lock);
> > +		if (pm->sweep_state == PERFMGR_SWEEP_POST_PROCESSING) {
> > +			pm->sweep_state = PERFMGR_SWEEP_SLEEP;
> > +			OSM_LOG(pm->log, OSM_LOG_INFO,
> > +				"PM sweep state exiting Post Processing\n");
> > +		}
> > +		cl_spinlock_release(&pm->lock);
> > +	}
> 
> Shouldn't this potential sweep state update occur prior to the event
> being signaled ?
> 
> -- Hal
> 
> >  }
> >  
> >  /**********************************************************************
> > @@ -439,7 +449,13 @@ static ib_api_status_t perfmgr_send_mad(osm_perfmgr_t *perfmgr,
> >  		while (perfmgr->outstanding_queries >
> >  		       (int32_t)perfmgr->max_outstanding_queries) {
> >  			cl_spinlock_acquire(&perfmgr->lock);
> > -			perfmgr->sweep_state = PERFMGR_SWEEP_SUSPENDED;
> > +			if (perfmgr->sweep_state == PERFMGR_SWEEP_SLEEP) {
> > +				perfmgr->sweep_state = PERFMGR_SWEEP_POST_PROCESSING;
> > +				OSM_LOG(perfmgr->log, OSM_LOG_INFO,
> > +					"PM sweep state going into Post Processing\n");
> > +			}
> > +			else if (perfmgr->sweep_state == PERFMGR_SWEEP_ACTIVE)
> > +				perfmgr->sweep_state = PERFMGR_SWEEP_SUSPENDED;
> >  			cl_spinlock_release(&perfmgr->lock);
> >  wait:
> >  			sts = cl_event_wait_on(&perfmgr->sig_query,
> > @@ -1015,7 +1031,8 @@ void osm_perfmgr_process(osm_perfmgr_t * pm)
> >  
> >  	cl_spinlock_acquire(&pm->lock);
> >  	if (pm->sweep_state == PERFMGR_SWEEP_ACTIVE ||
> > -	    pm->sweep_state == PERFMGR_SWEEP_SUSPENDED) {
> > +	    pm->sweep_state == PERFMGR_SWEEP_SUSPENDED ||
> > +	    pm->sweep_state == PERFMGR_SWEEP_POST_PROCESSING) {
> >  		cl_spinlock_release(&pm->lock);
> >  		OSM_LOG(pm->log, OSM_LOG_INFO,
> >  			"PM sweep state %d, skipping sweep\n",
>
diff mbox

Patch

diff --git a/include/opensm/osm_perfmgr.h b/include/opensm/osm_perfmgr.h
index ab02c26..44a278d 100644
--- a/include/opensm/osm_perfmgr.h
+++ b/include/opensm/osm_perfmgr.h
@@ -88,7 +88,8 @@  typedef enum {
 typedef enum {
 	PERFMGR_SWEEP_SLEEP,
 	PERFMGR_SWEEP_ACTIVE,
-	PERFMGR_SWEEP_SUSPENDED
+	PERFMGR_SWEEP_SUSPENDED,
+	PERFMGR_SWEEP_POST_PROCESSING
 } osm_perfmgr_sweep_state_t;
 
 typedef struct monitored_port {
@@ -239,6 +240,9 @@  inline static const char *osm_perfmgr_get_sweep_state_str(osm_perfmgr_t * perfmg
 	case PERFMGR_SWEEP_SUSPENDED:
 		return "Suspended";
 		break;
+	case PERFMGR_SWEEP_POST_PROCESSING:
+		return "PostProcessing";
+		break;
 	}
 	return "UNKNOWN";
 }
diff --git a/opensm/osm_perfmgr.c b/opensm/osm_perfmgr.c
index bfa63dc..863e9bf 100644
--- a/opensm/osm_perfmgr.c
+++ b/opensm/osm_perfmgr.c
@@ -164,6 +164,16 @@  static inline void decrement_outstanding_queries(osm_perfmgr_t * pm)
 {
 	cl_atomic_dec(&pm->outstanding_queries);
 	cl_event_signal(&pm->sig_query);
+
+	if (!pm->outstanding_queries) {
+		cl_spinlock_acquire(&pm->lock);
+		if (pm->sweep_state == PERFMGR_SWEEP_POST_PROCESSING) {
+			pm->sweep_state = PERFMGR_SWEEP_SLEEP;
+			OSM_LOG(pm->log, OSM_LOG_INFO,
+				"PM sweep state exiting Post Processing\n");
+		}
+		cl_spinlock_release(&pm->lock);
+	}
 }
 
 /**********************************************************************
@@ -439,7 +449,13 @@  static ib_api_status_t perfmgr_send_mad(osm_perfmgr_t *perfmgr,
 		while (perfmgr->outstanding_queries >
 		       (int32_t)perfmgr->max_outstanding_queries) {
 			cl_spinlock_acquire(&perfmgr->lock);
-			perfmgr->sweep_state = PERFMGR_SWEEP_SUSPENDED;
+			if (perfmgr->sweep_state == PERFMGR_SWEEP_SLEEP) {
+				perfmgr->sweep_state = PERFMGR_SWEEP_POST_PROCESSING;
+				OSM_LOG(perfmgr->log, OSM_LOG_INFO,
+					"PM sweep state going into Post Processing\n");
+			}
+			else if (perfmgr->sweep_state == PERFMGR_SWEEP_ACTIVE)
+				perfmgr->sweep_state = PERFMGR_SWEEP_SUSPENDED;
 			cl_spinlock_release(&perfmgr->lock);
 wait:
 			sts = cl_event_wait_on(&perfmgr->sig_query,
@@ -1015,7 +1031,8 @@  void osm_perfmgr_process(osm_perfmgr_t * pm)
 
 	cl_spinlock_acquire(&pm->lock);
 	if (pm->sweep_state == PERFMGR_SWEEP_ACTIVE ||
-	    pm->sweep_state == PERFMGR_SWEEP_SUSPENDED) {
+	    pm->sweep_state == PERFMGR_SWEEP_SUSPENDED ||
+	    pm->sweep_state == PERFMGR_SWEEP_POST_PROCESSING) {
 		cl_spinlock_release(&pm->lock);
 		OSM_LOG(pm->log, OSM_LOG_INFO,
 			"PM sweep state %d, skipping sweep\n",