lustre: lu_object: fix possible hang waiting for LCS_LEAVING
diff mbox series

Message ID 87d0s070nq.fsf@notabene.neil.brown.name
State New
Headers show
Series
  • lustre: lu_object: fix possible hang waiting for LCS_LEAVING
Related show

Commit Message

NeilBrown Oct. 23, 2018, 10:34 p.m. UTC
As lu_context_key_quiesce() spins waiting for LCS_LEAVING to
change, it is important the we set and then clear in within a
non-preemptible region.  If the thread that spins pre-empty the
thread that sets-and-clears the state while the state is LCS_LEAVING,
then it can spin indefinitely, particularly on a single-CPU machine.

Also update the comment to explain this dependency.

Fixes: ac3f8fd6e61b ("staging: lustre: remove locking from lu_context_exit()")
---

This is the cause of the "something" that went wrong in my recent
testing that I mentioned.  I wonder if preempt_enable() has recently
been enhanced to encourage a preempt, to make this sort of bug easier to
see.


 drivers/staging/lustre/lustre/obdclass/lu_object.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

Comments

James Simmons Oct. 29, 2018, 3:31 a.m. UTC | #1
> As lu_context_key_quiesce() spins waiting for LCS_LEAVING to
> change, it is important the we set and then clear in within a
> non-preemptible region.  If the thread that spins pre-empty the
> thread that sets-and-clears the state while the state is LCS_LEAVING,
> then it can spin indefinitely, particularly on a single-CPU machine.
> 
> Also update the comment to explain this dependency.
> 
> Fixes: ac3f8fd6e61b ("staging: lustre: remove locking from lu_context_exit()")
> ---
> 
> This is the cause of the "something" that went wrong in my recent
> testing that I mentioned.  I wonder if preempt_enable() has recently
> been enhanced to encourage a preempt, to make this sort of bug easier to
> see.
> 

Reduced my cpu load :-)

Reviewed-by: James Simmons <jsimmons@infradead.org>
 
>  drivers/staging/lustre/lustre/obdclass/lu_object.c | 15 +++++++++------
>  1 file changed, 9 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/staging/lustre/lustre/obdclass/lu_object.c b/drivers/staging/lustre/lustre/obdclass/lu_object.c
> index cb57abf03644..51497c144dd6 100644
> --- a/drivers/staging/lustre/lustre/obdclass/lu_object.c
> +++ b/drivers/staging/lustre/lustre/obdclass/lu_object.c
> @@ -1654,17 +1654,20 @@ void lu_context_exit(struct lu_context *ctx)
>  	unsigned int i;
>  
>  	LINVRNT(ctx->lc_state == LCS_ENTERED);
> -	/*
> -	 * Ensure lu_context_key_quiesce() sees LCS_LEAVING
> -	 * or we see LCT_QUIESCENT
> -	 */
> -	smp_store_mb(ctx->lc_state, LCS_LEAVING);
>  	/*
>  	 * Disable preempt to ensure we get a warning if
>  	 * any lct_exit ever tries to sleep.  That would hurt
>  	 * lu_context_key_quiesce() which spins waiting for us.
> +	 * This also ensure we aren't preempted while the state
> +	 * is LCS_LEAVING, as that too would cause problems for
> +	 * lu_context_key_quiesce().
>  	 */
>  	preempt_disable();
> +	/*
> +	 * Ensure lu_context_key_quiesce() sees LCS_LEAVING
> +	 * or we see LCT_QUIESCENT
> +	 */
> +	smp_store_mb(ctx->lc_state, LCS_LEAVING);
>  	if (ctx->lc_tags & LCT_HAS_EXIT && ctx->lc_value) {
>  		for (i = 0; i < ARRAY_SIZE(lu_keys); ++i) {
>  			struct lu_context_key *key;
> @@ -1677,8 +1680,8 @@ void lu_context_exit(struct lu_context *ctx)
>  		}
>  	}
>  
> -	preempt_enable();
>  	smp_store_release(&ctx->lc_state, LCS_LEFT);
> +	preempt_enable();
>  }
>  EXPORT_SYMBOL(lu_context_exit);
>  
> -- 
> 2.14.0.rc0.dirty
> 
>
NeilBrown Oct. 29, 2018, 4:31 a.m. UTC | #2
On Mon, Oct 29 2018, James Simmons wrote:

>> As lu_context_key_quiesce() spins waiting for LCS_LEAVING to
>> change, it is important the we set and then clear in within a
>> non-preemptible region.  If the thread that spins pre-empty the
>> thread that sets-and-clears the state while the state is LCS_LEAVING,
>> then it can spin indefinitely, particularly on a single-CPU machine.
>> 
>> Also update the comment to explain this dependency.
>> 
>> Fixes: ac3f8fd6e61b ("staging: lustre: remove locking from lu_context_exit()")
>> ---
>> 
>> This is the cause of the "something" that went wrong in my recent
>> testing that I mentioned.  I wonder if preempt_enable() has recently
>> been enhanced to encourage a preempt, to make this sort of bug easier to
>> see.
>> 
>
> Reduced my cpu load :-)
>
> Reviewed-by: James Simmons <jsimmons@infradead.org>

Thanks,
NeilBrown


>  
>>  drivers/staging/lustre/lustre/obdclass/lu_object.c | 15 +++++++++------
>>  1 file changed, 9 insertions(+), 6 deletions(-)
>> 
>> diff --git a/drivers/staging/lustre/lustre/obdclass/lu_object.c b/drivers/staging/lustre/lustre/obdclass/lu_object.c
>> index cb57abf03644..51497c144dd6 100644
>> --- a/drivers/staging/lustre/lustre/obdclass/lu_object.c
>> +++ b/drivers/staging/lustre/lustre/obdclass/lu_object.c
>> @@ -1654,17 +1654,20 @@ void lu_context_exit(struct lu_context *ctx)
>>  	unsigned int i;
>>  
>>  	LINVRNT(ctx->lc_state == LCS_ENTERED);
>> -	/*
>> -	 * Ensure lu_context_key_quiesce() sees LCS_LEAVING
>> -	 * or we see LCT_QUIESCENT
>> -	 */
>> -	smp_store_mb(ctx->lc_state, LCS_LEAVING);
>>  	/*
>>  	 * Disable preempt to ensure we get a warning if
>>  	 * any lct_exit ever tries to sleep.  That would hurt
>>  	 * lu_context_key_quiesce() which spins waiting for us.
>> +	 * This also ensure we aren't preempted while the state
>> +	 * is LCS_LEAVING, as that too would cause problems for
>> +	 * lu_context_key_quiesce().
>>  	 */
>>  	preempt_disable();
>> +	/*
>> +	 * Ensure lu_context_key_quiesce() sees LCS_LEAVING
>> +	 * or we see LCT_QUIESCENT
>> +	 */
>> +	smp_store_mb(ctx->lc_state, LCS_LEAVING);
>>  	if (ctx->lc_tags & LCT_HAS_EXIT && ctx->lc_value) {
>>  		for (i = 0; i < ARRAY_SIZE(lu_keys); ++i) {
>>  			struct lu_context_key *key;
>> @@ -1677,8 +1680,8 @@ void lu_context_exit(struct lu_context *ctx)
>>  		}
>>  	}
>>  
>> -	preempt_enable();
>>  	smp_store_release(&ctx->lc_state, LCS_LEFT);
>> +	preempt_enable();
>>  }
>>  EXPORT_SYMBOL(lu_context_exit);
>>  
>> -- 
>> 2.14.0.rc0.dirty
>> 
>>

Patch
diff mbox series

diff --git a/drivers/staging/lustre/lustre/obdclass/lu_object.c b/drivers/staging/lustre/lustre/obdclass/lu_object.c
index cb57abf03644..51497c144dd6 100644
--- a/drivers/staging/lustre/lustre/obdclass/lu_object.c
+++ b/drivers/staging/lustre/lustre/obdclass/lu_object.c
@@ -1654,17 +1654,20 @@  void lu_context_exit(struct lu_context *ctx)
 	unsigned int i;
 
 	LINVRNT(ctx->lc_state == LCS_ENTERED);
-	/*
-	 * Ensure lu_context_key_quiesce() sees LCS_LEAVING
-	 * or we see LCT_QUIESCENT
-	 */
-	smp_store_mb(ctx->lc_state, LCS_LEAVING);
 	/*
 	 * Disable preempt to ensure we get a warning if
 	 * any lct_exit ever tries to sleep.  That would hurt
 	 * lu_context_key_quiesce() which spins waiting for us.
+	 * This also ensure we aren't preempted while the state
+	 * is LCS_LEAVING, as that too would cause problems for
+	 * lu_context_key_quiesce().
 	 */
 	preempt_disable();
+	/*
+	 * Ensure lu_context_key_quiesce() sees LCS_LEAVING
+	 * or we see LCT_QUIESCENT
+	 */
+	smp_store_mb(ctx->lc_state, LCS_LEAVING);
 	if (ctx->lc_tags & LCT_HAS_EXIT && ctx->lc_value) {
 		for (i = 0; i < ARRAY_SIZE(lu_keys); ++i) {
 			struct lu_context_key *key;
@@ -1677,8 +1680,8 @@  void lu_context_exit(struct lu_context *ctx)
 		}
 	}
 
-	preempt_enable();
 	smp_store_release(&ctx->lc_state, LCS_LEFT);
+	preempt_enable();
 }
 EXPORT_SYMBOL(lu_context_exit);