diff mbox series

[16/28] lustre: statahead: missing barrier before wake_up

Message ID 1539543498-29105-17-git-send-email-jsimmons@infradead.org (mailing list archive)
State New, archived
Headers show
Series lustre: more assorted fixes for lustre 2.10 | expand

Commit Message

James Simmons Oct. 14, 2018, 6:58 p.m. UTC
From: Lai Siyao <lai.siyao@whamcloud.com>

A barrier is missing before wake_up() in ll_statahead_interpret(),
which may cause 'ls' hang. Under the right conditions a basic 'ls'
can fail. The debug logs show:

statahead.c:683:ll_statahead_interpret()) sa_entry software rc -13
statahead.c:1666:ll_statahead()) revalidate statahead software: -11.

Obviously statahead failure didn't notify 'ls' process in time.
The mi_cbdata can be stale so add a barrier before calling
wake_up().

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Signed-off-by: Bob Glossman <bob.glossman@intel.com>
WC-bug-id: https://jira.whamcloud.com/browse/LU-9210
Reviewed-on: https://review.whamcloud.com/27330
Reviewed-by: Nathaniel Clark <nclark@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/llite/statahead.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

Comments

NeilBrown Oct. 18, 2018, 2 a.m. UTC | #1
On Sun, Oct 14 2018, James Simmons wrote:

> From: Lai Siyao <lai.siyao@whamcloud.com>
>
> A barrier is missing before wake_up() in ll_statahead_interpret(),
> which may cause 'ls' hang. Under the right conditions a basic 'ls'
> can fail. The debug logs show:
>
> statahead.c:683:ll_statahead_interpret()) sa_entry software rc -13
> statahead.c:1666:ll_statahead()) revalidate statahead software: -11.
>
> Obviously statahead failure didn't notify 'ls' process in time.
> The mi_cbdata can be stale so add a barrier before calling
> wake_up().
>
> Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
> Signed-off-by: Bob Glossman <bob.glossman@intel.com>
> WC-bug-id: https://jira.whamcloud.com/browse/LU-9210
> Reviewed-on: https://review.whamcloud.com/27330
> Reviewed-by: Nathaniel Clark <nclark@whamcloud.com>
> Reviewed-by: Oleg Drokin <green@whamcloud.com>
> Signed-off-by: James Simmons <jsimmons@infradead.org>
> ---
>  drivers/staging/lustre/lustre/llite/statahead.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/staging/lustre/lustre/llite/statahead.c b/drivers/staging/lustre/lustre/llite/statahead.c
> index 1ad308c..0174a4c 100644
> --- a/drivers/staging/lustre/lustre/llite/statahead.c
> +++ b/drivers/staging/lustre/lustre/llite/statahead.c
> @@ -680,8 +680,14 @@ static int ll_statahead_interpret(struct ptlrpc_request *req,
>  
>  	spin_lock(&lli->lli_sa_lock);
>  	if (rc) {
> -		if (__sa_make_ready(sai, entry, rc))
> +		if (__sa_make_ready(sai, entry, rc)) {
> +			/* LU-9210 : Under the right conditions even 'ls'
> +			 * can cause the statahead to fail. Using a memory
> +			 * barrier resolves this issue.
> +			 */
> +			smp_mb();
>  			wake_up(&sai->sai_waitq);
> +		}
>  	} else {
>  		int first = 0;
>  		entry->se_minfo = minfo;
> -- 
> 1.8.3.1

Again, this is a fairly lame comment to justify the smp_mb().
It appears to me that the issue is most likely the value of
entry->se_state.
__sa_make_ready() sets this and revalidate_statahead_dentry tests it
after waiting on sai_waitq.
So I think it would be best if we changed __sa_make_ready() to

	smp_store_release(&entry->se_state, ret < 0 ? SA_ENTRY_INVA : SA_ENTRY_SUCC)

and in ll_statahead_interpret() have

	if (smp_load_acquire(&entry->se_state) == SA_ENTRY_SUCC &&
            entry->se_inode) {

This would make it obvious which variable was important, and would show
the paired synchronization points.

NeilBrown
James Simmons Oct. 21, 2018, 10:52 p.m. UTC | #2
> On Sun, Oct 14 2018, James Simmons wrote:
> 
> > From: Lai Siyao <lai.siyao@whamcloud.com>
> >
> > A barrier is missing before wake_up() in ll_statahead_interpret(),
> > which may cause 'ls' hang. Under the right conditions a basic 'ls'
> > can fail. The debug logs show:
> >
> > statahead.c:683:ll_statahead_interpret()) sa_entry software rc -13
> > statahead.c:1666:ll_statahead()) revalidate statahead software: -11.
> >
> > Obviously statahead failure didn't notify 'ls' process in time.
> > The mi_cbdata can be stale so add a barrier before calling
> > wake_up().
> >
> > Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
> > Signed-off-by: Bob Glossman <bob.glossman@intel.com>
> > WC-bug-id: https://jira.whamcloud.com/browse/LU-9210
> > Reviewed-on: https://review.whamcloud.com/27330
> > Reviewed-by: Nathaniel Clark <nclark@whamcloud.com>
> > Reviewed-by: Oleg Drokin <green@whamcloud.com>
> > Signed-off-by: James Simmons <jsimmons@infradead.org>
> > ---
> >  drivers/staging/lustre/lustre/llite/statahead.c | 8 +++++++-
> >  1 file changed, 7 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/staging/lustre/lustre/llite/statahead.c b/drivers/staging/lustre/lustre/llite/statahead.c
> > index 1ad308c..0174a4c 100644
> > --- a/drivers/staging/lustre/lustre/llite/statahead.c
> > +++ b/drivers/staging/lustre/lustre/llite/statahead.c
> > @@ -680,8 +680,14 @@ static int ll_statahead_interpret(struct ptlrpc_request *req,
> >  
> >  	spin_lock(&lli->lli_sa_lock);
> >  	if (rc) {
> > -		if (__sa_make_ready(sai, entry, rc))
> > +		if (__sa_make_ready(sai, entry, rc)) {
> > +			/* LU-9210 : Under the right conditions even 'ls'
> > +			 * can cause the statahead to fail. Using a memory
> > +			 * barrier resolves this issue.
> > +			 */
> > +			smp_mb();
> >  			wake_up(&sai->sai_waitq);
> > +		}
> >  	} else {
> >  		int first = 0;
> >  		entry->se_minfo = minfo;
> > -- 
> > 1.8.3.1
> 
> Again, this is a fairly lame comment to justify the smp_mb().
> It appears to me that the issue is most likely the value of
> entry->se_state.
> __sa_make_ready() sets this and revalidate_statahead_dentry tests it
> after waiting on sai_waitq.
> So I think it would be best if we changed __sa_make_ready() to
> 
> 	smp_store_release(&entry->se_state, ret < 0 ? SA_ENTRY_INVA : SA_ENTRY_SUCC)
> 
> and in ll_statahead_interpret() have
> 
> 	if (smp_load_acquire(&entry->se_state) == SA_ENTRY_SUCC &&
>             entry->se_inode) {
> 
> This would make it obvious which variable was important, and would show
> the paired synchronization points.

If you think this is lame you should be the JIRA ticket and the original 
patch. It had zero info so I attempted to extract what I could out of the
ticket. Hopefully Lai can fill in the details. I have no problems fixing 
this another way. I don't see a way in the ticket to easily reproduce this
problem to see the new approach would fix it :-(
NeilBrown Oct. 22, 2018, 4:04 a.m. UTC | #3
On Sun, Oct 21 2018, James Simmons wrote:

>> On Sun, Oct 14 2018, James Simmons wrote:
>> 
>> > From: Lai Siyao <lai.siyao@whamcloud.com>
>> >
>> > A barrier is missing before wake_up() in ll_statahead_interpret(),
>> > which may cause 'ls' hang. Under the right conditions a basic 'ls'
>> > can fail. The debug logs show:
>> >
>> > statahead.c:683:ll_statahead_interpret()) sa_entry software rc -13
>> > statahead.c:1666:ll_statahead()) revalidate statahead software: -11.
>> >
>> > Obviously statahead failure didn't notify 'ls' process in time.
>> > The mi_cbdata can be stale so add a barrier before calling
>> > wake_up().
>> >
>> > Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
>> > Signed-off-by: Bob Glossman <bob.glossman@intel.com>
>> > WC-bug-id: https://jira.whamcloud.com/browse/LU-9210
>> > Reviewed-on: https://review.whamcloud.com/27330
>> > Reviewed-by: Nathaniel Clark <nclark@whamcloud.com>
>> > Reviewed-by: Oleg Drokin <green@whamcloud.com>
>> > Signed-off-by: James Simmons <jsimmons@infradead.org>
>> > ---
>> >  drivers/staging/lustre/lustre/llite/statahead.c | 8 +++++++-
>> >  1 file changed, 7 insertions(+), 1 deletion(-)
>> >
>> > diff --git a/drivers/staging/lustre/lustre/llite/statahead.c b/drivers/staging/lustre/lustre/llite/statahead.c
>> > index 1ad308c..0174a4c 100644
>> > --- a/drivers/staging/lustre/lustre/llite/statahead.c
>> > +++ b/drivers/staging/lustre/lustre/llite/statahead.c
>> > @@ -680,8 +680,14 @@ static int ll_statahead_interpret(struct ptlrpc_request *req,
>> >  
>> >  	spin_lock(&lli->lli_sa_lock);
>> >  	if (rc) {
>> > -		if (__sa_make_ready(sai, entry, rc))
>> > +		if (__sa_make_ready(sai, entry, rc)) {
>> > +			/* LU-9210 : Under the right conditions even 'ls'
>> > +			 * can cause the statahead to fail. Using a memory
>> > +			 * barrier resolves this issue.
>> > +			 */
>> > +			smp_mb();
>> >  			wake_up(&sai->sai_waitq);
>> > +		}
>> >  	} else {
>> >  		int first = 0;
>> >  		entry->se_minfo = minfo;
>> > -- 
>> > 1.8.3.1
>> 
>> Again, this is a fairly lame comment to justify the smp_mb().
>> It appears to me that the issue is most likely the value of
>> entry->se_state.
>> __sa_make_ready() sets this and revalidate_statahead_dentry tests it
>> after waiting on sai_waitq.
>> So I think it would be best if we changed __sa_make_ready() to
>> 
>> 	smp_store_release(&entry->se_state, ret < 0 ? SA_ENTRY_INVA : SA_ENTRY_SUCC)
>> 
>> and in ll_statahead_interpret() have
>> 
>> 	if (smp_load_acquire(&entry->se_state) == SA_ENTRY_SUCC &&
>>             entry->se_inode) {
>> 
>> This would make it obvious which variable was important, and would show
>> the paired synchronization points.
>
> If you think this is lame you should be the JIRA ticket and the original 
> patch. It had zero info so I attempted to extract what I could out of the
> ticket. Hopefully Lai can fill in the details. I have no problems fixing 
> this another way. I don't see a way in the ticket to easily reproduce this
> problem to see the new approach would fix it :-(

I have imposed the version that I think is correct.  See below.

Thanks,
NeilBrown

--- a/drivers/staging/lustre/lustre/llite/statahead.c
+++ b/drivers/staging/lustre/lustre/llite/statahead.c
@@ -322,7 +322,11 @@ __sa_make_ready(struct ll_statahead_info *sai, struct sa_entry *entry, int ret)
 		}
 	}
 	list_add(&entry->se_list, pos);
-	entry->se_state = ret < 0 ? SA_ENTRY_INVA : SA_ENTRY_SUCC;
+	/*
+	 * LU-9210: ll_statahead_interpet must be able to see this before
+	 * we wake it up
+	 */
+	smp_store_release(&entry->se_state, ret < 0 ? SA_ENTRY_INVA : SA_ENTRY_SUCC);
 
 	return (index == sai->sai_index_wait);
 }
@@ -1390,7 +1394,12 @@ static int revalidate_statahead_dentry(struct inode *dir,
 		}
 	}
 
-	if (entry->se_state == SA_ENTRY_SUCC && entry->se_inode) {
+	/*
+	 * We need to see the value that was set immediately before we
+	 * were woken up.
+	 */
+	if (smp_load_acquire(&entry->se_state) == SA_ENTRY_SUCC &&
+	    entry->se_inode) {
 		struct inode *inode = entry->se_inode;
 		struct lookup_intent it = { .it_op = IT_GETATTR,
 					    .it_lock_handle = entry->se_handle };
James Simmons Nov. 4, 2018, 8:52 p.m. UTC | #4
> On Sun, Oct 21 2018, James Simmons wrote:
> 
> >> On Sun, Oct 14 2018, James Simmons wrote:
> >> 
> >> > From: Lai Siyao <lai.siyao@whamcloud.com>
> >> >
> >> > A barrier is missing before wake_up() in ll_statahead_interpret(),
> >> > which may cause 'ls' hang. Under the right conditions a basic 'ls'
> >> > can fail. The debug logs show:
> >> >
> >> > statahead.c:683:ll_statahead_interpret()) sa_entry software rc -13
> >> > statahead.c:1666:ll_statahead()) revalidate statahead software: -11.
> >> >
> >> > Obviously statahead failure didn't notify 'ls' process in time.
> >> > The mi_cbdata can be stale so add a barrier before calling
> >> > wake_up().
> >> >
> >> > Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
> >> > Signed-off-by: Bob Glossman <bob.glossman@intel.com>
> >> > WC-bug-id: https://jira.whamcloud.com/browse/LU-9210
> >> > Reviewed-on: https://review.whamcloud.com/27330
> >> > Reviewed-by: Nathaniel Clark <nclark@whamcloud.com>
> >> > Reviewed-by: Oleg Drokin <green@whamcloud.com>
> >> > Signed-off-by: James Simmons <jsimmons@infradead.org>
> >> > ---
> >> >  drivers/staging/lustre/lustre/llite/statahead.c | 8 +++++++-
> >> >  1 file changed, 7 insertions(+), 1 deletion(-)
> >> >
> >> > diff --git a/drivers/staging/lustre/lustre/llite/statahead.c b/drivers/staging/lustre/lustre/llite/statahead.c
> >> > index 1ad308c..0174a4c 100644
> >> > --- a/drivers/staging/lustre/lustre/llite/statahead.c
> >> > +++ b/drivers/staging/lustre/lustre/llite/statahead.c
> >> > @@ -680,8 +680,14 @@ static int ll_statahead_interpret(struct ptlrpc_request *req,
> >> >  
> >> >  	spin_lock(&lli->lli_sa_lock);
> >> >  	if (rc) {
> >> > -		if (__sa_make_ready(sai, entry, rc))
> >> > +		if (__sa_make_ready(sai, entry, rc)) {
> >> > +			/* LU-9210 : Under the right conditions even 'ls'
> >> > +			 * can cause the statahead to fail. Using a memory
> >> > +			 * barrier resolves this issue.
> >> > +			 */
> >> > +			smp_mb();
> >> >  			wake_up(&sai->sai_waitq);
> >> > +		}
> >> >  	} else {
> >> >  		int first = 0;
> >> >  		entry->se_minfo = minfo;
> >> > -- 
> >> > 1.8.3.1
> >> 
> >> Again, this is a fairly lame comment to justify the smp_mb().
> >> It appears to me that the issue is most likely the value of
> >> entry->se_state.
> >> __sa_make_ready() sets this and revalidate_statahead_dentry tests it
> >> after waiting on sai_waitq.
> >> So I think it would be best if we changed __sa_make_ready() to
> >> 
> >> 	smp_store_release(&entry->se_state, ret < 0 ? SA_ENTRY_INVA : SA_ENTRY_SUCC)
> >> 
> >> and in ll_statahead_interpret() have
> >> 
> >> 	if (smp_load_acquire(&entry->se_state) == SA_ENTRY_SUCC &&
> >>             entry->se_inode) {
> >> 
> >> This would make it obvious which variable was important, and would show
> >> the paired synchronization points.
> >
> > If you think this is lame you should be the JIRA ticket and the original 
> > patch. It had zero info so I attempted to extract what I could out of the
> > ticket. Hopefully Lai can fill in the details. I have no problems fixing 
> > this another way. I don't see a way in the ticket to easily reproduce this
> > problem to see the new approach would fix it :-(
> 
> I have imposed the version that I think is correct.  See below.

I opened a ticket - https://jira.whamcloud.com/browse/LU-11616 and pushed 
this to the OpenSFS branch for full testing. The patch is at:

https://review.whamcloud.com/#/c/33571

BTW I did not know about (total store order) TSO platforms and how some 
architectures don't support this property. An audit of the smp barriers
might be a good idea for Lustre. For people not aware of TSO a good
article on this is at:

https://lwn.net/Articles/576486
 
> Thanks,
> NeilBrown
> 
> --- a/drivers/staging/lustre/lustre/llite/statahead.c
> +++ b/drivers/staging/lustre/lustre/llite/statahead.c
> @@ -322,7 +322,11 @@ __sa_make_ready(struct ll_statahead_info *sai, struct sa_entry *entry, int ret)
>  		}
>  	}
>  	list_add(&entry->se_list, pos);
> -	entry->se_state = ret < 0 ? SA_ENTRY_INVA : SA_ENTRY_SUCC;
> +	/*
> +	 * LU-9210: ll_statahead_interpet must be able to see this before
> +	 * we wake it up
> +	 */
> +	smp_store_release(&entry->se_state, ret < 0 ? SA_ENTRY_INVA : SA_ENTRY_SUCC);
>  
>  	return (index == sai->sai_index_wait);
>  }
> @@ -1390,7 +1394,12 @@ static int revalidate_statahead_dentry(struct inode *dir,
>  		}
>  	}
>  
> -	if (entry->se_state == SA_ENTRY_SUCC && entry->se_inode) {
> +	/*
> +	 * We need to see the value that was set immediately before we
> +	 * were woken up.
> +	 */
> +	if (smp_load_acquire(&entry->se_state) == SA_ENTRY_SUCC &&
> +	    entry->se_inode) {
>  		struct inode *inode = entry->se_inode;
>  		struct lookup_intent it = { .it_op = IT_GETATTR,
>  					    .it_lock_handle = entry->se_handle };
>
diff mbox series

Patch

diff --git a/drivers/staging/lustre/lustre/llite/statahead.c b/drivers/staging/lustre/lustre/llite/statahead.c
index 1ad308c..0174a4c 100644
--- a/drivers/staging/lustre/lustre/llite/statahead.c
+++ b/drivers/staging/lustre/lustre/llite/statahead.c
@@ -680,8 +680,14 @@  static int ll_statahead_interpret(struct ptlrpc_request *req,
 
 	spin_lock(&lli->lli_sa_lock);
 	if (rc) {
-		if (__sa_make_ready(sai, entry, rc))
+		if (__sa_make_ready(sai, entry, rc)) {
+			/* LU-9210 : Under the right conditions even 'ls'
+			 * can cause the statahead to fail. Using a memory
+			 * barrier resolves this issue.
+			 */
+			smp_mb();
 			wake_up(&sai->sai_waitq);
+		}
 	} else {
 		int first = 0;
 		entry->se_minfo = minfo;