diff mbox

A special case in write_flush which cause the umount busy

Message ID 87inb0r7cd.fsf@notabene.neil.brown.name (mailing list archive)
State New, archived
Headers show

Commit Message

NeilBrown Feb. 13, 2018, 7:57 p.m. UTC
On Sun, Feb 11 2018, Wang, Alan 1. (NSB - CN/Hangzhou) wrote:

> Hi,
>
> I have a test case on mount/umount on a partition from nfs server side. And encounter a problem of umount busy in a low probability.
>
> The Linux version is 3.10.64 with the patch "sunrpc/cache: make cache flushing more reliable".
> https://patchwork.kernel.org/patch/7410021/
>
> After some analysis and test in many times, I find that when it failed to mount, the time "then" and "now" are different, which caused the last_refresh is far beyond the flush_time. So this cache is not expired and won't be clean at once. 
> Then the ref in cache_head won't be released, and mntput_no_expire didn't be called to decrease the count. That caused the umount busy.
>
> Below are logs in my test.
>
> kernel: [  292.767801] write_flush 1480 then = 249, now = 250
> kernel: [  292.767817] cache_clean 451, cd name nfsd.fh expiry_time 790485253, cd flush_time 249, last_refresh 369, seconds_since_boot 250
> kernel: [  292.767913] write_flush 1480 then = 249, now = 250
> kernel: [  292.767928] cache_clean 451, cd name nfsd.export expiry_time 2049, cd flush_time 249, last_refresh 369, seconds_since_boot 250
> kernel: [  292.773229] do_refcount_check 283 mycount 4
> kernel: [  292.773245] do_umount 1344 retval -16
>
> I think this happens in such case that the exportfs writes the flush with current time, the time of "then". But when seconds_since_boot being called in function write_flush, the time is on the next second, so the "now" is one second after "then".
> Because "then" is less than "now", the flush_time is set directly to original "then", rather than "cd->flush_time + 1".
>
>
> And I want to change the condition as below. I'm not sure it's OK and has no effects to other part.

I think this change is safe.  It is a bit of a hack, but the "flush"
interface was actually very poorly designed and I don't think it *can*
be implemented cleanly.
Maybe we should just ignore the number written in and *always* flush
the cache completely.  That is was it always wanted.

See below.

Bruce: do you have any thoughts on this?

Thanks,
NeilBrown



>
> --------------------------------------------------------------------------
>        then = get_expiry(&bp);
>        now = seconds_since_boot();
>        cd->nextcheck = now;
>        /* Can only set flush_time to 1 second beyond "now", or
>        * possibly 1 second beyond flushtime.  This is because
>        * flush_time never goes backwards so it mustn't get too far
>        * ahead of time.
>        */
>        if (then != now)
>               printk("%s %d then = %d, now = %d\n", __func__, __LINE__, then, now);
> -      if (then >= now) {
> +     if (then >= now - 1) {
>               /* Want to flush everything, so behave like cache_purge() */
>               if (cd->flush_time >= now)
>                      now = cd->flush_time + 1;
>               then = now;
>        }
>
>        cd->flush_time = then;
> --------------------------------------------------------------------------
>
> Best Regards,
> Alan Wang

Comments

J. Bruce Fields Feb. 13, 2018, 9:03 p.m. UTC | #1
On Wed, Feb 14, 2018 at 06:57:06AM +1100, NeilBrown wrote:
> I think this change is safe.  It is a bit of a hack, but the "flush"
> interface was actually very poorly designed and I don't think it *can*
> be implemented cleanly.
> Maybe we should just ignore the number written in and *always* flush
> the cache completely.  That is was it always wanted.
> 
> See below.
> 
> Bruce: do you have any thoughts on this?

I'm fine with ignoring the number--I agree that the chance anyone is
doing anything fancier is vanishingly small.

--b.

> 
> Thanks,
> NeilBrown
> 
> diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
> index aa36dad32db1..43f117d1547e 100644
> --- a/net/sunrpc/cache.c
> +++ b/net/sunrpc/cache.c
> @@ -1450,7 +1450,7 @@ static ssize_t write_flush(struct file *file, const char __user *buf,
>  			   struct cache_detail *cd)
>  {
>  	char tbuf[20];
> -	char *bp, *ep;
> +	char *ep;
>  	time_t then, now;
>  
>  	if (*ppos || count > sizeof(tbuf)-1)
> @@ -1461,24 +1461,24 @@ static ssize_t write_flush(struct file *file, const char __user *buf,
>  	simple_strtoul(tbuf, &ep, 0);
>  	if (*ep && *ep != '\n')
>  		return -EINVAL;
> +	/* Note that while we check that 'buf' holds a valid number,
> +	 * we always ignore the value and just flush everything.
> +	 * Making use of the number leads to races.
> +	 */
>  
> -	bp = tbuf;
> -	then = get_expiry(&bp);
>  	now = seconds_since_boot();
> -	cd->nextcheck = now;
> -	/* Can only set flush_time to 1 second beyond "now", or
> +	/* Always flush everything, so behave like cache_purge()
> +	 * Can only set flush_time to 1 second beyond "now", or
>  	 * possibly 1 second beyond flushtime.  This is because
>  	 * flush_time never goes backwards so it mustn't get too far
>  	 * ahead of time.
>  	 */
> -	if (then >= now) {
> -		/* Want to flush everything, so behave like cache_purge() */
> -		if (cd->flush_time >= now)
> -			now = cd->flush_time + 1;
> -		then = now;
> -	}
>  
> -	cd->flush_time = then;
> +	if (cd->flush_time >= now)
> +		now = cd->flush_time + 1;
> +
> +	cd->flush_time = now;
> +	cd->nextcheck = now;
>  	cache_flush();
>  
>  	*ppos += count;
> 
> 
> >
> > --------------------------------------------------------------------------
> >        then = get_expiry(&bp);
> >        now = seconds_since_boot();
> >        cd->nextcheck = now;
> >        /* Can only set flush_time to 1 second beyond "now", or
> >        * possibly 1 second beyond flushtime.  This is because
> >        * flush_time never goes backwards so it mustn't get too far
> >        * ahead of time.
> >        */
> >        if (then != now)
> >               printk("%s %d then = %d, now = %d\n", __func__, __LINE__, then, now);
> > -      if (then >= now) {
> > +     if (then >= now - 1) {
> >               /* Want to flush eve


--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
index aa36dad32db1..43f117d1547e 100644
--- a/net/sunrpc/cache.c
+++ b/net/sunrpc/cache.c
@@ -1450,7 +1450,7 @@  static ssize_t write_flush(struct file *file, const char __user *buf,
 			   struct cache_detail *cd)
 {
 	char tbuf[20];
-	char *bp, *ep;
+	char *ep;
 	time_t then, now;
 
 	if (*ppos || count > sizeof(tbuf)-1)
@@ -1461,24 +1461,24 @@  static ssize_t write_flush(struct file *file, const char __user *buf,
 	simple_strtoul(tbuf, &ep, 0);
 	if (*ep && *ep != '\n')
 		return -EINVAL;
+	/* Note that while we check that 'buf' holds a valid number,
+	 * we always ignore the value and just flush everything.
+	 * Making use of the number leads to races.
+	 */
 
-	bp = tbuf;
-	then = get_expiry(&bp);
 	now = seconds_since_boot();
-	cd->nextcheck = now;
-	/* Can only set flush_time to 1 second beyond "now", or
+	/* Always flush everything, so behave like cache_purge()
+	 * Can only set flush_time to 1 second beyond "now", or
 	 * possibly 1 second beyond flushtime.  This is because
 	 * flush_time never goes backwards so it mustn't get too far
 	 * ahead of time.
 	 */
-	if (then >= now) {
-		/* Want to flush everything, so behave like cache_purge() */
-		if (cd->flush_time >= now)
-			now = cd->flush_time + 1;
-		then = now;
-	}
 
-	cd->flush_time = then;
+	if (cd->flush_time >= now)
+		now = cd->flush_time + 1;
+
+	cd->flush_time = now;
+	cd->nextcheck = now;
 	cache_flush();
 
 	*ppos += count;