From patchwork Tue Feb 13 19:57:06 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 10217441 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 2496460467 for ; Tue, 13 Feb 2018 19:57:16 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 08EC328D9D for ; Tue, 13 Feb 2018 19:57:16 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id F022D28DB2; Tue, 13 Feb 2018 19:57:15 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, T_TVD_MIME_EPI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7897128D9D for ; Tue, 13 Feb 2018 19:57:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965611AbeBMT5O (ORCPT ); Tue, 13 Feb 2018 14:57:14 -0500 Received: from mx2.suse.de ([195.135.220.15]:53352 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S965589AbeBMT5N (ORCPT ); Tue, 13 Feb 2018 14:57:13 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de X-Amavis-Alert: BAD HEADER SECTION, Duplicate header field: "To" Received: from relay2.suse.de (charybdis-ext.suse.de [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 8A133AF37; Tue, 13 Feb 2018 19:57:12 +0000 (UTC) From: NeilBrown To: "Wang\, Alan 1. \(NSB - CN\/Hangzhou\)" , "linux-nfs\@vger.kernel.org" to: "J. Bruce Fields" Date: Wed, 14 Feb 2018 06:57:06 +1100 Subject: Re: A special case in write_flush which cause the umount busy In-Reply-To: References: Message-ID: <87inb0r7cd.fsf@notabene.neil.brown.name> MIME-Version: 1.0 Sender: linux-nfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On Sun, Feb 11 2018, Wang, Alan 1. (NSB - CN/Hangzhou) wrote: > Hi, > > I have a test case on mount/umount on a partition from nfs server side. And encounter a problem of umount busy in a low probability. > > The Linux version is 3.10.64 with the patch "sunrpc/cache: make cache flushing more reliable". > https://patchwork.kernel.org/patch/7410021/ > > After some analysis and test in many times, I find that when it failed to mount, the time "then" and "now" are different, which caused the last_refresh is far beyond the flush_time. So this cache is not expired and won't be clean at once. > Then the ref in cache_head won't be released, and mntput_no_expire didn't be called to decrease the count. That caused the umount busy. > > Below are logs in my test. > > kernel: [ 292.767801] write_flush 1480 then = 249, now = 250 > kernel: [ 292.767817] cache_clean 451, cd name nfsd.fh expiry_time 790485253, cd flush_time 249, last_refresh 369, seconds_since_boot 250 > kernel: [ 292.767913] write_flush 1480 then = 249, now = 250 > kernel: [ 292.767928] cache_clean 451, cd name nfsd.export expiry_time 2049, cd flush_time 249, last_refresh 369, seconds_since_boot 250 > kernel: [ 292.773229] do_refcount_check 283 mycount 4 > kernel: [ 292.773245] do_umount 1344 retval -16 > > I think this happens in such case that the exportfs writes the flush with current time, the time of "then". But when seconds_since_boot being called in function write_flush, the time is on the next second, so the "now" is one second after "then". > Because "then" is less than "now", the flush_time is set directly to original "then", rather than "cd->flush_time + 1". > > > And I want to change the condition as below. I'm not sure it's OK and has no effects to other part. I think this change is safe. It is a bit of a hack, but the "flush" interface was actually very poorly designed and I don't think it *can* be implemented cleanly. Maybe we should just ignore the number written in and *always* flush the cache completely. That is was it always wanted. See below. Bruce: do you have any thoughts on this? Thanks, NeilBrown > > -------------------------------------------------------------------------- > then = get_expiry(&bp); > now = seconds_since_boot(); > cd->nextcheck = now; > /* Can only set flush_time to 1 second beyond "now", or > * possibly 1 second beyond flushtime. This is because > * flush_time never goes backwards so it mustn't get too far > * ahead of time. > */ > if (then != now) > printk("%s %d then = %d, now = %d\n", __func__, __LINE__, then, now); > - if (then >= now) { > + if (then >= now - 1) { > /* Want to flush everything, so behave like cache_purge() */ > if (cd->flush_time >= now) > now = cd->flush_time + 1; > then = now; > } > > cd->flush_time = then; > -------------------------------------------------------------------------- > > Best Regards, > Alan Wang diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c index aa36dad32db1..43f117d1547e 100644 --- a/net/sunrpc/cache.c +++ b/net/sunrpc/cache.c @@ -1450,7 +1450,7 @@ static ssize_t write_flush(struct file *file, const char __user *buf, struct cache_detail *cd) { char tbuf[20]; - char *bp, *ep; + char *ep; time_t then, now; if (*ppos || count > sizeof(tbuf)-1) @@ -1461,24 +1461,24 @@ static ssize_t write_flush(struct file *file, const char __user *buf, simple_strtoul(tbuf, &ep, 0); if (*ep && *ep != '\n') return -EINVAL; + /* Note that while we check that 'buf' holds a valid number, + * we always ignore the value and just flush everything. + * Making use of the number leads to races. + */ - bp = tbuf; - then = get_expiry(&bp); now = seconds_since_boot(); - cd->nextcheck = now; - /* Can only set flush_time to 1 second beyond "now", or + /* Always flush everything, so behave like cache_purge() + * Can only set flush_time to 1 second beyond "now", or * possibly 1 second beyond flushtime. This is because * flush_time never goes backwards so it mustn't get too far * ahead of time. */ - if (then >= now) { - /* Want to flush everything, so behave like cache_purge() */ - if (cd->flush_time >= now) - now = cd->flush_time + 1; - then = now; - } - cd->flush_time = then; + if (cd->flush_time >= now) + now = cd->flush_time + 1; + + cd->flush_time = now; + cd->nextcheck = now; cache_flush(); *ppos += count;