From patchwork Fri Jul 21 13:43:10 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Waiman Long X-Patchwork-Id: 9856965 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 6F4C8600F5 for ; Fri, 21 Jul 2017 13:44:17 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 74D0D28749 for ; Fri, 21 Jul 2017 13:44:17 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 6987D287B6; Fri, 21 Jul 2017 13:44:17 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0A9AB28749 for ; Fri, 21 Jul 2017 13:44:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753576AbdGUNoD (ORCPT ); Fri, 21 Jul 2017 09:44:03 -0400 Received: from mx1.redhat.com ([209.132.183.28]:54478 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753905AbdGUNni (ORCPT ); Fri, 21 Jul 2017 09:43:38 -0400 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id A5882883A0; Fri, 21 Jul 2017 13:43:37 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com A5882883A0 Authentication-Results: ext-mx02.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx02.extmail.prod.ext.phx2.redhat.com; spf=pass smtp.mailfrom=longman@redhat.com DKIM-Filter: OpenDKIM Filter v2.11.0 mx1.redhat.com A5882883A0 Received: from llong.com (dhcp-17-6.bos.redhat.com [10.18.17.6]) by smtp.corp.redhat.com (Postfix) with ESMTP id 3DE1117F31; Fri, 21 Jul 2017 13:43:36 +0000 (UTC) From: Waiman Long To: Alexander Viro , Jonathan Corbet Cc: linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-fsdevel@vger.kernel.org, "Paul E. McKenney" , Andrew Morton , Ingo Molnar , Miklos Szeredi , Matthew Wilcox , Larry Woodman , Waiman Long Subject: [PATCH v2 4/4] fs/dcache: Protect negative dentry pruning from racing with umount Date: Fri, 21 Jul 2017 09:43:10 -0400 Message-Id: <1500644590-6599-5-git-send-email-longman@redhat.com> In-Reply-To: <1500644590-6599-1-git-send-email-longman@redhat.com> References: <1500644590-6599-1-git-send-email-longman@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.26]); Fri, 21 Jul 2017 13:43:37 +0000 (UTC) Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP The negative dentry pruning is done on a specific super_block set in the ndblk.prune_sb variable. If the super_block is also being un-mounted concurrently, the content of the super_block may no longer be valid. To protect against such racing condition, a new lock is added to the ndblk structure to synchronize the negative dentry pruning and umount operation. This is a regular spinlock as the pruning operation can be quite time consuming. Signed-off-by: Waiman Long --- fs/dcache.c | 42 +++++++++++++++++++++++++++++++++++++++--- 1 file changed, 39 insertions(+), 3 deletions(-) diff --git a/fs/dcache.c b/fs/dcache.c index c2ea876..a3159f3 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -139,11 +139,13 @@ struct dentry_stat_t dentry_stat = { static long neg_dentry_nfree_init __read_mostly; /* Free pool initial value */ static struct { raw_spinlock_t nfree_lock; + spinlock_t prune_lock; /* Lock for protecting pruning */ long nfree; /* Negative dentry free pool */ struct super_block *prune_sb; /* Super_block for pruning */ int neg_count, prune_count; /* Pruning counts */ } ndblk ____cacheline_aligned_in_smp; +static void clear_prune_sb_for_umount(struct super_block *sb); static void prune_negative_dentry(struct work_struct *work); static DECLARE_DELAYED_WORK(prune_neg_dentry_work, prune_negative_dentry); @@ -1323,6 +1325,7 @@ void shrink_dcache_sb(struct super_block *sb) { long freed; + clear_prune_sb_for_umount(sb); do { LIST_HEAD(dispose); @@ -1353,7 +1356,8 @@ static enum lru_status dentry_negative_lru_isolate(struct list_head *item, * list. */ if ((ndblk.neg_count >= NEG_DENTRY_BATCH) || - (ndblk.prune_count >= NEG_DENTRY_BATCH)) { + (ndblk.prune_count >= NEG_DENTRY_BATCH) || + !READ_ONCE(ndblk.prune_sb)) { ndblk.prune_count = 0; return LRU_STOP; } @@ -1408,15 +1412,24 @@ static enum lru_status dentry_negative_lru_isolate(struct list_head *item, static void prune_negative_dentry(struct work_struct *work) { int freed; - struct super_block *sb = READ_ONCE(ndblk.prune_sb); + struct super_block *sb; LIST_HEAD(dispose); - if (!sb) + /* + * The prune_lock is used to protect negative dentry pruning from + * racing with concurrent umount operation. + */ + spin_lock(&ndblk.prune_lock); + sb = READ_ONCE(ndblk.prune_sb); + if (!sb) { + spin_unlock(&ndblk.prune_lock); return; + } ndblk.neg_count = ndblk.prune_count = 0; freed = list_lru_walk(&sb->s_dentry_lru, dentry_negative_lru_isolate, &dispose, NEG_DENTRY_BATCH); + spin_unlock(&ndblk.prune_lock); if (freed) shrink_dentry_list(&dispose); @@ -1433,6 +1446,27 @@ static void prune_negative_dentry(struct work_struct *work) WRITE_ONCE(ndblk.prune_sb, NULL); } +/* + * This is called before an umount to clear ndblk.prune_sb if it + * matches the given super_block. + */ +static void clear_prune_sb_for_umount(struct super_block *sb) +{ + if (likely(READ_ONCE(ndblk.prune_sb) != sb)) + return; + WRITE_ONCE(ndblk.prune_sb, NULL); + /* + * Need to wait until an ongoing pruning operation, if present, + * is completed. + * + * Clearing ndblk.prune_sb will hasten the completion of pruning. + * In the unlikely event that ndblk.prune_sb is set to another + * super_block, the waiting will last the complete pruning operation + * which shouldn't be that long either. + */ + spin_unlock_wait(&ndblk.prune_lock); +} + /** * enum d_walk_ret - action to talke during tree walk * @D_WALK_CONTINUE: contrinue walk @@ -1755,6 +1789,7 @@ void shrink_dcache_for_umount(struct super_block *sb) WARN(down_read_trylock(&sb->s_umount), "s_umount should've been locked"); + clear_prune_sb_for_umount(sb); dentry = sb->s_root; sb->s_root = NULL; do_one_tree(dentry); @@ -3857,6 +3892,7 @@ static void __init neg_dentry_init(void) unsigned long cnt; raw_spin_lock_init(&ndblk.nfree_lock); + spin_lock_init(&ndblk.prune_lock); /* 20% in global pool & 80% in percpu free */ ndblk.nfree = neg_dentry_nfree_init