From patchwork Wed Mar 9 00:53:20 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: "Drokin, Oleg" X-Patchwork-Id: 8540251 Return-Path: X-Original-To: patchwork-linux-fsdevel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork2.web.kernel.org (Postfix) with ESMTP id 380DFC0553 for ; Wed, 9 Mar 2016 00:53:36 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id F2FB1201FE for ; Wed, 9 Mar 2016 00:53:34 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 86A0320121 for ; Wed, 9 Mar 2016 00:53:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753220AbcCIAxa (ORCPT ); Tue, 8 Mar 2016 19:53:30 -0500 Received: from mga02.intel.com ([134.134.136.20]:30967 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753011AbcCIAxW convert rfc822-to-8bit (ORCPT ); Tue, 8 Mar 2016 19:53:22 -0500 Received: from orsmga003.jf.intel.com ([10.7.209.27]) by orsmga101.jf.intel.com with ESMTP; 08 Mar 2016 16:53:21 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.22,559,1449561600"; d="scan'208";a="760610778" Received: from fmsmsx108.amr.corp.intel.com ([10.18.124.206]) by orsmga003.jf.intel.com with ESMTP; 08 Mar 2016 16:53:21 -0800 Received: from fmsmsx154.amr.corp.intel.com (10.18.116.70) by FMSMSX108.amr.corp.intel.com (10.18.124.206) with Microsoft SMTP Server (TLS) id 14.3.248.2; Tue, 8 Mar 2016 16:53:20 -0800 Received: from FMSMSX109.amr.corp.intel.com ([169.254.15.144]) by FMSMSX154.amr.corp.intel.com ([169.254.6.95]) with mapi id 14.03.0248.002; Tue, 8 Mar 2016 16:53:20 -0800 From: "Drokin, Oleg" To: Al Viro CC: "Dilger, Andreas" , Linus Torvalds , "" Subject: Re: races in ll_splice_alias() Thread-Topic: races in ll_splice_alias() Thread-Index: AQHReVTMjuKWH+rs0kiLdpBaNswnup9QC6WwgAA4ncyAAItWAA== Date: Wed, 9 Mar 2016 00:53:20 +0000 Message-ID: <7C3EBB6F-54AC-4744-BEC1-33EA82216F85@intel.com> References: <20160308160537.GV17997@ZenIV.linux.org.uk> <498D5A19-9E55-48D7-B5CF-34CA5769FF7F@intel.com> <20160308211148.GX17997@ZenIV.linux.org.uk> <20160309003416.GY17997@ZenIV.linux.org.uk> In-Reply-To: <20160309003416.GY17997@ZenIV.linux.org.uk> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.254.53.151] Content-ID: <7A0ACBB67BF0D34D9372455896CDB2A0@intel.com> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On Mar 8, 2016, at 7:34 PM, Al Viro wrote: > On Tue, Mar 08, 2016 at 11:18:09PM +0000, Drokin, Oleg wrote: >> Rename on server cannot get us to see the same directory in two places, or what's >> the scenario? >> In Lustre: >> thread1: lookup a directory on the server. >> get a lock back >> instantiate a dentry (guarded by the lock) >> make a lock revocable (ll_lookup_finish_locks() in ll_lookup_it()) >> thread2: rename the directory moving it somewhere else >> server attempts to revoke the lock from thread1 >> node that runs thread1 drops the lock and marks all dentries for that >> inode invalid >> server completes rename and releases the lock it holds >> thread3: lookup the directory under new name on the server >> this can only return from server when the rename has completed and the >> dentry from thread1 marked invalid. > > thread2 might run on another client; might or might not make any difference, > but in any case server going nuts shouldn't corrupt data structures on client… It does not matter indeed if thread2 is run on any particular client due to the lustre locking. >> Ok, let me try my hand at that. Hopefully whatever complications are there would >> show themselves right away too. >> >>> would always either inserted inode reference into a new dentry or dropped it. >>> I'm still trying to trace what does iput() in case of error in your current >>> code; I understand the one in do_statahead_enter(), but what does it in >>> ll_lookup_it_finish()? >> >> You mean when ll_d_init() fails in ll_splice_alias()? >> Hm… It appears that we are indeed leaking the inode in that case, thanks. >> I'll try to address that too. >> Hm, in fact this was almost noticed, but I guess nobody retested it after >> fixing the initial crash we had with 7486bc06ab2c46d6957f0211d09bc549aaf9cc87 > > If that's the case, I'd try this (on top of the patch from upthread): I have not tried this exact one yet (will try in a moment). But it appears we do need ll_d_init() call if we got a hit in d_exact_alias(). A particular test case fails due to lack of the d_fsdata being null. So far I hit it in case of unix sockets with a trace like this: [ 291.019261] Lustre: DEBUG MARKER: == sanity test 54a: unix domain socket test ========================================================== 19:31:19 (1457483479) [ 291.127120] LustreError: 4984:0:(llite_internal.h:1462:d_lustre_revalidate()) ASSERTION( ((struct ll_dentry_data *)((dentry)->d_fsdata)) ) failed: [ 291.127900] LustreError: 4984:0:(llite_internal.h:1462:d_lustre_revalidate()) LBUG [ 291.128626] CPU: 5 PID: 4984 Comm: socketclient Tainted: G C 4.5.0-rc6-vm-nfs+ #74 [ 291.129340] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 291.129343] 0000000000000000 ffff8800d7a379a0 ffffffff81413801 ffffffffa0464200 [ 291.129343] ffff8800d3b05f50 ffff8800d7a379c0 ffffffffa016ca36 0000000000000000 [ 291.129344] ffff8800d3b05ed0 ffff8800d7a37a60 ffffffffa0438680 ffff8800c22d3000 [ 291.129345] Call Trace: [ 291.129349] [] dump_stack+0x63/0x82 [ 291.129356] [] lbug_with_loc+0x46/0xb0 [libcfs] [ 291.129368] [] ll_lookup_it_finish+0x610/0x970 [lustre] [ 291.129374] [] ? ll_finish_md_op_data+0xe/0x10 [lustre] [ 291.129380] [] ll_lookup_it+0x28b/0x680 [lustre] [ 291.129386] [] ? ll_rmdir+0x390/0x390 [lustre] [ 291.129392] [] ll_lookup_nd+0x73/0x120 [lustre] [ 291.129394] [] lookup_real+0x1d/0x60 [ 291.129395] [] __lookup_hash+0x33/0x40 [ 291.129396] [] walk_component+0x1bc/0x510 [ 291.129397] [] ? link_path_walk+0x2e0/0x500 [ 291.129398] [] ? path_init+0x3ca/0x5a0 [ 291.129400] [] path_lookupat+0x5d/0x110 [ 291.129401] [] filename_lookup+0x9e/0x150 [ 291.129403] [] ? __kmalloc_node_track_caller+0x31/0x40 [ 291.129404] [] ? __kmalloc_node_track_caller+0x31/0x40 [ 291.129405] [] ? getname_kernel+0x34/0x120 [ 291.129406] [] ? getname_kernel+0x8d/0x120 [ 291.129407] [] kern_path+0x2b/0x30 [ 291.129408] [] unix_find_other+0x38/0x1f0 [ 291.129409] [] unix_stream_connect+0xf8/0x480 [ 291.129411] [] SYSC_connect+0xb7/0xf0 [ 291.129412] [] SyS_connect+0xe/0x10 [ 291.129413] [] entry_SYSCALL_64_fastpath+0x16/0x7a Sadly it seems crash tool does not know how to work with modern kernels yet again so it is time to update and see what was going on in more details. The test itself is simple: test_54a() { $SOCKETSERVER $DIR/socket || error "$SOCKETSERVER $DIR/socket failed: $?" $SOCKETCLIENT $DIR/socket || error "$SOCKETCLIENT $DIR/socket failed: $?" $MUNLINK $DIR/socket || error "$MUNLINK $DIR/socket failed: $?" } run_test 54a "unix domain socket test ==========================" with the socket server and socket client doing what you would think. The patch I had at hand was: Overall the rate of hits in d_exact_alias was very low, need to check if pre-patch it was any better. Also any particular reason to prefer d_splice_alias over d_add? Did not we already determine there were no other aliases. > > diff --git a/drivers/staging/lustre/lustre/llite/namei.c b/drivers/staging/lustre/lustre/llite/namei.c > index da5f443..bcc9841 100644 > --- a/drivers/staging/lustre/lustre/llite/namei.c > +++ b/drivers/staging/lustre/lustre/llite/namei.c > @@ -320,81 +320,37 @@ void ll_i2gids(__u32 *suppgids, struct inode *i1, struct inode *i2) > } > > /* > - * try to reuse three types of dentry: > - * 1. unhashed alias, this one is unhashed by d_invalidate (but it may be valid > - * by concurrent .revalidate). > - * 2. INVALID alias (common case for no valid ldlm lock held, but this flag may > - * be cleared by others calling d_lustre_revalidate). > - * 3. DISCONNECTED alias. > - */ > -static struct dentry *ll_find_alias(struct inode *inode, struct dentry *dentry) > -{ > - struct dentry *alias, *discon_alias, *invalid_alias; > - > - if (hlist_empty(&inode->i_dentry)) > - return NULL; > - > - discon_alias = invalid_alias = NULL; > - > - ll_lock_dcache(inode); > - hlist_for_each_entry(alias, &inode->i_dentry, d_u.d_alias) { > - LASSERT(alias != dentry); > - > - spin_lock(&alias->d_lock); > - if (alias->d_flags & DCACHE_DISCONNECTED) > - /* LASSERT(last_discon == NULL); LU-405, bz 20055 */ > - discon_alias = alias; > - else if (alias->d_parent == dentry->d_parent && > - alias->d_name.hash == dentry->d_name.hash && > - alias->d_name.len == dentry->d_name.len && > - memcmp(alias->d_name.name, dentry->d_name.name, > - dentry->d_name.len) == 0) > - invalid_alias = alias; > - spin_unlock(&alias->d_lock); > - > - if (invalid_alias) > - break; > - } > - alias = invalid_alias ?: discon_alias ?: NULL; > - if (alias) { > - spin_lock(&alias->d_lock); > - dget_dlock(alias); > - spin_unlock(&alias->d_lock); > - } > - ll_unlock_dcache(inode); > - > - return alias; > -} > - > -/* > * Similar to d_splice_alias(), but lustre treats invalid alias > * similar to DCACHE_DISCONNECTED, and tries to use it anyway. > */ > struct dentry *ll_splice_alias(struct inode *inode, struct dentry *de) > { > - struct dentry *new; > + struct dentry *alias = NULL; > int rc; > > if (inode) { > - new = ll_find_alias(inode, de); > - if (new) { > - rc = ll_d_init(new); > - if (rc < 0) { > - dput(new); > - return ERR_PTR(rc); > - } > - d_move(new, de); > + alias = d_exact_alias(de, inode); > + if (alias) > + iput(inode); > + } > + > + if (!alias) { > + rc = ll_d_init(de); > + if (rc < 0) { > iput(inode); > - CDEBUG(D_DENTRY, > - "Reuse dentry %p inode %p refc %d flags %#x\n", > - new, d_inode(new), d_count(new), new->d_flags); > - return new; > + return ERR_PTR(rc); > } > + alias = d_splice_alias(inode, de); > + if (IS_ERR(alias)) > + return alias; > + } > + > + if (alias) { > + CDEBUG(D_DENTRY, > + "Reuse dentry %p inode %p refc %d flags %#x\n", > + alias, d_inode(alias), d_count(alias), alias->d_flags); > + return alias; > } > - rc = ll_d_init(de); > - if (rc < 0) > - return ERR_PTR(rc); > - d_add(de, inode); > CDEBUG(D_DENTRY, "Add dentry %p inode %p refc %d flags %#x\n", > de, d_inode(de), d_count(de), de->d_flags); > return de; > diff --git a/drivers/staging/lustre/lustre/llite/statahead.c b/drivers/staging/lustre/lustre/llite/statahead.c > index 88ffd8e..6c64de0 100644 > --- a/drivers/staging/lustre/lustre/llite/statahead.c > +++ b/drivers/staging/lustre/lustre/llite/statahead.c > @@ -1576,6 +1576,7 @@ int do_statahead_enter(struct inode *dir, struct dentry **dentryp, > alias = ll_splice_alias(inode, > *dentryp); > if (IS_ERR(alias)) { > + entry->se_inode = NULL; > ll_sai_unplug(sai, entry); > return PTR_ERR(alias); > } --- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html --- a/drivers/staging/lustre/lustre/llite/namei.c +++ b/drivers/staging/lustre/lustre/llite/namei.c @@ -358,14 +358,8 @@ struct dentry *ll_splice_alias(struct inode *inode, struct dentry *de) int rc; if (inode) { - new = ll_find_alias(inode, de); + new = d_exact_alias(de, inode); if (new) { - rc = ll_d_init(new); - if (rc < 0) { - dput(new); - return ERR_PTR(rc); - } - d_move(new, de); iput(inode); CDEBUG(D_DENTRY, "Reuse dentry %p inode %p refc %d flags %#x\n", @@ -374,9 +368,9 @@ struct dentry *ll_splice_alias(struct inode *inode, struct dentry *de) } } rc = ll_d_init(de); + d_add(de, inode); if (rc < 0) return ERR_PTR(rc); - d_add(de, inode); CDEBUG(D_DENTRY, "Add dentry %p inode %p refc %d flags %#x\n", de, d_inode(de), d_count(de), de->d_flags);