From patchwork Tue Nov 6 08:22:10 2012 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Yan, Zheng" X-Patchwork-Id: 1703531 Return-Path: X-Original-To: patchwork-ceph-devel@patchwork.kernel.org Delivered-To: patchwork-process-083081@patchwork1.kernel.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by patchwork1.kernel.org (Postfix) with ESMTP id 97F3B3FCDE for ; Tue, 6 Nov 2012 08:22:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754584Ab2KFIWO (ORCPT ); Tue, 6 Nov 2012 03:22:14 -0500 Received: from mga02.intel.com ([134.134.136.20]:17412 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754421Ab2KFIWN (ORCPT ); Tue, 6 Nov 2012 03:22:13 -0500 Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga101.jf.intel.com with ESMTP; 06 Nov 2012 00:22:12 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="4.80,721,1344236400"; d="scan'208";a="237833374" Received: from zyan5-mobl.sh.intel.com (HELO [10.239.36.18]) ([10.239.36.18]) by orsmga002.jf.intel.com with ESMTP; 06 Nov 2012 00:22:11 -0800 Message-ID: <5098C8B2.5060404@intel.com> Date: Tue, 06 Nov 2012 16:22:10 +0800 From: "Yan, Zheng" User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:16.0) Gecko/20121016 Thunderbird/16.0.1 MIME-Version: 1.0 To: Sage Weil CC: ceph-devel@vger.kernel.org Subject: Re: [PATCH 1/2] mds: Don't acquire replica object's versionlock References: <1351760618-19874-1-git-send-email-zheng.z.yan@intel.com> <1351760618-19874-2-git-send-email-zheng.z.yan@intel.com> In-Reply-To: Sender: ceph-devel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org On 11/06/2012 02:52 AM, Sage Weil wrote: > On Thu, 1 Nov 2012, Yan, Zheng wrote: >> From: "Yan, Zheng" >> >> Both CInode and CDentry's versionlocks are of type LocalLock. >> Acquiring LocalLock in replica object is useless and problematic. >> For example, if two requests try acquiring a replica object's >> versionlock, the first request succeeds, the second request >> is added to wait queue. Later when the first request finishes, >> MDCache::request_drop_foreign_locks() finds the lock's parent is >> non-auth, it skips waking requests in the wait queue. So the >> second request hangs. > > I don't remmeber the details, but the iversion locking on replicas came up > while testing renaming and export thrashing. i.e., running with > > mds thrash exports = 1 > > and doing some rename workload (fsstress maybe?). I saw assertion in Server::handle_slave_rename_prep() was triggered when I applied a wrong fix. but never see it again with this patch. > > Maybe the fix is just to wake the requests in the queue? > If I'm not wrong the version locks are used to wait for all rename operation on the paths to srcdn/destdn to complete. So witness see consistent paths for srcdn and destdn no matter how many times the OP_RENAMEPREP slave request is dispatched. I think the main reason we need version lock is that: For an auth dentry, we may rdlock it even it is already xlocked. But for non-auth dentry, we only can rdlock it when the lock is sync state, it guarantees the dentry is not xlocked. I find a bug in previous patch. 'assert(dn->is_auth())' in Locker::acquire_locks should be if '(!dn->is_auth()) continue'. Regards Yan, Zheng --- From 6683482d9f9517b990d3e4bae18af275f32491e4 Mon Sep 17 00:00:00 2001 From: "Yan, Zheng" Date: Thu, 1 Nov 2012 13:21:21 +0800 Subject: [PATCH] mds: Don't acquire replica object's versionlock Both CInode and CDentry's versionlocks are of type LocalLock. Acquiring LocalLock in replica object is useless and problematic. For example, if two requests try acquiring a replica object's versionlock, the first request succeeds, the second request is added to wait queue. Later when the first request finishes, MDCache::request_drop_foreign_locks() finds the lock's parent is non-auth, it skips waking requests in the wait queue. So the second request hangs. Signed-off-by: Yan, Zheng --- src/mds/Locker.cc | 7 +++++++ src/mds/Server.cc | 25 ++++++++++--------------- 2 files changed, 17 insertions(+), 15 deletions(-) diff --git a/src/mds/Locker.cc b/src/mds/Locker.cc index 7b6d449..a1f957a 100644 --- a/src/mds/Locker.cc +++ b/src/mds/Locker.cc @@ -196,6 +196,8 @@ bool Locker::acquire_locks(MDRequest *mdr, // augment xlock with a versionlock? if ((*p)->get_type() == CEPH_LOCK_DN) { CDentry *dn = (CDentry*)(*p)->get_parent(); + if (!dn->is_auth()) + continue; if (xlocks.count(&dn->versionlock)) continue; // we're xlocking the versionlock too; don't wrlock it! @@ -213,6 +215,8 @@ bool Locker::acquire_locks(MDRequest *mdr, if ((*p)->get_type() > CEPH_LOCK_IVERSION) { // inode version lock? CInode *in = (CInode*)(*p)->get_parent(); + if (!in->is_auth()) + continue; if (mdr->is_master()) { // master. wrlock versionlock so we can pipeline inode updates to journal. wrlocks.insert(&in->versionlock); @@ -3899,6 +3903,7 @@ void Locker::local_wrlock_grab(LocalLock *lock, Mutation *mut) dout(7) << "local_wrlock_grab on " << *lock << " on " << *lock->get_parent() << dendl; + assert(lock->get_parent()->is_auth()); assert(lock->can_wrlock()); assert(!mut->wrlocks.count(lock)); lock->get_wrlock(mut->get_client()); @@ -3911,6 +3916,7 @@ bool Locker::local_wrlock_start(LocalLock *lock, MDRequest *mut) dout(7) << "local_wrlock_start on " << *lock << " on " << *lock->get_parent() << dendl; + assert(lock->get_parent()->is_auth()); if (lock->can_wrlock()) { assert(!mut->wrlocks.count(lock)); lock->get_wrlock(mut->get_client()); @@ -3942,6 +3948,7 @@ bool Locker::local_xlock_start(LocalLock *lock, MDRequest *mut) dout(7) << "local_xlock_start on " << *lock << " on " << *lock->get_parent() << dendl; + assert(lock->get_parent()->is_auth()); if (!lock->can_xlock_local()) { lock->add_waiter(SimpleLock::WAIT_WR|SimpleLock::WAIT_STABLE, new C_MDS_RetryRequest(mdcache, mut)); return false; diff --git a/src/mds/Server.cc b/src/mds/Server.cc index 4642a13..45c890a 100644 --- a/src/mds/Server.cc +++ b/src/mds/Server.cc @@ -5204,25 +5204,20 @@ void Server::handle_client_rename(MDRequest *mdr) wrlocks.insert(&straydn->get_dir()->inode->nestlock); } - // xlock versionlock on srci if remote? - // this ensures it gets safely remotely auth_pinned, avoiding deadlock; - // strictly speaking, having the slave node freeze the inode is - // otherwise sufficient for avoiding conflicts with inode locks, etc. - if (!srcdn->is_auth() && srcdnl->is_primary()) // xlock versionlock on srci if there are any witnesses - xlocks.insert(&srci->versionlock); - // xlock versionlock on dentries if there are witnesses. // replicas can't see projected dentry linkages, and will get // confused if we try to pipeline things. if (!witnesses.empty()) { - if (srcdn->is_projected()) - xlocks.insert(&srcdn->versionlock); - if (destdn->is_projected()) - xlocks.insert(&destdn->versionlock); - // also take rdlock on all ancestor dentries for destdn. this ensures that the - // destdn can be traversed to by the witnesses. - for (int i=0; i<(int)desttrace.size(); i++) - xlocks.insert(&desttrace[i]->versionlock); + // take xlock on all projected dentries for srcdn and destdn. this ensures + // that the srcdn and destdn can be traversed to by the witnesses. + for (int i= 0; i<(int)srctrace.size(); i++) { + if (srctrace[i]->is_auth() && srctrace[i]->is_projected()) + xlocks.insert(&srctrace[i]->versionlock); + } + for (int i=0; i<(int)desttrace.size(); i++) { + if (desttrace[i]->is_auth() && desttrace[i]->is_projected()) + xlocks.insert(&desttrace[i]->versionlock); + } } // we need to update srci's ctime. xlock its least contended lock to do that...