From patchwork Sat Jun 15 02:05:10 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Goldwyn Rodriues X-Patchwork-Id: 2725771 Return-Path: X-Original-To: patchwork-ocfs2-devel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork1.web.kernel.org (Postfix) with ESMTP id F03DC9F96B for ; Sat, 15 Jun 2013 02:06:20 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id EE7562019D for ; Sat, 15 Jun 2013 02:06:19 +0000 (UTC) Received: from userp1040.oracle.com (userp1040.oracle.com [156.151.31.81]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 99D5820198 for ; Sat, 15 Jun 2013 02:06:18 +0000 (UTC) Received: from acsinet21.oracle.com (acsinet21.oracle.com [141.146.126.237]) by userp1040.oracle.com (Sentrion-MTA-4.3.1/Sentrion-MTA-4.3.1) with ESMTP id r5F25Zed024284 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Sat, 15 Jun 2013 02:05:36 GMT Received: from oss.oracle.com (oss-external.oracle.com [137.254.96.51]) by acsinet21.oracle.com (8.14.4+Sun/8.14.4) with ESMTP id r5F25YfA026364 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sat, 15 Jun 2013 02:05:34 GMT Received: from localhost ([127.0.0.1] helo=oss.oracle.com) by oss.oracle.com with esmtp (Exim 4.63) (envelope-from ) id 1Unfrx-00045n-Uq; Fri, 14 Jun 2013 19:05:33 -0700 Received: from acsinet22.oracle.com ([141.146.126.238]) by oss.oracle.com with esmtp (Exim 4.63) (envelope-from ) id 1Unfrf-00044k-Tq for ocfs2-devel@oss.oracle.com; Fri, 14 Jun 2013 19:05:17 -0700 Received: from userp1020.oracle.com (userp1020.oracle.com [156.151.31.79]) by acsinet22.oracle.com (8.14.4+Sun/8.14.4) with ESMTP id r5F25FQH027962 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK) for ; Sat, 15 Jun 2013 02:05:15 GMT Received: from mail-ye0-f181.google.com (mail-ye0-f181.google.com [209.85.213.181]) by userp1020.oracle.com (Sentrion-MTA-4.3.1/Sentrion-MTA-4.3.1) with ESMTP id r5F24ih3009495 (version=TLSv1/SSLv3 cipher=RC4-SHA bits=128 verify=OK) for ; Sat, 15 Jun 2013 02:04:45 GMT Received: by mail-ye0-f181.google.com with SMTP id g12so358963yee.12 for ; Fri, 14 Jun 2013 19:05:13 -0700 (PDT) X-Received: by 10.236.4.233 with SMTP id 69mr3047695yhj.100.1371261913634; Fri, 14 Jun 2013 19:05:13 -0700 (PDT) Received: from shrek.cartoons (c-75-64-61-89.hsd1.tn.comcast.net. [75.64.61.89]) by mx.google.com with ESMTPSA id x52sm7463246yhh.18.2013.06.14.19.05.12 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Fri, 14 Jun 2013 19:05:13 -0700 (PDT) Date: Fri, 14 Jun 2013 21:05:10 -0500 From: Goldwyn Rodrigues To: ocfs2-devel@oss.oracle.com Message-ID: <20130615020510.GA4487@shrek.cartoons> Mail-Followup-To: ocfs2-devel@oss.oracle.com MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) X-Flow-Control-Info: class=Pass-to-MM reputation=ipRisk-All ip=209.85.213.181 ct-class=R5 ct-vol1=0 ct-vol2=7 ct-vol3=6 ct-risk=49 ct-spam1=81 ct-spam2=4 ct-bulk=2 rcpts=1 size=5958 X-Sendmail-CM-Score: 0.00% X-Sendmail-CM-Analysis: v=2.1 cv=YcKEuWhf c=1 sm=1 tr=0 a=XNVE1oAuebZMzh4wqguR3g==:117 a=d9OfPU+c/iNwybyNyKUfsg==:17 a=LcaDllckn3IA:10 a=gjzOS1-YxKsA:10 a=nDghuxUhq_wA:10 a=o0B0tjt6tGQA:10 a=kj9zAlcOel0A:10 a=pGLkceISAAAA:8 a=iox4zFpeAAAA:8 a=1XWaLZrsAAAA:8 a=y PCof4ZbAAAA:8 a=C_IRinGWAAAA:8 a=lS0MHldHvS4A:10 a=cyPbcwbf_Q0A:10 a=cEqdXuGKAl881V9AqXcA:9 a=CjuIK1q_8ugA:10 a=n9GBPR9yFnkA:10 X-Sendmail-CT-Classification: not spam X-Sendmail-CT-RefID: str=0001.0A090207.51BBCBDA.005D, ss=1, re=0.000, recu=0.000, reip=0.000, cl=1, cld=1, fgs=0 Subject: [Ocfs2-devel] [PATCH] unlink performance: wait for open lock in case of dirs X-BeenThere: ocfs2-devel@oss.oracle.com X-Mailman-Version: 2.1.9 Precedence: list Reply-To: rgoldwyn@suse.com List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: ocfs2-devel-bounces@oss.oracle.com Errors-To: ocfs2-devel-bounces@oss.oracle.com X-Source-IP: acsinet21.oracle.com [141.146.126.237] X-Spam-Status: No, score=-4.5 required=5.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED, FREEMAIL_FROM,RCVD_IN_DNSWL_MED,RP_MATCHES_RCVD,UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This patch is to improve the unlink performance. Here is the scenario: On node A, create multiple directories say d1-d8, and each have 3 files under it f1, f2 and f3. On node B, delete all directories using rm -Rf d* The FS first unlinks f1, f2 and f3. However, when it performs ocfs2_evict_inode() -> ocfs2_delete_inode() -> ocfs2_query_inode_wipe() -> ocfs2_try_open_lock() on d1, it fails with -EAGAIN. The open lock fails because on the remote node a PR->EX convert takes longer than a simple EX grant. This starts a checkpoint because OCFS2_INODE_DELETED flag is not set on the directory inode. Now, a checkpoint interferes with the journaling of the inodes deleted in the following unlinks, in our case, directories d2-d8 and the files contained in it. With this patch, We wait on a directory EX lock only if we already have an open_lock in PR mode. This way we will avoid the ABBA locking. By waiting for the open_lock on the directory, I am getting a unlink performance improvement of a rm -Rf of 50-60% in the usual case. Also, folded ocfs2_open_lock and ocfs2_try_open_lock into one. Let me know if you would like to see the test case. Signed-off-by: Goldwyn Rodrigues Index: linux-3.0-SLE11-SP3/fs/ocfs2/dlmglue.c =================================================================== --- linux-3.0-SLE11-SP3.orig/fs/ocfs2/dlmglue.c 2013-06-14 06:47:29.322506695 -0500 +++ linux-3.0-SLE11-SP3/fs/ocfs2/dlmglue.c 2013-06-14 09:19:48.651924037 -0500 @@ -1681,9 +1681,9 @@ void ocfs2_rw_unlock(struct inode *inode /* * ocfs2_open_lock always get PR mode lock. */ -int ocfs2_open_lock(struct inode *inode) +int ocfs2_open_lock(struct inode *inode, int write, int wait) { - int status = 0; + int status = 0, level, flags; struct ocfs2_lock_res *lockres; struct ocfs2_super *osb = OCFS2_SB(inode->i_sb); @@ -1696,43 +1696,25 @@ int ocfs2_open_lock(struct inode *inode) goto out; lockres = &OCFS2_I(inode)->ip_open_lockres; - - status = ocfs2_cluster_lock(OCFS2_SB(inode->i_sb), lockres, - DLM_LOCK_PR, 0, 0); - if (status < 0) - mlog_errno(status); - -out: - return status; -} - -int ocfs2_try_open_lock(struct inode *inode, int write) -{ - int status = 0, level; - struct ocfs2_lock_res *lockres; - struct ocfs2_super *osb = OCFS2_SB(inode->i_sb); - - BUG_ON(!inode); - - mlog(0, "inode %llu try to take %s open lock\n", - (unsigned long long)OCFS2_I(inode)->ip_blkno, - write ? "EXMODE" : "PRMODE"); - - if (ocfs2_mount_local(osb)) - goto out; - - lockres = &OCFS2_I(inode)->ip_open_lockres; - level = write ? DLM_LOCK_EX : DLM_LOCK_PR; + if (wait) { + flags = 0; + /* If we don't already have the lock in PR mode, + * don't wait. + * + * This should avoid ABBA locking. + */ + if ((lockres->l_level != DLM_LOCK_PR) && write) + flags = DLM_LKF_NOQUEUE; + + } else + flags = DLM_LKF_NOQUEUE; - /* - * The file system may already holding a PRMODE/EXMODE open lock. - * Since we pass DLM_LKF_NOQUEUE, the request won't block waiting on - * other nodes and the -EAGAIN will indicate to the caller that - * this inode is still in use. - */ status = ocfs2_cluster_lock(OCFS2_SB(inode->i_sb), lockres, - level, DLM_LKF_NOQUEUE, 0); + level, flags, 0); + + if ((status < 0) && (flags && (status != -EAGAIN))) + mlog_errno(status); out: return status; Index: linux-3.0-SLE11-SP3/fs/ocfs2/inode.c =================================================================== --- linux-3.0-SLE11-SP3.orig/fs/ocfs2/inode.c 2013-06-13 22:54:32.527606012 -0500 +++ linux-3.0-SLE11-SP3/fs/ocfs2/inode.c 2013-06-14 07:33:53.960196648 -0500 @@ -455,7 +455,7 @@ static int ocfs2_read_locked_inode(struc 0, inode); if (can_lock) { - status = ocfs2_open_lock(inode); + status = ocfs2_open_lock(inode, 0, 1); if (status) { make_bad_inode(inode); mlog_errno(status); @@ -470,7 +470,7 @@ static int ocfs2_read_locked_inode(struc } if (args->fi_flags & OCFS2_FI_FLAG_ORPHAN_RECOVERY) { - status = ocfs2_try_open_lock(inode, 0); + status = ocfs2_open_lock(inode, 0, 0); if (status) { make_bad_inode(inode); return status; @@ -923,7 +923,8 @@ static int ocfs2_query_inode_wipe(struct * Though we call this with the meta data lock held, the * trylock keeps us from ABBA deadlock. */ - status = ocfs2_try_open_lock(inode, 1); + status = ocfs2_open_lock(inode, 1, S_ISDIR(inode->i_mode)); + if (status == -EAGAIN) { status = 0; reason = 3; Index: linux-3.0-SLE11-SP3/fs/ocfs2/dlmglue.h =================================================================== --- linux-3.0-SLE11-SP3.orig/fs/ocfs2/dlmglue.h 2013-06-10 09:45:20.787386504 -0500 +++ linux-3.0-SLE11-SP3/fs/ocfs2/dlmglue.h 2013-06-14 07:38:49.861576515 -0500 @@ -110,8 +110,7 @@ int ocfs2_create_new_inode_locks(struct int ocfs2_drop_inode_locks(struct inode *inode); int ocfs2_rw_lock(struct inode *inode, int write); void ocfs2_rw_unlock(struct inode *inode, int write); -int ocfs2_open_lock(struct inode *inode); -int ocfs2_try_open_lock(struct inode *inode, int write); +int ocfs2_open_lock(struct inode *inode, int write, int wait); void ocfs2_open_unlock(struct inode *inode); int ocfs2_inode_lock_atime(struct inode *inode, struct vfsmount *vfsmnt, Index: linux-3.0-SLE11-SP3/fs/ocfs2/namei.c =================================================================== --- linux-3.0-SLE11-SP3.orig/fs/ocfs2/namei.c 2013-06-13 22:54:32.527606012 -0500 +++ linux-3.0-SLE11-SP3/fs/ocfs2/namei.c 2013-06-14 07:40:06.623785914 -0500 @@ -2302,7 +2302,7 @@ int ocfs2_create_inode_in_orphan(struct } /* get open lock so that only nodes can't remove it from orphan dir. */ - status = ocfs2_open_lock(inode); + status = ocfs2_open_lock(inode, 0, 1); if (status < 0) mlog_errno(status);