From patchwork Sat Jul 23 00:02:47 2011 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sage Weil X-Patchwork-Id: 1001382 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by demeter1.kernel.org (8.14.4/8.14.4) with ESMTP id p6N08HNo032329 for ; Sat, 23 Jul 2011 00:16:08 GMT Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751416Ab1GWAGA (ORCPT ); Fri, 22 Jul 2011 20:06:00 -0400 Received: from cobra.newdream.net ([66.33.216.30]:44317 "EHLO cobra.newdream.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754009Ab1GWAFy (ORCPT ); Fri, 22 Jul 2011 20:05:54 -0400 Received: from localhost.localdomain (ip-64-111-111-107.dreamhost.com [64.111.111.107]) by cobra.newdream.net (Postfix) with ESMTPA id 58220BC654; Fri, 22 Jul 2011 17:09:37 -0700 (PDT) From: Sage Weil To: ceph-devel@vger.kernel.org Cc: Sage Weil Subject: [PATCH 08/23] ceph: avoid carrying Fw cap during write into page cache Date: Fri, 22 Jul 2011 17:02:47 -0700 Message-Id: <1311379382-9218-9-git-send-email-sage@newdream.net> X-Mailer: git-send-email 1.7.0 In-Reply-To: <1311379382-9218-1-git-send-email-sage@newdream.net> References: <1311379382-9218-1-git-send-email-sage@newdream.net> Sender: ceph-devel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org X-Greylist: IP, sender and recipient auto-whitelisted, not delayed by milter-greylist-4.2.6 (demeter1.kernel.org [140.211.167.41]); Sat, 23 Jul 2011 00:16:09 +0000 (UTC) The generic_file_aio_write call may block on balance_dirty_pages while we flush data to the OSDs. If we hold a reference to the FILE_WR cap during that interval revocation by the MDS (e.g., to do a stat(2)) may be very slow. Signed-off-by: Sage Weil --- fs/ceph/file.c | 22 +++++++++++++++++++--- 1 files changed, 19 insertions(+), 3 deletions(-) diff --git a/fs/ceph/file.c b/fs/ceph/file.c index 44e4fe9..6c90cf0 100644 --- a/fs/ceph/file.c +++ b/fs/ceph/file.c @@ -713,7 +713,7 @@ retry_snap: want = CEPH_CAP_FILE_BUFFER; ret = ceph_get_caps(ci, CEPH_CAP_FILE_WR, want, &got, endoff); if (ret < 0) - goto out; + goto out_put; dout("aio_write %p %llx.%llx %llu~%u got cap refs on %s\n", inode, ceph_vinop(inode), pos, (unsigned)iov->iov_len, @@ -726,8 +726,18 @@ retry_snap: ret = ceph_sync_write(file, iov->iov_base, iov->iov_len, &iocb->ki_pos); } else { - ret = generic_file_aio_write(iocb, iov, nr_segs, pos); + /* + * buffered write; drop Fw early to avoid slow + * revocation if we get stuck on balance_dirty_pages + */ + int dirty; + + spin_lock(&inode->i_lock); + dirty = __ceph_mark_dirty_caps(ci, CEPH_CAP_FILE_WR); + spin_unlock(&inode->i_lock); + ceph_put_cap_refs(ci, got); + ret = generic_file_aio_write(iocb, iov, nr_segs, pos); if ((ret >= 0 || ret == -EIOCBQUEUED) && ((file->f_flags & O_SYNC) || IS_SYNC(file->f_mapping->host) || ceph_osdmap_flag(osdc->osdmap, CEPH_OSDMAP_NEARFULL))) { @@ -735,7 +745,12 @@ retry_snap: if (err < 0) ret = err; } + + if (dirty) + __mark_inode_dirty(inode, dirty); + goto out; } + if (ret >= 0) { int dirty; spin_lock(&inode->i_lock); @@ -745,12 +760,13 @@ retry_snap: __mark_inode_dirty(inode, dirty); } -out: +out_put: dout("aio_write %p %llx.%llx %llu~%u dropping cap refs on %s\n", inode, ceph_vinop(inode), pos, (unsigned)iov->iov_len, ceph_cap_string(got)); ceph_put_cap_refs(ci, got); +out: if (ret == -EOLDSNAPC) { dout("aio_write %p %llx.%llx %llu~%u got EOLDSNAPC, retrying\n", inode, ceph_vinop(inode), pos, (unsigned)iov->iov_len);