From patchwork Tue Jan 29 21:03:23 2013
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Sage Weil <sage@inktank.com>
X-Patchwork-Id: 2063871
Return-Path: <ceph-devel-owner@vger.kernel.org>
X-Original-To: patchwork-ceph-devel@patchwork.kernel.org
Delivered-To: patchwork-process-083081@patchwork2.kernel.org
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by patchwork2.kernel.org (Postfix) with ESMTP id 8F280DF23E
	for <patchwork-ceph-devel@patchwork.kernel.org>;
	Tue, 29 Jan 2013 21:03:28 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1752200Ab3A2VDY (ORCPT
	<rfc822;patchwork-ceph-devel@patchwork.kernel.org>);
	Tue, 29 Jan 2013 16:03:24 -0500
Received: from cobra.newdream.net ([66.33.216.30]:42318 "EHLO
	cobra.newdream.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1750809Ab3A2VDY (ORCPT
	<rfc822; ceph-devel@vger.kernel.org>); Tue, 29 Jan 2013 16:03:24 -0500
Received: from cobra.newdream.net (localhost [127.0.0.1])
	by cobra.newdream.net (Postfix) with ESMTP id DE0918044B;
	Tue, 29 Jan 2013 13:03:23 -0800 (PST)
Received: by cobra.newdream.net (Postfix, from userid 1031)
	id BFF6380452; Tue, 29 Jan 2013 13:03:23 -0800 (PST)
Received: from localhost (localhost [127.0.0.1])
	by cobra.newdream.net (Postfix) with ESMTP id AEFE38044B;
	Tue, 29 Jan 2013 13:03:23 -0800 (PST)
Date: Tue, 29 Jan 2013 13:03:23 -0800 (PST)
From: Sage Weil <sage@inktank.com>
X-X-Sender: sage@cobra.newdream.net
To: Al Viro <viro@ZenIV.linux.org.uk>
cc: linux-fsdevel@vger.kernel.org, ceph-devel@vger.kernel.org
Subject: Re: [ceph] locking fun with d_materialise_unique()
In-Reply-To: <20130129080036.GI4503@ZenIV.linux.org.uk>
Message-ID: <alpine.DEB.2.00.1301291301260.6657@cobra.newdream.net>
References: <20130129045213.GH4503@ZenIV.linux.org.uk>
	<alpine.DEB.2.00.1301282115060.16431@cobra.newdream.net>
	<20130129080036.GI4503@ZenIV.linux.org.uk>
User-Agent: Alpine 2.00 (DEB 1167 2008-08-23)
MIME-Version: 1.0
Sender: ceph-devel-owner@vger.kernel.org
Precedence: bulk
List-ID: <ceph-devel.vger.kernel.org>
X-Mailing-List: ceph-devel@vger.kernel.org

Hi Al,

On Tue, 29 Jan 2013, Al Viro wrote:

> On Mon, Jan 28, 2013 at 09:20:34PM -0800, Sage Weil wrote:
> 
> > Yep, that is indeed a problem.  I think we just need to do the r_aborted 
> > and/or r_locked_dir check in the else if condition...
> > 
> > > I'm not sure if we are guaranteed that ceph_readdir_prepopulate() won't
> > > get to its splice_dentry() and d_delete() calls in similar situations -
> > > I hadn't checked that one yet.  If it isn't guaranteed, we have a problem
> > > there as well.
> > 
> > ...and the condition guarding readdir_prepopulate().  :)
> 
> > I think you're reading it correctly.  The main thing to keep in mind here 
> > is that we *do* need to call fill_inode() for the inode metadata on these 
> > requests to keep the mds and client state in sync.  The dentry state is 
> > safe to ignore.
> 
> You mean the parts under
>         if (rinfo->head->is_dentry) {
> and
>         if (rinfo->head->is_target) {
> in there?  Because there's fill_inode() called from readdir_prepopulate()
> and it's a lot more problematic than those two...

Yep, patch below.
 
> > It would be great to have the dir i_mutex rules summarized somewhere, even 
> > if it is just a copy of the below.  It took a fair bit of trial and error 
> > to infer what was going on when writing this code.  :)
> 
> Directory ->i_mutex rules are in part documented - "what VFS guarantees
> to hold" side is in Documentation/filesystems/directory-locking.  It's
> the other side ("what locks are expected to be held by callers of dcache.c
> functions") that is badly missing...
> 
> > Ping me when you've pushed that branch and I'll take a look...
> 
> To gitolite@ra.kernel.org:/pub/scm/linux/kernel/git/viro/vfs.git
>    01a88fa..4056362  master -> master
> 
> with tentative ceph patch in the very end.  Should be on git.kernel.org
> shortly...

We should drop teh mds_client.c hunk from your patch, and then do 
something like the below.  I'll put it in the ceph tree so we can do some 
basic testing.  Unfortunately, the abort case is a bit annoying to 
trigger.. we'll need to write a new test for it.

Thanks-
sage


From 80d70ef02d5277af8e1355db74fffe71f7217e76 Mon Sep 17 00:00:00 2001
From: Sage Weil <sage@inktank.com>
Date: Tue, 29 Jan 2013 13:01:15 -0800
Subject: [PATCH] ceph: prepopulate inodes only when request is aborted

If r_aborted is true, we do not hold the dir i_mutex, and cannot touch
the dcache.  However, we still need to update the inodes with the state
returned by the MDS.

Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Sage Weil <sage@inktank.com>
---
 fs/ceph/inode.c |   36 ++++++++++++++++++++++++++++++++++++
 1 file changed, 36 insertions(+)

diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index 77b1b18..b51653e 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -1195,6 +1195,39 @@ done:
 /*
  * Prepopulate our cache with readdir results, leases, etc.
  */
+static int readdir_prepopulate_inodes_only(struct ceph_mds_request *req,
+					   struct ceph_mds_session *session)
+{
+	struct ceph_mds_reply_info_parsed *rinfo = &req->r_reply_info;
+	int i, err = 0;
+
+	for (i = 0; i < rinfo->dir_nr; i++) {
+		struct ceph_vino vino;
+		struct inode *in;
+		int rc;
+
+		vino.ino = le64_to_cpu(rinfo->dir_in[i].in->ino);
+		vino.snap = le64_to_cpu(rinfo->dir_in[i].in->snapid);
+
+		in = ceph_get_inode(req->r_dentry->d_sb, vino);
+		if (IS_ERR(in)) {
+			err = PTR_ERR(in);
+			dout("new_inode badness got %d\n", err);
+			continue;
+		}
+		rc = fill_inode(in, &rinfo->dir_in[i], NULL, session,
+				req->r_request_started, -1,
+				&req->r_caps_reservation);
+		if (rc < 0) {
+			pr_err("fill_inode badness on %p got %d\n", in, rc);
+			err = rc;
+			continue;
+		}
+	}
+
+	return err;
+}
+
 int ceph_readdir_prepopulate(struct ceph_mds_request *req,
 			     struct ceph_mds_session *session)
 {
@@ -1209,6 +1242,9 @@ int ceph_readdir_prepopulate(struct ceph_mds_request *req,
 	u64 frag = le32_to_cpu(rhead->args.readdir.frag);
 	struct ceph_dentry_info *di;
 
+	if (req->r_aborted)
+		return readdir_prepopulate_inodes_only(req, session);
+
 	if (le32_to_cpu(rinfo->head->op) == CEPH_MDS_OP_LSSNAP) {
 		snapdir = ceph_get_snapdir(parent->d_inode);
 		parent = d_find_alias(snapdir);