[RPC] parallel directory operations for mainline Linux

From: NeilBrown <neilb@suse.com>

One of the remaining features of ldiskfs which is not in ext4fs is
parallel directory ops.
It would not be possible to get this upstream without VFS support
for parallel directory ops.  Lustre doesn't use the VFS interfaces
so this lack is not an immediate problem for lustre, but it is a
real problem for upstreaming.

This patch (which seems to work in my testing so far, but is probably
still buggy) adds VFS support for parallel dir ops - create and remove.
I haven't attempted rename - it would be complex for various reasons and
while I'm sure it is possible, I'm not sure it is worth the effort.

With this patch a filesystem can indicate that it supports parallel ops
by setting a flag on a directory.  The VFS will then get exclusive
access to the dentry - instead of the whole directory - when
performing the operation.

A filesystem which supports this much have its own locking to ensure
that lookup, readdir, create, unlink can all happen in parallel.
For NFS this is easy as the server takes care of those details, so
this patch also adds parallel-ops support for NFS.
For a filesystem like ext4 it would mean adding some locking to
the internal data structures.

I've had a bit of a look at the parallel-ops patch for ldiskfs and I
think it is over-engineered.  We don't need a new locking primitive.

I suspect I would start by adding a seqlock to each htree node.
This allows reads to proceed locklessly when no changes are happening
(if they are careful not to get confused by an inconsistent node).
A modification would normally find the relevant leaf with a similar
lockless walk, then lock the leaf, verify the seq-lock on the parent
hasn't changed, and perform the update.
In the rarer case when a leaf needs to split or merge something more
heavy handed would be needed - possibly lock the whole tree - possibly
just lock a higher node.

I don't expect to look at ext4 parallel ops in more detail in the
immediate future, and I don't plan to post this upstream until we have
credible support in ext4.  So I'm just posting it here now in case
anyone else want to explore how to make ext4 work with this.

NeilBrown

From 827c01aee1cb74b72e5dbb2f40c01666b914bc15 Mon Sep 17 00:00:00 2001
From: NeilBrown <neilb@suse.com>
Date: Fri, 16 Nov 2018 19:58:53 +1100
Subject: [PATCH] VFS: support parallel updates in the one directory.

Some filesystems can support parallel modifications to a directory,
either because the modification happen on a remote server which does
its own locking (e.g. NFS) or because they can internally lock just
a part of a directory (e.g. many local filesystems, with a bit of
work).

To support these, we introduce support for parallel modification:
unlink (including rmdir) and create.

If a filesystem supports parallel modification in a given directory,
it sets S_PAR_UNLINK on the inode for that directory.
lookup_open() and the new lookup_hash_modify() (similar to
__lookup_hash()) notice the flag and take a shared
lock on the directory.

Once a dentry for the target name has been obtained,
DCACHE_PAR_UPDATE is set on it, waiting if necessary.
Once this is set, the thread has exclusive access to the
name and can call into the filesystem to perform
the required action.

Some files do *not* complete the lookup that precedes
a create, but leave the dentry d_in_lookup() and unhashed,
so often a dentry will have both DCACHE_PAR_LOOKUP and
DCACHE_PAR_UPDATE set at the same time.  To allow
for this, we need the 'wq' that is used when DCACHE_PAR_LOOKUP is
cleared, to exist until the creation is complete.  We also
need to re-initialize it if it might get re-used.

As NFS trivially supports parallel unlinks, this patch also adds the
flag to all NFS directories.

Signed-off-by: NeilBrown <neilb@suse.com>
---
 fs/dcache.c            |  37 ++++++++++
 fs/namei.c             | 189 ++++++++++++++++++++++++++++++++++++++++++-------
 fs/nfs/dir.c           |   2 +-
 fs/nfs/inode.c         |   2 +
 fs/nfs/unlink.c        |   4 +-
 include/linux/dcache.h |  43 +++++++++++
 include/linux/fs.h     |   1 +
 7 files changed, 249 insertions(+), 29 deletions(-)

Message ID	8736rsbdx1.fsf@notabene.neil.brown.name (mailing list archive)
State	New, archived
Headers	show Return-Path: <lustre-devel-bounces@lists.lustre.org> Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A19D513B5 for <patchwork-lustre-devel@patchwork.kernel.org>; Fri, 23 Nov 2018 04:45:13 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 777592BA08 for <patchwork-lustre-devel@patchwork.kernel.org>; Fri, 23 Nov 2018 04:45:13 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 66EAE2C0B5; Fri, 23 Nov 2018 04:45:13 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 0799B2BA08 for <patchwork-lustre-devel@patchwork.kernel.org>; Fri, 23 Nov 2018 04:45:11 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 778EB21F760; Thu, 22 Nov 2018 20:45:10 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from mx1.suse.de (mx2.suse.de [195.135.220.15]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id C2D9321F619 for <lustre-devel@lists.lustre.org>; Thu, 22 Nov 2018 20:45:07 -0800 (PST) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay1.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 7DA53AF89 for <lustre-devel@lists.lustre.org>; Fri, 23 Nov 2018 04:45:06 +0000 (UTC) From: NeilBrown <neilb@suse.com> To: Lustre Developement <lustre-devel@lists.lustre.org> Date: Fri, 23 Nov 2018 15:44:58 +1100 Message-ID: <8736rsbdx1.fsf@notabene.neil.brown.name> MIME-Version: 1.0 Subject: [lustre-devel] [RPC] parallel directory operations for mainline Linux X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." <lustre-devel-lustre.org> List-Unsubscribe: <http://lists.lustre.org/options.cgi/lustre-devel-lustre.org>, <mailto:lustre-devel-request@lists.lustre.org?subject=unsubscribe> List-Archive: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/> List-Post: <mailto:lustre-devel@lists.lustre.org> List-Help: <mailto:lustre-devel-request@lists.lustre.org?subject=help> List-Subscribe: <http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org>, <mailto:lustre-devel-request@lists.lustre.org?subject=subscribe> Content-Type: multipart/mixed; boundary="===============7503731167880755605==" Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" <lustre-devel-bounces@lists.lustre.org> X-Virus-Scanned: ClamAV using ClamSMTP
Series	[RPC] parallel directory operations for mainline Linux \| expand [RPC] parallel directory operations for mainline Linux

[RPC] parallel directory operations for mainline Linux

Commit Message

Comments

Patch