[alternative] xfs: per-cpu deferred inode inactivation queues

From: Dave Chinner <dchinner@redhat.com>

From: Dave Chinner <dchinner@redhat.com>

Move inode inactivation to background work contexts so that it no
longer runs in the context that releases the final reference to an
inode. This will allow process work that ends up blocking on
inactivation to continue doing work while the filesytem processes
the inactivation in the background.

A typical demonstration of this is unlinking an inode with lots of
extents. The extents are removed during inactivation, so this blocks
the process that unlinked the inode from the directory structure. By
moving the inactivation to the background process, the userspace
applicaiton can keep working (e.g. unlinking the next inode in the
directory) while the inactivation work on the previous inode is
done by a different CPU.

The implementation of the queue is relatively simple. We use a
per-cpu lockless linked list (llist) to queue inodes for
inactivation without requiring serialisation mechanisms, and a work
item to allow the queue to be processed by a CPU bound worker
thread. We also keep a count of the queue depth so that we can
trigger work after a number of deferred inactivations have been
queued.

The use of a bound workqueue with a single work depth allows the
workqueue to run one work item per CPU. We queue the work item on
the CPU we are currently running on, and so this essentially gives
us affine per-cpu worker threads for the per-cpu queues. THis
maintains the effective CPU affinity that occurs within XFS at the
AG level due to all objects in a directory being local to an AG.
Hence inactivation work tends to run on the same CPU that last
accessed all the objects that inactivation accesses and this
maintains hot CPU caches for unlink workloads.

A depth of 32 inodes was chosen to match the number of inodes in an
inode cluster buffer. This hopefully allows sequential
allocation/unlink behaviours to defering inactivation of all the
inodes in a single cluster buffer at a time, further helping
maintain hot CPU and buffer cache accesses while running
inactivations.

A hard per-cpu queue throttle of 256 inode has been set to avoid
runaway queuing when inodes that take a long to time inactivate are
being processed. For example, when unlinking inodes with large
numbers of extents that can take a lot of processing to free.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---

Hi Darrick,

This is the current version of the per-cpu deferred queues updated
to replace patch 3 in this series. There are no performance
regressions that I've measured with this, and most of fstests is
passing. There are some failures that I haven't looked at yet -
g/055, g/102, g/219, g/226, g/233, and so on. THese tests did not
fail with my original "hack the queue onto the end of the series"
patch - there were zero regressions from that patch so clearly some
of the fixes later in this patch series are still necessary. Or I
screwed up/missed a flush location that those tests would have
otherwise triggered. I suspect patch 19(?) that triggers an inodegc
flush from the blockgc flush at ENOSPC might be one of the missing
pieces...

Hence I don't think these failures have to do with the relative lack
of throttling, low space management or memory pressure detection.
More tests are failing on my 16GB test VM than the 512MB test VM,
and in general I haven't seen memory pressure have any impact on
this queuing mechanism at all.

I suspect that means most of the rest of the patchset is not
necessary for inodegc management. I haven't yet gone through them
to see which ones address the failures I'm seeing, so that's the
next step here.

It would be good if you can run this through you test setups for
this patchset to see if it behaves well in those situations. If it
reproduces the same failures as Im seeing, then maybe by the time
I'm awake again you've worked out which remaining bits of the
patchset are still required....

Cheers,

Dave.

 fs/xfs/scrub/common.c    |   7 +
 fs/xfs/xfs_icache.c      | 338 +++++++++++++++++++++++++++++++++++------------
 fs/xfs/xfs_icache.h      |   5 +
 fs/xfs/xfs_inode.h       |  20 ++-
 fs/xfs/xfs_log_recover.c |   7 +
 fs/xfs/xfs_mount.c       |  26 +++-
 fs/xfs/xfs_mount.h       |  34 ++++-
 fs/xfs/xfs_super.c       | 111 +++++++++++++++-
 fs/xfs/xfs_trace.h       |  50 ++++++-
 9 files changed, 505 insertions(+), 93 deletions(-)

Message ID	20210803083403.GI2757197@dread.disaster.area (mailing list archive)
State	Superseded
Headers	show Return-Path: <linux-xfs-owner@kernel.org> X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0AF9BC4338F for <linux-xfs@archiver.kernel.org>; Tue, 3 Aug 2021 08:34:08 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E4B1160F8F for <linux-xfs@archiver.kernel.org>; Tue, 3 Aug 2021 08:34:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234559AbhHCIeR (ORCPT <rfc822;linux-xfs@archiver.kernel.org>); Tue, 3 Aug 2021 04:34:17 -0400 Received: from mail104.syd.optusnet.com.au ([211.29.132.246]:41993 "EHLO mail104.syd.optusnet.com.au" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234551AbhHCIeR (ORCPT <rfc822;linux-xfs@vger.kernel.org>); Tue, 3 Aug 2021 04:34:17 -0400 Received: from dread.disaster.area (pa49-195-182-146.pa.nsw.optusnet.com.au [49.195.182.146]) by mail104.syd.optusnet.com.au (Postfix) with ESMTPS id A764F864F79; Tue, 3 Aug 2021 18:34:04 +1000 (AEST) Received: from dave by dread.disaster.area with local (Exim 4.92.3) (envelope-from <david@fromorbit.com>) id 1mApsB-00DyP8-K2; Tue, 03 Aug 2021 18:34:03 +1000 Date: Tue, 3 Aug 2021 18:34:03 +1000 From: Dave Chinner <david@fromorbit.com> To: "Darrick J. Wong" <djwong@kernel.org> Cc: linux-xfs@vger.kernel.org, hch@infradead.org Subject: [PATCH, alternative] xfs: per-cpu deferred inode inactivation queues Message-ID: <20210803083403.GI2757197@dread.disaster.area> References: <162758423315.332903.16799817941903734904.stgit@magnolia> <162758425012.332903.3784529658243630550.stgit@magnolia> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <162758425012.332903.3784529658243630550.stgit@magnolia> X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.3 cv=F8MpiZpN c=1 sm=1 tr=0 a=QpfB3wCSrn/dqEBSktpwZQ==:117 a=QpfB3wCSrn/dqEBSktpwZQ==:17 a=kj9zAlcOel0A:10 a=MhDmnRu9jo8A:10 a=20KFwNOVAAAA:8 a=j3tfVDrrSPF_Te4g2a8A:9 a=Nn-xEw_3PKFq6CBI:21 a=Fqgz9hWL3zcueUg8:21 a=CjuIK1q_8ugA:10 Precedence: bulk List-ID: <linux-xfs.vger.kernel.org> X-Mailing-List: linux-xfs@vger.kernel.org
Series	[alternative] xfs: per-cpu deferred inode inactivation queues \| expand [alternative] xfs: per-cpu deferred inode inactivation queues

[alternative] xfs: per-cpu deferred inode inactivation queues

Commit Message

Comments

Patch