[3/5,RFC] xfs: use percpu counters for CIL context counters

From: Dave Chinner <dchinner@redhat.com>

From: Dave Chinner <dchinner@redhat.com>

With the m_active_trans atomic bottleneck out of the way, the CIL
xc_cil_lock is the next bottleneck that causes cacheline contention.
This protects several things, the first of which is the CIL context
reservation ticket and space usage counters.

We can lift them out of the xc_cil_lock by converting them to
percpu counters. THis involves two things, the first of which is
lifting calculations and samples that don't actually need protecting
from races outside the xc_cil lock.

The second is converting the counters to percpu counters and lifting
them outside the lock. This requires a couple of tricky things to
minimise initial state races and to ensure we take into account
split reservations. We do this by erring on the "take the
reservation just in case" side, which largely lost in the noise of
many frequent large transactions.

We use a trick with percpu_counter_add_batch() to ensure the global
sum is updated immediately on first reservation, hence allowing us
to use fast counter reads everywhere to determine if the CIL is
empty or not, rather than using the list itself. This is important
for later patches where the CIL is moved to percpu lists
and hence cannot use list_empty() to detect an empty CIL. Hence we
provide a low overhead, lockless mechanism for determining if the
CIL is empty or not via this mechanisms. All other percpu counter
updates use a large batch count so they aggregate on the local CPU
and minimise global sum updates.

The xc_ctx_lock rwsem protects draining the percpu counters to the
context's ticket, similar to the way it allows access to the CIL
without using the xc_cil_lock. i.e. the CIL push has exclusive
access to the CIL, the context and the percpu counters while holding
the xc_ctx_lock. This ensures that we can sum and zero the counters
atomically from the perspective of the transaction commit side of
the push. i.e. they reset to zero atomically with the CIL context
swap and hence we don't need to have the percpu counters attached to
the CIL context.

Performance wise, this increases the transaction rate from
~620,000/s to around 750,000/second. Using a 32-way concurrent
create instead of 16-way on a 32p/16GB virtual machine:

		create time	rate		unlink time
unpatched	  2m03s      472k/s+/-9k/s	 3m6s
patched		  1m56s	     533k/s+/-28k/s	 2m34

Notably, the system time for the create went from 44m20s down to
38m37s, whilst going faster. There is more variance, but I think
that is from the cacheline contention having inconsistent overhead.

XXX: probably should split into two patches

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_log_cil.c  | 99 ++++++++++++++++++++++++++++++-------------
 fs/xfs/xfs_log_priv.h |  2 +
 2 files changed, 72 insertions(+), 29 deletions(-)

Message ID	20200512092811.1846252-4-david@fromorbit.com (mailing list archive)
State	Deferred, archived
Headers	show Return-Path: <SRS0=k9vQ=62=vger.kernel.org=linux-xfs-owner@kernel.org> Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7C74615E6 for <patchwork-linux-xfs@patchwork.kernel.org>; Tue, 12 May 2020 09:28:19 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 68F152075E for <patchwork-linux-xfs@patchwork.kernel.org>; Tue, 12 May 2020 09:28:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726891AbgELJ2T (ORCPT <rfc822;patchwork-linux-xfs@patchwork.kernel.org>); Tue, 12 May 2020 05:28:19 -0400 Received: from mail105.syd.optusnet.com.au ([211.29.132.249]:35755 "EHLO mail105.syd.optusnet.com.au" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725889AbgELJ2S (ORCPT <rfc822;linux-xfs@vger.kernel.org>); Tue, 12 May 2020 05:28:18 -0400 Received: from dread.disaster.area (pa49-195-157-175.pa.nsw.optusnet.com.au [49.195.157.175]) by mail105.syd.optusnet.com.au (Postfix) with ESMTPS id 5FD283A2E4C for <linux-xfs@vger.kernel.org>; Tue, 12 May 2020 19:28:15 +1000 (AEST) Received: from discord.disaster.area ([192.168.253.110]) by dread.disaster.area with esmtp (Exim 4.92.3) (envelope-from <david@fromorbit.com>) id 1jYRCw-0004H5-BY for linux-xfs@vger.kernel.org; Tue, 12 May 2020 19:28:14 +1000 Received: from dave by discord.disaster.area with local (Exim 4.93) (envelope-from <david@fromorbit.com>) id 1jYRCw-007kYW-2S for linux-xfs@vger.kernel.org; Tue, 12 May 2020 19:28:14 +1000 From: Dave Chinner <david@fromorbit.com> To: linux-xfs@vger.kernel.org Subject: [PATCH 3/5] [RFC] xfs: use percpu counters for CIL context counters Date: Tue, 12 May 2020 19:28:09 +1000 Message-Id: <20200512092811.1846252-4-david@fromorbit.com> X-Mailer: git-send-email 2.26.1.301.g55bc3eb7cb9 In-Reply-To: <20200512092811.1846252-1-david@fromorbit.com> References: <20200512092811.1846252-1-david@fromorbit.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.3 cv=QIgWuTDL c=1 sm=1 tr=0 a=ONQRW0k9raierNYdzxQi9Q==:117 a=ONQRW0k9raierNYdzxQi9Q==:17 a=sTwFKg_x9MkA:10 a=20KFwNOVAAAA:8 a=EZ1i1sUKn3mE4C4D1V4A:9 a=0fyGEN4HqmrUkTAy:21 a=GkttGSZADD73hktX:21 Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: <linux-xfs.vger.kernel.org> X-Mailing-List: linux-xfs@vger.kernel.org
Series	xfs: fix a couple of performance issues \| expand [0/5,v2] xfs: fix a couple of performance issues [1/5] xfs: separate read-only variables in struct xfs_mount [2/5] xfs: convert m_active_trans counter to per-cpu [3/5,RFC] xfs: use percpu counters for CIL context counters [4/5,RFC] xfs: per-cpu CIL lists [5/5,RFC] xfs: make CIl busy extent lists per-cpu

[3/5,RFC] xfs: use percpu counters for CIL context counters

Commit Message

Comments

Patch