[10/45] xfs: reduce buffer log item shadow allocations

From: Dave Chinner <dchinner@redhat.com>

From: Dave Chinner <dchinner@redhat.com>

When we modify btrees repeatedly, we regularly increase the size of
the logged region by a single chunk at a time (per transaction
commit). This results in the CIL formatting code having to
reallocate the log vector buffer every time the buffer dirty region
grows. Hence over a typical 4kB btree buffer, we might grow the log
vector 4096/128 = 32x over a short period where we repeatedly add
or remove records to/from the buffer over a series of running
transaction. This means we are doing 32 memory allocations and frees
over this time during a performance critical path in the journal.

The amount of space tracked in the CIL for the object is calculated
during the ->iop_format() call for the buffer log item, but the
buffer memory allocated for it is calculated by the ->iop_size()
call. The size callout determines the size of the buffer, the format
call determines the space used in the buffer.

Hence we can oversize the buffer space required in the size
calculation without impacting the amount of space used and accounted
to the CIL for the changes being logged. This allows us to reduce
the number of allocations by rounding up the buffer size to allow
for future growth. This can safe a substantial amount of CPU time in
this path:

-   46.52%     2.02%  [kernel]                  [k] xfs_log_commit_cil
   - 44.49% xfs_log_commit_cil
      - 30.78% _raw_spin_lock
         - 30.75% do_raw_spin_lock
              30.27% __pv_queued_spin_lock_slowpath

(oh, ouch!)
....
      - 1.05% kmem_alloc_large
         - 1.02% kmem_alloc
              0.94% __kmalloc

This overhead here us what this patch is aimed at. After:

      - 0.76% kmem_alloc_large
         - 0.75% kmem_alloc
              0.70% __kmalloc

The size of 512 bytes is based on the bitmap chunk size being 128
bytes and that random directory entry updates almost never require
more than 3-4 128 byte regions to be logged in the directory block.

The other observation is for per-ag btrees. When we are inserting
into a new btree block, we'll pack it from the front. Hence the
first few records land in the first 128 bytes so we log only 128
bytes, the next 8-16 records land in the second region so now we log
256 bytes. And so on.  If we are doing random updates, it will only
allocate every 4 random 128 byte regions that are dirtied instead of
every single one.

Any larger than 512 bytes and I noticed an increase in memory
footprint in my scalability workloads. Any less than this and I
didn't really see any significant benefit to CPU usage.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_buf_item.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

Message ID	20210305051143.182133-11-david@fromorbit.com (mailing list archive)
State	Superseded
Headers	show Return-Path: <linux-xfs-owner@kernel.org> X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4799AC433DB for <linux-xfs@archiver.kernel.org>; Fri, 5 Mar 2021 05:11:58 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 145B065017 for <linux-xfs@archiver.kernel.org>; Fri, 5 Mar 2021 05:11:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229446AbhCEFLz (ORCPT <rfc822;linux-xfs@archiver.kernel.org>); Fri, 5 Mar 2021 00:11:55 -0500 Received: from mail106.syd.optusnet.com.au ([211.29.132.42]:40392 "EHLO mail106.syd.optusnet.com.au" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229497AbhCEFLw (ORCPT <rfc822;linux-xfs@vger.kernel.org>); Fri, 5 Mar 2021 00:11:52 -0500 Received: from dread.disaster.area (pa49-179-130-210.pa.nsw.optusnet.com.au [49.179.130.210]) by mail106.syd.optusnet.com.au (Postfix) with ESMTPS id DA4AF78AFD1 for <linux-xfs@vger.kernel.org>; Fri, 5 Mar 2021 16:11:50 +1100 (AEDT) Received: from discord.disaster.area ([192.168.253.110]) by dread.disaster.area with esmtp (Exim 4.92.3) (envelope-from <david@fromorbit.com>) id 1lI2kg-00FboC-CB for linux-xfs@vger.kernel.org; Fri, 05 Mar 2021 16:11:50 +1100 Received: from dave by discord.disaster.area with local (Exim 4.94) (envelope-from <david@fromorbit.com>) id 1lI2kg-000lZE-3y for linux-xfs@vger.kernel.org; Fri, 05 Mar 2021 16:11:50 +1100 From: Dave Chinner <david@fromorbit.com> To: linux-xfs@vger.kernel.org Subject: [PATCH 10/45] xfs: reduce buffer log item shadow allocations Date: Fri, 5 Mar 2021 16:11:08 +1100 Message-Id: <20210305051143.182133-11-david@fromorbit.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20210305051143.182133-1-david@fromorbit.com> References: <20210305051143.182133-1-david@fromorbit.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.3 cv=F8MpiZpN c=1 sm=1 tr=0 cx=a_idp_d a=JD06eNgDs9tuHP7JIKoLzw==:117 a=JD06eNgDs9tuHP7JIKoLzw==:17 a=dESyimp9J3IA:10 a=20KFwNOVAAAA:8 a=pGLkceISAAAA:8 a=VwQbUJbxAAAA:8 a=FNdSPaME_fANvIEDIncA:9 a=AjGcO6oz07-iQ99wixmX:22 Precedence: bulk List-ID: <linux-xfs.vger.kernel.org> X-Mailing-List: linux-xfs@vger.kernel.org
Series	xfs: consolidated log and optimisation changes \| expand [00/45,v3] xfs: consolidated log and optimisation changes [01/45] xfs: initialise attr fork on inode create [02/45] xfs: log stripe roundoff is a property of the log [03/45] xfs: separate CIL commit record IO [04/45] xfs: remove xfs_blkdev_issue_flush [05/45] xfs: async blkdev cache flush [06/45] xfs: CIL checkpoint flushes caches unconditionally [07/45] xfs: remove need_start_rec parameter from xlog_write() [08/45] xfs: journal IO cache flush reductions [09/45] xfs: Fix CIL throttle hang when CIL space used going backwards [10/45] xfs: reduce buffer log item shadow allocations [11/45] xfs: xfs_buf_item_size_segment() needs to pass segment offset [12/45] xfs: optimise xfs_buf_item_size/format for contiguous regions [13/45] xfs: xfs_log_force_lsn isn't passed a LSN [14/45] xfs: AIL needs asynchronous CIL forcing [15/45] xfs: CIL work is serialised, not pipelined [16/45] xfs: type verification is expensive [17/45] xfs: No need for inode number error injection in __xfs_dir3_data_check [18/45] xfs: reduce debug overhead of dir leaf/node checks [19/45] xfs: factor out the CIL transaction header building [20/45] xfs: only CIL pushes require a start record [21/45] xfs: embed the xlog_op_header in the unmount record [22/45] xfs: embed the xlog_op_header in the commit record [23/45] xfs: log tickets don't need log client id [24/45] xfs: move log iovec alignment to preparation function [25/45] xfs: reserve space and initialise xlog_op_header in item formatting [26/45] xfs: log ticket region debug is largely useless [27/45] xfs: pass lv chain length into xlog_write() [28/45] xfs: introduce xlog_write_single() [29/45] xfs:_introduce xlog_write_partial() [30/45] xfs: xlog_write() no longer needs contwr state [31/45] xfs: CIL context doesn't need to count iovecs [32/45] xfs: use the CIL space used counter for emptiness checks [33/45] xfs: lift init CIL reservation out of xc_cil_lock [34/45] xfs: rework per-iclog header CIL reservation [35/45] xfs: introduce per-cpu CIL tracking sructure [36/45] xfs: implement percpu cil space used calculation [37/45] xfs: track CIL ticket reservation in percpu structure [38/45] xfs: convert CIL busy extents to per-cpu [39/45] xfs: Add order IDs to log items in CIL [40/45] xfs: convert CIL to unordered per cpu lists [41/45] xfs: move CIL ordering to the logvec chain [42/45] xfs: __percpu_counter_compare() inode count debug too expensive [43/45] xfs: avoid cil push lock if possible [44/45] xfs: xlog_sync() manually adjusts grant head space [45/45] xfs: expanding delayed logging design with background material

[10/45] xfs: reduce buffer log item shadow allocations

Commit Message

Comments

Patch