From patchwork Wed Sep 15 23:42:06 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 12497879 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.4 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CCCB0C433EF for ; Wed, 15 Sep 2021 23:42:07 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B134C61108 for ; Wed, 15 Sep 2021 23:42:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233067AbhIOXn0 (ORCPT ); Wed, 15 Sep 2021 19:43:26 -0400 Received: from mail.kernel.org ([198.145.29.99]:45348 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233162AbhIOXnZ (ORCPT ); Wed, 15 Sep 2021 19:43:25 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id 3A0BD60F25; Wed, 15 Sep 2021 23:42:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1631749326; bh=J6afXhYUJhfjvhm0Oj2J5P5yH5fOBL3R9W15pHLq2Lg=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=E3n+Zp84s+IH+rZrj35kM78Ny/4SQylyOb+EdqPYID9OQ/O3/p4agepAr+KpFDIhs pDqOTqt7Bq7Wmh8utzJWAm6TzbVrapncfI4MGkJV4Bu6sWZtq09KGxpzMr4itTyIJ7 wr/vquWyeKUR9toFBs6OAWY9c9EW3x9VdO/p4plekeSrGyX7jFll2DYyRgWzcZLPke JmiaEsFSmoEKZESDD82HlMfdJSxJ7+GQq06tOaofaWWU30qH15+QdeBWNUwNcc9LN/ g4+AF0i4lg1XcNvo/xxrFif6cR9CfaBiWGueOAHBctwCfqBt98N63l3sTDIqFz4ixH DQElebDKeJ7yQ== Subject: [PATCH 1/1] common/rc: re-fix detection of device-mapper/persistent memory incompatibility From: "Darrick J. Wong" To: djwong@kernel.org, guaneryu@gmail.com Cc: linux-xfs@vger.kernel.org, fstests@vger.kernel.org, guan@eryu.me Date: Wed, 15 Sep 2021 16:42:06 -0700 Message-ID: <163174932597.379383.18426474248994143835.stgit@magnolia> In-Reply-To: <163174932046.379383.10637812567210248503.stgit@magnolia> References: <163174932046.379383.10637812567210248503.stgit@magnolia> User-Agent: StGit/0.19 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: fstests@vger.kernel.org From: Darrick J. Wong In commit e05491b3, I tried to resolve false test failures that were a result of device mapper refusing to change access modes on a block device that supports the FSDAX access mode. Unfortunately, I did not realize that there are two ways that fsdax support can be detected via sysfs: /sys/block/XXX/queue/dax and /sys/block/XXX/dax/, so I only added a test for the latter. As of 5.15-rc1 this doesn't seem to work anymore for some reason. I don't know enough about the byzantine world of pmem device driver initialization, but fsdax mode actually does work even though the /sys/block/XXX/dax/ path went away. So clearly we have to detect it via the other sysfs path. Fixes: e05491b3 ("common/rc: fix detection of device-mapper/persistent memory incompatibility") Signed-off-by: Darrick J. Wong Reviewed-by: Christoph Hellwig --- common/rc | 18 +++++++++++++++--- 1 file changed, 15 insertions(+), 3 deletions(-) diff --git a/common/rc b/common/rc index 154bc2dd..275b1f24 100644 --- a/common/rc +++ b/common/rc @@ -1964,6 +1964,20 @@ _require_sane_bdev_flush() fi } +# Decide if the scratch filesystem is likely to be mounted in fsdax mode. +# If there's a dax clause in the mount options we assume the test runner +# wants us to test DAX; or if the scratch device itself advertises dax mode +# in sysfs. +__detect_scratch_fsdax() +{ + _normalize_mount_options | egrep -q "dax(=always| |$)" && return 0 + + local sysfs="/sys/block/$(_short_dev $SCRATCH_DEV)" + test -e "${sysfs}/dax" && return 0 + test "$(cat "${sysfs}/queue/dax" 2>/dev/null)" = "1" && return 0 + return 1 +} + # this test requires a specific device mapper target _require_dm_target() { @@ -1975,9 +1989,7 @@ _require_dm_target() _require_sane_bdev_flush $SCRATCH_DEV _require_command "$DMSETUP_PROG" dmsetup - _normalize_mount_options | egrep -q "dax(=always| |$)" || \ - test -e "/sys/block/$(_short_dev $SCRATCH_DEV)/dax" - if [ $? -eq 0 ]; then + if __detect_scratch_fsdax; then case $target in stripe|linear|log-writes) ;; From patchwork Fri Sep 17 00:48:29 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 12500709 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.4 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8EE53C433EF for ; Fri, 17 Sep 2021 00:48:31 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 6C39E6120F for ; Fri, 17 Sep 2021 00:48:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241949AbhIQAtv (ORCPT ); Thu, 16 Sep 2021 20:49:51 -0400 Received: from mail.kernel.org ([198.145.29.99]:36624 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232111AbhIQAtv (ORCPT ); Thu, 16 Sep 2021 20:49:51 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id 31CF7611C8; Fri, 17 Sep 2021 00:48:30 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1631839710; bh=HxJw6vOW5k2FFFL10OyVeCgu2mw6w08nOXbqpgFz3FM=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=ZQv99HZT0ikpMoP+A5Ly8TWvHxpY3gTL046g5NPj0zgX+ozy/s9gR4HAtr/HcqXCW gzXFV5ntSy/Nmdw9iXXpNbgRMrdTazAjv4JRfRASujpalHyRiSIC8CFsAgfHHJSSPS fJxD18zuQ4sScUDHpcBRze5Q0/SkyrhaOb1waRngflaxeeYdV+5dwrHeE3MYNeibn4 YeO4PdXPmsVbviBoRLkGOG0L1zHSTFE3MgiwkvNfyYPfKsDTY4Lk5C28iLUf47KomO GWU5VWrZe83epT0M8FGdkqi+MM9wABmSvzsr/EMYWpHB4ruAHzt38x7qtcNuHnRWgq z+7kcdKiKzHYw== Date: Thu, 16 Sep 2021 17:48:29 -0700 From: "Darrick J. Wong" To: guaneryu@gmail.com Cc: linux-xfs@vger.kernel.org, fstests@vger.kernel.org, guan@eryu.me, osandov@fb.com Subject: [PATCH 2/1] common/rc: use directio mode for the loop device when possible Message-ID: <20210917004829.GD34874@magnolia> References: <163174932046.379383.10637812567210248503.stgit@magnolia> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <163174932046.379383.10637812567210248503.stgit@magnolia> Precedence: bulk List-ID: X-Mailing-List: fstests@vger.kernel.org From: Darrick J. Wong Recently, I've been observing very high runtimes of tests that format a filesystem atop a loop device and write enough data to fill memory, such as generic/590 and generic/361. Logging into the test VMs, I noticed that the writes to the file on the upper filesystem started fast, but soon slowed down to about 500KB/s and stayed that way for nearly 20 minutes. Looking through the D-state processes on the system revealed: /proc/4350/comm = xfs_io /proc/4350/stack : [<0>] balance_dirty_pages+0x332/0xda0 [<0>] balance_dirty_pages_ratelimited+0x304/0x400 [<0>] iomap_file_buffered_write+0x1ab/0x260 [<0>] xfs_file_buffered_write+0xba/0x330 [xfs] [<0>] new_sync_write+0x119/0x1a0 [<0>] vfs_write+0x274/0x310 [<0>] __x64_sys_pwrite64+0x89/0xc0 [<0>] do_syscall_64+0x35/0x80 [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xae Here's the xfs_io process performing a buffered write to the file on the upper filesystem, which at this point has dirtied enough pages to be ratelimited. /proc/28/comm = u10:0+flush-8:80 /proc/28/stack : [<0>] blk_mq_get_tag+0x11c/0x280 [<0>] __blk_mq_alloc_request+0xce/0xf0 [<0>] blk_mq_submit_bio+0x139/0x5b0 [<0>] submit_bio_noacct+0x3ba/0x430 [<0>] iomap_submit_ioend+0x4b/0x70 [<0>] xfs_vm_writepages+0x86/0x170 [xfs] [<0>] do_writepages+0xcc/0x200 [<0>] __writeback_single_inode+0x3d/0x300 [<0>] writeback_sb_inodes+0x207/0x4a0 [<0>] __writeback_inodes_wb+0x4c/0xe0 [<0>] wb_writeback+0x1da/0x2c0 [<0>] wb_workfn+0x2ad/0x4f0 [<0>] process_one_work+0x1e2/0x3d0 [<0>] worker_thread+0x53/0x3c0 [<0>] kthread+0x149/0x170 [<0>] ret_from_fork+0x1f/0x30 This is a flusher thread that has invoked writeback on the upper filesystem to try to clean memory pages. /proc/89/comm = u10:7+loop0 /proc/89/stack : [<0>] balance_dirty_pages+0x332/0xda0 [<0>] balance_dirty_pages_ratelimited+0x304/0x400 [<0>] iomap_file_buffered_write+0x1ab/0x260 [<0>] xfs_file_buffered_write+0xba/0x330 [xfs] [<0>] do_iter_readv_writev+0x14f/0x1a0 [<0>] do_iter_write+0x7b/0x1c0 [<0>] lo_write_bvec+0x62/0x1c0 [<0>] loop_process_work+0x3a4/0xba0 [<0>] process_one_work+0x1e2/0x3d0 [<0>] worker_thread+0x53/0x3c0 [<0>] kthread+0x149/0x170 [<0>] ret_from_fork+0x1f/0x30 Here's the loop device worker handling the writeback IO submitted by the flusher thread. Unfortunately, the loop device is using buffered write mode, which means that /writeback/ is dirtying pages and being throttled for that. This is stupid. Fix this by trying to enable "directio" mode on the loop device, which delivers two performance benefits: setting directio mode also enables async io mode, which will allow multiple IOs at once; and using directio nearly eliminates the chance that writeback will get throttled. On the author's system with fast storage, this reduces the runtime of g/590 from 20 minutes to 12 seconds, and g/361 from ~30s to ~3s. Signed-off-by: Darrick J. Wong --- common/rc | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/common/rc b/common/rc index 275b1f24..a174b695 100644 --- a/common/rc +++ b/common/rc @@ -3849,6 +3849,14 @@ _create_loop_device() { local file=$1 dev dev=`losetup -f --show $file` || _fail "Cannot assign $file to a loop device" + + # Try to enable asynchronous directio mode on the loopback device so + # that writeback started by a filesystem mounted on the loop device + # won't be throttled by buffered writes to the lower filesystem. This + # is a performance optimization for tests that want to write a lot of + # data, so it isn't required to work. + test -b "$dev" && losetup --direct-io=on $dev 2> /dev/null + echo $dev }