From patchwork Fri Dec 30 22:17:41 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13085068 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B32C3C4332F for ; Sat, 31 Dec 2022 00:11:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235648AbiLaAL2 (ORCPT ); Fri, 30 Dec 2022 19:11:28 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54614 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229758AbiLaAL0 (ORCPT ); Fri, 30 Dec 2022 19:11:26 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 45B75102E for ; Fri, 30 Dec 2022 16:11:26 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id D870161CD5 for ; Sat, 31 Dec 2022 00:11:25 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 4306BC433EF; Sat, 31 Dec 2022 00:11:25 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1672445485; bh=QqsP5YDPHGnXJiIQ/WonwsuNHD9JxZoqfKQUtOFqgKA=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=TRJGQLONuH57pFbqpqFTRGDohtEzDkJ/ZarZpqaPk2E8YxSbMUEdQx7NohWRnHVpb G4RvFtX51uuRAaQzt7kpafL3Vd9MOoCh+S0boOYchTjtz+eit8mHnmiRiex6F4+8ss /V1oIbZLJM2SHsR7bJkJiYxYT02MAFldx1RfZf7Ipt3Catm0KuiNxG9gQU04qTYWc3 C+owOFO4xxuIkmW8LqLboAFfpuTysBpHP8ct0JBIpySggbm/Py0oatZ2mI1zLID4e1 yISa9GygueN9KI8fao5Qz+e/PDlkKMpUbHEER0IzIPlTExDURAO4+/loXfIGdPPufK 2qGf6Ro6bLy+w== Subject: [PATCH 1/9] libxfs: clean up xfs_da_unmount usage From: "Darrick J. Wong" To: cem@kernel.org, djwong@kernel.org Cc: linux-xfs@vger.kernel.org Date: Fri, 30 Dec 2022 14:17:41 -0800 Message-ID: <167243866170.711834.15916685233545164856.stgit@magnolia> In-Reply-To: <167243866153.711834.17585439086893346840.stgit@magnolia> References: <167243866153.711834.17585439086893346840.stgit@magnolia> User-Agent: StGit/0.19 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Darrick J. Wong Replace the open-coded xfs_da_unmount usage in libxfs_umount and teach libxfs_mount not to leak the dir/attr geometry structures when the mount attempt fails. Signed-off-by: Darrick J. Wong --- libxfs/init.c | 17 +++++++++-------- 1 file changed, 9 insertions(+), 8 deletions(-) diff --git a/libxfs/init.c b/libxfs/init.c index 93dc1f1c599..f21dbc6732b 100644 --- a/libxfs/init.c +++ b/libxfs/init.c @@ -842,7 +842,7 @@ libxfs_mount( if (error) { fprintf(stderr, _("%s: data size check failed\n"), progname); if (!xfs_is_debugger(mp)) - return NULL; + goto out_da; } else libxfs_buf_relse(bp); @@ -856,7 +856,7 @@ libxfs_mount( fprintf(stderr, _("%s: log size checks failed\n"), progname); if (!xfs_is_debugger(mp)) - return NULL; + goto out_da; } if (bp) libxfs_buf_relse(bp); @@ -865,8 +865,8 @@ libxfs_mount( /* Initialize realtime fields in the mount structure */ if (rtmount_init(mp)) { fprintf(stderr, _("%s: realtime device init failed\n"), - progname); - return NULL; + progname); + goto out_da; } /* @@ -884,7 +884,7 @@ libxfs_mount( fprintf(stderr, _("%s: read of AG %u failed\n"), progname, sbp->sb_agcount); if (!xfs_is_debugger(mp)) - return NULL; + goto out_da; fprintf(stderr, _("%s: limiting reads to AG 0\n"), progname); sbp->sb_agcount = 1; @@ -902,6 +902,9 @@ libxfs_mount( xfs_set_perag_data_loaded(mp); return mp; +out_da: + xfs_da_unmount(mp); + return NULL; } void @@ -1024,9 +1027,7 @@ libxfs_umount( if (xfs_is_perag_data_loaded(mp)) libxfs_free_perag(mp); - kmem_free(mp->m_attr_geo); - kmem_free(mp->m_dir_geo); - + xfs_da_unmount(mp); kmem_free(mp->m_rtdev_targp); if (mp->m_logdev_targp != mp->m_ddev_targp) kmem_free(mp->m_logdev_targp); From patchwork Fri Dec 30 22:17:41 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13085069 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 625ADC4332F for ; Sat, 31 Dec 2022 00:11:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235871AbiLaALo (ORCPT ); Fri, 30 Dec 2022 19:11:44 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54708 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229758AbiLaALn (ORCPT ); Fri, 30 Dec 2022 19:11:43 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EB61D5F5E for ; Fri, 30 Dec 2022 16:11:41 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 7B98E61CE1 for ; Sat, 31 Dec 2022 00:11:41 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id C6FC0C433D2; Sat, 31 Dec 2022 00:11:40 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1672445500; bh=AqkH6y6TAKOH0/PszLxvfuO2B526VzI9qyebTk30fr4=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=nSecbn3XhG1Sc+9tuDYFvKEz2KadENArVOadzeXFdwBtg6cmgAVUGgvCmFBQQhgUs 33GkhE1FZAsN9pSkuwnnHnJvRxodGCEveFgKv+eLezWkPttzLVAs61EZC/KWIotf9I 7pFepOHWUKX5Q0HizPNdQtB6umoChor9XzgH7xol9f05IyfYIlmID0gsKXphvyMmRB Vn1BxqzR6TfPnUx9DNSOURUrYbLUv41FM0fVK0Kp8/cVQ72Pxj1496gHPTOkL1B9In as+4gh6ubM/Udn7dPxgjGZ6maOY13pWMT4sFE/bsotoQ9mVRTtOJf22+Lg0DNl3Hip tTN7H67rL1GAQ== Subject: [PATCH 2/9] libxfs: teach buftargs to maintain their own buffer hashtable From: "Darrick J. Wong" To: cem@kernel.org, djwong@kernel.org Cc: linux-xfs@vger.kernel.org Date: Fri, 30 Dec 2022 14:17:41 -0800 Message-ID: <167243866184.711834.9648436964411711614.stgit@magnolia> In-Reply-To: <167243866153.711834.17585439086893346840.stgit@magnolia> References: <167243866153.711834.17585439086893346840.stgit@magnolia> User-Agent: StGit/0.19 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Darrick J. Wong Currently, cached buffers are indexed with a single global bcache structure. This works ok for the limited use case where we only support reading from the data device, but will fail badly when we want to support buffers from in-memory btrees. Move the bcache structure into the buftarg. Signed-off-by: Darrick J. Wong --- copy/xfs_copy.c | 2 + db/init.c | 7 +++- db/sb.c | 3 +- include/libxfs.h | 3 -- include/xfs_mount.h | 2 + libxfs/init.c | 81 +++++++++++++++++++++++++++++---------------------- libxfs/libxfs_io.h | 14 +++++---- libxfs/rdwr.c | 40 +++++++++++++++++-------- logprint/logprint.c | 2 + mkfs/xfs_mkfs.c | 4 +-- repair/prefetch.c | 12 +++++--- repair/prefetch.h | 1 + repair/progress.c | 14 ++++++--- repair/progress.h | 2 + repair/scan.c | 2 + repair/xfs_repair.c | 32 +++++++++++--------- 16 files changed, 130 insertions(+), 91 deletions(-) diff --git a/copy/xfs_copy.c b/copy/xfs_copy.c index 79f65946709..45f8485799e 100644 --- a/copy/xfs_copy.c +++ b/copy/xfs_copy.c @@ -733,7 +733,7 @@ main(int argc, char **argv) memset(&mbuf, 0, sizeof(xfs_mount_t)); /* We don't yet know the sector size, so read maximal size */ - libxfs_buftarg_init(&mbuf, xargs.ddev, xargs.logdev, xargs.rtdev); + libxfs_buftarg_init(&mbuf, xargs.ddev, xargs.logdev, xargs.rtdev, 0); error = -libxfs_buf_read_uncached(mbuf.m_ddev_targp, XFS_SB_DADDR, 1 << (XFS_MAX_SECTORSIZE_LOG - BBSHIFT), 0, &sbp, NULL); if (error) { diff --git a/db/init.c b/db/init.c index eec65d0884d..9f045d27076 100644 --- a/db/init.c +++ b/db/init.c @@ -97,7 +97,6 @@ init( else x.dname = fsdevice; - x.bcache_flags = CACHE_MISCOMPARE_PURGE; if (!libxfs_init(&x)) { fputs(_("\nfatal error -- couldn't initialize XFS library\n"), stderr); @@ -109,7 +108,8 @@ init( * tool and so need to be able to mount busted filesystems. */ memset(&xmount, 0, sizeof(struct xfs_mount)); - libxfs_buftarg_init(&xmount, x.ddev, x.logdev, x.rtdev); + libxfs_buftarg_init(&xmount, x.ddev, x.logdev, x.rtdev, + XFS_BUFTARG_MISCOMPARE_PURGE); error = -libxfs_buf_read_uncached(xmount.m_ddev_targp, XFS_SB_DADDR, 1 << (XFS_MAX_SECTORSIZE_LOG - BBSHIFT), 0, &bp, NULL); if (error) { @@ -134,7 +134,8 @@ init( agcount = sbp->sb_agcount; mp = libxfs_mount(&xmount, sbp, x.ddev, x.logdev, x.rtdev, - LIBXFS_MOUNT_DEBUGGER); + LIBXFS_MOUNT_DEBUGGER | + LIBXFS_MOUNT_CACHE_MISCOMPARE_PURGE); if (!mp) { fprintf(stderr, _("%s: device %s unusable (not an XFS filesystem?)\n"), diff --git a/db/sb.c b/db/sb.c index 2d508c26a3b..fd81286cd60 100644 --- a/db/sb.c +++ b/db/sb.c @@ -233,7 +233,8 @@ sb_logcheck(void) } } - libxfs_buftarg_init(mp, x.ddev, x.logdev, x.rtdev); + libxfs_buftarg_init(mp, x.ddev, x.logdev, x.rtdev, + XFS_BUFTARG_MISCOMPARE_PURGE); dirty = xlog_is_dirty(mp, mp->m_log, &x, 0); if (dirty == -1) { diff --git a/include/libxfs.h b/include/libxfs.h index 915bf511313..b07da6c03ee 100644 --- a/include/libxfs.h +++ b/include/libxfs.h @@ -123,8 +123,6 @@ typedef struct libxfs_xinit { int dfd; /* data subvolume file descriptor */ int logfd; /* log subvolume file descriptor */ int rtfd; /* realtime subvolume file descriptor */ - int icache_flags; /* cache init flags */ - int bcache_flags; /* cache init flags */ } libxfs_init_t; #define LIBXFS_ISREADONLY 0x0002 /* disallow all mounted filesystems */ @@ -141,7 +139,6 @@ extern int libxfs_device_to_fd (dev_t); extern dev_t libxfs_device_open (char *, int, int, int); extern void libxfs_device_close (dev_t); extern int libxfs_device_alignment (void); -extern void libxfs_report(FILE *); /* check or write log footer: specify device, log size in blocks & uuid */ typedef char *(libxfs_get_block_t)(char *, int, void *); diff --git a/include/xfs_mount.h b/include/xfs_mount.h index acd9214da3a..6be85bf21d2 100644 --- a/include/xfs_mount.h +++ b/include/xfs_mount.h @@ -256,6 +256,8 @@ __XFS_UNSUPP_OPSTATE(shutdown) #define LIBXFS_MOUNT_DEBUGGER (1U << 0) /* report metadata corruption to stdout */ #define LIBXFS_MOUNT_REPORT_CORRUPTION (1U << 1) +/* purge buffer cache on miscompares */ +#define LIBXFS_MOUNT_CACHE_MISCOMPARE_PURGE (1U << 2) #define LIBXFS_BHASHSIZE(sbp) (1<<10) diff --git a/libxfs/init.c b/libxfs/init.c index f21dbc6732b..5e90bf733b7 100644 --- a/libxfs/init.c +++ b/libxfs/init.c @@ -31,7 +31,6 @@ pthread_mutex_t atomic64_lock = PTHREAD_MUTEX_INITIALIZER; char *progname = "libxfs"; /* default, changed by each tool */ -struct cache *libxfs_bcache; /* global buffer cache */ int libxfs_bhash_size; /* #buckets in bcache */ int use_xfs_buf_lock; /* global flag: use xfs_buf locks for MT */ @@ -407,8 +406,6 @@ libxfs_init(libxfs_init_t *a) } if (!libxfs_bhash_size) libxfs_bhash_size = LIBXFS_BHASHSIZE(sbp); - libxfs_bcache = cache_init(a->bcache_flags, libxfs_bhash_size, - &libxfs_bcache_operations); use_xfs_buf_lock = a->usebuflock; xfs_dir_startup(); init_caches(); @@ -592,9 +589,14 @@ static struct xfs_buftarg * libxfs_buftarg_alloc( struct xfs_mount *mp, dev_t dev, - unsigned long write_fails) + unsigned long write_fails, + unsigned int buftarg_flags) { struct xfs_buftarg *btp; + unsigned int bcache_flags = 0; + + if (!write_fails) + buftarg_flags &= ~XFS_BUFTARG_INJECT_WRITE_FAIL; btp = malloc(sizeof(*btp)); if (!btp) { @@ -604,13 +606,15 @@ libxfs_buftarg_alloc( } btp->bt_mount = mp; btp->bt_bdev = dev; - btp->flags = 0; - if (write_fails) { - btp->writes_left = write_fails; - btp->flags |= XFS_BUFTARG_INJECT_WRITE_FAIL; - } + btp->flags = buftarg_flags; + btp->writes_left = write_fails; + if (btp->flags & XFS_BUFTARG_MISCOMPARE_PURGE) + bcache_flags |= CACHE_MISCOMPARE_PURGE; pthread_mutex_init(&btp->lock, NULL); + btp->bcache = cache_init(bcache_flags, libxfs_bhash_size, + &libxfs_bcache_operations); + return btp; } @@ -633,10 +637,12 @@ libxfs_buftarg_init( struct xfs_mount *mp, dev_t dev, dev_t logdev, - dev_t rtdev) + dev_t rtdev, + unsigned int btflags) { char *p = getenv("LIBXFS_DEBUG_WRITE_CRASH"); unsigned long dfail = 0, lfail = 0, rfail = 0; + unsigned int dflags = 0, lflags = 0, rflags = 0; /* Simulate utility crash after a certain number of writes. */ while (p && *p) { @@ -650,6 +656,8 @@ libxfs_buftarg_init( exit(1); } dfail = strtoul(val, NULL, 0); + if (dfail) + dflags |= XFS_BUFTARG_INJECT_WRITE_FAIL; break; case WF_LOG: if (!val) { @@ -658,6 +666,8 @@ libxfs_buftarg_init( exit(1); } lfail = strtoul(val, NULL, 0); + if (lfail) + lflags |= XFS_BUFTARG_INJECT_WRITE_FAIL; break; case WF_RT: if (!val) { @@ -666,6 +676,8 @@ libxfs_buftarg_init( exit(1); } rfail = strtoul(val, NULL, 0); + if (rfail) + rflags |= XFS_BUFTARG_INJECT_WRITE_FAIL; break; default: fprintf(stderr, _("unknown write fail type %s\n"), @@ -708,12 +720,15 @@ libxfs_buftarg_init( return; } - mp->m_ddev_targp = libxfs_buftarg_alloc(mp, dev, dfail); + mp->m_ddev_targp = libxfs_buftarg_alloc(mp, dev, dfail, + dflags | btflags); if (!logdev || logdev == dev) mp->m_logdev_targp = mp->m_ddev_targp; else - mp->m_logdev_targp = libxfs_buftarg_alloc(mp, logdev, lfail); - mp->m_rtdev_targp = libxfs_buftarg_alloc(mp, rtdev, rfail); + mp->m_logdev_targp = libxfs_buftarg_alloc(mp, logdev, lfail, + lflags | btflags); + mp->m_rtdev_targp = libxfs_buftarg_alloc(mp, rtdev, rfail, + rflags | btflags); } /* Compute maximum possible height for per-AG btree types for this fs. */ @@ -760,14 +775,18 @@ libxfs_mount( struct xfs_buf *bp; struct xfs_sb *sbp; xfs_daddr_t d; + unsigned int btflags = 0; int error; + mp->m_features = xfs_sb_version_to_features(sb); if (flags & LIBXFS_MOUNT_DEBUGGER) xfs_set_debugger(mp); if (flags & LIBXFS_MOUNT_REPORT_CORRUPTION) xfs_set_reporting_corruption(mp); - libxfs_buftarg_init(mp, dev, logdev, rtdev); + if (flags & LIBXFS_MOUNT_CACHE_MISCOMPARE_PURGE) + btflags |= XFS_BUFTARG_MISCOMPARE_PURGE; + libxfs_buftarg_init(mp, dev, logdev, rtdev, btflags); mp->m_finobt_nores = true; xfs_set_inode32(mp); @@ -975,7 +994,7 @@ libxfs_flush_mount( * LOST_WRITE flag to be set in the buftarg. Once that's done, * instruct the disks to persist their write caches. */ - libxfs_bcache_flush(); + libxfs_bcache_flush(mp); /* Flush all kernel and disk write caches, and report failures. */ if (mp->m_ddev_targp) { @@ -1001,6 +1020,14 @@ libxfs_flush_mount( return error; } +static void +libxfs_buftarg_free( + struct xfs_buftarg *btp) +{ + cache_destroy(btp->bcache); + kmem_free(btp); +} + /* * Release any resource obtained during a mount. */ @@ -1017,7 +1044,7 @@ libxfs_umount( * all incore buffers, then pick up the outcome when we tell the disks * to persist their write caches. */ - libxfs_bcache_purge(); + libxfs_bcache_purge(mp); error = libxfs_flush_mount(mp); /* @@ -1028,10 +1055,10 @@ libxfs_umount( libxfs_free_perag(mp); xfs_da_unmount(mp); - kmem_free(mp->m_rtdev_targp); + libxfs_buftarg_free(mp->m_rtdev_targp); if (mp->m_logdev_targp != mp->m_ddev_targp) - kmem_free(mp->m_logdev_targp); - kmem_free(mp->m_ddev_targp); + libxfs_buftarg_free(mp->m_logdev_targp); + libxfs_buftarg_free(mp->m_ddev_targp); return error; } @@ -1047,10 +1074,7 @@ libxfs_destroy( libxfs_close_devices(li); - /* Free everything from the buffer cache before freeing buffer cache */ - libxfs_bcache_purge(); libxfs_bcache_free(); - cache_destroy(libxfs_bcache); leaked = destroy_caches(); rcu_unregister_thread(); if (getenv("LIBXFS_LEAK_CHECK") && leaked) @@ -1062,16 +1086,3 @@ libxfs_device_alignment(void) { return platform_align_blockdev(); } - -void -libxfs_report(FILE *fp) -{ - time_t t; - char *c; - - cache_report(fp, "libxfs_bcache", libxfs_bcache); - - t = time(NULL); - c = asctime(localtime(&t)); - fprintf(fp, "%s", c); -} diff --git a/libxfs/libxfs_io.h b/libxfs/libxfs_io.h index 4ffe788d446..3fa9e75dcaa 100644 --- a/libxfs/libxfs_io.h +++ b/libxfs/libxfs_io.h @@ -26,6 +26,7 @@ struct xfs_buftarg { unsigned long writes_left; dev_t bt_bdev; unsigned int flags; + struct cache *bcache; /* global buffer cache */ }; /* We purged a dirty buffer and lost a write. */ @@ -34,6 +35,8 @@ struct xfs_buftarg { #define XFS_BUFTARG_CORRUPT_WRITE (1 << 1) /* Simulate failure after a certain number of writes. */ #define XFS_BUFTARG_INJECT_WRITE_FAIL (1 << 2) +/* purge buffers when lookups find a size mismatch */ +#define XFS_BUFTARG_MISCOMPARE_PURGE (1 << 3) /* Simulate the system crashing after a certain number of writes. */ static inline void @@ -50,8 +53,8 @@ xfs_buftarg_trip_write( pthread_mutex_unlock(&btp->lock); } -extern void libxfs_buftarg_init(struct xfs_mount *mp, dev_t ddev, - dev_t logdev, dev_t rtdev); +void libxfs_buftarg_init(struct xfs_mount *mp, dev_t ddev, dev_t logdev, + dev_t rtdev, unsigned int btflags); int libxfs_blkdev_issue_flush(struct xfs_buftarg *btp); #define LIBXFS_BBTOOFF64(bbs) (((xfs_off_t)(bbs)) << BBSHIFT) @@ -139,7 +142,6 @@ int libxfs_buf_priority(struct xfs_buf *bp); /* Buffer Cache Interfaces */ -extern struct cache *libxfs_bcache; extern struct cache_operations libxfs_bcache_operations; #define LIBXFS_GETBUF_TRYLOCK (1 << 0) @@ -183,10 +185,10 @@ libxfs_buf_read( int libxfs_readbuf_verify(struct xfs_buf *bp, const struct xfs_buf_ops *ops); struct xfs_buf *libxfs_getsb(struct xfs_mount *mp); -extern void libxfs_bcache_purge(void); +extern void libxfs_bcache_purge(struct xfs_mount *mp); extern void libxfs_bcache_free(void); -extern void libxfs_bcache_flush(void); -extern int libxfs_bcache_overflowed(void); +extern void libxfs_bcache_flush(struct xfs_mount *mp); +extern int libxfs_bcache_overflowed(struct xfs_mount *mp); /* Buffer (Raw) Interfaces */ int libxfs_bwrite(struct xfs_buf *bp); diff --git a/libxfs/rdwr.c b/libxfs/rdwr.c index d5aad3ea210..5d63ec4f6de 100644 --- a/libxfs/rdwr.c +++ b/libxfs/rdwr.c @@ -198,18 +198,21 @@ libxfs_bhash(cache_key_t key, unsigned int hashsize, unsigned int hashshift) } static int -libxfs_bcompare(struct cache_node *node, cache_key_t key) +libxfs_bcompare( + struct cache_node *node, + cache_key_t key) { struct xfs_buf *bp = container_of(node, struct xfs_buf, b_node); struct xfs_bufkey *bkey = (struct xfs_bufkey *)key; + struct cache *bcache = bkey->buftarg->bcache; if (bp->b_target->bt_bdev == bkey->buftarg->bt_bdev && bp->b_cache_key == bkey->blkno) { if (bp->b_length == bkey->bblen) return CACHE_HIT; #ifdef IO_BCOMPARE_CHECK - if (!(libxfs_bcache->c_flags & CACHE_MISCOMPARE_PURGE)) { + if (!(bcache->c_flags & CACHE_MISCOMPARE_PURGE)) { fprintf(stderr, "%lx: Badness in key lookup (length)\n" "bp=(bno 0x%llx, len %u bytes) key=(bno 0x%llx, len %u bytes)\n", @@ -399,11 +402,12 @@ __cache_lookup( struct xfs_buf **bpp) { struct cache_node *cn = NULL; + struct cache *bcache = key->buftarg->bcache; struct xfs_buf *bp; *bpp = NULL; - cache_node_get(libxfs_bcache, key, &cn); + cache_node_get(bcache, key, &cn); if (!cn) return -ENOMEM; bp = container_of(cn, struct xfs_buf, b_node); @@ -415,7 +419,7 @@ __cache_lookup( if (ret) { ASSERT(ret == EAGAIN); if (flags & LIBXFS_GETBUF_TRYLOCK) { - cache_node_put(libxfs_bcache, cn); + cache_node_put(bcache, cn); return -EAGAIN; } @@ -434,7 +438,7 @@ __cache_lookup( bp->b_holder = pthread_self(); } - cache_node_set_priority(libxfs_bcache, cn, + cache_node_set_priority(bcache, cn, cache_node_get_priority(cn) - CACHE_PREFETCH_PRIORITY); *bpp = bp; return 0; @@ -550,7 +554,7 @@ libxfs_buf_relse( } if (!list_empty(&bp->b_node.cn_hash)) - cache_node_put(libxfs_bcache, &bp->b_node); + cache_node_put(bp->b_target->bcache, &bp->b_node); else if (--bp->b_node.cn_count == 0) { if (bp->b_flags & LIBXFS_B_DIRTY) libxfs_bwrite(bp); @@ -1004,21 +1008,31 @@ libxfs_bflush( } void -libxfs_bcache_purge(void) +libxfs_bcache_purge(struct xfs_mount *mp) { - cache_purge(libxfs_bcache); + if (!mp) + return; + cache_purge(mp->m_ddev_targp->bcache); + cache_purge(mp->m_logdev_targp->bcache); + cache_purge(mp->m_rtdev_targp->bcache); } void -libxfs_bcache_flush(void) +libxfs_bcache_flush(struct xfs_mount *mp) { - cache_flush(libxfs_bcache); + if (!mp) + return; + cache_flush(mp->m_ddev_targp->bcache); + cache_flush(mp->m_logdev_targp->bcache); + cache_flush(mp->m_rtdev_targp->bcache); } int -libxfs_bcache_overflowed(void) +libxfs_bcache_overflowed(struct xfs_mount *mp) { - return cache_overflowed(libxfs_bcache); + return cache_overflowed(mp->m_ddev_targp->bcache) || + cache_overflowed(mp->m_logdev_targp->bcache) || + cache_overflowed(mp->m_rtdev_targp->bcache); } struct cache_operations libxfs_bcache_operations = { @@ -1460,7 +1474,7 @@ libxfs_buf_set_priority( struct xfs_buf *bp, int priority) { - cache_node_set_priority(libxfs_bcache, &bp->b_node, priority); + cache_node_set_priority(bp->b_target->bcache, &bp->b_node, priority); } int diff --git a/logprint/logprint.c b/logprint/logprint.c index 9a8811f467c..df70553543b 100644 --- a/logprint/logprint.c +++ b/logprint/logprint.c @@ -213,7 +213,7 @@ main(int argc, char **argv) exit(1); logstat(&mount); - libxfs_buftarg_init(&mount, x.ddev, x.logdev, x.rtdev); + libxfs_buftarg_init(&mount, x.ddev, x.logdev, x.rtdev, 0); logfd = (x.logfd < 0) ? x.dfd : x.logfd; diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c index 31861a2eb3c..638e7ce6ea4 100644 --- a/mkfs/xfs_mkfs.c +++ b/mkfs/xfs_mkfs.c @@ -4518,7 +4518,7 @@ main( /* * we need the libxfs buffer cache from here on in. */ - libxfs_buftarg_init(mp, xi.ddev, xi.logdev, xi.rtdev); + libxfs_buftarg_init(mp, xi.ddev, xi.logdev, xi.rtdev, 0); /* * Before we mount the filesystem we need to make sure the devices have @@ -4587,7 +4587,7 @@ main( * Need to drop references to inodes we still hold, first. */ libxfs_rtmount_destroy(mp); - libxfs_bcache_purge(); + libxfs_bcache_purge(mp); /* * Mark the filesystem ok. diff --git a/repair/prefetch.c b/repair/prefetch.c index 017750e9a92..5665e0a224c 100644 --- a/repair/prefetch.c +++ b/repair/prefetch.c @@ -886,10 +886,12 @@ init_prefetch( prefetch_args_t * start_inode_prefetch( + struct xfs_mount *mp, xfs_agnumber_t agno, int dirs_only, prefetch_args_t *prev_args) { + struct cache *bcache = mp->m_ddev_targp->bcache; prefetch_args_t *args; long max_queue; struct xfs_ino_geometry *igeo = M_IGEO(mp); @@ -914,7 +916,7 @@ start_inode_prefetch( * and not any other associated metadata like directories */ - max_queue = libxfs_bcache->c_maxcount / thread_count / 8; + max_queue = bcache->c_maxcount / thread_count / 8; if (igeo->inode_cluster_size > mp->m_sb.sb_blocksize) max_queue = max_queue * igeo->blocks_per_cluster / igeo->ialloc_blks; @@ -970,14 +972,16 @@ prefetch_ag_range( void (*func)(struct workqueue *, xfs_agnumber_t, void *)) { + struct xfs_mount *mp = work->wq_ctx; int i; struct prefetch_args *pf_args[2]; - pf_args[start_ag & 1] = start_inode_prefetch(start_ag, dirs_only, NULL); + pf_args[start_ag & 1] = start_inode_prefetch(mp, start_ag, dirs_only, + NULL); for (i = start_ag; i < end_ag; i++) { /* Don't prefetch end_ag */ if (i + 1 < end_ag) - pf_args[(~i) & 1] = start_inode_prefetch(i + 1, + pf_args[(~i) & 1] = start_inode_prefetch(mp, i + 1, dirs_only, pf_args[i & 1]); func(work, i, pf_args[i & 1]); } @@ -1027,7 +1031,7 @@ do_inode_prefetch( * filesystem - it's all in the cache. In that case, run a thread per * CPU to maximise parallelism of the queue to be processed. */ - if (check_cache && !libxfs_bcache_overflowed()) { + if (check_cache && !libxfs_bcache_overflowed(mp)) { queue.wq_ctx = mp; create_work_queue(&queue, mp, platform_nproc()); for (i = 0; i < mp->m_sb.sb_agcount; i++) diff --git a/repair/prefetch.h b/repair/prefetch.h index 54ece48ad22..a8c52a1195b 100644 --- a/repair/prefetch.h +++ b/repair/prefetch.h @@ -39,6 +39,7 @@ init_prefetch( prefetch_args_t * start_inode_prefetch( + struct xfs_mount *mp, xfs_agnumber_t agno, int dirs_only, prefetch_args_t *prev_args); diff --git a/repair/progress.c b/repair/progress.c index f6c4d988444..625dc41c289 100644 --- a/repair/progress.c +++ b/repair/progress.c @@ -383,14 +383,18 @@ timediff(int phase) ** array. */ char * -timestamp(int end, int phase, char *buf) +timestamp( + struct xfs_mount *mp, + int end, + int phase, + char *buf) { - time_t now; - struct tm *tmp; + time_t now; + struct tm *tmp; - if (verbose > 1) - cache_report(stderr, "libxfs_bcache", libxfs_bcache); + if (verbose > 1 && mp && mp->m_ddev_targp) + cache_report(stderr, "libxfs_bcache", mp->m_ddev_targp->bcache); now = time(NULL); diff --git a/repair/progress.h b/repair/progress.h index 2c1690db1b1..75b751b783b 100644 --- a/repair/progress.h +++ b/repair/progress.h @@ -37,7 +37,7 @@ extern void stop_progress_rpt(void); extern void summary_report(void); extern int set_progress_msg(int report, uint64_t total); extern uint64_t print_final_rpt(void); -extern char *timestamp(int end, int phase, char *buf); +extern char *timestamp(struct xfs_mount *mp, int end, int phase, char *buf); extern char *duration(int val, char *buf); extern int do_parallel; diff --git a/repair/scan.c b/repair/scan.c index 008ef65ac75..ac2233b93b7 100644 --- a/repair/scan.c +++ b/repair/scan.c @@ -42,7 +42,7 @@ struct aghdr_cnts { void set_mp(xfs_mount_t *mpp) { - libxfs_bcache_purge(); + libxfs_bcache_purge(mp); mp = mpp; } diff --git a/repair/xfs_repair.c b/repair/xfs_repair.c index ff29bea9743..e49d4292ad4 100644 --- a/repair/xfs_repair.c +++ b/repair/xfs_repair.c @@ -944,9 +944,11 @@ repair_capture_writeback( } static inline void -phase_end(int phase) +phase_end( + struct xfs_mount *mp, + int phase) { - timestamp(PHASE_END, phase, NULL); + timestamp(mp, PHASE_END, phase, NULL); /* Fail if someone injected an post-phase error. */ if (fail_after_phase && phase == fail_after_phase) @@ -981,8 +983,8 @@ main(int argc, char **argv) msgbuf = malloc(DURATION_BUF_SIZE); - timestamp(PHASE_START, 0, NULL); - phase_end(0); + timestamp(temp_mp, PHASE_START, 0, NULL); + phase_end(temp_mp, 0); /* -f forces this, but let's be nice and autodetect it, as well. */ if (!isa_file) { @@ -1005,7 +1007,7 @@ main(int argc, char **argv) /* do phase1 to make sure we have a superblock */ phase1(temp_mp); - phase_end(1); + phase_end(temp_mp, 1); if (no_modify && primary_sb_modified) { do_warn(_("Primary superblock would have been modified.\n" @@ -1142,8 +1144,8 @@ main(int argc, char **argv) unsigned long max_mem; struct rlimit rlim; - libxfs_bcache_purge(); - cache_destroy(libxfs_bcache); + libxfs_bcache_purge(mp); + cache_destroy(mp->m_ddev_targp->bcache); mem_used = (mp->m_sb.sb_icount >> (10 - 2)) + (mp->m_sb.sb_dblocks >> (10 + 1)) + @@ -1203,7 +1205,7 @@ main(int argc, char **argv) do_log(_(" - block cache size set to %d entries\n"), libxfs_bhash_size * HASH_CACHE_RATIO); - libxfs_bcache = cache_init(0, libxfs_bhash_size, + mp->m_ddev_targp->bcache = cache_init(0, libxfs_bhash_size, &libxfs_bcache_operations); } @@ -1231,16 +1233,16 @@ main(int argc, char **argv) /* make sure the per-ag freespace maps are ok so we can mount the fs */ phase2(mp, phase2_threads); - phase_end(2); + phase_end(mp, 2); if (do_prefetch) init_prefetch(mp); phase3(mp, phase2_threads); - phase_end(3); + phase_end(mp, 3); phase4(mp); - phase_end(4); + phase_end(mp, 4); if (no_modify) { printf(_("No modify flag set, skipping phase 5\n")); @@ -1250,7 +1252,7 @@ main(int argc, char **argv) } else { phase5(mp); } - phase_end(5); + phase_end(mp, 5); /* * Done with the block usage maps, toss them... @@ -1260,10 +1262,10 @@ main(int argc, char **argv) if (!bad_ino_btree) { phase6(mp); - phase_end(6); + phase_end(mp, 6); phase7(mp, phase2_threads); - phase_end(7); + phase_end(mp, 7); } else { do_warn( _("Inode allocation btrees are too corrupted, skipping phases 6 and 7\n")); @@ -1388,7 +1390,7 @@ _("Note - stripe unit (%d) and width (%d) were copied from a backup superblock.\ * verifiers are run (where we discover the max metadata LSN), reformat * the log if necessary and unmount. */ - libxfs_bcache_flush(); + libxfs_bcache_flush(mp); format_log_max_lsn(mp); if (xfs_sb_version_needsrepair(&mp->m_sb)) From patchwork Fri Dec 30 22:17:41 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13085070 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3E245C4332F for ; Sat, 31 Dec 2022 00:12:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235906AbiLaAMC (ORCPT ); Fri, 30 Dec 2022 19:12:02 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54842 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235901AbiLaAMA (ORCPT ); Fri, 30 Dec 2022 19:12:00 -0500 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 234D0B4A8 for ; Fri, 30 Dec 2022 16:11:59 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id C386BB81E00 for ; Sat, 31 Dec 2022 00:11:57 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 7F51FC433EF; Sat, 31 Dec 2022 00:11:56 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1672445516; bh=2ovO5lGVpiTfloWLhpoN+1rS5yJYoSjSR3SKKNclDUs=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=hhf46kqyDyb2v5hDgTM2YG90JqHX0C0Cgyghun81Fo23a8pspaHqQbN2YOmDhed54 5BxpQQgXytK8NWVEjku2+/nsWV2yQlUxqlPXOu5CkG8hihJ6KWGyB+kexgMXCInWI9 2JIrBeLh+4g2pvzTFdOskGxf0XfLpVtZl76bXFOMq0YpowpvELlEFvoUdoZu7RCavJ RLWSbFb+bUL42CtnAR0vv9R2BTDdA+23hTNyodQhOrkEB4QESflhskqO5Rss7wnPuP CSy3qTvYyAFmTdC4Fh54TDkWr932+vErfpTATqIjBvs2BbiqpTqLJM4aMuT1yc8D3H PW5sb1NtIUU5A== Subject: [PATCH 3/9] libxfs: add xfile support From: "Darrick J. Wong" To: cem@kernel.org, djwong@kernel.org Cc: linux-xfs@vger.kernel.org Date: Fri, 30 Dec 2022 14:17:41 -0800 Message-ID: <167243866197.711834.3914718577863787117.stgit@magnolia> In-Reply-To: <167243866153.711834.17585439086893346840.stgit@magnolia> References: <167243866153.711834.17585439086893346840.stgit@magnolia> User-Agent: StGit/0.19 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Darrick J. Wong Port the xfile functionality (anonymous pageable file-index memory) from the kernel. Signed-off-by: Darrick J. Wong --- configure.ac | 3 + include/builddefs.in | 3 + libxfs/Makefile | 12 +++ libxfs/xfile.c | 224 +++++++++++++++++++++++++++++++++++++++++++++++++ libxfs/xfile.h | 56 ++++++++++++ m4/package_libcdev.m4 | 50 +++++++++++ 6 files changed, 348 insertions(+) create mode 100644 libxfs/xfile.c create mode 100644 libxfs/xfile.h diff --git a/configure.ac b/configure.ac index 20445a98d84..6c704464061 100644 --- a/configure.ac +++ b/configure.ac @@ -251,6 +251,9 @@ AC_CHECK_SIZEOF([char *]) AC_TYPE_UMODE_T AC_MANUAL_FORMAT AC_HAVE_LIBURCU_ATOMIC64 +AC_HAVE_MEMFD_CLOEXEC +AC_HAVE_O_TMPFILE +AC_HAVE_MKOSTEMP_CLOEXEC AC_CONFIG_FILES([include/builddefs]) AC_OUTPUT diff --git a/include/builddefs.in b/include/builddefs.in index e0a2f3cbc95..60c1320af37 100644 --- a/include/builddefs.in +++ b/include/builddefs.in @@ -127,6 +127,9 @@ SYSTEMD_SYSTEM_UNIT_DIR = @systemd_system_unit_dir@ HAVE_CROND = @have_crond@ CROND_DIR = @crond_dir@ HAVE_LIBURCU_ATOMIC64 = @have_liburcu_atomic64@ +HAVE_MEMFD_CLOEXEC = @have_memfd_cloexec@ +HAVE_O_TMPFILE = @have_o_tmpfile@ +HAVE_MKOSTEMP_CLOEXEC = @have_mkostemp_cloexec@ GCCFLAGS = -funsigned-char -fno-strict-aliasing -Wall # -Wbitwise -Wno-transparent-union -Wno-old-initializer -Wno-decl diff --git a/libxfs/Makefile b/libxfs/Makefile index 010ee68e229..2007be570ed 100644 --- a/libxfs/Makefile +++ b/libxfs/Makefile @@ -26,6 +26,7 @@ HFILES = \ libxfs_priv.h \ linux-err.h \ topology.h \ + xfile.h \ xfs_ag_resv.h \ xfs_alloc.h \ xfs_alloc_btree.h \ @@ -65,6 +66,7 @@ CFILES = cache.c \ topology.c \ trans.c \ util.c \ + xfile.c \ xfs_ag.c \ xfs_ag_resv.c \ xfs_alloc.c \ @@ -111,6 +113,16 @@ CFILES = cache.c \ # #LCFLAGS += +ifeq ($(HAVE_MEMFD_CLOEXEC),yes) + LCFLAGS += -DHAVE_MEMFD_CLOEXEC +endif +ifeq ($(HAVE_O_TMPFILE),yes) + LCFLAGS += -DHAVE_O_TMPFILE +endif +ifeq ($(HAVE_MKOSTEMP_CLOEXEC),yes) + LCFLAGS += -DHAVE_MKOSTEMP_CLOEXEC +endif + FCFLAGS = -I. LTLIBS = $(LIBPTHREAD) $(LIBRT) diff --git a/libxfs/xfile.c b/libxfs/xfile.c new file mode 100644 index 00000000000..357ffb0077d --- /dev/null +++ b/libxfs/xfile.c @@ -0,0 +1,224 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Copyright (C) 2022 Oracle. All Rights Reserved. + * Author: Darrick J. Wong + */ +#include "libxfs_priv.h" +#include "libxfs.h" +#include "libxfs/xfile.h" +#include +#include +#include + +/* + * Swappable Temporary Memory + * ========================== + * + * Offline checking sometimes needs to be able to stage a large amount of data + * in memory. This information might not fit in the available memory and it + * doesn't all need to be accessible at all times. In other words, we want an + * indexed data buffer to store data that can be paged out. + * + * memfd files meet those requirements. Therefore, the xfile mechanism uses + * one to store our staging data. The xfile must be freed with xfile_destroy. + * + * xfiles assume that the caller will handle all required concurrency + * management; file locks are not taken. + */ + +/* + * Open a memory-backed fd to back an xfile. We require close-on-exec here, + * because these memfd files function as windowed RAM and hence should never + * be shared with other processes. + */ +static int +xfile_create_fd( + const char *description) +{ + int fd = -1; + +#ifdef HAVE_MEMFD_CLOEXEC + /* memfd_create exists in kernel 3.17 (2014) and glibc 2.27 (2018). */ + fd = memfd_create(description, MFD_CLOEXEC); + if (fd >= 0) + return fd; +#endif + +#ifdef HAVE_O_TMPFILE + /* + * O_TMPFILE exists as of kernel 3.11 (2013), which means that if we + * find it, we're pretty safe in assuming O_CLOEXEC exists too. + */ + fd = open("/dev/shm", O_TMPFILE | O_CLOEXEC | O_RDWR, 0600); + if (fd >= 0) + return fd; + + fd = open("/tmp", O_TMPFILE | O_CLOEXEC | O_RDWR, 0600); + if (fd >= 0) + return fd; +#endif + +#ifdef HAVE_MKOSTEMP_CLOEXEC + /* + * mkostemp exists as of glibc 2.7 (2007) and O_CLOEXEC exists as of + * kernel 2.6.23 (2007). + */ + fd = mkostemp("libxfsXXXXXX", O_CLOEXEC); + if (fd >= 0) + return fd; +#endif + +#if !defined(HAVE_MEMFD_CLOEXEC) && \ + !defined(HAVE_O_TMPFILE) && \ + !defined(HAVE_MKOSTEMP_CLOEXEC) +# error System needs memfd_create, O_TMPFILE, or O_CLOEXEC to build! +#endif + + return fd; +} + +/* + * Create an xfile of the given size. The description will be used in the + * trace output. + */ +int +xfile_create( + struct xfs_mount *mp, + const char *description, + struct xfile **xfilep) +{ + struct xfile *xf; + char fname[MAXNAMELEN]; + int error; + + snprintf(fname, MAXNAMELEN - 1, "XFS (%s): %s", mp->m_fsname, + description); + fname[MAXNAMELEN - 1] = 0; + + xf = kmem_alloc(sizeof(struct xfile), KM_MAYFAIL); + if (!xf) + return -ENOMEM; + + xf->fd = xfile_create_fd(fname); + if (xf->fd < 0) { + error = -errno; + kmem_free(xf); + return error; + } + + *xfilep = xf; + return 0; +} + +/* Close the file and release all resources. */ +void +xfile_destroy( + struct xfile *xf) +{ + close(xf->fd); + kmem_free(xf); +} + +static inline loff_t +xfile_maxbytes( + struct xfile *xf) +{ + if (sizeof(loff_t) == 8) + return LLONG_MAX; + return LONG_MAX; +} + +/* + * Read a memory object directly from the xfile's page cache. Unlike regular + * pread, we return -E2BIG and -EFBIG for reads that are too large or at too + * high an offset, instead of truncating the read. Otherwise, we return + * bytes read or an error code, like regular pread. + */ +ssize_t +xfile_pread( + struct xfile *xf, + void *buf, + size_t count, + loff_t pos) +{ + ssize_t ret; + + if (count > INT_MAX) + return -E2BIG; + if (xfile_maxbytes(xf) - pos < count) + return -EFBIG; + + ret = pread(xf->fd, buf, count, pos); + if (ret >= 0) + return ret; + return -errno; +} + +/* + * Write a memory object directly to the xfile's page cache. Unlike regular + * pwrite, we return -E2BIG and -EFBIG for writes that are too large or at too + * high an offset, instead of truncating the write. Otherwise, we return + * bytes written or an error code, like regular pwrite. + */ +ssize_t +xfile_pwrite( + struct xfile *xf, + void *buf, + size_t count, + loff_t pos) +{ + ssize_t ret; + + if (count > INT_MAX) + return -E2BIG; + if (xfile_maxbytes(xf) - pos < count) + return -EFBIG; + + ret = pwrite(xf->fd, buf, count, pos); + if (ret >= 0) + return ret; + return -errno; +} + +/* Query stat information for an xfile. */ +int +xfile_stat( + struct xfile *xf, + struct xfile_stat *statbuf) +{ + struct stat ks; + int error; + + error = fstat(xf->fd, &ks); + if (error) + return -errno; + + statbuf->size = ks.st_size; + statbuf->bytes = (unsigned long long)ks.st_blocks << 9; + return 0; +} + +/* Dump an xfile to stdout. */ +int +xfile_dump( + struct xfile *xf) +{ + char *argv[] = {"od", "-tx1", "-Ad", "-c", NULL}; + pid_t child; + int i; + + child = fork(); + if (child != 0) { + int wstatus; + + wait(&wstatus); + return wstatus == 0 ? 0 : -EIO; + } + + /* reroute our xfile to stdin and shut everything else */ + dup2(xf->fd, 0); + for (i = 3; i < 1024; i++) + close(i); + + return execvp("od", argv); +} diff --git a/libxfs/xfile.h b/libxfs/xfile.h new file mode 100644 index 00000000000..ad13f62ee0f --- /dev/null +++ b/libxfs/xfile.h @@ -0,0 +1,56 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +/* + * Copyright (C) 2022 Oracle. All Rights Reserved. + * Author: Darrick J. Wong + */ +#ifndef __LIBXFS_XFILE_H__ +#define __LIBXFS_XFILE_H__ + +struct xfile { + int fd; +}; + +int xfile_create(struct xfs_mount *mp, const char *description, + struct xfile **xfilep); +void xfile_destroy(struct xfile *xf); + +ssize_t xfile_pread(struct xfile *xf, void *buf, size_t count, loff_t pos); +ssize_t xfile_pwrite(struct xfile *xf, void *buf, size_t count, loff_t pos); + +/* + * Load an object. Since we're treating this file as "memory", any error or + * short IO is treated as a failure to allocate memory. + */ +static inline int +xfile_obj_load(struct xfile *xf, void *buf, size_t count, loff_t pos) +{ + ssize_t ret = xfile_pread(xf, buf, count, pos); + + if (ret < 0 || ret != count) + return -ENOMEM; + return 0; +} + +/* + * Store an object. Since we're treating this file as "memory", any error or + * short IO is treated as a failure to allocate memory. + */ +static inline int +xfile_obj_store(struct xfile *xf, void *buf, size_t count, loff_t pos) +{ + ssize_t ret = xfile_pwrite(xf, buf, count, pos); + + if (ret < 0 || ret != count) + return -ENOMEM; + return 0; +} + +struct xfile_stat { + loff_t size; + unsigned long long bytes; +}; + +int xfile_stat(struct xfile *xf, struct xfile_stat *statbuf); +int xfile_dump(struct xfile *xf); + +#endif /* __LIBXFS_XFILE_H__ */ diff --git a/m4/package_libcdev.m4 b/m4/package_libcdev.m4 index bb1ab49c11e..119d1bda74d 100644 --- a/m4/package_libcdev.m4 +++ b/m4/package_libcdev.m4 @@ -507,3 +507,53 @@ AC_DEFUN([AC_PACKAGE_CHECK_LTO], AC_SUBST(lto_cflags) AC_SUBST(lto_ldflags) ]) + +# +# Check if we have a memfd_create syscall with a MFD_CLOEXEC flag +# +AC_DEFUN([AC_HAVE_MEMFD_CLOEXEC], + [ AC_MSG_CHECKING([for memfd_fd and MFD_CLOEXEC]) + AC_LINK_IFELSE([AC_LANG_PROGRAM([[ +#define _GNU_SOURCE +#include + ]], [[ + return memfd_create("xfs", MFD_CLOEXEC); + ]])],[have_memfd_cloexec=yes + AC_MSG_RESULT(yes)],[AC_MSG_RESULT(no)]) + AC_SUBST(have_memfd_cloexec) + ]) + +# +# Check if we have the O_TMPFILE flag +# +AC_DEFUN([AC_HAVE_O_TMPFILE], + [ AC_MSG_CHECKING([for O_TMPFILE]) + AC_LINK_IFELSE([AC_LANG_PROGRAM([[ +#define _GNU_SOURCE +#include +#include +#include + ]], [[ + return open("nowhere", O_TMPFILE, 0600); + ]])],[have_o_tmpfile=yes + AC_MSG_RESULT(yes)],[AC_MSG_RESULT(no)]) + AC_SUBST(have_o_tmpfile) + ]) + +# +# Check if we have mkostemp with the O_CLOEXEC flag +# +AC_DEFUN([AC_HAVE_MKOSTEMP_CLOEXEC], + [ AC_MSG_CHECKING([for mkostemp and O_CLOEXEC]) + AC_LINK_IFELSE([AC_LANG_PROGRAM([[ +#define _GNU_SOURCE +#include +#include +#include +#include + ]], [[ + return mkostemp("nowhere", O_TMPFILE); + ]])],[have_mkostemp_cloexec=yes + AC_MSG_RESULT(yes)],[AC_MSG_RESULT(no)]) + AC_SUBST(have_mkostemp_cloexec) + ]) From patchwork Fri Dec 30 22:17:42 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13085071 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 819C6C4332F for ; Sat, 31 Dec 2022 00:12:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229758AbiLaAMQ (ORCPT ); Fri, 30 Dec 2022 19:12:16 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55056 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235655AbiLaAMP (ORCPT ); Fri, 30 Dec 2022 19:12:15 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 400AECE00 for ; Fri, 30 Dec 2022 16:12:13 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 9341361CE5 for ; Sat, 31 Dec 2022 00:12:12 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id EE572C433EF; Sat, 31 Dec 2022 00:12:11 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1672445532; bh=7fp1bwsJ5DMIExHg+2ao4tL1B+EBVFR+AQUbYRcF5RU=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=EQp2Jyu/gE1M6Xo35zpOi1eD1lDA+n1ZqTFppYcnQasXvdeKSEe0Jdj5HdRqqXrhp iJGA7jjqnE1EblFa0p0FBmwgvAXC3KYP7N/h4i0RqkJBsP/sIDlEITEw3mBNsZdamD LOSjYZCzsfR6Zh8RiKeWCBYMoClQG3bNMLoIb2h4l8LGIsvG6+ZA2LegQQp2+roVRC BTwMBkiNmckfNbH5XJVHJnU5p5gznGodbSij9y89whLiztpgvt+2RWc6V2y7yMQGIl whh22dgUOYlhoGnMI9FavRy6k7NZt1jGqOa2d1SCLe8+MnojHTGDvwZrjFIh0izjZt JD6lU5hikeB7Q== Subject: [PATCH 4/9] libxfs: support in-memory buffer cache targets From: "Darrick J. Wong" To: cem@kernel.org, djwong@kernel.org Cc: linux-xfs@vger.kernel.org Date: Fri, 30 Dec 2022 14:17:42 -0800 Message-ID: <167243866211.711834.5799622851053030124.stgit@magnolia> In-Reply-To: <167243866153.711834.17585439086893346840.stgit@magnolia> References: <167243866153.711834.17585439086893346840.stgit@magnolia> User-Agent: StGit/0.19 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Darrick J. Wong Allow the buffer cache to target in-memory files by connecting it to xfiles. Signed-off-by: Darrick J. Wong --- libxfs/libxfs_io.h | 14 +++++++++++++- libxfs/rdwr.c | 47 +++++++++++++++++++++++++++++++++++++++++++---- 2 files changed, 56 insertions(+), 5 deletions(-) diff --git a/libxfs/libxfs_io.h b/libxfs/libxfs_io.h index 3fa9e75dcaa..c002ef058ec 100644 --- a/libxfs/libxfs_io.h +++ b/libxfs/libxfs_io.h @@ -24,7 +24,10 @@ struct xfs_buftarg { struct xfs_mount *bt_mount; pthread_mutex_t lock; unsigned long writes_left; - dev_t bt_bdev; + union { + struct xfile *bt_xfile; + dev_t bt_bdev; + }; unsigned int flags; struct cache *bcache; /* global buffer cache */ }; @@ -37,6 +40,15 @@ struct xfs_buftarg { #define XFS_BUFTARG_INJECT_WRITE_FAIL (1 << 2) /* purge buffers when lookups find a size mismatch */ #define XFS_BUFTARG_MISCOMPARE_PURGE (1 << 3) +/* use xfile for */ +#define XFS_BUFTARG_IN_MEMORY (1 << 4) + +static inline bool +xfs_buftarg_in_memory( + struct xfs_buftarg *btp) +{ + return btp->flags & XFS_BUFTARG_IN_MEMORY; +} /* Simulate the system crashing after a certain number of writes. */ static inline void diff --git a/libxfs/rdwr.c b/libxfs/rdwr.c index 5d63ec4f6de..9d36698bb5c 100644 --- a/libxfs/rdwr.c +++ b/libxfs/rdwr.c @@ -18,7 +18,7 @@ #include "xfs_inode.h" #include "xfs_trans.h" #include "libfrog/platform.h" - +#include "libxfs/xfile.h" #include "libxfs.h" static void libxfs_brelse(struct cache_node *node); @@ -68,6 +68,9 @@ libxfs_device_zero(struct xfs_buftarg *btp, xfs_daddr_t start, uint len) char *z; int error, fd; + if (btp->flags & XFS_BUFTARG_IN_MEMORY) + return -EOPNOTSUPP; + fd = libxfs_device_to_fd(btp->bt_bdev); start_offset = LIBXFS_BBTOOFF64(start); @@ -578,6 +581,31 @@ libxfs_balloc( return &bp->b_node; } +static inline int +libxfs_buf_ioapply_in_memory( + struct xfs_buf *bp, + bool is_write) +{ + struct xfile *xfile = bp->b_target->bt_xfile; + loff_t pos = BBTOB(xfs_buf_daddr(bp)); + size_t size = BBTOB(bp->b_length); + int error; + + if (bp->b_nmaps > 1) { + /* We don't need or support multi-map buffers. */ + ASSERT(0); + error = -EIO; + } else if (is_write) { + error = xfile_obj_store(xfile, bp->b_addr, size, pos); + } else { + error = xfile_obj_load(xfile, bp->b_addr, size, pos); + } + if (error) + bp->b_error = error; + else if (!is_write) + bp->b_flags |= LIBXFS_B_UPTODATE; + return error; +} static int __read_buf(int fd, void *buf, int len, off64_t offset, int flags) @@ -602,12 +630,16 @@ int libxfs_readbufr(struct xfs_buftarg *btp, xfs_daddr_t blkno, struct xfs_buf *bp, int len, int flags) { - int fd = libxfs_device_to_fd(btp->bt_bdev); + int fd; int bytes = BBTOB(len); int error; ASSERT(len <= bp->b_length); + if (bp->b_target->flags & XFS_BUFTARG_IN_MEMORY) + return libxfs_buf_ioapply_in_memory(bp, false); + + fd = libxfs_device_to_fd(btp->bt_bdev); error = __read_buf(fd, bp->b_addr, bytes, LIBXFS_BBTOOFF64(blkno), flags); if (!error && bp->b_target->bt_bdev == btp->bt_bdev && @@ -640,6 +672,9 @@ libxfs_readbufr_map(struct xfs_buftarg *btp, struct xfs_buf *bp, int flags) void *buf; int i; + if (bp->b_target->flags & XFS_BUFTARG_IN_MEMORY) + return libxfs_buf_ioapply_in_memory(bp, false); + fd = libxfs_device_to_fd(btp->bt_bdev); buf = bp->b_addr; for (i = 0; i < bp->b_nmaps; i++) { @@ -824,7 +859,7 @@ int libxfs_bwrite( struct xfs_buf *bp) { - int fd = libxfs_device_to_fd(bp->b_target->bt_bdev); + int fd; /* * we never write buffers that are marked stale. This indicates they @@ -859,7 +894,10 @@ libxfs_bwrite( } } - if (!(bp->b_flags & LIBXFS_B_DISCONTIG)) { + if (bp->b_target->flags & XFS_BUFTARG_IN_MEMORY) { + libxfs_buf_ioapply_in_memory(bp, true); + } else if (!(bp->b_flags & LIBXFS_B_DISCONTIG)) { + fd = libxfs_device_to_fd(bp->b_target->bt_bdev); bp->b_error = __write_buf(fd, bp->b_addr, BBTOB(bp->b_length), LIBXFS_BBTOOFF64(xfs_buf_daddr(bp)), bp->b_flags); @@ -867,6 +905,7 @@ libxfs_bwrite( int i; void *buf = bp->b_addr; + fd = libxfs_device_to_fd(bp->b_target->bt_bdev); for (i = 0; i < bp->b_nmaps; i++) { off64_t offset = LIBXFS_BBTOOFF64(bp->b_maps[i].bm_bn); int len = BBTOB(bp->b_maps[i].bm_len); From patchwork Fri Dec 30 22:17:42 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13085072 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 57C6FC4332F for ; Sat, 31 Dec 2022 00:12:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235398AbiLaAMc (ORCPT ); Fri, 30 Dec 2022 19:12:32 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55164 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229994AbiLaAMb (ORCPT ); Fri, 30 Dec 2022 19:12:31 -0500 Received: from ams.source.kernel.org (ams.source.kernel.org [IPv6:2604:1380:4601:e00::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3E3D9102E for ; Fri, 30 Dec 2022 16:12:30 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 00DC9B81E07 for ; Sat, 31 Dec 2022 00:12:29 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id ABCF9C433D2; Sat, 31 Dec 2022 00:12:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1672445547; bh=xetoTG19L9IIBcjn1e0GZ9FHC/r7WxpFxy2COu9FuL0=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=hyadrQeg3lvtvghts8SMP8nuKflPao9Af8aSKdyfOPdA9mvzxsViNy3WKUeU0KZzn HOlSWR+k8uHmp0l/IR5rqR/AeC3b8yRxKjsYI5HjPYar5GatAW82RoYnD3G9zXzn9b k95o9eiVKk9aENftqhGwQhajch0RAhsC/PW5JuXlAa1ZcvNIaVg1SUE6txEIrZLgUk 1NR6pYeNDniWIlDwiI/ElFnZQtpa7BnHugkuGH/uvOZjCjxf4mmUoGA7Vt2LjIDxqz xiXsIul++K9gdz3WLw/tph4sBOo1p52QSvJ89O7pREL5o+/vpTWW0vgohNNknEMuBt QFXIl6YJ0YwAA== Subject: [PATCH 5/9] xfs: consolidate btree block freeing tracepoints From: "Darrick J. Wong" To: cem@kernel.org, djwong@kernel.org Cc: linux-xfs@vger.kernel.org Date: Fri, 30 Dec 2022 14:17:42 -0800 Message-ID: <167243866224.711834.5671254620893854013.stgit@magnolia> In-Reply-To: <167243866153.711834.17585439086893346840.stgit@magnolia> References: <167243866153.711834.17585439086893346840.stgit@magnolia> User-Agent: StGit/0.19 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Darrick J. Wong Don't waste tracepoint segment memory on per-btree block freeing tracepoints when we can do it from the generic btree code. Signed-off-by: Darrick J. Wong --- include/xfs_trace.h | 3 +-- libxfs/xfs_btree.c | 2 ++ libxfs/xfs_refcount_btree.c | 2 -- libxfs/xfs_rmap_btree.c | 2 -- 4 files changed, 3 insertions(+), 6 deletions(-) diff --git a/include/xfs_trace.h b/include/xfs_trace.h index 19b05f6e25e..0a7581b5794 100644 --- a/include/xfs_trace.h +++ b/include/xfs_trace.h @@ -61,6 +61,7 @@ #define trace_xfs_btree_commit_ifakeroot(a) ((void) 0) #define trace_xfs_btree_bload_level_geometry(a,b,c,d,e,f,g) ((void) 0) #define trace_xfs_btree_bload_block(a,b,c,d,e,f) ((void) 0) +#define trace_xfs_btree_free_block(...) ((void) 0) #define trace_xfs_free_extent(a,b,c,d,e,f,g) ((void) 0) #define trace_xfs_agf(a,b,c,d) ((void) 0) @@ -243,7 +244,6 @@ #define trace_xfs_rmap_find_left_neighbor_result(...) ((void) 0) #define trace_xfs_rmap_lookup_le_range_result(...) ((void) 0) -#define trace_xfs_rmapbt_free_block(...) ((void) 0) #define trace_xfs_rmapbt_alloc_block(...) ((void) 0) #define trace_xfs_ag_resv_critical(...) ((void) 0) @@ -263,7 +263,6 @@ #define trace_xfs_refcount_insert_error(...) ((void) 0) #define trace_xfs_refcount_delete(...) ((void) 0) #define trace_xfs_refcount_delete_error(...) ((void) 0) -#define trace_xfs_refcountbt_free_block(...) ((void) 0) #define trace_xfs_refcountbt_alloc_block(...) ((void) 0) #define trace_xfs_refcount_rec_order_error(...) ((void) 0) diff --git a/libxfs/xfs_btree.c b/libxfs/xfs_btree.c index e0b9f075015..d7501da87ce 100644 --- a/libxfs/xfs_btree.c +++ b/libxfs/xfs_btree.c @@ -411,6 +411,8 @@ xfs_btree_free_block( { int error; + trace_xfs_btree_free_block(cur, bp); + error = cur->bc_ops->free_block(cur, bp); if (!error) { xfs_trans_binval(cur->bc_tp, bp); diff --git a/libxfs/xfs_refcount_btree.c b/libxfs/xfs_refcount_btree.c index 0a8e80e705f..c1dd2fe8d37 100644 --- a/libxfs/xfs_refcount_btree.c +++ b/libxfs/xfs_refcount_btree.c @@ -107,8 +107,6 @@ xfs_refcountbt_free_block( xfs_fsblock_t fsbno = XFS_DADDR_TO_FSB(mp, xfs_buf_daddr(bp)); int error; - trace_xfs_refcountbt_free_block(cur->bc_mp, cur->bc_ag.pag->pag_agno, - XFS_FSB_TO_AGBNO(cur->bc_mp, fsbno), 1); be32_add_cpu(&agf->agf_refcount_blocks, -1); xfs_alloc_log_agf(cur->bc_tp, agbp, XFS_AGF_REFCOUNT_BLOCKS); error = xfs_free_extent(cur->bc_tp, cur->bc_ag.pag, diff --git a/libxfs/xfs_rmap_btree.c b/libxfs/xfs_rmap_btree.c index e8ffb23be42..36f6714ed3f 100644 --- a/libxfs/xfs_rmap_btree.c +++ b/libxfs/xfs_rmap_btree.c @@ -123,8 +123,6 @@ xfs_rmapbt_free_block( int error; bno = xfs_daddr_to_agbno(cur->bc_mp, xfs_buf_daddr(bp)); - trace_xfs_rmapbt_free_block(cur->bc_mp, pag->pag_agno, - bno, 1); be32_add_cpu(&agf->agf_rmap_blocks, -1); xfs_alloc_log_agf(cur->bc_tp, agbp, XFS_AGF_RMAP_BLOCKS); error = xfs_alloc_put_freelist(pag, cur->bc_tp, agbp, NULL, bno, 1); From patchwork Fri Dec 30 22:17:42 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13085073 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 713D0C4332F for ; Sat, 31 Dec 2022 00:12:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229994AbiLaAMq (ORCPT ); Fri, 30 Dec 2022 19:12:46 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55182 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235655AbiLaAMp (ORCPT ); Fri, 30 Dec 2022 19:12:45 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 666F760EF for ; Fri, 30 Dec 2022 16:12:44 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 021BC61CE3 for ; Sat, 31 Dec 2022 00:12:44 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 62D8EC433D2; Sat, 31 Dec 2022 00:12:43 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1672445563; bh=cgz9gijUdM5wAlvrjoqIXTZS9ZyLOGqAJiH75qW4eps=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=lz+xyrG3HPaD70UqueqVKg91e4zUklE20+N3zJ0vAgJu/7Sq7YC6b42QpQPgf0b8z vko6eBqcWX2Dttx/7eOmpce11sZGQaFNV01Xoin/gFpgylAOLl45mpQiRMJMw4nJw1 JkBIDyouhS1F2PM/q7NrwIfgreNgnnCzE1dMYf3WiOnBZ2aji6/C/0op3jlDpwTZ19 CfZ1ejPB3bn2ZsFn47W+XDMQX25xVdA3ALvW3gAx4oVg/bjI2FPZgc298PIhYsIx04 t+JpvmNqMG0CFgcwhFQI4twzqkiItG4r8gMOwWdIDAo+uQCq82FHtzQcuQ8G2jXw// m6HOTdhl6o2Xg== Subject: [PATCH 6/9] xfs: consolidate btree block allocation tracepoints From: "Darrick J. Wong" To: cem@kernel.org, djwong@kernel.org Cc: linux-xfs@vger.kernel.org Date: Fri, 30 Dec 2022 14:17:42 -0800 Message-ID: <167243866237.711834.18080247135876736641.stgit@magnolia> In-Reply-To: <167243866153.711834.17585439086893346840.stgit@magnolia> References: <167243866153.711834.17585439086893346840.stgit@magnolia> User-Agent: StGit/0.19 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Darrick J. Wong Don't waste tracepoint segment memory on per-btree block allocation tracepoints when we can do it from the generic btree code. Signed-off-by: Darrick J. Wong --- include/xfs_trace.h | 4 +--- libxfs/xfs_btree.c | 20 +++++++++++++++++--- libxfs/xfs_refcount_btree.c | 2 -- libxfs/xfs_rmap_btree.c | 2 -- 4 files changed, 18 insertions(+), 10 deletions(-) diff --git a/include/xfs_trace.h b/include/xfs_trace.h index 0a7581b5794..3ca6cda253c 100644 --- a/include/xfs_trace.h +++ b/include/xfs_trace.h @@ -62,6 +62,7 @@ #define trace_xfs_btree_bload_level_geometry(a,b,c,d,e,f,g) ((void) 0) #define trace_xfs_btree_bload_block(a,b,c,d,e,f) ((void) 0) #define trace_xfs_btree_free_block(...) ((void) 0) +#define trace_xfs_btree_alloc_block(...) ((void) 0) #define trace_xfs_free_extent(a,b,c,d,e,f,g) ((void) 0) #define trace_xfs_agf(a,b,c,d) ((void) 0) @@ -244,8 +245,6 @@ #define trace_xfs_rmap_find_left_neighbor_result(...) ((void) 0) #define trace_xfs_rmap_lookup_le_range_result(...) ((void) 0) -#define trace_xfs_rmapbt_alloc_block(...) ((void) 0) - #define trace_xfs_ag_resv_critical(...) ((void) 0) #define trace_xfs_ag_resv_needed(...) ((void) 0) #define trace_xfs_ag_resv_free(...) ((void) 0) @@ -263,7 +262,6 @@ #define trace_xfs_refcount_insert_error(...) ((void) 0) #define trace_xfs_refcount_delete(...) ((void) 0) #define trace_xfs_refcount_delete_error(...) ((void) 0) -#define trace_xfs_refcountbt_alloc_block(...) ((void) 0) #define trace_xfs_refcount_rec_order_error(...) ((void) 0) #define trace_xfs_refcount_lookup(...) ((void) 0) diff --git a/libxfs/xfs_btree.c b/libxfs/xfs_btree.c index d7501da87ce..cd722d1c830 100644 --- a/libxfs/xfs_btree.c +++ b/libxfs/xfs_btree.c @@ -2690,6 +2690,20 @@ xfs_btree_rshift( return error; } +static inline int +xfs_btree_alloc_block( + struct xfs_btree_cur *cur, + const union xfs_btree_ptr *hint_block, + union xfs_btree_ptr *new_block, + int *stat) +{ + int error; + + error = cur->bc_ops->alloc_block(cur, hint_block, new_block, stat); + trace_xfs_btree_alloc_block(cur, new_block, *stat, error); + return error; +} + /* * Split cur/level block in half. * Return new block number and the key to its first @@ -2733,7 +2747,7 @@ __xfs_btree_split( xfs_btree_buf_to_ptr(cur, lbp, &lptr); /* Allocate the new block. If we can't do it, we're toast. Give up. */ - error = cur->bc_ops->alloc_block(cur, &lptr, &rptr, stat); + error = xfs_btree_alloc_block(cur, &lptr, &rptr, stat); if (error) goto error0; if (*stat == 0) @@ -2999,7 +3013,7 @@ xfs_btree_new_iroot( pp = xfs_btree_ptr_addr(cur, 1, block); /* Allocate the new block. If we can't do it, we're toast. Give up. */ - error = cur->bc_ops->alloc_block(cur, pp, &nptr, stat); + error = xfs_btree_alloc_block(cur, pp, &nptr, stat); if (error) goto error0; if (*stat == 0) @@ -3099,7 +3113,7 @@ xfs_btree_new_root( cur->bc_ops->init_ptr_from_cur(cur, &rptr); /* Allocate the new block. If we can't do it, we're toast. Give up. */ - error = cur->bc_ops->alloc_block(cur, &rptr, &lptr, stat); + error = xfs_btree_alloc_block(cur, &rptr, &lptr, stat); if (error) goto error0; if (*stat == 0) diff --git a/libxfs/xfs_refcount_btree.c b/libxfs/xfs_refcount_btree.c index c1dd2fe8d37..ec30077bd49 100644 --- a/libxfs/xfs_refcount_btree.c +++ b/libxfs/xfs_refcount_btree.c @@ -76,8 +76,6 @@ xfs_refcountbt_alloc_block( error = xfs_alloc_vextent(&args); if (error) goto out_error; - trace_xfs_refcountbt_alloc_block(cur->bc_mp, cur->bc_ag.pag->pag_agno, - args.agbno, 1); if (args.fsbno == NULLFSBLOCK) { *stat = 0; return 0; diff --git a/libxfs/xfs_rmap_btree.c b/libxfs/xfs_rmap_btree.c index 36f6714ed3f..928f61053b0 100644 --- a/libxfs/xfs_rmap_btree.c +++ b/libxfs/xfs_rmap_btree.c @@ -92,8 +92,6 @@ xfs_rmapbt_alloc_block( &bno, 1); if (error) return error; - - trace_xfs_rmapbt_alloc_block(cur->bc_mp, pag->pag_agno, bno, 1); if (bno == NULLAGBLOCK) { *stat = 0; return 0; From patchwork Fri Dec 30 22:17:42 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13085074 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9F08AC4332F for ; Sat, 31 Dec 2022 00:13:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235693AbiLaAND (ORCPT ); Fri, 30 Dec 2022 19:13:03 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55218 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235655AbiLaANC (ORCPT ); Fri, 30 Dec 2022 19:13:02 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 74BA2B48F for ; Fri, 30 Dec 2022 16:13:00 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 011C061CE8 for ; Sat, 31 Dec 2022 00:13:00 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 1C15BC433D2; Sat, 31 Dec 2022 00:12:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1672445579; bh=FrPCngJLgut+729qY7WY89dy4W86avaXIUQyCqav7Og=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=Y2ArWrRhMHeunsW4/J4YzSyn48rMhvk6htLLSJQnc5VNciuA4XN4mxUPboyKXw/sE eXJ5Y3YH+EJXXyrUyMWLkU+Dqmodbh7eLHZ82BAcwjWVud0qbvuo0Es5BqwX92xSQL hVisnItH/caFIY36dqgFLtHdDOygi5vQcnXNMEnu6NlbCTBZEgl9j9JRaMRvj2plDh p3kkK1H5Rfu5Ne1vVoXGtCByN68eeh2UmBmHUcfw+BKIaE/GSSXeMnHue6hcT1vHX9 /hHqfeOjtoDLYCXF/sCcckB24n20+SDN4LIlJQX/Gv33IDIqNBIeGMNtoXkZoSC3pc GKbN+tJsXsNGg== Subject: [PATCH 7/9] xfs: support in-memory btrees From: "Darrick J. Wong" To: cem@kernel.org, djwong@kernel.org Cc: linux-xfs@vger.kernel.org Date: Fri, 30 Dec 2022 14:17:42 -0800 Message-ID: <167243866250.711834.16040353400926127990.stgit@magnolia> In-Reply-To: <167243866153.711834.17585439086893346840.stgit@magnolia> References: <167243866153.711834.17585439086893346840.stgit@magnolia> User-Agent: StGit/0.19 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Darrick J. Wong Adapt the generic btree cursor code to be able to create a btree whose buffers come from a (presumably in-memory) buftarg with a header block that's specific to in-memory btrees. We'll connect this to other parts of online scrub in the next patches. Note that in-memory btrees always have a block size matching the system memory page size for efficiency reasons. Signed-off-by: Darrick J. Wong --- include/libxfs.h | 2 libxfs/Makefile | 3 libxfs/init.c | 3 libxfs/libxfs_io.h | 10 + libxfs/libxfs_priv.h | 2 libxfs/rdwr.c | 30 ++++ libxfs/xfbtree.c | 343 ++++++++++++++++++++++++++++++++++++++++++++++++ libxfs/xfbtree.h | 36 +++++ libxfs/xfile.c | 18 +++ libxfs/xfile.h | 50 +++++++ libxfs/xfs_btree.c | 151 +++++++++++++++++---- libxfs/xfs_btree.h | 17 ++ libxfs/xfs_btree_mem.h | 87 ++++++++++++ 13 files changed, 724 insertions(+), 28 deletions(-) create mode 100644 libxfs/xfbtree.c create mode 100644 libxfs/xfbtree.h create mode 100644 libxfs/xfs_btree_mem.h diff --git a/include/libxfs.h b/include/libxfs.h index b07da6c03ee..887f57b6171 100644 --- a/include/libxfs.h +++ b/include/libxfs.h @@ -7,6 +7,8 @@ #ifndef __LIBXFS_H__ #define __LIBXFS_H__ +#define CONFIG_XFS_IN_MEMORY_BTREE + #include "libxfs_api_defs.h" #include "platform_defs.h" #include "xfs.h" diff --git a/libxfs/Makefile b/libxfs/Makefile index 2007be570ed..b4aa9706aaa 100644 --- a/libxfs/Makefile +++ b/libxfs/Makefile @@ -26,6 +26,7 @@ HFILES = \ libxfs_priv.h \ linux-err.h \ topology.h \ + xfbtree.h \ xfile.h \ xfs_ag_resv.h \ xfs_alloc.h \ @@ -36,6 +37,7 @@ HFILES = \ xfs_bmap.h \ xfs_bmap_btree.h \ xfs_btree.h \ + xfs_btree_mem.h \ xfs_btree_staging.h \ xfs_attr_remote.h \ xfs_cksum.h \ @@ -66,6 +68,7 @@ CFILES = cache.c \ topology.c \ trans.c \ util.c \ + xfbtree.c \ xfile.c \ xfs_ag.c \ xfs_ag_resv.c \ diff --git a/libxfs/init.c b/libxfs/init.c index 5e90bf733b7..676c6fbd6d2 100644 --- a/libxfs/init.c +++ b/libxfs/init.c @@ -22,6 +22,7 @@ #include "xfs_rmap_btree.h" #include "xfs_refcount_btree.h" #include "libfrog/platform.h" +#include "xfile.h" #include "libxfs.h" /* for now */ @@ -321,6 +322,8 @@ libxfs_init(libxfs_init_t *a) a->dsize = a->lbsize = a->rtbsize = 0; a->dbsize = a->logBBsize = a->logBBstart = a->rtsize = 0; + xfile_libinit(); + fd = -1; flags = (a->isreadonly | a->isdirect); diff --git a/libxfs/libxfs_io.h b/libxfs/libxfs_io.h index c002ef058ec..fb536c1c3c9 100644 --- a/libxfs/libxfs_io.h +++ b/libxfs/libxfs_io.h @@ -271,4 +271,14 @@ xfs_buf_delwri_queue_here(struct xfs_buf *bp, struct list_head *buffer_list) int xfs_buf_delwri_submit(struct list_head *buffer_list); void xfs_buf_delwri_cancel(struct list_head *list); +xfs_daddr_t xfs_buftarg_nr_sectors(struct xfs_buftarg *btp); + +static inline bool +xfs_buftarg_verify_daddr( + struct xfs_buftarg *btp, + xfs_daddr_t daddr) +{ + return daddr < xfs_buftarg_nr_sectors(btp); +} + #endif /* __LIBXFS_IO_H__ */ diff --git a/libxfs/libxfs_priv.h b/libxfs/libxfs_priv.h index 8cfdc3e295a..f205d31a305 100644 --- a/libxfs/libxfs_priv.h +++ b/libxfs/libxfs_priv.h @@ -37,6 +37,8 @@ #ifndef __LIBXFS_INTERNAL_XFS_H__ #define __LIBXFS_INTERNAL_XFS_H__ +#define CONFIG_XFS_IN_MEMORY_BTREE + #include "libxfs_api_defs.h" #include "platform_defs.h" #include "xfs.h" diff --git a/libxfs/rdwr.c b/libxfs/rdwr.c index 9d36698bb5c..c2dbc51f3f2 100644 --- a/libxfs/rdwr.c +++ b/libxfs/rdwr.c @@ -1544,3 +1544,33 @@ __xfs_buf_mark_corrupt( xfs_buf_corruption_error(bp, fa); xfs_buf_stale(bp); } + +/* Return the number of sectors for a buffer target. */ +xfs_daddr_t +xfs_buftarg_nr_sectors( + struct xfs_buftarg *btp) +{ + struct stat sb; + int fd; + int ret; + + if (btp->flags & XFS_BUFTARG_IN_MEMORY) + return xfile_size(btp->bt_xfile) >> BBSHIFT; + + fd = libxfs_device_to_fd(btp->bt_bdev); + ret = fstat(fd, &sb); + if (ret) + return 0; + + if (S_ISBLK(sb.st_mode)) { + uint64_t sz; + + ret = ioctl(fd, BLKGETSIZE64, &sz); + if (ret) + return 0; + + return sz >> BBSHIFT; + } + + return sb.st_size >> BBSHIFT; +} diff --git a/libxfs/xfbtree.c b/libxfs/xfbtree.c new file mode 100644 index 00000000000..0481e9ed9f4 --- /dev/null +++ b/libxfs/xfbtree.c @@ -0,0 +1,343 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (C) 2022 Oracle. All Rights Reserved. + * Author: Darrick J. Wong + */ +#include "libxfs_priv.h" +#include "libxfs.h" +#include "xfile.h" +#include "xfbtree.h" +#include "xfs_btree_mem.h" + +/* btree ops functions for in-memory btrees. */ + +static xfs_failaddr_t +xfs_btree_mem_head_verify( + struct xfs_buf *bp) +{ + struct xfs_btree_mem_head *mhead = bp->b_addr; + struct xfs_mount *mp = bp->b_mount; + + if (!xfs_verify_magic(bp, mhead->mh_magic)) + return __this_address; + if (be32_to_cpu(mhead->mh_nlevels) == 0) + return __this_address; + if (!uuid_equal(&mhead->mh_uuid, &mp->m_sb.sb_meta_uuid)) + return __this_address; + + return NULL; +} + +static void +xfs_btree_mem_head_read_verify( + struct xfs_buf *bp) +{ + xfs_failaddr_t fa = xfs_btree_mem_head_verify(bp); + + if (fa) + xfs_verifier_error(bp, -EFSCORRUPTED, fa); +} + +static void +xfs_btree_mem_head_write_verify( + struct xfs_buf *bp) +{ + xfs_failaddr_t fa = xfs_btree_mem_head_verify(bp); + + if (fa) + xfs_verifier_error(bp, -EFSCORRUPTED, fa); +} + +static const struct xfs_buf_ops xfs_btree_mem_head_buf_ops = { + .name = "xfs_btree_mem_head", + .magic = { cpu_to_be32(XFS_BTREE_MEM_HEAD_MAGIC), + cpu_to_be32(XFS_BTREE_MEM_HEAD_MAGIC) }, + .verify_read = xfs_btree_mem_head_read_verify, + .verify_write = xfs_btree_mem_head_write_verify, + .verify_struct = xfs_btree_mem_head_verify, +}; + +/* Initialize the header block for an in-memory btree. */ +static inline void +xfs_btree_mem_head_init( + struct xfs_buf *head_bp, + unsigned long long owner, + xfileoff_t leaf_xfoff) +{ + struct xfs_btree_mem_head *mhead = head_bp->b_addr; + struct xfs_mount *mp = head_bp->b_mount; + + mhead->mh_magic = cpu_to_be32(XFS_BTREE_MEM_HEAD_MAGIC); + mhead->mh_nlevels = cpu_to_be32(1); + mhead->mh_owner = cpu_to_be64(owner); + mhead->mh_root = cpu_to_be64(leaf_xfoff); + uuid_copy(&mhead->mh_uuid, &mp->m_sb.sb_meta_uuid); + + head_bp->b_ops = &xfs_btree_mem_head_buf_ops; +} + +/* Return tree height from the in-memory btree head. */ +unsigned int +xfs_btree_mem_head_nlevels( + struct xfs_buf *head_bp) +{ + struct xfs_btree_mem_head *mhead = head_bp->b_addr; + + return be32_to_cpu(mhead->mh_nlevels); +} + +/* Extract the buftarg target for this xfile btree. */ +struct xfs_buftarg * +xfbtree_target(struct xfbtree *xfbtree) +{ + return xfbtree->target; +} + +/* Is this daddr (sector offset) contained within the buffer target? */ +static inline bool +xfbtree_verify_buftarg_xfileoff( + struct xfs_buftarg *btp, + xfileoff_t xfoff) +{ + xfs_daddr_t xfoff_daddr = xfo_to_daddr(xfoff); + + return xfs_buftarg_verify_daddr(btp, xfoff_daddr); +} + +/* Is this btree xfile offset contained within the xfile? */ +bool +xfbtree_verify_xfileoff( + struct xfs_btree_cur *cur, + unsigned long long xfoff) +{ + struct xfs_buftarg *btp = xfbtree_target(cur->bc_mem.xfbtree); + + return xfbtree_verify_buftarg_xfileoff(btp, xfoff); +} + +/* Check if a btree pointer is reasonable. */ +int +xfbtree_check_ptr( + struct xfs_btree_cur *cur, + const union xfs_btree_ptr *ptr, + int index, + int level) +{ + xfileoff_t bt_xfoff; + xfs_failaddr_t fa = NULL; + + ASSERT(cur->bc_flags & XFS_BTREE_IN_MEMORY); + + if (cur->bc_flags & XFS_BTREE_LONG_PTRS) + bt_xfoff = be64_to_cpu(ptr->l); + else + bt_xfoff = be32_to_cpu(ptr->s); + + if (!xfbtree_verify_xfileoff(cur, bt_xfoff)) + fa = __this_address; + + if (fa) { + xfs_err(cur->bc_mp, +"In-memory: Corrupt btree %d flags 0x%x pointer at level %d index %d fa %pS.", + cur->bc_btnum, cur->bc_flags, level, index, + fa); + return -EFSCORRUPTED; + } + return 0; +} + +/* Convert a btree pointer to a daddr */ +xfs_daddr_t +xfbtree_ptr_to_daddr( + struct xfs_btree_cur *cur, + const union xfs_btree_ptr *ptr) +{ + xfileoff_t bt_xfoff; + + if (cur->bc_flags & XFS_BTREE_LONG_PTRS) + bt_xfoff = be64_to_cpu(ptr->l); + else + bt_xfoff = be32_to_cpu(ptr->s); + return xfo_to_daddr(bt_xfoff); +} + +/* Set the pointer to point to this buffer. */ +void +xfbtree_buf_to_ptr( + struct xfs_btree_cur *cur, + struct xfs_buf *bp, + union xfs_btree_ptr *ptr) +{ + xfileoff_t xfoff = xfs_daddr_to_xfo(xfs_buf_daddr(bp)); + + if (cur->bc_flags & XFS_BTREE_LONG_PTRS) + ptr->l = cpu_to_be64(xfoff); + else + ptr->s = cpu_to_be32(xfoff); +} + +/* Return the in-memory btree block size, in units of 512 bytes. */ +unsigned int xfbtree_bbsize(void) +{ + return xfo_to_daddr(1); +} + +/* Set the root of an in-memory btree. */ +void +xfbtree_set_root( + struct xfs_btree_cur *cur, + const union xfs_btree_ptr *ptr, + int inc) +{ + struct xfs_buf *head_bp = cur->bc_mem.head_bp; + struct xfs_btree_mem_head *mhead = head_bp->b_addr; + + ASSERT(cur->bc_flags & XFS_BTREE_IN_MEMORY); + + if (cur->bc_flags & XFS_BTREE_LONG_PTRS) { + mhead->mh_root = ptr->l; + } else { + uint32_t root = be32_to_cpu(ptr->s); + + mhead->mh_root = cpu_to_be64(root); + } + be32_add_cpu(&mhead->mh_nlevels, inc); + xfs_trans_log_buf(cur->bc_tp, head_bp, 0, sizeof(*mhead) - 1); +} + +/* Initialize a pointer from the in-memory btree header. */ +void +xfbtree_init_ptr_from_cur( + struct xfs_btree_cur *cur, + union xfs_btree_ptr *ptr) +{ + struct xfs_buf *head_bp = cur->bc_mem.head_bp; + struct xfs_btree_mem_head *mhead = head_bp->b_addr; + + ASSERT(cur->bc_flags & XFS_BTREE_IN_MEMORY); + + if (cur->bc_flags & XFS_BTREE_LONG_PTRS) { + ptr->l = mhead->mh_root; + } else { + uint64_t root = be64_to_cpu(mhead->mh_root); + + ptr->s = cpu_to_be32(root); + } +} + +/* Duplicate an in-memory btree cursor. */ +struct xfs_btree_cur * +xfbtree_dup_cursor( + struct xfs_btree_cur *cur) +{ + struct xfs_btree_cur *ncur; + + ASSERT(cur->bc_flags & XFS_BTREE_IN_MEMORY); + + ncur = xfs_btree_alloc_cursor(cur->bc_mp, cur->bc_tp, cur->bc_btnum, + cur->bc_maxlevels, cur->bc_cache); + ncur->bc_flags = cur->bc_flags; + ncur->bc_nlevels = cur->bc_nlevels; + ncur->bc_statoff = cur->bc_statoff; + ncur->bc_ops = cur->bc_ops; + memcpy(&ncur->bc_mem, &cur->bc_mem, sizeof(cur->bc_mem)); + + if (cur->bc_mem.pag) + ncur->bc_mem.pag = xfs_perag_bump(cur->bc_mem.pag); + + return ncur; +} + +/* Check the owner of an in-memory btree block. */ +xfs_failaddr_t +xfbtree_check_block_owner( + struct xfs_btree_cur *cur, + struct xfs_btree_block *block) +{ + struct xfbtree *xfbt = cur->bc_mem.xfbtree; + + if (cur->bc_flags & XFS_BTREE_LONG_PTRS) { + if (be64_to_cpu(block->bb_u.l.bb_owner) != xfbt->owner) + return __this_address; + + return NULL; + } + + if (be32_to_cpu(block->bb_u.s.bb_owner) != xfbt->owner) + return __this_address; + + return NULL; +} + +/* Return the owner of this in-memory btree. */ +unsigned long long +xfbtree_owner( + struct xfs_btree_cur *cur) +{ + return cur->bc_mem.xfbtree->owner; +} + +/* Return the xfile offset (in blocks) of a btree buffer. */ +unsigned long long +xfbtree_buf_to_xfoff( + struct xfs_btree_cur *cur, + struct xfs_buf *bp) +{ + ASSERT(cur->bc_flags & XFS_BTREE_IN_MEMORY); + + return xfs_daddr_to_xfo(xfs_buf_daddr(bp)); +} + +/* Verify a long-format btree block. */ +xfs_failaddr_t +xfbtree_lblock_verify( + struct xfs_buf *bp, + unsigned int max_recs) +{ + struct xfs_btree_block *block = XFS_BUF_TO_BLOCK(bp); + struct xfs_buftarg *btp = bp->b_target; + + /* numrecs verification */ + if (be16_to_cpu(block->bb_numrecs) > max_recs) + return __this_address; + + /* sibling pointer verification */ + if (block->bb_u.l.bb_leftsib != cpu_to_be64(NULLFSBLOCK) && + !xfbtree_verify_buftarg_xfileoff(btp, + be64_to_cpu(block->bb_u.l.bb_leftsib))) + return __this_address; + + if (block->bb_u.l.bb_rightsib != cpu_to_be64(NULLFSBLOCK) && + !xfbtree_verify_buftarg_xfileoff(btp, + be64_to_cpu(block->bb_u.l.bb_rightsib))) + return __this_address; + + return NULL; +} + +/* Verify a short-format btree block. */ +xfs_failaddr_t +xfbtree_sblock_verify( + struct xfs_buf *bp, + unsigned int max_recs) +{ + struct xfs_btree_block *block = XFS_BUF_TO_BLOCK(bp); + struct xfs_buftarg *btp = bp->b_target; + + /* numrecs verification */ + if (be16_to_cpu(block->bb_numrecs) > max_recs) + return __this_address; + + /* sibling pointer verification */ + if (block->bb_u.s.bb_leftsib != cpu_to_be32(NULLAGBLOCK) && + !xfbtree_verify_buftarg_xfileoff(btp, + be32_to_cpu(block->bb_u.s.bb_leftsib))) + return __this_address; + + if (block->bb_u.s.bb_rightsib != cpu_to_be32(NULLAGBLOCK) && + !xfbtree_verify_buftarg_xfileoff(btp, + be32_to_cpu(block->bb_u.s.bb_rightsib))) + return __this_address; + + return NULL; +} diff --git a/libxfs/xfbtree.h b/libxfs/xfbtree.h new file mode 100644 index 00000000000..e378b771637 --- /dev/null +++ b/libxfs/xfbtree.h @@ -0,0 +1,36 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (C) 2022 Oracle. All Rights Reserved. + * Author: Darrick J. Wong + */ +#ifndef __LIBXFS_XFBTREE_H__ +#define __LIBXFS_XFBTREE_H__ + +#ifdef CONFIG_XFS_IN_MEMORY_BTREE + +/* Root block for an in-memory btree. */ +struct xfs_btree_mem_head { + __be32 mh_magic; + __be32 mh_nlevels; + __be64 mh_owner; + __be64 mh_root; + uuid_t mh_uuid; +}; + +#define XFS_BTREE_MEM_HEAD_MAGIC 0x4341544D /* "CATM" */ + +/* in-memory btree header is always block 0 in the backing store */ +#define XFS_BTREE_MEM_HEAD_DADDR 0 + +/* xfile-backed in-memory btrees */ + +struct xfbtree { + struct xfs_buftarg *target; + + /* Owner of this btree. */ + unsigned long long owner; +}; + +#endif /* CONFIG_XFS_IN_MEMORY_BTREE */ + +#endif /* __LIBXFS_XFBTREE_H__ */ diff --git a/libxfs/xfile.c b/libxfs/xfile.c index 357ffb0077d..5985433749d 100644 --- a/libxfs/xfile.c +++ b/libxfs/xfile.c @@ -6,6 +6,7 @@ #include "libxfs_priv.h" #include "libxfs.h" #include "libxfs/xfile.h" +#include "libfrog/util.h" #include #include #include @@ -26,6 +27,23 @@ * management; file locks are not taken. */ +/* Figure out the xfile block size here */ +unsigned int XFB_BLOCKSIZE; +unsigned int XFB_BSHIFT; + +void +xfile_libinit(void) +{ + long ret = sysconf(_SC_PAGESIZE); + + /* If we don't find a power-of-two page size, go with 4k. */ + if (ret < 0 || !is_power_of_2(ret)) + ret = 4096; + + XFB_BLOCKSIZE = ret; + XFB_BSHIFT = libxfs_highbit32(XFB_BLOCKSIZE); +} + /* * Open a memory-backed fd to back an xfile. We require close-on-exec here, * because these memfd files function as windowed RAM and hence should never diff --git a/libxfs/xfile.h b/libxfs/xfile.h index ad13f62ee0f..5a1d0104808 100644 --- a/libxfs/xfile.h +++ b/libxfs/xfile.h @@ -10,6 +10,8 @@ struct xfile { int fd; }; +void xfile_libinit(void); + int xfile_create(struct xfs_mount *mp, const char *description, struct xfile **xfilep); void xfile_destroy(struct xfile *xf); @@ -53,4 +55,52 @@ struct xfile_stat { int xfile_stat(struct xfile *xf, struct xfile_stat *statbuf); int xfile_dump(struct xfile *xf); +static inline loff_t xfile_size(struct xfile *xf) +{ + struct xfile_stat xs; + int ret; + + ret = xfile_stat(xf, &xs); + if (ret) + return 0; + + return xs.size; +} + +/* file block (aka system page size) to basic block conversions. */ +typedef unsigned long long xfileoff_t; +extern unsigned int XFB_BLOCKSIZE; +extern unsigned int XFB_BSHIFT; +#define XFB_SHIFT (XFB_BSHIFT - BBSHIFT) + +static inline loff_t xfo_to_b(xfileoff_t xfoff) +{ + return xfoff << XFB_BSHIFT; +} + +static inline xfileoff_t b_to_xfo(loff_t pos) +{ + return (pos + (XFB_BLOCKSIZE - 1)) >> XFB_BSHIFT; +} + +static inline xfileoff_t b_to_xfot(loff_t pos) +{ + return pos >> XFB_BSHIFT; +} + +static inline xfs_daddr_t xfo_to_daddr(xfileoff_t xfoff) +{ + return xfoff << XFB_SHIFT; +} + +static inline xfileoff_t xfs_daddr_to_xfo(xfs_daddr_t bb) +{ + return (bb + (xfo_to_daddr(1) - 1)) >> XFB_SHIFT; +} + +static inline xfileoff_t xfs_daddr_to_xfot(xfs_daddr_t bb) +{ + return bb >> XFB_SHIFT; +} + #endif /* __LIBXFS_XFILE_H__ */ diff --git a/libxfs/xfs_btree.c b/libxfs/xfs_btree.c index cd722d1c830..dd189a8baf2 100644 --- a/libxfs/xfs_btree.c +++ b/libxfs/xfs_btree.c @@ -25,6 +25,9 @@ #include "xfs_rmap_btree.h" #include "xfs_refcount_btree.h" #include "xfs_health.h" +#include "xfile.h" +#include "xfbtree.h" +#include "xfs_btree_mem.h" /* * Btree magic numbers. @@ -79,6 +82,9 @@ xfs_btree_check_lblock_siblings( if (level >= 0) { if (!xfs_btree_check_lptr(cur, sibling, level + 1)) return __this_address; + } else if (cur && (cur->bc_flags & XFS_BTREE_IN_MEMORY)) { + if (!xfbtree_verify_xfileoff(cur, sibling)) + return __this_address; } else { if (!xfs_verify_fsbno(mp, sibling)) return __this_address; @@ -106,6 +112,9 @@ xfs_btree_check_sblock_siblings( if (level >= 0) { if (!xfs_btree_check_sptr(cur, sibling, level + 1)) return __this_address; + } else if (cur && (cur->bc_flags & XFS_BTREE_IN_MEMORY)) { + if (!xfbtree_verify_xfileoff(cur, sibling)) + return __this_address; } else { if (!xfs_verify_agbno(pag, sibling)) return __this_address; @@ -148,7 +157,9 @@ __xfs_btree_check_lblock( cur->bc_ops->get_maxrecs(cur, level)) return __this_address; - if (bp) + if ((cur->bc_flags & XFS_BTREE_IN_MEMORY) && bp) + fsb = xfbtree_buf_to_xfoff(cur, bp); + else if (bp) fsb = XFS_DADDR_TO_FSB(mp, xfs_buf_daddr(bp)); fa = xfs_btree_check_lblock_siblings(mp, cur, level, fsb, @@ -215,8 +226,12 @@ __xfs_btree_check_sblock( cur->bc_ops->get_maxrecs(cur, level)) return __this_address; - if (bp) + if ((cur->bc_flags & XFS_BTREE_IN_MEMORY) && bp) { + pag = NULL; + agbno = xfbtree_buf_to_xfoff(cur, bp); + } else if (bp) { agbno = xfs_daddr_to_agbno(mp, xfs_buf_daddr(bp)); + } fa = xfs_btree_check_sblock_siblings(pag, cur, level, agbno, block->bb_u.s.bb_leftsib); @@ -273,6 +288,8 @@ xfs_btree_check_lptr( { if (level <= 0) return false; + if (cur->bc_flags & XFS_BTREE_IN_MEMORY) + return xfbtree_verify_xfileoff(cur, fsbno); return xfs_verify_fsbno(cur->bc_mp, fsbno); } @@ -285,6 +302,8 @@ xfs_btree_check_sptr( { if (level <= 0) return false; + if (cur->bc_flags & XFS_BTREE_IN_MEMORY) + return xfbtree_verify_xfileoff(cur, agbno); return xfs_verify_agbno(cur->bc_ag.pag, agbno); } @@ -299,6 +318,9 @@ xfs_btree_check_ptr( int index, int level) { + if (cur->bc_flags & XFS_BTREE_IN_MEMORY) + return xfbtree_check_ptr(cur, ptr, index, level); + if (cur->bc_flags & XFS_BTREE_LONG_PTRS) { if (xfs_btree_check_lptr(cur, be64_to_cpu((&ptr->l)[index]), level)) @@ -455,11 +477,36 @@ xfs_btree_del_cursor( xfs_is_shutdown(cur->bc_mp) || error != 0); if (unlikely(cur->bc_flags & XFS_BTREE_STAGING)) kmem_free(cur->bc_ops); - if (!(cur->bc_flags & XFS_BTREE_LONG_PTRS) && cur->bc_ag.pag) + if (!(cur->bc_flags & XFS_BTREE_LONG_PTRS) && + !(cur->bc_flags & XFS_BTREE_IN_MEMORY) && cur->bc_ag.pag) xfs_perag_put(cur->bc_ag.pag); + if (cur->bc_flags & XFS_BTREE_IN_MEMORY) { + if (cur->bc_mem.pag) + xfs_perag_put(cur->bc_mem.pag); + } kmem_cache_free(cur->bc_cache, cur); } +/* Return the buffer target for this btree's buffer. */ +static inline struct xfs_buftarg * +xfs_btree_buftarg( + struct xfs_btree_cur *cur) +{ + if (cur->bc_flags & XFS_BTREE_IN_MEMORY) + return xfbtree_target(cur->bc_mem.xfbtree); + return cur->bc_mp->m_ddev_targp; +} + +/* Return the block size (in units of 512b sectors) for this btree. */ +static inline unsigned int +xfs_btree_bbsize( + struct xfs_btree_cur *cur) +{ + if (cur->bc_flags & XFS_BTREE_IN_MEMORY) + return xfbtree_bbsize(); + return cur->bc_mp->m_bsize; +} + /* * Duplicate the btree cursor. * Allocate a new one, copy the record, re-get the buffers. @@ -497,10 +544,11 @@ xfs_btree_dup_cursor( new->bc_levels[i].ra = cur->bc_levels[i].ra; bp = cur->bc_levels[i].bp; if (bp) { - error = xfs_trans_read_buf(mp, tp, mp->m_ddev_targp, - xfs_buf_daddr(bp), mp->m_bsize, - 0, &bp, - cur->bc_ops->buf_ops); + error = xfs_trans_read_buf(mp, tp, + xfs_btree_buftarg(cur), + xfs_buf_daddr(bp), + xfs_btree_bbsize(cur), 0, &bp, + cur->bc_ops->buf_ops); if (xfs_metadata_is_sick(error)) xfs_btree_mark_sick(new); if (error) { @@ -941,6 +989,9 @@ xfs_btree_readahead_lblock( xfs_fsblock_t left = be64_to_cpu(block->bb_u.l.bb_leftsib); xfs_fsblock_t right = be64_to_cpu(block->bb_u.l.bb_rightsib); + if (cur->bc_flags & XFS_BTREE_IN_MEMORY) + return 0; + if ((lr & XFS_BTCUR_LEFTRA) && left != NULLFSBLOCK) { xfs_btree_reada_bufl(cur->bc_mp, left, 1, cur->bc_ops->buf_ops); @@ -966,6 +1017,8 @@ xfs_btree_readahead_sblock( xfs_agblock_t left = be32_to_cpu(block->bb_u.s.bb_leftsib); xfs_agblock_t right = be32_to_cpu(block->bb_u.s.bb_rightsib); + if (cur->bc_flags & XFS_BTREE_IN_MEMORY) + return 0; if ((lr & XFS_BTCUR_LEFTRA) && left != NULLAGBLOCK) { xfs_btree_reada_bufs(cur->bc_mp, cur->bc_ag.pag->pag_agno, @@ -1027,6 +1080,11 @@ xfs_btree_ptr_to_daddr( if (error) return error; + if (cur->bc_flags & XFS_BTREE_IN_MEMORY) { + *daddr = xfbtree_ptr_to_daddr(cur, ptr); + return 0; + } + if (cur->bc_flags & XFS_BTREE_LONG_PTRS) { fsbno = be64_to_cpu(ptr->l); *daddr = XFS_FSB_TO_DADDR(cur->bc_mp, fsbno); @@ -1055,8 +1113,9 @@ xfs_btree_readahead_ptr( if (xfs_btree_ptr_to_daddr(cur, ptr, &daddr)) return; - xfs_buf_readahead(cur->bc_mp->m_ddev_targp, daddr, - cur->bc_mp->m_bsize * count, cur->bc_ops->buf_ops); + xfs_buf_readahead(xfs_btree_buftarg(cur), daddr, + xfs_btree_bbsize(cur) * count, + cur->bc_ops->buf_ops); } /* @@ -1230,7 +1289,9 @@ xfs_btree_init_block_cur( * change in future, but is safe for current users of the generic btree * code. */ - if (cur->bc_flags & XFS_BTREE_LONG_PTRS) + if (cur->bc_flags & XFS_BTREE_IN_MEMORY) + owner = xfbtree_owner(cur); + else if (cur->bc_flags & XFS_BTREE_LONG_PTRS) owner = cur->bc_ino.ip->i_ino; else owner = cur->bc_ag.pag->pag_agno; @@ -1270,6 +1331,11 @@ xfs_btree_buf_to_ptr( struct xfs_buf *bp, union xfs_btree_ptr *ptr) { + if (cur->bc_flags & XFS_BTREE_IN_MEMORY) { + xfbtree_buf_to_ptr(cur, bp, ptr); + return; + } + if (cur->bc_flags & XFS_BTREE_LONG_PTRS) ptr->l = cpu_to_be64(XFS_DADDR_TO_FSB(cur->bc_mp, xfs_buf_daddr(bp))); @@ -1314,15 +1380,14 @@ xfs_btree_get_buf_block( struct xfs_btree_block **block, struct xfs_buf **bpp) { - struct xfs_mount *mp = cur->bc_mp; - xfs_daddr_t d; - int error; + xfs_daddr_t d; + int error; error = xfs_btree_ptr_to_daddr(cur, ptr, &d); if (error) return error; - error = xfs_trans_get_buf(cur->bc_tp, mp->m_ddev_targp, d, mp->m_bsize, - 0, bpp); + error = xfs_trans_get_buf(cur->bc_tp, xfs_btree_buftarg(cur), d, + xfs_btree_bbsize(cur), 0, bpp); if (error) return error; @@ -1353,9 +1418,9 @@ xfs_btree_read_buf_block( error = xfs_btree_ptr_to_daddr(cur, ptr, &d); if (error) return error; - error = xfs_trans_read_buf(mp, cur->bc_tp, mp->m_ddev_targp, d, - mp->m_bsize, flags, bpp, - cur->bc_ops->buf_ops); + error = xfs_trans_read_buf(mp, cur->bc_tp, xfs_btree_buftarg(cur), d, + xfs_btree_bbsize(cur), flags, bpp, + cur->bc_ops->buf_ops); if (xfs_metadata_is_sick(error)) xfs_btree_mark_sick(cur); if (error) @@ -1795,6 +1860,37 @@ xfs_btree_decrement( return error; } +/* + * Check the btree block owner now that we have the context to know who the + * real owner is. + */ +static inline xfs_failaddr_t +xfs_btree_check_block_owner( + struct xfs_btree_cur *cur, + struct xfs_btree_block *block) +{ + if (!xfs_has_crc(cur->bc_mp)) + return NULL; + + if (cur->bc_flags & XFS_BTREE_IN_MEMORY) + return xfbtree_check_block_owner(cur, block); + + if (!(cur->bc_flags & XFS_BTREE_LONG_PTRS)) { + if (be32_to_cpu(block->bb_u.s.bb_owner) != + cur->bc_ag.pag->pag_agno) + return __this_address; + return NULL; + } + + if (cur->bc_ino.flags & XFS_BTCUR_BMBT_INVALID_OWNER) + return NULL; + + if (be64_to_cpu(block->bb_u.l.bb_owner) != cur->bc_ino.ip->i_ino) + return __this_address; + + return NULL; +} + int xfs_btree_lookup_get_block( struct xfs_btree_cur *cur, /* btree cursor */ @@ -1833,11 +1929,7 @@ xfs_btree_lookup_get_block( return error; /* Check the inode owner since the verifiers don't. */ - if (xfs_has_crc(cur->bc_mp) && - !(cur->bc_ino.flags & XFS_BTCUR_BMBT_INVALID_OWNER) && - (cur->bc_flags & XFS_BTREE_LONG_PTRS) && - be64_to_cpu((*blkp)->bb_u.l.bb_owner) != - cur->bc_ino.ip->i_ino) + if (xfs_btree_check_block_owner(cur, *blkp) != NULL) goto out_bad; /* Did we get the level we were looking for? */ @@ -4369,7 +4461,7 @@ xfs_btree_visit_block( { struct xfs_btree_block *block; struct xfs_buf *bp; - union xfs_btree_ptr rptr; + union xfs_btree_ptr rptr, bufptr; int error; /* do right sibling readahead */ @@ -4392,15 +4484,14 @@ xfs_btree_visit_block( * return the same block without checking if the right sibling points * back to us and creates a cyclic reference in the btree. */ + xfs_btree_buf_to_ptr(cur, bp, &bufptr); if (cur->bc_flags & XFS_BTREE_LONG_PTRS) { - if (be64_to_cpu(rptr.l) == XFS_DADDR_TO_FSB(cur->bc_mp, - xfs_buf_daddr(bp))) { + if (rptr.l == bufptr.l) { xfs_btree_mark_sick(cur); return -EFSCORRUPTED; } } else { - if (be32_to_cpu(rptr.s) == xfs_daddr_to_agbno(cur->bc_mp, - xfs_buf_daddr(bp))) { + if (rptr.s == bufptr.s) { xfs_btree_mark_sick(cur); return -EFSCORRUPTED; } @@ -4582,6 +4673,8 @@ xfs_btree_lblock_verify( xfs_fsblock_t fsb; xfs_failaddr_t fa; + ASSERT(!xfs_buftarg_in_memory(bp->b_target)); + /* numrecs verification */ if (be16_to_cpu(block->bb_numrecs) > max_recs) return __this_address; @@ -4637,6 +4730,8 @@ xfs_btree_sblock_verify( xfs_agblock_t agbno; xfs_failaddr_t fa; + ASSERT(!xfs_buftarg_in_memory(bp->b_target)); + /* numrecs verification */ if (be16_to_cpu(block->bb_numrecs) > max_recs) return __this_address; diff --git a/libxfs/xfs_btree.h b/libxfs/xfs_btree.h index 2fa7a09cab3..6c81fceab74 100644 --- a/libxfs/xfs_btree.h +++ b/libxfs/xfs_btree.h @@ -248,6 +248,15 @@ struct xfs_btree_cur_ino { #define XFS_BTCUR_BMBT_INVALID_OWNER (1 << 1) }; +/* In-memory btree information */ +struct xfbtree; + +struct xfs_btree_cur_mem { + struct xfbtree *xfbtree; + struct xfs_buf *head_bp; + struct xfs_perag *pag; +}; + struct xfs_btree_level { /* buffer pointer */ struct xfs_buf *bp; @@ -287,6 +296,7 @@ struct xfs_btree_cur union { struct xfs_btree_cur_ag bc_ag; struct xfs_btree_cur_ino bc_ino; + struct xfs_btree_cur_mem bc_mem; }; /* Must be at the end of the struct! */ @@ -317,6 +327,13 @@ xfs_btree_cur_sizeof(unsigned int nlevels) */ #define XFS_BTREE_STAGING (1<<5) +/* btree stored in memory; not compatible with ROOT_IN_INODE */ +#ifdef CONFIG_XFS_IN_MEMORY_BTREE +# define XFS_BTREE_IN_MEMORY (1<<7) +#else +# define XFS_BTREE_IN_MEMORY (0) +#endif + #define XFS_BTREE_NOERROR 0 #define XFS_BTREE_ERROR 1 diff --git a/libxfs/xfs_btree_mem.h b/libxfs/xfs_btree_mem.h new file mode 100644 index 00000000000..6ca9ea64a9a --- /dev/null +++ b/libxfs/xfs_btree_mem.h @@ -0,0 +1,87 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (C) 2022 Oracle. All Rights Reserved. + * Author: Darrick J. Wong + */ +#ifndef __XFS_BTREE_MEM_H__ +#define __XFS_BTREE_MEM_H__ + +struct xfbtree; + +#ifdef CONFIG_XFS_IN_MEMORY_BTREE +unsigned int xfs_btree_mem_head_nlevels(struct xfs_buf *head_bp); + +struct xfs_buftarg *xfbtree_target(struct xfbtree *xfbtree); +int xfbtree_check_ptr(struct xfs_btree_cur *cur, + const union xfs_btree_ptr *ptr, int index, int level); +xfs_daddr_t xfbtree_ptr_to_daddr(struct xfs_btree_cur *cur, + const union xfs_btree_ptr *ptr); +void xfbtree_buf_to_ptr(struct xfs_btree_cur *cur, struct xfs_buf *bp, + union xfs_btree_ptr *ptr); + +unsigned int xfbtree_bbsize(void); + +void xfbtree_set_root(struct xfs_btree_cur *cur, + const union xfs_btree_ptr *ptr, int inc); +void xfbtree_init_ptr_from_cur(struct xfs_btree_cur *cur, + union xfs_btree_ptr *ptr); +struct xfs_btree_cur *xfbtree_dup_cursor(struct xfs_btree_cur *cur); +bool xfbtree_verify_xfileoff(struct xfs_btree_cur *cur, + unsigned long long xfoff); +xfs_failaddr_t xfbtree_check_block_owner(struct xfs_btree_cur *cur, + struct xfs_btree_block *block); +unsigned long long xfbtree_owner(struct xfs_btree_cur *cur); +xfs_failaddr_t xfbtree_lblock_verify(struct xfs_buf *bp, unsigned int max_recs); +xfs_failaddr_t xfbtree_sblock_verify(struct xfs_buf *bp, unsigned int max_recs); +unsigned long long xfbtree_buf_to_xfoff(struct xfs_btree_cur *cur, + struct xfs_buf *bp); +#else +static inline unsigned int xfs_btree_mem_head_nlevels(struct xfs_buf *head_bp) +{ + return 0; +} + +static inline struct xfs_buftarg * +xfbtree_target(struct xfbtree *xfbtree) +{ + return NULL; +} + +static inline int +xfbtree_check_ptr(struct xfs_btree_cur *cur, const union xfs_btree_ptr *ptr, + int index, int level) +{ + return 0; +} + +static inline xfs_daddr_t +xfbtree_ptr_to_daddr(struct xfs_btree_cur *cur, const union xfs_btree_ptr *ptr) +{ + return 0; +} + +static inline void +xfbtree_buf_to_ptr( + struct xfs_btree_cur *cur, + struct xfs_buf *bp, + union xfs_btree_ptr *ptr) +{ + memset(ptr, 0xFF, sizeof(*ptr)); +} + +static inline unsigned int xfbtree_bbsize(void) +{ + return 0; +} + +#define xfbtree_set_root NULL +#define xfbtree_init_ptr_from_cur NULL +#define xfbtree_dup_cursor NULL +#define xfbtree_verify_xfileoff(cur, xfoff) (false) +#define xfbtree_check_block_owner(cur, block) NULL +#define xfbtree_owner(cur) (0ULL) +#define xfbtree_buf_to_xfoff(cur, bp) (-1) + +#endif /* CONFIG_XFS_IN_MEMORY_BTREE */ + +#endif /* __XFS_BTREE_MEM_H__ */ From patchwork Fri Dec 30 22:17:42 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13085075 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 52E76C4332F for ; Sat, 31 Dec 2022 00:13:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235655AbiLaANT (ORCPT ); Fri, 30 Dec 2022 19:13:19 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55246 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235595AbiLaANR (ORCPT ); Fri, 30 Dec 2022 19:13:17 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1462FB48F for ; Fri, 30 Dec 2022 16:13:16 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 95C0061CE8 for ; Sat, 31 Dec 2022 00:13:15 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id EBDE3C433D2; Sat, 31 Dec 2022 00:13:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1672445595; bh=SylaC+trNLdlABIvka1N+FGo69dykpsOk56DjzmWRI4=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=Z6QRrKZPCXB5Oca03PRMFdwcB7sOA8DRullEmDVgvSpiyBOGscQsRsaAUussWxErl gF+uMA9CoUdEbGV0CQ/+mYhzhVi/DZTv0hlg3lX9ILZ0+lAkCtsLunAoz6mJueOCsC TjTq07MQLZILtv3+MnHFDl2atF8QQtzhBEPtIEhm9j6lfgiwsSatLSJnYp+8RNIY+t SO7W/4W/80vQgoSx4by4huTZmLQiz5s5zWT0H4MidFibrpTgC6bLp4hRasNMhEeMTl E9MtmE2T+d5po15dmVoHUTSff/r++oz/ijEiRlqBx/3H+NGzu1SFOZWk3+bfQfAkRl j65dLa1kIWD2g== Subject: [PATCH 8/9] xfs: connect in-memory btrees to xfiles From: "Darrick J. Wong" To: cem@kernel.org, djwong@kernel.org Cc: linux-xfs@vger.kernel.org Date: Fri, 30 Dec 2022 14:17:42 -0800 Message-ID: <167243866264.711834.49594932006935260.stgit@magnolia> In-Reply-To: <167243866153.711834.17585439086893346840.stgit@magnolia> References: <167243866153.711834.17585439086893346840.stgit@magnolia> User-Agent: StGit/0.19 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Darrick J. Wong Add to our stubbed-out in-memory btrees the ability to connect them with an actual in-memory backing file (aka xfiles) and the necessary pieces to track free space in the xfile and flush dirty xfbtree buffers on demand, which we'll need for online repair. Signed-off-by: Darrick J. Wong --- include/xfs_mount.h | 8 + include/xfs_trace.h | 8 + include/xfs_trans.h | 1 libfrog/bitmap.c | 64 ++++++- libfrog/bitmap.h | 3 libxfs/init.c | 32 +++ libxfs/trans.c | 40 ++++ libxfs/xfbtree.c | 459 ++++++++++++++++++++++++++++++++++++++++++++++++ libxfs/xfbtree.h | 27 +++ libxfs/xfile.c | 16 ++ libxfs/xfile.h | 2 libxfs/xfs_btree_mem.h | 41 ++++ 12 files changed, 690 insertions(+), 11 deletions(-) diff --git a/include/xfs_mount.h b/include/xfs_mount.h index 6be85bf21d2..906b0573c0b 100644 --- a/include/xfs_mount.h +++ b/include/xfs_mount.h @@ -282,4 +282,12 @@ static inline void xfs_perag_drop_intents(struct xfs_perag *pag) { } #define xfs_drain_free(dr) ((void)0) #define xfs_drain_init(dr) ((void)0) +static inline void libxfs_buftarg_drain(struct xfs_buftarg *btp) +{ + cache_purge(btp->bcache); +} +void libxfs_buftarg_free(struct xfs_buftarg *btp); +int libxfs_alloc_memory_buftarg(struct xfs_mount *mp, struct xfile *xfile, + struct xfs_buftarg **btpp); + #endif /* __XFS_MOUNT_H__ */ diff --git a/include/xfs_trace.h b/include/xfs_trace.h index 3ca6cda253c..3c6bd32d4ca 100644 --- a/include/xfs_trace.h +++ b/include/xfs_trace.h @@ -6,6 +6,13 @@ #ifndef __TRACE_H__ #define __TRACE_H__ +#define trace_xfbtree_create(...) ((void) 0) +#define trace_xfbtree_create_root_buf(...) ((void) 0) +#define trace_xfbtree_alloc_block(...) ((void) 0) +#define trace_xfbtree_free_block(...) ((void) 0) +#define trace_xfbtree_trans_cancel_buf(...) ((void) 0) +#define trace_xfbtree_trans_commit_buf(...) ((void) 0) + #define trace_xfs_agfl_reset(a,b,c,d) ((void) 0) #define trace_xfs_agfl_free_defer(a,b,c,d,e) ((void) 0) #define trace_xfs_alloc_cur_check(a,b,c,d,e,f) ((void) 0) @@ -194,6 +201,7 @@ #define trace_xfs_trans_cancel(a,b) ((void) 0) #define trace_xfs_trans_brelse(a) ((void) 0) #define trace_xfs_trans_binval(a) ((void) 0) +#define trace_xfs_trans_bdetach(a) ((void) 0) #define trace_xfs_trans_bjoin(a) ((void) 0) #define trace_xfs_trans_bhold(a) ((void) 0) #define trace_xfs_trans_bhold_release(a) ((void) 0) diff --git a/include/xfs_trans.h b/include/xfs_trans.h index ae339df1195..bfaee7e8fed 100644 --- a/include/xfs_trans.h +++ b/include/xfs_trans.h @@ -107,6 +107,7 @@ int libxfs_trans_roll_inode (struct xfs_trans **, struct xfs_inode *); void libxfs_trans_brelse(struct xfs_trans *, struct xfs_buf *); void libxfs_trans_binval(struct xfs_trans *, struct xfs_buf *); void libxfs_trans_bjoin(struct xfs_trans *, struct xfs_buf *); +void libxfs_trans_bdetach(struct xfs_trans *tp, struct xfs_buf *bp); void libxfs_trans_bhold(struct xfs_trans *, struct xfs_buf *); void libxfs_trans_bhold_release(struct xfs_trans *, struct xfs_buf *); void libxfs_trans_dirty_buf(struct xfs_trans *, struct xfs_buf *); diff --git a/libfrog/bitmap.c b/libfrog/bitmap.c index 5af5ab8dd6b..e1f3a5e1c84 100644 --- a/libfrog/bitmap.c +++ b/libfrog/bitmap.c @@ -233,10 +233,9 @@ bitmap_set( return res; } -#if 0 /* Unused, provided for completeness. */ /* Clear a region of bits. */ -int -bitmap_clear( +static int +__bitmap_clear( struct bitmap *bmap, uint64_t start, uint64_t len) @@ -251,8 +250,8 @@ bitmap_clear( uint64_t new_length; struct avl64node *node; int stat; + int ret = 0; - pthread_mutex_lock(&bmap->bt_lock); /* Find any existing nodes over that range. */ avl64_findranges(bmap->bt_tree, start, start + len, &firstn, &lastn); @@ -312,10 +311,24 @@ bitmap_clear( } out: - pthread_mutex_unlock(&bmap->bt_lock); return ret; } -#endif + +/* Clear a region of bits. */ +int +bitmap_clear( + struct bitmap *bmap, + uint64_t start, + uint64_t length) +{ + int res; + + pthread_mutex_lock(&bmap->bt_lock); + res = __bitmap_clear(bmap, start, length); + pthread_mutex_unlock(&bmap->bt_lock); + + return res; +} /* Iterate the set regions of this bitmap. */ int @@ -438,3 +451,42 @@ bitmap_dump( printf("BITMAP DUMP DONE\n"); } #endif + +/* + * Find the first set bit in this bitmap, clear it, and return the index of + * that bit in @valp. Returns -ENODATA if no bits were set, or the usual + * negative errno. + */ +int +bitmap_take_first_set( + struct bitmap *bmap, + uint64_t start, + uint64_t last, + uint64_t *valp) +{ + struct avl64node *firstn; + struct avl64node *lastn; + struct bitmap_node *ext; + uint64_t val; + int error; + + pthread_mutex_lock(&bmap->bt_lock); + + avl64_findranges(bmap->bt_tree, start, last + 1, &firstn, &lastn); + + if (firstn == NULL && lastn == NULL) { + error = -ENODATA; + goto out; + } + + ext = container_of(firstn, struct bitmap_node, btn_node); + val = ext->btn_start; + error = __bitmap_clear(bmap, val, 1); + if (error) + goto out; + + *valp = val; +out: + pthread_mutex_unlock(&bmap->bt_lock); + return error; +} diff --git a/libfrog/bitmap.h b/libfrog/bitmap.h index 043b77eece6..896ae01f8f4 100644 --- a/libfrog/bitmap.h +++ b/libfrog/bitmap.h @@ -14,6 +14,7 @@ struct bitmap { int bitmap_alloc(struct bitmap **bmap); void bitmap_free(struct bitmap **bmap); int bitmap_set(struct bitmap *bmap, uint64_t start, uint64_t length); +int bitmap_clear(struct bitmap *bmap, uint64_t start, uint64_t length); int bitmap_iterate(struct bitmap *bmap, int (*fn)(uint64_t, uint64_t, void *), void *arg); int bitmap_iterate_range(struct bitmap *bmap, uint64_t start, uint64_t length, @@ -22,5 +23,7 @@ bool bitmap_test(struct bitmap *bmap, uint64_t start, uint64_t len); bool bitmap_empty(struct bitmap *bmap); void bitmap_dump(struct bitmap *bmap); +int bitmap_take_first_set(struct bitmap *bmap, uint64_t start, uint64_t last, + uint64_t *valp); #endif /* __LIBFROG_BITMAP_H__ */ diff --git a/libxfs/init.c b/libxfs/init.c index 676c6fbd6d2..b80f6bfd8fc 100644 --- a/libxfs/init.c +++ b/libxfs/init.c @@ -621,6 +621,36 @@ libxfs_buftarg_alloc( return btp; } +int +libxfs_alloc_memory_buftarg( + struct xfs_mount *mp, + struct xfile *xfile, + struct xfs_buftarg **btpp) +{ + struct xfs_buftarg *btp; + unsigned int bcache_flags = 0; + + btp = malloc(sizeof(*btp)); + if (!btp) + return -ENOMEM; + + btp->bt_mount = mp; + btp->bt_xfile = xfile; + btp->flags = XFS_BUFTARG_IN_MEMORY; + btp->writes_left = 0; + pthread_mutex_init(&btp->lock, NULL); + + /* + * Keep the bucket count small because the only anticipated caller is + * per-AG in-memory btrees, for which we don't need to scale to handle + * an entire filesystem. + */ + btp->bcache = cache_init(bcache_flags, 63, &libxfs_bcache_operations); + + *btpp = btp; + return 0; +} + enum libxfs_write_failure_nums { WF_DATA = 0, WF_LOG, @@ -1023,7 +1053,7 @@ libxfs_flush_mount( return error; } -static void +void libxfs_buftarg_free( struct xfs_buftarg *btp) { diff --git a/libxfs/trans.c b/libxfs/trans.c index e9430c61562..3120d8b1dea 100644 --- a/libxfs/trans.c +++ b/libxfs/trans.c @@ -613,6 +613,46 @@ libxfs_trans_brelse( libxfs_buf_relse(bp); } +/* + * Forcibly detach a buffer previously joined to the transaction. The caller + * will retain its locked reference to the buffer after this function returns. + * The buffer must be completely clean and must not be held to the transaction. + */ +void +libxfs_trans_bdetach( + struct xfs_trans *tp, + struct xfs_buf *bp) +{ + struct xfs_buf_log_item *bip = bp->b_log_item; + + ASSERT(tp != NULL); + ASSERT(bp->b_transp == tp); + ASSERT(bip->bli_item.li_type == XFS_LI_BUF); + + trace_xfs_trans_bdetach(bip); + + /* + * Erase all recursion count, since we're removing this buffer from the + * transaction. + */ + bip->bli_recur = 0; + + /* + * The buffer must be completely clean. Specifically, it had better + * not be dirty, stale, logged, ordered, or held to the transaction. + */ + ASSERT(!test_bit(XFS_LI_DIRTY, &bip->bli_item.li_flags)); + ASSERT(!(bip->bli_flags & XFS_BLI_DIRTY)); + ASSERT(!(bip->bli_flags & XFS_BLI_HOLD)); + ASSERT(!(bip->bli_flags & XFS_BLI_ORDERED)); + ASSERT(!(bip->bli_flags & XFS_BLI_STALE)); + + /* Unlink the log item from the transaction and drop the log item. */ + xfs_trans_del_item(&bip->bli_item); + xfs_buf_item_put(bip); + bp->b_transp = NULL; +} + /* * Mark the buffer as not needing to be unlocked when the buf item's * iop_unlock() routine is called. The buffer must already be locked diff --git a/libxfs/xfbtree.c b/libxfs/xfbtree.c index 0481e9ed9f4..41851c3b9ae 100644 --- a/libxfs/xfbtree.c +++ b/libxfs/xfbtree.c @@ -8,6 +8,7 @@ #include "xfile.h" #include "xfbtree.h" #include "xfs_btree_mem.h" +#include "libfrog/bitmap.h" /* btree ops functions for in-memory btrees. */ @@ -133,9 +134,18 @@ xfbtree_check_ptr( else bt_xfoff = be32_to_cpu(ptr->s); - if (!xfbtree_verify_xfileoff(cur, bt_xfoff)) + if (!xfbtree_verify_xfileoff(cur, bt_xfoff)) { fa = __this_address; + goto done; + } + /* Can't point to the head or anything before it */ + if (bt_xfoff < XFBTREE_INIT_LEAF_BLOCK) { + fa = __this_address; + goto done; + } + +done: if (fa) { xfs_err(cur->bc_mp, "In-memory: Corrupt btree %d flags 0x%x pointer at level %d index %d fa %pS.", @@ -341,3 +351,450 @@ xfbtree_sblock_verify( return NULL; } + +/* Close the btree xfile and release all resources. */ +void +xfbtree_destroy( + struct xfbtree *xfbt) +{ + bitmap_free(&xfbt->freespace); + kmem_free(xfbt->freespace); + libxfs_buftarg_drain(xfbt->target); + kmem_free(xfbt); +} + +/* Compute the number of bytes available for records. */ +static inline unsigned int +xfbtree_rec_bytes( + struct xfs_mount *mp, + const struct xfbtree_config *cfg) +{ + unsigned int blocklen = xfo_to_b(1); + + if (cfg->flags & XFBTREE_CREATE_LONG_PTRS) { + if (xfs_has_crc(mp)) + return blocklen - XFS_BTREE_LBLOCK_CRC_LEN; + + return blocklen - XFS_BTREE_LBLOCK_LEN; + } + + if (xfs_has_crc(mp)) + return blocklen - XFS_BTREE_SBLOCK_CRC_LEN; + + return blocklen - XFS_BTREE_SBLOCK_LEN; +} + +/* Initialize an empty leaf block as the btree root. */ +STATIC int +xfbtree_init_leaf_block( + struct xfs_mount *mp, + struct xfbtree *xfbt, + const struct xfbtree_config *cfg) +{ + struct xfs_buf *bp; + xfs_daddr_t daddr; + int error; + unsigned int bc_flags = 0; + + if (cfg->flags & XFBTREE_CREATE_LONG_PTRS) + bc_flags |= XFS_BTREE_LONG_PTRS; + + daddr = xfo_to_daddr(XFBTREE_INIT_LEAF_BLOCK); + error = xfs_buf_get(xfbt->target, daddr, xfbtree_bbsize(), &bp); + if (error) + return error; + + trace_xfbtree_create_root_buf(xfbt, bp); + + bp->b_ops = cfg->btree_ops->buf_ops; + xfs_btree_init_block_int(mp, bp->b_addr, daddr, cfg->btnum, 0, 0, + cfg->owner, bc_flags); + error = xfs_bwrite(bp); + xfs_buf_relse(bp); + if (error) + return error; + + xfbt->xf_used++; + return 0; +} + +/* Initialize the in-memory btree header block. */ +STATIC int +xfbtree_init_head( + struct xfbtree *xfbt) +{ + struct xfs_buf *bp; + xfs_daddr_t daddr; + int error; + + daddr = xfo_to_daddr(XFBTREE_HEAD_BLOCK); + error = xfs_buf_get(xfbt->target, daddr, xfbtree_bbsize(), &bp); + if (error) + return error; + + xfs_btree_mem_head_init(bp, xfbt->owner, XFBTREE_INIT_LEAF_BLOCK); + error = xfs_bwrite(bp); + xfs_buf_relse(bp); + if (error) + return error; + + xfbt->xf_used++; + return 0; +} + +/* Create an xfile btree backing thing that can be used for in-memory btrees. */ +int +xfbtree_create( + struct xfs_mount *mp, + const struct xfbtree_config *cfg, + struct xfbtree **xfbtreep) +{ + struct xfbtree *xfbt; + unsigned int blocklen = xfbtree_rec_bytes(mp, cfg); + unsigned int keyptr_len = cfg->btree_ops->key_len; + int error; + + /* Requires an xfile-backed buftarg. */ + if (!(cfg->target->flags & XFS_BUFTARG_IN_MEMORY)) { + ASSERT(cfg->target->flags & XFS_BUFTARG_IN_MEMORY); + return -EINVAL; + } + + xfbt = kmem_zalloc(sizeof(struct xfbtree), KM_NOFS | KM_MAYFAIL); + if (!xfbt) + return -ENOMEM; + + /* Assign our memory file and the free space bitmap. */ + xfbt->target = cfg->target; + error = bitmap_alloc(&xfbt->freespace); + if (error) + goto err_buftarg; + + /* Set up min/maxrecs for this btree. */ + if (cfg->flags & XFBTREE_CREATE_LONG_PTRS) + keyptr_len += sizeof(__be64); + else + keyptr_len += sizeof(__be32); + xfbt->maxrecs[0] = blocklen / cfg->btree_ops->rec_len; + xfbt->maxrecs[1] = blocklen / keyptr_len; + xfbt->minrecs[0] = xfbt->maxrecs[0] / 2; + xfbt->minrecs[1] = xfbt->maxrecs[1] / 2; + xfbt->owner = cfg->owner; + + /* Initialize the empty btree. */ + error = xfbtree_init_leaf_block(mp, xfbt, cfg); + if (error) + goto err_freesp; + + error = xfbtree_init_head(xfbt); + if (error) + goto err_freesp; + + trace_xfbtree_create(mp, cfg, xfbt); + + *xfbtreep = xfbt; + return 0; + +err_freesp: + bitmap_free(&xfbt->freespace); + kmem_free(xfbt->freespace); +err_buftarg: + libxfs_buftarg_drain(xfbt->target); + kmem_free(xfbt); + return error; +} + +/* Read the in-memory btree head. */ +int +xfbtree_head_read_buf( + struct xfbtree *xfbt, + struct xfs_trans *tp, + struct xfs_buf **bpp) +{ + struct xfs_buftarg *btp = xfbt->target; + struct xfs_mount *mp = btp->bt_mount; + struct xfs_btree_mem_head *mhead; + struct xfs_buf *bp; + xfs_daddr_t daddr; + int error; + + daddr = xfo_to_daddr(XFBTREE_HEAD_BLOCK); + error = xfs_trans_read_buf(mp, tp, btp, daddr, xfbtree_bbsize(), 0, + &bp, &xfs_btree_mem_head_buf_ops); + if (error) + return error; + + mhead = bp->b_addr; + if (be64_to_cpu(mhead->mh_owner) != xfbt->owner) { + xfs_verifier_error(bp, -EFSCORRUPTED, __this_address); + xfs_trans_brelse(tp, bp); + return -EFSCORRUPTED; + } + + *bpp = bp; + return 0; +} + +static inline struct xfile *xfbtree_xfile(struct xfbtree *xfbt) +{ + return xfbt->target->bt_xfile; +} + +/* Allocate a block to our in-memory btree. */ +int +xfbtree_alloc_block( + struct xfs_btree_cur *cur, + const union xfs_btree_ptr *start, + union xfs_btree_ptr *new, + int *stat) +{ + struct xfbtree *xfbt = cur->bc_mem.xfbtree; + uint64_t bt_xfoff; + loff_t pos; + int error; + + ASSERT(cur->bc_flags & XFS_BTREE_IN_MEMORY); + + /* + * Find the first free block in the free space bitmap and take it. If + * none are found, seek to end of the file. + */ + error = bitmap_take_first_set(xfbt->freespace, 0, -1ULL, &bt_xfoff); + if (error == -ENODATA) { + bt_xfoff = xfbt->xf_used; + xfbt->xf_used++; + } else if (error) { + return error; + } + + trace_xfbtree_alloc_block(xfbt, cur, bt_xfoff); + + /* Fail if the block address exceeds the maximum for short pointers. */ + if (!(cur->bc_flags & XFS_BTREE_LONG_PTRS) && bt_xfoff >= INT_MAX) { + *stat = 0; + return 0; + } + + /* Make sure we actually can write to the block before we return it. */ + pos = xfo_to_b(bt_xfoff); + error = xfile_prealloc(xfbtree_xfile(xfbt), pos, xfo_to_b(1)); + if (error) + return error; + + if (cur->bc_flags & XFS_BTREE_LONG_PTRS) + new->l = cpu_to_be64(bt_xfoff); + else + new->s = cpu_to_be32(bt_xfoff); + + *stat = 1; + return 0; +} + +/* Free a block from our in-memory btree. */ +int +xfbtree_free_block( + struct xfs_btree_cur *cur, + struct xfs_buf *bp) +{ + struct xfbtree *xfbt = cur->bc_mem.xfbtree; + xfileoff_t bt_xfoff, bt_xflen; + + ASSERT(cur->bc_flags & XFS_BTREE_IN_MEMORY); + + bt_xfoff = xfs_daddr_to_xfot(xfs_buf_daddr(bp)); + bt_xflen = xfs_daddr_to_xfot(bp->b_length); + + trace_xfbtree_free_block(xfbt, cur, bt_xfoff); + + return bitmap_set(xfbt->freespace, bt_xfoff, bt_xflen); +} + +/* Return the minimum number of records for a btree block. */ +int +xfbtree_get_minrecs( + struct xfs_btree_cur *cur, + int level) +{ + struct xfbtree *xfbt = cur->bc_mem.xfbtree; + + return xfbt->minrecs[level != 0]; +} + +/* Return the maximum number of records for a btree block. */ +int +xfbtree_get_maxrecs( + struct xfs_btree_cur *cur, + int level) +{ + struct xfbtree *xfbt = cur->bc_mem.xfbtree; + + return xfbt->maxrecs[level != 0]; +} + +/* If this log item is a buffer item that came from the xfbtree, return it. */ +static inline struct xfs_buf * +xfbtree_buf_match( + struct xfbtree *xfbt, + const struct xfs_log_item *lip) +{ + const struct xfs_buf_log_item *bli; + struct xfs_buf *bp; + + if (lip->li_type != XFS_LI_BUF) + return NULL; + + bli = container_of(lip, struct xfs_buf_log_item, bli_item); + bp = bli->bli_buf; + if (bp->b_target != xfbt->target) + return NULL; + + return bp; +} + +/* + * Detach this (probably dirty) xfbtree buffer from the transaction by any + * means necessary. Returns true if the buffer needs to be written. + */ +STATIC bool +xfbtree_trans_bdetach( + struct xfs_trans *tp, + struct xfs_buf *bp) +{ + struct xfs_buf_log_item *bli = bp->b_log_item; + bool dirty; + + ASSERT(bli != NULL); + + dirty = bli->bli_flags & (XFS_BLI_DIRTY | XFS_BLI_ORDERED); + + bli->bli_flags &= ~(XFS_BLI_DIRTY | XFS_BLI_ORDERED | + XFS_BLI_STALE); + clear_bit(XFS_LI_DIRTY, &bli->bli_item.li_flags); + + while (bp->b_log_item != NULL) + libxfs_trans_bdetach(tp, bp); + + return dirty; +} + +/* + * Commit changes to the incore btree immediately by writing all dirty xfbtree + * buffers to the backing xfile. This detaches all xfbtree buffers from the + * transaction, even on failure. The buffer locks are dropped between the + * delwri queue and submit, so the caller must synchronize btree access. + * + * Normally we'd let the buffers commit with the transaction and get written to + * the xfile via the log, but online repair stages ephemeral btrees in memory + * and uses the btree_staging functions to write new btrees to disk atomically. + * The in-memory btree (and its backing store) are discarded at the end of the + * repair phase, which means that xfbtree buffers cannot commit with the rest + * of a transaction. + * + * In other words, online repair only needs the transaction to collect buffer + * pointers and to avoid buffer deadlocks, not to guarantee consistency of + * updates. + */ +int +xfbtree_trans_commit( + struct xfbtree *xfbt, + struct xfs_trans *tp) +{ + LIST_HEAD(buffer_list); + struct xfs_log_item *lip, *n; + bool corrupt = false; + bool tp_dirty = false; + + /* + * For each xfbtree buffer attached to the transaction, write the dirty + * buffers to the xfile and release them. + */ + list_for_each_entry_safe(lip, n, &tp->t_items, li_trans) { + struct xfs_buf *bp = xfbtree_buf_match(xfbt, lip); + bool dirty; + + if (!bp) { + if (test_bit(XFS_LI_DIRTY, &lip->li_flags)) + tp_dirty |= true; + continue; + } + + trace_xfbtree_trans_commit_buf(xfbt, bp); + + dirty = xfbtree_trans_bdetach(tp, bp); + if (dirty && !corrupt) { + xfs_failaddr_t fa = bp->b_ops->verify_struct(bp); + + /* + * Because this btree is ephemeral, validate the buffer + * structure before delwri_submit so that we can return + * corruption errors to the caller without shutting + * down the filesystem. + * + * If the buffer fails verification, log the failure + * but continue walking the transaction items so that + * we remove all ephemeral btree buffers. + */ + if (fa) { + corrupt = true; + xfs_verifier_error(bp, -EFSCORRUPTED, fa); + } else { + xfs_buf_delwri_queue_here(bp, &buffer_list); + } + } + + xfs_buf_relse(bp); + } + + /* + * Reset the transaction's dirty flag to reflect the dirty state of the + * log items that are still attached. + */ + tp->t_flags = (tp->t_flags & ~XFS_TRANS_DIRTY) | + (tp_dirty ? XFS_TRANS_DIRTY : 0); + + if (corrupt) { + xfs_buf_delwri_cancel(&buffer_list); + return -EFSCORRUPTED; + } + + if (list_empty(&buffer_list)) + return 0; + + return xfs_buf_delwri_submit(&buffer_list); +} + +/* + * Cancel changes to the incore btree by detaching all the xfbtree buffers. + * Changes are not written to the backing store. This is needed for online + * repair btrees, which are by nature ephemeral. + */ +void +xfbtree_trans_cancel( + struct xfbtree *xfbt, + struct xfs_trans *tp) +{ + struct xfs_log_item *lip, *n; + bool tp_dirty = false; + + list_for_each_entry_safe(lip, n, &tp->t_items, li_trans) { + struct xfs_buf *bp = xfbtree_buf_match(xfbt, lip); + + if (!bp) { + if (test_bit(XFS_LI_DIRTY, &lip->li_flags)) + tp_dirty |= true; + continue; + } + + trace_xfbtree_trans_cancel_buf(xfbt, bp); + + xfbtree_trans_bdetach(tp, bp); + xfs_buf_relse(bp); + } + + /* + * Reset the transaction's dirty flag to reflect the dirty state of the + * log items that are still attached. + */ + tp->t_flags = (tp->t_flags & ~XFS_TRANS_DIRTY) | + (tp_dirty ? XFS_TRANS_DIRTY : 0); +} diff --git a/libxfs/xfbtree.h b/libxfs/xfbtree.h index e378b771637..72f56c69157 100644 --- a/libxfs/xfbtree.h +++ b/libxfs/xfbtree.h @@ -19,18 +19,39 @@ struct xfs_btree_mem_head { #define XFS_BTREE_MEM_HEAD_MAGIC 0x4341544D /* "CATM" */ -/* in-memory btree header is always block 0 in the backing store */ -#define XFS_BTREE_MEM_HEAD_DADDR 0 - /* xfile-backed in-memory btrees */ struct xfbtree { + /* buffer cache target for the xfile backing this in-memory btree */ struct xfs_buftarg *target; + /* Bitmap of free space from pos to used */ + struct bitmap *freespace; + + /* Number of xfile blocks actually used by this xfbtree. */ + xfileoff_t xf_used; + /* Owner of this btree. */ unsigned long long owner; + + /* Minimum and maximum records per block. */ + unsigned int maxrecs[2]; + unsigned int minrecs[2]; }; +/* The head of the in-memory btree is always at block 0 */ +#define XFBTREE_HEAD_BLOCK 0 + +/* in-memory btrees are always created with an empty leaf block at block 1 */ +#define XFBTREE_INIT_LEAF_BLOCK 1 + +int xfbtree_head_read_buf(struct xfbtree *xfbt, struct xfs_trans *tp, + struct xfs_buf **bpp); + +void xfbtree_destroy(struct xfbtree *xfbt); +int xfbtree_trans_commit(struct xfbtree *xfbt, struct xfs_trans *tp); +void xfbtree_trans_cancel(struct xfbtree *xfbt, struct xfs_trans *tp); + #endif /* CONFIG_XFS_IN_MEMORY_BTREE */ #endif /* __LIBXFS_XFBTREE_H__ */ diff --git a/libxfs/xfile.c b/libxfs/xfile.c index 5985433749d..c1b8b1c5928 100644 --- a/libxfs/xfile.c +++ b/libxfs/xfile.c @@ -240,3 +240,19 @@ xfile_dump( return execvp("od", argv); } + +/* Ensure that there is storage backing the given range. */ +int +xfile_prealloc( + struct xfile *xf, + loff_t pos, + uint64_t count) +{ + int error; + + count = min(count, xfile_maxbytes(xf) - pos); + error = fallocate(xf->fd, 0, pos, count); + if (error) + return -errno; + return 0; +} diff --git a/libxfs/xfile.h b/libxfs/xfile.h index 5a1d0104808..9580de32864 100644 --- a/libxfs/xfile.h +++ b/libxfs/xfile.h @@ -47,6 +47,8 @@ xfile_obj_store(struct xfile *xf, void *buf, size_t count, loff_t pos) return 0; } +int xfile_prealloc(struct xfile *xf, loff_t pos, uint64_t count); + struct xfile_stat { loff_t size; unsigned long long bytes; diff --git a/libxfs/xfs_btree_mem.h b/libxfs/xfs_btree_mem.h index 6ca9ea64a9a..5e7b1f20fb5 100644 --- a/libxfs/xfs_btree_mem.h +++ b/libxfs/xfs_btree_mem.h @@ -8,6 +8,26 @@ struct xfbtree; +struct xfbtree_config { + /* Buffer ops for the btree root block */ + const struct xfs_btree_ops *btree_ops; + + /* Buffer target for the xfile backing this btree. */ + struct xfs_buftarg *target; + + /* Owner of this btree. */ + unsigned long long owner; + + /* Btree type number */ + xfs_btnum_t btnum; + + /* XFBTREE_CREATE_* flags */ + unsigned int flags; +}; + +/* btree has long pointers */ +#define XFBTREE_CREATE_LONG_PTRS (1U << 0) + #ifdef CONFIG_XFS_IN_MEMORY_BTREE unsigned int xfs_btree_mem_head_nlevels(struct xfs_buf *head_bp); @@ -35,6 +55,16 @@ xfs_failaddr_t xfbtree_lblock_verify(struct xfs_buf *bp, unsigned int max_recs); xfs_failaddr_t xfbtree_sblock_verify(struct xfs_buf *bp, unsigned int max_recs); unsigned long long xfbtree_buf_to_xfoff(struct xfs_btree_cur *cur, struct xfs_buf *bp); + +int xfbtree_get_minrecs(struct xfs_btree_cur *cur, int level); +int xfbtree_get_maxrecs(struct xfs_btree_cur *cur, int level); + +int xfbtree_create(struct xfs_mount *mp, const struct xfbtree_config *cfg, + struct xfbtree **xfbtreep); +int xfbtree_alloc_block(struct xfs_btree_cur *cur, + const union xfs_btree_ptr *start, union xfs_btree_ptr *ptr, + int *stat); +int xfbtree_free_block(struct xfs_btree_cur *cur, struct xfs_buf *bp); #else static inline unsigned int xfs_btree_mem_head_nlevels(struct xfs_buf *head_bp) { @@ -77,11 +107,22 @@ static inline unsigned int xfbtree_bbsize(void) #define xfbtree_set_root NULL #define xfbtree_init_ptr_from_cur NULL #define xfbtree_dup_cursor NULL +#define xfbtree_get_minrecs NULL +#define xfbtree_get_maxrecs NULL +#define xfbtree_alloc_block NULL +#define xfbtree_free_block NULL #define xfbtree_verify_xfileoff(cur, xfoff) (false) #define xfbtree_check_block_owner(cur, block) NULL #define xfbtree_owner(cur) (0ULL) #define xfbtree_buf_to_xfoff(cur, bp) (-1) +static inline int +xfbtree_create(struct xfs_mount *mp, const struct xfbtree_config *cfg, + struct xfbtree **xfbtreep) +{ + return -EOPNOTSUPP; +} + #endif /* CONFIG_XFS_IN_MEMORY_BTREE */ #endif /* __XFS_BTREE_MEM_H__ */ From patchwork Fri Dec 30 22:17:42 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13085076 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 87BD3C4332F for ; Sat, 31 Dec 2022 00:13:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235595AbiLaANe (ORCPT ); Fri, 30 Dec 2022 19:13:34 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55276 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235580AbiLaANd (ORCPT ); Fri, 30 Dec 2022 19:13:33 -0500 Received: from ams.source.kernel.org (ams.source.kernel.org [IPv6:2604:1380:4601:e00::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0D247B48F for ; Fri, 30 Dec 2022 16:13:33 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id C5936B81DD9 for ; Sat, 31 Dec 2022 00:13:31 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 8D6CCC433D2; Sat, 31 Dec 2022 00:13:30 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1672445610; bh=BNAtpZACAl2l4KtbQxYnLgyzxcb+vb1e+k0MuWij8sc=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=RMCE+7Mnpca+xykyw0nZCtB1k9UN4lBtDDqej89M3uhsrZseSCMEy9adCAZZcLbJn QHbQ3gWRSC2yr0kpjXY6KBdDgBPEyQtMvia3WZScHxPCquhaJJ5ogmj+gAynrK5duC RsgiNfgWGIIshCMqvl1ZewVhOglhZ6sd5EmqJDwjABNhutQ4Vuq9wzh9QlzwWVAeoO AFUwzx/NyBAyZv2CX2lpv+7KvirPEnXQmcduNcpigU2lrc5TZaGgMuGAZi0mpvRx7O 6XcG1m0InQQr8eKvh/Xo/0qHMXkzH3gUQNWdW5NB9syvlj1e6cfiXGqa7ck4FE6kUk gxS7z8evH90iw== Subject: [PATCH 9/9] xfbtree: let the buffer cache flush dirty buffers to the xfile From: "Darrick J. Wong" To: cem@kernel.org, djwong@kernel.org Cc: linux-xfs@vger.kernel.org Date: Fri, 30 Dec 2022 14:17:42 -0800 Message-ID: <167243866277.711834.8918707899481865492.stgit@magnolia> In-Reply-To: <167243866153.711834.17585439086893346840.stgit@magnolia> References: <167243866153.711834.17585439086893346840.stgit@magnolia> User-Agent: StGit/0.19 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Darrick J. Wong As a performance optimization, when we're committing xfbtree updates, let the buffer cache flush the dirty buffers to disk when it's ready instead of writing everything at every transaction commit. This is a bit sketchy but it's an ephemeral tree so we can play fast and loose. Signed-off-by: Darrick J. Wong --- libxfs/xfbtree.c | 17 +++++++---------- 1 file changed, 7 insertions(+), 10 deletions(-) diff --git a/libxfs/xfbtree.c b/libxfs/xfbtree.c index 41851c3b9ae..65d6baea856 100644 --- a/libxfs/xfbtree.c +++ b/libxfs/xfbtree.c @@ -699,7 +699,6 @@ xfbtree_trans_commit( struct xfbtree *xfbt, struct xfs_trans *tp) { - LIST_HEAD(buffer_list); struct xfs_log_item *lip, *n; bool corrupt = false; bool tp_dirty = false; @@ -733,12 +732,16 @@ xfbtree_trans_commit( * If the buffer fails verification, log the failure * but continue walking the transaction items so that * we remove all ephemeral btree buffers. + * + * Since the userspace buffer cache supports marking + * buffers dirty and flushing them later, use this to + * reduce the number of writes to the xfile. */ if (fa) { corrupt = true; xfs_verifier_error(bp, -EFSCORRUPTED, fa); } else { - xfs_buf_delwri_queue_here(bp, &buffer_list); + libxfs_buf_mark_dirty(bp); } } @@ -752,15 +755,9 @@ xfbtree_trans_commit( tp->t_flags = (tp->t_flags & ~XFS_TRANS_DIRTY) | (tp_dirty ? XFS_TRANS_DIRTY : 0); - if (corrupt) { - xfs_buf_delwri_cancel(&buffer_list); + if (corrupt) return -EFSCORRUPTED; - } - - if (list_empty(&buffer_list)) - return 0; - - return xfs_buf_delwri_submit(&buffer_list); + return 0; } /*