From patchwork Thu Feb 6 22:30:54 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13963872 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BF06E1DDA2D for ; Thu, 6 Feb 2025 22:30:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738881056; cv=none; b=qJV9wPRiNy7Hf/IKh624olr7gqKYghdIyZMByF5mTuzLrk/8aZqymBTZoVNbV9kfxJZaP5T/CtdmfHib/+aFhAmLl1J3kjgpI/FlKDEW6u5CdXXWdariZWB9Bmzs5GSUIeWI67a33bmhPWH/Nd/UdUZ33FIQVSYEZF1M/rA8gtQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738881056; c=relaxed/simple; bh=T7Wjg00+8Zlz88Jf6nNBEN9QXIiwlKpVlYrKOVduqn0=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=D8olOIeqRuwVeA8u8tedMVLjJ5iOCjHFzoSXfvvjlYzooM4x/sFUt0C5dVVSX3REFNwhgX24yR/YjwodMtXEUA37wNVp2it8Zxc7z/c7IiU/9cD5v43WaqQHIUBlAVSHWD5m2W7WlTqMyXz9RlzN7KoYnCMljetPz6S1hxiZR80= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=nzMibIXz; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="nzMibIXz" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 27991C4CEDD; Thu, 6 Feb 2025 22:30:55 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1738881055; bh=T7Wjg00+8Zlz88Jf6nNBEN9QXIiwlKpVlYrKOVduqn0=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=nzMibIXzYUk9y6/ET1kmMlVzcwrvTfhQf1A/dABa1PEq/rzTS0MD7/NutOTs/qFV4 gK1ElpA0RhtXi0+k6Aqgy/OlWBDPneVGoB0HRunGKz7f60qm6roRruGAcTsKSDVWnj Pkh5mTxLskXxHvyB27zADlqj+hd3OOTjbn4p2Qbq15+GeNLsJ5roYf+6kPT8hvbk/f FNFLAPw1KIAa+9lJY9L156lQhTfO3MUAWR+PmAwH5blpNks7bnUq/NXoqNeKX3ybwZ L2eqrF/4h/1T7gcgEG26VvXv/KnfeO2VtsNM2h83tqr73NjBMBE32nC5jy91+t30G/ RDkvbNokXeUUw== Date: Thu, 06 Feb 2025 14:30:54 -0800 Subject: [PATCH 01/17] libxfs: unmap xmbuf pages to avoid disaster From: "Darrick J. Wong" To: djwong@kernel.org, aalbersh@kernel.org Cc: hch@lst.de, linux-xfs@vger.kernel.org Message-ID: <173888086075.2738568.9520704150703509751.stgit@frogsfrogsfrogs> In-Reply-To: <173888086034.2738568.15125078367450007162.stgit@frogsfrogsfrogs> References: <173888086034.2738568.15125078367450007162.stgit@frogsfrogsfrogs> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong It turns out that there's a maximum mappings count, so we need to be smartish about not overflowing that with too many xmbuf buffers. This needs to be a global value because high-agcount filesystems will create a large number of xmbuf caches but this is a process-global limit. Cc: # v6.9.0 Fixes: 124b388dac17f5 ("libxfs: support in-memory buffer cache targets") Signed-off-by: "Darrick J. Wong" Reviewed-by: Christoph Hellwig --- include/cache.h | 6 +++ libxfs/buf_mem.c | 102 ++++++++++++++++++++++++++++++++++++++++++++++++++++-- libxfs/cache.c | 11 ++++++ 3 files changed, 115 insertions(+), 4 deletions(-) diff --git a/include/cache.h b/include/cache.h index 334ad26309e26d..279bf717ba335f 100644 --- a/include/cache.h +++ b/include/cache.h @@ -64,6 +64,8 @@ typedef unsigned int (*cache_node_hash_t)(cache_key_t, unsigned int, unsigned int); typedef int (*cache_node_compare_t)(struct cache_node *, cache_key_t); typedef unsigned int (*cache_bulk_relse_t)(struct cache *, struct list_head *); +typedef int (*cache_node_get_t)(struct cache_node *); +typedef void (*cache_node_put_t)(struct cache_node *); struct cache_operations { cache_node_hash_t hash; @@ -72,6 +74,8 @@ struct cache_operations { cache_node_relse_t relse; cache_node_compare_t compare; cache_bulk_relse_t bulkrelse; /* optional */ + cache_node_get_t get; /* optional */ + cache_node_put_t put; /* optional */ }; struct cache_hash { @@ -107,6 +111,8 @@ struct cache { cache_node_relse_t relse; /* memory free function */ cache_node_compare_t compare; /* comparison routine */ cache_bulk_relse_t bulkrelse; /* bulk release routine */ + cache_node_get_t get; /* prepare cache node after get */ + cache_node_put_t put; /* prepare to put cache node */ unsigned int c_hashsize; /* hash bucket count */ unsigned int c_hashshift; /* hash key shift */ struct cache_hash *c_hash; /* hash table buckets */ diff --git a/libxfs/buf_mem.c b/libxfs/buf_mem.c index e5b91d3cfe0486..16cb038ba10e2a 100644 --- a/libxfs/buf_mem.c +++ b/libxfs/buf_mem.c @@ -34,6 +34,36 @@ unsigned int XMBUF_BLOCKSIZE; unsigned int XMBUF_BLOCKSHIFT; +long xmbuf_max_mappings; +static atomic_t xmbuf_mappings; +bool xmbuf_unmap_early = false; + +static long +get_max_mmap_count(void) +{ + char buffer[64]; + char *p = NULL; + long ret = -1; + FILE *file; + + file = fopen("/proc/sys/vm/max_map_count", "r"); + if (!file) + return -1; + + while (fgets(buffer, sizeof(buffer), file)) { + errno = 0; + ret = strtol(buffer, &p, 0); + if (errno || p == buffer) + continue; + + /* only take half the maximum mmap count so others can use it */ + ret /= 2; + break; + } + fclose(file); + return ret; +} + void xmbuf_libinit(void) { @@ -45,6 +75,14 @@ xmbuf_libinit(void) XMBUF_BLOCKSIZE = ret; XMBUF_BLOCKSHIFT = libxfs_highbit32(XMBUF_BLOCKSIZE); + + /* + * Figure out how many mmaps we will use simultaneously. Pick a low + * default if we can't query procfs. + */ + xmbuf_max_mappings = get_max_mmap_count(); + if (xmbuf_max_mappings < 0) + xmbuf_max_mappings = 1024; } /* Allocate a new cache node (aka a xfs_buf) */ @@ -105,7 +143,8 @@ xmbuf_cache_relse( struct xfs_buf *bp; bp = container_of(node, struct xfs_buf, b_node); - xmbuf_unmap_page(bp); + if (bp->b_addr) + xmbuf_unmap_page(bp); kmem_cache_free(xfs_buf_cache, bp); } @@ -129,13 +168,50 @@ xmbuf_cache_bulkrelse( return count; } +static int +xmbuf_cache_node_get( + struct cache_node *node) +{ + struct xfs_buf *bp = + container_of(node, struct xfs_buf, b_node); + int error; + + if (bp->b_addr != NULL) + return 0; + + error = xmbuf_map_page(bp); + if (error) { + fprintf(stderr, + _("%s: %s can't mmap %u bytes at xfile offset %llu: %s\n"), + progname, __FUNCTION__, BBTOB(bp->b_length), + (unsigned long long)xfs_buf_daddr(bp), + strerror(error)); + return error; + } + + return 0; +} + +static void +xmbuf_cache_node_put( + struct cache_node *node) +{ + struct xfs_buf *bp = + container_of(node, struct xfs_buf, b_node); + + if (xmbuf_unmap_early) + xmbuf_unmap_page(bp); +} + static struct cache_operations xmbuf_bcache_operations = { .hash = libxfs_bhash, .alloc = xmbuf_cache_alloc, .flush = xmbuf_cache_flush, .relse = xmbuf_cache_relse, .compare = libxfs_bcompare, - .bulkrelse = xmbuf_cache_bulkrelse + .bulkrelse = xmbuf_cache_bulkrelse, + .get = xmbuf_cache_node_get, + .put = xmbuf_cache_node_put, }; /* @@ -216,8 +292,24 @@ xmbuf_map_page( pos = xfile->partition_pos + BBTOB(xfs_buf_daddr(bp)); p = mmap(NULL, BBTOB(bp->b_length), PROT_READ | PROT_WRITE, MAP_SHARED, xfile->fcb->fd, pos); - if (p == MAP_FAILED) - return -errno; + if (p == MAP_FAILED) { + if (errno == ENOMEM && !xmbuf_unmap_early) { +#ifdef DEBUG + fprintf(stderr, "xmbuf could not make mappings!\n"); +#endif + xmbuf_unmap_early = true; + } + return errno; + } + + if (!xmbuf_unmap_early && + atomic_inc_return(&xmbuf_mappings) > xmbuf_max_mappings) { +#ifdef DEBUG + fprintf(stderr, _("xmbuf hit too many mappings (%ld)!\n", + xmbuf_max_mappings); +#endif + xmbuf_unmap_early = true; + } bp->b_addr = p; bp->b_flags |= LIBXFS_B_UPTODATE | LIBXFS_B_UNCHECKED; @@ -230,6 +322,8 @@ void xmbuf_unmap_page( struct xfs_buf *bp) { + if (!xmbuf_unmap_early) + atomic_dec(&xmbuf_mappings); munmap(bp->b_addr, BBTOB(bp->b_length)); bp->b_addr = NULL; } diff --git a/libxfs/cache.c b/libxfs/cache.c index 139c7c1b9e715e..af20f3854df93e 100644 --- a/libxfs/cache.c +++ b/libxfs/cache.c @@ -61,6 +61,8 @@ cache_init( cache->compare = cache_operations->compare; cache->bulkrelse = cache_operations->bulkrelse ? cache_operations->bulkrelse : cache_generic_bulkrelse; + cache->get = cache_operations->get; + cache->put = cache_operations->put; pthread_mutex_init(&cache->c_mutex, NULL); for (i = 0; i < hashsize; i++) { @@ -415,6 +417,13 @@ cache_node_get( */ pthread_mutex_lock(&node->cn_mutex); + if (node->cn_count == 0 && cache->get) { + int err = cache->get(node); + if (err) { + pthread_mutex_unlock(&node->cn_mutex); + goto next_object; + } + } if (node->cn_count == 0) { ASSERT(node->cn_priority >= 0); ASSERT(!list_empty(&node->cn_mru)); @@ -503,6 +512,8 @@ cache_node_put( #endif node->cn_count--; + if (node->cn_count == 0 && cache->put) + cache->put(node); if (node->cn_count == 0) { /* add unreferenced node to appropriate MRU for shaker */ mru = &cache->c_mrus[node->cn_priority]; From patchwork Thu Feb 6 22:31:10 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13963873 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4E7711CEAD6 for ; Thu, 6 Feb 2025 22:31:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738881071; cv=none; b=sIVNKCEpwb66oMILd+lfSKtqo/1KsttqF6rOZ5Wh45xjO80YGlg9oKCCrSk1H/xotaXoCvkmvzs07132Nhy7yCPnkyIuBCp81URTnoYX7nqyaL8uz3IW+ABuASqjwv5PSzJ1WI1IZmwAToA1zgUUF3S0dk4cTV5EfzdvJZ9b5KA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738881071; c=relaxed/simple; bh=g/fdodwYfi9LHgPzpiOp1cwbkpzdtJvF9OMMGdCS9Tg=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=au5mIhRQoIZ1MlEUvAAMXfVANrdNVRHZSb6HzMPN7/5yedSMjp5Dk7sd0WDB2POOHMZsXa+V0ngVDnGoHPlT2JG8LqnTPI3afLPu3AoqN76MThawyFdsd8+yo3suny3Il+7nqabPGg+QL0DcdlmMYY/Se+A9fh/7nOAG+IYJnFs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=o7/5aTRF; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="o7/5aTRF" Received: by smtp.kernel.org (Postfix) with ESMTPSA id C0009C4CEDD; Thu, 6 Feb 2025 22:31:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1738881070; bh=g/fdodwYfi9LHgPzpiOp1cwbkpzdtJvF9OMMGdCS9Tg=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=o7/5aTRFbmeroAXUDnwp/eu9NI2HZVD1g+DomWMTd6BR7DtAgKO1AjkOnZMkyT++v fn7/9l2kmE24STD++FwdRAy8tiUL9DzChCF6d2wtLaupf/7KN7dL0m+rnz0V0CWF5c I5lbrhXZXcd2NudAcAy5QqnnQnlCB1SniJrN77mrEziycjwDAkKQQLP0h1HiipOnhU oX2l5KF6VxvzJGy3RAwtpDCIQF/eQc2+yzL97U83/yzpzsLIhKEypy+o/Y4ejyt588 3c5mRS1JBg077U3VfHev8SsCRpEhpqmkVd1l499MMMzFHBH2c4lMj4hgiCysTkw2zp cCJoa9Rz+v8Hw== Date: Thu, 06 Feb 2025 14:31:10 -0800 Subject: [PATCH 02/17] libxfs: mark xmbuf_{un,}map_page static From: "Darrick J. Wong" To: djwong@kernel.org, aalbersh@kernel.org Cc: hch@lst.de, hch@lst.de, linux-xfs@vger.kernel.org Message-ID: <173888086090.2738568.18394123905353895033.stgit@frogsfrogsfrogs> In-Reply-To: <173888086034.2738568.15125078367450007162.stgit@frogsfrogsfrogs> References: <173888086034.2738568.15125078367450007162.stgit@frogsfrogsfrogs> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Christoph Hellwig Not used outside of buf_mem.c. Signed-off-by: Christoph Hellwig Reviewed-by: "Darrick J. Wong" Signed-off-by: "Darrick J. Wong" --- libxfs/buf_mem.c | 97 +++++++++++++++++++++++++++--------------------------- libxfs/buf_mem.h | 3 -- 2 files changed, 49 insertions(+), 51 deletions(-) diff --git a/libxfs/buf_mem.c b/libxfs/buf_mem.c index 16cb038ba10e2a..77396fa95b4138 100644 --- a/libxfs/buf_mem.c +++ b/libxfs/buf_mem.c @@ -85,6 +85,55 @@ xmbuf_libinit(void) xmbuf_max_mappings = 1024; } +/* Directly map a memfd page into the buffer cache. */ +static int +xmbuf_map_page( + struct xfs_buf *bp) +{ + struct xfile *xfile = bp->b_target->bt_xfile; + void *p; + loff_t pos; + + pos = xfile->partition_pos + BBTOB(xfs_buf_daddr(bp)); + p = mmap(NULL, BBTOB(bp->b_length), PROT_READ | PROT_WRITE, MAP_SHARED, + xfile->fcb->fd, pos); + if (p == MAP_FAILED) { + if (errno == ENOMEM && !xmbuf_unmap_early) { +#ifdef DEBUG + fprintf(stderr, "xmbuf could not make mappings!\n"); +#endif + xmbuf_unmap_early = true; + } + return errno; + } + + if (!xmbuf_unmap_early && + atomic_inc_return(&xmbuf_mappings) > xmbuf_max_mappings) { +#ifdef DEBUG + fprintf(stderr, _("xmbuf hit too many mappings (%ld)!\n", + xmbuf_max_mappings); +#endif + xmbuf_unmap_early = true; + } + + bp->b_addr = p; + bp->b_flags |= LIBXFS_B_UPTODATE | LIBXFS_B_UNCHECKED; + bp->b_error = 0; + return 0; +} + +/* Unmap a memfd page that was mapped into the buffer cache. */ +static void +xmbuf_unmap_page( + struct xfs_buf *bp) +{ + if (!xmbuf_unmap_early) + atomic_dec(&xmbuf_mappings); + munmap(bp->b_addr, BBTOB(bp->b_length)); + bp->b_addr = NULL; +} + + /* Allocate a new cache node (aka a xfs_buf) */ static struct cache_node * xmbuf_cache_alloc( @@ -280,54 +329,6 @@ xmbuf_free( kfree(btp); } -/* Directly map a memfd page into the buffer cache. */ -int -xmbuf_map_page( - struct xfs_buf *bp) -{ - struct xfile *xfile = bp->b_target->bt_xfile; - void *p; - loff_t pos; - - pos = xfile->partition_pos + BBTOB(xfs_buf_daddr(bp)); - p = mmap(NULL, BBTOB(bp->b_length), PROT_READ | PROT_WRITE, MAP_SHARED, - xfile->fcb->fd, pos); - if (p == MAP_FAILED) { - if (errno == ENOMEM && !xmbuf_unmap_early) { -#ifdef DEBUG - fprintf(stderr, "xmbuf could not make mappings!\n"); -#endif - xmbuf_unmap_early = true; - } - return errno; - } - - if (!xmbuf_unmap_early && - atomic_inc_return(&xmbuf_mappings) > xmbuf_max_mappings) { -#ifdef DEBUG - fprintf(stderr, _("xmbuf hit too many mappings (%ld)!\n", - xmbuf_max_mappings); -#endif - xmbuf_unmap_early = true; - } - - bp->b_addr = p; - bp->b_flags |= LIBXFS_B_UPTODATE | LIBXFS_B_UNCHECKED; - bp->b_error = 0; - return 0; -} - -/* Unmap a memfd page that was mapped into the buffer cache. */ -void -xmbuf_unmap_page( - struct xfs_buf *bp) -{ - if (!xmbuf_unmap_early) - atomic_dec(&xmbuf_mappings); - munmap(bp->b_addr, BBTOB(bp->b_length)); - bp->b_addr = NULL; -} - /* Is this a valid daddr within the buftarg? */ bool xmbuf_verify_daddr( diff --git a/libxfs/buf_mem.h b/libxfs/buf_mem.h index f19bc6fd700b9a..6e4b2d3503b853 100644 --- a/libxfs/buf_mem.h +++ b/libxfs/buf_mem.h @@ -20,9 +20,6 @@ int xmbuf_alloc(struct xfs_mount *mp, const char *descr, unsigned long long maxpos, struct xfs_buftarg **btpp); void xmbuf_free(struct xfs_buftarg *btp); -int xmbuf_map_page(struct xfs_buf *bp); -void xmbuf_unmap_page(struct xfs_buf *bp); - bool xmbuf_verify_daddr(struct xfs_buftarg *btp, xfs_daddr_t daddr); void xmbuf_trans_bdetach(struct xfs_trans *tp, struct xfs_buf *bp); int xmbuf_finalize(struct xfs_buf *bp); From patchwork Thu Feb 6 22:31:25 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13963874 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0239B23C380 for ; Thu, 6 Feb 2025 22:31:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738881087; cv=none; b=BTWIDfHUfc47mYaBRT8XejwZSfZDxB89JP4Vdyi/oMEeiWKYvRxsepOYvkxsBvPO475ubt9WzZlPVmuzm/PWeTwlTOjbaZfR6Ive2GKyTyzQmuUrJKzDH3RwV5xaJTZM1/exdE8y1Twn2rF2RW9sL4afCPLfTc9n0twI6oGYGQE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738881087; c=relaxed/simple; bh=A75iD+p2qdMNi7CcjMFVYN/ojm7YOnmP/jjfFEqoEYQ=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=G1btPdc6CXqGehWWN6qHtT5AkafOkNMUg7lSE31OPtR/xNRENSM68r0bGQyNHSAHRKA/xpd6xPTgzr2xFaJNBpJ0fHrj9gngdGTi1VzbmW2yNPY6KnNbaCvurx16gY60JyVJ+wE/nJzZIi0jJ0+mIP+T5YxGbMPYEx7JwmPeGko= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=aGtiAz25; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="aGtiAz25" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 766D2C4CEDD; Thu, 6 Feb 2025 22:31:26 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1738881086; bh=A75iD+p2qdMNi7CcjMFVYN/ojm7YOnmP/jjfFEqoEYQ=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=aGtiAz25vDITb4laZchIdXyKxnEHgNm6XQEzQNFCTw1ys9tdu8ByxZBIwOjSjwRb7 rCivBWXxElroOeD5oz8yKntfZvU1Nh66eZyfJGWdOVQ+AmFWkICDAWxkkHqS96CTnP En+OTApOxk90lsr7DerxARmvIM0CVWgBjush/FwosFQcbWhH/MkgqSH+1XPZPmHt66 1t5PJC0Y6kI1N8pO4jw2W98JLgywvl1+6OngkZDxUI2UEMb7Y5wFdvDuyI92GyLref BkFtZFWBOKMVMnoeXpjhJVlpNplH/a5yMihU4ll7iJ/D86Tv5opXfcy01vjQj36trg WkA9T5w+z/VHA== Date: Thu, 06 Feb 2025 14:31:25 -0800 Subject: [PATCH 03/17] man: document new XFS_BULK_IREQ_METADIR flag to bulkstat From: "Darrick J. Wong" To: djwong@kernel.org, aalbersh@kernel.org Cc: hch@lst.de, linux-xfs@vger.kernel.org Message-ID: <173888086105.2738568.7689923306499344386.stgit@frogsfrogsfrogs> In-Reply-To: <173888086034.2738568.15125078367450007162.stgit@frogsfrogsfrogs> References: <173888086034.2738568.15125078367450007162.stgit@frogsfrogsfrogs> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Document this new flag. Signed-off-by: "Darrick J. Wong" Reviewed-by: Christoph Hellwig --- man/man2/ioctl_xfs_bulkstat.2 | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/man/man2/ioctl_xfs_bulkstat.2 b/man/man2/ioctl_xfs_bulkstat.2 index b6d51aa438111d..0dba16a6b0e2df 100644 --- a/man/man2/ioctl_xfs_bulkstat.2 +++ b/man/man2/ioctl_xfs_bulkstat.2 @@ -101,6 +101,14 @@ .SH DESCRIPTION via bs_extents field and bs_extents64 is assigned a value of 0. In the second case, bs_extents is set to (2^31 - 1) if data fork extent count is larger than 2^31. This flag may be set independently of whether other flags have been set. +.TP +.B XFS_BULK_IREQ_METADIR +Return metadata directory tree inodes in the stat output as well. +The only fields that will be populated are +.IR bs_ino ", " bs_gen ", " bs_mode ", " bs_sick ", and " bs_checked. +All other fields, notably +.IR bs_blksize , +will be zero. .RE .PP .I hdr.icount From patchwork Thu Feb 6 22:31:41 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13963875 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B305423C380 for ; Thu, 6 Feb 2025 22:31:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738881103; cv=none; b=UYzxq/qM+6ofAGcqtwiyIjtEtgP7BpiIyqQifwhm7wRUaqb5tUYBH+KKNVioPFG/mujZNpaPUKQnMoPNsMTGdSJkPYr1QOoEQPk+HwPYup6f2xUR7DBKvpH7uXXQbykt5lyDeiAhWN/upVbfmTgtW30eTNgVyI6rtDosYNdTsaw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738881103; c=relaxed/simple; bh=85RxbEHIXDely4uyqI+E9ItwB8JeaYQfFWT/fvaRr1k=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=W2P5K3oLp+8MdSkck1x4XzX3tcr1sPc03BwS1zusebt1BS358ftS6bEtfCaW7/IBhc2bNTarfLsyVSam+fmneO6MZwUaiPqJQSAW6MnoPyThjzvf7BDgOJKqr80eZTHlkeFqHM52f2hbYi7QnRRMyxBpGsIQc31mjI4waZzpbms= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=JAruEv+R; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="JAruEv+R" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 1B2ABC4CEDD; Thu, 6 Feb 2025 22:31:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1738881102; bh=85RxbEHIXDely4uyqI+E9ItwB8JeaYQfFWT/fvaRr1k=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=JAruEv+R4RmH2pSx4LACs80y9LdH0wZyaWkXn/z5DoKEouw6VZwzk71jjeVdQD1+9 tYaQhI6qhCb7OCxQQ3riXVcYRu+Hra+it/XCsR6YS3P0Z8xTqsHNtoQQHu3AhQPySQ MA1GqhGGR/iEw1sJySWXKgVssL1qoK63lKDokm4zAvF6smS3Oc53wnkrWqDkdwG4Cd X6618AWd+TVlucZDhxnSzznVWjJ0bwOb6NHO/tdvV1Qen56pmkE89VjoKSPqI2oZSZ b81Wv1KXaQ32YPaQbsGJ57ysws7EMMbjT4idGotkOo1/zTzp8vxdTXWXRRl9CTP4ZQ PYz6WDsp/angg== Date: Thu, 06 Feb 2025 14:31:41 -0800 Subject: [PATCH 04/17] libfrog: wrap handle construction code From: "Darrick J. Wong" To: djwong@kernel.org, aalbersh@kernel.org Cc: hch@lst.de, linux-xfs@vger.kernel.org Message-ID: <173888086121.2738568.17449625667584946105.stgit@frogsfrogsfrogs> In-Reply-To: <173888086034.2738568.15125078367450007162.stgit@frogsfrogsfrogs> References: <173888086034.2738568.15125078367450007162.stgit@frogsfrogsfrogs> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Clean up all the open-coded logic to construct a file handle from a fshandle and some bulkstat/parent pointer information. The new functions are stashed in a private header file to avoid leaking the details of xfs_handle construction in the public libhandle headers. Signed-off-by: "Darrick J. Wong" --- io/parent.c | 9 +++----- libfrog/Makefile | 1 + libfrog/handle_priv.h | 55 +++++++++++++++++++++++++++++++++++++++++++++++++ scrub/common.c | 9 +++----- scrub/inodes.c | 13 ++++-------- scrub/phase5.c | 12 ++++------- spaceman/health.c | 9 +++----- 7 files changed, 73 insertions(+), 35 deletions(-) create mode 100644 libfrog/handle_priv.h diff --git a/io/parent.c b/io/parent.c index 8db93d98755289..3ba3aef48cb9be 100644 --- a/io/parent.c +++ b/io/parent.c @@ -11,6 +11,7 @@ #include "handle.h" #include "init.h" #include "io.h" +#include "libfrog/handle_priv.h" static cmdinfo_t parent_cmd; static char *mntpt; @@ -205,12 +206,8 @@ parent_f( return 0; } - memcpy(&handle, hanp, sizeof(handle)); - handle.ha_fid.fid_len = sizeof(xfs_fid_t) - - sizeof(handle.ha_fid.fid_len); - handle.ha_fid.fid_pad = 0; - handle.ha_fid.fid_ino = ino; - handle.ha_fid.fid_gen = gen; + handle_from_fshandle(&handle, hanp, hlen); + handle_from_inogen(&handle, ino, gen); } else if (optind != argc) { return command_usage(&parent_cmd); } diff --git a/libfrog/Makefile b/libfrog/Makefile index 4da427789411a6..fc7e506d96bbad 100644 --- a/libfrog/Makefile +++ b/libfrog/Makefile @@ -53,6 +53,7 @@ fsgeom.h \ fsproperties.h \ fsprops.h \ getparents.h \ +handle_priv.h \ histogram.h \ logging.h \ paths.h \ diff --git a/libfrog/handle_priv.h b/libfrog/handle_priv.h new file mode 100644 index 00000000000000..8c3634c40de1c8 --- /dev/null +++ b/libfrog/handle_priv.h @@ -0,0 +1,55 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2025 Oracle. All Rights Reserved. + * Author: Darrick J. Wong + */ +#ifndef __LIBFROG_HANDLE_PRIV_H__ +#define __LIBFROG_HANDLE_PRIV_H__ + +/* + * Private helpers to construct an xfs_handle without publishing those details + * in the public libhandle header files. + */ + +/* + * Fills out the fsid part of a handle. This does not initialize the fid part + * of the handle; use either of the two functions below. + */ +static inline void +handle_from_fshandle( + struct xfs_handle *handle, + const void *fshandle, + size_t fshandle_len) +{ + ASSERT(fshandle_len == sizeof(xfs_fsid_t)); + + memcpy(&handle->ha_fsid, fshandle, sizeof(handle->ha_fsid)); + handle->ha_fid.fid_len = sizeof(xfs_fid_t) - + sizeof(handle->ha_fid.fid_len); + handle->ha_fid.fid_pad = 0; + handle->ha_fid.fid_ino = 0; + handle->ha_fid.fid_gen = 0; +} + +/* Fill out the fid part of a handle from raw components. */ +static inline void +handle_from_inogen( + struct xfs_handle *handle, + uint64_t ino, + uint32_t gen) +{ + handle->ha_fid.fid_ino = ino; + handle->ha_fid.fid_gen = gen; +} + +/* Fill out the fid part of a handle. */ +static inline void +handle_from_bulkstat( + struct xfs_handle *handle, + const struct xfs_bulkstat *bstat) +{ + handle->ha_fid.fid_ino = bstat->bs_ino; + handle->ha_fid.fid_gen = bstat->bs_gen; +} + +#endif /* __LIBFROG_HANDLE_PRIV_H__ */ diff --git a/scrub/common.c b/scrub/common.c index f86546556f46dd..6eb3c026dc5ac9 100644 --- a/scrub/common.c +++ b/scrub/common.c @@ -10,6 +10,7 @@ #include "platform_defs.h" #include "libfrog/paths.h" #include "libfrog/getparents.h" +#include "libfrog/handle_priv.h" #include "xfs_scrub.h" #include "common.h" #include "progress.h" @@ -414,12 +415,8 @@ scrub_render_ino_descr( if (ctx->mnt.fsgeom.flags & XFS_FSOP_GEOM_FLAGS_PARENT) { struct xfs_handle handle; - memcpy(&handle.ha_fsid, ctx->fshandle, sizeof(handle.ha_fsid)); - handle.ha_fid.fid_len = sizeof(xfs_fid_t) - - sizeof(handle.ha_fid.fid_len); - handle.ha_fid.fid_pad = 0; - handle.ha_fid.fid_ino = ino; - handle.ha_fid.fid_gen = gen; + handle_from_fshandle(&handle, ctx->fshandle, ctx->fshandle_len); + handle_from_inogen(&handle, ino, gen); ret = handle_to_path(&handle, sizeof(struct xfs_handle), 4096, buf, buflen); diff --git a/scrub/inodes.c b/scrub/inodes.c index 3fe759e8f4867d..2b492a634ea3b2 100644 --- a/scrub/inodes.c +++ b/scrub/inodes.c @@ -19,6 +19,7 @@ #include "descr.h" #include "libfrog/fsgeom.h" #include "libfrog/bulkstat.h" +#include "libfrog/handle_priv.h" /* * Iterate a range of inodes. @@ -209,7 +210,7 @@ scan_ag_bulkstat( xfs_agnumber_t agno, void *arg) { - struct xfs_handle handle = { }; + struct xfs_handle handle; struct scrub_ctx *ctx = (struct scrub_ctx *)wq->wq_ctx; struct scan_ichunk *ichunk = arg; struct xfs_inumbers_req *ireq = ichunk_to_inumbers(ichunk); @@ -225,12 +226,7 @@ scan_ag_bulkstat( DEFINE_DESCR(dsc_inumbers, ctx, render_inumbers_from_agno); descr_set(&dsc_inumbers, &agno); - - memcpy(&handle.ha_fsid, ctx->fshandle, sizeof(handle.ha_fsid)); - handle.ha_fid.fid_len = sizeof(xfs_fid_t) - - sizeof(handle.ha_fid.fid_len); - handle.ha_fid.fid_pad = 0; - + handle_from_fshandle(&handle, ctx->fshandle, ctx->fshandle_len); retry: bulkstat_for_inumbers(ctx, &dsc_inumbers, inumbers, breq); @@ -244,8 +240,7 @@ scan_ag_bulkstat( continue; descr_set(&dsc_bulkstat, bs); - handle.ha_fid.fid_ino = scan_ino; - handle.ha_fid.fid_gen = bs->bs_gen; + handle_from_bulkstat(&handle, bs); error = si->fn(ctx, &handle, bs, si->arg); switch (error) { case 0: diff --git a/scrub/phase5.c b/scrub/phase5.c index 22a22915dbc68d..6460d00f30f4bd 100644 --- a/scrub/phase5.c +++ b/scrub/phase5.c @@ -18,6 +18,7 @@ #include "libfrog/bitmap.h" #include "libfrog/bulkstat.h" #include "libfrog/fakelibattr.h" +#include "libfrog/handle_priv.h" #include "xfs_scrub.h" #include "common.h" #include "inodes.h" @@ -474,9 +475,7 @@ retry_deferred_inode( if (error) return error; - handle->ha_fid.fid_ino = bstat.bs_ino; - handle->ha_fid.fid_gen = bstat.bs_gen; - + handle_from_bulkstat(handle, &bstat); return check_inode_names(ncs->ctx, handle, &bstat, ncs); } @@ -487,16 +486,13 @@ retry_deferred_inode_range( uint64_t len, void *arg) { - struct xfs_handle handle = { }; + struct xfs_handle handle; struct ncheck_state *ncs = arg; struct scrub_ctx *ctx = ncs->ctx; uint64_t i; int error; - memcpy(&handle.ha_fsid, ctx->fshandle, sizeof(handle.ha_fsid)); - handle.ha_fid.fid_len = sizeof(xfs_fid_t) - - sizeof(handle.ha_fid.fid_len); - handle.ha_fid.fid_pad = 0; + handle_from_fshandle(&handle, ctx->fshandle, ctx->fshandle_len); for (i = 0; i < len; i++) { error = retry_deferred_inode(ncs, &handle, ino + i); diff --git a/spaceman/health.c b/spaceman/health.c index 4281589324cd44..0d2767df424f27 100644 --- a/spaceman/health.c +++ b/spaceman/health.c @@ -14,6 +14,7 @@ #include "libfrog/bulkstat.h" #include "space.h" #include "libfrog/getparents.h" +#include "libfrog/handle_priv.h" static cmdinfo_t health_cmd; static unsigned long long reported; @@ -317,12 +318,8 @@ report_inode( (file->xfd.fsgeom.flags & XFS_FSOP_GEOM_FLAGS_PARENT)) { struct xfs_handle handle; - memcpy(&handle.ha_fsid, file->fshandle, sizeof(handle.ha_fsid)); - handle.ha_fid.fid_len = sizeof(xfs_fid_t) - - sizeof(handle.ha_fid.fid_len); - handle.ha_fid.fid_pad = 0; - handle.ha_fid.fid_ino = bs->bs_ino; - handle.ha_fid.fid_gen = bs->bs_gen; + handle_from_fshandle(&handle, file->fshandle, file->fshandle_len); + handle_from_bulkstat(&handle, bs); ret = handle_to_path(&handle, sizeof(struct xfs_handle), 0, descr, sizeof(descr) - 1); From patchwork Thu Feb 6 22:31:57 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13963876 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DC6751A9B3F for ; Thu, 6 Feb 2025 22:31:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738881118; cv=none; b=meU0ha8MHk5h854chuuEAgP9wb++yhLgyGHsGGwoa9ubNEAkwKykZKrM4CYXv/jtINLy5t5SMxKe7A/UQ9dgkEe9Hp85X5LXlmlK8ZdcuugD+AZHjf5HzjyRJqNeNs3WRVmbf45288zKbPTEYDPgpbjviXaL2IY9d04cDjPkJJ4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738881118; c=relaxed/simple; bh=zYmmiPT5QI6o2Mi+rYGvf2DB2RWXoDFri3rtt13psLU=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=jZHD4+ePICHpSt7mVWOURUDvDCO/ndt1Zq9wyMll4Tt8z43KmY+7c8ewQWc3L7rFgi6Jeuy1DMAiEgKN+HE1pUJiGy9KV6aGGXxaCAEnKqHstXhVKaFAoDieUL4hi2c3XT8oq46On+qvq8PSBTjkmV2w4C+wrqjrZx5ukbVeaOE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=QWvOn45B; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="QWvOn45B" Received: by smtp.kernel.org (Postfix) with ESMTPSA id B47E3C4CEDD; Thu, 6 Feb 2025 22:31:58 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1738881118; bh=zYmmiPT5QI6o2Mi+rYGvf2DB2RWXoDFri3rtt13psLU=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=QWvOn45BdtJ993mwH+ldVflJruFCydPAAGzmgniM18hxKwmTnf0how2z2Utj+K36s qBPo97oSwKBn6TcsaWegElBkO+edhOoED81HqmE74dPrtTUWlw7cGo0NVKL8pIQjjD 9pFjb8lsak55Q1fGQrtCqZ8nVNK8JpJoMAph0l/0HECwpD5dN/WjZYrl+fKL2Cw7ud 7N04akksKgQkfgDYYU52Fp5sW9e8DefwaUjy290ZcMmQddCUATxYcQuXhtNSgVgtSz Wn0RncPLXv5kWV3Y2722ozl9Og+vpNKr2Gp8wNhQi1aQrhJRinXYX+3Iq816T37gWY Gp/X1KC9ft0dw== Date: Thu, 06 Feb 2025 14:31:57 -0800 Subject: [PATCH 05/17] xfs_scrub: don't report data loss in unlinked inodes twice From: "Darrick J. Wong" To: djwong@kernel.org, aalbersh@kernel.org Cc: hch@lst.de, linux-xfs@vger.kernel.org Message-ID: <173888086136.2738568.12499263697186080933.stgit@frogsfrogsfrogs> In-Reply-To: <173888086034.2738568.15125078367450007162.stgit@frogsfrogsfrogs> References: <173888086034.2738568.15125078367450007162.stgit@frogsfrogsfrogs> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong If parent pointers are enabled, report_ioerr_fsmap will report lost file data and xattrs for all files, having used the parent pointer ioctls to generate the path of the lost file. For unlinked files, the path lookup will fail, but we'll report the inumber of the file that lost data. Therefore, we don't need to do a separate scan of the unlinked inodes in report_all_media_errors after doing the fsmap scan. Cc: # v6.10.0 Fixes: 9b5d1349ca5fb1 ("xfs_scrub: use parent pointers to report lost file data") Signed-off-by: "Darrick J. Wong" Reviewed-by: Christoph Hellwig --- scrub/phase6.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/scrub/phase6.c b/scrub/phase6.c index fc63f5aad0bd7b..2695e645004bf1 100644 --- a/scrub/phase6.c +++ b/scrub/phase6.c @@ -569,12 +569,12 @@ report_all_media_errors( * Scan the directory tree to get file paths if we didn't already use * directory parent pointers to report the loss. */ - if (!can_use_pptrs(ctx)) { - ret = scan_fs_tree(ctx, report_dir_loss, report_dirent_loss, - vs); - if (ret) - return ret; - } + if (can_use_pptrs(ctx)) + return 0; + + ret = scan_fs_tree(ctx, report_dir_loss, report_dirent_loss, vs); + if (ret) + return ret; /* Scan for unlinked files. */ return scrub_scan_all_inodes(ctx, report_inode_loss, 0, vs); From patchwork Thu Feb 6 22:32:13 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13963877 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8556323C380 for ; Thu, 6 Feb 2025 22:32:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738881135; cv=none; b=Oa/Xq6WbgDZzQlSufruF2yRQH7kgWbfcl3BmecslpWjZnkfLnQTxWwsRnCPnaInhM9WgoKYFUJNhKlIzsWGR8svTIWZDdRRrzaSEFWoJIEVghPp1WScVFHQRerZr//V647SM8zXsgTBfXSLDqjR8mrrdQg3kLzTTv11uukTkdmE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738881135; c=relaxed/simple; bh=fegis/W4Nriq8N9/cca3cQEQdysBUteoOCMEz5HsjrU=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=KfCm2tzSXcpPIfd/NnJ62HlYdY7ZPsbq1VPCUk16wkecn1rpLTTEyYboTQ3HA9zDm6w/7KIVXnKgp/hWhgxV5m83/fm2ZbKTxe7Qqm8zXjiLt2anCCoEAnsQ74Z2M5qoMLw1duoZaX2+XQEb9Bm4X9SEN6jTbGk8Ca3sXgy1UlM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=nmtMXNV2; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="nmtMXNV2" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 5CF97C4CEDD; Thu, 6 Feb 2025 22:32:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1738881135; bh=fegis/W4Nriq8N9/cca3cQEQdysBUteoOCMEz5HsjrU=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=nmtMXNV2gpFXFXHH841Y+PEu36OQaxSGR91c+1B80t+adeb7Mj6CoE7NKGJ/OjEqr oDctG20JNmOOBHcpW3ttUhogWi1D4CpggvrH4wSSMOpfS2SNkUWlmoYeXYR2eqQxCz mKzjbA615f3xMrIvmUjamBD8aGknoe/w8eI7XyOpaRi8IJDFYu5fFsiCFEcKvqIY1G gPnDsKxgqAHXdo4dDDg36IL3fHlj8zEzmwmoHPUBFhjlnfoOJk3DElWu3rTa3V05cC SItK8pzKgDZbR1B5nTDioMpC2YmaesMjVNgWHKYKfzSYTPTMhaW9pW1dWpO4RzerAI C/nsY1KG/gp4A== Date: Thu, 06 Feb 2025 14:32:13 -0800 Subject: [PATCH 06/17] xfs_scrub: call bulkstat directly if we're only scanning user files From: "Darrick J. Wong" To: djwong@kernel.org, aalbersh@kernel.org Cc: hch@lst.de, linux-xfs@vger.kernel.org Message-ID: <173888086151.2738568.86305255846191106.stgit@frogsfrogsfrogs> In-Reply-To: <173888086034.2738568.15125078367450007162.stgit@frogsfrogsfrogs> References: <173888086034.2738568.15125078367450007162.stgit@frogsfrogsfrogs> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Christoph observed xfs_scrub phase 5 consuming a lot of CPU time on a filesystem with a very large number of rtgroups. He traced this to bulkstat_for_inumbers spending a lot of time trying to single-step through inodes that were marked allocated in the inumbers record but didn't show up in the bulkstat data. These correspond to files in the metadata directory tree that are not returned by the regular bulkstat. This complex machinery isn't necessary for the inode walk that occur during phase 5 because phase 5 wants to open user files and check the dirent/xattr names associated with that file. It's not needed for phase 6 because we're only using it to report data loss in unlinked files when parent pointers aren't enabled. Furthermore, we don't need to do this inumbers -> bulkstat dance because phase 3 and 4 supposedly fixed any inode that was to corrupt to be igettable and hence reported on by bulkstat. Fix this by creating a simpler user file iterator that walks bulkstat across the filesystem without using inumbers. While we're at it, fix the obviously incorrect comments in inodes.h. Cc: # v4.15.0 Fixes: 372d4ba99155b2 ("xfs_scrub: add inode iteration functions") Signed-off-by: "Darrick J. Wong" Reviewed-by: Christoph Hellwig --- scrub/inodes.c | 151 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ scrub/inodes.h | 6 ++ scrub/phase5.c | 2 - scrub/phase6.c | 2 - 4 files changed, 158 insertions(+), 3 deletions(-) diff --git a/scrub/inodes.c b/scrub/inodes.c index 2b492a634ea3b2..58969131628f8f 100644 --- a/scrub/inodes.c +++ b/scrub/inodes.c @@ -445,6 +445,157 @@ scrub_scan_all_inodes( return si.aborted ? -1 : 0; } +struct user_bulkstat { + struct scan_inodes *si; + + /* vla, must be last */ + struct xfs_bulkstat_req breq; +}; + +/* Iterate all the user files returned by a bulkstat. */ +static void +scan_user_files( + struct workqueue *wq, + xfs_agnumber_t agno, + void *arg) +{ + struct xfs_handle handle; + struct scrub_ctx *ctx = (struct scrub_ctx *)wq->wq_ctx; + struct user_bulkstat *ureq = arg; + struct xfs_bulkstat *bs = &ureq->breq.bulkstat[0]; + struct scan_inodes *si = ureq->si; + int i; + int error = 0; + DEFINE_DESCR(dsc_bulkstat, ctx, render_ino_from_bulkstat); + + handle_from_fshandle(&handle, ctx->fshandle, ctx->fshandle_len); + + for (i = 0; !si->aborted && i < ureq->breq.hdr.ocount; i++, bs++) { + descr_set(&dsc_bulkstat, bs); + handle_from_bulkstat(&handle, bs); + error = si->fn(ctx, &handle, bs, si->arg); + switch (error) { + case 0: + break; + case ESTALE: + case ECANCELED: + error = 0; + fallthrough; + default: + goto err; + } + if (scrub_excessive_errors(ctx)) { + si->aborted = true; + goto out; + } + } + +err: + if (error) { + str_liberror(ctx, error, descr_render(&dsc_bulkstat)); + si->aborted = true; + } +out: + free(ureq); +} + +/* + * Run one step of the user files bulkstat scan and schedule background + * processing of the stat data returned. Returns 1 to keep going, or 0 to + * stop. + */ +static int +scan_user_bulkstat( + struct scrub_ctx *ctx, + struct scan_inodes *si, + uint64_t *cursor) +{ + struct user_bulkstat *ureq; + const char *what = NULL; + int ret; + + ureq = calloc(1, sizeof(struct user_bulkstat) + + XFS_BULKSTAT_REQ_SIZE(LIBFROG_BULKSTAT_CHUNKSIZE)); + if (!ureq) { + ret = ENOMEM; + what = _("creating bulkstat work item"); + goto err; + } + ureq->si = si; + ureq->breq.hdr.icount = LIBFROG_BULKSTAT_CHUNKSIZE; + ureq->breq.hdr.ino = *cursor; + + ret = -xfrog_bulkstat(&ctx->mnt, &ureq->breq); + if (ret) { + what = _("user files bulkstat"); + goto err_ureq; + } + if (ureq->breq.hdr.ocount == 0) { + *cursor = NULLFSINO; + free(ureq); + return 0; + } + + *cursor = ureq->breq.hdr.ino; + + /* scan_user_files frees ureq; do not access it */ + ret = -workqueue_add(&si->wq_bulkstat, scan_user_files, 0, ureq); + if (ret) { + what = _("queueing bulkstat work"); + goto err_ureq; + } + ureq = NULL; + + return 1; + +err_ureq: + free(ureq); +err: + si->aborted = true; + str_liberror(ctx, ret, what); + return 0; +} + +/* + * Scan all the user files in a filesystem in inumber order. On error, this + * function will log an error message and return -1. + */ +int +scrub_scan_user_files( + struct scrub_ctx *ctx, + scrub_inode_iter_fn fn, + void *arg) +{ + struct scan_inodes si = { + .fn = fn, + .arg = arg, + .nr_threads = scrub_nproc_workqueue(ctx), + }; + uint64_t ino = 0; + int ret; + + /* Queue up to four bulkstat result sets per thread. */ + ret = -workqueue_create_bound(&si.wq_bulkstat, (struct xfs_mount *)ctx, + si.nr_threads, si.nr_threads * 4); + if (ret) { + str_liberror(ctx, ret, _("creating bulkstat workqueue")); + return -1; + } + + while ((ret = scan_user_bulkstat(ctx, &si, &ino)) == 1) { + /* empty */ + } + + ret = -workqueue_terminate(&si.wq_bulkstat); + if (ret) { + si.aborted = true; + str_liberror(ctx, ret, _("finishing bulkstat work")); + } + workqueue_destroy(&si.wq_bulkstat); + + return si.aborted ? -1 : 0; +} + /* Open a file by handle, returning either the fd or -1 on error. */ int scrub_open_handle( diff --git a/scrub/inodes.h b/scrub/inodes.h index 7a0b275e575ead..99b78fa1f76515 100644 --- a/scrub/inodes.h +++ b/scrub/inodes.h @@ -7,7 +7,7 @@ #define XFS_SCRUB_INODES_H_ /* - * Visit each space mapping of an inode fork. Return 0 to continue iteration + * Callback for each inode in a filesystem. Return 0 to continue iteration * or a positive error code to interrupt iteraton. If ESTALE is returned, * iteration will be restarted from the beginning of the inode allocation * group. Any other non zero value will stop iteration. The special return @@ -23,6 +23,10 @@ typedef int (*scrub_inode_iter_fn)(struct scrub_ctx *ctx, int scrub_scan_all_inodes(struct scrub_ctx *ctx, scrub_inode_iter_fn fn, unsigned int flags, void *arg); +/* Scan all user-created files in the filesystem. */ +int scrub_scan_user_files(struct scrub_ctx *ctx, scrub_inode_iter_fn fn, + void *arg); + int scrub_open_handle(struct xfs_handle *handle); #endif /* XFS_SCRUB_INODES_H_ */ diff --git a/scrub/phase5.c b/scrub/phase5.c index 6460d00f30f4bd..577dda8064c3a8 100644 --- a/scrub/phase5.c +++ b/scrub/phase5.c @@ -882,7 +882,7 @@ _("Filesystem has errors, skipping connectivity checks.")); pthread_mutex_init(&ncs.lock, NULL); - ret = scrub_scan_all_inodes(ctx, check_inode_names, 0, &ncs); + ret = scrub_scan_user_files(ctx, check_inode_names, &ncs); if (ret) goto out_lock; if (ncs.aborted) { diff --git a/scrub/phase6.c b/scrub/phase6.c index 2695e645004bf1..9858b932f20de5 100644 --- a/scrub/phase6.c +++ b/scrub/phase6.c @@ -577,7 +577,7 @@ report_all_media_errors( return ret; /* Scan for unlinked files. */ - return scrub_scan_all_inodes(ctx, report_inode_loss, 0, vs); + return scrub_scan_user_files(ctx, report_inode_loss, vs); } /* Schedule a read-verify of a (data block) extent. */ From patchwork Thu Feb 6 22:32:30 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13963878 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 81CE123C380 for ; Thu, 6 Feb 2025 22:32:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738881152; cv=none; b=Xw0YulRi/uKtl4ULZB34tx0yj/I6DQwjxjsmdc2T5TwVvuHrZJxXPHvlPkKO0lskkVdQ52Z+16IddLS6MslMv6m6aMVEJ/dHywGbIEHviQxfaiqXo2ut4HTblqMh12Kx8JGnsUdb4RTahfiLWo2ft2MHrDTcIhVnOdEInRob4Fw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738881152; c=relaxed/simple; bh=9VTlwDlEVEUXePinz0+P5OAfQ5luKF96OijHvaxE47k=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=IEu80qq+GyWjQFoSuuKbTscykhXZp27nAlcf+RfRSNdxYVQdVljFgZEh0hQzE9SvHSJtH1PX0R8NkilgQfudSri9HVwhW6Agc+/Sw5aFdTkZwNgtph7QsEGoHIRWIh1aKrFNfKyFOjn94TOTvUEhtB2tQtZmA3AG29X8YJ+s76k= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=cbOgC3EO; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="cbOgC3EO" Received: by smtp.kernel.org (Postfix) with ESMTPSA id EC57FC4CEDD; Thu, 6 Feb 2025 22:32:30 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1738881151; bh=9VTlwDlEVEUXePinz0+P5OAfQ5luKF96OijHvaxE47k=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=cbOgC3EOpSnzDEBQ273rEECR291vr1WTEQfMVjA8wBtkuQ4bf1zSR+L0uB19CB1pD NDIylY+DNTFm/73EtN9MtTYioQnoTDGSRjaMz6UPZlAAqBCahooyUePHF7DvSG8wEH b14NZDOFcdmZL6L0boFudKJJVioV88xmnMeME1vA+lnY+C7T60m2Yzcyy6MUcBCsCi oD/O4O/QOJk78i8XTeRgTmYhAjOc/L4307dYPYdX5W4xJeAYqZBE5H9w/A+ds9Qh5/ I5j+nSFp17zqwxqN4Ud/8nhG6dHBArcDFP6Pgt1OgLh+5oJvpj6h6on7LPyxMb2hxe 2gJcohH+2HQlA== Date: Thu, 06 Feb 2025 14:32:30 -0800 Subject: [PATCH 07/17] xfs_scrub: remove flags argument from scrub_scan_all_inodes From: "Darrick J. Wong" To: djwong@kernel.org, aalbersh@kernel.org Cc: hch@lst.de, linux-xfs@vger.kernel.org Message-ID: <173888086167.2738568.69850505985022498.stgit@frogsfrogsfrogs> In-Reply-To: <173888086034.2738568.15125078367450007162.stgit@frogsfrogsfrogs> References: <173888086034.2738568.15125078367450007162.stgit@frogsfrogsfrogs> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Now that there's only one caller of scrub_scan_all_inodes, remove the single defined flag because it can set the METADIR bulkstat flag if needed. Clarify in the documentation that this is a special purpose inode iterator that picks up things that don't normally happen. Signed-off-by: "Darrick J. Wong" Reviewed-by: Christoph Hellwig --- scrub/inodes.c | 23 ++++++++++------------- scrub/inodes.h | 6 ++---- scrub/phase3.c | 7 +------ 3 files changed, 13 insertions(+), 23 deletions(-) diff --git a/scrub/inodes.c b/scrub/inodes.c index 58969131628f8f..c32dfb624e3e95 100644 --- a/scrub/inodes.c +++ b/scrub/inodes.c @@ -57,7 +57,6 @@ bulkstat_for_inumbers( { struct xfs_bulkstat *bstat = breq->bulkstat; struct xfs_bulkstat *bs; - unsigned int flags = 0; int i; int error; @@ -72,9 +71,6 @@ bulkstat_for_inumbers( strerror_r(error, errbuf, DESCR_BUFSZ)); } - if (breq->hdr.flags & XFS_BULK_IREQ_METADIR) - flags |= XFS_BULK_IREQ_METADIR; - /* * Check each of the stats we got back to make sure we got the inodes * we asked for. @@ -89,7 +85,7 @@ bulkstat_for_inumbers( /* Load the one inode. */ error = -xfrog_bulkstat_single(&ctx->mnt, - inumbers->xi_startino + i, flags, bs); + inumbers->xi_startino + i, breq->hdr.flags, bs); if (error || bs->bs_ino != inumbers->xi_startino + i) { memset(bs, 0, sizeof(struct xfs_bulkstat)); bs->bs_ino = inumbers->xi_startino + i; @@ -105,7 +101,6 @@ struct scan_inodes { scrub_inode_iter_fn fn; void *arg; unsigned int nr_threads; - unsigned int flags; bool aborted; }; @@ -139,6 +134,7 @@ ichunk_to_bulkstat( static inline int alloc_ichunk( + struct scrub_ctx *ctx, struct scan_inodes *si, uint32_t agno, uint64_t startino, @@ -164,7 +160,9 @@ alloc_ichunk( breq = ichunk_to_bulkstat(ichunk); breq->hdr.icount = LIBFROG_BULKSTAT_CHUNKSIZE; - if (si->flags & SCRUB_SCAN_METADIR) + + /* Scan the metadata directory tree too. */ + if (ctx->mnt.fsgeom.flags & XFS_FSOP_GEOM_FLAGS_METADIR) breq->hdr.flags |= XFS_BULK_IREQ_METADIR; *ichunkp = ichunk; @@ -302,7 +300,7 @@ scan_ag_inumbers( descr_set(&dsc, &agno); - error = alloc_ichunk(si, agno, 0, &ichunk); + error = alloc_ichunk(ctx, si, agno, 0, &ichunk); if (error) goto err; ireq = ichunk_to_inumbers(ichunk); @@ -355,7 +353,7 @@ scan_ag_inumbers( } if (!ichunk) { - error = alloc_ichunk(si, agno, nextino, &ichunk); + error = alloc_ichunk(ctx, si, agno, nextino, &ichunk); if (error) goto err; } @@ -375,19 +373,18 @@ scan_ag_inumbers( } /* - * Scan all the inodes in a filesystem. On error, this function will log - * an error message and return -1. + * Scan all the inodes in a filesystem, including metadata directory files and + * broken files. On error, this function will log an error message and return + * -1. */ int scrub_scan_all_inodes( struct scrub_ctx *ctx, scrub_inode_iter_fn fn, - unsigned int flags, void *arg) { struct scan_inodes si = { .fn = fn, - .flags = flags, .arg = arg, .nr_threads = scrub_nproc_workqueue(ctx), }; diff --git a/scrub/inodes.h b/scrub/inodes.h index 99b78fa1f76515..d68e94eb216895 100644 --- a/scrub/inodes.h +++ b/scrub/inodes.h @@ -17,11 +17,9 @@ typedef int (*scrub_inode_iter_fn)(struct scrub_ctx *ctx, struct xfs_handle *handle, struct xfs_bulkstat *bs, void *arg); -/* Return metadata directories too. */ -#define SCRUB_SCAN_METADIR (1 << 0) - +/* Scan every file in the filesystem, including metadir and corrupt ones. */ int scrub_scan_all_inodes(struct scrub_ctx *ctx, scrub_inode_iter_fn fn, - unsigned int flags, void *arg); + void *arg); /* Scan all user-created files in the filesystem. */ int scrub_scan_user_files(struct scrub_ctx *ctx, scrub_inode_iter_fn fn, diff --git a/scrub/phase3.c b/scrub/phase3.c index c90da78439425a..046a42c1da8beb 100644 --- a/scrub/phase3.c +++ b/scrub/phase3.c @@ -312,7 +312,6 @@ phase3_func( struct scrub_inode_ctx ictx = { .ctx = ctx }; uint64_t val; xfs_agnumber_t agno; - unsigned int scan_flags = 0; int err; err = -ptvar_alloc(scrub_nproc(ctx), sizeof(struct action_list), @@ -329,10 +328,6 @@ phase3_func( goto out_ptvar; } - /* Scan the metadata directory tree too. */ - if (ctx->mnt.fsgeom.flags & XFS_FSOP_GEOM_FLAGS_METADIR) - scan_flags |= SCRUB_SCAN_METADIR; - /* * If we already have ag/fs metadata to repair from previous phases, * we would rather not try to repair file metadata until we've tried @@ -343,7 +338,7 @@ phase3_func( ictx.always_defer_repairs = true; } - err = scrub_scan_all_inodes(ctx, scrub_inode, scan_flags, &ictx); + err = scrub_scan_all_inodes(ctx, scrub_inode, &ictx); if (!err && ictx.aborted) err = ECANCELED; if (err) From patchwork Thu Feb 6 22:32:46 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13963879 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BBC3A23C380 for ; Thu, 6 Feb 2025 22:32:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738881166; cv=none; b=qwZOeDXjk/JVDgImfx0HUWaopPgW7iQbGedG5e983UNcPHIffmqZbzMWw6hG/GIlsS9Xv0gbM+MbnwUsQwqQm6Hwi52kkyTsN+mR8c1CtBBBeZFutqMt3E/bYg97kqgnS4WoTg8QnBY7G381CE/N1XCoHyrVUmU9BAuxsa9MMng= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738881166; c=relaxed/simple; bh=BjdxM0GDvnEPNbKX3BpZOGpoTFfLke3heb61iO6gWeg=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=ukmXRYWtJN+6aJtJXZaRe4YuzWMtxC5QHFN7gfeO1dG/CRoUEZI7TRpr1XMgbT+O/1ptNXRaDDcjGT3CpNhOdQ8tHjiKx8FH9f5ixx3JFKRFqPECVgTc52vCb5CEtgnhAxhBjkKRObXfi+dU4eUVX91qMoe8Em8y4lumH92AeGs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=cRj1wdd3; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="cRj1wdd3" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 943E0C4CEDD; Thu, 6 Feb 2025 22:32:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1738881166; bh=BjdxM0GDvnEPNbKX3BpZOGpoTFfLke3heb61iO6gWeg=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=cRj1wdd3VsGVpWHQ8G+PtlTzZq2hzJh7wkDEujCbcU93ljhxjufNYMrzo5UF/gGJl Tqc1n11DaXuJGRMf6nXFJHpSnTHwnVs2DYM54GrR7b7rhvwKPiOJ83cf4EaYli5agv N/PP1q3K89AtJqQEqN/h04dCWhBcb9uA9wGmKelWhZJtEJvbVlI7cOxio6FWkY1NrU vOejRgnZeuhEUvoehEuJiXGvm3ip9mVsAft/g45p5hktq/ILSom+ooLkJSuLMF/Uaz HjXFkv9yiLn+CA/YbYQnUrBuigbDhwMxyIDQPl23SVj2qplWApon5uKAMK5m5Icx6O sKSqB1N+3jwow== Date: Thu, 06 Feb 2025 14:32:46 -0800 Subject: [PATCH 08/17] xfs_scrub: selectively re-run bulkstat after re-running inumbers From: "Darrick J. Wong" To: djwong@kernel.org, aalbersh@kernel.org Cc: hch@lst.de, linux-xfs@vger.kernel.org Message-ID: <173888086183.2738568.5501883032377295543.stgit@frogsfrogsfrogs> In-Reply-To: <173888086034.2738568.15125078367450007162.stgit@frogsfrogsfrogs> References: <173888086034.2738568.15125078367450007162.stgit@frogsfrogsfrogs> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong In the phase 3 inode scan, don't bother retrying the inumbers -> bulkstat conversion unless inumbers returns the same startino and there are allocated inodes. If inumbers returns data for a totally different inobt record, that means the whole inode chunk was freed. Cc: # v5.18.0 Fixes: 245c72a6eeb720 ("xfs_scrub: balance inode chunk scan across CPUs") Signed-off-by: "Darrick J. Wong" Reviewed-by: Christoph Hellwig --- scrub/inodes.c | 19 +++++++++++++++++-- 1 file changed, 17 insertions(+), 2 deletions(-) diff --git a/scrub/inodes.c b/scrub/inodes.c index c32dfb624e3e95..8bdfa0b35d6172 100644 --- a/scrub/inodes.c +++ b/scrub/inodes.c @@ -60,6 +60,8 @@ bulkstat_for_inumbers( int i; int error; + assert(inumbers->xi_allocmask != 0); + /* First we try regular bulkstat, for speed. */ breq->hdr.ino = inumbers->xi_startino; breq->hdr.icount = inumbers->xi_alloccount; @@ -246,11 +248,24 @@ scan_ag_bulkstat( case ESTALE: { stale_count++; if (stale_count < 30) { - ireq->hdr.ino = inumbers->xi_startino; + uint64_t old_startino; + + ireq->hdr.ino = old_startino = + inumbers->xi_startino; error = -xfrog_inumbers(&ctx->mnt, ireq); if (error) goto err; - goto retry; + /* + * Retry only if inumbers returns the same + * inobt record as the previous record and + * there are allocated inodes in it. + */ + if (!si->aborted && + ireq->hdr.ocount > 0 && + inumbers->xi_alloccount > 0 && + inumbers->xi_startino == old_startino) + goto retry; + goto out; } str_info(ctx, descr_render(&dsc_bulkstat), _("Changed too many times during scan; giving up.")); From patchwork Thu Feb 6 22:33:01 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13963880 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 70F2123C380 for ; Thu, 6 Feb 2025 22:33:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738881182; cv=none; b=Yc4obRK/24qYzu63T40FwfzahibNcHmtrMazya+qoKyHYkiXlhLcOwqqnORFdoAnSoNy766cn0u2EI6jyztsebbxIelA3gTW2rTNs1YkzyCdzUjOWnft+No45rkg+rFyFH6XsL2Q0lZcd9Xf8OH5Zk/DgyJvEOQ5TP+9NL6Piak= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738881182; c=relaxed/simple; bh=kgMaJkhQXV+UgnPz12rEaL1HWzP7AemNSeOdsWVPcms=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=Tigc1979psessa32euIjYKbtxwQ9j1cGBy6qMFmYu93mpXvu7V/TVo0mNHYXA9fI1QuDbRJpYOY6hXlpMbKvV2Xl736o1DygDKBSbj1ft/2gcPRFmzOb8tnKLj/a6mBQAx/JCp8feeUy+z6WYYVAZyG8WHWMIle3BSPq2g6lmuc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=nc9kf4Bo; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="nc9kf4Bo" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 3F5F3C4CEDD; Thu, 6 Feb 2025 22:33:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1738881182; bh=kgMaJkhQXV+UgnPz12rEaL1HWzP7AemNSeOdsWVPcms=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=nc9kf4Bo1wQbp9HEl7UB8GlHeY8ESIOPav9ruvMT/XsRP/pLWBSpFUhKmTd68Skpf UFe5TEerSXicJh3jW6EiywIhlSFlwvgRw8VwRcOA0RPymeZ1j/FJ1tVj8Kin8DpyMp 0YtWaQnEdDKOSr+UNFFlz3Szvu9VH9xshC/DPU8nfe/MnVeeDtBtwsJRCWKGm3Bx9i wxzR7IV7rHhLVv6CcCSIyhP5uXrqbOpKprrT1b728gTQ6OiFGJhL1PFcdKUpvyXkNr bfs31rnbnFpsTbO1N/+6HQfdZYVzbaaEiVIu4mbCC6diMiBocoCCaVu9aPOC0cqyB3 BLOQS1rfLy7Uw== Date: Thu, 06 Feb 2025 14:33:01 -0800 Subject: [PATCH 09/17] xfs_scrub: actually iterate all the bulkstat records From: "Darrick J. Wong" To: djwong@kernel.org, aalbersh@kernel.org Cc: hch@lst.de, linux-xfs@vger.kernel.org Message-ID: <173888086198.2738568.10758609523199339681.stgit@frogsfrogsfrogs> In-Reply-To: <173888086034.2738568.15125078367450007162.stgit@frogsfrogsfrogs> References: <173888086034.2738568.15125078367450007162.stgit@frogsfrogsfrogs> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong In scan_ag_bulkstat, we have a for loop that iterates all the xfs_bulkstat records in breq->bulkstat. The loop condition test should test against the array length, not the number of bits set in an unrelated data structure. If ocount > xi_alloccount then we miss some inodes; if ocount < xi_alloccount then we've walked off the end of the array. Cc: # v5.18.0 Fixes: 245c72a6eeb720 ("xfs_scrub: balance inode chunk scan across CPUs") Signed-off-by: "Darrick J. Wong" Reviewed-by: Christoph Hellwig --- scrub/inodes.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/scrub/inodes.c b/scrub/inodes.c index 8bdfa0b35d6172..4d8b137a698004 100644 --- a/scrub/inodes.c +++ b/scrub/inodes.c @@ -216,7 +216,7 @@ scan_ag_bulkstat( struct xfs_inumbers_req *ireq = ichunk_to_inumbers(ichunk); struct xfs_bulkstat_req *breq = ichunk_to_bulkstat(ichunk); struct scan_inodes *si = ichunk->si; - struct xfs_bulkstat *bs; + struct xfs_bulkstat *bs = &breq->bulkstat[0]; struct xfs_inumbers *inumbers = &ireq->inumbers[0]; uint64_t last_ino = 0; int i; @@ -231,8 +231,7 @@ scan_ag_bulkstat( bulkstat_for_inumbers(ctx, &dsc_inumbers, inumbers, breq); /* Iterate all the inodes. */ - bs = &breq->bulkstat[0]; - for (i = 0; !si->aborted && i < inumbers->xi_alloccount; i++, bs++) { + for (i = 0; !si->aborted && i < breq->hdr.ocount; i++, bs++) { uint64_t scan_ino = bs->bs_ino; /* ensure forward progress if we retried */ From patchwork Thu Feb 6 22:33:17 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13963881 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 62BC723C380 for ; Thu, 6 Feb 2025 22:33:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738881200; cv=none; b=bfbtJoCMactQC3qoyetLp/GywVbiFEwVRYU3UtajB4d/h2U4kauhz8UoWR0/PTpMMpNYfWkfo2XP2JwIrhkOl9qTBRpLOt9pJZrJ+9bAMEnvFlsZJu5LyswgxutlOhium4VCI2VYLPLvkqc1CY/kWK64OoDrmDiclDhLfyXv4aQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738881200; c=relaxed/simple; bh=81jMm/6LH7cbXeO9MllrHoOYrcWze0fVPNguk2AOp78=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=NTdvBijg0qAIfKmhocfU8jeDyW4VNwuy+GuvdldACKdR0Bzi5mGUIng5yjgOhOaqguv8AVst6UMSWLF8Q5DfcMopAV/nMBK01kE34ZAXpJf44N0hNQukuj3sAdogDKoJiS+7VTiXPEWuV8vEC4pPSoxlZZrKXYNU3q6sFJaH9v8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=YYfArVN6; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="YYfArVN6" Received: by smtp.kernel.org (Postfix) with ESMTPSA id CFDD1C4CEDD; Thu, 6 Feb 2025 22:33:17 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1738881197; bh=81jMm/6LH7cbXeO9MllrHoOYrcWze0fVPNguk2AOp78=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=YYfArVN6cSmDrQnN5LQ1DIsYSXdhjKJ3k98zLRfv+4Pghg36qmkmAOdNmjhcc4Rdx 3RqHVDtKut+KOXSdT5vBhWnRC5aKZd8JfoP0boUSXO3tWGDVUrE7I7z1ySWql7OjYp gie7AsJFX4FbgITlThE3o6fRedVsXDDJKDke92uddcwyOTL3Eapnlmeh6J8N6jhAyr lxLIM56NiRlPMDP9jqOe6gZYAP0yO9+hdvwl4e24ZxaQpUmc2xA7suPAHNQJVohJed wkOw0H4PF5Sxn+NmPmoKc3s2LP06S/QXNJMRx/028RTO3f2PZnKogrvKeKkKENcI5W iHjwiLsgohgsQ== Date: Thu, 06 Feb 2025 14:33:17 -0800 Subject: [PATCH 10/17] xfs_scrub: don't double-scan inodes during phase 3 From: "Darrick J. Wong" To: djwong@kernel.org, aalbersh@kernel.org Cc: hch@lst.de, linux-xfs@vger.kernel.org Message-ID: <173888086213.2738568.8939791256440476361.stgit@frogsfrogsfrogs> In-Reply-To: <173888086034.2738568.15125078367450007162.stgit@frogsfrogsfrogs> References: <173888086034.2738568.15125078367450007162.stgit@frogsfrogsfrogs> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong The bulkstat ioctl only allows us to specify the starting inode number and the length of the bulkstat array. It is possible that a bulkstat request for {startino = 30, icount = 10} will return stat data for inode 50. For most bulkstat users this is ok because they're marching linearly across all inodes in the filesystem. Unfortunately for scrub phase 3 this is undesirable because we only want the inodes that belong to a specific inobt record because we need to know about inodes that are marked as allocated but are too corrupt to appear in the bulkstat output. Another worker will process the inobt record(s) that corresponds to the extra inodes, which means we can double-scan some inodes. Therefore, bulkstat_for_inumbers should trim out inodes that don't correspond to the inumbers record that it is given. Cc: # v5.3.0 Fixes: e3724c8b82a320 ("xfs_scrub: refactor xfs_iterate_inodes_range_check") Signed-off-by: "Darrick J. Wong" Reviewed-by: Christoph Hellwig --- scrub/inodes.c | 28 ++++++++++++++++++++-------- 1 file changed, 20 insertions(+), 8 deletions(-) diff --git a/scrub/inodes.c b/scrub/inodes.c index 4d8b137a698004..a7ea24615e9255 100644 --- a/scrub/inodes.c +++ b/scrub/inodes.c @@ -50,15 +50,17 @@ */ static void bulkstat_for_inumbers( - struct scrub_ctx *ctx, - struct descr *dsc, - const struct xfs_inumbers *inumbers, - struct xfs_bulkstat_req *breq) + struct scrub_ctx *ctx, + struct descr *dsc, + const struct xfs_inumbers *inumbers, + struct xfs_bulkstat_req *breq) { - struct xfs_bulkstat *bstat = breq->bulkstat; - struct xfs_bulkstat *bs; - int i; - int error; + struct xfs_bulkstat *bstat = breq->bulkstat; + struct xfs_bulkstat *bs; + const uint64_t limit_ino = + inumbers->xi_startino + LIBFROG_BULKSTAT_CHUNKSIZE; + int i; + int error; assert(inumbers->xi_allocmask != 0); @@ -73,6 +75,16 @@ bulkstat_for_inumbers( strerror_r(error, errbuf, DESCR_BUFSZ)); } + /* + * Bulkstat might return inodes beyond xi_startino + CHUNKSIZE. Reduce + * ocount to ignore inodes not described by the inumbers record. + */ + for (i = breq->hdr.ocount - 1; i >= 0; i--) { + if (breq->bulkstat[i].bs_ino < limit_ino) + break; + breq->hdr.ocount--; + } + /* * Check each of the stats we got back to make sure we got the inodes * we asked for. From patchwork Thu Feb 6 22:33:33 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13963882 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9F5F123C380 for ; Thu, 6 Feb 2025 22:33:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738881213; cv=none; b=OEqjDyHQeuw9S2ZE6hu8ME6cvCOq/PtrJfE6kjDcofkmI2soZUQoZ7ouM+cBPjpSTkcHJ664o/Wtld8e6GLViK0kPBAvPyCVZwMxtZGt22fpYNLjW2YC3nalOotD4RSeFSV/UoiYKaUUXdZyeDRvvZ05vXsfmd8WU8tbNHzxavk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738881213; c=relaxed/simple; bh=P92t2ZOWqqm/hYmNlwwKrEbbLTc7CzU5KXRl09Kvwg4=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=F/9Fwf63SlKJ2157OCKJL0zDVItDZpOaZBwRasee9IFFffW94pyT08lpFX+uHZDUelnFjOxax4/GUjLXnKiKhaLRE8CeTNTEwQzH7zuDq0i2jkl3SWt8yGUoqetjUCRW05IpLj6wXvQGDAOwQL8VSI+sGtFVuRbzR67rTPH73KE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=jhI+um56; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="jhI+um56" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 73CD5C4CEDD; Thu, 6 Feb 2025 22:33:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1738881213; bh=P92t2ZOWqqm/hYmNlwwKrEbbLTc7CzU5KXRl09Kvwg4=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=jhI+um56zYi8QXK0xFnYqceI9fOA1FM2VX4A0m7Fn+lY/Qh+eXMwlHYXRpO6z34g3 qsa0jnI28B3QOTnx45rQN8sicpJg5Z62oZbsASDxTJR17Py52wuTXOktOTDGDrfjQX jrcPasqaC4Vc+l4/mfA3KxmPgMlIjOM682J9fUe2vOFItzRVx8a4KSqIrR9bdd/j09 IvhvS+9J6YvhWH/0RVRwm48pOeCZ/MzUR5N5A/d0XY08fcGfpu0JyYtNLkLSsOyocy YDLkAd2tPQnK/wvFcX1QH2kQ6FRu6OQLqYzRJmqrvPWZ7TGAmRS0f/tEiGj3dEXNlV KFODfalHTmcjw== Date: Thu, 06 Feb 2025 14:33:33 -0800 Subject: [PATCH 11/17] xfs_scrub: don't (re)set the bulkstat request icount incorrectly From: "Darrick J. Wong" To: djwong@kernel.org, aalbersh@kernel.org Cc: hch@lst.de, linux-xfs@vger.kernel.org Message-ID: <173888086229.2738568.17046030028284704437.stgit@frogsfrogsfrogs> In-Reply-To: <173888086034.2738568.15125078367450007162.stgit@frogsfrogsfrogs> References: <173888086034.2738568.15125078367450007162.stgit@frogsfrogsfrogs> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Don't change the bulkstat request icount in bulkstat_for_inumbers because alloc_ichunk already set it to LIBFROG_BULKSTAT_CHUNKSIZE. Lowering it to xi_alloccount here means that we can miss inodes at the end of the inumbers chunk if any are allocated to the same inobt record after the inumbers call but before the bulkstat call. Cc: # v5.3.0 Fixes: e3724c8b82a320 ("xfs_scrub: refactor xfs_iterate_inodes_range_check") Signed-off-by: "Darrick J. Wong" Reviewed-by: Christoph Hellwig --- scrub/inodes.c | 1 - 1 file changed, 1 deletion(-) diff --git a/scrub/inodes.c b/scrub/inodes.c index a7ea24615e9255..4e4408f9ff2256 100644 --- a/scrub/inodes.c +++ b/scrub/inodes.c @@ -66,7 +66,6 @@ bulkstat_for_inumbers( /* First we try regular bulkstat, for speed. */ breq->hdr.ino = inumbers->xi_startino; - breq->hdr.icount = inumbers->xi_alloccount; error = -xfrog_bulkstat(&ctx->mnt, breq); if (error) { char errbuf[DESCR_BUFSZ]; From patchwork Thu Feb 6 22:33:48 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13963883 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3FB591A9B3F for ; Thu, 6 Feb 2025 22:33:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738881229; cv=none; b=FWJdf7LvY60mz4ZIZNrERuGHFlAh1Ndpg9S48O7U8KLhcKdBmhz7yaVWkQ6xJ3cAgyBBYuceiUcFM+c1BbQZoMV5Nf4H0idPo9L8RgtFf/4RPOS+qNY87XosRxCsCZecQNYEeYDcYs7sOJ9uMEL29RgFyKQ3zkTfYLgkYRMDqwo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738881229; c=relaxed/simple; bh=usi1lkw+EEgJnxDq/uYMS83/hq4V9riqP0zRXIlkt1E=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=sYMljl8iDwp4l34IKTkICs9n5mN5O/ooQFc3viJG+KHzg6pwj65uMQsHPbELPw0G7GOF23o2HIqWPiKDt33eywdXk7mhHxafHvOgD2yHDuWL8z95/9SquOoEeWbQXM1loNwaF96GAC/WB2h5ULtlqcPOjQqkbTkZQQRLmUPsavY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=dII+C/lY; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="dII+C/lY" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 179B9C4CEDD; Thu, 6 Feb 2025 22:33:49 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1738881229; bh=usi1lkw+EEgJnxDq/uYMS83/hq4V9riqP0zRXIlkt1E=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=dII+C/lYGvQ7yl+zbY5xH1pa5fNLiwaCpYzpZCSduZXsFFfS4MOwxKS+V+PTXvVB8 kAGHPqa58jWmvfBsFCWPNmfLlMshiKixnNH42Jabkq6nanDe17QOfPetImWNTeFBUZ TSgYXatzkXA20wyK0llE3dOkXx/p1MG9FBD9zRj2Exboba3oJ5PDk5KUK8MVaye0eE iytRDiPIxNMtnvBa2AwcIMOxMV0P5b3CbjbhG7eHbF85wUsZglW1Ufn5Bq9QUB2VJb tEco8pawhilDKzweXXmdK7vEIiHEnb5h06vTthz1/typAAOPRK1es/s88CXYWEjk5h DFyWoZ0zkpOUA== Date: Thu, 06 Feb 2025 14:33:48 -0800 Subject: [PATCH 12/17] xfs_scrub: don't complain if bulkstat fails From: "Darrick J. Wong" To: djwong@kernel.org, aalbersh@kernel.org Cc: hch@lst.de, linux-xfs@vger.kernel.org Message-ID: <173888086244.2738568.15432642060089262298.stgit@frogsfrogsfrogs> In-Reply-To: <173888086034.2738568.15125078367450007162.stgit@frogsfrogsfrogs> References: <173888086034.2738568.15125078367450007162.stgit@frogsfrogsfrogs> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong If bulkstat fails, we fall back to loading the bulkstat array one element at a time. There's no reason to log errors. Signed-off-by: "Darrick J. Wong" Reviewed-by: Christoph Hellwig --- scrub/inodes.c | 11 ++--------- 1 file changed, 2 insertions(+), 9 deletions(-) diff --git a/scrub/inodes.c b/scrub/inodes.c index 4e4408f9ff2256..4d3ec07b2d9862 100644 --- a/scrub/inodes.c +++ b/scrub/inodes.c @@ -51,7 +51,6 @@ static void bulkstat_for_inumbers( struct scrub_ctx *ctx, - struct descr *dsc, const struct xfs_inumbers *inumbers, struct xfs_bulkstat_req *breq) { @@ -66,13 +65,7 @@ bulkstat_for_inumbers( /* First we try regular bulkstat, for speed. */ breq->hdr.ino = inumbers->xi_startino; - error = -xfrog_bulkstat(&ctx->mnt, breq); - if (error) { - char errbuf[DESCR_BUFSZ]; - - str_info(ctx, descr_render(dsc), "%s", - strerror_r(error, errbuf, DESCR_BUFSZ)); - } + xfrog_bulkstat(&ctx->mnt, breq); /* * Bulkstat might return inodes beyond xi_startino + CHUNKSIZE. Reduce @@ -239,7 +232,7 @@ scan_ag_bulkstat( descr_set(&dsc_inumbers, &agno); handle_from_fshandle(&handle, ctx->fshandle, ctx->fshandle_len); retry: - bulkstat_for_inumbers(ctx, &dsc_inumbers, inumbers, breq); + bulkstat_for_inumbers(ctx, inumbers, breq); /* Iterate all the inodes. */ for (i = 0; !si->aborted && i < breq->hdr.ocount; i++, bs++) { From patchwork Thu Feb 6 22:34:04 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13963884 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2CC0123C380 for ; Thu, 6 Feb 2025 22:34:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738881245; cv=none; b=Z3H8V9NA1CnN+CS1C0mH6LB33WwO4oIYi0J0VoX8tLfq/WG3gb4nz/y/vQ1wXuZQPr01cvYnDNhlGNdlX74NIji8OVZhejlPOkXBDC/0akEqW4nUXjnBFeLKtP/6aRAR7Gb04xS+doqOMR3TlEsRAJ3w6THp8f8GkzQ+sj/ROxs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738881245; c=relaxed/simple; bh=qx2h7EohY71q2EzGD1FSfRVwcF4uUrwl3GKRnLdXBvY=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=vBDJeNMpkBKjnBYki4WklUxqU3SDP6Hn1L/VnxzWg7OmHyHDp+TyF+FPKXLleNqtdnjrsx7/lREkUULUBCTzU/Tsd04Wr5uiBmCjuYeJNId0PForvL+Tj+17GSrsrO+JtjHQbQ/j+Rwn9/hgXO2GjgBpwzYLKbxZAebWxcVbtM4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=Y3IxWVBF; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Y3IxWVBF" Received: by smtp.kernel.org (Postfix) with ESMTPSA id A8ADEC4CEDD; Thu, 6 Feb 2025 22:34:04 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1738881244; bh=qx2h7EohY71q2EzGD1FSfRVwcF4uUrwl3GKRnLdXBvY=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=Y3IxWVBFNQLVj72OahZhgOVhe4BybITJ3JN1EaPf6/JghbUT6Wnwoik6UB2uad60W F+OqlS18RLab4S/sCGDczHk9IecRsND+5ApCzk+i35ZTUxRfHCnTalHjGIk/uhg11l gUq11R+zOKG80zzhef3mho+aL3eBiA4jfVZ6GC5jJbGFs4NzSOebSV5fxZnXWs6bY/ bBklomMZE2Auoi+2g/Y/Z+e+i5qHE9SQHjhdVv4HZdD6moU4taAbLw07h56aks4Bny BNu6WndnmhoxLuKCVl8iOPqmpv9Xvq8bYpKxrizmj35icGNADXCAJekJuysXjwsyXt tOVbj42hUEHOg== Date: Thu, 06 Feb 2025 14:34:04 -0800 Subject: [PATCH 13/17] xfs_scrub: return early from bulkstat_for_inumbers if no bulkstat data From: "Darrick J. Wong" To: djwong@kernel.org, aalbersh@kernel.org Cc: hch@lst.de, linux-xfs@vger.kernel.org Message-ID: <173888086259.2738568.15642483253868951064.stgit@frogsfrogsfrogs> In-Reply-To: <173888086034.2738568.15125078367450007162.stgit@frogsfrogsfrogs> References: <173888086034.2738568.15125078367450007162.stgit@frogsfrogsfrogs> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong If bulkstat doesn't return an error code or any bulkstat records, we've hit the end of the filesystem, so return early. This can happen if the inumbers data came from the very last inobt record in the filesystem and every inode in that inobt record is freed immediately after INUMBERS. There's no bug here, it's just a minor optimization. Signed-off-by: "Darrick J. Wong" Reviewed-by: Christoph Hellwig --- scrub/inodes.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/scrub/inodes.c b/scrub/inodes.c index 4d3ec07b2d9862..3b9026ce8fa2f4 100644 --- a/scrub/inodes.c +++ b/scrub/inodes.c @@ -65,7 +65,9 @@ bulkstat_for_inumbers( /* First we try regular bulkstat, for speed. */ breq->hdr.ino = inumbers->xi_startino; - xfrog_bulkstat(&ctx->mnt, breq); + error = -xfrog_bulkstat(&ctx->mnt, breq); + if (!error && !breq->hdr.ocount) + return; /* * Bulkstat might return inodes beyond xi_startino + CHUNKSIZE. Reduce From patchwork Thu Feb 6 22:34:19 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13963885 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7FA3623C380 for ; Thu, 6 Feb 2025 22:34:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738881260; cv=none; b=l9hZRo8Gebtlun7EGEjWTIdiiNekZpbxx/Znyx/N0fDA4PwSfFWngdevv2hlEizm45O+ATWPZcM64s7C2XAgbbAotZgVKrIipVF63GgOrafilGrvLNBK5jQmbrEN9tJJtIOkteYqXt8GjO0Sypcl6uU6wST4TGfswjZYM3XiLAU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738881260; c=relaxed/simple; bh=bLof78tYdk+S1R2T0t5UONSbz4MfPZWfxtIfbZ0gmeE=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=nmD1xJkvZBeiiEtFLiktK2yrQwde3LlEVBqVkieq0BoxwCPd81JJ7u/mtKmrXLcjsgk3xPmUtM3AGFXzRYvJiqO9p9E8EEedDgA1l6gdTtp/uJ6NhtjLsna/K7KB3jgJJC/te3GsTr3ZkBb7I1wISY3d9ziRAnGCqEdK6O2G8pk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=k5Kk0cog; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="k5Kk0cog" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 4F369C4CEDD; Thu, 6 Feb 2025 22:34:20 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1738881260; bh=bLof78tYdk+S1R2T0t5UONSbz4MfPZWfxtIfbZ0gmeE=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=k5Kk0cogS4EKbZ2hYznpAMFSTB3ETN/BYp8TiuhQ7Pu6WFqfs5ejSEs7X3e6E33F5 7NqcM8aGO4svzLZFZv99GjwYlKt24uq/MbzAX9/dHpcCusT5uFPBZ4IMgKktT8Hjdv SfVr4X3pVyrMay9tZyRFTp8O25kjY0q3Ncga5rkBCYbrClnFqDV4eyeRDL6+WDaIxa iDbwbSgKxAiqaQyMUquiZ497GsJ1b2pnfzPLOyId6Sgzavce8YpICm27TRGKYjtpp+ fxfqb7Wd6njvsItK8Ngst50PX0TxEbf58fsPTinFerQ+QgmYkKreFEh9DhL5crS3Lr g8RgJxGSDPAKQ== Date: Thu, 06 Feb 2025 14:34:19 -0800 Subject: [PATCH 14/17] xfs_scrub: don't blow away new inodes in bulkstat_single_step From: "Darrick J. Wong" To: djwong@kernel.org, aalbersh@kernel.org Cc: hch@lst.de, linux-xfs@vger.kernel.org Message-ID: <173888086274.2738568.5398591109789938783.stgit@frogsfrogsfrogs> In-Reply-To: <173888086034.2738568.15125078367450007162.stgit@frogsfrogsfrogs> References: <173888086034.2738568.15125078367450007162.stgit@frogsfrogsfrogs> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong bulkstat_single_step has an ugly misfeature -- given the inumbers record, it expects to find bulkstat data for those inodes, in the exact order that they were specified in inumbers. If a new inode is created after inumbers but before bulkstat, bulkstat will return stat data for that inode, only to have bulkstat_single_step obliterate it. Then we fail to scan that inode. Instead, we should use the returned bulkstat array to compute a bitmask of inodes that bulkstat had to have seen while it was walking the inobt. An important detail is that any inode between the @ino parameter passed to bulkstat and the last bulkstat record it returns was seen, even if no bstat record was produced. Any inode set in xi_allocmask but not set in the seen_mask is missing and needs to be loaded. Load bstat data for those inodes into the /end/ of the array so that we don't obliterate bstat data for a newly created inode, then re-sort the array so we always scan in ascending inumber order. Cc: # v5.18.0 Fixes: 245c72a6eeb720 ("xfs_scrub: balance inode chunk scan across CPUs") Signed-off-by: "Darrick J. Wong" Reviewed-by: Christoph Hellwig --- scrub/inodes.c | 144 ++++++++++++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 123 insertions(+), 21 deletions(-) diff --git a/scrub/inodes.c b/scrub/inodes.c index 3b9026ce8fa2f4..ffdf0f2ae42c17 100644 --- a/scrub/inodes.c +++ b/scrub/inodes.c @@ -26,16 +26,41 @@ * * This is a little more involved than repeatedly asking BULKSTAT for a * buffer's worth of stat data for some number of inodes. We want to scan as - * many of the inodes that the inobt thinks there are, including the ones that - * are broken, but if we ask for n inodes starting at x, it'll skip the bad - * ones and fill from beyond the range (x + n). - * - * Therefore, we ask INUMBERS to return one inobt chunk's worth of inode - * bitmap information. Then we try to BULKSTAT only the inodes that were - * present in that chunk, and compare what we got against what INUMBERS said - * was there. If there's a mismatch, we know that we have an inode that fails - * the verifiers but we can inject the bulkstat information to force the scrub - * code to deal with the broken inodes. + * many of the inodes that the inobt thinks there are, so we use the INUMBERS + * ioctl to walk all the inobt records in the filesystem and spawn a worker to + * bulkstat and iterate. The worker starts with an inumbers record that can + * look like this: + * + * {startino = S, allocmask = 0b11011} + * + * Given a starting inumber S and count C=64, bulkstat will return a sorted + * array of stat information. The bs_ino of those array elements can look like + * any of the following: + * + * 0. [S, S+1, S+3, S+4] + * 1. [S+e, S+e+1, S+e+3, S+e+4, S+e+C+1...], where e >= 0 + * 2. [S+e+n], where n >= 0 + * 3. [] + * 4. [], errno == EFSCORRUPTED + * + * We know that bulkstat scanned the entire inode range between S and bs_ino of + * the last array element, even though it only fills out an array element for + * allocated inodes. Therefore, we can say in cases 0-2 that S was filled, + * even if there is no bstat[] record for S. In turn, we can create a bitmask + * of inodes that we have seen, and set bits 0 through (bstat[-1].bs_ino - S), + * being careful not to set any bits past S+C. + * + * In case (0) we find that seen mask matches the inumber record + * exactly, so the caller can walk the stat records and move on. In case (1) + * this is also true, but we must be careful to reduce the array length to + * avoid scanning inodes that are not in the inumber chunk. In case (3) we + * conclude that there were no inodes left to scan and terminate. + * + * Inodes that are set in the allocmask but not set in the seen mask are the + * corrupt inodes. For each of these cases, we try to populate the bulkstat + * array one inode at a time. If the kernel returns a matching record we can + * use it; if instead we receive an error, we synthesize enough of a record + * to be able to run online scrub by handle. * * If the iteration function returns ESTALE, that means that the inode has * been deleted and possibly recreated since the BULKSTAT call. We wil @@ -43,6 +68,57 @@ * the staleness as an error. */ +/* + * Return the inumber of the highest inode in the bulkstat data, assuming the + * records are sorted in inumber order. + */ +static inline uint64_t last_bstat_ino(const struct xfs_bulkstat_req *b) +{ + return b->hdr.ocount ? b->bulkstat[b->hdr.ocount - 1].bs_ino : 0; +} + +/* + * Deduce the bitmask of the inodes in inums that were seen by bulkstat. If + * the inode is present in the bstat array this is trivially true; or if it is + * not in the array but higher inumbers are present, then it was freed. + */ +static __u64 +seen_mask_from_bulkstat( + const struct xfs_inumbers *inums, + __u64 breq_startino, + const struct xfs_bulkstat_req *breq) +{ + const __u64 limit_ino = + inums->xi_startino + LIBFROG_BULKSTAT_CHUNKSIZE; + const __u64 last = last_bstat_ino(breq); + __u64 ret = 0; + int i, maxi; + + /* Ignore the bulkstat results if they don't cover inumbers */ + if (breq_startino > limit_ino || last < inums->xi_startino) + return 0; + + maxi = min(LIBFROG_BULKSTAT_CHUNKSIZE, last - inums->xi_startino + 1); + for (i = breq_startino - inums->xi_startino; i < maxi; i++) + ret |= 1ULL << i; + + return ret; +} + +#define cmp_int(l, r) ((l > r) - (l < r)) + +/* Compare two bulkstat records by inumber. */ +static int +compare_bstat( + const void *a, + const void *b) +{ + const struct xfs_bulkstat *ba = a; + const struct xfs_bulkstat *bb = b; + + return cmp_int(ba->bs_ino, bb->bs_ino); +} + /* * Run bulkstat on an entire inode allocation group, then check that we got * exactly the inodes we expected. If not, load them one at a time (or fake @@ -54,10 +130,10 @@ bulkstat_for_inumbers( const struct xfs_inumbers *inumbers, struct xfs_bulkstat_req *breq) { - struct xfs_bulkstat *bstat = breq->bulkstat; - struct xfs_bulkstat *bs; + struct xfs_bulkstat *bs = NULL; const uint64_t limit_ino = inumbers->xi_startino + LIBFROG_BULKSTAT_CHUNKSIZE; + uint64_t seen_mask = 0; int i; int error; @@ -66,8 +142,12 @@ bulkstat_for_inumbers( /* First we try regular bulkstat, for speed. */ breq->hdr.ino = inumbers->xi_startino; error = -xfrog_bulkstat(&ctx->mnt, breq); - if (!error && !breq->hdr.ocount) - return; + if (!error) { + if (!breq->hdr.ocount) + return; + seen_mask |= seen_mask_from_bulkstat(inumbers, + inumbers->xi_startino, breq); + } /* * Bulkstat might return inodes beyond xi_startino + CHUNKSIZE. Reduce @@ -80,18 +160,33 @@ bulkstat_for_inumbers( } /* - * Check each of the stats we got back to make sure we got the inodes - * we asked for. + * Walk the xi_allocmask looking for set bits that aren't present in + * the fill mask. For each such inode, fill the entries at the end of + * the array with stat information one at a time, synthesizing them if + * necessary. At this point, (xi_allocmask & ~seen_mask) should be the + * corrupt inodes. */ - for (i = 0, bs = bstat; i < LIBFROG_BULKSTAT_CHUNKSIZE; i++) { + for (i = 0; i < LIBFROG_BULKSTAT_CHUNKSIZE; i++) { + /* + * Don't single-step if inumbers said it wasn't allocated or + * bulkstat actually filled it. + */ if (!(inumbers->xi_allocmask & (1ULL << i))) continue; - if (bs->bs_ino == inumbers->xi_startino + i) { - bs++; + if (seen_mask & (1ULL << i)) continue; - } - /* Load the one inode. */ + assert(breq->hdr.ocount < LIBFROG_BULKSTAT_CHUNKSIZE); + + if (!bs) + bs = &breq->bulkstat[breq->hdr.ocount]; + + /* + * Didn't get desired stat data and we've hit the end of the + * returned data. We can't distinguish between the inode being + * freed vs. the inode being to corrupt to load, so try a + * bulkstat single to see if we can load the inode. + */ error = -xfrog_bulkstat_single(&ctx->mnt, inumbers->xi_startino + i, breq->hdr.flags, bs); if (error || bs->bs_ino != inumbers->xi_startino + i) { @@ -99,8 +194,15 @@ bulkstat_for_inumbers( bs->bs_ino = inumbers->xi_startino + i; bs->bs_blksize = ctx->mnt_sv.f_frsize; } + + breq->hdr.ocount++; bs++; } + + /* If we added any entries, re-sort the array. */ + if (bs) + qsort(breq->bulkstat, breq->hdr.ocount, + sizeof(struct xfs_bulkstat), compare_bstat); } /* BULKSTAT wrapper routines. */ From patchwork Thu Feb 6 22:34:35 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13963887 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8C6CC23C380 for ; Thu, 6 Feb 2025 22:34:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738881276; cv=none; b=aTVs/SAJKvI+0HFpH2f5D7clhulFipoj2O7/TWXci78da3tb5s5vJFCFm5nFmiv1CgtRhR0skuE+PkX77+ohYmdM5tzfy1rDScZLRgbFc0icckZSktUaPAJoeM4FXm+1yljpPWy/CeCfcH5IAimh/POYqUARN2RdeqLmJXzKsH4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738881276; c=relaxed/simple; bh=K+QiUzO24GNyuAZCg55P8Vwd8q2Ao1Mw+PKyDl27gCM=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=ut/bK8ARZvw0OHz1EEbfllR2pT2ZmpmfEBMDdHTLqm9gZaaP3VJ/9VHiyWOMxoyYPP238TUsPG5fp6xd5mOOrXxla3+X/R2mSdpiTOyKVm2AkeZhRu9D/rEbOheYQUMPa5w3mTPgo0LIFcjeOoaHUrIb62ZhjqhgsEXQMFh9t0I= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=kgmFP6Vq; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="kgmFP6Vq" Received: by smtp.kernel.org (Postfix) with ESMTPSA id F2D60C4CEDD; Thu, 6 Feb 2025 22:34:35 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1738881276; bh=K+QiUzO24GNyuAZCg55P8Vwd8q2Ao1Mw+PKyDl27gCM=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=kgmFP6VqMY7QURbtbOUv7/uqZa3zAaKBjbIKg1R1ooRlhfQIiVKv6NMiKRCVdcLq2 T8CCuUVopVljCwsvSHB3FdJhU+aB5XurT+3XE57vh5vMcMOxmYznFaz33i4QsZV8gD HlrM63ftLfkA+LtEEF8EMzcHVdN63ljEMraxiMW9r0HyglgbyhEk5OAs5Ye0KFgGL3 MLeC7erewb4CgwpkFeU5c0wAf+tqiASjgkRK7YDK0NpmWsMx7ywOUPLg0/agSdEjv5 5/aEJbghfYeuYkr3c7fgUBRZ1+T2ZCfn+5GiPe6Zn8shWiAMSXm7PbDGUrEL+omtA2 1Uay9eq+IhjdA== Date: Thu, 06 Feb 2025 14:34:35 -0800 Subject: [PATCH 15/17] xfs_scrub: hoist the phase3 bulkstat single stepping code From: "Darrick J. Wong" To: djwong@kernel.org, aalbersh@kernel.org Cc: hch@lst.de, linux-xfs@vger.kernel.org Message-ID: <173888086287.2738568.12350824518838304954.stgit@frogsfrogsfrogs> In-Reply-To: <173888086034.2738568.15125078367450007162.stgit@frogsfrogsfrogs> References: <173888086034.2738568.15125078367450007162.stgit@frogsfrogsfrogs> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong We're about to make the bulkstat single step loading code more complex, so hoist it into a separate function. Signed-off-by: "Darrick J. Wong" Reviewed-by: Christoph Hellwig --- scrub/inodes.c | 89 +++++++++++++++++++++++++++++++++----------------------- 1 file changed, 53 insertions(+), 36 deletions(-) diff --git a/scrub/inodes.c b/scrub/inodes.c index ffdf0f2ae42c17..84696a5bcda7d1 100644 --- a/scrub/inodes.c +++ b/scrub/inodes.c @@ -120,52 +120,23 @@ compare_bstat( } /* - * Run bulkstat on an entire inode allocation group, then check that we got - * exactly the inodes we expected. If not, load them one at a time (or fake - * it) into the bulkstat data. + * Walk the xi_allocmask looking for set bits that aren't present in + * the fill mask. For each such inode, fill the entries at the end of + * the array with stat information one at a time, synthesizing them if + * necessary. At this point, (xi_allocmask & ~seen_mask) should be the + * corrupt inodes. */ static void -bulkstat_for_inumbers( +bulkstat_single_step( struct scrub_ctx *ctx, const struct xfs_inumbers *inumbers, + uint64_t seen_mask, struct xfs_bulkstat_req *breq) { struct xfs_bulkstat *bs = NULL; - const uint64_t limit_ino = - inumbers->xi_startino + LIBFROG_BULKSTAT_CHUNKSIZE; - uint64_t seen_mask = 0; int i; int error; - assert(inumbers->xi_allocmask != 0); - - /* First we try regular bulkstat, for speed. */ - breq->hdr.ino = inumbers->xi_startino; - error = -xfrog_bulkstat(&ctx->mnt, breq); - if (!error) { - if (!breq->hdr.ocount) - return; - seen_mask |= seen_mask_from_bulkstat(inumbers, - inumbers->xi_startino, breq); - } - - /* - * Bulkstat might return inodes beyond xi_startino + CHUNKSIZE. Reduce - * ocount to ignore inodes not described by the inumbers record. - */ - for (i = breq->hdr.ocount - 1; i >= 0; i--) { - if (breq->bulkstat[i].bs_ino < limit_ino) - break; - breq->hdr.ocount--; - } - - /* - * Walk the xi_allocmask looking for set bits that aren't present in - * the fill mask. For each such inode, fill the entries at the end of - * the array with stat information one at a time, synthesizing them if - * necessary. At this point, (xi_allocmask & ~seen_mask) should be the - * corrupt inodes. - */ for (i = 0; i < LIBFROG_BULKSTAT_CHUNKSIZE; i++) { /* * Don't single-step if inumbers said it wasn't allocated or @@ -205,6 +176,52 @@ bulkstat_for_inumbers( sizeof(struct xfs_bulkstat), compare_bstat); } +/* + * Run bulkstat on an entire inode allocation group, then check that we got + * exactly the inodes we expected. If not, load them one at a time (or fake + * it) into the bulkstat data. + */ +static void +bulkstat_for_inumbers( + struct scrub_ctx *ctx, + const struct xfs_inumbers *inumbers, + struct xfs_bulkstat_req *breq) +{ + const uint64_t limit_ino = + inumbers->xi_startino + LIBFROG_BULKSTAT_CHUNKSIZE; + uint64_t seen_mask = 0; + int i; + int error; + + assert(inumbers->xi_allocmask != 0); + + /* First we try regular bulkstat, for speed. */ + breq->hdr.ino = inumbers->xi_startino; + error = -xfrog_bulkstat(&ctx->mnt, breq); + if (!error) { + if (!breq->hdr.ocount) + return; + seen_mask |= seen_mask_from_bulkstat(inumbers, + inumbers->xi_startino, breq); + } + + /* + * Bulkstat might return inodes beyond xi_startino + CHUNKSIZE. Reduce + * ocount to ignore inodes not described by the inumbers record. + */ + for (i = breq->hdr.ocount - 1; i >= 0; i--) { + if (breq->bulkstat[i].bs_ino < limit_ino) + break; + breq->hdr.ocount--; + } + + /* + * Fill in any missing inodes that are mentioned in the alloc mask but + * weren't previously seen by bulkstat. + */ + bulkstat_single_step(ctx, inumbers, seen_mask, breq); +} + /* BULKSTAT wrapper routines. */ struct scan_inodes { struct workqueue wq_bulkstat; From patchwork Thu Feb 6 22:34:51 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13963888 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BC38123C380 for ; Thu, 6 Feb 2025 22:34:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738881291; cv=none; b=uKOZsBj1f4ZBqcLE42GR/+97UUL879LK5z9LQSOvNpHcP/S/jNNd7ArHm7QFrP6xHiTjQZFdDO8SOg4iB8UhsId1GyAKXK5hxu8V8T5QhVm440Z4mpKyaYB5Z6UkiGYSlun5ynnuQ43NHZmkOtFJWSXrT6GkKwYTXYK7hGe4Tmw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738881291; c=relaxed/simple; bh=EntUuLe0v8aH7mEuub1bOR7lNmBDCnEbQvYQQlYHCSE=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=l8GMiA+0/IkQHQsdkdkAsKPFHOPxbWeywOly/YHVKWk/yxvmVREbcKimWhPbtBWq8J3eY27yk97M/MCQpTRFyGPTljoLiG98Q5a6t9sA1jFpBES6EIX2SEqZTN0QgV0UJ+FDyVxqZRGB0hFWg5eunWTKcI3rhJr54mETgtHm3PM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=sxH6MSSD; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="sxH6MSSD" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 95B88C4CEDD; Thu, 6 Feb 2025 22:34:51 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1738881291; bh=EntUuLe0v8aH7mEuub1bOR7lNmBDCnEbQvYQQlYHCSE=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=sxH6MSSD/45/vuAKlC0C1mqlLMa2lxe7RnxIfJOuQfW3fu5d3od8EZhDCs6fhoA47 3Fzts6d/Nv0NgIX00o5ynBAecBhP8Lx279zHz/pz4ftvwkVBe8y+t3Q7ak/JOnHYYJ G47fXuvmhr7ovSddPmmYukMd4l1STUXASqR71AsQA9REsvux3gg7olSo6HHBuRh1S3 BLGDELqe3oQAyCYGH4CwHbysqjTxtZ/Xw46tEEjhuUbEHsRbZferOm0OBstYE08Frf dKeCixs5ecNrRLzJQGO+r6rZ5SGyTP8Idcdl8ODx/FFOXNJlfs3RpQ4C7KLBYj+7zT VQNL5I4dOWMtg== Date: Thu, 06 Feb 2025 14:34:51 -0800 Subject: [PATCH 16/17] xfs_scrub: ignore freed inodes when single-stepping during phase 3 From: "Darrick J. Wong" To: djwong@kernel.org, aalbersh@kernel.org Cc: hch@lst.de, linux-xfs@vger.kernel.org Message-ID: <173888086302.2738568.11012690239317955502.stgit@frogsfrogsfrogs> In-Reply-To: <173888086034.2738568.15125078367450007162.stgit@frogsfrogsfrogs> References: <173888086034.2738568.15125078367450007162.stgit@frogsfrogsfrogs> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong For inodes that inumbers told us were allocated but weren't loaded by the bulkstat call, we fall back to loading bulkstat data one inode at a time to try to find the inodes that are too corrupt to load. However, there are a couple of outcomes of the single bulkstat call that clearly indicate that the inode is free, not corrupt. In this case, the phase 3 inode scan will try to scrub the inode, only to be told ENOENT because it doesn't exist. As an optimization here, don't increment ocount, just move on to the next inode in the mask. Signed-off-by: "Darrick J. Wong" Reviewed-by: Christoph Hellwig --- scrub/inodes.c | 26 +++++++++++++++++++++++++- 1 file changed, 25 insertions(+), 1 deletion(-) diff --git a/scrub/inodes.c b/scrub/inodes.c index 84696a5bcda7d1..24a1dcab94c22d 100644 --- a/scrub/inodes.c +++ b/scrub/inodes.c @@ -160,10 +160,34 @@ bulkstat_single_step( */ error = -xfrog_bulkstat_single(&ctx->mnt, inumbers->xi_startino + i, breq->hdr.flags, bs); - if (error || bs->bs_ino != inumbers->xi_startino + i) { + switch (error) { + case ENOENT: + /* + * This inode wasn't found, and no results were + * returned. We've likely hit the end of the + * filesystem, but we'll move on to the next inode in + * the mask for the sake of caution. + */ + continue; + case 0: + /* + * If a result was returned but it wasn't the inode + * we were looking for, then the missing inode was + * freed. Move on to the next inode in the mask. + */ + if (bs->bs_ino != inumbers->xi_startino + i) + continue; + break; + default: + /* + * Some error happened. Synthesize a bulkstat record + * so that phase3 can try to see if there's a corrupt + * inode that needs repairing. + */ memset(bs, 0, sizeof(struct xfs_bulkstat)); bs->bs_ino = inumbers->xi_startino + i; bs->bs_blksize = ctx->mnt_sv.f_frsize; + break; } breq->hdr.ocount++; From patchwork Thu Feb 6 22:35:06 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13963889 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C176C23C380 for ; Thu, 6 Feb 2025 22:35:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738881307; cv=none; b=LakXTLKC71oO7MCLDabFs5G1U2VTJoP0jJMFkMLu6jsyqIEex912ogi1oe0PsP4+GJyoCpgRTYxRDo8PjBfLv1FD2jwreq2f/NajNT4lwuKGKwD2IEh3Q5YQ3Nq5AWt4NVRsDgyIR0vEZBo0KulzGLak90QP3CghTtnMFB6zsHc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738881307; c=relaxed/simple; bh=1ZI6SBq5e4VMF7WCR4KuSUIlVRiklt4AHPT/rURa1qs=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=tfWvjZLR2Y3jg7FcBtW3r7cl+HaVENU1ZjjvpUUiRdj1qinisZAADzsIHROfydLKXbO2Li2pAenXLOxWYjdTyXb64jjsDWOGZFYkBp9h5fvMCSSDpmhJIvCoRknznzLE76imlQXFnuOXP8gD9bfrDIgwOIGQ93BB7Vlkn9kMqmE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=rkM814fM; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="rkM814fM" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 37E8CC4CEDD; Thu, 6 Feb 2025 22:35:07 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1738881307; bh=1ZI6SBq5e4VMF7WCR4KuSUIlVRiklt4AHPT/rURa1qs=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=rkM814fM5nFd2K3cP+MOIUIKz0kLisHTFC5GqCupFPMH/M5DYIgYgL1KASwo0jdzh 3IFY3J1W/Mo160u01+i19q0V5AKh/3n62HK91rLPODpa74wfwbCkLrTgQYSHHWHu9h IlljoQtPVYBuC34gilAx40QwkoS+PYxQIEY3VQf1qVHATixrQGHEyiRPjjxzKO9YWj vHMDDiZui5SpyqRYat0aaqy7Cp8Rn5lX1hFbY9mcZLldLeJRkIvmdVkILdhaixmczh wCStkO3BstI46vfn1XQPNqqVyQAuD//bt7UyIgkGy9Q+hazss6DVErsgVAiBMolMYo OME59R/AsdO4Q== Date: Thu, 06 Feb 2025 14:35:06 -0800 Subject: [PATCH 17/17] xfs_scrub: try harder to fill the bulkstat array with bulkstat() From: "Darrick J. Wong" To: djwong@kernel.org, aalbersh@kernel.org Cc: hch@lst.de, linux-xfs@vger.kernel.org Message-ID: <173888086317.2738568.6808179914591920294.stgit@frogsfrogsfrogs> In-Reply-To: <173888086034.2738568.15125078367450007162.stgit@frogsfrogsfrogs> References: <173888086034.2738568.15125078367450007162.stgit@frogsfrogsfrogs> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Sometimes, the last bulkstat record returned by the first xfrog_bulkstat call in bulkstat_for_inumbers will contain an inumber less than the highest allocated inode mentioned in the inumbers record. This happens either because the inodes have been freed, or because the the kernel encountered a corrupt inode during bulkstat and stopped filling up the array. In both cases, we can call bulkstat again to try to fill up the rest of the array. If there are newly allocated inodes, they'll be returned; if we've truly hit the end of the filesystem, the kernel will return zero records; and if the first allocated inode is indeed corrupt, the kernel will return EFSCORRUPTED. As an optimization to avoid the single-step code, call bulkstat with an increasing ino parameter until the bulkstat array is full or the kernel tells us there are no bulkstat records to return. This speeds things up a bit in cases where the allocmask is all ones and only the second inode is corrupt. Signed-off-by: "Darrick J. Wong" Reviewed-by: Christoph Hellwig --- libfrog/bitmask.h | 6 +++ scrub/inodes.c | 110 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 115 insertions(+), 1 deletion(-) diff --git a/libfrog/bitmask.h b/libfrog/bitmask.h index 719a6bfd29db38..47e39a1e09d002 100644 --- a/libfrog/bitmask.h +++ b/libfrog/bitmask.h @@ -42,4 +42,10 @@ static inline int test_and_set_bit(int nr, volatile unsigned long *addr) return 0; } +/* Get high bit set out of 64-bit argument, -1 if none set */ +static inline int xfrog_highbit64(uint64_t v) +{ + return fls64(v) - 1; +} + #endif /* __LIBFROG_BITMASK_H_ */ diff --git a/scrub/inodes.c b/scrub/inodes.c index 24a1dcab94c22d..2f3c87be79f783 100644 --- a/scrub/inodes.c +++ b/scrub/inodes.c @@ -20,6 +20,8 @@ #include "libfrog/fsgeom.h" #include "libfrog/bulkstat.h" #include "libfrog/handle_priv.h" +#include "bitops.h" +#include "libfrog/bitmask.h" /* * Iterate a range of inodes. @@ -56,6 +58,15 @@ * avoid scanning inodes that are not in the inumber chunk. In case (3) we * conclude that there were no inodes left to scan and terminate. * + * In case (2) and (4) we don't know why bulkstat returned fewer than C + * elements. We might have found the end of the filesystem, or the kernel + * might have found a corrupt inode and stopped. This we must investigate by + * trying to fill out the rest of the bstat array starting with the next + * inumber after the last bstat array element filled, and continuing until S' + * is beyond S0 + C, or the array is full. Each time we succeed in loading + * new records, the kernel increases S' for us; if instead we encounter case + * (4), we can increment S' ourselves. + * * Inodes that are set in the allocmask but not set in the seen mask are the * corrupt inodes. For each of these cases, we try to populate the bulkstat * array one inode at a time. If the kernel returns a matching record we can @@ -105,6 +116,87 @@ seen_mask_from_bulkstat( return ret; } +/* + * Try to fill the rest of orig_breq with bulkstat data by re-running bulkstat + * with increasing start_ino until we either hit the end of the inumbers info + * or fill up the bstat array with something. Returns a bitmask of the inodes + * within inums that were filled by the bulkstat requests. + */ +static __u64 +bulkstat_the_rest( + struct scrub_ctx *ctx, + const struct xfs_inumbers *inums, + struct xfs_bulkstat_req *orig_breq, + int orig_error) +{ + struct xfs_bulkstat_req *new_breq; + struct xfs_bulkstat *old_bstat = + &orig_breq->bulkstat[orig_breq->hdr.ocount]; + const __u64 limit_ino = + inums->xi_startino + LIBFROG_BULKSTAT_CHUNKSIZE; + __u64 start_ino = orig_breq->hdr.ino; + __u64 seen_mask = 0; + int error; + + assert(orig_breq->hdr.ocount < orig_breq->hdr.icount); + + /* + * If the first bulkstat returned a corruption error, that means + * start_ino is corrupt. Restart instead at the next inumber. + */ + if (orig_error == EFSCORRUPTED) + start_ino++; + if (start_ino >= limit_ino) + return 0; + + error = -xfrog_bulkstat_alloc_req( + orig_breq->hdr.icount - orig_breq->hdr.ocount, + start_ino, &new_breq); + if (error) + return error; + new_breq->hdr.flags = orig_breq->hdr.flags; + + do { + /* + * Fill the new bulkstat request with stat data starting at + * start_ino. + */ + error = -xfrog_bulkstat(&ctx->mnt, new_breq); + if (error == EFSCORRUPTED) { + /* + * start_ino is corrupt, increment and try the next + * inode. + */ + start_ino++; + new_breq->hdr.ino = start_ino; + continue; + } + if (error) { + /* + * Any other error means the caller falls back to + * single stepping. + */ + break; + } + if (new_breq->hdr.ocount == 0) + break; + + /* Copy new results to the original bstat buffer */ + memcpy(old_bstat, new_breq->bulkstat, + new_breq->hdr.ocount * sizeof(struct xfs_bulkstat)); + orig_breq->hdr.ocount += new_breq->hdr.ocount; + old_bstat += new_breq->hdr.ocount; + seen_mask |= seen_mask_from_bulkstat(inums, start_ino, + new_breq); + + new_breq->hdr.icount -= new_breq->hdr.ocount; + start_ino = new_breq->hdr.ino; + } while (new_breq->hdr.icount > 0 && new_breq->hdr.ino < limit_ino); + + free(new_breq); + return seen_mask; +} + #define cmp_int(l, r) ((l > r) - (l < r)) /* Compare two bulkstat records by inumber. */ @@ -200,6 +292,12 @@ bulkstat_single_step( sizeof(struct xfs_bulkstat), compare_bstat); } +/* Return the inumber of the highest allocated inode in the inumbers data. */ +static inline uint64_t last_allocmask_ino(const struct xfs_inumbers *i) +{ + return i->xi_startino + xfrog_highbit64(i->xi_allocmask); +} + /* * Run bulkstat on an entire inode allocation group, then check that we got * exactly the inodes we expected. If not, load them one at a time (or fake @@ -229,6 +327,16 @@ bulkstat_for_inumbers( inumbers->xi_startino, breq); } + /* + * If the last allocated inode as reported by inumbers is higher than + * the last inode reported by bulkstat, two things could have happened. + * Either all the inodes at the high end of the cluster were freed + * since the inumbers call; or bulkstat encountered a corrupt inode and + * returned early. Try to bulkstat the rest of the array. + */ + if (last_allocmask_ino(inumbers) > last_bstat_ino(breq)) + seen_mask |= bulkstat_the_rest(ctx, inumbers, breq, error); + /* * Bulkstat might return inodes beyond xi_startino + CHUNKSIZE. Reduce * ocount to ignore inodes not described by the inumbers record. @@ -241,7 +349,7 @@ bulkstat_for_inumbers( /* * Fill in any missing inodes that are mentioned in the alloc mask but - * weren't previously seen by bulkstat. + * weren't previously seen by bulkstat. These are the corrupt inodes. */ bulkstat_single_step(ctx, inumbers, seen_mask, breq); }