Message ID | 171988118237.2007602.9576505614542313879.stgit@frogsfrogsfrogs (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [1/8] xfs_scrub: move FITRIM to phase 8 | expand |
On Mon, Jul 01, 2024 at 06:02:36PM -0700, Darrick J. Wong wrote: > Therefore, we created a second implementation that will walk the bnobt > and perform the trims in block number order. This implementation avoids > the worst problems of the original code, though it lacks the desirable > attribute of freeing the biggest chunks first. > > On the other hand, this second implementation will be much easier to > constrain the system call latency, and makes it much easier to report > fstrim progress to anyone who's running xfs_scrub. Skip the first block > of each AG to ensure that we get the sub-AG algorithm. I completely fail to understand how skipping the first block improves performance and is otherwise a good idea. What am I missing?
On Tue, Jul 02, 2024 at 07:30:19AM +0200, Christoph Hellwig wrote: > On Mon, Jul 01, 2024 at 06:02:36PM -0700, Darrick J. Wong wrote: > > Therefore, we created a second implementation that will walk the bnobt > > and perform the trims in block number order. This implementation avoids > > the worst problems of the original code, though it lacks the desirable > > attribute of freeing the biggest chunks first. > > > > On the other hand, this second implementation will be much easier to > > constrain the system call latency, and makes it much easier to report > > fstrim progress to anyone who's running xfs_scrub. Skip the first block > > of each AG to ensure that we get the sub-AG algorithm. > > I completely fail to understand how skipping the first block improves > performance and is otherwise a good idea. What am I missing? Actually, I think this patch doesn't have a good reason to exist on its own anymore. The goal of this patch and the next one is to improve responsiveness to signals and smoothness of the progress bar by reducing the number of bytes per FITRIM call, but I think limiting that to 11GB per call is sufficient without the "skip the first block" games. Those games once was the magic method for getting the kernel to use the by-block discard algorithm instead of the by-length algorithm, but most of the deficiencies in both algorithsm have been fixed now. So in the end the only reason for this patch to continue existing is so that we only issue one log force and lock one AGF per FITRIM call. The next patch exists to constrain latencies. I think I'll combine these two patches into a single patch that avoids having FITRIM requests cross AGs and breaks the requests into 11G chunks to improve latency. --D
diff --git a/scrub/phase8.c b/scrub/phase8.c index 75400c968595..e5f5619a80b5 100644 --- a/scrub/phase8.c +++ b/scrub/phase8.c @@ -45,29 +45,73 @@ fstrim_ok( return true; } -/* Trim the filesystem, if desired. */ -int -phase8_func( - struct scrub_ctx *ctx) +/* Trim a certain range of the filesystem. */ +static int +fstrim_fsblocks( + struct scrub_ctx *ctx, + uint64_t start_fsb, + uint64_t fsbcount) { + uint64_t start = cvt_off_fsb_to_b(&ctx->mnt, start_fsb); + uint64_t len = cvt_off_fsb_to_b(&ctx->mnt, fsbcount); int error; - if (!fstrim_ok(ctx)) - return 0; - - error = fstrim(ctx); + error = fstrim(ctx, start, len); if (error == EOPNOTSUPP) return 0; - if (error) { - str_liberror(ctx, error, _("fstrim")); + char descr[DESCR_BUFSZ]; + + snprintf(descr, sizeof(descr) - 1, + _("fstrim start 0x%llx len 0x%llx"), + (unsigned long long)start, + (unsigned long long)len); + str_liberror(ctx, error, descr); return error; } + return 0; +} + +/* Trim each AG on the data device. */ +static int +fstrim_datadev( + struct scrub_ctx *ctx) +{ + struct xfs_fsop_geom *geo = &ctx->mnt.fsgeom; + uint64_t fsbno; + int error; + + for (fsbno = 0; fsbno < geo->datablocks; fsbno += geo->agblocks) { + uint64_t fsbcount; + + /* + * Skip the first block of each AG to ensure that we get the + * partial-AG discard implementation. This means that we can + * report trim progress to userspace in units smaller than + * entire AGs. + */ + fsbcount = min(geo->datablocks - (fsbno + 1), geo->agblocks - 1); + error = fstrim_fsblocks(ctx, fsbno + 1, fsbcount); + if (error) + return error; + } + progress_add(1); return 0; } +/* Trim the filesystem, if desired. */ +int +phase8_func( + struct scrub_ctx *ctx) +{ + if (!fstrim_ok(ctx)) + return 0; + + return fstrim_datadev(ctx); +} + /* Estimate how much work we're going to do. */ int phase8_estimate( diff --git a/scrub/vfs.c b/scrub/vfs.c index bcfd4f42ca8b..cc958ba9438e 100644 --- a/scrub/vfs.c +++ b/scrub/vfs.c @@ -298,11 +298,15 @@ struct fstrim_range { /* Call FITRIM to trim all the unused space in a filesystem. */ int fstrim( - struct scrub_ctx *ctx) + struct scrub_ctx *ctx, + uint64_t start, + uint64_t len) { - struct fstrim_range range = {0}; + struct fstrim_range range = { + .start = start, + .len = len, + }; - range.len = ULLONG_MAX; if (ioctl(ctx->mnt.fd, FITRIM, &range) == 0) return 0; if (errno == EOPNOTSUPP || errno == ENOTTY) diff --git a/scrub/vfs.h b/scrub/vfs.h index a8a4d72e290a..1af8d80d1de6 100644 --- a/scrub/vfs.h +++ b/scrub/vfs.h @@ -24,6 +24,6 @@ typedef int (*scan_fs_tree_dirent_fn)(struct scrub_ctx *, const char *, int scan_fs_tree(struct scrub_ctx *ctx, scan_fs_tree_dir_fn dir_fn, scan_fs_tree_dirent_fn dirent_fn, void *arg); -int fstrim(struct scrub_ctx *ctx); +int fstrim(struct scrub_ctx *ctx, uint64_t start, uint64_t len); #endif /* XFS_SCRUB_VFS_H_ */