From patchwork Fri Dec 30 22:13:55 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13084968 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D23E9C3DA7D for ; Fri, 30 Dec 2022 23:50:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229749AbiL3Xu2 (ORCPT ); Fri, 30 Dec 2022 18:50:28 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48250 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235738AbiL3Xu1 (ORCPT ); Fri, 30 Dec 2022 18:50:27 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A37FC1E3C1; Fri, 30 Dec 2022 15:50:25 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 35E0F61C39; Fri, 30 Dec 2022 23:50:25 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 945CFC433EF; Fri, 30 Dec 2022 23:50:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1672444224; bh=pwJlrVcqRb9Q59nbn0Wa6w7tHQw2GK948+drrUiHIQU=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=jGJ+uDlm5ARFkmSetFXh8ucjHgSzZZ3rUeILTiqvJIX8X9+rKI9xtBgofxVA5TaZX mcG+Z5qxJWVRhF+9gV+mwv5/i05T+jBJwNHYo3zPkMbPT8+MiLM787zIZ91J7Owemo caketwe34C6hnwxtKHEF0mdvQb5+UGnCk7/NoByp5dIlHlUwM1MqdTlir3tfy/AwcM Yf3gqMU63n98kvkwZuT53yrYuRKTLSeeh1Xaaq8H4IP0GmxZIg0UgNitziF1V7xYqd ZOI00X3cLwAd2/hUvqNcJ5U+pqe4OkMorak9Sj+2MYSW2tPKZXLOKhp8JRaAVLKTHI RoUnO16jNDdSw== Subject: [PATCH 01/21] vfs: introduce new file range exchange ioctl From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org Date: Fri, 30 Dec 2022 14:13:55 -0800 Message-ID: <167243843535.699466.9038562542810181371.stgit@magnolia> In-Reply-To: <167243843494.699466.5163281976943635014.stgit@magnolia> References: <167243843494.699466.5163281976943635014.stgit@magnolia> User-Agent: StGit/0.19 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Darrick J. Wong Introduce a new ioctl to handle swapping ranges of bytes between files. Signed-off-by: Darrick J. Wong --- Documentation/filesystems/vfs.rst | 16 ++ fs/ioctl.c | 27 +++ fs/remap_range.c | 296 +++++++++++++++++++++++++++++++++++++ fs/xfs/libxfs/xfs_fs.h | 1 include/linux/fs.h | 14 ++ include/uapi/linux/fiexchange.h | 101 +++++++++++++ 6 files changed, 454 insertions(+), 1 deletion(-) create mode 100644 include/uapi/linux/fiexchange.h diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystems/vfs.rst index 2c15e7053113..cae6dd3a8a0b 100644 --- a/Documentation/filesystems/vfs.rst +++ b/Documentation/filesystems/vfs.rst @@ -1036,6 +1036,8 @@ This describes how the VFS can manipulate an open file. As of kernel loff_t (*remap_file_range)(struct file *file_in, loff_t pos_in, struct file *file_out, loff_t pos_out, loff_t len, unsigned int remap_flags); + int (*xchg_file_range)(struct file *file1, struct file *file2, + struct file_xchg_range *fxr); int (*fadvise)(struct file *, loff_t, loff_t, int); }; @@ -1154,6 +1156,20 @@ otherwise noted. ok with the implementation shortening the request length to satisfy alignment or EOF requirements (or any other reason). +``xchg_file_range`` + called by the ioctl(2) system call for FIEXCHANGE_RANGE to exchange the + contents of two file ranges. An implementation should exchange + fxr.length bytes starting at fxr.file1_offset in file1 with the same + number of bytes starting at fxr.file2_offset in file2. Refer to + fiexchange.h file for more information. Implementations must call + generic_xchg_file_range_prep to prepare the two files prior to taking + locks; they must call generic_xchg_file_range_check_fresh once the + inode is locked to abort the call if file2 has changed; and they must + update the inode change and mod times of both files as part of the + metadata update. The timestamp updates must be done atomically as part + of the data exchange operation to ensure correctness of the freshness + check. + ``fadvise`` possibly called by the fadvise64() system call. diff --git a/fs/ioctl.c b/fs/ioctl.c index 80ac36aea913..bd636daf8e90 100644 --- a/fs/ioctl.c +++ b/fs/ioctl.c @@ -259,6 +259,30 @@ static long ioctl_file_clone_range(struct file *file, args.src_length, args.dest_offset); } +static long ioctl_file_xchg_range(struct file *file2, + struct file_xchg_range __user *argp) +{ + struct file_xchg_range args; + struct fd file1; + int ret; + + if (copy_from_user(&args, argp, sizeof(args))) + return -EFAULT; + + file1 = fdget(args.file1_fd); + if (!file1.file) + return -EBADF; + + ret = -EXDEV; + if (file1.file->f_path.mnt != file2->f_path.mnt) + goto fdput; + + ret = vfs_xchg_file_range(file1.file, file2, &args); +fdput: + fdput(file1); + return ret; +} + /* * This provides compatibility with legacy XFS pre-allocation ioctls * which predate the fallocate syscall. @@ -825,6 +849,9 @@ static int do_vfs_ioctl(struct file *filp, unsigned int fd, case FIDEDUPERANGE: return ioctl_file_dedupe_range(filp, argp); + case FIEXCHANGE_RANGE: + return ioctl_file_xchg_range(filp, argp); + case FIONREAD: if (!S_ISREG(inode->i_mode)) return vfs_ioctl(filp, cmd, arg); diff --git a/fs/remap_range.c b/fs/remap_range.c index 41f60477bb41..469d53fb42e9 100644 --- a/fs/remap_range.c +++ b/fs/remap_range.c @@ -567,3 +567,299 @@ int vfs_dedupe_file_range(struct file *file, struct file_dedupe_range *same) return ret; } EXPORT_SYMBOL(vfs_dedupe_file_range); + +/* Performs necessary checks before doing a range exchange. */ +static int generic_xchg_file_range_checks(struct file *file1, + struct file *file2, + struct file_xchg_range *fxr, + unsigned int blocksize) +{ + struct inode *inode1 = file1->f_mapping->host; + struct inode *inode2 = file2->f_mapping->host; + uint64_t blkmask = blocksize - 1; + int64_t test_len; + uint64_t blen; + loff_t size1, size2; + int ret; + + /* Don't touch certain kinds of inodes */ + if (IS_IMMUTABLE(inode1) || IS_IMMUTABLE(inode2)) + return -EPERM; + if (IS_SWAPFILE(inode1) || IS_SWAPFILE(inode2)) + return -ETXTBSY; + + size1 = i_size_read(inode1); + size2 = i_size_read(inode2); + + /* Ranges cannot start after EOF. */ + if (fxr->file1_offset > size1 || fxr->file2_offset > size2) + return -EINVAL; + + /* + * If the caller asked for full files, check that the offset/length + * values cover all of both files. + */ + if ((fxr->flags & FILE_XCHG_RANGE_FULL_FILES) && + (fxr->file1_offset != 0 || fxr->file2_offset != 0 || + fxr->length != size1 || fxr->length != size2)) + return -EDOM; + + /* + * If the caller said to exchange to EOF, we set the length of the + * request large enough to cover everything to the end of both files. + */ + if (fxr->flags & FILE_XCHG_RANGE_TO_EOF) + fxr->length = max_t(int64_t, size1 - fxr->file1_offset, + size2 - fxr->file2_offset); + + /* The start of both ranges must be aligned to an fs block. */ + if (!IS_ALIGNED(fxr->file1_offset, blocksize) || + !IS_ALIGNED(fxr->file2_offset, blocksize)) + return -EINVAL; + + /* Ensure offsets don't wrap. */ + if (fxr->file1_offset + fxr->length < fxr->file1_offset || + fxr->file2_offset + fxr->length < fxr->file2_offset) + return -EINVAL; + + /* + * We require both ranges to be within EOF, unless we're exchanging + * to EOF. generic_xchg_range_prep already checked that both + * fxr->file1_offset and fxr->file2_offset are within EOF. + */ + if (!(fxr->flags & FILE_XCHG_RANGE_TO_EOF) && + (fxr->file1_offset + fxr->length > size1 || + fxr->file2_offset + fxr->length > size2)) + return -EINVAL; + + /* + * Make sure we don't hit any file size limits. If we hit any size + * limits such that test_length was adjusted, we abort the whole + * operation. + */ + test_len = fxr->length; + ret = generic_write_check_limits(file2, fxr->file2_offset, &test_len); + if (ret) + return ret; + ret = generic_write_check_limits(file1, fxr->file1_offset, &test_len); + if (ret) + return ret; + if (test_len != fxr->length) + return -EINVAL; + + /* + * If the user wanted us to exchange up to the infile's EOF, round up + * to the next block boundary for this check. Do the same for the + * outfile. + * + * Otherwise, reject the range length if it's not block aligned. We + * already confirmed the starting offsets' block alignment. + */ + if (fxr->file1_offset + fxr->length == size1) + blen = ALIGN(size1, blocksize) - fxr->file1_offset; + else if (fxr->file2_offset + fxr->length == size2) + blen = ALIGN(size2, blocksize) - fxr->file2_offset; + else if (!IS_ALIGNED(fxr->length, blocksize)) + return -EINVAL; + else + blen = fxr->length; + + /* Don't allow overlapped exchanges within the same file. */ + if (inode1 == inode2 && + fxr->file2_offset + blen > fxr->file1_offset && + fxr->file1_offset + blen > fxr->file2_offset) + return -EINVAL; + + /* If we already failed the freshness check, we're done. */ + ret = generic_xchg_file_range_check_fresh(inode2, fxr); + if (ret) + return ret; + + /* + * Ensure that we don't exchange a partial EOF block into the middle of + * another file. + */ + if ((fxr->length & blkmask) == 0) + return 0; + + blen = fxr->length; + if (fxr->file2_offset + blen < size2) + blen &= ~blkmask; + + if (fxr->file1_offset + blen < size1) + blen &= ~blkmask; + + return blen == fxr->length ? 0 : -EINVAL; +} + +/* + * Check that the two inodes are eligible for range exchanges, the ranges make + * sense, and then flush all dirty data. Caller must ensure that the inodes + * have been locked against any other modifications. + */ +int generic_xchg_file_range_prep(struct file *file1, struct file *file2, + struct file_xchg_range *fxr, + unsigned int blocksize) +{ + struct inode *inode1 = file_inode(file1); + struct inode *inode2 = file_inode(file2); + bool same_inode = (inode1 == inode2); + int ret; + + /* Check that we don't violate system file offset limits. */ + ret = generic_xchg_file_range_checks(file1, file2, fxr, blocksize); + if (ret || fxr->length == 0) + return ret; + + /* Wait for the completion of any pending IOs on both files */ + inode_dio_wait(inode1); + if (!same_inode) + inode_dio_wait(inode2); + + ret = filemap_write_and_wait_range(inode1->i_mapping, fxr->file1_offset, + fxr->file1_offset + fxr->length - 1); + if (ret) + return ret; + + ret = filemap_write_and_wait_range(inode2->i_mapping, fxr->file2_offset, + fxr->file2_offset + fxr->length - 1); + if (ret) + return ret; + + /* + * If the files or inodes involved require synchronous writes, amend + * the request to force the filesystem to flush all data and metadata + * to disk after the operation completes. + */ + if (((file1->f_flags | file2->f_flags) & (__O_SYNC | O_DSYNC)) || + IS_SYNC(inode1) || IS_SYNC(inode2)) + fxr->flags |= FILE_XCHG_RANGE_FSYNC; + + return 0; +} +EXPORT_SYMBOL_GPL(generic_xchg_file_range_prep); + +/* + * Finish a range exchange operation, if it was successful. Caller must ensure + * that the inodes are still locked against any other modifications. + */ +int generic_xchg_file_range_finish(struct file *file1, struct file *file2) +{ + int ret; + + ret = file_remove_privs(file1); + if (ret) + return ret; + if (file_inode(file1) == file_inode(file2)) + return 0; + + return file_remove_privs(file2); +} +EXPORT_SYMBOL_GPL(generic_xchg_file_range_finish); + +/* + * Check that both files' metadata agree with the snapshot that we took for + * the range exchange request. + + * This should be called after the filesystem has locked /all/ inode metadata + * against modification. + */ +int generic_xchg_file_range_check_fresh(struct inode *inode2, + const struct file_xchg_range *fxr) +{ + /* Check that file2 hasn't otherwise been modified. */ + if ((fxr->flags & FILE_XCHG_RANGE_FILE2_FRESH) && + (fxr->file2_ino != inode2->i_ino || + fxr->file2_ctime != inode2->i_ctime.tv_sec || + fxr->file2_ctime_nsec != inode2->i_ctime.tv_nsec || + fxr->file2_mtime != inode2->i_mtime.tv_sec || + fxr->file2_mtime_nsec != inode2->i_mtime.tv_nsec)) + return -EBUSY; + + return 0; +} +EXPORT_SYMBOL_GPL(generic_xchg_file_range_check_fresh); + +static inline int xchg_range_verify_area(struct file *file, loff_t pos, + struct file_xchg_range *fxr) +{ + int64_t len = fxr->length; + + if (pos < 0) + return -EINVAL; + + if (fxr->flags & FILE_XCHG_RANGE_TO_EOF) + len = min_t(int64_t, len, i_size_read(file_inode(file)) - pos); + return remap_verify_area(file, pos, len, true); +} + +int do_xchg_file_range(struct file *file1, struct file *file2, + struct file_xchg_range *fxr) +{ + struct inode *inode1 = file_inode(file1); + struct inode *inode2 = file_inode(file2); + int ret; + + if ((fxr->flags & ~FILE_XCHG_RANGE_ALL_FLAGS) || + memchr_inv(&fxr->pad, 0, sizeof(fxr->pad))) + return -EINVAL; + + if ((fxr->flags & FILE_XCHG_RANGE_FULL_FILES) && + (fxr->flags & FILE_XCHG_RANGE_TO_EOF)) + return -EINVAL; + + /* + * The ioctl enforces that src and dest files are on the same mount. + * Practically, they only need to be on the same file system. + */ + if (inode1->i_sb != inode2->i_sb) + return -EXDEV; + + /* This only works for regular files. */ + if (S_ISDIR(inode1->i_mode) || S_ISDIR(inode2->i_mode)) + return -EISDIR; + if (!S_ISREG(inode1->i_mode) || !S_ISREG(inode2->i_mode)) + return -EINVAL; + + ret = generic_file_rw_checks(file1, file2); + if (ret < 0) + return ret; + + ret = generic_file_rw_checks(file2, file1); + if (ret < 0) + return ret; + + if (!file1->f_op->xchg_file_range) + return -EOPNOTSUPP; + + ret = xchg_range_verify_area(file1, fxr->file1_offset, fxr); + if (ret) + return ret; + + ret = xchg_range_verify_area(file2, fxr->file2_offset, fxr); + if (ret) + return ret; + + ret = file2->f_op->xchg_file_range(file1, file2, fxr); + if (ret) + return ret; + + fsnotify_modify(file1); + if (file2 != file1) + fsnotify_modify(file2); + return 0; +} +EXPORT_SYMBOL_GPL(do_xchg_file_range); + +int vfs_xchg_file_range(struct file *file1, struct file *file2, + struct file_xchg_range *fxr) +{ + int ret; + + file_start_write(file2); + ret = do_xchg_file_range(file1, file2, fxr); + file_end_write(file2); + + return ret; +} +EXPORT_SYMBOL_GPL(vfs_xchg_file_range); diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h index 400cf68e551e..210c17f5a16c 100644 --- a/fs/xfs/libxfs/xfs_fs.h +++ b/fs/xfs/libxfs/xfs_fs.h @@ -841,6 +841,7 @@ struct xfs_scrub_metadata { #define XFS_IOC_FSGEOMETRY _IOR ('X', 126, struct xfs_fsop_geom) #define XFS_IOC_BULKSTAT _IOR ('X', 127, struct xfs_bulkstat_req) #define XFS_IOC_INUMBERS _IOR ('X', 128, struct xfs_inumbers_req) +/* FIEXCHANGE_RANGE ----------- hoisted 129 */ /* XFS_IOC_GETFSUUID ---------- deprecated 140 */ diff --git a/include/linux/fs.h b/include/linux/fs.h index 066555ad1bf8..cd86ac22c339 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -46,6 +46,7 @@ #include #include +#include struct backing_dev_info; struct bdi_writeback; @@ -2125,6 +2126,8 @@ struct file_operations { loff_t (*remap_file_range)(struct file *file_in, loff_t pos_in, struct file *file_out, loff_t pos_out, loff_t len, unsigned int remap_flags); + int (*xchg_file_range)(struct file *file1, struct file *file2, + struct file_xchg_range *fsr); int (*fadvise)(struct file *, loff_t, loff_t, int); int (*uring_cmd)(struct io_uring_cmd *ioucmd, unsigned int issue_flags); int (*uring_cmd_iopoll)(struct io_uring_cmd *, struct io_comp_batch *, @@ -2205,6 +2208,10 @@ int __generic_remap_file_range_prep(struct file *file_in, loff_t pos_in, int generic_remap_file_range_prep(struct file *file_in, loff_t pos_in, struct file *file_out, loff_t pos_out, loff_t *count, unsigned int remap_flags); +int generic_xchg_file_range_prep(struct file *file1, struct file *file2, + struct file_xchg_range *fsr, + unsigned int blocksize); +int generic_xchg_file_range_finish(struct file *file1, struct file *file2); extern loff_t do_clone_file_range(struct file *file_in, loff_t pos_in, struct file *file_out, loff_t pos_out, loff_t len, unsigned int remap_flags); @@ -2216,7 +2223,12 @@ extern int vfs_dedupe_file_range(struct file *file, extern loff_t vfs_dedupe_file_range_one(struct file *src_file, loff_t src_pos, struct file *dst_file, loff_t dst_pos, loff_t len, unsigned int remap_flags); - +extern int do_xchg_file_range(struct file *file1, struct file *file2, + struct file_xchg_range *fsr); +extern int vfs_xchg_file_range(struct file *file1, struct file *file2, + struct file_xchg_range *fsr); +extern int generic_xchg_file_range_check_fresh(struct inode *inode2, + const struct file_xchg_range *fsr); struct super_operations { struct inode *(*alloc_inode)(struct super_block *sb); diff --git a/include/uapi/linux/fiexchange.h b/include/uapi/linux/fiexchange.h new file mode 100644 index 000000000000..72bc228d4141 --- /dev/null +++ b/include/uapi/linux/fiexchange.h @@ -0,0 +1,101 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later WITH Linux-syscall-note */ +/* + * FIEXCHANGE_RANGE ioctl definitions, to facilitate exchanging parts of files. + * + * Copyright (C) 2022 Oracle. All Rights Reserved. + * + * Author: Darrick J. Wong + */ +#ifndef _LINUX_FIEXCHANGE_H +#define _LINUX_FIEXCHANGE_H + +#include + +/* + * Exchange part of file1 with part of the file that this ioctl that is being + * called against (which we'll call file2). Filesystems must be able to + * restart and complete the operation even after the system goes down. + */ +struct file_xchg_range { + __s64 file1_fd; + __s64 file1_offset; /* file1 offset, bytes */ + __s64 file2_offset; /* file2 offset, bytes */ + __u64 length; /* bytes to exchange */ + + __u64 flags; /* see FILE_XCHG_RANGE_* below */ + + /* file2 metadata for optional freshness checks */ + __s64 file2_ino; /* inode number */ + __s64 file2_mtime; /* modification time */ + __s64 file2_ctime; /* change time */ + __s32 file2_mtime_nsec; /* mod time, nsec */ + __s32 file2_ctime_nsec; /* change time, nsec */ + + __u64 pad[6]; /* must be zeroes */ +}; + +/* + * Atomic exchange operations are not required. This relaxes the requirement + * that the filesystem must be able to complete the operation after a crash. + */ +#define FILE_XCHG_RANGE_NONATOMIC (1 << 0) + +/* + * Check that file2's inode number, mtime, and ctime against the values + * provided, and return -EBUSY if there isn't an exact match. + */ +#define FILE_XCHG_RANGE_FILE2_FRESH (1 << 1) + +/* + * Check that the file1's length is equal to file1_offset + length, and that + * file2's length is equal to file2_offset + length. Returns -EDOM if there + * isn't an exact match. + */ +#define FILE_XCHG_RANGE_FULL_FILES (1 << 2) + +/* + * Exchange file data all the way to the ends of both files, and then exchange + * the file sizes. This flag can be used to replace a file's contents with a + * different amount of data. length will be ignored. + */ +#define FILE_XCHG_RANGE_TO_EOF (1 << 3) + +/* Flush all changes in file data and file metadata to disk before returning. */ +#define FILE_XCHG_RANGE_FSYNC (1 << 4) + +/* Dry run; do all the parameter verification but do not change anything. */ +#define FILE_XCHG_RANGE_DRY_RUN (1 << 5) + +/* + * Do not exchange any part of the range where file1's mapping is a hole. This + * can be used to emulate scatter-gather atomic writes with a temp file. + */ +#define FILE_XCHG_RANGE_SKIP_FILE1_HOLES (1 << 6) + +/* + * Commit the contents of file1 into file2 if file2 has the same inode number, + * mtime, and ctime as the arguments provided to the call. The old contents of + * file2 will be moved to file1. + * + * With this flag, all committed information can be retrieved even if the + * system crashes or is rebooted. This includes writing through or flushing a + * disk cache if present. The call blocks until the device reports that the + * commit is complete. + * + * This flag should not be combined with NONATOMIC. It can be combined with + * SKIP_FILE1_HOLES. + */ +#define FILE_XCHG_RANGE_COMMIT (FILE_XCHG_RANGE_FILE2_FRESH | \ + FILE_XCHG_RANGE_FSYNC) + +#define FILE_XCHG_RANGE_ALL_FLAGS (FILE_XCHG_RANGE_NONATOMIC | \ + FILE_XCHG_RANGE_FILE2_FRESH | \ + FILE_XCHG_RANGE_FULL_FILES | \ + FILE_XCHG_RANGE_TO_EOF | \ + FILE_XCHG_RANGE_FSYNC | \ + FILE_XCHG_RANGE_DRY_RUN | \ + FILE_XCHG_RANGE_SKIP_FILE1_HOLES) + +#define FIEXCHANGE_RANGE _IOWR('X', 129, struct file_xchg_range) + +#endif /* _LINUX_FIEXCHANGE_H */ From patchwork Fri Dec 30 22:13:55 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13084969 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 33F4CC4708E for ; Fri, 30 Dec 2022 23:50:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235738AbiL3Xuq (ORCPT ); Fri, 30 Dec 2022 18:50:46 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48430 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235775AbiL3Xun (ORCPT ); Fri, 30 Dec 2022 18:50:43 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7735E64FA; Fri, 30 Dec 2022 15:50:41 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id B4B2160CF0; Fri, 30 Dec 2022 23:50:40 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 2070EC433D2; Fri, 30 Dec 2022 23:50:40 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1672444240; bh=LX6LX9jzrv4hOzAcW5tfZ0NpVPszEilNORfS20tNIpk=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=KOqeJ4+82D6bnTmXLgn5n81Jlmihw9YKJL3DGJ6sLmPhYVh1ZgsVmePMyZ9IbNLrv K8YXHrShBaJQnbX3N3rGnu9+wtcBz8hjFQFeSLdnq5rrVv9drkTadTfWu1QE92K7nq +FyXspRx0wb0LkNdtTJTuj4oW7mEITF8XKFYy9SegKb1+HvifZF3XxXOgKV2eOuRAJ PEtlrdhdV7EqNFXNZiHXOETvqt0vuTjZ+LfAnJm4aLyIk9JYbCqxWrjuO6hOCm9he8 +dP+d/jd90WM0waPQB1GQ/Ny+Awr/TvtVwVma0IwFVKC6fycCEdkNxCubE6ORa+xcN 3mt8kugKaFtoQ== Subject: [PATCH 02/21] xfs: create a new helper to return a file's allocation unit From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org Date: Fri, 30 Dec 2022 14:13:55 -0800 Message-ID: <167243843550.699466.10049718356899871040.stgit@magnolia> In-Reply-To: <167243843494.699466.5163281976943635014.stgit@magnolia> References: <167243843494.699466.5163281976943635014.stgit@magnolia> User-Agent: StGit/0.19 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Darrick J. Wong Create a new helper function to calculate the fundamental allocation unit (i.e. the smallest unit of space we can allocate) of a file. Things are going to get hairy with range-exchange on the realtime device, so prepare for this now. Signed-off-by: Darrick J. Wong --- fs/xfs/xfs_file.c | 26 +++++++++----------------- fs/xfs/xfs_inode.c | 13 +++++++++++++ fs/xfs/xfs_inode.h | 1 + 3 files changed, 23 insertions(+), 17 deletions(-) diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c index c4bdadd8fa71..b382380656d7 100644 --- a/fs/xfs/xfs_file.c +++ b/fs/xfs/xfs_file.c @@ -44,27 +44,19 @@ xfs_is_falloc_aligned( loff_t pos, long long int len) { - struct xfs_mount *mp = ip->i_mount; - uint64_t mask; + unsigned int alloc_unit = xfs_inode_alloc_unitsize(ip); - if (XFS_IS_REALTIME_INODE(ip)) { - if (!is_power_of_2(mp->m_sb.sb_rextsize)) { - u64 rextbytes; - u32 mod; + if (XFS_IS_REALTIME_INODE(ip) && !is_power_of_2(alloc_unit)) { + u32 mod; - rextbytes = XFS_FSB_TO_B(mp, mp->m_sb.sb_rextsize); - div_u64_rem(pos, rextbytes, &mod); - if (mod) - return false; - div_u64_rem(len, rextbytes, &mod); - return mod == 0; - } - mask = XFS_FSB_TO_B(mp, mp->m_sb.sb_rextsize) - 1; - } else { - mask = mp->m_sb.sb_blocksize - 1; + div_u64_rem(pos, alloc_unit, &mod); + if (mod) + return false; + div_u64_rem(len, alloc_unit, &mod); + return mod == 0; } - return !((pos | len) & mask); + return !((pos | len) & (alloc_unit - 1)); } /* diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index b082222a9061..04ceafb936bc 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -3795,3 +3795,16 @@ xfs_inode_count_blocks( xfs_bmap_count_leaves(ifp, rblocks); *dblocks = ip->i_nblocks - *rblocks; } + +/* Returns the size of fundamental allocation unit for a file, in bytes. */ +unsigned int +xfs_inode_alloc_unitsize( + struct xfs_inode *ip) +{ + unsigned int blocks = 1; + + if (XFS_IS_REALTIME_INODE(ip)) + blocks = ip->i_mount->m_sb.sb_rextsize; + + return XFS_FSB_TO_B(ip->i_mount, blocks); +} diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h index 926e4dd566d0..4b01d078ace2 100644 --- a/fs/xfs/xfs_inode.h +++ b/fs/xfs/xfs_inode.h @@ -577,6 +577,7 @@ void xfs_iunlock2_io_mmap(struct xfs_inode *ip1, struct xfs_inode *ip2); void xfs_inode_count_blocks(struct xfs_trans *tp, struct xfs_inode *ip, xfs_filblks_t *dblocks, xfs_filblks_t *rblocks); +unsigned int xfs_inode_alloc_unitsize(struct xfs_inode *ip); /* * Parameters for tracking bumplink and droplink operations. The hook From patchwork Fri Dec 30 22:13:55 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13084970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id ED4B3C3DA7C for ; Fri, 30 Dec 2022 23:51:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235776AbiL3Xu7 (ORCPT ); Fri, 30 Dec 2022 18:50:59 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48848 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235566AbiL3Xu6 (ORCPT ); Fri, 30 Dec 2022 18:50:58 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C1BC11123; Fri, 30 Dec 2022 15:50:56 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 5E0CC61C43; Fri, 30 Dec 2022 23:50:56 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id BACD1C433D2; Fri, 30 Dec 2022 23:50:55 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1672444255; bh=oMY1tm5eTfkoXJGAz0d+4HAjxv6YvR1JlylNE7b1Pvs=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=CjSxOUJXU/Z7v1TZ0A7pR2vrmVYC+LieryLbAIhz7j3dKL8agScwVqPFQU2lrEEXY Tfzu3JTBpa+Pa7kYjrJ64eQBBhgu4TMO8ibeuw1vgmPThEBw1TtZcynHCA/cfSqSVD yILSFGU+NyFYVtQNnAew6un6Oi2Vq1iQCFAGDUTs0ddNy7IG3tygfAZu1QXQmp5bbN 6322pG4TEN2PZ8yS86KmnwKMuxuHThpA3z7mtwRwdNVoZoaqqnt/Nh8qHbZvJ3GyoJ dmixmR3yV8lTl1YM5KZ5qWbmS+9ba7irzpTctLy6JiKyTatNM0GejJiCTmCHlgJtVc vSxJOiG3fAKRg== Subject: [PATCH 03/21] xfs: refactor non-power-of-two alignment checks From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org Date: Fri, 30 Dec 2022 14:13:55 -0800 Message-ID: <167243843564.699466.13518512328247458894.stgit@magnolia> In-Reply-To: <167243843494.699466.5163281976943635014.stgit@magnolia> References: <167243843494.699466.5163281976943635014.stgit@magnolia> User-Agent: StGit/0.19 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Darrick J. Wong Create a helper function that can compute if a 64-bit number is an integer multiple of a 32-bit number, where the 32-bit number is not required to be an even power of two. This is needed for some new code for the realtime device, where we can set 37k allocation units and then have to remap them. Signed-off-by: Darrick J. Wong --- fs/xfs/xfs_file.c | 12 +++--------- fs/xfs/xfs_linux.h | 5 +++++ 2 files changed, 8 insertions(+), 9 deletions(-) diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c index b382380656d7..78323574021c 100644 --- a/fs/xfs/xfs_file.c +++ b/fs/xfs/xfs_file.c @@ -46,15 +46,9 @@ xfs_is_falloc_aligned( { unsigned int alloc_unit = xfs_inode_alloc_unitsize(ip); - if (XFS_IS_REALTIME_INODE(ip) && !is_power_of_2(alloc_unit)) { - u32 mod; - - div_u64_rem(pos, alloc_unit, &mod); - if (mod) - return false; - div_u64_rem(len, alloc_unit, &mod); - return mod == 0; - } + if (XFS_IS_REALTIME_INODE(ip) && !is_power_of_2(alloc_unit)) + return isaligned_64(pos, alloc_unit) && + isaligned_64(len, alloc_unit); return !((pos | len) & (alloc_unit - 1)); } diff --git a/fs/xfs/xfs_linux.h b/fs/xfs/xfs_linux.h index 3847719c3026..7e9bf03c80a3 100644 --- a/fs/xfs/xfs_linux.h +++ b/fs/xfs/xfs_linux.h @@ -197,6 +197,11 @@ static inline uint64_t howmany_64(uint64_t x, uint32_t y) return x; } +static inline bool isaligned_64(uint64_t x, uint32_t y) +{ + return do_div(x, y) == 0; +} + int xfs_rw_bdev(struct block_device *bdev, sector_t sector, unsigned int count, char *data, enum req_op op); From patchwork Fri Dec 30 22:13:55 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13084971 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id AA795C3DA7C for ; Fri, 30 Dec 2022 23:51:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235719AbiL3XvQ (ORCPT ); Fri, 30 Dec 2022 18:51:16 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48918 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235566AbiL3XvP (ORCPT ); Fri, 30 Dec 2022 18:51:15 -0500 Received: from ams.source.kernel.org (ams.source.kernel.org [IPv6:2604:1380:4601:e00::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EB664262A; Fri, 30 Dec 2022 15:51:13 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id A1632B81DCA; Fri, 30 Dec 2022 23:51:12 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 56BAFC433D2; Fri, 30 Dec 2022 23:51:11 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1672444271; bh=6E8jbKf7iy1+nbQHRKu7MpZkjsB/2aoy1V0uNTM6rIM=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=kJ3Fvu5GjMzcMVl3jyibRIlIRrAZtio1jsftWP6Yp5x5zXEzBw7JGuzc1rakdjqhH BV7mrjVG8GVHQCwXYvmtG0i828O776ZE6VJxGCoBt6t7UjRTTIag+GB0Npv7XgLMat sRpXUEoXR9pzXWZvklhut8crgg0bdtgvntDbB+AbM3PRso/LYhFGXrFvXJDDsrPmCT HKfJ7H+4oYIrFtQsT/dmMG98sBK48fvbCK8qfb/29IqrmMv+5A5KEBXbQZs/J8Sgjw NYv2cYM9t7hgut6l+tYU3eGymu8ulau4N1x4rfAqOSVeNcvMUeudVD+Iy3jg8zGgZO DCJYLDK/wZP1w== Subject: [PATCH 04/21] xfs: parameterize all the incompat log feature helpers From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org Date: Fri, 30 Dec 2022 14:13:55 -0800 Message-ID: <167243843579.699466.1331325312571395676.stgit@magnolia> In-Reply-To: <167243843494.699466.5163281976943635014.stgit@magnolia> References: <167243843494.699466.5163281976943635014.stgit@magnolia> User-Agent: StGit/0.19 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Darrick J. Wong We're about to define a new XFS_SB_FEAT_INCOMPAT_LOG_ bit, which means that callers will soon require the ability to toggle on and off different log incompat feature bits. Parameterize the xlog_{use,drop}_incompat_feat and xfs_sb_remove_incompat_log_features functions so that callers can specify which feature they're trying to use and so that we can clear individual log incompat bits as needed. Signed-off-by: Darrick J. Wong --- fs/xfs/libxfs/xfs_format.h | 5 +++-- fs/xfs/xfs_log.c | 34 +++++++++++++++++++++++++--------- fs/xfs/xfs_log.h | 9 ++++++--- fs/xfs/xfs_log_priv.h | 2 +- fs/xfs/xfs_log_recover.c | 3 ++- fs/xfs/xfs_mount.c | 11 +++++------ fs/xfs/xfs_mount.h | 2 +- fs/xfs/xfs_xattr.c | 6 +++--- 8 files changed, 46 insertions(+), 26 deletions(-) diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h index 5ba2dae7aa2f..817adb36cb1e 100644 --- a/fs/xfs/libxfs/xfs_format.h +++ b/fs/xfs/libxfs/xfs_format.h @@ -404,9 +404,10 @@ xfs_sb_has_incompat_log_feature( static inline void xfs_sb_remove_incompat_log_features( - struct xfs_sb *sbp) + struct xfs_sb *sbp, + uint32_t feature) { - sbp->sb_features_log_incompat &= ~XFS_SB_FEAT_INCOMPAT_LOG_ALL; + sbp->sb_features_log_incompat &= ~feature; } static inline void diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c index b32a8e57f576..a0ef09addc84 100644 --- a/fs/xfs/xfs_log.c +++ b/fs/xfs/xfs_log.c @@ -1082,7 +1082,7 @@ xfs_log_quiesce( * failures, though it's not fatal to have a higher log feature * protection level than the log contents actually require. */ - if (xfs_clear_incompat_log_features(mp)) { + if (xfs_clear_incompat_log_features(mp, XFS_SB_FEAT_INCOMPAT_LOG_ALL)) { int error; error = xfs_sync_sb(mp, false); @@ -1489,6 +1489,7 @@ xlog_clear_incompat( struct xlog *log) { struct xfs_mount *mp = log->l_mp; + uint32_t incompat_mask = 0; if (!xfs_sb_has_incompat_log_feature(&mp->m_sb, XFS_SB_FEAT_INCOMPAT_LOG_ALL)) @@ -1497,11 +1498,16 @@ xlog_clear_incompat( if (log->l_covered_state != XLOG_STATE_COVER_DONE2) return; - if (!down_write_trylock(&log->l_incompat_users)) + if (down_write_trylock(&log->l_incompat_xattrs)) + incompat_mask |= XFS_SB_FEAT_INCOMPAT_LOG_XATTRS; + + if (!incompat_mask) return; - xfs_clear_incompat_log_features(mp); - up_write(&log->l_incompat_users); + xfs_clear_incompat_log_features(mp, incompat_mask); + + if (incompat_mask & XFS_SB_FEAT_INCOMPAT_LOG_XATTRS) + up_write(&log->l_incompat_xattrs); } /* @@ -1618,7 +1624,7 @@ xlog_alloc_log( } log->l_sectBBsize = 1 << log2_size; - init_rwsem(&log->l_incompat_users); + init_rwsem(&log->l_incompat_xattrs); xlog_get_iclog_buffer_size(mp, log); @@ -3909,15 +3915,25 @@ xfs_log_check_lsn( */ void xlog_use_incompat_feat( - struct xlog *log) + struct xlog *log, + enum xlog_incompat_feat what) { - down_read(&log->l_incompat_users); + switch (what) { + case XLOG_INCOMPAT_FEAT_XATTRS: + down_read(&log->l_incompat_xattrs); + break; + } } /* Notify the log that we've finished using log incompat features. */ void xlog_drop_incompat_feat( - struct xlog *log) + struct xlog *log, + enum xlog_incompat_feat what) { - up_read(&log->l_incompat_users); + switch (what) { + case XLOG_INCOMPAT_FEAT_XATTRS: + up_read(&log->l_incompat_xattrs); + break; + } } diff --git a/fs/xfs/xfs_log.h b/fs/xfs/xfs_log.h index 2728886c2963..d187f6445909 100644 --- a/fs/xfs/xfs_log.h +++ b/fs/xfs/xfs_log.h @@ -159,8 +159,11 @@ bool xfs_log_check_lsn(struct xfs_mount *, xfs_lsn_t); xfs_lsn_t xlog_grant_push_threshold(struct xlog *log, int need_bytes); bool xlog_force_shutdown(struct xlog *log, uint32_t shutdown_flags); -void xlog_use_incompat_feat(struct xlog *log); -void xlog_drop_incompat_feat(struct xlog *log); -int xfs_attr_use_log_assist(struct xfs_mount *mp); +enum xlog_incompat_feat { + XLOG_INCOMPAT_FEAT_XATTRS = XFS_SB_FEAT_INCOMPAT_LOG_XATTRS, +}; + +void xlog_use_incompat_feat(struct xlog *log, enum xlog_incompat_feat what); +void xlog_drop_incompat_feat(struct xlog *log, enum xlog_incompat_feat what); #endif /* __XFS_LOG_H__ */ diff --git a/fs/xfs/xfs_log_priv.h b/fs/xfs/xfs_log_priv.h index 1bd2963e8fbd..a13b5b6b744d 100644 --- a/fs/xfs/xfs_log_priv.h +++ b/fs/xfs/xfs_log_priv.h @@ -447,7 +447,7 @@ struct xlog { uint32_t l_iclog_roundoff;/* padding roundoff */ /* Users of log incompat features should take a read lock. */ - struct rw_semaphore l_incompat_users; + struct rw_semaphore l_incompat_xattrs; }; /* diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c index 6b1f37bc3e95..81ce08c23306 100644 --- a/fs/xfs/xfs_log_recover.c +++ b/fs/xfs/xfs_log_recover.c @@ -3473,7 +3473,8 @@ xlog_recover_finish( * longer anything to protect. We rely on the AIL push to write out the * updated superblock after everything else. */ - if (xfs_clear_incompat_log_features(log->l_mp)) { + if (xfs_clear_incompat_log_features(log->l_mp, + XFS_SB_FEAT_INCOMPAT_LOG_ALL)) { error = xfs_sync_sb(log->l_mp, false); if (error < 0) { xfs_alert(log->l_mp, diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c index 31f49211fdd6..54cd47882991 100644 --- a/fs/xfs/xfs_mount.c +++ b/fs/xfs/xfs_mount.c @@ -1357,13 +1357,13 @@ xfs_add_incompat_log_feature( */ bool xfs_clear_incompat_log_features( - struct xfs_mount *mp) + struct xfs_mount *mp, + uint32_t features) { bool ret = false; if (!xfs_has_crc(mp) || - !xfs_sb_has_incompat_log_feature(&mp->m_sb, - XFS_SB_FEAT_INCOMPAT_LOG_ALL) || + !xfs_sb_has_incompat_log_feature(&mp->m_sb, features) || xfs_is_shutdown(mp)) return false; @@ -1375,9 +1375,8 @@ xfs_clear_incompat_log_features( xfs_buf_lock(mp->m_sb_bp); xfs_buf_hold(mp->m_sb_bp); - if (xfs_sb_has_incompat_log_feature(&mp->m_sb, - XFS_SB_FEAT_INCOMPAT_LOG_ALL)) { - xfs_sb_remove_incompat_log_features(&mp->m_sb); + if (xfs_sb_has_incompat_log_feature(&mp->m_sb, features)) { + xfs_sb_remove_incompat_log_features(&mp->m_sb, features); ret = true; } diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h index ec8b185d45f8..7c48a2b70f6f 100644 --- a/fs/xfs/xfs_mount.h +++ b/fs/xfs/xfs_mount.h @@ -547,7 +547,7 @@ struct xfs_error_cfg * xfs_error_get_cfg(struct xfs_mount *mp, int error_class, int error); void xfs_force_summary_recalc(struct xfs_mount *mp); int xfs_add_incompat_log_feature(struct xfs_mount *mp, uint32_t feature); -bool xfs_clear_incompat_log_features(struct xfs_mount *mp); +bool xfs_clear_incompat_log_features(struct xfs_mount *mp, uint32_t feature); void xfs_mod_delalloc(struct xfs_mount *mp, int64_t delta); #endif /* __XFS_MOUNT_H__ */ diff --git a/fs/xfs/xfs_xattr.c b/fs/xfs/xfs_xattr.c index 10aa1fd39d2b..e03f199f50c7 100644 --- a/fs/xfs/xfs_xattr.c +++ b/fs/xfs/xfs_xattr.c @@ -37,7 +37,7 @@ xfs_attr_grab_log_assist( * Protect ourselves from an idle log clearing the logged xattrs log * incompat feature bit. */ - xlog_use_incompat_feat(mp->m_log); + xlog_use_incompat_feat(mp->m_log, XLOG_INCOMPAT_FEAT_XATTRS); /* * If log-assisted xattrs are already enabled, the caller can use the @@ -57,7 +57,7 @@ xfs_attr_grab_log_assist( return 0; drop_incompat: - xlog_drop_incompat_feat(mp->m_log); + xlog_drop_incompat_feat(mp->m_log, XLOG_INCOMPAT_FEAT_XATTRS); return error; } @@ -65,7 +65,7 @@ static inline void xfs_attr_rele_log_assist( struct xfs_mount *mp) { - xlog_drop_incompat_feat(mp->m_log); + xlog_drop_incompat_feat(mp->m_log, XLOG_INCOMPAT_FEAT_XATTRS); } static inline bool From patchwork Fri Dec 30 22:13:55 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13084972 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A7738C4708E for ; Fri, 30 Dec 2022 23:51:32 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235778AbiL3Xva (ORCPT ); Fri, 30 Dec 2022 18:51:30 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48950 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235530AbiL3Xv3 (ORCPT ); Fri, 30 Dec 2022 18:51:29 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EBB3D262A; Fri, 30 Dec 2022 15:51:27 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 8826E61B98; Fri, 30 Dec 2022 23:51:27 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id EF0D9C433D2; Fri, 30 Dec 2022 23:51:26 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1672444287; bh=EY7L0C/1YQ2AVomIpr49dXSRG1X4cWxiGR+xoSKOoeU=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=V0OuBRTN/SBoKm4Iu04kXo+RKtX4B6+RhjRCV2F03kNT+s4ucBfHR1R0UE2wWQMyT /SsUTJyLbNGnRj4ybqI+HT5yfzovb102NSQvNkgZeehr0BiCi8RcwmPjmrPWL6g2To 8Ag8C/qRg527zCKK7poDkSFkPYfWnFR/HhRuuV8aXAf63uLMFBlPE7+z0/0xjWv5Ne v162//W4vRRn4Z+woGA98O4D4iNApQ3Fy180c24tn7oYZ31WlIPjs+groEK3AHPuvE t4KBOZaDlonLj7cVMzsrxXgaO0U6aqvv/gDqRnTYy8hn9pwme7xcchLo28vWdUfl5m ++BcovYWaEaiQ== Subject: [PATCH 05/21] xfs: create a log incompat flag for atomic extent swapping From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org Date: Fri, 30 Dec 2022 14:13:55 -0800 Message-ID: <167243843594.699466.3694791888640844542.stgit@magnolia> In-Reply-To: <167243843494.699466.5163281976943635014.stgit@magnolia> References: <167243843494.699466.5163281976943635014.stgit@magnolia> User-Agent: StGit/0.19 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Darrick J. Wong Create a log incompat flag so that we only attempt to process swap extent log items if the filesystem supports it, and a geometry flag to advertise support if it's present. Signed-off-by: Darrick J. Wong --- fs/xfs/libxfs/xfs_format.h | 1 + fs/xfs/libxfs/xfs_fs.h | 1 + fs/xfs/libxfs/xfs_sb.c | 3 +++ fs/xfs/libxfs/xfs_swapext.h | 24 ++++++++++++++++++++++++ 4 files changed, 29 insertions(+) create mode 100644 fs/xfs/libxfs/xfs_swapext.h diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h index 817adb36cb1e..1424976ec955 100644 --- a/fs/xfs/libxfs/xfs_format.h +++ b/fs/xfs/libxfs/xfs_format.h @@ -391,6 +391,7 @@ xfs_sb_has_incompat_feature( } #define XFS_SB_FEAT_INCOMPAT_LOG_XATTRS (1 << 0) /* Delayed Attributes */ +#define XFS_SB_FEAT_INCOMPAT_LOG_SWAPEXT (1U << 31) /* file extent swap */ #define XFS_SB_FEAT_INCOMPAT_LOG_ALL \ (XFS_SB_FEAT_INCOMPAT_LOG_XATTRS) #define XFS_SB_FEAT_INCOMPAT_LOG_UNKNOWN ~XFS_SB_FEAT_INCOMPAT_LOG_ALL diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h index 210c17f5a16c..a39fd65e6ee0 100644 --- a/fs/xfs/libxfs/xfs_fs.h +++ b/fs/xfs/libxfs/xfs_fs.h @@ -239,6 +239,7 @@ typedef struct xfs_fsop_resblks { #define XFS_FSOP_GEOM_FLAGS_BIGTIME (1 << 21) /* 64-bit nsec timestamps */ #define XFS_FSOP_GEOM_FLAGS_INOBTCNT (1 << 22) /* inobt btree counter */ #define XFS_FSOP_GEOM_FLAGS_NREXT64 (1 << 23) /* large extent counters */ +#define XFS_FSOP_GEOM_FLAGS_ATOMIC_SWAP (1U << 31) /* atomic file extent swap */ /* * Minimum and maximum sizes need for growth checks. diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c index b3e8ab247b28..5b6f5939fda1 100644 --- a/fs/xfs/libxfs/xfs_sb.c +++ b/fs/xfs/libxfs/xfs_sb.c @@ -25,6 +25,7 @@ #include "xfs_da_format.h" #include "xfs_health.h" #include "xfs_ag.h" +#include "xfs_swapext.h" /* * Physical superblock buffer manipulations. Shared with libxfs in userspace. @@ -1197,6 +1198,8 @@ xfs_fs_geometry( } if (xfs_has_large_extent_counts(mp)) geo->flags |= XFS_FSOP_GEOM_FLAGS_NREXT64; + if (xfs_swapext_supported(mp)) + geo->flags |= XFS_FSOP_GEOM_FLAGS_ATOMIC_SWAP; geo->rtsectsize = sbp->sb_blocksize; geo->dirblocksize = xfs_dir2_dirblock_bytes(sbp); diff --git a/fs/xfs/libxfs/xfs_swapext.h b/fs/xfs/libxfs/xfs_swapext.h new file mode 100644 index 000000000000..316323339d76 --- /dev/null +++ b/fs/xfs/libxfs/xfs_swapext.h @@ -0,0 +1,24 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +/* + * Copyright (C) 2022 Oracle. All Rights Reserved. + * Author: Darrick J. Wong + */ +#ifndef __XFS_SWAPEXT_H_ +#define __XFS_SWAPEXT_H_ 1 + +/* + * Decide if this filesystem supports using log items to swap file extents and + * restart the operation if the system fails before the operation completes. + * + * This can be done to individual file extents by using the block mapping log + * intent items introduced with reflink and rmap; or to entire file ranges + * using swapext log intent items to track the overall progress across multiple + * extent mappings. Realtime is not supported yet. + */ +static inline bool xfs_swapext_supported(struct xfs_mount *mp) +{ + return (xfs_has_reflink(mp) || xfs_has_rmapbt(mp)) && + !xfs_has_realtime(mp); +} + +#endif /* __XFS_SWAPEXT_H_ */ From patchwork Fri Dec 30 22:13:56 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13084973 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 91DA2C3DA7D for ; Fri, 30 Dec 2022 23:51:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235667AbiL3Xvq (ORCPT ); Fri, 30 Dec 2022 18:51:46 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49028 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231164AbiL3Xvp (ORCPT ); Fri, 30 Dec 2022 18:51:45 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7FF1F8FC2; Fri, 30 Dec 2022 15:51:43 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 0F75F61B98; Fri, 30 Dec 2022 23:51:43 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 71633C433EF; Fri, 30 Dec 2022 23:51:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1672444302; bh=8w0KrGvFUuIxoZdeLwDguV5toFy9tFeYMYThtVUxXHE=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=aijH4cJ4SdnCd9aU+BQo/VGGfs0U6q5JFcwD6j6jDNnGlmfoKl4ZfCxXCbnwhwT7u 8yjG1B32QAR8lQYMKS7fowoyhTQ+UTpmm6ciOFBO/vYnfox3m8SMSJjAEGMVCAa3cg l6dWPogSST5t0ZUuIZE7l+zHuqArQl2ziA3xkb7GQS4E2Uciyx8E4av99duLz7HD6v aGuzEEIu5TmWKD8H2LhoMOuVBxC8mvF5TPSQPUxxnexMNlb9/i2OVlcqIEQRXOYu5I 5wGWhJvHDfNHCEBQ7uMtvCLFhQtEvqmPwQMkPm4q99Sy+35tfNsmvzMYJg1RJ6PWGy 0kjEnyKD7Cj5g== Subject: [PATCH 06/21] xfs: introduce a swap-extent log intent item From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org Date: Fri, 30 Dec 2022 14:13:56 -0800 Message-ID: <167243843608.699466.6025805169564879348.stgit@magnolia> In-Reply-To: <167243843494.699466.5163281976943635014.stgit@magnolia> References: <167243843494.699466.5163281976943635014.stgit@magnolia> User-Agent: StGit/0.19 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Darrick J. Wong Introduce a new intent log item to handle swapping extents. Signed-off-by: Darrick J. Wong --- fs/xfs/Makefile | 1 fs/xfs/libxfs/xfs_log_format.h | 51 ++++++ fs/xfs/libxfs/xfs_log_recover.h | 2 fs/xfs/xfs_log_recover.c | 2 fs/xfs/xfs_super.c | 19 ++ fs/xfs/xfs_swapext_item.c | 320 +++++++++++++++++++++++++++++++++++++++ fs/xfs/xfs_swapext_item.h | 56 +++++++ 7 files changed, 448 insertions(+), 3 deletions(-) create mode 100644 fs/xfs/xfs_swapext_item.c create mode 100644 fs/xfs/xfs_swapext_item.h diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile index fc83759656c6..c5cb8cf6ffbb 100644 --- a/fs/xfs/Makefile +++ b/fs/xfs/Makefile @@ -109,6 +109,7 @@ xfs-y += xfs_log.o \ xfs_iunlink_item.o \ xfs_refcount_item.o \ xfs_rmap_item.o \ + xfs_swapext_item.o \ xfs_log_recover.o \ xfs_trans_ail.o \ xfs_trans_buf.o diff --git a/fs/xfs/libxfs/xfs_log_format.h b/fs/xfs/libxfs/xfs_log_format.h index 367f536d9881..b105a5ef6644 100644 --- a/fs/xfs/libxfs/xfs_log_format.h +++ b/fs/xfs/libxfs/xfs_log_format.h @@ -117,8 +117,9 @@ struct xfs_unmount_log_format { #define XLOG_REG_TYPE_ATTRD_FORMAT 28 #define XLOG_REG_TYPE_ATTR_NAME 29 #define XLOG_REG_TYPE_ATTR_VALUE 30 -#define XLOG_REG_TYPE_MAX 30 - +#define XLOG_REG_TYPE_SXI_FORMAT 31 +#define XLOG_REG_TYPE_SXD_FORMAT 32 +#define XLOG_REG_TYPE_MAX 32 /* * Flags to log operation header @@ -243,6 +244,8 @@ typedef struct xfs_trans_header { #define XFS_LI_BUD 0x1245 #define XFS_LI_ATTRI 0x1246 /* attr set/remove intent*/ #define XFS_LI_ATTRD 0x1247 /* attr set/remove done */ +#define XFS_LI_SXI 0x1248 /* extent swap intent */ +#define XFS_LI_SXD 0x1249 /* extent swap done */ #define XFS_LI_TYPE_DESC \ { XFS_LI_EFI, "XFS_LI_EFI" }, \ @@ -260,7 +263,9 @@ typedef struct xfs_trans_header { { XFS_LI_BUI, "XFS_LI_BUI" }, \ { XFS_LI_BUD, "XFS_LI_BUD" }, \ { XFS_LI_ATTRI, "XFS_LI_ATTRI" }, \ - { XFS_LI_ATTRD, "XFS_LI_ATTRD" } + { XFS_LI_ATTRD, "XFS_LI_ATTRD" }, \ + { XFS_LI_SXI, "XFS_LI_SXI" }, \ + { XFS_LI_SXD, "XFS_LI_SXD" } /* * Inode Log Item Format definitions. @@ -871,6 +876,46 @@ struct xfs_bud_log_format { uint64_t bud_bui_id; /* id of corresponding bui */ }; +/* + * SXI/SXD (extent swapping) log format definitions + */ + +struct xfs_swap_extent { + uint64_t sx_inode1; + uint64_t sx_inode2; + uint64_t sx_startoff1; + uint64_t sx_startoff2; + uint64_t sx_blockcount; + uint64_t sx_flags; + int64_t sx_isize1; + int64_t sx_isize2; +}; + +#define XFS_SWAP_EXT_FLAGS (0) + +#define XFS_SWAP_EXT_STRINGS + +/* This is the structure used to lay out an sxi log item in the log. */ +struct xfs_sxi_log_format { + uint16_t sxi_type; /* sxi log item type */ + uint16_t sxi_size; /* size of this item */ + uint32_t __pad; /* must be zero */ + uint64_t sxi_id; /* sxi identifier */ + struct xfs_swap_extent sxi_extent; /* extent to swap */ +}; + +/* + * This is the structure used to lay out an sxd log item in the + * log. The sxd_extents array is a variable size array whose + * size is given by sxd_nextents; + */ +struct xfs_sxd_log_format { + uint16_t sxd_type; /* sxd log item type */ + uint16_t sxd_size; /* size of this item */ + uint32_t __pad; + uint64_t sxd_sxi_id; /* id of corresponding bui */ +}; + /* * Dquot Log format definitions. * diff --git a/fs/xfs/libxfs/xfs_log_recover.h b/fs/xfs/libxfs/xfs_log_recover.h index 2420865f3007..6162c93b5d38 100644 --- a/fs/xfs/libxfs/xfs_log_recover.h +++ b/fs/xfs/libxfs/xfs_log_recover.h @@ -74,6 +74,8 @@ extern const struct xlog_recover_item_ops xlog_cui_item_ops; extern const struct xlog_recover_item_ops xlog_cud_item_ops; extern const struct xlog_recover_item_ops xlog_attri_item_ops; extern const struct xlog_recover_item_ops xlog_attrd_item_ops; +extern const struct xlog_recover_item_ops xlog_sxi_item_ops; +extern const struct xlog_recover_item_ops xlog_sxd_item_ops; /* * Macros, structures, prototypes for internal log manager use. diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c index 81ce08c23306..006ceff1959d 100644 --- a/fs/xfs/xfs_log_recover.c +++ b/fs/xfs/xfs_log_recover.c @@ -1796,6 +1796,8 @@ static const struct xlog_recover_item_ops *xlog_recover_item_ops[] = { &xlog_bud_item_ops, &xlog_attri_item_ops, &xlog_attrd_item_ops, + &xlog_sxi_item_ops, + &xlog_sxd_item_ops, }; static const struct xlog_recover_item_ops * diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c index a16d4d1b35d0..4cf26611f46f 100644 --- a/fs/xfs/xfs_super.c +++ b/fs/xfs/xfs_super.c @@ -42,6 +42,7 @@ #include "xfs_xattr.h" #include "xfs_iunlink_item.h" #include "scrub/rcbag_btree.h" +#include "xfs_swapext_item.h" #include #include @@ -2122,8 +2123,24 @@ xfs_init_caches(void) if (!xfs_iunlink_cache) goto out_destroy_attri_cache; + xfs_sxd_cache = kmem_cache_create("xfs_sxd_item", + sizeof(struct xfs_sxd_log_item), + 0, 0, NULL); + if (!xfs_sxd_cache) + goto out_destroy_iul_cache; + + xfs_sxi_cache = kmem_cache_create("xfs_sxi_item", + sizeof(struct xfs_sxi_log_item), + 0, 0, NULL); + if (!xfs_sxi_cache) + goto out_destroy_sxd_cache; + return 0; + out_destroy_sxd_cache: + kmem_cache_destroy(xfs_sxd_cache); + out_destroy_iul_cache: + kmem_cache_destroy(xfs_iunlink_cache); out_destroy_attri_cache: kmem_cache_destroy(xfs_attri_cache); out_destroy_attrd_cache: @@ -2180,6 +2197,8 @@ xfs_destroy_caches(void) * destroy caches. */ rcu_barrier(); + kmem_cache_destroy(xfs_sxd_cache); + kmem_cache_destroy(xfs_sxi_cache); kmem_cache_destroy(xfs_iunlink_cache); kmem_cache_destroy(xfs_attri_cache); kmem_cache_destroy(xfs_attrd_cache); diff --git a/fs/xfs/xfs_swapext_item.c b/fs/xfs/xfs_swapext_item.c new file mode 100644 index 000000000000..ea4a3a8de7e3 --- /dev/null +++ b/fs/xfs/xfs_swapext_item.c @@ -0,0 +1,320 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Copyright (C) 2022 Oracle. All Rights Reserved. + * Author: Darrick J. Wong + */ +#include "xfs.h" +#include "xfs_fs.h" +#include "xfs_format.h" +#include "xfs_log_format.h" +#include "xfs_trans_resv.h" +#include "xfs_bit.h" +#include "xfs_shared.h" +#include "xfs_mount.h" +#include "xfs_defer.h" +#include "xfs_inode.h" +#include "xfs_trans.h" +#include "xfs_trans_priv.h" +#include "xfs_swapext_item.h" +#include "xfs_log.h" +#include "xfs_bmap.h" +#include "xfs_icache.h" +#include "xfs_trans_space.h" +#include "xfs_error.h" +#include "xfs_log_priv.h" +#include "xfs_log_recover.h" + +struct kmem_cache *xfs_sxi_cache; +struct kmem_cache *xfs_sxd_cache; + +static const struct xfs_item_ops xfs_sxi_item_ops; + +static inline struct xfs_sxi_log_item *SXI_ITEM(struct xfs_log_item *lip) +{ + return container_of(lip, struct xfs_sxi_log_item, sxi_item); +} + +STATIC void +xfs_sxi_item_free( + struct xfs_sxi_log_item *sxi_lip) +{ + kmem_free(sxi_lip->sxi_item.li_lv_shadow); + kmem_cache_free(xfs_sxi_cache, sxi_lip); +} + +/* + * Freeing the SXI requires that we remove it from the AIL if it has already + * been placed there. However, the SXI may not yet have been placed in the AIL + * when called by xfs_sxi_release() from SXD processing due to the ordering of + * committed vs unpin operations in bulk insert operations. Hence the reference + * count to ensure only the last caller frees the SXI. + */ +STATIC void +xfs_sxi_release( + struct xfs_sxi_log_item *sxi_lip) +{ + ASSERT(atomic_read(&sxi_lip->sxi_refcount) > 0); + if (atomic_dec_and_test(&sxi_lip->sxi_refcount)) { + xfs_trans_ail_delete(&sxi_lip->sxi_item, SHUTDOWN_LOG_IO_ERROR); + xfs_sxi_item_free(sxi_lip); + } +} + + +STATIC void +xfs_sxi_item_size( + struct xfs_log_item *lip, + int *nvecs, + int *nbytes) +{ + *nvecs += 1; + *nbytes += sizeof(struct xfs_sxi_log_format); +} + +/* + * This is called to fill in the vector of log iovecs for the given sxi log + * item. We use only 1 iovec, and we point that at the sxi_log_format structure + * embedded in the sxi item. + */ +STATIC void +xfs_sxi_item_format( + struct xfs_log_item *lip, + struct xfs_log_vec *lv) +{ + struct xfs_sxi_log_item *sxi_lip = SXI_ITEM(lip); + struct xfs_log_iovec *vecp = NULL; + + sxi_lip->sxi_format.sxi_type = XFS_LI_SXI; + sxi_lip->sxi_format.sxi_size = 1; + + xlog_copy_iovec(lv, &vecp, XLOG_REG_TYPE_SXI_FORMAT, + &sxi_lip->sxi_format, + sizeof(struct xfs_sxi_log_format)); +} + +/* + * The unpin operation is the last place an SXI is manipulated in the log. It + * is either inserted in the AIL or aborted in the event of a log I/O error. In + * either case, the SXI transaction has been successfully committed to make it + * this far. Therefore, we expect whoever committed the SXI to either construct + * and commit the SXD or drop the SXD's reference in the event of error. Simply + * drop the log's SXI reference now that the log is done with it. + */ +STATIC void +xfs_sxi_item_unpin( + struct xfs_log_item *lip, + int remove) +{ + struct xfs_sxi_log_item *sxi_lip = SXI_ITEM(lip); + + xfs_sxi_release(sxi_lip); +} + +/* + * The SXI has been either committed or aborted if the transaction has been + * cancelled. If the transaction was cancelled, an SXD isn't going to be + * constructed and thus we free the SXI here directly. + */ +STATIC void +xfs_sxi_item_release( + struct xfs_log_item *lip) +{ + xfs_sxi_release(SXI_ITEM(lip)); +} + +/* Allocate and initialize an sxi item with the given number of extents. */ +STATIC struct xfs_sxi_log_item * +xfs_sxi_init( + struct xfs_mount *mp) + +{ + struct xfs_sxi_log_item *sxi_lip; + + sxi_lip = kmem_cache_zalloc(xfs_sxi_cache, GFP_KERNEL | __GFP_NOFAIL); + + xfs_log_item_init(mp, &sxi_lip->sxi_item, XFS_LI_SXI, &xfs_sxi_item_ops); + sxi_lip->sxi_format.sxi_id = (uintptr_t)(void *)sxi_lip; + atomic_set(&sxi_lip->sxi_refcount, 2); + + return sxi_lip; +} + +static inline struct xfs_sxd_log_item *SXD_ITEM(struct xfs_log_item *lip) +{ + return container_of(lip, struct xfs_sxd_log_item, sxd_item); +} + +STATIC void +xfs_sxd_item_size( + struct xfs_log_item *lip, + int *nvecs, + int *nbytes) +{ + *nvecs += 1; + *nbytes += sizeof(struct xfs_sxd_log_format); +} + +/* + * This is called to fill in the vector of log iovecs for the given sxd log + * item. We use only 1 iovec, and we point that at the sxd_log_format structure + * embedded in the sxd item. + */ +STATIC void +xfs_sxd_item_format( + struct xfs_log_item *lip, + struct xfs_log_vec *lv) +{ + struct xfs_sxd_log_item *sxd_lip = SXD_ITEM(lip); + struct xfs_log_iovec *vecp = NULL; + + sxd_lip->sxd_format.sxd_type = XFS_LI_SXD; + sxd_lip->sxd_format.sxd_size = 1; + + xlog_copy_iovec(lv, &vecp, XLOG_REG_TYPE_SXD_FORMAT, &sxd_lip->sxd_format, + sizeof(struct xfs_sxd_log_format)); +} + +/* + * The SXD is either committed or aborted if the transaction is cancelled. If + * the transaction is cancelled, drop our reference to the SXI and free the + * SXD. + */ +STATIC void +xfs_sxd_item_release( + struct xfs_log_item *lip) +{ + struct xfs_sxd_log_item *sxd_lip = SXD_ITEM(lip); + + kmem_free(sxd_lip->sxd_item.li_lv_shadow); + xfs_sxi_release(sxd_lip->sxd_intent_log_item); + kmem_cache_free(xfs_sxd_cache, sxd_lip); +} + +static struct xfs_log_item * +xfs_sxd_item_intent( + struct xfs_log_item *lip) +{ + return &SXD_ITEM(lip)->sxd_intent_log_item->sxi_item; +} + +static const struct xfs_item_ops xfs_sxd_item_ops = { + .flags = XFS_ITEM_RELEASE_WHEN_COMMITTED | + XFS_ITEM_INTENT_DONE, + .iop_size = xfs_sxd_item_size, + .iop_format = xfs_sxd_item_format, + .iop_release = xfs_sxd_item_release, + .iop_intent = xfs_sxd_item_intent, +}; + +/* Process a swapext update intent item that was recovered from the log. */ +STATIC int +xfs_sxi_item_recover( + struct xfs_log_item *lip, + struct list_head *capture_list) +{ + return -EFSCORRUPTED; +} + +STATIC bool +xfs_sxi_item_match( + struct xfs_log_item *lip, + uint64_t intent_id) +{ + return SXI_ITEM(lip)->sxi_format.sxi_id == intent_id; +} + +/* Relog an intent item to push the log tail forward. */ +static struct xfs_log_item * +xfs_sxi_item_relog( + struct xfs_log_item *intent, + struct xfs_trans *tp) +{ + ASSERT(0); + return NULL; +} + +static const struct xfs_item_ops xfs_sxi_item_ops = { + .flags = XFS_ITEM_INTENT, + .iop_size = xfs_sxi_item_size, + .iop_format = xfs_sxi_item_format, + .iop_unpin = xfs_sxi_item_unpin, + .iop_release = xfs_sxi_item_release, + .iop_recover = xfs_sxi_item_recover, + .iop_match = xfs_sxi_item_match, + .iop_relog = xfs_sxi_item_relog, +}; + +/* + * This routine is called to create an in-core extent swapext update item from + * the sxi format structure which was logged on disk. It allocates an in-core + * sxi, copies the extents from the format structure into it, and adds the sxi + * to the AIL with the given LSN. + */ +STATIC int +xlog_recover_sxi_commit_pass2( + struct xlog *log, + struct list_head *buffer_list, + struct xlog_recover_item *item, + xfs_lsn_t lsn) +{ + struct xfs_mount *mp = log->l_mp; + struct xfs_sxi_log_item *sxi_lip; + struct xfs_sxi_log_format *sxi_formatp; + size_t len; + + sxi_formatp = item->ri_buf[0].i_addr; + + if (sxi_formatp->__pad != 0) { + XFS_ERROR_REPORT(__func__, XFS_ERRLEVEL_LOW, log->l_mp); + return -EFSCORRUPTED; + } + + len = sizeof(struct xfs_sxi_log_format); + if (item->ri_buf[0].i_len != len) { + XFS_ERROR_REPORT(__func__, XFS_ERRLEVEL_LOW, log->l_mp); + return -EFSCORRUPTED; + } + + sxi_lip = xfs_sxi_init(mp); + memcpy(&sxi_lip->sxi_format, sxi_formatp, len); + + xfs_trans_ail_insert(log->l_ailp, &sxi_lip->sxi_item, lsn); + xfs_sxi_release(sxi_lip); + return 0; +} + +const struct xlog_recover_item_ops xlog_sxi_item_ops = { + .item_type = XFS_LI_SXI, + .commit_pass2 = xlog_recover_sxi_commit_pass2, +}; + +/* + * This routine is called when an SXD format structure is found in a committed + * transaction in the log. Its purpose is to cancel the corresponding SXI if it + * was still in the log. To do this it searches the AIL for the SXI with an id + * equal to that in the SXD format structure. If we find it we drop the SXD + * reference, which removes the SXI from the AIL and frees it. + */ +STATIC int +xlog_recover_sxd_commit_pass2( + struct xlog *log, + struct list_head *buffer_list, + struct xlog_recover_item *item, + xfs_lsn_t lsn) +{ + struct xfs_sxd_log_format *sxd_formatp; + + sxd_formatp = item->ri_buf[0].i_addr; + if (item->ri_buf[0].i_len != sizeof(struct xfs_sxd_log_format)) { + XFS_ERROR_REPORT(__func__, XFS_ERRLEVEL_LOW, log->l_mp); + return -EFSCORRUPTED; + } + + xlog_recover_release_intent(log, XFS_LI_SXI, sxd_formatp->sxd_sxi_id); + return 0; +} + +const struct xlog_recover_item_ops xlog_sxd_item_ops = { + .item_type = XFS_LI_SXD, + .commit_pass2 = xlog_recover_sxd_commit_pass2, +}; diff --git a/fs/xfs/xfs_swapext_item.h b/fs/xfs/xfs_swapext_item.h new file mode 100644 index 000000000000..e3cb59692e50 --- /dev/null +++ b/fs/xfs/xfs_swapext_item.h @@ -0,0 +1,56 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +/* + * Copyright (C) 2022 Oracle. All Rights Reserved. + * Author: Darrick J. Wong + */ +#ifndef __XFS_SWAPEXT_ITEM_H__ +#define __XFS_SWAPEXT_ITEM_H__ + +/* + * The extent swapping intent item help us perform atomic extent swaps between + * two inode forks. It does this by tracking the range of logical offsets that + * still need to be swapped, and relogs as progress happens. + * + * *I items should be recorded in the *first* of a series of rolled + * transactions, and the *D items should be recorded in the same transaction + * that records the associated bmbt updates. + * + * Should the system crash after the commit of the first transaction but + * before the commit of the final transaction in a series, log recovery will + * use the redo information recorded by the intent items to replay the + * rest of the extent swaps. + */ + +/* kernel only SXI/SXD definitions */ + +struct xfs_mount; +struct kmem_cache; + +/* + * This is the "swapext update intent" log item. It is used to log the fact + * that we are swapping extents between two files. It is used in conjunction + * with the "swapext update done" log item described below. + * + * These log items follow the same rules as struct xfs_efi_log_item; see the + * comments about that structure (in xfs_extfree_item.h) for more details. + */ +struct xfs_sxi_log_item { + struct xfs_log_item sxi_item; + atomic_t sxi_refcount; + struct xfs_sxi_log_format sxi_format; +}; + +/* + * This is the "swapext update done" log item. It is used to log the fact that + * some extent swapping mentioned in an earlier sxi item have been performed. + */ +struct xfs_sxd_log_item { + struct xfs_log_item sxd_item; + struct xfs_sxi_log_item *sxd_intent_log_item; + struct xfs_sxd_log_format sxd_format; +}; + +extern struct kmem_cache *xfs_sxi_cache; +extern struct kmem_cache *xfs_sxd_cache; + +#endif /* __XFS_SWAPEXT_ITEM_H__ */ From patchwork Fri Dec 30 22:13:56 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13084974 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2C0A5C3DA7C for ; Fri, 30 Dec 2022 23:52:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235804AbiL3XwE (ORCPT ); Fri, 30 Dec 2022 18:52:04 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49070 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231164AbiL3XwC (ORCPT ); Fri, 30 Dec 2022 18:52:02 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4A5109FD5; Fri, 30 Dec 2022 15:51:59 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id BEBF061B98; Fri, 30 Dec 2022 23:51:58 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 1BC9AC433D2; Fri, 30 Dec 2022 23:51:58 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1672444318; bh=hxZ1XfETEtKI0Ptap9W7r+Fcmst9kdNdbwcj+WQhAzY=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=FoTKoWA6lg965XMSeJNU6Ab6uXXhNbUNzU0mCyz3XzPZUi2Gek5rtaRKDWbmn0hOX Zlug1yFysicWR3m8JHJCnMdI1w0CuC3LNsqfgq3MTXlu9J5+FbMBOPRxbb0UXiLpvM fX1LO/t27em7wLDjdGscLtJIk12fY5x0cbQioBSQHCZeuQjfVbMogzyXPB9ZiITM1t /ZyYBKrjhLXfGbkjXx86VQQ9gue2WubOBKtA1XHzNON9VKLYjSPa+q5N7yqJXB2h6/ NqF6GQRKAQxFpjebIrZrt5jss4+ukJNVDEEpK2TS1JGbupfM4TKgKPwhHaRsxhr+3v z7WuHQy8W7dpQ== Subject: [PATCH 07/21] xfs: create deferred log items for extent swapping From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org Date: Fri, 30 Dec 2022 14:13:56 -0800 Message-ID: <167243843623.699466.18403745518586988680.stgit@magnolia> In-Reply-To: <167243843494.699466.5163281976943635014.stgit@magnolia> References: <167243843494.699466.5163281976943635014.stgit@magnolia> User-Agent: StGit/0.19 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Darrick J. Wong Now that we've created the skeleton of a log intent item to track and restart extent swap operations, add the upper level logic to commit intent items and turn them into concrete work recorded in the log. We use the deferred item "multihop" feature that was introduced a few patches ago to constrain the number of active swap operations to one per thread. Signed-off-by: Darrick J. Wong --- fs/xfs/Makefile | 2 fs/xfs/libxfs/xfs_bmap.h | 4 fs/xfs/libxfs/xfs_defer.c | 7 fs/xfs/libxfs/xfs_defer.h | 3 fs/xfs/libxfs/xfs_format.h | 6 fs/xfs/libxfs/xfs_log_format.h | 28 + fs/xfs/libxfs/xfs_swapext.c | 1021 +++++++++++++++++++++++++++++++++++++++ fs/xfs/libxfs/xfs_swapext.h | 142 +++++ fs/xfs/libxfs/xfs_trans_space.h | 4 fs/xfs/xfs_swapext_item.c | 357 +++++++++++++- fs/xfs/xfs_trace.c | 1 fs/xfs/xfs_trace.h | 215 ++++++++ fs/xfs/xfs_xchgrange.c | 65 ++ fs/xfs/xfs_xchgrange.h | 17 + 14 files changed, 1855 insertions(+), 17 deletions(-) create mode 100644 fs/xfs/libxfs/xfs_swapext.c create mode 100644 fs/xfs/xfs_xchgrange.c create mode 100644 fs/xfs/xfs_xchgrange.h diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile index c5cb8cf6ffbb..23b0c40620cf 100644 --- a/fs/xfs/Makefile +++ b/fs/xfs/Makefile @@ -46,6 +46,7 @@ xfs-y += $(addprefix libxfs/, \ xfs_refcount.o \ xfs_refcount_btree.o \ xfs_sb.o \ + xfs_swapext.o \ xfs_symlink_remote.o \ xfs_trans_inode.o \ xfs_trans_resv.o \ @@ -92,6 +93,7 @@ xfs-y += xfs_aops.o \ xfs_sysfs.o \ xfs_trans.o \ xfs_xattr.o \ + xfs_xchgrange.o \ kmem.o # low-level transaction/log code diff --git a/fs/xfs/libxfs/xfs_bmap.h b/fs/xfs/libxfs/xfs_bmap.h index cb09a43a2872..413ec27f2f24 100644 --- a/fs/xfs/libxfs/xfs_bmap.h +++ b/fs/xfs/libxfs/xfs_bmap.h @@ -144,7 +144,7 @@ static inline int xfs_bmapi_whichfork(uint32_t bmapi_flags) { BMAP_COWFORK, "COW" } /* Return true if the extent is an allocated extent, written or not. */ -static inline bool xfs_bmap_is_real_extent(struct xfs_bmbt_irec *irec) +static inline bool xfs_bmap_is_real_extent(const struct xfs_bmbt_irec *irec) { return irec->br_startblock != HOLESTARTBLOCK && irec->br_startblock != DELAYSTARTBLOCK && @@ -155,7 +155,7 @@ static inline bool xfs_bmap_is_real_extent(struct xfs_bmbt_irec *irec) * Return true if the extent is a real, allocated extent, or false if it is a * delayed allocation, and unwritten extent or a hole. */ -static inline bool xfs_bmap_is_written_extent(struct xfs_bmbt_irec *irec) +static inline bool xfs_bmap_is_written_extent(const struct xfs_bmbt_irec *irec) { return xfs_bmap_is_real_extent(irec) && irec->br_state != XFS_EXT_UNWRITTEN; diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c index bcfb6a4203cd..1619b9b928db 100644 --- a/fs/xfs/libxfs/xfs_defer.c +++ b/fs/xfs/libxfs/xfs_defer.c @@ -26,6 +26,7 @@ #include "xfs_da_format.h" #include "xfs_da_btree.h" #include "xfs_attr.h" +#include "xfs_swapext.h" static struct kmem_cache *xfs_defer_pending_cache; @@ -189,6 +190,7 @@ static const struct xfs_defer_op_type *defer_op_types[] = { [XFS_DEFER_OPS_TYPE_FREE] = &xfs_extent_free_defer_type, [XFS_DEFER_OPS_TYPE_AGFL_FREE] = &xfs_agfl_free_defer_type, [XFS_DEFER_OPS_TYPE_ATTR] = &xfs_attr_defer_type, + [XFS_DEFER_OPS_TYPE_SWAPEXT] = &xfs_swapext_defer_type, }; /* @@ -913,6 +915,10 @@ xfs_defer_init_item_caches(void) error = xfs_attr_intent_init_cache(); if (error) goto err; + error = xfs_swapext_intent_init_cache(); + if (error) + goto err; + return 0; err: xfs_defer_destroy_item_caches(); @@ -923,6 +929,7 @@ xfs_defer_init_item_caches(void) void xfs_defer_destroy_item_caches(void) { + xfs_swapext_intent_destroy_cache(); xfs_attr_intent_destroy_cache(); xfs_extfree_intent_destroy_cache(); xfs_bmap_intent_destroy_cache(); diff --git a/fs/xfs/libxfs/xfs_defer.h b/fs/xfs/libxfs/xfs_defer.h index 114a3a4930a3..bcc48b0c75c9 100644 --- a/fs/xfs/libxfs/xfs_defer.h +++ b/fs/xfs/libxfs/xfs_defer.h @@ -20,6 +20,7 @@ enum xfs_defer_ops_type { XFS_DEFER_OPS_TYPE_FREE, XFS_DEFER_OPS_TYPE_AGFL_FREE, XFS_DEFER_OPS_TYPE_ATTR, + XFS_DEFER_OPS_TYPE_SWAPEXT, XFS_DEFER_OPS_TYPE_MAX, }; @@ -65,7 +66,7 @@ extern const struct xfs_defer_op_type xfs_rmap_update_defer_type; extern const struct xfs_defer_op_type xfs_extent_free_defer_type; extern const struct xfs_defer_op_type xfs_agfl_free_defer_type; extern const struct xfs_defer_op_type xfs_attr_defer_type; - +extern const struct xfs_defer_op_type xfs_swapext_defer_type; /* * Deferred operation item relogging limits. diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h index 1424976ec955..bb8bff488017 100644 --- a/fs/xfs/libxfs/xfs_format.h +++ b/fs/xfs/libxfs/xfs_format.h @@ -425,6 +425,12 @@ static inline bool xfs_sb_version_haslogxattrs(struct xfs_sb *sbp) XFS_SB_FEAT_INCOMPAT_LOG_XATTRS); } +static inline bool xfs_sb_version_haslogswapext(struct xfs_sb *sbp) +{ + return xfs_sb_is_v5(sbp) && (sbp->sb_features_log_incompat & + XFS_SB_FEAT_INCOMPAT_LOG_SWAPEXT); +} + static inline bool xfs_is_quota_inode(struct xfs_sb *sbp, xfs_ino_t ino) { diff --git a/fs/xfs/libxfs/xfs_log_format.h b/fs/xfs/libxfs/xfs_log_format.h index b105a5ef6644..65a84fdefe56 100644 --- a/fs/xfs/libxfs/xfs_log_format.h +++ b/fs/xfs/libxfs/xfs_log_format.h @@ -891,9 +891,33 @@ struct xfs_swap_extent { int64_t sx_isize2; }; -#define XFS_SWAP_EXT_FLAGS (0) +/* Swap extents between extended attribute forks. */ +#define XFS_SWAP_EXT_ATTR_FORK (1ULL << 0) -#define XFS_SWAP_EXT_STRINGS +/* Set the file sizes when finished. */ +#define XFS_SWAP_EXT_SET_SIZES (1ULL << 1) + +/* Do not swap any part of the range where inode1's mapping is a hole. */ +#define XFS_SWAP_EXT_SKIP_INO1_HOLES (1ULL << 2) + +/* Clear the reflink flag from inode1 after the operation. */ +#define XFS_SWAP_EXT_CLEAR_INO1_REFLINK (1ULL << 3) + +/* Clear the reflink flag from inode2 after the operation. */ +#define XFS_SWAP_EXT_CLEAR_INO2_REFLINK (1ULL << 4) + +#define XFS_SWAP_EXT_FLAGS (XFS_SWAP_EXT_ATTR_FORK | \ + XFS_SWAP_EXT_SET_SIZES | \ + XFS_SWAP_EXT_SKIP_INO1_HOLES | \ + XFS_SWAP_EXT_CLEAR_INO1_REFLINK | \ + XFS_SWAP_EXT_CLEAR_INO2_REFLINK) + +#define XFS_SWAP_EXT_STRINGS \ + { XFS_SWAP_EXT_ATTR_FORK, "ATTRFORK" }, \ + { XFS_SWAP_EXT_SET_SIZES, "SETSIZES" }, \ + { XFS_SWAP_EXT_SKIP_INO1_HOLES, "SKIP_INO1_HOLES" }, \ + { XFS_SWAP_EXT_CLEAR_INO1_REFLINK, "CLEAR_INO1_REFLINK" }, \ + { XFS_SWAP_EXT_CLEAR_INO2_REFLINK, "CLEAR_INO2_REFLINK" } /* This is the structure used to lay out an sxi log item in the log. */ struct xfs_sxi_log_format { diff --git a/fs/xfs/libxfs/xfs_swapext.c b/fs/xfs/libxfs/xfs_swapext.c new file mode 100644 index 000000000000..0bc758c5cf5c --- /dev/null +++ b/fs/xfs/libxfs/xfs_swapext.c @@ -0,0 +1,1021 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Copyright (C) 2022 Oracle. All Rights Reserved. + * Author: Darrick J. Wong + */ +#include "xfs.h" +#include "xfs_fs.h" +#include "xfs_shared.h" +#include "xfs_format.h" +#include "xfs_log_format.h" +#include "xfs_trans_resv.h" +#include "xfs_mount.h" +#include "xfs_defer.h" +#include "xfs_inode.h" +#include "xfs_trans.h" +#include "xfs_bmap.h" +#include "xfs_icache.h" +#include "xfs_quota.h" +#include "xfs_swapext.h" +#include "xfs_trace.h" +#include "xfs_bmap_btree.h" +#include "xfs_trans_space.h" +#include "xfs_error.h" +#include "xfs_errortag.h" +#include "xfs_health.h" + +struct kmem_cache *xfs_swapext_intent_cache; + +/* bmbt mappings adjacent to a pair of records. */ +struct xfs_swapext_adjacent { + struct xfs_bmbt_irec left1; + struct xfs_bmbt_irec right1; + struct xfs_bmbt_irec left2; + struct xfs_bmbt_irec right2; +}; + +#define ADJACENT_INIT { \ + .left1 = { .br_startblock = HOLESTARTBLOCK }, \ + .right1 = { .br_startblock = HOLESTARTBLOCK }, \ + .left2 = { .br_startblock = HOLESTARTBLOCK }, \ + .right2 = { .br_startblock = HOLESTARTBLOCK }, \ +} + +/* Information to help us reset reflink flag / CoW fork state after a swap. */ + +/* Previous state of the two inodes' reflink flags. */ +#define XFS_REFLINK_STATE_IP1 (1U << 0) +#define XFS_REFLINK_STATE_IP2 (1U << 1) + +/* + * If the reflink flag is set on either inode, make sure it has an incore CoW + * fork, since all reflink inodes must have them. If there's a CoW fork and it + * has extents in it, make sure the inodes are tagged appropriately so that + * speculative preallocations can be GC'd if we run low of space. + */ +static inline void +xfs_swapext_ensure_cowfork( + struct xfs_inode *ip) +{ + struct xfs_ifork *cfork; + + if (xfs_is_reflink_inode(ip)) + xfs_ifork_init_cow(ip); + + cfork = xfs_ifork_ptr(ip, XFS_COW_FORK); + if (!cfork) + return; + if (cfork->if_bytes > 0) + xfs_inode_set_cowblocks_tag(ip); + else + xfs_inode_clear_cowblocks_tag(ip); +} + +/* Schedule an atomic extent swap. */ +void +xfs_swapext_schedule( + struct xfs_trans *tp, + struct xfs_swapext_intent *sxi) +{ + trace_xfs_swapext_defer(tp->t_mountp, sxi); + xfs_defer_add(tp, XFS_DEFER_OPS_TYPE_SWAPEXT, &sxi->sxi_list); +} + +/* + * Adjust the on-disk inode size upwards if needed so that we never map extents + * into the file past EOF. This is crucial so that log recovery won't get + * confused by the sudden appearance of post-eof extents. + */ +STATIC void +xfs_swapext_update_size( + struct xfs_trans *tp, + struct xfs_inode *ip, + struct xfs_bmbt_irec *imap, + xfs_fsize_t new_isize) +{ + struct xfs_mount *mp = tp->t_mountp; + xfs_fsize_t len; + + if (new_isize < 0) + return; + + len = min(XFS_FSB_TO_B(mp, imap->br_startoff + imap->br_blockcount), + new_isize); + + if (len <= ip->i_disk_size) + return; + + trace_xfs_swapext_update_inode_size(ip, len); + + ip->i_disk_size = len; + xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE); +} + +static inline bool +sxi_has_more_swap_work(const struct xfs_swapext_intent *sxi) +{ + return sxi->sxi_blockcount > 0; +} + +static inline bool +sxi_has_postop_work(const struct xfs_swapext_intent *sxi) +{ + return sxi->sxi_flags & (XFS_SWAP_EXT_CLEAR_INO1_REFLINK | + XFS_SWAP_EXT_CLEAR_INO2_REFLINK); +} + +static inline void +sxi_advance( + struct xfs_swapext_intent *sxi, + const struct xfs_bmbt_irec *irec) +{ + sxi->sxi_startoff1 += irec->br_blockcount; + sxi->sxi_startoff2 += irec->br_blockcount; + sxi->sxi_blockcount -= irec->br_blockcount; +} + +/* Check all extents to make sure we can actually swap them. */ +int +xfs_swapext_check_extents( + struct xfs_mount *mp, + const struct xfs_swapext_req *req) +{ + struct xfs_ifork *ifp1, *ifp2; + + /* No fork? */ + ifp1 = xfs_ifork_ptr(req->ip1, req->whichfork); + ifp2 = xfs_ifork_ptr(req->ip2, req->whichfork); + if (!ifp1 || !ifp2) + return -EINVAL; + + /* We don't know how to swap local format forks. */ + if (ifp1->if_format == XFS_DINODE_FMT_LOCAL || + ifp2->if_format == XFS_DINODE_FMT_LOCAL) + return -EINVAL; + + /* We don't support realtime data forks yet. */ + if (!XFS_IS_REALTIME_INODE(req->ip1)) + return 0; + if (req->whichfork == XFS_ATTR_FORK) + return 0; + return -EINVAL; +} + +#ifdef CONFIG_XFS_QUOTA +/* Log the actual updates to the quota accounting. */ +static inline void +xfs_swapext_update_quota( + struct xfs_trans *tp, + struct xfs_swapext_intent *sxi, + struct xfs_bmbt_irec *irec1, + struct xfs_bmbt_irec *irec2) +{ + int64_t ip1_delta = 0, ip2_delta = 0; + unsigned int qflag; + + qflag = XFS_IS_REALTIME_INODE(sxi->sxi_ip1) ? XFS_TRANS_DQ_RTBCOUNT : + XFS_TRANS_DQ_BCOUNT; + + if (xfs_bmap_is_real_extent(irec1)) { + ip1_delta -= irec1->br_blockcount; + ip2_delta += irec1->br_blockcount; + } + + if (xfs_bmap_is_real_extent(irec2)) { + ip1_delta += irec2->br_blockcount; + ip2_delta -= irec2->br_blockcount; + } + + xfs_trans_mod_dquot_byino(tp, sxi->sxi_ip1, qflag, ip1_delta); + xfs_trans_mod_dquot_byino(tp, sxi->sxi_ip2, qflag, ip2_delta); +} +#else +# define xfs_swapext_update_quota(tp, sxi, irec1, irec2) ((void)0) +#endif + +/* + * Walk forward through the file ranges in @sxi until we find two different + * mappings to exchange. If there is work to do, return the mappings; + * otherwise we've reached the end of the range and sxi_blockcount will be + * zero. + * + * If the walk skips over a pair of mappings to the same storage, save them as + * the left records in @adj (if provided) so that the simulation phase can + * avoid an extra lookup. + */ +static int +xfs_swapext_find_mappings( + struct xfs_swapext_intent *sxi, + struct xfs_bmbt_irec *irec1, + struct xfs_bmbt_irec *irec2, + struct xfs_swapext_adjacent *adj) +{ + int nimaps; + int bmap_flags; + int error; + + bmap_flags = xfs_bmapi_aflag(xfs_swapext_whichfork(sxi)); + + for (; sxi_has_more_swap_work(sxi); sxi_advance(sxi, irec1)) { + /* Read extent from the first file */ + nimaps = 1; + error = xfs_bmapi_read(sxi->sxi_ip1, sxi->sxi_startoff1, + sxi->sxi_blockcount, irec1, &nimaps, + bmap_flags); + if (error) + return error; + if (nimaps != 1 || + irec1->br_startblock == DELAYSTARTBLOCK || + irec1->br_startoff != sxi->sxi_startoff1) { + /* + * We should never get no mapping or a delalloc extent + * or something that doesn't match what we asked for, + * since the caller flushed both inodes and we hold the + * ILOCKs for both inodes. + */ + ASSERT(0); + return -EINVAL; + } + + /* + * If the caller told us to ignore sparse areas of file1, jump + * ahead to the next region. + */ + if ((sxi->sxi_flags & XFS_SWAP_EXT_SKIP_INO1_HOLES) && + irec1->br_startblock == HOLESTARTBLOCK) { + trace_xfs_swapext_extent1(sxi->sxi_ip1, irec1); + continue; + } + + /* Read extent from the second file */ + nimaps = 1; + error = xfs_bmapi_read(sxi->sxi_ip2, sxi->sxi_startoff2, + irec1->br_blockcount, irec2, &nimaps, + bmap_flags); + if (error) + return error; + if (nimaps != 1 || + irec2->br_startblock == DELAYSTARTBLOCK || + irec2->br_startoff != sxi->sxi_startoff2) { + /* + * We should never get no mapping or a delalloc extent + * or something that doesn't match what we asked for, + * since the caller flushed both inodes and we hold the + * ILOCKs for both inodes. + */ + ASSERT(0); + return -EINVAL; + } + + /* + * We can only swap as many blocks as the smaller of the two + * extent maps. + */ + irec1->br_blockcount = min(irec1->br_blockcount, + irec2->br_blockcount); + + trace_xfs_swapext_extent1(sxi->sxi_ip1, irec1); + trace_xfs_swapext_extent2(sxi->sxi_ip2, irec2); + + /* We found something to swap, so return it. */ + if (irec1->br_startblock != irec2->br_startblock) + return 0; + + /* + * Two extents mapped to the same physical block must not have + * different states; that's filesystem corruption. Move on to + * the next extent if they're both holes or both the same + * physical extent. + */ + if (irec1->br_state != irec2->br_state) { + xfs_bmap_mark_sick(sxi->sxi_ip1, + xfs_swapext_whichfork(sxi)); + xfs_bmap_mark_sick(sxi->sxi_ip2, + xfs_swapext_whichfork(sxi)); + return -EFSCORRUPTED; + } + + /* + * Save the mappings if we're estimating work and skipping + * these identical mappings. + */ + if (adj) { + memcpy(&adj->left1, irec1, sizeof(*irec1)); + memcpy(&adj->left2, irec2, sizeof(*irec2)); + } + } + + return 0; +} + +/* Exchange these two mappings. */ +static void +xfs_swapext_exchange_mappings( + struct xfs_trans *tp, + struct xfs_swapext_intent *sxi, + struct xfs_bmbt_irec *irec1, + struct xfs_bmbt_irec *irec2) +{ + int whichfork = xfs_swapext_whichfork(sxi); + + xfs_swapext_update_quota(tp, sxi, irec1, irec2); + + /* Remove both mappings. */ + xfs_bmap_unmap_extent(tp, sxi->sxi_ip1, whichfork, irec1); + xfs_bmap_unmap_extent(tp, sxi->sxi_ip2, whichfork, irec2); + + /* + * Re-add both mappings. We swap the file offsets between the two maps + * and add the opposite map, which has the effect of filling the + * logical offsets we just unmapped, but with with the physical mapping + * information swapped. + */ + swap(irec1->br_startoff, irec2->br_startoff); + xfs_bmap_map_extent(tp, sxi->sxi_ip1, whichfork, irec2); + xfs_bmap_map_extent(tp, sxi->sxi_ip2, whichfork, irec1); + + /* Make sure we're not mapping extents past EOF. */ + if (whichfork == XFS_DATA_FORK) { + xfs_swapext_update_size(tp, sxi->sxi_ip1, irec2, + sxi->sxi_isize1); + xfs_swapext_update_size(tp, sxi->sxi_ip2, irec1, + sxi->sxi_isize2); + } + + /* + * Advance our cursor and exit. The caller (either defer ops or log + * recovery) will log the SXD item, and if *blockcount is nonzero, it + * will log a new SXI item for the remainder and call us back. + */ + sxi_advance(sxi, irec1); +} + +static inline void +xfs_swapext_clear_reflink( + struct xfs_trans *tp, + struct xfs_inode *ip) +{ + trace_xfs_reflink_unset_inode_flag(ip); + + ip->i_diflags2 &= ~XFS_DIFLAG2_REFLINK; + xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE); +} + +/* Finish whatever work might come after a swap operation. */ +static int +xfs_swapext_do_postop_work( + struct xfs_trans *tp, + struct xfs_swapext_intent *sxi) +{ + if (sxi->sxi_flags & XFS_SWAP_EXT_CLEAR_INO1_REFLINK) { + xfs_swapext_clear_reflink(tp, sxi->sxi_ip1); + sxi->sxi_flags &= ~XFS_SWAP_EXT_CLEAR_INO1_REFLINK; + } + + if (sxi->sxi_flags & XFS_SWAP_EXT_CLEAR_INO2_REFLINK) { + xfs_swapext_clear_reflink(tp, sxi->sxi_ip2); + sxi->sxi_flags &= ~XFS_SWAP_EXT_CLEAR_INO2_REFLINK; + } + + return 0; +} + +/* Finish one extent swap, possibly log more. */ +int +xfs_swapext_finish_one( + struct xfs_trans *tp, + struct xfs_swapext_intent *sxi) +{ + struct xfs_bmbt_irec irec1, irec2; + int error; + + if (sxi_has_more_swap_work(sxi)) { + /* + * If the operation state says that some range of the files + * have not yet been swapped, look for extents in that range to + * swap. If we find some extents, swap them. + */ + error = xfs_swapext_find_mappings(sxi, &irec1, &irec2, NULL); + if (error) + return error; + + if (sxi_has_more_swap_work(sxi)) + xfs_swapext_exchange_mappings(tp, sxi, &irec1, &irec2); + + /* + * If the caller asked us to exchange the file sizes after the + * swap and either we just swapped the last extents in the + * range or we didn't find anything to swap, update the ondisk + * file sizes. + */ + if ((sxi->sxi_flags & XFS_SWAP_EXT_SET_SIZES) && + !sxi_has_more_swap_work(sxi)) { + sxi->sxi_ip1->i_disk_size = sxi->sxi_isize1; + sxi->sxi_ip2->i_disk_size = sxi->sxi_isize2; + + xfs_trans_log_inode(tp, sxi->sxi_ip1, XFS_ILOG_CORE); + xfs_trans_log_inode(tp, sxi->sxi_ip2, XFS_ILOG_CORE); + } + } else if (sxi_has_postop_work(sxi)) { + /* + * Now that we're finished with the swap operation, complete + * the post-op cleanup work. + */ + error = xfs_swapext_do_postop_work(tp, sxi); + if (error) + return error; + } + + /* If we still have work to do, ask for a new transaction. */ + if (sxi_has_more_swap_work(sxi) || sxi_has_postop_work(sxi)) { + trace_xfs_swapext_defer(tp->t_mountp, sxi); + return -EAGAIN; + } + + /* + * If we reach here, we've finished all the swapping work and the post + * operation work. The last thing we need to do before returning to + * the caller is to make sure that COW forks are set up correctly. + */ + if (!(sxi->sxi_flags & XFS_SWAP_EXT_ATTR_FORK)) { + xfs_swapext_ensure_cowfork(sxi->sxi_ip1); + xfs_swapext_ensure_cowfork(sxi->sxi_ip2); + } + + return 0; +} + +/* + * Compute the amount of bmbt blocks we should reserve for each file. In the + * worst case, each exchange will fill a hole with a new mapping, which could + * result in a btree split every time we add a new leaf block. + */ +static inline uint64_t +xfs_swapext_bmbt_blocks( + struct xfs_mount *mp, + const struct xfs_swapext_req *req) +{ + return howmany_64(req->nr_exchanges, + XFS_MAX_CONTIG_BMAPS_PER_BLOCK(mp)) * + XFS_EXTENTADD_SPACE_RES(mp, req->whichfork); +} + +static inline uint64_t +xfs_swapext_rmapbt_blocks( + struct xfs_mount *mp, + const struct xfs_swapext_req *req) +{ + if (!xfs_has_rmapbt(mp)) + return 0; + if (XFS_IS_REALTIME_INODE(req->ip1)) + return 0; + + return howmany_64(req->nr_exchanges, + XFS_MAX_CONTIG_RMAPS_PER_BLOCK(mp)) * + XFS_RMAPADD_SPACE_RES(mp); +} + +/* Estimate the bmbt and rmapbt overhead required to exchange extents. */ +static int +xfs_swapext_estimate_overhead( + struct xfs_swapext_req *req) +{ + struct xfs_mount *mp = req->ip1->i_mount; + xfs_filblks_t bmbt_blocks; + xfs_filblks_t rmapbt_blocks; + xfs_filblks_t resblks = req->resblks; + + /* + * Compute the number of bmbt and rmapbt blocks we might need to handle + * the estimated number of exchanges. + */ + bmbt_blocks = xfs_swapext_bmbt_blocks(mp, req); + rmapbt_blocks = xfs_swapext_rmapbt_blocks(mp, req); + + trace_xfs_swapext_overhead(mp, bmbt_blocks, rmapbt_blocks); + + /* Make sure the change in file block count doesn't overflow. */ + if (check_add_overflow(req->ip1_bcount, bmbt_blocks, &req->ip1_bcount)) + return -EFBIG; + if (check_add_overflow(req->ip2_bcount, bmbt_blocks, &req->ip2_bcount)) + return -EFBIG; + + /* + * Add together the number of blocks we need to handle btree growth, + * then add it to the number of blocks we need to reserve to this + * transaction. + */ + if (check_add_overflow(resblks, bmbt_blocks, &resblks)) + return -ENOSPC; + if (check_add_overflow(resblks, bmbt_blocks, &resblks)) + return -ENOSPC; + if (check_add_overflow(resblks, rmapbt_blocks, &resblks)) + return -ENOSPC; + if (check_add_overflow(resblks, rmapbt_blocks, &resblks)) + return -ENOSPC; + + /* Can't actually reserve more than UINT_MAX blocks. */ + if (req->resblks > UINT_MAX) + return -ENOSPC; + + req->resblks = resblks; + trace_xfs_swapext_final_estimate(req); + return 0; +} + +/* Decide if we can merge two real extents. */ +static inline bool +can_merge( + const struct xfs_bmbt_irec *b1, + const struct xfs_bmbt_irec *b2) +{ + /* Don't merge holes. */ + if (b1->br_startblock == HOLESTARTBLOCK || + b2->br_startblock == HOLESTARTBLOCK) + return false; + + /* We don't merge holes. */ + if (!xfs_bmap_is_real_extent(b1) || !xfs_bmap_is_real_extent(b2)) + return false; + + if (b1->br_startoff + b1->br_blockcount == b2->br_startoff && + b1->br_startblock + b1->br_blockcount == b2->br_startblock && + b1->br_state == b2->br_state && + b1->br_blockcount + b2->br_blockcount <= XFS_MAX_BMBT_EXTLEN) + return true; + + return false; +} + +#define CLEFT_CONTIG 0x01 +#define CRIGHT_CONTIG 0x02 +#define CHOLE 0x04 +#define CBOTH_CONTIG (CLEFT_CONTIG | CRIGHT_CONTIG) + +#define NLEFT_CONTIG 0x10 +#define NRIGHT_CONTIG 0x20 +#define NHOLE 0x40 +#define NBOTH_CONTIG (NLEFT_CONTIG | NRIGHT_CONTIG) + +/* Estimate the effect of a single swap on extent count. */ +static inline int +delta_nextents_step( + struct xfs_mount *mp, + const struct xfs_bmbt_irec *left, + const struct xfs_bmbt_irec *curr, + const struct xfs_bmbt_irec *new, + const struct xfs_bmbt_irec *right) +{ + bool lhole, rhole, chole, nhole; + unsigned int state = 0; + int ret = 0; + + lhole = left->br_startblock == HOLESTARTBLOCK; + rhole = right->br_startblock == HOLESTARTBLOCK; + chole = curr->br_startblock == HOLESTARTBLOCK; + nhole = new->br_startblock == HOLESTARTBLOCK; + + if (chole) + state |= CHOLE; + if (!lhole && !chole && can_merge(left, curr)) + state |= CLEFT_CONTIG; + if (!rhole && !chole && can_merge(curr, right)) + state |= CRIGHT_CONTIG; + if ((state & CBOTH_CONTIG) == CBOTH_CONTIG && + left->br_startblock + curr->br_startblock + + right->br_startblock > XFS_MAX_BMBT_EXTLEN) + state &= ~CRIGHT_CONTIG; + + if (nhole) + state |= NHOLE; + if (!lhole && !nhole && can_merge(left, new)) + state |= NLEFT_CONTIG; + if (!rhole && !nhole && can_merge(new, right)) + state |= NRIGHT_CONTIG; + if ((state & NBOTH_CONTIG) == NBOTH_CONTIG && + left->br_startblock + new->br_startblock + + right->br_startblock > XFS_MAX_BMBT_EXTLEN) + state &= ~NRIGHT_CONTIG; + + switch (state & (CLEFT_CONTIG | CRIGHT_CONTIG | CHOLE)) { + case CLEFT_CONTIG | CRIGHT_CONTIG: + /* + * left/curr/right are the same extent, so deleting curr causes + * 2 new extents to be created. + */ + ret += 2; + break; + case 0: + /* + * curr is not contiguous with any extent, so we remove curr + * completely + */ + ret--; + break; + case CHOLE: + /* hole, do nothing */ + break; + case CLEFT_CONTIG: + case CRIGHT_CONTIG: + /* trim either left or right, no change */ + break; + } + + switch (state & (NLEFT_CONTIG | NRIGHT_CONTIG | NHOLE)) { + case NLEFT_CONTIG | NRIGHT_CONTIG: + /* + * left/curr/right will become the same extent, so adding + * curr causes the deletion of right. + */ + ret--; + break; + case 0: + /* new is not contiguous with any extent */ + ret++; + break; + case NHOLE: + /* hole, do nothing. */ + break; + case NLEFT_CONTIG: + case NRIGHT_CONTIG: + /* new is absorbed into left or right, no change */ + break; + } + + trace_xfs_swapext_delta_nextents_step(mp, left, curr, new, right, ret, + state); + return ret; +} + +/* Make sure we don't overflow the extent counters. */ +static inline int +ensure_delta_nextents( + struct xfs_swapext_req *req, + struct xfs_inode *ip, + int64_t delta) +{ + struct xfs_mount *mp = ip->i_mount; + struct xfs_ifork *ifp = xfs_ifork_ptr(ip, req->whichfork); + xfs_extnum_t max_extents; + bool large_extcount; + + if (delta < 0) + return 0; + + if (XFS_TEST_ERROR(false, mp, XFS_ERRTAG_REDUCE_MAX_IEXTENTS)) { + if (ifp->if_nextents + delta > 10) + return -EFBIG; + } + + if (req->req_flags & XFS_SWAP_REQ_NREXT64) + large_extcount = true; + else + large_extcount = xfs_inode_has_large_extent_counts(ip); + + max_extents = xfs_iext_max_nextents(large_extcount, req->whichfork); + if (ifp->if_nextents + delta <= max_extents) + return 0; + if (large_extcount) + return -EFBIG; + if (!xfs_has_large_extent_counts(mp)) + return -EFBIG; + + max_extents = xfs_iext_max_nextents(true, req->whichfork); + if (ifp->if_nextents + delta > max_extents) + return -EFBIG; + + req->req_flags |= XFS_SWAP_REQ_NREXT64; + return 0; +} + +/* Find the next extent after irec. */ +static inline int +get_next_ext( + struct xfs_inode *ip, + int bmap_flags, + const struct xfs_bmbt_irec *irec, + struct xfs_bmbt_irec *nrec) +{ + xfs_fileoff_t off; + xfs_filblks_t blockcount; + int nimaps = 1; + int error; + + off = irec->br_startoff + irec->br_blockcount; + blockcount = XFS_MAX_FILEOFF - off; + error = xfs_bmapi_read(ip, off, blockcount, nrec, &nimaps, bmap_flags); + if (error) + return error; + if (nrec->br_startblock == DELAYSTARTBLOCK || + nrec->br_startoff != off) { + /* + * If we don't get the extent we want, return a zero-length + * mapping, which our estimator function will pretend is a hole. + * We shouldn't get delalloc reservations. + */ + nrec->br_startblock = HOLESTARTBLOCK; + } + + return 0; +} + +int __init +xfs_swapext_intent_init_cache(void) +{ + xfs_swapext_intent_cache = kmem_cache_create("xfs_swapext_intent", + sizeof(struct xfs_swapext_intent), + 0, 0, NULL); + + return xfs_swapext_intent_cache != NULL ? 0 : -ENOMEM; +} + +void +xfs_swapext_intent_destroy_cache(void) +{ + kmem_cache_destroy(xfs_swapext_intent_cache); + xfs_swapext_intent_cache = NULL; +} + +/* + * Decide if we will swap the reflink flags between the two files after the + * swap. The only time we want to do this is if we're exchanging all extents + * under EOF and the inode reflink flags have different states. + */ +static inline bool +sxi_can_exchange_reflink_flags( + const struct xfs_swapext_req *req, + unsigned int reflink_state) +{ + struct xfs_mount *mp = req->ip1->i_mount; + + if (hweight32(reflink_state) != 1) + return false; + if (req->startoff1 != 0 || req->startoff2 != 0) + return false; + if (req->blockcount != XFS_B_TO_FSB(mp, req->ip1->i_disk_size)) + return false; + if (req->blockcount != XFS_B_TO_FSB(mp, req->ip2->i_disk_size)) + return false; + return true; +} + + +/* Allocate and initialize a new incore intent item from a request. */ +struct xfs_swapext_intent * +xfs_swapext_init_intent( + const struct xfs_swapext_req *req, + unsigned int *reflink_state) +{ + struct xfs_swapext_intent *sxi; + unsigned int rs = 0; + + sxi = kmem_cache_zalloc(xfs_swapext_intent_cache, + GFP_NOFS | __GFP_NOFAIL); + INIT_LIST_HEAD(&sxi->sxi_list); + sxi->sxi_ip1 = req->ip1; + sxi->sxi_ip2 = req->ip2; + sxi->sxi_startoff1 = req->startoff1; + sxi->sxi_startoff2 = req->startoff2; + sxi->sxi_blockcount = req->blockcount; + sxi->sxi_isize1 = sxi->sxi_isize2 = -1; + + if (req->whichfork == XFS_ATTR_FORK) + sxi->sxi_flags |= XFS_SWAP_EXT_ATTR_FORK; + + if (req->whichfork == XFS_DATA_FORK && + (req->req_flags & XFS_SWAP_REQ_SET_SIZES)) { + sxi->sxi_flags |= XFS_SWAP_EXT_SET_SIZES; + sxi->sxi_isize1 = req->ip2->i_disk_size; + sxi->sxi_isize2 = req->ip1->i_disk_size; + } + + if (req->req_flags & XFS_SWAP_REQ_SKIP_INO1_HOLES) + sxi->sxi_flags |= XFS_SWAP_EXT_SKIP_INO1_HOLES; + + if (req->req_flags & XFS_SWAP_REQ_LOGGED) + sxi->sxi_op_flags |= XFS_SWAP_EXT_OP_LOGGED; + if (req->req_flags & XFS_SWAP_REQ_NREXT64) + sxi->sxi_op_flags |= XFS_SWAP_EXT_OP_NREXT64; + + if (req->whichfork == XFS_DATA_FORK) { + /* + * Record the state of each inode's reflink flag before the + * operation. + */ + if (xfs_is_reflink_inode(req->ip1)) + rs |= XFS_REFLINK_STATE_IP1; + if (xfs_is_reflink_inode(req->ip2)) + rs |= XFS_REFLINK_STATE_IP2; + + /* + * Figure out if we're clearing the reflink flags (which + * effectively swaps them) after the operation. + */ + if (sxi_can_exchange_reflink_flags(req, rs)) { + if (rs & XFS_REFLINK_STATE_IP1) + sxi->sxi_flags |= + XFS_SWAP_EXT_CLEAR_INO1_REFLINK; + if (rs & XFS_REFLINK_STATE_IP2) + sxi->sxi_flags |= + XFS_SWAP_EXT_CLEAR_INO2_REFLINK; + } + } + + if (reflink_state) + *reflink_state = rs; + return sxi; +} + +/* + * Estimate the number of exchange operations and the number of file blocks + * in each file that will be affected by the exchange operation. + */ +int +xfs_swapext_estimate( + struct xfs_swapext_req *req) +{ + struct xfs_swapext_intent *sxi; + struct xfs_bmbt_irec irec1, irec2; + struct xfs_swapext_adjacent adj = ADJACENT_INIT; + xfs_filblks_t ip1_blocks = 0, ip2_blocks = 0; + int64_t d_nexts1, d_nexts2; + int bmap_flags; + int error; + + ASSERT(!(req->req_flags & ~XFS_SWAP_REQ_FLAGS)); + + bmap_flags = xfs_bmapi_aflag(req->whichfork); + sxi = xfs_swapext_init_intent(req, NULL); + + /* + * To guard against the possibility of overflowing the extent counters, + * we have to estimate an upper bound on the potential increase in that + * counter. We can split the extent at each end of the range, and for + * each step of the swap we can split the extent that we're working on + * if the extents do not align. + */ + d_nexts1 = d_nexts2 = 3; + + while (sxi_has_more_swap_work(sxi)) { + /* + * Walk through the file ranges until we find something to + * swap. Because we're simulating the swap, pass in adj to + * capture skipped mappings for correct estimation of bmbt + * record merges. + */ + error = xfs_swapext_find_mappings(sxi, &irec1, &irec2, &adj); + if (error) + goto out_free; + if (!sxi_has_more_swap_work(sxi)) + break; + + /* Update accounting. */ + if (xfs_bmap_is_real_extent(&irec1)) + ip1_blocks += irec1.br_blockcount; + if (xfs_bmap_is_real_extent(&irec2)) + ip2_blocks += irec2.br_blockcount; + req->nr_exchanges++; + + /* Read the next extents from both files. */ + error = get_next_ext(req->ip1, bmap_flags, &irec1, &adj.right1); + if (error) + goto out_free; + + error = get_next_ext(req->ip2, bmap_flags, &irec2, &adj.right2); + if (error) + goto out_free; + + /* Update extent count deltas. */ + d_nexts1 += delta_nextents_step(req->ip1->i_mount, + &adj.left1, &irec1, &irec2, &adj.right1); + + d_nexts2 += delta_nextents_step(req->ip1->i_mount, + &adj.left2, &irec2, &irec1, &adj.right2); + + /* Now pretend we swapped the extents. */ + if (can_merge(&adj.left2, &irec1)) + adj.left2.br_blockcount += irec1.br_blockcount; + else + memcpy(&adj.left2, &irec1, sizeof(irec1)); + + if (can_merge(&adj.left1, &irec2)) + adj.left1.br_blockcount += irec2.br_blockcount; + else + memcpy(&adj.left1, &irec2, sizeof(irec2)); + + sxi_advance(sxi, &irec1); + } + + /* Account for the blocks that are being exchanged. */ + if (XFS_IS_REALTIME_INODE(req->ip1) && + req->whichfork == XFS_DATA_FORK) { + req->ip1_rtbcount = ip1_blocks; + req->ip2_rtbcount = ip2_blocks; + } else { + req->ip1_bcount = ip1_blocks; + req->ip2_bcount = ip2_blocks; + } + + /* + * Make sure that both forks have enough slack left in their extent + * counters that the swap operation will not overflow. + */ + trace_xfs_swapext_delta_nextents(req, d_nexts1, d_nexts2); + if (req->ip1 == req->ip2) { + error = ensure_delta_nextents(req, req->ip1, + d_nexts1 + d_nexts2); + } else { + error = ensure_delta_nextents(req, req->ip1, d_nexts1); + if (error) + goto out_free; + error = ensure_delta_nextents(req, req->ip2, d_nexts2); + } + if (error) + goto out_free; + + trace_xfs_swapext_initial_estimate(req); + error = xfs_swapext_estimate_overhead(req); +out_free: + kmem_cache_free(xfs_swapext_intent_cache, sxi); + return error; +} + +static inline void +xfs_swapext_set_reflink( + struct xfs_trans *tp, + struct xfs_inode *ip) +{ + trace_xfs_reflink_set_inode_flag(ip); + + ip->i_diflags2 |= XFS_DIFLAG2_REFLINK; + xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE); +} + +/* + * If either file has shared blocks and we're swapping data forks, we must flag + * the other file as having shared blocks so that we get the shared-block rmap + * functions if we need to fix up the rmaps. + */ +void +xfs_swapext_ensure_reflink( + struct xfs_trans *tp, + const struct xfs_swapext_intent *sxi, + unsigned int reflink_state) +{ + if ((reflink_state & XFS_REFLINK_STATE_IP1) && + !xfs_is_reflink_inode(sxi->sxi_ip2)) + xfs_swapext_set_reflink(tp, sxi->sxi_ip2); + + if ((reflink_state & XFS_REFLINK_STATE_IP2) && + !xfs_is_reflink_inode(sxi->sxi_ip1)) + xfs_swapext_set_reflink(tp, sxi->sxi_ip1); +} + +/* Widen the extent counts of both inodes if necessary. */ +static inline void +xfs_swapext_upgrade_extent_counts( + struct xfs_trans *tp, + const struct xfs_swapext_intent *sxi) +{ + if (!(sxi->sxi_op_flags & XFS_SWAP_EXT_OP_NREXT64)) + return; + + sxi->sxi_ip1->i_diflags2 |= XFS_DIFLAG2_NREXT64; + xfs_trans_log_inode(tp, sxi->sxi_ip1, XFS_ILOG_CORE); + + sxi->sxi_ip2->i_diflags2 |= XFS_DIFLAG2_NREXT64; + xfs_trans_log_inode(tp, sxi->sxi_ip2, XFS_ILOG_CORE); +} + +/* + * Schedule a swap a range of extents from one inode to another. If the atomic + * swap feature is enabled, then the operation progress can be resumed even if + * the system goes down. The caller must commit the transaction to start the + * work. + * + * The caller must ensure the inodes must be joined to the transaction and + * ILOCKd; they will still be joined to the transaction at exit. + */ +void +xfs_swapext( + struct xfs_trans *tp, + const struct xfs_swapext_req *req) +{ + struct xfs_swapext_intent *sxi; + unsigned int reflink_state; + + ASSERT(xfs_isilocked(req->ip1, XFS_ILOCK_EXCL)); + ASSERT(xfs_isilocked(req->ip2, XFS_ILOCK_EXCL)); + ASSERT(req->whichfork != XFS_COW_FORK); + ASSERT(!(req->req_flags & ~XFS_SWAP_REQ_FLAGS)); + if (req->req_flags & XFS_SWAP_REQ_SET_SIZES) + ASSERT(req->whichfork == XFS_DATA_FORK); + + if (req->blockcount == 0) + return; + + sxi = xfs_swapext_init_intent(req, &reflink_state); + xfs_swapext_schedule(tp, sxi); + xfs_swapext_ensure_reflink(tp, sxi, reflink_state); + xfs_swapext_upgrade_extent_counts(tp, sxi); +} diff --git a/fs/xfs/libxfs/xfs_swapext.h b/fs/xfs/libxfs/xfs_swapext.h index 316323339d76..1987897ddc25 100644 --- a/fs/xfs/libxfs/xfs_swapext.h +++ b/fs/xfs/libxfs/xfs_swapext.h @@ -21,4 +21,146 @@ static inline bool xfs_swapext_supported(struct xfs_mount *mp) !xfs_has_realtime(mp); } +/* + * In-core information about an extent swap request between ranges of two + * inodes. + */ +struct xfs_swapext_intent { + /* List of other incore deferred work. */ + struct list_head sxi_list; + + /* Inodes participating in the operation. */ + struct xfs_inode *sxi_ip1; + struct xfs_inode *sxi_ip2; + + /* File offset range information. */ + xfs_fileoff_t sxi_startoff1; + xfs_fileoff_t sxi_startoff2; + xfs_filblks_t sxi_blockcount; + + /* Set these file sizes after the operation, unless negative. */ + xfs_fsize_t sxi_isize1; + xfs_fsize_t sxi_isize2; + + /* XFS_SWAP_EXT_* log operation flags */ + unsigned int sxi_flags; + + /* XFS_SWAP_EXT_OP_* flags */ + unsigned int sxi_op_flags; +}; + +/* Use log intent items to track and restart the entire operation. */ +#define XFS_SWAP_EXT_OP_LOGGED (1U << 0) + +/* Upgrade files to have large extent counts before proceeding. */ +#define XFS_SWAP_EXT_OP_NREXT64 (1U << 1) + +#define XFS_SWAP_EXT_OP_STRINGS \ + { XFS_SWAP_EXT_OP_LOGGED, "LOGGED" }, \ + { XFS_SWAP_EXT_OP_NREXT64, "NREXT64" } + +static inline int +xfs_swapext_whichfork(const struct xfs_swapext_intent *sxi) +{ + if (sxi->sxi_flags & XFS_SWAP_EXT_ATTR_FORK) + return XFS_ATTR_FORK; + return XFS_DATA_FORK; +} + +/* Parameters for a swapext request. */ +struct xfs_swapext_req { + /* Inodes participating in the operation. */ + struct xfs_inode *ip1; + struct xfs_inode *ip2; + + /* File offset range information. */ + xfs_fileoff_t startoff1; + xfs_fileoff_t startoff2; + xfs_filblks_t blockcount; + + /* Data or attr fork? */ + int whichfork; + + /* XFS_SWAP_REQ_* operation flags */ + unsigned int req_flags; + + /* + * Fields below this line are filled out by xfs_swapext_estimate; + * callers should initialize this part of the struct to zero. + */ + + /* + * Data device blocks to be moved out of ip1, and free space needed to + * handle the bmbt changes. + */ + xfs_filblks_t ip1_bcount; + + /* + * Data device blocks to be moved out of ip2, and free space needed to + * handle the bmbt changes. + */ + xfs_filblks_t ip2_bcount; + + /* rt blocks to be moved out of ip1. */ + xfs_filblks_t ip1_rtbcount; + + /* rt blocks to be moved out of ip2. */ + xfs_filblks_t ip2_rtbcount; + + /* Free space needed to handle the bmbt changes */ + unsigned long long resblks; + + /* Number of extent swaps needed to complete the operation */ + unsigned long long nr_exchanges; +}; + +/* Caller has permission to use log intent items for the swapext operation. */ +#define XFS_SWAP_REQ_LOGGED (1U << 0) + +/* Set the file sizes when finished. */ +#define XFS_SWAP_REQ_SET_SIZES (1U << 1) + +/* Do not swap any part of the range where ip1's mapping is a hole. */ +#define XFS_SWAP_REQ_SKIP_INO1_HOLES (1U << 2) + +/* Files need to be upgraded to have large extent counts. */ +#define XFS_SWAP_REQ_NREXT64 (1U << 3) + +#define XFS_SWAP_REQ_FLAGS (XFS_SWAP_REQ_LOGGED | \ + XFS_SWAP_REQ_SET_SIZES | \ + XFS_SWAP_REQ_SKIP_INO1_HOLES | \ + XFS_SWAP_REQ_NREXT64) + +#define XFS_SWAP_REQ_STRINGS \ + { XFS_SWAP_REQ_LOGGED, "LOGGED" }, \ + { XFS_SWAP_REQ_SET_SIZES, "SETSIZES" }, \ + { XFS_SWAP_REQ_SKIP_INO1_HOLES, "SKIP_INO1_HOLES" }, \ + { XFS_SWAP_REQ_NREXT64, "NREXT64" } + +unsigned int xfs_swapext_reflink_prep(const struct xfs_swapext_req *req); +void xfs_swapext_reflink_finish(struct xfs_trans *tp, + const struct xfs_swapext_req *req, unsigned int reflink_state); + +int xfs_swapext_estimate(struct xfs_swapext_req *req); + +extern struct kmem_cache *xfs_swapext_intent_cache; + +int __init xfs_swapext_intent_init_cache(void); +void xfs_swapext_intent_destroy_cache(void); + +struct xfs_swapext_intent *xfs_swapext_init_intent( + const struct xfs_swapext_req *req, unsigned int *reflink_state); +void xfs_swapext_ensure_reflink(struct xfs_trans *tp, + const struct xfs_swapext_intent *sxi, unsigned int reflink_state); + +void xfs_swapext_schedule(struct xfs_trans *tp, + struct xfs_swapext_intent *sxi); +int xfs_swapext_finish_one(struct xfs_trans *tp, + struct xfs_swapext_intent *sxi); + +int xfs_swapext_check_extents(struct xfs_mount *mp, + const struct xfs_swapext_req *req); + +void xfs_swapext(struct xfs_trans *tp, const struct xfs_swapext_req *req); + #endif /* __XFS_SWAPEXT_H_ */ diff --git a/fs/xfs/libxfs/xfs_trans_space.h b/fs/xfs/libxfs/xfs_trans_space.h index 87b31c69a773..9640fc232c14 100644 --- a/fs/xfs/libxfs/xfs_trans_space.h +++ b/fs/xfs/libxfs/xfs_trans_space.h @@ -10,6 +10,10 @@ * Components of space reservations. */ +/* Worst case number of bmaps that can be held in a block. */ +#define XFS_MAX_CONTIG_BMAPS_PER_BLOCK(mp) \ + (((mp)->m_bmap_dmxr[0]) - ((mp)->m_bmap_dmnr[0])) + /* Worst case number of rmaps that can be held in a block. */ #define XFS_MAX_CONTIG_RMAPS_PER_BLOCK(mp) \ (((mp)->m_rmap_mxr[0]) - ((mp)->m_rmap_mnr[0])) diff --git a/fs/xfs/xfs_swapext_item.c b/fs/xfs/xfs_swapext_item.c index ea4a3a8de7e3..24331a702497 100644 --- a/fs/xfs/xfs_swapext_item.c +++ b/fs/xfs/xfs_swapext_item.c @@ -16,13 +16,17 @@ #include "xfs_trans.h" #include "xfs_trans_priv.h" #include "xfs_swapext_item.h" +#include "xfs_swapext.h" #include "xfs_log.h" #include "xfs_bmap.h" #include "xfs_icache.h" +#include "xfs_bmap_btree.h" #include "xfs_trans_space.h" #include "xfs_error.h" #include "xfs_log_priv.h" #include "xfs_log_recover.h" +#include "xfs_xchgrange.h" +#include "xfs_trace.h" struct kmem_cache *xfs_sxi_cache; struct kmem_cache *xfs_sxd_cache; @@ -206,13 +210,333 @@ static const struct xfs_item_ops xfs_sxd_item_ops = { .iop_intent = xfs_sxd_item_intent, }; +static struct xfs_sxd_log_item * +xfs_trans_get_sxd( + struct xfs_trans *tp, + struct xfs_sxi_log_item *sxi_lip) +{ + struct xfs_sxd_log_item *sxd_lip; + + sxd_lip = kmem_cache_zalloc(xfs_sxd_cache, GFP_KERNEL | __GFP_NOFAIL); + xfs_log_item_init(tp->t_mountp, &sxd_lip->sxd_item, XFS_LI_SXD, + &xfs_sxd_item_ops); + sxd_lip->sxd_intent_log_item = sxi_lip; + sxd_lip->sxd_format.sxd_sxi_id = sxi_lip->sxi_format.sxi_id; + + xfs_trans_add_item(tp, &sxd_lip->sxd_item); + return sxd_lip; +} + +/* + * Finish an swapext update and log it to the SXD. Note that the transaction is + * marked dirty regardless of whether the swapext update succeeds or fails to + * support the SXI/SXD lifecycle rules. + */ +static int +xfs_swapext_finish_update( + struct xfs_trans *tp, + struct xfs_log_item *done, + struct xfs_swapext_intent *sxi) +{ + int error; + + error = xfs_swapext_finish_one(tp, sxi); + + /* + * Mark the transaction dirty, even on error. This ensures the + * transaction is aborted, which: + * + * 1.) releases the SXI and frees the SXD + * 2.) shuts down the filesystem + */ + tp->t_flags |= XFS_TRANS_DIRTY; + if (done) + set_bit(XFS_LI_DIRTY, &done->li_flags); + + return error; +} + +/* Log swapext updates in the intent item. */ +STATIC struct xfs_log_item * +xfs_swapext_create_intent( + struct xfs_trans *tp, + struct list_head *items, + unsigned int count, + bool sort) +{ + struct xfs_sxi_log_item *sxi_lip; + struct xfs_swapext_intent *sxi; + struct xfs_swap_extent *sx; + + ASSERT(count == 1); + + sxi = list_first_entry_or_null(items, struct xfs_swapext_intent, + sxi_list); + + /* + * We use the same defer ops control machinery to perform extent swaps + * even if we aren't using the machinery to track the operation status + * through log items. + */ + if (!(sxi->sxi_op_flags & XFS_SWAP_EXT_OP_LOGGED)) + return NULL; + + sxi_lip = xfs_sxi_init(tp->t_mountp); + xfs_trans_add_item(tp, &sxi_lip->sxi_item); + tp->t_flags |= XFS_TRANS_DIRTY; + set_bit(XFS_LI_DIRTY, &sxi_lip->sxi_item.li_flags); + + sx = &sxi_lip->sxi_format.sxi_extent; + sx->sx_inode1 = sxi->sxi_ip1->i_ino; + sx->sx_inode2 = sxi->sxi_ip2->i_ino; + sx->sx_startoff1 = sxi->sxi_startoff1; + sx->sx_startoff2 = sxi->sxi_startoff2; + sx->sx_blockcount = sxi->sxi_blockcount; + sx->sx_isize1 = sxi->sxi_isize1; + sx->sx_isize2 = sxi->sxi_isize2; + sx->sx_flags = sxi->sxi_flags; + + return &sxi_lip->sxi_item; +} + +STATIC struct xfs_log_item * +xfs_swapext_create_done( + struct xfs_trans *tp, + struct xfs_log_item *intent, + unsigned int count) +{ + if (intent == NULL) + return NULL; + return &xfs_trans_get_sxd(tp, SXI_ITEM(intent))->sxd_item; +} + +/* Process a deferred swapext update. */ +STATIC int +xfs_swapext_finish_item( + struct xfs_trans *tp, + struct xfs_log_item *done, + struct list_head *item, + struct xfs_btree_cur **state) +{ + struct xfs_swapext_intent *sxi; + int error; + + sxi = container_of(item, struct xfs_swapext_intent, sxi_list); + + /* + * Swap one more extent between the two files. If there's still more + * work to do, we want to requeue ourselves after all other pending + * deferred operations have finished. This includes all of the dfops + * that we queued directly as well as any new ones created in the + * process of finishing the others. Doing so prevents us from queuing + * a large number of SXI log items in kernel memory, which in turn + * prevents us from pinning the tail of the log (while logging those + * new SXI items) until the first SXI items can be processed. + */ + error = xfs_swapext_finish_update(tp, done, sxi); + if (error == -EAGAIN) + return error; + + kmem_cache_free(xfs_swapext_intent_cache, sxi); + return error; +} + +/* Abort all pending SXIs. */ +STATIC void +xfs_swapext_abort_intent( + struct xfs_log_item *intent) +{ + xfs_sxi_release(SXI_ITEM(intent)); +} + +/* Cancel a deferred swapext update. */ +STATIC void +xfs_swapext_cancel_item( + struct list_head *item) +{ + struct xfs_swapext_intent *sxi; + + sxi = container_of(item, struct xfs_swapext_intent, sxi_list); + kmem_cache_free(xfs_swapext_intent_cache, sxi); +} + +const struct xfs_defer_op_type xfs_swapext_defer_type = { + .max_items = 1, + .create_intent = xfs_swapext_create_intent, + .abort_intent = xfs_swapext_abort_intent, + .create_done = xfs_swapext_create_done, + .finish_item = xfs_swapext_finish_item, + .cancel_item = xfs_swapext_cancel_item, +}; + +/* Is this recovered SXI ok? */ +static inline bool +xfs_sxi_validate( + struct xfs_mount *mp, + struct xfs_sxi_log_item *sxi_lip) +{ + struct xfs_swap_extent *sx = &sxi_lip->sxi_format.sxi_extent; + + if (!xfs_sb_version_haslogswapext(&mp->m_sb)) + return false; + + if (sxi_lip->sxi_format.__pad != 0) + return false; + + if (sx->sx_flags & ~XFS_SWAP_EXT_FLAGS) + return false; + + if (!xfs_verify_ino(mp, sx->sx_inode1) || + !xfs_verify_ino(mp, sx->sx_inode2)) + return false; + + if ((sx->sx_flags & XFS_SWAP_EXT_SET_SIZES) && + (sx->sx_isize1 < 0 || sx->sx_isize2 < 0)) + return false; + + if (!xfs_verify_fileext(mp, sx->sx_startoff1, sx->sx_blockcount)) + return false; + + return xfs_verify_fileext(mp, sx->sx_startoff2, sx->sx_blockcount); +} + +/* + * Use the recovered log state to create a new request, estimate resource + * requirements, and create a new incore intent state. + */ +STATIC struct xfs_swapext_intent * +xfs_sxi_item_recover_intent( + struct xfs_mount *mp, + const struct xfs_swap_extent *sx, + struct xfs_swapext_req *req, + unsigned int *reflink_state) +{ + struct xfs_inode *ip1, *ip2; + int error; + + /* + * Grab both inodes and set IRECOVERY to prevent trimming of post-eof + * extents and freeing of unlinked inodes until we're totally done + * processing files. + */ + error = xlog_recover_iget(mp, sx->sx_inode1, &ip1); + if (error) + return ERR_PTR(error); + error = xlog_recover_iget(mp, sx->sx_inode2, &ip2); + if (error) + goto err_rele1; + + req->ip1 = ip1; + req->ip2 = ip2; + req->startoff1 = sx->sx_startoff1; + req->startoff2 = sx->sx_startoff2; + req->blockcount = sx->sx_blockcount; + + if (sx->sx_flags & XFS_SWAP_EXT_ATTR_FORK) + req->whichfork = XFS_ATTR_FORK; + else + req->whichfork = XFS_DATA_FORK; + + if (sx->sx_flags & XFS_SWAP_EXT_SET_SIZES) + req->req_flags |= XFS_SWAP_REQ_SET_SIZES; + if (sx->sx_flags & XFS_SWAP_EXT_SKIP_INO1_HOLES) + req->req_flags |= XFS_SWAP_REQ_SKIP_INO1_HOLES; + req->req_flags |= XFS_SWAP_REQ_LOGGED; + + xfs_xchg_range_ilock(NULL, ip1, ip2); + error = xfs_swapext_estimate(req); + xfs_xchg_range_iunlock(ip1, ip2); + if (error) + goto err_rele2; + + return xfs_swapext_init_intent(req, reflink_state); + +err_rele2: + xfs_irele(ip2); +err_rele1: + xfs_irele(ip1); + return ERR_PTR(error); +} + /* Process a swapext update intent item that was recovered from the log. */ STATIC int xfs_sxi_item_recover( - struct xfs_log_item *lip, - struct list_head *capture_list) + struct xfs_log_item *lip, + struct list_head *capture_list) { - return -EFSCORRUPTED; + struct xfs_swapext_req req = { .req_flags = 0 }; + struct xfs_swapext_intent *sxi; + struct xfs_sxi_log_item *sxi_lip = SXI_ITEM(lip); + struct xfs_mount *mp = lip->li_log->l_mp; + struct xfs_swap_extent *sx = &sxi_lip->sxi_format.sxi_extent; + struct xfs_sxd_log_item *sxd_lip = NULL; + struct xfs_trans *tp; + struct xfs_inode *ip1, *ip2; + unsigned int reflink_state; + int error = 0; + + if (!xfs_sxi_validate(mp, sxi_lip)) { + XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp, + &sxi_lip->sxi_format, + sizeof(sxi_lip->sxi_format)); + return -EFSCORRUPTED; + } + + sxi = xfs_sxi_item_recover_intent(mp, sx, &req, &reflink_state); + if (IS_ERR(sxi)) + return PTR_ERR(sxi); + + trace_xfs_swapext_recover(mp, sxi); + + ip1 = sxi->sxi_ip1; + ip2 = sxi->sxi_ip2; + + error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, req.resblks, 0, 0, + &tp); + if (error) + goto err_rele; + + sxd_lip = xfs_trans_get_sxd(tp, sxi_lip); + + xfs_xchg_range_ilock(tp, ip1, ip2); + + xfs_swapext_ensure_reflink(tp, sxi, reflink_state); + error = xfs_swapext_finish_update(tp, &sxd_lip->sxd_item, sxi); + if (error == -EAGAIN) { + /* + * If there's more extent swapping to be done, we have to + * schedule that as a separate deferred operation to be run + * after we've finished replaying all of the intents we + * recovered from the log. Transfer ownership of the sxi to + * the transaction. + */ + xfs_swapext_schedule(tp, sxi); + error = 0; + sxi = NULL; + } + if (error == -EFSCORRUPTED) + XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp, sx, + sizeof(*sx)); + if (error) + goto err_cancel; + + /* + * Commit transaction, which frees the transaction and saves the inodes + * for later replay activities. + */ + error = xfs_defer_ops_capture_and_commit(tp, capture_list); + goto err_unlock; + +err_cancel: + xfs_trans_cancel(tp); +err_unlock: + xfs_xchg_range_iunlock(ip1, ip2); +err_rele: + if (sxi) + kmem_cache_free(xfs_swapext_intent_cache, sxi); + xfs_irele(ip2); + xfs_irele(ip1); + return error; } STATIC bool @@ -229,8 +553,21 @@ xfs_sxi_item_relog( struct xfs_log_item *intent, struct xfs_trans *tp) { - ASSERT(0); - return NULL; + struct xfs_sxd_log_item *sxd_lip; + struct xfs_sxi_log_item *sxi_lip; + struct xfs_swap_extent *sx; + + sx = &SXI_ITEM(intent)->sxi_format.sxi_extent; + + tp->t_flags |= XFS_TRANS_DIRTY; + sxd_lip = xfs_trans_get_sxd(tp, SXI_ITEM(intent)); + set_bit(XFS_LI_DIRTY, &sxd_lip->sxd_item.li_flags); + + sxi_lip = xfs_sxi_init(tp->t_mountp); + memcpy(&sxi_lip->sxi_format.sxi_extent, sx, sizeof(*sx)); + xfs_trans_add_item(tp, &sxi_lip->sxi_item); + set_bit(XFS_LI_DIRTY, &sxi_lip->sxi_item.li_flags); + return &sxi_lip->sxi_item; } static const struct xfs_item_ops xfs_sxi_item_ops = { @@ -264,17 +601,17 @@ xlog_recover_sxi_commit_pass2( sxi_formatp = item->ri_buf[0].i_addr; - if (sxi_formatp->__pad != 0) { - XFS_ERROR_REPORT(__func__, XFS_ERRLEVEL_LOW, log->l_mp); - return -EFSCORRUPTED; - } - len = sizeof(struct xfs_sxi_log_format); if (item->ri_buf[0].i_len != len) { XFS_ERROR_REPORT(__func__, XFS_ERRLEVEL_LOW, log->l_mp); return -EFSCORRUPTED; } + if (sxi_formatp->__pad != 0) { + XFS_ERROR_REPORT(__func__, XFS_ERRLEVEL_LOW, log->l_mp); + return -EFSCORRUPTED; + } + sxi_lip = xfs_sxi_init(mp); memcpy(&sxi_lip->sxi_format, sxi_formatp, len); diff --git a/fs/xfs/xfs_trace.c b/fs/xfs/xfs_trace.c index c9a5d8087b63..b43b973f0e10 100644 --- a/fs/xfs/xfs_trace.c +++ b/fs/xfs/xfs_trace.c @@ -40,6 +40,7 @@ #include "scrub/xfbtree.h" #include "xfs_btree_mem.h" #include "xfs_bmap.h" +#include "xfs_swapext.h" /* * We include this last to have the helpers above available for the trace diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h index 15bd6b86b514..9ebaa5ffe504 100644 --- a/fs/xfs/xfs_trace.h +++ b/fs/xfs/xfs_trace.h @@ -78,6 +78,8 @@ union xfs_btree_ptr; struct xfs_dqtrx; struct xfs_icwalk; struct xfs_bmap_intent; +struct xfs_swapext_intent; +struct xfs_swapext_req; #define XFS_ATTR_FILTER_FLAGS \ { XFS_ATTR_ROOT, "ROOT" }, \ @@ -2173,7 +2175,7 @@ TRACE_EVENT(xfs_dir2_leafn_moveents, __entry->count) ); -#define XFS_SWAPEXT_INODES \ +#define XFS_SWAP_EXT_INODES \ { 0, "target" }, \ { 1, "temp" } @@ -2208,7 +2210,7 @@ DECLARE_EVENT_CLASS(xfs_swap_extent_class, "broot size %d, forkoff 0x%x", MAJOR(__entry->dev), MINOR(__entry->dev), __entry->ino, - __print_symbolic(__entry->which, XFS_SWAPEXT_INODES), + __print_symbolic(__entry->which, XFS_SWAP_EXT_INODES), __print_symbolic(__entry->format, XFS_INODE_FORMAT_STR), __entry->nex, __entry->broot_size, @@ -3761,6 +3763,9 @@ DEFINE_INODE_IREC_EVENT(xfs_reflink_cancel_cow); DEFINE_INODE_IREC_EVENT(xfs_swap_extent_rmap_remap); DEFINE_INODE_IREC_EVENT(xfs_swap_extent_rmap_remap_piece); DEFINE_INODE_ERROR_EVENT(xfs_swap_extent_rmap_error); +DEFINE_INODE_IREC_EVENT(xfs_swapext_extent1); +DEFINE_INODE_IREC_EVENT(xfs_swapext_extent2); +DEFINE_ITRUNC_EVENT(xfs_swapext_update_inode_size); /* fsmap traces */ DECLARE_EVENT_CLASS(xfs_fsmap_class, @@ -4581,6 +4586,212 @@ DEFINE_PERAG_INTENTS_EVENT(xfs_perag_wait_intents); #endif /* CONFIG_XFS_DRAIN_INTENTS */ +TRACE_EVENT(xfs_swapext_overhead, + TP_PROTO(struct xfs_mount *mp, unsigned long long bmbt_blocks, + unsigned long long rmapbt_blocks), + TP_ARGS(mp, bmbt_blocks, rmapbt_blocks), + TP_STRUCT__entry( + __field(dev_t, dev) + __field(unsigned long long, bmbt_blocks) + __field(unsigned long long, rmapbt_blocks) + ), + TP_fast_assign( + __entry->dev = mp->m_super->s_dev; + __entry->bmbt_blocks = bmbt_blocks; + __entry->rmapbt_blocks = rmapbt_blocks; + ), + TP_printk("dev %d:%d bmbt_blocks 0x%llx rmapbt_blocks 0x%llx", + MAJOR(__entry->dev), MINOR(__entry->dev), + __entry->bmbt_blocks, + __entry->rmapbt_blocks) +); + +DECLARE_EVENT_CLASS(xfs_swapext_estimate_class, + TP_PROTO(const struct xfs_swapext_req *req), + TP_ARGS(req), + TP_STRUCT__entry( + __field(dev_t, dev) + __field(xfs_ino_t, ino1) + __field(xfs_ino_t, ino2) + __field(xfs_fileoff_t, startoff1) + __field(xfs_fileoff_t, startoff2) + __field(xfs_filblks_t, blockcount) + __field(int, whichfork) + __field(unsigned int, req_flags) + __field(xfs_filblks_t, ip1_bcount) + __field(xfs_filblks_t, ip2_bcount) + __field(xfs_filblks_t, ip1_rtbcount) + __field(xfs_filblks_t, ip2_rtbcount) + __field(unsigned long long, resblks) + __field(unsigned long long, nr_exchanges) + ), + TP_fast_assign( + __entry->dev = req->ip1->i_mount->m_super->s_dev; + __entry->ino1 = req->ip1->i_ino; + __entry->ino2 = req->ip2->i_ino; + __entry->startoff1 = req->startoff1; + __entry->startoff2 = req->startoff2; + __entry->blockcount = req->blockcount; + __entry->whichfork = req->whichfork; + __entry->req_flags = req->req_flags; + __entry->ip1_bcount = req->ip1_bcount; + __entry->ip2_bcount = req->ip2_bcount; + __entry->ip1_rtbcount = req->ip1_rtbcount; + __entry->ip2_rtbcount = req->ip2_rtbcount; + __entry->resblks = req->resblks; + __entry->nr_exchanges = req->nr_exchanges; + ), + TP_printk("dev %d:%d ino1 0x%llx fileoff1 0x%llx ino2 0x%llx fileoff2 0x%llx fsbcount 0x%llx flags (%s) fork %s bcount1 0x%llx rtbcount1 0x%llx bcount2 0x%llx rtbcount2 0x%llx resblks 0x%llx nr_exchanges %llu", + MAJOR(__entry->dev), MINOR(__entry->dev), + __entry->ino1, __entry->startoff1, + __entry->ino2, __entry->startoff2, + __entry->blockcount, + __print_flags(__entry->req_flags, "|", XFS_SWAP_REQ_STRINGS), + __print_symbolic(__entry->whichfork, XFS_WHICHFORK_STRINGS), + __entry->ip1_bcount, + __entry->ip1_rtbcount, + __entry->ip2_bcount, + __entry->ip2_rtbcount, + __entry->resblks, + __entry->nr_exchanges) +); + +#define DEFINE_SWAPEXT_ESTIMATE_EVENT(name) \ +DEFINE_EVENT(xfs_swapext_estimate_class, name, \ + TP_PROTO(const struct xfs_swapext_req *req), \ + TP_ARGS(req)) +DEFINE_SWAPEXT_ESTIMATE_EVENT(xfs_swapext_initial_estimate); +DEFINE_SWAPEXT_ESTIMATE_EVENT(xfs_swapext_final_estimate); + +DECLARE_EVENT_CLASS(xfs_swapext_intent_class, + TP_PROTO(struct xfs_mount *mp, const struct xfs_swapext_intent *sxi), + TP_ARGS(mp, sxi), + TP_STRUCT__entry( + __field(dev_t, dev) + __field(xfs_ino_t, ino1) + __field(xfs_ino_t, ino2) + __field(unsigned int, flags) + __field(unsigned int, opflags) + __field(xfs_fileoff_t, startoff1) + __field(xfs_fileoff_t, startoff2) + __field(xfs_filblks_t, blockcount) + __field(xfs_fsize_t, isize1) + __field(xfs_fsize_t, isize2) + __field(xfs_fsize_t, new_isize1) + __field(xfs_fsize_t, new_isize2) + ), + TP_fast_assign( + __entry->dev = mp->m_super->s_dev; + __entry->ino1 = sxi->sxi_ip1->i_ino; + __entry->ino2 = sxi->sxi_ip2->i_ino; + __entry->flags = sxi->sxi_flags; + __entry->opflags = sxi->sxi_op_flags; + __entry->startoff1 = sxi->sxi_startoff1; + __entry->startoff2 = sxi->sxi_startoff2; + __entry->blockcount = sxi->sxi_blockcount; + __entry->isize1 = sxi->sxi_ip1->i_disk_size; + __entry->isize2 = sxi->sxi_ip2->i_disk_size; + __entry->new_isize1 = sxi->sxi_isize1; + __entry->new_isize2 = sxi->sxi_isize2; + ), + TP_printk("dev %d:%d ino1 0x%llx fileoff1 0x%llx ino2 0x%llx fileoff2 0x%llx fsbcount 0x%llx flags (%s) opflags (%s) isize1 0x%llx newisize1 0x%llx isize2 0x%llx newisize2 0x%llx", + MAJOR(__entry->dev), MINOR(__entry->dev), + __entry->ino1, __entry->startoff1, + __entry->ino2, __entry->startoff2, + __entry->blockcount, + __print_flags(__entry->flags, "|", XFS_SWAP_EXT_STRINGS), + __print_flags(__entry->opflags, "|", XFS_SWAP_EXT_OP_STRINGS), + __entry->isize1, __entry->new_isize1, + __entry->isize2, __entry->new_isize2) +); + +#define DEFINE_SWAPEXT_INTENT_EVENT(name) \ +DEFINE_EVENT(xfs_swapext_intent_class, name, \ + TP_PROTO(struct xfs_mount *mp, const struct xfs_swapext_intent *sxi), \ + TP_ARGS(mp, sxi)) +DEFINE_SWAPEXT_INTENT_EVENT(xfs_swapext_defer); +DEFINE_SWAPEXT_INTENT_EVENT(xfs_swapext_recover); + +TRACE_EVENT(xfs_swapext_delta_nextents_step, + TP_PROTO(struct xfs_mount *mp, + const struct xfs_bmbt_irec *left, + const struct xfs_bmbt_irec *curr, + const struct xfs_bmbt_irec *new, + const struct xfs_bmbt_irec *right, + int delta, unsigned int state), + TP_ARGS(mp, left, curr, new, right, delta, state), + TP_STRUCT__entry( + __field(dev_t, dev) + __field(xfs_fileoff_t, loff) + __field(xfs_fsblock_t, lstart) + __field(xfs_filblks_t, lcount) + __field(xfs_fileoff_t, coff) + __field(xfs_fsblock_t, cstart) + __field(xfs_filblks_t, ccount) + __field(xfs_fileoff_t, noff) + __field(xfs_fsblock_t, nstart) + __field(xfs_filblks_t, ncount) + __field(xfs_fileoff_t, roff) + __field(xfs_fsblock_t, rstart) + __field(xfs_filblks_t, rcount) + __field(int, delta) + __field(unsigned int, state) + ), + TP_fast_assign( + __entry->dev = mp->m_super->s_dev; + __entry->loff = left->br_startoff; + __entry->lstart = left->br_startblock; + __entry->lcount = left->br_blockcount; + __entry->coff = curr->br_startoff; + __entry->cstart = curr->br_startblock; + __entry->ccount = curr->br_blockcount; + __entry->noff = new->br_startoff; + __entry->nstart = new->br_startblock; + __entry->ncount = new->br_blockcount; + __entry->roff = right->br_startoff; + __entry->rstart = right->br_startblock; + __entry->rcount = right->br_blockcount; + __entry->delta = delta; + __entry->state = state; + ), + TP_printk("dev %d:%d left 0x%llx:0x%llx:0x%llx; curr 0x%llx:0x%llx:0x%llx <- new 0x%llx:0x%llx:0x%llx; right 0x%llx:0x%llx:0x%llx delta %d state 0x%x", + MAJOR(__entry->dev), MINOR(__entry->dev), + __entry->loff, __entry->lstart, __entry->lcount, + __entry->coff, __entry->cstart, __entry->ccount, + __entry->noff, __entry->nstart, __entry->ncount, + __entry->roff, __entry->rstart, __entry->rcount, + __entry->delta, __entry->state) +); + +TRACE_EVENT(xfs_swapext_delta_nextents, + TP_PROTO(const struct xfs_swapext_req *req, int64_t d_nexts1, + int64_t d_nexts2), + TP_ARGS(req, d_nexts1, d_nexts2), + TP_STRUCT__entry( + __field(dev_t, dev) + __field(xfs_ino_t, ino1) + __field(xfs_ino_t, ino2) + __field(xfs_extnum_t, nexts1) + __field(xfs_extnum_t, nexts2) + __field(int64_t, d_nexts1) + __field(int64_t, d_nexts2) + ), + TP_fast_assign( + __entry->dev = req->ip1->i_mount->m_super->s_dev; + __entry->ino1 = req->ip1->i_ino; + __entry->ino2 = req->ip2->i_ino; + __entry->nexts1 = xfs_ifork_ptr(req->ip1, req->whichfork)->if_nextents; + __entry->nexts2 = xfs_ifork_ptr(req->ip2, req->whichfork)->if_nextents; + __entry->d_nexts1 = d_nexts1; + __entry->d_nexts2 = d_nexts2; + ), + TP_printk("dev %d:%d ino1 0x%llx nexts %llu ino2 0x%llx nexts %llu delta1 %lld delta2 %lld", + MAJOR(__entry->dev), MINOR(__entry->dev), + __entry->ino1, __entry->nexts1, + __entry->ino2, __entry->nexts2, + __entry->d_nexts1, __entry->d_nexts2) +); + #endif /* _TRACE_XFS_H */ #undef TRACE_INCLUDE_PATH diff --git a/fs/xfs/xfs_xchgrange.c b/fs/xfs/xfs_xchgrange.c new file mode 100644 index 000000000000..0dba5078c9f7 --- /dev/null +++ b/fs/xfs/xfs_xchgrange.c @@ -0,0 +1,65 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Copyright (C) 2022 Oracle. All Rights Reserved. + * Author: Darrick J. Wong + */ +#include "xfs.h" +#include "xfs_fs.h" +#include "xfs_shared.h" +#include "xfs_format.h" +#include "xfs_log_format.h" +#include "xfs_trans_resv.h" +#include "xfs_mount.h" +#include "xfs_defer.h" +#include "xfs_inode.h" +#include "xfs_trans.h" +#include "xfs_swapext.h" +#include "xfs_xchgrange.h" + +/* Lock (and optionally join) two inodes for a file range exchange. */ +void +xfs_xchg_range_ilock( + struct xfs_trans *tp, + struct xfs_inode *ip1, + struct xfs_inode *ip2) +{ + if (ip1 != ip2) + xfs_lock_two_inodes(ip1, XFS_ILOCK_EXCL, + ip2, XFS_ILOCK_EXCL); + else + xfs_ilock(ip1, XFS_ILOCK_EXCL); + if (tp) { + xfs_trans_ijoin(tp, ip1, 0); + if (ip2 != ip1) + xfs_trans_ijoin(tp, ip2, 0); + } + +} + +/* Unlock two inodes after a file range exchange operation. */ +void +xfs_xchg_range_iunlock( + struct xfs_inode *ip1, + struct xfs_inode *ip2) +{ + if (ip2 != ip1) + xfs_iunlock(ip2, XFS_ILOCK_EXCL); + xfs_iunlock(ip1, XFS_ILOCK_EXCL); +} + +/* + * Estimate the resource requirements to exchange file contents between the two + * files. The caller is required to hold the IOLOCK and the MMAPLOCK and to + * have flushed both inodes' pagecache and active direct-ios. + */ +int +xfs_xchg_range_estimate( + struct xfs_swapext_req *req) +{ + int error; + + xfs_xchg_range_ilock(NULL, req->ip1, req->ip2); + error = xfs_swapext_estimate(req); + xfs_xchg_range_iunlock(req->ip1, req->ip2); + return error; +} diff --git a/fs/xfs/xfs_xchgrange.h b/fs/xfs/xfs_xchgrange.h new file mode 100644 index 000000000000..89320a354efa --- /dev/null +++ b/fs/xfs/xfs_xchgrange.h @@ -0,0 +1,17 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +/* + * Copyright (C) 2022 Oracle. All Rights Reserved. + * Author: Darrick J. Wong + */ +#ifndef __XFS_XCHGRANGE_H__ +#define __XFS_XCHGRANGE_H__ + +struct xfs_swapext_req; + +void xfs_xchg_range_ilock(struct xfs_trans *tp, struct xfs_inode *ip1, + struct xfs_inode *ip2); +void xfs_xchg_range_iunlock(struct xfs_inode *ip1, struct xfs_inode *ip2); + +int xfs_xchg_range_estimate(struct xfs_swapext_req *req); + +#endif /* __XFS_XCHGRANGE_H__ */ From patchwork Fri Dec 30 22:13:56 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13084975 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id BF11BC4708E for ; Fri, 30 Dec 2022 23:52:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235828AbiL3XwQ (ORCPT ); Fri, 30 Dec 2022 18:52:16 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49098 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235825AbiL3XwP (ORCPT ); Fri, 30 Dec 2022 18:52:15 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BBBFE1E3C1; Fri, 30 Dec 2022 15:52:14 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 592F460CF0; Fri, 30 Dec 2022 23:52:14 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id B7BE6C433EF; Fri, 30 Dec 2022 23:52:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1672444333; bh=E36jUGCApd/U0WfGcZmj11kf51+yw1h25/Myy9ahPVo=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=pXGO+oJSU+PQuz3FMghAu7cuZ9kMGOAvlXNYoyHBAGoFfOU2v7ztjtR8cdBMli48p yRj/lfwhj/pElR9XqbRI5bPxhx3KWFDtE0VwBNbrfGAyHiY/4GdGC8hcRXB5LlcIVr eqH/NpYJg2l4JqBhRhbyC67tg8oV4aC2hRVIkO84/LrhOFj8QBJbWlXD3+uNVRi/fw ro8puXOpm+c1igxihyEPBxLVel0kBbv63G4Wj0X+ElMTImLoiPNlms0zrAI4PhCX/U lF7tRDZMXo9EniyEtEXmQlumz8zVp6yUpYToUYc2MI2i6b2ISz1V6b3t/6Js30AwG8 VV/yYA1J2WiMw== Subject: [PATCH 08/21] xfs: enable xlog users to toggle atomic extent swapping From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org Date: Fri, 30 Dec 2022 14:13:56 -0800 Message-ID: <167243843639.699466.9221823995188278604.stgit@magnolia> In-Reply-To: <167243843494.699466.5163281976943635014.stgit@magnolia> References: <167243843494.699466.5163281976943635014.stgit@magnolia> User-Agent: StGit/0.19 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Darrick J. Wong Plumb the necessary bits into the xlog code so that higher level callers can enable the atomic extent swapping feature and have it clear automatically when possible. Signed-off-by: Darrick J. Wong --- fs/xfs/xfs_log.c | 13 +++++++++++++ fs/xfs/xfs_log.h | 1 + fs/xfs/xfs_log_priv.h | 1 + 3 files changed, 15 insertions(+) diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c index a0ef09addc84..37e85c1bb913 100644 --- a/fs/xfs/xfs_log.c +++ b/fs/xfs/xfs_log.c @@ -1501,11 +1501,17 @@ xlog_clear_incompat( if (down_write_trylock(&log->l_incompat_xattrs)) incompat_mask |= XFS_SB_FEAT_INCOMPAT_LOG_XATTRS; + if (down_write_trylock(&log->l_incompat_swapext)) + incompat_mask |= XFS_SB_FEAT_INCOMPAT_LOG_SWAPEXT; + if (!incompat_mask) return; xfs_clear_incompat_log_features(mp, incompat_mask); + if (incompat_mask & XFS_SB_FEAT_INCOMPAT_LOG_SWAPEXT) + up_write(&log->l_incompat_swapext); + if (incompat_mask & XFS_SB_FEAT_INCOMPAT_LOG_XATTRS) up_write(&log->l_incompat_xattrs); } @@ -1625,6 +1631,7 @@ xlog_alloc_log( log->l_sectBBsize = 1 << log2_size; init_rwsem(&log->l_incompat_xattrs); + init_rwsem(&log->l_incompat_swapext); xlog_get_iclog_buffer_size(mp, log); @@ -3922,6 +3929,9 @@ xlog_use_incompat_feat( case XLOG_INCOMPAT_FEAT_XATTRS: down_read(&log->l_incompat_xattrs); break; + case XLOG_INCOMPAT_FEAT_SWAPEXT: + down_read(&log->l_incompat_swapext); + break; } } @@ -3935,5 +3945,8 @@ xlog_drop_incompat_feat( case XLOG_INCOMPAT_FEAT_XATTRS: up_read(&log->l_incompat_xattrs); break; + case XLOG_INCOMPAT_FEAT_SWAPEXT: + up_read(&log->l_incompat_swapext); + break; } } diff --git a/fs/xfs/xfs_log.h b/fs/xfs/xfs_log.h index d187f6445909..30bdbf8ee25c 100644 --- a/fs/xfs/xfs_log.h +++ b/fs/xfs/xfs_log.h @@ -161,6 +161,7 @@ bool xlog_force_shutdown(struct xlog *log, uint32_t shutdown_flags); enum xlog_incompat_feat { XLOG_INCOMPAT_FEAT_XATTRS = XFS_SB_FEAT_INCOMPAT_LOG_XATTRS, + XLOG_INCOMPAT_FEAT_SWAPEXT = XFS_SB_FEAT_INCOMPAT_LOG_SWAPEXT }; void xlog_use_incompat_feat(struct xlog *log, enum xlog_incompat_feat what); diff --git a/fs/xfs/xfs_log_priv.h b/fs/xfs/xfs_log_priv.h index a13b5b6b744d..6cbee6996de5 100644 --- a/fs/xfs/xfs_log_priv.h +++ b/fs/xfs/xfs_log_priv.h @@ -448,6 +448,7 @@ struct xlog { /* Users of log incompat features should take a read lock. */ struct rw_semaphore l_incompat_xattrs; + struct rw_semaphore l_incompat_swapext; }; /* From patchwork Fri Dec 30 22:13:56 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13084976 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C4589C4708E for ; Fri, 30 Dec 2022 23:52:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235825AbiL3Xwd (ORCPT ); Fri, 30 Dec 2022 18:52:33 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49130 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235659AbiL3Xwc (ORCPT ); Fri, 30 Dec 2022 18:52:32 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 857CA10055; Fri, 30 Dec 2022 15:52:30 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 11C8F61B98; Fri, 30 Dec 2022 23:52:30 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 68146C433EF; Fri, 30 Dec 2022 23:52:29 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1672444349; bh=NeqfzQouOBkOBZfwE9K4phTCrJafqp+G6WzpwGGIxVI=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=U0bujZLLKi6ScJU9r2kO1NtF7YRCQiacr/XIDyLr5bCtjp+q1DVMtGj4kV57yIAE7 dXVJo6kNvP1plnKxA8q5ByoRwgFA5vyzTRFnjujgnYTPt31DCaVrW3+MhMqr9V1TGa Ido+8ByA+BBLA0x0SZtU6W4Q/QrsIUi0sLLMV9jkV6G0qYCb2MotAC/Ko1XoZBdm1F lVycU1HeCn3ejyn3urDedPUiCnWI8auUAAjcqKv1i8NeFVH3nFZeS/9NfUnx0DZXcP HUuVXjwAxsH1jWZh12ah8jE32iF5rW4R/+SOweRHwSbGV7StacR26LcpIbdN+Nm+KJ B6WEwz/RrhQrA== Subject: [PATCH 09/21] xfs: add a ->xchg_file_range handler From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org Date: Fri, 30 Dec 2022 14:13:56 -0800 Message-ID: <167243843654.699466.3274644860168881046.stgit@magnolia> In-Reply-To: <167243843494.699466.5163281976943635014.stgit@magnolia> References: <167243843494.699466.5163281976943635014.stgit@magnolia> User-Agent: StGit/0.19 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Darrick J. Wong Add a function to handle file range exchange requests from the vfs. Signed-off-by: Darrick J. Wong --- fs/xfs/xfs_bmap_util.c | 1 fs/xfs/xfs_file.c | 70 +++++++++ fs/xfs/xfs_mount.h | 5 + fs/xfs/xfs_trace.c | 1 fs/xfs/xfs_trace.h | 120 +++++++++++++++ fs/xfs/xfs_xchgrange.c | 375 ++++++++++++++++++++++++++++++++++++++++++++++++ fs/xfs/xfs_xchgrange.h | 23 +++ 7 files changed, 594 insertions(+), 1 deletion(-) diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c index 8621534b749b..d587015aec0e 100644 --- a/fs/xfs/xfs_bmap_util.c +++ b/fs/xfs/xfs_bmap_util.c @@ -28,6 +28,7 @@ #include "xfs_icache.h" #include "xfs_iomap.h" #include "xfs_reflink.h" +#include "xfs_swapext.h" /* Kernel only BMAP related definitions and functions */ diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c index 78323574021c..b4629c8aa6b7 100644 --- a/fs/xfs/xfs_file.c +++ b/fs/xfs/xfs_file.c @@ -24,6 +24,7 @@ #include "xfs_pnfs.h" #include "xfs_iomap.h" #include "xfs_reflink.h" +#include "xfs_xchgrange.h" #include #include @@ -1150,6 +1151,74 @@ xfs_file_remap_range( return remapped > 0 ? remapped : ret; } +STATIC int +xfs_file_xchg_range( + struct file *file1, + struct file *file2, + struct file_xchg_range *fxr) +{ + struct inode *inode1 = file_inode(file1); + struct inode *inode2 = file_inode(file2); + struct xfs_inode *ip1 = XFS_I(inode1); + struct xfs_inode *ip2 = XFS_I(inode2); + struct xfs_mount *mp = ip1->i_mount; + unsigned int priv_flags = 0; + bool use_logging = false; + int error; + + if (xfs_is_shutdown(mp)) + return -EIO; + + /* Update cmtime if the fd/inode don't forbid it. */ + if (likely(!(file1->f_mode & FMODE_NOCMTIME) && !IS_NOCMTIME(inode1))) + priv_flags |= XFS_XCHG_RANGE_UPD_CMTIME1; + if (likely(!(file2->f_mode & FMODE_NOCMTIME) && !IS_NOCMTIME(inode2))) + priv_flags |= XFS_XCHG_RANGE_UPD_CMTIME2; + + /* Lock both files against IO */ + error = xfs_ilock2_io_mmap(ip1, ip2); + if (error) + goto out_err; + + /* Prepare and then exchange file contents. */ + error = xfs_xchg_range_prep(file1, file2, fxr); + if (error) + goto out_unlock; + + /* Get permission to use log-assisted file content swaps. */ + error = xfs_xchg_range_grab_log_assist(mp, + !(fxr->flags & FILE_XCHG_RANGE_NONATOMIC), + &use_logging); + if (error) + goto out_unlock; + if (use_logging) + priv_flags |= XFS_XCHG_RANGE_LOGGED; + + error = xfs_xchg_range(ip1, ip2, fxr, priv_flags); + if (error) + goto out_drop_feat; + + /* + * Finish the exchange by removing special file privileges like any + * other file write would do. This may involve turning on support for + * logged xattrs if either file has security capabilities, which means + * xfs_xchg_range_grab_log_assist before xfs_attr_grab_log_assist. + */ + error = generic_xchg_file_range_finish(file1, file2); + if (error) + goto out_drop_feat; + +out_drop_feat: + if (use_logging) + xfs_xchg_range_rele_log_assist(mp); +out_unlock: + xfs_iunlock2_io_mmap(ip1, ip2); +out_err: + if (error) + trace_xfs_file_xchg_range_error(ip2, error, _RET_IP_); + return error; +} + STATIC int xfs_file_open( struct inode *inode, @@ -1439,6 +1508,7 @@ const struct file_operations xfs_file_operations = { .fallocate = xfs_file_fallocate, .fadvise = xfs_file_fadvise, .remap_file_range = xfs_file_remap_range, + .xchg_file_range = xfs_file_xchg_range, }; const struct file_operations xfs_dir_file_operations = { diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h index 7c48a2b70f6f..3b2601ab954d 100644 --- a/fs/xfs/xfs_mount.h +++ b/fs/xfs/xfs_mount.h @@ -399,6 +399,8 @@ __XFS_HAS_FEAT(nouuid, NOUUID) #define XFS_OPSTATE_WARNED_SHRINK 8 /* Kernel has logged a warning about logged xattr updates being used. */ #define XFS_OPSTATE_WARNED_LARP 9 +/* Kernel has logged a warning about extent swapping being used on this fs. */ +#define XFS_OPSTATE_WARNED_SWAPEXT 10 #define __XFS_IS_OPSTATE(name, NAME) \ static inline bool xfs_is_ ## name (struct xfs_mount *mp) \ @@ -438,7 +440,8 @@ xfs_should_warn(struct xfs_mount *mp, long nr) { (1UL << XFS_OPSTATE_BLOCKGC_ENABLED), "blockgc" }, \ { (1UL << XFS_OPSTATE_WARNED_SCRUB), "wscrub" }, \ { (1UL << XFS_OPSTATE_WARNED_SHRINK), "wshrink" }, \ - { (1UL << XFS_OPSTATE_WARNED_LARP), "wlarp" } + { (1UL << XFS_OPSTATE_WARNED_LARP), "wlarp" }, \ + { (1UL << XFS_OPSTATE_WARNED_SWAPEXT), "wswapext" } /* * Max and min values for mount-option defined I/O diff --git a/fs/xfs/xfs_trace.c b/fs/xfs/xfs_trace.c index b43b973f0e10..e38814f4380c 100644 --- a/fs/xfs/xfs_trace.c +++ b/fs/xfs/xfs_trace.c @@ -41,6 +41,7 @@ #include "xfs_btree_mem.h" #include "xfs_bmap.h" #include "xfs_swapext.h" +#include "xfs_xchgrange.h" /* * We include this last to have the helpers above available for the trace diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h index 9ebaa5ffe504..6841f04ee38d 100644 --- a/fs/xfs/xfs_trace.h +++ b/fs/xfs/xfs_trace.h @@ -3763,10 +3763,130 @@ DEFINE_INODE_IREC_EVENT(xfs_reflink_cancel_cow); DEFINE_INODE_IREC_EVENT(xfs_swap_extent_rmap_remap); DEFINE_INODE_IREC_EVENT(xfs_swap_extent_rmap_remap_piece); DEFINE_INODE_ERROR_EVENT(xfs_swap_extent_rmap_error); + +/* swapext tracepoints */ +DEFINE_INODE_ERROR_EVENT(xfs_file_xchg_range_error); DEFINE_INODE_IREC_EVENT(xfs_swapext_extent1); DEFINE_INODE_IREC_EVENT(xfs_swapext_extent2); DEFINE_ITRUNC_EVENT(xfs_swapext_update_inode_size); +#define FIEXCHANGE_FLAGS_STRS \ + { FILE_XCHG_RANGE_NONATOMIC, "NONATOMIC" }, \ + { FILE_XCHG_RANGE_FILE2_FRESH, "F2_FRESH" }, \ + { FILE_XCHG_RANGE_FULL_FILES, "FULL" }, \ + { FILE_XCHG_RANGE_TO_EOF, "TO_EOF" }, \ + { FILE_XCHG_RANGE_FSYNC , "FSYNC" }, \ + { FILE_XCHG_RANGE_DRY_RUN, "DRY_RUN" }, \ + { FILE_XCHG_RANGE_SKIP_FILE1_HOLES, "SKIP_F1_HOLES" } + +/* file exchange-range tracepoint class */ +DECLARE_EVENT_CLASS(xfs_xchg_range_class, + TP_PROTO(struct xfs_inode *ip1, const struct file_xchg_range *fxr, + struct xfs_inode *ip2, unsigned int xchg_flags), + TP_ARGS(ip1, fxr, ip2, xchg_flags), + TP_STRUCT__entry( + __field(dev_t, dev) + __field(xfs_ino_t, ip1_ino) + __field(loff_t, ip1_isize) + __field(loff_t, ip1_disize) + __field(xfs_ino_t, ip2_ino) + __field(loff_t, ip2_isize) + __field(loff_t, ip2_disize) + + __field(loff_t, file1_offset) + __field(loff_t, file2_offset) + __field(unsigned long long, length) + __field(unsigned long long, vflags) + __field(unsigned int, xflags) + ), + TP_fast_assign( + __entry->dev = VFS_I(ip1)->i_sb->s_dev; + __entry->ip1_ino = ip1->i_ino; + __entry->ip1_isize = VFS_I(ip1)->i_size; + __entry->ip1_disize = ip1->i_disk_size; + __entry->ip2_ino = ip2->i_ino; + __entry->ip2_isize = VFS_I(ip2)->i_size; + __entry->ip2_disize = ip2->i_disk_size; + + __entry->file1_offset = fxr->file1_offset; + __entry->file2_offset = fxr->file2_offset; + __entry->length = fxr->length; + __entry->vflags = fxr->flags; + __entry->xflags = xchg_flags; + ), + TP_printk("dev %d:%d vfs_flags %s xchg_flags %s bytecount 0x%llx " + "ino1 0x%llx isize 0x%llx disize 0x%llx pos 0x%llx -> " + "ino2 0x%llx isize 0x%llx disize 0x%llx pos 0x%llx", + MAJOR(__entry->dev), MINOR(__entry->dev), + __print_flags(__entry->vflags, "|", FIEXCHANGE_FLAGS_STRS), + __print_flags(__entry->xflags, "|", XCHG_RANGE_FLAGS_STRS), + __entry->length, + __entry->ip1_ino, + __entry->ip1_isize, + __entry->ip1_disize, + __entry->file1_offset, + __entry->ip2_ino, + __entry->ip2_isize, + __entry->ip2_disize, + __entry->file2_offset) +) + +#define DEFINE_XCHG_RANGE_EVENT(name) \ +DEFINE_EVENT(xfs_xchg_range_class, name, \ + TP_PROTO(struct xfs_inode *ip1, const struct file_xchg_range *fxr, \ + struct xfs_inode *ip2, unsigned int xchg_flags), \ + TP_ARGS(ip1, fxr, ip2, xchg_flags)) +DEFINE_XCHG_RANGE_EVENT(xfs_xchg_range_prep); +DEFINE_XCHG_RANGE_EVENT(xfs_xchg_range_flush); +DEFINE_XCHG_RANGE_EVENT(xfs_xchg_range); + +TRACE_EVENT(xfs_xchg_range_freshness, + TP_PROTO(struct xfs_inode *ip2, const struct file_xchg_range *fxr), + TP_ARGS(ip2, fxr), + TP_STRUCT__entry( + __field(dev_t, dev) + __field(xfs_ino_t, ip2_ino) + __field(long long, ip2_mtime) + __field(long long, ip2_ctime) + __field(int, ip2_mtime_nsec) + __field(int, ip2_ctime_nsec) + + __field(xfs_ino_t, file2_ino) + __field(long long, file2_mtime) + __field(long long, file2_ctime) + __field(int, file2_mtime_nsec) + __field(int, file2_ctime_nsec) + ), + TP_fast_assign( + __entry->dev = VFS_I(ip2)->i_sb->s_dev; + __entry->ip2_ino = ip2->i_ino; + __entry->ip2_mtime = VFS_I(ip2)->i_mtime.tv_sec; + __entry->ip2_ctime = VFS_I(ip2)->i_ctime.tv_sec; + __entry->ip2_mtime_nsec = VFS_I(ip2)->i_mtime.tv_nsec; + __entry->ip2_ctime_nsec = VFS_I(ip2)->i_ctime.tv_nsec; + + __entry->file2_ino = fxr->file2_ino; + __entry->file2_mtime = fxr->file2_mtime; + __entry->file2_ctime = fxr->file2_ctime; + __entry->file2_mtime_nsec = fxr->file2_mtime_nsec; + __entry->file2_ctime_nsec = fxr->file2_ctime_nsec; + ), + TP_printk("dev %d:%d " + "ino 0x%llx mtime %lld:%d ctime %lld:%d -> " + "file 0x%llx mtime %lld:%d ctime %lld:%d", + MAJOR(__entry->dev), MINOR(__entry->dev), + __entry->ip2_ino, + __entry->ip2_mtime, + __entry->ip2_mtime_nsec, + __entry->ip2_ctime, + __entry->ip2_ctime_nsec, + __entry->file2_ino, + __entry->file2_mtime, + __entry->file2_mtime_nsec, + __entry->file2_ctime, + __entry->file2_ctime_nsec) +); + /* fsmap traces */ DECLARE_EVENT_CLASS(xfs_fsmap_class, TP_PROTO(struct xfs_mount *mp, u32 keydev, xfs_agnumber_t agno, diff --git a/fs/xfs/xfs_xchgrange.c b/fs/xfs/xfs_xchgrange.c index 0dba5078c9f7..9966938134c0 100644 --- a/fs/xfs/xfs_xchgrange.c +++ b/fs/xfs/xfs_xchgrange.c @@ -13,8 +13,15 @@ #include "xfs_defer.h" #include "xfs_inode.h" #include "xfs_trans.h" +#include "xfs_quota.h" +#include "xfs_bmap_util.h" +#include "xfs_reflink.h" +#include "xfs_trace.h" #include "xfs_swapext.h" #include "xfs_xchgrange.h" +#include "xfs_sb.h" +#include "xfs_icache.h" +#include "xfs_log.h" /* Lock (and optionally join) two inodes for a file range exchange. */ void @@ -63,3 +70,371 @@ xfs_xchg_range_estimate( xfs_xchg_range_iunlock(req->ip1, req->ip2); return error; } + +/* Prepare two files to have their data exchanged. */ +int +xfs_xchg_range_prep( + struct file *file1, + struct file *file2, + struct file_xchg_range *fxr) +{ + struct xfs_inode *ip1 = XFS_I(file_inode(file1)); + struct xfs_inode *ip2 = XFS_I(file_inode(file2)); + int error; + + trace_xfs_xchg_range_prep(ip1, fxr, ip2, 0); + + /* Verify both files are either real-time or non-realtime */ + if (XFS_IS_REALTIME_INODE(ip1) != XFS_IS_REALTIME_INODE(ip2)) + return -EINVAL; + + /* + * The alignment checks in the VFS helpers cannot deal with allocation + * units that are not powers of 2. This can happen with the realtime + * volume if the extent size is set. Note that alignment checks are + * skipped if FULL_FILES is set. + */ + if (!(fxr->flags & FILE_XCHG_RANGE_FULL_FILES) && + !is_power_of_2(xfs_inode_alloc_unitsize(ip2))) + return -EOPNOTSUPP; + + error = generic_xchg_file_range_prep(file1, file2, fxr, + xfs_inode_alloc_unitsize(ip2)); + if (error || fxr->length == 0) + return error; + + /* Attach dquots to both inodes before changing block maps. */ + error = xfs_qm_dqattach(ip2); + if (error) + return error; + error = xfs_qm_dqattach(ip1); + if (error) + return error; + + trace_xfs_xchg_range_flush(ip1, fxr, ip2, 0); + + /* Flush the relevant ranges of both files. */ + error = xfs_flush_unmap_range(ip2, fxr->file2_offset, fxr->length); + if (error) + return error; + error = xfs_flush_unmap_range(ip1, fxr->file1_offset, fxr->length); + if (error) + return error; + + /* + * Cancel CoW fork preallocations for the ranges of both files. The + * prep function should have flushed all the dirty data, so the only + * extents remaining should be speculative. + */ + if (xfs_inode_has_cow_data(ip1)) { + error = xfs_reflink_cancel_cow_range(ip1, fxr->file1_offset, + fxr->length, true); + if (error) + return error; + } + + if (xfs_inode_has_cow_data(ip2)) { + error = xfs_reflink_cancel_cow_range(ip2, fxr->file2_offset, + fxr->length, true); + if (error) + return error; + } + + return 0; +} + +#define QRETRY_IP1 (0x1) +#define QRETRY_IP2 (0x2) + +/* + * Obtain a quota reservation to make sure we don't hit EDQUOT. We can skip + * this if quota enforcement is disabled or if both inodes' dquots are the + * same. The qretry structure must be initialized to zeroes before the first + * call to this function. + */ +STATIC int +xfs_xchg_range_reserve_quota( + struct xfs_trans *tp, + const struct xfs_swapext_req *req, + unsigned int *qretry) +{ + int64_t ddelta, rdelta; + int ip1_error = 0; + int error; + + /* + * Don't bother with a quota reservation if we're not enforcing them + * or the two inodes have the same dquots. + */ + if (!XFS_IS_QUOTA_ON(tp->t_mountp) || req->ip1 == req->ip2 || + (req->ip1->i_udquot == req->ip2->i_udquot && + req->ip1->i_gdquot == req->ip2->i_gdquot && + req->ip1->i_pdquot == req->ip2->i_pdquot)) + return 0; + + *qretry = 0; + + /* + * For each file, compute the net gain in the number of regular blocks + * that will be mapped into that file and reserve that much quota. The + * quota counts must be able to absorb at least that much space. + */ + ddelta = req->ip2_bcount - req->ip1_bcount; + rdelta = req->ip2_rtbcount - req->ip1_rtbcount; + if (ddelta > 0 || rdelta > 0) { + error = xfs_trans_reserve_quota_nblks(tp, req->ip1, + ddelta > 0 ? ddelta : 0, + rdelta > 0 ? rdelta : 0, + false); + if (error == -EDQUOT || error == -ENOSPC) { + /* + * Save this error and see what happens if we try to + * reserve quota for ip2. Then report both. + */ + *qretry |= QRETRY_IP1; + ip1_error = error; + error = 0; + } + if (error) + return error; + } + if (ddelta < 0 || rdelta < 0) { + error = xfs_trans_reserve_quota_nblks(tp, req->ip2, + ddelta < 0 ? -ddelta : 0, + rdelta < 0 ? -rdelta : 0, + false); + if (error == -EDQUOT || error == -ENOSPC) + *qretry |= QRETRY_IP2; + if (error) + return error; + } + if (ip1_error) + return ip1_error; + + /* + * For each file, forcibly reserve the gross gain in mapped blocks so + * that we don't trip over any quota block reservation assertions. + * We must reserve the gross gain because the quota code subtracts from + * bcount the number of blocks that we unmap; it does not add that + * quantity back to the quota block reservation. + */ + error = xfs_trans_reserve_quota_nblks(tp, req->ip1, req->ip1_bcount, + req->ip1_rtbcount, true); + if (error) + return error; + + return xfs_trans_reserve_quota_nblks(tp, req->ip2, req->ip2_bcount, + req->ip2_rtbcount, true); +} + +/* + * Get permission to use log-assisted atomic exchange of file extents. + * + * Callers must hold the IOLOCK and MMAPLOCK of both files. They must not be + * running any transactions or hold any ILOCKS. If @use_logging is set after a + * successful return, callers must call xfs_xchg_range_rele_log_assist after + * the exchange is completed. + */ +int +xfs_xchg_range_grab_log_assist( + struct xfs_mount *mp, + bool force, + bool *use_logging) +{ + int error = 0; + + /* + * Protect ourselves from an idle log clearing the atomic swapext + * log incompat feature bit. + */ + xlog_use_incompat_feat(mp->m_log, XLOG_INCOMPAT_FEAT_SWAPEXT); + *use_logging = true; + + /* + * If log-assisted swapping is already enabled, the caller can use the + * log assisted swap functions with the log-incompat reference we got. + */ + if (xfs_sb_version_haslogswapext(&mp->m_sb)) + return 0; + + /* + * If the caller doesn't /require/ log-assisted swapping, drop the + * log-incompat feature protection and exit. The caller cannot use + * log assisted swapping. + */ + if (!force) + goto drop_incompat; + + /* + * Caller requires log-assisted swapping but the fs feature set isn't + * rich enough to support it. Bail out. + */ + if (!xfs_swapext_supported(mp)) { + error = -EOPNOTSUPP; + goto drop_incompat; + } + + error = xfs_add_incompat_log_feature(mp, + XFS_SB_FEAT_INCOMPAT_LOG_SWAPEXT); + if (error) + goto drop_incompat; + + xfs_warn_mount(mp, XFS_OPSTATE_WARNED_SWAPEXT, + "EXPERIMENTAL atomic file range swap feature in use. Use at your own risk!"); + + return 0; +drop_incompat: + xlog_drop_incompat_feat(mp->m_log, XLOG_INCOMPAT_FEAT_SWAPEXT); + *use_logging = false; + return error; +} + +/* Release permission to use log-assisted extent swapping. */ +void +xfs_xchg_range_rele_log_assist( + struct xfs_mount *mp) +{ + xlog_drop_incompat_feat(mp->m_log, XLOG_INCOMPAT_FEAT_SWAPEXT); +} + +/* Exchange the contents of two files. */ +int +xfs_xchg_range( + struct xfs_inode *ip1, + struct xfs_inode *ip2, + const struct file_xchg_range *fxr, + unsigned int xchg_flags) +{ + struct xfs_mount *mp = ip1->i_mount; + struct xfs_swapext_req req = { + .ip1 = ip1, + .ip2 = ip2, + .whichfork = XFS_DATA_FORK, + .startoff1 = XFS_B_TO_FSBT(mp, fxr->file1_offset), + .startoff2 = XFS_B_TO_FSBT(mp, fxr->file2_offset), + .blockcount = XFS_B_TO_FSB(mp, fxr->length), + }; + struct xfs_trans *tp; + unsigned int qretry; + bool retried = false; + int error; + + trace_xfs_xchg_range(ip1, fxr, ip2, xchg_flags); + + /* + * This function only supports using log intent items (SXI items if + * atomic exchange is required, or BUI items if not) to exchange file + * data. The legacy whole-fork swap will be ported in a later patch. + */ + if (!(xchg_flags & XFS_XCHG_RANGE_LOGGED) && !xfs_swapext_supported(mp)) + return -EOPNOTSUPP; + + if (fxr->flags & FILE_XCHG_RANGE_TO_EOF) + req.req_flags |= XFS_SWAP_REQ_SET_SIZES; + if (fxr->flags & FILE_XCHG_RANGE_SKIP_FILE1_HOLES) + req.req_flags |= XFS_SWAP_REQ_SKIP_INO1_HOLES; + if (xchg_flags & XFS_XCHG_RANGE_LOGGED) + req.req_flags |= XFS_SWAP_REQ_LOGGED; + + error = xfs_xchg_range_estimate(&req); + if (error) + return error; + +retry: + /* Allocate the transaction, lock the inodes, and join them. */ + error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, req.resblks, 0, + XFS_TRANS_RES_FDBLKS, &tp); + if (error) + return error; + + xfs_xchg_range_ilock(tp, ip1, ip2); + + trace_xfs_swap_extent_before(ip2, 0); + trace_xfs_swap_extent_before(ip1, 1); + + if (fxr->flags & FILE_XCHG_RANGE_FILE2_FRESH) + trace_xfs_xchg_range_freshness(ip2, fxr); + + /* + * Now that we've excluded all other inode metadata changes by taking + * the ILOCK, repeat the freshness check. + */ + error = generic_xchg_file_range_check_fresh(VFS_I(ip2), fxr); + if (error) + goto out_trans_cancel; + + error = xfs_swapext_check_extents(mp, &req); + if (error) + goto out_trans_cancel; + + /* + * Reserve ourselves some quota if any of them are in enforcing mode. + * In theory we only need enough to satisfy the change in the number + * of blocks between the two ranges being remapped. + */ + error = xfs_xchg_range_reserve_quota(tp, &req, &qretry); + if ((error == -EDQUOT || error == -ENOSPC) && !retried) { + xfs_trans_cancel(tp); + xfs_xchg_range_iunlock(ip1, ip2); + if (qretry & QRETRY_IP1) + xfs_blockgc_free_quota(ip1, 0); + if (qretry & QRETRY_IP2) + xfs_blockgc_free_quota(ip2, 0); + retried = true; + goto retry; + } + if (error) + goto out_trans_cancel; + + /* If we got this far on a dry run, all parameters are ok. */ + if (fxr->flags & FILE_XCHG_RANGE_DRY_RUN) + goto out_trans_cancel; + + /* Update the mtime and ctime of both files. */ + if (xchg_flags & XFS_XCHG_RANGE_UPD_CMTIME1) + xfs_trans_ichgtime(tp, ip1, + XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG); + if (xchg_flags & XFS_XCHG_RANGE_UPD_CMTIME2) + xfs_trans_ichgtime(tp, ip2, + XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG); + + xfs_swapext(tp, &req); + + /* + * Force the log to persist metadata updates if the caller or the + * administrator requires this. The VFS prep function already flushed + * the relevant parts of the page cache. + */ + if (xfs_has_wsync(mp) || (fxr->flags & FILE_XCHG_RANGE_FSYNC)) + xfs_trans_set_sync(tp); + + error = xfs_trans_commit(tp); + + trace_xfs_swap_extent_after(ip2, 0); + trace_xfs_swap_extent_after(ip1, 1); + + if (error) + goto out_unlock; + + /* + * If the caller wanted us to exchange the contents of two complete + * files of unequal length, exchange the incore sizes now. This should + * be safe because we flushed both files' page caches, moved all the + * extents, and updated the ondisk sizes. + */ + if (fxr->flags & FILE_XCHG_RANGE_TO_EOF) { + loff_t temp; + + temp = i_size_read(VFS_I(ip2)); + i_size_write(VFS_I(ip2), i_size_read(VFS_I(ip1))); + i_size_write(VFS_I(ip1), temp); + } + +out_unlock: + xfs_xchg_range_iunlock(ip1, ip2); + return error; + +out_trans_cancel: + xfs_trans_cancel(tp); + goto out_unlock; +} diff --git a/fs/xfs/xfs_xchgrange.h b/fs/xfs/xfs_xchgrange.h index 89320a354efa..a0e64408784a 100644 --- a/fs/xfs/xfs_xchgrange.h +++ b/fs/xfs/xfs_xchgrange.h @@ -14,4 +14,27 @@ void xfs_xchg_range_iunlock(struct xfs_inode *ip1, struct xfs_inode *ip2); int xfs_xchg_range_estimate(struct xfs_swapext_req *req); +int xfs_xchg_range_grab_log_assist(struct xfs_mount *mp, bool force, + bool *use_logging); +void xfs_xchg_range_rele_log_assist(struct xfs_mount *mp); + +/* Caller has permission to use log intent items for the exchange operation. */ +#define XFS_XCHG_RANGE_LOGGED (1U << 0) + +/* Update ip1's change and mod time. */ +#define XFS_XCHG_RANGE_UPD_CMTIME1 (1U << 1) + +/* Update ip2's change and mod time. */ +#define XFS_XCHG_RANGE_UPD_CMTIME2 (1U << 2) + +#define XCHG_RANGE_FLAGS_STRS \ + { XFS_XCHG_RANGE_LOGGED, "LOGGED" }, \ + { XFS_XCHG_RANGE_UPD_CMTIME1, "UPD_CMTIME1" }, \ + { XFS_XCHG_RANGE_UPD_CMTIME2, "UPD_CMTIME2" } + +int xfs_xchg_range(struct xfs_inode *ip1, struct xfs_inode *ip2, + const struct file_xchg_range *fxr, unsigned int xchg_flags); +int xfs_xchg_range_prep(struct file *file1, struct file *file2, + struct file_xchg_range *fxr); + #endif /* __XFS_XCHGRANGE_H__ */ From patchwork Fri Dec 30 22:13:56 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13084977 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id AD162C3DA7C for ; Fri, 30 Dec 2022 23:52:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235785AbiL3Xww (ORCPT ); Fri, 30 Dec 2022 18:52:52 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49296 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235424AbiL3Xwr (ORCPT ); Fri, 30 Dec 2022 18:52:47 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EFDC51DF3A; Fri, 30 Dec 2022 15:52:45 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 8CA0861C43; Fri, 30 Dec 2022 23:52:45 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id F2FC7C433EF; Fri, 30 Dec 2022 23:52:44 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1672444365; bh=2oh73mlzD1sPSN+bFzwE2eyzq1B6W0++rBwka2oIIUw=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=iCpejZzscE79nVD1hBchW/BRPpS+L35hpusBRVYMwIINA3/RXBNTnnCuR8poWSWbz v3lRmxfA6LoXRqTbZWZr21K37u4vNncKSBxZjGA8+ELQo2djD3qdjrVp/I5m2BnmJR ZmICPqXdt0whrCWvaPZLNNyXNnfQzt7dCTKid2QzfFtKPFcetUY+pzu0pUuGkwHddO EO82wDReMZHfzLHcwYmqNhRTl0bL25+AeRC43DRm+qaCft+wipJRrnUx+P6EcgOt8r 9XiuvZhSK+qZMBN5ecSrmAIrqjGSF6WTnQk0dvsDSB7rBS7hC7bRVWHf6o1lzjrGXP i3X2miviI5sNw== Subject: [PATCH 10/21] xfs: add error injection to test swapext recovery From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org Date: Fri, 30 Dec 2022 14:13:56 -0800 Message-ID: <167243843669.699466.12618888634612571770.stgit@magnolia> In-Reply-To: <167243843494.699466.5163281976943635014.stgit@magnolia> References: <167243843494.699466.5163281976943635014.stgit@magnolia> User-Agent: StGit/0.19 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Darrick J. Wong Add an errortag so that we can test recovery of swapext log items. Signed-off-by: Darrick J. Wong --- fs/xfs/libxfs/xfs_errortag.h | 4 +++- fs/xfs/libxfs/xfs_swapext.c | 3 +++ fs/xfs/xfs_error.c | 3 +++ 3 files changed, 9 insertions(+), 1 deletion(-) diff --git a/fs/xfs/libxfs/xfs_errortag.h b/fs/xfs/libxfs/xfs_errortag.h index 01a9e86b3037..263d62a8d70f 100644 --- a/fs/xfs/libxfs/xfs_errortag.h +++ b/fs/xfs/libxfs/xfs_errortag.h @@ -63,7 +63,8 @@ #define XFS_ERRTAG_ATTR_LEAF_TO_NODE 41 #define XFS_ERRTAG_WB_DELAY_MS 42 #define XFS_ERRTAG_WRITE_DELAY_MS 43 -#define XFS_ERRTAG_MAX 44 +#define XFS_ERRTAG_SWAPEXT_FINISH_ONE 44 +#define XFS_ERRTAG_MAX 45 /* * Random factors for above tags, 1 means always, 2 means 1/2 time, etc. @@ -111,5 +112,6 @@ #define XFS_RANDOM_ATTR_LEAF_TO_NODE 1 #define XFS_RANDOM_WB_DELAY_MS 3000 #define XFS_RANDOM_WRITE_DELAY_MS 3000 +#define XFS_RANDOM_SWAPEXT_FINISH_ONE 1 #endif /* __XFS_ERRORTAG_H_ */ diff --git a/fs/xfs/libxfs/xfs_swapext.c b/fs/xfs/libxfs/xfs_swapext.c index 0bc758c5cf5c..227a08ac5d4b 100644 --- a/fs/xfs/libxfs/xfs_swapext.c +++ b/fs/xfs/libxfs/xfs_swapext.c @@ -426,6 +426,9 @@ xfs_swapext_finish_one( return error; } + if (XFS_TEST_ERROR(false, tp->t_mountp, XFS_ERRTAG_SWAPEXT_FINISH_ONE)) + return -EIO; + /* If we still have work to do, ask for a new transaction. */ if (sxi_has_more_swap_work(sxi) || sxi_has_postop_work(sxi)) { trace_xfs_swapext_defer(tp->t_mountp, sxi); diff --git a/fs/xfs/xfs_error.c b/fs/xfs/xfs_error.c index ae082808cfed..4b57a809ced5 100644 --- a/fs/xfs/xfs_error.c +++ b/fs/xfs/xfs_error.c @@ -62,6 +62,7 @@ static unsigned int xfs_errortag_random_default[] = { XFS_RANDOM_ATTR_LEAF_TO_NODE, XFS_RANDOM_WB_DELAY_MS, XFS_RANDOM_WRITE_DELAY_MS, + XFS_RANDOM_SWAPEXT_FINISH_ONE, }; struct xfs_errortag_attr { @@ -179,6 +180,7 @@ XFS_ERRORTAG_ATTR_RW(da_leaf_split, XFS_ERRTAG_DA_LEAF_SPLIT); XFS_ERRORTAG_ATTR_RW(attr_leaf_to_node, XFS_ERRTAG_ATTR_LEAF_TO_NODE); XFS_ERRORTAG_ATTR_RW(wb_delay_ms, XFS_ERRTAG_WB_DELAY_MS); XFS_ERRORTAG_ATTR_RW(write_delay_ms, XFS_ERRTAG_WRITE_DELAY_MS); +XFS_ERRORTAG_ATTR_RW(swapext_finish_one, XFS_ERRTAG_SWAPEXT_FINISH_ONE); static struct attribute *xfs_errortag_attrs[] = { XFS_ERRORTAG_ATTR_LIST(noerror), @@ -224,6 +226,7 @@ static struct attribute *xfs_errortag_attrs[] = { XFS_ERRORTAG_ATTR_LIST(attr_leaf_to_node), XFS_ERRORTAG_ATTR_LIST(wb_delay_ms), XFS_ERRORTAG_ATTR_LIST(write_delay_ms), + XFS_ERRORTAG_ATTR_LIST(swapext_finish_one), NULL, }; ATTRIBUTE_GROUPS(xfs_errortag); From patchwork Fri Dec 30 22:13:56 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13084978 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E4520C3DA7D for ; Fri, 30 Dec 2022 23:53:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235634AbiL3XxG (ORCPT ); Fri, 30 Dec 2022 18:53:06 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49488 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235783AbiL3XxE (ORCPT ); Fri, 30 Dec 2022 18:53:04 -0500 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 38F4D1E3E2; Fri, 30 Dec 2022 15:53:03 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id D2649B80883; Fri, 30 Dec 2022 23:53:01 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 7F9E1C433EF; Fri, 30 Dec 2022 23:53:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1672444380; bh=6jDbZQzQFUtiKixlRWE6RAZepECmkaHvKDIqDs7AvPc=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=YgGHsEf9Q5wDgVCzgcAYAdDIj4oRrnmG41tG+p1YQV1MnxMvuBKl+UDzrbpeZ6tWv SS9nWTqKuSU8+jKuOshF/95S54PVT7nkTe3JfMD7ONB2gbzXBfR0zLvfE6xFjBKm2o OmBvgyL/wuJnRSRB2gNWSerbF/G6US3PxGQR1CEVt9DE5bJgR0kXHrE6+FurLp2lOL GgwDTtgBwwWfMJ4Ys3xENvl6+eruEAd6ZvtiCSvi2mh1A22jwcauKxq8kDQDT0hIOm +qWcell0ca8QmYHeIsn5ymyxDiFTycEs2gQ3iK44VVxpGIRxbrhuNaiMzdfIrDO20N CxE1JbFfUiL+g== Subject: [PATCH 11/21] xfs: port xfs_swap_extents_rmap to our new code From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org Date: Fri, 30 Dec 2022 14:13:56 -0800 Message-ID: <167243843684.699466.18215544953813368074.stgit@magnolia> In-Reply-To: <167243843494.699466.5163281976943635014.stgit@magnolia> References: <167243843494.699466.5163281976943635014.stgit@magnolia> User-Agent: StGit/0.19 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Darrick J. Wong The inner loop of xfs_swap_extents_rmap does the same work as xfs_swapext_finish_one, so adapt it to use that. Doing so has the side benefit that the older code path no longer wastes its time remapping shared extents. This forms the basis of the non-atomic swaprange implementation. Signed-off-by: Darrick J. Wong --- fs/xfs/xfs_bmap_util.c | 151 +++++------------------------------------------- fs/xfs/xfs_trace.h | 5 -- 2 files changed, 16 insertions(+), 140 deletions(-) diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c index d587015aec0e..4d4696bf9b08 100644 --- a/fs/xfs/xfs_bmap_util.c +++ b/fs/xfs/xfs_bmap_util.c @@ -1360,138 +1360,6 @@ xfs_swap_extent_flush( return 0; } -/* - * Move extents from one file to another, when rmap is enabled. - */ -STATIC int -xfs_swap_extent_rmap( - struct xfs_trans **tpp, - struct xfs_inode *ip, - struct xfs_inode *tip) -{ - struct xfs_trans *tp = *tpp; - struct xfs_bmbt_irec irec; - struct xfs_bmbt_irec uirec; - struct xfs_bmbt_irec tirec; - xfs_fileoff_t offset_fsb; - xfs_fileoff_t end_fsb; - xfs_filblks_t count_fsb; - int error; - xfs_filblks_t ilen; - xfs_filblks_t rlen; - int nimaps; - uint64_t tip_flags2; - - /* - * If the source file has shared blocks, we must flag the donor - * file as having shared blocks so that we get the shared-block - * rmap functions when we go to fix up the rmaps. The flags - * will be switch for reals later. - */ - tip_flags2 = tip->i_diflags2; - if (ip->i_diflags2 & XFS_DIFLAG2_REFLINK) - tip->i_diflags2 |= XFS_DIFLAG2_REFLINK; - - offset_fsb = 0; - end_fsb = XFS_B_TO_FSB(ip->i_mount, i_size_read(VFS_I(ip))); - count_fsb = (xfs_filblks_t)(end_fsb - offset_fsb); - - while (count_fsb) { - /* Read extent from the donor file */ - nimaps = 1; - error = xfs_bmapi_read(tip, offset_fsb, count_fsb, &tirec, - &nimaps, 0); - if (error) - goto out; - ASSERT(nimaps == 1); - ASSERT(tirec.br_startblock != DELAYSTARTBLOCK); - - trace_xfs_swap_extent_rmap_remap(tip, &tirec); - ilen = tirec.br_blockcount; - - /* Unmap the old blocks in the source file. */ - while (tirec.br_blockcount) { - ASSERT(tp->t_firstblock == NULLFSBLOCK); - trace_xfs_swap_extent_rmap_remap_piece(tip, &tirec); - - /* Read extent from the source file */ - nimaps = 1; - error = xfs_bmapi_read(ip, tirec.br_startoff, - tirec.br_blockcount, &irec, - &nimaps, 0); - if (error) - goto out; - ASSERT(nimaps == 1); - ASSERT(tirec.br_startoff == irec.br_startoff); - trace_xfs_swap_extent_rmap_remap_piece(ip, &irec); - - /* Trim the extent. */ - uirec = tirec; - uirec.br_blockcount = rlen = min_t(xfs_filblks_t, - tirec.br_blockcount, - irec.br_blockcount); - trace_xfs_swap_extent_rmap_remap_piece(tip, &uirec); - - if (xfs_bmap_is_real_extent(&uirec)) { - error = xfs_iext_count_may_overflow(ip, - XFS_DATA_FORK, - XFS_IEXT_SWAP_RMAP_CNT); - if (error == -EFBIG) - error = xfs_iext_count_upgrade(tp, ip, - XFS_IEXT_SWAP_RMAP_CNT); - if (error) - goto out; - } - - if (xfs_bmap_is_real_extent(&irec)) { - error = xfs_iext_count_may_overflow(tip, - XFS_DATA_FORK, - XFS_IEXT_SWAP_RMAP_CNT); - if (error == -EFBIG) - error = xfs_iext_count_upgrade(tp, ip, - XFS_IEXT_SWAP_RMAP_CNT); - if (error) - goto out; - } - - /* Remove the mapping from the donor file. */ - xfs_bmap_unmap_extent(tp, tip, XFS_DATA_FORK, &uirec); - - /* Remove the mapping from the source file. */ - xfs_bmap_unmap_extent(tp, ip, XFS_DATA_FORK, &irec); - - /* Map the donor file's blocks into the source file. */ - xfs_bmap_map_extent(tp, ip, XFS_DATA_FORK, &uirec); - - /* Map the source file's blocks into the donor file. */ - xfs_bmap_map_extent(tp, tip, XFS_DATA_FORK, &irec); - - error = xfs_defer_finish(tpp); - tp = *tpp; - if (error) - goto out; - - tirec.br_startoff += rlen; - if (tirec.br_startblock != HOLESTARTBLOCK && - tirec.br_startblock != DELAYSTARTBLOCK) - tirec.br_startblock += rlen; - tirec.br_blockcount -= rlen; - } - - /* Roll on... */ - count_fsb -= ilen; - offset_fsb += ilen; - } - - tip->i_diflags2 = tip_flags2; - return 0; - -out: - trace_xfs_swap_extent_rmap_error(ip, error, _RET_IP_); - tip->i_diflags2 = tip_flags2; - return error; -} - /* Swap the extents of two files by swapping data forks. */ STATIC int xfs_swap_extent_forks( @@ -1775,13 +1643,24 @@ xfs_swap_extents( src_log_flags = XFS_ILOG_CORE; target_log_flags = XFS_ILOG_CORE; - if (xfs_has_rmapbt(mp)) - error = xfs_swap_extent_rmap(&tp, ip, tip); - else + if (xfs_has_rmapbt(mp)) { + struct xfs_swapext_req req = { + .ip1 = tip, + .ip2 = ip, + .whichfork = XFS_DATA_FORK, + .blockcount = XFS_B_TO_FSB(ip->i_mount, + i_size_read(VFS_I(ip))), + }; + + xfs_swapext(tp, &req); + error = xfs_defer_finish(&tp); + } else error = xfs_swap_extent_forks(tp, ip, tip, &src_log_flags, &target_log_flags); - if (error) + if (error) { + trace_xfs_swap_extent_error(ip, error, _THIS_IP_); goto out_trans_cancel; + } /* Do we have to swap reflink flags? */ if ((ip->i_diflags2 & XFS_DIFLAG2_REFLINK) ^ diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h index 6841f04ee38d..b0ced76af3b9 100644 --- a/fs/xfs/xfs_trace.h +++ b/fs/xfs/xfs_trace.h @@ -3759,13 +3759,10 @@ DEFINE_INODE_ERROR_EVENT(xfs_reflink_end_cow_error); DEFINE_INODE_IREC_EVENT(xfs_reflink_cancel_cow); -/* rmap swapext tracepoints */ -DEFINE_INODE_IREC_EVENT(xfs_swap_extent_rmap_remap); -DEFINE_INODE_IREC_EVENT(xfs_swap_extent_rmap_remap_piece); -DEFINE_INODE_ERROR_EVENT(xfs_swap_extent_rmap_error); /* swapext tracepoints */ DEFINE_INODE_ERROR_EVENT(xfs_file_xchg_range_error); +DEFINE_INODE_ERROR_EVENT(xfs_swap_extent_error); DEFINE_INODE_IREC_EVENT(xfs_swapext_extent1); DEFINE_INODE_IREC_EVENT(xfs_swapext_extent2); DEFINE_ITRUNC_EVENT(xfs_swapext_update_inode_size); From patchwork Fri Dec 30 22:13:57 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13084979 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1BB48C3DA7C for ; Fri, 30 Dec 2022 23:53:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235783AbiL3XxV (ORCPT ); Fri, 30 Dec 2022 18:53:21 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49748 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231348AbiL3XxT (ORCPT ); Fri, 30 Dec 2022 18:53:19 -0500 Received: from ams.source.kernel.org (ams.source.kernel.org [IPv6:2604:1380:4601:e00::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 922AC1DF3A; Fri, 30 Dec 2022 15:53:18 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 5358DB80883; Fri, 30 Dec 2022 23:53:17 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 147CFC433EF; Fri, 30 Dec 2022 23:53:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1672444396; bh=Z+WdyR21OnOkAeFJd2/2V5xKTkymJYwVNQwGZNWgWyI=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=RoHEcXPkIC0ZIWtPZKMaIdjba+Tl6GeRUpGzQCsShCBhbQWxxpdgNGqHJiYr1Id6C YMwaDVb9KzbGzkzXRf1/wNb1unt/xTzize8L2hBPuhwoKcL1dRraurGOv5aFguvIfg 4bDNCkoBbr7goYUFs6boFTLoYwX5AhHNkhKJmAgRun6U2Sx13QwPLz6TN5/NCx+si5 S7WmhbtiEUnteZWsJBr2ltKFlMkF6cz5fCZAAzRhrYOraER+on3BZ6alPuJ8SvB/BY uPbd1RVCzNu4Qyk3t2TLAzvcOGwq5NkDChfujF9km5TeWUHXSOc1js59a2C9njRvRL DGJb+ekII01Sw== Subject: [PATCH 12/21] xfs: consolidate all of the xfs_swap_extent_forks code From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org Date: Fri, 30 Dec 2022 14:13:57 -0800 Message-ID: <167243843698.699466.5411234125659051866.stgit@magnolia> In-Reply-To: <167243843494.699466.5163281976943635014.stgit@magnolia> References: <167243843494.699466.5163281976943635014.stgit@magnolia> User-Agent: StGit/0.19 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Darrick J. Wong Now that we've moved the old swapext code to use the new log-assisted extent swap code for rmap filesystems, let's start porting the old implementation to the new ioctl interface so that later we can port the old interface to the new interface. Consolidate the reflink flag swap code and the the bmbt owner change scan code in xfs_swap_extent_forks, since both interfaces are going to need that. Signed-off-by: Darrick J. Wong --- fs/xfs/xfs_bmap_util.c | 220 ++++++++++++++++++++++++------------------------ 1 file changed, 108 insertions(+), 112 deletions(-) diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c index 4d4696bf9b08..dbd95d86addb 100644 --- a/fs/xfs/xfs_bmap_util.c +++ b/fs/xfs/xfs_bmap_util.c @@ -1360,19 +1360,61 @@ xfs_swap_extent_flush( return 0; } +/* + * Fix up the owners of the bmbt blocks to refer to the current inode. The + * change owner scan attempts to order all modified buffers in the current + * transaction. In the event of ordered buffer failure, the offending buffer is + * physically logged as a fallback and the scan returns -EAGAIN. We must roll + * the transaction in this case to replenish the fallback log reservation and + * restart the scan. This process repeats until the scan completes. + */ +static int +xfs_swap_change_owner( + struct xfs_trans **tpp, + struct xfs_inode *ip, + struct xfs_inode *tmpip) +{ + int error; + struct xfs_trans *tp = *tpp; + + do { + error = xfs_bmbt_change_owner(tp, ip, XFS_DATA_FORK, ip->i_ino, + NULL); + /* success or fatal error */ + if (error != -EAGAIN) + break; + + error = xfs_trans_roll(tpp); + if (error) + break; + tp = *tpp; + + /* + * Redirty both inodes so they can relog and keep the log tail + * moving forward. + */ + xfs_trans_ijoin(tp, ip, 0); + xfs_trans_ijoin(tp, tmpip, 0); + xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE); + xfs_trans_log_inode(tp, tmpip, XFS_ILOG_CORE); + } while (true); + + return error; +} + /* Swap the extents of two files by swapping data forks. */ STATIC int xfs_swap_extent_forks( - struct xfs_trans *tp, + struct xfs_trans **tpp, struct xfs_inode *ip, - struct xfs_inode *tip, - int *src_log_flags, - int *target_log_flags) + struct xfs_inode *tip) { xfs_filblks_t aforkblks = 0; xfs_filblks_t taforkblks = 0; xfs_extnum_t junk; uint64_t tmp; + int src_log_flags = XFS_ILOG_CORE; + int target_log_flags = XFS_ILOG_CORE; int error; /* @@ -1380,14 +1422,14 @@ xfs_swap_extent_forks( */ if (xfs_inode_has_attr_fork(ip) && ip->i_af.if_nextents > 0 && ip->i_af.if_format != XFS_DINODE_FMT_LOCAL) { - error = xfs_bmap_count_blocks(tp, ip, XFS_ATTR_FORK, &junk, + error = xfs_bmap_count_blocks(*tpp, ip, XFS_ATTR_FORK, &junk, &aforkblks); if (error) return error; } if (xfs_inode_has_attr_fork(tip) && tip->i_af.if_nextents > 0 && tip->i_af.if_format != XFS_DINODE_FMT_LOCAL) { - error = xfs_bmap_count_blocks(tp, tip, XFS_ATTR_FORK, &junk, + error = xfs_bmap_count_blocks(*tpp, tip, XFS_ATTR_FORK, &junk, &taforkblks); if (error) return error; @@ -1402,9 +1444,9 @@ xfs_swap_extent_forks( */ if (xfs_has_v3inodes(ip->i_mount)) { if (ip->i_df.if_format == XFS_DINODE_FMT_BTREE) - (*target_log_flags) |= XFS_ILOG_DOWNER; + target_log_flags |= XFS_ILOG_DOWNER; if (tip->i_df.if_format == XFS_DINODE_FMT_BTREE) - (*src_log_flags) |= XFS_ILOG_DOWNER; + src_log_flags |= XFS_ILOG_DOWNER; } /* @@ -1434,71 +1476,80 @@ xfs_swap_extent_forks( switch (ip->i_df.if_format) { case XFS_DINODE_FMT_EXTENTS: - (*src_log_flags) |= XFS_ILOG_DEXT; + src_log_flags |= XFS_ILOG_DEXT; break; case XFS_DINODE_FMT_BTREE: ASSERT(!xfs_has_v3inodes(ip->i_mount) || - (*src_log_flags & XFS_ILOG_DOWNER)); - (*src_log_flags) |= XFS_ILOG_DBROOT; + (src_log_flags & XFS_ILOG_DOWNER)); + src_log_flags |= XFS_ILOG_DBROOT; break; } switch (tip->i_df.if_format) { case XFS_DINODE_FMT_EXTENTS: - (*target_log_flags) |= XFS_ILOG_DEXT; + target_log_flags |= XFS_ILOG_DEXT; break; case XFS_DINODE_FMT_BTREE: - (*target_log_flags) |= XFS_ILOG_DBROOT; + target_log_flags |= XFS_ILOG_DBROOT; ASSERT(!xfs_has_v3inodes(ip->i_mount) || - (*target_log_flags & XFS_ILOG_DOWNER)); + (target_log_flags & XFS_ILOG_DOWNER)); break; } + /* Do we have to swap reflink flags? */ + if ((ip->i_diflags2 & XFS_DIFLAG2_REFLINK) ^ + (tip->i_diflags2 & XFS_DIFLAG2_REFLINK)) { + uint64_t f; + + f = ip->i_diflags2 & XFS_DIFLAG2_REFLINK; + ip->i_diflags2 &= ~XFS_DIFLAG2_REFLINK; + ip->i_diflags2 |= tip->i_diflags2 & XFS_DIFLAG2_REFLINK; + tip->i_diflags2 &= ~XFS_DIFLAG2_REFLINK; + tip->i_diflags2 |= f & XFS_DIFLAG2_REFLINK; + } + + /* Swap the cow forks. */ + if (xfs_has_reflink(ip->i_mount)) { + ASSERT(!ip->i_cowfp || + ip->i_cowfp->if_format == XFS_DINODE_FMT_EXTENTS); + ASSERT(!tip->i_cowfp || + tip->i_cowfp->if_format == XFS_DINODE_FMT_EXTENTS); + + swap(ip->i_cowfp, tip->i_cowfp); + + if (ip->i_cowfp && ip->i_cowfp->if_bytes) + xfs_inode_set_cowblocks_tag(ip); + else + xfs_inode_clear_cowblocks_tag(ip); + if (tip->i_cowfp && tip->i_cowfp->if_bytes) + xfs_inode_set_cowblocks_tag(tip); + else + xfs_inode_clear_cowblocks_tag(tip); + } + + xfs_trans_log_inode(*tpp, ip, src_log_flags); + xfs_trans_log_inode(*tpp, tip, target_log_flags); + + /* + * The extent forks have been swapped, but crc=1,rmapbt=0 filesystems + * have inode number owner values in the bmbt blocks that still refer to + * the old inode. Scan each bmbt to fix up the owner values with the + * inode number of the current inode. + */ + if (src_log_flags & XFS_ILOG_DOWNER) { + error = xfs_swap_change_owner(tpp, ip, tip); + if (error) + return error; + } + if (target_log_flags & XFS_ILOG_DOWNER) { + error = xfs_swap_change_owner(tpp, tip, ip); + if (error) + return error; + } + return 0; } -/* - * Fix up the owners of the bmbt blocks to refer to the current inode. The - * change owner scan attempts to order all modified buffers in the current - * transaction. In the event of ordered buffer failure, the offending buffer is - * physically logged as a fallback and the scan returns -EAGAIN. We must roll - * the transaction in this case to replenish the fallback log reservation and - * restart the scan. This process repeats until the scan completes. - */ -static int -xfs_swap_change_owner( - struct xfs_trans **tpp, - struct xfs_inode *ip, - struct xfs_inode *tmpip) -{ - int error; - struct xfs_trans *tp = *tpp; - - do { - error = xfs_bmbt_change_owner(tp, ip, XFS_DATA_FORK, ip->i_ino, - NULL); - /* success or fatal error */ - if (error != -EAGAIN) - break; - - error = xfs_trans_roll(tpp); - if (error) - break; - tp = *tpp; - - /* - * Redirty both inodes so they can relog and keep the log tail - * moving forward. - */ - xfs_trans_ijoin(tp, ip, 0); - xfs_trans_ijoin(tp, tmpip, 0); - xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE); - xfs_trans_log_inode(tp, tmpip, XFS_ILOG_CORE); - } while (true); - - return error; -} - int xfs_swap_extents( struct xfs_inode *ip, /* target inode */ @@ -1508,9 +1559,7 @@ xfs_swap_extents( struct xfs_mount *mp = ip->i_mount; struct xfs_trans *tp; struct xfs_bstat *sbp = &sxp->sx_stat; - int src_log_flags, target_log_flags; int error = 0; - uint64_t f; int resblks = 0; unsigned int flags = 0; @@ -1640,9 +1689,6 @@ xfs_swap_extents( * recovery is going to see the fork as owned by the swapped inode, * not the pre-swapped inodes. */ - src_log_flags = XFS_ILOG_CORE; - target_log_flags = XFS_ILOG_CORE; - if (xfs_has_rmapbt(mp)) { struct xfs_swapext_req req = { .ip1 = tip, @@ -1655,62 +1701,12 @@ xfs_swap_extents( xfs_swapext(tp, &req); error = xfs_defer_finish(&tp); } else - error = xfs_swap_extent_forks(tp, ip, tip, &src_log_flags, - &target_log_flags); + error = xfs_swap_extent_forks(&tp, ip, tip); if (error) { trace_xfs_swap_extent_error(ip, error, _THIS_IP_); goto out_trans_cancel; } - /* Do we have to swap reflink flags? */ - if ((ip->i_diflags2 & XFS_DIFLAG2_REFLINK) ^ - (tip->i_diflags2 & XFS_DIFLAG2_REFLINK)) { - f = ip->i_diflags2 & XFS_DIFLAG2_REFLINK; - ip->i_diflags2 &= ~XFS_DIFLAG2_REFLINK; - ip->i_diflags2 |= tip->i_diflags2 & XFS_DIFLAG2_REFLINK; - tip->i_diflags2 &= ~XFS_DIFLAG2_REFLINK; - tip->i_diflags2 |= f & XFS_DIFLAG2_REFLINK; - } - - /* Swap the cow forks. */ - if (xfs_has_reflink(mp)) { - ASSERT(!ip->i_cowfp || - ip->i_cowfp->if_format == XFS_DINODE_FMT_EXTENTS); - ASSERT(!tip->i_cowfp || - tip->i_cowfp->if_format == XFS_DINODE_FMT_EXTENTS); - - swap(ip->i_cowfp, tip->i_cowfp); - - if (ip->i_cowfp && ip->i_cowfp->if_bytes) - xfs_inode_set_cowblocks_tag(ip); - else - xfs_inode_clear_cowblocks_tag(ip); - if (tip->i_cowfp && tip->i_cowfp->if_bytes) - xfs_inode_set_cowblocks_tag(tip); - else - xfs_inode_clear_cowblocks_tag(tip); - } - - xfs_trans_log_inode(tp, ip, src_log_flags); - xfs_trans_log_inode(tp, tip, target_log_flags); - - /* - * The extent forks have been swapped, but crc=1,rmapbt=0 filesystems - * have inode number owner values in the bmbt blocks that still refer to - * the old inode. Scan each bmbt to fix up the owner values with the - * inode number of the current inode. - */ - if (src_log_flags & XFS_ILOG_DOWNER) { - error = xfs_swap_change_owner(&tp, ip, tip); - if (error) - goto out_trans_cancel; - } - if (target_log_flags & XFS_ILOG_DOWNER) { - error = xfs_swap_change_owner(&tp, tip, ip); - if (error) - goto out_trans_cancel; - } - /* * If this is a synchronous mount, make sure that the * transaction goes to disk before returning to the user. From patchwork Fri Dec 30 22:13:57 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13084980 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3AD67C3DA7D for ; Fri, 30 Dec 2022 23:53:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235799AbiL3Xxg (ORCPT ); Fri, 30 Dec 2022 18:53:36 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49894 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235608AbiL3Xxe (ORCPT ); Fri, 30 Dec 2022 18:53:34 -0500 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 48E9E1E3C3; Fri, 30 Dec 2022 15:53:34 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id EFA09B81DE0; Fri, 30 Dec 2022 23:53:32 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 9F3CEC43392; Fri, 30 Dec 2022 23:53:31 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1672444411; bh=Rl5SwoFy18xfolYMtYCB6aH9T6OJCaLqHEoNpxvqv8Q=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=P0/wxArwOglTj3cMbDqwQs4+uAD/BAdx9r/wr4iwX+5LxP4BCZ+MBngjy0yn78B70 rNnrnjYwiIHCBUIP5BM7hs+CDtoqq6bvA+h9pXMh6VlPAU7CXWqN+1iN5P4P42tqAw OW1RWzchfO1LWaQ91m7lAzDK0N4o7oaaN2Lq2HIChvHy33so6RbOtGbmhOrZJcVy96 AIKAbj3VoEWWRlOZTo+XiCJ5WBCA2A0iDT9Ti4RX5wPfcx2ZwcIliY4zctoqSGOXdn Wxr/wsqnfsOIJXTvt4m4fnk1B8khCmFUVDKShwurLngWSh7k2Myxz8WSX1Otxeq+yn RYFxapMi4lxdA== Subject: [PATCH 13/21] xfs: port xfs_swap_extent_forks to use xfs_swapext_req From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org Date: Fri, 30 Dec 2022 14:13:57 -0800 Message-ID: <167243843712.699466.14764003951903637803.stgit@magnolia> In-Reply-To: <167243843494.699466.5163281976943635014.stgit@magnolia> References: <167243843494.699466.5163281976943635014.stgit@magnolia> User-Agent: StGit/0.19 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Darrick J. Wong Port the old extent fork swapping function to take a xfs_swapext_req as input, which aligns it with the new fiexchange interface. Signed-off-by: Darrick J. Wong --- fs/xfs/xfs_bmap_util.c | 21 ++++++++++----------- 1 file changed, 10 insertions(+), 11 deletions(-) diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c index dbd95d86addb..9d6337a05544 100644 --- a/fs/xfs/xfs_bmap_util.c +++ b/fs/xfs/xfs_bmap_util.c @@ -1406,9 +1406,10 @@ xfs_swap_change_owner( STATIC int xfs_swap_extent_forks( struct xfs_trans **tpp, - struct xfs_inode *ip, - struct xfs_inode *tip) + struct xfs_swapext_req *req) { + struct xfs_inode *ip = req->ip2; + struct xfs_inode *tip = req->ip1; xfs_filblks_t aforkblks = 0; xfs_filblks_t taforkblks = 0; xfs_extnum_t junk; @@ -1556,6 +1557,11 @@ xfs_swap_extents( struct xfs_inode *tip, /* tmp inode */ struct xfs_swapext *sxp) { + struct xfs_swapext_req req = { + .ip1 = tip, + .ip2 = ip, + .whichfork = XFS_DATA_FORK, + }; struct xfs_mount *mp = ip->i_mount; struct xfs_trans *tp; struct xfs_bstat *sbp = &sxp->sx_stat; @@ -1689,19 +1695,12 @@ xfs_swap_extents( * recovery is going to see the fork as owned by the swapped inode, * not the pre-swapped inodes. */ + req.blockcount = XFS_B_TO_FSB(ip->i_mount, i_size_read(VFS_I(ip))); if (xfs_has_rmapbt(mp)) { - struct xfs_swapext_req req = { - .ip1 = tip, - .ip2 = ip, - .whichfork = XFS_DATA_FORK, - .blockcount = XFS_B_TO_FSB(ip->i_mount, - i_size_read(VFS_I(ip))), - }; - xfs_swapext(tp, &req); error = xfs_defer_finish(&tp); } else - error = xfs_swap_extent_forks(&tp, ip, tip); + error = xfs_swap_extent_forks(&tp, &req); if (error) { trace_xfs_swap_extent_error(ip, error, _THIS_IP_); goto out_trans_cancel; From patchwork Fri Dec 30 22:13:57 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13084981 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D04B4C3DA7D for ; Fri, 30 Dec 2022 23:53:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235806AbiL3Xxu (ORCPT ); Fri, 30 Dec 2022 18:53:50 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50110 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235736AbiL3Xxt (ORCPT ); Fri, 30 Dec 2022 18:53:49 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 454FA1E3D6; Fri, 30 Dec 2022 15:53:48 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id D627A61B98; Fri, 30 Dec 2022 23:53:47 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 455D6C433D2; Fri, 30 Dec 2022 23:53:47 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1672444427; bh=AKaVQQmU9fM/WNiMCxkI/eFl5zhqdBthV24kVCilEYc=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=kj0SjqgfuKHSNWH9nfHb20MSEgxo3J1fBsBnVPyNatJqyXA0usxKIF+FHDSpb1mrM t9uE/6uJsKpwyf1IiyA50n6mmBPLjLBKjC99zAHuvp9wT3+5ENGWoNeHiOt4yEnaYD 06+zZeT1IXRTxgKf2SLmTOHYOiNyMXIpyydN0sOwUCQLkRhdQWiqu0S1q0Gfm9NDQV IKLs3x5bHbfUw3F1eO74swLP1pFsh2tEUPXzb+PU5Zs7vuS841MbLM40YsxeyCXat0 x8hlHsxkcRIwbD9oTaLE2r8NOB+XKygjicYXXMuLftCaXCK6VLc4XQcT6KJfFxD7OH eeaJVZ7OKX/Wg== Subject: [PATCH 14/21] xfs: allow xfs_swap_range to use older extent swap algorithms From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org Date: Fri, 30 Dec 2022 14:13:57 -0800 Message-ID: <167243843727.699466.11955722742191147402.stgit@magnolia> In-Reply-To: <167243843494.699466.5163281976943635014.stgit@magnolia> References: <167243843494.699466.5163281976943635014.stgit@magnolia> User-Agent: StGit/0.19 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Darrick J. Wong If userspace permits non-atomic swap operations, use the older code paths to implement the same functionality. Signed-off-by: Darrick J. Wong --- fs/xfs/xfs_bmap_util.c | 4 +- fs/xfs/xfs_bmap_util.h | 4 ++ fs/xfs/xfs_xchgrange.c | 96 +++++++++++++++++++++++++++++++++++++++++++----- 3 files changed, 92 insertions(+), 12 deletions(-) diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c index 9d6337a05544..e8562c4de7eb 100644 --- a/fs/xfs/xfs_bmap_util.c +++ b/fs/xfs/xfs_bmap_util.c @@ -1261,7 +1261,7 @@ xfs_insert_file_space( * reject and log the attempt. basically we are putting the responsibility on * userspace to get this right. */ -static int +int xfs_swap_extents_check_format( struct xfs_inode *ip, /* target inode */ struct xfs_inode *tip) /* tmp inode */ @@ -1403,7 +1403,7 @@ xfs_swap_change_owner( } /* Swap the extents of two files by swapping data forks. */ -STATIC int +int xfs_swap_extent_forks( struct xfs_trans **tpp, struct xfs_swapext_req *req) diff --git a/fs/xfs/xfs_bmap_util.h b/fs/xfs/xfs_bmap_util.h index 6888078f5c31..39c71da08403 100644 --- a/fs/xfs/xfs_bmap_util.h +++ b/fs/xfs/xfs_bmap_util.h @@ -69,6 +69,10 @@ int xfs_free_eofblocks(struct xfs_inode *ip); int xfs_swap_extents(struct xfs_inode *ip, struct xfs_inode *tip, struct xfs_swapext *sx); +struct xfs_swapext_req; +int xfs_swap_extent_forks(struct xfs_trans **tpp, struct xfs_swapext_req *req); +int xfs_swap_extents_check_format(struct xfs_inode *ip, struct xfs_inode *tip); + xfs_daddr_t xfs_fsb_to_db(struct xfs_inode *ip, xfs_fsblock_t fsb); xfs_extnum_t xfs_bmap_count_leaves(struct xfs_ifork *ifp, xfs_filblks_t *count); diff --git a/fs/xfs/xfs_xchgrange.c b/fs/xfs/xfs_xchgrange.c index 9966938134c0..2b7aedc49923 100644 --- a/fs/xfs/xfs_xchgrange.c +++ b/fs/xfs/xfs_xchgrange.c @@ -297,6 +297,33 @@ xfs_xchg_range_rele_log_assist( xlog_drop_incompat_feat(mp->m_log, XLOG_INCOMPAT_FEAT_SWAPEXT); } +/* Decide if we can use the old data fork exchange code. */ +static inline bool +xfs_xchg_use_forkswap( + const struct file_xchg_range *fxr, + struct xfs_inode *ip1, + struct xfs_inode *ip2) +{ + if (!(fxr->flags & FILE_XCHG_RANGE_NONATOMIC)) + return false; + if (!(fxr->flags & FILE_XCHG_RANGE_FULL_FILES)) + return false; + if (fxr->flags & FILE_XCHG_RANGE_TO_EOF) + return false; + if (fxr->file1_offset != 0 || fxr->file2_offset != 0) + return false; + if (fxr->length != ip1->i_disk_size) + return false; + if (fxr->length != ip2->i_disk_size) + return false; + return true; +} + +enum xchg_strategy { + SWAPEXT = 1, /* xfs_swapext() */ + FORKSWAP = 2, /* exchange forks */ +}; + /* Exchange the contents of two files. */ int xfs_xchg_range( @@ -316,19 +343,13 @@ xfs_xchg_range( }; struct xfs_trans *tp; unsigned int qretry; + unsigned int flags = 0; bool retried = false; + enum xchg_strategy strategy; int error; trace_xfs_xchg_range(ip1, fxr, ip2, xchg_flags); - /* - * This function only supports using log intent items (SXI items if - * atomic exchange is required, or BUI items if not) to exchange file - * data. The legacy whole-fork swap will be ported in a later patch. - */ - if (!(xchg_flags & XFS_XCHG_RANGE_LOGGED) && !xfs_swapext_supported(mp)) - return -EOPNOTSUPP; - if (fxr->flags & FILE_XCHG_RANGE_TO_EOF) req.req_flags |= XFS_SWAP_REQ_SET_SIZES; if (fxr->flags & FILE_XCHG_RANGE_SKIP_FILE1_HOLES) @@ -340,10 +361,25 @@ xfs_xchg_range( if (error) return error; + /* + * We haven't decided which exchange strategy we want to use yet, but + * here we must choose if we want freed blocks during the swap to be + * added to the transaction block reservation (RES_FDBLKS) or freed + * into the global fdblocks. The legacy fork swap mechanism doesn't + * free any blocks, so it doesn't require it. It is also the only + * option that works for older filesystems. + * + * The bmap log intent items that were added with rmap and reflink can + * change the bmbt shape, so the intent-based swap strategies require + * us to set RES_FDBLKS. + */ + if (xfs_has_lazysbcount(mp)) + flags |= XFS_TRANS_RES_FDBLKS; + retry: /* Allocate the transaction, lock the inodes, and join them. */ error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, req.resblks, 0, - XFS_TRANS_RES_FDBLKS, &tp); + flags, &tp); if (error) return error; @@ -386,6 +422,40 @@ xfs_xchg_range( if (error) goto out_trans_cancel; + if ((xchg_flags & XFS_XCHG_RANGE_LOGGED) || xfs_swapext_supported(mp)) { + /* + * xfs_swapext() uses deferred bmap log intent items to swap + * extents between file forks. If the atomic log swap feature + * is enabled, it will also use swapext log intent items to + * restart the operation in case of failure. + * + * This means that we can use it if we previously obtained + * permission from the log to use log-assisted atomic extent + * swapping; or if the fs supports rmap or reflink and the + * user said NONATOMIC. + */ + strategy = SWAPEXT; + } else if (xfs_xchg_use_forkswap(fxr, ip1, ip2)) { + /* + * Exchange the file contents by using the old bmap fork + * exchange code, if we're a defrag tool doing a full file + * swap. + */ + strategy = FORKSWAP; + + error = xfs_swap_extents_check_format(ip2, ip1); + if (error) { + xfs_notice(mp, + "%s: inode 0x%llx format is incompatible for exchanging.", + __func__, ip2->i_ino); + goto out_trans_cancel; + } + } else { + /* We cannot exchange the file contents. */ + error = -EOPNOTSUPP; + goto out_trans_cancel; + } + /* If we got this far on a dry run, all parameters are ok. */ if (fxr->flags & FILE_XCHG_RANGE_DRY_RUN) goto out_trans_cancel; @@ -398,7 +468,13 @@ xfs_xchg_range( xfs_trans_ichgtime(tp, ip2, XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG); - xfs_swapext(tp, &req); + if (strategy == SWAPEXT) { + xfs_swapext(tp, &req); + } else { + error = xfs_swap_extent_forks(&tp, &req); + if (error) + goto out_trans_cancel; + } /* * Force the log to persist metadata updates if the caller or the From patchwork Fri Dec 30 22:13:57 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13084982 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 21849C3DA7C for ; Fri, 30 Dec 2022 23:54:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235736AbiL3XyJ (ORCPT ); Fri, 30 Dec 2022 18:54:09 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50382 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235798AbiL3XyG (ORCPT ); Fri, 30 Dec 2022 18:54:06 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D3E6A1DF3A; Fri, 30 Dec 2022 15:54:03 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 6171161C31; Fri, 30 Dec 2022 23:54:03 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id B60AEC433EF; Fri, 30 Dec 2022 23:54:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1672444442; bh=5T/qOfCEe4z5SHUzmbPDYoP2PnkSUjpLfJ78pgKNqoc=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=s+g0g208gdNFIc/JOuF44hS3xV82PqxkqVSXFFmud5jhVy+MFou2SptipQ0t3xw1Z Sa3TyQBU3sGha/JI4L2R1Bhmys0ysSLmu5l8+vmP70V9MLVSN0AQr+dq0XGBx+o4it eHYxJ84/7ObaVPRHe+QbQSf/9HrQqXpQZeFXSYMxsuG3H11zMZeuWHSDSxj4Hi2hZX aEippUHJ+7zZMybb/tatD5Scc22U7vrHYUjzOSrX3lHrpx0n7wcuEp4+CpC8hMqeAO gpOXy/YUZzo+NOSrWpGVv0hZKqR7u29gX+IqcsjK+PWwLE0Y1miGnQxmwqhxgCmHrk faS0WrwWWEU7A== Subject: [PATCH 15/21] xfs: remove old swap extents implementation From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org Date: Fri, 30 Dec 2022 14:13:57 -0800 Message-ID: <167243843741.699466.605234553596100810.stgit@magnolia> In-Reply-To: <167243843494.699466.5163281976943635014.stgit@magnolia> References: <167243843494.699466.5163281976943635014.stgit@magnolia> User-Agent: StGit/0.19 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Darrick J. Wong Migrate the old XFS_IOC_SWAPEXT implementation to use our shiny new one. Signed-off-by: Darrick J. Wong --- fs/xfs/xfs_bmap_util.c | 491 ------------------------------------------------ fs/xfs/xfs_bmap_util.h | 7 - fs/xfs/xfs_ioctl.c | 102 +++------- fs/xfs/xfs_ioctl.h | 4 fs/xfs/xfs_ioctl32.c | 11 - fs/xfs/xfs_xchgrange.c | 299 +++++++++++++++++++++++++++++ 6 files changed, 334 insertions(+), 580 deletions(-) diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c index e8562c4de7eb..47a583a94d58 100644 --- a/fs/xfs/xfs_bmap_util.c +++ b/fs/xfs/xfs_bmap_util.c @@ -1240,494 +1240,3 @@ xfs_insert_file_space( xfs_iunlock(ip, XFS_ILOCK_EXCL); return error; } - -/* - * We need to check that the format of the data fork in the temporary inode is - * valid for the target inode before doing the swap. This is not a problem with - * attr1 because of the fixed fork offset, but attr2 has a dynamically sized - * data fork depending on the space the attribute fork is taking so we can get - * invalid formats on the target inode. - * - * E.g. target has space for 7 extents in extent format, temp inode only has - * space for 6. If we defragment down to 7 extents, then the tmp format is a - * btree, but when swapped it needs to be in extent format. Hence we can't just - * blindly swap data forks on attr2 filesystems. - * - * Note that we check the swap in both directions so that we don't end up with - * a corrupt temporary inode, either. - * - * Note that fixing the way xfs_fsr sets up the attribute fork in the source - * inode will prevent this situation from occurring, so all we do here is - * reject and log the attempt. basically we are putting the responsibility on - * userspace to get this right. - */ -int -xfs_swap_extents_check_format( - struct xfs_inode *ip, /* target inode */ - struct xfs_inode *tip) /* tmp inode */ -{ - struct xfs_ifork *ifp = &ip->i_df; - struct xfs_ifork *tifp = &tip->i_df; - - /* User/group/project quota ids must match if quotas are enforced. */ - if (XFS_IS_QUOTA_ON(ip->i_mount) && - (!uid_eq(VFS_I(ip)->i_uid, VFS_I(tip)->i_uid) || - !gid_eq(VFS_I(ip)->i_gid, VFS_I(tip)->i_gid) || - ip->i_projid != tip->i_projid)) - return -EINVAL; - - /* Should never get a local format */ - if (ifp->if_format == XFS_DINODE_FMT_LOCAL || - tifp->if_format == XFS_DINODE_FMT_LOCAL) - return -EINVAL; - - /* - * if the target inode has less extents that then temporary inode then - * why did userspace call us? - */ - if (ifp->if_nextents < tifp->if_nextents) - return -EINVAL; - - /* - * If we have to use the (expensive) rmap swap method, we can - * handle any number of extents and any format. - */ - if (xfs_has_rmapbt(ip->i_mount)) - return 0; - - /* - * if the target inode is in extent form and the temp inode is in btree - * form then we will end up with the target inode in the wrong format - * as we already know there are less extents in the temp inode. - */ - if (ifp->if_format == XFS_DINODE_FMT_EXTENTS && - tifp->if_format == XFS_DINODE_FMT_BTREE) - return -EINVAL; - - /* Check temp in extent form to max in target */ - if (tifp->if_format == XFS_DINODE_FMT_EXTENTS && - tifp->if_nextents > XFS_IFORK_MAXEXT(ip, XFS_DATA_FORK)) - return -EINVAL; - - /* Check target in extent form to max in temp */ - if (ifp->if_format == XFS_DINODE_FMT_EXTENTS && - ifp->if_nextents > XFS_IFORK_MAXEXT(tip, XFS_DATA_FORK)) - return -EINVAL; - - /* - * If we are in a btree format, check that the temp root block will fit - * in the target and that it has enough extents to be in btree format - * in the target. - * - * Note that we have to be careful to allow btree->extent conversions - * (a common defrag case) which will occur when the temp inode is in - * extent format... - */ - if (tifp->if_format == XFS_DINODE_FMT_BTREE) { - if (xfs_inode_has_attr_fork(ip) && - XFS_BMAP_BMDR_SPACE(tifp->if_broot) > xfs_inode_fork_boff(ip)) - return -EINVAL; - if (tifp->if_nextents <= XFS_IFORK_MAXEXT(ip, XFS_DATA_FORK)) - return -EINVAL; - } - - /* Reciprocal target->temp btree format checks */ - if (ifp->if_format == XFS_DINODE_FMT_BTREE) { - if (xfs_inode_has_attr_fork(tip) && - XFS_BMAP_BMDR_SPACE(ip->i_df.if_broot) > xfs_inode_fork_boff(tip)) - return -EINVAL; - if (ifp->if_nextents <= XFS_IFORK_MAXEXT(tip, XFS_DATA_FORK)) - return -EINVAL; - } - - return 0; -} - -static int -xfs_swap_extent_flush( - struct xfs_inode *ip) -{ - int error; - - error = filemap_write_and_wait(VFS_I(ip)->i_mapping); - if (error) - return error; - truncate_pagecache_range(VFS_I(ip), 0, -1); - - /* Verify O_DIRECT for ftmp */ - if (VFS_I(ip)->i_mapping->nrpages) - return -EINVAL; - return 0; -} - -/* - * Fix up the owners of the bmbt blocks to refer to the current inode. The - * change owner scan attempts to order all modified buffers in the current - * transaction. In the event of ordered buffer failure, the offending buffer is - * physically logged as a fallback and the scan returns -EAGAIN. We must roll - * the transaction in this case to replenish the fallback log reservation and - * restart the scan. This process repeats until the scan completes. - */ -static int -xfs_swap_change_owner( - struct xfs_trans **tpp, - struct xfs_inode *ip, - struct xfs_inode *tmpip) -{ - int error; - struct xfs_trans *tp = *tpp; - - do { - error = xfs_bmbt_change_owner(tp, ip, XFS_DATA_FORK, ip->i_ino, - NULL); - /* success or fatal error */ - if (error != -EAGAIN) - break; - - error = xfs_trans_roll(tpp); - if (error) - break; - tp = *tpp; - - /* - * Redirty both inodes so they can relog and keep the log tail - * moving forward. - */ - xfs_trans_ijoin(tp, ip, 0); - xfs_trans_ijoin(tp, tmpip, 0); - xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE); - xfs_trans_log_inode(tp, tmpip, XFS_ILOG_CORE); - } while (true); - - return error; -} - -/* Swap the extents of two files by swapping data forks. */ -int -xfs_swap_extent_forks( - struct xfs_trans **tpp, - struct xfs_swapext_req *req) -{ - struct xfs_inode *ip = req->ip2; - struct xfs_inode *tip = req->ip1; - xfs_filblks_t aforkblks = 0; - xfs_filblks_t taforkblks = 0; - xfs_extnum_t junk; - uint64_t tmp; - int src_log_flags = XFS_ILOG_CORE; - int target_log_flags = XFS_ILOG_CORE; - int error; - - /* - * Count the number of extended attribute blocks - */ - if (xfs_inode_has_attr_fork(ip) && ip->i_af.if_nextents > 0 && - ip->i_af.if_format != XFS_DINODE_FMT_LOCAL) { - error = xfs_bmap_count_blocks(*tpp, ip, XFS_ATTR_FORK, &junk, - &aforkblks); - if (error) - return error; - } - if (xfs_inode_has_attr_fork(tip) && tip->i_af.if_nextents > 0 && - tip->i_af.if_format != XFS_DINODE_FMT_LOCAL) { - error = xfs_bmap_count_blocks(*tpp, tip, XFS_ATTR_FORK, &junk, - &taforkblks); - if (error) - return error; - } - - /* - * Btree format (v3) inodes have the inode number stamped in the bmbt - * block headers. We can't start changing the bmbt blocks until the - * inode owner change is logged so recovery does the right thing in the - * event of a crash. Set the owner change log flags now and leave the - * bmbt scan as the last step. - */ - if (xfs_has_v3inodes(ip->i_mount)) { - if (ip->i_df.if_format == XFS_DINODE_FMT_BTREE) - target_log_flags |= XFS_ILOG_DOWNER; - if (tip->i_df.if_format == XFS_DINODE_FMT_BTREE) - src_log_flags |= XFS_ILOG_DOWNER; - } - - /* - * Swap the data forks of the inodes - */ - swap(ip->i_df, tip->i_df); - - /* - * Fix the on-disk inode values - */ - tmp = (uint64_t)ip->i_nblocks; - ip->i_nblocks = tip->i_nblocks - taforkblks + aforkblks; - tip->i_nblocks = tmp + taforkblks - aforkblks; - - /* - * The extents in the source inode could still contain speculative - * preallocation beyond EOF (e.g. the file is open but not modified - * while defrag is in progress). In that case, we need to copy over the - * number of delalloc blocks the data fork in the source inode is - * tracking beyond EOF so that when the fork is truncated away when the - * temporary inode is unlinked we don't underrun the i_delayed_blks - * counter on that inode. - */ - ASSERT(tip->i_delayed_blks == 0); - tip->i_delayed_blks = ip->i_delayed_blks; - ip->i_delayed_blks = 0; - - switch (ip->i_df.if_format) { - case XFS_DINODE_FMT_EXTENTS: - src_log_flags |= XFS_ILOG_DEXT; - break; - case XFS_DINODE_FMT_BTREE: - ASSERT(!xfs_has_v3inodes(ip->i_mount) || - (src_log_flags & XFS_ILOG_DOWNER)); - src_log_flags |= XFS_ILOG_DBROOT; - break; - } - - switch (tip->i_df.if_format) { - case XFS_DINODE_FMT_EXTENTS: - target_log_flags |= XFS_ILOG_DEXT; - break; - case XFS_DINODE_FMT_BTREE: - target_log_flags |= XFS_ILOG_DBROOT; - ASSERT(!xfs_has_v3inodes(ip->i_mount) || - (target_log_flags & XFS_ILOG_DOWNER)); - break; - } - - /* Do we have to swap reflink flags? */ - if ((ip->i_diflags2 & XFS_DIFLAG2_REFLINK) ^ - (tip->i_diflags2 & XFS_DIFLAG2_REFLINK)) { - uint64_t f; - - f = ip->i_diflags2 & XFS_DIFLAG2_REFLINK; - ip->i_diflags2 &= ~XFS_DIFLAG2_REFLINK; - ip->i_diflags2 |= tip->i_diflags2 & XFS_DIFLAG2_REFLINK; - tip->i_diflags2 &= ~XFS_DIFLAG2_REFLINK; - tip->i_diflags2 |= f & XFS_DIFLAG2_REFLINK; - } - - /* Swap the cow forks. */ - if (xfs_has_reflink(ip->i_mount)) { - ASSERT(!ip->i_cowfp || - ip->i_cowfp->if_format == XFS_DINODE_FMT_EXTENTS); - ASSERT(!tip->i_cowfp || - tip->i_cowfp->if_format == XFS_DINODE_FMT_EXTENTS); - - swap(ip->i_cowfp, tip->i_cowfp); - - if (ip->i_cowfp && ip->i_cowfp->if_bytes) - xfs_inode_set_cowblocks_tag(ip); - else - xfs_inode_clear_cowblocks_tag(ip); - if (tip->i_cowfp && tip->i_cowfp->if_bytes) - xfs_inode_set_cowblocks_tag(tip); - else - xfs_inode_clear_cowblocks_tag(tip); - } - - xfs_trans_log_inode(*tpp, ip, src_log_flags); - xfs_trans_log_inode(*tpp, tip, target_log_flags); - - /* - * The extent forks have been swapped, but crc=1,rmapbt=0 filesystems - * have inode number owner values in the bmbt blocks that still refer to - * the old inode. Scan each bmbt to fix up the owner values with the - * inode number of the current inode. - */ - if (src_log_flags & XFS_ILOG_DOWNER) { - error = xfs_swap_change_owner(tpp, ip, tip); - if (error) - return error; - } - if (target_log_flags & XFS_ILOG_DOWNER) { - error = xfs_swap_change_owner(tpp, tip, ip); - if (error) - return error; - } - - return 0; -} - -int -xfs_swap_extents( - struct xfs_inode *ip, /* target inode */ - struct xfs_inode *tip, /* tmp inode */ - struct xfs_swapext *sxp) -{ - struct xfs_swapext_req req = { - .ip1 = tip, - .ip2 = ip, - .whichfork = XFS_DATA_FORK, - }; - struct xfs_mount *mp = ip->i_mount; - struct xfs_trans *tp; - struct xfs_bstat *sbp = &sxp->sx_stat; - int error = 0; - int resblks = 0; - unsigned int flags = 0; - - /* - * Lock the inodes against other IO, page faults and truncate to - * begin with. Then we can ensure the inodes are flushed and have no - * page cache safely. Once we have done this we can take the ilocks and - * do the rest of the checks. - */ - lock_two_nondirectories(VFS_I(ip), VFS_I(tip)); - filemap_invalidate_lock_two(VFS_I(ip)->i_mapping, - VFS_I(tip)->i_mapping); - - /* Verify that both files have the same format */ - if ((VFS_I(ip)->i_mode & S_IFMT) != (VFS_I(tip)->i_mode & S_IFMT)) { - error = -EINVAL; - goto out_unlock; - } - - /* Verify both files are either real-time or non-realtime */ - if (XFS_IS_REALTIME_INODE(ip) != XFS_IS_REALTIME_INODE(tip)) { - error = -EINVAL; - goto out_unlock; - } - - error = xfs_qm_dqattach(ip); - if (error) - goto out_unlock; - - error = xfs_qm_dqattach(tip); - if (error) - goto out_unlock; - - error = xfs_swap_extent_flush(ip); - if (error) - goto out_unlock; - error = xfs_swap_extent_flush(tip); - if (error) - goto out_unlock; - - if (xfs_inode_has_cow_data(tip)) { - error = xfs_reflink_cancel_cow_range(tip, 0, NULLFILEOFF, true); - if (error) - goto out_unlock; - } - - /* - * Extent "swapping" with rmap requires a permanent reservation and - * a block reservation because it's really just a remap operation - * performed with log redo items! - */ - if (xfs_has_rmapbt(mp)) { - int w = XFS_DATA_FORK; - uint32_t ipnext = ip->i_df.if_nextents; - uint32_t tipnext = tip->i_df.if_nextents; - - /* - * Conceptually this shouldn't affect the shape of either bmbt, - * but since we atomically move extents one by one, we reserve - * enough space to rebuild both trees. - */ - resblks = XFS_SWAP_RMAP_SPACE_RES(mp, ipnext, w); - resblks += XFS_SWAP_RMAP_SPACE_RES(mp, tipnext, w); - - /* - * If either inode straddles a bmapbt block allocation boundary, - * the rmapbt algorithm triggers repeated allocs and frees as - * extents are remapped. This can exhaust the block reservation - * prematurely and cause shutdown. Return freed blocks to the - * transaction reservation to counter this behavior. - */ - flags |= XFS_TRANS_RES_FDBLKS; - } - error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, resblks, 0, flags, - &tp); - if (error) - goto out_unlock; - - /* - * Lock and join the inodes to the tansaction so that transaction commit - * or cancel will unlock the inodes from this point onwards. - */ - xfs_lock_two_inodes(ip, XFS_ILOCK_EXCL, tip, XFS_ILOCK_EXCL); - xfs_trans_ijoin(tp, ip, 0); - xfs_trans_ijoin(tp, tip, 0); - - - /* Verify all data are being swapped */ - if (sxp->sx_offset != 0 || - sxp->sx_length != ip->i_disk_size || - sxp->sx_length != tip->i_disk_size) { - error = -EFAULT; - goto out_trans_cancel; - } - - trace_xfs_swap_extent_before(ip, 0); - trace_xfs_swap_extent_before(tip, 1); - - /* check inode formats now that data is flushed */ - error = xfs_swap_extents_check_format(ip, tip); - if (error) { - xfs_notice(mp, - "%s: inode 0x%llx format is incompatible for exchanging.", - __func__, ip->i_ino); - goto out_trans_cancel; - } - - /* - * Compare the current change & modify times with that - * passed in. If they differ, we abort this swap. - * This is the mechanism used to ensure the calling - * process that the file was not changed out from - * under it. - */ - if ((sbp->bs_ctime.tv_sec != VFS_I(ip)->i_ctime.tv_sec) || - (sbp->bs_ctime.tv_nsec != VFS_I(ip)->i_ctime.tv_nsec) || - (sbp->bs_mtime.tv_sec != VFS_I(ip)->i_mtime.tv_sec) || - (sbp->bs_mtime.tv_nsec != VFS_I(ip)->i_mtime.tv_nsec)) { - error = -EBUSY; - goto out_trans_cancel; - } - - /* - * Note the trickiness in setting the log flags - we set the owner log - * flag on the opposite inode (i.e. the inode we are setting the new - * owner to be) because once we swap the forks and log that, log - * recovery is going to see the fork as owned by the swapped inode, - * not the pre-swapped inodes. - */ - req.blockcount = XFS_B_TO_FSB(ip->i_mount, i_size_read(VFS_I(ip))); - if (xfs_has_rmapbt(mp)) { - xfs_swapext(tp, &req); - error = xfs_defer_finish(&tp); - } else - error = xfs_swap_extent_forks(&tp, &req); - if (error) { - trace_xfs_swap_extent_error(ip, error, _THIS_IP_); - goto out_trans_cancel; - } - - /* - * If this is a synchronous mount, make sure that the - * transaction goes to disk before returning to the user. - */ - if (xfs_has_wsync(mp)) - xfs_trans_set_sync(tp); - - error = xfs_trans_commit(tp); - - trace_xfs_swap_extent_after(ip, 0); - trace_xfs_swap_extent_after(tip, 1); - -out_unlock_ilock: - xfs_iunlock(ip, XFS_ILOCK_EXCL); - xfs_iunlock(tip, XFS_ILOCK_EXCL); -out_unlock: - filemap_invalidate_unlock_two(VFS_I(ip)->i_mapping, - VFS_I(tip)->i_mapping); - unlock_two_nondirectories(VFS_I(ip), VFS_I(tip)); - return error; - -out_trans_cancel: - xfs_trans_cancel(tp); - goto out_unlock_ilock; -} diff --git a/fs/xfs/xfs_bmap_util.h b/fs/xfs/xfs_bmap_util.h index 39c71da08403..8eb7166aa9d4 100644 --- a/fs/xfs/xfs_bmap_util.h +++ b/fs/xfs/xfs_bmap_util.h @@ -66,13 +66,6 @@ int xfs_insert_file_space(struct xfs_inode *, xfs_off_t offset, bool xfs_can_free_eofblocks(struct xfs_inode *ip, bool force); int xfs_free_eofblocks(struct xfs_inode *ip); -int xfs_swap_extents(struct xfs_inode *ip, struct xfs_inode *tip, - struct xfs_swapext *sx); - -struct xfs_swapext_req; -int xfs_swap_extent_forks(struct xfs_trans **tpp, struct xfs_swapext_req *req); -int xfs_swap_extents_check_format(struct xfs_inode *ip, struct xfs_inode *tip); - xfs_daddr_t xfs_fsb_to_db(struct xfs_inode *ip, xfs_fsblock_t fsb); xfs_extnum_t xfs_bmap_count_leaves(struct xfs_ifork *ifp, xfs_filblks_t *count); diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c index 4b2a02a08dfa..85c33142c5ab 100644 --- a/fs/xfs/xfs_ioctl.c +++ b/fs/xfs/xfs_ioctl.c @@ -1655,81 +1655,43 @@ xfs_ioc_scrub_metadata( int xfs_ioc_swapext( - xfs_swapext_t *sxp) + struct xfs_swapext *sxp) { - xfs_inode_t *ip, *tip; - struct fd f, tmp; - int error = 0; + struct file_xchg_range fxr = { 0 }; + struct fd fd2, fd1; + int error = 0; - /* Pull information for the target fd */ - f = fdget((int)sxp->sx_fdtarget); - if (!f.file) { - error = -EINVAL; - goto out; - } - - if (!(f.file->f_mode & FMODE_WRITE) || - !(f.file->f_mode & FMODE_READ) || - (f.file->f_flags & O_APPEND)) { - error = -EBADF; - goto out_put_file; - } + fd2 = fdget((int)sxp->sx_fdtarget); + if (!fd2.file) + return -EINVAL; - tmp = fdget((int)sxp->sx_fdtmp); - if (!tmp.file) { + fd1 = fdget((int)sxp->sx_fdtmp); + if (!fd1.file) { error = -EINVAL; - goto out_put_file; + goto dest_fdput; } - if (!(tmp.file->f_mode & FMODE_WRITE) || - !(tmp.file->f_mode & FMODE_READ) || - (tmp.file->f_flags & O_APPEND)) { - error = -EBADF; - goto out_put_tmp_file; - } + fxr.file1_fd = sxp->sx_fdtmp; + fxr.length = sxp->sx_length; + fxr.flags = FILE_XCHG_RANGE_NONATOMIC | FILE_XCHG_RANGE_FILE2_FRESH | + FILE_XCHG_RANGE_FULL_FILES; + fxr.file2_ino = sxp->sx_stat.bs_ino; + fxr.file2_mtime = sxp->sx_stat.bs_mtime.tv_sec; + fxr.file2_ctime = sxp->sx_stat.bs_ctime.tv_sec; + fxr.file2_mtime_nsec = sxp->sx_stat.bs_mtime.tv_nsec; + fxr.file2_ctime_nsec = sxp->sx_stat.bs_ctime.tv_nsec; - if (IS_SWAPFILE(file_inode(f.file)) || - IS_SWAPFILE(file_inode(tmp.file))) { - error = -EINVAL; - goto out_put_tmp_file; - } + error = vfs_xchg_file_range(fd1.file, fd2.file, &fxr); /* - * We need to ensure that the fds passed in point to XFS inodes - * before we cast and access them as XFS structures as we have no - * control over what the user passes us here. + * The old implementation returned EFAULT if the swap range was not + * the entirety of both files. */ - if (f.file->f_op != &xfs_file_operations || - tmp.file->f_op != &xfs_file_operations) { - error = -EINVAL; - goto out_put_tmp_file; - } - - ip = XFS_I(file_inode(f.file)); - tip = XFS_I(file_inode(tmp.file)); - - if (ip->i_mount != tip->i_mount) { - error = -EINVAL; - goto out_put_tmp_file; - } - - if (ip->i_ino == tip->i_ino) { - error = -EINVAL; - goto out_put_tmp_file; - } - - if (xfs_is_shutdown(ip->i_mount)) { - error = -EIO; - goto out_put_tmp_file; - } - - error = xfs_swap_extents(ip, tip, sxp); - - out_put_tmp_file: - fdput(tmp); - out_put_file: - fdput(f); - out: + if (error == -EDOM) + error = -EFAULT; + fdput(fd1); +dest_fdput: + fdput(fd2); return error; } @@ -1988,14 +1950,10 @@ xfs_file_ioctl( case XFS_IOC_SWAPEXT: { struct xfs_swapext sxp; - if (copy_from_user(&sxp, arg, sizeof(xfs_swapext_t))) + if (copy_from_user(&sxp, arg, sizeof(struct xfs_swapext))) return -EFAULT; - error = mnt_want_write_file(filp); - if (error) - return error; - error = xfs_ioc_swapext(&sxp); - mnt_drop_write_file(filp); - return error; + + return xfs_ioc_swapext(&sxp); } case XFS_IOC_FSCOUNTS: { diff --git a/fs/xfs/xfs_ioctl.h b/fs/xfs/xfs_ioctl.h index d4abba2c13c1..e3f72d816e0e 100644 --- a/fs/xfs/xfs_ioctl.h +++ b/fs/xfs/xfs_ioctl.h @@ -10,9 +10,7 @@ struct xfs_bstat; struct xfs_ibulk; struct xfs_inogrp; -int -xfs_ioc_swapext( - xfs_swapext_t *sxp); +int xfs_ioc_swapext(struct xfs_swapext *sxp); extern int xfs_find_handle( diff --git a/fs/xfs/xfs_ioctl32.c b/fs/xfs/xfs_ioctl32.c index 2f54b701eead..885d6e58d7ec 100644 --- a/fs/xfs/xfs_ioctl32.c +++ b/fs/xfs/xfs_ioctl32.c @@ -425,7 +425,6 @@ xfs_file_compat_ioctl( struct inode *inode = file_inode(filp); struct xfs_inode *ip = XFS_I(inode); void __user *arg = compat_ptr(p); - int error; trace_xfs_file_compat_ioctl(ip); @@ -435,6 +434,7 @@ xfs_file_compat_ioctl( return xfs_compat_ioc_fsgeometry_v1(ip->i_mount, arg); case XFS_IOC_FSGROWFSDATA_32: { struct xfs_growfs_data in; + int error; if (xfs_compat_growfs_data_copyin(&in, arg)) return -EFAULT; @@ -447,6 +447,7 @@ xfs_file_compat_ioctl( } case XFS_IOC_FSGROWFSRT_32: { struct xfs_growfs_rt in; + int error; if (xfs_compat_growfs_rt_copyin(&in, arg)) return -EFAULT; @@ -471,12 +472,8 @@ xfs_file_compat_ioctl( offsetof(struct xfs_swapext, sx_stat)) || xfs_ioctl32_bstat_copyin(&sxp.sx_stat, &sxu->sx_stat)) return -EFAULT; - error = mnt_want_write_file(filp); - if (error) - return error; - error = xfs_ioc_swapext(&sxp); - mnt_drop_write_file(filp); - return error; + + return xfs_ioc_swapext(&sxp); } case XFS_IOC_FSBULKSTAT_32: case XFS_IOC_FSBULKSTAT_SINGLE_32: diff --git a/fs/xfs/xfs_xchgrange.c b/fs/xfs/xfs_xchgrange.c index 2b7aedc49923..27bb88dcf228 100644 --- a/fs/xfs/xfs_xchgrange.c +++ b/fs/xfs/xfs_xchgrange.c @@ -2,6 +2,11 @@ /* * Copyright (C) 2022 Oracle. All Rights Reserved. * Author: Darrick J. Wong + * + * The xfs_swap_extent_* functions are: + * Copyright (c) 2000-2006 Silicon Graphics, Inc. + * Copyright (c) 2012 Red Hat, Inc. + * All Rights Reserved. */ #include "xfs.h" #include "xfs_fs.h" @@ -15,6 +20,7 @@ #include "xfs_trans.h" #include "xfs_quota.h" #include "xfs_bmap_util.h" +#include "xfs_bmap_btree.h" #include "xfs_reflink.h" #include "xfs_trace.h" #include "xfs_swapext.h" @@ -71,6 +77,299 @@ xfs_xchg_range_estimate( return error; } +/* + * We need to check that the format of the data fork in the temporary inode is + * valid for the target inode before doing the swap. This is not a problem with + * attr1 because of the fixed fork offset, but attr2 has a dynamically sized + * data fork depending on the space the attribute fork is taking so we can get + * invalid formats on the target inode. + * + * E.g. target has space for 7 extents in extent format, temp inode only has + * space for 6. If we defragment down to 7 extents, then the tmp format is a + * btree, but when swapped it needs to be in extent format. Hence we can't just + * blindly swap data forks on attr2 filesystems. + * + * Note that we check the swap in both directions so that we don't end up with + * a corrupt temporary inode, either. + * + * Note that fixing the way xfs_fsr sets up the attribute fork in the source + * inode will prevent this situation from occurring, so all we do here is + * reject and log the attempt. basically we are putting the responsibility on + * userspace to get this right. + */ +STATIC int +xfs_swap_extents_check_format( + struct xfs_inode *ip, /* target inode */ + struct xfs_inode *tip) /* tmp inode */ +{ + struct xfs_ifork *ifp = &ip->i_df; + struct xfs_ifork *tifp = &tip->i_df; + + /* User/group/project quota ids must match if quotas are enforced. */ + if (XFS_IS_QUOTA_ON(ip->i_mount) && + (!uid_eq(VFS_I(ip)->i_uid, VFS_I(tip)->i_uid) || + !gid_eq(VFS_I(ip)->i_gid, VFS_I(tip)->i_gid) || + ip->i_projid != tip->i_projid)) + return -EINVAL; + + /* Should never get a local format */ + if (ifp->if_format == XFS_DINODE_FMT_LOCAL || + tifp->if_format == XFS_DINODE_FMT_LOCAL) + return -EINVAL; + + /* + * if the target inode has less extents that then temporary inode then + * why did userspace call us? + */ + if (ifp->if_nextents < tifp->if_nextents) + return -EINVAL; + + /* + * If we have to use the (expensive) rmap swap method, we can + * handle any number of extents and any format. + */ + if (xfs_has_rmapbt(ip->i_mount)) + return 0; + + /* + * if the target inode is in extent form and the temp inode is in btree + * form then we will end up with the target inode in the wrong format + * as we already know there are less extents in the temp inode. + */ + if (ifp->if_format == XFS_DINODE_FMT_EXTENTS && + tifp->if_format == XFS_DINODE_FMT_BTREE) + return -EINVAL; + + /* Check temp in extent form to max in target */ + if (tifp->if_format == XFS_DINODE_FMT_EXTENTS && + tifp->if_nextents > XFS_IFORK_MAXEXT(ip, XFS_DATA_FORK)) + return -EINVAL; + + /* Check target in extent form to max in temp */ + if (ifp->if_format == XFS_DINODE_FMT_EXTENTS && + ifp->if_nextents > XFS_IFORK_MAXEXT(tip, XFS_DATA_FORK)) + return -EINVAL; + + /* + * If we are in a btree format, check that the temp root block will fit + * in the target and that it has enough extents to be in btree format + * in the target. + * + * Note that we have to be careful to allow btree->extent conversions + * (a common defrag case) which will occur when the temp inode is in + * extent format... + */ + if (tifp->if_format == XFS_DINODE_FMT_BTREE) { + if (xfs_inode_has_attr_fork(ip) && + XFS_BMAP_BMDR_SPACE(tifp->if_broot) > xfs_inode_fork_boff(ip)) + return -EINVAL; + if (tifp->if_nextents <= XFS_IFORK_MAXEXT(ip, XFS_DATA_FORK)) + return -EINVAL; + } + + /* Reciprocal target->temp btree format checks */ + if (ifp->if_format == XFS_DINODE_FMT_BTREE) { + if (xfs_inode_has_attr_fork(tip) && + XFS_BMAP_BMDR_SPACE(ip->i_df.if_broot) > xfs_inode_fork_boff(tip)) + return -EINVAL; + if (ifp->if_nextents <= XFS_IFORK_MAXEXT(tip, XFS_DATA_FORK)) + return -EINVAL; + } + + return 0; +} + +/* + * Fix up the owners of the bmbt blocks to refer to the current inode. The + * change owner scan attempts to order all modified buffers in the current + * transaction. In the event of ordered buffer failure, the offending buffer is + * physically logged as a fallback and the scan returns -EAGAIN. We must roll + * the transaction in this case to replenish the fallback log reservation and + * restart the scan. This process repeats until the scan completes. + */ +static int +xfs_swap_change_owner( + struct xfs_trans **tpp, + struct xfs_inode *ip, + struct xfs_inode *tmpip) +{ + int error; + struct xfs_trans *tp = *tpp; + + do { + error = xfs_bmbt_change_owner(tp, ip, XFS_DATA_FORK, ip->i_ino, + NULL); + /* success or fatal error */ + if (error != -EAGAIN) + break; + + error = xfs_trans_roll(tpp); + if (error) + break; + tp = *tpp; + + /* + * Redirty both inodes so they can relog and keep the log tail + * moving forward. + */ + xfs_trans_ijoin(tp, ip, 0); + xfs_trans_ijoin(tp, tmpip, 0); + xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE); + xfs_trans_log_inode(tp, tmpip, XFS_ILOG_CORE); + } while (true); + + return error; +} + +/* Swap the extents of two files by swapping data forks. */ +STATIC int +xfs_swap_extent_forks( + struct xfs_trans **tpp, + struct xfs_swapext_req *req) +{ + struct xfs_inode *ip = req->ip2; + struct xfs_inode *tip = req->ip1; + xfs_filblks_t aforkblks = 0; + xfs_filblks_t taforkblks = 0; + xfs_extnum_t junk; + uint64_t tmp; + int src_log_flags = XFS_ILOG_CORE; + int target_log_flags = XFS_ILOG_CORE; + int error; + + /* + * Count the number of extended attribute blocks + */ + if (xfs_inode_has_attr_fork(ip) && ip->i_af.if_nextents > 0 && + ip->i_af.if_format != XFS_DINODE_FMT_LOCAL) { + error = xfs_bmap_count_blocks(*tpp, ip, XFS_ATTR_FORK, &junk, + &aforkblks); + if (error) + return error; + } + if (xfs_inode_has_attr_fork(tip) && tip->i_af.if_nextents > 0 && + tip->i_af.if_format != XFS_DINODE_FMT_LOCAL) { + error = xfs_bmap_count_blocks(*tpp, tip, XFS_ATTR_FORK, &junk, + &taforkblks); + if (error) + return error; + } + + /* + * Btree format (v3) inodes have the inode number stamped in the bmbt + * block headers. We can't start changing the bmbt blocks until the + * inode owner change is logged so recovery does the right thing in the + * event of a crash. Set the owner change log flags now and leave the + * bmbt scan as the last step. + */ + if (xfs_has_v3inodes(ip->i_mount)) { + if (ip->i_df.if_format == XFS_DINODE_FMT_BTREE) + target_log_flags |= XFS_ILOG_DOWNER; + if (tip->i_df.if_format == XFS_DINODE_FMT_BTREE) + src_log_flags |= XFS_ILOG_DOWNER; + } + + /* + * Swap the data forks of the inodes + */ + swap(ip->i_df, tip->i_df); + + /* + * Fix the on-disk inode values + */ + tmp = (uint64_t)ip->i_nblocks; + ip->i_nblocks = tip->i_nblocks - taforkblks + aforkblks; + tip->i_nblocks = tmp + taforkblks - aforkblks; + + /* + * The extents in the source inode could still contain speculative + * preallocation beyond EOF (e.g. the file is open but not modified + * while defrag is in progress). In that case, we need to copy over the + * number of delalloc blocks the data fork in the source inode is + * tracking beyond EOF so that when the fork is truncated away when the + * temporary inode is unlinked we don't underrun the i_delayed_blks + * counter on that inode. + */ + ASSERT(tip->i_delayed_blks == 0); + tip->i_delayed_blks = ip->i_delayed_blks; + ip->i_delayed_blks = 0; + + switch (ip->i_df.if_format) { + case XFS_DINODE_FMT_EXTENTS: + src_log_flags |= XFS_ILOG_DEXT; + break; + case XFS_DINODE_FMT_BTREE: + ASSERT(!xfs_has_v3inodes(ip->i_mount) || + (src_log_flags & XFS_ILOG_DOWNER)); + src_log_flags |= XFS_ILOG_DBROOT; + break; + } + + switch (tip->i_df.if_format) { + case XFS_DINODE_FMT_EXTENTS: + target_log_flags |= XFS_ILOG_DEXT; + break; + case XFS_DINODE_FMT_BTREE: + target_log_flags |= XFS_ILOG_DBROOT; + ASSERT(!xfs_has_v3inodes(ip->i_mount) || + (target_log_flags & XFS_ILOG_DOWNER)); + break; + } + + /* Do we have to swap reflink flags? */ + if ((ip->i_diflags2 & XFS_DIFLAG2_REFLINK) ^ + (tip->i_diflags2 & XFS_DIFLAG2_REFLINK)) { + uint64_t f; + + f = ip->i_diflags2 & XFS_DIFLAG2_REFLINK; + ip->i_diflags2 &= ~XFS_DIFLAG2_REFLINK; + ip->i_diflags2 |= tip->i_diflags2 & XFS_DIFLAG2_REFLINK; + tip->i_diflags2 &= ~XFS_DIFLAG2_REFLINK; + tip->i_diflags2 |= f & XFS_DIFLAG2_REFLINK; + } + + /* Swap the cow forks. */ + if (xfs_has_reflink(ip->i_mount)) { + ASSERT(!ip->i_cowfp || + ip->i_cowfp->if_format == XFS_DINODE_FMT_EXTENTS); + ASSERT(!tip->i_cowfp || + tip->i_cowfp->if_format == XFS_DINODE_FMT_EXTENTS); + + swap(ip->i_cowfp, tip->i_cowfp); + + if (ip->i_cowfp && ip->i_cowfp->if_bytes) + xfs_inode_set_cowblocks_tag(ip); + else + xfs_inode_clear_cowblocks_tag(ip); + if (tip->i_cowfp && tip->i_cowfp->if_bytes) + xfs_inode_set_cowblocks_tag(tip); + else + xfs_inode_clear_cowblocks_tag(tip); + } + + xfs_trans_log_inode(*tpp, ip, src_log_flags); + xfs_trans_log_inode(*tpp, tip, target_log_flags); + + /* + * The extent forks have been swapped, but crc=1,rmapbt=0 filesystems + * have inode number owner values in the bmbt blocks that still refer to + * the old inode. Scan each bmbt to fix up the owner values with the + * inode number of the current inode. + */ + if (src_log_flags & XFS_ILOG_DOWNER) { + error = xfs_swap_change_owner(tpp, ip, tip); + if (error) + return error; + } + if (target_log_flags & XFS_ILOG_DOWNER) { + error = xfs_swap_change_owner(tpp, tip, ip); + if (error) + return error; + } + + return 0; +} + /* Prepare two files to have their data exchanged. */ int xfs_xchg_range_prep( From patchwork Fri Dec 30 22:13:57 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13084983 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 515A3C3DA7D for ; Fri, 30 Dec 2022 23:54:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235809AbiL3XyV (ORCPT ); Fri, 30 Dec 2022 18:54:21 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50480 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235661AbiL3XyU (ORCPT ); Fri, 30 Dec 2022 18:54:20 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 52C0B1DF3A; Fri, 30 Dec 2022 15:54:19 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id E3EA461C63; Fri, 30 Dec 2022 23:54:18 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 55BE6C433EF; Fri, 30 Dec 2022 23:54:18 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1672444458; bh=9ClwFWQXTUDSfnYX0ax6pVW/z2nuo6SRg62Gf8aE69U=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=ZtPeJDC1/2UEplRtXWlw9hygW3WJyPQc1pEA6oKArBwS6aceSOwCM0hZiwCji2xmB 5EL+qKYycTseyWnSYytUmcNJlkhD8k5tEcwN0bXdGCTqFqa7YRkfwE2JshoDetJ2bI HZ7i16PFsf+2fbt0/R/nmbjD2xF9vyUohrV4p8mN431mwr6awBUXczphBQrqDJqAmz XwWn6Yia8+jQPx3f98aEBqe92QwTQfqDhwkBRxgl50gpqjOMSnQWA6kjh2PaYyYUr/ y5H8+Y5fvLEmcseVRFjqN1zTzWPcGvi9czy5v9CIBqFoNPg05TIFW7qWTA/9z5cJUW BFIe6p2rzvKSA== Subject: [PATCH 16/21] xfs: condense extended attributes after an atomic swap From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org Date: Fri, 30 Dec 2022 14:13:57 -0800 Message-ID: <167243843757.699466.12904802285567328629.stgit@magnolia> In-Reply-To: <167243843494.699466.5163281976943635014.stgit@magnolia> References: <167243843494.699466.5163281976943635014.stgit@magnolia> User-Agent: StGit/0.19 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Darrick J. Wong Add a new swapext flag that enables us to perform post-swap processing on file2 once we're done swapping the extent maps. If we were swapping the extended attributes, we want to be able to convert file2's attr fork from block to inline format. This isn't used anywhere right now, but we need to have the basic ondisk flags in place so that a future online xattr repair feature can create salvaged attrs in a temporary file and swap the attr forks when ready. If one file is in extents format and the other is inline, we will have to promote both to extents format to perform the swap. After the swap, we can try to condense the fixed file's attr fork back down to inline format if possible. Signed-off-by: Darrick J. Wong --- fs/xfs/libxfs/xfs_log_format.h | 9 +++++-- fs/xfs/libxfs/xfs_swapext.c | 51 +++++++++++++++++++++++++++++++++++++++- fs/xfs/libxfs/xfs_swapext.h | 9 +++++-- 3 files changed, 64 insertions(+), 5 deletions(-) diff --git a/fs/xfs/libxfs/xfs_log_format.h b/fs/xfs/libxfs/xfs_log_format.h index 65a84fdefe56..378201a70028 100644 --- a/fs/xfs/libxfs/xfs_log_format.h +++ b/fs/xfs/libxfs/xfs_log_format.h @@ -906,18 +906,23 @@ struct xfs_swap_extent { /* Clear the reflink flag from inode2 after the operation. */ #define XFS_SWAP_EXT_CLEAR_INO2_REFLINK (1ULL << 4) +/* Try to convert inode2 from block to short format at the end, if possible. */ +#define XFS_SWAP_EXT_CVT_INO2_SF (1ULL << 5) + #define XFS_SWAP_EXT_FLAGS (XFS_SWAP_EXT_ATTR_FORK | \ XFS_SWAP_EXT_SET_SIZES | \ XFS_SWAP_EXT_SKIP_INO1_HOLES | \ XFS_SWAP_EXT_CLEAR_INO1_REFLINK | \ - XFS_SWAP_EXT_CLEAR_INO2_REFLINK) + XFS_SWAP_EXT_CLEAR_INO2_REFLINK | \ + XFS_SWAP_EXT_CVT_INO2_SF) #define XFS_SWAP_EXT_STRINGS \ { XFS_SWAP_EXT_ATTR_FORK, "ATTRFORK" }, \ { XFS_SWAP_EXT_SET_SIZES, "SETSIZES" }, \ { XFS_SWAP_EXT_SKIP_INO1_HOLES, "SKIP_INO1_HOLES" }, \ { XFS_SWAP_EXT_CLEAR_INO1_REFLINK, "CLEAR_INO1_REFLINK" }, \ - { XFS_SWAP_EXT_CLEAR_INO2_REFLINK, "CLEAR_INO2_REFLINK" } + { XFS_SWAP_EXT_CLEAR_INO2_REFLINK, "CLEAR_INO2_REFLINK" }, \ + { XFS_SWAP_EXT_CVT_INO2_SF, "CVT_INO2_SF" } /* This is the structure used to lay out an sxi log item in the log. */ struct xfs_sxi_log_format { diff --git a/fs/xfs/libxfs/xfs_swapext.c b/fs/xfs/libxfs/xfs_swapext.c index 227a08ac5d4b..6b5223e73692 100644 --- a/fs/xfs/libxfs/xfs_swapext.c +++ b/fs/xfs/libxfs/xfs_swapext.c @@ -23,6 +23,10 @@ #include "xfs_error.h" #include "xfs_errortag.h" #include "xfs_health.h" +#include "xfs_da_format.h" +#include "xfs_da_btree.h" +#include "xfs_attr_leaf.h" +#include "xfs_attr.h" struct kmem_cache *xfs_swapext_intent_cache; @@ -121,7 +125,8 @@ static inline bool sxi_has_postop_work(const struct xfs_swapext_intent *sxi) { return sxi->sxi_flags & (XFS_SWAP_EXT_CLEAR_INO1_REFLINK | - XFS_SWAP_EXT_CLEAR_INO2_REFLINK); + XFS_SWAP_EXT_CLEAR_INO2_REFLINK | + XFS_SWAP_EXT_CVT_INO2_SF); } static inline void @@ -350,6 +355,36 @@ xfs_swapext_exchange_mappings( sxi_advance(sxi, irec1); } +/* Convert inode2's leaf attr fork back to shortform, if possible.. */ +STATIC int +xfs_swapext_attr_to_sf( + struct xfs_trans *tp, + struct xfs_swapext_intent *sxi) +{ + struct xfs_da_args args = { + .dp = sxi->sxi_ip2, + .geo = tp->t_mountp->m_attr_geo, + .whichfork = XFS_ATTR_FORK, + .trans = tp, + }; + struct xfs_buf *bp; + int forkoff; + int error; + + if (!xfs_attr_is_leaf(sxi->sxi_ip2)) + return 0; + + error = xfs_attr3_leaf_read(tp, sxi->sxi_ip2, 0, &bp); + if (error) + return error; + + forkoff = xfs_attr_shortform_allfit(bp, sxi->sxi_ip2); + if (forkoff == 0) + return 0; + + return xfs_attr3_leaf_to_shortform(bp, &args, forkoff); +} + static inline void xfs_swapext_clear_reflink( struct xfs_trans *tp, @@ -367,6 +402,16 @@ xfs_swapext_do_postop_work( struct xfs_trans *tp, struct xfs_swapext_intent *sxi) { + if (sxi->sxi_flags & XFS_SWAP_EXT_CVT_INO2_SF) { + int error = 0; + + if (sxi->sxi_flags & XFS_SWAP_EXT_ATTR_FORK) + error = xfs_swapext_attr_to_sf(tp, sxi); + sxi->sxi_flags &= ~XFS_SWAP_EXT_CVT_INO2_SF; + if (error) + return error; + } + if (sxi->sxi_flags & XFS_SWAP_EXT_CLEAR_INO1_REFLINK) { xfs_swapext_clear_reflink(tp, sxi->sxi_ip1); sxi->sxi_flags &= ~XFS_SWAP_EXT_CLEAR_INO1_REFLINK; @@ -794,6 +839,8 @@ xfs_swapext_init_intent( if (req->req_flags & XFS_SWAP_REQ_SKIP_INO1_HOLES) sxi->sxi_flags |= XFS_SWAP_EXT_SKIP_INO1_HOLES; + if (req->req_flags & XFS_SWAP_REQ_CVT_INO2_SF) + sxi->sxi_flags |= XFS_SWAP_EXT_CVT_INO2_SF; if (req->req_flags & XFS_SWAP_REQ_LOGGED) sxi->sxi_op_flags |= XFS_SWAP_EXT_OP_LOGGED; @@ -1013,6 +1060,8 @@ xfs_swapext( ASSERT(!(req->req_flags & ~XFS_SWAP_REQ_FLAGS)); if (req->req_flags & XFS_SWAP_REQ_SET_SIZES) ASSERT(req->whichfork == XFS_DATA_FORK); + if (req->req_flags & XFS_SWAP_REQ_CVT_INO2_SF) + ASSERT(req->whichfork == XFS_ATTR_FORK); if (req->blockcount == 0) return; diff --git a/fs/xfs/libxfs/xfs_swapext.h b/fs/xfs/libxfs/xfs_swapext.h index 1987897ddc25..6b610fea150a 100644 --- a/fs/xfs/libxfs/xfs_swapext.h +++ b/fs/xfs/libxfs/xfs_swapext.h @@ -126,16 +126,21 @@ struct xfs_swapext_req { /* Files need to be upgraded to have large extent counts. */ #define XFS_SWAP_REQ_NREXT64 (1U << 3) +/* Try to convert inode2's fork to local format, if possible. */ +#define XFS_SWAP_REQ_CVT_INO2_SF (1U << 4) + #define XFS_SWAP_REQ_FLAGS (XFS_SWAP_REQ_LOGGED | \ XFS_SWAP_REQ_SET_SIZES | \ XFS_SWAP_REQ_SKIP_INO1_HOLES | \ - XFS_SWAP_REQ_NREXT64) + XFS_SWAP_REQ_NREXT64 | \ + XFS_SWAP_REQ_CVT_INO2_SF) #define XFS_SWAP_REQ_STRINGS \ { XFS_SWAP_REQ_LOGGED, "LOGGED" }, \ { XFS_SWAP_REQ_SET_SIZES, "SETSIZES" }, \ { XFS_SWAP_REQ_SKIP_INO1_HOLES, "SKIP_INO1_HOLES" }, \ - { XFS_SWAP_REQ_NREXT64, "NREXT64" } + { XFS_SWAP_REQ_NREXT64, "NREXT64" }, \ + { XFS_SWAP_REQ_CVT_INO2_SF, "CVT_INO2_SF" } unsigned int xfs_swapext_reflink_prep(const struct xfs_swapext_req *req); void xfs_swapext_reflink_finish(struct xfs_trans *tp, From patchwork Fri Dec 30 22:13:57 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13084984 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 133A8C4708E for ; Fri, 30 Dec 2022 23:54:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235834AbiL3Xyj (ORCPT ); Fri, 30 Dec 2022 18:54:39 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50628 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235817AbiL3Xyf (ORCPT ); Fri, 30 Dec 2022 18:54:35 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D05DB1E3D3; Fri, 30 Dec 2022 15:54:34 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 6E96461C5B; Fri, 30 Dec 2022 23:54:34 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id CF5EDC433D2; Fri, 30 Dec 2022 23:54:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1672444473; bh=Us/J2beDbg7zaJmu/2DSsuF/lnWPPWk8P86/YJdCKYE=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=MQ/ZzJ4gA741tuxb4WrqZ1rqBehyN4kLiNpUGLCmfIUY/AiLhh9o4TnEGlKuX0Z48 DcRz/LMqrBJ7essq/zyAZRsXO730WHN5P9/oXOZ3Iz5WvRjPbWaGOBV2q01RRCaUiH w/dz+HaLQh7XjL8KYXMohbSErVprl8n0hIS3eXjbD4RZDkvixSSbbWu68JCmzmL825 Jn6W43okbGSvwe91y++f5AAaYU39QH+eIYl9R700mNxJXLUU5pdT3EptqEnUFMmPxQ n8/XOp1TGsKd6lgfHF6y+045wE3eipMC4sLUSpbNJIA3RQ8PUBgRWD6A2Yu0a2LQkX UDS5SOgy3j1KQ== Subject: [PATCH 17/21] xfs: condense directories after an atomic swap From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org Date: Fri, 30 Dec 2022 14:13:57 -0800 Message-ID: <167243843771.699466.8963465782484534140.stgit@magnolia> In-Reply-To: <167243843494.699466.5163281976943635014.stgit@magnolia> References: <167243843494.699466.5163281976943635014.stgit@magnolia> User-Agent: StGit/0.19 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Darrick J. Wong The previous commit added a new swapext flag that enables us to perform post-swap processing on file2 once we're done swapping the extent maps. Now add this ability for directories. This isn't used anywhere right now, but we need to have the basic ondisk flags in place so that a future online directory repair feature can create salvaged dirents in a temporary directory and swap the data forks when ready. If one file is in extents format and the other is inline, we will have to promote both to extents format to perform the swap. After the swap, we can try to condense the fixed directory down to inline format if possible. Signed-off-by: Darrick J. Wong --- fs/xfs/libxfs/xfs_swapext.c | 44 ++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 43 insertions(+), 1 deletion(-) diff --git a/fs/xfs/libxfs/xfs_swapext.c b/fs/xfs/libxfs/xfs_swapext.c index 6b5223e73692..a52f72a499f4 100644 --- a/fs/xfs/libxfs/xfs_swapext.c +++ b/fs/xfs/libxfs/xfs_swapext.c @@ -27,6 +27,8 @@ #include "xfs_da_btree.h" #include "xfs_attr_leaf.h" #include "xfs_attr.h" +#include "xfs_dir2_priv.h" +#include "xfs_dir2.h" struct kmem_cache *xfs_swapext_intent_cache; @@ -385,6 +387,42 @@ xfs_swapext_attr_to_sf( return xfs_attr3_leaf_to_shortform(bp, &args, forkoff); } +/* Convert inode2's block dir fork back to shortform, if possible.. */ +STATIC int +xfs_swapext_dir_to_sf( + struct xfs_trans *tp, + struct xfs_swapext_intent *sxi) +{ + struct xfs_da_args args = { + .dp = sxi->sxi_ip2, + .geo = tp->t_mountp->m_dir_geo, + .whichfork = XFS_DATA_FORK, + .trans = tp, + }; + struct xfs_dir2_sf_hdr sfh; + struct xfs_buf *bp; + bool isblock; + int size; + int error; + + error = xfs_dir2_isblock(&args, &isblock); + if (error) + return error; + + if (!isblock) + return 0; + + error = xfs_dir3_block_read(tp, sxi->sxi_ip2, &bp); + if (error) + return error; + + size = xfs_dir2_block_sfsize(sxi->sxi_ip2, bp->b_addr, &sfh); + if (size > xfs_inode_data_fork_size(sxi->sxi_ip2)) + return 0; + + return xfs_dir2_block_to_sf(&args, bp, size, &sfh); +} + static inline void xfs_swapext_clear_reflink( struct xfs_trans *tp, @@ -407,6 +445,8 @@ xfs_swapext_do_postop_work( if (sxi->sxi_flags & XFS_SWAP_EXT_ATTR_FORK) error = xfs_swapext_attr_to_sf(tp, sxi); + else if (S_ISDIR(VFS_I(sxi->sxi_ip2)->i_mode)) + error = xfs_swapext_dir_to_sf(tp, sxi); sxi->sxi_flags &= ~XFS_SWAP_EXT_CVT_INO2_SF; if (error) return error; @@ -1061,7 +1101,9 @@ xfs_swapext( if (req->req_flags & XFS_SWAP_REQ_SET_SIZES) ASSERT(req->whichfork == XFS_DATA_FORK); if (req->req_flags & XFS_SWAP_REQ_CVT_INO2_SF) - ASSERT(req->whichfork == XFS_ATTR_FORK); + ASSERT(req->whichfork == XFS_ATTR_FORK || + (req->whichfork == XFS_DATA_FORK && + S_ISDIR(VFS_I(req->ip2)->i_mode))); if (req->blockcount == 0) return; From patchwork Fri Dec 30 22:13:57 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13084985 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A1D58C3DA7C for ; Fri, 30 Dec 2022 23:54:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235818AbiL3Xyw (ORCPT ); Fri, 30 Dec 2022 18:54:52 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50642 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235819AbiL3Xyv (ORCPT ); Fri, 30 Dec 2022 18:54:51 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 63E8E1E3DE; Fri, 30 Dec 2022 15:54:50 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id F36A861C55; Fri, 30 Dec 2022 23:54:49 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 5EDBFC433EF; Fri, 30 Dec 2022 23:54:49 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1672444489; bh=J8c/a8Xy6EPxYjsODvsxgmlH0Y5nZNdU7B5T9VqmxJQ=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=jlYnV4t3nOEUJnWmn+BMi0pgTn2+njS6b5gFp/Y2oSvhgiOItdgtuRKW9utgbrVpy iTtSrA4H6hodCQjwNeyKwikB9BNpNn7axokIYz1fIgcwcyDS6+stoohuCz3+q9p4h/ bqwqFR0mknP/iaVMUR2YiJXkBePGfgSliFUsqBfYIhTBFlYSV+3XS1eGnllPn/kFHt YlOwr9yAqRv62WxEytddXDPlEe7YUgCzu9piywTw1CqfKpU30RbDQBlnbL9HAKvC2+ HDuc2UNhz0+repYSlU7xw8zPiRWpJHiynyc5aXEGicxtUwwECZt9O3kns1xick07Y1 ybsy9FoFxfJrA== Subject: [PATCH 18/21] xfs: condense symbolic links after an atomic swap From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org Date: Fri, 30 Dec 2022 14:13:57 -0800 Message-ID: <167243843786.699466.11515465917822251289.stgit@magnolia> In-Reply-To: <167243843494.699466.5163281976943635014.stgit@magnolia> References: <167243843494.699466.5163281976943635014.stgit@magnolia> User-Agent: StGit/0.19 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Darrick J. Wong The previous commit added a new swapext flag that enables us to perform post-swap processing on file2 once we're done swapping the extent maps. Now add this ability for symlinks. This isn't used anywhere right now, but we need to have the basic ondisk flags in place so that a future online symlink repair feature can salvage the remote target in a temporary link and swap the data forks when ready. If one file is in extents format and the other is inline, we will have to promote both to extents format to perform the swap. After the swap, we can try to condense the fixed symlink down to inline format if possible. Signed-off-by: Darrick J. Wong --- fs/xfs/libxfs/xfs_swapext.c | 48 +++++++++++++++++++++++++++++++++++ fs/xfs/libxfs/xfs_symlink_remote.c | 47 +++++++++++++++++++++++++++++++++++ fs/xfs/libxfs/xfs_symlink_remote.h | 1 + fs/xfs/xfs_symlink.c | 49 ++++-------------------------------- 4 files changed, 101 insertions(+), 44 deletions(-) diff --git a/fs/xfs/libxfs/xfs_swapext.c b/fs/xfs/libxfs/xfs_swapext.c index a52f72a499f4..b27ceeb93a16 100644 --- a/fs/xfs/libxfs/xfs_swapext.c +++ b/fs/xfs/libxfs/xfs_swapext.c @@ -29,6 +29,7 @@ #include "xfs_attr.h" #include "xfs_dir2_priv.h" #include "xfs_dir2.h" +#include "xfs_symlink_remote.h" struct kmem_cache *xfs_swapext_intent_cache; @@ -423,6 +424,48 @@ xfs_swapext_dir_to_sf( return xfs_dir2_block_to_sf(&args, bp, size, &sfh); } +/* Convert inode2's remote symlink target back to shortform, if possible. */ +STATIC int +xfs_swapext_link_to_sf( + struct xfs_trans *tp, + struct xfs_swapext_intent *sxi) +{ + struct xfs_inode *ip = sxi->sxi_ip2; + struct xfs_ifork *ifp = xfs_ifork_ptr(ip, XFS_DATA_FORK); + char *buf; + int error; + + if (ifp->if_format == XFS_DINODE_FMT_LOCAL || + ip->i_disk_size > xfs_inode_data_fork_size(ip)) + return 0; + + /* Read the current symlink target into a buffer. */ + buf = kmem_alloc(ip->i_disk_size + 1, KM_NOFS); + if (!buf) { + ASSERT(0); + return -ENOMEM; + } + + error = xfs_symlink_remote_read(ip, buf); + if (error) + goto free; + + /* Remove the blocks. */ + error = xfs_symlink_remote_truncate(tp, ip); + if (error) + goto free; + + /* Convert fork to local format and log our changes. */ + xfs_idestroy_fork(ifp); + ifp->if_bytes = 0; + ifp->if_format = XFS_DINODE_FMT_LOCAL; + xfs_init_local_fork(ip, XFS_DATA_FORK, buf, ip->i_disk_size); + xfs_trans_log_inode(tp, ip, XFS_ILOG_DDATA | XFS_ILOG_CORE); +free: + kmem_free(buf); + return error; +} + static inline void xfs_swapext_clear_reflink( struct xfs_trans *tp, @@ -447,6 +490,8 @@ xfs_swapext_do_postop_work( error = xfs_swapext_attr_to_sf(tp, sxi); else if (S_ISDIR(VFS_I(sxi->sxi_ip2)->i_mode)) error = xfs_swapext_dir_to_sf(tp, sxi); + else if (S_ISLNK(VFS_I(sxi->sxi_ip2)->i_mode)) + error = xfs_swapext_link_to_sf(tp, sxi); sxi->sxi_flags &= ~XFS_SWAP_EXT_CVT_INO2_SF; if (error) return error; @@ -1103,7 +1148,8 @@ xfs_swapext( if (req->req_flags & XFS_SWAP_REQ_CVT_INO2_SF) ASSERT(req->whichfork == XFS_ATTR_FORK || (req->whichfork == XFS_DATA_FORK && - S_ISDIR(VFS_I(req->ip2)->i_mode))); + (S_ISDIR(VFS_I(req->ip2)->i_mode) || + S_ISLNK(VFS_I(req->ip2)->i_mode)))); if (req->blockcount == 0) return; diff --git a/fs/xfs/libxfs/xfs_symlink_remote.c b/fs/xfs/libxfs/xfs_symlink_remote.c index 5261f15ea2ed..b48dcb893a2a 100644 --- a/fs/xfs/libxfs/xfs_symlink_remote.c +++ b/fs/xfs/libxfs/xfs_symlink_remote.c @@ -391,3 +391,50 @@ xfs_symlink_write_target( ASSERT(pathlen == 0); return 0; } + +/* Remove all the blocks from a symlink and invalidate buffers. */ +int +xfs_symlink_remote_truncate( + struct xfs_trans *tp, + struct xfs_inode *ip) +{ + struct xfs_bmbt_irec mval[XFS_SYMLINK_MAPS]; + struct xfs_mount *mp = tp->t_mountp; + struct xfs_buf *bp; + int nmaps = XFS_SYMLINK_MAPS; + int done = 0; + int i; + int error; + + /* Read mappings and invalidate buffers. */ + error = xfs_bmapi_read(ip, 0, XFS_MAX_FILEOFF, mval, &nmaps, 0); + if (error) + return error; + + for (i = 0; i < nmaps; i++) { + if (!xfs_bmap_is_real_extent(&mval[i])) + break; + + error = xfs_trans_get_buf(tp, mp->m_ddev_targp, + XFS_FSB_TO_DADDR(mp, mval[i].br_startblock), + XFS_FSB_TO_BB(mp, mval[i].br_blockcount), 0, + &bp); + if (error) + return error; + + xfs_trans_binval(tp, bp); + } + + /* Unmap the remote blocks. */ + error = xfs_bunmapi(tp, ip, 0, XFS_MAX_FILEOFF, 0, nmaps, &done); + if (error) + return error; + if (!done) { + ASSERT(done); + xfs_inode_mark_sick(ip, XFS_SICK_INO_SYMLINK); + return -EFSCORRUPTED; + } + + xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE); + return 0; +} diff --git a/fs/xfs/libxfs/xfs_symlink_remote.h b/fs/xfs/libxfs/xfs_symlink_remote.h index d81461c06b6b..05eb9c3937d9 100644 --- a/fs/xfs/libxfs/xfs_symlink_remote.h +++ b/fs/xfs/libxfs/xfs_symlink_remote.h @@ -23,5 +23,6 @@ int xfs_symlink_remote_read(struct xfs_inode *ip, char *link); int xfs_symlink_write_target(struct xfs_trans *tp, struct xfs_inode *ip, const char *target_path, int pathlen, xfs_fsblock_t fs_blocks, uint resblks); +int xfs_symlink_remote_truncate(struct xfs_trans *tp, struct xfs_inode *ip); #endif /* __XFS_SYMLINK_REMOTE_H */ diff --git a/fs/xfs/xfs_symlink.c b/fs/xfs/xfs_symlink.c index 548d9116e0c5..8cf69ca4bd7c 100644 --- a/fs/xfs/xfs_symlink.c +++ b/fs/xfs/xfs_symlink.c @@ -250,19 +250,12 @@ xfs_symlink( */ STATIC int xfs_inactive_symlink_rmt( - struct xfs_inode *ip) + struct xfs_inode *ip) { - struct xfs_buf *bp; - int done; - int error; - int i; - xfs_mount_t *mp; - xfs_bmbt_irec_t mval[XFS_SYMLINK_MAPS]; - int nmaps; - int size; - xfs_trans_t *tp; + struct xfs_mount *mp = ip->i_mount; + struct xfs_trans *tp; + int error; - mp = ip->i_mount; ASSERT(!xfs_need_iread_extents(&ip->i_df)); /* * We're freeing a symlink that has some @@ -286,44 +279,14 @@ xfs_inactive_symlink_rmt( * locked for the second transaction. In the error paths we need it * held so the cancel won't rele it, see below. */ - size = (int)ip->i_disk_size; ip->i_disk_size = 0; VFS_I(ip)->i_mode = (VFS_I(ip)->i_mode & ~S_IFMT) | S_IFREG; xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE); - /* - * Find the block(s) so we can inval and unmap them. - */ - done = 0; - nmaps = ARRAY_SIZE(mval); - error = xfs_bmapi_read(ip, 0, xfs_symlink_blocks(mp, size), - mval, &nmaps, 0); - if (error) - goto error_trans_cancel; - /* - * Invalidate the block(s). No validation is done. - */ - for (i = 0; i < nmaps; i++) { - error = xfs_trans_get_buf(tp, mp->m_ddev_targp, - XFS_FSB_TO_DADDR(mp, mval[i].br_startblock), - XFS_FSB_TO_BB(mp, mval[i].br_blockcount), 0, - &bp); - if (error) - goto error_trans_cancel; - xfs_trans_binval(tp, bp); - } - /* - * Unmap the dead block(s) to the dfops. - */ - error = xfs_bunmapi(tp, ip, 0, size, 0, nmaps, &done); + + error = xfs_symlink_remote_truncate(tp, ip); if (error) goto error_trans_cancel; - ASSERT(done); - /* - * Commit the transaction. This first logs the EFI and the inode, then - * rolls and commits the transaction that frees the extents. - */ - xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE); error = xfs_trans_commit(tp); if (error) { ASSERT(xfs_is_shutdown(mp)); From patchwork Fri Dec 30 22:13:58 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13084986 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B6720C3DA7D for ; Fri, 30 Dec 2022 23:55:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235581AbiL3XzL (ORCPT ); Fri, 30 Dec 2022 18:55:11 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51118 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235820AbiL3XzJ (ORCPT ); Fri, 30 Dec 2022 18:55:09 -0500 Received: from ams.source.kernel.org (ams.source.kernel.org [IPv6:2604:1380:4601:e00::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B9D631E3C5; Fri, 30 Dec 2022 15:55:07 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 4D007B81DDC; Fri, 30 Dec 2022 23:55:06 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id E4B08C433D2; Fri, 30 Dec 2022 23:55:04 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1672444505; bh=n+t8AJoQoQqXtCJCs0m7XC3U25YgcVJYyNfHESyEMIA=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=BO4LK37suIoJZKW0FdvETqC0n/c7psy2gB4xmLsxFBsDPo+PnqavBlT3AvJE/FJ4e 1ytXKV4obFWpCXFgrhBIwpWRKas2Xqhn4PvCecHKxNafc43irMk9i1QoI1QRJf4y5e 0v+0fiye3OmiimGVZE8FQwGfLyOTyWP5wi/cV/73m5wuXnkYdUz038nt8L6Lj+fGpP gyllnPWGFX6gg6BdYYWqt0Hyr108Frh7edYIX6NjA5yu0Di5UUtpRRC8uG1N77gfDi zzwQuGUCdqWnfqVwENPAOeO0qsiyk3JL/CFwZtO1FoKCeHorzlcee1aEWJjPdZUWWI KCN/9NTzdz3aQ== Subject: [PATCH 19/21] xfs: make atomic extent swapping support realtime files From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org Date: Fri, 30 Dec 2022 14:13:58 -0800 Message-ID: <167243843800.699466.8306225646421918407.stgit@magnolia> In-Reply-To: <167243843494.699466.5163281976943635014.stgit@magnolia> References: <167243843494.699466.5163281976943635014.stgit@magnolia> User-Agent: StGit/0.19 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Darrick J. Wong Now that bmap items support the realtime device, we can add the necessary pieces to the atomic extent swapping code to support such things. Signed-off-by: Darrick J. Wong --- fs/xfs/libxfs/xfs_swapext.c | 109 +++++++++++++++++++++++++++++++++- fs/xfs/libxfs/xfs_swapext.h | 5 +- fs/xfs/xfs_bmap_util.c | 2 - fs/xfs/xfs_file.c | 2 - fs/xfs/xfs_inode.h | 5 ++ fs/xfs/xfs_rtalloc.c | 136 +++++++++++++++++++++++++++++++++++++++++++ fs/xfs/xfs_rtalloc.h | 3 + fs/xfs/xfs_trace.h | 11 ++- fs/xfs/xfs_xchgrange.c | 71 ++++++++++++++++++++++ fs/xfs/xfs_xchgrange.h | 2 - 10 files changed, 329 insertions(+), 17 deletions(-) diff --git a/fs/xfs/libxfs/xfs_swapext.c b/fs/xfs/libxfs/xfs_swapext.c index b27ceeb93a16..69812594fd71 100644 --- a/fs/xfs/libxfs/xfs_swapext.c +++ b/fs/xfs/libxfs/xfs_swapext.c @@ -142,6 +142,108 @@ sxi_advance( sxi->sxi_blockcount -= irec->br_blockcount; } +#ifdef DEBUG +static inline bool +xfs_swapext_need_rt_conversion( + const struct xfs_swapext_req *req) +{ + struct xfs_inode *ip = req->ip2; + struct xfs_mount *mp = ip->i_mount; + + /* xattrs don't live on the rt device */ + if (req->whichfork == XFS_ATTR_FORK) + return false; + + /* + * Caller got permission to use logged swapext, so log recovery will + * finish the swap and not leave us with partially swapped rt extents + * exposed to userspace. + */ + if (req->req_flags & XFS_SWAP_REQ_LOGGED) + return false; + + /* + * If we can't use log intent items at all, the only supported + * operation is full fork swaps. + */ + if (!xfs_swapext_supported(mp)) + return false; + + /* Conversion is only needed for realtime files with big rt extents */ + return xfs_inode_has_bigrtextents(ip); +} + +static inline int +xfs_swapext_check_rt_extents( + struct xfs_mount *mp, + const struct xfs_swapext_req *req) +{ + struct xfs_bmbt_irec irec1, irec2; + xfs_fileoff_t startoff1 = req->startoff1; + xfs_fileoff_t startoff2 = req->startoff2; + xfs_filblks_t blockcount = req->blockcount; + uint32_t mod; + int nimaps; + int error; + + if (!xfs_swapext_need_rt_conversion(req)) + return 0; + + while (blockcount > 0) { + /* Read extent from the first file */ + nimaps = 1; + error = xfs_bmapi_read(req->ip1, startoff1, blockcount, + &irec1, &nimaps, 0); + if (error) + return error; + ASSERT(nimaps == 1); + + /* Read extent from the second file */ + nimaps = 1; + error = xfs_bmapi_read(req->ip2, startoff2, + irec1.br_blockcount, &irec2, &nimaps, + 0); + if (error) + return error; + ASSERT(nimaps == 1); + + /* + * We can only swap as many blocks as the smaller of the two + * extent maps. + */ + irec1.br_blockcount = min(irec1.br_blockcount, + irec2.br_blockcount); + + /* Both mappings must be aligned to the realtime extent size. */ + div_u64_rem(irec1.br_startoff, mp->m_sb.sb_rextsize, &mod); + if (mod) { + ASSERT(mod == 0); + return -EINVAL; + } + + div_u64_rem(irec2.br_startoff, mp->m_sb.sb_rextsize, &mod); + if (mod) { + ASSERT(mod == 0); + return -EINVAL; + } + + div_u64_rem(irec1.br_blockcount, mp->m_sb.sb_rextsize, &mod); + if (mod) { + ASSERT(mod == 0); + return -EINVAL; + } + + startoff1 += irec1.br_blockcount; + startoff2 += irec1.br_blockcount; + blockcount -= irec1.br_blockcount; + } + + return 0; +} +#else +# define xfs_swapext_check_rt_extents(mp, req) (0) +#endif + /* Check all extents to make sure we can actually swap them. */ int xfs_swapext_check_extents( @@ -161,12 +263,7 @@ xfs_swapext_check_extents( ifp2->if_format == XFS_DINODE_FMT_LOCAL) return -EINVAL; - /* We don't support realtime data forks yet. */ - if (!XFS_IS_REALTIME_INODE(req->ip1)) - return 0; - if (req->whichfork == XFS_ATTR_FORK) - return 0; - return -EINVAL; + return xfs_swapext_check_rt_extents(mp, req); } #ifdef CONFIG_XFS_QUOTA diff --git a/fs/xfs/libxfs/xfs_swapext.h b/fs/xfs/libxfs/xfs_swapext.h index 6b610fea150a..155add23d8e2 100644 --- a/fs/xfs/libxfs/xfs_swapext.h +++ b/fs/xfs/libxfs/xfs_swapext.h @@ -13,12 +13,11 @@ * This can be done to individual file extents by using the block mapping log * intent items introduced with reflink and rmap; or to entire file ranges * using swapext log intent items to track the overall progress across multiple - * extent mappings. Realtime is not supported yet. + * extent mappings. */ static inline bool xfs_swapext_supported(struct xfs_mount *mp) { - return (xfs_has_reflink(mp) || xfs_has_rmapbt(mp)) && - !xfs_has_realtime(mp); + return xfs_has_reflink(mp) || xfs_has_rmapbt(mp); } /* diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c index 47a583a94d58..3593c0f0ce13 100644 --- a/fs/xfs/xfs_bmap_util.c +++ b/fs/xfs/xfs_bmap_util.c @@ -989,7 +989,7 @@ xfs_free_file_space( endoffset_fsb = XFS_B_TO_FSBT(mp, offset + len); /* We can only free complete realtime extents. */ - if (XFS_IS_REALTIME_INODE(ip) && mp->m_sb.sb_rextsize > 1) { + if (xfs_inode_has_bigrtextents(ip)) { startoffset_fsb = roundup_64(startoffset_fsb, mp->m_sb.sb_rextsize); endoffset_fsb = rounddown_64(endoffset_fsb, diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c index b4629c8aa6b7..87dfb05640a8 100644 --- a/fs/xfs/xfs_file.c +++ b/fs/xfs/xfs_file.c @@ -1181,7 +1181,7 @@ xfs_file_xchg_range( goto out_err; /* Prepare and then exchange file contents. */ - error = xfs_xchg_range_prep(file1, file2, fxr); + error = xfs_xchg_range_prep(file1, file2, fxr, priv_flags); if (error) goto out_unlock; diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h index 4b01d078ace2..444c43571e31 100644 --- a/fs/xfs/xfs_inode.h +++ b/fs/xfs/xfs_inode.h @@ -287,6 +287,11 @@ static inline bool xfs_inode_has_large_extent_counts(struct xfs_inode *ip) return ip->i_diflags2 & XFS_DIFLAG2_NREXT64; } +static inline bool xfs_inode_has_bigrtextents(struct xfs_inode *ip) +{ + return XFS_IS_REALTIME_INODE(ip) && ip->i_mount->m_sb.sb_rextsize > 1; +} + /* * Return the buftarg used for data allocations on a given inode. */ diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c index 790191316a32..883333036519 100644 --- a/fs/xfs/xfs_rtalloc.c +++ b/fs/xfs/xfs_rtalloc.c @@ -21,6 +21,7 @@ #include "xfs_sb.h" #include "xfs_log_priv.h" #include "xfs_health.h" +#include "xfs_trace.h" /* * Read and return the summary information for a given extent size, @@ -1461,3 +1462,138 @@ xfs_rtpick_extent( *pick = b; return 0; } + +/* + * Decide if this is an unwritten extent that isn't aligned to a rt extent + * boundary. If it is, shorten the mapping so that we're ready to convert + * everything up to the next rt extent to a zeroed written extent. If not, + * return false. + */ +static inline bool +xfs_rtfile_want_conversion( + struct xfs_mount *mp, + struct xfs_bmbt_irec *irec) +{ + xfs_fileoff_t rext_next; + uint32_t modoff, modcnt; + + if (irec->br_state != XFS_EXT_UNWRITTEN) + return false; + + div_u64_rem(irec->br_startoff, mp->m_sb.sb_rextsize, &modoff); + if (modoff == 0) { + uint64_t rexts = div_u64_rem(irec->br_blockcount, + mp->m_sb.sb_rextsize, &modcnt); + + if (rexts > 0) { + /* + * Unwritten mapping starts at an rt extent boundary + * and is longer than one rt extent. Round the length + * down to the nearest extent but don't select it for + * conversion. + */ + irec->br_blockcount -= modcnt; + modcnt = 0; + } + + /* Unwritten mapping is perfectly aligned, do not convert. */ + if (modcnt == 0) + return false; + } + + /* + * Unaligned and unwritten; trim to the current rt extent and select it + * for conversion. + */ + rext_next = (irec->br_startoff - modoff) + mp->m_sb.sb_rextsize; + xfs_trim_extent(irec, irec->br_startoff, rext_next - irec->br_startoff); + return true; +} + +/* + * For all realtime extents backing the given range of a file, search for + * unwritten mappings that do not cover a full rt extent and convert them + * to zeroed written mappings. The goal is to end up with one mapping per rt + * extent so that we can perform a remapping operation. Callers must ensure + * that there are no dirty pages in the given range. + */ +int +xfs_rtfile_convert_unwritten( + struct xfs_inode *ip, + loff_t pos, + uint64_t len) +{ + struct xfs_bmbt_irec irec; + struct xfs_trans *tp; + struct xfs_mount *mp = ip->i_mount; + xfs_fileoff_t off; + xfs_fileoff_t endoff; + unsigned int resblks; + int ret; + + if (mp->m_sb.sb_rextsize == 1) + return 0; + + off = rounddown_64(XFS_B_TO_FSBT(mp, pos), mp->m_sb.sb_rextsize); + endoff = roundup_64(XFS_B_TO_FSB(mp, pos + len), mp->m_sb.sb_rextsize); + + trace_xfs_rtfile_convert_unwritten(ip, pos, len); + + while (off < endoff) { + int nmap = 1; + + if (fatal_signal_pending(current)) + return -EINTR; + + resblks = XFS_DIOSTRAT_SPACE_RES(mp, 1); + ret = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, resblks, 0, 0, + &tp); + if (ret) + return ret; + + xfs_ilock(ip, XFS_ILOCK_EXCL); + xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL); + + /* + * Read the mapping. If we find an unwritten extent that isn't + * aligned to an rt extent boundary... + */ + ret = xfs_bmapi_read(ip, off, endoff - off, &irec, &nmap, 0); + if (ret) + goto err; + ASSERT(nmap == 1); + ASSERT(irec.br_startoff == off); + if (!xfs_rtfile_want_conversion(mp, &irec)) { + xfs_trans_cancel(tp); + off += irec.br_blockcount; + continue; + } + + /* + * ...make sure this partially unwritten rt extent gets + * converted to a zeroed written extent that we can remap. + */ + nmap = 1; + ret = xfs_bmapi_write(tp, ip, off, irec.br_blockcount, + XFS_BMAPI_CONVERT | XFS_BMAPI_ZERO, 0, &irec, + &nmap); + if (ret) + goto err; + ASSERT(nmap == 1); + if (irec.br_state != XFS_EXT_NORM) { + ASSERT(0); + ret = -EIO; + goto err; + } + ret = xfs_trans_commit(tp); + if (ret) + return ret; + + off += irec.br_blockcount; + } + + return 0; +err: + xfs_trans_cancel(tp); + return ret; +} diff --git a/fs/xfs/xfs_rtalloc.h b/fs/xfs/xfs_rtalloc.h index 3b2f1b499a11..e440f793dd98 100644 --- a/fs/xfs/xfs_rtalloc.h +++ b/fs/xfs/xfs_rtalloc.h @@ -140,6 +140,8 @@ int xfs_rtalloc_extent_is_free(struct xfs_mount *mp, struct xfs_trans *tp, xfs_rtblock_t start, xfs_extlen_t len, bool *is_free); int xfs_rtalloc_reinit_frextents(struct xfs_mount *mp); +int xfs_rtfile_convert_unwritten(struct xfs_inode *ip, loff_t pos, + uint64_t len); #else # define xfs_rtallocate_extent(t,b,min,max,l,f,p,rb) (ENOSYS) # define xfs_rtfree_extent(t,b,l) (ENOSYS) @@ -164,6 +166,7 @@ xfs_rtmount_init( } # define xfs_rtmount_inodes(m) (((mp)->m_sb.sb_rblocks == 0)? 0 : (ENOSYS)) # define xfs_rtunmount_inodes(m) +# define xfs_rtfile_convert_unwritten(ip, pos, len) (0) #endif /* CONFIG_XFS_RT */ #endif /* __XFS_RTALLOC_H__ */ diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h index b0ced76af3b9..0802f078a945 100644 --- a/fs/xfs/xfs_trace.h +++ b/fs/xfs/xfs_trace.h @@ -1519,7 +1519,7 @@ DEFINE_IMAP_EVENT(xfs_iomap_alloc); DEFINE_IMAP_EVENT(xfs_iomap_found); DECLARE_EVENT_CLASS(xfs_simple_io_class, - TP_PROTO(struct xfs_inode *ip, xfs_off_t offset, ssize_t count), + TP_PROTO(struct xfs_inode *ip, xfs_off_t offset, u64 count), TP_ARGS(ip, offset, count), TP_STRUCT__entry( __field(dev_t, dev) @@ -1527,7 +1527,7 @@ DECLARE_EVENT_CLASS(xfs_simple_io_class, __field(loff_t, isize) __field(loff_t, disize) __field(loff_t, offset) - __field(size_t, count) + __field(u64, count) ), TP_fast_assign( __entry->dev = VFS_I(ip)->i_sb->s_dev; @@ -1538,7 +1538,7 @@ DECLARE_EVENT_CLASS(xfs_simple_io_class, __entry->count = count; ), TP_printk("dev %d:%d ino 0x%llx isize 0x%llx disize 0x%llx " - "pos 0x%llx bytecount 0x%zx", + "pos 0x%llx bytecount 0x%llx", MAJOR(__entry->dev), MINOR(__entry->dev), __entry->ino, __entry->isize, @@ -1549,7 +1549,7 @@ DECLARE_EVENT_CLASS(xfs_simple_io_class, #define DEFINE_SIMPLE_IO_EVENT(name) \ DEFINE_EVENT(xfs_simple_io_class, name, \ - TP_PROTO(struct xfs_inode *ip, xfs_off_t offset, ssize_t count), \ + TP_PROTO(struct xfs_inode *ip, xfs_off_t offset, u64 count), \ TP_ARGS(ip, offset, count)) DEFINE_SIMPLE_IO_EVENT(xfs_delalloc_enospc); DEFINE_SIMPLE_IO_EVENT(xfs_unwritten_convert); @@ -3741,6 +3741,9 @@ TRACE_EVENT(xfs_ioctl_clone, /* unshare tracepoints */ DEFINE_SIMPLE_IO_EVENT(xfs_reflink_unshare); DEFINE_INODE_ERROR_EVENT(xfs_reflink_unshare_error); +#ifdef CONFIG_XFS_RT +DEFINE_SIMPLE_IO_EVENT(xfs_rtfile_convert_unwritten); +#endif /* CONFIG_XFS_RT */ /* copy on write */ DEFINE_INODE_IREC_EVENT(xfs_reflink_trim_around_shared); diff --git a/fs/xfs/xfs_xchgrange.c b/fs/xfs/xfs_xchgrange.c index 27bb88dcf228..6a66d09099b0 100644 --- a/fs/xfs/xfs_xchgrange.c +++ b/fs/xfs/xfs_xchgrange.c @@ -28,6 +28,7 @@ #include "xfs_sb.h" #include "xfs_icache.h" #include "xfs_log.h" +#include "xfs_rtalloc.h" /* Lock (and optionally join) two inodes for a file range exchange. */ void @@ -370,12 +371,58 @@ xfs_swap_extent_forks( return 0; } +/* + * There may be partially written rt extents lurking in the ranges to be + * swapped. According to the rules for realtime files with big rt extents, we + * must guarantee that an outside observer (an IO thread, realistically) never + * can see multiple physical rt extents mapped to the same logical file rt + * extent. The deferred bmap log intent items that we use under the hood + * operate on single block mappings and not rt extents, which means we must + * have a strategy to ensure that log recovery after a failure won't stop in + * the middle of an rt extent. + * + * The preferred strategy is to use deferred extent swap log intent items to + * track the status of the overall swap operation so that we can complete the + * work during crash recovery. If that isn't possible, we fall back to + * requiring the selected mappings in both forks to be aligned to rt extent + * boundaries. As an aside, the old fork swap routine didn't have this + * requirement, but at an extreme cost in flexibilty (full files only, and no + * support if rmapbt is enabled). + */ +static bool +xfs_xchg_range_need_rt_conversion( + struct xfs_inode *ip, + unsigned int xchg_flags) +{ + struct xfs_mount *mp = ip->i_mount; + + /* + * Caller got permission to use logged swapext, so log recovery will + * finish the swap and not leave us with partially swapped rt extents + * exposed to userspace. + */ + if (xchg_flags & XFS_XCHG_RANGE_LOGGED) + return false; + + /* + * If we can't use log intent items at all, the only supported + * operation is full fork swaps, so no conversions are needed. + * The range requirements are enforced by the swapext code itself. + */ + if (!xfs_swapext_supported(mp)) + return false; + + /* Conversion is only needed for realtime files with big rt extents */ + return xfs_inode_has_bigrtextents(ip); +} + /* Prepare two files to have their data exchanged. */ int xfs_xchg_range_prep( struct file *file1, struct file *file2, - struct file_xchg_range *fxr) + struct file_xchg_range *fxr, + unsigned int xchg_flags) { struct xfs_inode *ip1 = XFS_I(file_inode(file1)); struct xfs_inode *ip2 = XFS_I(file_inode(file2)); @@ -439,6 +486,19 @@ xfs_xchg_range_prep( return error; } + /* Convert unwritten sub-extent mappings if required. */ + if (xfs_xchg_range_need_rt_conversion(ip2, xchg_flags)) { + error = xfs_rtfile_convert_unwritten(ip2, fxr->file2_offset, + fxr->length); + if (error) + return error; + + error = xfs_rtfile_convert_unwritten(ip1, fxr->file1_offset, + fxr->length); + if (error) + return error; + } + return 0; } @@ -656,6 +716,15 @@ xfs_xchg_range( if (xchg_flags & XFS_XCHG_RANGE_LOGGED) req.req_flags |= XFS_SWAP_REQ_LOGGED; + /* + * Round the request length up to the nearest fundamental unit of + * allocation. The prep function already checked that the request + * offsets and length in @fxr are safe to round up. + */ + if (XFS_IS_REALTIME_INODE(ip2)) + req.blockcount = roundup_64(req.blockcount, + mp->m_sb.sb_rextsize); + error = xfs_xchg_range_estimate(&req); if (error) return error; diff --git a/fs/xfs/xfs_xchgrange.h b/fs/xfs/xfs_xchgrange.h index a0e64408784a..e356fe09a40c 100644 --- a/fs/xfs/xfs_xchgrange.h +++ b/fs/xfs/xfs_xchgrange.h @@ -35,6 +35,6 @@ void xfs_xchg_range_rele_log_assist(struct xfs_mount *mp); int xfs_xchg_range(struct xfs_inode *ip1, struct xfs_inode *ip2, const struct file_xchg_range *fxr, unsigned int xchg_flags); int xfs_xchg_range_prep(struct file *file1, struct file *file2, - struct file_xchg_range *fxr); + struct file_xchg_range *fxr, unsigned int xchg_flags); #endif /* __XFS_XCHGRANGE_H__ */ From patchwork Fri Dec 30 22:13:58 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13084987 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1683FC3DA7D for ; Fri, 30 Dec 2022 23:55:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235833AbiL3XzX (ORCPT ); Fri, 30 Dec 2022 18:55:23 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51244 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235658AbiL3XzW (ORCPT ); Fri, 30 Dec 2022 18:55:22 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 862311E3CE; Fri, 30 Dec 2022 15:55:21 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 25A0161BCD; Fri, 30 Dec 2022 23:55:21 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 8CBB4C433D2; Fri, 30 Dec 2022 23:55:20 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1672444520; bh=d6rZADnnl5NJT9em+5xNA+xpoVTObaMoEUZNgr89LGQ=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=OyeO3MbXs7HqmRNt64bgvPOCHTTl9cWxM4RDDFHnhM7JBEnwMtLZOjOVr65jR+sq2 FqBfjAgdtQhRWiffjdtRAMpx7Wipyaxq5SqN+gxvDPjWm6rinQwNZ42SWOhLY8Vuxb cySLaqYR2waqYbJAL+98HBJzvvDTSBlLsvl7433PXCin8iNjhmEiDthCtEXp//LBJJ HAimJzjaR9jCzg32gjM1/XGILNe02EpzPcY/f7bKBK9mPSxFh4qWsWfZR/BRHWvrGZ 4vSbHXvHwTHjBdK7E2RiO5yQvao2x1G1HI+YwVebEMUnDVmq6PXsbT0oDzQ3MZTVDe Q/pDf275PfCEA== Subject: [PATCH 20/21] xfs: support non-power-of-two rtextsize with exchange-range From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org Date: Fri, 30 Dec 2022 14:13:58 -0800 Message-ID: <167243843816.699466.13150984573684791764.stgit@magnolia> In-Reply-To: <167243843494.699466.5163281976943635014.stgit@magnolia> References: <167243843494.699466.5163281976943635014.stgit@magnolia> User-Agent: StGit/0.19 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Darrick J. Wong The VFS exchange-range alignment checks use (fast) bitmasks to perform block alignment checks on the exchange parameters. Unfortunately, bitmasks require that the alignment size be a power of two. This isn't true for realtime devices, so we have to copy-pasta the VFS checks using long division for this to work properly. Signed-off-by: Darrick J. Wong --- fs/xfs/xfs_xchgrange.c | 102 +++++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 91 insertions(+), 11 deletions(-) diff --git a/fs/xfs/xfs_xchgrange.c b/fs/xfs/xfs_xchgrange.c index 6a66d09099b0..ae030a6f607e 100644 --- a/fs/xfs/xfs_xchgrange.c +++ b/fs/xfs/xfs_xchgrange.c @@ -416,6 +416,86 @@ xfs_xchg_range_need_rt_conversion( return xfs_inode_has_bigrtextents(ip); } +/* + * Check the alignment of an exchange request when the allocation unit size + * isn't a power of two. The VFS helpers use (fast) bitmask-based alignment + * checks, but here we have to use slow long division. + */ +static int +xfs_xchg_range_check_rtalign( + struct xfs_inode *ip1, + struct xfs_inode *ip2, + const struct file_xchg_range *fxr) +{ + struct xfs_mount *mp = ip1->i_mount; + uint32_t rextbytes; + uint64_t length = fxr->length; + uint64_t blen; + loff_t size1, size2; + + rextbytes = XFS_FSB_TO_B(mp, mp->m_sb.sb_rextsize); + size1 = i_size_read(VFS_I(ip1)); + size2 = i_size_read(VFS_I(ip2)); + + /* The start of both ranges must be aligned to a rt extent. */ + if (!isaligned_64(fxr->file1_offset, rextbytes) || + !isaligned_64(fxr->file2_offset, rextbytes)) + return -EINVAL; + + /* + * If the caller asked for full files, check that the offset/length + * values cover all of both files. + */ + if ((fxr->flags & FILE_XCHG_RANGE_FULL_FILES) && + (fxr->file1_offset != 0 || fxr->file2_offset != 0 || + fxr->length != size1 || fxr->length != size2)) + return -EDOM; + + if (fxr->flags & FILE_XCHG_RANGE_TO_EOF) + length = max_t(int64_t, size1 - fxr->file1_offset, + size2 - fxr->file2_offset); + + /* + * If the user wanted us to exchange up to the infile's EOF, round up + * to the next rt extent boundary for this check. Do the same for the + * outfile. + * + * Otherwise, reject the range length if it's not rt extent aligned. + * We already confirmed the starting offsets' rt extent block + * alignment. + */ + if (fxr->file1_offset + length == size1) + blen = roundup_64(size1, rextbytes) - fxr->file1_offset; + else if (fxr->file2_offset + length == size2) + blen = roundup_64(size2, rextbytes) - fxr->file2_offset; + else if (!isaligned_64(length, rextbytes)) + return -EINVAL; + else + blen = length; + + /* Don't allow overlapped exchanges within the same file. */ + if (ip1 == ip2 && + fxr->file2_offset + blen > fxr->file1_offset && + fxr->file1_offset + blen > fxr->file2_offset) + return -EINVAL; + + /* + * Ensure that we don't exchange a partial EOF rt extent into the + * middle of another file. + */ + if (isaligned_64(length, rextbytes)) + return 0; + + blen = length; + if (fxr->file2_offset + length < size2) + blen = rounddown_64(blen, rextbytes); + + if (fxr->file1_offset + blen < size1) + blen = rounddown_64(blen, rextbytes); + + return blen == length ? 0 : -EINVAL; +} + /* Prepare two files to have their data exchanged. */ int xfs_xchg_range_prep( @@ -426,6 +506,7 @@ xfs_xchg_range_prep( { struct xfs_inode *ip1 = XFS_I(file_inode(file1)); struct xfs_inode *ip2 = XFS_I(file_inode(file2)); + unsigned int alloc_unit = xfs_inode_alloc_unitsize(ip2); int error; trace_xfs_xchg_range_prep(ip1, fxr, ip2, 0); @@ -434,18 +515,17 @@ xfs_xchg_range_prep( if (XFS_IS_REALTIME_INODE(ip1) != XFS_IS_REALTIME_INODE(ip2)) return -EINVAL; - /* - * The alignment checks in the VFS helpers cannot deal with allocation - * units that are not powers of 2. This can happen with the realtime - * volume if the extent size is set. Note that alignment checks are - * skipped if FULL_FILES is set. - */ - if (!(fxr->flags & FILE_XCHG_RANGE_FULL_FILES) && - !is_power_of_2(xfs_inode_alloc_unitsize(ip2))) - return -EOPNOTSUPP; + /* Check non-power of two alignment issues, if necessary. */ + if (XFS_IS_REALTIME_INODE(ip2) && !is_power_of_2(alloc_unit)) { + error = xfs_xchg_range_check_rtalign(ip1, ip2, fxr); + if (error) + return error; - error = generic_xchg_file_range_prep(file1, file2, fxr, - xfs_inode_alloc_unitsize(ip2)); + /* Do the VFS checks with the regular block alignment. */ + alloc_unit = ip1->i_mount->m_sb.sb_blocksize; + } + + error = generic_xchg_file_range_prep(file1, file2, fxr, alloc_unit); if (error || fxr->length == 0) return error; From patchwork Fri Dec 30 22:13:58 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13084988 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7E68CC3DA7C for ; Fri, 30 Dec 2022 23:55:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235838AbiL3Xzm (ORCPT ); Fri, 30 Dec 2022 18:55:42 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51488 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235829AbiL3Xzj (ORCPT ); Fri, 30 Dec 2022 18:55:39 -0500 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9DA611E3E8; Fri, 30 Dec 2022 15:55:38 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 5277EB81DD1; Fri, 30 Dec 2022 23:55:37 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 12AE7C433EF; Fri, 30 Dec 2022 23:55:36 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1672444536; bh=yBEVPklg/o7H3DOnN+gXOPTcjM79wse51B6G8fFBW1A=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=rB4hcLa5SySrwSpUDLXN0ayWwiUzF9bpABF6P3rle3KzIzD4doL2JWYIcT0p+xrQa BYG8thrGjwdoBLaZzd0E42OcO+/eFVeKXh3D/FvZdgq/tYt4R0WMTwp9AttnPaZP5m 2TXIvM+aCgNEr7cd8DwD9OL5XC+8Zt9Ukqa5mO24nUGaYAadGHUz5yq5BTLwoJG8uQ LditQf9nJXD3wfFi1d6Z2GgvJKVDTtYz/BbYwNxbFKi2ojqDP9JqhFED0LSX2jhakH knOOV3eXuo/X7JhNlj7J2Muuwo18kk9QrD0moOP7UNsHqakqgZuep015GbTzcXsfzB 53PZ8qbW50F6w== Subject: [PATCH 21/21] xfs: enable atomic swapext feature From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org Date: Fri, 30 Dec 2022 14:13:58 -0800 Message-ID: <167243843830.699466.7903927007917457600.stgit@magnolia> In-Reply-To: <167243843494.699466.5163281976943635014.stgit@magnolia> References: <167243843494.699466.5163281976943635014.stgit@magnolia> User-Agent: StGit/0.19 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Darrick J. Wong Add the atomic swapext feature to the set of features that we will permit. Signed-off-by: Darrick J. Wong --- fs/xfs/libxfs/xfs_format.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h index bb8bff488017..0c457905cce5 100644 --- a/fs/xfs/libxfs/xfs_format.h +++ b/fs/xfs/libxfs/xfs_format.h @@ -393,7 +393,8 @@ xfs_sb_has_incompat_feature( #define XFS_SB_FEAT_INCOMPAT_LOG_XATTRS (1 << 0) /* Delayed Attributes */ #define XFS_SB_FEAT_INCOMPAT_LOG_SWAPEXT (1U << 31) /* file extent swap */ #define XFS_SB_FEAT_INCOMPAT_LOG_ALL \ - (XFS_SB_FEAT_INCOMPAT_LOG_XATTRS) + (XFS_SB_FEAT_INCOMPAT_LOG_XATTRS | \ + XFS_SB_FEAT_INCOMPAT_LOG_SWAPEXT) #define XFS_SB_FEAT_INCOMPAT_LOG_UNKNOWN ~XFS_SB_FEAT_INCOMPAT_LOG_ALL static inline bool xfs_sb_has_incompat_log_feature(