Message ID | bda8a1de78c3d938a71a816401f96f0e0d6c3f72.1733929328.git.fdmanana@suse.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | btrfs: fixes around swap file activation and cleanups | expand |
在 2024/12/12 01:34, fdmanana@kernel.org 写道: > From: Filipe Manana <fdmanana@suse.com> > > When activating a swap file, to determine if an extent is shared we use > can_nocow_extent(), which ends up at btrfs_cross_ref_exist(). That helper > is meant to be quick because it's used in the NOCOW write path, when > flushing delalloc and when doing a direct IO write, however it does return > some false positives, meaning it may indicate that an extent is shared > even if it's no longer the case. For the write path this is fine, we just > do a unnecessary COW operation instead of doing a more rigorous check > which would be too heavy (calling btrfs_is_data_extent_shared()). > > However when activating a swap file, the false positives simply result > in a failure, which is confusing for users/applications. One particular > case where this happens is when a data extent only has 1 reference but > that reference is not inlined in the extent item located in the extent > tree - this happens when we create more than 33 references for an extent > and then delete those 33 references plus every other non-inline reference > except one. The function check_committed_ref() assumes that if the size > of an extent item doesn't match the size of struct btrfs_extent_item > plus the size of an inline reference (plus an owner reference in case > simple quotas are enabled), then the extent is shared - that is not the > case however, we can have a single reference but it's not inlined - the > reason we do this is to be fast and avoid inspecting non-inline references > which may be located in another leaf of the extent tree, slowing down > write paths. > > The following test script reproduces the bug: > > $ cat test.sh > #!/bin/bash > > DEV=/dev/sdi > MNT=/mnt/sdi > NUM_CLONES=50 > > umount $DEV &> /dev/null > > run_test() > { > local sync_after_add_reflinks=$1 > local sync_after_remove_reflinks=$2 > > mkfs.btrfs -f $DEV > /dev/null > #mkfs.xfs -f $DEV > /dev/null > mount $DEV $MNT > > touch $MNT/foo > chmod 0600 $MNT/foo > # On btrfs the file must be NOCOW. > chattr +C $MNT/foo &> /dev/null > xfs_io -s -c "pwrite -b 1M 0 1M" $MNT/foo > mkswap $MNT/foo > > for ((i = 1; i <= $NUM_CLONES; i++)); do > touch $MNT/foo_clone_$i > chmod 0600 $MNT/foo_clone_$i > # On btrfs the file must be NOCOW. > chattr +C $MNT/foo_clone_$i &> /dev/null > cp --reflink=always $MNT/foo $MNT/foo_clone_$i > done > > if [ $sync_after_add_reflinks -ne 0 ]; then > # Flush delayed refs and commit current transaction. > sync -f $MNT > fi > > # Remove the original file and all clones except the last. > rm -f $MNT/foo > for ((i = 1; i < $NUM_CLONES; i++)); do > rm -f $MNT/foo_clone_$i > done > > if [ $sync_after_remove_reflinks -ne 0 ]; then > # Flush delayed refs and commit current transaction. > sync -f $MNT > fi > > # Now use the last clone as a swap file. It should work since > # its extent are not shared anymore. > swapon $MNT/foo_clone_${NUM_CLONES} > swapoff $MNT/foo_clone_${NUM_CLONES} > > umount $MNT > } > > echo -e "\nTest without sync after creating and removing clones" > run_test 0 0 > > echo -e "\nTest with sync after creating clones" > run_test 1 0 > > echo -e "\nTest with sync after removing clones" > run_test 0 1 > > echo -e "\nTest with sync after creating and removing clones" > run_test 1 1 > > Running the test: > > $ ./test.sh > Test without sync after creating and removing clones > wrote 1048576/1048576 bytes at offset 0 > 1 MiB, 1 ops; 0.0017 sec (556.793 MiB/sec and 556.7929 ops/sec) > Setting up swapspace version 1, size = 1020 KiB (1044480 bytes) > no label, UUID=a6b9c29e-5ef4-4689-a8ac-bc199c750f02 > swapon: /mnt/sdi/foo_clone_50: swapon failed: Invalid argument > swapoff: /mnt/sdi/foo_clone_50: swapoff failed: Invalid argument > > Test with sync after creating clones > wrote 1048576/1048576 bytes at offset 0 > 1 MiB, 1 ops; 0.0036 sec (271.739 MiB/sec and 271.7391 ops/sec) > Setting up swapspace version 1, size = 1020 KiB (1044480 bytes) > no label, UUID=5e9008d6-1f7a-4948-a1b4-3f30aba20a33 > swapon: /mnt/sdi/foo_clone_50: swapon failed: Invalid argument > swapoff: /mnt/sdi/foo_clone_50: swapoff failed: Invalid argument > > Test with sync after removing clones > wrote 1048576/1048576 bytes at offset 0 > 1 MiB, 1 ops; 0.0103 sec (96.665 MiB/sec and 96.6651 ops/sec) > Setting up swapspace version 1, size = 1020 KiB (1044480 bytes) > no label, UUID=916c2740-fa9f-4385-9f06-29c3f89e4764 > > Test with sync after creating and removing clones > wrote 1048576/1048576 bytes at offset 0 > 1 MiB, 1 ops; 0.0031 sec (314.268 MiB/sec and 314.2678 ops/sec) > Setting up swapspace version 1, size = 1020 KiB (1044480 bytes) > no label, UUID=06aab1dd-4d90-49c0-bd9f-3a8db4e2f912 > swapon: /mnt/sdi/foo_clone_50: swapon failed: Invalid argument > swapoff: /mnt/sdi/foo_clone_50: swapoff failed: Invalid argument > > Fix this by reworking btrfs_swap_activate() to instead of using extent > maps and checking for shared extents with can_nocow_extent(), iterate > over the inode's file extent items and use the accurate > btrfs_is_data_extent_shared(). > > Signed-off-by: Filipe Manana <fdmanana@suse.com> The patch looks good to me. Reviewed-by: Qu Wenruo <wqu@suse.com> Although there is no mention about why we get rid of btrfs_get_extent(). I guess it's to avoid caching those extent maps to save space? Just like what we did in fiemap. Thanks, Qu > --- > fs/btrfs/inode.c | 96 ++++++++++++++++++++++++++++++++++-------------- > 1 file changed, 69 insertions(+), 27 deletions(-) > > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c > index 926d82fbdbae..7ddb8a01b60f 100644 > --- a/fs/btrfs/inode.c > +++ b/fs/btrfs/inode.c > @@ -9799,15 +9799,16 @@ static int btrfs_swap_activate(struct swap_info_struct *sis, struct file *file, > struct btrfs_fs_info *fs_info = root->fs_info; > struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree; > struct extent_state *cached_state = NULL; > - struct extent_map *em = NULL; > struct btrfs_chunk_map *map = NULL; > struct btrfs_device *device = NULL; > struct btrfs_swap_info bsi = { > .lowest_ppage = (sector_t)-1ULL, > }; > + struct btrfs_backref_share_check_ctx *backref_ctx = NULL; > + struct btrfs_path *path = NULL; > int ret = 0; > u64 isize; > - u64 start; > + u64 prev_extent_end = 0; > > /* > * Acquire the inode's mmap lock to prevent races with memory mapped > @@ -9846,6 +9847,13 @@ static int btrfs_swap_activate(struct swap_info_struct *sis, struct file *file, > goto out_unlock_mmap; > } > > + path = btrfs_alloc_path(); > + backref_ctx = btrfs_alloc_backref_share_check_ctx(); > + if (!path || !backref_ctx) { > + ret = -ENOMEM; > + goto out_unlock_mmap; > + } > + > /* > * Balance or device remove/replace/resize can move stuff around from > * under us. The exclop protection makes sure they aren't running/won't > @@ -9904,24 +9912,39 @@ static int btrfs_swap_activate(struct swap_info_struct *sis, struct file *file, > isize = ALIGN_DOWN(inode->i_size, fs_info->sectorsize); > > lock_extent(io_tree, 0, isize - 1, &cached_state); > - start = 0; > - while (start < isize) { > - u64 logical_block_start, physical_block_start; > + while (prev_extent_end < isize) { > + struct btrfs_key key; > + struct extent_buffer *leaf; > + struct btrfs_file_extent_item *ei; > struct btrfs_block_group *bg; > - u64 len = isize - start; > + u64 logical_block_start; > + u64 physical_block_start; > + u64 extent_gen; > + u64 disk_bytenr; > + u64 len; > > - em = btrfs_get_extent(BTRFS_I(inode), NULL, start, len); > - if (IS_ERR(em)) { > - ret = PTR_ERR(em); > + key.objectid = btrfs_ino(BTRFS_I(inode)); > + key.type = BTRFS_EXTENT_DATA_KEY; > + key.offset = prev_extent_end; > + > + ret = btrfs_search_slot(NULL, root, &key, path, 0, 0); > + if (ret < 0) > goto out; > - } > > - if (em->disk_bytenr == EXTENT_MAP_HOLE) { > + /* > + * If key not found it means we have an implicit hole (NO_HOLES > + * is enabled). > + */ > + if (ret > 0) { > btrfs_warn(fs_info, "swapfile must not have holes"); > ret = -EINVAL; > goto out; > } > - if (em->disk_bytenr == EXTENT_MAP_INLINE) { > + > + leaf = path->nodes[0]; > + ei = btrfs_item_ptr(leaf, path->slots[0], struct btrfs_file_extent_item); > + > + if (btrfs_file_extent_type(leaf, ei) == BTRFS_FILE_EXTENT_INLINE) { > /* > * It's unlikely we'll ever actually find ourselves > * here, as a file small enough to fit inline won't be > @@ -9933,23 +9956,45 @@ static int btrfs_swap_activate(struct swap_info_struct *sis, struct file *file, > ret = -EINVAL; > goto out; > } > - if (extent_map_is_compressed(em)) { > + > + if (btrfs_file_extent_compression(leaf, ei) != BTRFS_COMPRESS_NONE) { > btrfs_warn(fs_info, "swapfile must not be compressed"); > ret = -EINVAL; > goto out; > } > > - logical_block_start = extent_map_block_start(em) + (start - em->start); > - len = min(len, em->len - (start - em->start)); > - free_extent_map(em); > - em = NULL; > + disk_bytenr = btrfs_file_extent_disk_bytenr(leaf, ei); > + if (disk_bytenr == 0) { > + btrfs_warn(fs_info, "swapfile must not have holes"); > + ret = -EINVAL; > + goto out; > + } > + > + logical_block_start = disk_bytenr + btrfs_file_extent_offset(leaf, ei); > + extent_gen = btrfs_file_extent_generation(leaf, ei); > + prev_extent_end = btrfs_file_extent_end(path); > + > + if (prev_extent_end > isize) > + len = isize - key.offset; > + else > + len = btrfs_file_extent_num_bytes(leaf, ei); > > - ret = can_nocow_extent(inode, start, &len, NULL, false, true); > + backref_ctx->curr_leaf_bytenr = leaf->start; > + > + /* > + * Don't need the path anymore, release to avoid deadlocks when > + * calling btrfs_is_data_extent_shared() because when joining a > + * transaction it can block waiting for the current one's commit > + * which in turn may be trying to lock the same leaf to flush > + * delayed items for example. > + */ > + btrfs_release_path(path); > + > + ret = btrfs_is_data_extent_shared(BTRFS_I(inode), disk_bytenr, > + extent_gen, backref_ctx); > if (ret < 0) { > goto out; > - } else if (ret) { > - ret = 0; > - } else { > + } else if (ret > 0) { > btrfs_warn(fs_info, > "swapfile must not be copy-on-write"); > ret = -EINVAL; > @@ -9984,7 +10029,6 @@ static int btrfs_swap_activate(struct swap_info_struct *sis, struct file *file, > > physical_block_start = (map->stripes[0].physical + > (logical_block_start - map->start)); > - len = min(len, map->chunk_len - (logical_block_start - map->start)); > btrfs_free_chunk_map(map); > map = NULL; > > @@ -10025,20 +10069,16 @@ static int btrfs_swap_activate(struct swap_info_struct *sis, struct file *file, > if (ret) > goto out; > } > - bsi.start = start; > + bsi.start = key.offset; > bsi.block_start = physical_block_start; > bsi.block_len = len; > } > - > - start += len; > } > > if (bsi.block_len) > ret = btrfs_add_swap_extent(sis, &bsi); > > out: > - if (!IS_ERR_OR_NULL(em)) > - free_extent_map(em); > if (!IS_ERR_OR_NULL(map)) > btrfs_free_chunk_map(map); > > @@ -10053,6 +10093,8 @@ static int btrfs_swap_activate(struct swap_info_struct *sis, struct file *file, > > out_unlock_mmap: > up_write(&BTRFS_I(inode)->i_mmap_lock); > + btrfs_free_backref_share_ctx(backref_ctx); > + btrfs_free_path(path); > if (ret) > return ret; >
On Fri, Dec 13, 2024 at 8:52 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote: > > > > 在 2024/12/12 01:34, fdmanana@kernel.org 写道: > > From: Filipe Manana <fdmanana@suse.com> > > > > When activating a swap file, to determine if an extent is shared we use > > can_nocow_extent(), which ends up at btrfs_cross_ref_exist(). That helper > > is meant to be quick because it's used in the NOCOW write path, when > > flushing delalloc and when doing a direct IO write, however it does return > > some false positives, meaning it may indicate that an extent is shared > > even if it's no longer the case. For the write path this is fine, we just > > do a unnecessary COW operation instead of doing a more rigorous check > > which would be too heavy (calling btrfs_is_data_extent_shared()). > > > > However when activating a swap file, the false positives simply result > > in a failure, which is confusing for users/applications. One particular > > case where this happens is when a data extent only has 1 reference but > > that reference is not inlined in the extent item located in the extent > > tree - this happens when we create more than 33 references for an extent > > and then delete those 33 references plus every other non-inline reference > > except one. The function check_committed_ref() assumes that if the size > > of an extent item doesn't match the size of struct btrfs_extent_item > > plus the size of an inline reference (plus an owner reference in case > > simple quotas are enabled), then the extent is shared - that is not the > > case however, we can have a single reference but it's not inlined - the > > reason we do this is to be fast and avoid inspecting non-inline references > > which may be located in another leaf of the extent tree, slowing down > > write paths. > > > > The following test script reproduces the bug: > > > > $ cat test.sh > > #!/bin/bash > > > > DEV=/dev/sdi > > MNT=/mnt/sdi > > NUM_CLONES=50 > > > > umount $DEV &> /dev/null > > > > run_test() > > { > > local sync_after_add_reflinks=$1 > > local sync_after_remove_reflinks=$2 > > > > mkfs.btrfs -f $DEV > /dev/null > > #mkfs.xfs -f $DEV > /dev/null > > mount $DEV $MNT > > > > touch $MNT/foo > > chmod 0600 $MNT/foo > > # On btrfs the file must be NOCOW. > > chattr +C $MNT/foo &> /dev/null > > xfs_io -s -c "pwrite -b 1M 0 1M" $MNT/foo > > mkswap $MNT/foo > > > > for ((i = 1; i <= $NUM_CLONES; i++)); do > > touch $MNT/foo_clone_$i > > chmod 0600 $MNT/foo_clone_$i > > # On btrfs the file must be NOCOW. > > chattr +C $MNT/foo_clone_$i &> /dev/null > > cp --reflink=always $MNT/foo $MNT/foo_clone_$i > > done > > > > if [ $sync_after_add_reflinks -ne 0 ]; then > > # Flush delayed refs and commit current transaction. > > sync -f $MNT > > fi > > > > # Remove the original file and all clones except the last. > > rm -f $MNT/foo > > for ((i = 1; i < $NUM_CLONES; i++)); do > > rm -f $MNT/foo_clone_$i > > done > > > > if [ $sync_after_remove_reflinks -ne 0 ]; then > > # Flush delayed refs and commit current transaction. > > sync -f $MNT > > fi > > > > # Now use the last clone as a swap file. It should work since > > # its extent are not shared anymore. > > swapon $MNT/foo_clone_${NUM_CLONES} > > swapoff $MNT/foo_clone_${NUM_CLONES} > > > > umount $MNT > > } > > > > echo -e "\nTest without sync after creating and removing clones" > > run_test 0 0 > > > > echo -e "\nTest with sync after creating clones" > > run_test 1 0 > > > > echo -e "\nTest with sync after removing clones" > > run_test 0 1 > > > > echo -e "\nTest with sync after creating and removing clones" > > run_test 1 1 > > > > Running the test: > > > > $ ./test.sh > > Test without sync after creating and removing clones > > wrote 1048576/1048576 bytes at offset 0 > > 1 MiB, 1 ops; 0.0017 sec (556.793 MiB/sec and 556.7929 ops/sec) > > Setting up swapspace version 1, size = 1020 KiB (1044480 bytes) > > no label, UUID=a6b9c29e-5ef4-4689-a8ac-bc199c750f02 > > swapon: /mnt/sdi/foo_clone_50: swapon failed: Invalid argument > > swapoff: /mnt/sdi/foo_clone_50: swapoff failed: Invalid argument > > > > Test with sync after creating clones > > wrote 1048576/1048576 bytes at offset 0 > > 1 MiB, 1 ops; 0.0036 sec (271.739 MiB/sec and 271.7391 ops/sec) > > Setting up swapspace version 1, size = 1020 KiB (1044480 bytes) > > no label, UUID=5e9008d6-1f7a-4948-a1b4-3f30aba20a33 > > swapon: /mnt/sdi/foo_clone_50: swapon failed: Invalid argument > > swapoff: /mnt/sdi/foo_clone_50: swapoff failed: Invalid argument > > > > Test with sync after removing clones > > wrote 1048576/1048576 bytes at offset 0 > > 1 MiB, 1 ops; 0.0103 sec (96.665 MiB/sec and 96.6651 ops/sec) > > Setting up swapspace version 1, size = 1020 KiB (1044480 bytes) > > no label, UUID=916c2740-fa9f-4385-9f06-29c3f89e4764 > > > > Test with sync after creating and removing clones > > wrote 1048576/1048576 bytes at offset 0 > > 1 MiB, 1 ops; 0.0031 sec (314.268 MiB/sec and 314.2678 ops/sec) > > Setting up swapspace version 1, size = 1020 KiB (1044480 bytes) > > no label, UUID=06aab1dd-4d90-49c0-bd9f-3a8db4e2f912 > > swapon: /mnt/sdi/foo_clone_50: swapon failed: Invalid argument > > swapoff: /mnt/sdi/foo_clone_50: swapoff failed: Invalid argument > > > > Fix this by reworking btrfs_swap_activate() to instead of using extent > > maps and checking for shared extents with can_nocow_extent(), iterate > > over the inode's file extent items and use the accurate > > btrfs_is_data_extent_shared(). > > > > Signed-off-by: Filipe Manana <fdmanana@suse.com> > > The patch looks good to me. > > Reviewed-by: Qu Wenruo <wqu@suse.com> > > > Although there is no mention about why we get rid of btrfs_get_extent(). > > I guess it's to avoid caching those extent maps to save space? Just like > what we did in fiemap. It's not so much to avoid loading extent maps unnecessarily but because of correctness. Extent maps can be merged, and we need to find out individual extents because some of the extents that were merged may be shared while others are not and vice-versa. Otherwise it would be a bug, similar to what we had with fiemap fixed in ac3c0d36a2a2f7a8f9778faef3f2639f5bf29d44, for which there's test case generic/702. That's why we iterate over extent items and not extent maps. Thanks. > > Thanks, > Qu > > > --- > > fs/btrfs/inode.c | 96 ++++++++++++++++++++++++++++++++++-------------- > > 1 file changed, 69 insertions(+), 27 deletions(-) > > > > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c > > index 926d82fbdbae..7ddb8a01b60f 100644 > > --- a/fs/btrfs/inode.c > > +++ b/fs/btrfs/inode.c > > @@ -9799,15 +9799,16 @@ static int btrfs_swap_activate(struct swap_info_struct *sis, struct file *file, > > struct btrfs_fs_info *fs_info = root->fs_info; > > struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree; > > struct extent_state *cached_state = NULL; > > - struct extent_map *em = NULL; > > struct btrfs_chunk_map *map = NULL; > > struct btrfs_device *device = NULL; > > struct btrfs_swap_info bsi = { > > .lowest_ppage = (sector_t)-1ULL, > > }; > > + struct btrfs_backref_share_check_ctx *backref_ctx = NULL; > > + struct btrfs_path *path = NULL; > > int ret = 0; > > u64 isize; > > - u64 start; > > + u64 prev_extent_end = 0; > > > > /* > > * Acquire the inode's mmap lock to prevent races with memory mapped > > @@ -9846,6 +9847,13 @@ static int btrfs_swap_activate(struct swap_info_struct *sis, struct file *file, > > goto out_unlock_mmap; > > } > > > > + path = btrfs_alloc_path(); > > + backref_ctx = btrfs_alloc_backref_share_check_ctx(); > > + if (!path || !backref_ctx) { > > + ret = -ENOMEM; > > + goto out_unlock_mmap; > > + } > > + > > /* > > * Balance or device remove/replace/resize can move stuff around from > > * under us. The exclop protection makes sure they aren't running/won't > > @@ -9904,24 +9912,39 @@ static int btrfs_swap_activate(struct swap_info_struct *sis, struct file *file, > > isize = ALIGN_DOWN(inode->i_size, fs_info->sectorsize); > > > > lock_extent(io_tree, 0, isize - 1, &cached_state); > > - start = 0; > > - while (start < isize) { > > - u64 logical_block_start, physical_block_start; > > + while (prev_extent_end < isize) { > > + struct btrfs_key key; > > + struct extent_buffer *leaf; > > + struct btrfs_file_extent_item *ei; > > struct btrfs_block_group *bg; > > - u64 len = isize - start; > > + u64 logical_block_start; > > + u64 physical_block_start; > > + u64 extent_gen; > > + u64 disk_bytenr; > > + u64 len; > > > > - em = btrfs_get_extent(BTRFS_I(inode), NULL, start, len); > > - if (IS_ERR(em)) { > > - ret = PTR_ERR(em); > > + key.objectid = btrfs_ino(BTRFS_I(inode)); > > + key.type = BTRFS_EXTENT_DATA_KEY; > > + key.offset = prev_extent_end; > > + > > + ret = btrfs_search_slot(NULL, root, &key, path, 0, 0); > > + if (ret < 0) > > goto out; > > - } > > > > - if (em->disk_bytenr == EXTENT_MAP_HOLE) { > > + /* > > + * If key not found it means we have an implicit hole (NO_HOLES > > + * is enabled). > > + */ > > + if (ret > 0) { > > btrfs_warn(fs_info, "swapfile must not have holes"); > > ret = -EINVAL; > > goto out; > > } > > - if (em->disk_bytenr == EXTENT_MAP_INLINE) { > > + > > + leaf = path->nodes[0]; > > + ei = btrfs_item_ptr(leaf, path->slots[0], struct btrfs_file_extent_item); > > + > > + if (btrfs_file_extent_type(leaf, ei) == BTRFS_FILE_EXTENT_INLINE) { > > /* > > * It's unlikely we'll ever actually find ourselves > > * here, as a file small enough to fit inline won't be > > @@ -9933,23 +9956,45 @@ static int btrfs_swap_activate(struct swap_info_struct *sis, struct file *file, > > ret = -EINVAL; > > goto out; > > } > > - if (extent_map_is_compressed(em)) { > > + > > + if (btrfs_file_extent_compression(leaf, ei) != BTRFS_COMPRESS_NONE) { > > btrfs_warn(fs_info, "swapfile must not be compressed"); > > ret = -EINVAL; > > goto out; > > } > > > > - logical_block_start = extent_map_block_start(em) + (start - em->start); > > - len = min(len, em->len - (start - em->start)); > > - free_extent_map(em); > > - em = NULL; > > + disk_bytenr = btrfs_file_extent_disk_bytenr(leaf, ei); > > + if (disk_bytenr == 0) { > > + btrfs_warn(fs_info, "swapfile must not have holes"); > > + ret = -EINVAL; > > + goto out; > > + } > > + > > + logical_block_start = disk_bytenr + btrfs_file_extent_offset(leaf, ei); > > + extent_gen = btrfs_file_extent_generation(leaf, ei); > > + prev_extent_end = btrfs_file_extent_end(path); > > + > > + if (prev_extent_end > isize) > > + len = isize - key.offset; > > + else > > + len = btrfs_file_extent_num_bytes(leaf, ei); > > > > - ret = can_nocow_extent(inode, start, &len, NULL, false, true); > > + backref_ctx->curr_leaf_bytenr = leaf->start; > > + > > + /* > > + * Don't need the path anymore, release to avoid deadlocks when > > + * calling btrfs_is_data_extent_shared() because when joining a > > + * transaction it can block waiting for the current one's commit > > + * which in turn may be trying to lock the same leaf to flush > > + * delayed items for example. > > + */ > > + btrfs_release_path(path); > > + > > + ret = btrfs_is_data_extent_shared(BTRFS_I(inode), disk_bytenr, > > + extent_gen, backref_ctx); > > if (ret < 0) { > > goto out; > > - } else if (ret) { > > - ret = 0; > > - } else { > > + } else if (ret > 0) { > > btrfs_warn(fs_info, > > "swapfile must not be copy-on-write"); > > ret = -EINVAL; > > @@ -9984,7 +10029,6 @@ static int btrfs_swap_activate(struct swap_info_struct *sis, struct file *file, > > > > physical_block_start = (map->stripes[0].physical + > > (logical_block_start - map->start)); > > - len = min(len, map->chunk_len - (logical_block_start - map->start)); > > btrfs_free_chunk_map(map); > > map = NULL; > > > > @@ -10025,20 +10069,16 @@ static int btrfs_swap_activate(struct swap_info_struct *sis, struct file *file, > > if (ret) > > goto out; > > } > > - bsi.start = start; > > + bsi.start = key.offset; > > bsi.block_start = physical_block_start; > > bsi.block_len = len; > > } > > - > > - start += len; > > } > > > > if (bsi.block_len) > > ret = btrfs_add_swap_extent(sis, &bsi); > > > > out: > > - if (!IS_ERR_OR_NULL(em)) > > - free_extent_map(em); > > if (!IS_ERR_OR_NULL(map)) > > btrfs_free_chunk_map(map); > > > > @@ -10053,6 +10093,8 @@ static int btrfs_swap_activate(struct swap_info_struct *sis, struct file *file, > > > > out_unlock_mmap: > > up_write(&BTRFS_I(inode)->i_mmap_lock); > > + btrfs_free_backref_share_ctx(backref_ctx); > > + btrfs_free_path(path); > > if (ret) > > return ret; > > >
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 926d82fbdbae..7ddb8a01b60f 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -9799,15 +9799,16 @@ static int btrfs_swap_activate(struct swap_info_struct *sis, struct file *file, struct btrfs_fs_info *fs_info = root->fs_info; struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree; struct extent_state *cached_state = NULL; - struct extent_map *em = NULL; struct btrfs_chunk_map *map = NULL; struct btrfs_device *device = NULL; struct btrfs_swap_info bsi = { .lowest_ppage = (sector_t)-1ULL, }; + struct btrfs_backref_share_check_ctx *backref_ctx = NULL; + struct btrfs_path *path = NULL; int ret = 0; u64 isize; - u64 start; + u64 prev_extent_end = 0; /* * Acquire the inode's mmap lock to prevent races with memory mapped @@ -9846,6 +9847,13 @@ static int btrfs_swap_activate(struct swap_info_struct *sis, struct file *file, goto out_unlock_mmap; } + path = btrfs_alloc_path(); + backref_ctx = btrfs_alloc_backref_share_check_ctx(); + if (!path || !backref_ctx) { + ret = -ENOMEM; + goto out_unlock_mmap; + } + /* * Balance or device remove/replace/resize can move stuff around from * under us. The exclop protection makes sure they aren't running/won't @@ -9904,24 +9912,39 @@ static int btrfs_swap_activate(struct swap_info_struct *sis, struct file *file, isize = ALIGN_DOWN(inode->i_size, fs_info->sectorsize); lock_extent(io_tree, 0, isize - 1, &cached_state); - start = 0; - while (start < isize) { - u64 logical_block_start, physical_block_start; + while (prev_extent_end < isize) { + struct btrfs_key key; + struct extent_buffer *leaf; + struct btrfs_file_extent_item *ei; struct btrfs_block_group *bg; - u64 len = isize - start; + u64 logical_block_start; + u64 physical_block_start; + u64 extent_gen; + u64 disk_bytenr; + u64 len; - em = btrfs_get_extent(BTRFS_I(inode), NULL, start, len); - if (IS_ERR(em)) { - ret = PTR_ERR(em); + key.objectid = btrfs_ino(BTRFS_I(inode)); + key.type = BTRFS_EXTENT_DATA_KEY; + key.offset = prev_extent_end; + + ret = btrfs_search_slot(NULL, root, &key, path, 0, 0); + if (ret < 0) goto out; - } - if (em->disk_bytenr == EXTENT_MAP_HOLE) { + /* + * If key not found it means we have an implicit hole (NO_HOLES + * is enabled). + */ + if (ret > 0) { btrfs_warn(fs_info, "swapfile must not have holes"); ret = -EINVAL; goto out; } - if (em->disk_bytenr == EXTENT_MAP_INLINE) { + + leaf = path->nodes[0]; + ei = btrfs_item_ptr(leaf, path->slots[0], struct btrfs_file_extent_item); + + if (btrfs_file_extent_type(leaf, ei) == BTRFS_FILE_EXTENT_INLINE) { /* * It's unlikely we'll ever actually find ourselves * here, as a file small enough to fit inline won't be @@ -9933,23 +9956,45 @@ static int btrfs_swap_activate(struct swap_info_struct *sis, struct file *file, ret = -EINVAL; goto out; } - if (extent_map_is_compressed(em)) { + + if (btrfs_file_extent_compression(leaf, ei) != BTRFS_COMPRESS_NONE) { btrfs_warn(fs_info, "swapfile must not be compressed"); ret = -EINVAL; goto out; } - logical_block_start = extent_map_block_start(em) + (start - em->start); - len = min(len, em->len - (start - em->start)); - free_extent_map(em); - em = NULL; + disk_bytenr = btrfs_file_extent_disk_bytenr(leaf, ei); + if (disk_bytenr == 0) { + btrfs_warn(fs_info, "swapfile must not have holes"); + ret = -EINVAL; + goto out; + } + + logical_block_start = disk_bytenr + btrfs_file_extent_offset(leaf, ei); + extent_gen = btrfs_file_extent_generation(leaf, ei); + prev_extent_end = btrfs_file_extent_end(path); + + if (prev_extent_end > isize) + len = isize - key.offset; + else + len = btrfs_file_extent_num_bytes(leaf, ei); - ret = can_nocow_extent(inode, start, &len, NULL, false, true); + backref_ctx->curr_leaf_bytenr = leaf->start; + + /* + * Don't need the path anymore, release to avoid deadlocks when + * calling btrfs_is_data_extent_shared() because when joining a + * transaction it can block waiting for the current one's commit + * which in turn may be trying to lock the same leaf to flush + * delayed items for example. + */ + btrfs_release_path(path); + + ret = btrfs_is_data_extent_shared(BTRFS_I(inode), disk_bytenr, + extent_gen, backref_ctx); if (ret < 0) { goto out; - } else if (ret) { - ret = 0; - } else { + } else if (ret > 0) { btrfs_warn(fs_info, "swapfile must not be copy-on-write"); ret = -EINVAL; @@ -9984,7 +10029,6 @@ static int btrfs_swap_activate(struct swap_info_struct *sis, struct file *file, physical_block_start = (map->stripes[0].physical + (logical_block_start - map->start)); - len = min(len, map->chunk_len - (logical_block_start - map->start)); btrfs_free_chunk_map(map); map = NULL; @@ -10025,20 +10069,16 @@ static int btrfs_swap_activate(struct swap_info_struct *sis, struct file *file, if (ret) goto out; } - bsi.start = start; + bsi.start = key.offset; bsi.block_start = physical_block_start; bsi.block_len = len; } - - start += len; } if (bsi.block_len) ret = btrfs_add_swap_extent(sis, &bsi); out: - if (!IS_ERR_OR_NULL(em)) - free_extent_map(em); if (!IS_ERR_OR_NULL(map)) btrfs_free_chunk_map(map); @@ -10053,6 +10093,8 @@ static int btrfs_swap_activate(struct swap_info_struct *sis, struct file *file, out_unlock_mmap: up_write(&BTRFS_I(inode)->i_mmap_lock); + btrfs_free_backref_share_ctx(backref_ctx); + btrfs_free_path(path); if (ret) return ret;