Message ID | 20240918140850.27261-1-jth@kernel.org (mailing list archive) |
---|---|
Headers | show |
Series | Add dummy RAID stripe tree entries for PREALLOC data | expand |
在 2024/9/18 23:38, Johannes Thumshirn 写道: > This is an RFC implementation of Qu's request to be able to > distinguish preallocated extents in the stripe tree for scrub. > > It's not 100% working yet but only showing the basic "how it's going to > look like". > > I'm not really sure this is a better solution than returning ENOENT > and ignoring it in scrub. If RST without zoned supports preallocation and NOCOW, then I think we should not just insert a dummy, but a real RST entries. So that NOCOW/preallocated writes can just re-use the existing RST entry, without any new RST updates. At least logically it makes more sense. At least a quick glance into the handling shows, NOCOW writes doesn't trigger any RST updates, so I guess it should also apply to PREALLOCATED writes too. Or did I miss something else? Thanks, Qu > > A third possibility would be to do a full backref walk on > btrfs_map_block() error and then check if it's a preallocated extent. > > Johannes Thumshirn (2): > btrfs: introduce dummy RAID stripes for preallocated data > btrfs: insert dummy RAID stripe extents for preallocated data > > fs/btrfs/inode.c | 47 +++++++++++++++++++++++++++++++++ > fs/btrfs/raid-stripe-tree.c | 47 +++++++++++++++++++++++++++++++++ > fs/btrfs/raid-stripe-tree.h | 2 ++ > include/uapi/linux/btrfs_tree.h | 1 + > 4 files changed, 97 insertions(+) >
On 19.09.24 01:46, Qu Wenruo wrote: > 在 2024/9/18 23:38, Johannes Thumshirn 写道: >> This is an RFC implementation of Qu's request to be able to >> distinguish preallocated extents in the stripe tree for scrub. >> >> It's not 100% working yet but only showing the basic "how it's going to >> look like". >> >> I'm not really sure this is a better solution than returning ENOENT >> and ignoring it in scrub. > > If RST without zoned supports preallocation and NOCOW, then I think we > should not just insert a dummy, but a real RST entries. > > So that NOCOW/preallocated writes can just re-use the existing RST > entry, without any new RST updates. > > At least logically it makes more sense. > > At least a quick glance into the handling shows, NOCOW writes doesn't > trigger any RST updates, so I guess it should also apply to PREALLOCATED > writes too. Yup, got a fix for NOCOW. That again was a stupid mistake because I only had zoned in mind. For prealloc I don't this I should add RST entries, because there's nothing on disk for these ranges. It's pre-allocated after all. Once we write into the range, RST entries will be created: virtme-scsi:/mnt # xfs_io -fc "falloc 0 1M" test virtme-scsi:/mnt # btrfs ins dump-tree -t extent /dev/sda | grep -A 1 EXTENT_ITEM item 13 key (298844160 EXTENT_ITEM 1048576) itemoff 15828 itemsize 53 refs 1 gen 8 flags DATA virtme-scsi:/mnt # btrfs ins dump-tree -t fs /dev/sda | grep -A 3 EXTENT_DATA item 6 key (257 EXTENT_DATA 0) itemoff 15816 itemsize 53 generation 8 type 2 (prealloc) prealloc data disk byte 298844160 nr 1048576 prealloc data offset 0 nr 1048576 virtme-scsi:/mnt # btrfs ins dump-tree -t raid-stripe /dev/sda btrfs-progs v6.9 raid stripe tree key (RAID_STRIPE_TREE ROOT_ITEM 0) leaf 5259264 items 0 free space 16283 generation 3 owner RAID_STRIPE_TREE leaf 5259264 flags 0x1(WRITTEN) backref revision 1 checksum stored f54ade94 checksum calced f54ade94 fs uuid bebf1755-9379-4ac8-a623-ad0dc52641cf chunk uuid 463f7b1d-c0b8-4373-8334-52f5bf83475e total bytes 21474836480 bytes used 1212416 uuid bebf1755-9379-4ac8-a623-ad0dc52641cf virtme-scsi:/mnt # xfs_io -fc "pwrite 64k 64k" test wrote 65536/65536 bytes at offset 65536 64 KiB, 16 ops; 0.0003 sec (183.284 MiB/sec and 46920.8211 ops/sec) virtme-scsi:/mnt # btrfs ins dump-tree -t extent /dev/sda | grep -A 1 EXTENT_ITEM item 13 key (298844160 EXTENT_ITEM 1048576) itemoff 15828 itemsize 53 refs 3 gen 8 flags DATA virtme-scsi:/mnt # btrfs ins dump-tree -t fs /dev/sda | grep -A 3 EXTENT_DATA item 6 key (257 EXTENT_DATA 0) itemoff 15816 itemsize 53 generation 9 type 2 (prealloc) prealloc data disk byte 298844160 nr 1048576 prealloc data offset 0 nr 65536 item 7 key (257 EXTENT_DATA 65536) itemoff 15763 itemsize 53 generation 9 type 1 (regular) extent data disk byte 298844160 nr 1048576 extent data offset 65536 nr 65536 ram 1048576 -- item 8 key (257 EXTENT_DATA 131072) itemoff 15710 itemsize 53 generation 9 type 2 (prealloc) prealloc data disk byte 298844160 nr 1048576 prealloc data offset 131072 nr 917504 virtme-scsi:/mnt # btrfs ins dump-tree -t raid-stripe /dev/sda btrfs-progs v6.9 raid stripe tree key (RAID_STRIPE_TREE ROOT_ITEM 0) leaf 30638080 items 1 free space 16226 generation 9 owner RAID_STRIPE_TREE leaf 30638080 flags 0x1(WRITTEN) backref revision 1 checksum stored 9ed94b4d checksum calced 9ed94b4d fs uuid bebf1755-9379-4ac8-a623-ad0dc52641cf chunk uuid 463f7b1d-c0b8-4373-8334-52f5bf83475e item 0 key (298909696 RAID_STRIPE 65536) itemoff 16251 itemsize 32 stripe 0 devid 1 physical 298909696 stripe 1 devid 2 physical 277938176 total bytes 21474836480 bytes used 1212416 uuid bebf1755-9379-4ac8-a623-ad0dc52641cf
On 18.09.2024 16:08, Johannes Thumshirn wrote: > This is an RFC implementation of Qu's request to be able to > distinguish preallocated extents in the stripe tree for scrub. > > It's not 100% working yet but only showing the basic "how it's going to > look like". > > I'm not really sure this is a better solution than returning ENOENT > and ignoring it in scrub. > > A third possibility would be to do a full backref walk on > btrfs_map_block() error and then check if it's a preallocated extent. > > Johannes Thumshirn (2): > btrfs: introduce dummy RAID stripes for preallocated data > btrfs: insert dummy RAID stripe extents for preallocated data > > fs/btrfs/inode.c | 47 +++++++++++++++++++++++++++++++++ > fs/btrfs/raid-stripe-tree.c | 47 +++++++++++++++++++++++++++++++++ > fs/btrfs/raid-stripe-tree.h | 2 ++ > include/uapi/linux/btrfs_tree.h | 1 + > 4 files changed, 97 insertions(+) > Will this also solve the compression topic (see subject "BTRFS doesn't compress on the fly") for e.g. PostgreSQL (which preallocates)? Thnx. Ciao, Gerhard