Message ID | c9979f47d503ce623e9e8b8d1fb32188844c1990.1675853489.git.johannes.thumshirn@wdc.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | btrfs: introduce RAID stripe tree | expand |
On Wed, Feb 08, 2023 at 02:57:39AM -0800, Johannes Thumshirn wrote: > Add definitions for the raid stripe tree. This tree will hold information > about the on-disk layout of the stripes in a RAID set. > > Each stripe extent has a 1:1 relationship with an on-disk extent item and > is doing the logical to per-drive physical address translation for the > extent item in question. > > Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Thanks, Josef
On Wed, Feb 08, 2023 at 02:57:39AM -0800, Johannes Thumshirn wrote: > Add definitions for the raid stripe tree. This tree will hold information > about the on-disk layout of the stripes in a RAID set. > > Each stripe extent has a 1:1 relationship with an on-disk extent item and > is doing the logical to per-drive physical address translation for the > extent item in question. So this basially removes the need to trak the physical address in the chunk tree. Is there any way to stop maintaining it at all? If not, why?
On 13.02.23 07:50, Christoph Hellwig wrote: > On Wed, Feb 08, 2023 at 02:57:39AM -0800, Johannes Thumshirn wrote: >> Add definitions for the raid stripe tree. This tree will hold information >> about the on-disk layout of the stripes in a RAID set. >> >> Each stripe extent has a 1:1 relationship with an on-disk extent item and >> is doing the logical to per-drive physical address translation for the >> extent item in question. > > So this basially removes the need to trak the physical address in > the chunk tree. Is there any way to stop maintaining it at all? > If not, why? > > Isn't the chunk tree only storing the physical start of a chunk/block-group? What we /could/ do is change the absolute physical addresses in the stripe tree to offsets from the chunk start. On the upside that would give us the ability to use u32 instead of u64 and thus shrink the on-disk format, but on the flip-side we'd need to obtain the chunk start addresses and calculate the offsets on each endio. Classic time-memory tradeoff I guess. But then the chunk tree is needed to bootstrap the FS as well. And the RST is an optional incompatible feature so that'll make the code more ugly if we'd have to distinguish between these two cases. Josef, David? What's your thoughts on this?
On 08/02/2023 18:57, Johannes Thumshirn wrote: > Add definitions for the raid stripe tree. This tree will hold information > about the on-disk layout of the stripes in a RAID set. > > Each stripe extent has a 1:1 relationship with an on-disk extent item and > is doing the logical to per-drive physical address translation for the > extent item in question. > > Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> LGTM Reviewed-by: Anand Jain <anand.jain@oracle.com>
diff --git a/fs/btrfs/accessors.h b/fs/btrfs/accessors.h index ceadfc5d6c66..6e753b63faae 100644 --- a/fs/btrfs/accessors.h +++ b/fs/btrfs/accessors.h @@ -306,6 +306,35 @@ BTRFS_SETGET_FUNCS(timespec_nsec, struct btrfs_timespec, nsec, 32); BTRFS_SETGET_STACK_FUNCS(stack_timespec_sec, struct btrfs_timespec, sec, 64); BTRFS_SETGET_STACK_FUNCS(stack_timespec_nsec, struct btrfs_timespec, nsec, 32); +BTRFS_SETGET_FUNCS(raid_stride_devid, struct btrfs_raid_stride, devid, 64); +BTRFS_SETGET_FUNCS(raid_stride_physical, struct btrfs_raid_stride, physical, 64); +BTRFS_SETGET_STACK_FUNCS(stack_raid_stride_devid, struct btrfs_raid_stride, devid, 64); +BTRFS_SETGET_STACK_FUNCS(stack_raid_stride_physical, struct btrfs_raid_stride, physical, 64); + +static inline struct btrfs_raid_stride *btrfs_raid_stride_nr( + struct btrfs_stripe_extent *dps, int nr) +{ + unsigned long offset = (unsigned long)dps; + + offset += offsetof(struct btrfs_stripe_extent, strides); + offset += nr * sizeof(struct btrfs_raid_stride); + return (struct btrfs_raid_stride *)offset; +} + +static inline u64 btrfs_raid_stride_devid_nr(const struct extent_buffer *eb, + struct btrfs_stripe_extent *dps, + int nr) +{ + return btrfs_raid_stride_devid(eb, btrfs_raid_stride_nr(dps, nr)); +} + +static inline u64 btrfs_raid_stride_physical_nr(const struct extent_buffer *eb, + struct btrfs_stripe_extent *dps, + int nr) +{ + return btrfs_raid_stride_physical(eb, btrfs_raid_stride_nr(dps, nr)); +} + /* struct btrfs_dev_extent */ BTRFS_SETGET_FUNCS(dev_extent_chunk_tree, struct btrfs_dev_extent, chunk_tree, 64); BTRFS_SETGET_FUNCS(dev_extent_chunk_objectid, struct btrfs_dev_extent, diff --git a/include/uapi/linux/btrfs_tree.h b/include/uapi/linux/btrfs_tree.h index ab38d0f411fa..64e6bf2a10d8 100644 --- a/include/uapi/linux/btrfs_tree.h +++ b/include/uapi/linux/btrfs_tree.h @@ -4,9 +4,8 @@ #include <linux/btrfs.h> #include <linux/types.h> -#ifdef __KERNEL__ #include <linux/stddef.h> -#else +#ifndef __KERNEL__ #include <stddef.h> #endif @@ -73,6 +72,9 @@ /* Holds the block group items for extent tree v2. */ #define BTRFS_BLOCK_GROUP_TREE_OBJECTID 11ULL +/* tracks RAID stripes in block groups. */ +#define BTRFS_RAID_STRIPE_TREE_OBJECTID 12ULL + /* device stats in the device tree */ #define BTRFS_DEV_STATS_OBJECTID 0ULL @@ -281,6 +283,8 @@ */ #define BTRFS_QGROUP_RELATION_KEY 246 +#define BTRFS_RAID_STRIPE_KEY 247 + /* * Obsolete name, see BTRFS_TEMPORARY_ITEM_KEY. */ @@ -715,6 +719,18 @@ struct btrfs_free_space_header { __le64 num_bitmaps; } __attribute__ ((__packed__)); +struct btrfs_raid_stride { + /* btrfs device-id this raid extent lives on */ + __le64 devid; + /* physical location on disk */ + __le64 physical; +}; + +struct btrfs_stripe_extent { + /* array of raid strides this stripe is composed of */ + __DECLARE_FLEX_ARRAY(struct btrfs_raid_stride, strides); +}; + #define BTRFS_HEADER_FLAG_WRITTEN (1ULL << 0) #define BTRFS_HEADER_FLAG_RELOC (1ULL << 1)
Add definitions for the raid stripe tree. This tree will hold information about the on-disk layout of the stripes in a RAID set. Each stripe extent has a 1:1 relationship with an on-disk extent item and is doing the logical to per-drive physical address translation for the extent item in question. Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> --- fs/btrfs/accessors.h | 29 +++++++++++++++++++++++++++++ include/uapi/linux/btrfs_tree.h | 20 ++++++++++++++++++-- 2 files changed, 47 insertions(+), 2 deletions(-)