Message ID | cover.1666007330.git.johannes.thumshirn@wdc.com (mailing list archive) |
---|---|
Headers | show |
Series | btrfs: raid-stripe-tree draft patches | expand |
On Mon, Oct 17, 2022 at 04:55:18AM -0700, Johannes Thumshirn wrote: > Here's a yet another draft of my btrfs zoned RAID patches. It's based on > Christoph's bio splitting series for btrfs. > > Updates of the raid-stripe-tree are done at delayed-ref time to safe on > bandwidth while for reading we do the stripe-tree lookup on bio mapping time, > i.e. when the logical to physical translation happens for regular btrfs RAID > as well. > > The stripe tree is keyed by an extent's disk_bytenr and disk_num_bytes and > it's contents are the respective physical device id and position. > So generally I'm good with this design and everything, I just have a few asks 1. I want a design doc for btrfs-dev-docs that lays out the design and how it's meant to be used. The ondisk stuff, as well as the post update after the delayed ref runs. 2. Additionally, I would love to see it written down where exactly you want to use this in the future. I know you've talked about using it for other raid levels, but I have a hard time paying attention to my own stuff so I'd like to see what the long term vision is for this, again this would probably be well suited for the btrfs-dev-docs update. 3. I super don't love the fact that we have mirrored extents in both places, especially with zoned stripping it down to 128k, this tree is going to be huge. There's no way around this, but this makes the global roots thing even more important for scalability with zoned+RST. I don't really think you need to add that bit here now, I'll make it global in my patches for extent tree v2. Mostly I'm just lamenting that you're going to be ready before me and now you'll have to wait for the benefits of the global roots work. Thanks, Josef
On 20.10.22 17:42, Josef Bacik wrote: > On Mon, Oct 17, 2022 at 04:55:18AM -0700, Johannes Thumshirn wrote: >> Here's a yet another draft of my btrfs zoned RAID patches. It's based on >> Christoph's bio splitting series for btrfs. >> >> Updates of the raid-stripe-tree are done at delayed-ref time to safe on >> bandwidth while for reading we do the stripe-tree lookup on bio mapping time, >> i.e. when the logical to physical translation happens for regular btrfs RAID >> as well. >> >> The stripe tree is keyed by an extent's disk_bytenr and disk_num_bytes and >> it's contents are the respective physical device id and position. >> > > So generally I'm good with this design and everything, I just have a few asks > > 1. I want a design doc for btrfs-dev-docs that lays out the design and how it's > meant to be used. The ondisk stuff, as well as the post update after the > delayed ref runs. > 2. Additionally, I would love to see it written down where exactly you want to > use this in the future. I know you've talked about using it for other raid > levels, but I have a hard time paying attention to my own stuff so I'd like > to see what the long term vision is for this, again this would probably be > well suited for the btrfs-dev-docs update. Sure I'll add a document to btrfs-dev-docs (and sent it to the list for review). There's still a problem with the delayed-ref update to be solved resulting in the leak checker yelling on unmount, so maybe documenting what I've done, shows me where I messed up. > 3. I super don't love the fact that we have mirrored extents in both places, > especially with zoned stripping it down to 128k, this tree is going to be > huge. There's no way around this, but this makes the global roots thing even > more important for scalability with zoned+RST. I don't really think you need > to add that bit here now, I'll make it global in my patches for extent tree > v2. Mostly I'm just lamenting that you're going to be ready before me and > now you'll have to wait for the benefits of the global roots work. Well I'm pretty sure I won't be done before the global roots work is. But I agree the extra amount of metadata for the RST is a concern to me as well. Especially for overwrite kind of workloads it produces a lot of new extents for each CoW we write. Combine that with zoned and we really need working reclaim, otherwise it goes all down the drench. Can you please remind me, with your global roots am I getting a root per metadata or per data block group?