Message ID | 20210116071533.105780-1-wqu@suse.com (mailing list archive) |
---|---|
Headers | show |
Series | btrfs: add read-only support for subpage sector size | expand |
On Sat, Jan 16, 2021 at 03:15:15PM +0800, Qu Wenruo wrote: > Patches can be fetched from github: > https://github.com/adam900710/linux/tree/subpage > Currently the branch also contains partial RW data support (still some > ordered extent and data csum mismatch problems) > > Great thanks to David/Nikolay/Josef for their effort reviewing and > merging the preparation patches into misc-next. > > === What works === > > Just from the patchset: > - Data read > Both regular and compressed data, with csum check. > > - Metadata read > > This means, with these patchset, 64K page systems can at least mount > btrfs with 4K sector size. I haven't found anything serious, the comments are cosmetic and I can fixup that or other simple things at commit time. Is there anthing serious still not working? As the subpage support is sort of an isolated feature we could afford to get the first batch of code in and continue polishing. Read-only suppot with 64k/4k is a good milestone so I'm not worried too much about some smaller things left behind, as long as the default case page size == sectorsize works. Tests of this branch are still running but so far so good. I'll add it as a topic branch to for-next for testing and my current plan is to push it to misc-next soon, targeting 5.12. > In the subpage branch > - Metadata read write and balance > Not yet full tested due to data write still has bugs need to be > solved. > But considering that metadata operations from previous iteration > is mostly untouched, metadata read write should be pretty stable. I assume the bugs are for the 64k/4k usecase. > - Data read write and balance > Only uncompressed data writes. Fsstress can survive for around 5000 > ops and more. > But still some random data csum error, and even more rare ordered > extent related BUG_ON(). > Still invetigating. You say it's for 'read write', right now getting the read-only suport without known bugs would be sufficient. > === Needs feedback === > The following design needs extra comments: > > - u16 bitmap > As David mentioned, using u16 as bit map is not the fastest way. > That's also why current bitmap code requires unsigned long (u32) as > minimal unit. > But using bitmap directly would double the memory usage. > Thus the best way is to pack two u16 bitmap into one u32 bitmap, but > that still needs extra investigation to find better practice. I think that for first implementation we can afford to trade off correctness for performance. In this case not optimal bitmap tracking with the spinlock. Replacing a better bitmap tracking with atomics would be a separate step and can be reviewed independently once we know the slow but coorrect case works as expected. > Anyway the skeleton should be pretty simple to expand. > > - Separate handling for subpage metadata > Currently the metadata read and (later write path) handles subpage > metadata differently. Mostly due to the page locking must be skipped > for subpage metadata. > I tried several times to use as many common code as possible, but > every time I ended up reverting back to current code. > > Thankfully, for data handling we will use the same common code. Ok. > - Incompatible subpage strcuture against iomap_page > In btrfs we need extra bits than iomap_page. > This is due to we need sector perfect write for data balance. > E.g. if only one 4K sector is dirty in a 64K page, we should only > write that dirty 4K back to disk, not the full 64K page. > > As data balance requires the new data extents to have exactly the > same size as the original ones. > This means, unless iomap_page get extra bits like what we're doing in > btrfs for dirty, we can't merge the btrfs_subpage with iomap_page. Ok, so implementing the subpage support inside btrfs first gives us some space for the specific needs or workarounds that would perhaps need extensions of the iomap API. Once we have that working and understand what exactly we need, then we can ask for iomap changes. This has worked well, eg. during the direct io conversion, so we can build on that.
On 2021/1/19 上午7:17, David Sterba wrote: > On Sat, Jan 16, 2021 at 03:15:15PM +0800, Qu Wenruo wrote: >> Patches can be fetched from github: >> https://github.com/adam900710/linux/tree/subpage >> Currently the branch also contains partial RW data support (still some >> ordered extent and data csum mismatch problems) >> >> Great thanks to David/Nikolay/Josef for their effort reviewing and >> merging the preparation patches into misc-next. >> >> === What works === >> >> Just from the patchset: >> - Data read >> Both regular and compressed data, with csum check. >> >> - Metadata read >> >> This means, with these patchset, 64K page systems can at least mount >> btrfs with 4K sector size. > > I haven't found anything serious, the comments are cosmetic and I can > fixup that or other simple things at commit time. > > Is there anthing serious still not working? Compression write (not even touching it). Random (rare) ordered extent related bugs (from BUG_ON() due to missing ordered extent to data csum mismatch). Working on the ordered extent bug now. > As the subpage support is > sort of an isolated feature we could afford to get the first batch of > code in and continue polishing. Read-only suppot with 64k/4k is a good > milestone so I'm not worried too much about some smaller things left > behind, as long as the default case page size == sectorsize works. Yeah, that's the core design of current subpage support, all subpage will be handled in a different routine, leaving minimal impact to existing code. > > Tests of this branch are still running but so far so good. I'll add it > as a topic branch to for-next for testing and my current plan is to push > it to misc-next soon, targeting 5.12. That's great to hear. > >> In the subpage branch >> - Metadata read write and balance >> Not yet full tested due to data write still has bugs need to be >> solved. >> But considering that metadata operations from previous iteration >> is mostly untouched, metadata read write should be pretty stable. > > I assume the bugs are for the 64k/4k usecase. Yes, at least the 4K case passes fstests in my local env. Thanks, Qu > >> - Data read write and balance >> Only uncompressed data writes. Fsstress can survive for around 5000 >> ops and more. >> But still some random data csum error, and even more rare ordered >> extent related BUG_ON(). >> Still invetigating. > > You say it's for 'read write', right now getting the read-only suport > without known bugs would be sufficient. > >> === Needs feedback === >> The following design needs extra comments: >> >> - u16 bitmap >> As David mentioned, using u16 as bit map is not the fastest way. >> That's also why current bitmap code requires unsigned long (u32) as >> minimal unit. >> But using bitmap directly would double the memory usage. >> Thus the best way is to pack two u16 bitmap into one u32 bitmap, but >> that still needs extra investigation to find better practice. > > I think that for first implementation we can afford to trade off > correctness for performance. In this case not optimal bitmap tracking > with the spinlock. Replacing a better bitmap tracking with atomics would > be a separate step and can be reviewed independently once we know the > slow but coorrect case works as expected. > >> Anyway the skeleton should be pretty simple to expand. >> >> - Separate handling for subpage metadata >> Currently the metadata read and (later write path) handles subpage >> metadata differently. Mostly due to the page locking must be skipped >> for subpage metadata. >> I tried several times to use as many common code as possible, but >> every time I ended up reverting back to current code. >> >> Thankfully, for data handling we will use the same common code. > > Ok. > >> - Incompatible subpage strcuture against iomap_page >> In btrfs we need extra bits than iomap_page. >> This is due to we need sector perfect write for data balance. >> E.g. if only one 4K sector is dirty in a 64K page, we should only >> write that dirty 4K back to disk, not the full 64K page. >> >> As data balance requires the new data extents to have exactly the >> same size as the original ones. >> This means, unless iomap_page get extra bits like what we're doing in >> btrfs for dirty, we can't merge the btrfs_subpage with iomap_page. > > Ok, so implementing the subpage support inside btrfs first gives us some > space for the specific needs or workarounds that would perhaps need > extensions of the iomap API. Once we have that working and understand > what exactly we need, then we can ask for iomap changes. This has worked > well, eg. during the direct io conversion, so we can build on that. >
On Tue, Jan 19, 2021 at 07:26:17AM +0800, Qu Wenruo wrote: > On 2021/1/19 上午7:17, David Sterba wrote: > > On Sat, Jan 16, 2021 at 03:15:15PM +0800, Qu Wenruo wrote: > > As the subpage support is > > sort of an isolated feature we could afford to get the first batch of > > code in and continue polishing. Read-only suppot with 64k/4k is a good > > milestone so I'm not worried too much about some smaller things left > > behind, as long as the default case page size == sectorsize works. > > Yeah, that's the core design of current subpage support, all subpage > will be handled in a different routine, leaving minimal impact to > existing code. > > > > > Tests of this branch are still running but so far so good. I'll add it > > as a topic branch to for-next for testing and my current plan is to push > > it to misc-next soon, targeting 5.12. > > That's great to hear. > > > >> In the subpage branch > >> - Metadata read write and balance > >> Not yet full tested due to data write still has bugs need to be > >> solved. > >> But considering that metadata operations from previous iteration > >> is mostly untouched, metadata read write should be pretty stable. > > > > I assume the bugs are for the 64k/4k usecase. > > Yes, at least the 4K case passes fstests in my local env. I'd done a pre-merge pass last week with fixups in changlogs, subjects and some coding style fixes but that was before Josef's comments. Some of them still need updates but I also don't want to throw away my changes. (Ideally I don't have to do them at all, you can get the gist of what are the most common things I'm fixing by comparing both versions.) Please have a look at the branch ext/qu/subpage-v4 in my github repo, the patches are in the same order as in this posted patchset. If the patch does not change you can keep it as is, I'll reuse what I have. For the final merge of the read-only support, patch 1 could be dropped as discussed. The rest is hopefully ok to go, please resend, thanks.
On 2021/1/24 下午8:29, David Sterba wrote: > On Tue, Jan 19, 2021 at 07:26:17AM +0800, Qu Wenruo wrote: >> On 2021/1/19 上午7:17, David Sterba wrote: >>> On Sat, Jan 16, 2021 at 03:15:15PM +0800, Qu Wenruo wrote: >>> As the subpage support is >>> sort of an isolated feature we could afford to get the first batch of >>> code in and continue polishing. Read-only suppot with 64k/4k is a good >>> milestone so I'm not worried too much about some smaller things left >>> behind, as long as the default case page size == sectorsize works. >> >> Yeah, that's the core design of current subpage support, all subpage >> will be handled in a different routine, leaving minimal impact to >> existing code. >> >>> >>> Tests of this branch are still running but so far so good. I'll add it >>> as a topic branch to for-next for testing and my current plan is to push >>> it to misc-next soon, targeting 5.12. >> >> That's great to hear. >>> >>>> In the subpage branch >>>> - Metadata read write and balance >>>> Not yet full tested due to data write still has bugs need to be >>>> solved. >>>> But considering that metadata operations from previous iteration >>>> is mostly untouched, metadata read write should be pretty stable. >>> >>> I assume the bugs are for the 64k/4k usecase. >> >> Yes, at least the 4K case passes fstests in my local env. > > I'd done a pre-merge pass last week with fixups in changlogs, subjects > and some coding style fixes but that was before Josef's comments. Some > of them still need updates but I also don't want to throw away my > changes. (Ideally I don't have to do them at all, you can get the gist > of what are the most common things I'm fixing by comparing both versions.) > > Please have a look at the branch ext/qu/subpage-v4 in my github repo, > the patches are in the same order as in this posted patchset. If the > patch does not change you can keep it as is, I'll reuse what I have. Already doing this, using the ext/qu/subpage-v4 as base, so all your modification should still be there. Thanks, Qu > > For the final merge of the read-only support, patch 1 could be dropped > as discussed. The rest is hopefully ok to go, please resend, thanks. >