Message ID | 20211201051756.53742-1-wqu@suse.com (mailing list archive) |
---|---|
Headers | show |
Series | btrfs: split bio at btrfs_map_bio() time | expand |
On 2021/12/1 13:17, Qu Wenruo wrote: > [BACKGROUND] > > Currently btrfs never uses bio_split() to split its bio against RAID > stripe boundaries. > > Instead inside btrfs we check our stripe boundary everytime we allocate > a new bio, and ensure the new bio never cross stripe boundaries. > > [PROBLEMS] > > Although this behavior works fine, it's against the common practice used in > stacked drivers, and is making the effort to convert to iomap harder. > > There is also an hidden burden, every time we allocate a new bio, we uses > BIO_MAX_BVECS, but since we know the boundaries, for RAID0/RAID10 we can > only fit at most 16 pages (fixed 64K stripe size, and 4K page size), > wasting the 256 slots we allocated. > > [CHALLENGES] > > To change the situation, this patchset attempts to improve the situation > by moving the bio split into btrfs_map_bio() time, so upper layer should > no longer bother the bio split against RAID stripes or even chunk > boundaries. > > But there are several challenges: > > - Conflicts in various endio functions > We want the existing granularity, instead of chained endio, thus we > must make the involved endio functions to handle split bios. > > Although most endio functions are already doing their works > independent of the bio size, they are not yet fully handling split > bio. > > This patch will convert them to use saved bi_iter and only iterate > the split range instead of the whole bio. > This change involved 3 types of IOs: > > * Buffered IO > Including both data and metadata > * Direct IO > * Compressed IO > > Their endio functions needs different level of updates to handle split > bios. > > Furthermore, there is another endio, end_workqueue_bio(), it can't > handle split bios at all, thus we change the timing so that > btrfs_bio_wq_end_io() is only called after the bio being split. > > - Checksum verification > Currently we rely on btrfs_bio::csum to contain the checksum for the > whole bio. > If one bio get split, csum will no longer points to the correct > location for the split bio. > > This can be solved by introducing btrfs_bio::offset_to_original, and > use that new member to calculate where we should read csum from. > > For the parent bio, it still has btrfs_bio::csum for the whole bio, > thus it can still free it correctly. > > - Independent endio for each split bio > Unlike stack drivers, for RAID10 btrfs needs to try its best effort to > read every sectors, to handle the following case: (X means bad, either > unable to read or failed to pass checksum verification, V means good) > > Dev 1 (missing) | D1 (X) | > Dev 2 (OK) | D1 (V) | > Dev 3 (OK) | D2 (V) | > Dev 4 (OK) | D2 (X) | > > In the above RAID10 case, dev1 is missing, and although dev4 is fine, > its D2 sector is corrupted (by bit rot or whatever). > > If we use bio_chain(), read bio for both D1 and D2 will be split, and > since D1 is missing, the whole D1 and D2 read will be marked as error, > thus we will try to read from dev2 and dev4. > > But D2 in dev4 has csum mismatch, we can only rebuild D1 and D2 > correctly from dev2:D1 and dev3:D2. > > This patchset resolve this by saving bi_iter into btrfs_bio::iter, and > uses that at endio to iterate only the split part of an bio. > Other than this, existing read/write page endio functions can handle > them properly without problem. > > - Bad RAID56 naming/functionality > There are quite some RAID56 call sites relies on specific behavior on > __btrfs_map_block(), like returning @map_length as stripe_len other > than real mapped length. > > This is handled by some small cleanups specific for RAID56. > > [NEED FEEDBACK] > In this refactor, btrfs is utilizing a lot of call sites like: > > btrfs_bio_save_iter(); // Save bi_iter into some other location > __bio_for_each_segment(bvec, bio, iter, btrfs_bio->iter) { > /* Doing endio for each bvec */ > } > > And manually implementing an endio which does some work of > __bio_chain_endio() but with extra btrfs specific workload. > > I'm wondering if block layer is fine to provide some *enhanced* chain > bio facilities? > > [CHANGELOG] > RFC->v1: > - Better patch split > Now patch 01~06 are refactors/cleanups/preparations. > While 07~13 are the patches that doing the conversion while can handle > both old and new bio split timings. > Finally patch 14~16 convert the bio split call sites one by one to > newer facility. > The final patch is just a small clean up. > > - Various bug fixes > During the full fstests run, various stupid bugs are exposed and > fixed. The latest version can be fetched from this branch: https://github.com/adam900710/linux/tree/refactor_chunk_map Which already has the fix for the btrfs/187 crash, which is caused by a missing modification for scrub_stripe_index_and_offset(). Thanks, Qu > > Qu Wenruo (17): > btrfs: update an stale comment on btrfs_submit_bio_hook() > btrfs: save bio::bi_iter into btrfs_bio::iter before submitting > btrfs: use correct bio size for error message in btrfs_end_dio_bio() > btrfs: refactor btrfs_map_bio() > btrfs: move btrfs_bio_wq_end_io() calls into submit_stripe_bio() > btrfs: replace btrfs_dio_private::refs with > btrfs_dio_private::pending_bytes > btrfs: introduce btrfs_bio_split() helper > btrfs: make data buffered read path to handle split bio properly > btrfs: make data buffered write endio function to be split bio > compatible > btrfs: make metadata write endio functions to be split bio compatible > btrfs: make dec_and_test_compressed_bio() to be split bio compatible > btrfs: return proper mapped length for RAID56 profiles in > __btrfs_map_block() > btrfs: allow btrfs_map_bio() to split bio according to chunk stripe > boundaries > btrfs: remove buffered IO stripe boundary calculation > btrfs: remove stripe boundary calculation for compressed IO > btrfs: remove the stripe boundary calculation for direct IO > btrfs: unexport btrfs_get_io_geometry() > > fs/btrfs/btrfs_inode.h | 10 +- > fs/btrfs/compression.c | 70 +++----------- > fs/btrfs/disk-io.c | 9 +- > fs/btrfs/extent_io.c | 189 +++++++++++++++++++++++++------------ > fs/btrfs/extent_io.h | 2 + > fs/btrfs/inode.c | 210 ++++++++++++++++------------------------- > fs/btrfs/raid56.c | 14 ++- > fs/btrfs/raid56.h | 2 +- > fs/btrfs/scrub.c | 4 +- > fs/btrfs/volumes.c | 157 ++++++++++++++++++++++-------- > fs/btrfs/volumes.h | 75 +++++++++++++-- > 11 files changed, 435 insertions(+), 307 deletions(-) > -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel