Message ID | dd9a8794a1da2a4f3e7c47cc4df42ef972568ce4.1687133480.git.wqu@suse.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | btrfs: add trace events for bio split at storage layer | expand |
On Mon, Jun 19, 2023 at 08:11:37AM +0800, Qu Wenruo wrote: > [BACKGROUND] > David recently reported some weird RAID56 errors where huge btrfs bio > (tens of MiBs) is directly passed to RAID56 path, without proper bio > split. > > To my surprise, there is no trace events around the bio split code of > storage layer. > > [NEW TRACE EVENTS] > This patch would add 3 new trace events: > > - trace_initial_bbio() > - trace_before_split_bbio() > - trace_after_split_bbio() > > The example output would look like the following (headers and uuid > removed): > > initial_bbio: root=5 ino=257 logical=389152768 length=524288 opf=0x1 mirror_num=0 map_length=-1 > before_split_bbio: root=5 ino=257 logical=389152768 length=524288 opf=0x1 mirror_num=0 map_length=131072 > after_split_bbio: root=5 ino=257 logical=389152768 length=131072 opf=0x1 mirror_num=0 map_length=131072 > before_split_bbio: root=5 ino=257 logical=389283840 length=393216 opf=0x1 mirror_num=0 map_length=131072 > after_split_bbio: root=5 ino=257 logical=389283840 length=131072 opf=0x1 mirror_num=0 map_length=131072 > before_split_bbio: root=5 ino=257 logical=389414912 length=262144 opf=0x1 mirror_num=0 map_length=131072 > after_split_bbio: root=5 ino=257 logical=389414912 length=131072 opf=0x1 mirror_num=0 map_length=131072 > before_split_bbio: root=5 ino=257 logical=389545984 length=131072 opf=0x1 mirror_num=0 map_length=131072 > > The above lines show a 512K bbio submitted, then it got split by each > 128K boundary (this is a 3 disks RAID5). > > initial_bbio: root=1 ino=1 logical=30441472 length=16384 opf=0x1 mirror_num=0 map_length=-1 > before_split_bbio: root=1 ino=1 logical=30441472 length=16384 opf=0x1 mirror_num=0 map_length=16384 > initial_bbio: root=1 ino=1 logical=30457856 length=16384 opf=0x1 mirror_num=0 map_length=-1 > before_split_bbio: root=1 ino=1 logical=30457856 length=16384 opf=0x1 mirror_num=0 map_length=16384 > > And the above lines are metadata writes which didn't need to be split at > all. > > These new events should allow us to debug bio split related problems > easier. > > Signed-off-by: Qu Wenruo <wqu@suse.com> > --- > fs/btrfs/bio.c | 4 +++ > include/trace/events/btrfs.h | 51 ++++++++++++++++++++++++++++++++++++ > 2 files changed, 55 insertions(+) > > diff --git a/fs/btrfs/bio.c b/fs/btrfs/bio.c > index 12b12443efaa..204c30391086 100644 > --- a/fs/btrfs/bio.c > +++ b/fs/btrfs/bio.c > @@ -669,9 +669,12 @@ static bool btrfs_submit_chunk(struct btrfs_bio *bbio, int mirror_num) > if (use_append) > map_length = min(map_length, fs_info->max_zone_append_size); > > + trace_before_split_bbio(bbio, mirror_num, map_length); > + > if (map_length < length) { > bbio = btrfs_split_bio(fs_info, bbio, map_length, use_append); > bio = &bbio->bio; > + trace_after_split_bbio(bbio, mirror_num, map_length); > } > > /* > @@ -731,6 +734,7 @@ void btrfs_submit_bio(struct btrfs_bio *bbio, int mirror_num) > /* If bbio->inode is not populated, its file_offset must be 0. */ > ASSERT(bbio->inode || bbio->file_offset == 0); > > + trace_initial_bbio(bbio, mirror_num, -1); > while (!btrfs_submit_chunk(bbio, mirror_num)) > ; > } > diff --git a/include/trace/events/btrfs.h b/include/trace/events/btrfs.h > index c6eee9b414cf..1e6d87abd677 100644 > --- a/include/trace/events/btrfs.h > +++ b/include/trace/events/btrfs.h > @@ -31,6 +31,7 @@ struct extent_io_tree; > struct prelim_ref; > struct btrfs_space_info; > struct btrfs_raid_bio; > +struct btrfs_bio; > struct raid56_bio_trace_info; > struct find_free_extent_ctl; > > @@ -2521,6 +2522,56 @@ DEFINE_EVENT(btrfs_raid56_bio, raid56_scrub_read_recover, > TP_ARGS(rbio, bio, trace_info) > ); > > +DECLARE_EVENT_CLASS(btrfs_bio, > + > + TP_PROTO(const struct btrfs_bio *bbio, int mirror_num, u64 map_length), > + > + TP_ARGS(bbio, mirror_num, map_length), > + > + TP_STRUCT__entry_btrfs( > + __field( u64, logical ) > + __field( u64, root ) > + __field( u64, ino ) > + __field( s64, map_length ) > + __field( u32, length ) > + __field( int, mirror_num ) > + __field( u8, opf ) > + ), > + > + TP_fast_assign_btrfs(bbio->fs_info, > + __entry->logical = bbio->bio.bi_iter.bi_sector << SECTOR_SHIFT; > + __entry->length = bbio->bio.bi_iter.bi_size; > + __entry->map_length = map_length; > + __entry->mirror_num = mirror_num; > + __entry->opf = bio_op(&bbio->bio); > + __entry->root = bbio->inode ? > + bbio->inode->root->root_key.objectid : 0; Can't we use show_root_type() here? Other than that, looks good to me. Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com>
On 2023/6/19 12:15, Naohiro Aota wrote: > On Mon, Jun 19, 2023 at 08:11:37AM +0800, Qu Wenruo wrote: >> [BACKGROUND] >> David recently reported some weird RAID56 errors where huge btrfs bio >> (tens of MiBs) is directly passed to RAID56 path, without proper bio >> split. >> >> To my surprise, there is no trace events around the bio split code of >> storage layer. >> >> [NEW TRACE EVENTS] >> This patch would add 3 new trace events: >> >> - trace_initial_bbio() >> - trace_before_split_bbio() >> - trace_after_split_bbio() >> >> The example output would look like the following (headers and uuid >> removed): >> >> initial_bbio: root=5 ino=257 logical=389152768 length=524288 opf=0x1 mirror_num=0 map_length=-1 >> before_split_bbio: root=5 ino=257 logical=389152768 length=524288 opf=0x1 mirror_num=0 map_length=131072 >> after_split_bbio: root=5 ino=257 logical=389152768 length=131072 opf=0x1 mirror_num=0 map_length=131072 >> before_split_bbio: root=5 ino=257 logical=389283840 length=393216 opf=0x1 mirror_num=0 map_length=131072 >> after_split_bbio: root=5 ino=257 logical=389283840 length=131072 opf=0x1 mirror_num=0 map_length=131072 >> before_split_bbio: root=5 ino=257 logical=389414912 length=262144 opf=0x1 mirror_num=0 map_length=131072 >> after_split_bbio: root=5 ino=257 logical=389414912 length=131072 opf=0x1 mirror_num=0 map_length=131072 >> before_split_bbio: root=5 ino=257 logical=389545984 length=131072 opf=0x1 mirror_num=0 map_length=131072 >> >> The above lines show a 512K bbio submitted, then it got split by each >> 128K boundary (this is a 3 disks RAID5). >> >> initial_bbio: root=1 ino=1 logical=30441472 length=16384 opf=0x1 mirror_num=0 map_length=-1 >> before_split_bbio: root=1 ino=1 logical=30441472 length=16384 opf=0x1 mirror_num=0 map_length=16384 >> initial_bbio: root=1 ino=1 logical=30457856 length=16384 opf=0x1 mirror_num=0 map_length=-1 >> before_split_bbio: root=1 ino=1 logical=30457856 length=16384 opf=0x1 mirror_num=0 map_length=16384 >> >> And the above lines are metadata writes which didn't need to be split at >> all. >> >> These new events should allow us to debug bio split related problems >> easier. >> >> Signed-off-by: Qu Wenruo <wqu@suse.com> >> --- >> fs/btrfs/bio.c | 4 +++ >> include/trace/events/btrfs.h | 51 ++++++++++++++++++++++++++++++++++++ >> 2 files changed, 55 insertions(+) >> >> diff --git a/fs/btrfs/bio.c b/fs/btrfs/bio.c >> index 12b12443efaa..204c30391086 100644 >> --- a/fs/btrfs/bio.c >> +++ b/fs/btrfs/bio.c >> @@ -669,9 +669,12 @@ static bool btrfs_submit_chunk(struct btrfs_bio *bbio, int mirror_num) >> if (use_append) >> map_length = min(map_length, fs_info->max_zone_append_size); >> >> + trace_before_split_bbio(bbio, mirror_num, map_length); >> + >> if (map_length < length) { >> bbio = btrfs_split_bio(fs_info, bbio, map_length, use_append); >> bio = &bbio->bio; >> + trace_after_split_bbio(bbio, mirror_num, map_length); >> } >> >> /* >> @@ -731,6 +734,7 @@ void btrfs_submit_bio(struct btrfs_bio *bbio, int mirror_num) >> /* If bbio->inode is not populated, its file_offset must be 0. */ >> ASSERT(bbio->inode || bbio->file_offset == 0); >> >> + trace_initial_bbio(bbio, mirror_num, -1); >> while (!btrfs_submit_chunk(bbio, mirror_num)) >> ; >> } >> diff --git a/include/trace/events/btrfs.h b/include/trace/events/btrfs.h >> index c6eee9b414cf..1e6d87abd677 100644 >> --- a/include/trace/events/btrfs.h >> +++ b/include/trace/events/btrfs.h >> @@ -31,6 +31,7 @@ struct extent_io_tree; >> struct prelim_ref; >> struct btrfs_space_info; >> struct btrfs_raid_bio; >> +struct btrfs_bio; >> struct raid56_bio_trace_info; >> struct find_free_extent_ctl; >> >> @@ -2521,6 +2522,56 @@ DEFINE_EVENT(btrfs_raid56_bio, raid56_scrub_read_recover, >> TP_ARGS(rbio, bio, trace_info) >> ); >> >> +DECLARE_EVENT_CLASS(btrfs_bio, >> + >> + TP_PROTO(const struct btrfs_bio *bbio, int mirror_num, u64 map_length), >> + >> + TP_ARGS(bbio, mirror_num, map_length), >> + >> + TP_STRUCT__entry_btrfs( >> + __field( u64, logical ) >> + __field( u64, root ) >> + __field( u64, ino ) >> + __field( s64, map_length ) >> + __field( u32, length ) >> + __field( int, mirror_num ) >> + __field( u8, opf ) >> + ), >> + >> + TP_fast_assign_btrfs(bbio->fs_info, >> + __entry->logical = bbio->bio.bi_iter.bi_sector << SECTOR_SHIFT; >> + __entry->length = bbio->bio.bi_iter.bi_size; >> + __entry->map_length = map_length; >> + __entry->mirror_num = mirror_num; >> + __entry->opf = bio_op(&bbio->bio); >> + __entry->root = bbio->inode ? >> + bbio->inode->root->root_key.objectid : 0; > > Can't we use show_root_type() here? Good idea! Would go that way in the next update. Thanks, Qu > > Other than that, looks good to me. > > Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com>
diff --git a/fs/btrfs/bio.c b/fs/btrfs/bio.c index 12b12443efaa..204c30391086 100644 --- a/fs/btrfs/bio.c +++ b/fs/btrfs/bio.c @@ -669,9 +669,12 @@ static bool btrfs_submit_chunk(struct btrfs_bio *bbio, int mirror_num) if (use_append) map_length = min(map_length, fs_info->max_zone_append_size); + trace_before_split_bbio(bbio, mirror_num, map_length); + if (map_length < length) { bbio = btrfs_split_bio(fs_info, bbio, map_length, use_append); bio = &bbio->bio; + trace_after_split_bbio(bbio, mirror_num, map_length); } /* @@ -731,6 +734,7 @@ void btrfs_submit_bio(struct btrfs_bio *bbio, int mirror_num) /* If bbio->inode is not populated, its file_offset must be 0. */ ASSERT(bbio->inode || bbio->file_offset == 0); + trace_initial_bbio(bbio, mirror_num, -1); while (!btrfs_submit_chunk(bbio, mirror_num)) ; } diff --git a/include/trace/events/btrfs.h b/include/trace/events/btrfs.h index c6eee9b414cf..1e6d87abd677 100644 --- a/include/trace/events/btrfs.h +++ b/include/trace/events/btrfs.h @@ -31,6 +31,7 @@ struct extent_io_tree; struct prelim_ref; struct btrfs_space_info; struct btrfs_raid_bio; +struct btrfs_bio; struct raid56_bio_trace_info; struct find_free_extent_ctl; @@ -2521,6 +2522,56 @@ DEFINE_EVENT(btrfs_raid56_bio, raid56_scrub_read_recover, TP_ARGS(rbio, bio, trace_info) ); +DECLARE_EVENT_CLASS(btrfs_bio, + + TP_PROTO(const struct btrfs_bio *bbio, int mirror_num, u64 map_length), + + TP_ARGS(bbio, mirror_num, map_length), + + TP_STRUCT__entry_btrfs( + __field( u64, logical ) + __field( u64, root ) + __field( u64, ino ) + __field( s64, map_length ) + __field( u32, length ) + __field( int, mirror_num ) + __field( u8, opf ) + ), + + TP_fast_assign_btrfs(bbio->fs_info, + __entry->logical = bbio->bio.bi_iter.bi_sector << SECTOR_SHIFT; + __entry->length = bbio->bio.bi_iter.bi_size; + __entry->map_length = map_length; + __entry->mirror_num = mirror_num; + __entry->opf = bio_op(&bbio->bio); + __entry->root = bbio->inode ? + bbio->inode->root->root_key.objectid : 0; + __entry->ino = bbio->inode ? btrfs_ino(bbio->inode) : 0; + ), + TP_printk_btrfs( +"root=%lld ino=%llu logical=%llu length=%u opf=0x%x mirror_num=%d map_length=%lld", + __entry->root, __entry->ino, + __entry->logical, __entry->length, __entry->opf, + __entry->mirror_num, __entry->map_length) +); + +DEFINE_EVENT(btrfs_bio, initial_bbio, + TP_PROTO(const struct btrfs_bio *bbio, int mirror_num, u64 map_length), + + TP_ARGS(bbio, mirror_num, map_length) +); + +DEFINE_EVENT(btrfs_bio, before_split_bbio, + TP_PROTO(const struct btrfs_bio *bbio, int mirror_num, u64 map_length), + + TP_ARGS(bbio, mirror_num, map_length) +); + +DEFINE_EVENT(btrfs_bio, after_split_bbio, + TP_PROTO(const struct btrfs_bio *bbio, int mirror_num, u64 map_length), + + TP_ARGS(bbio, mirror_num, map_length) +); #endif /* _TRACE_BTRFS_H */ /* This part must be outside protection */
[BACKGROUND] David recently reported some weird RAID56 errors where huge btrfs bio (tens of MiBs) is directly passed to RAID56 path, without proper bio split. To my surprise, there is no trace events around the bio split code of storage layer. [NEW TRACE EVENTS] This patch would add 3 new trace events: - trace_initial_bbio() - trace_before_split_bbio() - trace_after_split_bbio() The example output would look like the following (headers and uuid removed): initial_bbio: root=5 ino=257 logical=389152768 length=524288 opf=0x1 mirror_num=0 map_length=-1 before_split_bbio: root=5 ino=257 logical=389152768 length=524288 opf=0x1 mirror_num=0 map_length=131072 after_split_bbio: root=5 ino=257 logical=389152768 length=131072 opf=0x1 mirror_num=0 map_length=131072 before_split_bbio: root=5 ino=257 logical=389283840 length=393216 opf=0x1 mirror_num=0 map_length=131072 after_split_bbio: root=5 ino=257 logical=389283840 length=131072 opf=0x1 mirror_num=0 map_length=131072 before_split_bbio: root=5 ino=257 logical=389414912 length=262144 opf=0x1 mirror_num=0 map_length=131072 after_split_bbio: root=5 ino=257 logical=389414912 length=131072 opf=0x1 mirror_num=0 map_length=131072 before_split_bbio: root=5 ino=257 logical=389545984 length=131072 opf=0x1 mirror_num=0 map_length=131072 The above lines show a 512K bbio submitted, then it got split by each 128K boundary (this is a 3 disks RAID5). initial_bbio: root=1 ino=1 logical=30441472 length=16384 opf=0x1 mirror_num=0 map_length=-1 before_split_bbio: root=1 ino=1 logical=30441472 length=16384 opf=0x1 mirror_num=0 map_length=16384 initial_bbio: root=1 ino=1 logical=30457856 length=16384 opf=0x1 mirror_num=0 map_length=-1 before_split_bbio: root=1 ino=1 logical=30457856 length=16384 opf=0x1 mirror_num=0 map_length=16384 And the above lines are metadata writes which didn't need to be split at all. These new events should allow us to debug bio split related problems easier. Signed-off-by: Qu Wenruo <wqu@suse.com> --- fs/btrfs/bio.c | 4 +++ include/trace/events/btrfs.h | 51 ++++++++++++++++++++++++++++++++++++ 2 files changed, 55 insertions(+)