Message ID | de8cc48c6141a20fb2ccf2b774981b150caee27b.1653988869.git.wqu@suse.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | btrfs: add btrfs_debug() output for every bio submitted by btrfs RAID56 | expand |
On Tue, May 31, 2022 at 05:22:56PM +0800, Qu Wenruo wrote: > For the later incoming RAID56J, it's better to know each bio we're > submitting from btrfs RAID56 layer. > > The output will look like this: > > BTRFS debug (device dm-4): partial rmw, full stripe=389152768 opf=0x0 devid=3 type=1 offset=16384 physical=323043328 len=49152 > BTRFS debug (device dm-4): partial rmw, full stripe=389152768 opf=0x0 devid=1 type=2 offset=0 physical=67174400 len=65536 > BTRFS debug (device dm-4): full stripe rmw, full stripe=389152768 opf=0x1 devid=3 type=1 offset=0 physical=323026944 len=16384 > BTRFS debug (device dm-4): full stripe rmw, full stripe=389152768 opf=0x1 devid=2 type=-1 offset=0 physical=323026944 len=16384 > > The above debug output is from a 16K data write into an empty RAID56 > data chunk. > > Some explanation on them: > opf: bio operation > devid: btrfs devid > type: raid stripe type. > >=1 are the Nth data stripe. > -1 for P stripe, -2 for Q stripe. > 0 for error (btrfs device not found) > offset: the offset inside the stripe. > physical: the physical offset the bio is for > len: the lenghth of the bio > > The first two lines are from partial RMW read, which is reading the > remaining data stripes from disks. > > The last two lines are for full stripe RMW write, which is writing the > involved two 16K stripes (one for data1, one for parity). > The stripe for data2 is doesn't need to be written. > > To enable any btrfs_debug() output, it's recommended to use kernel > dynamic debug interface. > > For this RAID56 example: > > # echo 'file fs/btrfs/raid56.c +p' > /sys/kernel/debug/dynamic_debug/control Have you considered using a tracepoint instead of dynamic debug?
On 2022/5/31 22:43, David Sterba wrote: > On Tue, May 31, 2022 at 05:22:56PM +0800, Qu Wenruo wrote: >> For the later incoming RAID56J, it's better to know each bio we're >> submitting from btrfs RAID56 layer. >> >> The output will look like this: >> >> BTRFS debug (device dm-4): partial rmw, full stripe=389152768 opf=0x0 devid=3 type=1 offset=16384 physical=323043328 len=49152 >> BTRFS debug (device dm-4): partial rmw, full stripe=389152768 opf=0x0 devid=1 type=2 offset=0 physical=67174400 len=65536 >> BTRFS debug (device dm-4): full stripe rmw, full stripe=389152768 opf=0x1 devid=3 type=1 offset=0 physical=323026944 len=16384 >> BTRFS debug (device dm-4): full stripe rmw, full stripe=389152768 opf=0x1 devid=2 type=-1 offset=0 physical=323026944 len=16384 >> >> The above debug output is from a 16K data write into an empty RAID56 >> data chunk. >> >> Some explanation on them: >> opf: bio operation >> devid: btrfs devid >> type: raid stripe type. >> >=1 are the Nth data stripe. >> -1 for P stripe, -2 for Q stripe. >> 0 for error (btrfs device not found) >> offset: the offset inside the stripe. >> physical: the physical offset the bio is for >> len: the lenghth of the bio >> >> The first two lines are from partial RMW read, which is reading the >> remaining data stripes from disks. >> >> The last two lines are for full stripe RMW write, which is writing the >> involved two 16K stripes (one for data1, one for parity). >> The stripe for data2 is doesn't need to be written. >> >> To enable any btrfs_debug() output, it's recommended to use kernel >> dynamic debug interface. >> >> For this RAID56 example: >> >> # echo 'file fs/btrfs/raid56.c +p' > /sys/kernel/debug/dynamic_debug/control > > Have you considered using a tracepoint instead of dynamic debug? > I have considered, but there is still a problem I can not solve that well. When using trace events, we have an advantage that everything in trace event is only executed if that event is enabled. But I'm not able to put the devid/stripe type search code into trace event. It will need to iterate through the rbio->bioc->stripes[] array. I'm not even sure if it's possible in trace events. So I go dynamic debug, with the extra cost of running devid/stripe search every time even the debug code is not enabled. Thanks, Qu
On 1.06.22 г. 2:12 ч., Qu Wenruo wrote: > > > On 2022/5/31 22:43, David Sterba wrote: >> On Tue, May 31, 2022 at 05:22:56PM +0800, Qu Wenruo wrote: >>> For the later incoming RAID56J, it's better to know each bio we're >>> submitting from btrfs RAID56 layer. >>> >>> The output will look like this: >>> >>> BTRFS debug (device dm-4): partial rmw, full stripe=389152768 >>> opf=0x0 devid=3 type=1 offset=16384 physical=323043328 len=49152 >>> BTRFS debug (device dm-4): partial rmw, full stripe=389152768 >>> opf=0x0 devid=1 type=2 offset=0 physical=67174400 len=65536 >>> BTRFS debug (device dm-4): full stripe rmw, full stripe=389152768 >>> opf=0x1 devid=3 type=1 offset=0 physical=323026944 len=16384 >>> BTRFS debug (device dm-4): full stripe rmw, full stripe=389152768 >>> opf=0x1 devid=2 type=-1 offset=0 physical=323026944 len=16384 >>> >>> The above debug output is from a 16K data write into an empty RAID56 >>> data chunk. >>> >>> Some explanation on them: >>> opf: bio operation >>> devid: btrfs devid >>> type: raid stripe type. >>> >=1 are the Nth data stripe. >>> -1 for P stripe, -2 for Q stripe. >>> 0 for error (btrfs device not found) >>> offset: the offset inside the stripe. >>> physical: the physical offset the bio is for >>> len: the lenghth of the bio >>> >>> The first two lines are from partial RMW read, which is reading the >>> remaining data stripes from disks. >>> >>> The last two lines are for full stripe RMW write, which is writing the >>> involved two 16K stripes (one for data1, one for parity). >>> The stripe for data2 is doesn't need to be written. >>> >>> To enable any btrfs_debug() output, it's recommended to use kernel >>> dynamic debug interface. >>> >>> For this RAID56 example: >>> >>> # echo 'file fs/btrfs/raid56.c +p' > >>> /sys/kernel/debug/dynamic_debug/control >> >> Have you considered using a tracepoint instead of dynamic debug? >> > > I have considered, but there is still a problem I can not solve that well. > > When using trace events, we have an advantage that everything in trace > event is only executed if that event is enabled. > > But I'm not able to put the devid/stripe type search code into trace event. > It will need to iterate through the rbio->bioc->stripes[] array. > I'm not even sure if it's possible in trace events. With the trace event you can do: if (trace_btrfs_raid56_enabled()) { stripe = expensive_search_code() } trace_btrfs_raid56(..., stripe) I.e execute the code iff that particular event is enabled and pass the resulting information to the event. For reference you can lookup how 'start_ns' variable is assigned in __reserve_bytes and later passed to handle_reserve_ticket which in turn passes it to trace_btrfs_reserve_ticket. > > So I go dynamic debug, with the extra cost of running devid/stripe > search every time even the debug code is not enabled. > > Thanks, > Qu >
On 2022/6/1 14:07, Nikolay Borisov wrote: > > > On 1.06.22 г. 2:12 ч., Qu Wenruo wrote: >> >> >> On 2022/5/31 22:43, David Sterba wrote: >>> On Tue, May 31, 2022 at 05:22:56PM +0800, Qu Wenruo wrote: >>>> For the later incoming RAID56J, it's better to know each bio we're >>>> submitting from btrfs RAID56 layer. >>>> >>>> The output will look like this: >>>> >>>> BTRFS debug (device dm-4): partial rmw, full stripe=389152768 >>>> opf=0x0 devid=3 type=1 offset=16384 physical=323043328 len=49152 >>>> BTRFS debug (device dm-4): partial rmw, full stripe=389152768 >>>> opf=0x0 devid=1 type=2 offset=0 physical=67174400 len=65536 >>>> BTRFS debug (device dm-4): full stripe rmw, full stripe=389152768 >>>> opf=0x1 devid=3 type=1 offset=0 physical=323026944 len=16384 >>>> BTRFS debug (device dm-4): full stripe rmw, full stripe=389152768 >>>> opf=0x1 devid=2 type=-1 offset=0 physical=323026944 len=16384 >>>> >>>> The above debug output is from a 16K data write into an empty RAID56 >>>> data chunk. >>>> >>>> Some explanation on them: >>>> opf: bio operation >>>> devid: btrfs devid >>>> type: raid stripe type. >>>> >=1 are the Nth data stripe. >>>> -1 for P stripe, -2 for Q stripe. >>>> 0 for error (btrfs device not found) >>>> offset: the offset inside the stripe. >>>> physical: the physical offset the bio is for >>>> len: the lenghth of the bio >>>> >>>> The first two lines are from partial RMW read, which is reading the >>>> remaining data stripes from disks. >>>> >>>> The last two lines are for full stripe RMW write, which is writing the >>>> involved two 16K stripes (one for data1, one for parity). >>>> The stripe for data2 is doesn't need to be written. >>>> >>>> To enable any btrfs_debug() output, it's recommended to use kernel >>>> dynamic debug interface. >>>> >>>> For this RAID56 example: >>>> >>>> # echo 'file fs/btrfs/raid56.c +p' > >>>> /sys/kernel/debug/dynamic_debug/control >>> >>> Have you considered using a tracepoint instead of dynamic debug? >>> >> >> I have considered, but there is still a problem I can not solve that >> well. >> >> When using trace events, we have an advantage that everything in trace >> event is only executed if that event is enabled. >> >> But I'm not able to put the devid/stripe type search code into trace >> event. >> It will need to iterate through the rbio->bioc->stripes[] array. >> I'm not even sure if it's possible in trace events. > > With the trace event you can do: > > if (trace_btrfs_raid56_enabled()) { > stripe = expensive_search_code() > > } > > > trace_btrfs_raid56(..., stripe) > > > I.e execute the code iff that particular event is enabled and pass the > resulting information to the event. For reference you can lookup how > 'start_ns' variable is assigned in __reserve_bytes and later passed to > handle_reserve_ticket which in turn passes it to > trace_btrfs_reserve_ticket. Awesome! exactly what I need. Now I will go fully trace events, since the dynamic debug still needs to run those helpers anyway. And with trace events, I can provide a better readable result for the stripe type now. Thanks, Qu > > > >> >> So I go dynamic debug, with the extra cost of running devid/stripe >> search every time even the debug code is not enabled. >> >> Thanks, >> Qu >>
diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c index 88640f7e1622..65073abd92ab 100644 --- a/fs/btrfs/raid56.c +++ b/fs/btrfs/raid56.c @@ -1257,6 +1257,82 @@ static void index_rbio_pages(struct btrfs_raid_bio *rbio) spin_unlock_irq(&rbio->bio_list_lock); } +/* + * Return 0 if we can not get the device for it. + * Return >=1 for the nth data stripe. + * Return -1 for P stripe. + * Return -2 for Q stripe. + */ +static int bio_get_stripe_type(struct btrfs_raid_bio *rbio, struct bio *bio) +{ + int i; + + if (!bio->bi_bdev) + return 0; + + for (i = 0; i < rbio->bioc->num_stripes; i++) { + if (bio->bi_bdev == rbio->bioc->stripes[i].dev->bdev) { + /* For data stripe, we need to add one. */ + if (i < rbio->nr_data) + return i + 1; + + /* P stripe. */ + if (i == rbio->nr_data) + return -1; + /* Q stripe. */ + return -2; + } + } + return 0; +} + +/* + * Return the devid of a bio for RAID56 profile. + * Return (u64)-1 if we can not find the btrfs device for it. + */ +static u64 bio_get_btrfs_devid(struct btrfs_raid_bio *rbio, struct bio *bio) +{ + int i; + + if (!bio->bi_bdev) + return -1; + + for (i = 0; i < rbio->bioc->num_stripes; i++) { + if (bio->bi_bdev == rbio->bioc->stripes[i].dev->bdev) + return rbio->bioc->stripes[i].dev->devid; + } + return -1; +} + +/* Return the offset inside its (data/P/Q) stripe */ +static int bio_get_offset(struct btrfs_raid_bio *rbio, struct bio *bio) +{ + int i; + + if (!bio->bi_bdev) + return -1; + + for (i = 0; i < rbio->bioc->num_stripes; i++) { + if (bio->bi_bdev == rbio->bioc->stripes[i].dev->bdev) + return (bio->bi_iter.bi_sector << SECTOR_SHIFT) - + rbio->bioc->stripes[i].physical; + } + return -1; +} + +static void debug_raid_bio(struct btrfs_raid_bio *rbio, struct bio *bio, + const char *prefix) +{ + btrfs_debug(rbio->bioc->fs_info, + "%s, full stripe=%llu opf=0x%x devid=%lld type=%d offset=%d physical=%llu len=%u", + prefix, rbio->bioc->raid_map[0], bio_op(bio), + bio_get_btrfs_devid(rbio, bio), + bio_get_stripe_type(rbio, bio), + bio_get_offset(rbio, bio), + bio->bi_iter.bi_sector << SECTOR_SHIFT, + bio->bi_iter.bi_size); +} + /* * this is called from one of two situations. We either * have a full stripe from the higher layers, or we've read all @@ -1422,6 +1498,7 @@ static noinline void finish_rmw(struct btrfs_raid_bio *rbio) while ((bio = bio_list_pop(&bio_list))) { bio->bi_end_io = raid_write_end_io; + debug_raid_bio(rbio, bio, "full stripe rmw"); submit_bio(bio); } return; @@ -1683,6 +1760,7 @@ static int raid56_rmw_stripe(struct btrfs_raid_bio *rbio) btrfs_bio_wq_end_io(rbio->bioc->fs_info, bio, BTRFS_WQ_ENDIO_RAID56); + debug_raid_bio(rbio, bio, "partial rmw"); submit_bio(bio); } /* the actual write will happen once the reads are done */ @@ -2256,6 +2334,7 @@ static int __raid56_parity_recover(struct btrfs_raid_bio *rbio) btrfs_bio_wq_end_io(rbio->bioc->fs_info, bio, BTRFS_WQ_ENDIO_RAID56); + debug_raid_bio(rbio, bio, "parity recover"); submit_bio(bio); } @@ -2625,6 +2704,7 @@ static noinline void finish_parity_scrub(struct btrfs_raid_bio *rbio, while ((bio = bio_list_pop(&bio_list))) { bio->bi_end_io = raid_write_end_io; + debug_raid_bio(rbio, bio, "scrub parity"); submit_bio(bio); } return; @@ -2804,6 +2884,7 @@ static void raid56_parity_scrub_stripe(struct btrfs_raid_bio *rbio) btrfs_bio_wq_end_io(rbio->bioc->fs_info, bio, BTRFS_WQ_ENDIO_RAID56); + debug_raid_bio(rbio, bio, "scrub stripe"); submit_bio(bio); } /* the actual write will happen once the reads are done */