diff mbox series

btrfs: add btrfs_debug() output for every bio submitted by btrfs RAID56

Message ID de8cc48c6141a20fb2ccf2b774981b150caee27b.1653988869.git.wqu@suse.com (mailing list archive)
State New, archived
Headers show
Series btrfs: add btrfs_debug() output for every bio submitted by btrfs RAID56 | expand

Commit Message

Qu Wenruo May 31, 2022, 9:22 a.m. UTC
For the later incoming RAID56J, it's better to know each bio we're
submitting from btrfs RAID56 layer.

The output will look like this:

 BTRFS debug (device dm-4): partial rmw, full stripe=389152768 opf=0x0 devid=3 type=1 offset=16384 physical=323043328 len=49152
 BTRFS debug (device dm-4): partial rmw, full stripe=389152768 opf=0x0 devid=1 type=2 offset=0 physical=67174400 len=65536
 BTRFS debug (device dm-4): full stripe rmw, full stripe=389152768 opf=0x1 devid=3 type=1 offset=0 physical=323026944 len=16384
 BTRFS debug (device dm-4): full stripe rmw, full stripe=389152768 opf=0x1 devid=2 type=-1 offset=0 physical=323026944 len=16384

The above debug output is from a 16K data write into an empty RAID56
data chunk.

Some explanation on them:
 opf:		bio operation
 devid:		btrfs devid
 type:		raid stripe type.
		>=1 are the Nth data stripe.
		-1 for P stripe, -2 for Q stripe.
		0 for error (btrfs device not found)
 offset:	the offset inside the stripe.
 physical:	the physical offset the bio is for
 len:		the lenghth of the bio

The first two lines are from partial RMW read, which is reading the
remaining data stripes from disks.

The last two lines are for full stripe RMW write, which is writing the
involved two 16K stripes (one for data1, one for parity).
The stripe for data2 is doesn't need to be written.

To enable any btrfs_debug() output, it's recommended to use kernel
dynamic debug interface.

For this RAID56 example:

  # echo 'file fs/btrfs/raid56.c +p' > /sys/kernel/debug/dynamic_debug/control

Above command will enable all `btrfs_debug()` in fs/btrfs/raid56.c,
currently there is only one callsite inside that file.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/raid56.c | 81 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 81 insertions(+)

Comments

David Sterba May 31, 2022, 2:43 p.m. UTC | #1
On Tue, May 31, 2022 at 05:22:56PM +0800, Qu Wenruo wrote:
> For the later incoming RAID56J, it's better to know each bio we're
> submitting from btrfs RAID56 layer.
> 
> The output will look like this:
> 
>  BTRFS debug (device dm-4): partial rmw, full stripe=389152768 opf=0x0 devid=3 type=1 offset=16384 physical=323043328 len=49152
>  BTRFS debug (device dm-4): partial rmw, full stripe=389152768 opf=0x0 devid=1 type=2 offset=0 physical=67174400 len=65536
>  BTRFS debug (device dm-4): full stripe rmw, full stripe=389152768 opf=0x1 devid=3 type=1 offset=0 physical=323026944 len=16384
>  BTRFS debug (device dm-4): full stripe rmw, full stripe=389152768 opf=0x1 devid=2 type=-1 offset=0 physical=323026944 len=16384
> 
> The above debug output is from a 16K data write into an empty RAID56
> data chunk.
> 
> Some explanation on them:
>  opf:		bio operation
>  devid:		btrfs devid
>  type:		raid stripe type.
> 		>=1 are the Nth data stripe.
> 		-1 for P stripe, -2 for Q stripe.
> 		0 for error (btrfs device not found)
>  offset:	the offset inside the stripe.
>  physical:	the physical offset the bio is for
>  len:		the lenghth of the bio
> 
> The first two lines are from partial RMW read, which is reading the
> remaining data stripes from disks.
> 
> The last two lines are for full stripe RMW write, which is writing the
> involved two 16K stripes (one for data1, one for parity).
> The stripe for data2 is doesn't need to be written.
> 
> To enable any btrfs_debug() output, it's recommended to use kernel
> dynamic debug interface.
> 
> For this RAID56 example:
> 
>   # echo 'file fs/btrfs/raid56.c +p' > /sys/kernel/debug/dynamic_debug/control

Have you considered using a tracepoint instead of dynamic debug?
Qu Wenruo May 31, 2022, 11:12 p.m. UTC | #2
On 2022/5/31 22:43, David Sterba wrote:
> On Tue, May 31, 2022 at 05:22:56PM +0800, Qu Wenruo wrote:
>> For the later incoming RAID56J, it's better to know each bio we're
>> submitting from btrfs RAID56 layer.
>>
>> The output will look like this:
>>
>>   BTRFS debug (device dm-4): partial rmw, full stripe=389152768 opf=0x0 devid=3 type=1 offset=16384 physical=323043328 len=49152
>>   BTRFS debug (device dm-4): partial rmw, full stripe=389152768 opf=0x0 devid=1 type=2 offset=0 physical=67174400 len=65536
>>   BTRFS debug (device dm-4): full stripe rmw, full stripe=389152768 opf=0x1 devid=3 type=1 offset=0 physical=323026944 len=16384
>>   BTRFS debug (device dm-4): full stripe rmw, full stripe=389152768 opf=0x1 devid=2 type=-1 offset=0 physical=323026944 len=16384
>>
>> The above debug output is from a 16K data write into an empty RAID56
>> data chunk.
>>
>> Some explanation on them:
>>   opf:		bio operation
>>   devid:		btrfs devid
>>   type:		raid stripe type.
>> 		>=1 are the Nth data stripe.
>> 		-1 for P stripe, -2 for Q stripe.
>> 		0 for error (btrfs device not found)
>>   offset:	the offset inside the stripe.
>>   physical:	the physical offset the bio is for
>>   len:		the lenghth of the bio
>>
>> The first two lines are from partial RMW read, which is reading the
>> remaining data stripes from disks.
>>
>> The last two lines are for full stripe RMW write, which is writing the
>> involved two 16K stripes (one for data1, one for parity).
>> The stripe for data2 is doesn't need to be written.
>>
>> To enable any btrfs_debug() output, it's recommended to use kernel
>> dynamic debug interface.
>>
>> For this RAID56 example:
>>
>>    # echo 'file fs/btrfs/raid56.c +p' > /sys/kernel/debug/dynamic_debug/control
>
> Have you considered using a tracepoint instead of dynamic debug?
>

I have considered, but there is still a problem I can not solve that well.

When using trace events, we have an advantage that everything in trace
event is only executed if that event is enabled.

But I'm not able to put the devid/stripe type search code into trace event.
It will need to iterate through the rbio->bioc->stripes[] array.
I'm not even sure if it's possible in trace events.

So I go dynamic debug, with the extra cost of running devid/stripe
search every time even the debug code is not enabled.

Thanks,
Qu
Nikolay Borisov June 1, 2022, 6:07 a.m. UTC | #3
On 1.06.22 г. 2:12 ч., Qu Wenruo wrote:
> 
> 
> On 2022/5/31 22:43, David Sterba wrote:
>> On Tue, May 31, 2022 at 05:22:56PM +0800, Qu Wenruo wrote:
>>> For the later incoming RAID56J, it's better to know each bio we're
>>> submitting from btrfs RAID56 layer.
>>>
>>> The output will look like this:
>>>
>>>   BTRFS debug (device dm-4): partial rmw, full stripe=389152768 
>>> opf=0x0 devid=3 type=1 offset=16384 physical=323043328 len=49152
>>>   BTRFS debug (device dm-4): partial rmw, full stripe=389152768 
>>> opf=0x0 devid=1 type=2 offset=0 physical=67174400 len=65536
>>>   BTRFS debug (device dm-4): full stripe rmw, full stripe=389152768 
>>> opf=0x1 devid=3 type=1 offset=0 physical=323026944 len=16384
>>>   BTRFS debug (device dm-4): full stripe rmw, full stripe=389152768 
>>> opf=0x1 devid=2 type=-1 offset=0 physical=323026944 len=16384
>>>
>>> The above debug output is from a 16K data write into an empty RAID56
>>> data chunk.
>>>
>>> Some explanation on them:
>>>   opf:        bio operation
>>>   devid:        btrfs devid
>>>   type:        raid stripe type.
>>>         >=1 are the Nth data stripe.
>>>         -1 for P stripe, -2 for Q stripe.
>>>         0 for error (btrfs device not found)
>>>   offset:    the offset inside the stripe.
>>>   physical:    the physical offset the bio is for
>>>   len:        the lenghth of the bio
>>>
>>> The first two lines are from partial RMW read, which is reading the
>>> remaining data stripes from disks.
>>>
>>> The last two lines are for full stripe RMW write, which is writing the
>>> involved two 16K stripes (one for data1, one for parity).
>>> The stripe for data2 is doesn't need to be written.
>>>
>>> To enable any btrfs_debug() output, it's recommended to use kernel
>>> dynamic debug interface.
>>>
>>> For this RAID56 example:
>>>
>>>    # echo 'file fs/btrfs/raid56.c +p' > 
>>> /sys/kernel/debug/dynamic_debug/control
>>
>> Have you considered using a tracepoint instead of dynamic debug?
>>
> 
> I have considered, but there is still a problem I can not solve that well.
> 
> When using trace events, we have an advantage that everything in trace
> event is only executed if that event is enabled.
> 
> But I'm not able to put the devid/stripe type search code into trace event.
> It will need to iterate through the rbio->bioc->stripes[] array.
> I'm not even sure if it's possible in trace events.

With the trace event you can do:

if (trace_btrfs_raid56_enabled()) {
	stripe = expensive_search_code()

}


trace_btrfs_raid56(..., stripe)


I.e execute the code iff that particular event is enabled and pass the 
resulting information to the event. For reference you can lookup how 
'start_ns' variable is assigned in __reserve_bytes and later passed to 
handle_reserve_ticket which in turn passes it to 
trace_btrfs_reserve_ticket.



> 
> So I go dynamic debug, with the extra cost of running devid/stripe
> search every time even the debug code is not enabled.
> 
> Thanks,
> Qu
>
Qu Wenruo June 1, 2022, 6:10 a.m. UTC | #4
On 2022/6/1 14:07, Nikolay Borisov wrote:
>
>
> On 1.06.22 г. 2:12 ч., Qu Wenruo wrote:
>>
>>
>> On 2022/5/31 22:43, David Sterba wrote:
>>> On Tue, May 31, 2022 at 05:22:56PM +0800, Qu Wenruo wrote:
>>>> For the later incoming RAID56J, it's better to know each bio we're
>>>> submitting from btrfs RAID56 layer.
>>>>
>>>> The output will look like this:
>>>>
>>>>   BTRFS debug (device dm-4): partial rmw, full stripe=389152768
>>>> opf=0x0 devid=3 type=1 offset=16384 physical=323043328 len=49152
>>>>   BTRFS debug (device dm-4): partial rmw, full stripe=389152768
>>>> opf=0x0 devid=1 type=2 offset=0 physical=67174400 len=65536
>>>>   BTRFS debug (device dm-4): full stripe rmw, full stripe=389152768
>>>> opf=0x1 devid=3 type=1 offset=0 physical=323026944 len=16384
>>>>   BTRFS debug (device dm-4): full stripe rmw, full stripe=389152768
>>>> opf=0x1 devid=2 type=-1 offset=0 physical=323026944 len=16384
>>>>
>>>> The above debug output is from a 16K data write into an empty RAID56
>>>> data chunk.
>>>>
>>>> Some explanation on them:
>>>>   opf:        bio operation
>>>>   devid:        btrfs devid
>>>>   type:        raid stripe type.
>>>>         >=1 are the Nth data stripe.
>>>>         -1 for P stripe, -2 for Q stripe.
>>>>         0 for error (btrfs device not found)
>>>>   offset:    the offset inside the stripe.
>>>>   physical:    the physical offset the bio is for
>>>>   len:        the lenghth of the bio
>>>>
>>>> The first two lines are from partial RMW read, which is reading the
>>>> remaining data stripes from disks.
>>>>
>>>> The last two lines are for full stripe RMW write, which is writing the
>>>> involved two 16K stripes (one for data1, one for parity).
>>>> The stripe for data2 is doesn't need to be written.
>>>>
>>>> To enable any btrfs_debug() output, it's recommended to use kernel
>>>> dynamic debug interface.
>>>>
>>>> For this RAID56 example:
>>>>
>>>>    # echo 'file fs/btrfs/raid56.c +p' >
>>>> /sys/kernel/debug/dynamic_debug/control
>>>
>>> Have you considered using a tracepoint instead of dynamic debug?
>>>
>>
>> I have considered, but there is still a problem I can not solve that
>> well.
>>
>> When using trace events, we have an advantage that everything in trace
>> event is only executed if that event is enabled.
>>
>> But I'm not able to put the devid/stripe type search code into trace
>> event.
>> It will need to iterate through the rbio->bioc->stripes[] array.
>> I'm not even sure if it's possible in trace events.
>
> With the trace event you can do:
>
> if (trace_btrfs_raid56_enabled()) {
>      stripe = expensive_search_code()
>
> }
>
>
> trace_btrfs_raid56(..., stripe)
>
>
> I.e execute the code iff that particular event is enabled and pass the
> resulting information to the event. For reference you can lookup how
> 'start_ns' variable is assigned in __reserve_bytes and later passed to
> handle_reserve_ticket which in turn passes it to
> trace_btrfs_reserve_ticket.

Awesome! exactly what I need.

Now I will go fully trace events, since the dynamic debug still needs to
run those helpers anyway.

And with trace events, I can provide a better readable result for the
stripe type now.

Thanks,
Qu
>
>
>
>>
>> So I go dynamic debug, with the extra cost of running devid/stripe
>> search every time even the debug code is not enabled.
>>
>> Thanks,
>> Qu
>>
diff mbox series

Patch

diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index 88640f7e1622..65073abd92ab 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -1257,6 +1257,82 @@  static void index_rbio_pages(struct btrfs_raid_bio *rbio)
 	spin_unlock_irq(&rbio->bio_list_lock);
 }
 
+/*
+ * Return 0 if we can not get the device for it.
+ * Return >=1 for the nth data stripe.
+ * Return -1 for P stripe.
+ * Return -2 for Q stripe.
+ */
+static int bio_get_stripe_type(struct btrfs_raid_bio *rbio, struct bio *bio)
+{
+	int i;
+
+	if (!bio->bi_bdev)
+		return 0;
+
+	for (i = 0; i < rbio->bioc->num_stripes; i++) {
+		if (bio->bi_bdev == rbio->bioc->stripes[i].dev->bdev) {
+			/* For data stripe, we need to add one. */
+			if (i < rbio->nr_data)
+				return i + 1;
+
+			/* P stripe. */
+			if (i == rbio->nr_data)
+				return -1;
+			/* Q stripe. */
+			return -2;
+		}
+	}
+	return 0;
+}
+
+/*
+ * Return the devid of a bio for RAID56 profile.
+ * Return (u64)-1 if we can not find the btrfs device for it.
+ */
+static u64 bio_get_btrfs_devid(struct btrfs_raid_bio *rbio, struct bio *bio)
+{
+	int i;
+
+	if (!bio->bi_bdev)
+		return -1;
+
+	for (i = 0; i < rbio->bioc->num_stripes; i++) {
+		if (bio->bi_bdev == rbio->bioc->stripes[i].dev->bdev)
+			return rbio->bioc->stripes[i].dev->devid;
+	}
+	return -1;
+}
+
+/* Return the offset inside its (data/P/Q) stripe */
+static int bio_get_offset(struct btrfs_raid_bio *rbio, struct bio *bio)
+{
+	int i;
+
+	if (!bio->bi_bdev)
+		return -1;
+
+	for (i = 0; i < rbio->bioc->num_stripes; i++) {
+		if (bio->bi_bdev == rbio->bioc->stripes[i].dev->bdev)
+			return (bio->bi_iter.bi_sector << SECTOR_SHIFT) -
+				rbio->bioc->stripes[i].physical;
+	}
+	return -1;
+}
+
+static void debug_raid_bio(struct btrfs_raid_bio *rbio, struct bio *bio,
+			   const char *prefix)
+{
+	btrfs_debug(rbio->bioc->fs_info,
+		"%s, full stripe=%llu opf=0x%x devid=%lld type=%d offset=%d physical=%llu len=%u",
+		prefix, rbio->bioc->raid_map[0], bio_op(bio),
+		bio_get_btrfs_devid(rbio, bio),
+		bio_get_stripe_type(rbio, bio),
+		bio_get_offset(rbio, bio),
+		bio->bi_iter.bi_sector << SECTOR_SHIFT,
+		bio->bi_iter.bi_size);
+}
+
 /*
  * this is called from one of two situations.  We either
  * have a full stripe from the higher layers, or we've read all
@@ -1422,6 +1498,7 @@  static noinline void finish_rmw(struct btrfs_raid_bio *rbio)
 	while ((bio = bio_list_pop(&bio_list))) {
 		bio->bi_end_io = raid_write_end_io;
 
+		debug_raid_bio(rbio, bio, "full stripe rmw");
 		submit_bio(bio);
 	}
 	return;
@@ -1683,6 +1760,7 @@  static int raid56_rmw_stripe(struct btrfs_raid_bio *rbio)
 
 		btrfs_bio_wq_end_io(rbio->bioc->fs_info, bio, BTRFS_WQ_ENDIO_RAID56);
 
+		debug_raid_bio(rbio, bio, "partial rmw");
 		submit_bio(bio);
 	}
 	/* the actual write will happen once the reads are done */
@@ -2256,6 +2334,7 @@  static int __raid56_parity_recover(struct btrfs_raid_bio *rbio)
 
 		btrfs_bio_wq_end_io(rbio->bioc->fs_info, bio, BTRFS_WQ_ENDIO_RAID56);
 
+		debug_raid_bio(rbio, bio, "parity recover");
 		submit_bio(bio);
 	}
 
@@ -2625,6 +2704,7 @@  static noinline void finish_parity_scrub(struct btrfs_raid_bio *rbio,
 	while ((bio = bio_list_pop(&bio_list))) {
 		bio->bi_end_io = raid_write_end_io;
 
+		debug_raid_bio(rbio, bio, "scrub parity");
 		submit_bio(bio);
 	}
 	return;
@@ -2804,6 +2884,7 @@  static void raid56_parity_scrub_stripe(struct btrfs_raid_bio *rbio)
 
 		btrfs_bio_wq_end_io(rbio->bioc->fs_info, bio, BTRFS_WQ_ENDIO_RAID56);
 
+		debug_raid_bio(rbio, bio, "scrub stripe");
 		submit_bio(bio);
 	}
 	/* the actual write will happen once the reads are done */