diff mbox series

btrfs: relocation: Output current relocation stage at btrfs_relocate_block_group()

Message ID 20191128075437.10621-1-wqu@suse.com (mailing list archive)
State New, archived
Headers show
Series btrfs: relocation: Output current relocation stage at btrfs_relocate_block_group() | expand

Commit Message

Qu Wenruo Nov. 28, 2019, 7:54 a.m. UTC
There are several reports of hanging relocation, populating the dmesg
with things like:
  BTRFS info (device dm-5): found 1 extents

The investigation is still on going, but will never hurt to output a
little more info.

This patch will also output the current relocation stage, making that
output something like:

  BTRFS info (device dm-5): balance: start -d -m -s
  BTRFS info (device dm-5): relocating block group 30408704 flags metadata|dup
  BTRFS info (device dm-5): found 2 extents at MOVE_DATA_EXTENT stage
  BTRFS info (device dm-5): relocating block group 22020096 flags system|dup
  BTRFS info (device dm-5): found 1 extents at MOVE_DATA_EXTENT stage
  BTRFS info (device dm-5): relocating block group 13631488 flags data
  BTRFS info (device dm-5): found 1 extents at MOVE_DATA_EXTENT stage
  BTRFS info (device dm-5): found 1 extents at UPDATE_DATA_PTRS stage
  BTRFS info (device dm-5): balance: ended with status: 0

The string "MOVE_DATA_EXTENT" and "UPDATE_DATA_PTRS" is mostly from the
macro MOVE_DATA_EXTENTS and UPDATE_DATA_PTRS, but the 'S' from
MOVE_DATA_EXTENTS is removed in the output string to make the alignment
better.

This patch will not increase the number of lines, but with extra info
for us to debug the reported problem.
(Although it's very likely the bug is sticking at UPDATE_DATA_PTRS
stage, even without the patch)

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/relocation.c | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

Comments

Su Yue Nov. 28, 2019, 10:50 a.m. UTC | #1
On 2019/11/28 3:54 PM, Qu Wenruo wrote:
> There are several reports of hanging relocation, populating the dmesg
> with things like:
>    BTRFS info (device dm-5): found 1 extents
>
> The investigation is still on going, but will never hurt to output a
> little more info.
>
> This patch will also output the current relocation stage, making that
> output something like:
>
>    BTRFS info (device dm-5): balance: start -d -m -s
>    BTRFS info (device dm-5): relocating block group 30408704 flags metadata|dup
>    BTRFS info (device dm-5): found 2 extents at MOVE_DATA_EXTENT stage
>    BTRFS info (device dm-5): relocating block group 22020096 flags system|dup
>    BTRFS info (device dm-5): found 1 extents at MOVE_DATA_EXTENT stage
>    BTRFS info (device dm-5): relocating block group 13631488 flags data
>    BTRFS info (device dm-5): found 1 extents at MOVE_DATA_EXTENT stage
>    BTRFS info (device dm-5): found 1 extents at UPDATE_DATA_PTRS stage
>    BTRFS info (device dm-5): balance: ended with status: 0
>
> The string "MOVE_DATA_EXTENT" and "UPDATE_DATA_PTRS" is mostly from the
> macro MOVE_DATA_EXTENTS and UPDATE_DATA_PTRS, but the 'S' from
> MOVE_DATA_EXTENTS is removed in the output string to make the alignment
> better.
>
> This patch will not increase the number of lines, but with extra info
> for us to debug the reported problem.
> (Although it's very likely the bug is sticking at UPDATE_DATA_PTRS
> stage, even without the patch)
>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>   fs/btrfs/relocation.c | 16 ++++++++++++++--
>   1 file changed, 14 insertions(+), 2 deletions(-)
>
> diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
> index d897a8e5e430..88fd9182852d 100644
> --- a/fs/btrfs/relocation.c
> +++ b/fs/btrfs/relocation.c
> @@ -4291,6 +4291,15 @@ static void describe_relocation(struct btrfs_fs_info *fs_info,
>   		   block_group->start, buf);
>   }
>
> +static const char *stage_to_string(int stage)
> +{
> +	if (stage == MOVE_DATA_EXTENTS)
> +		return "MOVE_DATA_EXTENT";
> +	if (stage == UPDATE_DATA_PTRS)
> +		return "UPDATE_DATA_PTRS";
> +	return "UNKNOWN";
> +}
> +
>   /*
>    * function to relocate all extents in a block group.
>    */
> @@ -4365,12 +4374,15 @@ int btrfs_relocate_block_group(struct btrfs_fs_info *fs_info, u64 group_start)
>   				 rc->block_group->length);
>
>   	while (1) {
> +		int finishes_stage;
> +

NIT: the rc::stage is an unsigned integer.


>   		mutex_lock(&fs_info->cleaner_mutex);
>   		ret = relocate_block_group(rc);
>   		mutex_unlock(&fs_info->cleaner_mutex);
>   		if (ret < 0)
>   			err = ret;
>
> +		finishes_stage = rc->stage;
>   		/*
>   		 * We may have gotten ENOSPC after we already dirtied some
>   		 * extents.  If writeout happens while we're relocating a
> @@ -4396,8 +4408,8 @@ int btrfs_relocate_block_group(struct btrfs_fs_info *fs_info, u64 group_start)
>   		if (rc->extents_found == 0)
>   			break;
>
> -		btrfs_info(fs_info, "found %llu extents", rc->extents_found);
> -
> +		btrfs_info(fs_info, "found %llu extents at %s stage",
> +			   rc->extents_found, stage_to_string(finishes_stage));
>   	}
>
>   	WARN_ON(rc->block_group->pinned > 0);
>
Qu Wenruo Nov. 28, 2019, 10:58 a.m. UTC | #2
On 2019/11/28 下午6:50, Su Yue wrote:
> 
> 
> On 2019/11/28 3:54 PM, Qu Wenruo wrote:
>> There are several reports of hanging relocation, populating the dmesg
>> with things like:
>>    BTRFS info (device dm-5): found 1 extents
>>
>> The investigation is still on going, but will never hurt to output a
>> little more info.
>>
>> This patch will also output the current relocation stage, making that
>> output something like:
>>
>>    BTRFS info (device dm-5): balance: start -d -m -s
>>    BTRFS info (device dm-5): relocating block group 30408704 flags
>> metadata|dup
>>    BTRFS info (device dm-5): found 2 extents at MOVE_DATA_EXTENT stage
>>    BTRFS info (device dm-5): relocating block group 22020096 flags
>> system|dup
>>    BTRFS info (device dm-5): found 1 extents at MOVE_DATA_EXTENT stage
>>    BTRFS info (device dm-5): relocating block group 13631488 flags data
>>    BTRFS info (device dm-5): found 1 extents at MOVE_DATA_EXTENT stage
>>    BTRFS info (device dm-5): found 1 extents at UPDATE_DATA_PTRS stage
>>    BTRFS info (device dm-5): balance: ended with status: 0
>>
>> The string "MOVE_DATA_EXTENT" and "UPDATE_DATA_PTRS" is mostly from the
>> macro MOVE_DATA_EXTENTS and UPDATE_DATA_PTRS, but the 'S' from
>> MOVE_DATA_EXTENTS is removed in the output string to make the alignment
>> better.
>>
>> This patch will not increase the number of lines, but with extra info
>> for us to debug the reported problem.
>> (Although it's very likely the bug is sticking at UPDATE_DATA_PTRS
>> stage, even without the patch)
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>>   fs/btrfs/relocation.c | 16 ++++++++++++++--
>>   1 file changed, 14 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
>> index d897a8e5e430..88fd9182852d 100644
>> --- a/fs/btrfs/relocation.c
>> +++ b/fs/btrfs/relocation.c
>> @@ -4291,6 +4291,15 @@ static void describe_relocation(struct
>> btrfs_fs_info *fs_info,
>>              block_group->start, buf);
>>   }
>>
>> +static const char *stage_to_string(int stage)
>> +{
>> +    if (stage == MOVE_DATA_EXTENTS)
>> +        return "MOVE_DATA_EXTENT";
>> +    if (stage == UPDATE_DATA_PTRS)
>> +        return "UPDATE_DATA_PTRS";
>> +    return "UNKNOWN";
>> +}
>> +
>>   /*
>>    * function to relocate all extents in a block group.
>>    */
>> @@ -4365,12 +4374,15 @@ int btrfs_relocate_block_group(struct
>> btrfs_fs_info *fs_info, u64 group_start)
>>                    rc->block_group->length);
>>
>>       while (1) {
>> +        int finishes_stage;
>> +
> 
> NIT: the rc::stage is an unsigned integer.

It doesn't matter. rc::stage is only a 8bit value.

Both unsigned int/int can handle it without any problem.
And the strage_to_string() function can handle any value, so no problem
at all.

Thanks,
Qu
> 
> 
>>           mutex_lock(&fs_info->cleaner_mutex);
>>           ret = relocate_block_group(rc);
>>           mutex_unlock(&fs_info->cleaner_mutex);
>>           if (ret < 0)
>>               err = ret;
>>
>> +        finishes_stage = rc->stage;
>>           /*
>>            * We may have gotten ENOSPC after we already dirtied some
>>            * extents.  If writeout happens while we're relocating a
>> @@ -4396,8 +4408,8 @@ int btrfs_relocate_block_group(struct
>> btrfs_fs_info *fs_info, u64 group_start)
>>           if (rc->extents_found == 0)
>>               break;
>>
>> -        btrfs_info(fs_info, "found %llu extents", rc->extents_found);
>> -
>> +        btrfs_info(fs_info, "found %llu extents at %s stage",
>> +               rc->extents_found, stage_to_string(finishes_stage));
>>       }
>>
>>       WARN_ON(rc->block_group->pinned > 0);
>>
David Sterba Nov. 28, 2019, 3:40 p.m. UTC | #3
On Thu, Nov 28, 2019 at 03:54:36PM +0800, Qu Wenruo wrote:
> There are several reports of hanging relocation, populating the dmesg
> with things like:
>   BTRFS info (device dm-5): found 1 extents
> 
> The investigation is still on going, but will never hurt to output a
> little more info.
> 
> This patch will also output the current relocation stage, making that
> output something like:
> 
>   BTRFS info (device dm-5): balance: start -d -m -s
>   BTRFS info (device dm-5): relocating block group 30408704 flags metadata|dup
>   BTRFS info (device dm-5): found 2 extents at MOVE_DATA_EXTENT stage
>   BTRFS info (device dm-5): relocating block group 22020096 flags system|dup
>   BTRFS info (device dm-5): found 1 extents at MOVE_DATA_EXTENT stage
>   BTRFS info (device dm-5): relocating block group 13631488 flags data
>   BTRFS info (device dm-5): found 1 extents at MOVE_DATA_EXTENT stage
>   BTRFS info (device dm-5): found 1 extents at UPDATE_DATA_PTRS stage
>   BTRFS info (device dm-5): balance: ended with status: 0
> 
> The string "MOVE_DATA_EXTENT" and "UPDATE_DATA_PTRS" is mostly from the
> macro MOVE_DATA_EXTENTS and UPDATE_DATA_PTRS, but the 'S' from
> MOVE_DATA_EXTENTS is removed in the output string to make the alignment
> better.
> 
> This patch will not increase the number of lines, but with extra info
> for us to debug the reported problem.

Nice. I'd suggest to make it more user friendly

	relocation: found 111 extents, stage: move data blocks
	relocation: found 111 extents, stage: update data pointers

The identifier can be understood what it means but it's IMHO not
important to copy it to the message verbatim.
Qu Wenruo Nov. 29, 2019, 12:35 a.m. UTC | #4
On 2019/11/28 下午11:40, David Sterba wrote:
> On Thu, Nov 28, 2019 at 03:54:36PM +0800, Qu Wenruo wrote:
>> There are several reports of hanging relocation, populating the dmesg
>> with things like:
>>   BTRFS info (device dm-5): found 1 extents
>>
>> The investigation is still on going, but will never hurt to output a
>> little more info.
>>
>> This patch will also output the current relocation stage, making that
>> output something like:
>>
>>   BTRFS info (device dm-5): balance: start -d -m -s
>>   BTRFS info (device dm-5): relocating block group 30408704 flags metadata|dup
>>   BTRFS info (device dm-5): found 2 extents at MOVE_DATA_EXTENT stage
>>   BTRFS info (device dm-5): relocating block group 22020096 flags system|dup
>>   BTRFS info (device dm-5): found 1 extents at MOVE_DATA_EXTENT stage
>>   BTRFS info (device dm-5): relocating block group 13631488 flags data
>>   BTRFS info (device dm-5): found 1 extents at MOVE_DATA_EXTENT stage
>>   BTRFS info (device dm-5): found 1 extents at UPDATE_DATA_PTRS stage
>>   BTRFS info (device dm-5): balance: ended with status: 0
>>
>> The string "MOVE_DATA_EXTENT" and "UPDATE_DATA_PTRS" is mostly from the
>> macro MOVE_DATA_EXTENTS and UPDATE_DATA_PTRS, but the 'S' from
>> MOVE_DATA_EXTENTS is removed in the output string to make the alignment
>> better.
>>
>> This patch will not increase the number of lines, but with extra info
>> for us to debug the reported problem.
> 
> Nice. I'd suggest to make it more user friendly
> 
> 	relocation: found 111 extents, stage: move data blocks
> 	relocation: found 111 extents, stage: update data pointers

This is much better, keeps the indent while provide better readability.

I'll go this way in the next version.

Thanks,
Qu
> 
> The identifier can be understood what it means but it's IMHO not
> important to copy it to the message verbatim.
>
diff mbox series

Patch

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index d897a8e5e430..88fd9182852d 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -4291,6 +4291,15 @@  static void describe_relocation(struct btrfs_fs_info *fs_info,
 		   block_group->start, buf);
 }
 
+static const char *stage_to_string(int stage)
+{
+	if (stage == MOVE_DATA_EXTENTS)
+		return "MOVE_DATA_EXTENT";
+	if (stage == UPDATE_DATA_PTRS)
+		return "UPDATE_DATA_PTRS";
+	return "UNKNOWN";
+}
+
 /*
  * function to relocate all extents in a block group.
  */
@@ -4365,12 +4374,15 @@  int btrfs_relocate_block_group(struct btrfs_fs_info *fs_info, u64 group_start)
 				 rc->block_group->length);
 
 	while (1) {
+		int finishes_stage;
+
 		mutex_lock(&fs_info->cleaner_mutex);
 		ret = relocate_block_group(rc);
 		mutex_unlock(&fs_info->cleaner_mutex);
 		if (ret < 0)
 			err = ret;
 
+		finishes_stage = rc->stage;
 		/*
 		 * We may have gotten ENOSPC after we already dirtied some
 		 * extents.  If writeout happens while we're relocating a
@@ -4396,8 +4408,8 @@  int btrfs_relocate_block_group(struct btrfs_fs_info *fs_info, u64 group_start)
 		if (rc->extents_found == 0)
 			break;
 
-		btrfs_info(fs_info, "found %llu extents", rc->extents_found);
-
+		btrfs_info(fs_info, "found %llu extents at %s stage",
+			   rc->extents_found, stage_to_string(finishes_stage));
 	}
 
 	WARN_ON(rc->block_group->pinned > 0);