[1/2] btrfs: account for pinned bytes and bytes_may_use in should_alloc_chunk
diff mbox

Message ID 1497455067-6050-1-git-send-email-jeffm@suse.com
State New
Headers show

Commit Message

Jeff Mahoney June 14, 2017, 3:44 p.m. UTC
From: Jeff Mahoney <jeffm@suse.com>

In a heavy write scenario, we can end up with a large number of pinned
bytes.  This can translate into (very) premature ENOSPC because pinned
bytes must be accounted for when allowing a reservation but aren't
accounted for when deciding whether to create a new chunk.

This patch adds the accounting to should_alloc_chunk so that we can
create the chunk.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
---
 fs/btrfs/extent-tree.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Jeff Mahoney June 21, 2017, 8:14 p.m. UTC | #1
On 6/14/17 11:44 AM, jeffm@suse.com wrote:
> From: Jeff Mahoney <jeffm@suse.com>
> 
> In a heavy write scenario, we can end up with a large number of pinned
> bytes.  This can translate into (very) premature ENOSPC because pinned
> bytes must be accounted for when allowing a reservation but aren't
> accounted for when deciding whether to create a new chunk.
> 
> This patch adds the accounting to should_alloc_chunk so that we can
> create the chunk.
> 
> Signed-off-by: Jeff Mahoney <jeffm@suse.com>
> ---
>  fs/btrfs/extent-tree.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index cb0b924..d027807 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -4389,7 +4389,7 @@ static int should_alloc_chunk(struct btrfs_fs_info *fs_info,
>  {
>  	struct btrfs_block_rsv *global_rsv = &fs_info->global_block_rsv;
>  	u64 num_bytes = sinfo->total_bytes - sinfo->bytes_readonly;
> -	u64 num_allocated = sinfo->bytes_used + sinfo->bytes_reserved;
> +	u64 num_allocated = sinfo->bytes_used + sinfo->bytes_reserved + sinfo->bytes_pinned + sinfo->bytes_may_use;
>  	u64 thresh;
>  
>  	if (force == CHUNK_ALLOC_FORCE)
> 


Ignore this patch.  It certainly allocates chunks more aggressively, but
it means we end up with a ton of metadata chunks even when we don't have
much metadata.

-Jeff
Chris Mason June 21, 2017, 8:31 p.m. UTC | #2
On 06/21/2017 04:14 PM, Jeff Mahoney wrote:
> On 6/14/17 11:44 AM, jeffm@suse.com wrote:
>> From: Jeff Mahoney <jeffm@suse.com>
>>
>> In a heavy write scenario, we can end up with a large number of pinned
>> bytes.  This can translate into (very) premature ENOSPC because pinned
>> bytes must be accounted for when allowing a reservation but aren't
>> accounted for when deciding whether to create a new chunk.
>>
>> This patch adds the accounting to should_alloc_chunk so that we can
>> create the chunk.
>>
>> Signed-off-by: Jeff Mahoney <jeffm@suse.com>
>> ---
>>  fs/btrfs/extent-tree.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>> index cb0b924..d027807 100644
>> --- a/fs/btrfs/extent-tree.c
>> +++ b/fs/btrfs/extent-tree.c
>> @@ -4389,7 +4389,7 @@ static int should_alloc_chunk(struct btrfs_fs_info *fs_info,
>>  {
>>  	struct btrfs_block_rsv *global_rsv = &fs_info->global_block_rsv;
>>  	u64 num_bytes = sinfo->total_bytes - sinfo->bytes_readonly;
>> -	u64 num_allocated = sinfo->bytes_used + sinfo->bytes_reserved;
>> +	u64 num_allocated = sinfo->bytes_used + sinfo->bytes_reserved + sinfo->bytes_pinned + sinfo->bytes_may_use;
>>  	u64 thresh;
>>
>>  	if (force == CHUNK_ALLOC_FORCE)
>>
>
>
> Ignore this patch.  It certainly allocates chunks more aggressively, but
> it means we end up with a ton of metadata chunks even when we don't have
> much metadata.
>

Josef and I pushed this needle back and forth a bunch of times in the 
early days.  I still think we can allocate a few more chunks than we do 
now...

-chris

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jeff Mahoney June 21, 2017, 9:08 p.m. UTC | #3
On 6/21/17 4:31 PM, Chris Mason wrote:
> On 06/21/2017 04:14 PM, Jeff Mahoney wrote:
>> On 6/14/17 11:44 AM, jeffm@suse.com wrote:
>>> From: Jeff Mahoney <jeffm@suse.com>
>>>
>>> In a heavy write scenario, we can end up with a large number of pinned
>>> bytes.  This can translate into (very) premature ENOSPC because pinned
>>> bytes must be accounted for when allowing a reservation but aren't
>>> accounted for when deciding whether to create a new chunk.
>>>
>>> This patch adds the accounting to should_alloc_chunk so that we can
>>> create the chunk.
>>>
>>> Signed-off-by: Jeff Mahoney <jeffm@suse.com>
>>> ---
>>>  fs/btrfs/extent-tree.c | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>>> index cb0b924..d027807 100644
>>> --- a/fs/btrfs/extent-tree.c
>>> +++ b/fs/btrfs/extent-tree.c
>>> @@ -4389,7 +4389,7 @@ static int should_alloc_chunk(struct
>>> btrfs_fs_info *fs_info,
>>>  {
>>>      struct btrfs_block_rsv *global_rsv = &fs_info->global_block_rsv;
>>>      u64 num_bytes = sinfo->total_bytes - sinfo->bytes_readonly;
>>> -    u64 num_allocated = sinfo->bytes_used + sinfo->bytes_reserved;
>>> +    u64 num_allocated = sinfo->bytes_used + sinfo->bytes_reserved +
>>> sinfo->bytes_pinned + sinfo->bytes_may_use;
>>>      u64 thresh;
>>>
>>>      if (force == CHUNK_ALLOC_FORCE)
>>>
>>
>>
>> Ignore this patch.  It certainly allocates chunks more aggressively, but
>> it means we end up with a ton of metadata chunks even when we don't have
>> much metadata.
>>
> 
> Josef and I pushed this needle back and forth a bunch of times in the
> early days.  I still think we can allocate a few more chunks than we do
> now...

I agree.  This patch was to fix an issue that we are seeing during
installation.  It'd stop with ENOSPC with >50GB completely unallocated.
The patch passed the test cases that were failing before but now it's
failing differently.  I was worried this pattern might be the end result:

Data,single: Size:4.00GiB, Used:3.32GiB
   /dev/vde        4.00GiB

Metadata,DUP: Size:20.00GiB, Used:204.12MiB
   /dev/vde       40.00GiB

System,DUP: Size:8.00MiB, Used:16.00KiB
   /dev/vde       16.00MiB

This is on a fresh file system with just "cp /usr /mnt" executed.

I'm looking into it a bit more now.

-Jeff
Chris Mason June 21, 2017, 9:15 p.m. UTC | #4
On 06/21/2017 05:08 PM, Jeff Mahoney wrote:
> On 6/21/17 4:31 PM, Chris Mason wrote:
>> On 06/21/2017 04:14 PM, Jeff Mahoney wrote:
>>> On 6/14/17 11:44 AM, jeffm@suse.com wrote:
>>>> From: Jeff Mahoney <jeffm@suse.com>
>>>>
>>>> In a heavy write scenario, we can end up with a large number of pinned
>>>> bytes.  This can translate into (very) premature ENOSPC because pinned
>>>> bytes must be accounted for when allowing a reservation but aren't
>>>> accounted for when deciding whether to create a new chunk.
>>>>
>>>> This patch adds the accounting to should_alloc_chunk so that we can
>>>> create the chunk.
>>>>
>>>> Signed-off-by: Jeff Mahoney <jeffm@suse.com>
>>>> ---
>>>>  fs/btrfs/extent-tree.c | 2 +-
>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>>>> index cb0b924..d027807 100644
>>>> --- a/fs/btrfs/extent-tree.c
>>>> +++ b/fs/btrfs/extent-tree.c
>>>> @@ -4389,7 +4389,7 @@ static int should_alloc_chunk(struct
>>>> btrfs_fs_info *fs_info,
>>>>  {
>>>>      struct btrfs_block_rsv *global_rsv = &fs_info->global_block_rsv;
>>>>      u64 num_bytes = sinfo->total_bytes - sinfo->bytes_readonly;
>>>> -    u64 num_allocated = sinfo->bytes_used + sinfo->bytes_reserved;
>>>> +    u64 num_allocated = sinfo->bytes_used + sinfo->bytes_reserved +
>>>> sinfo->bytes_pinned + sinfo->bytes_may_use;
>>>>      u64 thresh;
>>>>
>>>>      if (force == CHUNK_ALLOC_FORCE)
>>>>
>>>
>>>
>>> Ignore this patch.  It certainly allocates chunks more aggressively, but
>>> it means we end up with a ton of metadata chunks even when we don't have
>>> much metadata.
>>>
>>
>> Josef and I pushed this needle back and forth a bunch of times in the
>> early days.  I still think we can allocate a few more chunks than we do
>> now...
>
> I agree.  This patch was to fix an issue that we are seeing during
> installation.  It'd stop with ENOSPC with >50GB completely unallocated.
> The patch passed the test cases that were failing before but now it's
> failing differently.  I was worried this pattern might be the end result:
>
> Data,single: Size:4.00GiB, Used:3.32GiB
>    /dev/vde        4.00GiB
>
> Metadata,DUP: Size:20.00GiB, Used:204.12MiB
>    /dev/vde       40.00GiB
>
> System,DUP: Size:8.00MiB, Used:16.00KiB
>    /dev/vde       16.00MiB
>
> This is on a fresh file system with just "cp /usr /mnt" executed.
>
> I'm looking into it a bit more now.

Does this failure still happen with Omar's ENOSPC fix (commit: 
70e7af244f24c94604ef6eca32ad297632018583)

-chris

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jeff Mahoney June 21, 2017, 9:41 p.m. UTC | #5
On 6/21/17 5:15 PM, Chris Mason wrote:
> 
> 
> On 06/21/2017 05:08 PM, Jeff Mahoney wrote:
>> On 6/21/17 4:31 PM, Chris Mason wrote:
>>> On 06/21/2017 04:14 PM, Jeff Mahoney wrote:
>>>> On 6/14/17 11:44 AM, jeffm@suse.com wrote:
>>>>> From: Jeff Mahoney <jeffm@suse.com>
>>>>>
>>>>> In a heavy write scenario, we can end up with a large number of pinned
>>>>> bytes.  This can translate into (very) premature ENOSPC because pinned
>>>>> bytes must be accounted for when allowing a reservation but aren't
>>>>> accounted for when deciding whether to create a new chunk.
>>>>>
>>>>> This patch adds the accounting to should_alloc_chunk so that we can
>>>>> create the chunk.
>>>>>
>>>>> Signed-off-by: Jeff Mahoney <jeffm@suse.com>
>>>>> ---
>>>>>  fs/btrfs/extent-tree.c | 2 +-
>>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>>>>> index cb0b924..d027807 100644
>>>>> --- a/fs/btrfs/extent-tree.c
>>>>> +++ b/fs/btrfs/extent-tree.c
>>>>> @@ -4389,7 +4389,7 @@ static int should_alloc_chunk(struct
>>>>> btrfs_fs_info *fs_info,
>>>>>  {
>>>>>      struct btrfs_block_rsv *global_rsv = &fs_info->global_block_rsv;
>>>>>      u64 num_bytes = sinfo->total_bytes - sinfo->bytes_readonly;
>>>>> -    u64 num_allocated = sinfo->bytes_used + sinfo->bytes_reserved;
>>>>> +    u64 num_allocated = sinfo->bytes_used + sinfo->bytes_reserved +
>>>>> sinfo->bytes_pinned + sinfo->bytes_may_use;
>>>>>      u64 thresh;
>>>>>
>>>>>      if (force == CHUNK_ALLOC_FORCE)
>>>>>
>>>>
>>>>
>>>> Ignore this patch.  It certainly allocates chunks more aggressively,
>>>> but
>>>> it means we end up with a ton of metadata chunks even when we don't
>>>> have
>>>> much metadata.
>>>>
>>>
>>> Josef and I pushed this needle back and forth a bunch of times in the
>>> early days.  I still think we can allocate a few more chunks than we do
>>> now...
>>
>> I agree.  This patch was to fix an issue that we are seeing during
>> installation.  It'd stop with ENOSPC with >50GB completely unallocated.
>> The patch passed the test cases that were failing before but now it's
>> failing differently.  I was worried this pattern might be the end result:
>>
>> Data,single: Size:4.00GiB, Used:3.32GiB
>>    /dev/vde        4.00GiB
>>
>> Metadata,DUP: Size:20.00GiB, Used:204.12MiB
>>    /dev/vde       40.00GiB
>>
>> System,DUP: Size:8.00MiB, Used:16.00KiB
>>    /dev/vde       16.00MiB
>>
>> This is on a fresh file system with just "cp /usr /mnt" executed.
>>
>> I'm looking into it a bit more now.
> 
> Does this failure still happen with Omar's ENOSPC fix (commit:
> 70e7af244f24c94604ef6eca32ad297632018583)

Nope.  There aren't any warnings either with or without my patch.
Adding Omar's didn't make a difference.

-Jeff

Patch
diff mbox

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index cb0b924..d027807 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -4389,7 +4389,7 @@  static int should_alloc_chunk(struct btrfs_fs_info *fs_info,
 {
 	struct btrfs_block_rsv *global_rsv = &fs_info->global_block_rsv;
 	u64 num_bytes = sinfo->total_bytes - sinfo->bytes_readonly;
-	u64 num_allocated = sinfo->bytes_used + sinfo->bytes_reserved;
+	u64 num_allocated = sinfo->bytes_used + sinfo->bytes_reserved + sinfo->bytes_pinned + sinfo->bytes_may_use;
 	u64 thresh;
 
 	if (force == CHUNK_ALLOC_FORCE)