diff mbox

btrfs: fix deadlock when writing out space cache

Message ID 1510780852-8862-1-git-send-email-josef@toxicpanda.com (mailing list archive)
State New, archived
Headers show

Commit Message

Josef Bacik Nov. 15, 2017, 9:20 p.m. UTC
From: Josef Bacik <jbacik@fb.com>

If we fail to prepare our pages for whatever reason (out of memory in
our case) we need to make sure to drop the block_group->data_rwsem,
otherwise hilarity ensues.

Signed-off-by: Josef Bacik <jbacik@fb.com>
---
 fs/btrfs/free-space-cache.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

Comments

Omar Sandoval Nov. 15, 2017, 9:29 p.m. UTC | #1
On Wed, Nov 15, 2017 at 04:20:52PM -0500, Josef Bacik wrote:
> From: Josef Bacik <jbacik@fb.com>
> 
> If we fail to prepare our pages for whatever reason (out of memory in
> our case) we need to make sure to drop the block_group->data_rwsem,
> otherwise hilarity ensues.
> 
> Signed-off-by: Josef Bacik <jbacik@fb.com>

Reviewed-by: Omar Sandoval <osandov@fb.com>

> ---
>  fs/btrfs/free-space-cache.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
> index cdc9f4015ec3..a6c643275210 100644
> --- a/fs/btrfs/free-space-cache.c
> +++ b/fs/btrfs/free-space-cache.c
> @@ -1263,8 +1263,12 @@ static int __btrfs_write_out_cache(struct btrfs_root *root, struct inode *inode,
>  
>  	/* Lock all pages first so we can lock the extent safely. */
>  	ret = io_ctl_prepare_pages(io_ctl, inode, 0);
> -	if (ret)
> +	if (ret) {
> +		if (block_group &&
> +		    (block_group->flags & BTRFS_BLOCK_GROUP_DATA))
> +			up_write(&block_group->data_rwsem);
>  		goto out;
> +	}
>  
>  	lock_extent_bits(&BTRFS_I(inode)->io_tree, 0, i_size_read(inode) - 1,
>  			 &cached_state);
> -- 
> 2.7.5
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Liu Bo Nov. 15, 2017, 11:46 p.m. UTC | #2
On Wed, Nov 15, 2017 at 04:20:52PM -0500, Josef Bacik wrote:
> From: Josef Bacik <jbacik@fb.com>
> 
> If we fail to prepare our pages for whatever reason (out of memory in
> our case) we need to make sure to drop the block_group->data_rwsem,
> otherwise hilarity ensues.
>

Reviewed-by: Liu Bo <bo.li.liu@oracle.com>

Thanks,

-liubo
> Signed-off-by: Josef Bacik <jbacik@fb.com>
> ---
>  fs/btrfs/free-space-cache.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
> index cdc9f4015ec3..a6c643275210 100644
> --- a/fs/btrfs/free-space-cache.c
> +++ b/fs/btrfs/free-space-cache.c
> @@ -1263,8 +1263,12 @@ static int __btrfs_write_out_cache(struct btrfs_root *root, struct inode *inode,
>  
>  	/* Lock all pages first so we can lock the extent safely. */
>  	ret = io_ctl_prepare_pages(io_ctl, inode, 0);
> -	if (ret)
> +	if (ret) {
> +		if (block_group &&
> +		    (block_group->flags & BTRFS_BLOCK_GROUP_DATA))
> +			up_write(&block_group->data_rwsem);
>  		goto out;
> +	}
>  
>  	lock_extent_bits(&BTRFS_I(inode)->io_tree, 0, i_size_read(inode) - 1,
>  			 &cached_state);
> -- 
> 2.7.5
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Chris Mason Nov. 16, 2017, 1:52 a.m. UTC | #3
On 11/15/2017 06:46 PM, Liu Bo wrote:
> On Wed, Nov 15, 2017 at 04:20:52PM -0500, Josef Bacik wrote:
>> From: Josef Bacik <jbacik@fb.com>
>>
>> If we fail to prepare our pages for whatever reason (out of memory in
>> our case) we need to make sure to drop the block_group->data_rwsem,
>> otherwise hilarity ensues.
>>

Thanks Josef, I searched all the logs and it looks like we've really 
only hit this twice this month.  It's surprising we haven't seen this 
more given how often we OOM.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nikolay Borisov Nov. 16, 2017, 8:09 a.m. UTC | #4
On 15.11.2017 23:20, Josef Bacik wrote:
> From: Josef Bacik <jbacik@fb.com>
> 
> If we fail to prepare our pages for whatever reason (out of memory in
> our case) we need to make sure to drop the block_group->data_rwsem,
> otherwise hilarity ensues.
> 
> Signed-off-by: Josef Bacik <jbacik@fb.com>
> ---
>  fs/btrfs/free-space-cache.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
> index cdc9f4015ec3..a6c643275210 100644
> --- a/fs/btrfs/free-space-cache.c
> +++ b/fs/btrfs/free-space-cache.c
> @@ -1263,8 +1263,12 @@ static int __btrfs_write_out_cache(struct btrfs_root *root, struct inode *inode,
>  
>  	/* Lock all pages first so we can lock the extent safely. */
>  	ret = io_ctl_prepare_pages(io_ctl, inode, 0);
> -	if (ret)
> +	if (ret) {
> +		if (block_group &&
> +		    (block_group->flags & BTRFS_BLOCK_GROUP_DATA))
> +			up_write(&block_group->data_rwsem);
>  		goto out;
> +	}

Which function after out: label causes a deadlock - btrfs_update_inode
(unlikely) or invalidate_inode_pages2?

>  
>  	lock_extent_bits(&BTRFS_I(inode)->io_tree, 0, i_size_read(inode) - 1,
>  			 &cached_state);
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Chris Mason Nov. 16, 2017, 1:50 p.m. UTC | #5
On 11/16/2017 03:09 AM, Nikolay Borisov wrote:
> 
> 
> On 15.11.2017 23:20, Josef Bacik wrote:
>> From: Josef Bacik <jbacik@fb.com>
>>
>> If we fail to prepare our pages for whatever reason (out of memory in
>> our case) we need to make sure to drop the block_group->data_rwsem,
>> otherwise hilarity ensues.
>>
>> Signed-off-by: Josef Bacik <jbacik@fb.com>
>> ---
>>   fs/btrfs/free-space-cache.c | 6 +++++-
>>   1 file changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
>> index cdc9f4015ec3..a6c643275210 100644
>> --- a/fs/btrfs/free-space-cache.c
>> +++ b/fs/btrfs/free-space-cache.c
>> @@ -1263,8 +1263,12 @@ static int __btrfs_write_out_cache(struct btrfs_root *root, struct inode *inode,
>>   
>>   	/* Lock all pages first so we can lock the extent safely. */
>>   	ret = io_ctl_prepare_pages(io_ctl, inode, 0);
>> -	if (ret)
>> +	if (ret) {
>> +		if (block_group &&
>> +		    (block_group->flags & BTRFS_BLOCK_GROUP_DATA))
>> +			up_write(&block_group->data_rwsem);
>>   		goto out;
>> +	}
> 
> Which function after out: label causes a deadlock - btrfs_update_inode
> (unlikely) or invalidate_inode_pages2?

Neither, out: just doesn't drop the data_rwsem mutex, so it leaves the 
block group locked.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nikolay Borisov Nov. 16, 2017, 1:53 p.m. UTC | #6
On 16.11.2017 15:50, Chris Mason wrote:
> 
> 
> On 11/16/2017 03:09 AM, Nikolay Borisov wrote:
>>
>>
>> On 15.11.2017 23:20, Josef Bacik wrote:
>>> From: Josef Bacik <jbacik@fb.com>
>>>
>>> If we fail to prepare our pages for whatever reason (out of memory in
>>> our case) we need to make sure to drop the block_group->data_rwsem,
>>> otherwise hilarity ensues.
>>>
>>> Signed-off-by: Josef Bacik <jbacik@fb.com>
>>> ---
>>>   fs/btrfs/free-space-cache.c | 6 +++++-
>>>   1 file changed, 5 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
>>> index cdc9f4015ec3..a6c643275210 100644
>>> --- a/fs/btrfs/free-space-cache.c
>>> +++ b/fs/btrfs/free-space-cache.c
>>> @@ -1263,8 +1263,12 @@ static int __btrfs_write_out_cache(struct
>>> btrfs_root *root, struct inode *inode,
>>>         /* Lock all pages first so we can lock the extent safely. */
>>>       ret = io_ctl_prepare_pages(io_ctl, inode, 0);
>>> -    if (ret)
>>> +    if (ret) {
>>> +        if (block_group &&
>>> +            (block_group->flags & BTRFS_BLOCK_GROUP_DATA))
>>> +            up_write(&block_group->data_rwsem);
>>>           goto out;
>>> +    }
>>
>> Which function after out: label causes a deadlock - btrfs_update_inode
>> (unlikely) or invalidate_inode_pages2?
> 
> Neither, out: just doesn't drop the data_rwsem mutex, so it leaves the
> block group locked.

Ah, it has a return ret; and never hits the code under out_nospc, fair
enough.

> 
> -chris
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Sterba Nov. 20, 2017, 5:22 p.m. UTC | #7
On Wed, Nov 15, 2017 at 04:20:52PM -0500, Josef Bacik wrote:
> From: Josef Bacik <jbacik@fb.com>
> 
> If we fail to prepare our pages for whatever reason (out of memory in
> our case) we need to make sure to drop the block_group->data_rwsem,
> otherwise hilarity ensues.
> 
> Signed-off-by: Josef Bacik <jbacik@fb.com>
> ---
>  fs/btrfs/free-space-cache.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
> index cdc9f4015ec3..a6c643275210 100644
> --- a/fs/btrfs/free-space-cache.c
> +++ b/fs/btrfs/free-space-cache.c
> @@ -1263,8 +1263,12 @@ static int __btrfs_write_out_cache(struct btrfs_root *root, struct inode *inode,
>  
>  	/* Lock all pages first so we can lock the extent safely. */
>  	ret = io_ctl_prepare_pages(io_ctl, inode, 0);
> -	if (ret)
> +	if (ret) {
> +		if (block_group &&
> +		    (block_group->flags & BTRFS_BLOCK_GROUP_DATA))
> +			up_write(&block_group->data_rwsem);
>  		goto out;

The unlocking sequence is in the exit block but does not have a label.
It would be better to reuse the code.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index cdc9f4015ec3..a6c643275210 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -1263,8 +1263,12 @@  static int __btrfs_write_out_cache(struct btrfs_root *root, struct inode *inode,
 
 	/* Lock all pages first so we can lock the extent safely. */
 	ret = io_ctl_prepare_pages(io_ctl, inode, 0);
-	if (ret)
+	if (ret) {
+		if (block_group &&
+		    (block_group->flags & BTRFS_BLOCK_GROUP_DATA))
+			up_write(&block_group->data_rwsem);
 		goto out;
+	}
 
 	lock_extent_bits(&BTRFS_I(inode)->io_tree, 0, i_size_read(inode) - 1,
 			 &cached_state);