diff mbox series

btrfs: add a warning to check on the leaking device close

Message ID 8ad72827dc32542915f87db73cbb6b609f24dce4.1600074377.git.anand.jain@oracle.com (mailing list archive)
State New, archived
Headers show
Series btrfs: add a warning to check on the leaking device close | expand

Commit Message

Anand Jain Sept. 14, 2020, 9:11 a.m. UTC
To help better understand the device-close leaks, add a warning if the
device freed is still open.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
 fs/btrfs/volumes.c | 1 +
 1 file changed, 1 insertion(+)

Comments

David Sterba Sept. 16, 2020, 10:10 a.m. UTC | #1
On Mon, Sep 14, 2020 at 05:11:14PM +0800, Anand Jain wrote:
> To help better understand the device-close leaks, add a warning if the
> device freed is still open.

Have you seen that happen or is it just a precaution? I've checked where
the bdev is set to NULL and all paths seem to be covered, so the warn_on
does not harm anything just that it does not seem to be possible to hit.
For that an assert would be better.

> Signed-off-by: Anand Jain <anand.jain@oracle.com>
> ---
>  fs/btrfs/volumes.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 2d5cc644741e..c0dfc0b569e9 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -371,6 +371,7 @@ static struct btrfs_fs_devices *alloc_fs_devices(const u8 *fsid,
>  void btrfs_free_device(struct btrfs_device *device)
>  {
>  	WARN_ON(!list_empty(&device->post_commit_list));
> +	WARN_ON(device->bdev);
>  	rcu_string_free(device->name);
>  	extent_io_tree_release(&device->alloc_state);
>  	bio_put(device->flush_bio);
> -- 
> 2.25.1
Anand Jain Sept. 16, 2020, 11:51 a.m. UTC | #2
On 16/9/20 6:10 pm, David Sterba wrote:
> On Mon, Sep 14, 2020 at 05:11:14PM +0800, Anand Jain wrote:
>> To help better understand the device-close leaks, add a warning if the
>> device freed is still open.
> 
> Have you seen that happen or is it just a precaution? I've checked where
> the bdev is set to NULL and all paths seem to be covered, so the warn_on
> does not harm anything just that it does not seem to be possible to hit.
> For that an assert would be better.

There is an early/unconfirmed report [1] that after the forget
sub-command a device had partition changes and the new partitions failed
to recognize by the kernel.
[1]
https://lore.kernel.org/linux-btrfs/40e2315e-e60e-6161-ceb7-acd8b8a4e4a0@oracle.com/T/#t

The mount thread can't use device_list_mutex (because of bd_mutex),
and we rely on the uuid_mutex during mount.

The forget thread used both uuid_mutex and device_list_mutex.

So there isn't race between these two.

As of now we don't know. So the warning will help to know if we are
missing something.

Thanks, Anand


> 
>> Signed-off-by: Anand Jain <anand.jain@oracle.com>
>> ---
>>   fs/btrfs/volumes.c | 1 +
>>   1 file changed, 1 insertion(+)
>>
>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>> index 2d5cc644741e..c0dfc0b569e9 100644
>> --- a/fs/btrfs/volumes.c
>> +++ b/fs/btrfs/volumes.c
>> @@ -371,6 +371,7 @@ static struct btrfs_fs_devices *alloc_fs_devices(const u8 *fsid,
>>   void btrfs_free_device(struct btrfs_device *device)
>>   {
>>   	WARN_ON(!list_empty(&device->post_commit_list));
>> +	WARN_ON(device->bdev);
>>   	rcu_string_free(device->name);
>>   	extent_io_tree_release(&device->alloc_state);
>>   	bio_put(device->flush_bio);
>> -- 
>> 2.25.1
Josef Bacik Sept. 28, 2020, 6:16 p.m. UTC | #3
On 9/16/20 7:51 AM, Anand Jain wrote:
> 
> 
> On 16/9/20 6:10 pm, David Sterba wrote:
>> On Mon, Sep 14, 2020 at 05:11:14PM +0800, Anand Jain wrote:
>>> To help better understand the device-close leaks, add a warning if the
>>> device freed is still open.
>>
>> Have you seen that happen or is it just a precaution? I've checked where
>> the bdev is set to NULL and all paths seem to be covered, so the warn_on
>> does not harm anything just that it does not seem to be possible to hit.
>> For that an assert would be better.
> 
> There is an early/unconfirmed report [1] that after the forget
> sub-command a device had partition changes and the new partitions failed
> to recognize by the kernel.
> [1]
> https://lore.kernel.org/linux-btrfs/40e2315e-e60e-6161-ceb7-acd8b8a4e4a0@oracle.com/T/#t 
> 
> 
> The mount thread can't use device_list_mutex (because of bd_mutex),
> and we rely on the uuid_mutex during mount.
> 
> The forget thread used both uuid_mutex and device_list_mutex.
> 
> So there isn't race between these two.
> 
> As of now we don't know. So the warning will help to know if we are
> missing something.
> 

It is clear that it can't really happen, but if we're worried about it I'd 
rather it be an ASSERT().  Thanks,

Josef
David Sterba Nov. 8, 2021, 7:58 p.m. UTC | #4
On Mon, Sep 28, 2020 at 02:16:34PM -0400, Josef Bacik wrote:
> On 9/16/20 7:51 AM, Anand Jain wrote:
> > 
> > 
> > On 16/9/20 6:10 pm, David Sterba wrote:
> >> On Mon, Sep 14, 2020 at 05:11:14PM +0800, Anand Jain wrote:
> >>> To help better understand the device-close leaks, add a warning if the
> >>> device freed is still open.
> >>
> >> Have you seen that happen or is it just a precaution? I've checked where
> >> the bdev is set to NULL and all paths seem to be covered, so the warn_on
> >> does not harm anything just that it does not seem to be possible to hit.
> >> For that an assert would be better.
> > 
> > There is an early/unconfirmed report [1] that after the forget
> > sub-command a device had partition changes and the new partitions failed
> > to recognize by the kernel.
> > [1]
> > https://lore.kernel.org/linux-btrfs/40e2315e-e60e-6161-ceb7-acd8b8a4e4a0@oracle.com/T/#t 
> > 
> > 
> > The mount thread can't use device_list_mutex (because of bd_mutex),
> > and we rely on the uuid_mutex during mount.
> > 
> > The forget thread used both uuid_mutex and device_list_mutex.
> > 
> > So there isn't race between these two.
> > 
> > As of now we don't know. So the warning will help to know if we are
> > missing something.
> > 
> 
> It is clear that it can't really happen, but if we're worried about it I'd 
> rather it be an ASSERT().  Thanks,

I'm going through the patch backlog and the patch may be still relevant
but a lot of device locking code has changed recently.

Anand, if this is still relevant, please resend, thanks.
Anand Jain Nov. 9, 2021, 8:55 a.m. UTC | #5
On 9/11/21 3:58 am, David Sterba wrote:
> On Mon, Sep 28, 2020 at 02:16:34PM -0400, Josef Bacik wrote:
>> On 9/16/20 7:51 AM, Anand Jain wrote:
>>>
>>>
>>> On 16/9/20 6:10 pm, David Sterba wrote:
>>>> On Mon, Sep 14, 2020 at 05:11:14PM +0800, Anand Jain wrote:
>>>>> To help better understand the device-close leaks, add a warning if the
>>>>> device freed is still open.
>>>>
>>>> Have you seen that happen or is it just a precaution? I've checked where
>>>> the bdev is set to NULL and all paths seem to be covered, so the warn_on
>>>> does not harm anything just that it does not seem to be possible to hit.
>>>> For that an assert would be better.
>>>
>>> There is an early/unconfirmed report [1] that after the forget
>>> sub-command a device had partition changes and the new partitions failed
>>> to recognize by the kernel.
>>> [1]
>>> https://lore.kernel.org/linux-btrfs/40e2315e-e60e-6161-ceb7-acd8b8a4e4a0@oracle.com/T/#t
>>>
>>>
>>> The mount thread can't use device_list_mutex (because of bd_mutex),
>>> and we rely on the uuid_mutex during mount.
>>>
>>> The forget thread used both uuid_mutex and device_list_mutex.
>>>
>>> So there isn't race between these two.
>>>
>>> As of now we don't know. So the warning will help to know if we are
>>> missing something.
>>>
>>
>> It is clear that it can't really happen, but if we're worried about it I'd
>> rather it be an ASSERT().  Thanks,
> 
> I'm going through the patch backlog and the patch may be still relevant
> but a lot of device locking code has changed recently.
> 
> Anand, if this is still relevant, please resend, thanks.
> 

This patch is not relevant anymore. The commit 3fa421dedbc8  ("btrfs: 
delay blkdev_put until after the device removes") introduced blkdev_put 
after the device-free. So pls ignore this patch.

Thanks, Anand
diff mbox series

Patch

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 2d5cc644741e..c0dfc0b569e9 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -371,6 +371,7 @@  static struct btrfs_fs_devices *alloc_fs_devices(const u8 *fsid,
 void btrfs_free_device(struct btrfs_device *device)
 {
 	WARN_ON(!list_empty(&device->post_commit_list));
+	WARN_ON(device->bdev);
 	rcu_string_free(device->name);
 	extent_io_tree_release(&device->alloc_state);
 	bio_put(device->flush_bio);