Message ID | 8ad72827dc32542915f87db73cbb6b609f24dce4.1600074377.git.anand.jain@oracle.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | btrfs: add a warning to check on the leaking device close | expand |
On Mon, Sep 14, 2020 at 05:11:14PM +0800, Anand Jain wrote: > To help better understand the device-close leaks, add a warning if the > device freed is still open. Have you seen that happen or is it just a precaution? I've checked where the bdev is set to NULL and all paths seem to be covered, so the warn_on does not harm anything just that it does not seem to be possible to hit. For that an assert would be better. > Signed-off-by: Anand Jain <anand.jain@oracle.com> > --- > fs/btrfs/volumes.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c > index 2d5cc644741e..c0dfc0b569e9 100644 > --- a/fs/btrfs/volumes.c > +++ b/fs/btrfs/volumes.c > @@ -371,6 +371,7 @@ static struct btrfs_fs_devices *alloc_fs_devices(const u8 *fsid, > void btrfs_free_device(struct btrfs_device *device) > { > WARN_ON(!list_empty(&device->post_commit_list)); > + WARN_ON(device->bdev); > rcu_string_free(device->name); > extent_io_tree_release(&device->alloc_state); > bio_put(device->flush_bio); > -- > 2.25.1
On 16/9/20 6:10 pm, David Sterba wrote: > On Mon, Sep 14, 2020 at 05:11:14PM +0800, Anand Jain wrote: >> To help better understand the device-close leaks, add a warning if the >> device freed is still open. > > Have you seen that happen or is it just a precaution? I've checked where > the bdev is set to NULL and all paths seem to be covered, so the warn_on > does not harm anything just that it does not seem to be possible to hit. > For that an assert would be better. There is an early/unconfirmed report [1] that after the forget sub-command a device had partition changes and the new partitions failed to recognize by the kernel. [1] https://lore.kernel.org/linux-btrfs/40e2315e-e60e-6161-ceb7-acd8b8a4e4a0@oracle.com/T/#t The mount thread can't use device_list_mutex (because of bd_mutex), and we rely on the uuid_mutex during mount. The forget thread used both uuid_mutex and device_list_mutex. So there isn't race between these two. As of now we don't know. So the warning will help to know if we are missing something. Thanks, Anand > >> Signed-off-by: Anand Jain <anand.jain@oracle.com> >> --- >> fs/btrfs/volumes.c | 1 + >> 1 file changed, 1 insertion(+) >> >> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c >> index 2d5cc644741e..c0dfc0b569e9 100644 >> --- a/fs/btrfs/volumes.c >> +++ b/fs/btrfs/volumes.c >> @@ -371,6 +371,7 @@ static struct btrfs_fs_devices *alloc_fs_devices(const u8 *fsid, >> void btrfs_free_device(struct btrfs_device *device) >> { >> WARN_ON(!list_empty(&device->post_commit_list)); >> + WARN_ON(device->bdev); >> rcu_string_free(device->name); >> extent_io_tree_release(&device->alloc_state); >> bio_put(device->flush_bio); >> -- >> 2.25.1
On 9/16/20 7:51 AM, Anand Jain wrote: > > > On 16/9/20 6:10 pm, David Sterba wrote: >> On Mon, Sep 14, 2020 at 05:11:14PM +0800, Anand Jain wrote: >>> To help better understand the device-close leaks, add a warning if the >>> device freed is still open. >> >> Have you seen that happen or is it just a precaution? I've checked where >> the bdev is set to NULL and all paths seem to be covered, so the warn_on >> does not harm anything just that it does not seem to be possible to hit. >> For that an assert would be better. > > There is an early/unconfirmed report [1] that after the forget > sub-command a device had partition changes and the new partitions failed > to recognize by the kernel. > [1] > https://lore.kernel.org/linux-btrfs/40e2315e-e60e-6161-ceb7-acd8b8a4e4a0@oracle.com/T/#t > > > The mount thread can't use device_list_mutex (because of bd_mutex), > and we rely on the uuid_mutex during mount. > > The forget thread used both uuid_mutex and device_list_mutex. > > So there isn't race between these two. > > As of now we don't know. So the warning will help to know if we are > missing something. > It is clear that it can't really happen, but if we're worried about it I'd rather it be an ASSERT(). Thanks, Josef
On Mon, Sep 28, 2020 at 02:16:34PM -0400, Josef Bacik wrote: > On 9/16/20 7:51 AM, Anand Jain wrote: > > > > > > On 16/9/20 6:10 pm, David Sterba wrote: > >> On Mon, Sep 14, 2020 at 05:11:14PM +0800, Anand Jain wrote: > >>> To help better understand the device-close leaks, add a warning if the > >>> device freed is still open. > >> > >> Have you seen that happen or is it just a precaution? I've checked where > >> the bdev is set to NULL and all paths seem to be covered, so the warn_on > >> does not harm anything just that it does not seem to be possible to hit. > >> For that an assert would be better. > > > > There is an early/unconfirmed report [1] that after the forget > > sub-command a device had partition changes and the new partitions failed > > to recognize by the kernel. > > [1] > > https://lore.kernel.org/linux-btrfs/40e2315e-e60e-6161-ceb7-acd8b8a4e4a0@oracle.com/T/#t > > > > > > The mount thread can't use device_list_mutex (because of bd_mutex), > > and we rely on the uuid_mutex during mount. > > > > The forget thread used both uuid_mutex and device_list_mutex. > > > > So there isn't race between these two. > > > > As of now we don't know. So the warning will help to know if we are > > missing something. > > > > It is clear that it can't really happen, but if we're worried about it I'd > rather it be an ASSERT(). Thanks, I'm going through the patch backlog and the patch may be still relevant but a lot of device locking code has changed recently. Anand, if this is still relevant, please resend, thanks.
On 9/11/21 3:58 am, David Sterba wrote: > On Mon, Sep 28, 2020 at 02:16:34PM -0400, Josef Bacik wrote: >> On 9/16/20 7:51 AM, Anand Jain wrote: >>> >>> >>> On 16/9/20 6:10 pm, David Sterba wrote: >>>> On Mon, Sep 14, 2020 at 05:11:14PM +0800, Anand Jain wrote: >>>>> To help better understand the device-close leaks, add a warning if the >>>>> device freed is still open. >>>> >>>> Have you seen that happen or is it just a precaution? I've checked where >>>> the bdev is set to NULL and all paths seem to be covered, so the warn_on >>>> does not harm anything just that it does not seem to be possible to hit. >>>> For that an assert would be better. >>> >>> There is an early/unconfirmed report [1] that after the forget >>> sub-command a device had partition changes and the new partitions failed >>> to recognize by the kernel. >>> [1] >>> https://lore.kernel.org/linux-btrfs/40e2315e-e60e-6161-ceb7-acd8b8a4e4a0@oracle.com/T/#t >>> >>> >>> The mount thread can't use device_list_mutex (because of bd_mutex), >>> and we rely on the uuid_mutex during mount. >>> >>> The forget thread used both uuid_mutex and device_list_mutex. >>> >>> So there isn't race between these two. >>> >>> As of now we don't know. So the warning will help to know if we are >>> missing something. >>> >> >> It is clear that it can't really happen, but if we're worried about it I'd >> rather it be an ASSERT(). Thanks, > > I'm going through the patch backlog and the patch may be still relevant > but a lot of device locking code has changed recently. > > Anand, if this is still relevant, please resend, thanks. > This patch is not relevant anymore. The commit 3fa421dedbc8 ("btrfs: delay blkdev_put until after the device removes") introduced blkdev_put after the device-free. So pls ignore this patch. Thanks, Anand
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 2d5cc644741e..c0dfc0b569e9 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -371,6 +371,7 @@ static struct btrfs_fs_devices *alloc_fs_devices(const u8 *fsid, void btrfs_free_device(struct btrfs_device *device) { WARN_ON(!list_empty(&device->post_commit_list)); + WARN_ON(device->bdev); rcu_string_free(device->name); extent_io_tree_release(&device->alloc_state); bio_put(device->flush_bio);
To help better understand the device-close leaks, add a warning if the device freed is still open. Signed-off-by: Anand Jain <anand.jain@oracle.com> --- fs/btrfs/volumes.c | 1 + 1 file changed, 1 insertion(+)