diff mbox

Btrfs: fix a deadlock on chunk mutex

Message ID 20130130163832.GH3660@localhost.localdomain (mailing list archive)
State New, archived
Headers show

Commit Message

Josef Bacik Jan. 30, 2013, 4:38 p.m. UTC
On Tue, Jan 29, 2013 at 04:05:17PM -0700, Jim Schutt wrote:
> On 01/29/2013 01:04 PM, Josef Bacik wrote:
> > On Tue, Jan 29, 2013 at 11:41:10AM -0700, Jim Schutt wrote:
> >> > On 01/28/2013 02:23 PM, Josef Bacik wrote:
> >>> > > On Thu, Jan 03, 2013 at 11:44:46AM -0700, Jim Schutt wrote:
> >>>> > >> Hi Josef,
> >>>> > >>
> >>>> > >> Thanks for the patch - sorry for the long delay in testing...
> >>>> > >>
> >>> > > 
> >>> > > Jim,
> >>> > > 
> >>> > > I've been trying to reason out how this happens, could you do a btrfs fi df on
> >>> > > the filesystem thats giving you trouble so I can see if what I think is
> >>> > > happening is what's actually happening.  Thanks,
> >> > 
> >> > Here's an example, using a slightly different kernel than
> >> > my previous report.  It's your btrfs-next master branch
> >> > (commit 8f139e59d5 "Btrfs: use bit operation for ->fs_state")
> >> > with ceph 3.8 for-linus (commit 0fa6ebc600 from linus' tree).
> >> > 
> >> > 
> >> > Here I'm finding the file system in question:
> >> > 
> >> > # ls -l /dev/mapper | grep dm-93
> >> > lrwxrwxrwx 1 root root       8 Jan 29 11:13 cs53s19p2 -> ../dm-93
> >> > 
> >> > # df -h | grep -A 1 cs53s19p2
> >> > /dev/mapper/cs53s19p2
> >> >                       896G  1.1G  896G   1% /ram/mnt/ceph/data.osd.522
> >> > 
> >> > 
> >> > Here's the info you asked for:
> >> > 
> >> > # btrfs fi df /ram/mnt/ceph/data.osd.522
> >> > Data: total=2.01GB, used=1.00GB
> >> > System: total=4.00MB, used=64.00KB
> >> > Metadata: total=8.00MB, used=7.56MB
> >> > 
> > How big is the disk you are using, and what mount options?  I have a patch to
> > keep the panic from happening and hopefully the abort, could you try this?  I
> > still want to keep the underlying error from happening because it shouldn't be,
> > but no reason I can't fix the error case while you can easily reproduce it :).
> > Thanks,
> > 
> > Josef
> > 
> >>From c50b725c74c7d39064e553ef85ac9753efbd8aec Mon Sep 17 00:00:00 2001
> > From: Josef Bacik <jbacik@fusionio.com>
> > Date: Tue, 29 Jan 2013 15:03:37 -0500
> > Subject: [PATCH] Btrfs: fix chunk allocation error handling
> > 
> > If we error out allocating a dev extent we will have already created the
> > block group and such which will cause problems since the allocator may have
> > tried to allocate out of the block group that no longer exists.  This will
> > cause BUG_ON()'s in the bio submission path.  This also makes a failure to
> > allocate a dev extent a non-abort error, we will just clean up the dev
> > extents we did allocate and exit.  Now if we fail to delete the dev extents
> > we will abort since we can't have half of the dev extents hanging around,
> > but this will make us much less likely to abort.  Thanks,
> > 
> > Signed-off-by: Josef Bacik <jbacik@fusionio.com>
> > ---
> 
> Interesting - with your patch applied I triggered the following, just
> bringing up a fresh Ceph filesystem - I didn't even get a chance to
> mount it on my Ceph clients:
> 

Ok can you give this patch a whirl as well?  It seems to fix the problem for me.
Thanks,

Josef

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Jim Schutt Jan. 30, 2013, 9:37 p.m. UTC | #1
On 01/30/2013 09:38 AM, Josef Bacik wrote:
> On Tue, Jan 29, 2013 at 04:05:17PM -0700, Jim Schutt wrote:
>> > On 01/29/2013 01:04 PM, Josef Bacik wrote:
>>> > > On Tue, Jan 29, 2013 at 11:41:10AM -0700, Jim Schutt wrote:
>>>>> > >> > On 01/28/2013 02:23 PM, Josef Bacik wrote:
>>>>>>> > >>> > > On Thu, Jan 03, 2013 at 11:44:46AM -0700, Jim Schutt wrote:
>>>>>>>>> > >>>> > >> Hi Josef,
>>>>>>>>> > >>>> > >>
>>>>>>>>> > >>>> > >> Thanks for the patch - sorry for the long delay in testing...
>>>>>>>>> > >>>> > >>
>>>>>>> > >>> > > 
>>>>>>> > >>> > > Jim,
>>>>>>> > >>> > > 
>>>>>>> > >>> > > I've been trying to reason out how this happens, could you do a btrfs fi df on
>>>>>>> > >>> > > the filesystem thats giving you trouble so I can see if what I think is
>>>>>>> > >>> > > happening is what's actually happening.  Thanks,
>>>>> > >> > 
>>>>> > >> > Here's an example, using a slightly different kernel than
>>>>> > >> > my previous report.  It's your btrfs-next master branch
>>>>> > >> > (commit 8f139e59d5 "Btrfs: use bit operation for ->fs_state")
>>>>> > >> > with ceph 3.8 for-linus (commit 0fa6ebc600 from linus' tree).
>>>>> > >> > 
>>>>> > >> > 
>>>>> > >> > Here I'm finding the file system in question:
>>>>> > >> > 
>>>>> > >> > # ls -l /dev/mapper | grep dm-93
>>>>> > >> > lrwxrwxrwx 1 root root       8 Jan 29 11:13 cs53s19p2 -> ../dm-93
>>>>> > >> > 
>>>>> > >> > # df -h | grep -A 1 cs53s19p2
>>>>> > >> > /dev/mapper/cs53s19p2
>>>>> > >> >                       896G  1.1G  896G   1% /ram/mnt/ceph/data.osd.522
>>>>> > >> > 
>>>>> > >> > 
>>>>> > >> > Here's the info you asked for:
>>>>> > >> > 
>>>>> > >> > # btrfs fi df /ram/mnt/ceph/data.osd.522
>>>>> > >> > Data: total=2.01GB, used=1.00GB
>>>>> > >> > System: total=4.00MB, used=64.00KB
>>>>> > >> > Metadata: total=8.00MB, used=7.56MB
>>>>> > >> > 
>>> > > How big is the disk you are using, and what mount options?  I have a patch to
>>> > > keep the panic from happening and hopefully the abort, could you try this?  I
>>> > > still want to keep the underlying error from happening because it shouldn't be,
>>> > > but no reason I can't fix the error case while you can easily reproduce it :).
>>> > > Thanks,
>>> > > 
>>> > > Josef
>>> > > 
>>> > >>From c50b725c74c7d39064e553ef85ac9753efbd8aec Mon Sep 17 00:00:00 2001
>>> > > From: Josef Bacik <jbacik@fusionio.com>
>>> > > Date: Tue, 29 Jan 2013 15:03:37 -0500
>>> > > Subject: [PATCH] Btrfs: fix chunk allocation error handling
>>> > > 
>>> > > If we error out allocating a dev extent we will have already created the
>>> > > block group and such which will cause problems since the allocator may have
>>> > > tried to allocate out of the block group that no longer exists.  This will
>>> > > cause BUG_ON()'s in the bio submission path.  This also makes a failure to
>>> > > allocate a dev extent a non-abort error, we will just clean up the dev
>>> > > extents we did allocate and exit.  Now if we fail to delete the dev extents
>>> > > we will abort since we can't have half of the dev extents hanging around,
>>> > > but this will make us much less likely to abort.  Thanks,
>>> > > 
>>> > > Signed-off-by: Josef Bacik <jbacik@fusionio.com>
>>> > > ---
>> > 
>> > Interesting - with your patch applied I triggered the following, just
>> > bringing up a fresh Ceph filesystem - I didn't even get a chance to
>> > mount it on my Ceph clients:
>> > 
> Ok can you give this patch a whirl as well?  It seems to fix the problem for me.

With this patch on top of your previous patch, after several trials of
my test I am also unable to reproduce the issue.  Since I had been
having trouble first time, every time, I think it also seems to fix
the problem for me.

Thanks again!

-- Jim

> Thanks,
> 
> Josef


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Josef Bacik Jan. 30, 2013, 9:55 p.m. UTC | #2
On Wed, Jan 30, 2013 at 02:37:40PM -0700, Jim Schutt wrote:
> On 01/30/2013 09:38 AM, Josef Bacik wrote:
> > On Tue, Jan 29, 2013 at 04:05:17PM -0700, Jim Schutt wrote:
> >> > On 01/29/2013 01:04 PM, Josef Bacik wrote:
> >>> > > On Tue, Jan 29, 2013 at 11:41:10AM -0700, Jim Schutt wrote:
> >>>>> > >> > On 01/28/2013 02:23 PM, Josef Bacik wrote:
> >>>>>>> > >>> > > On Thu, Jan 03, 2013 at 11:44:46AM -0700, Jim Schutt wrote:
> >>>>>>>>> > >>>> > >> Hi Josef,
> >>>>>>>>> > >>>> > >>
> >>>>>>>>> > >>>> > >> Thanks for the patch - sorry for the long delay in testing...
> >>>>>>>>> > >>>> > >>
> >>>>>>> > >>> > > 
> >>>>>>> > >>> > > Jim,
> >>>>>>> > >>> > > 
> >>>>>>> > >>> > > I've been trying to reason out how this happens, could you do a btrfs fi df on
> >>>>>>> > >>> > > the filesystem thats giving you trouble so I can see if what I think is
> >>>>>>> > >>> > > happening is what's actually happening.  Thanks,
> >>>>> > >> > 
> >>>>> > >> > Here's an example, using a slightly different kernel than
> >>>>> > >> > my previous report.  It's your btrfs-next master branch
> >>>>> > >> > (commit 8f139e59d5 "Btrfs: use bit operation for ->fs_state")
> >>>>> > >> > with ceph 3.8 for-linus (commit 0fa6ebc600 from linus' tree).
> >>>>> > >> > 
> >>>>> > >> > 
> >>>>> > >> > Here I'm finding the file system in question:
> >>>>> > >> > 
> >>>>> > >> > # ls -l /dev/mapper | grep dm-93
> >>>>> > >> > lrwxrwxrwx 1 root root       8 Jan 29 11:13 cs53s19p2 -> ../dm-93
> >>>>> > >> > 
> >>>>> > >> > # df -h | grep -A 1 cs53s19p2
> >>>>> > >> > /dev/mapper/cs53s19p2
> >>>>> > >> >                       896G  1.1G  896G   1% /ram/mnt/ceph/data.osd.522
> >>>>> > >> > 
> >>>>> > >> > 
> >>>>> > >> > Here's the info you asked for:
> >>>>> > >> > 
> >>>>> > >> > # btrfs fi df /ram/mnt/ceph/data.osd.522
> >>>>> > >> > Data: total=2.01GB, used=1.00GB
> >>>>> > >> > System: total=4.00MB, used=64.00KB
> >>>>> > >> > Metadata: total=8.00MB, used=7.56MB
> >>>>> > >> > 
> >>> > > How big is the disk you are using, and what mount options?  I have a patch to
> >>> > > keep the panic from happening and hopefully the abort, could you try this?  I
> >>> > > still want to keep the underlying error from happening because it shouldn't be,
> >>> > > but no reason I can't fix the error case while you can easily reproduce it :).
> >>> > > Thanks,
> >>> > > 
> >>> > > Josef
> >>> > > 
> >>> > >>From c50b725c74c7d39064e553ef85ac9753efbd8aec Mon Sep 17 00:00:00 2001
> >>> > > From: Josef Bacik <jbacik@fusionio.com>
> >>> > > Date: Tue, 29 Jan 2013 15:03:37 -0500
> >>> > > Subject: [PATCH] Btrfs: fix chunk allocation error handling
> >>> > > 
> >>> > > If we error out allocating a dev extent we will have already created the
> >>> > > block group and such which will cause problems since the allocator may have
> >>> > > tried to allocate out of the block group that no longer exists.  This will
> >>> > > cause BUG_ON()'s in the bio submission path.  This also makes a failure to
> >>> > > allocate a dev extent a non-abort error, we will just clean up the dev
> >>> > > extents we did allocate and exit.  Now if we fail to delete the dev extents
> >>> > > we will abort since we can't have half of the dev extents hanging around,
> >>> > > but this will make us much less likely to abort.  Thanks,
> >>> > > 
> >>> > > Signed-off-by: Josef Bacik <jbacik@fusionio.com>
> >>> > > ---
> >> > 
> >> > Interesting - with your patch applied I triggered the following, just
> >> > bringing up a fresh Ceph filesystem - I didn't even get a chance to
> >> > mount it on my Ceph clients:
> >> > 
> > Ok can you give this patch a whirl as well?  It seems to fix the problem for me.
> 
> With this patch on top of your previous patch, after several trials of
> my test I am also unable to reproduce the issue.  Since I had been
> having trouble first time, every time, I think it also seems to fix
> the problem for me.
> 
> Thanks again!
> 

Awesome thanks for testing!

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index dca5679..874bcf2 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3677,8 +3677,18 @@  static int can_overcommit(struct btrfs_root *root,
 	u64 used;
 
 	used = space_info->bytes_used + space_info->bytes_reserved +
-		space_info->bytes_pinned + space_info->bytes_readonly +
-		space_info->bytes_may_use;
+		space_info->bytes_pinned + space_info->bytes_readonly;
+
+	/*
+	 * We only want to allow over committing if we have lots of actual space
+	 * free, but if we've tied up more than 80% of the space with actual
+	 * space reservation (not including bytes we _might_ use) then don't
+	 * allow overcommitting as it will just make things go badly for us.
+	 */
+	if (used > div_factor(space_info->total_bytes, 8))
+		return 0;
+
+	used += space_info->bytes_may_use;
 
 	spin_lock(&root->fs_info->free_chunk_lock);
 	avail = root->fs_info->free_chunk_space;