From patchwork Wed Jan 30 16:38:32 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Josef Bacik X-Patchwork-Id: 2068701 Return-Path: X-Original-To: patchwork-linux-btrfs@patchwork.kernel.org Delivered-To: patchwork-process-083081@patchwork2.kernel.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by patchwork2.kernel.org (Postfix) with ESMTP id 2D882DF264 for ; Wed, 30 Jan 2013 16:38:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754368Ab3A3Qig (ORCPT ); Wed, 30 Jan 2013 11:38:36 -0500 Received: from mx2.fusionio.com ([66.114.96.31]:39699 "EHLO mx2.fusionio.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753919Ab3A3Qig (ORCPT ); Wed, 30 Jan 2013 11:38:36 -0500 X-ASG-Debug-ID: 1359563914-0421b503d55dff0001-6jHSXT Received: from mail1.int.fusionio.com (mail1.int.fusionio.com [10.101.1.21]) by mx2.fusionio.com with ESMTP id 9FbrCtiqaXfTUtBV (version=TLSv1 cipher=AES128-SHA bits=128 verify=NO); Wed, 30 Jan 2013 09:38:34 -0700 (MST) X-Barracuda-Envelope-From: JBacik@fusionio.com Received: from localhost (98.26.82.158) by mail.fusionio.com (10.101.1.19) with Microsoft SMTP Server (TLS) id 8.3.83.0; Wed, 30 Jan 2013 09:38:34 -0700 Date: Wed, 30 Jan 2013 11:38:32 -0500 From: Josef Bacik To: Jim Schutt CC: Josef Bacik , Liu Bo , "linux-btrfs@vger.kernel.org" Subject: Re: [PATCH] Btrfs: fix a deadlock on chunk mutex Message-ID: <20130130163832.GH3660@localhost.localdomain> X-ASG-Orig-Subj: Re: [PATCH] Btrfs: fix a deadlock on chunk mutex References: <1355363557-2962-1-git-send-email-bo.li.liu@oracle.com> <20121218135242.GC2403@localhost.localdomain> <50E5D19E.3060406@sandia.gov> <20130128212331.GG3257@localhost.localdomain> <510817C6.5070007@sandia.gov> <20130129200415.GE3660@localhost.localdomain> <510855AD.2020602@sandia.gov> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <510855AD.2020602@sandia.gov> User-Agent: Mutt/1.5.21 (2011-07-01) X-Barracuda-Connect: mail1.int.fusionio.com[10.101.1.21] X-Barracuda-Start-Time: 1359563914 X-Barracuda-Encrypted: AES128-SHA X-Barracuda-URL: http://10.101.1.181:8000/cgi-mod/mark.cgi X-Virus-Scanned: by bsmtpd at fusionio.com X-Barracuda-Spam-Score: 0.00 X-Barracuda-Spam-Status: No, SCORE=0.00 using global scores of TAG_LEVEL=1000.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=9.0 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.121340 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org On Tue, Jan 29, 2013 at 04:05:17PM -0700, Jim Schutt wrote: > On 01/29/2013 01:04 PM, Josef Bacik wrote: > > On Tue, Jan 29, 2013 at 11:41:10AM -0700, Jim Schutt wrote: > >> > On 01/28/2013 02:23 PM, Josef Bacik wrote: > >>> > > On Thu, Jan 03, 2013 at 11:44:46AM -0700, Jim Schutt wrote: > >>>> > >> Hi Josef, > >>>> > >> > >>>> > >> Thanks for the patch - sorry for the long delay in testing... > >>>> > >> > >>> > > > >>> > > Jim, > >>> > > > >>> > > I've been trying to reason out how this happens, could you do a btrfs fi df on > >>> > > the filesystem thats giving you trouble so I can see if what I think is > >>> > > happening is what's actually happening. Thanks, > >> > > >> > Here's an example, using a slightly different kernel than > >> > my previous report. It's your btrfs-next master branch > >> > (commit 8f139e59d5 "Btrfs: use bit operation for ->fs_state") > >> > with ceph 3.8 for-linus (commit 0fa6ebc600 from linus' tree). > >> > > >> > > >> > Here I'm finding the file system in question: > >> > > >> > # ls -l /dev/mapper | grep dm-93 > >> > lrwxrwxrwx 1 root root 8 Jan 29 11:13 cs53s19p2 -> ../dm-93 > >> > > >> > # df -h | grep -A 1 cs53s19p2 > >> > /dev/mapper/cs53s19p2 > >> > 896G 1.1G 896G 1% /ram/mnt/ceph/data.osd.522 > >> > > >> > > >> > Here's the info you asked for: > >> > > >> > # btrfs fi df /ram/mnt/ceph/data.osd.522 > >> > Data: total=2.01GB, used=1.00GB > >> > System: total=4.00MB, used=64.00KB > >> > Metadata: total=8.00MB, used=7.56MB > >> > > > How big is the disk you are using, and what mount options? I have a patch to > > keep the panic from happening and hopefully the abort, could you try this? I > > still want to keep the underlying error from happening because it shouldn't be, > > but no reason I can't fix the error case while you can easily reproduce it :). > > Thanks, > > > > Josef > > > >>From c50b725c74c7d39064e553ef85ac9753efbd8aec Mon Sep 17 00:00:00 2001 > > From: Josef Bacik > > Date: Tue, 29 Jan 2013 15:03:37 -0500 > > Subject: [PATCH] Btrfs: fix chunk allocation error handling > > > > If we error out allocating a dev extent we will have already created the > > block group and such which will cause problems since the allocator may have > > tried to allocate out of the block group that no longer exists. This will > > cause BUG_ON()'s in the bio submission path. This also makes a failure to > > allocate a dev extent a non-abort error, we will just clean up the dev > > extents we did allocate and exit. Now if we fail to delete the dev extents > > we will abort since we can't have half of the dev extents hanging around, > > but this will make us much less likely to abort. Thanks, > > > > Signed-off-by: Josef Bacik > > --- > > Interesting - with your patch applied I triggered the following, just > bringing up a fresh Ceph filesystem - I didn't even get a chance to > mount it on my Ceph clients: > Ok can you give this patch a whirl as well? It seems to fix the problem for me. Thanks, Josef --- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index dca5679..874bcf2 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -3677,8 +3677,18 @@ static int can_overcommit(struct btrfs_root *root, u64 used; used = space_info->bytes_used + space_info->bytes_reserved + - space_info->bytes_pinned + space_info->bytes_readonly + - space_info->bytes_may_use; + space_info->bytes_pinned + space_info->bytes_readonly; + + /* + * We only want to allow over committing if we have lots of actual space + * free, but if we've tied up more than 80% of the space with actual + * space reservation (not including bytes we _might_ use) then don't + * allow overcommitting as it will just make things go badly for us. + */ + if (used > div_factor(space_info->total_bytes, 8)) + return 0; + + used += space_info->bytes_may_use; spin_lock(&root->fs_info->free_chunk_lock); avail = root->fs_info->free_chunk_space;