From patchwork Tue Mar  1 07:24:03 2016
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Qu Wenruo <quwenruo@cn.fujitsu.com>
X-Patchwork-Id: 8462671
Return-Path: <linux-btrfs-owner@kernel.org>
X-Original-To: patchwork-linux-btrfs@patchwork.kernel.org
Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org
Received: from mail.kernel.org (mail.kernel.org [198.145.29.136])
	by patchwork2.web.kernel.org (Postfix) with ESMTP id 9206BC0553
	for <patchwork-linux-btrfs@patchwork.kernel.org>;
	Tue,  1 Mar 2016 07:24:48 +0000 (UTC)
Received: from mail.kernel.org (localhost [127.0.0.1])
	by mail.kernel.org (Postfix) with ESMTP id 804AB201DD
	for <patchwork-linux-btrfs@patchwork.kernel.org>;
	Tue,  1 Mar 2016 07:24:47 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.kernel.org (Postfix) with ESMTP id 3E36F202E5
	for <patchwork-linux-btrfs@patchwork.kernel.org>;
	Tue,  1 Mar 2016 07:24:46 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1751456AbcCAHYQ (ORCPT
	<rfc822;patchwork-linux-btrfs@patchwork.kernel.org>);
	Tue, 1 Mar 2016 02:24:16 -0500
Received: from cn.fujitsu.com ([59.151.112.132]:59078 "EHLO
	heian.cn.fujitsu.com" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org
	with ESMTP id S1750823AbcCAHYP (ORCPT
	<rfc822; linux-btrfs@vger.kernel.org>); Tue, 1 Mar 2016 02:24:15 -0500
X-IronPort-AV: E=Sophos;i="5.22,518,1449504000";
	d="scan'208,223";a="4059218"
Received: from unknown (HELO cn.fujitsu.com) ([10.167.33.5])
	by heian.cn.fujitsu.com with ESMTP; 01 Mar 2016 15:24:09 +0800
Received: from G08CNEXCHPEKD02.g08.fujitsu.local (unknown [10.167.33.83])
	by cn.fujitsu.com (Postfix) with ESMTP id 0151E42AC851;
	Tue,  1 Mar 2016 15:24:05 +0800 (CST)
Received: from [172.16.0.100] (10.167.226.34) by
	G08CNEXCHPEKD02.g08.fujitsu.local (10.167.33.89) with Microsoft SMTP
	Server (TLS) id 14.3.181.6; Tue, 1 Mar 2016 15:24:04 +0800
Subject: Re: Again, no space left on device while rebalancing and recipe
	doesnt work
To: Marc Haber <mh+linux-btrfs@zugschlus.de>
References: <20160227211450.GS26042@torres.zugschlus.de>
	<56D3A56A.20809@cn.fujitsu.com>
	<20160229153352.GE2334@torres.zugschlus.de>
	<56D4E621.3010604@cn.fujitsu.com>
	<20160301065448.GJ2334@torres.zugschlus.de>
CC: <linux-btrfs@vger.kernel.org>
From: Qu Wenruo <quwenruo@cn.fujitsu.com>
Message-ID: <56D54393.8060307@cn.fujitsu.com>
Date: Tue, 1 Mar 2016 15:24:03 +0800
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101
	Thunderbird/38.6.0
MIME-Version: 1.0
In-Reply-To: <20160301065448.GJ2334@torres.zugschlus.de>
X-Originating-IP: [10.167.226.34]
X-yoursite-MailScanner-ID: 0151E42AC851.AF01F
X-yoursite-MailScanner: Found to be clean
X-yoursite-MailScanner-From: quwenruo@cn.fujitsu.com
X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI,
	RP_MATCHES_RCVD, T_TVD_MIME_EPI,
	UNPARSEABLE_RELAY autolearn=ham version=3.3.1
Sender: linux-btrfs-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-btrfs.vger.kernel.org>
X-Mailing-List: linux-btrfs@vger.kernel.org
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

Marc Haber wrote on 2016/03/01 07:54 +0100:
> On Tue, Mar 01, 2016 at 08:45:21AM +0800, Qu Wenruo wrote:
>> Didn't see the attachment though, seems to be filtered by maillist police.
>
> Trying again.

OK, I got the attachment.

And, surprisingly, btrfs balance on data chunk works without problem, 
but it fails on plain btrfs balance command.

>
>>> I now have a kworker and a btfs-transact kernel process taking most of
>>> one CPU core each, even after the userspace programs have terminated.
>>> Is there a way to find out what these threads are actually doing?
>>
>> Did btrfs balance status gives any hint?
>
> It says 'No balance found on /mnt/fanbtr'. I do have a second btrfs on
> the box, which is acting up as well (it has a five digit number of
> snapshots, and deleting a single snapshot takes about five to ten
> minutes. I was planning to write another mailing list article once
> this balance issue is through).

I assume the large number of snapshots is related to the high CPU usage.
As so many snapshots will make btrfs take so much time to calculate its 
backref, and the backtrace seems to prove that.

I'd like to remove unused snapshots and keep the number of them to 4 
digits, as a workaround.

But still not sure if it's related to the ENOSPC problem.

It would provide great help if you can modify your kernel and add the 
following debug: (same as attachment)

------
 From f2cc7af0aea659a522b97d3776b719f14532bce9 Mon Sep 17 00:00:00 2001
From: Qu Wenruo <quwenruo@cn.fujitsu.com>
Date: Tue, 1 Mar 2016 15:21:18 +0800
Subject: [PATCH] btrfs: debug patch

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
  fs/btrfs/extent-tree.c | 15 +++++++++++++--
  1 file changed, 13 insertions(+), 2 deletions(-)


@@ -9419,6 +9421,11 @@ int btrfs_can_relocate(struct btrfs_root *root, 
u64 bytenr)
  	     space_info->bytes_pinned + space_info->bytes_readonly +
  	     min_free < space_info->total_bytes)) {
  		spin_unlock(&space_info->lock);
+		pr_info("no space: total:%llu, bg_len:%llu, used:%llu, reseved:%llu, 
pinned:%llu, ro:%llu, min_free:%llu\n",
+			space_info->total_bytes, block_group->key.offset,
+			space_info->bytes_used, space_info->bytes_reserved,
+			space_info->bytes_pinned, space_info->bytes_readonly,
+			min_free);
  		goto out;
  	}
  	spin_unlock(&space_info->lock);
@@ -9448,8 +9455,10 @@ int btrfs_can_relocate(struct btrfs_root *root, 
u64 bytenr)
  		 * this is just a balance, so if we were marked as full
  		 * we know there is no space for a new chunk
  		 */
-		if (full)
+		if (full) {
+			pr_info("space full\n");
  			goto out;
+		}

  		index = get_block_group_index(block_group);
  	}
@@ -9496,6 +9505,8 @@ int btrfs_can_relocate(struct btrfs_root *root, 
u64 bytenr)
  			ret = -1;
  		}
  	}
+	if (ret == -1)
+		pr_info("no new chunk allocatable\n");
  	mutex_unlock(&root->fs_info->chunk_mutex);
  	btrfs_end_transaction(trans, root);
  out:

From f2cc7af0aea659a522b97d3776b719f14532bce9 Mon Sep 17 00:00:00 2001
From: Qu Wenruo <quwenruo@cn.fujitsu.com>
Date: Tue, 1 Mar 2016 15:21:18 +0800
Subject: [PATCH] btrfs: debug patch

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 fs/btrfs/extent-tree.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 083783b..70b284b 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -9393,8 +9393,10 @@ int btrfs_can_relocate(struct btrfs_root *root, u64 bytenr)
 	block_group = btrfs_lookup_block_group(root->fs_info, bytenr);
 
 	/* odd, couldn't find the block group, leave it alone */
-	if (!block_group)
+	if (!block_group) {
+		pr_info("no such chunk: %llu\n", bytenr);
 		return -1;
+	}
 
 	min_free = btrfs_block_group_used(&block_group->item);
 
@@ -9419,6 +9421,11 @@ int btrfs_can_relocate(struct btrfs_root *root, u64 bytenr)
 	     space_info->bytes_pinned + space_info->bytes_readonly +
 	     min_free < space_info->total_bytes)) {
 		spin_unlock(&space_info->lock);
+		pr_info("no space: total:%llu, bg_len:%llu, used:%llu, reseved:%llu, pinned:%llu, ro:%llu, min_free:%llu\n",
+			space_info->total_bytes, block_group->key.offset,
+			space_info->bytes_used, space_info->bytes_reserved,
+			space_info->bytes_pinned, space_info->bytes_readonly,
+			min_free);
 		goto out;
 	}
 	spin_unlock(&space_info->lock);
@@ -9448,8 +9455,10 @@ int btrfs_can_relocate(struct btrfs_root *root, u64 bytenr)
 		 * this is just a balance, so if we were marked as full
 		 * we know there is no space for a new chunk
 		 */
-		if (full)
+		if (full) {
+			pr_info("space full\n");
 			goto out;
+		}
 
 		index = get_block_group_index(block_group);
 	}
@@ -9496,6 +9505,8 @@ int btrfs_can_relocate(struct btrfs_root *root, u64 bytenr)
 			ret = -1;
 		}
 	}
+	if (ret == -1)
+		pr_info("no new chunk allocatable\n");
 	mutex_unlock(&root->fs_info->chunk_mutex);
 	btrfs_end_transaction(trans, root);
 out:
-- 
2.7.2