From patchwork Thu May 18 21:47:24 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Liu Bo X-Patchwork-Id: 9735279 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 95353600CC for ; Thu, 18 May 2017 21:50:22 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 80487287CA for ; Thu, 18 May 2017 21:50:22 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 742F628A28; Thu, 18 May 2017 21:50:22 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D41E3287CA for ; Thu, 18 May 2017 21:50:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756167AbdERVuT (ORCPT ); Thu, 18 May 2017 17:50:19 -0400 Received: from aserp1040.oracle.com ([141.146.126.69]:18228 "EHLO aserp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754479AbdERVuS (ORCPT ); Thu, 18 May 2017 17:50:18 -0400 Received: from aserv0021.oracle.com (aserv0021.oracle.com [141.146.126.233]) by aserp1040.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id v4ILnDU2019491 (version=TLSv1 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Thu, 18 May 2017 21:49:13 GMT Received: from userv0122.oracle.com (userv0122.oracle.com [156.151.31.75]) by aserv0021.oracle.com (8.13.8/8.14.4) with ESMTP id v4ILnDh8017827 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Thu, 18 May 2017 21:49:13 GMT Received: from abhmp0016.oracle.com (abhmp0016.oracle.com [141.146.116.22]) by userv0122.oracle.com (8.14.4/8.14.4) with ESMTP id v4ILnCYp026994; Thu, 18 May 2017 21:49:12 GMT Received: from lim.localdomain (/10.159.245.252) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Thu, 18 May 2017 14:49:12 -0700 Date: Thu, 18 May 2017 14:47:24 -0700 From: Liu Bo To: Nikolay Borisov Cc: linux-btrfs , jeffm@suse.com Subject: Re: btrfs metadata reclaim behavior/performance characteristics Message-ID: <20170518214724.GA10554@lim.localdomain> Reply-To: bo.li.liu@oracle.com References: <20170518144532.GA28854@lim.localdomain> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.8.2 (2017-04-18) X-Source-IP: aserv0021.oracle.com [141.146.126.233] Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On Thu, May 18, 2017 at 06:41:08PM +0300, Nikolay Borisov wrote: > > > On 18.05.2017 17:45, Liu Bo wrote: > > On Thu, May 18, 2017 at 11:40:05AM +0300, Nikolay Borisov wrote: > > > > > > > Just some random thoughts here. > > > > Hmm, not sure if this matters, but fstests now doesn't set --mixed even if the > > disk size is as small as 256mb. So are you testing a mixed btrfs or not? > > You are right that fstest only sets mixed mode in case the filesystem is > less than or equal to 100mb. IN this case the fs is 256mb which means > mixed mode _is not_ set. > > > > > So now we've observed there're too many 'commit transaction' happening, I think it's because via commiting transaction it doesn't reclaim enough metadata space, esp. looks like space->bytes_may_use is not reduced somehow. > > You've given me the idea to basically compare the state of the various > space_info counters after each transaction commit. Before and after the > ticketed work. Also what makes you believe it's the commit transaction > itself not freeing enough memory and not some of the other, "cheaper" > flush states? > Firstly, commit transaction is a checkpoint, where all queued delayed refs are supposed to be processed and this's one of the factors to update space_info counters (bytes_may_use -> reserve, pinned -> bytes_used), secondly, it's the last resort, if any cheaper flush works, ie. offers enough space to tickets, it won't commit, will it? > > > > The metadata space_info->bytes_may_use may come from: > > 1) 1K file with buffered IO ends up living in btree leaf, so it will contribute to the number, > > 2) if it's mixed btrfs, then 1k file with direct IO ends up with creating a 4k extent in mixed block group. > > 3) while writing 1k files, metadata is reserved to make it run, and when to release depends on writeback (in the buffered IO case) or endio (in the direct IO case) > > > > When running several commit transaction concurrently, if one has entered TRANS_STATE_COMMIT_START state, others just wait there, have you observed that if each commit transaction actually writes superblock in the end? > > Haven't gone that far, will have to instrument the code to confirm this. > What exactly should writing the superblock reveal? > On a second thinking, it shouldn't matter whether they go to the part of writing superblock. Commit transaction is costly for sure, I think we're trying to find why there're so many commit transaction with ticket patch set. In btrfs_async_reclaim_metadata_space(), flush_state++ happens when one ticket is not satisfied, and we do commit transaction only if there're enough pinned bytes, but what I don't understand is the link shows 'pinned' is 0 and I spotted something wrong (shown in the included patch). Thanks, -liubo From: Liu Bo Subject: [PATCH] Btrfs: skip commit transaction if we don't have enough pinned bytes We commit transaction in order to reclaim space from pinned bytes because it could process delayed refs, and in may_commit_transaction(), we check first if pinned bytes are enough for the required space, we then check if that plus bytes reserved for delayed insert are enough for the required space. This changes the code to the above logic. Signed-off-by: Liu Bo --- fs/btrfs/extent-tree.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index e390451c72e6..bded1ddd1bb6 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -4837,7 +4837,7 @@ static int may_commit_transaction(struct btrfs_fs_info *fs_info, spin_lock(&delayed_rsv->lock); if (percpu_counter_compare(&space_info->total_bytes_pinned, - bytes - delayed_rsv->size) >= 0) { + bytes - delayed_rsv->size) < 0) { spin_unlock(&delayed_rsv->lock); return -ENOSPC; }