From patchwork Sat Mar 1 10:57:03 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Filipe Manana X-Patchwork-Id: 3746731 Return-Path: X-Original-To: patchwork-linux-btrfs@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork2.web.kernel.org (Postfix) with ESMTP id 973ACBF13A for ; Sat, 1 Mar 2014 10:57:17 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 9BDC720328 for ; Sat, 1 Mar 2014 10:57:16 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 8B1152034B for ; Sat, 1 Mar 2014 10:57:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752431AbaCAK5M (ORCPT ); Sat, 1 Mar 2014 05:57:12 -0500 Received: from mail-wi0-f179.google.com ([209.85.212.179]:42380 "EHLO mail-wi0-f179.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752157AbaCAK5M (ORCPT ); Sat, 1 Mar 2014 05:57:12 -0500 Received: by mail-wi0-f179.google.com with SMTP id bs8so1521492wib.12 for ; Sat, 01 Mar 2014 02:57:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:to:cc:subject:date:message-id; bh=TUkc6VFd7iSRDjAk9H9h4yHx3Vi0OuwLQt+OturrGe0=; b=VEUZ1Oz3E7cyHAolyuy4PFqLNHE9OU1aAMY6nZpRRkPIDwqxyGuDl4boENnD/NQ2py izqB7gQVztUifcj7nXFZIuriqUUr00uxzCy0wvOT51YZ6TxC7HYCgxOI2YdmCELt1TYP Wub5qvRTrflqIHB7cI3Ew+3MxcYg1lHj12d59gbP8F0lqfsXIOTXV57jGAAtZ37kjvyj jjVEoHDupahdMa3mtyezx/WM4RFarctiaUrAX2A0O46ySeATCuZfct4g7kgTU5ZG4NjA DV4Q8+nla+DZuHN2M8gKJfPTJMBzRWH/Ve/2HlJoaqO2WtKBeNR4m87yAhtuylcuq+fG glag== X-Received: by 10.180.7.227 with SMTP id m3mr6456869wia.59.1393671430599; Sat, 01 Mar 2014 02:57:10 -0800 (PST) Received: from storm-desktop.lan (bl5-78-108.dsl.telepac.pt. [82.154.78.108]) by mx.google.com with ESMTPSA id ff7sm14476895wic.10.2014.03.01.02.57.09 for (version=TLSv1.1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Sat, 01 Mar 2014 02:57:09 -0800 (PST) From: Filipe David Borba Manana To: xfs@oss.sgi.com Cc: linux-btrfs@vger.kernel.org, Filipe David Borba Manana Subject: [PATCH] Btrfs: make defrag not fragment files when using prealloc extents Date: Sat, 1 Mar 2014 10:57:03 +0000 Message-Id: <1393671423-7349-1-git-send-email-fdmanana@gmail.com> X-Mailer: git-send-email 1.7.9.5 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Spam-Status: No, score=-6.8 required=5.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED, DKIM_SIGNED, FREEMAIL_FROM, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD, T_DKIM_INVALID, UNPARSEABLE_RELAY autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP When using prealloc extents, a file defragment operation may actually fragment the file and increase the amount of data space used by the file. This change fixes that behaviour. Example: $ mkfs.btrfs -f /dev/sdb3 $ mount /dev/sdb3 /mnt $ cd /mnt $ xfs_io -f -c 'falloc 0 1048576' foobar && sync $ xfs_io -c 'pwrite -S 0xff -b 100000 5000 100000' foobar $ xfs_io -c 'pwrite -S 0xac -b 100000 200000 100000' foobar $ xfs_io -c 'pwrite -S 0xe1 -b 100000 900000 100000' foobar && sync Before defragmenting the file: $ btrfs filesystem df /mnt Data, single: total=8.00MiB, used=1.25MiB System, DUP: total=8.00MiB, used=16.00KiB System, single: total=4.00MiB, used=0.00 Metadata, DUP: total=1.00GiB, used=112.00KiB Metadata, single: total=8.00MiB, used=0.00 $ btrfs-debug-tree /dev/sdb3 (...) item 6 key (257 EXTENT_DATA 0) itemoff 15810 itemsize 53 prealloc data disk byte 12845056 nr 1048576 prealloc data offset 0 nr 4096 item 7 key (257 EXTENT_DATA 4096) itemoff 15757 itemsize 53 extent data disk byte 12845056 nr 1048576 extent data offset 4096 nr 102400 ram 1048576 extent compression 0 item 8 key (257 EXTENT_DATA 106496) itemoff 15704 itemsize 53 prealloc data disk byte 12845056 nr 1048576 prealloc data offset 106496 nr 90112 item 9 key (257 EXTENT_DATA 196608) itemoff 15651 itemsize 53 extent data disk byte 12845056 nr 1048576 extent data offset 196608 nr 106496 ram 1048576 extent compression 0 item 10 key (257 EXTENT_DATA 303104) itemoff 15598 itemsize 53 prealloc data disk byte 12845056 nr 1048576 prealloc data offset 303104 nr 593920 item 11 key (257 EXTENT_DATA 897024) itemoff 15545 itemsize 53 extent data disk byte 12845056 nr 1048576 extent data offset 897024 nr 106496 ram 1048576 extent compression 0 item 12 key (257 EXTENT_DATA 1003520) itemoff 15492 itemsize 53 prealloc data disk byte 12845056 nr 1048576 prealloc data offset 1003520 nr 45056 (...) Now defragmenting the file results in more data space used than before: $ btrfs filesystem defragment -f foobar && sync $ btrfs filesystem df /mnt Data, single: total=8.00MiB, used=1.55MiB System, DUP: total=8.00MiB, used=16.00KiB System, single: total=4.00MiB, used=0.00 Metadata, DUP: total=1.00GiB, used=112.00KiB Metadata, single: total=8.00MiB, used=0.00 And the corresponding file extent items are now no longer perfectly sequential as before, and we're now needlessly using more space from data block groups: $ btrfs-debug-tree /dev/sdb3 (...) item 6 key (257 EXTENT_DATA 0) itemoff 15810 itemsize 53 extent data disk byte 12845056 nr 1048576 extent data offset 0 nr 4096 ram 1048576 extent compression 0 item 7 key (257 EXTENT_DATA 4096) itemoff 15757 itemsize 53 extent data disk byte 13893632 nr 102400 extent data offset 0 nr 102400 ram 102400 extent compression 0 item 8 key (257 EXTENT_DATA 106496) itemoff 15704 itemsize 53 extent data disk byte 12845056 nr 1048576 extent data offset 106496 nr 90112 ram 1048576 extent compression 0 item 9 key (257 EXTENT_DATA 196608) itemoff 15651 itemsize 53 extent data disk byte 13996032 nr 106496 extent data offset 0 nr 106496 ram 106496 extent compression 0 item 10 key (257 EXTENT_DATA 303104) itemoff 15598 itemsize 53 prealloc data disk byte 12845056 nr 1048576 prealloc data offset 303104 nr 593920 item 11 key (257 EXTENT_DATA 897024) itemoff 15545 itemsize 53 extent data disk byte 14102528 nr 106496 extent data offset 0 nr 106496 ram 106496 extent compression 0 item 12 key (257 EXTENT_DATA 1003520) itemoff 15492 itemsize 53 extent data disk byte 12845056 nr 1048576 extent data offset 1003520 nr 45056 ram 1048576 extent compression 0 (...) With this change, the above example will no longer cause allocation of new data space nor change the sequentiality of the file extents, that is, defragment will be effectless, leaving all extent items pointing to the extent starting at disk byte 12845056. In a 20Gb filesystem I had, mounted with the autodefrag option and 20 files of 400Mb each, initially consisting of a single prealloc extent of 400Mb, having random writes happening at a low rate, lead to a total of over ~17Gb of data space used, not far from eventually reaching an ENOSPC state. Signed-off-by: Filipe David Borba Manana --- fs/btrfs/ioctl.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index f914b5d..1ae45bd 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -983,7 +983,8 @@ static bool defrag_check_next_extent(struct inode *inode, struct extent_map *em) return false; next = defrag_lookup_extent(inode, em->start + em->len); - if (!next || next->block_start >= EXTENT_MAP_LAST_BYTE) + if (!next || next->block_start >= EXTENT_MAP_LAST_BYTE || + (em->block_start + em->block_len == next->block_start)) ret = false; free_extent_map(next);