[RFC] spaceman: physically move a regular inode

From: Dave Chinner <dchinner@redhat.com>

From: Dave Chinner <dchinner@redhat.com>

To be able to shrink a filesystem, we need to be able to physically
move an inode and all it's data and metadata from it's current
location to a new AG.  Add a command to spaceman to allow an inode
to be moved to a new AG.

This new command is not intended to be a perfect solution. I am not
trying to handle atomic movement of open files - this is intended to
be run as a maintenance operation on idle filesystem. If root
filesystems are the target, then this should be run via a rescue
environment that is not executing directly on the root fs. With
those caveats in place, we can do the entire inode move as a set of
non-destructive operations finalised by an atomic inode swap
without any needing special kernel support.

To ensure we move metadata such as BMBT blocks even if we don't need
to move data, we clone the data to a new inode that we've allocated
in the destination AG. This will result in new bmbt blocks being
allocated in the new location even though the data is not copied.
Attributes need to be copied one at a time from the original inode.

If data needs to be moved, then we use fallocate(UNSHARE) to create
a private copy of the range of data that needs to be moved in the
new inode. This will be allocated in the destination AG by normal
allocation policy.

Once the new inode has been finalised, use RENAME_EXCHANGE to swap
it into place and unlink the original inode to free up all the
resources it still pins.

There are many optimisations still possible to speed this up, but
the goal here is "functional" rather than "optimal". Performance can
be optimised once all the parts for a "empty the tail of the
filesystem before shrink" operation are implemented and solidly
tested.

This functionality has been smoke tested by creating a 32MB data
file with 4k extents and several hundred attributes:

$ cat test.sh
fname=/mnt/scratch/foo
xfs_io -f -c "pwrite 0 32m" -c sync $fname
for (( i=0; i < 4096 ; i++ )); do
	xfs_io -c "fpunch $((i * 8))k 4k" $fname
done

for (( i=0; i < 100 ; i++ )); do
	setfattr -n user.blah.$i.$i.blah -v blah.$i.$i.blah $fname
	setfattr -n user.foo.$i.$i.foo -v $i.cantbele.$i.ve.$i.tsnotbutter $fname
done
for (( i=0; i < 100 ; i++ )); do
	setfattr -n security.baz.$i.$i.baz -v wotchul$i$iookinat $fname
done

xfs_io -c stat -c "bmap -vp" -c "bmap -avp" $fname
xfs_spaceman -c "move_inode -a 22" /mnt/scratch/foo
xfs_io -c stat -c "bmap -vp" -c "bmap -avp" $fname
$

and the output looks something like:

$ sudo ./test.sh
....
fd.path = "/mnt/scratch/foo"
fd.flags = non-sync,non-direct,read-write
stat.ino = 133
/mnt/scratch/foo:
 EXT: FILE-OFFSET      BLOCK-RANGE       AG AG-OFFSET        TOTAL FLAGS
   0: [0..7]:          hole                                      8
   1: [8..15]:         208..215           0 (208..215)           8 000000
   2: [16..23]:        hole                                      8
   3: [24..31]:        224..231           0 (224..231)           8 000000
....
8189: [65512..65519]:  65712..65719       0 (65712..65719)       8 000000
8190: [65520..65527]:  hole                                      8
8191: [65528..65535]:  65728..65735       0 (65728..65735)       8 000000
mnt/scratch/foo:
 EXT: FILE-OFFSET      BLOCK-RANGE       AG AG-OFFSET        TOTAL FLAGS
   0: [0..7]:          392..399           0 (392..399)           8 000000
   1: [8..15]:         408..415           0 (408..415)           8 000000
   2: [16..23]:        424..431           0 (424..431)           8 000000
   3: [24..31]:        456..463           0 (456..463)           8 000000
move mnt /mnt/scratch, path /mnt/scratch/foo, agno 22
fd.path = "/mnt/scratch/foo"
fd.flags = non-sync,non-direct,read-write
stat.ino = 47244651475
....
/mnt/scratch/foo:
 EXT: FILE-OFFSET      BLOCK-RANGE               AG AG-OFFSET        TOTAL FLAGS
   0: [0..7]:          hole                                              8
   1: [8..15]:         47244763192..47244763199  22 (123112..123119)     8 000000
   2: [16..23]:        hole                                              8
   3: [24..31]:        47244763208..47244763215  22 (123128..123135)     8 000000
....
8189: [65512..65519]:  47244828808..47244828815  22 (188728..188735)     8 000000
8190: [65520..65527]:  hole                                              8
8191: [65528..65535]:  47244828824..47244828831  22 (188744..188751)     8 000000
/mnt/scratch/foo:
 EXT: FILE-OFFSET      BLOCK-RANGE               AG AG-OFFSET        TOTAL FLAGS
   0: [0..7]:          47244763176..47244763183  22 (123096..123103)     8 000000
$

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 spaceman/Makefile     |   6 +-
 spaceman/file.c       |   2 +-
 spaceman/init.c       |   1 +
 spaceman/move_inode.c | 518 ++++++++++++++++++++++++++++++++++++++++++
 spaceman/space.h      |   1 +
 5 files changed, 524 insertions(+), 4 deletions(-)
 create mode 100644 spaceman/move_inode.c

Message ID	20201110225924.4031404-1-david@fromorbit.com (mailing list archive)
State	Superseded, archived
Headers	show Return-Path: <linux-xfs-owner@kernel.org> Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id CA93A138B for <patchwork-linux-xfs@patchwork.kernel.org>; Tue, 10 Nov 2020 22:59:37 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9A1602080A for <patchwork-linux-xfs@patchwork.kernel.org>; Tue, 10 Nov 2020 22:59:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732054AbgKJW7g (ORCPT <rfc822;patchwork-linux-xfs@patchwork.kernel.org>); Tue, 10 Nov 2020 17:59:36 -0500 Received: from mail105.syd.optusnet.com.au ([211.29.132.249]:49962 "EHLO mail105.syd.optusnet.com.au" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731907AbgKJW7g (ORCPT <rfc822;linux-xfs@vger.kernel.org>); Tue, 10 Nov 2020 17:59:36 -0500 Received: from dread.disaster.area (pa49-179-6-140.pa.nsw.optusnet.com.au [49.179.6.140]) by mail105.syd.optusnet.com.au (Postfix) with ESMTPS id 39E953A9B82 for <linux-xfs@vger.kernel.org>; Wed, 11 Nov 2020 09:59:26 +1100 (AEDT) Received: from discord.disaster.area ([192.168.253.110]) by dread.disaster.area with esmtp (Exim 4.92.3) (envelope-from <david@fromorbit.com>) id 1kccbl-009pNe-1k for linux-xfs@vger.kernel.org; Wed, 11 Nov 2020 09:59:25 +1100 Received: from dave by discord.disaster.area with local (Exim 4.94) (envelope-from <david@fromorbit.com>) id 1kccbk-00GulN-Nt for linux-xfs@vger.kernel.org; Wed, 11 Nov 2020 09:59:24 +1100 From: Dave Chinner <david@fromorbit.com> To: linux-xfs@vger.kernel.org Subject: [PATCH] [RFC] spaceman: physically move a regular inode Date: Wed, 11 Nov 2020 09:59:24 +1100 Message-Id: <20201110225924.4031404-1-david@fromorbit.com> X-Mailer: git-send-email 2.28.0 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.3 cv=Ubgvt5aN c=1 sm=1 tr=0 cx=a_idp_d a=uDU3YIYVKEaHT0eX+MXYOQ==:117 a=uDU3YIYVKEaHT0eX+MXYOQ==:17 a=nNwsprhYR40A:10 a=20KFwNOVAAAA:8 a=myu2YSmQo_11sOQ7t4IA:9 a=3DS5zZEWb2WOpUtQ:21 a=jDB7GZ6fhO4h3lOr:21 Precedence: bulk List-ID: <linux-xfs.vger.kernel.org> X-Mailing-List: linux-xfs@vger.kernel.org
Series	[RFC] spaceman: physically move a regular inode \| expand [RFC] spaceman: physically move a regular inode

[RFC] spaceman: physically move a regular inode

Commit Message

Comments

Patch