[RFC,00/18] xfs: atomic file updates

Message ID	158812825316.168506.932540609191384366.stgit@magnolia (mailing list archive)
Headers	show Return-Path: <SRS0=Q7ae=6N=vger.kernel.org=linux-xfs-owner@kernel.org> Subject: [PATCH RFC 00/18] xfs: atomic file updates From: "Darrick J. Wong" <darrick.wong@oracle.com> To: darrick.wong@oracle.com Cc: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org Date: Tue, 28 Apr 2020 19:44:14 -0700 Message-ID: <158812825316.168506.932540609191384366.stgit@magnolia> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk
Series	xfs: atomic file updates \| expand [RFC,00/18] xfs: atomic file updates [01/18] xfs: clean up the error handling in xfs_swap_extent_rmap [02/18] xfs: fix xfs_reflink_remap_prep calling conventions [03/18] vfs: introduce new file extent swap ioctl [04/18] xfs: support deferred bmap updates on the attr fork [05/18] xfs: xfs_bmap_finish_one should map unwritten extents properly [06/18] xfs: create a log incompat flag for atomic extent swapping [07/18] xfs: allow deferred ops items to put themselves at the end of the pending queue [08/18] xfs: introduce a swap-extent log intent item [09/18] xfs: create deferred log items for extent swapping [10/18] xfs: refactor locking and unlocking two inodes against userspace IO [11/18] xfs: add a ->swap_file_range handler [12/18] xfs: add error injection to test swapext recovery [13/18] xfs: allow xfs_swap_range to use older extent swap algorithms [14/18] xfs: port xfs_swap_extents_rmap to our new code [15/18] xfs: consolidate all of the xfs_swap_extent_forks code [16/18] xfs: refactor reflink flag handling in xfs_swap_extent_forks [17/18] xfs: remove old swap extents implementation [18/18] xfs: fix quota accounting in the old fork swap code

Message ID

158812825316.168506.932540609191384366.stgit@magnolia (mailing list archive)

Headers

Subject: [PATCH RFC 00/18] xfs: atomic file updates
From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: darrick.wong@oracle.com
Cc: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
        linux-api@vger.kernel.org
Date: Tue, 28 Apr 2020 19:44:14 -0700
Message-ID: <158812825316.168506.932540609191384366.stgit@magnolia>
User-Agent: StGit/0.17.1-dirty
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit
Sender: linux-xfs-owner@vger.kernel.org
Precedence: bulk

Series

xfs: atomic file updates | expand

Message

Darrick J. Wong April 29, 2020, 2:44 a.m. UTC

Hi all,

This series creates a new log incompat feature and log intent items to
track high level progress of swapping ranges of two files and finish
interrupted work if the system goes down. It then adds a new
FISWAPRANGE ioctl so that userspace can access the atomic extent
swapping feature. With this feature, user programs will be able to
update files atomically by opening an O_TMPFILE, reflinking the source
file to it, making whatever updates they want to make, and then
atomically swap the changed bits back to the source file. It even has
an optional ability to detect a changed source file and reject the
update.

The intent behind this new userspace functionality is to enable atomic
rewrites of arbitrary parts of individual files. For years, application
programmers wanting to ensure the atomicity of a file update had to
write the changes to a new file in the same directory, fsync the new
file, rename the new file on top of the old filename, and then fsync the
directory. People get it wrong all the time, and $fs hacks abound.

With atomic file updates, this is no longer necessary. Programmers
create an O_TMPFILE, optionally FICLONE the file contents into the
temporary file, make whatever changes they want to the tempfile, and
FISWAPRANGE the contents from the tempfile into the regular file. The
interface can optionally check the original file's [cm]time to reject
the swap operation if the file has been modified by. There are no
fsyncs to take care of; no directory operations at all; and the fs will
take care of finishing the swap operation if the system goes down in the
middle of the swap. Sample code can be found in the corresponding
changes to xfs_io to exercise the use case mentioned above.

Note that this function is /not/ the O_DIRECT atomic file writes concept
that has been floating around for years. This is constructed entirely
in software, which means that there are no limitations other than the
regular filesystem limits.

As a side note, there's an extra motivation behind the kernel
functionality: online repair of file-based metadata. The atomic file
swap is implemented as an atomic inode fork swap, which means that we
can implement online reconstruction of extended attributes and
directories by building a new one in another inode and atomically
swap the contents.

If you're going to start using this mess, you probably ought to just
pull from my git trees, which are linked below.

This is an extraordinary way to destroy everything. Enjoy!
Comments and questions are, as always, welcome.

--D

kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=atomic-file-updates

xfsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=atomic-file-updates

fstests git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfstests-dev.git/log/?h=atomic-file-updates

Comments

Jann Horn May 1, 2020, 7:46 p.m. UTC | #1

On Wed, Apr 29, 2020 at 4:46 AM Darrick J. Wong <darrick.wong@oracle.com> wrote:
> This series creates a new log incompat feature and log intent items to
> track high level progress of swapping ranges of two files and finish
> interrupted work if the system goes down.  It then adds a new
> FISWAPRANGE ioctl so that userspace can access the atomic extent
> swapping feature.  With this feature, user programs will be able to
> update files atomically by opening an O_TMPFILE, reflinking the source
> file to it, making whatever updates they want to make, and then
> atomically swap the changed bits back to the source file.  It even has
> an optional ability to detect a changed source file and reject the
> update.
>
> The intent behind this new userspace functionality is to enable atomic
> rewrites of arbitrary parts of individual files.  For years, application
> programmers wanting to ensure the atomicity of a file update had to
> write the changes to a new file in the same directory, fsync the new
> file, rename the new file on top of the old filename, and then fsync the
> directory.  People get it wrong all the time, and $fs hacks abound.
>
> With atomic file updates, this is no longer necessary.  Programmers
> create an O_TMPFILE, optionally FICLONE the file contents into the
> temporary file, make whatever changes they want to the tempfile, and
> FISWAPRANGE the contents from the tempfile into the regular file.

That also requires the *readers* to be atomic though, right? Since now
the updates are visible to readers instantly, instead of only on the
next open()? If you used this to update /etc/passwd while someone else
is in the middle of reading it with a sequence of read() calls, there
would be fireworks...

I guess maybe the new API could also be wired up to ext4's
EXT4_IOC_MOVE_EXT somehow, provided that the caller specifies
FILE_SWAP_RANGE_NONATOMIC?

Darrick J. Wong May 1, 2020, 8:11 p.m. UTC | #2

On Fri, May 01, 2020 at 09:46:07PM +0200, Jann Horn wrote:
> On Wed, Apr 29, 2020 at 4:46 AM Darrick J. Wong <darrick.wong@oracle.com> wrote:
> > This series creates a new log incompat feature and log intent items to
> > track high level progress of swapping ranges of two files and finish
> > interrupted work if the system goes down.  It then adds a new
> > FISWAPRANGE ioctl so that userspace can access the atomic extent
> > swapping feature.  With this feature, user programs will be able to
> > update files atomically by opening an O_TMPFILE, reflinking the source
> > file to it, making whatever updates they want to make, and then
> > atomically swap the changed bits back to the source file.  It even has
> > an optional ability to detect a changed source file and reject the
> > update.
> >
> > The intent behind this new userspace functionality is to enable atomic
> > rewrites of arbitrary parts of individual files.  For years, application
> > programmers wanting to ensure the atomicity of a file update had to
> > write the changes to a new file in the same directory, fsync the new
> > file, rename the new file on top of the old filename, and then fsync the
> > directory.  People get it wrong all the time, and $fs hacks abound.
> >
> > With atomic file updates, this is no longer necessary.  Programmers
> > create an O_TMPFILE, optionally FICLONE the file contents into the
> > temporary file, make whatever changes they want to the tempfile, and
> > FISWAPRANGE the contents from the tempfile into the regular file.
> 
> That also requires the *readers* to be atomic though, right? Since now
> the updates are visible to readers instantly, instead of only on the
> next open()? If you used this to update /etc/passwd while someone else
> is in the middle of reading it with a sequence of read() calls, there
> would be fireworks...

Right.  In XFS, we guarantee read atomicity by by grabbing i_rwsem and
the xfs mmap lock, break any layout leases, drain the directios, and
then flush+invalidate the page cache.  Once that preparation step is
done, we do the actual extent swap.

> I guess maybe the new API could also be wired up to ext4's
> EXT4_IOC_MOVE_EXT somehow, provided that the caller specifies
> FILE_SWAP_RANGE_NONATOMIC?

Sort of.  ext4's MOVE_EXT also swaps the file contents doing the swap
one buffer_head at a time, so you'd have to turn that off since this API
assumes that the caller already set each file's contents beforehand.

Ted has theorized that so long as the extent map size is less than 1/4
of the journal then it would be possible to do atomic swaps in ext4
without adding all the logical log item bits that were a prerequisite
for the xfs implementation.

--D