mbox series

[RFC,0/9] ext4: Add direct-io atomic write support using fsawu

Message ID cover.1709356594.git.ritesh.list@gmail.com (mailing list archive)
Headers show
Series ext4: Add direct-io atomic write support using fsawu | expand

Message

Ritesh Harjani (IBM) March 2, 2024, 7:41 a.m. UTC
Hello all,

This RFC series adds support for atomic writes to ext4 direct-io using
filesystem atomic write unit. It's built on top of John's "block atomic
write v5" series which adds RWF_ATOMIC flag interface to pwritev2() and enables
atomic write support in underlying device driver and block layer.

This series uses the same RWF_ATOMIC interface for adding atomic write support
to ext4's direct-io path. One can utilize it by 2 of the methods explained below.
((1)mkfs.ext4 -b <BS>, (2) with bigalloc).

Filesystem atomic write unit (fsawu):
============================================
Atomic writes within ext4 can be supported using below 3 methods -
1. On a large pagesize system (e.g. Power with 64k pagesize or aarch64 with 64k pagesize),
   we can mkfs using different blocksizes. e.g. mkfs.ext4 -b <4k/8k/16k/32k/64k).
   Now if the underlying HW device supports atomic writes, than a corresponding
   blocksize can be chosen as a filesystem atomic write unit (fsawu) which
   should be within the underlying hw defined [awu_min, awu_max] range.
   For such filesystem, fsawu_[min|max] both are equal to blocksize (e.g. 16k)

   On a smaller pagesize system this can be utilized when support for LBS is
   complete (on ext4).

2. EXT4 already supports a feature called bigalloc. In that ext4 can handle
   allocation in cluster size units. So for e.g. we can create a filesystem with
   4k blocksize but with 64k clustersize. Such a configuration can also be used
   to support atomic writes if the underlying hw device supports it.
   In such case the fsawu_min will most likely be the filesystem blocksize and
   fsawu_max will mostly likely be the cluster size.

   So a user can do an atomic write of any size between [fsawu_min, fsawu_max]
   range as long as it satisfies other constraints being laid out by HW device
   (or by software stack) to support atomic writes.
   e.g. len should be a power of 2, pos % len should be naturally
   aligned and [start | end] (phys offsets) should not straddle over
   an atomic write boundary.

3. EXT4 mballoc can be made aware of doing aligned block allocation for e.g. by
   utilizing cr-0 allocation criteria. With this support, we won't be needing
   to format a new filesystem and hopefully when the support for this in mballoc
   is done, it can utilize the same interface/helper routines laid out in this
   patch series. There is work going on in this aspect too in parallel [2]


Purpose of an early RFC:
(note only minimal testing has been done on this).
========================
Other than getting early review comments on the design, hopefully it should also
help folks in their discussion at LSFMM since there are various topic proposals
out there regarding atomic write support in xfs and ext4 [3][4].


How to utilize this support:
===========================
1. mkfs.ext4 -b 4096 -C 65536 /dev/<sdb> (scsi_debug or device with atomic write)
   or mkfs.ext4 -b <BS=16k> if your platform supports it.
2. mount /dev/sdb /mnt
3. touch /mnt/f1
4. chattr +W /mnt/f1
5. xfs_io -dc "pwrite <pos> <len>" /mnt/f1


References:
===========
[1]: https://lore.kernel.org/all/20240226173612.1478858-1-john.g.garry@oracle.com/
[2]: https://lore.kernel.org/linux-ext4/cover.1701339358.git.ojaswin@linux.ibm.com/
[3]: https://www.spinics.net/lists/linux-xfs/msg81086.html
[4]: https://www.spinics.net/lists/linux-fsdevel/msg265226.html

John Garry (1):
  fs: Add FS_XFLAG_ATOMICWRITES flag

Ritesh Harjani (IBM) (7):
  fs: Reserve inode flag FS_ATOMICWRITES_FL for atomic writes
  iomap: Add atomic write support for direct-io
  ext4: Add statx and other atomic write helper routines
  ext4: Adds direct-io atomic writes checks
  ext4: Add an inode flag for atomic writes
  ext4: Enable FMODE_CAN_ATOMIC_WRITE in open for direct-io
  ext4: Adds atomic writes using fsawu

Ritesh Harjani (IBM) (1):
  e2fsprogs/chattr: Supports atomic writes attribute

 fs/ext4/ext4.h           | 87 +++++++++++++++++++++++++++++++++++++++-
 fs/ext4/file.c           | 38 ++++++++++++++++--
 fs/ext4/inode.c          | 16 ++++++++
 fs/ext4/ioctl.c          | 11 +++++
 fs/ext4/super.c          |  1 +
 fs/ioctl.c               |  4 ++
 fs/iomap/direct-io.c     | 75 ++++++++++++++++++++++++++++++++--
 fs/iomap/trace.h         |  3 +-
 include/linux/fileattr.h |  4 +-
 include/linux/iomap.h    |  1 +
 include/uapi/linux/fs.h  |  2 +
 11 files changed, 232 insertions(+), 10 deletions(-)

--
2.39.2

Comments

John Garry March 6, 2024, 11:22 a.m. UTC | #1
On 02/03/2024 07:41, Ritesh Harjani (IBM) wrote:
> Hello all,
> 
> This RFC series adds support for atomic writes to ext4 direct-io using
> filesystem atomic write unit. It's built on top of John's "block atomic
> write v5" series which adds RWF_ATOMIC flag interface to pwritev2() and enables
> atomic write support in underlying device driver and block layer.
> 
> This series uses the same RWF_ATOMIC interface for adding atomic write support
> to ext4's direct-io path. One can utilize it by 2 of the methods explained below.
> ((1)mkfs.ext4 -b <BS>, (2) with bigalloc).
> 
> Filesystem atomic write unit (fsawu):
> ============================================
> Atomic writes within ext4 can be supported using below 3 methods -
> 1. On a large pagesize system (e.g. Power with 64k pagesize or aarch64 with 64k pagesize),
>     we can mkfs using different blocksizes. e.g. mkfs.ext4 -b <4k/8k/16k/32k/64k).
>     Now if the underlying HW device supports atomic writes, than a corresponding
>     blocksize can be chosen as a filesystem atomic write unit (fsawu) which
>     should be within the underlying hw defined [awu_min, awu_max] range.
>     For such filesystem, fsawu_[min|max] both are equal to blocksize (e.g. 16k)
> 
>     On a smaller pagesize system this can be utilized when support for LBS is
>     complete (on ext4).
> 
> 2. EXT4 already supports a feature called bigalloc. In that ext4 can handle
>     allocation in cluster size units. So for e.g. we can create a filesystem with
>     4k blocksize but with 64k clustersize. Such a configuration can also be used
>     to support atomic writes if the underlying hw device supports it.
>     In such case the fsawu_min will most likely be the filesystem blocksize and
>     fsawu_max will mostly likely be the cluster size.
> 
>     So a user can do an atomic write of any size between [fsawu_min, fsawu_max]
>     range as long as it satisfies other constraints being laid out by HW device
>     (or by software stack) to support atomic writes.
>     e.g. len should be a power of 2, pos % len should be naturally
>     aligned and [start | end] (phys offsets) should not straddle over
>     an atomic write boundary.

JFYI, I gave this a quick try, and it seems to work ok. Naturally it 
suffers from the same issue discussed at 
https://lore.kernel.org/linux-fsdevel/434c570e-39b2-4f1c-9b49-ac5241d310ca@oracle.com/ 
with regards to writing to partially written extents, which I have tried 
to address properly in my v2 for that same series.

Thanks,
John
Ritesh Harjani (IBM) March 6, 2024, 1:13 p.m. UTC | #2
John Garry <john.g.garry@oracle.com> writes:

> On 02/03/2024 07:41, Ritesh Harjani (IBM) wrote:
>> Hello all,
>> 
>> This RFC series adds support for atomic writes to ext4 direct-io using
>> filesystem atomic write unit. It's built on top of John's "block atomic
>> write v5" series which adds RWF_ATOMIC flag interface to pwritev2() and enables
>> atomic write support in underlying device driver and block layer.
>> 
>> This series uses the same RWF_ATOMIC interface for adding atomic write support
>> to ext4's direct-io path. One can utilize it by 2 of the methods explained below.
>> ((1)mkfs.ext4 -b <BS>, (2) with bigalloc).
>> 
>> Filesystem atomic write unit (fsawu):
>> ============================================
>> Atomic writes within ext4 can be supported using below 3 methods -
>> 1. On a large pagesize system (e.g. Power with 64k pagesize or aarch64 with 64k pagesize),
>>     we can mkfs using different blocksizes. e.g. mkfs.ext4 -b <4k/8k/16k/32k/64k).
>>     Now if the underlying HW device supports atomic writes, than a corresponding
>>     blocksize can be chosen as a filesystem atomic write unit (fsawu) which
>>     should be within the underlying hw defined [awu_min, awu_max] range.
>>     For such filesystem, fsawu_[min|max] both are equal to blocksize (e.g. 16k)
>> 
>>     On a smaller pagesize system this can be utilized when support for LBS is
>>     complete (on ext4).
>> 
>> 2. EXT4 already supports a feature called bigalloc. In that ext4 can handle
>>     allocation in cluster size units. So for e.g. we can create a filesystem with
>>     4k blocksize but with 64k clustersize. Such a configuration can also be used
>>     to support atomic writes if the underlying hw device supports it.
>>     In such case the fsawu_min will most likely be the filesystem blocksize and
>>     fsawu_max will mostly likely be the cluster size.
>> 
>>     So a user can do an atomic write of any size between [fsawu_min, fsawu_max]
>>     range as long as it satisfies other constraints being laid out by HW device
>>     (or by software stack) to support atomic writes.
>>     e.g. len should be a power of 2, pos % len should be naturally
>>     aligned and [start | end] (phys offsets) should not straddle over
>>     an atomic write boundary.
>
> JFYI, I gave this a quick try, and it seems to work ok. Naturally it 

Thanks John for giving this a try!

> suffers from the same issue discussed at 
> https://lore.kernel.org/linux-fsdevel/434c570e-39b2-4f1c-9b49-ac5241d310ca@oracle.com/ 
> with regards to writing to partially written extents, which I have tried 
> to address properly in my v2 for that same series.

I did go through other revisions, but I guess I missed going through this series.

Thanks Dave & John for your comments over the series.
Let me go through the revisions I have missed and John's latest revision.
I will update this series accordingly.

Appreciate your help!
-ritesh