Message ID | 20250415121425.4146847-10-john.g.garry@oracle.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | large atomic writes for xfs | expand |
On Tue, Apr 15, 2025 at 12:14:20PM +0000, John Garry wrote: > For when large atomic writes (> 1x FS block) are supported, there will be > various occasions when HW offload may not be possible. > > Such instances include: > - unaligned extent mapping wrt write length > - extent mappings which do not cover the full write, e.g. the write spans > sparse or mixed-mapping extents > - the write length is greater than HW offload can support > > In those cases, we need to fallback to the CoW-based atomic write mode. For > this, report special code -ENOPROTOOPT to inform the caller that HW > offload-based method is not possible. > > In addition to the occasions mentioned, if the write covers an unallocated > range, we again judge that we need to rely on the CoW-based method when we > would need to allocate anything more than 1x block. This is because if we > allocate less blocks that is required for the write, then again HW > offload-based method would not be possible. So we are taking a pessimistic > approach to writes covering unallocated space. > > Signed-off-by: John Garry <john.g.garry@oracle.com> > Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> > --- > fs/xfs/xfs_iomap.c | 65 ++++++++++++++++++++++++++++++++++++++++++++-- > 1 file changed, 63 insertions(+), 2 deletions(-) > > diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c > index 049655ebc3f7..02bb8257ea24 100644 > --- a/fs/xfs/xfs_iomap.c > +++ b/fs/xfs/xfs_iomap.c > @@ -798,6 +798,41 @@ imap_spans_range( > return true; > } > > +static bool > +xfs_bmap_hw_atomic_write_possible( > + struct xfs_inode *ip, > + struct xfs_bmbt_irec *imap, > + xfs_fileoff_t offset_fsb, > + xfs_fileoff_t end_fsb) > +{ > + struct xfs_mount *mp = ip->i_mount; > + xfs_fsize_t len = XFS_FSB_TO_B(mp, end_fsb - offset_fsb); > + > + /* > + * atomic writes are required to be naturally aligned for disk blocks, > + * which ensures that we adhere to block layer rules that we won't > + * straddle any boundary or violate write alignment requirement. > + */ > + if (!IS_ALIGNED(imap->br_startblock, imap->br_blockcount)) > + return false; > + > + /* > + * Spanning multiple extents would mean that multiple BIOs would be > + * issued, and so would lose atomicity required for REQ_ATOMIC-based > + * atomics. > + */ > + if (!imap_spans_range(imap, offset_fsb, end_fsb)) > + return false; > + > + /* > + * The ->iomap_begin caller should ensure this, but check anyway. > + */ > + if (len > xfs_inode_buftarg(ip)->bt_bdev_awu_max) > + return false; This needs to check len against bt_bdev_awu_min so that we don't submit too-short atomic writes to the hardware. Let's say that the hardware minimum is 32k and the fsblock size is 4k. XFS can perform an out of place write for 4k-16k writes, but right now we'll just throw invalid commands at the bdev, and it'll return EINVAL. /me wonders if statx should grow a atomic_write_unit_min_opt field too, unless everyone in block layer land is convinced that awu_min will always match lbasize? (I probably missed that conversation) --D > + > + return true; > +} > + > static int > xfs_direct_write_iomap_begin( > struct inode *inode, > @@ -812,9 +847,11 @@ xfs_direct_write_iomap_begin( > struct xfs_bmbt_irec imap, cmap; > xfs_fileoff_t offset_fsb = XFS_B_TO_FSBT(mp, offset); > xfs_fileoff_t end_fsb = xfs_iomap_end_fsb(mp, offset, length); > + xfs_fileoff_t orig_end_fsb = end_fsb; > int nimaps = 1, error = 0; > bool shared = false; > u16 iomap_flags = 0; > + bool needs_alloc; > unsigned int lockmode; > u64 seq; > > @@ -875,13 +912,37 @@ xfs_direct_write_iomap_begin( > (flags & IOMAP_DIRECT) || IS_DAX(inode)); > if (error) > goto out_unlock; > - if (shared) > + if (shared) { > + if ((flags & IOMAP_ATOMIC) && > + !xfs_bmap_hw_atomic_write_possible(ip, &cmap, > + offset_fsb, end_fsb)) { > + error = -ENOPROTOOPT; > + goto out_unlock; > + } > goto out_found_cow; > + } > end_fsb = imap.br_startoff + imap.br_blockcount; > length = XFS_FSB_TO_B(mp, end_fsb) - offset; > } > > - if (imap_needs_alloc(inode, flags, &imap, nimaps)) > + needs_alloc = imap_needs_alloc(inode, flags, &imap, nimaps); > + > + if (flags & IOMAP_ATOMIC) { > + error = -ENOPROTOOPT; > + /* > + * If we allocate less than what is required for the write > + * then we may end up with multiple extents, which means that > + * REQ_ATOMIC-based cannot be used, so avoid this possibility. > + */ > + if (needs_alloc && orig_end_fsb - offset_fsb > 1) > + goto out_unlock; > + > + if (!xfs_bmap_hw_atomic_write_possible(ip, &imap, offset_fsb, > + orig_end_fsb)) > + goto out_unlock; > + } > + > + if (needs_alloc) > goto allocate_blocks; > > /* > -- > 2.31.1 > >
On 15/04/2025 18:34, Darrick J. Wong wrote: >> + /* >> + * Spanning multiple extents would mean that multiple BIOs would be >> + * issued, and so would lose atomicity required for REQ_ATOMIC-based >> + * atomics. >> + */ >> + if (!imap_spans_range(imap, offset_fsb, end_fsb)) >> + return false; >> + >> + /* >> + * The ->iomap_begin caller should ensure this, but check anyway. >> + */ >> + if (len > xfs_inode_buftarg(ip)->bt_bdev_awu_max) >> + return false; > This needs to check len against bt_bdev_awu_min so that we don't submit > too-short atomic writes to the hardware. Right, let me check this. I think that we should only support sane HW which can write 1x FS block or more. > Let's say that the hardware > minimum is 32k and the fsblock size is 4k. XFS can perform an out of > place write for 4k-16k writes, but right now we'll just throw invalid > commands at the bdev, and it'll return EINVAL. > > /me wonders if statx should grow a atomic_write_unit_min_opt field > too, unless everyone in block layer land is convinced that awu_min will > always match lbasize? (I probably missed that conversation) Nothing states that it should (match lbasize), but again HW which can only write >1 FS block is something which I don't want to support (yet). Thanks, John
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c index 049655ebc3f7..02bb8257ea24 100644 --- a/fs/xfs/xfs_iomap.c +++ b/fs/xfs/xfs_iomap.c @@ -798,6 +798,41 @@ imap_spans_range( return true; } +static bool +xfs_bmap_hw_atomic_write_possible( + struct xfs_inode *ip, + struct xfs_bmbt_irec *imap, + xfs_fileoff_t offset_fsb, + xfs_fileoff_t end_fsb) +{ + struct xfs_mount *mp = ip->i_mount; + xfs_fsize_t len = XFS_FSB_TO_B(mp, end_fsb - offset_fsb); + + /* + * atomic writes are required to be naturally aligned for disk blocks, + * which ensures that we adhere to block layer rules that we won't + * straddle any boundary or violate write alignment requirement. + */ + if (!IS_ALIGNED(imap->br_startblock, imap->br_blockcount)) + return false; + + /* + * Spanning multiple extents would mean that multiple BIOs would be + * issued, and so would lose atomicity required for REQ_ATOMIC-based + * atomics. + */ + if (!imap_spans_range(imap, offset_fsb, end_fsb)) + return false; + + /* + * The ->iomap_begin caller should ensure this, but check anyway. + */ + if (len > xfs_inode_buftarg(ip)->bt_bdev_awu_max) + return false; + + return true; +} + static int xfs_direct_write_iomap_begin( struct inode *inode, @@ -812,9 +847,11 @@ xfs_direct_write_iomap_begin( struct xfs_bmbt_irec imap, cmap; xfs_fileoff_t offset_fsb = XFS_B_TO_FSBT(mp, offset); xfs_fileoff_t end_fsb = xfs_iomap_end_fsb(mp, offset, length); + xfs_fileoff_t orig_end_fsb = end_fsb; int nimaps = 1, error = 0; bool shared = false; u16 iomap_flags = 0; + bool needs_alloc; unsigned int lockmode; u64 seq; @@ -875,13 +912,37 @@ xfs_direct_write_iomap_begin( (flags & IOMAP_DIRECT) || IS_DAX(inode)); if (error) goto out_unlock; - if (shared) + if (shared) { + if ((flags & IOMAP_ATOMIC) && + !xfs_bmap_hw_atomic_write_possible(ip, &cmap, + offset_fsb, end_fsb)) { + error = -ENOPROTOOPT; + goto out_unlock; + } goto out_found_cow; + } end_fsb = imap.br_startoff + imap.br_blockcount; length = XFS_FSB_TO_B(mp, end_fsb) - offset; } - if (imap_needs_alloc(inode, flags, &imap, nimaps)) + needs_alloc = imap_needs_alloc(inode, flags, &imap, nimaps); + + if (flags & IOMAP_ATOMIC) { + error = -ENOPROTOOPT; + /* + * If we allocate less than what is required for the write + * then we may end up with multiple extents, which means that + * REQ_ATOMIC-based cannot be used, so avoid this possibility. + */ + if (needs_alloc && orig_end_fsb - offset_fsb > 1) + goto out_unlock; + + if (!xfs_bmap_hw_atomic_write_possible(ip, &imap, offset_fsb, + orig_end_fsb)) + goto out_unlock; + } + + if (needs_alloc) goto allocate_blocks; /*