diff mbox series

[v7,15/15] xfs: Add async buffered write support

Message ID 20220601210141.3773402-16-shr@fb.com (mailing list archive)
State New
Headers show
Series io-uring/xfs: support async buffered writes | expand

Commit Message

Stefan Roesch June 1, 2022, 9:01 p.m. UTC
This adds the async buffered write support to XFS. For async buffered
write requests, the request will return -EAGAIN if the ilock cannot be
obtained immediately.

Signed-off-by: Stefan Roesch <shr@fb.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_file.c  | 11 +++++------
 fs/xfs/xfs_iomap.c |  5 ++++-
 2 files changed, 9 insertions(+), 7 deletions(-)

Comments

Al Viro July 1, 2022, 4:39 a.m. UTC | #1
On Wed, Jun 01, 2022 at 02:01:41PM -0700, Stefan Roesch wrote:
> This adds the async buffered write support to XFS. For async buffered
> write requests, the request will return -EAGAIN if the ilock cannot be
> obtained immediately.

breaks generic/471...
Jens Axboe July 1, 2022, 2:19 p.m. UTC | #2
On 6/30/22 10:39 PM, Al Viro wrote:
> On Wed, Jun 01, 2022 at 02:01:41PM -0700, Stefan Roesch wrote:
>> This adds the async buffered write support to XFS. For async buffered
>> write requests, the request will return -EAGAIN if the ilock cannot be
>> obtained immediately.
> 
> breaks generic/471...

That test case is odd, because it makes some weird assumptions about
what RWF_NOWAIT means. Most notably that it makes it mean if we should
instantiate blocks or not. Where did those assumed semantics come from?
On the read side, we have clearly documented that it should "not wait
for data which is not immediately available".

Now it is possible that we're returning a spurious -EAGAIN here when we
should not be. And that would be a bug imho. I'll dig in and see what's
going on.
Jens Axboe July 1, 2022, 2:30 p.m. UTC | #3
On 7/1/22 8:19 AM, Jens Axboe wrote:
> On 6/30/22 10:39 PM, Al Viro wrote:
>> On Wed, Jun 01, 2022 at 02:01:41PM -0700, Stefan Roesch wrote:
>>> This adds the async buffered write support to XFS. For async buffered
>>> write requests, the request will return -EAGAIN if the ilock cannot be
>>> obtained immediately.
>>
>> breaks generic/471...
> 
> That test case is odd, because it makes some weird assumptions about
> what RWF_NOWAIT means. Most notably that it makes it mean if we should
> instantiate blocks or not. Where did those assumed semantics come from?
> On the read side, we have clearly documented that it should "not wait
> for data which is not immediately available".
> 
> Now it is possible that we're returning a spurious -EAGAIN here when we
> should not be. And that would be a bug imho. I'll dig in and see what's
> going on.

This is the timestamp update that needs doing which will now return
-EAGAIN if IOCB_NOWAIT is set as it may block.

I do wonder if we should just allow inode time updates with IOCB_NOWAIT,
even on the io_uring side. Either that, or passed in RWF_NOWAIT
semantics don't map completely to internal IOCB_NOWAIT semantics. At
least in terms of what generic/471 is doing, but I'm not sure who came
up with that and if it's established semantics or just some made up ones
from whomever wrote that test. I don't think they make any sense, to be
honest.
Jens Axboe July 1, 2022, 2:38 p.m. UTC | #4
On 7/1/22 8:30 AM, Jens Axboe wrote:
> On 7/1/22 8:19 AM, Jens Axboe wrote:
>> On 6/30/22 10:39 PM, Al Viro wrote:
>>> On Wed, Jun 01, 2022 at 02:01:41PM -0700, Stefan Roesch wrote:
>>>> This adds the async buffered write support to XFS. For async buffered
>>>> write requests, the request will return -EAGAIN if the ilock cannot be
>>>> obtained immediately.
>>>
>>> breaks generic/471...
>>
>> That test case is odd, because it makes some weird assumptions about
>> what RWF_NOWAIT means. Most notably that it makes it mean if we should
>> instantiate blocks or not. Where did those assumed semantics come from?
>> On the read side, we have clearly documented that it should "not wait
>> for data which is not immediately available".
>>
>> Now it is possible that we're returning a spurious -EAGAIN here when we
>> should not be. And that would be a bug imho. I'll dig in and see what's
>> going on.
> 
> This is the timestamp update that needs doing which will now return
> -EAGAIN if IOCB_NOWAIT is set as it may block.
> 
> I do wonder if we should just allow inode time updates with IOCB_NOWAIT,
> even on the io_uring side. Either that, or passed in RWF_NOWAIT
> semantics don't map completely to internal IOCB_NOWAIT semantics. At
> least in terms of what generic/471 is doing, but I'm not sure who came
> up with that and if it's established semantics or just some made up ones
> from whomever wrote that test. I don't think they make any sense, to be
> honest.

Further support that generic/471 is just randomly made up semantics,
it needs to special case btrfs with nocow or you'd get -EAGAIN anyway
for that test.

And it's relying on some random timing to see if this works. I really
think that test case is just hot garbage, and doesn't test anything
meaningful.
Darrick J. Wong July 1, 2022, 6:05 p.m. UTC | #5
On Fri, Jul 01, 2022 at 08:38:07AM -0600, Jens Axboe wrote:
> On 7/1/22 8:30 AM, Jens Axboe wrote:
> > On 7/1/22 8:19 AM, Jens Axboe wrote:
> >> On 6/30/22 10:39 PM, Al Viro wrote:
> >>> On Wed, Jun 01, 2022 at 02:01:41PM -0700, Stefan Roesch wrote:
> >>>> This adds the async buffered write support to XFS. For async buffered
> >>>> write requests, the request will return -EAGAIN if the ilock cannot be
> >>>> obtained immediately.
> >>>
> >>> breaks generic/471...
> >>
> >> That test case is odd, because it makes some weird assumptions about
> >> what RWF_NOWAIT means. Most notably that it makes it mean if we should
> >> instantiate blocks or not. Where did those assumed semantics come from?
> >> On the read side, we have clearly documented that it should "not wait
> >> for data which is not immediately available".
> >>
> >> Now it is possible that we're returning a spurious -EAGAIN here when we
> >> should not be. And that would be a bug imho. I'll dig in and see what's
> >> going on.
> > 
> > This is the timestamp update that needs doing which will now return
> > -EAGAIN if IOCB_NOWAIT is set as it may block.
> > 
> > I do wonder if we should just allow inode time updates with IOCB_NOWAIT,
> > even on the io_uring side. Either that, or passed in RWF_NOWAIT
> > semantics don't map completely to internal IOCB_NOWAIT semantics. At
> > least in terms of what generic/471 is doing, but I'm not sure who came
> > up with that and if it's established semantics or just some made up ones
> > from whomever wrote that test. I don't think they make any sense, to be
> > honest.
> 
> Further support that generic/471 is just randomly made up semantics,
> it needs to special case btrfs with nocow or you'd get -EAGAIN anyway
> for that test.
> 
> And it's relying on some random timing to see if this works. I really
> think that test case is just hot garbage, and doesn't test anything
> meaningful.

<shrug> I had thought that NOWAIT means "don't wait for *any*thing",
which would include timestamp updates... but then I've never been all
that clear on what specifically NOWAIT will and won't wait for. :/

--D

> -- 
> Jens Axboe
>
Jens Axboe July 1, 2022, 6:14 p.m. UTC | #6
On 7/1/22 12:05 PM, Darrick J. Wong wrote:
> On Fri, Jul 01, 2022 at 08:38:07AM -0600, Jens Axboe wrote:
>> On 7/1/22 8:30 AM, Jens Axboe wrote:
>>> On 7/1/22 8:19 AM, Jens Axboe wrote:
>>>> On 6/30/22 10:39 PM, Al Viro wrote:
>>>>> On Wed, Jun 01, 2022 at 02:01:41PM -0700, Stefan Roesch wrote:
>>>>>> This adds the async buffered write support to XFS. For async buffered
>>>>>> write requests, the request will return -EAGAIN if the ilock cannot be
>>>>>> obtained immediately.
>>>>>
>>>>> breaks generic/471...
>>>>
>>>> That test case is odd, because it makes some weird assumptions about
>>>> what RWF_NOWAIT means. Most notably that it makes it mean if we should
>>>> instantiate blocks or not. Where did those assumed semantics come from?
>>>> On the read side, we have clearly documented that it should "not wait
>>>> for data which is not immediately available".
>>>>
>>>> Now it is possible that we're returning a spurious -EAGAIN here when we
>>>> should not be. And that would be a bug imho. I'll dig in and see what's
>>>> going on.
>>>
>>> This is the timestamp update that needs doing which will now return
>>> -EAGAIN if IOCB_NOWAIT is set as it may block.
>>>
>>> I do wonder if we should just allow inode time updates with IOCB_NOWAIT,
>>> even on the io_uring side. Either that, or passed in RWF_NOWAIT
>>> semantics don't map completely to internal IOCB_NOWAIT semantics. At
>>> least in terms of what generic/471 is doing, but I'm not sure who came
>>> up with that and if it's established semantics or just some made up ones
>>> from whomever wrote that test. I don't think they make any sense, to be
>>> honest.
>>
>> Further support that generic/471 is just randomly made up semantics,
>> it needs to special case btrfs with nocow or you'd get -EAGAIN anyway
>> for that test.
>>
>> And it's relying on some random timing to see if this works. I really
>> think that test case is just hot garbage, and doesn't test anything
>> meaningful.
> 
> <shrug> I had thought that NOWAIT means "don't wait for *any*thing",
> which would include timestamp updates... but then I've never been all
> that clear on what specifically NOWAIT will and won't wait for. :/

Agree, at least the read semantics (kind of) make sense, but the ones
seemingly made up by generic/471 don't seem to make any sense at all.
Josef Bacik July 5, 2022, 1:47 p.m. UTC | #7
On Fri, Jul 01, 2022 at 12:14:41PM -0600, Jens Axboe wrote:
> On 7/1/22 12:05 PM, Darrick J. Wong wrote:
> > On Fri, Jul 01, 2022 at 08:38:07AM -0600, Jens Axboe wrote:
> >> On 7/1/22 8:30 AM, Jens Axboe wrote:
> >>> On 7/1/22 8:19 AM, Jens Axboe wrote:
> >>>> On 6/30/22 10:39 PM, Al Viro wrote:
> >>>>> On Wed, Jun 01, 2022 at 02:01:41PM -0700, Stefan Roesch wrote:
> >>>>>> This adds the async buffered write support to XFS. For async buffered
> >>>>>> write requests, the request will return -EAGAIN if the ilock cannot be
> >>>>>> obtained immediately.
> >>>>>
> >>>>> breaks generic/471...
> >>>>
> >>>> That test case is odd, because it makes some weird assumptions about
> >>>> what RWF_NOWAIT means. Most notably that it makes it mean if we should
> >>>> instantiate blocks or not. Where did those assumed semantics come from?
> >>>> On the read side, we have clearly documented that it should "not wait
> >>>> for data which is not immediately available".
> >>>>
> >>>> Now it is possible that we're returning a spurious -EAGAIN here when we
> >>>> should not be. And that would be a bug imho. I'll dig in and see what's
> >>>> going on.
> >>>
> >>> This is the timestamp update that needs doing which will now return
> >>> -EAGAIN if IOCB_NOWAIT is set as it may block.
> >>>
> >>> I do wonder if we should just allow inode time updates with IOCB_NOWAIT,
> >>> even on the io_uring side. Either that, or passed in RWF_NOWAIT
> >>> semantics don't map completely to internal IOCB_NOWAIT semantics. At
> >>> least in terms of what generic/471 is doing, but I'm not sure who came
> >>> up with that and if it's established semantics or just some made up ones
> >>> from whomever wrote that test. I don't think they make any sense, to be
> >>> honest.
> >>
> >> Further support that generic/471 is just randomly made up semantics,
> >> it needs to special case btrfs with nocow or you'd get -EAGAIN anyway
> >> for that test.
> >>
> >> And it's relying on some random timing to see if this works. I really
> >> think that test case is just hot garbage, and doesn't test anything
> >> meaningful.
> > 
> > <shrug> I had thought that NOWAIT means "don't wait for *any*thing",
> > which would include timestamp updates... but then I've never been all
> > that clear on what specifically NOWAIT will and won't wait for. :/
> 
> Agree, at least the read semantics (kind of) make sense, but the ones
> seemingly made up by generic/471 don't seem to make any sense at all.
>

Added Goldwyn to the CC list for this.

This appears to be just a confusion about what we think NOWAIT should mean.
Looking at the btrfs code it seems like Goldwyn took it as literally as possible
so we wouldn't do any NOWAIT IO's unless it was into a NOCOW area, meaning we
literally wouldn't do anything other than wrap the bio up and fire it off.

The general consensus seems to be that NOWAIT isn't that strict, and that
BTRFS's definition was too strict.  I wrote initial patches to give to Stefan to
clean up the Btrfs side to allow us to use NOWAIT under a lot more
circumstances.

Goldwyn, this test seems to be a little specific to our case, and can be flakey
if the timing isn't just right.  I think we should just remove it?  Especially
since how we define NOWAIT isn't quite right.  Does that sound reasonable to
you?

I think a decent followup would be to add a NOWAIT specific fio test to fsperf
so we still can catch any NOWAIT related regressions, without trying to test for
specific behavior for something that can fail under a whole lot of conditions
unrelated to our implementation.  Thanks,

Josef
Goldwyn Rodrigues July 5, 2022, 2:23 p.m. UTC | #8
On  9:47 05/07, Josef Bacik wrote:
> On Fri, Jul 01, 2022 at 12:14:41PM -0600, Jens Axboe wrote:
> > On 7/1/22 12:05 PM, Darrick J. Wong wrote:
> > > On Fri, Jul 01, 2022 at 08:38:07AM -0600, Jens Axboe wrote:
> > >> On 7/1/22 8:30 AM, Jens Axboe wrote:
> > >>> On 7/1/22 8:19 AM, Jens Axboe wrote:
> > >>>> On 6/30/22 10:39 PM, Al Viro wrote:
> > >>>>> On Wed, Jun 01, 2022 at 02:01:41PM -0700, Stefan Roesch wrote:
> > >>>>>> This adds the async buffered write support to XFS. For async buffered
> > >>>>>> write requests, the request will return -EAGAIN if the ilock cannot be
> > >>>>>> obtained immediately.
> > >>>>>
> > >>>>> breaks generic/471...
> > >>>>
> > >>>> That test case is odd, because it makes some weird assumptions about
> > >>>> what RWF_NOWAIT means. Most notably that it makes it mean if we should
> > >>>> instantiate blocks or not. Where did those assumed semantics come from?
> > >>>> On the read side, we have clearly documented that it should "not wait
> > >>>> for data which is not immediately available".
> > >>>>
> > >>>> Now it is possible that we're returning a spurious -EAGAIN here when we
> > >>>> should not be. And that would be a bug imho. I'll dig in and see what's
> > >>>> going on.
> > >>>
> > >>> This is the timestamp update that needs doing which will now return
> > >>> -EAGAIN if IOCB_NOWAIT is set as it may block.
> > >>>
> > >>> I do wonder if we should just allow inode time updates with IOCB_NOWAIT,
> > >>> even on the io_uring side. Either that, or passed in RWF_NOWAIT
> > >>> semantics don't map completely to internal IOCB_NOWAIT semantics. At
> > >>> least in terms of what generic/471 is doing, but I'm not sure who came
> > >>> up with that and if it's established semantics or just some made up ones
> > >>> from whomever wrote that test. I don't think they make any sense, to be
> > >>> honest.
> > >>
> > >> Further support that generic/471 is just randomly made up semantics,
> > >> it needs to special case btrfs with nocow or you'd get -EAGAIN anyway
> > >> for that test.
> > >>
> > >> And it's relying on some random timing to see if this works. I really
> > >> think that test case is just hot garbage, and doesn't test anything
> > >> meaningful.
> > > 
> > > <shrug> I had thought that NOWAIT means "don't wait for *any*thing",
> > > which would include timestamp updates... but then I've never been all
> > > that clear on what specifically NOWAIT will and won't wait for. :/
> > 
> > Agree, at least the read semantics (kind of) make sense, but the ones
> > seemingly made up by generic/471 don't seem to make any sense at all.
> >
> 
> Added Goldwyn to the CC list for this.
> 
> This appears to be just a confusion about what we think NOWAIT should mean.
> Looking at the btrfs code it seems like Goldwyn took it as literally as possible
> so we wouldn't do any NOWAIT IO's unless it was into a NOCOW area, meaning we
> literally wouldn't do anything other than wrap the bio up and fire it off.

When I introduced NOWAIT, it was only meant for writes and for those
which would block on block allocations or locking. Over time it was
included for buffered reads as well.

> 
> The general consensus seems to be that NOWAIT isn't that strict, and that
> BTRFS's definition was too strict.  I wrote initial patches to give to Stefan to
> clean up the Btrfs side to allow us to use NOWAIT under a lot more
> circumstances.

BTRFS COW path would allocate blocks and the reason it returns -EAGAIN.

> 
> Goldwyn, this test seems to be a little specific to our case, and can be flakey
> if the timing isn't just right.  I think we should just remove it?  Especially
> since how we define NOWAIT isn't quite right.  Does that sound reasonable to
> you?

Yes, I agree. It was based on the initial definition and now can be
removed.
Christoph Hellwig July 5, 2022, 4:11 p.m. UTC | #9
On Tue, Jul 05, 2022 at 09:47:25AM -0400, Josef Bacik wrote:
> This appears to be just a confusion about what we think NOWAIT should mean.
> Looking at the btrfs code it seems like Goldwyn took it as literally as possible
> so we wouldn't do any NOWAIT IO's unless it was into a NOCOW area, meaning we
> literally wouldn't do anything other than wrap the bio up and fire it off.

It means don't do anything that would block.  And I suspect in btrfs
the above is about as much as you can do without blocking.
diff mbox series

Patch

diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index a60632ecc3f0..4d65ff007c7d 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -410,7 +410,7 @@  xfs_file_write_checks(
 		spin_unlock(&ip->i_flags_lock);
 
 out:
-	return file_modified(file);
+	return kiocb_modified(iocb);
 }
 
 static int
@@ -700,12 +700,11 @@  xfs_file_buffered_write(
 	bool			cleared_space = false;
 	unsigned int		iolock;
 
-	if (iocb->ki_flags & IOCB_NOWAIT)
-		return -EOPNOTSUPP;
-
 write_retry:
 	iolock = XFS_IOLOCK_EXCL;
-	xfs_ilock(ip, iolock);
+	ret = xfs_ilock_iocb(iocb, iolock);
+	if (ret)
+		return ret;
 
 	ret = xfs_file_write_checks(iocb, from, &iolock);
 	if (ret)
@@ -1165,7 +1164,7 @@  xfs_file_open(
 {
 	if (xfs_is_shutdown(XFS_M(inode->i_sb)))
 		return -EIO;
-	file->f_mode |= FMODE_NOWAIT | FMODE_BUF_RASYNC;
+	file->f_mode |= FMODE_NOWAIT | FMODE_BUF_RASYNC | FMODE_BUF_WASYNC;
 	return generic_file_open(inode, file);
 }
 
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index bcf7c3694290..5d50fed291b4 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -886,6 +886,7 @@  xfs_buffered_write_iomap_begin(
 	bool			eof = false, cow_eof = false, shared = false;
 	int			allocfork = XFS_DATA_FORK;
 	int			error = 0;
+	unsigned int		lockmode = XFS_ILOCK_EXCL;
 
 	if (xfs_is_shutdown(mp))
 		return -EIO;
@@ -897,7 +898,9 @@  xfs_buffered_write_iomap_begin(
 
 	ASSERT(!XFS_IS_REALTIME_INODE(ip));
 
-	xfs_ilock(ip, XFS_ILOCK_EXCL);
+	error = xfs_ilock_for_iomap(ip, flags, &lockmode);
+	if (error)
+		return error;
 
 	if (XFS_IS_CORRUPT(mp, !xfs_ifork_has_extents(&ip->i_df)) ||
 	    XFS_TEST_ERROR(false, mp, XFS_ERRTAG_BMAPIFORMAT)) {