mbox series

[0/3] Transient errors in Direct I/O

Message ID 20200605204838.10765-1-rgoldwyn@suse.de (mailing list archive)
Headers show
Series Transient errors in Direct I/O | expand

Message

Goldwyn Rodrigues June 5, 2020, 8:48 p.m. UTC
In current scenarios, for XFS, it would mean that a page invalidation
would end up being a writeback error. So, if iomap returns zero, fall
back to biffered I/O. XFS has never supported fallback to buffered I/O.
I hope it is not "never will" ;)

With mixed buffered and direct writes in btrfs, the pages may not be
released the extent may be locked in the ordered extents cleanup thread,
which must make changes to the btrfs trees. In case of btrfs, if it is
possible to wait, depending on the memory flags passed, wait for extent
bit to be cleared so direct I/O is executed so there is no need to
fallback to buffered I/O.

Comments

Dave Chinner June 10, 2020, 2:59 a.m. UTC | #1
[ Please cc the XFS list on XFS and iomap infrastructure changes.]

On Fri, Jun 05, 2020 at 03:48:35PM -0500, Goldwyn Rodrigues wrote:
> In current scenarios, for XFS, it would mean that a page invalidation
> would end up being a writeback error. So, if iomap returns zero, fall
> back to biffered I/O. XFS has never supported fallback to buffered I/O.
> I hope it is not "never will" ;)

I wouldn't say "never", but we are not going to change XFS behaviour
because btrfs has a page invalidation vs DIO bug in it...

If you want to make a specific filesystem fall back to buffered IO
in this case, pass a new flag into iomap_dio_rw() to conditionally
abort the DIO on invalidation failure and let filesystem
implementations opt-in to use that flag.

Cheers,

Dave.
Dave Chinner June 10, 2020, 5:05 a.m. UTC | #2
On Wed, Jun 10, 2020 at 12:59:00PM +1000, Dave Chinner wrote:
> [ Please cc the XFS list on XFS and iomap infrastructure changes.]
> 
> On Fri, Jun 05, 2020 at 03:48:35PM -0500, Goldwyn Rodrigues wrote:
> > In current scenarios, for XFS, it would mean that a page invalidation
> > would end up being a writeback error. So, if iomap returns zero, fall
> > back to biffered I/O. XFS has never supported fallback to buffered I/O.
> > I hope it is not "never will" ;)
> 
> I wouldn't say "never", but we are not going to change XFS behaviour
> because btrfs has a page invalidation vs DIO bug in it...

Let me point out a specific "oh shit, I didn't think of that" sort
of problem that your blind fallback to buffered IO causes. Do this
via direct IO:

	pwritev2(RWF_NOWAIT)

now have it fail invalidation in the direct IO path and fallback to
buffered write. What does buffered write do with it?


	if (iocb->ki_flags & IOCB_NOWAIT)
		return -EOPNOTSUPP;

Yup, userspace gets a completely spurious and bogus -EOPNOTSUPP
error to pwritev2() because some 3rd party is accessing the same
file via mmap or buffered IO.

Cheers,

Dave.
Qu Wenruo June 10, 2020, 5:31 a.m. UTC | #3
On 2020/6/6 上午4:48, Goldwyn Rodrigues wrote:
> In current scenarios, for XFS, it would mean that a page invalidation
> would end up being a writeback error. So, if iomap returns zero, fall
> back to biffered I/O. XFS has never supported fallback to buffered I/O.
> I hope it is not "never will" ;)
> 
> With mixed buffered and direct writes in btrfs, the pages may not be
> released the extent may be locked in the ordered extents cleanup thread,

I'm wondering can we handle this case in a different way.

In fact btrfs has its own special handling for invalidating pages.
Btrfs will first look for any ordered extents covering the page, finish
the ordered extent manually, then invalidate the page.

I'm not sure why invalidate_inode_pages2_range() used in dio iomap code
does not use the fs specific invalidatepage(), but only do_lander_page()
then releasepage().

Shouldn'y we btrfs implement the lander_page() to handle ordered extents
properly?
Or is there any special requirement?

Thanks,
Qu

> which must make changes to the btrfs trees. In case of btrfs, if it is
> possible to wait, depending on the memory flags passed, wait for extent
> bit to be cleared so direct I/O is executed so there is no need to
> fallback to buffered I/O.
>
Goldwyn Rodrigues June 11, 2020, 2:11 p.m. UTC | #4
On 13:31 10/06, Qu Wenruo wrote:
> 
> 
> On 2020/6/6 上午4:48, Goldwyn Rodrigues wrote:
> > In current scenarios, for XFS, it would mean that a page invalidation
> > would end up being a writeback error. So, if iomap returns zero, fall
> > back to biffered I/O. XFS has never supported fallback to buffered I/O.
> > I hope it is not "never will" ;)
> > 
> > With mixed buffered and direct writes in btrfs, the pages may not be
> > released the extent may be locked in the ordered extents cleanup thread,
> 
> I'm wondering can we handle this case in a different way.
> 
> In fact btrfs has its own special handling for invalidating pages.
> Btrfs will first look for any ordered extents covering the page, finish
> the ordered extent manually, then invalidate the page.
> 
> I'm not sure why invalidate_inode_pages2_range() used in dio iomap code
> does not use the fs specific invalidatepage(), but only do_lander_page()
> then releasepage().
> 
> Shouldn'y we btrfs implement the lander_page() to handle ordered extents
> properly?
> Or is there any special requirement?
> 

The problem is aops->launder_page() is called only if PG_Dirty is
set. In this case it is not because we just performed a writeback.

Also, it is not just ordered ordered extent writeback which may lock
the extent, a buffered read can lock as well.
Goldwyn Rodrigues June 11, 2020, 2:13 p.m. UTC | #5
On 15:05 10/06, Dave Chinner wrote:
> On Wed, Jun 10, 2020 at 12:59:00PM +1000, Dave Chinner wrote:
> > [ Please cc the XFS list on XFS and iomap infrastructure changes.]
> > 
> > On Fri, Jun 05, 2020 at 03:48:35PM -0500, Goldwyn Rodrigues wrote:
> > > In current scenarios, for XFS, it would mean that a page invalidation
> > > would end up being a writeback error. So, if iomap returns zero, fall
> > > back to biffered I/O. XFS has never supported fallback to buffered I/O.
> > > I hope it is not "never will" ;)
> > 
> > I wouldn't say "never", but we are not going to change XFS behaviour
> > because btrfs has a page invalidation vs DIO bug in it...
> 
> Let me point out a specific "oh shit, I didn't think of that" sort
> of problem that your blind fallback to buffered IO causes. Do this
> via direct IO:
> 
> 	pwritev2(RWF_NOWAIT)
> 
> now have it fail invalidation in the direct IO path and fallback to
> buffered write. What does buffered write do with it?
> 
> 
> 	if (iocb->ki_flags & IOCB_NOWAIT)
> 		return -EOPNOTSUPP;
> 
> Yup, userspace gets a completely spurious and bogus -EOPNOTSUPP
> error to pwritev2() because some 3rd party is accessing the same
> file via mmap or buffered IO.
> 

Oh shit, I didn't think about that!

I think adding a flag to iomap_dio_rw() to return in case of page
invalidation failure is the best option for now.
Qu Wenruo June 12, 2020, 12:56 p.m. UTC | #6
On 2020/6/11 下午10:11, Goldwyn Rodrigues wrote:
> On 13:31 10/06, Qu Wenruo wrote:
>>
>>
>> On 2020/6/6 上午4:48, Goldwyn Rodrigues wrote:
>>> In current scenarios, for XFS, it would mean that a page invalidation
>>> would end up being a writeback error. So, if iomap returns zero, fall
>>> back to biffered I/O. XFS has never supported fallback to buffered I/O.
>>> I hope it is not "never will" ;)
>>>
>>> With mixed buffered and direct writes in btrfs, the pages may not be
>>> released the extent may be locked in the ordered extents cleanup thread,
>>
>> I'm wondering can we handle this case in a different way.
>>
>> In fact btrfs has its own special handling for invalidating pages.
>> Btrfs will first look for any ordered extents covering the page, finish
>> the ordered extent manually, then invalidate the page.
>>
>> I'm not sure why invalidate_inode_pages2_range() used in dio iomap code
>> does not use the fs specific invalidatepage(), but only do_lander_page()
>> then releasepage().
>>
>> Shouldn'y we btrfs implement the lander_page() to handle ordered extents
>> properly?
>> Or is there any special requirement?
>>
> 
> The problem is aops->launder_page() is called only if PG_Dirty is
> set. In this case it is not because we just performed a writeback.

For the dio iomap code, before btrfs_finish_ordered_io(), the pages in
ordered ranges are still dirty.
So launder_page() here can still get triggered to finish the ordered extent.

> 
> Also, it is not just ordered ordered extent writeback which may lock
> the extent, a buffered read can lock as well.
> 
That's right.
But at least that would greatly reduce the possibility for btrfs to fall
back to buffered IO, right?

Thanks,
Qu