diff mbox series

fs: Return EOPNOTSUPP if block layer does not support REQ_NOWAIT

Message ID 20181213115306.fm2mjc3qszjiwkgf@merlin (mailing list archive)
State New, archived
Headers show
Series fs: Return EOPNOTSUPP if block layer does not support REQ_NOWAIT | expand

Commit Message

Goldwyn Rodrigues Dec. 13, 2018, 11:53 a.m. UTC
For AIO+DIO with RWF_NOWAIT, if the block layer does not support REQ_NOWAIT,
it returns EIO. Return EOPNOTSUPP to represent the correct error code.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
---
 fs/direct-io.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

Comments

Avi Kivity Dec. 13, 2018, 12:04 p.m. UTC | #1
On 12/13/18 1:53 PM, Goldwyn Rodrigues wrote:
> For AIO+DIO with RWF_NOWAIT, if the block layer does not support REQ_NOWAIT,
> it returns EIO. Return EOPNOTSUPP to represent the correct error code.


Cc: stable@?


> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
> ---
>   fs/direct-io.c | 11 +++++++----
>   1 file changed, 7 insertions(+), 4 deletions(-)
>
> diff --git a/fs/direct-io.c b/fs/direct-io.c
> index 41a0e97252ae..77adf33916b8 100644
> --- a/fs/direct-io.c
> +++ b/fs/direct-io.c
> @@ -542,10 +542,13 @@ static blk_status_t dio_bio_complete(struct dio *dio, struct bio *bio)
>   	blk_status_t err = bio->bi_status;
>   
>   	if (err) {
> -		if (err == BLK_STS_AGAIN && (bio->bi_opf & REQ_NOWAIT))
> -			dio->io_error = -EAGAIN;
> -		else
> -			dio->io_error = -EIO;
> +		dio->io_error = -EIO;
> +		if (bio->bi_opf & REQ_NOWAIT) {
> +			if (err == BLK_STS_AGAIN)
> +				dio->io_error = -EAGAIN;
> +			else if (err == BLK_STS_NOTSUPP)
> +				dio->io_error = -EOPNOTSUPP;
> +		}
>   	}
>   
>   	if (dio->is_async && dio->op == REQ_OP_READ && dio->should_dirty) {


Looks good. I wonder why it only shows up so rarely. Is there an 
alternative path that generates EOPNOTSUPP, that works most of the time?
Christoph Hellwig Dec. 13, 2018, 2:24 p.m. UTC | #2
On Thu, Dec 13, 2018 at 02:04:41PM +0200, Avi Kivity wrote:
> On 12/13/18 1:53 PM, Goldwyn Rodrigues wrote:
> > For AIO+DIO with RWF_NOWAIT, if the block layer does not support REQ_NOWAIT,
> > it returns EIO. Return EOPNOTSUPP to represent the correct error code.
> 
> 
> Cc: stable@?
> 
> 
> > Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
> > ---
> >   fs/direct-io.c | 11 +++++++----
> >   1 file changed, 7 insertions(+), 4 deletions(-)
> > 
> > diff --git a/fs/direct-io.c b/fs/direct-io.c
> > index 41a0e97252ae..77adf33916b8 100644
> > --- a/fs/direct-io.c
> > +++ b/fs/direct-io.c
> > @@ -542,10 +542,13 @@ static blk_status_t dio_bio_complete(struct dio *dio, struct bio *bio)
> >   	blk_status_t err = bio->bi_status;
> >   	if (err) {

I think this just need to become:

	if (err)
		dio->io_error = blk_status_to_errno(bio->bi_status);

And Avi, you really should be using XFS ;-)
Goldwyn Rodrigues Dec. 13, 2018, 3:44 p.m. UTC | #3
On  6:24 13/12, Christoph Hellwig wrote:
> On Thu, Dec 13, 2018 at 02:04:41PM +0200, Avi Kivity wrote:
> > On 12/13/18 1:53 PM, Goldwyn Rodrigues wrote:
> > > For AIO+DIO with RWF_NOWAIT, if the block layer does not support REQ_NOWAIT,
> > > it returns EIO. Return EOPNOTSUPP to represent the correct error code.
> > 
> > 
> > Cc: stable@?
> > 
> > 
> > > Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
> > > ---
> > >   fs/direct-io.c | 11 +++++++----
> > >   1 file changed, 7 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/fs/direct-io.c b/fs/direct-io.c
> > > index 41a0e97252ae..77adf33916b8 100644
> > > --- a/fs/direct-io.c
> > > +++ b/fs/direct-io.c
> > > @@ -542,10 +542,13 @@ static blk_status_t dio_bio_complete(struct dio *dio, struct bio *bio)
> > >   	blk_status_t err = bio->bi_status;
> > >   	if (err) {
> 
> I think this just need to become:
> 
> 	if (err)
> 		dio->io_error = blk_status_to_errno(bio->bi_status);
> 

Ahh.. Din't of it's existence. Yes, the function is much more elaborate.

Thanks!
Matthew Wilcox Dec. 13, 2018, 4:27 p.m. UTC | #4
On Thu, Dec 13, 2018 at 05:53:06AM -0600, Goldwyn Rodrigues wrote:
> For AIO+DIO with RWF_NOWAIT, if the block layer does not support REQ_NOWAIT,
> it returns EIO. Return EOPNOTSUPP to represent the correct error code.

Why is EOPNOTSUPP the "correct" error code?  That's a networking error,
not a block layer error.
Goldwyn Rodrigues Dec. 13, 2018, 7:04 p.m. UTC | #5
On  8:27 13/12, Matthew Wilcox wrote:
> On Thu, Dec 13, 2018 at 05:53:06AM -0600, Goldwyn Rodrigues wrote:
> > For AIO+DIO with RWF_NOWAIT, if the block layer does not support REQ_NOWAIT,
> > it returns EIO. Return EOPNOTSUPP to represent the correct error code.
> 
> Why is EOPNOTSUPP the "correct" error code?  That's a networking error,
> not a block layer error.

No. We return EOPNOTSUPP in filesystems as well, in case RWF_NOWAIT is not
supported.
Dave Chinner Dec. 13, 2018, 10:43 p.m. UTC | #6
On Thu, Dec 13, 2018 at 05:53:06AM -0600, Goldwyn Rodrigues wrote:
> For AIO+DIO with RWF_NOWAIT, if the block layer does not support REQ_NOWAIT,
> it returns EIO. Return EOPNOTSUPP to represent the correct error code.

Say what?

Does this mean that if a filesystem supports RWF_NOWAIT, but the
underlying block device/storage stack doesn't support it, then we'll
getting EIO/EOPNOTSUPP errors returned to userspace?

Isn't that highly unfriendly to userspace applications? i.e. instead
of just ignoring RWF_NOWAIT in this case and having the AIO succeed,
we return a /fatal/ error from deep in the guts of the IO subsystem
that the user has no obvious way of tracking down?

I'm also concerned that this is highly hardware dependent - two
identical filesystems on different storage hardware on the same
machine could behave differently. i.e. it works on one filesystem
but not on the other, and there's no way to tell when it will work
or fail apart from trying to use RWF_NOWAIT?

I'd also like to point out that this errori (whether EIO or
EOPNOTSUPP) is completely undocumented in the preadv2/pwritev2 man
page, so application developers that get bug reports about
EOPNOTSUPP errors are going to be rather confused....

Cheers,

Dave.
Goldwyn Rodrigues Dec. 14, 2018, 5:09 p.m. UTC | #7
On  9:43 14/12, Dave Chinner wrote:
> On Thu, Dec 13, 2018 at 05:53:06AM -0600, Goldwyn Rodrigues wrote:
> > For AIO+DIO with RWF_NOWAIT, if the block layer does not support REQ_NOWAIT,
> > it returns EIO. Return EOPNOTSUPP to represent the correct error code.
> 
> Say what?
> 
> Does this mean that if a filesystem supports RWF_NOWAIT, but the
> underlying block device/storage stack doesn't support it, then we'll
> getting EIO/EOPNOTSUPP errors returned to userspace?
> 
> Isn't that highly unfriendly to userspace applications? i.e. instead
> of just ignoring RWF_NOWAIT in this case and having the AIO succeed,
> we return a /fatal/ error from deep in the guts of the IO subsystem
> that the user has no obvious way of tracking down?

Well, if it is not supported, we'd rather let users decide how they
want to handle it rather than manipulating the request in the kernel.
For all you know, it could be a probe call to understand if RWF_NOWAIT
is supported or not.

> 
> I'm also concerned that this is highly hardware dependent - two
> identical filesystems on different storage hardware on the same
> machine could behave differently. i.e. it works on one filesystem
> but not on the other, and there's no way to tell when it will work
> or fail apart from trying to use RWF_NOWAIT?

I was not too happy getting it all the way down to block layer either.
The multi-devices makes it worse. However, here we are and we need to
tell the user that RWF_NOWAIT is not supported in this environment.

> 
> I'd also like to point out that this errori (whether EIO or
> EOPNOTSUPP) is completely undocumented in the preadv2/pwritev2 man
> page, so application developers that get bug reports about
> EOPNOTSUPP errors are going to be rather confused....

Yes, I will send a patch to update the man page.
Avi Kivity Dec. 16, 2018, 10:45 a.m. UTC | #8
On 12/13/18 4:24 PM, Christoph Hellwig wrote:
>
>>> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
>>> ---
>>>    fs/direct-io.c | 11 +++++++----
>>>    1 file changed, 7 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/fs/direct-io.c b/fs/direct-io.c
>>> index 41a0e97252ae..77adf33916b8 100644
>>> --- a/fs/direct-io.c
>>> +++ b/fs/direct-io.c
>>> @@ -542,10 +542,13 @@ static blk_status_t dio_bio_complete(struct dio *dio, struct bio *bio)
>>>    	blk_status_t err = bio->bi_status;
>>>    	if (err) {
> I think this just need to become:
>
> 	if (err)
> 		dio->io_error = blk_status_to_errno(bio->bi_status);
>
> And Avi, you really should be using XFS ;-)


I did see this on XFS too. The whole thing bothers me, it doesn't happen 
consistently in some setups, which I don't understand. Either it should 
trigger always or never.
Dave Chinner Dec. 16, 2018, 9:35 p.m. UTC | #9
On Fri, Dec 14, 2018 at 11:09:10AM -0600, Goldwyn Rodrigues wrote:
> On  9:43 14/12, Dave Chinner wrote:
> > On Thu, Dec 13, 2018 at 05:53:06AM -0600, Goldwyn Rodrigues wrote:
> > > For AIO+DIO with RWF_NOWAIT, if the block layer does not support REQ_NOWAIT,
> > > it returns EIO. Return EOPNOTSUPP to represent the correct error code.
> > 
> > Say what?
> > 
> > Does this mean that if a filesystem supports RWF_NOWAIT, but the
> > underlying block device/storage stack doesn't support it, then we'll
> > getting EIO/EOPNOTSUPP errors returned to userspace?
> > 
> > Isn't that highly unfriendly to userspace applications? i.e. instead
> > of just ignoring RWF_NOWAIT in this case and having the AIO succeed,
> > we return a /fatal/ error from deep in the guts of the IO subsystem
> > that the user has no obvious way of tracking down?
> 
> Well, if it is not supported, we'd rather let users decide how they
> want to handle it rather than manipulating the request in the kernel.
> For all you know, it could be a probe call to understand if RWF_NOWAIT
> is supported or not.

So even though the filesystem supports it and the app can avaoid
blocking on filesystem locks (the biggest problem they have by far),
we're going to prevent the filesystems from being non-blocking
because the underlying block device isn't non blocking?

That amkes no sense to me at all.

> > I'm also concerned that this is highly hardware dependent - two
> > identical filesystems on different storage hardware on the same
> > machine could behave differently. i.e. it works on one filesystem
> > but not on the other, and there's no way to tell when it will work
> > or fail apart from trying to use RWF_NOWAIT?
> 
> I was not too happy getting it all the way down to block layer either.
> The multi-devices makes it worse. However, here we are and we need to
> tell the user that RWF_NOWAIT is not supported in this environment.

RWF_NOWAIT matters for filesystems much more than the underlying
block device. If the application is accessing the blockd evice
directly, then yes, RWF_NOWAIT support in the block device matters.
But when the IO is being done through the filesystem it's far more
important to avoid blocking on filesystem locks that whatever the
block device does....

Hence I think that if the bio is coming from a filesystem,
REQ_NOWAIT should always be accepted or bounced with EAGAIN and
never failed with EOPNOTSUPP. It just makes no sense at all for
filesytsem based IO....

Cheers,

Dave.
Christoph Hellwig Dec. 17, 2018, 5:38 p.m. UTC | #10
On Sun, Dec 16, 2018 at 12:45:19PM +0200, Avi Kivity wrote:
> I did see this on XFS too. The whole thing bothers me, it doesn't happen
> consistently in some setups, which I don't understand. Either it should
> trigger always or never.

Well, if it also happens in XFS the above change isn't going to fix
it alone, there must be another issue hiding in addition to the error
conversion problems.
Goldwyn Rodrigues Dec. 18, 2018, 11:53 a.m. UTC | #11
On  8:35 17/12, Dave Chinner wrote:
> > 
> > I was not too happy getting it all the way down to block layer either.
> > The multi-devices makes it worse. However, here we are and we need to
> > tell the user that RWF_NOWAIT is not supported in this environment.
> 
> RWF_NOWAIT matters for filesystems much more than the underlying
> block device. If the application is accessing the blockd evice
> directly, then yes, RWF_NOWAIT support in the block device matters.
> But when the IO is being done through the filesystem it's far more
> important to avoid blocking on filesystem locks that whatever the
> block device does....
> 
> Hence I think that if the bio is coming from a filesystem,
> REQ_NOWAIT should always be accepted or bounced with EAGAIN and
> never failed with EOPNOTSUPP. It just makes no sense at all for
> filesytsem based IO....

It was initially suggested where the block layer would retry getting
a bio in get_request(). While request based devices were fine, the bio
based ones such as MD needed extra work. However, when I actually got
down to writing code for multi-device, it got more hurdles than
solutions primarily in the area of bio merging.

RWF_NOWAIT should have been restricted to filesystems and I think we
should do away (or at least ignore) REQ_NOWAIT for now.
Goldwyn Rodrigues Dec. 18, 2018, 11:55 a.m. UTC | #12
On  9:38 17/12, Christoph Hellwig wrote:
> On Sun, Dec 16, 2018 at 12:45:19PM +0200, Avi Kivity wrote:
> > I did see this on XFS too. The whole thing bothers me, it doesn't happen
> > consistently in some setups, which I don't understand. Either it should
> > trigger always or never.
> 
> Well, if it also happens in XFS the above change isn't going to fix
> it alone, there must be another issue hiding in addition to the error
> conversion problems.

Are you using multi-device setup as your block device? That could make
it return EOPNOTSUPP since we never got to a point where we could
merge code which supported bio based devices.
Avi Kivity Dec. 20, 2018, 3:32 p.m. UTC | #13
On 12/18/18 1:55 PM, Goldwyn Rodrigues wrote:
> On  9:38 17/12, Christoph Hellwig wrote:
>> On Sun, Dec 16, 2018 at 12:45:19PM +0200, Avi Kivity wrote:
>>> I did see this on XFS too. The whole thing bothers me, it doesn't happen
>>> consistently in some setups, which I don't understand. Either it should
>>> trigger always or never.
>> Well, if it also happens in XFS the above change isn't going to fix
>> it alone, there must be another issue hiding in addition to the error
>> conversion problems.
> Are you using multi-device setup as your block device? That could make
> it return EOPNOTSUPP since we never got to a point where we could
> merge code which supported bio based devices.
>

Yes, an lvm linear device on top of a single SATA SSD.
Avi Kivity July 22, 2020, 4:08 p.m. UTC | #14
On 13/12/2018 13.53, Goldwyn Rodrigues wrote:
> For AIO+DIO with RWF_NOWAIT, if the block layer does not support REQ_NOWAIT,
> it returns EIO. Return EOPNOTSUPP to represent the correct error code.
>
> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
> ---
>   fs/direct-io.c | 11 +++++++----
>   1 file changed, 7 insertions(+), 4 deletions(-)
>
> diff --git a/fs/direct-io.c b/fs/direct-io.c
> index 41a0e97252ae..77adf33916b8 100644
> --- a/fs/direct-io.c
> +++ b/fs/direct-io.c
> @@ -542,10 +542,13 @@ static blk_status_t dio_bio_complete(struct dio *dio, struct bio *bio)
>   	blk_status_t err = bio->bi_status;
>   
>   	if (err) {
> -		if (err == BLK_STS_AGAIN && (bio->bi_opf & REQ_NOWAIT))
> -			dio->io_error = -EAGAIN;
> -		else
> -			dio->io_error = -EIO;
> +		dio->io_error = -EIO;
> +		if (bio->bi_opf & REQ_NOWAIT) {
> +			if (err == BLK_STS_AGAIN)
> +				dio->io_error = -EAGAIN;
> +			else if (err == BLK_STS_NOTSUPP)
> +				dio->io_error = -EOPNOTSUPP;
> +		}
>   	}
>   
>   	if (dio->is_async && dio->op == REQ_OP_READ && dio->should_dirty) {


In the end, did this or some alternative get applied? I'd like to enable 
RWF_NOWAIT support, but EIO scares me and my application.
Goldwyn Rodrigues July 28, 2020, 1:38 p.m. UTC | #15
On 19:08 22/07, Avi Kivity wrote:
> 
> On 13/12/2018 13.53, Goldwyn Rodrigues wrote:
> > For AIO+DIO with RWF_NOWAIT, if the block layer does not support REQ_NOWAIT,
> > it returns EIO. Return EOPNOTSUPP to represent the correct error code.
> > 
> > Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
> > ---
> >   fs/direct-io.c | 11 +++++++----
> >   1 file changed, 7 insertions(+), 4 deletions(-)
> > 
> > diff --git a/fs/direct-io.c b/fs/direct-io.c
> > index 41a0e97252ae..77adf33916b8 100644
> > --- a/fs/direct-io.c
> > +++ b/fs/direct-io.c
> > @@ -542,10 +542,13 @@ static blk_status_t dio_bio_complete(struct dio *dio, struct bio *bio)
> >   	blk_status_t err = bio->bi_status;
> >   	if (err) {
> > -		if (err == BLK_STS_AGAIN && (bio->bi_opf & REQ_NOWAIT))
> > -			dio->io_error = -EAGAIN;
> > -		else
> > -			dio->io_error = -EIO;
> > +		dio->io_error = -EIO;
> > +		if (bio->bi_opf & REQ_NOWAIT) {
> > +			if (err == BLK_STS_AGAIN)
> > +				dio->io_error = -EAGAIN;
> > +			else if (err == BLK_STS_NOTSUPP)
> > +				dio->io_error = -EOPNOTSUPP;
> > +		}
> >   	}
> >   	if (dio->is_async && dio->op == REQ_OP_READ && dio->should_dirty) {
> 
> 
> In the end, did this or some alternative get applied? I'd like to enable
> RWF_NOWAIT support, but EIO scares me and my application.
> 

No, it was not. There were lot of objections to return error from the
block layer for a filesystem nowait request.
Avi Kivity July 28, 2020, 1:47 p.m. UTC | #16
On 28/07/2020 16.38, Goldwyn Rodrigues wrote:
> On 19:08 22/07, Avi Kivity wrote:
>> On 13/12/2018 13.53, Goldwyn Rodrigues wrote:
>>> For AIO+DIO with RWF_NOWAIT, if the block layer does not support REQ_NOWAIT,
>>> it returns EIO. Return EOPNOTSUPP to represent the correct error code.
>>>
>>> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
>>> ---
>>>    fs/direct-io.c | 11 +++++++----
>>>    1 file changed, 7 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/fs/direct-io.c b/fs/direct-io.c
>>> index 41a0e97252ae..77adf33916b8 100644
>>> --- a/fs/direct-io.c
>>> +++ b/fs/direct-io.c
>>> @@ -542,10 +542,13 @@ static blk_status_t dio_bio_complete(struct dio *dio, struct bio *bio)
>>>    	blk_status_t err = bio->bi_status;
>>>    	if (err) {
>>> -		if (err == BLK_STS_AGAIN && (bio->bi_opf & REQ_NOWAIT))
>>> -			dio->io_error = -EAGAIN;
>>> -		else
>>> -			dio->io_error = -EIO;
>>> +		dio->io_error = -EIO;
>>> +		if (bio->bi_opf & REQ_NOWAIT) {
>>> +			if (err == BLK_STS_AGAIN)
>>> +				dio->io_error = -EAGAIN;
>>> +			else if (err == BLK_STS_NOTSUPP)
>>> +				dio->io_error = -EOPNOTSUPP;
>>> +		}
>>>    	}
>>>    	if (dio->is_async && dio->op == REQ_OP_READ && dio->should_dirty) {
>>
>> In the end, did this or some alternative get applied? I'd like to enable
>> RWF_NOWAIT support, but EIO scares me and my application.
>>
> No, it was not. There were lot of objections to return error from the
> block layer for a filesystem nowait request.
>

I see. For me, it makes RWF_NOWAIT unusable, since I have no way to 
distinguish between real EIO and EIO due to this bug.


Maybe the filesystem should ask the block device if it supports nowait 
ahead of time (during mounting), and not pass REQ_NOWAIT at all in those 
cases.
Christoph Hellwig July 31, 2020, 1:11 p.m. UTC | #17
On Wed, Jul 22, 2020 at 07:08:21PM +0300, Avi Kivity wrote:
> 
> On 13/12/2018 13.53, Goldwyn Rodrigues wrote:
> > For AIO+DIO with RWF_NOWAIT, if the block layer does not support REQ_NOWAIT,
> > it returns EIO. Return EOPNOTSUPP to represent the correct error code.
> > 
> > Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>

I think the main problem is the EOPNOTSUPP return value.  Everywhere
else we treat the lack of support as BLK_STS_AGAIN / -EAGAIN, so it
should return that.  Independ of that the legacy direct I/O code
really should just use blk_status_to_errno like most of the other
infrastructure instead of havings it's own conversion and dropping
the detailed error status on the floor.
diff mbox series

Patch

diff --git a/fs/direct-io.c b/fs/direct-io.c
index 41a0e97252ae..77adf33916b8 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -542,10 +542,13 @@  static blk_status_t dio_bio_complete(struct dio *dio, struct bio *bio)
 	blk_status_t err = bio->bi_status;
 
 	if (err) {
-		if (err == BLK_STS_AGAIN && (bio->bi_opf & REQ_NOWAIT))
-			dio->io_error = -EAGAIN;
-		else
-			dio->io_error = -EIO;
+		dio->io_error = -EIO;
+		if (bio->bi_opf & REQ_NOWAIT) {
+			if (err == BLK_STS_AGAIN)
+				dio->io_error = -EAGAIN;
+			else if (err == BLK_STS_NOTSUPP)
+				dio->io_error = -EOPNOTSUPP;
+		}
 	}
 
 	if (dio->is_async && dio->op == REQ_OP_READ && dio->should_dirty) {