direct-io: Fix negative return from dio read beyond eof
diff mbox

Message ID 1447964734-16010-1-git-send-email-jack@suse.cz
State New
Headers show

Commit Message

Jan Kara Nov. 19, 2015, 8:25 p.m. UTC
Assume a filesystem with 4KB blocks. When a file has size 1000 bytes and
we issue direct IO read at offset 1024, blockdev_direct_IO() reads the
tail of the last block and the logic for handling short DIO reads in
dio_complete() results in a return value -24 (1000 - 1024) which
obviously confuses userspace.

Fix the problem by bailing out early once we sample i_size and can
reliably check that direct IO read starts beyond i_size.

Reported-by: Avi Kivity <avi@scylladb.com>
Fixes: 9fe55eea7e4b444bafc42fa0000cc2d1d2847275
CC: stable@vger.kernel.org
CC: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/direct-io.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

Avi, this patch fixes the issue for me.

								Honza

Comments

Jan Kara Nov. 30, 2015, 1:10 p.m. UTC | #1
On Thu 19-11-15 21:25:34, Jan Kara wrote:
> Assume a filesystem with 4KB blocks. When a file has size 1000 bytes and
> we issue direct IO read at offset 1024, blockdev_direct_IO() reads the
> tail of the last block and the logic for handling short DIO reads in
> dio_complete() results in a return value -24 (1000 - 1024) which
> obviously confuses userspace.
> 
> Fix the problem by bailing out early once we sample i_size and can
> reliably check that direct IO read starts beyond i_size.
> 
> Reported-by: Avi Kivity <avi@scylladb.com>
> Fixes: 9fe55eea7e4b444bafc42fa0000cc2d1d2847275
> CC: stable@vger.kernel.org
> CC: Steven Whitehouse <swhiteho@redhat.com>
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  fs/direct-io.c | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> Avi, this patch fixes the issue for me.

Jens, can you pick up this fix please? Thanks!

								Honza

> diff --git a/fs/direct-io.c b/fs/direct-io.c
> index 18e7554cf94c..08094c9d8172 100644
> --- a/fs/direct-io.c
> +++ b/fs/direct-io.c
> @@ -1163,6 +1163,15 @@ do_blockdev_direct_IO(struct kiocb *iocb, struct inode *inode,
>  		}
>  	}
>  
> +	/* Once we sampled i_size check for reads beyond EOF */
> +	dio->i_size = i_size_read(inode);
> +	if (iov_iter_rw(iter) == READ && offset >= dio->i_size) {
> +		if (dio->flags & DIO_LOCKING)
> +			mutex_unlock(&inode->i_mutex);
> +		kmem_cache_free(dio_cache, dio);
> +		goto out;
> +	}
> +
>  	/*
>  	 * For file extending writes updating i_size before data writeouts
>  	 * complete can expose uninitialized blocks in dumb filesystems.
> @@ -1216,7 +1225,6 @@ do_blockdev_direct_IO(struct kiocb *iocb, struct inode *inode,
>  	sdio.next_block_for_io = -1;
>  
>  	dio->iocb = iocb;
> -	dio->i_size = i_size_read(inode);
>  
>  	spin_lock_init(&dio->bio_lock);
>  	dio->refcount = 1;
> -- 
> 2.1.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jens Axboe Nov. 30, 2015, 5:16 p.m. UTC | #2
On 11/30/2015 06:10 AM, Jan Kara wrote:
> On Thu 19-11-15 21:25:34, Jan Kara wrote:
>> Assume a filesystem with 4KB blocks. When a file has size 1000 bytes and
>> we issue direct IO read at offset 1024, blockdev_direct_IO() reads the
>> tail of the last block and the logic for handling short DIO reads in
>> dio_complete() results in a return value -24 (1000 - 1024) which
>> obviously confuses userspace.
>>
>> Fix the problem by bailing out early once we sample i_size and can
>> reliably check that direct IO read starts beyond i_size.
>>
>> Reported-by: Avi Kivity <avi@scylladb.com>
>> Fixes: 9fe55eea7e4b444bafc42fa0000cc2d1d2847275
>> CC: stable@vger.kernel.org
>> CC: Steven Whitehouse <swhiteho@redhat.com>
>> Signed-off-by: Jan Kara <jack@suse.cz>
>> ---
>>   fs/direct-io.c | 10 +++++++++-
>>   1 file changed, 9 insertions(+), 1 deletion(-)
>>
>> Avi, this patch fixes the issue for me.
>
> Jens, can you pick up this fix please? Thanks!

Yup, added for 4.4, thanks!
Avi Kivity Jan. 27, 2016, 10:38 a.m. UTC | #3
On 11/19/2015 10:25 PM, Jan Kara wrote:
> Assume a filesystem with 4KB blocks. When a file has size 1000 bytes and
> we issue direct IO read at offset 1024, blockdev_direct_IO() reads the
> tail of the last block and the logic for handling short DIO reads in
> dio_complete() results in a return value -24 (1000 - 1024) which
> obviously confuses userspace.
>
> Fix the problem by bailing out early once we sample i_size and can
> reliably check that direct IO read starts beyond i_size.
>
> Reported-by: Avi Kivity <avi@scylladb.com>
> Fixes: 9fe55eea7e4b444bafc42fa0000cc2d1d2847275
> CC: stable@vger.kernel.org

While this patch made it into upstream, it did not appear in 4.3.4. Did 
it slip through the proverbial cracks?  Can it be queued for 4.3.5?

> CC: Steven Whitehouse <swhiteho@redhat.com>
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>   fs/direct-io.c | 10 +++++++++-
>   1 file changed, 9 insertions(+), 1 deletion(-)
>
> Avi, this patch fixes the issue for me.
>
> 								Honza
>
> diff --git a/fs/direct-io.c b/fs/direct-io.c
> index 18e7554cf94c..08094c9d8172 100644
> --- a/fs/direct-io.c
> +++ b/fs/direct-io.c
> @@ -1163,6 +1163,15 @@ do_blockdev_direct_IO(struct kiocb *iocb, struct inode *inode,
>   		}
>   	}
>   
> +	/* Once we sampled i_size check for reads beyond EOF */
> +	dio->i_size = i_size_read(inode);
> +	if (iov_iter_rw(iter) == READ && offset >= dio->i_size) {
> +		if (dio->flags & DIO_LOCKING)
> +			mutex_unlock(&inode->i_mutex);
> +		kmem_cache_free(dio_cache, dio);
> +		goto out;
> +	}
> +
>   	/*
>   	 * For file extending writes updating i_size before data writeouts
>   	 * complete can expose uninitialized blocks in dumb filesystems.
> @@ -1216,7 +1225,6 @@ do_blockdev_direct_IO(struct kiocb *iocb, struct inode *inode,
>   	sdio.next_block_for_io = -1;
>   
>   	dio->iocb = iocb;
> -	dio->i_size = i_size_read(inode);
>   
>   	spin_lock_init(&dio->bio_lock);
>   	dio->refcount = 1;

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Greg KH Jan. 27, 2016, 5:13 p.m. UTC | #4
On Wed, Jan 27, 2016 at 12:38:14PM +0200, Avi Kivity wrote:
> On 11/19/2015 10:25 PM, Jan Kara wrote:
> >Assume a filesystem with 4KB blocks. When a file has size 1000 bytes and
> >we issue direct IO read at offset 1024, blockdev_direct_IO() reads the
> >tail of the last block and the logic for handling short DIO reads in
> >dio_complete() results in a return value -24 (1000 - 1024) which
> >obviously confuses userspace.
> >
> >Fix the problem by bailing out early once we sample i_size and can
> >reliably check that direct IO read starts beyond i_size.
> >
> >Reported-by: Avi Kivity <avi@scylladb.com>
> >Fixes: 9fe55eea7e4b444bafc42fa0000cc2d1d2847275
> >CC: stable@vger.kernel.org
> 
> While this patch made it into upstream, it did not appear in 4.3.4. Did it
> slip through the proverbial cracks?  Can it be queued for 4.3.5?

There are over 400 patches right now in my queue that haven't made it
into a 4.3.x kernel.  These are in the queue, in good company :)

I'll go dig these out as I guess people care about them more than
others...

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Avi Kivity Jan. 27, 2016, 5:16 p.m. UTC | #5
On 01/27/2016 07:13 PM, Greg KH wrote:
> On Wed, Jan 27, 2016 at 12:38:14PM +0200, Avi Kivity wrote:
>> On 11/19/2015 10:25 PM, Jan Kara wrote:
>>> Assume a filesystem with 4KB blocks. When a file has size 1000 bytes and
>>> we issue direct IO read at offset 1024, blockdev_direct_IO() reads the
>>> tail of the last block and the logic for handling short DIO reads in
>>> dio_complete() results in a return value -24 (1000 - 1024) which
>>> obviously confuses userspace.
>>>
>>> Fix the problem by bailing out early once we sample i_size and can
>>> reliably check that direct IO read starts beyond i_size.
>>>
>>> Reported-by: Avi Kivity <avi@scylladb.com>
>>> Fixes: 9fe55eea7e4b444bafc42fa0000cc2d1d2847275
>>> CC: stable@vger.kernel.org
>> While this patch made it into upstream, it did not appear in 4.3.4. Did it
>> slip through the proverbial cracks?  Can it be queued for 4.3.5?
> There are over 400 patches right now in my queue that haven't made it
> into a 4.3.x kernel.

Many projects would consider 400 patches a major release, and here they 
are behind two dots...

>    These are in the queue, in good company :)
>
> I'll go dig these out as I guess people care about them more than
> others...
>
>

Much appreciated.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Greg KH Jan. 27, 2016, 5:45 p.m. UTC | #6
On Wed, Jan 27, 2016 at 07:16:32PM +0200, Avi Kivity wrote:
> On 01/27/2016 07:13 PM, Greg KH wrote:
> >On Wed, Jan 27, 2016 at 12:38:14PM +0200, Avi Kivity wrote:
> >>On 11/19/2015 10:25 PM, Jan Kara wrote:
> >>>Assume a filesystem with 4KB blocks. When a file has size 1000 bytes and
> >>>we issue direct IO read at offset 1024, blockdev_direct_IO() reads the
> >>>tail of the last block and the logic for handling short DIO reads in
> >>>dio_complete() results in a return value -24 (1000 - 1024) which
> >>>obviously confuses userspace.
> >>>
> >>>Fix the problem by bailing out early once we sample i_size and can
> >>>reliably check that direct IO read starts beyond i_size.
> >>>
> >>>Reported-by: Avi Kivity <avi@scylladb.com>
> >>>Fixes: 9fe55eea7e4b444bafc42fa0000cc2d1d2847275
> >>>CC: stable@vger.kernel.org
> >>While this patch made it into upstream, it did not appear in 4.3.4. Did it
> >>slip through the proverbial cracks?  Can it be queued for 4.3.5?
> >There are over 400 patches right now in my queue that haven't made it
> >into a 4.3.x kernel.
> 
> Many projects would consider 400 patches a major release, and here they are
> behind two dots...

Look at the percentages, it's just a tiny drop in the bucket compared to
what is hitting Linus's tree :)

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Greg KH Jan. 27, 2016, 5:46 p.m. UTC | #7
On Wed, Jan 27, 2016 at 07:16:32PM +0200, Avi Kivity wrote:
> >   These are in the queue, in good company :)
> >
> >I'll go dig these out as I guess people care about them more than
> >others...
> >
> >
> 
> Much appreciated.

Note, they didn't build on 3.14 so I had to revert both of them.  If you
could provide a fix for that release, that would be great.

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Avi Kivity Jan. 27, 2016, 5:49 p.m. UTC | #8
On 01/27/2016 07:46 PM, Greg KH wrote:
> On Wed, Jan 27, 2016 at 07:16:32PM +0200, Avi Kivity wrote:
>>>    These are in the queue, in good company :)
>>>
>>> I'll go dig these out as I guess people care about them more than
>>> others...
>>>
>>>
>> Much appreciated.
> Note, they didn't build on 3.14

I think the regression was introduced post 14, let me check.

>   so I had to revert both of them.

Both of what?  There was only one patch.  Or did you mean you drop this 
patch for 4.3.5?

>    If you
> could provide a fix for that release, that would be great.
>


--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Avi Kivity Jan. 27, 2016, 5:52 p.m. UTC | #9
On 01/27/2016 07:49 PM, Avi Kivity wrote:
> On 01/27/2016 07:46 PM, Greg KH wrote:
>> On Wed, Jan 27, 2016 at 07:16:32PM +0200, Avi Kivity wrote:
>>>>    These are in the queue, in good company :)
>>>>
>>>> I'll go dig these out as I guess people care about them more than
>>>> others...
>>>>
>>>>
>>> Much appreciated.
>> Note, they didn't build on 3.14
>
> I think the regression was introduced post 14, let me check.
>

Actually it was introduced in 3.14-rc1.  I could try a backport, but it 
would be better if someone who actually knows the code did this.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Greg KH Jan. 27, 2016, 5:59 p.m. UTC | #10
On Wed, Jan 27, 2016 at 07:49:15PM +0200, Avi Kivity wrote:
> On 01/27/2016 07:46 PM, Greg KH wrote:
> >On Wed, Jan 27, 2016 at 07:16:32PM +0200, Avi Kivity wrote:
> >>>   These are in the queue, in good company :)
> >>>
> >>>I'll go dig these out as I guess people care about them more than
> >>>others...
> >>>
> >>>
> >>Much appreciated.
> >Note, they didn't build on 3.14
> 
> I think the regression was introduced post 14, let me check.

Not according to the stable tag in the commit :)

> >  so I had to revert both of them.
> 
> Both of what?  There was only one patch.  Or did you mean you drop this
> patch for 4.3.5?

There was a follow-on fix for this patch as well, commit
2d4594acbf6d8f75a27f3578476b6a27d8b13ebb.

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch
diff mbox

diff --git a/fs/direct-io.c b/fs/direct-io.c
index 18e7554cf94c..08094c9d8172 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -1163,6 +1163,15 @@  do_blockdev_direct_IO(struct kiocb *iocb, struct inode *inode,
 		}
 	}
 
+	/* Once we sampled i_size check for reads beyond EOF */
+	dio->i_size = i_size_read(inode);
+	if (iov_iter_rw(iter) == READ && offset >= dio->i_size) {
+		if (dio->flags & DIO_LOCKING)
+			mutex_unlock(&inode->i_mutex);
+		kmem_cache_free(dio_cache, dio);
+		goto out;
+	}
+
 	/*
 	 * For file extending writes updating i_size before data writeouts
 	 * complete can expose uninitialized blocks in dumb filesystems.
@@ -1216,7 +1225,6 @@  do_blockdev_direct_IO(struct kiocb *iocb, struct inode *inode,
 	sdio.next_block_for_io = -1;
 
 	dio->iocb = iocb;
-	dio->i_size = i_size_read(inode);
 
 	spin_lock_init(&dio->bio_lock);
 	dio->refcount = 1;