xfs: Don't trim file extents during iomap

Message ID	1511437203-4329-1-git-send-email-nborisov@suse.com (mailing list archive)
State	Deferred, archived
Headers	show Return-Path: <linux-xfs-owner@kernel.org> From: Nikolay Borisov <nborisov@suse.com> To: linux-xfs@vger.kernel.org, hch@infradead.org Cc: sandeen@redhat.com, darrick.wong@oracle.com, Nikolay Borisov <nborisov@suse.com> Subject: [PATCH] xfs: Don't trim file extents during iomap Date: Thu, 23 Nov 2017 13:40:03 +0200 Message-Id: <1511437203-4329-1-git-send-email-nborisov@suse.com> Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk

Nikolay Borisov Nov. 23, 2017, 11:40 a.m. UTC

For file extents xfs currently calls xfs_bmapi_read with flag set to 0,
meaning extents are going to be truncated to the requested range. Since the
same codepath is used for fiemap this means xfs is special in that regard, since
other filesystems (ext4/btrfs) do not trim extents for fiemap. Make the behavior
consistent by always passing XFS_BMAPI_ENTIRE. A full xfstest run validated that
this doesn't regress on ordinary read/write IO.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
---
 fs/xfs/xfs_iomap.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

'Christoph Hellwig' Nov. 23, 2017, 1:26 p.m. UTC | #1

On Thu, Nov 23, 2017 at 01:40:03PM +0200, Nikolay Borisov wrote:
> For file extents xfs currently calls xfs_bmapi_read with flag set to 0,
> meaning extents are going to be truncated to the requested range. Since the
> same codepath is used for fiemap this means xfs is special in that regard, since
> other filesystems (ext4/btrfs) do not trim extents for fiemap. Make the behavior
> consistent by always passing XFS_BMAPI_ENTIRE. A full xfstest run validated that
> this doesn't regress on ordinary read/write IO.
> 
> Signed-off-by: Nikolay Borisov <nborisov@suse.com>

Looks good, except that the changelog could use line wrapping at
75 characters :)

Reviewed-by: Christoph Hellwig <hch@lst.de>
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Darrick J. Wong Nov. 27, 2017, 5:44 p.m. UTC | #2

On Thu, Nov 23, 2017 at 01:40:03PM +0200, Nikolay Borisov wrote:
> For file extents xfs currently calls xfs_bmapi_read with flag set to 0,
> meaning extents are going to be truncated to the requested range. Since the
> same codepath is used for fiemap this means xfs is special in that regard, since
> other filesystems (ext4/btrfs) do not trim extents for fiemap. Make the behavior
> consistent by always passing XFS_BMAPI_ENTIRE. A full xfstest run validated that
> this doesn't regress on ordinary read/write IO.
> 
> Signed-off-by: Nikolay Borisov <nborisov@suse.com>

Looks ok, will also test...
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

> ---
>  fs/xfs/xfs_iomap.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index f179bdf1644d..8942324a4d3d 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -1008,7 +1008,7 @@ xfs_file_iomap_begin(
>  	end_fsb = XFS_B_TO_FSB(mp, offset + length);
>  
>  	error = xfs_bmapi_read(ip, offset_fsb, end_fsb - offset_fsb, &imap,
> -			       &nimaps, 0);
> +			       &nimaps, XFS_BMAPI_ENTIRE);
>  	if (error)
>  		goto out_unlock;
>  
> -- 
> 2.7.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Darrick J. Wong Nov. 28, 2017, 1:47 a.m. UTC | #3

On Mon, Nov 27, 2017 at 09:44:34AM -0800, Darrick J. Wong wrote:
> On Thu, Nov 23, 2017 at 01:40:03PM +0200, Nikolay Borisov wrote:
> > For file extents xfs currently calls xfs_bmapi_read with flag set to 0,
> > meaning extents are going to be truncated to the requested range. Since the
> > same codepath is used for fiemap this means xfs is special in that regard, since
> > other filesystems (ext4/btrfs) do not trim extents for fiemap. Make the behavior
> > consistent by always passing XFS_BMAPI_ENTIRE. A full xfstest run validated that
> > this doesn't regress on ordinary read/write IO.
> > 
> > Signed-off-by: Nikolay Borisov <nborisov@suse.com>
> 
> Looks ok, will also test...
> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

...and now withdrawn.

Looking at iomap_apply, it seems we call ->iomap_begin, and at line 73 we
trim an iomap that is longer than the range we requested:

/*
 * Cut down the length to the one actually provided by the filesystem,
 * as it might not be able to give us the whole size that we requested.
 */
if (iomap.offset + iomap.length < pos + length)
	length = iomap.offset + iomap.length - pos;

However, we do not trim the beginning off an iomap that starts before
the 'pos' we passed to ->iomap_begin, so we're left with an iomap that
can begin before the range.

FWIW I ran xfstests (like I told you to on IRC) and saw regressions
in a whole bunch of tests:

generic/170 generic/287 generic/295 generic/326 generic/330 generic/333
generic/372 xfs/214 xfs/237 xfs/249 xfs/258 xfs/346 xfs/420 xfs/421

From a brief glance it looks as though most of the iomap_apply'ers
try to trim the iomap before using it, but clearly there's something
wrong here.

--D

> 
> > ---
> >  fs/xfs/xfs_iomap.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> > index f179bdf1644d..8942324a4d3d 100644
> > --- a/fs/xfs/xfs_iomap.c
> > +++ b/fs/xfs/xfs_iomap.c
> > @@ -1008,7 +1008,7 @@ xfs_file_iomap_begin(
> >  	end_fsb = XFS_B_TO_FSB(mp, offset + length);
> >  
> >  	error = xfs_bmapi_read(ip, offset_fsb, end_fsb - offset_fsb, &imap,
> > -			       &nimaps, 0);
> > +			       &nimaps, XFS_BMAPI_ENTIRE);
> >  	if (error)
> >  		goto out_unlock;
> >  
> > -- 
> > 2.7.4
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Nikolay Borisov Nov. 28, 2017, 6:46 a.m. UTC | #4

On 28.11.2017 03:47, Darrick J. Wong wrote:
> On Mon, Nov 27, 2017 at 09:44:34AM -0800, Darrick J. Wong wrote:
>> On Thu, Nov 23, 2017 at 01:40:03PM +0200, Nikolay Borisov wrote:
>>> For file extents xfs currently calls xfs_bmapi_read with flag set to 0,
>>> meaning extents are going to be truncated to the requested range. Since the
>>> same codepath is used for fiemap this means xfs is special in that regard, since
>>> other filesystems (ext4/btrfs) do not trim extents for fiemap. Make the behavior
>>> consistent by always passing XFS_BMAPI_ENTIRE. A full xfstest run validated that
>>> this doesn't regress on ordinary read/write IO.
>>>
>>> Signed-off-by: Nikolay Borisov <nborisov@suse.com>
>>
>> Looks ok, will also test...
>> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
> 
> ...and now withdrawn.
> 
> Looking at iomap_apply, it seems we call ->iomap_begin, and at line 73 we
> trim an iomap that is longer than the range we requested:
> 
> /*
>  * Cut down the length to the one actually provided by the filesystem,
>  * as it might not be able to give us the whole size that we requested.
>  */
> if (iomap.offset + iomap.length < pos + length)
> 	length = iomap.offset + iomap.length - pos;
> 
> However, we do not trim the beginning off an iomap that starts before
> the 'pos' we passed to ->iomap_begin, so we're left with an iomap that
> can begin before the range.
> 
> FWIW I ran xfstests (like I told you to on IRC) and saw regressions
> in a whole bunch of tests:
> 
> generic/170 generic/287 generic/295 generic/326 generic/330 generic/333
> generic/372 xfs/214 xfs/237 xfs/249 xfs/258 xfs/346 xfs/420 xfs/421
> 
> From a brief glance it looks as though most of the iomap_apply'ers
> try to trim the iomap before using it, but clearly there's something
> wrong here.

Strange, I didn't see those failures o_O. Anyways, how about using
XFS_BMAPI_ENTIRE in case when IOMAP_REPORT is passed i.e the first
iteration of this patch.

> 
> --D
> 
>>
>>> ---
>>>  fs/xfs/xfs_iomap.c | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
>>> index f179bdf1644d..8942324a4d3d 100644
>>> --- a/fs/xfs/xfs_iomap.c
>>> +++ b/fs/xfs/xfs_iomap.c
>>> @@ -1008,7 +1008,7 @@ xfs_file_iomap_begin(
>>>  	end_fsb = XFS_B_TO_FSB(mp, offset + length);
>>>  
>>>  	error = xfs_bmapi_read(ip, offset_fsb, end_fsb - offset_fsb, &imap,
>>> -			       &nimaps, 0);
>>> +			       &nimaps, XFS_BMAPI_ENTIRE);
>>>  	if (error)
>>>  		goto out_unlock;
>>>  
>>> -- 
>>> 2.7.4
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Nikolay Borisov Nov. 28, 2017, 7:05 a.m. UTC | #5

On 28.11.2017 03:47, Darrick J. Wong wrote:
> generic/170 generic/287 generic/295 generic/326 generic/330 generic/333
> generic/372 xfs/214 xfs/237 xfs/249 xfs/258 xfs/346 xfs/420 xfs/421

right, so all of these are reflink tests which aren't run in my
configuration
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Nikolay Borisov Nov. 28, 2017, 9:17 a.m. UTC | #6

On 28.11.2017 03:47, Darrick J. Wong wrote:
> On Mon, Nov 27, 2017 at 09:44:34AM -0800, Darrick J. Wong wrote:
>> On Thu, Nov 23, 2017 at 01:40:03PM +0200, Nikolay Borisov wrote:
>>> For file extents xfs currently calls xfs_bmapi_read with flag set to 0,
>>> meaning extents are going to be truncated to the requested range. Since the
>>> same codepath is used for fiemap this means xfs is special in that regard, since
>>> other filesystems (ext4/btrfs) do not trim extents for fiemap. Make the behavior
>>> consistent by always passing XFS_BMAPI_ENTIRE. A full xfstest run validated that
>>> this doesn't regress on ordinary read/write IO.
>>>
>>> Signed-off-by: Nikolay Borisov <nborisov@suse.com>
>>
>> Looks ok, will also test...
>> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
> 
> ...and now withdrawn.
> 
> Looking at iomap_apply, it seems we call ->iomap_begin, and at line 73 we
> trim an iomap that is longer than the range we requested:
> 
> /*
>  * Cut down the length to the one actually provided by the filesystem,
>  * as it might not be able to give us the whole size that we requested.
>  */
> if (iomap.offset + iomap.length < pos + length)
> 	length = iomap.offset + iomap.length - pos;

This actually trims the requested length to match what iomap has given
is. I.e if we request a range between [50, 100] (length 50, offset 50)
but iomap returns

[40,60] (offset 40, length 20) then the trimmer range is :

[50, 10]. SO it's actually trimming the requested range against what was
actually returned by the filesystem not the other way around.


> 
> However, we do not trim the beginning off an iomap that starts before
> the 'pos' we passed to ->iomap_begin, so we're left with an iomap that
> can begin before the range.
> 
> FWIW I ran xfstests (like I told you to on IRC) and saw regressions
> in a whole bunch of tests:
> 
> generic/170 generic/287 generic/295 generic/326 generic/330 generic/333
> generic/372 xfs/214 xfs/237 xfs/249 xfs/258 xfs/346 xfs/420 xfs/421
> 
> From a brief glance it looks as though most of the iomap_apply'ers
> try to trim the iomap before using it, but clearly there's something
> wrong here.
> 
> --D
> 
>>
>>> ---
>>>  fs/xfs/xfs_iomap.c | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
>>> index f179bdf1644d..8942324a4d3d 100644
>>> --- a/fs/xfs/xfs_iomap.c
>>> +++ b/fs/xfs/xfs_iomap.c
>>> @@ -1008,7 +1008,7 @@ xfs_file_iomap_begin(
>>>  	end_fsb = XFS_B_TO_FSB(mp, offset + length);
>>>  
>>>  	error = xfs_bmapi_read(ip, offset_fsb, end_fsb - offset_fsb, &imap,
>>> -			       &nimaps, 0);
>>> +			       &nimaps, XFS_BMAPI_ENTIRE);
>>>  	if (error)
>>>  		goto out_unlock;
>>>  
>>> -- 
>>> 2.7.4
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Darrick J. Wong Nov. 29, 2017, 2:30 a.m. UTC | #7

On Tue, Nov 28, 2017 at 08:46:27AM +0200, Nikolay Borisov wrote:
> 
> 
> On 28.11.2017 03:47, Darrick J. Wong wrote:
> > On Mon, Nov 27, 2017 at 09:44:34AM -0800, Darrick J. Wong wrote:
> >> On Thu, Nov 23, 2017 at 01:40:03PM +0200, Nikolay Borisov wrote:
> >>> For file extents xfs currently calls xfs_bmapi_read with flag set to 0,
> >>> meaning extents are going to be truncated to the requested range. Since the
> >>> same codepath is used for fiemap this means xfs is special in that regard, since
> >>> other filesystems (ext4/btrfs) do not trim extents for fiemap. Make the behavior
> >>> consistent by always passing XFS_BMAPI_ENTIRE. A full xfstest run validated that
> >>> this doesn't regress on ordinary read/write IO.
> >>>
> >>> Signed-off-by: Nikolay Borisov <nborisov@suse.com>
> >>
> >> Looks ok, will also test...
> >> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > ...and now withdrawn.
> > 
> > Looking at iomap_apply, it seems we call ->iomap_begin, and at line 73 we
> > trim an iomap that is longer than the range we requested:
> > 
> > /*
> >  * Cut down the length to the one actually provided by the filesystem,
> >  * as it might not be able to give us the whole size that we requested.
> >  */
> > if (iomap.offset + iomap.length < pos + length)
> > 	length = iomap.offset + iomap.length - pos;
> > 
> > However, we do not trim the beginning off an iomap that starts before
> > the 'pos' we passed to ->iomap_begin, so we're left with an iomap that
> > can begin before the range.
> > 
> > FWIW I ran xfstests (like I told you to on IRC) and saw regressions
> > in a whole bunch of tests:
> > 
> > generic/170 generic/287 generic/295 generic/326 generic/330 generic/333
> > generic/372 xfs/214 xfs/237 xfs/249 xfs/258 xfs/346 xfs/420 xfs/421
> > 
> > From a brief glance it looks as though most of the iomap_apply'ers
> > try to trim the iomap before using it, but clearly there's something
> > wrong here.
> 
> Strange, I didn't see those failures o_O. Anyways, how about using
> XFS_BMAPI_ENTIRE in case when IOMAP_REPORT is passed i.e the first
> iteration of this patch.

Hey Christoph, what are the rules about ->iomap_begin passing back an
extent that extends outside the range that was requested?  Mr. Borisov
wants XFS fiemap to return the raw extent without trimming (apparently
to follow the ext4 fiemap behavior), but enabling XFS_BMAPI_ENTIRE
unconditionally causes xfstests regressions, which we cannot have.

I /guess/ we could trim if IOMAP_REPORT is set (a la the first patch),
but then we have the situation where the xfs iomap_begin can return more
of an answer than was asked for, depending on the passed in flags.

Soooo... since you're the author of the vfs iomap patchset, I'm kicking
this question to you?   Does the iomap code require that the returned
&iomap cannot extend beyond the requested pos/length?  Is it supposed to
trim the iomap?  Or... what?

(I dunno, it's FIEMAP, which only says that the returned extent /may/
start before and /may/ end after.)

--D

> > 
> > --D
> > 
> >>
> >>> ---
> >>>  fs/xfs/xfs_iomap.c | 2 +-
> >>>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>>
> >>> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> >>> index f179bdf1644d..8942324a4d3d 100644
> >>> --- a/fs/xfs/xfs_iomap.c
> >>> +++ b/fs/xfs/xfs_iomap.c
> >>> @@ -1008,7 +1008,7 @@ xfs_file_iomap_begin(
> >>>  	end_fsb = XFS_B_TO_FSB(mp, offset + length);
> >>>  
> >>>  	error = xfs_bmapi_read(ip, offset_fsb, end_fsb - offset_fsb, &imap,
> >>> -			       &nimaps, 0);
> >>> +			       &nimaps, XFS_BMAPI_ENTIRE);
> >>>  	if (error)
> >>>  		goto out_unlock;
> >>>  
> >>> -- 
> >>> 2.7.4
> >>>
> >>> --
> >>> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> >>> the body of a message to majordomo@vger.kernel.org
> >>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

'Christoph Hellwig' Nov. 29, 2017, 7:57 p.m. UTC | #8

On Tue, Nov 28, 2017 at 06:30:34PM -0800, Darrick J. Wong wrote:
> Hey Christoph, what are the rules about ->iomap_begin passing back an
> extent that extends outside the range that was requested?  Mr. Borisov
> wants XFS fiemap to return the raw extent without trimming (apparently
> to follow the ext4 fiemap behavior), but enabling XFS_BMAPI_ENTIRE
> unconditionally causes xfstests regressions, which we cannot have.

No rules yet as you see.  But I think the best plan in the long run
is that the iomap.c code will trim any mapping if needed instead of having
duplicate instance of the trimming code in the file systems.
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

xfs: Don't trim file extents during iomap

Commit Message

Comments

Patch