Message ID | 1511437203-4329-1-git-send-email-nborisov@suse.com (mailing list archive) |
---|---|
State | Deferred, archived |
Headers | show |
On Thu, Nov 23, 2017 at 01:40:03PM +0200, Nikolay Borisov wrote: > For file extents xfs currently calls xfs_bmapi_read with flag set to 0, > meaning extents are going to be truncated to the requested range. Since the > same codepath is used for fiemap this means xfs is special in that regard, since > other filesystems (ext4/btrfs) do not trim extents for fiemap. Make the behavior > consistent by always passing XFS_BMAPI_ENTIRE. A full xfstest run validated that > this doesn't regress on ordinary read/write IO. > > Signed-off-by: Nikolay Borisov <nborisov@suse.com> Looks good, except that the changelog could use line wrapping at 75 characters :) Reviewed-by: Christoph Hellwig <hch@lst.de> -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Nov 23, 2017 at 01:40:03PM +0200, Nikolay Borisov wrote: > For file extents xfs currently calls xfs_bmapi_read with flag set to 0, > meaning extents are going to be truncated to the requested range. Since the > same codepath is used for fiemap this means xfs is special in that regard, since > other filesystems (ext4/btrfs) do not trim extents for fiemap. Make the behavior > consistent by always passing XFS_BMAPI_ENTIRE. A full xfstest run validated that > this doesn't regress on ordinary read/write IO. > > Signed-off-by: Nikolay Borisov <nborisov@suse.com> Looks ok, will also test... Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> > --- > fs/xfs/xfs_iomap.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c > index f179bdf1644d..8942324a4d3d 100644 > --- a/fs/xfs/xfs_iomap.c > +++ b/fs/xfs/xfs_iomap.c > @@ -1008,7 +1008,7 @@ xfs_file_iomap_begin( > end_fsb = XFS_B_TO_FSB(mp, offset + length); > > error = xfs_bmapi_read(ip, offset_fsb, end_fsb - offset_fsb, &imap, > - &nimaps, 0); > + &nimaps, XFS_BMAPI_ENTIRE); > if (error) > goto out_unlock; > > -- > 2.7.4 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Nov 27, 2017 at 09:44:34AM -0800, Darrick J. Wong wrote: > On Thu, Nov 23, 2017 at 01:40:03PM +0200, Nikolay Borisov wrote: > > For file extents xfs currently calls xfs_bmapi_read with flag set to 0, > > meaning extents are going to be truncated to the requested range. Since the > > same codepath is used for fiemap this means xfs is special in that regard, since > > other filesystems (ext4/btrfs) do not trim extents for fiemap. Make the behavior > > consistent by always passing XFS_BMAPI_ENTIRE. A full xfstest run validated that > > this doesn't regress on ordinary read/write IO. > > > > Signed-off-by: Nikolay Borisov <nborisov@suse.com> > > Looks ok, will also test... > Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> ...and now withdrawn. Looking at iomap_apply, it seems we call ->iomap_begin, and at line 73 we trim an iomap that is longer than the range we requested: /* * Cut down the length to the one actually provided by the filesystem, * as it might not be able to give us the whole size that we requested. */ if (iomap.offset + iomap.length < pos + length) length = iomap.offset + iomap.length - pos; However, we do not trim the beginning off an iomap that starts before the 'pos' we passed to ->iomap_begin, so we're left with an iomap that can begin before the range. FWIW I ran xfstests (like I told you to on IRC) and saw regressions in a whole bunch of tests: generic/170 generic/287 generic/295 generic/326 generic/330 generic/333 generic/372 xfs/214 xfs/237 xfs/249 xfs/258 xfs/346 xfs/420 xfs/421 From a brief glance it looks as though most of the iomap_apply'ers try to trim the iomap before using it, but clearly there's something wrong here. --D > > > --- > > fs/xfs/xfs_iomap.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c > > index f179bdf1644d..8942324a4d3d 100644 > > --- a/fs/xfs/xfs_iomap.c > > +++ b/fs/xfs/xfs_iomap.c > > @@ -1008,7 +1008,7 @@ xfs_file_iomap_begin( > > end_fsb = XFS_B_TO_FSB(mp, offset + length); > > > > error = xfs_bmapi_read(ip, offset_fsb, end_fsb - offset_fsb, &imap, > > - &nimaps, 0); > > + &nimaps, XFS_BMAPI_ENTIRE); > > if (error) > > goto out_unlock; > > > > -- > > 2.7.4 > > > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 28.11.2017 03:47, Darrick J. Wong wrote: > On Mon, Nov 27, 2017 at 09:44:34AM -0800, Darrick J. Wong wrote: >> On Thu, Nov 23, 2017 at 01:40:03PM +0200, Nikolay Borisov wrote: >>> For file extents xfs currently calls xfs_bmapi_read with flag set to 0, >>> meaning extents are going to be truncated to the requested range. Since the >>> same codepath is used for fiemap this means xfs is special in that regard, since >>> other filesystems (ext4/btrfs) do not trim extents for fiemap. Make the behavior >>> consistent by always passing XFS_BMAPI_ENTIRE. A full xfstest run validated that >>> this doesn't regress on ordinary read/write IO. >>> >>> Signed-off-by: Nikolay Borisov <nborisov@suse.com> >> >> Looks ok, will also test... >> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> > > ...and now withdrawn. > > Looking at iomap_apply, it seems we call ->iomap_begin, and at line 73 we > trim an iomap that is longer than the range we requested: > > /* > * Cut down the length to the one actually provided by the filesystem, > * as it might not be able to give us the whole size that we requested. > */ > if (iomap.offset + iomap.length < pos + length) > length = iomap.offset + iomap.length - pos; > > However, we do not trim the beginning off an iomap that starts before > the 'pos' we passed to ->iomap_begin, so we're left with an iomap that > can begin before the range. > > FWIW I ran xfstests (like I told you to on IRC) and saw regressions > in a whole bunch of tests: > > generic/170 generic/287 generic/295 generic/326 generic/330 generic/333 > generic/372 xfs/214 xfs/237 xfs/249 xfs/258 xfs/346 xfs/420 xfs/421 > > From a brief glance it looks as though most of the iomap_apply'ers > try to trim the iomap before using it, but clearly there's something > wrong here. Strange, I didn't see those failures o_O. Anyways, how about using XFS_BMAPI_ENTIRE in case when IOMAP_REPORT is passed i.e the first iteration of this patch. > > --D > >> >>> --- >>> fs/xfs/xfs_iomap.c | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c >>> index f179bdf1644d..8942324a4d3d 100644 >>> --- a/fs/xfs/xfs_iomap.c >>> +++ b/fs/xfs/xfs_iomap.c >>> @@ -1008,7 +1008,7 @@ xfs_file_iomap_begin( >>> end_fsb = XFS_B_TO_FSB(mp, offset + length); >>> >>> error = xfs_bmapi_read(ip, offset_fsb, end_fsb - offset_fsb, &imap, >>> - &nimaps, 0); >>> + &nimaps, XFS_BMAPI_ENTIRE); >>> if (error) >>> goto out_unlock; >>> >>> -- >>> 2.7.4 >>> >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 28.11.2017 03:47, Darrick J. Wong wrote: > generic/170 generic/287 generic/295 generic/326 generic/330 generic/333 > generic/372 xfs/214 xfs/237 xfs/249 xfs/258 xfs/346 xfs/420 xfs/421 right, so all of these are reflink tests which aren't run in my configuration -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 28.11.2017 03:47, Darrick J. Wong wrote: > On Mon, Nov 27, 2017 at 09:44:34AM -0800, Darrick J. Wong wrote: >> On Thu, Nov 23, 2017 at 01:40:03PM +0200, Nikolay Borisov wrote: >>> For file extents xfs currently calls xfs_bmapi_read with flag set to 0, >>> meaning extents are going to be truncated to the requested range. Since the >>> same codepath is used for fiemap this means xfs is special in that regard, since >>> other filesystems (ext4/btrfs) do not trim extents for fiemap. Make the behavior >>> consistent by always passing XFS_BMAPI_ENTIRE. A full xfstest run validated that >>> this doesn't regress on ordinary read/write IO. >>> >>> Signed-off-by: Nikolay Borisov <nborisov@suse.com> >> >> Looks ok, will also test... >> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> > > ...and now withdrawn. > > Looking at iomap_apply, it seems we call ->iomap_begin, and at line 73 we > trim an iomap that is longer than the range we requested: > > /* > * Cut down the length to the one actually provided by the filesystem, > * as it might not be able to give us the whole size that we requested. > */ > if (iomap.offset + iomap.length < pos + length) > length = iomap.offset + iomap.length - pos; This actually trims the requested length to match what iomap has given is. I.e if we request a range between [50, 100] (length 50, offset 50) but iomap returns [40,60] (offset 40, length 20) then the trimmer range is : [50, 10]. SO it's actually trimming the requested range against what was actually returned by the filesystem not the other way around. > > However, we do not trim the beginning off an iomap that starts before > the 'pos' we passed to ->iomap_begin, so we're left with an iomap that > can begin before the range. > > FWIW I ran xfstests (like I told you to on IRC) and saw regressions > in a whole bunch of tests: > > generic/170 generic/287 generic/295 generic/326 generic/330 generic/333 > generic/372 xfs/214 xfs/237 xfs/249 xfs/258 xfs/346 xfs/420 xfs/421 > > From a brief glance it looks as though most of the iomap_apply'ers > try to trim the iomap before using it, but clearly there's something > wrong here. > > --D > >> >>> --- >>> fs/xfs/xfs_iomap.c | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c >>> index f179bdf1644d..8942324a4d3d 100644 >>> --- a/fs/xfs/xfs_iomap.c >>> +++ b/fs/xfs/xfs_iomap.c >>> @@ -1008,7 +1008,7 @@ xfs_file_iomap_begin( >>> end_fsb = XFS_B_TO_FSB(mp, offset + length); >>> >>> error = xfs_bmapi_read(ip, offset_fsb, end_fsb - offset_fsb, &imap, >>> - &nimaps, 0); >>> + &nimaps, XFS_BMAPI_ENTIRE); >>> if (error) >>> goto out_unlock; >>> >>> -- >>> 2.7.4 >>> >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Nov 28, 2017 at 08:46:27AM +0200, Nikolay Borisov wrote: > > > On 28.11.2017 03:47, Darrick J. Wong wrote: > > On Mon, Nov 27, 2017 at 09:44:34AM -0800, Darrick J. Wong wrote: > >> On Thu, Nov 23, 2017 at 01:40:03PM +0200, Nikolay Borisov wrote: > >>> For file extents xfs currently calls xfs_bmapi_read with flag set to 0, > >>> meaning extents are going to be truncated to the requested range. Since the > >>> same codepath is used for fiemap this means xfs is special in that regard, since > >>> other filesystems (ext4/btrfs) do not trim extents for fiemap. Make the behavior > >>> consistent by always passing XFS_BMAPI_ENTIRE. A full xfstest run validated that > >>> this doesn't regress on ordinary read/write IO. > >>> > >>> Signed-off-by: Nikolay Borisov <nborisov@suse.com> > >> > >> Looks ok, will also test... > >> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> > > > > ...and now withdrawn. > > > > Looking at iomap_apply, it seems we call ->iomap_begin, and at line 73 we > > trim an iomap that is longer than the range we requested: > > > > /* > > * Cut down the length to the one actually provided by the filesystem, > > * as it might not be able to give us the whole size that we requested. > > */ > > if (iomap.offset + iomap.length < pos + length) > > length = iomap.offset + iomap.length - pos; > > > > However, we do not trim the beginning off an iomap that starts before > > the 'pos' we passed to ->iomap_begin, so we're left with an iomap that > > can begin before the range. > > > > FWIW I ran xfstests (like I told you to on IRC) and saw regressions > > in a whole bunch of tests: > > > > generic/170 generic/287 generic/295 generic/326 generic/330 generic/333 > > generic/372 xfs/214 xfs/237 xfs/249 xfs/258 xfs/346 xfs/420 xfs/421 > > > > From a brief glance it looks as though most of the iomap_apply'ers > > try to trim the iomap before using it, but clearly there's something > > wrong here. > > Strange, I didn't see those failures o_O. Anyways, how about using > XFS_BMAPI_ENTIRE in case when IOMAP_REPORT is passed i.e the first > iteration of this patch. Hey Christoph, what are the rules about ->iomap_begin passing back an extent that extends outside the range that was requested? Mr. Borisov wants XFS fiemap to return the raw extent without trimming (apparently to follow the ext4 fiemap behavior), but enabling XFS_BMAPI_ENTIRE unconditionally causes xfstests regressions, which we cannot have. I /guess/ we could trim if IOMAP_REPORT is set (a la the first patch), but then we have the situation where the xfs iomap_begin can return more of an answer than was asked for, depending on the passed in flags. Soooo... since you're the author of the vfs iomap patchset, I'm kicking this question to you? Does the iomap code require that the returned &iomap cannot extend beyond the requested pos/length? Is it supposed to trim the iomap? Or... what? (I dunno, it's FIEMAP, which only says that the returned extent /may/ start before and /may/ end after.) --D > > > > --D > > > >> > >>> --- > >>> fs/xfs/xfs_iomap.c | 2 +- > >>> 1 file changed, 1 insertion(+), 1 deletion(-) > >>> > >>> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c > >>> index f179bdf1644d..8942324a4d3d 100644 > >>> --- a/fs/xfs/xfs_iomap.c > >>> +++ b/fs/xfs/xfs_iomap.c > >>> @@ -1008,7 +1008,7 @@ xfs_file_iomap_begin( > >>> end_fsb = XFS_B_TO_FSB(mp, offset + length); > >>> > >>> error = xfs_bmapi_read(ip, offset_fsb, end_fsb - offset_fsb, &imap, > >>> - &nimaps, 0); > >>> + &nimaps, XFS_BMAPI_ENTIRE); > >>> if (error) > >>> goto out_unlock; > >>> > >>> -- > >>> 2.7.4 > >>> > >>> -- > >>> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > >>> the body of a message to majordomo@vger.kernel.org > >>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >> -- > >> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > >> the body of a message to majordomo@vger.kernel.org > >> More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Nov 28, 2017 at 06:30:34PM -0800, Darrick J. Wong wrote: > Hey Christoph, what are the rules about ->iomap_begin passing back an > extent that extends outside the range that was requested? Mr. Borisov > wants XFS fiemap to return the raw extent without trimming (apparently > to follow the ext4 fiemap behavior), but enabling XFS_BMAPI_ENTIRE > unconditionally causes xfstests regressions, which we cannot have. No rules yet as you see. But I think the best plan in the long run is that the iomap.c code will trim any mapping if needed instead of having duplicate instance of the trimming code in the file systems. -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c index f179bdf1644d..8942324a4d3d 100644 --- a/fs/xfs/xfs_iomap.c +++ b/fs/xfs/xfs_iomap.c @@ -1008,7 +1008,7 @@ xfs_file_iomap_begin( end_fsb = XFS_B_TO_FSB(mp, offset + length); error = xfs_bmapi_read(ip, offset_fsb, end_fsb - offset_fsb, &imap, - &nimaps, 0); + &nimaps, XFS_BMAPI_ENTIRE); if (error) goto out_unlock;
For file extents xfs currently calls xfs_bmapi_read with flag set to 0, meaning extents are going to be truncated to the requested range. Since the same codepath is used for fiemap this means xfs is special in that regard, since other filesystems (ext4/btrfs) do not trim extents for fiemap. Make the behavior consistent by always passing XFS_BMAPI_ENTIRE. A full xfstest run validated that this doesn't regress on ordinary read/write IO. Signed-off-by: Nikolay Borisov <nborisov@suse.com> --- fs/xfs/xfs_iomap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)