diff mbox

Fiemap inconsistent behaviour when file offset range isn't on a block boundary

Message ID 20140715121512.GA54811@bfoster.bfoster (mailing list archive)
State New, archived
Headers show

Commit Message

Brian Foster July 15, 2014, 12:15 p.m. UTC
On Tue, Jul 15, 2014 at 04:20:29PM +0630, Chandan Rajendra wrote:
> All the filesystems created and used below have 4k blocksize. The
> "file.bin" file mentioned below is 8k in size and has two 4k
> extents. The test program used can be found at http://fpaste.org/118057/.
> 
> 1. First run (file range being passed is on block boundaries).
>    ,----
>    | [root@guest0 btrfs]# for f in $(ls -1 /mnt/{btrfs,ext4,xfs}/file.bin) ;
>    | >        do
>    | >        echo "-------------- File: $f -----------";
>    | >        /root/print-fiemap 0 8192 $f;
>    | >        done
>    | -------------- File: /mnt/btrfs/file.bin -----------
>    | File range: 0 - 8191.
>    | Found 2 extents.
>    | Fiemap information:
>    | 		Logical: 0
>    | 		Physical: 12582912
>    | 		Length: 4096
>    | 		Flags:
>    | 
>    | 		Logical: 4096
>    | 		Physical: 12656640
>    | 		Length: 4096
>    | 		Flags: FIEMAP_EXTENT_LAST |
>    | 
>    | -------------- File: /mnt/ext4/file.bin -----------
>    | File range: 0 - 8191.
>    | Found 2 extents.
>    | Fiemap information:
>    | 		Logical: 0
>    | 		Physical: 135270400
>    | 		Length: 4096
>    | 		Flags:
>    | 
>    | 		Logical: 4096
>    | 		Physical: 135278592
>    | 		Length: 4096
>    | 		Flags: FIEMAP_EXTENT_LAST |
>    | 
>    | -------------- File: /mnt/xfs/file.bin -----------
>    | File range: 0 - 8191.
>    | Found 2 extents.
>    | Fiemap information:
>    | 		Logical: 0
>    | 		Physical: 49152
>    | 		Length: 4096
>    | 		Flags:
>    | 
>    | 		Logical: 4096
>    | 		Physical: 57344
>    | 		Length: 4096
>    | 		Flags: FIEMAP_EXTENT_LAST |
>    `----
> 
> 2. If the file offset range being passed as input to fiemap ioctl is
>    not on block boundaries and falls within an extent's range then that
>    extent is skipped.
>    ,----
>    | [root@guest0 btrfs]# for f in $(ls -1 /mnt/{btrfs,ext4,xfs}/file.bin) ;
>    | > do
>    | > echo "-------------- File: $f -----------";
>    | > /root/print-fiemap 1 4095 $f;
>    | > done
>    | -------------- File: /mnt/btrfs/file.bin -----------
>    | File range: 1 - 4095.
>    | Found 0 extents.
>    | 
>    | 
>    | -------------- File: /mnt/ext4/file.bin -----------
>    | File range: 1 - 4095.
>    | Found 1 extents.
>    | Fiemap information:
>    | 		Logical: 0
>    | 		Physical: 135270400
>    | 		Length: 4096
>    | 		Flags:
>    | 
>    | -------------- File: /mnt/xfs/file.bin -----------
>    | File range: 1 - 4095.
>    | Found 2 extents.
>    | Fiemap information:
>    | 		Logical: 0
>    | 		Physical: 49152
>    | 		Length: 4096
>    | 		Flags:
>    | 
>    | 		Logical: 4096
>    | 		Physical: 57344
>    | 		Length: 4096
>    | 		Flags: FIEMAP_EXTENT_LAST |
>    `----
> 
>    From linux/Documentation/filesystems/fiemap.txt, "fm_start, and
>    fm_length specify the logical range within the file which the
>    process would like mappings for. Extents returned mirror those on
>    disk - that is, the logical offset of the 1st returned extent may
>    start before fm_start, and the range covered by the last returned
>    extent may end after fm_length. All offsets and lengths are in
>    bytes."
> 
>    So IMHO, the above would mean that all the extents that map the
>    file range [fm_start, fm_start + fm_length - 1] should be returned
>    by a fiemap ioctl call (as done by ext4).
> 
>    In the case of Btrfs, the commit
>    ea8efc74bd0402b4d5f663d007b4e25fa29ea778 i.e. "Btrfs: make sure not
>    to return overlapping extents to fiemap", forces the first extent
>    returned by btrfs_fiemap() to start from fm_start (if fm_start is
>    greater than the file offset mapped by the containing extent's
>    first byte). Can somebody please list some example scenarios where
>    extent_fiemap() ends up returning dupclicate and overlapping
>    extents?
>    Also, the commit 4d479cf010d56ec9c54f3099992d039918f1296b
>    i.e. "Btrfs: sectorsize align offsets in fiemap", rounds up first
>    byte of the file offset range to the next block. Shouldn't it be
>    rounded down instead?
> 
>    XFS lists both the extents even though the first one encompasses the
>    file range specified in the input.
>      

I gave this a test on XFS with a file that looks like this:

# xfs_bmap -v /mnt/file 
/mnt/file:
 EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET        TOTAL FLAGS
   0: [0..15]:         102368..102383    1 (51168..51183)      16 10000
   1: [16..63]:        1832..1879        0 (1832..1879)        48 10000

I narrowed the print_fiemap behavior down to this:

# ./print_fiemap 1 7680 /mnt/file 
File range: 1 - 7680.
Found 1 extents.
Fiemap information:
		Logical: 0
		Physical: 52412416
		Length: 8192
		Flags: FIEMAP_EXTENT_UNWRITTEN | 

# ./print_fiemap 1 7681 /mnt/file 
File range: 1 - 7681.
Found 2 extents.
Fiemap information:
		Logical: 0
		Physical: 52412416
		Length: 8192
		Flags: FIEMAP_EXTENT_UNWRITTEN | 

		Logical: 8192
		Physical: 937984
		Length: 4096
		Flags: FIEMAP_EXTENT_LAST | FIEMAP_EXTENT_UNWRITTEN | 

... which is caused by using the BTOBB() macro on the provided length
value. This rounds the length up by a basic block (512 bytes). Switching
this to use BTOBBT() fixes it for me. Patch below, care to test? Thanks.

Brian

---8<---


> -- 
> chandan
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Brian Foster July 15, 2014, 1:53 p.m. UTC | #1
On Tue, Jul 15, 2014 at 08:15:12AM -0400, Brian Foster wrote:
> On Tue, Jul 15, 2014 at 04:20:29PM +0630, Chandan Rajendra wrote:
> > All the filesystems created and used below have 4k blocksize. The
> > "file.bin" file mentioned below is 8k in size and has two 4k
> > extents. The test program used can be found at http://fpaste.org/118057/.
> > 
> > 1. First run (file range being passed is on block boundaries).
> >    ,----
> >    | [root@guest0 btrfs]# for f in $(ls -1 /mnt/{btrfs,ext4,xfs}/file.bin) ;
> >    | >        do
> >    | >        echo "-------------- File: $f -----------";
> >    | >        /root/print-fiemap 0 8192 $f;
> >    | >        done
> >    | -------------- File: /mnt/btrfs/file.bin -----------
> >    | File range: 0 - 8191.
> >    | Found 2 extents.
> >    | Fiemap information:
> >    | 		Logical: 0
> >    | 		Physical: 12582912
> >    | 		Length: 4096
> >    | 		Flags:
> >    | 
> >    | 		Logical: 4096
> >    | 		Physical: 12656640
> >    | 		Length: 4096
> >    | 		Flags: FIEMAP_EXTENT_LAST |
> >    | 
> >    | -------------- File: /mnt/ext4/file.bin -----------
> >    | File range: 0 - 8191.
> >    | Found 2 extents.
> >    | Fiemap information:
> >    | 		Logical: 0
> >    | 		Physical: 135270400
> >    | 		Length: 4096
> >    | 		Flags:
> >    | 
> >    | 		Logical: 4096
> >    | 		Physical: 135278592
> >    | 		Length: 4096
> >    | 		Flags: FIEMAP_EXTENT_LAST |
> >    | 
> >    | -------------- File: /mnt/xfs/file.bin -----------
> >    | File range: 0 - 8191.
> >    | Found 2 extents.
> >    | Fiemap information:
> >    | 		Logical: 0
> >    | 		Physical: 49152
> >    | 		Length: 4096
> >    | 		Flags:
> >    | 
> >    | 		Logical: 4096
> >    | 		Physical: 57344
> >    | 		Length: 4096
> >    | 		Flags: FIEMAP_EXTENT_LAST |
> >    `----
> > 
> > 2. If the file offset range being passed as input to fiemap ioctl is
> >    not on block boundaries and falls within an extent's range then that
> >    extent is skipped.
> >    ,----
> >    | [root@guest0 btrfs]# for f in $(ls -1 /mnt/{btrfs,ext4,xfs}/file.bin) ;
> >    | > do
> >    | > echo "-------------- File: $f -----------";
> >    | > /root/print-fiemap 1 4095 $f;
> >    | > done
> >    | -------------- File: /mnt/btrfs/file.bin -----------
> >    | File range: 1 - 4095.
> >    | Found 0 extents.
> >    | 
> >    | 
> >    | -------------- File: /mnt/ext4/file.bin -----------
> >    | File range: 1 - 4095.
> >    | Found 1 extents.
> >    | Fiemap information:
> >    | 		Logical: 0
> >    | 		Physical: 135270400
> >    | 		Length: 4096
> >    | 		Flags:
> >    | 
> >    | -------------- File: /mnt/xfs/file.bin -----------
> >    | File range: 1 - 4095.
> >    | Found 2 extents.
> >    | Fiemap information:
> >    | 		Logical: 0
> >    | 		Physical: 49152
> >    | 		Length: 4096
> >    | 		Flags:
> >    | 
> >    | 		Logical: 4096
> >    | 		Physical: 57344
> >    | 		Length: 4096
> >    | 		Flags: FIEMAP_EXTENT_LAST |
> >    `----
> > 
> >    From linux/Documentation/filesystems/fiemap.txt, "fm_start, and
> >    fm_length specify the logical range within the file which the
> >    process would like mappings for. Extents returned mirror those on
> >    disk - that is, the logical offset of the 1st returned extent may
> >    start before fm_start, and the range covered by the last returned
> >    extent may end after fm_length. All offsets and lengths are in
> >    bytes."
> > 
> >    So IMHO, the above would mean that all the extents that map the
> >    file range [fm_start, fm_start + fm_length - 1] should be returned
> >    by a fiemap ioctl call (as done by ext4).
> > 
> >    In the case of Btrfs, the commit
> >    ea8efc74bd0402b4d5f663d007b4e25fa29ea778 i.e. "Btrfs: make sure not
> >    to return overlapping extents to fiemap", forces the first extent
> >    returned by btrfs_fiemap() to start from fm_start (if fm_start is
> >    greater than the file offset mapped by the containing extent's
> >    first byte). Can somebody please list some example scenarios where
> >    extent_fiemap() ends up returning dupclicate and overlapping
> >    extents?
> >    Also, the commit 4d479cf010d56ec9c54f3099992d039918f1296b
> >    i.e. "Btrfs: sectorsize align offsets in fiemap", rounds up first
> >    byte of the file offset range to the next block. Shouldn't it be
> >    rounded down instead?
> > 
> >    XFS lists both the extents even though the first one encompasses the
> >    file range specified in the input.
> >      
> 
> I gave this a test on XFS with a file that looks like this:
> 
> # xfs_bmap -v /mnt/file 
> /mnt/file:
>  EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET        TOTAL FLAGS
>    0: [0..15]:         102368..102383    1 (51168..51183)      16 10000
>    1: [16..63]:        1832..1879        0 (1832..1879)        48 10000
> 
> I narrowed the print_fiemap behavior down to this:
> 
> # ./print_fiemap 1 7680 /mnt/file 
> File range: 1 - 7680.
> Found 1 extents.
> Fiemap information:
> 		Logical: 0
> 		Physical: 52412416
> 		Length: 8192
> 		Flags: FIEMAP_EXTENT_UNWRITTEN | 
> 
> # ./print_fiemap 1 7681 /mnt/file 
> File range: 1 - 7681.
> Found 2 extents.
> Fiemap information:
> 		Logical: 0
> 		Physical: 52412416
> 		Length: 8192
> 		Flags: FIEMAP_EXTENT_UNWRITTEN | 
> 
> 		Logical: 8192
> 		Physical: 937984
> 		Length: 4096
> 		Flags: FIEMAP_EXTENT_LAST | FIEMAP_EXTENT_UNWRITTEN | 
> 
> ... which is caused by using the BTOBB() macro on the provided length
> value. This rounds the length up by a basic block (512 bytes). Switching
> this to use BTOBBT() fixes it for me. Patch below, care to test? Thanks.
> 
> Brian
> 
> ---8<---
> 
> diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
> index d75621a..d2fbc42 100644
> --- a/fs/xfs/xfs_iops.c
> +++ b/fs/xfs/xfs_iops.c
> @@ -1060,7 +1060,7 @@ xfs_vn_fiemap(
>  	if (length == FIEMAP_MAX_OFFSET)
>  		bm.bmv_length = -1LL;
>  	else
> -		bm.bmv_length = BTOBB(length);
> +		bm.bmv_length = BTOBBT(length);
>  
>  	/* We add one because in getbmap world count includes the header */
>  	bm.bmv_count = !fieinfo->fi_extents_max ? MAXEXTNUM :

It occurs to me that this probably isn't the right fix. The
documentation doesn't clarify for me what the expected behavior is for
unaligned offsets to fiemap. We could round down the start offset and
round up the length and go with that..?

I suppose this is confusing with regard to how your test application
presents the output. E.g., it shows the file range as start
offset+length, but that could just be wrong. Thoughts?

Brian

> 
> > -- 
> > chandan
> > 
> > _______________________________________________
> > xfs mailing list
> > xfs@oss.sgi.com
> > http://oss.sgi.com/mailman/listinfo/xfs
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Chandan Rajendra July 15, 2014, 2:26 p.m. UTC | #2
On Tuesday 15 Jul 2014 09:53:49 Brian Foster wrote:
> On Tue, Jul 15, 2014 at 08:15:12AM -0400, Brian Foster wrote:
> > On Tue, Jul 15, 2014 at 04:20:29PM +0630, Chandan Rajendra wrote:
> > > All the filesystems created and used below have 4k blocksize. The
> > > "file.bin" file mentioned below is 8k in size and has two 4k
> > > extents. The test program used can be found at http://fpaste.org/118057/.
> > > 
> > > 1. First run (file range being passed is on block boundaries).
> > >    ,----
> > >    | [root@guest0 btrfs]# for f in $(ls -1 /mnt/{btrfs,ext4,xfs}/file.bin) ;
> > >    | >        do
> > >    | >        echo "-------------- File: $f -----------";
> > >    | >        /root/print-fiemap 0 8192 $f;
> > >    | >        done
> > >    | -------------- File: /mnt/btrfs/file.bin -----------
> > >    | File range: 0 - 8191.
> > >    | Found 2 extents.
> > >    | Fiemap information:
> > >    | 		Logical: 0
> > >    | 		Physical: 12582912
> > >    | 		Length: 4096
> > >    | 		Flags:
> > >    | 
> > >    | 		Logical: 4096
> > >    | 		Physical: 12656640
> > >    | 		Length: 4096
> > >    | 		Flags: FIEMAP_EXTENT_LAST |
> > >    | 
> > >    | -------------- File: /mnt/ext4/file.bin -----------
> > >    | File range: 0 - 8191.
> > >    | Found 2 extents.
> > >    | Fiemap information:
> > >    | 		Logical: 0
> > >    | 		Physical: 135270400
> > >    | 		Length: 4096
> > >    | 		Flags:
> > >    | 
> > >    | 		Logical: 4096
> > >    | 		Physical: 135278592
> > >    | 		Length: 4096
> > >    | 		Flags: FIEMAP_EXTENT_LAST |
> > >    | 
> > >    | -------------- File: /mnt/xfs/file.bin -----------
> > >    | File range: 0 - 8191.
> > >    | Found 2 extents.
> > >    | Fiemap information:
> > >    | 		Logical: 0
> > >    | 		Physical: 49152
> > >    | 		Length: 4096
> > >    | 		Flags:
> > >    | 
> > >    | 		Logical: 4096
> > >    | 		Physical: 57344
> > >    | 		Length: 4096
> > >    | 		Flags: FIEMAP_EXTENT_LAST |
> > >    `----
> > > 
> > > 2. If the file offset range being passed as input to fiemap ioctl is
> > >    not on block boundaries and falls within an extent's range then that
> > >    extent is skipped.
> > >    ,----
> > >    | [root@guest0 btrfs]# for f in $(ls -1 /mnt/{btrfs,ext4,xfs}/file.bin) ;
> > >    | > do
> > >    | > echo "-------------- File: $f -----------";
> > >    | > /root/print-fiemap 1 4095 $f;
> > >    | > done
> > >    | -------------- File: /mnt/btrfs/file.bin -----------
> > >    | File range: 1 - 4095.
> > >    | Found 0 extents.
> > >    | 
> > >    | 
> > >    | -------------- File: /mnt/ext4/file.bin -----------
> > >    | File range: 1 - 4095.
> > >    | Found 1 extents.
> > >    | Fiemap information:
> > >    | 		Logical: 0
> > >    | 		Physical: 135270400
> > >    | 		Length: 4096
> > >    | 		Flags:
> > >    | 
> > >    | -------------- File: /mnt/xfs/file.bin -----------
> > >    | File range: 1 - 4095.
> > >    | Found 2 extents.
> > >    | Fiemap information:
> > >    | 		Logical: 0
> > >    | 		Physical: 49152
> > >    | 		Length: 4096
> > >    | 		Flags:
> > >    | 
> > >    | 		Logical: 4096
> > >    | 		Physical: 57344
> > >    | 		Length: 4096
> > >    | 		Flags: FIEMAP_EXTENT_LAST |
> > >    `----
> > > 
> > >    From linux/Documentation/filesystems/fiemap.txt, "fm_start, and
> > >    fm_length specify the logical range within the file which the
> > >    process would like mappings for. Extents returned mirror those on
> > >    disk - that is, the logical offset of the 1st returned extent may
> > >    start before fm_start, and the range covered by the last returned
> > >    extent may end after fm_length. All offsets and lengths are in
> > >    bytes."
> > > 
> > >    So IMHO, the above would mean that all the extents that map the
> > >    file range [fm_start, fm_start + fm_length - 1] should be returned
> > >    by a fiemap ioctl call (as done by ext4).
> > > 
> > >    In the case of Btrfs, the commit
> > >    ea8efc74bd0402b4d5f663d007b4e25fa29ea778 i.e. "Btrfs: make sure not
> > >    to return overlapping extents to fiemap", forces the first extent
> > >    returned by btrfs_fiemap() to start from fm_start (if fm_start is
> > >    greater than the file offset mapped by the containing extent's
> > >    first byte). Can somebody please list some example scenarios where
> > >    extent_fiemap() ends up returning dupclicate and overlapping
> > >    extents?
> > >    Also, the commit 4d479cf010d56ec9c54f3099992d039918f1296b
> > >    i.e. "Btrfs: sectorsize align offsets in fiemap", rounds up first
> > >    byte of the file offset range to the next block. Shouldn't it be
> > >    rounded down instead?
> > > 
> > >    XFS lists both the extents even though the first one encompasses the
> > >    file range specified in the input.
> > >      
> > 
> > I gave this a test on XFS with a file that looks like this:
> > 
> > # xfs_bmap -v /mnt/file 
> > /mnt/file:
> >  EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET        TOTAL FLAGS
> >    0: [0..15]:         102368..102383    1 (51168..51183)      16 10000
> >    1: [16..63]:        1832..1879        0 (1832..1879)        48 10000
> > 
> > I narrowed the print_fiemap behavior down to this:
> > 
> > # ./print_fiemap 1 7680 /mnt/file 
> > File range: 1 - 7680.
> > Found 1 extents.
> > Fiemap information:
> > 		Logical: 0
> > 		Physical: 52412416
> > 		Length: 8192
> > 		Flags: FIEMAP_EXTENT_UNWRITTEN | 
> > 
> > # ./print_fiemap 1 7681 /mnt/file 
> > File range: 1 - 7681.
> > Found 2 extents.
> > Fiemap information:
> > 		Logical: 0
> > 		Physical: 52412416
> > 		Length: 8192
> > 		Flags: FIEMAP_EXTENT_UNWRITTEN | 
> > 
> > 		Logical: 8192
> > 		Physical: 937984
> > 		Length: 4096
> > 		Flags: FIEMAP_EXTENT_LAST | FIEMAP_EXTENT_UNWRITTEN | 
> > 
> > ... which is caused by using the BTOBB() macro on the provided length
> > value. This rounds the length up by a basic block (512 bytes). Switching
> > this to use BTOBBT() fixes it for me. Patch below, care to test? Thanks.
> > 
> > Brian
> > 
> > ---8<---
> > 
> > diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
> > index d75621a..d2fbc42 100644
> > --- a/fs/xfs/xfs_iops.c
> > +++ b/fs/xfs/xfs_iops.c
> > @@ -1060,7 +1060,7 @@ xfs_vn_fiemap(
> >  	if (length == FIEMAP_MAX_OFFSET)
> >  		bm.bmv_length = -1LL;
> >  	else
> > -		bm.bmv_length = BTOBB(length);
> > +		bm.bmv_length = BTOBBT(length);
> >  
> >  	/* We add one because in getbmap world count includes the header */
> >  	bm.bmv_count = !fieinfo->fi_extents_max ? MAXEXTNUM :
> 

Hello Brian,

> It occurs to me that this probably isn't the right fix. The
> documentation doesn't clarify for me what the expected behavior is for
> unaligned offsets to fiemap. We could round down the start offset and
> round up the length and go with that..?

As I had indicated for Btrfs, I think we should round down
start_offset and round up 'start_offset + length'.

> 
> I suppose this is confusing with regard to how your test application
> presents the output. E.g., it shows the file range as start
> offset+length, but that could just be wrong. Thoughts?

Are you referring to the below statement?
	printf("File range: %llu - %llu.\n", fiemap->fm_start,
		fiemap->fm_start + fiemap->fm_length - 1);

If yes, then the file range printed would be, 
[start_offset, start_offset + len - 1]


> 
> Brian
> 
> > 
> > 
> > _______________________________________________
> > xfs mailing list
> > xfs@oss.sgi.com
> > http://oss.sgi.com/mailman/listinfo/xfs
>
diff mbox

Patch

diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index d75621a..d2fbc42 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -1060,7 +1060,7 @@  xfs_vn_fiemap(
 	if (length == FIEMAP_MAX_OFFSET)
 		bm.bmv_length = -1LL;
 	else
-		bm.bmv_length = BTOBB(length);
+		bm.bmv_length = BTOBBT(length);
 
 	/* We add one because in getbmap world count includes the header */
 	bm.bmv_count = !fieinfo->fi_extents_max ? MAXEXTNUM :