diff mbox

btrfs: fix file truncation if FALLOC_FL_KEEP_SIZE is specified

Message ID 51CD1C08.40801@oracle.com (mailing list archive)
State Accepted, archived
Headers show

Commit Message

jeff.liu June 28, 2013, 5:15 a.m. UTC
From: Jie Liu <jeff.liu@oracle.com>

Create a small file and fallocate it to a big size with
FALLOC_FL_KEEP_SIZE option, then truncate it back to the
small size again, the disk free space is not changed back
in this case. i.e,

# dd if=/dev/zero of=/mnt/test bs=512 count=1
# ls -l /mnt
total 4
-rw-r--r-- 1 root root 512 Jun 28 11:35 test

# df -h
Filesystem      Size  Used Avail Use% Mounted on
....
/dev/sdb1       8.0G   56K  7.2G   1% /mnt

# xfs_io -c 'falloc -k 512 5G' /mnt/test 
# ls -l /mnt/test
-rw-r--r-- 1 root root 512 Jun 28 11:35 /mnt/test

# sync; df -h
Filesystem      Size  Used Avail Use% Mounted on
....
/dev/sdb1       8.0G  5.1G  2.2G  70% /mnt

# xfs_io -c 'truncate 512' /mnt/test
# sync; df -h
Filesystem      Size  Used Avail Use% Mounted on
....
/dev/sdb1       8.0G  5.1G  2.2G  70% /mnt

With this fix, the truncated up space is back as:
# sync; df -h
Filesystem      Size  Used Avail Use% Mounted on
....
/dev/sdb1       8.0G   56K  7.2G   1% /mnt

Signed-off-by: Jie Liu <jeff.liu@oracle.com>
---
 fs/btrfs/inode.c |    3 ---
 1 file changed, 3 deletions(-)

Comments

Josef Bacik June 28, 2013, 12:41 p.m. UTC | #1
On Fri, Jun 28, 2013 at 01:15:52PM +0800, Jeff Liu wrote:
> From: Jie Liu <jeff.liu@oracle.com>
> 
> Create a small file and fallocate it to a big size with
> FALLOC_FL_KEEP_SIZE option, then truncate it back to the
> small size again, the disk free space is not changed back
> in this case. i.e,
> 
> # dd if=/dev/zero of=/mnt/test bs=512 count=1
> # ls -l /mnt
> total 4
> -rw-r--r-- 1 root root 512 Jun 28 11:35 test
> 
> # df -h
> Filesystem      Size  Used Avail Use% Mounted on
> ....
> /dev/sdb1       8.0G   56K  7.2G   1% /mnt
> 
> # xfs_io -c 'falloc -k 512 5G' /mnt/test 
> # ls -l /mnt/test
> -rw-r--r-- 1 root root 512 Jun 28 11:35 /mnt/test
> 
> # sync; df -h
> Filesystem      Size  Used Avail Use% Mounted on
> ....
> /dev/sdb1       8.0G  5.1G  2.2G  70% /mnt
> 
> # xfs_io -c 'truncate 512' /mnt/test
> # sync; df -h
> Filesystem      Size  Used Avail Use% Mounted on
> ....
> /dev/sdb1       8.0G  5.1G  2.2G  70% /mnt
> 
> With this fix, the truncated up space is back as:
> # sync; df -h
> Filesystem      Size  Used Avail Use% Mounted on
> ....
> /dev/sdb1       8.0G   56K  7.2G   1% /mnt
> 
> Signed-off-by: Jie Liu <jeff.liu@oracle.com>
> ---
>  fs/btrfs/inode.c |    3 ---
>  1 file changed, 3 deletions(-)
> 
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 4f9d16b..7e1a5ff 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -4509,9 +4509,6 @@ static int btrfs_setsize(struct inode *inode, struct iattr *attr)
>  	int mask = attr->ia_valid;
>  	int ret;
>  
> -	if (newsize == oldsize)
> -		return 0;
> -
>  	/*
>  	 * The regular truncate() case without ATTR_CTIME and ATTR_MTIME is a
>  	 * special case where we need to update the times despite not having

Cc'ing a few people on this since I'd like their opinion.  Looking at other fs's
it looks like ext4 does the same thing we do and would leave the prealloc'ed
space, but it appears that xfs will truncate it.  What do we think is the
correct behavior?  I'm inclined to take this patch, but I'd like to have an
xfstest made for it so other file systems can be made to be consistent, and I'd
like to make sure we all agree what is the correct behavior before we wander
down that road.  Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
jeff.liu June 28, 2013, 1:07 p.m. UTC | #2
On 06/28/2013 08:41 PM, Josef Bacik wrote:

> On Fri, Jun 28, 2013 at 01:15:52PM +0800, Jeff Liu wrote:
>> From: Jie Liu <jeff.liu@oracle.com>
>>
>> Create a small file and fallocate it to a big size with
>> FALLOC_FL_KEEP_SIZE option, then truncate it back to the
>> small size again, the disk free space is not changed back
>> in this case. i.e,
>>
>> # dd if=/dev/zero of=/mnt/test bs=512 count=1
>> # ls -l /mnt
>> total 4
>> -rw-r--r-- 1 root root 512 Jun 28 11:35 test
>>
>> # df -h
>> Filesystem      Size  Used Avail Use% Mounted on
>> ....
>> /dev/sdb1       8.0G   56K  7.2G   1% /mnt
>>
>> # xfs_io -c 'falloc -k 512 5G' /mnt/test 
>> # ls -l /mnt/test
>> -rw-r--r-- 1 root root 512 Jun 28 11:35 /mnt/test
>>
>> # sync; df -h
>> Filesystem      Size  Used Avail Use% Mounted on
>> ....
>> /dev/sdb1       8.0G  5.1G  2.2G  70% /mnt
>>
>> # xfs_io -c 'truncate 512' /mnt/test
>> # sync; df -h
>> Filesystem      Size  Used Avail Use% Mounted on
>> ....
>> /dev/sdb1       8.0G  5.1G  2.2G  70% /mnt
>>
>> With this fix, the truncated up space is back as:
>> # sync; df -h
>> Filesystem      Size  Used Avail Use% Mounted on
>> ....
>> /dev/sdb1       8.0G   56K  7.2G   1% /mnt
>>
>> Signed-off-by: Jie Liu <jeff.liu@oracle.com>
>> ---
>>  fs/btrfs/inode.c |    3 ---
>>  1 file changed, 3 deletions(-)
>>
>> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
>> index 4f9d16b..7e1a5ff 100644
>> --- a/fs/btrfs/inode.c
>> +++ b/fs/btrfs/inode.c
>> @@ -4509,9 +4509,6 @@ static int btrfs_setsize(struct inode *inode, struct iattr *attr)
>>  	int mask = attr->ia_valid;
>>  	int ret;
>>  
>> -	if (newsize == oldsize)
>> -		return 0;
>> -
>>  	/*
>>  	 * The regular truncate() case without ATTR_CTIME and ATTR_MTIME is a
>>  	 * special case where we need to update the times despite not having
> 
> Cc'ing a few people on this since I'd like their opinion.  Looking at other fs's
> it looks like ext4 does the same thing we do and would leave the prealloc'ed
> space, but it appears that xfs will truncate it.  What do we think is the
> correct behavior?  I'm inclined to take this patch, but I'd like to have an
> xfstest made for it so other file systems can be made to be consistent, and I'd
> like to make sure we all agree what is the correct behavior before we wander
> down that road.  Thanks,

Looks Ext4 does the same thing to XFS in this case :), but OCFS2 does not.
I'd like to write a test case for xfstest if we reach an agreement.

Thanks,
-Jeff


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Josef Bacik June 28, 2013, 1:12 p.m. UTC | #3
On Fri, Jun 28, 2013 at 09:07:49PM +0800, Jeff Liu wrote:
> On 06/28/2013 08:41 PM, Josef Bacik wrote:
> 
> > On Fri, Jun 28, 2013 at 01:15:52PM +0800, Jeff Liu wrote:
> >> From: Jie Liu <jeff.liu@oracle.com>
> >>
> >> Create a small file and fallocate it to a big size with
> >> FALLOC_FL_KEEP_SIZE option, then truncate it back to the
> >> small size again, the disk free space is not changed back
> >> in this case. i.e,
> >>
> >> # dd if=/dev/zero of=/mnt/test bs=512 count=1
> >> # ls -l /mnt
> >> total 4
> >> -rw-r--r-- 1 root root 512 Jun 28 11:35 test
> >>
> >> # df -h
> >> Filesystem      Size  Used Avail Use% Mounted on
> >> ....
> >> /dev/sdb1       8.0G   56K  7.2G   1% /mnt
> >>
> >> # xfs_io -c 'falloc -k 512 5G' /mnt/test 
> >> # ls -l /mnt/test
> >> -rw-r--r-- 1 root root 512 Jun 28 11:35 /mnt/test
> >>
> >> # sync; df -h
> >> Filesystem      Size  Used Avail Use% Mounted on
> >> ....
> >> /dev/sdb1       8.0G  5.1G  2.2G  70% /mnt
> >>
> >> # xfs_io -c 'truncate 512' /mnt/test
> >> # sync; df -h
> >> Filesystem      Size  Used Avail Use% Mounted on
> >> ....
> >> /dev/sdb1       8.0G  5.1G  2.2G  70% /mnt
> >>
> >> With this fix, the truncated up space is back as:
> >> # sync; df -h
> >> Filesystem      Size  Used Avail Use% Mounted on
> >> ....
> >> /dev/sdb1       8.0G   56K  7.2G   1% /mnt
> >>
> >> Signed-off-by: Jie Liu <jeff.liu@oracle.com>
> >> ---
> >>  fs/btrfs/inode.c |    3 ---
> >>  1 file changed, 3 deletions(-)
> >>
> >> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> >> index 4f9d16b..7e1a5ff 100644
> >> --- a/fs/btrfs/inode.c
> >> +++ b/fs/btrfs/inode.c
> >> @@ -4509,9 +4509,6 @@ static int btrfs_setsize(struct inode *inode, struct iattr *attr)
> >>  	int mask = attr->ia_valid;
> >>  	int ret;
> >>  
> >> -	if (newsize == oldsize)
> >> -		return 0;
> >> -
> >>  	/*
> >>  	 * The regular truncate() case without ATTR_CTIME and ATTR_MTIME is a
> >>  	 * special case where we need to update the times despite not having
> > 
> > Cc'ing a few people on this since I'd like their opinion.  Looking at other fs's
> > it looks like ext4 does the same thing we do and would leave the prealloc'ed
> > space, but it appears that xfs will truncate it.  What do we think is the
> > correct behavior?  I'm inclined to take this patch, but I'd like to have an
> > xfstest made for it so other file systems can be made to be consistent, and I'd
> > like to make sure we all agree what is the correct behavior before we wander
> > down that road.  Thanks,
> 
> Looks Ext4 does the same thing to XFS in this case :), but OCFS2 does not.
> I'd like to write a test case for xfstest if we reach an agreement.
> 

This is where I realize I didn't actually format my fs as ext4 when I was seeing
what ext4 did.  So looks like you are right and this is what we should be doing,
I'll take this and I'll look out for the xfstest.  Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dave Chinner June 29, 2013, 2:22 a.m. UTC | #4
On Fri, Jun 28, 2013 at 08:41:00AM -0400, Josef Bacik wrote:
> On Fri, Jun 28, 2013 at 01:15:52PM +0800, Jeff Liu wrote:
> > From: Jie Liu <jeff.liu@oracle.com>
> > 
> > Create a small file and fallocate it to a big size with
> > FALLOC_FL_KEEP_SIZE option, then truncate it back to the
> > small size again, the disk free space is not changed back
> > in this case. i.e,
> > 
> > # dd if=/dev/zero of=/mnt/test bs=512 count=1
> > # ls -l /mnt
> > total 4
> > -rw-r--r-- 1 root root 512 Jun 28 11:35 test
> > 
> > # df -h
> > Filesystem      Size  Used Avail Use% Mounted on
> > ....
> > /dev/sdb1       8.0G   56K  7.2G   1% /mnt
> > 
> > # xfs_io -c 'falloc -k 512 5G' /mnt/test 
> > # ls -l /mnt/test
> > -rw-r--r-- 1 root root 512 Jun 28 11:35 /mnt/test
> > 
> > # sync; df -h
> > Filesystem      Size  Used Avail Use% Mounted on
> > ....
> > /dev/sdb1       8.0G  5.1G  2.2G  70% /mnt
> > 
> > # xfs_io -c 'truncate 512' /mnt/test
> > # sync; df -h
> > Filesystem      Size  Used Avail Use% Mounted on
> > ....
> > /dev/sdb1       8.0G  5.1G  2.2G  70% /mnt
> > 
> > With this fix, the truncated up space is back as:
> > # sync; df -h
> > Filesystem      Size  Used Avail Use% Mounted on
> > ....
> > /dev/sdb1       8.0G   56K  7.2G   1% /mnt
> > 
> > Signed-off-by: Jie Liu <jeff.liu@oracle.com>
> > ---
> >  fs/btrfs/inode.c |    3 ---
> >  1 file changed, 3 deletions(-)
> > 
> > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> > index 4f9d16b..7e1a5ff 100644
> > --- a/fs/btrfs/inode.c
> > +++ b/fs/btrfs/inode.c
> > @@ -4509,9 +4509,6 @@ static int btrfs_setsize(struct inode *inode, struct iattr *attr)
> >  	int mask = attr->ia_valid;
> >  	int ret;
> >  
> > -	if (newsize == oldsize)
> > -		return 0;
> > -
> >  	/*
> >  	 * The regular truncate() case without ATTR_CTIME and ATTR_MTIME is a
> >  	 * special case where we need to update the times despite not having
> 
> Cc'ing a few people on this since I'd like their opinion.  Looking at other fs's
> it looks like ext4 does the same thing we do and would leave the prealloc'ed
> space, but it appears that xfs will truncate it.  What do we think is the
> correct behavior? 

XFS has had this truncate behaviour since at least the start of the
git tree history (2005). Given that these fallocate()
prealloc-blocks-beyond-EOF behaviours are modelled on what XFS has
historically provided, I think y'all can see what i think should be
done...

> I'm inclined to take this patch, but I'd like to have an
> xfstest made for it so other file systems can be made to be consistent, and I'd
> like to make sure we all agree what is the correct behavior before we wander
> down that road.  Thanks,

I couldn't have said it better myself. Jeff, can you take care of
this, please?

Cheers,

Dave.
jeff.liu June 29, 2013, 2:44 a.m. UTC | #5
On 06/29/2013 10:22 AM, Dave Chinner wrote:

> On Fri, Jun 28, 2013 at 08:41:00AM -0400, Josef Bacik wrote:
>> On Fri, Jun 28, 2013 at 01:15:52PM +0800, Jeff Liu wrote:
>>> From: Jie Liu <jeff.liu@oracle.com>
>>>
>>> Create a small file and fallocate it to a big size with
>>> FALLOC_FL_KEEP_SIZE option, then truncate it back to the
>>> small size again, the disk free space is not changed back
>>> in this case. i.e,
>>>
>>> # dd if=/dev/zero of=/mnt/test bs=512 count=1
>>> # ls -l /mnt
>>> total 4
>>> -rw-r--r-- 1 root root 512 Jun 28 11:35 test
>>>
>>> # df -h
>>> Filesystem      Size  Used Avail Use% Mounted on
>>> ....
>>> /dev/sdb1       8.0G   56K  7.2G   1% /mnt
>>>
>>> # xfs_io -c 'falloc -k 512 5G' /mnt/test 
>>> # ls -l /mnt/test
>>> -rw-r--r-- 1 root root 512 Jun 28 11:35 /mnt/test
>>>
>>> # sync; df -h
>>> Filesystem      Size  Used Avail Use% Mounted on
>>> ....
>>> /dev/sdb1       8.0G  5.1G  2.2G  70% /mnt
>>>
>>> # xfs_io -c 'truncate 512' /mnt/test
>>> # sync; df -h
>>> Filesystem      Size  Used Avail Use% Mounted on
>>> ....
>>> /dev/sdb1       8.0G  5.1G  2.2G  70% /mnt
>>>
>>> With this fix, the truncated up space is back as:
>>> # sync; df -h
>>> Filesystem      Size  Used Avail Use% Mounted on
>>> ....
>>> /dev/sdb1       8.0G   56K  7.2G   1% /mnt
>>>
>>> Signed-off-by: Jie Liu <jeff.liu@oracle.com>
>>> ---
>>>  fs/btrfs/inode.c |    3 ---
>>>  1 file changed, 3 deletions(-)
>>>
>>> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
>>> index 4f9d16b..7e1a5ff 100644
>>> --- a/fs/btrfs/inode.c
>>> +++ b/fs/btrfs/inode.c
>>> @@ -4509,9 +4509,6 @@ static int btrfs_setsize(struct inode *inode, struct iattr *attr)
>>>  	int mask = attr->ia_valid;
>>>  	int ret;
>>>  
>>> -	if (newsize == oldsize)
>>> -		return 0;
>>> -
>>>  	/*
>>>  	 * The regular truncate() case without ATTR_CTIME and ATTR_MTIME is a
>>>  	 * special case where we need to update the times despite not having
>>
>> Cc'ing a few people on this since I'd like their opinion.  Looking at other fs's
>> it looks like ext4 does the same thing we do and would leave the prealloc'ed
>> space, but it appears that xfs will truncate it.  What do we think is the
>> correct behavior? 
> 
> XFS has had this truncate behaviour since at least the start of the
> git tree history (2005). Given that these fallocate()
> prealloc-blocks-beyond-EOF behaviours are modelled on what XFS has
> historically provided, I think y'all can see what i think should be
> done...
> 
>> I'm inclined to take this patch, but I'd like to have an
>> xfstest made for it so other file systems can be made to be consistent, and I'd
>> like to make sure we all agree what is the correct behavior before we wander
>> down that road.  Thanks,
> 
> I couldn't have said it better myself. Jeff, can you take care of
> this, please?

Yes, I'll take care of this.

Thanks,
-Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 4f9d16b..7e1a5ff 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -4509,9 +4509,6 @@  static int btrfs_setsize(struct inode *inode, struct iattr *attr)
 	int mask = attr->ia_valid;
 	int ret;
 
-	if (newsize == oldsize)
-		return 0;
-
 	/*
 	 * The regular truncate() case without ATTR_CTIME and ATTR_MTIME is a
 	 * special case where we need to update the times despite not having