diff mbox

[v2,2/2] Btrfs: qgroup: Introduce a may_use to account space_info->bytes_may_use.

Message ID 1418373875-2921-2-git-send-email-yangds.fnst@cn.fujitsu.com (mailing list archive)
State Superseded
Headers show

Commit Message

Yang Dongsheng Dec. 12, 2014, 8:44 a.m. UTC
Currently, for pre_alloc or delay_alloc, the bytes will be accounted
in space_info by the three guys.
space_info->bytes_may_use --- space_info->reserved --- space_info->used.
But on the other hand, in qgroup, there are only two counters to account the
bytes, qgroup->reserved and qgroup->excl. And qg->reserved accounts
bytes in space_info->bytes_may_use and qg->excl accounts bytes in
space_info->used. So the bytes in space_info->reserved is not accounted
in qgroup. If so, there is a window we can exceed the quota limit when
bytes is in space_info->reserved.

Example:
	# btrfs quota enable /mnt
	# btrfs qgroup limit -e 10M /mnt
	# for((i=0;i<20;i++));do fallocate -l 1M /mnt/data$i; done
	# sync
	# btrfs qgroup show -pcre /mnt
qgroupid rfer     excl     max_rfer max_excl parent  child
-------- ----     ----     -------- -------- ------  -----
0/5      20987904 20987904 0        10485760 ---     ---

qg->excl is 20987904 larger than max_excl 10485760.

This patch introduce a new counter named may_use to qgroup, then
there are three counters in qgroup to account bytes in space_info
as below.
space_info->bytes_may_use --- space_info->reserved --- space_info->used.
qgroup->may_use           --- qgroup->reserved     --- qgroup->excl

With this patch applied:
	# btrfs quota enable /mnt
	# btrfs qgroup limit -e 10M /mnt
	# for((i=0;i<20;i++));do fallocate -l 1M /mnt/data$i; done
fallocate: /mnt/data9: fallocate failed: Disk quota exceeded
fallocate: /mnt/data10: fallocate failed: Disk quota exceeded
fallocate: /mnt/data11: fallocate failed: Disk quota exceeded
fallocate: /mnt/data12: fallocate failed: Disk quota exceeded
fallocate: /mnt/data13: fallocate failed: Disk quota exceeded
fallocate: /mnt/data14: fallocate failed: Disk quota exceeded
fallocate: /mnt/data15: fallocate failed: Disk quota exceeded
fallocate: /mnt/data16: fallocate failed: Disk quota exceeded
fallocate: /mnt/data17: fallocate failed: Disk quota exceeded
fallocate: /mnt/data18: fallocate failed: Disk quota exceeded
fallocate: /mnt/data19: fallocate failed: Disk quota exceeded
	# sync
	# btrfs qgroup show -pcre /mnt
qgroupid rfer    excl    max_rfer max_excl parent  child
-------- ----    ----    -------- -------- ------  -----
0/5      9453568 9453568 0        10485760 ---     ---

Reported-by: Cyril SCETBON <cyril.scetbon@free.fr>
Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com>
---
Changelog:
	v1 -> v2:
		Remove the redundant check for fs_info->quota_enabled.

 fs/btrfs/extent-tree.c | 20 ++++++++++++++-
 fs/btrfs/inode.c       | 18 ++++++++++++-
 fs/btrfs/qgroup.c      | 68 +++++++++++++++++++++++++++++++++++++++++++++++---
 fs/btrfs/qgroup.h      |  4 +++
 4 files changed, 104 insertions(+), 6 deletions(-)

Comments

Satoru Takeuchi Dec. 26, 2014, 5:43 a.m. UTC | #1
Hi Yang,

On 2014/12/26 10:32, Dongsheng Yang wrote:
> Hi Satoru,
>
> I saw your mail of "[BUG] "Quota Ignored On write" problem still exist with 3.16-rc5 <http://news.gmane.org/find-root.php?message_id=53C8DEB0.1060404%40jp.fujitsu.com>" in gmane.
> I guess this patch will fix the problem you mentioned.
> Could you help to give a try?

Although it reached disk quota before writing whole 1.5GB,
the copied size was only about 700 MB. In this case,
expected behavior is to reach disk quota just after
writing 1 GB.

* 3.19-rc1 without your two v2 patches

===============================================================================
+ mkfs.btrfs -f /dev/vdb
Btrfs v3.17
See http://btrfs.wiki.kernel.org for more information.

Turning ON incompat feature 'extref': increased hardlink limit per file to 65536
fs created label (null) on /dev/vdb
         nodesize 16384 leafsize 16384 sectorsize 4096 size 30.00GiB
+ mount /dev/vdb /root/btrfs-auto-test/
+ ret=0
+ btrfs quota enable /root/btrfs-auto-test/
+ btrfs subvolume create /root/btrfs-auto-test//sub
Create subvolume '/root/btrfs-auto-test/sub'
+ btrfs qgroup limit 1G /root/btrfs-auto-test//sub
+ dd if=/dev/zero of=/root/btrfs-auto-test//sub/file bs=1024 count=1500000
1500000+0 records in
1500000+0 records out
1536000000 bytes (1.5 GB) copied, 25.6601 s, 59.9 MB/s
===============================================================================

* 3.19-rc1 with your two v2 patches

===============================================================================
+ mkfs.btrfs -f /dev/vdb
Btrfs v3.17
See http://btrfs.wiki.kernel.org for more information.

Turning ON incompat feature 'extref': increased hardlink limit per file to 65536
fs created label (null) on /dev/vdb
         nodesize 16384 leafsize 16384 sectorsize 4096 size 30.00GiB
+ mount /dev/vdb /root/btrfs-auto-test/
+ ret=0
+ btrfs quota enable /root/btrfs-auto-test/
+ btrfs subvolume create /root/btrfs-auto-test//sub
Create subvolume '/root/btrfs-auto-test/sub'
+ btrfs qgroup limit 1G /root/btrfs-auto-test//sub
+ dd if=/dev/zero of=/root/btrfs-auto-test//sub/file bs=1024 count=1500000
dd: error writing '/root/btrfs-auto-test//sub/file': Disk quota exceeded
681353+0 records in
681352+0 records out
697704448 bytes (698 MB) copied, 8.15563 s, 85.5 MB/s
===============================================================================

Thanks,
Satoru

>
> Thanx
> Yang
>
> On 12/12/2014 04:44 PM, Dongsheng Yang wrote:
>> Currently, for pre_alloc or delay_alloc, the bytes will be accounted
>> in space_info by the three guys.
>> space_info->bytes_may_use --- space_info->reserved --- space_info->used.
>> But on the other hand, in qgroup, there are only two counters to account the
>> bytes, qgroup->reserved and qgroup->excl. And qg->reserved accounts
>> bytes in space_info->bytes_may_use and qg->excl accounts bytes in
>> space_info->used. So the bytes in space_info->reserved is not accounted
>> in qgroup. If so, there is a window we can exceed the quota limit when
>> bytes is in space_info->reserved.
>>
>> Example:
>> 	# btrfs quota enable /mnt
>> 	# btrfs qgroup limit -e 10M /mnt
>> 	# for((i=0;i<20;i++));do fallocate -l 1M /mnt/data$i; done
>> 	# sync
>> 	# btrfs qgroup show -pcre /mnt
>> qgroupid rfer     excl     max_rfer max_excl parent  child
>> -------- ----     ----     -------- -------- ------  -----
>> 0/5      20987904 20987904 0        10485760 ---     ---
>>
>> qg->excl is 20987904 larger than max_excl 10485760.
>>
>> This patch introduce a new counter named may_use to qgroup, then
>> there are three counters in qgroup to account bytes in space_info
>> as below.
>> space_info->bytes_may_use --- space_info->reserved --- space_info->used.
>> qgroup->may_use           --- qgroup->reserved     --- qgroup->excl
>>
>> With this patch applied:
>> 	# btrfs quota enable /mnt
>> 	# btrfs qgroup limit -e 10M /mnt
>> 	# for((i=0;i<20;i++));do fallocate -l 1M /mnt/data$i; done
>> fallocate: /mnt/data9: fallocate failed: Disk quota exceeded
>> fallocate: /mnt/data10: fallocate failed: Disk quota exceeded
>> fallocate: /mnt/data11: fallocate failed: Disk quota exceeded
>> fallocate: /mnt/data12: fallocate failed: Disk quota exceeded
>> fallocate: /mnt/data13: fallocate failed: Disk quota exceeded
>> fallocate: /mnt/data14: fallocate failed: Disk quota exceeded
>> fallocate: /mnt/data15: fallocate failed: Disk quota exceeded
>> fallocate: /mnt/data16: fallocate failed: Disk quota exceeded
>> fallocate: /mnt/data17: fallocate failed: Disk quota exceeded
>> fallocate: /mnt/data18: fallocate failed: Disk quota exceeded
>> fallocate: /mnt/data19: fallocate failed: Disk quota exceeded
>> 	# sync
>> 	# btrfs qgroup show -pcre /mnt
>> qgroupid rfer    excl    max_rfer max_excl parent  child
>> -------- ----    ----    -------- -------- ------  -----
>> 0/5      9453568 9453568 0        10485760 ---     ---
>>
>> Reported-by: Cyril SCETBON<cyril.scetbon@free.fr>
>> Signed-off-by: Dongsheng Yang<yangds.fnst@cn.fujitsu.com>
>> ---
>> Changelog:
>> 	v1 -> v2:
>> 		Remove the redundant check for fs_info->quota_enabled.
>>
>>   fs/btrfs/extent-tree.c | 20 ++++++++++++++-
>>   fs/btrfs/inode.c       | 18 ++++++++++++-
>>   fs/btrfs/qgroup.c      | 68 +++++++++++++++++++++++++++++++++++++++++++++++---
>>   fs/btrfs/qgroup.h      |  4 +++
>>   4 files changed, 104 insertions(+), 6 deletions(-)
>>
>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>> index 014b7f2..f4ad737 100644
>> --- a/fs/btrfs/extent-tree.c
>> +++ b/fs/btrfs/extent-tree.c
>> @@ -5500,8 +5500,12 @@ static int pin_down_extent(struct btrfs_root *root,
>>
>>   	set_extent_dirty(root->fs_info->pinned_extents, bytenr,
>>   			 bytenr + num_bytes - 1, GFP_NOFS | __GFP_NOFAIL);
>> -	if (reserved)
>> +	if (reserved) {
>> +		btrfs_qgroup_update_reserved_bytes(root->fs_info,
>> +						   root->root_key.objectid,
>> +						   num_bytes, -1);
>>   		trace_btrfs_reserved_extent_free(root, bytenr, num_bytes);
>> +	}
>>   	return 0;
>>   }
>>
>> @@ -6230,6 +6234,9 @@ void btrfs_free_tree_block(struct btrfs_trans_handle *trans,
>>   		btrfs_update_reserved_bytes(cache, buf->len, RESERVE_FREE, 0);
>>   		trace_btrfs_reserved_extent_free(root, buf->start, buf->len);
>>   		pin = 0;
>> +		btrfs_qgroup_update_reserved_bytes(root->fs_info,
>> +						   root->root_key.objectid,
>> +						   buf->len, -1);
>>   	}
>>   out:
>>   	if (pin)
>> @@ -6964,7 +6971,11 @@ static int __btrfs_free_reserved_extent(struct btrfs_root *root,
>>   	else {
>>   		btrfs_add_free_space(cache, start, len);
>>   		btrfs_update_reserved_bytes(cache, len, RESERVE_FREE, delalloc);
>> +		btrfs_qgroup_update_reserved_bytes(root->fs_info,
>> +						   root->root_key.objectid,
>> +						   len, -1);
>>   	}
>> +
>>   	btrfs_put_block_group(cache);
>>
>>   	trace_btrfs_reserved_extent_free(root, start, len);
>> @@ -7200,6 +7211,9 @@ int btrfs_alloc_logged_file_extent(struct btrfs_trans_handle *trans,
>>   	BUG_ON(ret); /* logic error */
>>   	ret = alloc_reserved_file_extent(trans, root, 0, root_objectid,
>>   					 0, owner, offset, ins, 1);
>> +	btrfs_qgroup_update_reserved_bytes(root->fs_info,
>> +					   root->root_key.objectid,
>> +					   ins->offset, 1);
>>   	btrfs_put_block_group(block_group);
>>   	return ret;
>>   }
>> @@ -7346,6 +7360,10 @@ struct extent_buffer *btrfs_alloc_tree_block(struct btrfs_trans_handle *trans,
>>   		return ERR_PTR(ret);
>>   	}
>>
>> +	btrfs_qgroup_update_reserved_bytes(root->fs_info,
>> +					   root_objectid,
>> +					   ins.offset, 1);
>> +
>>   	buf = btrfs_init_new_buffer(trans, root, ins.objectid,
>>   				    blocksize, level);
>>   	BUG_ON(IS_ERR(buf)); /* -ENOMEM */
>> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
>> index d23362f..0c824f3 100644
>> --- a/fs/btrfs/inode.c
>> +++ b/fs/btrfs/inode.c
>> @@ -59,6 +59,7 @@
>>   #include "backref.h"
>>   #include "hash.h"
>>   #include "props.h"
>> +#include "qgroup.h"
>>
>>   struct btrfs_iget_args {
>>   	struct btrfs_key *location;
>> @@ -739,7 +740,9 @@ retry:
>>   			}
>>   			goto out_free;
>>   		}
>> -
>> +		btrfs_qgroup_update_reserved_bytes(root->fs_info,
>> +						   root->root_key.objectid,
>> +						   ins.offset, 1);
>>   		/*
>>   		 * here we're doing allocation and writeback of the
>>   		 * compressed pages
>> @@ -951,6 +954,10 @@ static noinline int cow_file_range(struct inode *inode,
>>   		if (ret < 0)
>>   			goto out_unlock;
>>
>> +		btrfs_qgroup_update_reserved_bytes(root->fs_info,
>> +						   root->root_key.objectid,
>> +						   ins.offset, 1);
>> +
>>   		em = alloc_extent_map();
>>   		if (!em) {
>>   			ret = -ENOMEM;
>> @@ -6745,6 +6752,10 @@ static struct extent_map *btrfs_new_extent_direct(struct inode *inode,
>>   		return ERR_PTR(ret);
>>   	}
>>
>> +	btrfs_qgroup_update_reserved_bytes(root->fs_info,
>> +					   root->root_key.objectid,
>> +					   ins.offset, 1);
>> +
>>   	return em;
>>   }
>>
>> @@ -9266,6 +9277,11 @@ static int __btrfs_prealloc_file_range(struct inode *inode, int mode,
>>   				btrfs_end_transaction(trans, root);
>>   			break;
>>   		}
>> +
>> +		btrfs_qgroup_update_reserved_bytes(root->fs_info,
>> +						   root->root_key.objectid,
>> +						   ins.offset, 1);
>> +
>>   		btrfs_drop_extent_cache(inode, cur_offset,
>>   					cur_offset + ins.offset -1, 0);
>>
>> diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
>> index 48b60db..d147082 100644
>> --- a/fs/btrfs/qgroup.c
>> +++ b/fs/btrfs/qgroup.c
>> @@ -72,6 +72,7 @@ struct btrfs_qgroup {
>>   	/*
>>   	 * reservation tracking
>>   	 */
>> +	u64 may_use;
>>   	u64 reserved;
>>
>>   	/*
>> @@ -1414,6 +1415,8 @@ static int qgroup_excl_accounting(struct btrfs_fs_info *fs_info,
>>   	WARN_ON(sign < 0 && qgroup->excl < oper->num_bytes);
>>   	qgroup->excl += sign * oper->num_bytes;
>>   	qgroup->excl_cmpr += sign * oper->num_bytes;
>> +	if (sign > 0)
>> +		qgroup->reserved -= oper->num_bytes;
>>
>>   	qgroup_dirty(fs_info, qgroup);
>>
>> @@ -1434,6 +1437,8 @@ static int qgroup_excl_accounting(struct btrfs_fs_info *fs_info,
>>   		qgroup->excl += sign * oper->num_bytes;
>>   		if (sign < 0)
>>   			WARN_ON(qgroup->excl < oper->num_bytes);
>> +		if (sign > 0)
>> +			qgroup->reserved -= oper->num_bytes;
>>   		qgroup->excl_cmpr += sign * oper->num_bytes;
>>   		qgroup_dirty(fs_info, qgroup);
>>
>> @@ -2359,6 +2364,61 @@ out:
>>   	return ret;
>>   }
>>
>> +int btrfs_qgroup_update_reserved_bytes(struct btrfs_fs_info *fs_info,
>> +					    u64 ref_root,
>> +					    u64 num_bytes,
>> +					    int sign)
>> +{
>> +	struct btrfs_root *quota_root;
>> +	struct btrfs_qgroup *qgroup;
>> +	int ret = 0;
>> +	struct ulist_node *unode;
>> +	struct ulist_iterator uiter;
>> +
>> +	if (!is_fstree(ref_root) || !fs_info->quota_enabled)
>> +		return 0;
>> +
>> +	if (num_bytes == 0)
>> +		return 0;
>> +
>> +	spin_lock(&fs_info->qgroup_lock);
>> +	quota_root = fs_info->quota_root;
>> +	if (!quota_root)
>> +		goto out;
>> +
>> +	qgroup = find_qgroup_rb(fs_info, ref_root);
>> +	if (!qgroup)
>> +		goto out;
>> +
>> +	ulist_reinit(fs_info->qgroup_ulist);
>> +	ret = ulist_add(fs_info->qgroup_ulist, qgroup->qgroupid,
>> +			(uintptr_t)qgroup, GFP_ATOMIC);
>> +	if (ret < 0)
>> +		goto out;
>> +
>> +	ULIST_ITER_INIT(&uiter);
>> +	while ((unode = ulist_next(fs_info->qgroup_ulist, &uiter))) {
>> +		struct btrfs_qgroup *qg;
>> +		struct btrfs_qgroup_list *glist;
>> +
>> +		qg = u64_to_ptr(unode->aux);
>> +
>> +		qg->reserved += sign * num_bytes;
>> +
>> +		list_for_each_entry(glist, &qg->groups, next_group) {
>> +			ret = ulist_add(fs_info->qgroup_ulist,
>> +					glist->group->qgroupid,
>> +					(uintptr_t)glist->group, GFP_ATOMIC);
>> +			if (ret < 0)
>> +				goto out;
>> +		}
>> +	}
>> +
>> +out:
>> +	spin_unlock(&fs_info->qgroup_lock);
>> +	return ret;
>> +}
>> +
>>   /*
>>    * reserve some space for a qgroup and all its parents. The reservation takes
>>    * place with start_transaction or dealloc_reserve, similar to ENOSPC
>> @@ -2407,14 +2467,14 @@ int btrfs_qgroup_reserve(struct btrfs_root *root, u64 num_bytes)
>>   		qg = u64_to_ptr(unode->aux);
>>
>>   		if ((qg->lim_flags & BTRFS_QGROUP_LIMIT_MAX_RFER) &&
>> -		    qg->reserved + (s64)qg->rfer + num_bytes >
>> +		    qg->reserved + qg->may_use + (s64)qg->rfer + num_bytes >
>>   		    qg->max_rfer) {
>>   			ret = -EDQUOT;
>>   			goto out;
>>   		}
>>
>>   		if ((qg->lim_flags & BTRFS_QGROUP_LIMIT_MAX_EXCL) &&
>> -		    qg->reserved + (s64)qg->excl + num_bytes >
>> +		    qg->reserved + qg->may_use + (s64)qg->excl + num_bytes >
>>   		    qg->max_excl) {
>>   			ret = -EDQUOT;
>>   			goto out;
>> @@ -2438,7 +2498,7 @@ int btrfs_qgroup_reserve(struct btrfs_root *root, u64 num_bytes)
>>
>>   		qg = u64_to_ptr(unode->aux);
>>
>> -		qg->reserved += num_bytes;
>> +		qg->may_use += num_bytes;
>>   	}
>>
>>   out:
>> @@ -2484,7 +2544,7 @@ void btrfs_qgroup_free(struct btrfs_root *root, u64 num_bytes)
>>
>>   		qg = u64_to_ptr(unode->aux);
>>
>> -		qg->reserved -= num_bytes;
>> +		qg->may_use -= num_bytes;
>>
>>   		list_for_each_entry(glist, &qg->groups, next_group) {
>>   			ret = ulist_add(fs_info->qgroup_ulist,
>> diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h
>> index 18cc68c..31de88e 100644
>> --- a/fs/btrfs/qgroup.h
>> +++ b/fs/btrfs/qgroup.h
>> @@ -95,6 +95,10 @@ int btrfs_run_qgroups(struct btrfs_trans_handle *trans,
>>   int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans,
>>   			 struct btrfs_fs_info *fs_info, u64 srcid, u64 objectid,
>>   			 struct btrfs_qgroup_inherit *inherit);
>> +int btrfs_qgroup_update_reserved_bytes(struct btrfs_fs_info *fs_info,
>> +				       u64 ref_root,
>> +				       u64 num_bytes,
>> +				       int sign);
>>   int btrfs_qgroup_reserve(struct btrfs_root *root, u64 num_bytes);
>>   void btrfs_qgroup_free(struct btrfs_root *root, u64 num_bytes);
>>
>

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yang Dongsheng Dec. 26, 2014, 6:49 a.m. UTC | #2
On 12/26/2014 01:43 PM, Satoru Takeuchi wrote:
> Hi Yang,
>
> On 2014/12/26 10:32, Dongsheng Yang wrote:
>> Hi Satoru,
>>
>> I saw your mail of "[BUG] "Quota Ignored On write" problem still 
>> exist with 3.16-rc5 
>> <http://news.gmane.org/find-root.php?message_id=53C8DEB0.1060404%40jp.fujitsu.com>" 
>> in gmane.
>> I guess this patch will fix the problem you mentioned.
>> Could you help to give a try?
>
> Although it reached disk quota before writing whole 1.5GB,
> the copied size was only about 700 MB. In this case,
> expected behavior is to reach disk quota just after
> writing 1 GB.

Hi Satoru,
Thanx for your testing a lot. But stranger, I tried it in 3.19-rc1 and 
3.18, both works well
writing 1GB. Could you give me some more information about your testing 
machine?
And could you execute "sync && btrfs qgroup show -pecr $MNT" after writing?

*3.19-rc1 with my 2/2 patch applied.
=======================================================================
[root@atest-guest linux_btrfs]# uname -a
Linux atest-guest 3.19.0-rc1+ #66 SMP Fri Dec 26 10:38:12 EST 2014 
x86_64 x86_64 x86_64 GNU/Linux
[root@atest-guest linux_btrfs]# sh quota.sh
umount: /mnt: not mounted
Btrfs v3.17-1-gbef9fd4
See http://btrfs.wiki.kernel.org for more information.

Turning ON incompat feature 'extref': increased hardlink limit per file 
to 65536
fs created label (null) on /dev/vdb
     nodesize 16384 leafsize 16384 sectorsize 4096 size 50.00GiB
Create subvolume '/mnt/quota_test'
dd: error writing \u2018/mnt/quota_test/test\u2019: Disk quota exceeded
999842+0 records in
999841+0 records out
1023837184 bytes (1.0 GB) copied, 8.06996 s, 127 MB/s
[PASS] quota works correctly
[root@atest-guest linux_btrfs]# sync
[root@atest-guest linux_btrfs]# df -h /mnt/
Filesystem      Size  Used Avail Use% Mounted on
/dev/vdb         50G  996M   49G   2% /mnt
[root@atest-guest linux_btrfs]# btrfs qgroup show -pecr /mnt/
qgroupid rfer       excl       max_rfer   max_excl parent  child
-------- ----       ----       --------   -------- ------  -----
0/5      16384      16384      0          0        ---     ---
0/256    1023868928 1023868928 1024000000 0        ---     ---
=============================================================
>
> * 3.19-rc1 without your two v2 patches
>
> =============================================================================== 
>
> + mkfs.btrfs -f /dev/vdb
> Btrfs v3.17
> See http://btrfs.wiki.kernel.org for more information.
>
> Turning ON incompat feature 'extref': increased hardlink limit per 
> file to 65536
> fs created label (null) on /dev/vdb
>         nodesize 16384 leafsize 16384 sectorsize 4096 size 30.00GiB
> + mount /dev/vdb /root/btrfs-auto-test/
> + ret=0
> + btrfs quota enable /root/btrfs-auto-test/
> + btrfs subvolume create /root/btrfs-auto-test//sub
> Create subvolume '/root/btrfs-auto-test/sub'
> + btrfs qgroup limit 1G /root/btrfs-auto-test//sub
> + dd if=/dev/zero of=/root/btrfs-auto-test//sub/file bs=1024 
> count=1500000
> 1500000+0 records in
> 1500000+0 records out
> 1536000000 bytes (1.5 GB) copied, 25.6601 s, 59.9 MB/s
> =============================================================================== 
>
>
> * 3.19-rc1 with your two v2 patches
>
> =============================================================================== 
>
> + mkfs.btrfs -f /dev/vdb
> Btrfs v3.17
> See http://btrfs.wiki.kernel.org for more information.
>
> Turning ON incompat feature 'extref': increased hardlink limit per 
> file to 65536
> fs created label (null) on /dev/vdb
>         nodesize 16384 leafsize 16384 sectorsize 4096 size 30.00GiB
> + mount /dev/vdb /root/btrfs-auto-test/
> + ret=0
> + btrfs quota enable /root/btrfs-auto-test/
> + btrfs subvolume create /root/btrfs-auto-test//sub
> Create subvolume '/root/btrfs-auto-test/sub'
> + btrfs qgroup limit 1G /root/btrfs-auto-test//sub
> + dd if=/dev/zero of=/root/btrfs-auto-test//sub/file bs=1024 
> count=1500000
> dd: error writing '/root/btrfs-auto-test//sub/file': Disk quota exceeded
> 681353+0 records in
> 681352+0 records out
> 697704448 bytes (698 MB) copied, 8.15563 s, 85.5 MB/s
> =============================================================================== 
>
>
> Thanks,
> Satoru
>
>>
>> Thanx
>> Yang
>>
>> On 12/12/2014 04:44 PM, Dongsheng Yang wrote:
>>> Currently, for pre_alloc or delay_alloc, the bytes will be accounted
>>> in space_info by the three guys.
>>> space_info->bytes_may_use --- space_info->reserved --- 
>>> space_info->used.
>>> But on the other hand, in qgroup, there are only two counters to 
>>> account the
>>> bytes, qgroup->reserved and qgroup->excl. And qg->reserved accounts
>>> bytes in space_info->bytes_may_use and qg->excl accounts bytes in
>>> space_info->used. So the bytes in space_info->reserved is not accounted
>>> in qgroup. If so, there is a window we can exceed the quota limit when
>>> bytes is in space_info->reserved.
>>>
>>> Example:
>>>     # btrfs quota enable /mnt
>>>     # btrfs qgroup limit -e 10M /mnt
>>>     # for((i=0;i<20;i++));do fallocate -l 1M /mnt/data$i; done
>>>     # sync
>>>     # btrfs qgroup show -pcre /mnt
>>> qgroupid rfer     excl     max_rfer max_excl parent  child
>>> -------- ----     ----     -------- -------- ------  -----
>>> 0/5      20987904 20987904 0        10485760 ---     ---
>>>
>>> qg->excl is 20987904 larger than max_excl 10485760.
>>>
>>> This patch introduce a new counter named may_use to qgroup, then
>>> there are three counters in qgroup to account bytes in space_info
>>> as below.
>>> space_info->bytes_may_use --- space_info->reserved --- 
>>> space_info->used.
>>> qgroup->may_use           --- qgroup->reserved     --- qgroup->excl
>>>
>>> With this patch applied:
>>>     # btrfs quota enable /mnt
>>>     # btrfs qgroup limit -e 10M /mnt
>>>     # for((i=0;i<20;i++));do fallocate -l 1M /mnt/data$i; done
>>> fallocate: /mnt/data9: fallocate failed: Disk quota exceeded
>>> fallocate: /mnt/data10: fallocate failed: Disk quota exceeded
>>> fallocate: /mnt/data11: fallocate failed: Disk quota exceeded
>>> fallocate: /mnt/data12: fallocate failed: Disk quota exceeded
>>> fallocate: /mnt/data13: fallocate failed: Disk quota exceeded
>>> fallocate: /mnt/data14: fallocate failed: Disk quota exceeded
>>> fallocate: /mnt/data15: fallocate failed: Disk quota exceeded
>>> fallocate: /mnt/data16: fallocate failed: Disk quota exceeded
>>> fallocate: /mnt/data17: fallocate failed: Disk quota exceeded
>>> fallocate: /mnt/data18: fallocate failed: Disk quota exceeded
>>> fallocate: /mnt/data19: fallocate failed: Disk quota exceeded
>>>     # sync
>>>     # btrfs qgroup show -pcre /mnt
>>> qgroupid rfer    excl    max_rfer max_excl parent  child
>>> -------- ----    ----    -------- -------- ------  -----
>>> 0/5      9453568 9453568 0        10485760 ---     ---
>>>
>>> Reported-by: Cyril SCETBON<cyril.scetbon@free.fr>
>>> Signed-off-by: Dongsheng Yang<yangds.fnst@cn.fujitsu.com>
>>> ---
>>> Changelog:
>>>     v1 -> v2:
>>>         Remove the redundant check for fs_info->quota_enabled.
>>>
>>>   fs/btrfs/extent-tree.c | 20 ++++++++++++++-
>>>   fs/btrfs/inode.c       | 18 ++++++++++++-
>>>   fs/btrfs/qgroup.c      | 68 
>>> +++++++++++++++++++++++++++++++++++++++++++++++---
>>>   fs/btrfs/qgroup.h      |  4 +++
>>>   4 files changed, 104 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>>> index 014b7f2..f4ad737 100644
>>> --- a/fs/btrfs/extent-tree.c
>>> +++ b/fs/btrfs/extent-tree.c
>>> @@ -5500,8 +5500,12 @@ static int pin_down_extent(struct btrfs_root 
>>> *root,
>>>
>>>       set_extent_dirty(root->fs_info->pinned_extents, bytenr,
>>>                bytenr + num_bytes - 1, GFP_NOFS | __GFP_NOFAIL);
>>> -    if (reserved)
>>> +    if (reserved) {
>>> +        btrfs_qgroup_update_reserved_bytes(root->fs_info,
>>> +                           root->root_key.objectid,
>>> +                           num_bytes, -1);
>>>           trace_btrfs_reserved_extent_free(root, bytenr, num_bytes);
>>> +    }
>>>       return 0;
>>>   }
>>>
>>> @@ -6230,6 +6234,9 @@ void btrfs_free_tree_block(struct 
>>> btrfs_trans_handle *trans,
>>>           btrfs_update_reserved_bytes(cache, buf->len, RESERVE_FREE, 
>>> 0);
>>>           trace_btrfs_reserved_extent_free(root, buf->start, buf->len);
>>>           pin = 0;
>>> +        btrfs_qgroup_update_reserved_bytes(root->fs_info,
>>> +                           root->root_key.objectid,
>>> +                           buf->len, -1);
>>>       }
>>>   out:
>>>       if (pin)
>>> @@ -6964,7 +6971,11 @@ static int 
>>> __btrfs_free_reserved_extent(struct btrfs_root *root,
>>>       else {
>>>           btrfs_add_free_space(cache, start, len);
>>>           btrfs_update_reserved_bytes(cache, len, RESERVE_FREE, 
>>> delalloc);
>>> +        btrfs_qgroup_update_reserved_bytes(root->fs_info,
>>> +                           root->root_key.objectid,
>>> +                           len, -1);
>>>       }
>>> +
>>>       btrfs_put_block_group(cache);
>>>
>>>       trace_btrfs_reserved_extent_free(root, start, len);
>>> @@ -7200,6 +7211,9 @@ int btrfs_alloc_logged_file_extent(struct 
>>> btrfs_trans_handle *trans,
>>>       BUG_ON(ret); /* logic error */
>>>       ret = alloc_reserved_file_extent(trans, root, 0, root_objectid,
>>>                        0, owner, offset, ins, 1);
>>> +    btrfs_qgroup_update_reserved_bytes(root->fs_info,
>>> +                       root->root_key.objectid,
>>> +                       ins->offset, 1);
>>>       btrfs_put_block_group(block_group);
>>>       return ret;
>>>   }
>>> @@ -7346,6 +7360,10 @@ struct extent_buffer 
>>> *btrfs_alloc_tree_block(struct btrfs_trans_handle *trans,
>>>           return ERR_PTR(ret);
>>>       }
>>>
>>> +    btrfs_qgroup_update_reserved_bytes(root->fs_info,
>>> +                       root_objectid,
>>> +                       ins.offset, 1);
>>> +
>>>       buf = btrfs_init_new_buffer(trans, root, ins.objectid,
>>>                       blocksize, level);
>>>       BUG_ON(IS_ERR(buf)); /* -ENOMEM */
>>> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
>>> index d23362f..0c824f3 100644
>>> --- a/fs/btrfs/inode.c
>>> +++ b/fs/btrfs/inode.c
>>> @@ -59,6 +59,7 @@
>>>   #include "backref.h"
>>>   #include "hash.h"
>>>   #include "props.h"
>>> +#include "qgroup.h"
>>>
>>>   struct btrfs_iget_args {
>>>       struct btrfs_key *location;
>>> @@ -739,7 +740,9 @@ retry:
>>>               }
>>>               goto out_free;
>>>           }
>>> -
>>> +        btrfs_qgroup_update_reserved_bytes(root->fs_info,
>>> +                           root->root_key.objectid,
>>> +                           ins.offset, 1);
>>>           /*
>>>            * here we're doing allocation and writeback of the
>>>            * compressed pages
>>> @@ -951,6 +954,10 @@ static noinline int cow_file_range(struct inode 
>>> *inode,
>>>           if (ret < 0)
>>>               goto out_unlock;
>>>
>>> +        btrfs_qgroup_update_reserved_bytes(root->fs_info,
>>> +                           root->root_key.objectid,
>>> +                           ins.offset, 1);
>>> +
>>>           em = alloc_extent_map();
>>>           if (!em) {
>>>               ret = -ENOMEM;
>>> @@ -6745,6 +6752,10 @@ static struct extent_map 
>>> *btrfs_new_extent_direct(struct inode *inode,
>>>           return ERR_PTR(ret);
>>>       }
>>>
>>> +    btrfs_qgroup_update_reserved_bytes(root->fs_info,
>>> +                       root->root_key.objectid,
>>> +                       ins.offset, 1);
>>> +
>>>       return em;
>>>   }
>>>
>>> @@ -9266,6 +9277,11 @@ static int __btrfs_prealloc_file_range(struct 
>>> inode *inode, int mode,
>>>                   btrfs_end_transaction(trans, root);
>>>               break;
>>>           }
>>> +
>>> +        btrfs_qgroup_update_reserved_bytes(root->fs_info,
>>> +                           root->root_key.objectid,
>>> +                           ins.offset, 1);
>>> +
>>>           btrfs_drop_extent_cache(inode, cur_offset,
>>>                       cur_offset + ins.offset -1, 0);
>>>
>>> diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
>>> index 48b60db..d147082 100644
>>> --- a/fs/btrfs/qgroup.c
>>> +++ b/fs/btrfs/qgroup.c
>>> @@ -72,6 +72,7 @@ struct btrfs_qgroup {
>>>       /*
>>>        * reservation tracking
>>>        */
>>> +    u64 may_use;
>>>       u64 reserved;
>>>
>>>       /*
>>> @@ -1414,6 +1415,8 @@ static int qgroup_excl_accounting(struct 
>>> btrfs_fs_info *fs_info,
>>>       WARN_ON(sign < 0 && qgroup->excl < oper->num_bytes);
>>>       qgroup->excl += sign * oper->num_bytes;
>>>       qgroup->excl_cmpr += sign * oper->num_bytes;
>>> +    if (sign > 0)
>>> +        qgroup->reserved -= oper->num_bytes;
>>>
>>>       qgroup_dirty(fs_info, qgroup);
>>>
>>> @@ -1434,6 +1437,8 @@ static int qgroup_excl_accounting(struct 
>>> btrfs_fs_info *fs_info,
>>>           qgroup->excl += sign * oper->num_bytes;
>>>           if (sign < 0)
>>>               WARN_ON(qgroup->excl < oper->num_bytes);
>>> +        if (sign > 0)
>>> +            qgroup->reserved -= oper->num_bytes;
>>>           qgroup->excl_cmpr += sign * oper->num_bytes;
>>>           qgroup_dirty(fs_info, qgroup);
>>>
>>> @@ -2359,6 +2364,61 @@ out:
>>>       return ret;
>>>   }
>>>
>>> +int btrfs_qgroup_update_reserved_bytes(struct btrfs_fs_info *fs_info,
>>> +                        u64 ref_root,
>>> +                        u64 num_bytes,
>>> +                        int sign)
>>> +{
>>> +    struct btrfs_root *quota_root;
>>> +    struct btrfs_qgroup *qgroup;
>>> +    int ret = 0;
>>> +    struct ulist_node *unode;
>>> +    struct ulist_iterator uiter;
>>> +
>>> +    if (!is_fstree(ref_root) || !fs_info->quota_enabled)
>>> +        return 0;
>>> +
>>> +    if (num_bytes == 0)
>>> +        return 0;
>>> +
>>> +    spin_lock(&fs_info->qgroup_lock);
>>> +    quota_root = fs_info->quota_root;
>>> +    if (!quota_root)
>>> +        goto out;
>>> +
>>> +    qgroup = find_qgroup_rb(fs_info, ref_root);
>>> +    if (!qgroup)
>>> +        goto out;
>>> +
>>> +    ulist_reinit(fs_info->qgroup_ulist);
>>> +    ret = ulist_add(fs_info->qgroup_ulist, qgroup->qgroupid,
>>> +            (uintptr_t)qgroup, GFP_ATOMIC);
>>> +    if (ret < 0)
>>> +        goto out;
>>> +
>>> +    ULIST_ITER_INIT(&uiter);
>>> +    while ((unode = ulist_next(fs_info->qgroup_ulist, &uiter))) {
>>> +        struct btrfs_qgroup *qg;
>>> +        struct btrfs_qgroup_list *glist;
>>> +
>>> +        qg = u64_to_ptr(unode->aux);
>>> +
>>> +        qg->reserved += sign * num_bytes;
>>> +
>>> +        list_for_each_entry(glist, &qg->groups, next_group) {
>>> +            ret = ulist_add(fs_info->qgroup_ulist,
>>> +                    glist->group->qgroupid,
>>> +                    (uintptr_t)glist->group, GFP_ATOMIC);
>>> +            if (ret < 0)
>>> +                goto out;
>>> +        }
>>> +    }
>>> +
>>> +out:
>>> +    spin_unlock(&fs_info->qgroup_lock);
>>> +    return ret;
>>> +}
>>> +
>>>   /*
>>>    * reserve some space for a qgroup and all its parents. The 
>>> reservation takes
>>>    * place with start_transaction or dealloc_reserve, similar to ENOSPC
>>> @@ -2407,14 +2467,14 @@ int btrfs_qgroup_reserve(struct btrfs_root 
>>> *root, u64 num_bytes)
>>>           qg = u64_to_ptr(unode->aux);
>>>
>>>           if ((qg->lim_flags & BTRFS_QGROUP_LIMIT_MAX_RFER) &&
>>> -            qg->reserved + (s64)qg->rfer + num_bytes >
>>> +            qg->reserved + qg->may_use + (s64)qg->rfer + num_bytes >
>>>               qg->max_rfer) {
>>>               ret = -EDQUOT;
>>>               goto out;
>>>           }
>>>
>>>           if ((qg->lim_flags & BTRFS_QGROUP_LIMIT_MAX_EXCL) &&
>>> -            qg->reserved + (s64)qg->excl + num_bytes >
>>> +            qg->reserved + qg->may_use + (s64)qg->excl + num_bytes >
>>>               qg->max_excl) {
>>>               ret = -EDQUOT;
>>>               goto out;
>>> @@ -2438,7 +2498,7 @@ int btrfs_qgroup_reserve(struct btrfs_root 
>>> *root, u64 num_bytes)
>>>
>>>           qg = u64_to_ptr(unode->aux);
>>>
>>> -        qg->reserved += num_bytes;
>>> +        qg->may_use += num_bytes;
>>>       }
>>>
>>>   out:
>>> @@ -2484,7 +2544,7 @@ void btrfs_qgroup_free(struct btrfs_root 
>>> *root, u64 num_bytes)
>>>
>>>           qg = u64_to_ptr(unode->aux);
>>>
>>> -        qg->reserved -= num_bytes;
>>> +        qg->may_use -= num_bytes;
>>>
>>>           list_for_each_entry(glist, &qg->groups, next_group) {
>>>               ret = ulist_add(fs_info->qgroup_ulist,
>>> diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h
>>> index 18cc68c..31de88e 100644
>>> --- a/fs/btrfs/qgroup.h
>>> +++ b/fs/btrfs/qgroup.h
>>> @@ -95,6 +95,10 @@ int btrfs_run_qgroups(struct btrfs_trans_handle 
>>> *trans,
>>>   int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans,
>>>                struct btrfs_fs_info *fs_info, u64 srcid, u64 objectid,
>>>                struct btrfs_qgroup_inherit *inherit);
>>> +int btrfs_qgroup_update_reserved_bytes(struct btrfs_fs_info *fs_info,
>>> +                       u64 ref_root,
>>> +                       u64 num_bytes,
>>> +                       int sign);
>>>   int btrfs_qgroup_reserve(struct btrfs_root *root, u64 num_bytes);
>>>   void btrfs_qgroup_free(struct btrfs_root *root, u64 num_bytes);
>>>
>>
>
> .
>

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yang Dongsheng Dec. 26, 2014, 7 a.m. UTC | #3
On 12/26/2014 02:49 PM, Dongsheng Yang wrote:
> On 12/26/2014 01:43 PM, Satoru Takeuchi wrote:
>> Hi Yang,
>>
>> On 2014/12/26 10:32, Dongsheng Yang wrote:
>>> Hi Satoru,
>>>
>>> I saw your mail of "[BUG] "Quota Ignored On write" problem still 
>>> exist with 3.16-rc5 
>>> <http://news.gmane.org/find-root.php?message_id=53C8DEB0.1060404%40jp.fujitsu.com>" 
>>> in gmane.
>>> I guess this patch will fix the problem you mentioned.
>>> Could you help to give a try?
>>
>> Although it reached disk quota before writing whole 1.5GB,
>> the copied size was only about 700 MB. In this case,
>> expected behavior is to reach disk quota just after
>> writing 1 GB.
>
> Hi Satoru,
> Thanx for your testing a lot. But stranger 

Typo, "strange". :)
> I tried it in 3.19-rc1 and 3.18, both works well
> writing 1GB. Could you give me some more information about your 
> testing machine?
> And could you execute "sync && btrfs qgroup show -pecr $MNT" after 
> writing?
>
> *3.19-rc1 with my 2/2 patch applied.

I also tried with the two patches applied, and got the same result.

Thanx
> =======================================================================
> [root@atest-guest linux_btrfs]# uname -a
> Linux atest-guest 3.19.0-rc1+ #66 SMP Fri Dec 26 10:38:12 EST 2014 
> x86_64 x86_64 x86_64 GNU/Linux
> [root@atest-guest linux_btrfs]# sh quota.sh
> umount: /mnt: not mounted
> Btrfs v3.17-1-gbef9fd4
> See http://btrfs.wiki.kernel.org for more information.
>
> Turning ON incompat feature 'extref': increased hardlink limit per 
> file to 65536
> fs created label (null) on /dev/vdb
>     nodesize 16384 leafsize 16384 sectorsize 4096 size 50.00GiB
> Create subvolume '/mnt/quota_test'
> dd: error writing \u2018/mnt/quota_test/test\u2019: Disk quota exceeded
> 999842+0 records in
> 999841+0 records out
> 1023837184 bytes (1.0 GB) copied, 8.06996 s, 127 MB/s
> [PASS] quota works correctly
> [root@atest-guest linux_btrfs]# sync
> [root@atest-guest linux_btrfs]# df -h /mnt/
> Filesystem      Size  Used Avail Use% Mounted on
> /dev/vdb         50G  996M   49G   2% /mnt
> [root@atest-guest linux_btrfs]# btrfs qgroup show -pecr /mnt/
> qgroupid rfer       excl       max_rfer   max_excl parent  child
> -------- ----       ----       --------   -------- ------  -----
> 0/5      16384      16384      0          0        ---     ---
> 0/256    1023868928 1023868928 1024000000 0        ---     ---
> =============================================================
>>
>> * 3.19-rc1 without your two v2 patches
>>
>> =============================================================================== 
>>
>> + mkfs.btrfs -f /dev/vdb
>> Btrfs v3.17
>> See http://btrfs.wiki.kernel.org for more information.
>>
>> Turning ON incompat feature 'extref': increased hardlink limit per 
>> file to 65536
>> fs created label (null) on /dev/vdb
>>         nodesize 16384 leafsize 16384 sectorsize 4096 size 30.00GiB
>> + mount /dev/vdb /root/btrfs-auto-test/
>> + ret=0
>> + btrfs quota enable /root/btrfs-auto-test/
>> + btrfs subvolume create /root/btrfs-auto-test//sub
>> Create subvolume '/root/btrfs-auto-test/sub'
>> + btrfs qgroup limit 1G /root/btrfs-auto-test//sub
>> + dd if=/dev/zero of=/root/btrfs-auto-test//sub/file bs=1024 
>> count=1500000
>> 1500000+0 records in
>> 1500000+0 records out
>> 1536000000 bytes (1.5 GB) copied, 25.6601 s, 59.9 MB/s
>> =============================================================================== 
>>
>>
>> * 3.19-rc1 with your two v2 patches
>>
>> =============================================================================== 
>>
>> + mkfs.btrfs -f /dev/vdb
>> Btrfs v3.17
>> See http://btrfs.wiki.kernel.org for more information.
>>
>> Turning ON incompat feature 'extref': increased hardlink limit per 
>> file to 65536
>> fs created label (null) on /dev/vdb
>>         nodesize 16384 leafsize 16384 sectorsize 4096 size 30.00GiB
>> + mount /dev/vdb /root/btrfs-auto-test/
>> + ret=0
>> + btrfs quota enable /root/btrfs-auto-test/
>> + btrfs subvolume create /root/btrfs-auto-test//sub
>> Create subvolume '/root/btrfs-auto-test/sub'
>> + btrfs qgroup limit 1G /root/btrfs-auto-test//sub
>> + dd if=/dev/zero of=/root/btrfs-auto-test//sub/file bs=1024 
>> count=1500000
>> dd: error writing '/root/btrfs-auto-test//sub/file': Disk quota exceeded
>> 681353+0 records in
>> 681352+0 records out
>> 697704448 bytes (698 MB) copied, 8.15563 s, 85.5 MB/s
>> =============================================================================== 
>>
>>
>> Thanks,
>> Satoru
>>
>>>
>>> Thanx
>>> Yang
>>>
>>> On 12/12/2014 04:44 PM, Dongsheng Yang wrote:
>>>> Currently, for pre_alloc or delay_alloc, the bytes will be accounted
>>>> in space_info by the three guys.
>>>> space_info->bytes_may_use --- space_info->reserved --- 
>>>> space_info->used.
>>>> But on the other hand, in qgroup, there are only two counters to 
>>>> account the
>>>> bytes, qgroup->reserved and qgroup->excl. And qg->reserved accounts
>>>> bytes in space_info->bytes_may_use and qg->excl accounts bytes in
>>>> space_info->used. So the bytes in space_info->reserved is not 
>>>> accounted
>>>> in qgroup. If so, there is a window we can exceed the quota limit when
>>>> bytes is in space_info->reserved.
>>>>
>>>> Example:
>>>>     # btrfs quota enable /mnt
>>>>     # btrfs qgroup limit -e 10M /mnt
>>>>     # for((i=0;i<20;i++));do fallocate -l 1M /mnt/data$i; done
>>>>     # sync
>>>>     # btrfs qgroup show -pcre /mnt
>>>> qgroupid rfer     excl     max_rfer max_excl parent  child
>>>> -------- ----     ----     -------- -------- ------  -----
>>>> 0/5      20987904 20987904 0        10485760 ---     ---
>>>>
>>>> qg->excl is 20987904 larger than max_excl 10485760.
>>>>
>>>> This patch introduce a new counter named may_use to qgroup, then
>>>> there are three counters in qgroup to account bytes in space_info
>>>> as below.
>>>> space_info->bytes_may_use --- space_info->reserved --- 
>>>> space_info->used.
>>>> qgroup->may_use           --- qgroup->reserved     --- qgroup->excl
>>>>
>>>> With this patch applied:
>>>>     # btrfs quota enable /mnt
>>>>     # btrfs qgroup limit -e 10M /mnt
>>>>     # for((i=0;i<20;i++));do fallocate -l 1M /mnt/data$i; done
>>>> fallocate: /mnt/data9: fallocate failed: Disk quota exceeded
>>>> fallocate: /mnt/data10: fallocate failed: Disk quota exceeded
>>>> fallocate: /mnt/data11: fallocate failed: Disk quota exceeded
>>>> fallocate: /mnt/data12: fallocate failed: Disk quota exceeded
>>>> fallocate: /mnt/data13: fallocate failed: Disk quota exceeded
>>>> fallocate: /mnt/data14: fallocate failed: Disk quota exceeded
>>>> fallocate: /mnt/data15: fallocate failed: Disk quota exceeded
>>>> fallocate: /mnt/data16: fallocate failed: Disk quota exceeded
>>>> fallocate: /mnt/data17: fallocate failed: Disk quota exceeded
>>>> fallocate: /mnt/data18: fallocate failed: Disk quota exceeded
>>>> fallocate: /mnt/data19: fallocate failed: Disk quota exceeded
>>>>     # sync
>>>>     # btrfs qgroup show -pcre /mnt
>>>> qgroupid rfer    excl    max_rfer max_excl parent  child
>>>> -------- ----    ----    -------- -------- ------  -----
>>>> 0/5      9453568 9453568 0        10485760 ---     ---
>>>>
>>>> Reported-by: Cyril SCETBON<cyril.scetbon@free.fr>
>>>> Signed-off-by: Dongsheng Yang<yangds.fnst@cn.fujitsu.com>
>>>> ---
>>>> Changelog:
>>>>     v1 -> v2:
>>>>         Remove the redundant check for fs_info->quota_enabled.
>>>>
>>>>   fs/btrfs/extent-tree.c | 20 ++++++++++++++-
>>>>   fs/btrfs/inode.c       | 18 ++++++++++++-
>>>>   fs/btrfs/qgroup.c      | 68 
>>>> +++++++++++++++++++++++++++++++++++++++++++++++---
>>>>   fs/btrfs/qgroup.h      |  4 +++
>>>>   4 files changed, 104 insertions(+), 6 deletions(-)
>>>>
>>>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>>>> index 014b7f2..f4ad737 100644
>>>> --- a/fs/btrfs/extent-tree.c
>>>> +++ b/fs/btrfs/extent-tree.c
>>>> @@ -5500,8 +5500,12 @@ static int pin_down_extent(struct btrfs_root 
>>>> *root,
>>>>
>>>>       set_extent_dirty(root->fs_info->pinned_extents, bytenr,
>>>>                bytenr + num_bytes - 1, GFP_NOFS | __GFP_NOFAIL);
>>>> -    if (reserved)
>>>> +    if (reserved) {
>>>> + btrfs_qgroup_update_reserved_bytes(root->fs_info,
>>>> +                           root->root_key.objectid,
>>>> +                           num_bytes, -1);
>>>>           trace_btrfs_reserved_extent_free(root, bytenr, num_bytes);
>>>> +    }
>>>>       return 0;
>>>>   }
>>>>
>>>> @@ -6230,6 +6234,9 @@ void btrfs_free_tree_block(struct 
>>>> btrfs_trans_handle *trans,
>>>>           btrfs_update_reserved_bytes(cache, buf->len, 
>>>> RESERVE_FREE, 0);
>>>>           trace_btrfs_reserved_extent_free(root, buf->start, 
>>>> buf->len);
>>>>           pin = 0;
>>>> + btrfs_qgroup_update_reserved_bytes(root->fs_info,
>>>> +                           root->root_key.objectid,
>>>> +                           buf->len, -1);
>>>>       }
>>>>   out:
>>>>       if (pin)
>>>> @@ -6964,7 +6971,11 @@ static int 
>>>> __btrfs_free_reserved_extent(struct btrfs_root *root,
>>>>       else {
>>>>           btrfs_add_free_space(cache, start, len);
>>>>           btrfs_update_reserved_bytes(cache, len, RESERVE_FREE, 
>>>> delalloc);
>>>> + btrfs_qgroup_update_reserved_bytes(root->fs_info,
>>>> +                           root->root_key.objectid,
>>>> +                           len, -1);
>>>>       }
>>>> +
>>>>       btrfs_put_block_group(cache);
>>>>
>>>>       trace_btrfs_reserved_extent_free(root, start, len);
>>>> @@ -7200,6 +7211,9 @@ int btrfs_alloc_logged_file_extent(struct 
>>>> btrfs_trans_handle *trans,
>>>>       BUG_ON(ret); /* logic error */
>>>>       ret = alloc_reserved_file_extent(trans, root, 0, root_objectid,
>>>>                        0, owner, offset, ins, 1);
>>>> +    btrfs_qgroup_update_reserved_bytes(root->fs_info,
>>>> +                       root->root_key.objectid,
>>>> +                       ins->offset, 1);
>>>>       btrfs_put_block_group(block_group);
>>>>       return ret;
>>>>   }
>>>> @@ -7346,6 +7360,10 @@ struct extent_buffer 
>>>> *btrfs_alloc_tree_block(struct btrfs_trans_handle *trans,
>>>>           return ERR_PTR(ret);
>>>>       }
>>>>
>>>> +    btrfs_qgroup_update_reserved_bytes(root->fs_info,
>>>> +                       root_objectid,
>>>> +                       ins.offset, 1);
>>>> +
>>>>       buf = btrfs_init_new_buffer(trans, root, ins.objectid,
>>>>                       blocksize, level);
>>>>       BUG_ON(IS_ERR(buf)); /* -ENOMEM */
>>>> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
>>>> index d23362f..0c824f3 100644
>>>> --- a/fs/btrfs/inode.c
>>>> +++ b/fs/btrfs/inode.c
>>>> @@ -59,6 +59,7 @@
>>>>   #include "backref.h"
>>>>   #include "hash.h"
>>>>   #include "props.h"
>>>> +#include "qgroup.h"
>>>>
>>>>   struct btrfs_iget_args {
>>>>       struct btrfs_key *location;
>>>> @@ -739,7 +740,9 @@ retry:
>>>>               }
>>>>               goto out_free;
>>>>           }
>>>> -
>>>> + btrfs_qgroup_update_reserved_bytes(root->fs_info,
>>>> +                           root->root_key.objectid,
>>>> +                           ins.offset, 1);
>>>>           /*
>>>>            * here we're doing allocation and writeback of the
>>>>            * compressed pages
>>>> @@ -951,6 +954,10 @@ static noinline int cow_file_range(struct 
>>>> inode *inode,
>>>>           if (ret < 0)
>>>>               goto out_unlock;
>>>>
>>>> + btrfs_qgroup_update_reserved_bytes(root->fs_info,
>>>> +                           root->root_key.objectid,
>>>> +                           ins.offset, 1);
>>>> +
>>>>           em = alloc_extent_map();
>>>>           if (!em) {
>>>>               ret = -ENOMEM;
>>>> @@ -6745,6 +6752,10 @@ static struct extent_map 
>>>> *btrfs_new_extent_direct(struct inode *inode,
>>>>           return ERR_PTR(ret);
>>>>       }
>>>>
>>>> +    btrfs_qgroup_update_reserved_bytes(root->fs_info,
>>>> +                       root->root_key.objectid,
>>>> +                       ins.offset, 1);
>>>> +
>>>>       return em;
>>>>   }
>>>>
>>>> @@ -9266,6 +9277,11 @@ static int 
>>>> __btrfs_prealloc_file_range(struct inode *inode, int mode,
>>>>                   btrfs_end_transaction(trans, root);
>>>>               break;
>>>>           }
>>>> +
>>>> + btrfs_qgroup_update_reserved_bytes(root->fs_info,
>>>> +                           root->root_key.objectid,
>>>> +                           ins.offset, 1);
>>>> +
>>>>           btrfs_drop_extent_cache(inode, cur_offset,
>>>>                       cur_offset + ins.offset -1, 0);
>>>>
>>>> diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
>>>> index 48b60db..d147082 100644
>>>> --- a/fs/btrfs/qgroup.c
>>>> +++ b/fs/btrfs/qgroup.c
>>>> @@ -72,6 +72,7 @@ struct btrfs_qgroup {
>>>>       /*
>>>>        * reservation tracking
>>>>        */
>>>> +    u64 may_use;
>>>>       u64 reserved;
>>>>
>>>>       /*
>>>> @@ -1414,6 +1415,8 @@ static int qgroup_excl_accounting(struct 
>>>> btrfs_fs_info *fs_info,
>>>>       WARN_ON(sign < 0 && qgroup->excl < oper->num_bytes);
>>>>       qgroup->excl += sign * oper->num_bytes;
>>>>       qgroup->excl_cmpr += sign * oper->num_bytes;
>>>> +    if (sign > 0)
>>>> +        qgroup->reserved -= oper->num_bytes;
>>>>
>>>>       qgroup_dirty(fs_info, qgroup);
>>>>
>>>> @@ -1434,6 +1437,8 @@ static int qgroup_excl_accounting(struct 
>>>> btrfs_fs_info *fs_info,
>>>>           qgroup->excl += sign * oper->num_bytes;
>>>>           if (sign < 0)
>>>>               WARN_ON(qgroup->excl < oper->num_bytes);
>>>> +        if (sign > 0)
>>>> +            qgroup->reserved -= oper->num_bytes;
>>>>           qgroup->excl_cmpr += sign * oper->num_bytes;
>>>>           qgroup_dirty(fs_info, qgroup);
>>>>
>>>> @@ -2359,6 +2364,61 @@ out:
>>>>       return ret;
>>>>   }
>>>>
>>>> +int btrfs_qgroup_update_reserved_bytes(struct btrfs_fs_info *fs_info,
>>>> +                        u64 ref_root,
>>>> +                        u64 num_bytes,
>>>> +                        int sign)
>>>> +{
>>>> +    struct btrfs_root *quota_root;
>>>> +    struct btrfs_qgroup *qgroup;
>>>> +    int ret = 0;
>>>> +    struct ulist_node *unode;
>>>> +    struct ulist_iterator uiter;
>>>> +
>>>> +    if (!is_fstree(ref_root) || !fs_info->quota_enabled)
>>>> +        return 0;
>>>> +
>>>> +    if (num_bytes == 0)
>>>> +        return 0;
>>>> +
>>>> +    spin_lock(&fs_info->qgroup_lock);
>>>> +    quota_root = fs_info->quota_root;
>>>> +    if (!quota_root)
>>>> +        goto out;
>>>> +
>>>> +    qgroup = find_qgroup_rb(fs_info, ref_root);
>>>> +    if (!qgroup)
>>>> +        goto out;
>>>> +
>>>> +    ulist_reinit(fs_info->qgroup_ulist);
>>>> +    ret = ulist_add(fs_info->qgroup_ulist, qgroup->qgroupid,
>>>> +            (uintptr_t)qgroup, GFP_ATOMIC);
>>>> +    if (ret < 0)
>>>> +        goto out;
>>>> +
>>>> +    ULIST_ITER_INIT(&uiter);
>>>> +    while ((unode = ulist_next(fs_info->qgroup_ulist, &uiter))) {
>>>> +        struct btrfs_qgroup *qg;
>>>> +        struct btrfs_qgroup_list *glist;
>>>> +
>>>> +        qg = u64_to_ptr(unode->aux);
>>>> +
>>>> +        qg->reserved += sign * num_bytes;
>>>> +
>>>> +        list_for_each_entry(glist, &qg->groups, next_group) {
>>>> +            ret = ulist_add(fs_info->qgroup_ulist,
>>>> +                    glist->group->qgroupid,
>>>> +                    (uintptr_t)glist->group, GFP_ATOMIC);
>>>> +            if (ret < 0)
>>>> +                goto out;
>>>> +        }
>>>> +    }
>>>> +
>>>> +out:
>>>> +    spin_unlock(&fs_info->qgroup_lock);
>>>> +    return ret;
>>>> +}
>>>> +
>>>>   /*
>>>>    * reserve some space for a qgroup and all its parents. The 
>>>> reservation takes
>>>>    * place with start_transaction or dealloc_reserve, similar to 
>>>> ENOSPC
>>>> @@ -2407,14 +2467,14 @@ int btrfs_qgroup_reserve(struct btrfs_root 
>>>> *root, u64 num_bytes)
>>>>           qg = u64_to_ptr(unode->aux);
>>>>
>>>>           if ((qg->lim_flags & BTRFS_QGROUP_LIMIT_MAX_RFER) &&
>>>> -            qg->reserved + (s64)qg->rfer + num_bytes >
>>>> +            qg->reserved + qg->may_use + (s64)qg->rfer + num_bytes >
>>>>               qg->max_rfer) {
>>>>               ret = -EDQUOT;
>>>>               goto out;
>>>>           }
>>>>
>>>>           if ((qg->lim_flags & BTRFS_QGROUP_LIMIT_MAX_EXCL) &&
>>>> -            qg->reserved + (s64)qg->excl + num_bytes >
>>>> +            qg->reserved + qg->may_use + (s64)qg->excl + num_bytes >
>>>>               qg->max_excl) {
>>>>               ret = -EDQUOT;
>>>>               goto out;
>>>> @@ -2438,7 +2498,7 @@ int btrfs_qgroup_reserve(struct btrfs_root 
>>>> *root, u64 num_bytes)
>>>>
>>>>           qg = u64_to_ptr(unode->aux);
>>>>
>>>> -        qg->reserved += num_bytes;
>>>> +        qg->may_use += num_bytes;
>>>>       }
>>>>
>>>>   out:
>>>> @@ -2484,7 +2544,7 @@ void btrfs_qgroup_free(struct btrfs_root 
>>>> *root, u64 num_bytes)
>>>>
>>>>           qg = u64_to_ptr(unode->aux);
>>>>
>>>> -        qg->reserved -= num_bytes;
>>>> +        qg->may_use -= num_bytes;
>>>>
>>>>           list_for_each_entry(glist, &qg->groups, next_group) {
>>>>               ret = ulist_add(fs_info->qgroup_ulist,
>>>> diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h
>>>> index 18cc68c..31de88e 100644
>>>> --- a/fs/btrfs/qgroup.h
>>>> +++ b/fs/btrfs/qgroup.h
>>>> @@ -95,6 +95,10 @@ int btrfs_run_qgroups(struct btrfs_trans_handle 
>>>> *trans,
>>>>   int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans,
>>>>                struct btrfs_fs_info *fs_info, u64 srcid, u64 objectid,
>>>>                struct btrfs_qgroup_inherit *inherit);
>>>> +int btrfs_qgroup_update_reserved_bytes(struct btrfs_fs_info *fs_info,
>>>> +                       u64 ref_root,
>>>> +                       u64 num_bytes,
>>>> +                       int sign);
>>>>   int btrfs_qgroup_reserve(struct btrfs_root *root, u64 num_bytes);
>>>>   void btrfs_qgroup_free(struct btrfs_root *root, u64 num_bytes);
>>>>
>>>
>>
>> .
>>
>
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> .
>

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dongsheng Yang Dec. 28, 2014, 2:03 a.m. UTC | #4
On Fri, Dec 26, 2014 at 1:43 PM, Satoru Takeuchi
<takeuchi_satoru@jp.fujitsu.com> wrote:
> Hi Yang,
>
> On 2014/12/26 10:32, Dongsheng Yang wrote:
>>
>> Hi Satoru,
>>
>> I saw your mail of "[BUG] "Quota Ignored On write" problem still exist
>> with 3.16-rc5
>> <http://news.gmane.org/find-root.php?message_id=53C8DEB0.1060404%40jp.fujitsu.com>"
>> in gmane.
>> I guess this patch will fix the problem you mentioned.
>> Could you help to give a try?
>
>
> Although it reached disk quota before writing whole 1.5GB,
> the copied size was only about 700 MB. In this case,
> expected behavior is to reach disk quota just after
> writing 1 GB.

Hi Satoru-san,

After some more investigation, I did meet this problem you described here.
There is a window between "reserved += num_bytes" and "may_use -= num_bytes".
And the similar problem exists in real space accounting. I am cooking a patch
to solve this problem.

Thanx for your test a lot.

Yang
>
> * 3.19-rc1 without your two v2 patches
>
> ===============================================================================
> + mkfs.btrfs -f /dev/vdb
> Btrfs v3.17
> See http://btrfs.wiki.kernel.org for more information.
>
> Turning ON incompat feature 'extref': increased hardlink limit per file to
> 65536
> fs created label (null) on /dev/vdb
>         nodesize 16384 leafsize 16384 sectorsize 4096 size 30.00GiB
> + mount /dev/vdb /root/btrfs-auto-test/
> + ret=0
> + btrfs quota enable /root/btrfs-auto-test/
> + btrfs subvolume create /root/btrfs-auto-test//sub
> Create subvolume '/root/btrfs-auto-test/sub'
> + btrfs qgroup limit 1G /root/btrfs-auto-test//sub
> + dd if=/dev/zero of=/root/btrfs-auto-test//sub/file bs=1024 count=1500000
> 1500000+0 records in
> 1500000+0 records out
> 1536000000 bytes (1.5 GB) copied, 25.6601 s, 59.9 MB/s
> ===============================================================================
>
> * 3.19-rc1 with your two v2 patches
>
> ===============================================================================
> + mkfs.btrfs -f /dev/vdb
> Btrfs v3.17
> See http://btrfs.wiki.kernel.org for more information.
>
> Turning ON incompat feature 'extref': increased hardlink limit per file to
> 65536
> fs created label (null) on /dev/vdb
>         nodesize 16384 leafsize 16384 sectorsize 4096 size 30.00GiB
> + mount /dev/vdb /root/btrfs-auto-test/
> + ret=0
> + btrfs quota enable /root/btrfs-auto-test/
> + btrfs subvolume create /root/btrfs-auto-test//sub
> Create subvolume '/root/btrfs-auto-test/sub'
> + btrfs qgroup limit 1G /root/btrfs-auto-test//sub
> + dd if=/dev/zero of=/root/btrfs-auto-test//sub/file bs=1024 count=1500000
> dd: error writing '/root/btrfs-auto-test//sub/file': Disk quota exceeded
> 681353+0 records in
> 681352+0 records out
> 697704448 bytes (698 MB) copied, 8.15563 s, 85.5 MB/s
> ===============================================================================
>
> Thanks,
> Satoru
>
>>
>> Thanx
>> Yang
>>
>>
>> On 12/12/2014 04:44 PM, Dongsheng Yang wrote:
>>>
>>> Currently, for pre_alloc or delay_alloc, the bytes will be accounted
>>> in space_info by the three guys.
>>> space_info->bytes_may_use --- space_info->reserved --- space_info->used.
>>> But on the other hand, in qgroup, there are only two counters to account
>>> the
>>> bytes, qgroup->reserved and qgroup->excl. And qg->reserved accounts
>>> bytes in space_info->bytes_may_use and qg->excl accounts bytes in
>>> space_info->used. So the bytes in space_info->reserved is not accounted
>>> in qgroup. If so, there is a window we can exceed the quota limit when
>>> bytes is in space_info->reserved.
>>>
>>> Example:
>>>         # btrfs quota enable /mnt
>>>         # btrfs qgroup limit -e 10M /mnt
>>>         # for((i=0;i<20;i++));do fallocate -l 1M /mnt/data$i; done
>>>         # sync
>>>         # btrfs qgroup show -pcre /mnt
>>> qgroupid rfer     excl     max_rfer max_excl parent  child
>>> -------- ----     ----     -------- -------- ------  -----
>>> 0/5      20987904 20987904 0        10485760 ---     ---
>>>
>>> qg->excl is 20987904 larger than max_excl 10485760.
>>>
>>> This patch introduce a new counter named may_use to qgroup, then
>>> there are three counters in qgroup to account bytes in space_info
>>> as below.
>>> space_info->bytes_may_use --- space_info->reserved --- space_info->used.
>>> qgroup->may_use           --- qgroup->reserved     --- qgroup->excl
>>>
>>> With this patch applied:
>>>         # btrfs quota enable /mnt
>>>         # btrfs qgroup limit -e 10M /mnt
>>>         # for((i=0;i<20;i++));do fallocate -l 1M /mnt/data$i; done
>>> fallocate: /mnt/data9: fallocate failed: Disk quota exceeded
>>> fallocate: /mnt/data10: fallocate failed: Disk quota exceeded
>>> fallocate: /mnt/data11: fallocate failed: Disk quota exceeded
>>> fallocate: /mnt/data12: fallocate failed: Disk quota exceeded
>>> fallocate: /mnt/data13: fallocate failed: Disk quota exceeded
>>> fallocate: /mnt/data14: fallocate failed: Disk quota exceeded
>>> fallocate: /mnt/data15: fallocate failed: Disk quota exceeded
>>> fallocate: /mnt/data16: fallocate failed: Disk quota exceeded
>>> fallocate: /mnt/data17: fallocate failed: Disk quota exceeded
>>> fallocate: /mnt/data18: fallocate failed: Disk quota exceeded
>>> fallocate: /mnt/data19: fallocate failed: Disk quota exceeded
>>>         # sync
>>>         # btrfs qgroup show -pcre /mnt
>>> qgroupid rfer    excl    max_rfer max_excl parent  child
>>> -------- ----    ----    -------- -------- ------  -----
>>> 0/5      9453568 9453568 0        10485760 ---     ---
>>>
>>> Reported-by: Cyril SCETBON<cyril.scetbon@free.fr>
>>> Signed-off-by: Dongsheng Yang<yangds.fnst@cn.fujitsu.com>
>>> ---
>>> Changelog:
>>>         v1 -> v2:
>>>                 Remove the redundant check for fs_info->quota_enabled.
>>>
>>>   fs/btrfs/extent-tree.c | 20 ++++++++++++++-
>>>   fs/btrfs/inode.c       | 18 ++++++++++++-
>>>   fs/btrfs/qgroup.c      | 68
>>> +++++++++++++++++++++++++++++++++++++++++++++++---
>>>   fs/btrfs/qgroup.h      |  4 +++
>>>   4 files changed, 104 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>>> index 014b7f2..f4ad737 100644
>>> --- a/fs/btrfs/extent-tree.c
>>> +++ b/fs/btrfs/extent-tree.c
>>> @@ -5500,8 +5500,12 @@ static int pin_down_extent(struct btrfs_root
>>> *root,
>>>
>>>         set_extent_dirty(root->fs_info->pinned_extents, bytenr,
>>>                          bytenr + num_bytes - 1, GFP_NOFS |
>>> __GFP_NOFAIL);
>>> -       if (reserved)
>>> +       if (reserved) {
>>> +               btrfs_qgroup_update_reserved_bytes(root->fs_info,
>>> +
>>> root->root_key.objectid,
>>> +                                                  num_bytes, -1);
>>>                 trace_btrfs_reserved_extent_free(root, bytenr,
>>> num_bytes);
>>> +       }
>>>         return 0;
>>>   }
>>>
>>> @@ -6230,6 +6234,9 @@ void btrfs_free_tree_block(struct
>>> btrfs_trans_handle *trans,
>>>                 btrfs_update_reserved_bytes(cache, buf->len,
>>> RESERVE_FREE, 0);
>>>                 trace_btrfs_reserved_extent_free(root, buf->start,
>>> buf->len);
>>>                 pin = 0;
>>> +               btrfs_qgroup_update_reserved_bytes(root->fs_info,
>>> +
>>> root->root_key.objectid,
>>> +                                                  buf->len, -1);
>>>         }
>>>   out:
>>>         if (pin)
>>> @@ -6964,7 +6971,11 @@ static int __btrfs_free_reserved_extent(struct
>>> btrfs_root *root,
>>>         else {
>>>                 btrfs_add_free_space(cache, start, len);
>>>                 btrfs_update_reserved_bytes(cache, len, RESERVE_FREE,
>>> delalloc);
>>> +               btrfs_qgroup_update_reserved_bytes(root->fs_info,
>>> +
>>> root->root_key.objectid,
>>> +                                                  len, -1);
>>>         }
>>> +
>>>         btrfs_put_block_group(cache);
>>>
>>>         trace_btrfs_reserved_extent_free(root, start, len);
>>> @@ -7200,6 +7211,9 @@ int btrfs_alloc_logged_file_extent(struct
>>> btrfs_trans_handle *trans,
>>>         BUG_ON(ret); /* logic error */
>>>         ret = alloc_reserved_file_extent(trans, root, 0, root_objectid,
>>>                                          0, owner, offset, ins, 1);
>>> +       btrfs_qgroup_update_reserved_bytes(root->fs_info,
>>> +                                          root->root_key.objectid,
>>> +                                          ins->offset, 1);
>>>         btrfs_put_block_group(block_group);
>>>         return ret;
>>>   }
>>> @@ -7346,6 +7360,10 @@ struct extent_buffer
>>> *btrfs_alloc_tree_block(struct btrfs_trans_handle *trans,
>>>                 return ERR_PTR(ret);
>>>         }
>>>
>>> +       btrfs_qgroup_update_reserved_bytes(root->fs_info,
>>> +                                          root_objectid,
>>> +                                          ins.offset, 1);
>>> +
>>>         buf = btrfs_init_new_buffer(trans, root, ins.objectid,
>>>                                     blocksize, level);
>>>         BUG_ON(IS_ERR(buf)); /* -ENOMEM */
>>> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
>>> index d23362f..0c824f3 100644
>>> --- a/fs/btrfs/inode.c
>>> +++ b/fs/btrfs/inode.c
>>> @@ -59,6 +59,7 @@
>>>   #include "backref.h"
>>>   #include "hash.h"
>>>   #include "props.h"
>>> +#include "qgroup.h"
>>>
>>>   struct btrfs_iget_args {
>>>         struct btrfs_key *location;
>>> @@ -739,7 +740,9 @@ retry:
>>>                         }
>>>                         goto out_free;
>>>                 }
>>> -
>>> +               btrfs_qgroup_update_reserved_bytes(root->fs_info,
>>> +
>>> root->root_key.objectid,
>>> +                                                  ins.offset, 1);
>>>                 /*
>>>                  * here we're doing allocation and writeback of the
>>>                  * compressed pages
>>> @@ -951,6 +954,10 @@ static noinline int cow_file_range(struct inode
>>> *inode,
>>>                 if (ret < 0)
>>>                         goto out_unlock;
>>>
>>> +               btrfs_qgroup_update_reserved_bytes(root->fs_info,
>>> +
>>> root->root_key.objectid,
>>> +                                                  ins.offset, 1);
>>> +
>>>                 em = alloc_extent_map();
>>>                 if (!em) {
>>>                         ret = -ENOMEM;
>>> @@ -6745,6 +6752,10 @@ static struct extent_map
>>> *btrfs_new_extent_direct(struct inode *inode,
>>>                 return ERR_PTR(ret);
>>>         }
>>>
>>> +       btrfs_qgroup_update_reserved_bytes(root->fs_info,
>>> +                                          root->root_key.objectid,
>>> +                                          ins.offset, 1);
>>> +
>>>         return em;
>>>   }
>>>
>>> @@ -9266,6 +9277,11 @@ static int __btrfs_prealloc_file_range(struct
>>> inode *inode, int mode,
>>>                                 btrfs_end_transaction(trans, root);
>>>                         break;
>>>                 }
>>> +
>>> +               btrfs_qgroup_update_reserved_bytes(root->fs_info,
>>> +
>>> root->root_key.objectid,
>>> +                                                  ins.offset, 1);
>>> +
>>>                 btrfs_drop_extent_cache(inode, cur_offset,
>>>                                         cur_offset + ins.offset -1, 0);
>>>
>>> diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
>>> index 48b60db..d147082 100644
>>> --- a/fs/btrfs/qgroup.c
>>> +++ b/fs/btrfs/qgroup.c
>>> @@ -72,6 +72,7 @@ struct btrfs_qgroup {
>>>         /*
>>>          * reservation tracking
>>>          */
>>> +       u64 may_use;
>>>         u64 reserved;
>>>
>>>         /*
>>> @@ -1414,6 +1415,8 @@ static int qgroup_excl_accounting(struct
>>> btrfs_fs_info *fs_info,
>>>         WARN_ON(sign < 0 && qgroup->excl < oper->num_bytes);
>>>         qgroup->excl += sign * oper->num_bytes;
>>>         qgroup->excl_cmpr += sign * oper->num_bytes;
>>> +       if (sign > 0)
>>> +               qgroup->reserved -= oper->num_bytes;
>>>
>>>         qgroup_dirty(fs_info, qgroup);
>>>
>>> @@ -1434,6 +1437,8 @@ static int qgroup_excl_accounting(struct
>>> btrfs_fs_info *fs_info,
>>>                 qgroup->excl += sign * oper->num_bytes;
>>>                 if (sign < 0)
>>>                         WARN_ON(qgroup->excl < oper->num_bytes);
>>> +               if (sign > 0)
>>> +                       qgroup->reserved -= oper->num_bytes;
>>>                 qgroup->excl_cmpr += sign * oper->num_bytes;
>>>                 qgroup_dirty(fs_info, qgroup);
>>>
>>> @@ -2359,6 +2364,61 @@ out:
>>>         return ret;
>>>   }
>>>
>>> +int btrfs_qgroup_update_reserved_bytes(struct btrfs_fs_info *fs_info,
>>> +                                           u64 ref_root,
>>> +                                           u64 num_bytes,
>>> +                                           int sign)
>>> +{
>>> +       struct btrfs_root *quota_root;
>>> +       struct btrfs_qgroup *qgroup;
>>> +       int ret = 0;
>>> +       struct ulist_node *unode;
>>> +       struct ulist_iterator uiter;
>>> +
>>> +       if (!is_fstree(ref_root) || !fs_info->quota_enabled)
>>> +               return 0;
>>> +
>>> +       if (num_bytes == 0)
>>> +               return 0;
>>> +
>>> +       spin_lock(&fs_info->qgroup_lock);
>>> +       quota_root = fs_info->quota_root;
>>> +       if (!quota_root)
>>> +               goto out;
>>> +
>>> +       qgroup = find_qgroup_rb(fs_info, ref_root);
>>> +       if (!qgroup)
>>> +               goto out;
>>> +
>>> +       ulist_reinit(fs_info->qgroup_ulist);
>>> +       ret = ulist_add(fs_info->qgroup_ulist, qgroup->qgroupid,
>>> +                       (uintptr_t)qgroup, GFP_ATOMIC);
>>> +       if (ret < 0)
>>> +               goto out;
>>> +
>>> +       ULIST_ITER_INIT(&uiter);
>>> +       while ((unode = ulist_next(fs_info->qgroup_ulist, &uiter))) {
>>> +               struct btrfs_qgroup *qg;
>>> +               struct btrfs_qgroup_list *glist;
>>> +
>>> +               qg = u64_to_ptr(unode->aux);
>>> +
>>> +               qg->reserved += sign * num_bytes;
>>> +
>>> +               list_for_each_entry(glist, &qg->groups, next_group) {
>>> +                       ret = ulist_add(fs_info->qgroup_ulist,
>>> +                                       glist->group->qgroupid,
>>> +                                       (uintptr_t)glist->group,
>>> GFP_ATOMIC);
>>> +                       if (ret < 0)
>>> +                               goto out;
>>> +               }
>>> +       }
>>> +
>>> +out:
>>> +       spin_unlock(&fs_info->qgroup_lock);
>>> +       return ret;
>>> +}
>>> +
>>>   /*
>>>    * reserve some space for a qgroup and all its parents. The reservation
>>> takes
>>>    * place with start_transaction or dealloc_reserve, similar to ENOSPC
>>> @@ -2407,14 +2467,14 @@ int btrfs_qgroup_reserve(struct btrfs_root *root,
>>> u64 num_bytes)
>>>                 qg = u64_to_ptr(unode->aux);
>>>
>>>                 if ((qg->lim_flags & BTRFS_QGROUP_LIMIT_MAX_RFER) &&
>>> -                   qg->reserved + (s64)qg->rfer + num_bytes >
>>> +                   qg->reserved + qg->may_use + (s64)qg->rfer +
>>> num_bytes >
>>>                     qg->max_rfer) {
>>>                         ret = -EDQUOT;
>>>                         goto out;
>>>                 }
>>>
>>>                 if ((qg->lim_flags & BTRFS_QGROUP_LIMIT_MAX_EXCL) &&
>>> -                   qg->reserved + (s64)qg->excl + num_bytes >
>>> +                   qg->reserved + qg->may_use + (s64)qg->excl +
>>> num_bytes >
>>>                     qg->max_excl) {
>>>                         ret = -EDQUOT;
>>>                         goto out;
>>> @@ -2438,7 +2498,7 @@ int btrfs_qgroup_reserve(struct btrfs_root *root,
>>> u64 num_bytes)
>>>
>>>                 qg = u64_to_ptr(unode->aux);
>>>
>>> -               qg->reserved += num_bytes;
>>> +               qg->may_use += num_bytes;
>>>         }
>>>
>>>   out:
>>> @@ -2484,7 +2544,7 @@ void btrfs_qgroup_free(struct btrfs_root *root, u64
>>> num_bytes)
>>>
>>>                 qg = u64_to_ptr(unode->aux);
>>>
>>> -               qg->reserved -= num_bytes;
>>> +               qg->may_use -= num_bytes;
>>>
>>>                 list_for_each_entry(glist, &qg->groups, next_group) {
>>>                         ret = ulist_add(fs_info->qgroup_ulist,
>>> diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h
>>> index 18cc68c..31de88e 100644
>>> --- a/fs/btrfs/qgroup.h
>>> +++ b/fs/btrfs/qgroup.h
>>> @@ -95,6 +95,10 @@ int btrfs_run_qgroups(struct btrfs_trans_handle
>>> *trans,
>>>   int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans,
>>>                          struct btrfs_fs_info *fs_info, u64 srcid, u64
>>> objectid,
>>>                          struct btrfs_qgroup_inherit *inherit);
>>> +int btrfs_qgroup_update_reserved_bytes(struct btrfs_fs_info *fs_info,
>>> +                                      u64 ref_root,
>>> +                                      u64 num_bytes,
>>> +                                      int sign);
>>>   int btrfs_qgroup_reserve(struct btrfs_root *root, u64 num_bytes);
>>>   void btrfs_qgroup_free(struct btrfs_root *root, u64 num_bytes);
>>>
>>
>
> --
>
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yang Dongsheng Jan. 5, 2015, 6:16 a.m. UTC | #5
Hi Josef and others,

This patch set is about enhancing qgroup.

[1/3]: fix a bug about qgroup leak when we exceed quota limit,
	It is reviewd by Josef.
[2/3]: introduce a new accounter in qgroup to close a window where
	user will exceed the limit by qgroup. It "looks good" to Josef.
[3/3]: a new patch to fix a bug reported by Satoru.

BTW, I have some other plan about qgroup in my TODO list:

Kernel:
a). adjust the accounters in parent qgroup when we move
the child qgroup.
	Currently, when we move a qgroup, the parent qgroup
will not updated at the same time. This will cause some wrong
numbers in qgroup.

b). add a ioctl to show the qgroup info.
	Command "btrfs qgroup show" is showing the qgroup info
read from qgroup tree. But there is some information in memory
which is not synced into device. Then it will show some outdate
number.

c). limit and account size in 3 modes, data, metadata and both.
	qgroup is accounting the size both of data and metadata
togather, but to a user, the data size is the most useful to them.

d). remove a subvolume related qgroup when subvolume is deleted and
there is no other reference to it.

user-tool:
a). Add the unit of B/K/M/G to btrfs qgroup show.
b). get the information via ioctl rather than reading it from
btree. Will keep the old way as a fallback for compatiblity.

Any comment and sugguestion is welcome. :)

Yang

Dongsheng Yang (3):
  Btrfs: qgroup: free reserved in exceeding quota.
  Btrfs: qgroup: Introduce a may_use to account
    space_info->bytes_may_use.
  Btrfs: qgroup, Account data space in more proper timings.

 fs/btrfs/extent-tree.c | 41 +++++++++++++++++++++++-------
 fs/btrfs/file.c        |  9 -------
 fs/btrfs/inode.c       | 18 ++++++++++++-
 fs/btrfs/qgroup.c      | 68 +++++++++++++++++++++++++++++++++++++++++++++++---
 fs/btrfs/qgroup.h      |  4 +++
 5 files changed, 117 insertions(+), 23 deletions(-)
Satoru Takeuchi Jan. 7, 2015, 12:49 a.m. UTC | #6
Hi Yang,

On 2015/01/05 15:16, Dongsheng Yang wrote:
> Hi Josef and others,
> 
> This patch set is about enhancing qgroup.
> 
> [1/3]: fix a bug about qgroup leak when we exceed quota limit,
> 	It is reviewd by Josef.
> [2/3]: introduce a new accounter in qgroup to close a window where
> 	user will exceed the limit by qgroup. It "looks good" to Josef.
> [3/3]: a new patch to fix a bug reported by Satoru.

I tested your the patchset v3. Although it's far better
than the patchset v2, there is still one problem in this patchset.
When I wrote 1.5GiB to a subvolume with 1.0 GiB limit,
1.0GiB - 139 block (in this case, 1KiB/block) was written.

I consider user should be able to write just 1.0GiB in this case.

* Test result

===============================================================================
+ mkfs.btrfs -f /dev/vdb
Btrfs v3.17
See http://btrfs.wiki.kernel.org for more information.

Turning ON incompat feature 'extref': increased hardlink limit per file to 65536
fs created label (null) on /dev/vdb
        nodesize 16384 leafsize 16384 sectorsize 4096 size 30.00GiB
+ mount /dev/vdb /root/btrfs-auto-test/
+ ret=0
+ btrfs quota enable /root/btrfs-auto-test/
+ btrfs subvolume create /root/btrfs-auto-test//sub
Create subvolume '/root/btrfs-auto-test/sub'
+ btrfs qgroup limit 1G /root/btrfs-auto-test//sub
+ dd if=/dev/zero of=/root/btrfs-auto-test//sub/file bs=1024 count=1500000
dd: error writing '/root/btrfs-auto-test//sub/file': Disk quota exceeded
1048438+0 records in                    # Tried to write 1GiB - 138 KiB
1048437+0 records out                   # Succeeded to write 1GiB - 139 KiB
1073599488 bytes (1.1 GB) copied, 19.0247 s, 56.4 MB/s
===============================================================================

* note

I tried to run the reproducer five times and the result is
a bit different for each time.

=========================
#	Written
-------------------------
1	1GiB - 139 KiB
2	1GiB - 139 KiB
3	1GiB - 145 KiB
4	1GiB - 135 KiB
5	1GiB - 135 KiB
==========================

So I consider it's a problem comes from timing.

If I changed the block size from 1KiB to 1 MiB,
the difference in bytes got larger.

============================
#	Written
----------------------------
1	1GiB - 1 MiB
2	1GiB - 1 MiB
3	1GiB - 1 MiB
4	1GiB - 1 MiB
5	1GiB - 1 MiB
============================

Thanks,
Satoru

> 
> BTW, I have some other plan about qgroup in my TODO list:
> 
> Kernel:
> a). adjust the accounters in parent qgroup when we move
> the child qgroup.
> 	Currently, when we move a qgroup, the parent qgroup
> will not updated at the same time. This will cause some wrong
> numbers in qgroup.
> 
> b). add a ioctl to show the qgroup info.
> 	Command "btrfs qgroup show" is showing the qgroup info
> read from qgroup tree. But there is some information in memory
> which is not synced into device. Then it will show some outdate
> number.
> 
> c). limit and account size in 3 modes, data, metadata and both.
> 	qgroup is accounting the size both of data and metadata
> togather, but to a user, the data size is the most useful to them.
> 
> d). remove a subvolume related qgroup when subvolume is deleted and
> there is no other reference to it.
> 
> user-tool:
> a). Add the unit of B/K/M/G to btrfs qgroup show.
> b). get the information via ioctl rather than reading it from
> btree. Will keep the old way as a fallback for compatiblity.
> 
> Any comment and sugguestion is welcome. :)
> 
> Yang
> 
> Dongsheng Yang (3):
>    Btrfs: qgroup: free reserved in exceeding quota.
>    Btrfs: qgroup: Introduce a may_use to account
>      space_info->bytes_may_use.
>    Btrfs: qgroup, Account data space in more proper timings.
> 
>   fs/btrfs/extent-tree.c | 41 +++++++++++++++++++++++-------
>   fs/btrfs/file.c        |  9 -------
>   fs/btrfs/inode.c       | 18 ++++++++++++-
>   fs/btrfs/qgroup.c      | 68 +++++++++++++++++++++++++++++++++++++++++++++++---
>   fs/btrfs/qgroup.h      |  4 +++
>   5 files changed, 117 insertions(+), 23 deletions(-)
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yang Dongsheng Jan. 8, 2015, 3:53 a.m. UTC | #7
On 01/07/2015 08:49 AM, Satoru Takeuchi wrote:
> Hi Yang,
>
> On 2015/01/05 15:16, Dongsheng Yang wrote:
>> Hi Josef and others,
>>
>> This patch set is about enhancing qgroup.
>>
>> [1/3]: fix a bug about qgroup leak when we exceed quota limit,
>> 	It is reviewd by Josef.
>> [2/3]: introduce a new accounter in qgroup to close a window where
>> 	user will exceed the limit by qgroup. It "looks good" to Josef.
>> [3/3]: a new patch to fix a bug reported by Satoru.
> I tested your the patchset v3. Although it's far better
> than the patchset v2, there is still one problem in this patchset.
> When I wrote 1.5GiB to a subvolume with 1.0 GiB limit,
> 1.0GiB - 139 block (in this case, 1KiB/block) was written.
>
> I consider user should be able to write just 1.0GiB in this case.

Hi Satoru,

Yes, Currently, user can not write 1.0GiB in this case. Because qgroup
is accounting data and metadata togather. And I have posted an idea
in this thread that split it into three modes, data, metadata and both.

TODO issues:
c). limit and account size in 3 modes, data, metadata and both.
	qgroup is accounting the size both of data and metadata
togather, but to users, the data size is the most useful to them.


But, you mentioned that the result is different in each time.
Hmmm.... there must be something wrong in it. I need some more
investigation to answer this question.

Thanx a lot for your test!
Yang
>
> * Test result
>
> ===============================================================================
> + mkfs.btrfs -f /dev/vdb
> Btrfs v3.17
> See http://btrfs.wiki.kernel.org for more information.
>
> Turning ON incompat feature 'extref': increased hardlink limit per file to 65536
> fs created label (null) on /dev/vdb
>         nodesize 16384 leafsize 16384 sectorsize 4096 size 30.00GiB
> + mount /dev/vdb /root/btrfs-auto-test/
> + ret=0
> + btrfs quota enable /root/btrfs-auto-test/
> + btrfs subvolume create /root/btrfs-auto-test//sub
> Create subvolume '/root/btrfs-auto-test/sub'
> + btrfs qgroup limit 1G /root/btrfs-auto-test//sub
> + dd if=/dev/zero of=/root/btrfs-auto-test//sub/file bs=1024 count=1500000
> dd: error writing '/root/btrfs-auto-test//sub/file': Disk quota exceeded
> 1048438+0 records in                    # Tried to write 1GiB - 138 KiB
> 1048437+0 records out                   # Succeeded to write 1GiB - 139 KiB
> 1073599488 bytes (1.1 GB) copied, 19.0247 s, 56.4 MB/s
> ===============================================================================
>
> * note
>
> I tried to run the reproducer five times and the result is
> a bit different for each time.
>
> =========================
> #	Written
> -------------------------
> 1	1GiB - 139 KiB
> 2	1GiB - 139 KiB
> 3	1GiB - 145 KiB
> 4	1GiB - 135 KiB
> 5	1GiB - 135 KiB
> ==========================
>
> So I consider it's a problem comes from timing.
>
> If I changed the block size from 1KiB to 1 MiB,
> the difference in bytes got larger.
>
> ============================
> #	Written
> ----------------------------
> 1	1GiB - 1 MiB
> 2	1GiB - 1 MiB
> 3	1GiB - 1 MiB
> 4	1GiB - 1 MiB
> 5	1GiB - 1 MiB
> ============================
>
> Thanks,
> Satoru
>
>> BTW, I have some other plan about qgroup in my TODO list:
>>
>> Kernel:
>> a). adjust the accounters in parent qgroup when we move
>> the child qgroup.
>> 	Currently, when we move a qgroup, the parent qgroup
>> will not updated at the same time. This will cause some wrong
>> numbers in qgroup.
>>
>> b). add a ioctl to show the qgroup info.
>> 	Command "btrfs qgroup show" is showing the qgroup info
>> read from qgroup tree. But there is some information in memory
>> which is not synced into device. Then it will show some outdate
>> number.
>>
>> c). limit and account size in 3 modes, data, metadata and both.
>> 	qgroup is accounting the size both of data and metadata
>> togather, but to a user, the data size is the most useful to them.
>>
>> d). remove a subvolume related qgroup when subvolume is deleted and
>> there is no other reference to it.
>>
>> user-tool:
>> a). Add the unit of B/K/M/G to btrfs qgroup show.
>> b). get the information via ioctl rather than reading it from
>> btree. Will keep the old way as a fallback for compatiblity.
>>
>> Any comment and sugguestion is welcome. :)
>>
>> Yang
>>
>> Dongsheng Yang (3):
>>    Btrfs: qgroup: free reserved in exceeding quota.
>>    Btrfs: qgroup: Introduce a may_use to account
>>      space_info->bytes_may_use.
>>    Btrfs: qgroup, Account data space in more proper timings.
>>
>>   fs/btrfs/extent-tree.c | 41 +++++++++++++++++++++++-------
>>   fs/btrfs/file.c        |  9 -------
>>   fs/btrfs/inode.c       | 18 ++++++++++++-
>>   fs/btrfs/qgroup.c      | 68 +++++++++++++++++++++++++++++++++++++++++++++++---
>>   fs/btrfs/qgroup.h      |  4 +++
>>   5 files changed, 117 insertions(+), 23 deletions(-)
>>
> .
>

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 014b7f2..f4ad737 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -5500,8 +5500,12 @@  static int pin_down_extent(struct btrfs_root *root,
 
 	set_extent_dirty(root->fs_info->pinned_extents, bytenr,
 			 bytenr + num_bytes - 1, GFP_NOFS | __GFP_NOFAIL);
-	if (reserved)
+	if (reserved) {
+		btrfs_qgroup_update_reserved_bytes(root->fs_info,
+						   root->root_key.objectid,
+						   num_bytes, -1);
 		trace_btrfs_reserved_extent_free(root, bytenr, num_bytes);
+	}
 	return 0;
 }
 
@@ -6230,6 +6234,9 @@  void btrfs_free_tree_block(struct btrfs_trans_handle *trans,
 		btrfs_update_reserved_bytes(cache, buf->len, RESERVE_FREE, 0);
 		trace_btrfs_reserved_extent_free(root, buf->start, buf->len);
 		pin = 0;
+		btrfs_qgroup_update_reserved_bytes(root->fs_info,
+						   root->root_key.objectid,
+						   buf->len, -1);
 	}
 out:
 	if (pin)
@@ -6964,7 +6971,11 @@  static int __btrfs_free_reserved_extent(struct btrfs_root *root,
 	else {
 		btrfs_add_free_space(cache, start, len);
 		btrfs_update_reserved_bytes(cache, len, RESERVE_FREE, delalloc);
+		btrfs_qgroup_update_reserved_bytes(root->fs_info,
+						   root->root_key.objectid,
+						   len, -1);
 	}
+
 	btrfs_put_block_group(cache);
 
 	trace_btrfs_reserved_extent_free(root, start, len);
@@ -7200,6 +7211,9 @@  int btrfs_alloc_logged_file_extent(struct btrfs_trans_handle *trans,
 	BUG_ON(ret); /* logic error */
 	ret = alloc_reserved_file_extent(trans, root, 0, root_objectid,
 					 0, owner, offset, ins, 1);
+	btrfs_qgroup_update_reserved_bytes(root->fs_info,
+					   root->root_key.objectid,
+					   ins->offset, 1);
 	btrfs_put_block_group(block_group);
 	return ret;
 }
@@ -7346,6 +7360,10 @@  struct extent_buffer *btrfs_alloc_tree_block(struct btrfs_trans_handle *trans,
 		return ERR_PTR(ret);
 	}
 
+	btrfs_qgroup_update_reserved_bytes(root->fs_info,
+					   root_objectid,
+					   ins.offset, 1);
+
 	buf = btrfs_init_new_buffer(trans, root, ins.objectid,
 				    blocksize, level);
 	BUG_ON(IS_ERR(buf)); /* -ENOMEM */
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index d23362f..0c824f3 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -59,6 +59,7 @@ 
 #include "backref.h"
 #include "hash.h"
 #include "props.h"
+#include "qgroup.h"
 
 struct btrfs_iget_args {
 	struct btrfs_key *location;
@@ -739,7 +740,9 @@  retry:
 			}
 			goto out_free;
 		}
-
+		btrfs_qgroup_update_reserved_bytes(root->fs_info,
+						   root->root_key.objectid,
+						   ins.offset, 1);
 		/*
 		 * here we're doing allocation and writeback of the
 		 * compressed pages
@@ -951,6 +954,10 @@  static noinline int cow_file_range(struct inode *inode,
 		if (ret < 0)
 			goto out_unlock;
 
+		btrfs_qgroup_update_reserved_bytes(root->fs_info,
+						   root->root_key.objectid,
+						   ins.offset, 1);
+
 		em = alloc_extent_map();
 		if (!em) {
 			ret = -ENOMEM;
@@ -6745,6 +6752,10 @@  static struct extent_map *btrfs_new_extent_direct(struct inode *inode,
 		return ERR_PTR(ret);
 	}
 
+	btrfs_qgroup_update_reserved_bytes(root->fs_info,
+					   root->root_key.objectid,
+					   ins.offset, 1);
+
 	return em;
 }
 
@@ -9266,6 +9277,11 @@  static int __btrfs_prealloc_file_range(struct inode *inode, int mode,
 				btrfs_end_transaction(trans, root);
 			break;
 		}
+
+		btrfs_qgroup_update_reserved_bytes(root->fs_info,
+						   root->root_key.objectid,
+						   ins.offset, 1);
+
 		btrfs_drop_extent_cache(inode, cur_offset,
 					cur_offset + ins.offset -1, 0);
 
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 48b60db..d147082 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -72,6 +72,7 @@  struct btrfs_qgroup {
 	/*
 	 * reservation tracking
 	 */
+	u64 may_use;
 	u64 reserved;
 
 	/*
@@ -1414,6 +1415,8 @@  static int qgroup_excl_accounting(struct btrfs_fs_info *fs_info,
 	WARN_ON(sign < 0 && qgroup->excl < oper->num_bytes);
 	qgroup->excl += sign * oper->num_bytes;
 	qgroup->excl_cmpr += sign * oper->num_bytes;
+	if (sign > 0)
+		qgroup->reserved -= oper->num_bytes;
 
 	qgroup_dirty(fs_info, qgroup);
 
@@ -1434,6 +1437,8 @@  static int qgroup_excl_accounting(struct btrfs_fs_info *fs_info,
 		qgroup->excl += sign * oper->num_bytes;
 		if (sign < 0)
 			WARN_ON(qgroup->excl < oper->num_bytes);
+		if (sign > 0)
+			qgroup->reserved -= oper->num_bytes;
 		qgroup->excl_cmpr += sign * oper->num_bytes;
 		qgroup_dirty(fs_info, qgroup);
 
@@ -2359,6 +2364,61 @@  out:
 	return ret;
 }
 
+int btrfs_qgroup_update_reserved_bytes(struct btrfs_fs_info *fs_info,
+					    u64 ref_root,
+					    u64 num_bytes,
+					    int sign)
+{
+	struct btrfs_root *quota_root;
+	struct btrfs_qgroup *qgroup;
+	int ret = 0;
+	struct ulist_node *unode;
+	struct ulist_iterator uiter;
+
+	if (!is_fstree(ref_root) || !fs_info->quota_enabled)
+		return 0;
+
+	if (num_bytes == 0)
+		return 0;
+
+	spin_lock(&fs_info->qgroup_lock);
+	quota_root = fs_info->quota_root;
+	if (!quota_root)
+		goto out;
+
+	qgroup = find_qgroup_rb(fs_info, ref_root);
+	if (!qgroup)
+		goto out;
+
+	ulist_reinit(fs_info->qgroup_ulist);
+	ret = ulist_add(fs_info->qgroup_ulist, qgroup->qgroupid,
+			(uintptr_t)qgroup, GFP_ATOMIC);
+	if (ret < 0)
+		goto out;
+
+	ULIST_ITER_INIT(&uiter);
+	while ((unode = ulist_next(fs_info->qgroup_ulist, &uiter))) {
+		struct btrfs_qgroup *qg;
+		struct btrfs_qgroup_list *glist;
+
+		qg = u64_to_ptr(unode->aux);
+
+		qg->reserved += sign * num_bytes;
+
+		list_for_each_entry(glist, &qg->groups, next_group) {
+			ret = ulist_add(fs_info->qgroup_ulist,
+					glist->group->qgroupid,
+					(uintptr_t)glist->group, GFP_ATOMIC);
+			if (ret < 0)
+				goto out;
+		}
+	}
+
+out:
+	spin_unlock(&fs_info->qgroup_lock);
+	return ret;
+}
+
 /*
  * reserve some space for a qgroup and all its parents. The reservation takes
  * place with start_transaction or dealloc_reserve, similar to ENOSPC
@@ -2407,14 +2467,14 @@  int btrfs_qgroup_reserve(struct btrfs_root *root, u64 num_bytes)
 		qg = u64_to_ptr(unode->aux);
 
 		if ((qg->lim_flags & BTRFS_QGROUP_LIMIT_MAX_RFER) &&
-		    qg->reserved + (s64)qg->rfer + num_bytes >
+		    qg->reserved + qg->may_use + (s64)qg->rfer + num_bytes >
 		    qg->max_rfer) {
 			ret = -EDQUOT;
 			goto out;
 		}
 
 		if ((qg->lim_flags & BTRFS_QGROUP_LIMIT_MAX_EXCL) &&
-		    qg->reserved + (s64)qg->excl + num_bytes >
+		    qg->reserved + qg->may_use + (s64)qg->excl + num_bytes >
 		    qg->max_excl) {
 			ret = -EDQUOT;
 			goto out;
@@ -2438,7 +2498,7 @@  int btrfs_qgroup_reserve(struct btrfs_root *root, u64 num_bytes)
 
 		qg = u64_to_ptr(unode->aux);
 
-		qg->reserved += num_bytes;
+		qg->may_use += num_bytes;
 	}
 
 out:
@@ -2484,7 +2544,7 @@  void btrfs_qgroup_free(struct btrfs_root *root, u64 num_bytes)
 
 		qg = u64_to_ptr(unode->aux);
 
-		qg->reserved -= num_bytes;
+		qg->may_use -= num_bytes;
 
 		list_for_each_entry(glist, &qg->groups, next_group) {
 			ret = ulist_add(fs_info->qgroup_ulist,
diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h
index 18cc68c..31de88e 100644
--- a/fs/btrfs/qgroup.h
+++ b/fs/btrfs/qgroup.h
@@ -95,6 +95,10 @@  int btrfs_run_qgroups(struct btrfs_trans_handle *trans,
 int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans,
 			 struct btrfs_fs_info *fs_info, u64 srcid, u64 objectid,
 			 struct btrfs_qgroup_inherit *inherit);
+int btrfs_qgroup_update_reserved_bytes(struct btrfs_fs_info *fs_info,
+				       u64 ref_root,
+				       u64 num_bytes,
+				       int sign);
 int btrfs_qgroup_reserve(struct btrfs_root *root, u64 num_bytes);
 void btrfs_qgroup_free(struct btrfs_root *root, u64 num_bytes);