diff mbox series

[v2,2/2] btrfs: qgroup: add sysfs interface for debug

Message ID 20200628050715.60961-3-wqu@suse.com (mailing list archive)
State New, archived
Headers show
Series btrfs: add sysfs interface for qgroup | expand

Commit Message

Qu Wenruo June 28, 2020, 5:07 a.m. UTC
This patch will add the following sysfs interface:
/sys/fs/btrfs/<UUID>/qgroups/<qgroup_id>/reference
/sys/fs/btrfs/<UUID>/qgroups/<qgroup_id>/exclusive
/sys/fs/btrfs/<UUID>/qgroups/<qgroup_id>/max_reference
/sys/fs/btrfs/<UUID>/qgroups/<qgroup_id>/max_exclusive
/sys/fs/btrfs/<UUID>/qgroups/<qgroup_id>/limit_flags
 ^^^ Above are already in "btrfs qgroup show" command output ^^^

/sys/fs/btrfs/<UUID>/qgroups/<qgroup_id>/rsv_data
/sys/fs/btrfs/<UUID>/qgroups/<qgroup_id>/rsv_meta_pertrans
/sys/fs/btrfs/<UUID>/qgroups/<qgroup_id>/rsv_meta_prealloc

The last 3 rsv related members are not visible to users, but can be very
useful to debug qgroup limit related bugs.

Also, to avoid '/' used in <qgroup_id>, the separator between qgroup
level and qgroup id is changed to '_'.

The interface is not hidden behind 'debug' as I want this interface to
be included into production build so we could have an easier life to
debug qgroup rsv related bugs.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/ctree.h  |   1 +
 fs/btrfs/qgroup.c |  44 +++++++++++---
 fs/btrfs/qgroup.h |  11 ++++
 fs/btrfs/sysfs.c  | 151 ++++++++++++++++++++++++++++++++++++++++++++++
 fs/btrfs/sysfs.h  |   6 ++
 5 files changed, 204 insertions(+), 9 deletions(-)

Comments

David Sterba June 29, 2020, 9:30 p.m. UTC | #1
On Sun, Jun 28, 2020 at 01:07:15PM +0800, Qu Wenruo wrote:
> +QGROUP_ATTR(rfer, reference);

Note that this is 'referenced'.

> +QGROUP_ATTR(excl, exclusive);
> +QGROUP_ATTR(max_rfer, max_reference);

And here max_referenced.

> +QGROUP_ATTR(max_excl, max_exclusive);
> +QGROUP_ATTR(lim_flags, limit_flags);
> +QGROUP_RSV_ATTR(data, BTRFS_QGROUP_RSV_DATA);
> +QGROUP_RSV_ATTR(meta_pertrans, BTRFS_QGROUP_RSV_META_PERTRANS);
> +QGROUP_RSV_ATTR(meta_prealloc, BTRFS_QGROUP_RSV_META_PREALLOC);

The two above fixed but otherwise it's good, thanks.

The qgroup membership and relations could be added to the sysfs export
too, but we're limited by the PAGE_SIZE output buffer so the information
could be incomplete.
Qu Wenruo June 29, 2020, 11:17 p.m. UTC | #2
On 2020/6/30 上午5:30, David Sterba wrote:
> On Sun, Jun 28, 2020 at 01:07:15PM +0800, Qu Wenruo wrote:
>> +QGROUP_ATTR(rfer, reference);
> 
> Note that this is 'referenced'.
> 
>> +QGROUP_ATTR(excl, exclusive);
>> +QGROUP_ATTR(max_rfer, max_reference);
> 
> And here max_referenced.
> 
>> +QGROUP_ATTR(max_excl, max_exclusive);
>> +QGROUP_ATTR(lim_flags, limit_flags);
>> +QGROUP_RSV_ATTR(data, BTRFS_QGROUP_RSV_DATA);
>> +QGROUP_RSV_ATTR(meta_pertrans, BTRFS_QGROUP_RSV_META_PERTRANS);
>> +QGROUP_RSV_ATTR(meta_prealloc, BTRFS_QGROUP_RSV_META_PREALLOC);
> 
> The two above fixed but otherwise it's good, thanks.
> 
> The qgroup membership and relations could be added to the sysfs export
> too, but we're limited by the PAGE_SIZE output buffer so the information
> could be incomplete.
> 
Yep, PAGE_SIZE is one limitation.
But we can also go another direction, just using a new dir for related
qgroups, then we can workaround the limitation.

But personally speaking, the main objective is still the rsv_* members
for debug.
I don't really want to turn the sysfs interface into a way to export all
qgroup info yet.


But if this inspired us to explore more usage of sysfs other than adding
new ioctls to get information from btrfs, then I can't be more happier.

Thanks,
Qu
David Sterba June 30, 2020, 8:07 a.m. UTC | #3
On Tue, Jun 30, 2020 at 07:17:25AM +0800, Qu Wenruo wrote:
> On 2020/6/30 上午5:30, David Sterba wrote:
> > On Sun, Jun 28, 2020 at 01:07:15PM +0800, Qu Wenruo wrote:
> >> +QGROUP_ATTR(rfer, reference);
> > 
> > Note that this is 'referenced'.
> > 
> >> +QGROUP_ATTR(excl, exclusive);
> >> +QGROUP_ATTR(max_rfer, max_reference);
> > 
> > And here max_referenced.
> > 
> >> +QGROUP_ATTR(max_excl, max_exclusive);
> >> +QGROUP_ATTR(lim_flags, limit_flags);
> >> +QGROUP_RSV_ATTR(data, BTRFS_QGROUP_RSV_DATA);
> >> +QGROUP_RSV_ATTR(meta_pertrans, BTRFS_QGROUP_RSV_META_PERTRANS);
> >> +QGROUP_RSV_ATTR(meta_prealloc, BTRFS_QGROUP_RSV_META_PREALLOC);
> > 
> > The two above fixed but otherwise it's good, thanks.
> > 
> > The qgroup membership and relations could be added to the sysfs export
> > too, but we're limited by the PAGE_SIZE output buffer so the information
> > could be incomplete.
> > 
> Yep, PAGE_SIZE is one limitation.
> But we can also go another direction, just using a new dir for related
> qgroups, then we can workaround the limitation.
>
> But personally speaking, the main objective is still the rsv_* members
> for debug.
> I don't really want to turn the sysfs interface into a way to export all
> qgroup info yet.

Well, that's a bit of a problem, if you're designing a public interface
for your own needs and neglecting that it will be used as a way to read
qgroup information. Sooner or later someone will ask for the additional
information to be added.

If it's just for debugging then it should be under 'debug', but as the
bug reporter does not want to apply patches/change config you ask to
make it visible unconditionally. I'm ok with that as long as the
interface is done right because any mistake will be potentially much
harder to fix in the future.

We need to at lest have a solid idea how to extend it, not neccessarily
to implement it right now. Adding the group membership to qgroup
directories under the qgroup directory as symlinks between the kobjects
sounds ok-ish to me but it's a new idea and I need to think about it.
David Sterba June 30, 2020, 2:27 p.m. UTC | #4
On Tue, Jun 30, 2020 at 10:07:35AM +0200, David Sterba wrote:
> We need to at lest have a solid idea how to extend it, not neccessarily
> to implement it right now. Adding the group membership to qgroup
> directories under the qgroup directory as symlinks between the kobjects
> sounds ok-ish to me but it's a new idea and I need to think about it.

So my current winner idea for the hierarchy representation is to keep
all child qgroups as symlinks in the parent group. This does not require
extra kojbect to keep, and the path in sysfs would reflect the
hierarchy, like:

/sys/fs/btrfs/UUID/qgroups/4_1/3_1/2_1/1234_0

This tracks only the parent -> child pointers, so to see all parent
qgroups one has to build the reverse graph or simply search which
1st level directories contain the group as child.

Implementation is one more sysfs_create_link, the leaf qgroups
representing subvolumes will contain only the files with stats.

One potential drawback is the maximum size of path that will not allow
to show deep qgroup hierarchies, but still about 200-400 in case of
average qgroup id 20 or 10 respectively.
David Sterba June 30, 2020, 4:57 p.m. UTC | #5
On Sun, Jun 28, 2020 at 01:07:15PM +0800, Qu Wenruo wrote:
> @@ -1030,6 +1040,11 @@ int btrfs_quota_enable(struct btrfs_fs_info *fs_info)
>  				btrfs_abort_transaction(trans, ret);
>  				goto out_free_path;
>  			}
> +			ret = btrfs_sysfs_add_one_qgroup(fs_info, qgroup);
> +			if (ret < 0) {
> +				btrfs_abort_transaction(trans, ret);
> +				goto out_free_path;
> +			}
>  		}
>  		ret = btrfs_next_item(tree_root, path);
>  		if (ret < 0) {
> @@ -1054,6 +1069,11 @@ int btrfs_quota_enable(struct btrfs_fs_info *fs_info)
>  		btrfs_abort_transaction(trans, ret);
>  		goto out_free_path;
>  	}
> +	ret = btrfs_sysfs_add_one_qgroup(fs_info, qgroup);
> +	if (ret < 0) {
> +		btrfs_abort_transaction(trans, ret);
> +		goto out_free_path;
> +	}

This adds 2 new transaction abort sites altough I don't think it's
justified, the filesystem is fine. If system is that low on memory it's
gonna be very bad elsewhere too so we don't need to make things worse
jsut because of some missing sysfs entries.

A warning would be better, though in that case the validity of the
kobjects should be double checked where it's accessed.
Qu Wenruo July 1, 2020, 12:06 a.m. UTC | #6
On 2020/7/1 上午12:57, David Sterba wrote:
> On Sun, Jun 28, 2020 at 01:07:15PM +0800, Qu Wenruo wrote:
>> @@ -1030,6 +1040,11 @@ int btrfs_quota_enable(struct btrfs_fs_info *fs_info)
>>  				btrfs_abort_transaction(trans, ret);
>>  				goto out_free_path;
>>  			}
>> +			ret = btrfs_sysfs_add_one_qgroup(fs_info, qgroup);
>> +			if (ret < 0) {
>> +				btrfs_abort_transaction(trans, ret);
>> +				goto out_free_path;
>> +			}
>>  		}
>>  		ret = btrfs_next_item(tree_root, path);
>>  		if (ret < 0) {
>> @@ -1054,6 +1069,11 @@ int btrfs_quota_enable(struct btrfs_fs_info *fs_info)
>>  		btrfs_abort_transaction(trans, ret);
>>  		goto out_free_path;
>>  	}
>> +	ret = btrfs_sysfs_add_one_qgroup(fs_info, qgroup);
>> +	if (ret < 0) {
>> +		btrfs_abort_transaction(trans, ret);
>> +		goto out_free_path;
>> +	}
> 
> This adds 2 new transaction abort sites altough I don't think it's
> justified, the filesystem is fine. If system is that low on memory it's
> gonna be very bad elsewhere too so we don't need to make things worse
> jsut because of some missing sysfs entries.

The problem here is, we don't have good enough way to revert back to
previous status.

This is common among a lot of qgroup code, and I prefer to fix them
later, as it will be a big qgroup error patch cleanup.

> 
> A warning would be better, though in that case the validity of the
> kobjects should be double checked where it's accessed.
> 
It would be even worse if the qgroup relationship is also exported
through sysfs.

In that case, warning is not good enough.

So I still prefer error path cleanup as the ultimate fix.
The objective is, if we hit any error during qgroup enabling or other
qgroup operations, we revert to previous status if possible.

For qgroup enable, if we hit any non-critical error, we don't abort
trans at all, but remove all qgroups along with its qgroup items, remove
the qgroup tree, then reverts back to qgroup disabled case.
This includes -ENOMEM case.

While for critical error like tree operations errors, we still abort
transaction.

In fact, I'm already working on a similar project, but for extent_changeset.
So I guess it won't take too long for qgroup.

Thanks,
Qu
Chris Down July 15, 2020, 1:49 p.m. UTC | #7
Hi Wenruo,

While testing my pending patches on top of linux-next, I encountered a bug that 
seems related to this patch during btrfs unmount. Specifically, a null pointer 
dereference in kobject_del inside btrfs_sysfs_del_qgroups from close_ctree.

The fix may be as simple as checking if the kobject is initialised, although 
perhaps it should always be initialised in this case, so I'll leave you to work 
out what the real issue is :-)


     RIP: kobject_del+0x1/0x20

     [...]

     Call Trace:
      btrfs_sysfs_del_qgroups+0xa5/0xe0
      close_ctree+0x1cd/0x2c0
      generic_shutdown_super+0x6c/0x100
      kill_anon_super+0x14/0x30
      btrfs_kill_super+0x12/0x20
      deactivate_locked_super+0x36/0x90
      cleanup_mnt+0x12d/0x190
      task_work_run+0x5c/0x90
      __prepare_exit_to_usermode+0x164/0x170
      [...]

Thanks,

Chris
Qu Wenruo July 16, 2020, 12:15 a.m. UTC | #8
On 2020/7/15 下午9:49, Chris Down wrote:
> Hi Wenruo,
> 
> While testing my pending patches on top of linux-next, I encountered a
> bug that seems related to this patch during btrfs unmount. Specifically,
> a null pointer dereference in kobject_del inside btrfs_sysfs_del_qgroups
> from close_ctree.
> 
> The fix may be as simple as checking if the kobject is initialised,
> although perhaps it should always be initialised in this case, so I'll
> leave you to work out what the real issue is :-)

Thank you very much for the report.

May I ask if the qgroup is enabled? Or qgroup is not enabled at all?

Thanks,
Qu
> 
> 
>     RIP: kobject_del+0x1/0x20
> 
>     [...]
> 
>     Call Trace:
>      btrfs_sysfs_del_qgroups+0xa5/0xe0
>      close_ctree+0x1cd/0x2c0
>      generic_shutdown_super+0x6c/0x100
>      kill_anon_super+0x14/0x30
>      btrfs_kill_super+0x12/0x20
>      deactivate_locked_super+0x36/0x90
>      cleanup_mnt+0x12d/0x190
>      task_work_run+0x5c/0x90
>      __prepare_exit_to_usermode+0x164/0x170
>      [...]
> 
> Thanks,
> 
> Chris
Chris Down July 16, 2020, 12:25 a.m. UTC | #9
Qu Wenruo writes:
>On 2020/7/15 下午9:49, Chris Down wrote:
>> While testing my pending patches on top of linux-next, I encountered a
>> bug that seems related to this patch during btrfs unmount. Specifically,
>> a null pointer dereference in kobject_del inside btrfs_sysfs_del_qgroups
>> from close_ctree.
>>
>> The fix may be as simple as checking if the kobject is initialised,
>> although perhaps it should always be initialised in this case, so I'll
>> leave you to work out what the real issue is :-)
>
>Thank you very much for the report.
>
>May I ask if the qgroup is enabled? Or qgroup is not enabled at all?

I have quotas disabled, so I assume qgroup is also implicitly disabled.
Qu Wenruo July 16, 2020, 12:27 a.m. UTC | #10
On 2020/7/16 上午8:15, Qu Wenruo wrote:
> 
> 
> On 2020/7/15 下午9:49, Chris Down wrote:
>> Hi Wenruo,
>>
>> While testing my pending patches on top of linux-next, I encountered a
>> bug that seems related to this patch during btrfs unmount. Specifically,
>> a null pointer dereference in kobject_del inside btrfs_sysfs_del_qgroups
>> from close_ctree.
>>
>> The fix may be as simple as checking if the kobject is initialised,
>> although perhaps it should always be initialised in this case, so I'll
>> leave you to work out what the real issue is :-)
> 
> Thank you very much for the report.
> 
> May I ask if the qgroup is enabled? Or qgroup is not enabled at all?

BTW, after checking the code, it looks a little strange to me.

Firstly, both kobject_del and kobject_put() has extra check on NULL
pointers, thus if fs_info->qgroups_kobj is NULL, it should do nothing
and exit.

Secondly, the fs_info->qgroup_kobj is initialized to zero, by kvzalloc()
in btrfs_mount_root().

Thus unless we modified it manually, it should always be NULL.

And for the locations modifying qgroups_kobj, it's either allocating it,
in btrfs_sysfs_add_qgroups(), or removing it and set it back to NULL in
btrfs_sysfs_del_qgroups().

Thus this looks pretty weird.

Would you please provide the full call trace (especially the address
causing the NULL pointer deref) and the reproducer (if possible)?

Thanks,
Qu
> 
> Thanks,
> Qu
>>
>>
>>     RIP: kobject_del+0x1/0x20
>>
>>     [...]
>>
>>     Call Trace:
>>      btrfs_sysfs_del_qgroups+0xa5/0xe0
>>      close_ctree+0x1cd/0x2c0
>>      generic_shutdown_super+0x6c/0x100
>>      kill_anon_super+0x14/0x30
>>      btrfs_kill_super+0x12/0x20
>>      deactivate_locked_super+0x36/0x90
>>      cleanup_mnt+0x12d/0x190
>>      task_work_run+0x5c/0x90
>>      __prepare_exit_to_usermode+0x164/0x170
>>      [...]
>>
>> Thanks,
>>
>> Chris
>
Qu Wenruo July 16, 2020, 1:51 a.m. UTC | #11
On 2020/7/16 上午8:40, Chris Down wrote:
> Qu Wenruo writes:
>> Would you please provide the full call trace (especially the address
>> causing the NULL pointer deref) and the reproducer (if possible)?
>
> I'll try to reproduce it again when I have time. I didn't have kdump
> set, and am not currently running linux-next, so until then, you can
> have this crappy photo that I rushed to take earlier before panic=30
> took effect :-)

No problem, the image is already good enough for me to lookinto the case.

And the extra clue of linux-next would also help a lot.

Thank you very much again for the report,
Qu
Qu Wenruo July 16, 2020, 6:21 a.m. UTC | #12
On 2020/7/16 上午8:40, Chris Down wrote:
> Qu Wenruo writes:
>> Would you please provide the full call trace (especially the address
>> causing the NULL pointer deref) and the reproducer (if possible)?
>
> I'll try to reproduce it again when I have time. I didn't have kdump
> set, and am not currently running linux-next, so until then, you can
> have this crappy photo that I rushed to take earlier before panic=30
> took effect :-)

The linux-next is using an older version of that patch, which has a bad
memory allocation inside atomic critical section.

If you're unlucky enough, it could cause various weird bugs.
Thankfully, David has exposed it and the latest version in btrfs tree
should have it fixed.

If you're uncertain if that's the case, feel free to try the latest
misc-next branch to see if it's fixed.
https://github.com/kdave/btrfs-devel/tree/misc-next

Thanks for your report,
Qu
Chris Down July 16, 2020, 8:41 a.m. UTC | #13
Qu Wenruo writes:
>
>
>On 2020/7/16 上午8:40, Chris Down wrote:
>> Qu Wenruo writes:
>>> Would you please provide the full call trace (especially the address
>>> causing the NULL pointer deref) and the reproducer (if possible)?
>>
>> I'll try to reproduce it again when I have time. I didn't have kdump
>> set, and am not currently running linux-next, so until then, you can
>> have this crappy photo that I rushed to take earlier before panic=30
>> took effect :-)
>
>The linux-next is using an older version of that patch, which has a bad
>memory allocation inside atomic critical section.
>
>If you're unlucky enough, it could cause various weird bugs.
>Thankfully, David has exposed it and the latest version in btrfs tree
>should have it fixed.

Great, thanks! When I run my tests again on linux-next I'll be sure to reapply 
the latest version of your patch first, then.

Thanks,

Chris
diff mbox series

Patch

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index d8301bf240e0..9c0394893e8e 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -779,6 +779,7 @@  struct btrfs_fs_info {
 	u32 thread_pool_size;
 
 	struct kobject *space_info_kobj;
+	struct kobject *qgroups_kobj;
 
 	u64 total_pinned;
 
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 014d32429165..fc90ec0a86f1 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -22,6 +22,7 @@ 
 #include "extent_io.h"
 #include "qgroup.h"
 #include "block-group.h"
+#include "sysfs.h"
 
 /* TODO XXX FIXME
  *  - subvol delete -> delete when ref goes to 0? delete limits also?
@@ -220,10 +221,12 @@  static struct btrfs_qgroup *add_qgroup_rb(struct btrfs_fs_info *fs_info,
 	return qgroup;
 }
 
-static void __del_qgroup_rb(struct btrfs_qgroup *qgroup)
+static void __del_qgroup_rb(struct btrfs_fs_info *fs_info,
+			    struct btrfs_qgroup *qgroup)
 {
 	struct btrfs_qgroup_list *list;
 
+	btrfs_sysfs_del_one_qgroup(fs_info, qgroup);
 	list_del(&qgroup->dirty);
 	while (!list_empty(&qgroup->groups)) {
 		list = list_first_entry(&qgroup->groups,
@@ -252,7 +255,7 @@  static int del_qgroup_rb(struct btrfs_fs_info *fs_info, u64 qgroupid)
 		return -ENOENT;
 
 	rb_erase(&qgroup->node, &fs_info->qgroup_tree);
-	__del_qgroup_rb(qgroup);
+	__del_qgroup_rb(fs_info, qgroup);
 	return 0;
 }
 
@@ -351,6 +354,9 @@  int btrfs_read_qgroup_config(struct btrfs_fs_info *fs_info)
 		goto out;
 	}
 
+	ret = btrfs_sysfs_add_qgroups(fs_info);
+	if (ret < 0)
+		goto out;
 	/* default this to quota off, in case no status key is found */
 	fs_info->qgroup_flags = 0;
 
@@ -412,6 +418,10 @@  int btrfs_read_qgroup_config(struct btrfs_fs_info *fs_info)
 				goto out;
 			}
 		}
+		ret = btrfs_sysfs_add_one_qgroup(fs_info, qgroup);
+		if (ret < 0)
+			goto out;
+
 		switch (found_key.type) {
 		case BTRFS_QGROUP_INFO_KEY: {
 			struct btrfs_qgroup_info_item *ptr;
@@ -500,16 +510,12 @@  int btrfs_read_qgroup_config(struct btrfs_fs_info *fs_info)
 		ulist_free(fs_info->qgroup_ulist);
 		fs_info->qgroup_ulist = NULL;
 		fs_info->qgroup_flags &= ~BTRFS_QGROUP_STATUS_FLAG_RESCAN;
+		btrfs_sysfs_del_qgroups(fs_info);
 	}
 
 	return ret < 0 ? ret : 0;
 }
 
-static u64 btrfs_qgroup_subvolid(u64 qgroupid)
-{
-	return (qgroupid & ((1ULL << BTRFS_QGROUP_LEVEL_SHIFT) - 1));
-}
-
 /*
  * Called in close_ctree() when quota is still enabled.  This verifies we don't
  * leak some reserved space.
@@ -562,7 +568,7 @@  void btrfs_free_qgroup_config(struct btrfs_fs_info *fs_info)
 	while ((n = rb_first(&fs_info->qgroup_tree))) {
 		qgroup = rb_entry(n, struct btrfs_qgroup, node);
 		rb_erase(n, &fs_info->qgroup_tree);
-		__del_qgroup_rb(qgroup);
+		__del_qgroup_rb(fs_info, qgroup);
 	}
 	/*
 	 * We call btrfs_free_qgroup_config() when unmounting
@@ -571,6 +577,7 @@  void btrfs_free_qgroup_config(struct btrfs_fs_info *fs_info)
 	 */
 	ulist_free(fs_info->qgroup_ulist);
 	fs_info->qgroup_ulist = NULL;
+	btrfs_sysfs_del_qgroups(fs_info);
 }
 
 static int add_qgroup_relation_item(struct btrfs_trans_handle *trans, u64 src,
@@ -943,6 +950,9 @@  int btrfs_quota_enable(struct btrfs_fs_info *fs_info)
 		goto out;
 	}
 
+	ret = btrfs_sysfs_add_qgroups(fs_info);
+	if (ret < 0)
+		goto out;
 	/*
 	 * 1 for quota root item
 	 * 1 for BTRFS_QGROUP_STATUS item
@@ -1030,6 +1040,11 @@  int btrfs_quota_enable(struct btrfs_fs_info *fs_info)
 				btrfs_abort_transaction(trans, ret);
 				goto out_free_path;
 			}
+			ret = btrfs_sysfs_add_one_qgroup(fs_info, qgroup);
+			if (ret < 0) {
+				btrfs_abort_transaction(trans, ret);
+				goto out_free_path;
+			}
 		}
 		ret = btrfs_next_item(tree_root, path);
 		if (ret < 0) {
@@ -1054,6 +1069,11 @@  int btrfs_quota_enable(struct btrfs_fs_info *fs_info)
 		btrfs_abort_transaction(trans, ret);
 		goto out_free_path;
 	}
+	ret = btrfs_sysfs_add_one_qgroup(fs_info, qgroup);
+	if (ret < 0) {
+		btrfs_abort_transaction(trans, ret);
+		goto out_free_path;
+	}
 
 	ret = btrfs_commit_transaction(trans);
 	trans = NULL;
@@ -1089,6 +1109,7 @@  int btrfs_quota_enable(struct btrfs_fs_info *fs_info)
 		fs_info->qgroup_ulist = NULL;
 		if (trans)
 			btrfs_end_transaction(trans);
+		btrfs_sysfs_del_qgroups(fs_info);
 	}
 	mutex_unlock(&fs_info->qgroup_ioctl_lock);
 	return ret;
@@ -1441,8 +1462,11 @@  int btrfs_create_qgroup(struct btrfs_trans_handle *trans, u64 qgroupid)
 	qgroup = add_qgroup_rb(fs_info, qgroupid);
 	spin_unlock(&fs_info->qgroup_lock);
 
-	if (IS_ERR(qgroup))
+	if (IS_ERR(qgroup)) {
 		ret = PTR_ERR(qgroup);
+		goto out;
+	}
+	ret = btrfs_sysfs_add_one_qgroup(fs_info, qgroup);
 out:
 	mutex_unlock(&fs_info->qgroup_ioctl_lock);
 	return ret;
@@ -2861,6 +2885,8 @@  int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans, u64 srcid,
 
 unlock:
 	spin_unlock(&fs_info->qgroup_lock);
+	if (!ret)
+		ret = btrfs_sysfs_add_one_qgroup(fs_info, dstgroup);
 out:
 	if (!committing)
 		mutex_unlock(&fs_info->qgroup_ioctl_lock);
diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h
index 3be5198a3719..0116779d34e5 100644
--- a/fs/btrfs/qgroup.h
+++ b/fs/btrfs/qgroup.h
@@ -8,6 +8,7 @@ 
 
 #include <linux/spinlock.h>
 #include <linux/rbtree.h>
+#include <linux/kobject.h>
 #include "ulist.h"
 #include "delayed-ref.h"
 
@@ -223,8 +224,18 @@  struct btrfs_qgroup {
 	 */
 	u64 old_refcnt;
 	u64 new_refcnt;
+
+	/*
+	 * Sysfs kobjectid
+	 */
+	struct kobject kobj;
 };
 
+static inline u64 btrfs_qgroup_subvolid(u64 qgroupid)
+{
+	return (qgroupid & ((1ULL << BTRFS_QGROUP_LEVEL_SHIFT) - 1));
+}
+
 /*
  * For qgroup event trace points only
  */
diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
index a39bff64ff24..3d7a939601a9 100644
--- a/fs/btrfs/sysfs.c
+++ b/fs/btrfs/sysfs.c
@@ -19,6 +19,7 @@ 
 #include "volumes.h"
 #include "space-info.h"
 #include "block-group.h"
+#include "qgroup.h"
 
 struct btrfs_feature_attr {
 	struct kobj_attribute kobj_attr;
@@ -1455,6 +1456,156 @@  int btrfs_sysfs_add_mounted(struct btrfs_fs_info *fs_info)
 	return error;
 }
 
+static struct btrfs_fs_info *qgroup_kobj_to_fs_info(struct kobject *kobj)
+{
+	return to_fs_info(kobj->parent->parent);
+}
+
+#define QGROUP_ATTR(_member, _show_name)				\
+static ssize_t btrfs_qgroup_show_##_member(struct kobject *qgroup_kobj,	\
+					struct kobj_attribute *a,	\
+					char *buf)			\
+{									\
+	struct btrfs_fs_info *fs_info = qgroup_kobj_to_fs_info(		\
+			qgroup_kobj);					\
+	struct btrfs_qgroup *qgroup = container_of(qgroup_kobj,		\
+			struct btrfs_qgroup, kobj);			\
+									\
+	return btrfs_show_u64(&qgroup->_member, &fs_info->qgroup_lock,	\
+			buf);						\
+}									\
+BTRFS_ATTR(qgroup, _show_name, btrfs_qgroup_show_##_member)
+
+#define QGROUP_RSV_ATTR(_name, _type)					\
+static ssize_t btrfs_qgroup_rsv_show_##_name(struct kobject *qgroup_kobj, \
+					struct kobj_attribute *a,	\
+					char *buf)			\
+{									\
+	struct btrfs_fs_info *fs_info = qgroup_kobj_to_fs_info(		\
+			qgroup_kobj);					\
+	struct btrfs_qgroup *qgroup = container_of(qgroup_kobj,		\
+			struct btrfs_qgroup, kobj);			\
+									\
+	return btrfs_show_u64(&qgroup->rsv.values[_type],		\
+			&fs_info->qgroup_lock, buf);			\
+}									\
+BTRFS_ATTR(qgroup, rsv_##_name, btrfs_qgroup_rsv_show_##_name)
+
+QGROUP_ATTR(rfer, reference);
+QGROUP_ATTR(excl, exclusive);
+QGROUP_ATTR(max_rfer, max_reference);
+QGROUP_ATTR(max_excl, max_exclusive);
+QGROUP_ATTR(lim_flags, limit_flags);
+QGROUP_RSV_ATTR(data, BTRFS_QGROUP_RSV_DATA);
+QGROUP_RSV_ATTR(meta_pertrans, BTRFS_QGROUP_RSV_META_PERTRANS);
+QGROUP_RSV_ATTR(meta_prealloc, BTRFS_QGROUP_RSV_META_PREALLOC);
+
+static struct attribute *qgroup_attrs[] = {
+	BTRFS_ATTR_PTR(qgroup, reference),
+	BTRFS_ATTR_PTR(qgroup, exclusive),
+	BTRFS_ATTR_PTR(qgroup, max_reference),
+	BTRFS_ATTR_PTR(qgroup, max_exclusive),
+	BTRFS_ATTR_PTR(qgroup, limit_flags),
+	BTRFS_ATTR_PTR(qgroup, rsv_data),
+	BTRFS_ATTR_PTR(qgroup, rsv_meta_pertrans),
+	BTRFS_ATTR_PTR(qgroup, rsv_meta_prealloc),
+	NULL
+};
+ATTRIBUTE_GROUPS(qgroup);
+
+static void qgroup_release(struct kobject *kobj)
+{
+	struct btrfs_qgroup *qgroup = container_of(kobj, struct btrfs_qgroup,
+			kobj);
+
+	memset(&qgroup->kobj, 0, sizeof(*kobj));
+}
+
+static struct kobj_type qgroup_ktype = {
+	.sysfs_ops = &kobj_sysfs_ops,
+	.release = qgroup_release,
+	.default_groups = qgroup_groups,
+};
+
+int btrfs_sysfs_add_one_qgroup(struct btrfs_fs_info *fs_info,
+				struct btrfs_qgroup *qgroup)
+{
+	struct kobject *qgroups_kobj = fs_info->qgroups_kobj;
+	int ret;
+
+	if (test_bit(BTRFS_FS_STATE_DUMMY_FS_INFO, &fs_info->fs_state))
+		return 0;
+	if (qgroup->kobj.state_initialized)
+		return 0;
+	if (!qgroups_kobj)
+		return -EINVAL;
+
+	ret = kobject_init_and_add(&qgroup->kobj, &qgroup_ktype, qgroups_kobj,
+			"%hu_%llu", btrfs_qgroup_level(qgroup->qgroupid),
+			btrfs_qgroup_subvolid(qgroup->qgroupid));
+	if (ret < 0)
+		kobject_put(&qgroup->kobj);
+
+	return ret;
+}
+
+void btrfs_sysfs_del_qgroups(struct btrfs_fs_info *fs_info)
+{
+	struct btrfs_qgroup *qgroup;
+	struct btrfs_qgroup *next;
+
+	if (test_bit(BTRFS_FS_STATE_DUMMY_FS_INFO, &fs_info->fs_state))
+		return;
+	rbtree_postorder_for_each_entry_safe(qgroup, next,
+			&fs_info->qgroup_tree, node)
+		btrfs_sysfs_del_one_qgroup(fs_info, qgroup);
+	kobject_del(fs_info->qgroups_kobj);
+	kobject_put(fs_info->qgroups_kobj);
+	fs_info->qgroups_kobj = NULL;
+}
+
+/* Called when qgroup get initialized, thus there is no need for extra lock. */
+int btrfs_sysfs_add_qgroups(struct btrfs_fs_info *fs_info)
+{
+	struct kobject *fsid_kobj = &fs_info->fs_devices->fsid_kobj;
+	struct btrfs_qgroup *qgroup;
+	struct btrfs_qgroup *next;
+	int ret = 0;
+
+	if (test_bit(BTRFS_FS_STATE_DUMMY_FS_INFO, &fs_info->fs_state))
+		return 0;
+	ASSERT(fsid_kobj);
+	if (fs_info->qgroups_kobj)
+		return 0;
+
+	fs_info->qgroups_kobj = kobject_create_and_add("qgroups", fsid_kobj);
+	if (!fs_info->qgroups_kobj) {
+		ret = -ENOMEM;
+		goto out;
+	}
+	rbtree_postorder_for_each_entry_safe(qgroup, next,
+			&fs_info->qgroup_tree, node) {
+		ret = btrfs_sysfs_add_one_qgroup(fs_info, qgroup);
+		if (ret < 0)
+			goto out;
+	}
+
+out:
+	if (ret < 0)
+		btrfs_sysfs_del_qgroups(fs_info);
+	return ret;
+}
+
+void btrfs_sysfs_del_one_qgroup(struct btrfs_fs_info *fs_info,
+				struct btrfs_qgroup *qgroup)
+{
+	if (test_bit(BTRFS_FS_STATE_DUMMY_FS_INFO, &fs_info->fs_state))
+		return;
+	if (qgroup->kobj.state_initialized) {
+		kobject_del(&qgroup->kobj);
+		kobject_put(&qgroup->kobj);
+	}
+}
 
 /*
  * Change per-fs features in /sys/fs/btrfs/UUID/features to match current
diff --git a/fs/btrfs/sysfs.h b/fs/btrfs/sysfs.h
index 718a26c97833..1e27a9c94c84 100644
--- a/fs/btrfs/sysfs.h
+++ b/fs/btrfs/sysfs.h
@@ -36,4 +36,10 @@  int btrfs_sysfs_add_space_info_type(struct btrfs_fs_info *fs_info,
 void btrfs_sysfs_remove_space_info(struct btrfs_space_info *space_info);
 void btrfs_sysfs_update_devid(struct btrfs_device *device);
 
+int btrfs_sysfs_add_one_qgroup(struct btrfs_fs_info *fs_info,
+				struct btrfs_qgroup *qgroup);
+void btrfs_sysfs_del_qgroups(struct btrfs_fs_info *fs_info);
+int btrfs_sysfs_add_qgroups(struct btrfs_fs_info *fs_info);
+void btrfs_sysfs_del_one_qgroup(struct btrfs_fs_info *fs_info,
+				struct btrfs_qgroup *qgroup);
 #endif