Message ID | 1380468336-5170-1-git-send-email-anand.jain@oracle.com (mailing list archive) |
---|---|
State | Under Review, archived |
Headers | show |
On Sun, Sep 29, 2013 at 11:25:36PM +0800, Anand Jain wrote: > It needs a lot more information about the snapshots if > snapshot's life cycle has to be all auto managed by > scripts _some day_. this patch is a step towards that. > > This patch provides the size which would be freed > if the subvol/snapshot is deleted. > preview: > --------------------- > btrfs su show /btrfs/sv1 > :: > Unshared space: 89.09MiB Useful information, but it takes time to calculate the number. The tree search ioctl has significant overhead and seeking all over the extent tree can be also lengthy. I've tried it on a random subvolume on a SSD disk and it's been running 10 minutes already, one cpu is at 100%. david -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> On Sun, Sep 29, 2013 at 11:25:36PM +0800, Anand Jain wrote: >> It needs a lot more information about the snapshots if >> snapshot's life cycle has to be all auto managed by >> scripts _some day_. this patch is a step towards that. >> >> This patch provides the size which would be freed >> if the subvol/snapshot is deleted. >> preview: >> --------------------- >> btrfs su show /btrfs/sv1 >> :: >> Unshared space: 89.09MiB > > Useful information, but it takes time to calculate the number. The tree > search ioctl has significant overhead and seeking all over the extent > tree can be also lengthy. I've tried it on a random subvolume on a SSD > disk and it's been running 10 minutes already, one cpu is at 100%. No wonder Anand's approach is inefficient.In Anand's approach: Search Every EXTENT_DATA Search every extent's refs in extent tree. If we want to speed up this progress, i think we can split it into two parts: 1. First to search inline extent_data 2. Secondly search in extent tree to calculate (extent refs=1) This will avoid n*n rather than n+n which is better… Thanks, Wang > > david > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> If we want to speed up this progress, i think we can split it into two parts: > > 1. First to search inline extent_data > 2. Secondly search in extent tree to calculate (extent refs=1) > > This will avoid n*n rather than n+n which is better… (sorry for delay in reply, I was on vacation). Thanks for what might help. I was kind of expecting this when preliminary patch was sent before. Let me try what you suggest and send out v3. I was also thinking if this should be inside kernel facilitated by a new ioctl? so that we avoid number of search ioctl thats required. Thanks, Anand -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 10/07/2013 10:47 AM, Anand Jain wrote: > > >> If we want to speed up this progress, i think we can split it into two parts: >> >> 1. First to search inline extent_data >> 2. Secondly search in extent tree to calculate (extent refs=1) >> >> This will avoid n*n rather than n+n which is better… > (sorry for delay in reply, I was on vacation). > > Thanks for what might help. I was kind of expecting > this when preliminary patch was sent before. Let me > try what you suggest and send out v3. > > I was also thinking if this should be inside kernel > facilitated by a new ioctl? so that we avoid number > of search ioctl thats required. Yeah, adding another ioctl maybe help. Anyway, there is another question. Did we need to consider metadata into subvolume's sole size? Personaly, the answer is yes. Considering a subvolume is filled with a lot of empty files, but we get subvolume's sole size *0*. Thanks, Wang > > Thanks, Anand > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 10/07/2013 11:01 AM, Wang Shilong wrote: > On 10/07/2013 10:47 AM, Anand Jain wrote: >> >> >>> If we want to speed up this progress, i think we can split it into two parts: >>> >>> 1. First to search inline extent_data >>> 2. Secondly search in extent tree to calculate (extent refs=1) >>> >>> This will avoid n*n rather than n+n which is better… >> (sorry for delay in reply, I was on vacation). >> >> Thanks for what might help. I was kind of expecting >> this when preliminary patch was sent before. Let me >> try what you suggest and send out v3. >> >> I was also thinking if this should be inside kernel >> facilitated by a new ioctl? so that we avoid number >> of search ioctl thats required. > Yeah, adding another ioctl maybe help. Anyway, there is another question. > Did we need to consider metadata into subvolume's sole size? > > Personaly, the answer is yes. Considering a subvolume is filled with a lot > of empty files, but we get subvolume's sole size *0*. yes. Will consider for an update patch later. Thanks, Anand -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Oct 07, 2013 at 10:47:56AM +0800, Anand Jain wrote: > I was also thinking if this should be inside kernel > facilitated by a new ioctl? so that we avoid number > of search ioctl thats required. I think so. And for the feature itself, it can be handy in case where qgroups are not established. david -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Wang, > 1. First to search inline extent_data > 2. Secondly search in extent tree to calculate (extent refs=1) but these extents will be of entire fs, how do you filter it for a subvol ? Thanks, Anand > This will avoid n*n rather than n+n which is better… -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 10/09/2013 04:03 PM, Anand Jain wrote: > Wang, > > >> 1. First to search inline extent_data >> 2. Secondly search in extent tree to calculate (extent refs=1) > but these extents will be of entire fs, how do you filter it for a subvol ? Sorry, this could not be true as i have expected,(we have to iterate extent tree with my approach) If we considered both metadata and data for a subvolume, we just iterate all the extent items in the extent tree and add those extents into subvolume's sole size iff: 1> extent ref=1 2> backref root=subvol However, this also may come very *slowly* when extent tree comes very large. Thanks, Wang > > > Thanks, Anand > >> This will avoid n*n rather than n+n which is better… > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hello David, Anand, > On Mon, Oct 07, 2013 at 10:47:56AM +0800, Anand Jain wrote: >> I was also thinking if this should be inside kernel >> facilitated by a new ioctl? so that we avoid number >> of search ioctl thats required. > > I think so. And for the feature itself, it can be handy in case where > qgroups are not established. Actually, i have not found an effectively way to work out this. Whatever we implement this as an ioctl in kernel space or just in userspace, why not just trigger quota enabled, but not set qgroup limit. If quota been enabled, quota rescan will calculate every subvolume's referenced and exclusive size(to implement this we have to iterate extent tree though). Luckily, after that, quota itself will track every subvolume's sole size(exclusive size). Justing using 'btrfs qgroup show', we can figure out every subvolume's referenced size and exclusive size easily, i think this is another important use of qgroup. If users want to find top subvolume's sole size, they can use: btrfs qgroup show <mnt> --sort=+/-excl This approach maybe a little tricky, but it is better than implement an ioctl to calculate one subvolume's size every time. Thanks, Wang > > david -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 10/10/2013 11:35 AM, Anand Jain wrote: > > If 'btrfs_file_extent_item' can contain the ref count it would > solve the current problem quite easily. (problem is that, its > of n * n searches to know data extents with its ref for a given > subvol). Just considering btrfs_file_extent_item is not enough, because we should consider metadata(as i have said before). > > But what'r the challenge(s) to have ref count in the > btrfs_file_extent_item ? any thoughts ? Haven't thought a better idea yet. > > Thanks, Anand > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
If 'btrfs_file_extent_item' can contain the ref count it would solve the current problem quite easily. (problem is that, its of n * n searches to know data extents with its ref for a given subvol). But what'r the challenge(s) to have ref count in the btrfs_file_extent_item ? any thoughts ? Thanks, Anand -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hello Anand, I have sent a similar comment to your email thread in http://www.spinics.net/lists/linux-btrfs/msg27547.html I believe this approach of calculating freeable space is incorrect. Try this: - create a fresh btrfs - create a regular file - write some data into it in such a way, that it was, say 4000 EXTENT_DATA items, so that file tree and extent tree get deep enough - run btrfs-debug-tree and verify that all EXTENT_ITEMs of this file (in the extent tree) have refcnt=1 - create a snapshot of the subvolume - run btrfs-debug-tree again You will see that most (in my case - all) of EXTENT_ITEMs still have refcnt=1. Reason for this is as I mentioned in http://www.spinics.net/lists/linux-btrfs/msg27547.html But if you delete the subvolume, no space will be freed, because all these extents are also shared by the snapshot. Although it seems that your tool will report that all subvolume's space is freeable (based on refcnt=1). Can you pls try that experiment and comment on it? Perhaps I am missing something here? Thanks! Alex. On Thu, Oct 10, 2013 at 6:33 AM, Wang Shilong <wangsl.fnst@cn.fujitsu.com> wrote: > On 10/10/2013 11:35 AM, Anand Jain wrote: >> >> >> If 'btrfs_file_extent_item' can contain the ref count it would >> solve the current problem quite easily. (problem is that, its >> of n * n searches to know data extents with its ref for a given >> subvol). > > Just considering btrfs_file_extent_item is not enough, because > we should consider metadata(as i have said before). > >> >> But what'r the challenge(s) to have ref count in the >> btrfs_file_extent_item ? any thoughts ? > > Haven't thought a better idea yet. > > >> >> Thanks, Anand >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Alex, On 11/29/2013 01:39 AM, Alex Lyakas wrote: > Hello Anand, > I have sent a similar comment to your email thread in > http://www.spinics.net/lists/linux-btrfs/msg27547.html > > I believe this approach of calculating freeable space is incorrect. > Try this: > - create a fresh btrfs > - create a regular file > - write some data into it in such a way, that it was, say 4000 > EXTENT_DATA items, so that file tree and extent tree get deep enough > - run btrfs-debug-tree and verify that all EXTENT_ITEMs of this file > (in the extent tree) have refcnt=1 > - create a snapshot of the subvolume > - run btrfs-debug-tree again > > You will see that most (in my case - all) of EXTENT_ITEMs still have > refcnt=1. Reason for this is as I mentioned in > http://www.spinics.net/lists/linux-btrfs/msg27547.html > > But if you delete the subvolume, no space will be freed, because all > these extents are also shared by the snapshot. Although it seems that > your tool will report that all subvolume's space is freeable (based on > refcnt=1). > > Can you pls try that experiment and comment on it? Perhaps I am > missing something here? I think you are right here, the only way we can make sure whether a extent_item is shared by many roots is to walk backref to find all roots, and if number of roots is 1, we know it's sole space. If so, i don't think this should be implemented in user space, Btrfs quota implement such function in kernel space, but it also face a problem that deleting a subvolume will break space accounting(because subvolume deletion won't iterate the whole fs tree). Thanks, Wang > > Thanks! > Alex. > > > > On Thu, Oct 10, 2013 at 6:33 AM, Wang Shilong > <wangsl.fnst@cn.fujitsu.com> wrote: >> On 10/10/2013 11:35 AM, Anand Jain wrote: >>> >>> If 'btrfs_file_extent_item' can contain the ref count it would >>> solve the current problem quite easily. (problem is that, its >>> of n * n searches to know data extents with its ref for a given >>> subvol). >> Just considering btrfs_file_extent_item is not enough, because >> we should consider metadata(as i have said before). >> >>> But what'r the challenge(s) to have ref count in the >>> btrfs_file_extent_item ? any thoughts ? >> Haven't thought a better idea yet. >> >> >>> Thanks, Anand >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> If so, i don't think this should be implemented in user space, Btrfs quota > implement such function in kernel space, but it also face a problem that > deleting a subvolume will break space accounting(because subvolume deletion > won't iterate the whole fs tree). Hi Alex, Wang. Thanks for comments, as Wang suggested before, I am working on the qgroups to handle our user space requisites here. I will be sending out review patch soon. pls stay tuned. Thanks, Anand -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/cmds-subvolume.c b/cmds-subvolume.c index de246ab..d801e00 100644 --- a/cmds-subvolume.c +++ b/cmds-subvolume.c @@ -915,6 +915,8 @@ static int cmd_subvol_show(int argc, char **argv) else printf("\tFlags: \t\t\t-\n"); + printf("\tUnshared space: \t%s\n", + pretty_size(get_subvol_freeable_bytes(fd))); /* print the snapshots of the given subvol if any*/ printf("\tSnapshot(s):\n"); filter_set = btrfs_list_alloc_filter_set(); diff --git a/utils.c b/utils.c index ccb5199..ca30485 100644 --- a/utils.c +++ b/utils.c @@ -2062,3 +2062,157 @@ int lookup_ino_rootid(int fd, u64 *rootid) return 0; } + +/* gets the ref count for given extent + * 0 = didn't find the item + * n = number of references +*/ +u64 get_extent_refcnt(int fd, u64 disk_blk) +{ + int ret = 0, i, e; + struct btrfs_ioctl_search_args args; + struct btrfs_ioctl_search_key *sk = &args.key; + struct btrfs_ioctl_search_header sh; + unsigned long off = 0; + + memset(&args, 0, sizeof(args)); + + sk->tree_id = BTRFS_EXTENT_TREE_OBJECTID; + + sk->min_type = BTRFS_EXTENT_ITEM_KEY; + sk->max_type = BTRFS_EXTENT_ITEM_KEY; + + sk->min_objectid = disk_blk; + sk->max_objectid = disk_blk; + + sk->max_offset = (u64)-1; + sk->max_transid = (u64)-1; + + while (1) { + sk->nr_items = 4096; + + ret = ioctl(fd, BTRFS_IOC_TREE_SEARCH, &args); + e = errno; + if (ret < 0) { + fprintf(stderr, "ERROR: search failed - %s\n", + strerror(e)); + return 0; + } + if (sk->nr_items == 0) + break; + + off = 0; + for (i = 0; i < sk->nr_items; i++) { + struct btrfs_extent_item *ei; + u64 ref; + + memcpy(&sh, args.buf + off, sizeof(sh)); + off += sizeof(sh); + + if (sh.type != BTRFS_EXTENT_ITEM_KEY) { + off += sh.len; + continue; + } + + ei = (struct btrfs_extent_item *)(args.buf + off); + ref = btrfs_stack_extent_refs(ei); + return ref; + } + sk->min_objectid = sh.objectid; + sk->min_offset = sh.offset; + sk->min_type = sh.type; + if (sk->min_offset < (u64)-1) + sk->min_offset++; + else if (sk->min_objectid < (u64)-1) { + sk->min_objectid++; + sk->min_offset = 0; + sk->min_type = 0; + } else + break; + } + return 0; +} + +u64 get_subvol_freeable_bytes(int fd) +{ + int ret = 0, i, e; + struct btrfs_ioctl_search_args args; + struct btrfs_ioctl_search_key *sk = &args.key; + struct btrfs_ioctl_search_header sh; + unsigned long off = 0; + u64 size_bytes = 0; + + memset(&args, 0, sizeof(args)); + + sk->tree_id = 0; + + sk->min_type = BTRFS_EXTENT_DATA_KEY; + sk->max_type = BTRFS_EXTENT_DATA_KEY; + + sk->max_objectid = (u64) -1; + sk->max_offset = (u64)-1; + sk->max_transid = (u64)-1; + + while (1) { + sk->nr_items = 4096; + + ret = ioctl(fd, BTRFS_IOC_TREE_SEARCH, &args); + e = errno; + if (ret < 0) { + fprintf(stderr, "ERROR: search failed - %s\n", + strerror(e)); + return 0; + } + if (sk->nr_items == 0) + break; + + off = 0; + for (i = 0; i < sk->nr_items; i++) { + struct btrfs_file_extent_item *efi; + u64 disk_bytenr = 0; + u64 num_bytes = 0; + u64 refcnt; + u8 type; + + memcpy(&sh, args.buf + off, sizeof(sh)); + off += sizeof(sh); + + if (sh.type != BTRFS_EXTENT_DATA_KEY) { + off += sh.len; + continue; + } + + efi = (struct btrfs_file_extent_item *)(args.buf + off); + type = btrfs_stack_file_extent_type(efi); + + if (type == BTRFS_FILE_EXTENT_INLINE) { + size_bytes += + btrfs_stack_file_extent_ram_bytes(efi); + goto skip_extent_data; + } + disk_bytenr = btrfs_stack_file_extent_disk_bytenr(efi); + num_bytes = btrfs_stack_file_extent_num_bytes(efi); + + if (disk_bytenr) { + refcnt = get_extent_refcnt(fd, disk_bytenr); + if (refcnt == 1) + size_bytes += num_bytes; + } +skip_extent_data: + off += sh.len; + } + sk->min_objectid = sh.objectid; + sk->min_offset = sh.offset; + sk->min_type = sh.type; + + if (sk->min_offset < (u64)-1) + sk->min_offset++; + else if (sk->min_objectid < (u64)-1) { + sk->min_objectid++; + sk->min_offset = 0; + sk->min_type = 0; + } else + break; + } + return size_bytes; +} diff --git a/utils.h b/utils.h index 0f31db7..4ddcf09 100644 --- a/utils.h +++ b/utils.h @@ -91,5 +91,6 @@ int csum_tree_block(struct btrfs_root *root, struct extent_buffer *buf, int ask_user(char *question); int lookup_ino_rootid(int fd, u64 *rootid); int btrfs_scan_lblkid(int update_kernel); +u64 get_subvol_freeable_bytes(int fd); #endif
It needs a lot more information about the snapshots if snapshot's life cycle has to be all auto managed by scripts _some day_. this patch is a step towards that. This patch provides the size which would be freed if the subvol/snapshot is deleted. preview: --------------------- btrfs su show /btrfs/sv1 :: Unshared space: 89.09MiB --------------------- v2: rename to 'unshared space' and edit commit text worked on review comments Signed-off-by: Anand Jain <anand.jain@oracle.com> --- cmds-subvolume.c | 2 + utils.c | 154 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ utils.h | 1 + 3 files changed, 157 insertions(+)