Message ID | 1466616406-28087-1-git-send-email-dsterba@suse.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 06/22/2016 07:26 PM, David Sterba wrote: > From: David Sterba <dsterba@suse.cz> > > Hi, > > the chunk dump is a useful thing, for debugging or balance filters. Nice! > Example output: > > Chunks on device id: 1 > PNumber Type PStart Length PEnd Age LStart Usage > ----------------------------------------------------------------------------------------------- > 0 System/RAID1 1.00MiB 32.00MiB 33.00MiB 47 1.40TiB 0.06 > 1 Metadata/RAID1 33.00MiB 1.00GiB 1.03GiB 31 1.36TiB 64.03 > 2 Metadata/RAID1 1.03GiB 32.00MiB 1.06GiB 36 1.36TiB 77.28 > 3 Data/single 1.06GiB 1.00GiB 2.06GiB 12 422.30GiB 78.90 > 4 Data/single 2.06GiB 1.00GiB 3.06GiB 11 420.30GiB 78.47 > ... > > (full output is at http://susepaste.org/view/raw/089a9877) > > This patch does the basic output, not filtering or sorting besides physical and > logical offset. There are possiblities to do more or enhance the output (eg. > starting with logical chunk and list the related physcial chunks together). > Or filter by type/profile, or understand the balance filters format. > > As it is now, it's per-device dump of physical layout, which was the original > idea. > > Printing 'usage' is not default as it's quite slow, it uses the search ioctl > and probably not in the best way, or there's some other issue in the > implementation. Interesting. So after reading this, I wrote a little test to test some scenarios: https://github.com/knorrie/python-btrfs/commit/1ca99880dfa0e14b148f3d9e2b6b381b781eb52d It's very clear that the most optimal way of doing this search is to have nr_items=1 and if possible, specify the length in offset. I have no idea yet what happens inside the kernel when nr_items is not 1, in combination with a large extent tree. I'll try to follow the code and see if I can find what's making this big difference. Also, the filesystem with the 1200 second search result is Debian 3.16.7-ckt25-2, I haven't looked yet if there's different behaviour inside handling searches between that kernel and e.g. 4.5. After loading a huge amount of data in memory, the search with nr_items > 1 only takes 4.6 seconds to come up with the block group item itself, and another 4.6 seconds to come up with 0 additional results, only using cpu time. It seems that searching in the empty space between (vaddr BLOCK_GROUP_ITEM length+1) and (vaddr BLOCK_GROUP_ITEM ULLONG_MAX) is really expensive, while there's absolutely nothing to find. I ran into this two days ago, caused by doing an unnecessary extra search after finding the block item already: https://github.com/knorrie/python-btrfs/commit/5a69d9cf0477515ce2d6e50f1741276a49b33af8 > I'll add the patch to devel branch but will not add it for any particular > release yet, I'd like some feedback first. Thanks. > > ------------- > New command 'btrfs inspect-internal dump-chunks' will dump layout of > chunks as stored on the devices. This corresponds to the physical > layout, sorted by the physical offset. The block group usage can be > shown as well, but the search is too slow so it's off by default. > > If the physical offset sorting is selected, the empty space between > chunks is also shown. > > Signed-off-by: David Sterba <dsterba@suse.com> > --- > cmds-inspect.c | 364 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 364 insertions(+) > [...] > +} > + > +static u64 fill_usage(int fd, u64 lstart) > +{ > + struct btrfs_ioctl_search_args args; > + struct btrfs_ioctl_search_key *sk = &args.key; > + struct btrfs_ioctl_search_header sh; > + struct btrfs_block_group_item *item; > + int ret; > + > + memset(&args, 0, sizeof(args)); > + sk->tree_id = BTRFS_EXTENT_TREE_OBJECTID; > + sk->min_objectid = lstart; > + sk->max_objectid = lstart; > + sk->min_type = BTRFS_BLOCK_GROUP_ITEM_KEY; > + sk->max_type = BTRFS_BLOCK_GROUP_ITEM_KEY; > + sk->min_offset = 0; > + sk->max_offset = (u64)-1; The chunk length is already known here, so it can go in offset. > + sk->max_transid = (u64)-1; > + > + sk->nr_items = 4096; Or set to 1 because we already know there can only be one result.
On 06/22/2016 07:26 PM, David Sterba wrote: > From: David Sterba <dsterba@suse.cz> > > Hi, > > the chunk dump is a useful thing, for debugging or balance filters. > > Example output: > > Chunks on device id: 1 > PNumber Type PStart Length PEnd Age LStart Usage > ----------------------------------------------------------------------------------------------- > 0 System/RAID1 1.00MiB 32.00MiB 33.00MiB 47 1.40TiB 0.06 > 1 Metadata/RAID1 33.00MiB 1.00GiB 1.03GiB 31 1.36TiB 64.03 > 2 Metadata/RAID1 1.03GiB 32.00MiB 1.06GiB 36 1.36TiB 77.28 > 3 Data/single 1.06GiB 1.00GiB 2.06GiB 12 422.30GiB 78.90 > 4 Data/single 2.06GiB 1.00GiB 3.06GiB 11 420.30GiB 78.47 > ... On RAID0, it looks funny :) # ./btrfs inspect-internal dump-chunks /mnt/extents/ Chunks on device id: 1 PNumber Type PStart Length PEnd Age LStart ---------------------------------------------------------------------------------------- 0 Metadata/RAID0 1.00MiB 256.00MiB 257.00MiB 1 13.36GiB . empty . 16.00EiB . . . 1 System/RAID0 129.00MiB 64.00MiB 193.00MiB 2 13.61GiB . empty . 16.00EiB . . . 2 Data/RAID0 161.00MiB 2.00GiB 2.16GiB 3 15.68GiB . empty . 115.00MiB . . . 3 Data/RAID0 2.27GiB 2.00GiB 4.27GiB 0 11.36GiB Chunks on device id: 2 PNumber Type PStart Length PEnd Age LStart ---------------------------------------------------------------------------------------- 0 Data/RAID0 1.00MiB 2.00GiB 2.00GiB 0 11.36GiB . empty . 16.00EiB . . . 1 Metadata/RAID0 1.00GiB 256.00MiB 1.25GiB 1 13.36GiB . empty . 16.00EiB . . . 2 System/RAID0 1.13GiB 64.00MiB 1.19GiB 2 13.61GiB . empty . 16.00EiB . . . 3 Data/RAID0 1.16GiB 2.00GiB 3.16GiB 3 15.68GiB For the correct sizes, it would be needed to either do some math, like if RAID0 then divide by 2, but I guess this gets horrible really soon with RAID5 etc... Or, if you want to print everything in the physical order, why not just dump the device tree with DEV_EXTENT which has the exact length on disk, and fill in the missing information using the reference to the chunk they belong to? https://github.com/knorrie/python-btrfs/commit/be85c2a0e2774909b532326e99e0c23ab8467263 Sorry, couldn't resist /:D\
At 06/23/2016 01:26 AM, David Sterba wrote: > From: David Sterba <dsterba@suse.cz> > > Hi, > > the chunk dump is a useful thing, for debugging or balance filters. Yes, quite useful. I don't ever need to manually check btrfs-debug-tree output to figure out the dev-extent layout. But according to the output, it's more like dev-extent dump, not chunk dump. It's better to make things more clear for objects at different logical level. > > Example output: > > Chunks on device id: 1 > PNumber Type PStart Length PEnd Age LStart Usage > ----------------------------------------------------------------------------------------------- > 0 System/RAID1 1.00MiB 32.00MiB 33.00MiB 47 1.40TiB 0.06 [Mixed chunk/dev-extent output] For "PStart/Length/Pend/Lstart" that's OK. (Although I prefer "Plength" to "Length", as it's a little confusing for chunk logical length and dev_extent physical length) Since "pstart" = key.offset, "length" = dev_extent_length, "Pend" = pstart + length and "Lstart" = dev_extent_chunk_offset, these values are all straightforward. But for Usage, it's completely a chunk/block group level thing, so personally speaking it's better not to output per chunk usage in dev-extent dump If there is real "chunk dump", then that's the best place to output per-chunk "usage". For dev-extent level usage, it should be the usage of a device, not per chunk. ["Age" calculation] Another concern is about "age", since there is no generation in chunk_item nor dev_extent, so you're using logical bytenr as a hint. This method implies the fact that btrfs will always allocates new chunk with increasing logical bytenr, never reuse lower logical bytenr. Such behavior provides the basis for a lot of other btrfs functions, like balance/qgroup rescan. IMHO it won't be changed in any time soon. But I'm still a little concerned about using such assumption. Since "age" here is more like a sorting index, not the generation when the chunk is created. It may not really reflect how long one chunk really lived. It maybe completely OK for most people, but at least it's not as straightforward as generation. [Support for offline btrfs] BTW, it's also a good idea to support not only mounted fs, but also offlined one. As for offlined one, no need to bother about the tree search ioctl, just straightforward search_slots(). (That's why I like offline btrfs tools) Thanks, Qu > 1 Metadata/RAID1 33.00MiB 1.00GiB 1.03GiB 31 1.36TiB 64.03 > 2 Metadata/RAID1 1.03GiB 32.00MiB 1.06GiB 36 1.36TiB 77.28 > 3 Data/single 1.06GiB 1.00GiB 2.06GiB 12 422.30GiB 78.90 > 4 Data/single 2.06GiB 1.00GiB 3.06GiB 11 420.30GiB 78.47 > ... > > (full output is at http://susepaste.org/view/raw/089a9877) > > This patch does the basic output, not filtering or sorting besides physical and > logical offset. There are possiblities to do more or enhance the output (eg. > starting with logical chunk and list the related physcial chunks together). > Or filter by type/profile, or understand the balance filters format. > > As it is now, it's per-device dump of physical layout, which was the original > idea. > > Printing 'usage' is not default as it's quite slow, it uses the search ioctl > and probably not in the best way, or there's some other issue in the > implementation. > > I'll add the patch to devel branch but will not add it for any particular > release yet, I'd like some feedback first. Thanks. > > ------------- > New command 'btrfs inspect-internal dump-chunks' will dump layout of > chunks as stored on the devices. This corresponds to the physical > layout, sorted by the physical offset. The block group usage can be > shown as well, but the search is too slow so it's off by default. > > If the physical offset sorting is selected, the empty space between > chunks is also shown. > > Signed-off-by: David Sterba <dsterba@suse.com> > --- > cmds-inspect.c | 364 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 364 insertions(+) > > diff --git a/cmds-inspect.c b/cmds-inspect.c > index dd7b9dd278f2..4eace63d8517 100644 > --- a/cmds-inspect.c > +++ b/cmds-inspect.c > @@ -623,6 +623,368 @@ static int cmd_inspect_min_dev_size(int argc, char **argv) > return !!ret; > } > > +static const char * const cmd_dump_chunks_usage[] = { > + "btrfs inspect-internal chunk-stats [options] <path>", > + "Show chunks (block groups) layout", > + "Show chunks (block groups) layout for all devices", > + "", > + HELPINFO_UNITS_LONG, > + "--sort=MODE sort by the physical or logical chunk start", > + " MODE is one of pstart or lstart (default: pstart)", > + "--usage show usage per block group, note this can be slow", > + NULL > +}; > + > +enum { > + CHUNK_SORT_PSTART, > + CHUNK_SORT_LSTART, > + CHUNK_SORT_DEFAULT = CHUNK_SORT_PSTART > +}; > + > +struct dump_chunks_entry { > + u64 devid; > + u64 start; > + u64 lstart; > + u64 length; > + u64 flags; > + u64 age; > + u64 used; > + u32 pnumber; > +}; > + > +struct dump_chunks_ctx { > + unsigned length; > + unsigned size; > + struct dump_chunks_entry *stats; > +}; > + > +int cmp_cse_devid_start(const void *va, const void *vb) > +{ > + const struct dump_chunks_entry *a = va; > + const struct dump_chunks_entry *b = vb; > + > + if (a->devid < b->devid) > + return -1; > + if (a->devid > b->devid) > + return 1; > + > + if (a->start < b->start) > + return -1; > + if (a->start == b->start) { > + error( > + "chunks start on same offset in the same device: devid %llu start %llu", > + (unsigned long long)a->devid, (unsigned long long)a->start); > + return 0; > + } > + return 1; > +} > + > +int cmp_cse_devid_lstart(const void *va, const void *vb) > +{ > + const struct dump_chunks_entry *a = va; > + const struct dump_chunks_entry *b = vb; > + > + if (a->devid < b->devid) > + return -1; > + if (a->devid > b->devid) > + return 1; > + > + if (a->lstart < b->lstart) > + return -1; > + if (a->lstart == b->lstart) { > + error( > +"chunks logically start on same offset in the same device: devid %llu start %llu", > + (unsigned long long)a->devid, (unsigned long long)a->lstart); > + return 0; > + } > + return 1; > +} > + > +void print_dump_chunks(struct dump_chunks_ctx *ctx, unsigned sort_mode, > + unsigned unit_mode, int with_usage) > +{ > + u64 devid; > + struct dump_chunks_entry e; > + int i; > + int chidx; > + u64 lastend = 0; > + u64 age; > + > + /* > + * Chunks are sorted logically as found by the ioctl, we need to sort > + * them once to find the physical ordering. This is the default mode. > + */ > + qsort(ctx->stats, ctx->length, sizeof(ctx->stats[0]), cmp_cse_devid_start); > + devid = 0; > + age = 0; > + for (i = 0; i < ctx->length; i++) { > + e = ctx->stats[i]; > + if (e.devid != devid) { > + devid = e.devid; > + age = 0; > + } > + ctx->stats[i].pnumber = age; > + age++; > + } > + > + if (sort_mode == CHUNK_SORT_LSTART) > + qsort(ctx->stats, ctx->length, sizeof(ctx->stats[0]), cmp_cse_devid_lstart); > + > + devid = 0; > + for (i = 0; i < ctx->length; i++) { > + e = ctx->stats[i]; > + if (e.devid != devid) { > + devid = e.devid; > + if (i != 0) > + putchar('\n'); > + printf("Chunks on device id: %llu\n", devid); > + printf("PNumber Type PStart Length PEnd Age LStart%s\n", > + with_usage ? " Usage" : ""); > + printf("----------------------------------------------------------------------------------------%s\n", > + with_usage ? "-------" : ""); > + chidx = 0; > + lastend = 0; > + } > + if (sort_mode == CHUNK_SORT_PSTART && lastend > 0 > + && e.start != lastend) { > + printf(" . empty . "); > + printf("%12s ", > + pretty_size_mode(e.start - lastend, unit_mode)); > + printf(" . . .\n"); > + } > + > + printf("%7u ", e.pnumber); > + printf("%8s/%-6s ", btrfs_group_type_str(e.flags), > + btrfs_group_profile_str(e.flags)); > + printf("%12s ", pretty_size_mode(e.start, unit_mode)); > + printf("%12s ", pretty_size_mode(e.length, unit_mode)); > + printf("%12s ", > + pretty_size_mode(e.start + e.length - 1, unit_mode)); > + printf("%6llu ", e.age); > + printf("%14s", pretty_size_mode(e.lstart, unit_mode)); > + if (with_usage) > + printf(" %5.2f", (float)e.used / e.length * 100); > + printf("\n"); > + > + lastend = e.start + e.length; > + chidx++; > + } > +} > + > +static u64 fill_usage(int fd, u64 lstart) > +{ > + struct btrfs_ioctl_search_args args; > + struct btrfs_ioctl_search_key *sk = &args.key; > + struct btrfs_ioctl_search_header sh; > + struct btrfs_block_group_item *item; > + int ret; > + > + memset(&args, 0, sizeof(args)); > + sk->tree_id = BTRFS_EXTENT_TREE_OBJECTID; > + sk->min_objectid = lstart; > + sk->max_objectid = lstart; > + sk->min_type = BTRFS_BLOCK_GROUP_ITEM_KEY; > + sk->max_type = BTRFS_BLOCK_GROUP_ITEM_KEY; > + sk->min_offset = 0; > + sk->max_offset = (u64)-1; > + sk->max_transid = (u64)-1; > + > + sk->nr_items = 4096; > + ret = ioctl(fd, BTRFS_IOC_TREE_SEARCH, &args); > + if (ret < 0) { > + error("cannot perform the search: %s", strerror(errno)); > + return 1; > + } > + if (sk->nr_items == 0) { > + warning("blockgroup %llu not found", > + (unsigned long long)lstart); > + return 0; > + } > + if (sk->nr_items > 1) { > + warning("found more than one blockgroup %llu", > + (unsigned long long)lstart); > + } > + > + memcpy(&sh, args.buf, sizeof(sh)); > + item = (struct btrfs_block_group_item*)(args.buf + sizeof(sh)); > + > + return item->used; > +} > + > +static int cmd_dump_chunks(int argc, char **argv) > +{ > + struct btrfs_ioctl_search_args args; > + struct btrfs_ioctl_search_key *sk = &args.key; > + struct btrfs_ioctl_search_header sh; > + unsigned long off = 0; > + u64 *age = 0; > + unsigned age_size = 128; > + int ret; > + int fd; > + int i; > + int e; > + DIR *dirstream = NULL; > + unsigned unit_mode; > + unsigned sort_mode = 0; > + int with_usage = 0; > + const char *path; > + struct dump_chunks_ctx ctx = { > + .length = 0, > + .size = 1024, > + .stats = NULL > + }; > + > + unit_mode = get_unit_mode_from_arg(&argc, argv, 0); > + > + while (1) { > + int c; > + enum { GETOPT_VAL_SORT = 256, GETOPT_VAL_USAGE }; > + static const struct option long_options[] = { > + {"sort", required_argument, NULL, GETOPT_VAL_SORT }, > + {"usage", no_argument, NULL, GETOPT_VAL_USAGE }, > + {NULL, 0, NULL, 0} > + }; > + > + c = getopt_long(argc, argv, "", long_options, NULL); > + if (c < 0) > + break; > + > + switch (c) { > + case GETOPT_VAL_SORT: > + if (strcmp(optarg, "pstart") == 0) { > + sort_mode = CHUNK_SORT_PSTART; > + } else if (strcmp(optarg, "lstart") == 0) { > + sort_mode = CHUNK_SORT_LSTART; > + } else { > + error("unknown sort mode: %s", optarg); > + exit(1); > + } > + break; > + case GETOPT_VAL_USAGE: > + with_usage = 1; > + break; > + default: > + usage(cmd_dump_chunks_usage); > + } > + } > + > + if (check_argc_exact(argc - optind, 1)) > + usage(cmd_dump_chunks_usage); > + > + ctx.stats = calloc(ctx.size, sizeof(ctx.stats[0])); > + if (!ctx.stats) > + goto out_nomem; > + > + path = argv[optind]; > + > + fd = open_file_or_dir(path, &dirstream); > + if (fd < 0) { > + error("cannot access '%s': %s", path, strerror(errno)); > + return 1; > + } > + > + memset(&args, 0, sizeof(args)); > + sk->tree_id = BTRFS_CHUNK_TREE_OBJECTID; > + sk->min_objectid = BTRFS_FIRST_CHUNK_TREE_OBJECTID; > + sk->max_objectid = BTRFS_FIRST_CHUNK_TREE_OBJECTID; > + sk->min_type = BTRFS_CHUNK_ITEM_KEY; > + sk->max_type = BTRFS_CHUNK_ITEM_KEY; > + sk->max_offset = (u64)-1; > + sk->max_transid = (u64)-1; > + age = calloc(age_size, sizeof(u64)); > + if (!age) > + goto out_nomem; > + > + while (1) { > + sk->nr_items = 4096; > + ret = ioctl(fd, BTRFS_IOC_TREE_SEARCH, &args); > + e = errno; > + if (ret < 0) { > + error("cannot perform the search: %s", strerror(e)); > + return 1; > + } > + if (sk->nr_items == 0) > + break; > + > + off = 0; > + for (i = 0; i < sk->nr_items; i++) { > + struct btrfs_chunk *item; > + struct btrfs_stripe *stripes; > + int sidx; > + u64 used = (u64)-1; > + > + memcpy(&sh, args.buf + off, sizeof(sh)); > + off += sizeof(sh); > + item = (struct btrfs_chunk*)(args.buf + off); > + off += sh.len; > + > + stripes = &item->stripe; > + for (sidx = 0; sidx < item->num_stripes; sidx++) { > + struct dump_chunks_entry *e; > + u64 devid; > + > + e = &ctx.stats[ctx.length]; > + devid = stripes[sidx].devid; > + e->devid = devid; > + e->start = stripes[sidx].offset; > + e->lstart = sh.offset; > + e->length = item->length; > + e->flags = item->type; > + e->pnumber = -1; > + while (devid > age_size) { > + u64 *tmp; > + unsigned old_size = age_size; > + > + age_size += 128; > + tmp = calloc(age_size, sizeof(u64)); > + if (!tmp) { > + free(age); > + goto out_nomem; > + } > + memcpy(tmp, age, sizeof(u64) * old_size); > + age = tmp; > + } > + e->age = age[devid]++; > + if (with_usage) { > + if (used == (u64)-1) > + used = fill_usage(fd, sh.offset); > + e->used = used; > + } else { > + e->used = 0; > + } > + > + ctx.length++; > + > + if (ctx.length == ctx.size) { > + ctx.size += 1024; > + ctx.stats = realloc(ctx.stats, ctx.size > + * sizeof(ctx.stats[0])); > + if (!ctx.stats) > + goto out_nomem; > + } > + } > + > + sk->min_objectid = sh.objectid; > + sk->min_type = sh.type; > + sk->min_offset = sh.offset; > + } > + if (sk->min_offset < (u64)-1) > + sk->min_offset++; > + else > + break; > + } > + > + print_dump_chunks(&ctx, sort_mode, unit_mode, with_usage); > + free(ctx.stats); > + > + close_file_or_dir(fd, dirstream); > + return 0; > + > +out_nomem: > + error("not enough memory"); > + return 1; > +} > + > static const char inspect_cmd_group_info[] = > "query various internal information"; > > @@ -644,6 +1006,8 @@ const struct cmd_group inspect_cmd_group = { > cmd_inspect_dump_super_usage, NULL, 0 }, > { "tree-stats", cmd_inspect_tree_stats, > cmd_inspect_tree_stats_usage, NULL, 0 }, > + { "dump-chunks", cmd_dump_chunks, cmd_dump_chunks_usage, NULL, > + 0 }, > NULL_CMD_STRUCT > } > }; > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jun 22, 2016 at 07:26:46PM +0200, David Sterba wrote: > From: David Sterba <dsterba@suse.cz> > > Hi, > > the chunk dump is a useful thing, for debugging or balance filters. > > Example output: > > Chunks on device id: 1 > PNumber Type PStart Length PEnd Age LStart Usage > ----------------------------------------------------------------------------------------------- > 0 System/RAID1 1.00MiB 32.00MiB 33.00MiB 47 1.40TiB 0.06 > 1 Metadata/RAID1 33.00MiB 1.00GiB 1.03GiB 31 1.36TiB 64.03 > 2 Metadata/RAID1 1.03GiB 32.00MiB 1.06GiB 36 1.36TiB 77.28 > 3 Data/single 1.06GiB 1.00GiB 2.06GiB 12 422.30GiB 78.90 > 4 Data/single 2.06GiB 1.00GiB 3.06GiB 11 420.30GiB 78.47 > ... > > (full output is at http://susepaste.org/view/raw/089a9877) > > This patch does the basic output, not filtering or sorting besides physical and > logical offset. There are possiblities to do more or enhance the output (eg. > starting with logical chunk and list the related physcial chunks together). > Or filter by type/profile, or understand the balance filters format. > > As it is now, it's per-device dump of physical layout, which was the original > idea. > > Printing 'usage' is not default as it's quite slow, it uses the search ioctl > and probably not in the best way, or there's some other issue in the > implementation. > > I'll add the patch to devel branch but will not add it for any particular > release yet, I'd like some feedback first. Thanks. > > ------------- > New command 'btrfs inspect-internal dump-chunks' will dump layout of > chunks as stored on the devices. This corresponds to the physical > layout, sorted by the physical offset. The block group usage can be > shown as well, but the search is too slow so it's off by default. > > If the physical offset sorting is selected, the empty space between > chunks is also shown. > > Signed-off-by: David Sterba <dsterba@suse.com> > --- > cmds-inspect.c | 364 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 364 insertions(+) > > diff --git a/cmds-inspect.c b/cmds-inspect.c > index dd7b9dd278f2..4eace63d8517 100644 > --- a/cmds-inspect.c > +++ b/cmds-inspect.c > @@ -623,6 +623,368 @@ static int cmd_inspect_min_dev_size(int argc, char **argv) > return !!ret; > } > > +static const char * const cmd_dump_chunks_usage[] = { > + "btrfs inspect-internal chunk-stats [options] <path>", > + "Show chunks (block groups) layout", > + "Show chunks (block groups) layout for all devices", > + "", > + HELPINFO_UNITS_LONG, > + "--sort=MODE sort by the physical or logical chunk start", > + " MODE is one of pstart or lstart (default: pstart)", > + "--usage show usage per block group, note this can be slow", > + NULL > +}; > + > +enum { > + CHUNK_SORT_PSTART, > + CHUNK_SORT_LSTART, > + CHUNK_SORT_DEFAULT = CHUNK_SORT_PSTART > +}; > + > +struct dump_chunks_entry { > + u64 devid; > + u64 start; > + u64 lstart; > + u64 length; > + u64 flags; > + u64 age; > + u64 used; > + u32 pnumber; > +}; > + > +struct dump_chunks_ctx { > + unsigned length; > + unsigned size; > + struct dump_chunks_entry *stats; > +}; > + > +int cmp_cse_devid_start(const void *va, const void *vb) > +{ > + const struct dump_chunks_entry *a = va; > + const struct dump_chunks_entry *b = vb; > + > + if (a->devid < b->devid) > + return -1; > + if (a->devid > b->devid) > + return 1; > + > + if (a->start < b->start) > + return -1; > + if (a->start == b->start) { > + error( > + "chunks start on same offset in the same device: devid %llu start %llu", > + (unsigned long long)a->devid, (unsigned long long)a->start); > + return 0; > + } > + return 1; > +} > + > +int cmp_cse_devid_lstart(const void *va, const void *vb) > +{ > + const struct dump_chunks_entry *a = va; > + const struct dump_chunks_entry *b = vb; > + > + if (a->devid < b->devid) > + return -1; > + if (a->devid > b->devid) > + return 1; > + > + if (a->lstart < b->lstart) > + return -1; > + if (a->lstart == b->lstart) { > + error( > +"chunks logically start on same offset in the same device: devid %llu start %llu", > + (unsigned long long)a->devid, (unsigned long long)a->lstart); > + return 0; > + } > + return 1; > +} > + > +void print_dump_chunks(struct dump_chunks_ctx *ctx, unsigned sort_mode, > + unsigned unit_mode, int with_usage) > +{ > + u64 devid; > + struct dump_chunks_entry e; > + int i; > + int chidx; > + u64 lastend = 0; > + u64 age; > + > + /* > + * Chunks are sorted logically as found by the ioctl, we need to sort > + * them once to find the physical ordering. This is the default mode. > + */ > + qsort(ctx->stats, ctx->length, sizeof(ctx->stats[0]), cmp_cse_devid_start); > + devid = 0; > + age = 0; > + for (i = 0; i < ctx->length; i++) { > + e = ctx->stats[i]; > + if (e.devid != devid) { > + devid = e.devid; > + age = 0; > + } > + ctx->stats[i].pnumber = age; > + age++; > + } > + > + if (sort_mode == CHUNK_SORT_LSTART) > + qsort(ctx->stats, ctx->length, sizeof(ctx->stats[0]), cmp_cse_devid_lstart); > + > + devid = 0; > + for (i = 0; i < ctx->length; i++) { > + e = ctx->stats[i]; > + if (e.devid != devid) { > + devid = e.devid; > + if (i != 0) > + putchar('\n'); > + printf("Chunks on device id: %llu\n", devid); > + printf("PNumber Type PStart Length PEnd Age LStart%s\n", > + with_usage ? " Usage" : ""); > + printf("----------------------------------------------------------------------------------------%s\n", > + with_usage ? "-------" : ""); > + chidx = 0; > + lastend = 0; > + } > + if (sort_mode == CHUNK_SORT_PSTART && lastend > 0 > + && e.start != lastend) { > + printf(" . empty . "); > + printf("%12s ", > + pretty_size_mode(e.start - lastend, unit_mode)); > + printf(" . . .\n"); > + } > + > + printf("%7u ", e.pnumber); > + printf("%8s/%-6s ", btrfs_group_type_str(e.flags), > + btrfs_group_profile_str(e.flags)); > + printf("%12s ", pretty_size_mode(e.start, unit_mode)); > + printf("%12s ", pretty_size_mode(e.length, unit_mode)); > + printf("%12s ", > + pretty_size_mode(e.start + e.length - 1, unit_mode)); > + printf("%6llu ", e.age); > + printf("%14s", pretty_size_mode(e.lstart, unit_mode)); > + if (with_usage) > + printf(" %5.2f", (float)e.used / e.length * 100); > + printf("\n"); > + > + lastend = e.start + e.length; > + chidx++; > + } > +} > + > +static u64 fill_usage(int fd, u64 lstart) > +{ > + struct btrfs_ioctl_search_args args; > + struct btrfs_ioctl_search_key *sk = &args.key; > + struct btrfs_ioctl_search_header sh; > + struct btrfs_block_group_item *item; > + int ret; > + > + memset(&args, 0, sizeof(args)); > + sk->tree_id = BTRFS_EXTENT_TREE_OBJECTID; > + sk->min_objectid = lstart; > + sk->max_objectid = lstart; > + sk->min_type = BTRFS_BLOCK_GROUP_ITEM_KEY; > + sk->max_type = BTRFS_BLOCK_GROUP_ITEM_KEY; > + sk->min_offset = 0; > + sk->max_offset = (u64)-1; > + sk->max_transid = (u64)-1; > + > + sk->nr_items = 4096; What if we set nr_items = 1? From the code review, this can let us stop and return immediately. Thanks, -liubo > + ret = ioctl(fd, BTRFS_IOC_TREE_SEARCH, &args); > + if (ret < 0) { > + error("cannot perform the search: %s", strerror(errno)); > + return 1; > + } > + if (sk->nr_items == 0) { > + warning("blockgroup %llu not found", > + (unsigned long long)lstart); > + return 0; > + } > + if (sk->nr_items > 1) { > + warning("found more than one blockgroup %llu", > + (unsigned long long)lstart); > + } > + > + memcpy(&sh, args.buf, sizeof(sh)); > + item = (struct btrfs_block_group_item*)(args.buf + sizeof(sh)); > + > + return item->used; > +} > + > +static int cmd_dump_chunks(int argc, char **argv) > +{ > + struct btrfs_ioctl_search_args args; > + struct btrfs_ioctl_search_key *sk = &args.key; > + struct btrfs_ioctl_search_header sh; > + unsigned long off = 0; > + u64 *age = 0; > + unsigned age_size = 128; > + int ret; > + int fd; > + int i; > + int e; > + DIR *dirstream = NULL; > + unsigned unit_mode; > + unsigned sort_mode = 0; > + int with_usage = 0; > + const char *path; > + struct dump_chunks_ctx ctx = { > + .length = 0, > + .size = 1024, > + .stats = NULL > + }; > + > + unit_mode = get_unit_mode_from_arg(&argc, argv, 0); > + > + while (1) { > + int c; > + enum { GETOPT_VAL_SORT = 256, GETOPT_VAL_USAGE }; > + static const struct option long_options[] = { > + {"sort", required_argument, NULL, GETOPT_VAL_SORT }, > + {"usage", no_argument, NULL, GETOPT_VAL_USAGE }, > + {NULL, 0, NULL, 0} > + }; > + > + c = getopt_long(argc, argv, "", long_options, NULL); > + if (c < 0) > + break; > + > + switch (c) { > + case GETOPT_VAL_SORT: > + if (strcmp(optarg, "pstart") == 0) { > + sort_mode = CHUNK_SORT_PSTART; > + } else if (strcmp(optarg, "lstart") == 0) { > + sort_mode = CHUNK_SORT_LSTART; > + } else { > + error("unknown sort mode: %s", optarg); > + exit(1); > + } > + break; > + case GETOPT_VAL_USAGE: > + with_usage = 1; > + break; > + default: > + usage(cmd_dump_chunks_usage); > + } > + } > + > + if (check_argc_exact(argc - optind, 1)) > + usage(cmd_dump_chunks_usage); > + > + ctx.stats = calloc(ctx.size, sizeof(ctx.stats[0])); > + if (!ctx.stats) > + goto out_nomem; > + > + path = argv[optind]; > + > + fd = open_file_or_dir(path, &dirstream); > + if (fd < 0) { > + error("cannot access '%s': %s", path, strerror(errno)); > + return 1; > + } > + > + memset(&args, 0, sizeof(args)); > + sk->tree_id = BTRFS_CHUNK_TREE_OBJECTID; > + sk->min_objectid = BTRFS_FIRST_CHUNK_TREE_OBJECTID; > + sk->max_objectid = BTRFS_FIRST_CHUNK_TREE_OBJECTID; > + sk->min_type = BTRFS_CHUNK_ITEM_KEY; > + sk->max_type = BTRFS_CHUNK_ITEM_KEY; > + sk->max_offset = (u64)-1; > + sk->max_transid = (u64)-1; > + age = calloc(age_size, sizeof(u64)); > + if (!age) > + goto out_nomem; > + > + while (1) { > + sk->nr_items = 4096; > + ret = ioctl(fd, BTRFS_IOC_TREE_SEARCH, &args); > + e = errno; > + if (ret < 0) { > + error("cannot perform the search: %s", strerror(e)); > + return 1; > + } > + if (sk->nr_items == 0) > + break; > + > + off = 0; > + for (i = 0; i < sk->nr_items; i++) { > + struct btrfs_chunk *item; > + struct btrfs_stripe *stripes; > + int sidx; > + u64 used = (u64)-1; > + > + memcpy(&sh, args.buf + off, sizeof(sh)); > + off += sizeof(sh); > + item = (struct btrfs_chunk*)(args.buf + off); > + off += sh.len; > + > + stripes = &item->stripe; > + for (sidx = 0; sidx < item->num_stripes; sidx++) { > + struct dump_chunks_entry *e; > + u64 devid; > + > + e = &ctx.stats[ctx.length]; > + devid = stripes[sidx].devid; > + e->devid = devid; > + e->start = stripes[sidx].offset; > + e->lstart = sh.offset; > + e->length = item->length; > + e->flags = item->type; > + e->pnumber = -1; > + while (devid > age_size) { > + u64 *tmp; > + unsigned old_size = age_size; > + > + age_size += 128; > + tmp = calloc(age_size, sizeof(u64)); > + if (!tmp) { > + free(age); > + goto out_nomem; > + } > + memcpy(tmp, age, sizeof(u64) * old_size); > + age = tmp; > + } > + e->age = age[devid]++; > + if (with_usage) { > + if (used == (u64)-1) > + used = fill_usage(fd, sh.offset); > + e->used = used; > + } else { > + e->used = 0; > + } > + > + ctx.length++; > + > + if (ctx.length == ctx.size) { > + ctx.size += 1024; > + ctx.stats = realloc(ctx.stats, ctx.size > + * sizeof(ctx.stats[0])); > + if (!ctx.stats) > + goto out_nomem; > + } > + } > + > + sk->min_objectid = sh.objectid; > + sk->min_type = sh.type; > + sk->min_offset = sh.offset; > + } > + if (sk->min_offset < (u64)-1) > + sk->min_offset++; > + else > + break; > + } > + > + print_dump_chunks(&ctx, sort_mode, unit_mode, with_usage); > + free(ctx.stats); > + > + close_file_or_dir(fd, dirstream); > + return 0; > + > +out_nomem: > + error("not enough memory"); > + return 1; > +} > + > static const char inspect_cmd_group_info[] = > "query various internal information"; > > @@ -644,6 +1006,8 @@ const struct cmd_group inspect_cmd_group = { > cmd_inspect_dump_super_usage, NULL, 0 }, > { "tree-stats", cmd_inspect_tree_stats, > cmd_inspect_tree_stats_usage, NULL, 0 }, > + { "dump-chunks", cmd_dump_chunks, cmd_dump_chunks_usage, NULL, > + 0 }, > NULL_CMD_STRUCT > } > }; > -- > 2.7.1 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jun 22, 2016 at 06:53:52PM -0700, Liu Bo wrote: > > +static u64 fill_usage(int fd, u64 lstart) > > +{ > > + struct btrfs_ioctl_search_args args; > > + struct btrfs_ioctl_search_key *sk = &args.key; > > + struct btrfs_ioctl_search_header sh; > > + struct btrfs_block_group_item *item; > > + int ret; > > + > > + memset(&args, 0, sizeof(args)); > > + sk->tree_id = BTRFS_EXTENT_TREE_OBJECTID; > > + sk->min_objectid = lstart; > > + sk->max_objectid = lstart; > > + sk->min_type = BTRFS_BLOCK_GROUP_ITEM_KEY; > > + sk->max_type = BTRFS_BLOCK_GROUP_ITEM_KEY; > > + sk->min_offset = 0; > > + sk->max_offset = (u64)-1; > > + sk->max_transid = (u64)-1; > > + > > + sk->nr_items = 4096; > > What if we set nr_items = 1? From the code review, this can let us > stop and return immediately. Indeed, I've copy-pasted the search code. When it's 1 it returns immediatelly if the metadata are cached, cold read took a few seconds on a terabyte-sized filesystem. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Jun 23, 2016 at 09:20:40AM +0800, Qu Wenruo wrote: > Yes, quite useful. > I don't ever need to manually check btrfs-debug-tree output to figure > out the dev-extent layout. > > But according to the output, it's more like dev-extent dump, not chunk dump. > > It's better to make things more clear for objects at different logical > level. Agreed, therefore I'd create 2 commands, one for dumping the logical chunks and another for the dev-extents (which is unfortunatelly not as short for a command name). > > Example output: > > > > Chunks on device id: 1 > > PNumber Type PStart Length PEnd Age LStart Usage > > ----------------------------------------------------------------------------------------------- > > 0 System/RAID1 1.00MiB 32.00MiB 33.00MiB 47 1.40TiB 0.06 > > [Mixed chunk/dev-extent output] > For "PStart/Length/Pend/Lstart" that's OK. > (Although I prefer "Plength" to "Length", as it's a little confusing for > chunk logical length and dev_extent physical length) This should be sorted once the commands are split. Using the 'P' or 'L' prefixes for clarity is a good thing. > Since "pstart" = key.offset, "length" = dev_extent_length, "Pend" = > pstart + length and "Lstart" = dev_extent_chunk_offset, these values are > all straightforward. > > But for Usage, it's completely a chunk/block group level thing, so > personally speaking it's better not to output per chunk usage in > dev-extent dump Agreed. > If there is real "chunk dump", then that's the best place to output > per-chunk "usage". > > For dev-extent level usage, it should be the usage of a device, not per > chunk. > > ["Age" calculation] > Another concern is about "age", since there is no generation in > chunk_item nor dev_extent, so you're using logical bytenr as a hint. > > This method implies the fact that btrfs will always allocates new chunk > with increasing logical bytenr, never reuse lower logical bytenr. > > Such behavior provides the basis for a lot of other btrfs functions, > like balance/qgroup rescan. > IMHO it won't be changed in any time soon. But I'm still a little > concerned about using such assumption. > > > Since "age" here is more like a sorting index, not the generation when > the chunk is created. > It may not really reflect how long one chunk really lived. > > It maybe completely OK for most people, but at least it's not as > straightforward as generation. Yes, I'll change it to LNumber. > [Support for offline btrfs] > BTW, it's also a good idea to support not only mounted fs, but also > offlined one. > > As for offlined one, no need to bother about the tree search ioctl, just > straightforward search_slots(). > (That's why I like offline btrfs tools) Yeah, the offline mode would be useful and not only for this command, so this should be sorted first on the whole tool level so we don't stitch it to each command separately. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Jun 23, 2016 at 12:20:38AM +0200, Hans van Kranenburg wrote: > > Printing 'usage' is not default as it's quite slow, it uses the search ioctl > > and probably not in the best way, or there's some other issue in the > > implementation. > > Interesting. > > So after reading this, I wrote a little test to test some scenarios: > > https://github.com/knorrie/python-btrfs/commit/1ca99880dfa0e14b148f3d9e2b6b381b781eb52d > > It's very clear that the most optimal way of doing this search is to > have nr_items=1 and if possible, specify the length in offset. And that solved it. [...] > It seems that searching in the empty space between > (vaddr BLOCK_GROUP_ITEM length+1) and > (vaddr BLOCK_GROUP_ITEM ULLONG_MAX) > is really expensive, while there's absolutely nothing to find. Yeah, the few BLOCK_GROUP_ITEMs are scattered among tons of EXTENT_ITEMs -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 06/23/2016 03:13 PM, David Sterba wrote: > On Thu, Jun 23, 2016 at 12:20:38AM +0200, Hans van Kranenburg wrote: >>> Printing 'usage' is not default as it's quite slow, it uses the search ioctl >>> and probably not in the best way, or there's some other issue in the >>> implementation. >> >> Interesting. >> >> So after reading this, I wrote a little test to test some scenarios: >> >> https://github.com/knorrie/python-btrfs/commit/1ca99880dfa0e14b148f3d9e2b6b381b781eb52d >> >> It's very clear that the most optimal way of doing this search is to >> have nr_items=1 and if possible, specify the length in offset. > > And that solved it. > > [...] > >> It seems that searching in the empty space between >> (vaddr BLOCK_GROUP_ITEM length+1) and >> (vaddr BLOCK_GROUP_ITEM ULLONG_MAX) >> is really expensive, while there's absolutely nothing to find. > > Yeah, the few BLOCK_GROUP_ITEMs are scattered among tons of EXTENT_ITEMs On the same vaddr objectid, there should only be at most one extent, and EXTENT_ITEM_KEY < BLOCK_GROUP_ITEM_KEY, so the space after the block group item should always contain exactly 0 items? Still it takes very long...
On Thu, Jun 23, 2016 at 03:10:09AM +0200, Hans van Kranenburg wrote: > On 06/22/2016 07:26 PM, David Sterba wrote: > > From: David Sterba <dsterba@suse.cz> > > > > Hi, > > > > the chunk dump is a useful thing, for debugging or balance filters. > > > > Example output: > > > > Chunks on device id: 1 > > PNumber Type PStart Length PEnd Age LStart Usage > > ----------------------------------------------------------------------------------------------- > > 0 System/RAID1 1.00MiB 32.00MiB 33.00MiB 47 1.40TiB 0.06 > > 1 Metadata/RAID1 33.00MiB 1.00GiB 1.03GiB 31 1.36TiB 64.03 > > 2 Metadata/RAID1 1.03GiB 32.00MiB 1.06GiB 36 1.36TiB 77.28 > > 3 Data/single 1.06GiB 1.00GiB 2.06GiB 12 422.30GiB 78.90 > > 4 Data/single 2.06GiB 1.00GiB 3.06GiB 11 420.30GiB 78.47 > > ... > > On RAID0, it looks funny :) > > # ./btrfs inspect-internal dump-chunks /mnt/extents/ > Chunks on device id: 1 > PNumber Type PStart Length PEnd Age LStart > ---------------------------------------------------------------------------------------- > 0 Metadata/RAID0 1.00MiB 256.00MiB 257.00MiB 1 13.36GiB > . empty . 16.00EiB . . . > 1 System/RAID0 129.00MiB 64.00MiB 193.00MiB 2 13.61GiB > . empty . 16.00EiB . . . > 2 Data/RAID0 161.00MiB 2.00GiB 2.16GiB 3 15.68GiB > . empty . 115.00MiB . . . > 3 Data/RAID0 2.27GiB 2.00GiB 4.27GiB 0 11.36GiB > > Chunks on device id: 2 > PNumber Type PStart Length PEnd > Age LStart > ---------------------------------------------------------------------------------------- > 0 Data/RAID0 1.00MiB 2.00GiB 2.00GiB 0 11.36GiB > . empty . 16.00EiB . . . > 1 Metadata/RAID0 1.00GiB 256.00MiB 1.25GiB 1 13.36GiB > . empty . 16.00EiB . . . > 2 System/RAID0 1.13GiB 64.00MiB 1.19GiB 2 13.61GiB > . empty . 16.00EiB . . . > 3 Data/RAID0 1.16GiB 2.00GiB 3.16GiB 3 15.68GiB > > For the correct sizes, it would be needed to either do some math, like > if RAID0 then divide by 2, but I guess this gets horrible really soon > with RAID5 etc... Or, if you want to print everything in the physical > order, why not just dump the device tree with DEV_EXTENT which has the > exact length on disk, and fill in the missing information using the > reference to the chunk they belong to? That's a good idea and would avoid the guesswork that makes the current numbers funny. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/cmds-inspect.c b/cmds-inspect.c index dd7b9dd278f2..4eace63d8517 100644 --- a/cmds-inspect.c +++ b/cmds-inspect.c @@ -623,6 +623,368 @@ static int cmd_inspect_min_dev_size(int argc, char **argv) return !!ret; } +static const char * const cmd_dump_chunks_usage[] = { + "btrfs inspect-internal chunk-stats [options] <path>", + "Show chunks (block groups) layout", + "Show chunks (block groups) layout for all devices", + "", + HELPINFO_UNITS_LONG, + "--sort=MODE sort by the physical or logical chunk start", + " MODE is one of pstart or lstart (default: pstart)", + "--usage show usage per block group, note this can be slow", + NULL +}; + +enum { + CHUNK_SORT_PSTART, + CHUNK_SORT_LSTART, + CHUNK_SORT_DEFAULT = CHUNK_SORT_PSTART +}; + +struct dump_chunks_entry { + u64 devid; + u64 start; + u64 lstart; + u64 length; + u64 flags; + u64 age; + u64 used; + u32 pnumber; +}; + +struct dump_chunks_ctx { + unsigned length; + unsigned size; + struct dump_chunks_entry *stats; +}; + +int cmp_cse_devid_start(const void *va, const void *vb) +{ + const struct dump_chunks_entry *a = va; + const struct dump_chunks_entry *b = vb; + + if (a->devid < b->devid) + return -1; + if (a->devid > b->devid) + return 1; + + if (a->start < b->start) + return -1; + if (a->start == b->start) { + error( + "chunks start on same offset in the same device: devid %llu start %llu", + (unsigned long long)a->devid, (unsigned long long)a->start); + return 0; + } + return 1; +} + +int cmp_cse_devid_lstart(const void *va, const void *vb) +{ + const struct dump_chunks_entry *a = va; + const struct dump_chunks_entry *b = vb; + + if (a->devid < b->devid) + return -1; + if (a->devid > b->devid) + return 1; + + if (a->lstart < b->lstart) + return -1; + if (a->lstart == b->lstart) { + error( +"chunks logically start on same offset in the same device: devid %llu start %llu", + (unsigned long long)a->devid, (unsigned long long)a->lstart); + return 0; + } + return 1; +} + +void print_dump_chunks(struct dump_chunks_ctx *ctx, unsigned sort_mode, + unsigned unit_mode, int with_usage) +{ + u64 devid; + struct dump_chunks_entry e; + int i; + int chidx; + u64 lastend = 0; + u64 age; + + /* + * Chunks are sorted logically as found by the ioctl, we need to sort + * them once to find the physical ordering. This is the default mode. + */ + qsort(ctx->stats, ctx->length, sizeof(ctx->stats[0]), cmp_cse_devid_start); + devid = 0; + age = 0; + for (i = 0; i < ctx->length; i++) { + e = ctx->stats[i]; + if (e.devid != devid) { + devid = e.devid; + age = 0; + } + ctx->stats[i].pnumber = age; + age++; + } + + if (sort_mode == CHUNK_SORT_LSTART) + qsort(ctx->stats, ctx->length, sizeof(ctx->stats[0]), cmp_cse_devid_lstart); + + devid = 0; + for (i = 0; i < ctx->length; i++) { + e = ctx->stats[i]; + if (e.devid != devid) { + devid = e.devid; + if (i != 0) + putchar('\n'); + printf("Chunks on device id: %llu\n", devid); + printf("PNumber Type PStart Length PEnd Age LStart%s\n", + with_usage ? " Usage" : ""); + printf("----------------------------------------------------------------------------------------%s\n", + with_usage ? "-------" : ""); + chidx = 0; + lastend = 0; + } + if (sort_mode == CHUNK_SORT_PSTART && lastend > 0 + && e.start != lastend) { + printf(" . empty . "); + printf("%12s ", + pretty_size_mode(e.start - lastend, unit_mode)); + printf(" . . .\n"); + } + + printf("%7u ", e.pnumber); + printf("%8s/%-6s ", btrfs_group_type_str(e.flags), + btrfs_group_profile_str(e.flags)); + printf("%12s ", pretty_size_mode(e.start, unit_mode)); + printf("%12s ", pretty_size_mode(e.length, unit_mode)); + printf("%12s ", + pretty_size_mode(e.start + e.length - 1, unit_mode)); + printf("%6llu ", e.age); + printf("%14s", pretty_size_mode(e.lstart, unit_mode)); + if (with_usage) + printf(" %5.2f", (float)e.used / e.length * 100); + printf("\n"); + + lastend = e.start + e.length; + chidx++; + } +} + +static u64 fill_usage(int fd, u64 lstart) +{ + struct btrfs_ioctl_search_args args; + struct btrfs_ioctl_search_key *sk = &args.key; + struct btrfs_ioctl_search_header sh; + struct btrfs_block_group_item *item; + int ret; + + memset(&args, 0, sizeof(args)); + sk->tree_id = BTRFS_EXTENT_TREE_OBJECTID; + sk->min_objectid = lstart; + sk->max_objectid = lstart; + sk->min_type = BTRFS_BLOCK_GROUP_ITEM_KEY; + sk->max_type = BTRFS_BLOCK_GROUP_ITEM_KEY; + sk->min_offset = 0; + sk->max_offset = (u64)-1; + sk->max_transid = (u64)-1; + + sk->nr_items = 4096; + ret = ioctl(fd, BTRFS_IOC_TREE_SEARCH, &args); + if (ret < 0) { + error("cannot perform the search: %s", strerror(errno)); + return 1; + } + if (sk->nr_items == 0) { + warning("blockgroup %llu not found", + (unsigned long long)lstart); + return 0; + } + if (sk->nr_items > 1) { + warning("found more than one blockgroup %llu", + (unsigned long long)lstart); + } + + memcpy(&sh, args.buf, sizeof(sh)); + item = (struct btrfs_block_group_item*)(args.buf + sizeof(sh)); + + return item->used; +} + +static int cmd_dump_chunks(int argc, char **argv) +{ + struct btrfs_ioctl_search_args args; + struct btrfs_ioctl_search_key *sk = &args.key; + struct btrfs_ioctl_search_header sh; + unsigned long off = 0; + u64 *age = 0; + unsigned age_size = 128; + int ret; + int fd; + int i; + int e; + DIR *dirstream = NULL; + unsigned unit_mode; + unsigned sort_mode = 0; + int with_usage = 0; + const char *path; + struct dump_chunks_ctx ctx = { + .length = 0, + .size = 1024, + .stats = NULL + }; + + unit_mode = get_unit_mode_from_arg(&argc, argv, 0); + + while (1) { + int c; + enum { GETOPT_VAL_SORT = 256, GETOPT_VAL_USAGE }; + static const struct option long_options[] = { + {"sort", required_argument, NULL, GETOPT_VAL_SORT }, + {"usage", no_argument, NULL, GETOPT_VAL_USAGE }, + {NULL, 0, NULL, 0} + }; + + c = getopt_long(argc, argv, "", long_options, NULL); + if (c < 0) + break; + + switch (c) { + case GETOPT_VAL_SORT: + if (strcmp(optarg, "pstart") == 0) { + sort_mode = CHUNK_SORT_PSTART; + } else if (strcmp(optarg, "lstart") == 0) { + sort_mode = CHUNK_SORT_LSTART; + } else { + error("unknown sort mode: %s", optarg); + exit(1); + } + break; + case GETOPT_VAL_USAGE: + with_usage = 1; + break; + default: + usage(cmd_dump_chunks_usage); + } + } + + if (check_argc_exact(argc - optind, 1)) + usage(cmd_dump_chunks_usage); + + ctx.stats = calloc(ctx.size, sizeof(ctx.stats[0])); + if (!ctx.stats) + goto out_nomem; + + path = argv[optind]; + + fd = open_file_or_dir(path, &dirstream); + if (fd < 0) { + error("cannot access '%s': %s", path, strerror(errno)); + return 1; + } + + memset(&args, 0, sizeof(args)); + sk->tree_id = BTRFS_CHUNK_TREE_OBJECTID; + sk->min_objectid = BTRFS_FIRST_CHUNK_TREE_OBJECTID; + sk->max_objectid = BTRFS_FIRST_CHUNK_TREE_OBJECTID; + sk->min_type = BTRFS_CHUNK_ITEM_KEY; + sk->max_type = BTRFS_CHUNK_ITEM_KEY; + sk->max_offset = (u64)-1; + sk->max_transid = (u64)-1; + age = calloc(age_size, sizeof(u64)); + if (!age) + goto out_nomem; + + while (1) { + sk->nr_items = 4096; + ret = ioctl(fd, BTRFS_IOC_TREE_SEARCH, &args); + e = errno; + if (ret < 0) { + error("cannot perform the search: %s", strerror(e)); + return 1; + } + if (sk->nr_items == 0) + break; + + off = 0; + for (i = 0; i < sk->nr_items; i++) { + struct btrfs_chunk *item; + struct btrfs_stripe *stripes; + int sidx; + u64 used = (u64)-1; + + memcpy(&sh, args.buf + off, sizeof(sh)); + off += sizeof(sh); + item = (struct btrfs_chunk*)(args.buf + off); + off += sh.len; + + stripes = &item->stripe; + for (sidx = 0; sidx < item->num_stripes; sidx++) { + struct dump_chunks_entry *e; + u64 devid; + + e = &ctx.stats[ctx.length]; + devid = stripes[sidx].devid; + e->devid = devid; + e->start = stripes[sidx].offset; + e->lstart = sh.offset; + e->length = item->length; + e->flags = item->type; + e->pnumber = -1; + while (devid > age_size) { + u64 *tmp; + unsigned old_size = age_size; + + age_size += 128; + tmp = calloc(age_size, sizeof(u64)); + if (!tmp) { + free(age); + goto out_nomem; + } + memcpy(tmp, age, sizeof(u64) * old_size); + age = tmp; + } + e->age = age[devid]++; + if (with_usage) { + if (used == (u64)-1) + used = fill_usage(fd, sh.offset); + e->used = used; + } else { + e->used = 0; + } + + ctx.length++; + + if (ctx.length == ctx.size) { + ctx.size += 1024; + ctx.stats = realloc(ctx.stats, ctx.size + * sizeof(ctx.stats[0])); + if (!ctx.stats) + goto out_nomem; + } + } + + sk->min_objectid = sh.objectid; + sk->min_type = sh.type; + sk->min_offset = sh.offset; + } + if (sk->min_offset < (u64)-1) + sk->min_offset++; + else + break; + } + + print_dump_chunks(&ctx, sort_mode, unit_mode, with_usage); + free(ctx.stats); + + close_file_or_dir(fd, dirstream); + return 0; + +out_nomem: + error("not enough memory"); + return 1; +} + static const char inspect_cmd_group_info[] = "query various internal information"; @@ -644,6 +1006,8 @@ const struct cmd_group inspect_cmd_group = { cmd_inspect_dump_super_usage, NULL, 0 }, { "tree-stats", cmd_inspect_tree_stats, cmd_inspect_tree_stats_usage, NULL, 0 }, + { "dump-chunks", cmd_dump_chunks, cmd_dump_chunks_usage, NULL, + 0 }, NULL_CMD_STRUCT } };