diff mbox series

[v5,13/15] bugreport: add packed object summary

Message ID 20200124033436.81097-14-emilyshaffer@google.com (mailing list archive)
State New, archived
Headers show
Series add git-bugreport tool | expand

Commit Message

Emily Shaffer Jan. 24, 2020, 3:34 a.m. UTC
From: Emily Shaffer <emilyshaffer@google.com>

Alongside the loose object counts, it can be useful to show the number
of packs and packed objects. This way we can check whether the repo has
an appropriate ratio of packed to loose objects to help determine
whether it's behaving correctly.

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
---
 bugreport.c | 38 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 38 insertions(+)

Comments

Junio C Hamano Feb. 4, 2020, 7 p.m. UTC | #1
emilyshaffer@google.com writes:

> From: Emily Shaffer <emilyshaffer@google.com>
>
> Alongside the loose object counts, it can be useful to show the number
> of packs and packed objects. This way we can check whether the repo has
> an appropriate ratio of packed to loose objects to help determine
> whether it's behaving correctly.

This lets the enumeration machinery to enumerate at the individual
object level, relies on the enumeration machinery to show your
callback all objects from the same pack before an object from a
different pack, to count both objects and packs.

But given that packfiles record how many objects there are in the
pack, and that the packfile.c layer must know what are the pack
files we have available, I wonder if we have an API already to allow
us to enumerate each packfile, and while counting the number of
times our callback was called, add the objects contained in each of
them.  If there isn't, it does not sound like a bad API function to
add (i.e. given a repository r, iterate over r->objects->packed_git
and let the API caller see each of them; the API caller can grab
num_objects and add them up).
Junio C Hamano Feb. 4, 2020, 7:03 p.m. UTC | #2
emilyshaffer@google.com writes:

> From: Emily Shaffer <emilyshaffer@google.com>
>
> Alongside the loose object counts, it can be useful to show the number
> of packs and packed objects. This way we can check whether the repo has
> an appropriate ratio of packed to loose objects to help determine
> whether it's behaving correctly.

This step makes me wonder if we want to see the midx as well.
Didn't we have a bug that manifests only when midx is (or is not) in
use?

Similarly, I think use (or non-use) of the commit graph may also be
a useful information for diagnosing.
Emily Shaffer Feb. 5, 2020, 3:09 a.m. UTC | #3
On Tue, Feb 04, 2020 at 11:03:57AM -0800, Junio C Hamano wrote:
> emilyshaffer@google.com writes:
> 
> > From: Emily Shaffer <emilyshaffer@google.com>
> >
> > Alongside the loose object counts, it can be useful to show the number
> > of packs and packed objects. This way we can check whether the repo has
> > an appropriate ratio of packed to loose objects to help determine
> > whether it's behaving correctly.
> 
> This step makes me wonder if we want to see the midx as well.
> Didn't we have a bug that manifests only when midx is (or is not) in
> use?
> 
> Similarly, I think use (or non-use) of the commit graph may also be
> a useful information for diagnosing.
> 

Sure. I'll look into both of these. Would you rather see them as add-ons
later, or do you want me to hold off on a v6 of this set until they're
ready?

 - Emily
Emily Shaffer Feb. 5, 2020, 3:15 a.m. UTC | #4
On Tue, Feb 04, 2020 at 11:00:41AM -0800, Junio C Hamano wrote:
> emilyshaffer@google.com writes:
> 
> > From: Emily Shaffer <emilyshaffer@google.com>
> >
> > Alongside the loose object counts, it can be useful to show the number
> > of packs and packed objects. This way we can check whether the repo has
> > an appropriate ratio of packed to loose objects to help determine
> > whether it's behaving correctly.
> 
> This lets the enumeration machinery to enumerate at the individual
> object level, relies on the enumeration machinery to show your
> callback all objects from the same pack before an object from a
> different pack, to count both objects and packs.
> 
> But given that packfiles record how many objects there are in the
> pack, and that the packfile.c layer must know what are the pack
> files we have available, I wonder if we have an API already to allow
> us to enumerate each packfile, and while counting the number of
> times our callback was called, add the objects contained in each of
> them.  If there isn't, it does not sound like a bad API function to
> add (i.e. given a repository r, iterate over r->objects->packed_git
> and let the API caller see each of them; the API caller can grab
> num_objects and add them up).

Yeah, I don't see one. I'll go ahead and add it - that sounds
straightforward enough.

 - Emily
diff mbox series

Patch

diff --git a/bugreport.c b/bugreport.c
index bf10857183..45cc1764e0 100644
--- a/bugreport.c
+++ b/bugreport.c
@@ -214,6 +214,41 @@  static void get_loose_object_summary(struct strbuf *obj_info) {
 		    total_count_questionable ? " (problem during count)" : "");
 }
 
+struct packed_object_cb_data {
+	struct packed_git *last_pack;
+	int pack_count;
+	int object_count;
+};
+
+static int packed_object_cb(const struct object_id *oid,
+			    struct packed_git *pack,
+			    uint32_t pos,
+			    void *data) {
+	struct packed_object_cb_data *cb_data = data;
+
+	if (!cb_data)
+		return 1;
+
+	if (pack && pack != cb_data->last_pack) {
+		cb_data->last_pack = pack;
+		cb_data->pack_count++;
+	}
+
+	cb_data->object_count++;
+
+	return 0;
+}
+
+static void get_packed_object_summary(struct strbuf *obj_info)
+{
+	struct packed_object_cb_data cb_data = {NULL, 0, 0};
+
+	for_each_packed_object(packed_object_cb, &cb_data, 0);
+
+	strbuf_addf(obj_info, "%d total packs (%d objects)\n",
+		    cb_data.pack_count, cb_data.object_count);
+}
+
 static const char * const bugreport_usage[] = {
 	N_("git bugreport [-o|--output <file>]"),
 	NULL
@@ -292,6 +327,9 @@  int cmd_main(int argc, const char **argv)
 	get_header(&buffer, "Loose Object Counts");
 	get_loose_object_summary(&buffer);
 
+	get_header(&buffer, "Packed Object Summary");
+	get_packed_object_summary(&buffer);
+
 	report = fopen_for_writing(report_path.buf);
 	strbuf_write(&buffer, report);
 	fclose(report);