diff mbox series

[v2,06/10] builtin/diagnose.c: create 'git diagnose' builtin

Message ID 73e139ee377f9c50e671b0d94a28b93c1db28a69.1659577543.git.gitgitgadget@gmail.com (mailing list archive)
State Superseded
Headers show
Series Generalize 'scalar diagnose' into 'git diagnose' and 'git bugreport --diagnose' | expand

Commit Message

Victoria Dye Aug. 4, 2022, 1:45 a.m. UTC
From: Victoria Dye <vdye@github.com>

Create a 'git diagnose' builtin to generate a standalone zip archive of
repository diagnostics.

The "diagnose" functionality was originally implemented for Scalar in
aa5c79a331 (scalar: implement `scalar diagnose`, 2022-05-28). However, the
diagnostics gathered are not specific to Scalar-cloned repositories and
can be useful when diagnosing issues in any Git repository.

Signed-off-by: Victoria Dye <vdye@github.com>
---
 .gitignore                     |  1 +
 Documentation/git-diagnose.txt | 52 ++++++++++++++++++++++++++++++
 Makefile                       |  1 +
 builtin.h                      |  1 +
 builtin/diagnose.c             | 58 ++++++++++++++++++++++++++++++++++
 git.c                          |  1 +
 t/t0092-diagnose.sh            | 28 ++++++++++++++++
 7 files changed, 142 insertions(+)
 create mode 100644 Documentation/git-diagnose.txt
 create mode 100644 builtin/diagnose.c
 create mode 100755 t/t0092-diagnose.sh

Comments

Ævar Arnfjörð Bjarmason Aug. 4, 2022, 6:27 a.m. UTC | #1
On Thu, Aug 04 2022, Victoria Dye via GitGitGadget wrote:

> From: Victoria Dye <vdye@github.com>
>
> Create a 'git diagnose' builtin to generate a standalone zip archive of
> repository diagnostics.

It's good to have this as a built-in separate from "git bugreport",
but...

> +git-diagnose - Generate a zip archive of diagnostic information

...I'd really prefer for this not to squat on such a common name we
might regret having reserved later for such very specific
functionality. I'd think e.g. these would be better:

	git mk-diagnostics-zip

Or maybe:

	git archive-interesting-for-report

If I had to guess what a "git diagnose" did, I'd probably think:

 * It analyzes your config, and suggests redundancies/alternatives
 * It does some perf tests / heuritics, and e.g. suggests you turn on
   the commit-graph writing.

etc., this (arguably even too generic then) made sense as "scalar
diagnose" because scalar is all about being an opinionated interface
targeted at performance", so there's an implied "my repo's performance"
following a "scalar diagnose".

But as a top-level command-name I think we should pick something more
specific to what it does, which is (I haven't fully read ahead in the
re-roll, but I'm assuming is) mostly/entirely to be a "helper" for
"scalar diagnose" and/or "git bugreport".

> +	switch (safe_create_leading_directories(zip_path.buf)) {
> +	case SCLD_OK:
> +	case SCLD_EXISTS:
> +		break;
> +	default:
> +		die(_("could not create leading directories for '%s'"),
> +		    zip_path.buf);

This seems to be carrying forward a minor bug from bugreport.c we should
probably fix: we should use die_errno() here (and maybe lead with a
commit to fix bugreport.c's version).

The strbuf*() before that also seems like a good candidate for a utility
function in your new diagnose library, i.e. to have both bugreport.c and
diagnose.c pass it the prefix/suffix/format, then try to create that
directory, then replace the copy/pasting here with a one-line call to
that now-shared function.

The two codepaths only seem to differ in the prefix & suffix from a
quick skim, the rest is all copy/pasted...
Derrick Stolee Aug. 5, 2022, 7:11 p.m. UTC | #2
On 8/3/2022 9:45 PM, Victoria Dye via GitGitGadget wrote:
> From: Victoria Dye <vdye@github.com>
> 
> Create a 'git diagnose' builtin to generate a standalone zip archive of
> repository diagnostics.

> +  * The contents of the `.git`, `.git/hooks`, `.git/info`, `.git/logs`, and
> +    `.git/objects/info` directories

You remove these lines in the next patch, which is called "gate certain data
behind '--all'" but maybe we shouldn't have this functionality now and
instead add it in the future.

The biggest reason for the --all option is that these contents will likely
include private IP (path names and branch names, but not file contents) that
the user would probably not want to share with the public mailing list, but
might want to share with a trusted Git expert in order to resolve a problem.
You mention earlier that

  The generated archive can then, for example, be shared with the
  Git mailing list to help debug an issue or serve as a reference for
  independent debugging.

So, if you're sending a v3, then moving this out of this patch and into the
next one would be a good way to be sure that this possibly-private data is
not mentioned as something to share super publicly.

(Of course, this requires making the change to create_diagnostics_archive()
in advance of creating the builtin, so maybe this reorganization isn't
worth it.)

> @@ -0,0 +1,58 @@
> +#include "builtin.h"
> +#include "parse-options.h"
> +#include "diagnose.h"
> +
> +

nit: double empty line

> +++ b/t/t0092-diagnose.sh
> @@ -0,0 +1,28 @@
> +#!/bin/sh
> +
> +test_description='git diagnose'
> +
> +TEST_PASSES_SANITIZE_LEAK=true
> +. ./test-lib.sh
> +
> +test_expect_success UNZIP 'creates diagnostics zip archive' '
> +	test_when_finished rm -rf report &&
> +
> +	git diagnose -o report -s test >out &&
> +
> +	zip_path=report/git-diagnostics-test.zip &&
> +	grep "Available space" out &&
> +	test_path_is_file "$zip_path" &&

nit: 'grep' the output immediately after the 'git diagnose' command and
keep the zip_path use immediately after its definition.

Thanks,
-Stolee
Derrick Stolee Aug. 5, 2022, 7:38 p.m. UTC | #3
On 8/4/2022 2:27 AM, Ævar Arnfjörð Bjarmason wrote:
> 
> On Thu, Aug 04 2022, Victoria Dye via GitGitGadget wrote:
> 
>> From: Victoria Dye <vdye@github.com>
>>
>> Create a 'git diagnose' builtin to generate a standalone zip archive of
>> repository diagnostics.
> 
> It's good to have this as a built-in separate from "git bugreport",
> but...
> 
>> +git-diagnose - Generate a zip archive of diagnostic information
> 
> ...I'd really prefer for this not to squat on such a common name we
> might regret having reserved later for such very specific
> functionality. I'd think e.g. these would be better:
> 
> 	git mk-diagnostics-zip
> 
> Or maybe:
> 
> 	git archive-interesting-for-report

These are not realistic replacements.

> If I had to guess what a "git diagnose" did, I'd probably think:
> 
>  * It analyzes your config, and suggests redundancies/alternatives
>  * It does some perf tests / heuritics, and e.g. suggests you turn on
>    the commit-graph writing.

These sound like great options to add in the future, such as:

   --perf-test: Run performance tests on your repository using different
   Git config options and recommend certain settings.

(This --perf-test option would be a great way to get wider adoption
of parallel checkout, since its optimal settings are so machine
dependent.)

The thing is, even if we did these other things, it would result in
some kind of document that summarizes the repository shape and features.
That kind of data is exactly what this version of 'git diagnose' does.

For now, it leaves the human reader responsible for making decisions
based on those documents, but they have been incredibly helpful when we
are _diagnosing_ issues users are having with their repositories.

Thanks,
-Stolee
Ævar Arnfjörð Bjarmason Aug. 11, 2022, 11:06 a.m. UTC | #4
On Fri, Aug 05 2022, Derrick Stolee wrote:

> On 8/4/2022 2:27 AM, Ævar Arnfjörð Bjarmason wrote:
>> 
>> On Thu, Aug 04 2022, Victoria Dye via GitGitGadget wrote:
>> 
>>> From: Victoria Dye <vdye@github.com>
>>>
>>> Create a 'git diagnose' builtin to generate a standalone zip archive of
>>> repository diagnostics.
>> 
>> It's good to have this as a built-in separate from "git bugreport",
>> but...
>> 
>>> +git-diagnose - Generate a zip archive of diagnostic information
>> 
>> ...I'd really prefer for this not to squat on such a common name we
>> might regret having reserved later for such very specific
>> functionality. I'd think e.g. these would be better:
>> 
>> 	git mk-diagnostics-zip
>> 
>> Or maybe:
>> 
>> 	git archive-interesting-for-report
>
> These are not realistic replacements.

Maybe:

	git diagnose create-zip

?

>> If I had to guess what a "git diagnose" did, I'd probably think:
>> 
>>  * It analyzes your config, and suggests redundancies/alternatives
>>  * It does some perf tests / heuritics, and e.g. suggests you turn on
>>    the commit-graph writing.
>
> These sound like great options to add in the future, such as:
>
>    --perf-test: Run performance tests on your repository using different
>    Git config options and recommend certain settings.
>
> (This --perf-test option would be a great way to get wider adoption
> of parallel checkout, since its optimal settings are so machine
> dependent.)

...

> The thing is, even if we did these other things, it would result in
> some kind of document that summarizes the repository shape and features.
> That kind of data is exactly what this version of 'git diagnose' does.

I think a command like "git diagnose" that had options to do other
unrelated stuff, but by default created a zip archive when given no
options would be rather confusing.

Yes, it makes sense to emit some human-readable summary, but to zip it
up as well? That's something we just need for the "git bugreport"
case...

> For now, it leaves the human reader responsible for making decisions
> based on those documents, but they have been incredibly helpful when we
> are _diagnosing_ issues users are having with their repositories.

This is orthagonal to what I'm pointing out. You're basically saying the
user can just read the documentation to find out what this built-in
does.

That's true, what I'm pointing out is that it's unfortunate that such
highly specific functionality is squatting on such a short & generic
name, but just e.g. adding a "create-zip" sub-command would address
that.
diff mbox series

Patch

diff --git a/.gitignore b/.gitignore
index 42fd7253b44..80b530bbed2 100644
--- a/.gitignore
+++ b/.gitignore
@@ -53,6 +53,7 @@ 
 /git-cvsimport
 /git-cvsserver
 /git-daemon
+/git-diagnose
 /git-diff
 /git-diff-files
 /git-diff-index
diff --git a/Documentation/git-diagnose.txt b/Documentation/git-diagnose.txt
new file mode 100644
index 00000000000..b12ef98f013
--- /dev/null
+++ b/Documentation/git-diagnose.txt
@@ -0,0 +1,52 @@ 
+git-diagnose(1)
+================
+
+NAME
+----
+git-diagnose - Generate a zip archive of diagnostic information
+
+SYNOPSIS
+--------
+[verse]
+'git diagnose' [(-o | --output-directory) <path>] [(-s | --suffix) <format>]
+
+DESCRIPTION
+-----------
+Collects detailed information about the user's machine, Git client, and
+repository state and packages that information into a zip archive. The
+generated archive can then, for example, be shared with the Git mailing list to
+help debug an issue or serve as a reference for independent debugging.
+
+The following information is captured in the archive:
+
+  * 'git version --build-options'
+  * The path to the repository root
+  * The available disk space on the filesystem
+  * The name and size of each packfile, including those in alternate object
+    stores
+  * The total count of loose objects, as well as counts broken down by
+    `.git/objects` subdirectory
+  * The contents of the `.git`, `.git/hooks`, `.git/info`, `.git/logs`, and
+    `.git/objects/info` directories
+
+This tool differs from linkgit:git-bugreport[1] in that it collects much more
+detailed information with a greater focus on reporting the size and data shape
+of repository contents.
+
+OPTIONS
+-------
+-o <path>::
+--output-directory <path>::
+	Place the resulting diagnostics archive in `<path>` instead of the
+	current directory.
+
+-s <format>::
+--suffix <format>::
+	Specify an alternate suffix for the diagnostics archive name, to create
+	a file named 'git-diagnostics-<formatted suffix>'. This should take the
+	form of a strftime(3) format string; the current local time will be
+	used.
+
+GIT
+---
+Part of the linkgit:git[1] suite
diff --git a/Makefile b/Makefile
index ad27a0bd70c..9ceaf55582a 100644
--- a/Makefile
+++ b/Makefile
@@ -1154,6 +1154,7 @@  BUILTIN_OBJS += builtin/credential-cache.o
 BUILTIN_OBJS += builtin/credential-store.o
 BUILTIN_OBJS += builtin/credential.o
 BUILTIN_OBJS += builtin/describe.o
+BUILTIN_OBJS += builtin/diagnose.o
 BUILTIN_OBJS += builtin/diff-files.o
 BUILTIN_OBJS += builtin/diff-index.o
 BUILTIN_OBJS += builtin/diff-tree.o
diff --git a/builtin.h b/builtin.h
index 40e9ecc8485..8901a34d6bf 100644
--- a/builtin.h
+++ b/builtin.h
@@ -144,6 +144,7 @@  int cmd_credential_cache(int argc, const char **argv, const char *prefix);
 int cmd_credential_cache_daemon(int argc, const char **argv, const char *prefix);
 int cmd_credential_store(int argc, const char **argv, const char *prefix);
 int cmd_describe(int argc, const char **argv, const char *prefix);
+int cmd_diagnose(int argc, const char **argv, const char *prefix);
 int cmd_diff_files(int argc, const char **argv, const char *prefix);
 int cmd_diff_index(int argc, const char **argv, const char *prefix);
 int cmd_diff(int argc, const char **argv, const char *prefix);
diff --git a/builtin/diagnose.c b/builtin/diagnose.c
new file mode 100644
index 00000000000..c545c6bae1d
--- /dev/null
+++ b/builtin/diagnose.c
@@ -0,0 +1,58 @@ 
+#include "builtin.h"
+#include "parse-options.h"
+#include "diagnose.h"
+
+
+static const char * const diagnose_usage[] = {
+	N_("git diagnose [-o|--output-directory <file>] [-s|--suffix <format>]"),
+	NULL
+};
+
+int cmd_diagnose(int argc, const char **argv, const char *prefix)
+{
+	struct strbuf zip_path = STRBUF_INIT;
+	time_t now = time(NULL);
+	struct tm tm;
+	char *option_output = NULL;
+	char *option_suffix = "%Y-%m-%d-%H%M";
+	char *prefixed_filename;
+
+	const struct option diagnose_options[] = {
+		OPT_STRING('o', "output-directory", &option_output, N_("path"),
+			   N_("specify a destination for the diagnostics archive")),
+		OPT_STRING('s', "suffix", &option_suffix, N_("format"),
+			   N_("specify a strftime format suffix for the filename")),
+		OPT_END()
+	};
+
+	argc = parse_options(argc, argv, prefix, diagnose_options,
+			     diagnose_usage, 0);
+
+	/* Prepare the path to put the result */
+	prefixed_filename = prefix_filename(prefix,
+					    option_output ? option_output : "");
+	strbuf_addstr(&zip_path, prefixed_filename);
+	strbuf_complete(&zip_path, '/');
+
+	strbuf_addstr(&zip_path, "git-diagnostics-");
+	strbuf_addftime(&zip_path, option_suffix, localtime_r(&now, &tm), 0, 0);
+	strbuf_addstr(&zip_path, ".zip");
+
+	switch (safe_create_leading_directories(zip_path.buf)) {
+	case SCLD_OK:
+	case SCLD_EXISTS:
+		break;
+	default:
+		die(_("could not create leading directories for '%s'"),
+		    zip_path.buf);
+	}
+
+	/* Prepare diagnostics */
+	if (create_diagnostics_archive(&zip_path))
+		die_errno(_("unable to create diagnostics archive %s"),
+			  zip_path.buf);
+
+	free(prefixed_filename);
+	strbuf_release(&zip_path);
+	return 0;
+}
diff --git a/git.c b/git.c
index e5d62fa5a92..0b9d8ef7677 100644
--- a/git.c
+++ b/git.c
@@ -522,6 +522,7 @@  static struct cmd_struct commands[] = {
 	{ "credential-cache--daemon", cmd_credential_cache_daemon },
 	{ "credential-store", cmd_credential_store },
 	{ "describe", cmd_describe, RUN_SETUP },
+	{ "diagnose", cmd_diagnose, RUN_SETUP_GENTLY },
 	{ "diff", cmd_diff, NO_PARSEOPT },
 	{ "diff-files", cmd_diff_files, RUN_SETUP | NEED_WORK_TREE | NO_PARSEOPT },
 	{ "diff-index", cmd_diff_index, RUN_SETUP | NO_PARSEOPT },
diff --git a/t/t0092-diagnose.sh b/t/t0092-diagnose.sh
new file mode 100755
index 00000000000..fa05bf6046f
--- /dev/null
+++ b/t/t0092-diagnose.sh
@@ -0,0 +1,28 @@ 
+#!/bin/sh
+
+test_description='git diagnose'
+
+TEST_PASSES_SANITIZE_LEAK=true
+. ./test-lib.sh
+
+test_expect_success UNZIP 'creates diagnostics zip archive' '
+	test_when_finished rm -rf report &&
+
+	git diagnose -o report -s test >out &&
+
+	zip_path=report/git-diagnostics-test.zip &&
+	grep "Available space" out &&
+	test_path_is_file "$zip_path" &&
+
+	# Check zipped archive content
+	"$GIT_UNZIP" -p "$zip_path" diagnostics.log >out &&
+	test_file_not_empty out &&
+
+	"$GIT_UNZIP" -p "$zip_path" packs-local.txt >out &&
+	grep ".git/objects" out &&
+
+	"$GIT_UNZIP" -p "$zip_path" objects-local.txt >out &&
+	grep "^Total: [0-9][0-9]*" out
+'
+
+test_done