diff mbox series

[v3,4/9] bugreport: add config values from whitelist

Message ID 20191025025129.250049-5-emilyshaffer@google.com (mailing list archive)
State New, archived
Headers show
Series add git-bugreport tool | expand

Commit Message

Emily Shaffer Oct. 25, 2019, 2:51 a.m. UTC
Teach bugreport to gather the values of config options which are present
in 'git-bugreport-config-whitelist'.

Many config options are sensitive, and many Git add-ons use config
options which git-core does not know about; it is better only to gather
config options which we know to be safe, rather than excluding options
which we know to be unsafe.

Reading the whitelist into memory and sorting it saves us time -
since git_config_bugreport() is called for every option the user has
configured, limiting the file IO to one open/read/close and performing
option lookup in sublinear time is a useful optimization.

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
---
 bugreport.c         | 50 +++++++++++++++++++++++++++++++++++++++++++++
 bugreport.h         |  7 +++++++
 builtin/bugreport.c |  4 ++++
 3 files changed, 61 insertions(+)

Comments

Johannes Schindelin Oct. 28, 2019, 2:14 p.m. UTC | #1
Hi Emily,

On Thu, 24 Oct 2019, Emily Shaffer wrote:

> Teach bugreport to gather the values of config options which are present
> in 'git-bugreport-config-whitelist'.
>
> Many config options are sensitive, and many Git add-ons use config
> options which git-core does not know about; it is better only to gather
> config options which we know to be safe, rather than excluding options
> which we know to be unsafe.

Should we still have the `// bugreport-exclude` comments, then?

>
> Reading the whitelist into memory and sorting it saves us time -
> since git_config_bugreport() is called for every option the user has
> configured, limiting the file IO to one open/read/close and performing
> option lookup in sublinear time is a useful optimization.

Maybe we even want a hashmap? That would reduce the time complexity even
further.

> diff --git a/bugreport.c b/bugreport.c
> index ada54fe583..afa4836ab1 100644
> --- a/bugreport.c
> +++ b/bugreport.c
> @@ -1,10 +1,24 @@
>  #include "cache.h"
>
>  #include "bugreport.h"
> +#include "config.h"
> +#include "exec-cmd.h"
>  #include "help.h"
>  #include "run-command.h"
>  #include "version.h"
>
> +/**
> + * A sorted list of config options which we will add to the bugreport. Managed
> + * by 'gather_whitelist(...)'.
> + */
> +struct string_list whitelist = STRING_LIST_INIT_DUP;
> +struct strbuf configs_and_values = STRBUF_INIT;
> +
> +// git version --build-options
> +// uname -a
> +// curl-config --version
> +// ldd --version
> +// echo $SHELL

These comments probably want to move to a single, C style comment, and
they probably want to be introduced together with `get_system_info()`.

I also have to admit that I might have missed where `$SHELL` was added
to the output...

>  void get_system_info(struct strbuf *sys_info)
>  {
>  	struct child_process cp = CHILD_PROCESS_INIT;
> @@ -53,3 +67,39 @@ void get_system_info(struct strbuf *sys_info)
>  	argv_array_clear(&cp.args);
>  	strbuf_reset(&std_out);
>  }
> +
> +void gather_whitelist(struct strbuf *path)
> +{
> +	struct strbuf tmp = STRBUF_INIT;
> +	strbuf_read_file(&tmp, path->buf, 0);
> +	string_list_init(&whitelist, 1);
> +	string_list_split(&whitelist, tmp.buf, '\n', -1);
> +	string_list_sort(&whitelist);
> +}
> +
> +int git_config_bugreport(const char *var, const char *value, void *cb)
> +{
> +	if (string_list_has_string(&whitelist, var)) {
> +		strbuf_addf(&configs_and_values,
> +			    "%s : %s\n",
> +			    var, value);

A quite useful piece of information would be the config source. Not sure
whether we can do that outside of `config.c` yet...

> +	}
> +
> +	return 0;
> +}
> +
> +void get_whitelisted_config(struct strbuf *config_info)
> +{
> +	struct strbuf path = STRBUF_INIT;
> +
> +	strbuf_addstr(&path, git_exec_path());
> +	strbuf_addstr(&path, "/git-bugreport-config-whitelist");

Hmm. I would have expected this patch to come directly after the patch
2/9 that generates that white-list, and I would also have expected that
to be pre-sorted, and compiled in.

Do you want users to _edit_ the file in the exec path? In general, that
path will be write-protected, though. A better alternative would
probably be to compile in a hard-coded list, and to allow including more
values e.g. by offering command-line options to specify config setting
patterns. But if we allow patterns, we might actually want to have those
exclusions to prevent sensitive data from being included.

> +
> +	gather_whitelist(&path);
> +	strbuf_init(&configs_and_values, whitelist.nr);
> +
> +	git_config(git_config_bugreport, NULL);
> +
> +	strbuf_reset(config_info);
> +	strbuf_addbuf(config_info, &configs_and_values);
> +}
> diff --git a/bugreport.h b/bugreport.h
> index ba216acf3f..7413e7e1be 100644
> --- a/bugreport.h
> +++ b/bugreport.h
> @@ -5,3 +5,10 @@
>   * The previous contents of sys_info will be discarded.
>   */
>  void get_system_info(struct strbuf *sys_info);
> +
> +/**

I also frequently use JavaDoc-style `/**`, but I am not sure that this
is actually desired in Git's source code ;-)

> + * Adds the values of the config items listed in
> + * 'git-bugreport-config-whitelist' to config_info. The previous contents of
> + * config_info will be discarded.
> + */
> +void get_whitelisted_config(struct strbuf *sys_info);
> diff --git a/builtin/bugreport.c b/builtin/bugreport.c
> index 7232d31be7..70fe0d2b85 100644
> --- a/builtin/bugreport.c
> +++ b/builtin/bugreport.c
> @@ -56,6 +56,10 @@ int cmd_bugreport(int argc, const char **argv, const char *prefix)
>  	get_system_info(&buffer);
>  	strbuf_write(&buffer, report);
>
> +	add_header(report, "Whitelisted Config");

Quite honestly, I would like to avoid the term "whitelist" for good. How
about "Selected config settings" instead?

Thanks,
Dscho

> +	get_whitelisted_config(&buffer);
> +	strbuf_write(&buffer, report);
> +
>  	fclose(report);
>
>  	launch_editor(report_path.buf, NULL, NULL);
> --
> 2.24.0.rc0.303.g954a862665-goog
>
>
Josh Steadmon Oct. 29, 2019, 8:58 p.m. UTC | #2
On 2019.10.24 19:51, Emily Shaffer wrote:
[snip]
> diff --git a/bugreport.c b/bugreport.c
> index ada54fe583..afa4836ab1 100644
> --- a/bugreport.c
> +++ b/bugreport.c
> @@ -1,10 +1,24 @@
>  #include "cache.h"
>  
>  #include "bugreport.h"
> +#include "config.h"
> +#include "exec-cmd.h"
>  #include "help.h"
>  #include "run-command.h"
>  #include "version.h"
>  
> +/**
> + * A sorted list of config options which we will add to the bugreport. Managed
> + * by 'gather_whitelist(...)'.
> + */
> +struct string_list whitelist = STRING_LIST_INIT_DUP;
> +struct strbuf configs_and_values = STRBUF_INIT;
> +
> +// git version --build-options
> +// uname -a
> +// curl-config --version
> +// ldd --version
> +// echo $SHELL
>  void get_system_info(struct strbuf *sys_info)
>  {
>  	struct child_process cp = CHILD_PROCESS_INIT;
> @@ -53,3 +67,39 @@ void get_system_info(struct strbuf *sys_info)
>  	argv_array_clear(&cp.args);
>  	strbuf_reset(&std_out);
>  }
> +
> +void gather_whitelist(struct strbuf *path)

This and git_config_bugreport() below should both be static as well.
Rather than repeating advice on the later patches, I'll just note that
any new functions that don't show up in the corresponding .h file should
be marked static.
Junio C Hamano Oct. 30, 2019, 1:37 a.m. UTC | #3
Josh Steadmon <steadmon@google.com> writes:

> This and git_config_bugreport() below should both be static as well.
> Rather than repeating advice on the later patches, I'll just note that
> any new functions that don't show up in the corresponding .h file should
> be marked static.

Good advice.  

More importantly, given that "git bugreport" itself has no service
of itself to offer other existing parts of Git (it is just a "gather
various pieces of information from different places, and then
produce a text file output" application), I do not see much point in
it having its own header file that others would #include (i.e. the
include file is to define services that are offered by it).  If
there are common enough service routines invented to support the
need of bugreport.c (e.g. perhaps it wants to give more info than
what is currently available via the existing API on the contents of
in-core index), by definition of being them common enough, they
should be added to the header that can be used by both bugreport.c
and other existing users of the same subsystem (e.g. if it is about
in-core index, perhaps cache.h).

It makes perfect sense for bugreport.c to #include header files for
the Git internals to collect pieces of information from inside Git,
though.
Emily Shaffer Nov. 14, 2019, 9:55 p.m. UTC | #4
On Wed, Oct 30, 2019 at 10:37:13AM +0900, Junio C Hamano wrote:
> Josh Steadmon <steadmon@google.com> writes:
> 
> > This and git_config_bugreport() below should both be static as well.
> > Rather than repeating advice on the later patches, I'll just note that
> > any new functions that don't show up in the corresponding .h file should
> > be marked static.
> 
> Good advice.  
> 
> More importantly, given that "git bugreport" itself has no service
> of itself to offer other existing parts of Git (it is just a "gather
> various pieces of information from different places, and then
> produce a text file output" application), I do not see much point in
> it having its own header file that others would #include (i.e. the
> include file is to define services that are offered by it).  If
> there are common enough service routines invented to support the
> need of bugreport.c (e.g. perhaps it wants to give more info than
> what is currently available via the existing API on the contents of
> in-core index), by definition of being them common enough, they
> should be added to the header that can be used by both bugreport.c
> and other existing users of the same subsystem (e.g. if it is about
> in-core index, perhaps cache.h).

Are you asking me to have only builtin/bugreport.c and implement each
function there, and eliminate both <basedir>/bugreport.[ch]?

It may be true that deciding to put this functionality into a library
"in case someone needs it later" was premature, so I can do that - I
just want to make sure I understand you.

 - Emily
Emily Shaffer Dec. 11, 2019, 8:48 p.m. UTC | #5
On Mon, Oct 28, 2019 at 03:14:35PM +0100, Johannes Schindelin wrote:
> Hi Emily,

Sorry for the delay in replying. This work has been backburnered and
this mail slipped through the cracks.

> 
> On Thu, 24 Oct 2019, Emily Shaffer wrote:
> 
> > Teach bugreport to gather the values of config options which are present
> > in 'git-bugreport-config-whitelist'.
> >
> > Many config options are sensitive, and many Git add-ons use config
> > options which git-core does not know about; it is better only to gather
> > config options which we know to be safe, rather than excluding options
> > which we know to be unsafe.
> 
> Should we still have the `// bugreport-exclude` comments, then?

They were optional (useless) before too. I can remove them if you want;
I suppose I like the idea of having precedent if someone wants to build
their own internal version with opt-out configs rather than opt-in. I
can remove them if we want; it doesn't matter very much to me either
way.

> 
> >
> > Reading the whitelist into memory and sorting it saves us time -
> > since git_config_bugreport() is called for every option the user has
> > configured, limiting the file IO to one open/read/close and performing
> > option lookup in sublinear time is a useful optimization.
> 
> Maybe we even want a hashmap? That would reduce the time complexity even
> further.

Sure, we can do it. I'll make that change.

> 
> > diff --git a/bugreport.c b/bugreport.c
> > index ada54fe583..afa4836ab1 100644
> > --- a/bugreport.c
> > +++ b/bugreport.c
> > @@ -1,10 +1,24 @@
> >  #include "cache.h"
> >
> >  #include "bugreport.h"
> > +#include "config.h"
> > +#include "exec-cmd.h"
> >  #include "help.h"
> >  #include "run-command.h"
> >  #include "version.h"
> >
> > +/**
> > + * A sorted list of config options which we will add to the bugreport. Managed
> > + * by 'gather_whitelist(...)'.
> > + */
> > +struct string_list whitelist = STRING_LIST_INIT_DUP;
> > +struct strbuf configs_and_values = STRBUF_INIT;
> > +
> > +// git version --build-options
> > +// uname -a
> > +// curl-config --version
> > +// ldd --version
> > +// echo $SHELL
> 
> These comments probably want to move to a single, C style comment, and
> they probably want to be introduced together with `get_system_info()`.

Yeah, it's stale and has been removed now. It was less commentary and
more todo list for author ;)

> 
> I also have to admit that I might have missed where `$SHELL` was added
> to the output...

I skipped it entirely since bugreport doesn't run in shell anymore. If
you have advice for gathering the user's shell I can try to add it; is
there such a difference between, say, a Debian user using bash and a
Debian user using zsh? I suppose it could be useful if someone has an
issue with GIT_PS1, or with autocompletion. I'll look into gathering it.

> 
> >  void get_system_info(struct strbuf *sys_info)
> >  {
> >  	struct child_process cp = CHILD_PROCESS_INIT;
> > @@ -53,3 +67,39 @@ void get_system_info(struct strbuf *sys_info)
> >  	argv_array_clear(&cp.args);
> >  	strbuf_reset(&std_out);
> >  }
> > +
> > +void gather_whitelist(struct strbuf *path)
> > +{
> > +	struct strbuf tmp = STRBUF_INIT;
> > +	strbuf_read_file(&tmp, path->buf, 0);
> > +	string_list_init(&whitelist, 1);
> > +	string_list_split(&whitelist, tmp.buf, '\n', -1);
> > +	string_list_sort(&whitelist);
> > +}
> > +
> > +int git_config_bugreport(const char *var, const char *value, void *cb)
> > +{
> > +	if (string_list_has_string(&whitelist, var)) {
> > +		strbuf_addf(&configs_and_values,
> > +			    "%s : %s\n",
> > +			    var, value);
> 
> A quite useful piece of information would be the config source. Not sure
> whether we can do that outside of `config.c` yet...

It's possible. I can add it.

> 
> > +	}
> > +
> > +	return 0;
> > +}
> > +
> > +void get_whitelisted_config(struct strbuf *config_info)
> > +{
> > +	struct strbuf path = STRBUF_INIT;
> > +
> > +	strbuf_addstr(&path, git_exec_path());
> > +	strbuf_addstr(&path, "/git-bugreport-config-whitelist");
> 
> Hmm. I would have expected this patch to come directly after the patch
> 2/9 that generates that white-list, and I would also have expected that
> to be pre-sorted, and compiled in.
> 
> Do you want users to _edit_ the file in the exec path? In general, that
> path will be write-protected, though. A better alternative would
> probably be to compile in a hard-coded list, and to allow including more
> values e.g. by offering command-line options to specify config setting
> patterns. But if we allow patterns, we might actually want to have those
> exclusions to prevent sensitive data from being included.

Hm, interesting. Do we have precedent for compiling in a header
generated during the build process? I think I saw one when I was adding
this script - I'll take a look.

> 
> > +
> > +	gather_whitelist(&path);
> > +	strbuf_init(&configs_and_values, whitelist.nr);
> > +
> > +	git_config(git_config_bugreport, NULL);
> > +
> > +	strbuf_reset(config_info);
> > +	strbuf_addbuf(config_info, &configs_and_values);
> > +}
> > diff --git a/bugreport.h b/bugreport.h
> > index ba216acf3f..7413e7e1be 100644
> > --- a/bugreport.h
> > +++ b/bugreport.h
> > @@ -5,3 +5,10 @@
> >   * The previous contents of sys_info will be discarded.
> >   */
> >  void get_system_info(struct strbuf *sys_info);
> > +
> > +/**
> 
> I also frequently use JavaDoc-style `/**`, but I am not sure that this
> is actually desired in Git's source code ;-)
> 
> > + * Adds the values of the config items listed in
> > + * 'git-bugreport-config-whitelist' to config_info. The previous contents of
> > + * config_info will be discarded.
> > + */
> > +void get_whitelisted_config(struct strbuf *sys_info);
> > diff --git a/builtin/bugreport.c b/builtin/bugreport.c
> > index 7232d31be7..70fe0d2b85 100644
> > --- a/builtin/bugreport.c
> > +++ b/builtin/bugreport.c
> > @@ -56,6 +56,10 @@ int cmd_bugreport(int argc, const char **argv, const char *prefix)
> >  	get_system_info(&buffer);
> >  	strbuf_write(&buffer, report);
> >
> > +	add_header(report, "Whitelisted Config");
> 
> Quite honestly, I would like to avoid the term "whitelist" for good. How
> about "Selected config settings" instead?

Will do - thanks for the callout.

> 
> Thanks,
> Dscho
> 
> > +	get_whitelisted_config(&buffer);
> > +	strbuf_write(&buffer, report);
> > +
> >  	fclose(report);
> >
> >  	launch_editor(report_path.buf, NULL, NULL);
> > --
> > 2.24.0.rc0.303.g954a862665-goog
> >
> >
Johannes Schindelin Dec. 15, 2019, 5:30 p.m. UTC | #6
Hi Emily,

On Wed, 11 Dec 2019, Emily Shaffer wrote:

> On Mon, Oct 28, 2019 at 03:14:35PM +0100, Johannes Schindelin wrote:
>
> >
> > On Thu, 24 Oct 2019, Emily Shaffer wrote:
> >
> > > Teach bugreport to gather the values of config options which are
> > > present in 'git-bugreport-config-whitelist'.
> > >
> > > Many config options are sensitive, and many Git add-ons use config
> > > options which git-core does not know about; it is better only to
> > > gather config options which we know to be safe, rather than
> > > excluding options which we know to be unsafe.
> >
> > Should we still have the `// bugreport-exclude` comments, then?
>
> They were optional (useless) before too. I can remove them if you want;
> I suppose I like the idea of having precedent if someone wants to build
> their own internal version with opt-out configs rather than opt-in. I
> can remove them if we want; it doesn't matter very much to me either
> way.

How are you guaranteeing that this information does not become stale,
like, immediately?

If you _still_ insist on keeping those comments even after answering this
question, at least please turn them into C-style comments: we have no
C++-style comments in Git and want to keep it this way.

> > > diff --git a/bugreport.c b/bugreport.c
> > > [...]
> > > +	}
> > > +
> > > +	return 0;
> > > +}
> > > +
> > > +void get_whitelisted_config(struct strbuf *config_info)
> > > +{
> > > +	struct strbuf path = STRBUF_INIT;
> > > +
> > > +	strbuf_addstr(&path, git_exec_path());
> > > +	strbuf_addstr(&path, "/git-bugreport-config-whitelist");
> >
> > Hmm. I would have expected this patch to come directly after the patch
> > 2/9 that generates that white-list, and I would also have expected that
> > to be pre-sorted, and compiled in.
> >
> > Do you want users to _edit_ the file in the exec path? In general, that
> > path will be write-protected, though. A better alternative would
> > probably be to compile in a hard-coded list, and to allow including more
> > values e.g. by offering command-line options to specify config setting
> > patterns. But if we allow patterns, we might actually want to have those
> > exclusions to prevent sensitive data from being included.
>
> Hm, interesting. Do we have precedent for compiling in a header
> generated during the build process? I think I saw one when I was adding
> this script - I'll take a look.

Yes, we do. The entire `command-list.h` is generated during the build.
Look for `GENERATED_H` in the `Makefile`.

Ciao,
Dscho
diff mbox series

Patch

diff --git a/bugreport.c b/bugreport.c
index ada54fe583..afa4836ab1 100644
--- a/bugreport.c
+++ b/bugreport.c
@@ -1,10 +1,24 @@ 
 #include "cache.h"
 
 #include "bugreport.h"
+#include "config.h"
+#include "exec-cmd.h"
 #include "help.h"
 #include "run-command.h"
 #include "version.h"
 
+/**
+ * A sorted list of config options which we will add to the bugreport. Managed
+ * by 'gather_whitelist(...)'.
+ */
+struct string_list whitelist = STRING_LIST_INIT_DUP;
+struct strbuf configs_and_values = STRBUF_INIT;
+
+// git version --build-options
+// uname -a
+// curl-config --version
+// ldd --version
+// echo $SHELL
 void get_system_info(struct strbuf *sys_info)
 {
 	struct child_process cp = CHILD_PROCESS_INIT;
@@ -53,3 +67,39 @@  void get_system_info(struct strbuf *sys_info)
 	argv_array_clear(&cp.args);
 	strbuf_reset(&std_out);
 }
+
+void gather_whitelist(struct strbuf *path)
+{
+	struct strbuf tmp = STRBUF_INIT;
+	strbuf_read_file(&tmp, path->buf, 0);
+	string_list_init(&whitelist, 1);
+	string_list_split(&whitelist, tmp.buf, '\n', -1);
+	string_list_sort(&whitelist);
+}
+
+int git_config_bugreport(const char *var, const char *value, void *cb)
+{
+	if (string_list_has_string(&whitelist, var)) {
+		strbuf_addf(&configs_and_values,
+			    "%s : %s\n",
+			    var, value);
+	}
+
+	return 0;
+}
+
+void get_whitelisted_config(struct strbuf *config_info)
+{
+	struct strbuf path = STRBUF_INIT;
+
+	strbuf_addstr(&path, git_exec_path());
+	strbuf_addstr(&path, "/git-bugreport-config-whitelist");
+
+	gather_whitelist(&path);
+	strbuf_init(&configs_and_values, whitelist.nr);
+
+	git_config(git_config_bugreport, NULL);
+
+	strbuf_reset(config_info);
+	strbuf_addbuf(config_info, &configs_and_values);
+}
diff --git a/bugreport.h b/bugreport.h
index ba216acf3f..7413e7e1be 100644
--- a/bugreport.h
+++ b/bugreport.h
@@ -5,3 +5,10 @@ 
  * The previous contents of sys_info will be discarded.
  */
 void get_system_info(struct strbuf *sys_info);
+
+/**
+ * Adds the values of the config items listed in
+ * 'git-bugreport-config-whitelist' to config_info. The previous contents of
+ * config_info will be discarded.
+ */
+void get_whitelisted_config(struct strbuf *sys_info);
diff --git a/builtin/bugreport.c b/builtin/bugreport.c
index 7232d31be7..70fe0d2b85 100644
--- a/builtin/bugreport.c
+++ b/builtin/bugreport.c
@@ -56,6 +56,10 @@  int cmd_bugreport(int argc, const char **argv, const char *prefix)
 	get_system_info(&buffer);
 	strbuf_write(&buffer, report);
 
+	add_header(report, "Whitelisted Config");
+	get_whitelisted_config(&buffer);
+	strbuf_write(&buffer, report);
+
 	fclose(report);
 
 	launch_editor(report_path.buf, NULL, NULL);