[1/2] bugreport: generate config safelist based on docs
diff mbox series

Message ID 20200624012827.34126-2-emilyshaffer@google.com
State New
Headers show
Series
  • bugreport: report configs from safelist
Related show

Commit Message

Emily Shaffer June 24, 2020, 1:28 a.m. UTC
Add a new step to the build to generate a safelist of git-config
variables which are appropriate to include in the output of
git-bugreport. New variables can be added to the safelist by annotating
their documentation in Documentation/config with the "annotate" macro,
which is a no-op in AsciiDoc and AsciiDoctor.

Some configs are private in nature, and can contain remote URLs,
passwords, or other sensitive information. In the event that a user
doesn't notice their information while reviewing a bugreport, that user
may leak their credentials to other individuals, mailing lists, or bug
tracking tools inadvertently. Heuristic blocklisting of configuration
keys is imperfect and prone to false negatives; given the nature of the
information which can be leaked, a safelist is more reliable.

However, it's possible that in some situations, an organization may be
less concerned with privacy of things like remote URLs and branch names,
and more concerned with ease of diagnosis for their support staff. In
those cases, it may make more sense for that organization to modify the
code to use a blocklist. To that end, we should try to mark configs which
are definitely safe, and configs which are definitely unsafe, and leave
blank configs which are somewhere in between. To mark a config as safe,
add "annotate:bugreport[include]" to the corresponding line in the
config documentation; to mark it as unsafe, add
"annotate:bugreport[exclude]" instead.

Generating bugreport-config-safelist.h at build time by grepping the
documentation for this new macro helps us prevent staleness. The macro
itself is a no-op and should not alter the appearance of the
documentation in either AsciiDoc or AsciiDoctor, confirmable by running:

  cd Documentation
  ./doc-diff --asciidoctor HEAD^ HEAD
  ./doc-diff --asciidoc HEAD^ HEAD

Diffing the rendered HTML shows that only inline comments were added,
which shouldn't be a problem.

Additionally, add annotations to the sendemail config documentation in
order to demonstrate a proof of concept.

Helped-by: Martin Ă…gren <martin.agren@gmail.com>
Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
---
 .gitignore                              |  1 +
 Documentation/asciidoc.conf             |  9 ++++
 Documentation/asciidoctor-extensions.rb |  7 ++++
 Documentation/config/sendemail.txt      | 56 ++++++++++++-------------
 Makefile                                |  7 ++++
 generate-bugreport-config-safelist.sh   | 18 ++++++++
 6 files changed, 70 insertions(+), 28 deletions(-)
 create mode 100755 generate-bugreport-config-safelist.sh

Comments

Junio C Hamano June 24, 2020, 4:35 a.m. UTC | #1
Emily Shaffer <emilyshaffer@google.com> writes:

> ... To mark a config as safe,
> add "annotate:bugreport[include]" to the corresponding line in the
> config documentation; to mark it as unsafe, add
> "annotate:bugreport[exclude]" instead.

Hmph,...

> -sendemail.smtpEncryption::
> +sendemail.smtpEncryption annotate:bugreport[include] ::
>  	See linkgit:git-send-email[1] for description.  Note that this
>  	setting is not subject to the 'identity' mechanism.
>  
> @@ -15,7 +15,7 @@ sendemail.smtpsslcertpath::
>  	Path to ca-certificates (either a directory or a single file).
>  	Set it to an empty string to disable certificate verification.
>  
> -sendemail.<identity>.*::
> +sendemail.<identity>.* annotate:bugreport[exclude] ::

So "sendemail.git-devel.cc" is not included due to [exclude] here,
but ...

> +sendemail.annotate annotate:bugreport[include] ::
> +sendemail.bcc annotate:bugreport[include] ::
> +sendemail.cc annotate:bugreport[include] ::

... "sendemail.cc" that is a fallback value for other "sendemail.*.cc"
is included?  

> +++ b/generate-bugreport-config-safelist.sh
> @@ -0,0 +1,18 @@
> +#!/bin/sh
> +
> +cat <<"EOF"
> +/* Automatically generated by bugreport-generate-config-safelist.sh */
> +
> +
> +static const char *bugreport_config_safelist[] = {
> +EOF
> +
> +# cat all regular files in Documentation/config
> +find Documentation/config -type f -exec cat {} \; |
> +# print the command name which matches the annotate-bugreport macro
> +sed -n 's/^\([^ ]*\)  *annotate:bugreport\[include\].* ::$/  "\1",/p' |
> +	sort

We just care about "include" entries, so it does not matter whether
we mark entries with [exclude] or not anyway?

Puzzled...
Emily Shaffer June 24, 2020, 6:37 p.m. UTC | #2
On Tue, Jun 23, 2020 at 09:35:15PM -0700, Junio C Hamano wrote:
> 
> Emily Shaffer <emilyshaffer@google.com> writes:
> 
> > ... To mark a config as safe,
> > add "annotate:bugreport[include]" to the corresponding line in the
> > config documentation; to mark it as unsafe, add
> > "annotate:bugreport[exclude]" instead.
> 
> Hmph,...
> 
> > -sendemail.smtpEncryption::
> > +sendemail.smtpEncryption annotate:bugreport[include] ::
> >  	See linkgit:git-send-email[1] for description.  Note that this
> >  	setting is not subject to the 'identity' mechanism.
> >  
> > @@ -15,7 +15,7 @@ sendemail.smtpsslcertpath::
> >  	Path to ca-certificates (either a directory or a single file).
> >  	Set it to an empty string to disable certificate verification.
> >  
> > -sendemail.<identity>.*::
> > +sendemail.<identity>.* annotate:bugreport[exclude] ::
> 
> So "sendemail.git-devel.cc" is not included due to [exclude] here,
> but ...

Hm, I can change this for the example, but I think it may still not be
included correctly because of the wildcard.  We had talked about whether
to include wildcarded options like this in the past too - I think that's
part of why I removed these couple patches from the initial bugreport
series.

If this line becomes included, then what we're really searching for is
anything matching "sendemail.*.*"; is this syntax regular enough to
count on in the generator script? Could the generator script do some
magic to turn <foo> and * into valid regex in the generated header and
then use that regex during the config parse?

Maybe using regex is overkill and I should just check for "*" during the
parse. Or maybe there's already a library to use. I'll dig into this
some today. Thanks for pointing it out.

> 
> > +sendemail.annotate annotate:bugreport[include] ::
> > +sendemail.bcc annotate:bugreport[include] ::
> > +sendemail.cc annotate:bugreport[include] ::
> 
> ... "sendemail.cc" that is a fallback value for other "sendemail.*.cc"
> is included?  
> 
> > +++ b/generate-bugreport-config-safelist.sh
> > @@ -0,0 +1,18 @@
> > +#!/bin/sh
> > +
> > +cat <<"EOF"
> > +/* Automatically generated by bugreport-generate-config-safelist.sh */
> > +
> > +
> > +static const char *bugreport_config_safelist[] = {
> > +EOF
> > +
> > +# cat all regular files in Documentation/config
> > +find Documentation/config -type f -exec cat {} \; |
> > +# print the command name which matches the annotate-bugreport macro
> > +sed -n 's/^\([^ ]*\)  *annotate:bugreport\[include\].* ::$/  "\1",/p' |
> > +	sort
> 
> We just care about "include" entries, so it does not matter whether
> we mark entries with [exclude] or not anyway?
> 
> Puzzled...

I wonder where I forgot to include the context for encouraging
"exclude". My thinking was that some organizations might have a lower
expectation of employee privacy, and therefore opt to use a blocklist
rather than a safelist for bug reports included internally. I suppose I
was thinking then that organization only needs to carry a patch to
invert the generator script, and not have to comb through all the
configs they care or don't care about themselves.

But as I'm revisiting this, it seems like that means a user who works
for Big Brother Corp but is told to "file that bug upstream" could then
leak their confidential info by accident, not realizing their
git-bugreport is using a wider net than upstream wants.

What do others think?

 - Emily

Patch
diff mbox series

diff --git a/.gitignore b/.gitignore
index ee509a2ad2..0177839ae8 100644
--- a/.gitignore
+++ b/.gitignore
@@ -191,6 +191,7 @@ 
 /gitweb/static/gitweb.min.*
 /config-list.h
 /command-list.h
+/bugreport-config-safelist.h
 *.tar.gz
 *.dsc
 *.deb
diff --git a/Documentation/asciidoc.conf b/Documentation/asciidoc.conf
index 3e4c13971b..fd225e30cc 100644
--- a/Documentation/asciidoc.conf
+++ b/Documentation/asciidoc.conf
@@ -6,9 +6,14 @@ 
 #
 # Show Git link as: <command>(<section>); if section is defined, else just show
 # the command.
+#
+# The annotate macro does nothing as far as rendering is
+# concerned -- we just grep for it in the sources to populate
+# things like the bugreport safelist.
 
 [macros]
 (?su)[\\]?(?P<name>linkgit):(?P<target>\S*?)\[(?P<attrlist>.*?)\]=
+(?su)[\\]?(?P<name>annotate):(?P<target>\S*?)\[(?P<attrlist>.*?)\]=
 
 [attributes]
 asterisk=&#42;
@@ -28,6 +33,8 @@  ifdef::backend-docbook[]
 {0#<citerefentry>}
 {0#<refentrytitle>{target}</refentrytitle><manvolnum>{0}</manvolnum>}
 {0#</citerefentry>}
+[annotate-inlinemacro]
+{0#}
 endif::backend-docbook[]
 
 ifdef::backend-docbook[]
@@ -75,4 +82,6 @@  ifdef::backend-xhtml11[]
 git-relative-html-prefix=
 [linkgit-inlinemacro]
 <a href="{git-relative-html-prefix}{target}.html">{target}{0?({0})}</a>
+[annotate-inlinemacro]
+<!-- -->
 endif::backend-xhtml11[]
diff --git a/Documentation/asciidoctor-extensions.rb b/Documentation/asciidoctor-extensions.rb
index d906a00803..03c80af0e5 100644
--- a/Documentation/asciidoctor-extensions.rb
+++ b/Documentation/asciidoctor-extensions.rb
@@ -39,10 +39,17 @@  module Git
         output
       end
     end
+
+    class AnnotateProcessor < Asciidoctor::Extensions::InlineMacroProcessor
+      def process(parent, target, attrs)
+        ""
+      end
+    end
   end
 end
 
 Asciidoctor::Extensions.register do
   inline_macro Git::Documentation::LinkGitProcessor, :linkgit
   postprocessor Git::Documentation::DocumentPostProcessor
+  inline_macro Git::Documentation::AnnotateProcessor, :annotate
 end
diff --git a/Documentation/config/sendemail.txt b/Documentation/config/sendemail.txt
index 0006faf800..fe27473e44 100644
--- a/Documentation/config/sendemail.txt
+++ b/Documentation/config/sendemail.txt
@@ -4,7 +4,7 @@  sendemail.identity::
 	values in the 'sendemail' section. The default identity is
 	the value of `sendemail.identity`.
 
-sendemail.smtpEncryption::
+sendemail.smtpEncryption annotate:bugreport[include] ::
 	See linkgit:git-send-email[1] for description.  Note that this
 	setting is not subject to the 'identity' mechanism.
 
@@ -15,7 +15,7 @@  sendemail.smtpsslcertpath::
 	Path to ca-certificates (either a directory or a single file).
 	Set it to an empty string to disable certificate verification.
 
-sendemail.<identity>.*::
+sendemail.<identity>.* annotate:bugreport[exclude] ::
 	Identity-specific versions of the 'sendemail.*' parameters
 	found below, taking precedence over those when this
 	identity is selected, through either the command-line or
@@ -23,41 +23,41 @@  sendemail.<identity>.*::
 
 sendemail.aliasesFile::
 sendemail.aliasFileType::
-sendemail.annotate::
-sendemail.bcc::
-sendemail.cc::
-sendemail.ccCmd::
-sendemail.chainReplyTo::
-sendemail.confirm::
-sendemail.envelopeSender::
-sendemail.from::
-sendemail.multiEdit::
-sendemail.signedoffbycc::
-sendemail.smtpPass::
-sendemail.suppresscc::
-sendemail.suppressFrom::
-sendemail.to::
-sendemail.tocmd::
-sendemail.smtpDomain::
-sendemail.smtpServer::
-sendemail.smtpServerPort::
-sendemail.smtpServerOption::
-sendemail.smtpUser::
-sendemail.thread::
-sendemail.transferEncoding::
-sendemail.validate::
-sendemail.xmailer::
+sendemail.annotate annotate:bugreport[include] ::
+sendemail.bcc annotate:bugreport[include] ::
+sendemail.cc annotate:bugreport[include] ::
+sendemail.ccCmd annotate:bugreport[include] ::
+sendemail.chainReplyTo annotate:bugreport[include] ::
+sendemail.confirm annotate:bugreport[include] ::
+sendemail.envelopeSender annotate:bugreport[include] ::
+sendemail.from annotate:bugreport[include] ::
+sendemail.multiEdit annotate:bugreport[include] ::
+sendemail.signedoffbycc annotate:bugreport[include] ::
+sendemail.smtpPass annotate:bugreport[exclude] ::
+sendemail.suppresscc annotate:bugreport[include] ::
+sendemail.suppressFrom annotate:bugreport[include] ::
+sendemail.to annotate:bugreport[include] ::
+sendemail.tocmd annotate:bugreport[include] ::
+sendemail.smtpDomain annotate:bugreport[include] ::
+sendemail.smtpServer annotate:bugreport[include] ::
+sendemail.smtpServerPort annotate:bugreport[include] ::
+sendemail.smtpServerOption annotate:bugreport[include] ::
+sendemail.smtpUser annotate:bugreport[exclude] ::
+sendemail.thread annotate:bugreport[include] ::
+sendemail.transferEncoding annotate:bugreport[include] ::
+sendemail.validate annotate:bugreport[include] ::
+sendemail.xmailer annotate:bugreport[include] ::
 	See linkgit:git-send-email[1] for description.
 
 sendemail.signedoffcc (deprecated)::
 	Deprecated alias for `sendemail.signedoffbycc`.
 
-sendemail.smtpBatchSize::
+sendemail.smtpBatchSize annotate:bugreport[include] ::
 	Number of messages to be sent per connection, after that a relogin
 	will happen.  If the value is 0 or undefined, send all messages in
 	one connection.
 	See also the `--batch-size` option of linkgit:git-send-email[1].
 
-sendemail.smtpReloginDelay::
+sendemail.smtpReloginDelay annotate:bugreport[include] ::
 	Seconds wait before reconnecting to smtp server.
 	See also the `--relogin-delay` option of linkgit:git-send-email[1].
diff --git a/Makefile b/Makefile
index 372139f1f2..11d4029003 100644
--- a/Makefile
+++ b/Makefile
@@ -810,6 +810,7 @@  VCSSVN_LIB = vcs-svn/lib.a
 
 GENERATED_H += config-list.h
 GENERATED_H += command-list.h
+GENERATED_H += bugreport-config-safelist.h
 
 LIB_H := $(sort $(patsubst ./%,%,$(shell git ls-files '*.h' ':!t/' ':!Documentation/' 2>/dev/null || \
 	$(FIND) . \
@@ -2164,6 +2165,12 @@  command-list.h: $(wildcard Documentation/git*.txt) Documentation/*config.txt Doc
 		$(patsubst %,--exclude-program %,$(EXCLUDED_PROGRAMS)) \
 		command-list.txt >$@+ && mv $@+ $@
 
+bugreport-config-safelist.h: generate-bugreport-config-safelist.sh
+
+bugreport-config-safelist.h: Documentation/config/*.txt
+	$(QUIET_GEN)$(SHELL_PATH) ./generate-bugreport-config-safelist.sh \
+		>$@+ && mv $@+ $@
+
 SCRIPT_DEFINES = $(SHELL_PATH_SQ):$(DIFF_SQ):$(GIT_VERSION):\
 	$(localedir_SQ):$(NO_CURL):$(USE_GETTEXT_SCHEME):$(SANE_TOOL_PATH_SQ):\
 	$(gitwebdir_SQ):$(PERL_PATH_SQ):$(SANE_TEXT_GREP):$(PAGER_ENV):\
diff --git a/generate-bugreport-config-safelist.sh b/generate-bugreport-config-safelist.sh
new file mode 100755
index 0000000000..5bf1f4ad82
--- /dev/null
+++ b/generate-bugreport-config-safelist.sh
@@ -0,0 +1,18 @@ 
+#!/bin/sh
+
+cat <<"EOF"
+/* Automatically generated by bugreport-generate-config-safelist.sh */
+
+
+static const char *bugreport_config_safelist[] = {
+EOF
+
+# cat all regular files in Documentation/config
+find Documentation/config -type f -exec cat {} \; |
+# print the command name which matches the annotate-bugreport macro
+sed -n 's/^\([^ ]*\)  *annotate:bugreport\[include\].* ::$/  "\1",/p' |
+	sort
+
+cat <<"EOF"
+};
+EOF