[v2,2/2] bugreport: generate config whitelist based on docs
diff mbox series

Message ID 20190817003926.102944-3-emilyshaffer@google.com
State New
Headers show
Series
  • Untitled series #161079
Related show

Commit Message

Emily Shaffer Aug. 17, 2019, 12:39 a.m. UTC
Add a new step to the build to generate a whitelist of git-config
variables which are appropriate to include in the output of
git-bugreport. New variables can be added to the whitelist by annotating
their documentation in Documentation/config with the line
"// bugreport-include".

Some configs are private in nature, and can contain remote URLs,
passwords, or other sensitive information. In the event that a user
doesn't notice their information while reviewing a bugreport, that user
may leak their credentials to other individuals, mailing lists, or bug
tracking tools inadvertently. Heuristic blacklisting of configuration
keys is imperfect and prone to false negatives; given the nature of the
information which can be leaked, a whitelist is more reliable.

In order to prevent staleness of the whitelist, add a mechanism to
generate the whitelist from annotations in the config documentation,
where contributors are already used to documenting their new config
keys.

Additionally, add annotations to the sendemail config documentation in
order to demonstrate a proof of concept.

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
---
 .gitignore                             |  1 +
 Documentation/config/sendemail.txt     | 68 +++++++++++++-------------
 Makefile                               |  9 +++-
 bugreport-generate-config-whitelist.sh |  4 ++
 git-bugreport-config-whitelist         | 22 ---------
 5 files changed, 47 insertions(+), 57 deletions(-)
 create mode 100755 bugreport-generate-config-whitelist.sh
 delete mode 100644 git-bugreport-config-whitelist

Comments

Martin Ågren Aug. 17, 2019, 8:38 p.m. UTC | #1
On Sat, 17 Aug 2019 at 02:42, Emily Shaffer <emilyshaffer@google.com> wrote:
>
> Add a new step to the build to generate a whitelist of git-config
> variables which are appropriate to include in the output of
> git-bugreport. New variables can be added to the whitelist by annotating
> their documentation in Documentation/config with the line
> "// bugreport-include".

These "// bugreport-include" show up in the rendered manpages, both with
AsciiDoc and Asciidoctor. :-(

> --- a/Documentation/config/sendemail.txt
> +++ b/Documentation/config/sendemail.txt
> @@ -1,63 +1,63 @@
> -sendemail.identity::
> +sendemail.identity:: // bugreport-exclude
>         A configuration identity. When given, causes values in the

If I put each comment on a line of its own (after the config option, but
I suppose before would work the same way), Asciidoctor truly ignores
them and everything's fine. And AsciiDoc renders this one and others
like it ok.

> -sendemail.aliasesFile::
> -sendemail.aliasFileType::
> -sendemail.annotate::
> +sendemail.aliasesFile:: // bugreport-exclude
> +sendemail.aliasFileType:: // bugreport-exclude
> +sendemail.annotate:: // bugreport-include

However, AsciiDoc (version 8.6.10) seems to effectively replace those
comments with an empty line during processing, and it makes quite the
difference here. Instead of these appearing in a compact comma-separated
list, they are treated as individual items in the description list with
no supporting content.

FWIW, I like the idea of annotating things here to make it harder to
forget this whitelisting when adding a new config item.

Below is what I came up with as an alternative approach. Feel free to
steal, squash and/or ignore as you see fit.

BTW, I'm not sure the grep invocation is portable (-R ? -h ? -P ? -o ?).
We should probably also do the usual "create a candidate output file,
then move it into place" dance for robustness.

I do think we should test the generated whitelist in some minimal way,
e.g., to check that it does contain something which objectively belongs
in the whitelist and -- more importantly IMHO -- does *not* contain
something that shouldn't be there, such as sendemail.smtpPass.

Martin

-- >8 --
Subject: [PATCH] Use a bugreport:include/exclude macro instead

Implement a no-op "bugreport" macro for use as "bugreport:include[x]" to
annotate the config keys that should be included in the automatically
generated whitelist. Use "exclude" for the others.

With Asciidoctor, it's ok to say "bugreport:include[]", but AsciiDoc
seems to want something between the brackets. A bit unfortunate, but
not a huge problem -- we'll just provide an "x".

"doc-diff" reports that this macro doesn't render at all. That is,
these are both empty after this commit:

  cd Documentation
  ./doc-diff --asciidoctor :/"bugreport: add tool" HEAD
  ./doc-diff --asciidoc    :/"bugreport: add tool" HEAD

Diffing the rendered HTML shows that there is some small amount of
whitespace and comments added. That shouldn't be a problem.

We could perhaps let the implementation verify that the "action" is one
of "include" and "exclude". For the Asciidoctor implementation that
should be straightforward, but for AsciiDoc I don't immediately know how
to do it. Anyway, if someone stumbles on the keyboard and writes
"bugreport:icndule", they'll "only" miss out on the config key being
included in the whitelist. If this were a blacklist, the consequences of
a misspelled target could be a lot more severe.

Disclaimer: This is the outcome of a combination of copy-paste and
trial and error -- better solutions might be possible...

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
---
 I suppose this could also be "annotate:bugreport[ignore]" to avoid the
 "[x]" and to place all of this under a more general "annotate" macro.

 Documentation/asciidoc.conf             |  8 +++
 Documentation/asciidoctor-extensions.rb |  8 +++
 Documentation/config/sendemail.txt      | 68 ++++++++++++-------------
 bugreport-generate-config-whitelist.sh  |  2 +-
 4 files changed, 51 insertions(+), 35 deletions(-)

diff --git a/Documentation/asciidoc.conf b/Documentation/asciidoc.conf
index 2c16c536ba..5a3c5ef028 100644
--- a/Documentation/asciidoc.conf
+++ b/Documentation/asciidoc.conf
@@ -6,9 +6,13 @@
 #
 # Show Git link as: <command>(<section>); if section is defined, else just show
 # the command.
+#
+# The bugreport macro does nothing as far as rendering is
+# concerned -- we just grep for it in the sources.
 
 [macros]
 (?su)[\\]?(?P<name>linkgit):(?P<target>\S*?)\[(?P<attrlist>.*?)\]=
+(?su)[\\]?(?P<name>bugreport):(?P<action>\S*?)\[(?P<attrlist>.*?)\]=
 
 [attributes]
 asterisk=&#42;
@@ -28,6 +32,8 @@ ifdef::backend-docbook[]
 {0#<citerefentry>}
 {0#<refentrytitle>{target}</refentrytitle><manvolnum>{0}</manvolnum>}
 {0#</citerefentry>}
+[bugreport-inlinemacro]
+{0#}
 endif::backend-docbook[]
 
 ifdef::backend-docbook[]
@@ -94,4 +100,6 @@ ifdef::backend-xhtml11[]
 git-relative-html-prefix=
 [linkgit-inlinemacro]
 <a href="{git-relative-html-prefix}{target}.html">{target}{0?({0})}</a>
+[bugreport-inlinemacro]
+<!-- -->
 endif::backend-xhtml11[]
diff --git a/Documentation/asciidoctor-extensions.rb b/Documentation/asciidoctor-extensions.rb
index 0089e0cfb8..54cbd5ddaf 100644
--- a/Documentation/asciidoctor-extensions.rb
+++ b/Documentation/asciidoctor-extensions.rb
@@ -20,9 +20,17 @@ module Git
         end
       end
     end
+    class BugReportProcessor < Asciidoctor::Extensions::InlineMacroProcessor
+      def process(parent, action, attrs)
+        ""
+      end
+    end
   end
 end
 
 Asciidoctor::Extensions.register do
   inline_macro Git::Documentation::LinkGitProcessor, :linkgit
+  # The bugreport macro does nothing as far as rendering is
+  # concerned -- we just grep for it in the sources.
+  inline_macro Git::Documentation::BugReportProcessor, :bugreport
 end
diff --git a/Documentation/config/sendemail.txt b/Documentation/config/sendemail.txt
index 69f3e4f219..92f5082013 100644
--- a/Documentation/config/sendemail.txt
+++ b/Documentation/config/sendemail.txt
@@ -1,63 +1,63 @@
-sendemail.identity:: // bugreport-exclude
+sendemail.identity bugreport:exclude[x] ::
 	A configuration identity. When given, causes values in the
 	'sendemail.<identity>' subsection to take precedence over
 	values in the 'sendemail' section. The default identity is
 	the value of `sendemail.identity`.
 
-sendemail.smtpEncryption:: // bugreport-include
+sendemail.smtpEncryption bugreport:include[x] ::
 	See linkgit:git-send-email[1] for description.  Note that this
 	setting is not subject to the 'identity' mechanism.
 
-sendemail.smtpssl (deprecated):: // bugreport-exclude
+sendemail.smtpssl (deprecated) bugreport:exclude[x] ::
 	Deprecated alias for 'sendemail.smtpEncryption = ssl'.
 
-sendemail.smtpsslcertpath:: // bugreport-exclude
+sendemail.smtpsslcertpath bugreport:exclude[x] ::
 	Path to ca-certificates (either a directory or a single file).
 	Set it to an empty string to disable certificate verification.
 
-sendemail.<identity>.*:: // bugreport-exclude
+sendemail.<identity>.* bugreport:exclude[x] ::
 	Identity-specific versions of the 'sendemail.*' parameters
 	found below, taking precedence over those when this
 	identity is selected, through either the command-line or
 	`sendemail.identity`.
 
-sendemail.aliasesFile:: // bugreport-exclude
-sendemail.aliasFileType:: // bugreport-exclude
-sendemail.annotate:: // bugreport-include
-sendemail.bcc:: // bugreport-include
-sendemail.cc:: // bugreport-include
-sendemail.ccCmd:: // bugreport-include
-sendemail.chainReplyTo:: // bugreport-include
-sendemail.confirm:: // bugreport-include
-sendemail.envelopeSender:: // bugreport-include
-sendemail.from:: // bugreport-include
-sendemail.multiEdit:: // bugreport-include
-sendemail.signedoffbycc:: // bugreport-include
-sendemail.smtpPass:: // bugreport-exclude
-sendemail.suppresscc:: // bugreport-include
-sendemail.suppressFrom:: // bugreport-include
-sendemail.to:: // bugreport-include
-sendemail.tocmd:: // bugreport-include
-sendemail.smtpDomain:: // bugreport-include
-sendemail.smtpServer:: // bugreport-include
-sendemail.smtpServerPort:: // bugreport-include
-sendemail.smtpServerOption:: // bugreport-include
-sendemail.smtpUser:: // bugreport-exclude
-sendemail.thread:: // bugreport-include
-sendemail.transferEncoding:: // bugreport-include
-sendemail.validate:: // bugreport-include
-sendemail.xmailer:: // bugreport-include
+sendemail.aliasesFile bugreport:exclude[x] ::
+sendemail.aliasFileType bugreport:exclude[x] ::
+sendemail.annotate bugreport:include[x] ::
+sendemail.bcc bugreport:include[x] ::
+sendemail.cc bugreport:include[x] ::
+sendemail.ccCmd bugreport:include[x] ::
+sendemail.chainReplyTo bugreport:include[x] ::
+sendemail.confirm bugreport:include[x] ::
+sendemail.envelopeSender bugreport:include[x] ::
+sendemail.from bugreport:include[x] ::
+sendemail.multiEdit bugreport:include[x] ::
+sendemail.signedoffbycc bugreport:include[x] ::
+sendemail.smtpPass bugreport:exclude[x] ::
+sendemail.suppresscc bugreport:include[x] ::
+sendemail.suppressFrom bugreport:include[x] ::
+sendemail.to bugreport:include[x] ::
+sendemail.tocmd bugreport:include[x] ::
+sendemail.smtpDomain bugreport:include[x] ::
+sendemail.smtpServer bugreport:include[x] ::
+sendemail.smtpServerPort bugreport:include[x] ::
+sendemail.smtpServerOption bugreport:include[x] ::
+sendemail.smtpUser bugreport:exclude[x] ::
+sendemail.thread bugreport:include[x] ::
+sendemail.transferEncoding bugreport:include[x] ::
+sendemail.validate bugreport:include[x] ::
+sendemail.xmailer bugreport:include[x] ::
 	See linkgit:git-send-email[1] for description.
 
-sendemail.signedoffcc (deprecated):: // bugreport-exclude
+sendemail.signedoffcc (deprecated) bugreport:exclude[x] ::
 	Deprecated alias for `sendemail.signedoffbycc`.
 
-sendemail.smtpBatchSize:: // bugreport-include
+sendemail.smtpBatchSize bugreport:include[x] ::
 	Number of messages to be sent per connection, after that a relogin
 	will happen.  If the value is 0 or undefined, send all messages in
 	one connection.
 	See also the `--batch-size` option of linkgit:git-send-email[1].
 
-sendemail.smtpReloginDelay:: // bugreport-include
+sendemail.smtpReloginDelay bugreport:include[x] ::
 	Seconds wait before reconnecting to smtp server.
 	See also the `--relogin-delay` option of linkgit:git-send-email[1].
diff --git a/bugreport-generate-config-whitelist.sh b/bugreport-generate-config-whitelist.sh
index ca6b232024..896b838cfa 100755
--- a/bugreport-generate-config-whitelist.sh
+++ b/bugreport-generate-config-whitelist.sh
@@ -1,4 +1,4 @@
 #!/bin/sh
 
-grep -RhPo ".*(?=:: \/\/ bugreport-include)" Documentation/config \
+grep -RhPo ".*(?= +bugreport:include.* ::)" Documentation/config \
   >git-bugreport-config-whitelist
Emily Shaffer Aug. 21, 2019, 5:40 p.m. UTC | #2
On Sat, Aug 17, 2019 at 10:38:46PM +0200, Martin Ågren wrote:
> On Sat, 17 Aug 2019 at 02:42, Emily Shaffer <emilyshaffer@google.com> wrote:
> >
> > Add a new step to the build to generate a whitelist of git-config
> > variables which are appropriate to include in the output of
> > git-bugreport. New variables can be added to the whitelist by annotating
> > their documentation in Documentation/config with the line
> > "// bugreport-include".
> 
> These "// bugreport-include" show up in the rendered manpages, both with
> AsciiDoc and Asciidoctor. :-(

Hmm, I don't see it in the troff or the HTML with asciidoc using `make`
- but I do see with asciidoctor, or `asciidoc -d html5`. Nice catch,
thanks.

> 
> > --- a/Documentation/config/sendemail.txt
> > +++ b/Documentation/config/sendemail.txt
> > @@ -1,63 +1,63 @@
> > -sendemail.identity::
> > +sendemail.identity:: // bugreport-exclude
> >         A configuration identity. When given, causes values in the
> 
> If I put each comment on a line of its own (after the config option, but
> I suppose before would work the same way), Asciidoctor truly ignores
> them and everything's fine. And AsciiDoc renders this one and others
> like it ok.
> 
> > -sendemail.aliasesFile::
> > -sendemail.aliasFileType::
> > -sendemail.annotate::
> > +sendemail.aliasesFile:: // bugreport-exclude
> > +sendemail.aliasFileType:: // bugreport-exclude
> > +sendemail.annotate:: // bugreport-include
> 
> However, AsciiDoc (version 8.6.10) seems to effectively replace those
> comments with an empty line during processing, and it makes quite the
> difference here. Instead of these appearing in a compact comma-separated
> list, they are treated as individual items in the description list with
> no supporting content.
> 
> FWIW, I like the idea of annotating things here to make it harder to
> forget this whitelisting when adding a new config item.
> 
> Below is what I came up with as an alternative approach. Feel free to
> steal, squash and/or ignore as you see fit.

This is cool - I'll add this to the commit chain and build on top of it.
Thanks!

> 
> BTW, I'm not sure the grep invocation is portable (-R ? -h ? -P ? -o ?).

Yeah, I'll see what I can do about that (recursive, no filename,
Perl-compatible regex, match only) - I can probably replace this with
something like `find | xargs pgrep` - I'll keep digging. Thanks.

> We should probably also do the usual "create a candidate output file,
> then move it into place" dance for robustness.
> 
> I do think we should test the generated whitelist in some minimal way,
> e.g., to check that it does contain something which objectively belongs
> in the whitelist and -- more importantly IMHO -- does *not* contain
> something that shouldn't be there, such as sendemail.smtpPass.

Good point, I'll add a test for this.

Thanks very much for this!
 - Emily

Patch
diff mbox series

diff --git a/.gitignore b/.gitignore
index b4f5433084..65cc74c748 100644
--- a/.gitignore
+++ b/.gitignore
@@ -26,6 +26,7 @@ 
 /git-blame
 /git-branch
 /git-bugreport
+/git-bugreport-config-whitelist
 /git-bundle
 /git-cat-file
 /git-check-attr
diff --git a/Documentation/config/sendemail.txt b/Documentation/config/sendemail.txt
index 0006faf800..69f3e4f219 100644
--- a/Documentation/config/sendemail.txt
+++ b/Documentation/config/sendemail.txt
@@ -1,63 +1,63 @@ 
-sendemail.identity::
+sendemail.identity:: // bugreport-exclude
 	A configuration identity. When given, causes values in the
 	'sendemail.<identity>' subsection to take precedence over
 	values in the 'sendemail' section. The default identity is
 	the value of `sendemail.identity`.
 
-sendemail.smtpEncryption::
+sendemail.smtpEncryption:: // bugreport-include
 	See linkgit:git-send-email[1] for description.  Note that this
 	setting is not subject to the 'identity' mechanism.
 
-sendemail.smtpssl (deprecated)::
+sendemail.smtpssl (deprecated):: // bugreport-exclude
 	Deprecated alias for 'sendemail.smtpEncryption = ssl'.
 
-sendemail.smtpsslcertpath::
+sendemail.smtpsslcertpath:: // bugreport-exclude
 	Path to ca-certificates (either a directory or a single file).
 	Set it to an empty string to disable certificate verification.
 
-sendemail.<identity>.*::
+sendemail.<identity>.*:: // bugreport-exclude
 	Identity-specific versions of the 'sendemail.*' parameters
 	found below, taking precedence over those when this
 	identity is selected, through either the command-line or
 	`sendemail.identity`.
 
-sendemail.aliasesFile::
-sendemail.aliasFileType::
-sendemail.annotate::
-sendemail.bcc::
-sendemail.cc::
-sendemail.ccCmd::
-sendemail.chainReplyTo::
-sendemail.confirm::
-sendemail.envelopeSender::
-sendemail.from::
-sendemail.multiEdit::
-sendemail.signedoffbycc::
-sendemail.smtpPass::
-sendemail.suppresscc::
-sendemail.suppressFrom::
-sendemail.to::
-sendemail.tocmd::
-sendemail.smtpDomain::
-sendemail.smtpServer::
-sendemail.smtpServerPort::
-sendemail.smtpServerOption::
-sendemail.smtpUser::
-sendemail.thread::
-sendemail.transferEncoding::
-sendemail.validate::
-sendemail.xmailer::
+sendemail.aliasesFile:: // bugreport-exclude
+sendemail.aliasFileType:: // bugreport-exclude
+sendemail.annotate:: // bugreport-include
+sendemail.bcc:: // bugreport-include
+sendemail.cc:: // bugreport-include
+sendemail.ccCmd:: // bugreport-include
+sendemail.chainReplyTo:: // bugreport-include
+sendemail.confirm:: // bugreport-include
+sendemail.envelopeSender:: // bugreport-include
+sendemail.from:: // bugreport-include
+sendemail.multiEdit:: // bugreport-include
+sendemail.signedoffbycc:: // bugreport-include
+sendemail.smtpPass:: // bugreport-exclude
+sendemail.suppresscc:: // bugreport-include
+sendemail.suppressFrom:: // bugreport-include
+sendemail.to:: // bugreport-include
+sendemail.tocmd:: // bugreport-include
+sendemail.smtpDomain:: // bugreport-include
+sendemail.smtpServer:: // bugreport-include
+sendemail.smtpServerPort:: // bugreport-include
+sendemail.smtpServerOption:: // bugreport-include
+sendemail.smtpUser:: // bugreport-exclude
+sendemail.thread:: // bugreport-include
+sendemail.transferEncoding:: // bugreport-include
+sendemail.validate:: // bugreport-include
+sendemail.xmailer:: // bugreport-include
 	See linkgit:git-send-email[1] for description.
 
-sendemail.signedoffcc (deprecated)::
+sendemail.signedoffcc (deprecated):: // bugreport-exclude
 	Deprecated alias for `sendemail.signedoffbycc`.
 
-sendemail.smtpBatchSize::
+sendemail.smtpBatchSize:: // bugreport-include
 	Number of messages to be sent per connection, after that a relogin
 	will happen.  If the value is 0 or undefined, send all messages in
 	one connection.
 	See also the `--batch-size` option of linkgit:git-send-email[1].
 
-sendemail.smtpReloginDelay::
+sendemail.smtpReloginDelay:: // bugreport-include
 	Seconds wait before reconnecting to smtp server.
 	See also the `--relogin-delay` option of linkgit:git-send-email[1].
diff --git a/Makefile b/Makefile
index 69801a1c45..a62bbefb8f 100644
--- a/Makefile
+++ b/Makefile
@@ -639,6 +639,10 @@  SCRIPT_PYTHON += git-p4.py
 SCRIPT_SH_GEN = $(patsubst %.sh,%,$(SCRIPT_SH))
 SCRIPT_PERL_GEN = $(patsubst %.perl,%,$(SCRIPT_PERL))
 SCRIPT_PYTHON_GEN = $(patsubst %.py,%,$(SCRIPT_PYTHON))
+SCRIPT_DEPENDENCIES = git-bugreport-config-whitelist
+
+$(SCRIPT_DEPENDENCIES): Documentation/config/*.txt
+	sh bugreport-generate-config-whitelist.sh
 
 # Individual rules to allow e.g.
 # "make -C ../.. SCRIPT_PERL=contrib/foo/bar.perl build-perl-script"
@@ -656,17 +660,20 @@  install-perl-script: $(SCRIPT_PERL_GEN)
 install-python-script: $(SCRIPT_PYTHON_GEN)
 	$(INSTALL) $^ '$(DESTDIR_SQ)$(gitexec_instdir_SQ)'
 
-.PHONY: clean-perl-script clean-sh-script clean-python-script
+.PHONY: clean-perl-script clean-sh-script clean-python-script clean-script-dependencies
 clean-sh-script:
 	$(RM) $(SCRIPT_SH_GEN)
 clean-perl-script:
 	$(RM) $(SCRIPT_PERL_GEN)
 clean-python-script:
 	$(RM) $(SCRIPT_PYTHON_GEN)
+clean-script-dependencies:
+	$(RM) $(SCRIPT_DEPENDENCIES)
 
 SCRIPTS = $(SCRIPT_SH_GEN) \
 	  $(SCRIPT_PERL_GEN) \
 	  $(SCRIPT_PYTHON_GEN) \
+	  $(SCRIPT_DEPENDENCIES) \
 	  git-instaweb
 
 ETAGS_TARGET = TAGS
diff --git a/bugreport-generate-config-whitelist.sh b/bugreport-generate-config-whitelist.sh
new file mode 100755
index 0000000000..ca6b232024
--- /dev/null
+++ b/bugreport-generate-config-whitelist.sh
@@ -0,0 +1,4 @@ 
+#!/bin/sh
+
+grep -RhPo ".*(?=:: \/\/ bugreport-include)" Documentation/config \
+  >git-bugreport-config-whitelist
diff --git a/git-bugreport-config-whitelist b/git-bugreport-config-whitelist
deleted file mode 100644
index e4f07f7175..0000000000
--- a/git-bugreport-config-whitelist
+++ /dev/null
@@ -1,22 +0,0 @@ 
-http.version
-protocol.version
-protocol.persistent-https.allow
-protocol.rpc.allow
-protocol.sso.allow
-submodule.repolike
-trace2.eventtarget
-trace2.configparams
-color.ui
-core.pager
-sendemail.smtpencryption
-sendemail.smtpserver
-sendemail.smtpserverport
-sendemail.smtpsslcertpath
-credential.helper
-merge.tool
-grep.linenumber
-rerere.enabled
-core.repositoryformatversion
-core.filemode
-core.bare
-core.logallrefupdates