diff mbox series

[v2] fetch-pack: redact packfile urls in traces

Message ID pull.1052.v2.git.1633746024175.gitgitgadget@gmail.com (mailing list archive)
State Superseded
Headers show
Series [v2] fetch-pack: redact packfile urls in traces | expand

Commit Message

Ivan Frade Oct. 9, 2021, 2:20 a.m. UTC
From: Ivan Frade <ifrade@google.com>

In some setups, packfile uris act as bearer token. It is not
recommended to expose them plainly in logs, although in special
circunstances (e.g. debug) it makes sense to write them.

Redact the packfile-uri lines by default, unless the GIT_TRACE_REDACT
variable is set to false. This mimics the redacting of the
Authorization header in HTTP.

Changes since v1:
- Removed non-POSIX flags in tests
- More accurate regex for the non-encrypted packfile line
- Dropped documentation change
- Dropped redacting the die message in http-fetch

Signed-off-by: Ivan Frade <ifrade@google.com>
---
    fetch-pack: redact packfile urls in traces
    
    In some setups, packfile uris act as bearer token. It is not recommended
    to expose them plainly in logs, although in special circunstances (e.g.
    debug) it makes sense to write them.
    
    Redact the packfile-uri lines by default, unless the GIT_TRACE_REDACT
    variable is set to false. This mimics the redacting of the Authorization
    header in HTTP.
    
    Signed-off-by: Ivan Frade ifrade@google.com
    
    cc: Ævar Arnfjörð Bjarmason avarab@gmail.com

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1052%2Fifradeo%2Fredact-packfile-uri-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1052/ifradeo/redact-packfile-uri-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/1052

Range-diff vs v1:

 1:  b473f145a87 ! 1:  701cb7a6ab9 fetch-pack: redact packfile urls in traces
     @@ Commit message
          variable is set to false. This mimics the redacting of the
          Authorization header in HTTP.
      
     +    Changes since v1:
     +    - Removed non-POSIX flags in tests
     +    - More accurate regex for the non-encrypted packfile line
     +    - Dropped documentation change
     +    - Dropped redacting the die message in http-fetch
     +
          Signed-off-by: Ivan Frade <ifrade@google.com>
      
       ## fetch-pack.c ##
     @@ fetch-pack.c: static void receive_packfile_uris(struct packet_reader *reader,
       		die("expected DELIM");
       }
      
     - ## http-fetch.c ##
     -@@ http-fetch.c: static void fetch_single_packfile(struct object_id *packfile_hash,
     - 	if (start_active_slot(preq->slot)) {
     - 		run_active_slot(preq->slot);
     - 		if (results.curl_result != CURLE_OK) {
     --			die("Unable to get pack file %s\n%s", preq->url,
     -+			int showUrl = git_env_bool("GIT_TRACE_REDACT", 1);
     -+			die("Unable to get offloaded pack file %s\n%s",
     -+			    showUrl ? preq->url : "<redacted>",
     - 			    curl_errorstr);
     - 		}
     - 	} else {
     -
       ## pkt-line.c ##
      @@ pkt-line.c: enum packet_read_status packet_read_with_status(int fd, char **src_buffer,
       		len--;
     @@ t/t5702-protocol-v2.sh: test_expect_success 'packfile-uri with transfer.fsckobje
      +		-c fetch.uriprotocols=http,https \
      +		clone "$HTTPD_URL/smart/http_parent" http_child &&
      +
     -+	grep -A1 "clone<\ ..packfile-uris" log | grep "clone<\ <redacted>"
     ++	grep "clone< <redacted>" log
      +'
      +
      +test_expect_success 'packfile-uri not redacted in trace when GIT_TRACE_REDACT=0' '
     @@ t/t5702-protocol-v2.sh: test_expect_success 'packfile-uri with transfer.fsckobje
      +		-c fetch.uriprotocols=http,https \
      +		clone "$HTTPD_URL/smart/http_parent" http_child &&
      +
     -+	grep -A1 "clone<\ ..packfile-uris" log  | grep -E "clone<\ ..[[:alnum:]]{40,64}\ http"
     ++	grep -E "clone< ..[0-9a-f]{40,64} http://" log
      +'
      +
       test_expect_success 'http:// --negotiate-only' '
 2:  497c5fd18d7 < -:  ----------- Documentation: packfile-uri hash can be longer than 40 hex chars


 fetch-pack.c           | 11 +++++++++++
 pkt-line.c             |  7 ++++++-
 pkt-line.h             |  1 +
 t/t5702-protocol-v2.sh | 43 ++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 61 insertions(+), 1 deletion(-)


base-commit: 0785eb769886ae81e346df10e88bc49ffc0ac64e

Comments

Junio C Hamano Oct. 11, 2021, 8:39 p.m. UTC | #1
"Ivan Frade via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Ivan Frade <ifrade@google.com>
>
> In some setups, packfile uris act as bearer token. It is not
> recommended to expose them plainly in logs, although in special
> circunstances (e.g. debug) it makes sense to write them.
>
> Redact the packfile-uri lines by default, unless the GIT_TRACE_REDACT
> variable is set to false. This mimics the redacting of the
> Authorization header in HTTP.

Well explained.

It of course is a different matter if the explained idea is
agreeable, though ;-).  Hiding the entire packet, based on the "it
might be in some setups" seems a bit too much.

Is it often the case that the whole URI is sensitive, or perhaps
leading "<scheme>://<host>/pack-<abc>.pack" part is not sensitive at
all, and what follows after that "public" part has some "nonce"
material that makes it sensitive?

> Changes since v1:
> - Removed non-POSIX flags in tests
> - More accurate regex for the non-encrypted packfile line
> - Dropped documentation change
> - Dropped redacting the die message in http-fetch

These are not for those who read "git log" in 3 months, as they may
not even have seen the "v1".  But these are very helpful for those
who read the "v1" to see how good this round is.  Please write such
material below the three-dash line.

> Signed-off-by: Ivan Frade <ifrade@google.com>
> ---

i.e. here.

>     fetch-pack: redact packfile urls in traces
>     
>     In some setups, packfile uris act as bearer token. It is not recommended
>     to expose them plainly in logs, although in special circunstances (e.g.
>     debug) it makes sense to write them.
>     
>     Redact the packfile-uri lines by default, unless the GIT_TRACE_REDACT
>     variable is set to false. This mimics the redacting of the Authorization
>     header in HTTP.
>     
>     Signed-off-by: Ivan Frade ifrade@google.com
>     
>     cc: Ævar Arnfjörð Bjarmason avarab@gmail.com

And there is no need to duplicate the log message here ;-)

> diff --git a/fetch-pack.c b/fetch-pack.c
> index a9604f35a3e..05c85eeafa1 100644
> --- a/fetch-pack.c
> +++ b/fetch-pack.c
> @@ -1518,7 +1518,16 @@ static void receive_wanted_refs(struct packet_reader *reader,
>  static void receive_packfile_uris(struct packet_reader *reader,
>  				  struct string_list *uris)
>  {
> +	int original_options;
>  	process_section_header(reader, "packfile-uris", 0);
> +	/*
> +	 * In some setups, packfile-uris act as bearer tokens,
> +	 * redact them by default.
> +	 */
> +	original_options = reader->options;
> +	if (git_env_bool("GIT_TRACE_REDACT", 1))
> +		reader->options |= PACKET_READ_REDACT_ON_TRACE;
> +
>  	while (packet_reader_read(reader) == PACKET_READ_NORMAL) {
>  		if (reader->pktlen < the_hash_algo->hexsz ||
>  		    reader->line[the_hash_algo->hexsz] != ' ')
> @@ -1526,6 +1535,8 @@ static void receive_packfile_uris(struct packet_reader *reader,
>  
>  		string_list_append(uris, reader->line);
>  	}
> +	reader->options = original_options;

So "original_options" is used to save away the reader->options so
that it can be restored before returning to our caller?  

OK (it may be more common in this codebase to call such a variable
"saved_X", though).

> diff --git a/pkt-line.c b/pkt-line.c
> index de4a94b437e..8da8ed88ccf 100644
> --- a/pkt-line.c
> +++ b/pkt-line.c
> @@ -443,7 +443,12 @@ enum packet_read_status packet_read_with_status(int fd, char **src_buffer,
>  		len--;
>  
>  	buffer[len] = 0;
> -	packet_trace(buffer, len, 0);
> +	if (options & PACKET_READ_REDACT_ON_TRACE) {
> +		const char *redacted = "<redacted>";
> +		packet_trace(redacted, strlen(redacted), 0);
> +	} else {
> +		packet_trace(buffer, len, 0);
> +	}
> ...
> +	GIT_TRACE=1 GIT_TRACE_PACKET="$(pwd)/log" GIT_TEST_SIDEBAND_ALL=1 \
> +	git -c protocol.version=2 \
> +		-c fetch.uriprotocols=http,https \
> +		clone "$HTTPD_URL/smart/http_parent" http_child &&
> +
> +	grep "clone< <redacted>" log

This checks only that "redacted" string appears, but what the theme
of the change really cares about is different, no?  You want to
ensure that no sensitive substring of the URI appears in the log.

Imagine somebody breaking the redact logic by making it prepend that
string to the payload, instead of replacing the payload with that
string---this test will not catch such a regression.

Thanks.
Ivan Frade Oct. 26, 2021, 7:32 p.m. UTC | #2
It seems I sent my original reply only to the github PR. Sorry for the
confusion:

On Mon, Oct 11, 2021 at 1:39 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> "Ivan Frade via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > From: Ivan Frade <ifrade@google.com>
...
>
> It of course is a different matter if the explained idea is
> agreeable, though ;-).  Hiding the entire packet, based on the "it
> might be in some setups" seems a bit too much.
>
> Is it often the case that the whole URI is sensitive, or perhaps
> leading "<scheme>://<host>/pack-<abc>.pack" part is not sensitive at
> all, and what follows after that "public" part has some "nonce"
> material that makes it sensitive?

In the specific case I am working on, the path of the URL is an
encrypted string that shouldn't be completely exposed (exposing part
of it would be fine). In general, I think we can assume that
<scheme>://<host>/ are always "public" but the path could be
sensitive.

We could redact only the path (<scheme>://<host>/REDACTED), or even a
fixed length of the URL? (<scheme>://<host>/pack-<xxREDACTED).

In the next patch version I go with redacting the path.


> > Changes since v1:
...
>  Please write such material below the three-dash line.
Done

> And there is no need to duplicate the log message here ;-)
Done

> So "original_options" is used to save away the reader->options so
> that it can be restored before returning to our caller?
>
> OK (it may be more common in this codebase to call such a variable
> "saved_X", though).

In the latest iteration, the option is enabled for all sections and
there is no need to set/unset the flag.

> > +     grep "clone< <redacted>" log
>
> This checks only that "redacted" string appears, but what the theme
> of the change really cares about is different, no?  You want to
> ensure that no sensitive substring of the URI appears in the log.
>
> Imagine somebody breaking the redact logic by making it prepend that
> string to the payload, instead of replacing the payload with that
> string---this test will not catch such a regression.

Now the tests verify the expected packfile-uri full line is in the log.

Thanks,

Ivan
diff mbox series

Patch

diff --git a/fetch-pack.c b/fetch-pack.c
index a9604f35a3e..05c85eeafa1 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -1518,7 +1518,16 @@  static void receive_wanted_refs(struct packet_reader *reader,
 static void receive_packfile_uris(struct packet_reader *reader,
 				  struct string_list *uris)
 {
+	int original_options;
 	process_section_header(reader, "packfile-uris", 0);
+	/*
+	 * In some setups, packfile-uris act as bearer tokens,
+	 * redact them by default.
+	 */
+	original_options = reader->options;
+	if (git_env_bool("GIT_TRACE_REDACT", 1))
+		reader->options |= PACKET_READ_REDACT_ON_TRACE;
+
 	while (packet_reader_read(reader) == PACKET_READ_NORMAL) {
 		if (reader->pktlen < the_hash_algo->hexsz ||
 		    reader->line[the_hash_algo->hexsz] != ' ')
@@ -1526,6 +1535,8 @@  static void receive_packfile_uris(struct packet_reader *reader,
 
 		string_list_append(uris, reader->line);
 	}
+	reader->options = original_options;
+
 	if (reader->status != PACKET_READ_DELIM)
 		die("expected DELIM");
 }
diff --git a/pkt-line.c b/pkt-line.c
index de4a94b437e..8da8ed88ccf 100644
--- a/pkt-line.c
+++ b/pkt-line.c
@@ -443,7 +443,12 @@  enum packet_read_status packet_read_with_status(int fd, char **src_buffer,
 		len--;
 
 	buffer[len] = 0;
-	packet_trace(buffer, len, 0);
+	if (options & PACKET_READ_REDACT_ON_TRACE) {
+		const char *redacted = "<redacted>";
+		packet_trace(redacted, strlen(redacted), 0);
+	} else {
+		packet_trace(buffer, len, 0);
+	}
 
 	if ((options & PACKET_READ_DIE_ON_ERR_PACKET) &&
 	    starts_with(buffer, "ERR "))
diff --git a/pkt-line.h b/pkt-line.h
index 82b95e4bdd3..44c02f3bc6e 100644
--- a/pkt-line.h
+++ b/pkt-line.h
@@ -88,6 +88,7 @@  void packet_fflush(FILE *f);
 #define PACKET_READ_CHOMP_NEWLINE        (1u<<1)
 #define PACKET_READ_DIE_ON_ERR_PACKET    (1u<<2)
 #define PACKET_READ_GENTLE_ON_READ_ERROR (1u<<3)
+#define PACKET_READ_REDACT_ON_TRACE      (1u<<4)
 int packet_read(int fd, char **src_buffer, size_t *src_len, char
 		*buffer, unsigned size, int options);
 
diff --git a/t/t5702-protocol-v2.sh b/t/t5702-protocol-v2.sh
index d527cf6c49f..f0273317861 100755
--- a/t/t5702-protocol-v2.sh
+++ b/t/t5702-protocol-v2.sh
@@ -1107,6 +1107,49 @@  test_expect_success 'packfile-uri with transfer.fsckobjects fails when .gitmodul
 	test_i18ngrep "disallowed submodule name" err
 '
 
+test_expect_success 'packfile-uri redacted in trace' '
+	P="$HTTPD_DOCUMENT_ROOT_PATH/http_parent" &&
+	rm -rf "$P" http_child log &&
+
+	git init "$P" &&
+	git -C "$P" config "uploadpack.allowsidebandall" "true" &&
+
+	echo my-blob >"$P/my-blob" &&
+	git -C "$P" add my-blob &&
+	git -C "$P" commit -m x &&
+
+	configure_exclusion "$P" my-blob >h &&
+
+	GIT_TRACE=1 GIT_TRACE_PACKET="$(pwd)/log" GIT_TEST_SIDEBAND_ALL=1 \
+	git -c protocol.version=2 \
+		-c fetch.uriprotocols=http,https \
+		clone "$HTTPD_URL/smart/http_parent" http_child &&
+
+	grep "clone< <redacted>" log
+'
+
+test_expect_success 'packfile-uri not redacted in trace when GIT_TRACE_REDACT=0' '
+	P="$HTTPD_DOCUMENT_ROOT_PATH/http_parent" &&
+	rm -rf "$P" http_child log &&
+
+	git init "$P" &&
+	git -C "$P" config "uploadpack.allowsidebandall" "true" &&
+
+	echo my-blob >"$P/my-blob" &&
+	git -C "$P" add my-blob &&
+	git -C "$P" commit -m x &&
+
+	configure_exclusion "$P" my-blob >h &&
+
+	GIT_TRACE=1 GIT_TRACE_PACKET="$(pwd)/log" GIT_TEST_SIDEBAND_ALL=1 \
+	GIT_TRACE_REDACT=0 \
+	git -c protocol.version=2 \
+		-c fetch.uriprotocols=http,https \
+		clone "$HTTPD_URL/smart/http_parent" http_child &&
+
+	grep -E "clone< ..[0-9a-f]{40,64} http://" log
+'
+
 test_expect_success 'http:// --negotiate-only' '
 	SERVER="$HTTPD_DOCUMENT_ROOT_PATH/server" &&
 	URI="$HTTPD_URL/smart/server" &&