diff mbox series

[1/2] fetch-pack: redact packfile urls in traces

Message ID b473f145a87a22db99734c6a21395f0d24c3da3c.1633708986.git.gitgitgadget@gmail.com (mailing list archive)
State Superseded
Headers show
Series fetch-pack: redact packfile urls in traces | expand

Commit Message

Ivan Frade Oct. 8, 2021, 4:03 p.m. UTC
From: Ivan Frade <ifrade@google.com>

In some setups, packfile uris act as bearer token. It is not
recommended to expose them plainly in logs, although in special
circunstances (e.g. debug) it makes sense to write them.

Redact the packfile-uri lines by default, unless the GIT_TRACE_REDACT
variable is set to false. This mimics the redacting of the
Authorization header in HTTP.

Signed-off-by: Ivan Frade <ifrade@google.com>
---
 fetch-pack.c           | 11 +++++++++++
 http-fetch.c           |  4 +++-
 pkt-line.c             |  7 ++++++-
 pkt-line.h             |  1 +
 t/t5702-protocol-v2.sh | 43 ++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 64 insertions(+), 2 deletions(-)

Comments

Ævar Arnfjörð Bjarmason Oct. 8, 2021, 7:36 p.m. UTC | #1
On Fri, Oct 08 2021, Ivan Frade via GitGitGadget wrote:

> diff --git a/http-fetch.c b/http-fetch.c
> index fa642462a9e..d35e33e4f65 100644
> --- a/http-fetch.c
> +++ b/http-fetch.c
> @@ -63,7 +63,9 @@ static void fetch_single_packfile(struct object_id *packfile_hash,
>  	if (start_active_slot(preq->slot)) {
>  		run_active_slot(preq->slot);
>  		if (results.curl_result != CURLE_OK) {
> -			die("Unable to get pack file %s\n%s", preq->url,
> +			int showUrl = git_env_bool("GIT_TRACE_REDACT", 1);
> +			die("Unable to get offloaded pack file %s\n%s",
> +			    showUrl ? preq->url : "<redacted>",
>  			    curl_errorstr);
>  		}
>  	} else {

Your CL and commit message just talk about traes, but this is a die()
message.

Perhaps it makes sense to redact it there too for some reason, but that
seems to be a thing to separately argue for.

This message is shown interactively to users, and I could see it be
annoying to not have the URL that failed in your terminal output, even
if it has some one-time token.

Which is presumably different from the use-cases you're thinking of, I'm
assuming some logging of detached processes, or central logging of user
actions.

> +test_expect_success 'packfile-uri redacted in trace' '
> +	P="$HTTPD_DOCUMENT_ROOT_PATH/http_parent" &&
> +	rm -rf "$P" http_child log &&
> +
> +	git init "$P" &&
> +	git -C "$P" config "uploadpack.allowsidebandall" "true" &&
> +
> +	echo my-blob >"$P/my-blob" &&
> +	git -C "$P" add my-blob &&
> +	git -C "$P" commit -m x &&
> +
> +	configure_exclusion "$P" my-blob >h &&
> +
> +	GIT_TRACE=1 GIT_TRACE_PACKET="$(pwd)/log" GIT_TEST_SIDEBAND_ALL=1 \
> +	git -c protocol.version=2 \
> +		-c fetch.uriprotocols=http,https \
> +		clone "$HTTPD_URL/smart/http_parent" http_child &&
> +
> +	grep -A1 "clone<\ ..packfile-uris" log | grep "clone<\ <redacted>"

We don't rely on GNU options like those for the test suite, it'll break
on various supported platformrs.

In this case the whole LHS of the pipe looks like it could be dropped,
why not grep for "^clone< <redacted>"?

Also you don't need to quote the space character in regexes, it's not a
metacharacter.
Ivan Frade Oct. 8, 2021, 11:15 p.m. UTC | #2
On Fri, Oct 8, 2021 at 12:42 PM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
>
> On Fri, Oct 08 2021, Ivan Frade via GitGitGadget wrote:
>
> > diff --git a/http-fetch.c b/http-fetch.c
> > index fa642462a9e..d35e33e4f65 100644
> > --- a/http-fetch.c
> > +++ b/http-fetch.c
> > @@ -63,7 +63,9 @@ static void fetch_single_packfile(struct object_id *packfile_hash,
> >       if (start_active_slot(preq->slot)) {
> >               run_active_slot(preq->slot);
> >               if (results.curl_result != CURLE_OK) {
> > -                     die("Unable to get pack file %s\n%s", preq->url,
> > +                     int showUrl = git_env_bool("GIT_TRACE_REDACT", 1);
> > +                     die("Unable to get offloaded pack file %s\n%s",
> > +                         showUrl ? preq->url : "<redacted>",
> >                           curl_errorstr);
> >               }
> >       } else {
>
> Your CL and commit message just talk about traes, but this is a die()
> message.
>
> Perhaps it makes sense to redact it there too for some reason, but that
> seems to be a thing to separately argue for.
>
> This message is shown interactively to users, and I could see it be
> annoying to not have the URL that failed in your terminal output, even
> if it has some one-time token.


For a regular user the URL could be confusing (should they click on
it? try to download it by themselves?). I also got a suggestion to
print e.g. only the domain and maybe the packname.

In any case, I agree it is a different thing than trace logging. I
removed it from this patch.

>
> > +
> > +     grep -A1 "clone<\ ..packfile-uris" log | grep "clone<\ <redacted>"
>
> We don't rely on GNU options like those for the test suite, it'll break
> on various supported platformrs.
>
> In this case the whole LHS of the pipe looks like it could be dropped,
> why not grep for "^clone< <redacted>"?
>
>
> Also you don't need to quote the space character in regexes, it's not a
> metacharacter.

Updated the grep expressions to look only for the relevant lines and
removed the escaping of the space char.

I was trying to limit the grep to the "packfile-uri" section, not to
match something else by accident, but I think "obj-id http://"
shouldn't match anything else in the clone response (no ref can start
with http://).

Thanks for the quick review!

Ivan
diff mbox series

Patch

diff --git a/fetch-pack.c b/fetch-pack.c
index a9604f35a3e..05c85eeafa1 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -1518,7 +1518,16 @@  static void receive_wanted_refs(struct packet_reader *reader,
 static void receive_packfile_uris(struct packet_reader *reader,
 				  struct string_list *uris)
 {
+	int original_options;
 	process_section_header(reader, "packfile-uris", 0);
+	/*
+	 * In some setups, packfile-uris act as bearer tokens,
+	 * redact them by default.
+	 */
+	original_options = reader->options;
+	if (git_env_bool("GIT_TRACE_REDACT", 1))
+		reader->options |= PACKET_READ_REDACT_ON_TRACE;
+
 	while (packet_reader_read(reader) == PACKET_READ_NORMAL) {
 		if (reader->pktlen < the_hash_algo->hexsz ||
 		    reader->line[the_hash_algo->hexsz] != ' ')
@@ -1526,6 +1535,8 @@  static void receive_packfile_uris(struct packet_reader *reader,
 
 		string_list_append(uris, reader->line);
 	}
+	reader->options = original_options;
+
 	if (reader->status != PACKET_READ_DELIM)
 		die("expected DELIM");
 }
diff --git a/http-fetch.c b/http-fetch.c
index fa642462a9e..d35e33e4f65 100644
--- a/http-fetch.c
+++ b/http-fetch.c
@@ -63,7 +63,9 @@  static void fetch_single_packfile(struct object_id *packfile_hash,
 	if (start_active_slot(preq->slot)) {
 		run_active_slot(preq->slot);
 		if (results.curl_result != CURLE_OK) {
-			die("Unable to get pack file %s\n%s", preq->url,
+			int showUrl = git_env_bool("GIT_TRACE_REDACT", 1);
+			die("Unable to get offloaded pack file %s\n%s",
+			    showUrl ? preq->url : "<redacted>",
 			    curl_errorstr);
 		}
 	} else {
diff --git a/pkt-line.c b/pkt-line.c
index de4a94b437e..8da8ed88ccf 100644
--- a/pkt-line.c
+++ b/pkt-line.c
@@ -443,7 +443,12 @@  enum packet_read_status packet_read_with_status(int fd, char **src_buffer,
 		len--;
 
 	buffer[len] = 0;
-	packet_trace(buffer, len, 0);
+	if (options & PACKET_READ_REDACT_ON_TRACE) {
+		const char *redacted = "<redacted>";
+		packet_trace(redacted, strlen(redacted), 0);
+	} else {
+		packet_trace(buffer, len, 0);
+	}
 
 	if ((options & PACKET_READ_DIE_ON_ERR_PACKET) &&
 	    starts_with(buffer, "ERR "))
diff --git a/pkt-line.h b/pkt-line.h
index 82b95e4bdd3..44c02f3bc6e 100644
--- a/pkt-line.h
+++ b/pkt-line.h
@@ -88,6 +88,7 @@  void packet_fflush(FILE *f);
 #define PACKET_READ_CHOMP_NEWLINE        (1u<<1)
 #define PACKET_READ_DIE_ON_ERR_PACKET    (1u<<2)
 #define PACKET_READ_GENTLE_ON_READ_ERROR (1u<<3)
+#define PACKET_READ_REDACT_ON_TRACE      (1u<<4)
 int packet_read(int fd, char **src_buffer, size_t *src_len, char
 		*buffer, unsigned size, int options);
 
diff --git a/t/t5702-protocol-v2.sh b/t/t5702-protocol-v2.sh
index d527cf6c49f..a620a678a56 100755
--- a/t/t5702-protocol-v2.sh
+++ b/t/t5702-protocol-v2.sh
@@ -1107,6 +1107,49 @@  test_expect_success 'packfile-uri with transfer.fsckobjects fails when .gitmodul
 	test_i18ngrep "disallowed submodule name" err
 '
 
+test_expect_success 'packfile-uri redacted in trace' '
+	P="$HTTPD_DOCUMENT_ROOT_PATH/http_parent" &&
+	rm -rf "$P" http_child log &&
+
+	git init "$P" &&
+	git -C "$P" config "uploadpack.allowsidebandall" "true" &&
+
+	echo my-blob >"$P/my-blob" &&
+	git -C "$P" add my-blob &&
+	git -C "$P" commit -m x &&
+
+	configure_exclusion "$P" my-blob >h &&
+
+	GIT_TRACE=1 GIT_TRACE_PACKET="$(pwd)/log" GIT_TEST_SIDEBAND_ALL=1 \
+	git -c protocol.version=2 \
+		-c fetch.uriprotocols=http,https \
+		clone "$HTTPD_URL/smart/http_parent" http_child &&
+
+	grep -A1 "clone<\ ..packfile-uris" log | grep "clone<\ <redacted>"
+'
+
+test_expect_success 'packfile-uri not redacted in trace when GIT_TRACE_REDACT=0' '
+	P="$HTTPD_DOCUMENT_ROOT_PATH/http_parent" &&
+	rm -rf "$P" http_child log &&
+
+	git init "$P" &&
+	git -C "$P" config "uploadpack.allowsidebandall" "true" &&
+
+	echo my-blob >"$P/my-blob" &&
+	git -C "$P" add my-blob &&
+	git -C "$P" commit -m x &&
+
+	configure_exclusion "$P" my-blob >h &&
+
+	GIT_TRACE=1 GIT_TRACE_PACKET="$(pwd)/log" GIT_TEST_SIDEBAND_ALL=1 \
+	GIT_TRACE_REDACT=0 \
+	git -c protocol.version=2 \
+		-c fetch.uriprotocols=http,https \
+		clone "$HTTPD_URL/smart/http_parent" http_child &&
+
+	grep -A1 "clone<\ ..packfile-uris" log  | grep -E "clone<\ ..[[:alnum:]]{40,64}\ http"
+'
+
 test_expect_success 'http:// --negotiate-only' '
 	SERVER="$HTTPD_DOCUMENT_ROOT_PATH/server" &&
 	URI="$HTTPD_URL/smart/server" &&