Message ID | b473f145a87a22db99734c6a21395f0d24c3da3c.1633708986.git.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | fetch-pack: redact packfile urls in traces | expand |
On Fri, Oct 08 2021, Ivan Frade via GitGitGadget wrote: > diff --git a/http-fetch.c b/http-fetch.c > index fa642462a9e..d35e33e4f65 100644 > --- a/http-fetch.c > +++ b/http-fetch.c > @@ -63,7 +63,9 @@ static void fetch_single_packfile(struct object_id *packfile_hash, > if (start_active_slot(preq->slot)) { > run_active_slot(preq->slot); > if (results.curl_result != CURLE_OK) { > - die("Unable to get pack file %s\n%s", preq->url, > + int showUrl = git_env_bool("GIT_TRACE_REDACT", 1); > + die("Unable to get offloaded pack file %s\n%s", > + showUrl ? preq->url : "<redacted>", > curl_errorstr); > } > } else { Your CL and commit message just talk about traes, but this is a die() message. Perhaps it makes sense to redact it there too for some reason, but that seems to be a thing to separately argue for. This message is shown interactively to users, and I could see it be annoying to not have the URL that failed in your terminal output, even if it has some one-time token. Which is presumably different from the use-cases you're thinking of, I'm assuming some logging of detached processes, or central logging of user actions. > +test_expect_success 'packfile-uri redacted in trace' ' > + P="$HTTPD_DOCUMENT_ROOT_PATH/http_parent" && > + rm -rf "$P" http_child log && > + > + git init "$P" && > + git -C "$P" config "uploadpack.allowsidebandall" "true" && > + > + echo my-blob >"$P/my-blob" && > + git -C "$P" add my-blob && > + git -C "$P" commit -m x && > + > + configure_exclusion "$P" my-blob >h && > + > + GIT_TRACE=1 GIT_TRACE_PACKET="$(pwd)/log" GIT_TEST_SIDEBAND_ALL=1 \ > + git -c protocol.version=2 \ > + -c fetch.uriprotocols=http,https \ > + clone "$HTTPD_URL/smart/http_parent" http_child && > + > + grep -A1 "clone<\ ..packfile-uris" log | grep "clone<\ <redacted>" We don't rely on GNU options like those for the test suite, it'll break on various supported platformrs. In this case the whole LHS of the pipe looks like it could be dropped, why not grep for "^clone< <redacted>"? Also you don't need to quote the space character in regexes, it's not a metacharacter.
On Fri, Oct 8, 2021 at 12:42 PM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote: > > > On Fri, Oct 08 2021, Ivan Frade via GitGitGadget wrote: > > > diff --git a/http-fetch.c b/http-fetch.c > > index fa642462a9e..d35e33e4f65 100644 > > --- a/http-fetch.c > > +++ b/http-fetch.c > > @@ -63,7 +63,9 @@ static void fetch_single_packfile(struct object_id *packfile_hash, > > if (start_active_slot(preq->slot)) { > > run_active_slot(preq->slot); > > if (results.curl_result != CURLE_OK) { > > - die("Unable to get pack file %s\n%s", preq->url, > > + int showUrl = git_env_bool("GIT_TRACE_REDACT", 1); > > + die("Unable to get offloaded pack file %s\n%s", > > + showUrl ? preq->url : "<redacted>", > > curl_errorstr); > > } > > } else { > > Your CL and commit message just talk about traes, but this is a die() > message. > > Perhaps it makes sense to redact it there too for some reason, but that > seems to be a thing to separately argue for. > > This message is shown interactively to users, and I could see it be > annoying to not have the URL that failed in your terminal output, even > if it has some one-time token. For a regular user the URL could be confusing (should they click on it? try to download it by themselves?). I also got a suggestion to print e.g. only the domain and maybe the packname. In any case, I agree it is a different thing than trace logging. I removed it from this patch. > > > + > > + grep -A1 "clone<\ ..packfile-uris" log | grep "clone<\ <redacted>" > > We don't rely on GNU options like those for the test suite, it'll break > on various supported platformrs. > > In this case the whole LHS of the pipe looks like it could be dropped, > why not grep for "^clone< <redacted>"? > > > Also you don't need to quote the space character in regexes, it's not a > metacharacter. Updated the grep expressions to look only for the relevant lines and removed the escaping of the space char. I was trying to limit the grep to the "packfile-uri" section, not to match something else by accident, but I think "obj-id http://" shouldn't match anything else in the clone response (no ref can start with http://). Thanks for the quick review! Ivan
diff --git a/fetch-pack.c b/fetch-pack.c index a9604f35a3e..05c85eeafa1 100644 --- a/fetch-pack.c +++ b/fetch-pack.c @@ -1518,7 +1518,16 @@ static void receive_wanted_refs(struct packet_reader *reader, static void receive_packfile_uris(struct packet_reader *reader, struct string_list *uris) { + int original_options; process_section_header(reader, "packfile-uris", 0); + /* + * In some setups, packfile-uris act as bearer tokens, + * redact them by default. + */ + original_options = reader->options; + if (git_env_bool("GIT_TRACE_REDACT", 1)) + reader->options |= PACKET_READ_REDACT_ON_TRACE; + while (packet_reader_read(reader) == PACKET_READ_NORMAL) { if (reader->pktlen < the_hash_algo->hexsz || reader->line[the_hash_algo->hexsz] != ' ') @@ -1526,6 +1535,8 @@ static void receive_packfile_uris(struct packet_reader *reader, string_list_append(uris, reader->line); } + reader->options = original_options; + if (reader->status != PACKET_READ_DELIM) die("expected DELIM"); } diff --git a/http-fetch.c b/http-fetch.c index fa642462a9e..d35e33e4f65 100644 --- a/http-fetch.c +++ b/http-fetch.c @@ -63,7 +63,9 @@ static void fetch_single_packfile(struct object_id *packfile_hash, if (start_active_slot(preq->slot)) { run_active_slot(preq->slot); if (results.curl_result != CURLE_OK) { - die("Unable to get pack file %s\n%s", preq->url, + int showUrl = git_env_bool("GIT_TRACE_REDACT", 1); + die("Unable to get offloaded pack file %s\n%s", + showUrl ? preq->url : "<redacted>", curl_errorstr); } } else { diff --git a/pkt-line.c b/pkt-line.c index de4a94b437e..8da8ed88ccf 100644 --- a/pkt-line.c +++ b/pkt-line.c @@ -443,7 +443,12 @@ enum packet_read_status packet_read_with_status(int fd, char **src_buffer, len--; buffer[len] = 0; - packet_trace(buffer, len, 0); + if (options & PACKET_READ_REDACT_ON_TRACE) { + const char *redacted = "<redacted>"; + packet_trace(redacted, strlen(redacted), 0); + } else { + packet_trace(buffer, len, 0); + } if ((options & PACKET_READ_DIE_ON_ERR_PACKET) && starts_with(buffer, "ERR ")) diff --git a/pkt-line.h b/pkt-line.h index 82b95e4bdd3..44c02f3bc6e 100644 --- a/pkt-line.h +++ b/pkt-line.h @@ -88,6 +88,7 @@ void packet_fflush(FILE *f); #define PACKET_READ_CHOMP_NEWLINE (1u<<1) #define PACKET_READ_DIE_ON_ERR_PACKET (1u<<2) #define PACKET_READ_GENTLE_ON_READ_ERROR (1u<<3) +#define PACKET_READ_REDACT_ON_TRACE (1u<<4) int packet_read(int fd, char **src_buffer, size_t *src_len, char *buffer, unsigned size, int options); diff --git a/t/t5702-protocol-v2.sh b/t/t5702-protocol-v2.sh index d527cf6c49f..a620a678a56 100755 --- a/t/t5702-protocol-v2.sh +++ b/t/t5702-protocol-v2.sh @@ -1107,6 +1107,49 @@ test_expect_success 'packfile-uri with transfer.fsckobjects fails when .gitmodul test_i18ngrep "disallowed submodule name" err ' +test_expect_success 'packfile-uri redacted in trace' ' + P="$HTTPD_DOCUMENT_ROOT_PATH/http_parent" && + rm -rf "$P" http_child log && + + git init "$P" && + git -C "$P" config "uploadpack.allowsidebandall" "true" && + + echo my-blob >"$P/my-blob" && + git -C "$P" add my-blob && + git -C "$P" commit -m x && + + configure_exclusion "$P" my-blob >h && + + GIT_TRACE=1 GIT_TRACE_PACKET="$(pwd)/log" GIT_TEST_SIDEBAND_ALL=1 \ + git -c protocol.version=2 \ + -c fetch.uriprotocols=http,https \ + clone "$HTTPD_URL/smart/http_parent" http_child && + + grep -A1 "clone<\ ..packfile-uris" log | grep "clone<\ <redacted>" +' + +test_expect_success 'packfile-uri not redacted in trace when GIT_TRACE_REDACT=0' ' + P="$HTTPD_DOCUMENT_ROOT_PATH/http_parent" && + rm -rf "$P" http_child log && + + git init "$P" && + git -C "$P" config "uploadpack.allowsidebandall" "true" && + + echo my-blob >"$P/my-blob" && + git -C "$P" add my-blob && + git -C "$P" commit -m x && + + configure_exclusion "$P" my-blob >h && + + GIT_TRACE=1 GIT_TRACE_PACKET="$(pwd)/log" GIT_TEST_SIDEBAND_ALL=1 \ + GIT_TRACE_REDACT=0 \ + git -c protocol.version=2 \ + -c fetch.uriprotocols=http,https \ + clone "$HTTPD_URL/smart/http_parent" http_child && + + grep -A1 "clone<\ ..packfile-uris" log | grep -E "clone<\ ..[[:alnum:]]{40,64}\ http" +' + test_expect_success 'http:// --negotiate-only' ' SERVER="$HTTPD_DOCUMENT_ROOT_PATH/server" && URI="$HTTPD_URL/smart/server" &&