diff mbox series

[v4,2/2] http-fetch: redact url on die() message

Message ID c7f0977cabd4ba7311b8045bc57e9e30198651fd.1635288599.git.gitgitgadget@gmail.com (mailing list archive)
State Superseded
Headers show
Series fetch-pack: redact packfile urls in traces | expand

Commit Message

Ivan Frade Oct. 26, 2021, 10:49 p.m. UTC
From: Ivan Frade <ifrade@google.com>

http-fetch prints the URL after failing to fetch it. This can be
confusing to users (they cannot really do anything with it) but even
worse, they can share by accident a sensitive URL (e.g. with
credentials) while looking for help.

Redact the URL unless the GIT_TRACE_REDACT variable is set to false. This
mimics the redaction of other sensitive information in git, like the
Authorization header in HTTP.

Signed-off-by: Ivan Frade <ifrade@google.com>
---
 http-fetch.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

Comments

Ævar Arnfjörð Bjarmason Oct. 28, 2021, 4:39 p.m. UTC | #1
On Tue, Oct 26 2021, Ivan Frade via GitGitGadget wrote:

> From: Ivan Frade <ifrade@google.com>
>
> http-fetch prints the URL after failing to fetch it. This can be
> confusing to users (they cannot really do anything with it) but even
> worse, they can share by accident a sensitive URL (e.g. with
> credentials) while looking for help.
>
> Redact the URL unless the GIT_TRACE_REDACT variable is set to false. This
> mimics the redaction of other sensitive information in git, like the
> Authorization header in HTTP.
>
> Signed-off-by: Ivan Frade <ifrade@google.com>
> ---
>  http-fetch.c | 15 +++++++++++++--
>  1 file changed, 13 insertions(+), 2 deletions(-)
>
> diff --git a/http-fetch.c b/http-fetch.c
> index fa642462a9e..bbe09a6ad9f 100644
> --- a/http-fetch.c
> +++ b/http-fetch.c
> @@ -4,6 +4,7 @@
>  #include "http.h"
>  #include "walker.h"
>  #include "strvec.h"
> +#include "urlmatch.h"
>  
>  static const char http_fetch_usage[] = "git http-fetch "
>  "[-c] [-t] [-a] [-v] [--recover] [-w ref] [--stdin | --packfile=hash | commit-id] url";
> @@ -63,8 +64,18 @@ static void fetch_single_packfile(struct object_id *packfile_hash,
>  	if (start_active_slot(preq->slot)) {
>  		run_active_slot(preq->slot);
>  		if (results.curl_result != CURLE_OK) {
> -			die("Unable to get pack file %s\n%s", preq->url,
> -			    curl_errorstr);
> +			struct url_info url;
> +			char *nurl = url_normalize(preq->url, &url);
> +			if (!git_env_bool("GIT_TRACE_REDACT", 1) || !nurl) {
> +				die("Unable to get pack file %s\n%s", preq->url,
> +				    curl_errorstr);

small nit: arrange if's from "if (cheap || expensive)", i.e. no need for
getenv() if !nurl, but maybe compilers are smart enough for that...

nit: die() messages should start with lower-case (in CodingGuidelines), and I think it's better to quote both, so:

    die("unable to get pack '%s': '%s'", ...)

Or maybe without the second '%s', as in 3e8084f1884 (http: check
CURLE_SSL_PINNEDPUBKEYNOTMATCH when emitting errors, 2021-09-24) (which
I authored, but just copy/pasted the convention in the surrounding
code)>

> +			} else {
> +				char *schema = xstrndup(url.url, url.scheme_len);
> +				char *host = xstrndup(&url.url[url.host_off], url.host_len);
> +				die("failed to get '%s' url from '%s' "
> +				    "(full URL redacted due to GIT_TRACE_REDACT setting)\n%s",
> +				    schema, host, curl_errorstr);

Hrm, I haven't tested, but aren't both of those xstrndup's redundant to
using %*s instead of %s for the printf format? I.e.:

    die("failed to get '%*s'[...]", url.schema_len, url.url, )
Eric Sunshine Oct. 28, 2021, 5:25 p.m. UTC | #2
On Thu, Oct 28, 2021 at 12:46 PM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
> On Tue, Oct 26 2021, Ivan Frade via GitGitGadget wrote:
> >               if (results.curl_result != CURLE_OK) {
> > -                     die("Unable to get pack file %s\n%s", preq->url,
> > -                         curl_errorstr);
> > +                     struct url_info url;
> > +                     char *nurl = url_normalize(preq->url, &url);
> > +                     if (!git_env_bool("GIT_TRACE_REDACT", 1) || !nurl) {
> > +                             die("Unable to get pack file %s\n%s", preq->url,
> > +                                 curl_errorstr);
>
> small nit: arrange if's from "if (cheap || expensive)", i.e. no need for
> getenv() if !nurl, but maybe compilers are smart enough for that...

I had the same passing thought when glancing over this code (although
this appears to be an error patch, thus not performance critical, so
not terribly important).

> nit: die() messages should start with lower-case (in CodingGuidelines), and I think it's better to quote both, so:
>
>     die("unable to get pack '%s': '%s'", ...)
>
> Or maybe without the second '%s', as in 3e8084f1884 (http: check
> CURLE_SSL_PINNEDPUBKEYNOTMATCH when emitting errors, 2021-09-24) (which
> I authored, but just copy/pasted the convention in the surrounding
> code)>

Note that this is not a new die() call; it just got indented as-is by
this patch, so the changes you suggest to the message string are
potentially outside the scope of this patch. Possibilities: (1) make
the changes in this patch but mention them in the commit message; (2)
make the changes in a preparatory patch; (3) punt on the changes for
now.

> > +                     } else {
> > +                             char *schema = xstrndup(url.url, url.scheme_len);
> > +                             char *host = xstrndup(&url.url[url.host_off], url.host_len);
> > +                             die("failed to get '%s' url from '%s' "
> > +                                 "(full URL redacted due to GIT_TRACE_REDACT setting)\n%s",
> > +                                 schema, host, curl_errorstr);
>
> Hrm, I haven't tested, but aren't both of those xstrndup's redundant to
> using %*s instead of %s for the printf format? I.e.:
>
>     die("failed to get '%*s'[...]", url.schema_len, url.url, )

I wondered the same when reading the patch. Thanks for mentioning it.
Ivan Frade Oct. 28, 2021, 10:41 p.m. UTC | #3
On Thu, Oct 28, 2021 at 9:46 AM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
>
> On Tue, Oct 26 2021, Ivan Frade via GitGitGadget wrote:
>
> > From: Ivan Frade <ifrade@google.com>
> >
...
> > +                     if (!git_env_bool("GIT_TRACE_REDACT", 1) || !nurl) {
> > +                             die("Unable to get pack file %s\n%s", preq->url,
> > +                                 curl_errorstr);
>
> small nit: arrange if's from "if (cheap || expensive)", i.e. no need for
> getenv() if !nurl, but maybe compilers are smart enough for that...

Done

> nit: die() messages should start with lower-case (in CodingGuidelines), and I think it's better to quote both, so:
>
>     die("unable to get pack '%s': '%s'", ...)
>
> Or maybe without the second '%s', as in 3e8084f1884 (http: check
> CURLE_SSL_PINNEDPUBKEYNOTMATCH when emitting errors, 2021-09-24) (which
> I authored, but just copy/pasted the convention in the surrounding
> code)>

Done

> > +                     } else {
> > +                             char *schema = xstrndup(url.url, url.scheme_len);
> > +                             char *host = xstrndup(&url.url[url.host_off], url.host_len);
> > +                             die("failed to get '%s' url from '%s' "
> > +                                 "(full URL redacted due to GIT_TRACE_REDACT setting)\n%s",
> > +                                 schema, host, curl_errorstr);
>
> Hrm, I haven't tested, but aren't both of those xstrndup's redundant to
> using %*s instead of %s for the printf format? I.e.:
>
>     die("failed to get '%*s'[...]", url.schema_len, url.url, )

Indeed, "%.*s" did the trick. Thanks!

Ivan
Ivan Frade Oct. 28, 2021, 10:44 p.m. UTC | #4
On Thu, Oct 28, 2021 at 10:26 AM Eric Sunshine <sunshine@sunshineco.com> wrote:
>
> On Thu, Oct 28, 2021 at 12:46 PM Ævar Arnfjörð Bjarmason
> <avarab@gmail.com> wrote:

>
> > nit: die() messages should start with lower-case (in CodingGuidelines), and I think it's better to quote both, so:
> >
> >     die("unable to get pack '%s': '%s'", ...)
> >
> > Or maybe without the second '%s', as in 3e8084f1884 (http: check
> > CURLE_SSL_PINNEDPUBKEYNOTMATCH when emitting errors, 2021-09-24) (which
> > I authored, but just copy/pasted the convention in the surrounding
> > code)>
>
> Note that this is not a new die() call; it just got indented as-is by
> this patch, so the changes you suggest to the message string are
> potentially outside the scope of this patch. Possibilities: (1) make
> the changes in this patch but mention them in the commit message; (2)
> make the changes in a preparatory patch; (3) punt on the changes for
> now.

I went for option (1). It is a minimal change and we are moving that
line in this commit.

Ivan
Junio C Hamano Oct. 29, 2021, 11:18 p.m. UTC | #5
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

>> +			if (!git_env_bool("GIT_TRACE_REDACT", 1) || !nurl) {
>> +				die("Unable to get pack file %s\n%s", preq->url,
>> +				    curl_errorstr);
>
> small nit: arrange if's from "if (cheap || expensive)", i.e. no need for
> getenv() if !nurl, but maybe compilers are smart enough for that...

They typically do not see what happens inside git_env_bool() while
compling this compilation unit, and cannot tell if the programmer
wanted to call it first for its side effects, hence they cannot
swap them safely.
Ævar Arnfjörð Bjarmason Nov. 9, 2021, 1:54 a.m. UTC | #6
On Fri, Oct 29 2021, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>
>>> +			if (!git_env_bool("GIT_TRACE_REDACT", 1) || !nurl) {
>>> +				die("Unable to get pack file %s\n%s", preq->url,
>>> +				    curl_errorstr);
>>
>> small nit: arrange if's from "if (cheap || expensive)", i.e. no need for
>> getenv() if !nurl, but maybe compilers are smart enough for that...
>
> They typically do not see what happens inside git_env_bool() while
> compling this compilation unit, and cannot tell if the programmer
> wanted to call it first for its side effects, hence they cannot
> swap them safely.

*nod*, but since that function is just:
    
    int git_env_bool(const char *k, int def)
    {
            const char *v = getenv(k);
            return v ? git_config_bool(k, v) : def;
    }

I was hedging and pondering if some compilers were smart enough these
days to optimize things like that.

I.e. in this case getenv() is a simple C library function, the env
variable is constant, and we do a boolean test of it before calling
git_config_bool().

So a sufficiently smart compiler could turn that into:

     /* global, probably something iterated over env already */
    static int __have_seen_GIT_TRACE_REDACT = 0;
    ...

    if ((!__have_seen_GIT_TRACE_REDACT || !nurl) ||
        (__have_seen_GIT_TRACE_REDACT && git_env_bool_without_v_bool_check(...)))

But probably not, since it wolud need quite a bit of C library
cooperation/hooks...
diff mbox series

Patch

diff --git a/http-fetch.c b/http-fetch.c
index fa642462a9e..bbe09a6ad9f 100644
--- a/http-fetch.c
+++ b/http-fetch.c
@@ -4,6 +4,7 @@ 
 #include "http.h"
 #include "walker.h"
 #include "strvec.h"
+#include "urlmatch.h"
 
 static const char http_fetch_usage[] = "git http-fetch "
 "[-c] [-t] [-a] [-v] [--recover] [-w ref] [--stdin | --packfile=hash | commit-id] url";
@@ -63,8 +64,18 @@  static void fetch_single_packfile(struct object_id *packfile_hash,
 	if (start_active_slot(preq->slot)) {
 		run_active_slot(preq->slot);
 		if (results.curl_result != CURLE_OK) {
-			die("Unable to get pack file %s\n%s", preq->url,
-			    curl_errorstr);
+			struct url_info url;
+			char *nurl = url_normalize(preq->url, &url);
+			if (!git_env_bool("GIT_TRACE_REDACT", 1) || !nurl) {
+				die("Unable to get pack file %s\n%s", preq->url,
+				    curl_errorstr);
+			} else {
+				char *schema = xstrndup(url.url, url.scheme_len);
+				char *host = xstrndup(&url.url[url.host_off], url.host_len);
+				die("failed to get '%s' url from '%s' "
+				    "(full URL redacted due to GIT_TRACE_REDACT setting)\n%s",
+				    schema, host, curl_errorstr);
+			}
 		}
 	} else {
 		die("Unable to start request");