diff mbox series

Docs: web server must setenv GIT_PROTOCOL for v2

Message ID 20210904151721.445117-1-konstantin@linuxfoundation.org (mailing list archive)
State New, archived
Headers show
Series Docs: web server must setenv GIT_PROTOCOL for v2 | expand

Commit Message

Konstantin Ryabitsev Sept. 4, 2021, 3:17 p.m. UTC
For the server-side to properly respond to v2 protocol requests, the
webserver must set the GIT_PROTOCOL environment variable to the value of
the Git-Protocol: request header.

Link: https://lore.kernel.org/git/YTNtVJy6sCfQ7T3L@coredump.intra.peff.net/
Reported-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
---
 Documentation/technical/protocol-v2.txt | 15 +++++++++++++++
 1 file changed, 15 insertions(+)


base-commit: e0a2f5cbc585657e757385ad918f167f519cfb96

Comments

Jeff King Sept. 4, 2021, 3:55 p.m. UTC | #1
On Sat, Sep 04, 2021 at 11:17:21AM -0400, Konstantin Ryabitsev wrote:

> For the server-side to properly respond to v2 protocol requests, the
> webserver must set the GIT_PROTOCOL environment variable to the value of
> the Git-Protocol: request header.

Thanks for assembling these examples.

I don't mind having these in the technical documentation, but I think
most users won't find them there (nor would they even know they need to
be looking). Maybe the manpage for git-http-backend would be a better
spot. We can mention v2 in the "description" section, and then there's
some example config near the end that could include it.

Unfortunately there isn't any nginx example config there at all yet. If
you have kernel.org config you could share, that would be great. But
even starting with just the "here's how you do v2" part would be
welcome.

-Peff
Junio C Hamano Sept. 7, 2021, 8:57 p.m. UTC | #2
Jeff King <peff@peff.net> writes:

> On Sat, Sep 04, 2021 at 11:17:21AM -0400, Konstantin Ryabitsev wrote:
>
>> For the server-side to properly respond to v2 protocol requests, the
>> webserver must set the GIT_PROTOCOL environment variable to the value of
>> the Git-Protocol: request header.
>
> Thanks for assembling these examples.
>
> I don't mind having these in the technical documentation, but I think
> most users won't find them there (nor would they even know they need to
> be looking). Maybe the manpage for git-http-backend would be a better
> spot. We can mention v2 in the "description" section, and then there's
> some example config near the end that could include it.
>
> Unfortunately there isn't any nginx example config there at all yet. If
> you have kernel.org config you could share, that would be great. But
> even starting with just the "here's how you do v2" part would be
> welcome.

True, true.

In the meantime, I'll queue this as-is.

Thanks.
Konstantin Ryabitsev Sept. 7, 2021, 9:11 p.m. UTC | #3
On Sat, Sep 04, 2021 at 11:55:11AM -0400, Jeff King wrote:
> Unfortunately there isn't any nginx example config there at all yet. If
> you have kernel.org config you could share, that would be great. But
> even starting with just the "here's how you do v2" part would be
> welcome.

I'll see if I can come up with something to put into
Documentation/git-http-backend.txt, but I can't right away -- hopefully in
early October once a bunch of conferences are over.

Best regards,
-K
Jeff King Sept. 8, 2021, 10:48 a.m. UTC | #4
On Tue, Sep 07, 2021 at 05:11:28PM -0400, Konstantin Ryabitsev wrote:

> On Sat, Sep 04, 2021 at 11:55:11AM -0400, Jeff King wrote:
> > Unfortunately there isn't any nginx example config there at all yet. If
> > you have kernel.org config you could share, that would be great. But
> > even starting with just the "here's how you do v2" part would be
> > welcome.
> 
> I'll see if I can come up with something to put into
> Documentation/git-http-backend.txt, but I can't right away -- hopefully in
> early October once a bunch of conferences are over.

It would be great if you could add nginx examples at some point. But in
the meantime, we can do this much easier patch to make sure we don't
forget about mentioning the protocol bits.

-- >8 --
Subject: [PATCH] docs/http-backend: mention v2 protocol

There's a little bit of configuration needed at the webserver level in
order to get the client's v2 protocol probes to Git. But when we
introduced the v2 protocol, we never documented these explicitly.

Commit 9181c4a9ac (Docs: web server must setenv GIT_PROTOCOL for v2,
2021-09-04) now mentions them in the v2 docs themselves, but users
configuring git-over-http for the first time are more likely to be
looking in the git-http-backend manpage. Let's make sure we mention it
there, too, and give some examples.

Both of the included examples here have been tested to work. The one for
lighttpd is a little less direct than I'd like, but I couldn't find a
way to directly set an environment variable to the value of a request
header. From my reading of the documentation, lighttpd will set
HTTP_GIT_PROTOCOL automatically, but git-http-backend looks only at
GIT_PROTOCOL. Arguably http-backend should do this translation itself.

Signed-off-by: Jeff King <peff@peff.net>
---
 Documentation/git-http-backend.txt | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-http-backend.txt b/Documentation/git-http-backend.txt
index 558966aa83..4797bc8aec 100644
--- a/Documentation/git-http-backend.txt
+++ b/Documentation/git-http-backend.txt
@@ -16,7 +16,9 @@ A simple CGI program to serve the contents of a Git repository to Git
 clients accessing the repository over http:// and https:// protocols.
 The program supports clients fetching using both the smart HTTP protocol
 and the backwards-compatible dumb HTTP protocol, as well as clients
-pushing using the smart HTTP protocol.
+pushing using the smart HTTP protocol. It also supports Git's
+more-efficient "v2" protocol if properly configured; see the
+discussion of `GIT_PROTOCOL` in the ENVIRONMENT section below.
 
 It verifies that the directory has the magic file
 "git-daemon-export-ok", and it will refuse to export any Git directory
@@ -76,6 +78,7 @@ Apache 2.x::
 ----------------------------------------------------------------
 SetEnv GIT_PROJECT_ROOT /var/www/git
 SetEnv GIT_HTTP_EXPORT_ALL
+SetEnvIf Git-Protocol ".*" GIT_PROTOCOL=$0
 ScriptAlias /git/ /usr/libexec/git-core/git-http-backend/
 ----------------------------------------------------------------
 +
@@ -203,6 +206,9 @@ $HTTP["url"] =~ "^/git" {
 		"GIT_PROJECT_ROOT" => "/var/www/git",
 		"GIT_HTTP_EXPORT_ALL" => ""
 	)
+	$REQUEST_HEADER["Git-Protocol"] == "version=2" {
+		setenv.add-environment += ("GIT_PROTOCOL" => "version=2")
+	}
 }
 ----------------------------------------------------------------
 +
@@ -264,6 +270,11 @@ a repository with an extremely large number of refs.  The value can be
 specified with a unit (e.g., `100M` for 100 megabytes). The default is
 10 megabytes.
 
+Clients may probe for optional protocol capabilities using the
+`Git-Protocol` HTTP header. In order to support these, the webserver
+must be configured to pass the contents of that header to
+`git-http-backend` in the `GIT_PROTOCOL` environment variable.
+
 The backend process sets GIT_COMMITTER_NAME to '$REMOTE_USER' and
 GIT_COMMITTER_EMAIL to '$\{REMOTE_USER}@http.$\{REMOTE_ADDR\}',
 ensuring that any reflogs created by 'git-receive-pack' contain some
Jeff King Sept. 8, 2021, 10:57 a.m. UTC | #5
On Wed, Sep 08, 2021 at 06:48:47AM -0400, Jeff King wrote:

> Both of the included examples here have been tested to work. The one for
> lighttpd is a little less direct than I'd like, but I couldn't find a
> way to directly set an environment variable to the value of a request
> header. From my reading of the documentation, lighttpd will set
> HTTP_GIT_PROTOCOL automatically, but git-http-backend looks only at
> GIT_PROTOCOL. Arguably http-backend should do this translation itself.

So having discovered this, I kind of wonder if these documentation
patches are barking up the wrong tree. There is no reason we would not
want v2 to work out of the box (after all, it does for git://).

The patch below does that (and could replace both my and Konstantin's
documentation patches).

This also makes me wonder if we should be documenting the use of
AcceptEnv for ssh (which sadly I don't think we can make work
out-of-the-box).

-- >8 --
Subject: [PATCH] http-backend: handle HTTP_GIT_PROTOCOL CGI variable

When a client requests the v2 protocol over HTTP, they set the
Git-Protocol header. Webservers will generaly make that available to our
CGI as HTTP_GIT_PROTOCOL in the environment. However, that's not
sufficient for upload-pack, etc, to respect it; they look in
GIT_PROTOCOL (without the HTTP_ prefix).

Either the webserver or the CGI is responsible for relaying that HTTP
header into the GIT_PROTOCOL variable. Traditionally, our tests have
configured the webserver to do so, but that's a burden on the server
admin. We can make this work out of the box by having the http-backend
CGI copy the contents.

Signed-off-by: Jeff King <peff@peff.net>
---
 http-backend.c          | 4 ++++
 t/lib-httpd/apache.conf | 2 --
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/http-backend.c b/http-backend.c
index b329bf63f0..2f4b4c11de 100644
--- a/http-backend.c
+++ b/http-backend.c
@@ -739,6 +739,7 @@ static int bad_request(struct strbuf *hdr, const struct service_cmd *c)
 int cmd_main(int argc, const char **argv)
 {
 	char *method = getenv("REQUEST_METHOD");
+	const char *proto_header;
 	char *dir;
 	struct service_cmd *cmd = NULL;
 	char *cmd_arg = NULL;
@@ -789,6 +790,9 @@ int cmd_main(int argc, const char **argv)
 	http_config();
 	max_request_buffer = git_env_ulong("GIT_HTTP_MAX_REQUEST_BUFFER",
 					   max_request_buffer);
+	proto_header = getenv("HTTP_GIT_PROTOCOL");
+	if (proto_header)
+		setenv(GIT_PROTOCOL_ENVIRONMENT, proto_header, 1);
 
 	cmd->imp(&hdr, cmd_arg);
 	return 0;
diff --git a/t/lib-httpd/apache.conf b/t/lib-httpd/apache.conf
index afa91e38b0..71761e3299 100644
--- a/t/lib-httpd/apache.conf
+++ b/t/lib-httpd/apache.conf
@@ -81,8 +81,6 @@ PassEnv GIT_TRACE
 PassEnv GIT_CONFIG_NOSYSTEM
 PassEnv GIT_TEST_SIDEBAND_ALL
 
-SetEnvIf Git-Protocol ".*" GIT_PROTOCOL=$0
-
 Alias /dumb/ www/
 Alias /auth/dumb/ www/auth/dumb/
Eric Wong Sept. 8, 2021, 4:50 p.m. UTC | #6
Jeff King <peff@peff.net> wrote:
> On Wed, Sep 08, 2021 at 06:48:47AM -0400, Jeff King wrote:
> 
> > Both of the included examples here have been tested to work. The one for
> > lighttpd is a little less direct than I'd like, but I couldn't find a
> > way to directly set an environment variable to the value of a request
> > header. From my reading of the documentation, lighttpd will set
> > HTTP_GIT_PROTOCOL automatically, but git-http-backend looks only at
> > GIT_PROTOCOL. Arguably http-backend should do this translation itself.
> 
> So having discovered this, I kind of wonder if these documentation
> patches are barking up the wrong tree. There is no reason we would not
> want v2 to work out of the box (after all, it does for git://).

Agreed.

> The patch below does that (and could replace both my and Konstantin's
> documentation patches).

<snip>

> -- >8 --
> Subject: [PATCH] http-backend: handle HTTP_GIT_PROTOCOL CGI variable
> 
> When a client requests the v2 protocol over HTTP, they set the
> Git-Protocol header. Webservers will generaly make that available to our

"generally"

> CGI as HTTP_GIT_PROTOCOL in the environment. However, that's not
> sufficient for upload-pack, etc, to respect it; they look in
> GIT_PROTOCOL (without the HTTP_ prefix).
> 
> Either the webserver or the CGI is responsible for relaying that HTTP
> header into the GIT_PROTOCOL variable. Traditionally, our tests have
> configured the webserver to do so, but that's a burden on the server
> admin. We can make this work out of the box by having the http-backend
> CGI copy the contents.

Agreed.  I've completely overlooked GIT_PROTOCOL support, so far...

This seems to be the right thing to do; I think I'll add support
for it when I spawn git-http-backend in something I work on.
(I also don't currently pass all HTTP headers in env when
spawning CGI, maybe I should *shrug*)
Junio C Hamano Sept. 9, 2021, 5:28 p.m. UTC | #7
Jeff King <peff@peff.net> writes:

> On Wed, Sep 08, 2021 at 06:48:47AM -0400, Jeff King wrote:
>
>> Both of the included examples here have been tested to work. The one for
>> lighttpd is a little less direct than I'd like, but I couldn't find a
>> way to directly set an environment variable to the value of a request
>> header. From my reading of the documentation, lighttpd will set
>> HTTP_GIT_PROTOCOL automatically, but git-http-backend looks only at
>> GIT_PROTOCOL. Arguably http-backend should do this translation itself.

Nice.

These headers get HTTP_* prefixed as a security measure when servers
expose them to their configuration mechanisms because these names
are attacker controlled.  I had a flawed mental model in which the
servers' configuration controls which one of these resulting HTTP_*
headers are passed to CGI and externals selectively, but if servers
pass all HTTP_* environment variables to CGI and externals without
any filtering, the patch you gave here is the most logical solution.

Will queue.

> -- >8 --
> Subject: [PATCH] http-backend: handle HTTP_GIT_PROTOCOL CGI variable
>
> When a client requests the v2 protocol over HTTP, they set the
> Git-Protocol header. Webservers will generaly make that available to our
> CGI as HTTP_GIT_PROTOCOL in the environment. However, that's not
> sufficient for upload-pack, etc, to respect it; they look in
> GIT_PROTOCOL (without the HTTP_ prefix).
>
> Either the webserver or the CGI is responsible for relaying that HTTP
> header into the GIT_PROTOCOL variable. Traditionally, our tests have
> configured the webserver to do so, but that's a burden on the server
> admin. We can make this work out of the box by having the http-backend
> CGI copy the contents.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  http-backend.c          | 4 ++++
>  t/lib-httpd/apache.conf | 2 --
>  2 files changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/http-backend.c b/http-backend.c
> index b329bf63f0..2f4b4c11de 100644
> --- a/http-backend.c
> +++ b/http-backend.c
> @@ -739,6 +739,7 @@ static int bad_request(struct strbuf *hdr, const struct service_cmd *c)
>  int cmd_main(int argc, const char **argv)
>  {
>  	char *method = getenv("REQUEST_METHOD");
> +	const char *proto_header;
>  	char *dir;
>  	struct service_cmd *cmd = NULL;
>  	char *cmd_arg = NULL;
> @@ -789,6 +790,9 @@ int cmd_main(int argc, const char **argv)
>  	http_config();
>  	max_request_buffer = git_env_ulong("GIT_HTTP_MAX_REQUEST_BUFFER",
>  					   max_request_buffer);
> +	proto_header = getenv("HTTP_GIT_PROTOCOL");
> +	if (proto_header)
> +		setenv(GIT_PROTOCOL_ENVIRONMENT, proto_header, 1);
>  
>  	cmd->imp(&hdr, cmd_arg);
>  	return 0;
> diff --git a/t/lib-httpd/apache.conf b/t/lib-httpd/apache.conf
> index afa91e38b0..71761e3299 100644
> --- a/t/lib-httpd/apache.conf
> +++ b/t/lib-httpd/apache.conf
> @@ -81,8 +81,6 @@ PassEnv GIT_TRACE
>  PassEnv GIT_CONFIG_NOSYSTEM
>  PassEnv GIT_TEST_SIDEBAND_ALL
>  
> -SetEnvIf Git-Protocol ".*" GIT_PROTOCOL=$0
> -
>  Alias /dumb/ www/
>  Alias /auth/dumb/ www/auth/dumb/
Junio C Hamano Sept. 9, 2021, 5:35 p.m. UTC | #8
Junio C Hamano <gitster@pobox.com> writes:

>> @@ -789,6 +790,9 @@ int cmd_main(int argc, const char **argv)
>>  	http_config();
>>  	max_request_buffer = git_env_ulong("GIT_HTTP_MAX_REQUEST_BUFFER",
>>  					   max_request_buffer);
>> +	proto_header = getenv("HTTP_GIT_PROTOCOL");
>> +	if (proto_header)
>> +		setenv(GIT_PROTOCOL_ENVIRONMENT, proto_header, 1);

Since this overwrites (I noticed the "1" at the end), the server
operator cannot force a particular protocol with their server
configuration, no?

Would a weaker form to use 0 (set if there isn't any, but keep the
value if somebody else already has set it) work OK?  Would that have
a downside?
Philippe Blain Sept. 9, 2021, 5:50 p.m. UTC | #9
Hi Peff,

Le 2021-09-08 à 06:57, Jeff King a écrit :
> On Wed, Sep 08, 2021 at 06:48:47AM -0400, Jeff King wrote:
> 
>> Both of the included examples here have been tested to work. The one for
>> lighttpd is a little less direct than I'd like, but I couldn't find a
>> way to directly set an environment variable to the value of a request
>> header. From my reading of the documentation, lighttpd will set
>> HTTP_GIT_PROTOCOL automatically, but git-http-backend looks only at
>> GIT_PROTOCOL. Arguably http-backend should do this translation itself.
> 
> So having discovered this, I kind of wonder if these documentation
> patches are barking up the wrong tree. There is no reason we would not
> want v2 to work out of the box (after all, it does for git://).
> 
> The patch below does that (and could replace both my and Konstantin's
> documentation patches).

I agree it's nice to make it work out of the box, without the web server
admin having to configure anything. But, I'm not sure we should completely
drop the documentation patches: your patch will only affect future versions
of git-http-backend, and users of previous versions will be left without
any documentation as to how to configure it for protocol v2. So I would think we should
keep the documentation patches, maybe with a mention "this should not be necessary
in Git 2.34 and later versions" or something like that (since your
commit message mentions that it "generally" should work like that depending
on the web servers).

> 
> This also makes me wonder if we should be documenting the use of
> AcceptEnv for ssh (which sadly I don't think we can make work
> out-of-the-box).

I think it would be a good idea to document it, yes. FWIW I found out about
it at https://docs.gitlab.com/ee/administration/git_protocol.html.

Cheers,
Philippe.
Junio C Hamano Sept. 10, 2021, 5:39 a.m. UTC | #10
Philippe Blain <levraiphilippeblain@gmail.com> writes:

> I agree it's nice to make it work out of the box, without the web server
> admin having to configure anything. But, I'm not sure we should completely
> drop the documentation patches: your patch will only affect future versions
> of git-http-backend, and users of previous versions will be left without
> any documentation as to how to configure it for protocol v2. So I would think we should
> keep the documentation patches, maybe with a mention "this should not be necessary
> in Git 2.34 and later versions" or something like that (since your
> commit message mentions that it "generally" should work like that depending
> on the web servers).

Thanks, exactly my thought on the need for docs.
Jeff King Sept. 10, 2021, 11:39 a.m. UTC | #11
On Thu, Sep 09, 2021 at 10:35:51AM -0700, Junio C Hamano wrote:

> Junio C Hamano <gitster@pobox.com> writes:
> 
> >> @@ -789,6 +790,9 @@ int cmd_main(int argc, const char **argv)
> >>  	http_config();
> >>  	max_request_buffer = git_env_ulong("GIT_HTTP_MAX_REQUEST_BUFFER",
> >>  					   max_request_buffer);
> >> +	proto_header = getenv("HTTP_GIT_PROTOCOL");
> >> +	if (proto_header)
> >> +		setenv(GIT_PROTOCOL_ENVIRONMENT, proto_header, 1);
> 
> Since this overwrites (I noticed the "1" at the end), the server
> operator cannot force a particular protocol with their server
> configuration, no?
> 
> Would a weaker form to use 0 (set if there isn't any, but keep the
> value if somebody else already has set it) work OK?  Would that have
> a downside?

Yeah, I wondered about that while writing it, but struggled to think of
a reason why the server operator would want to set it at all.

But maybe as a workaround for some misbehaving client (e.g., recognizing
it by a user-agent header). Or we could even perhaps use this in our
tests to test the v2-client-to-v0-server fallback behavior.

I'll re-roll with that change, plus some documentation changes adapted
to this new approach.

-Peff
Jeff King Sept. 10, 2021, 11:40 a.m. UTC | #12
On Thu, Sep 09, 2021 at 10:39:29PM -0700, Junio C Hamano wrote:

> Philippe Blain <levraiphilippeblain@gmail.com> writes:
> 
> > I agree it's nice to make it work out of the box, without the web server
> > admin having to configure anything. But, I'm not sure we should completely
> > drop the documentation patches: your patch will only affect future versions
> > of git-http-backend, and users of previous versions will be left without
> > any documentation as to how to configure it for protocol v2. So I would think we should
> > keep the documentation patches, maybe with a mention "this should not be necessary
> > in Git 2.34 and later versions" or something like that (since your
> > commit message mentions that it "generally" should work like that depending
> > on the web servers).
> 
> Thanks, exactly my thought on the need for docs.

OK. I'll try to cook something up here.

-Peff
Jeff King Sept. 10, 2021, 2:02 p.m. UTC | #13
On Fri, Sep 10, 2021 at 07:39:54AM -0400, Jeff King wrote:

> I'll re-roll with that change, plus some documentation changes adapted
> to this new approach.

Here's what I came up with. I think this should replace both
jk/http-backend-handle-proto-header and kr/doc-webserver-config-for-v2.
The latter does give some specific nginx tips which I didn't carry over,
but they shouldn't be necessary after the change in http-backend. If we
do want to include them, they can be mentioned as optional if we later
add an nginx example config to the http-backend manpage.

  [1/5]: t5551: test v2-to-v0 http protocol fallback
  [2/5]: http-backend: handle HTTP_GIT_PROTOCOL CGI variable
  [3/5]: docs/http-backend: mention v2 protocol
  [4/5]: docs/git: discuss server-side config for GIT_PROTOCOL
  [5/5]: docs/protocol-v2: point readers transport config discussion

 Documentation/git-http-backend.txt      | 26 ++++++++++++++++++++++++-
 Documentation/git-upload-pack.txt       |  8 ++++++++
 Documentation/git.txt                   | 15 ++++++++++++++
 Documentation/technical/protocol-v2.txt |  8 +++++++-
 http-backend.c                          |  4 ++++
 t/lib-httpd/apache.conf                 |  7 +++++--
 t/t5551-http-fetch-smart.sh             |  9 +++++++++
 7 files changed, 73 insertions(+), 4 deletions(-)

-Peff
Junio C Hamano Sept. 10, 2021, 10:08 p.m. UTC | #14
Jeff King <peff@peff.net> writes:

> On Fri, Sep 10, 2021 at 07:39:54AM -0400, Jeff King wrote:
>
>> I'll re-roll with that change, plus some documentation changes adapted
>> to this new approach.
>
> Here's what I came up with. I think this should replace both
> jk/http-backend-handle-proto-header and kr/doc-webserver-config-for-v2.
> The latter does give some specific nginx tips which I didn't carry over,
> but they shouldn't be necessary after the change in http-backend. If we
> do want to include them, they can be mentioned as optional if we later
> add an nginx example config to the http-backend manpage.

All look sensible.  This topic will appear in the next round of
integration, not today's.

Thanks.
diff mbox series

Patch

diff --git a/Documentation/technical/protocol-v2.txt b/Documentation/technical/protocol-v2.txt
index 1040d85319..7a0e97cc8d 100644
--- a/Documentation/technical/protocol-v2.txt
+++ b/Documentation/technical/protocol-v2.txt
@@ -81,6 +81,21 @@  A v2 server would reply:
 Subsequent requests are then made directly to the service
 `$GIT_URL/git-upload-pack`. (This works the same for git-receive-pack).
 
+The web server handling the requests must properly set the GIT_PROTOCOL
+environment variable when it finds `Git-Protocol` in the request headers.
+
+Apache example:
+
+   SetEnvIf Git-Protocol ".*" GIT_PROTOCOL=$0
+
+Nginx + uwsgi example:
+
+   uwsgi_param GIT_PROTOCOL $http_git_protocol;
+
+Nginx + fastcgi example:
+
+   fastcgi_param GIT_PROTOCOL $http_git_protocol;
+
 Capability Advertisement
 ------------------------