diff mbox series

Close transport helper on protocol error

Message ID 20190722212250.44011-1-thibault.jamet@gmail.com (mailing list archive)
State New, archived
Headers show
Series Close transport helper on protocol error | expand

Commit Message

thibault.jamet@gmail.com July 22, 2019, 9:22 p.m. UTC
From: Thibault Jamet <thibault.jamet@gmail.com>

We have noticed that in some cases, when the transport is not fully
compliant with the protocol, git exits with a status of 128 without
closing the transport.

This usually does not have consequences in a standard unix
environment as the process gets then attached to the init one which will
then take care of closing properly the remaining transport.

We remarkably noticed this behaviour on GitHub Enterprise v2.14.24 when
a repository has been migrated to another GitHub Enterprise instance.
The output of the transport is then:

remote: Repository `org/repo' is disabled. Please ask the owner to check their account.
fatal: unable to access 'https://github.example.com/org/repo/': The requested URL returned error: 403

and the code exits inside function get_refs_list

In container contexts, there might not be such an init process
to take care of terminating the transport process and hence they remain
as zombies, as mentioned in
https://github.com/rancher/rancher/issues/13858 or
https://github.com/coreos/clair/issues/441.

Although there is a work-around running an init inside the container,
clean-up the processes created at the time git exits.

Signed-off-by: Thibault Jamet <thibault.jamet@gmail.com>
---
 transport-helper.c | 44 +++++++++++++++++++++++++++++++++-----------
 1 file changed, 33 insertions(+), 11 deletions(-)

Comments

Junio C Hamano July 23, 2019, 5:33 p.m. UTC | #1
thibault.jamet@gmail.com writes:

> Subject: Re: [PATCH] Close transport helper on protocol error

Perhaps

    Subject: transport: close helper on protocol error

> +static int disconnect_helper(struct transport *transport);
> +
> +static int disconnect_helper_data(struct helper_data *data);

Even after reading it twice, disconnect_helper_data() does not ring
the "this is to disconnect the helper process, based on what is
contained in a helper_data instance" bell, which you wanted to ring.
It sounds like it is trying to disconnect "helper_data" from
something unsaid.

I think the root cause of this awkwardness is because this split of
the function into two is suboptimal.  There is only one existing
caller of disconnect_helper() and it passes transport->data (and the
"data" is of type helper_data).  As it is a file-scope static
function, why not just change the type of the parameter from the
whole transport to just helper_data, without introducing the new
function to hold the bulk of the original code?

> +static int release_helper(struct transport *transport);
> +
>  static struct child_process *get_helper(struct transport *transport)
>  {
>  	struct helper_data *data = transport->data;
> @@ -155,8 +161,10 @@ static struct child_process *get_helper(struct transport *transport)
>  	while (1) {
>  		const char *capname, *arg;
>  		int mandatory = 0;
> -		if (recvline(data, &buf))
> +		if (recvline(data, &buf)){
> +			release_helper(transport);
>  			exit(128);
> +		}

This, together with other exit(128) we see in this patch now have
release_helepr() in front of it, which is in line with what the log
message claims that the patch does.

I however wonder if we want to do a bit more, perhaps with atexit().
I am not hinting-suggesting to do so (as you said, if the init
process ought to take care of the zombies, the patch under review
might already be unneeded, and atexit() makes things even worse),
but having trouble to convince that this patch stops at the right
place.

Thanks.
Jeff King July 23, 2019, 7:40 p.m. UTC | #2
On Tue, Jul 23, 2019 at 10:33:10AM -0700, Junio C Hamano wrote:

> > +		if (recvline(data, &buf)){
> > +			release_helper(transport);
> >  			exit(128);
> > +		}
> 
> This, together with other exit(128) we see in this patch now have
> release_helepr() in front of it, which is in line with what the log
> message claims that the patch does.
> 
> I however wonder if we want to do a bit more, perhaps with atexit().
> I am not hinting-suggesting to do so (as you said, if the init
> process ought to take care of the zombies, the patch under review
> might already be unneeded, and atexit() makes things even worse),
> but having trouble to convince that this patch stops at the right
> place.

I was just writing a similar comment when I read this. It probably fixes
the particular case the OP saw, but Git is quite happy to die() in
various code-paths when it encounters an error.

Rather than try to annotate every possible exit, atexit() might be a
better solution. But isn't this a more general problem even than that?
Lots of parts of Git may spawn a sub-process and assume that the
children will be reaped automatically (as long as they do exit, but that
is usually guaranteed by us closing their input pipes when we ourselves
exit).

So I think you'd have to atexit() quite a few places. Or possibly
instrument run_command() to do so, though it might need some extra
annotation to mark whether a particular sub-process should be waited for
(there is some prior art in the child_process.clean_on_exit option).

At which point I do wonder if this is better handled by a wrapper
process which simply reaps everything. And indeed, people have already
come up with similar solutions for containers:

  https://github.com/Yelp/dumb-init

So I dunno. I am not really opposed to this patch, as it is just adding
some extra cleanup. But it seems like it is really hitting the tip of
the iceberg, and I'm not sure it's an iceberg I'd like to get to the
bottom of.

-Peff
Junio C Hamano July 23, 2019, 9:10 p.m. UTC | #3
Jeff King <peff@peff.net> writes:

> ...
> At which point I do wonder if this is better handled by a wrapper
> process which simply reaps everything. And indeed, people have already
> come up with similar solutions for containers:
>
>   https://github.com/Yelp/dumb-init
>
> So I dunno. I am not really opposed to this patch, as it is just adding
> some extra cleanup. But it seems like it is really hitting the tip of
> the iceberg, and I'm not sure it's an iceberg I'd like to get to the
> bottom of.

Thanks for stating what I had on mind a lot better than I said ;-)
diff mbox series

Patch

diff --git a/transport-helper.c b/transport-helper.c
index cec83bd663..34caa75a72 100644
--- a/transport-helper.c
+++ b/transport-helper.c
@@ -101,6 +101,12 @@  static void do_take_over(struct transport *transport)
 
 static void standard_options(struct transport *t);
 
+static int disconnect_helper(struct transport *transport);
+
+static int disconnect_helper_data(struct helper_data *data);
+
+static int release_helper(struct transport *transport);
+
 static struct child_process *get_helper(struct transport *transport)
 {
 	struct helper_data *data = transport->data;
@@ -155,8 +161,10 @@  static struct child_process *get_helper(struct transport *transport)
 	while (1) {
 		const char *capname, *arg;
 		int mandatory = 0;
-		if (recvline(data, &buf))
+		if (recvline(data, &buf)){
+			release_helper(transport);
 			exit(128);
+		}
 
 		if (!*buf.buf)
 			break;
@@ -215,10 +223,14 @@  static struct child_process *get_helper(struct transport *transport)
 
 static int disconnect_helper(struct transport *transport)
 {
-	struct helper_data *data = transport->data;
+	return disconnect_helper_data(transport->data);
+}
+
+static int disconnect_helper_data(struct helper_data *data)
+{
 	int res = 0;
 
-	if (data->helper) {
+	if (data && data->helper) {
 		if (debug)
 			fprintf(stderr, "Debug: Disconnecting.\n");
 		if (!data->no_disconnect_req) {
@@ -261,8 +273,10 @@  static int strbuf_set_helper_option(struct helper_data *data,
 	int ret;
 
 	sendline(data, buf);
-	if (recvline(data, buf))
+	if (recvline(data, buf)) {
+		disconnect_helper_data(data);
 		exit(128);
+	}
 
 	if (!strcmp(buf->buf, "ok"))
 		ret = 0;
@@ -366,10 +380,12 @@  static void standard_options(struct transport *t)
 static int release_helper(struct transport *transport)
 {
 	int res = 0;
-	struct helper_data *data = transport->data;
-	refspec_clear(&data->rs);
-	res = disconnect_helper(transport);
-	free(transport->data);
+	if (transport){
+		struct helper_data *data = transport->data;
+		refspec_clear(&data->rs);
+		res = disconnect_helper(transport);
+		free(transport->data);
+	}
 	return res;
 }
 
@@ -394,8 +410,10 @@  static int fetch_with_fetch(struct transport *transport,
 	sendline(data, &buf);
 
 	while (1) {
-		if (recvline(data, &buf))
+		if (recvline(data, &buf)){
+			release_helper(transport);
 			exit(128);
+		}
 
 		if (starts_with(buf.buf, "lock ")) {
 			const char *name = buf.buf + 5;
@@ -561,8 +579,10 @@  static int run_connect(struct transport *transport, struct strbuf *cmdbuf)
 	setvbuf(input, NULL, _IONBF, 0);
 
 	sendline(data, cmdbuf);
-	if (recvline_fh(input, cmdbuf))
+	if (recvline_fh(input, cmdbuf)){
+		release_helper(transport);
 		exit(128);
+	}
 
 	if (!strcmp(cmdbuf->buf, "")) {
 		data->no_disconnect_req = 1;
@@ -1074,8 +1094,10 @@  static struct ref *get_refs_list(struct transport *transport, int for_push,
 
 	while (1) {
 		char *eov, *eon;
-		if (recvline(data, &buf))
+		if (recvline(data, &buf)){
+			release_helper(transport);
 			exit(128);
+		}
 
 		if (!*buf.buf)
 			break;