diff mbox series

[v2,7/8] serve: add support for a "startup" git_config() callback

Message ID patch-7.8-0a4fb01ae38-20210628T191634Z-avarab@gmail.com (mailing list archive)
State New, archived
Headers show
Series serve: add "startup_config" callback | expand

Commit Message

Ævar Arnfjörð Bjarmason June 28, 2021, 7:19 p.m. UTC
Since the introduction of serve.c we've added git_config() callbacks
and other config reading for capabilities in the following commits:

- e20b4192a37 (upload-pack: support hidden refs with protocol v2, 2018-12-18)
- 08450ef7918 (upload-pack: clear filter_options for each v2 fetch command, 2020-05-08)
- 6b5b6e422ee (serve: advertise session ID in v2 capabilities, 2020-11-11)
- 59e1205d167 (ls-refs: report unborn targets of symrefs, 2021-02-05)

Of these 08450ef7918 fixed code that needed to read config on a
per-request basis, whereas most of the config reading just wants to
check if we've enabled one semi-static config variable or other. We'd
like to re-read that value eventually, but from request-to-request
it's OK if we retain the old one, and it isn't impacted by other
request data.

So let's support this common pattern as a "startup_config" callback,
making use of our recently added "call_{advertise,command}()"
functions. This allows us to simplify e.g. the "ensure_config_read()"
function added in 59e1205d167 (ls-refs: report unborn targets of
symrefs, 2021-02-05).

We could read all the config for all the protocol capabilities, but
let's do it one callback at a time in anticipation that some won't be
called at all, and that some might be more expensive than others in
the future.

I'm not migrating over the code in the upload_pack_v2 function in
upload-pack.c yet, that case is more complex since it deals with both
v1 and v2. It will be dealt with in a code a subsequent commit.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 ls-refs.c | 42 ++++++++++++++----------------------------
 ls-refs.h |  1 +
 serve.c   | 42 ++++++++++++++++++++++++++++++++++++++++--
 3 files changed, 55 insertions(+), 30 deletions(-)

Comments

Jeff King July 1, 2021, 4:43 p.m. UTC | #1
On Mon, Jun 28, 2021 at 09:19:24PM +0200, Ævar Arnfjörð Bjarmason wrote:

> So let's support this common pattern as a "startup_config" callback,
> making use of our recently added "call_{advertise,command}()"
> functions. This allows us to simplify e.g. the "ensure_config_read()"
> function added in 59e1205d167 (ls-refs: report unborn targets of
> symrefs, 2021-02-05).
> 
> We could read all the config for all the protocol capabilities, but
> let's do it one callback at a time in anticipation that some won't be
> called at all, and that some might be more expensive than others in
> the future.

Sadly I don't think this addresses my "v2 receive-pack" concern. The
ls_refs_startup_config() function will get called after we've received a
request for "ls-refs", which is good. But:

> +static void read_startup_config(struct protocol_capability *command)
> +{
> +	if (!command->startup_config)
> +		return;
> +	if (command->have_startup_config++)
> +		return;
> +	git_config(command->startup_config, NULL);
> +}

...we don't pass any context to the config callback here. I thought
passing "command" might work, but looking at the ls_refs() function, it
is the one who actually reads the pkt-lines that will tell us "hey, I'm
doing an ls-refs for a push".

So none of the serve() infrastructure can help us there; we need to read
pkt-lines and _then_ read config.

I dunno. Maybe the solution is for ls_refs() to just do a separate
config call to pick up the operation-specific bits, like:

diff --git a/ls-refs.c b/ls-refs.c
index 88f6c3f60d..6ee70126aa 100644
--- a/ls-refs.c
+++ b/ls-refs.c
@@ -130,12 +130,13 @@ static void send_possibly_unborn_head(struct ls_refs_data *data)
 
 static int ls_refs_config(const char *var, const char *value, void *data)
 {
+	struct ls_refs_data *d = data;
 	/*
 	 * We only serve fetches over v2 for now, so respect only "uploadpack"
 	 * config. This may need to eventually be expanded to "receive", but we
 	 * don't yet know how that information will be passed to ls-refs.
 	 */
-	return parse_hide_refs_config(var, value, "uploadpack");
+	return parse_hide_refs_config(var, value, d->for_push ? "receive" : "uploadpack");
 }
 
 int ls_refs(struct repository *r, struct strvec *keys,
@@ -147,7 +148,6 @@ int ls_refs(struct repository *r, struct strvec *keys,
 	strvec_init(&data.prefixes);
 
 	ensure_config_read();
-	git_config(ls_refs_config, NULL);
 
 	while (packet_reader_read(request) == PACKET_READ_NORMAL) {
 		const char *arg = request->line;
@@ -161,8 +161,12 @@ int ls_refs(struct repository *r, struct strvec *keys,
 			strvec_push(&data.prefixes, out);
 		else if (!strcmp("unborn", arg))
 			data.unborn = allow_unborn;
+		else if (!strcmp("for-push", arg)) /* imagine this exists */
+			data.for_push = 1;
 	}
 
+	git_config(ls_refs_config, &data);
+
 	if (request->status != PACKET_READ_FLUSH)
 		die(_("expected flush after ls-refs arguments"));
 

And then it is a separate thing entirely from your "serve will read
config" code. It's arguably more confusing, because now config is read
in two places, but that is already true because of this
ensure_config_read() thing.

The patch above obviously makes no sense until we're working on v2
receive-pack. But I think it illustrates that your patch here is not
getting in the way (though technically I think that would also be true
of your v1, I do like this version better).

-Peff
Jeff King July 1, 2021, 4:47 p.m. UTC | #2
On Thu, Jul 01, 2021 at 12:43:43PM -0400, Jeff King wrote:

> I dunno. Maybe the solution is for ls_refs() to just do a separate
> config call to pick up the operation-specific bits, like:

By the way, I think both currently and after the patch I showed,
ls_refs() has the same "bug" that we fixed for upload_pack_v2() a while
ago: in a v2 world, a client could request "ls-refs" over and over, and
each time we'd load the hiderefs config, appending duplicate config to
the list each time.

In practice this doesn't happen because unlike "fetch", which clients
must do many rounds of, clients usually issue only a single ls-refs. So
it may not be worth worrying too much about. I guess a malicious client
could convince us to very slowly allocate an arbitrary amount of memory.

-Peff
Ævar Arnfjörð Bjarmason July 2, 2021, 12:55 p.m. UTC | #3
On Thu, Jul 01 2021, Jeff King wrote:

> On Mon, Jun 28, 2021 at 09:19:24PM +0200, Ævar Arnfjörð Bjarmason wrote:
>
>> So let's support this common pattern as a "startup_config" callback,
>> making use of our recently added "call_{advertise,command}()"
>> functions. This allows us to simplify e.g. the "ensure_config_read()"
>> function added in 59e1205d167 (ls-refs: report unborn targets of
>> symrefs, 2021-02-05).
>> 
>> We could read all the config for all the protocol capabilities, but
>> let's do it one callback at a time in anticipation that some won't be
>> called at all, and that some might be more expensive than others in
>> the future.
>
> Sadly I don't think this addresses my "v2 receive-pack" concern. The
> ls_refs_startup_config() function will get called after we've received a
> request for "ls-refs", which is good. But:

I think it does in that you rightly objected to us moving all config to
such a callback, because for some of it we don't have the information
needed to look it up yet, we do that in the request handler.

But for a lot of our config it's fine to do it early, hence "startup"
config.

Yes I've moved the ls-refs handling into the "startup" because /right
now/ it's only handling fetches, it'll need to be moved out if and when
we start handling pushes.

But isn't it going to be obvious that we'll need to do that then? Since
we'll have the example of upload-pack.c doing that exact thing?

I.e. do you not want to have the "startup config" concept at all, or
would just prefer to have the ls-refs part of it pre-emotively moved out
of it in anticipation of handling pushes some day, even if we can do
that on "startup" now?

(More below)

>> +static void read_startup_config(struct protocol_capability *command)
>> +{
>> +	if (!command->startup_config)
>> +		return;
>> +	if (command->have_startup_config++)
>> +		return;
>> +	git_config(command->startup_config, NULL);
>> +}
>
> ...we don't pass any context to the config callback here. I thought
> passing "command" might work, but looking at the ls_refs() function, it
> is the one who actually reads the pkt-lines that will tell us "hey, I'm
> doing an ls-refs for a push".
>
> So none of the serve() infrastructure can help us there; we need to read
> pkt-lines and _then_ read config.
>
> I dunno. Maybe the solution is for ls_refs() to just do a separate
> config call to pick up the operation-specific bits, like:
>
> diff --git a/ls-refs.c b/ls-refs.c
> index 88f6c3f60d..6ee70126aa 100644
> --- a/ls-refs.c
> +++ b/ls-refs.c
> @@ -130,12 +130,13 @@ static void send_possibly_unborn_head(struct ls_refs_data *data)
>  
>  static int ls_refs_config(const char *var, const char *value, void *data)
>  {
> +	struct ls_refs_data *d = data;
>  	/*
>  	 * We only serve fetches over v2 for now, so respect only "uploadpack"
>  	 * config. This may need to eventually be expanded to "receive", but we
>  	 * don't yet know how that information will be passed to ls-refs.
>  	 */
> -	return parse_hide_refs_config(var, value, "uploadpack");
> +	return parse_hide_refs_config(var, value, d->for_push ? "receive" : "uploadpack");
>  }
>  
>  int ls_refs(struct repository *r, struct strvec *keys,
> @@ -147,7 +148,6 @@ int ls_refs(struct repository *r, struct strvec *keys,
>  	strvec_init(&data.prefixes);
>  
>  	ensure_config_read();
> -	git_config(ls_refs_config, NULL);
>  
>  	while (packet_reader_read(request) == PACKET_READ_NORMAL) {
>  		const char *arg = request->line;
> @@ -161,8 +161,12 @@ int ls_refs(struct repository *r, struct strvec *keys,
>  			strvec_push(&data.prefixes, out);
>  		else if (!strcmp("unborn", arg))
>  			data.unborn = allow_unborn;
> +		else if (!strcmp("for-push", arg)) /* imagine this exists */
> +			data.for_push = 1;
>  	}
>  
> +	git_config(ls_refs_config, &data);
> +
>  	if (request->status != PACKET_READ_FLUSH)
>  		die(_("expected flush after ls-refs arguments"));
>  
>
> And then it is a separate thing entirely from your "serve will read
> config" code. It's arguably more confusing, because now config is read
> in two places, but that is already true because of this
> ensure_config_read() thing.

This suggests to me you'd like to preemptively move it out of "startup",
correct?

Anyway, I can do that if that addresses your concern. I thought the v1
objection was mainly "the config flow won't work that way in all cases",
which you're right that I incorrectly assumed.

I just thought preemptively doing it for "ls-refs" wouldn't be needed,
since we'd notice in testing such a feature that "do it once" would
break in obvious ways for multi-requests, especially with the comment
for the "startup_config" callback explicitly calling out that case.

> The patch above obviously makes no sense until we're working on v2
> receive-pack. But I think it illustrates that your patch here is not
> getting in the way (though technically I think that would also be true
> of your v1, I do like this version better).
Jeff King July 2, 2021, 9:13 p.m. UTC | #4
On Fri, Jul 02, 2021 at 02:55:38PM +0200, Ævar Arnfjörð Bjarmason wrote:

> > Sadly I don't think this addresses my "v2 receive-pack" concern. The
> > ls_refs_startup_config() function will get called after we've received a
> > request for "ls-refs", which is good. But:
> 
> I think it does in that you rightly objected to us moving all config to
> such a callback, because for some of it we don't have the information
> needed to look it up yet, we do that in the request handler.
> 
> But for a lot of our config it's fine to do it early, hence "startup"
> config.

So I think your patch is OK, but just to lay out my thinking, it was
this. The v2 serve code basically does:

  1. Somebody calls serve().

  2. It reads a capability/operation name, then dispatches to the
     appropriate function.

  3. That function reads operation-specific data, like options.

Your first patch read config at step 1. This one reads at step 2.

But from past discussions, the hide-refs config could only be read (or
interpreted) after step 3, because that's when we'd see an option saying
"this is for pushing".

So anything except step 3 is "too late". But that doesn't stop us from
having some early config and some operation-specific late config.

> Yes I've moved the ls-refs handling into the "startup" because /right
> now/ it's only handling fetches, it'll need to be moved out if and when
> we start handling pushes.
> 
> But isn't it going to be obvious that we'll need to do that then? Since
> we'll have the example of upload-pack.c doing that exact thing?
>
> I.e. do you not want to have the "startup config" concept at all, or
> would just prefer to have the ls-refs part of it pre-emotively moved out
> of it in anticipation of handling pushes some day, even if we can do
> that on "startup" now?

Sorry to be so confusing. About halfway through writing my previous
email I had realized "well, I guess we can have two phases of
config-reading, and that's not so bad". So I'm OK with this direction.

I don't think anything else needs to happen on top of your patch here.
The "late" phase for ls-refs config reading already happens totally
separately (it's in the git_config(ls_refs_config) call, which is
already separate from the ensure_config_read() thing. And your patch
leaves that line untouched.

> > And then it is a separate thing entirely from your "serve will read
> > config" code. It's arguably more confusing, because now config is read
> > in two places, but that is already true because of this
> > ensure_config_read() thing.
> 
> This suggests to me you'd like to preemptively move it out of "startup",
> correct?

It's already out of the startup, so I think we are OK.

My big question here was: is the concept of "startup config" and "other
config we read after seeing the options" too confusing? But probably
not.

> I just thought preemptively doing it for "ls-refs" wouldn't be needed,
> since we'd notice in testing such a feature that "do it once" would
> break in obvious ways for multi-requests, especially with the comment
> for the "startup_config" callback explicitly calling out that case.

Yeah, we'd definitely notice. I was just wondering if we'd end up
needing to back out these changes to accommodate it. But the two-phase
thing solves that.

-Peff
Han-Wen Nienhuys July 5, 2021, 12:23 p.m. UTC | #5
On Mon, Jun 28, 2021 at 9:19 PM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
> So let's support this common pattern as a "startup_config" callback,
> making use of our recently added "call_{advertise,command}()"

I think this is an improvement over ensure_config_read(), but I don't
understand the whole picture. If you want to control configuration and
be sure it is set early, isn't it more natural to put this in struct
serve_options? With your patch, there are still two routes to
injecting configuration: 'struct serve_options' and the git config.

IMO the former is more principled, and the control flow (serve_options
is an argument to serve()) ensures that the options are read only
once.

> +       /*
> +        * A git_config() callback that'll be called only once for the
> +        * lifetime of the process, possibly over many different

Putting it in a struct supplied by the caller provides more
flexibility for testing: you could test multiple behaviors from a
single process, by calling serve multiple times supplying different
arguments.
Han-Wen Nienhuys July 5, 2021, 12:34 p.m. UTC | #6
On Mon, Jun 28, 2021 at 9:19 PM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>  /* Main serve loop for protocol version 2 */
>  void serve(struct serve_options *options)

side note: this is a very short name for a globally visible symbol.
Would be nice if this were something like protocol_v2_serve_loop() .
Then one could drop the comment.
diff mbox series

Patch

diff --git a/ls-refs.c b/ls-refs.c
index 88f6c3f60d8..02fbcfd9bde 100644
--- a/ls-refs.c
+++ b/ls-refs.c
@@ -7,37 +7,26 @@ 
 #include "pkt-line.h"
 #include "config.h"
 
-static int config_read;
-static int advertise_unborn;
-static int allow_unborn;
+/* "unborn" is on by default if there's no lsrefs.unborn config */
+static int advertise_unborn = 1;
+static int allow_unborn = 1;
 
-static void ensure_config_read(void)
+int ls_refs_startup_config(const char *var, const char *value, void *data)
 {
-	const char *str = NULL;
-
-	if (config_read)
-		return;
-
-	if (repo_config_get_string_tmp(the_repository, "lsrefs.unborn", &str)) {
-		/*
-		 * If there is no such config, advertise and allow it by
-		 * default.
-		 */
-		advertise_unborn = 1;
-		allow_unborn = 1;
-	} else {
-		if (!strcmp(str, "advertise")) {
-			advertise_unborn = 1;
+	if (!strcmp(var, "lsrefs.unborn")) {
+		if (!strcmp(value, "advertise")) {
+			/* Allowed and advertised by default */
+		} else if (!strcmp(value, "allow")) {
+			advertise_unborn = 0;
 			allow_unborn = 1;
-		} else if (!strcmp(str, "allow")) {
-			allow_unborn = 1;
-		} else if (!strcmp(str, "ignore")) {
-			/* do nothing */
+		} else if (!strcmp(value, "ignore")) {
+			advertise_unborn = 0;
+			allow_unborn = 0;
 		} else {
-			die(_("invalid value '%s' for lsrefs.unborn"), str);
+			die(_("invalid value '%s' for lsrefs.unborn"), value);
 		}
 	}
-	config_read = 1;
+	return 0;
 }
 
 /*
@@ -145,8 +134,6 @@  int ls_refs(struct repository *r, struct strvec *keys,
 
 	memset(&data, 0, sizeof(data));
 	strvec_init(&data.prefixes);
-
-	ensure_config_read();
 	git_config(ls_refs_config, NULL);
 
 	while (packet_reader_read(request) == PACKET_READ_NORMAL) {
@@ -179,7 +166,6 @@  int ls_refs(struct repository *r, struct strvec *keys,
 int ls_refs_advertise(struct repository *r, struct strbuf *value)
 {
 	if (value) {
-		ensure_config_read();
 		if (advertise_unborn)
 			strbuf_addstr(value, "unborn");
 	}
diff --git a/ls-refs.h b/ls-refs.h
index a99e4be0bdd..5fcab0d1dca 100644
--- a/ls-refs.h
+++ b/ls-refs.h
@@ -6,6 +6,7 @@  struct strvec;
 struct packet_reader;
 int ls_refs(struct repository *r, struct strvec *keys,
 	    struct packet_reader *request);
+int ls_refs_startup_config(const char *var, const char *value, void *data);
 int ls_refs_advertise(struct repository *r, struct strbuf *value);
 
 #endif /* LS_REFS_H */
diff --git a/serve.c b/serve.c
index 6dbd05248b9..cce96dd5a81 100644
--- a/serve.c
+++ b/serve.c
@@ -33,6 +33,13 @@  static int object_format_advertise(struct repository *r,
 	return 1;
 }
 
+static int session_id_startup_config(const char *var, const char *value, void *data)
+{
+	if (!strcmp(var, "transfer.advertisesid"))
+		advertise_sid = git_config_bool(var, value);
+	return 0;
+}
+
 static int session_id_advertise(struct repository *r, struct strbuf *value)
 {
 	if (!advertise_sid)
@@ -54,6 +61,24 @@  struct protocol_capability {
 	 */
 	const char *name;
 
+	/*
+	 * A git_config() callback that'll be called only once for the
+	 * lifetime of the process, possibly over many different
+	 * requests. Used for reading config that's expected to be
+	 * static.
+	 *
+	 * The "command" or "advertise" callbacks themselves are
+	 * expected to read config that needs to be more current than
+	 * that, or which is dependent on request data.
+	 */
+	int (*startup_config)(const char *var, const char *value, void *data);
+
+	/*
+	 * A boolean to check if we've called our "startup_config"
+	 * callback.
+	 */
+	int have_startup_config;
+
 	/*
 	 * Function queried to see if a capability should be advertised.
 	 * Optionally a value can be specified by adding it to 'value'.
@@ -81,6 +106,7 @@  static struct protocol_capability capabilities[] = {
 	},
 	{
 		.name = "ls-refs",
+		.startup_config = ls_refs_startup_config,
 		.advertise = ls_refs_advertise,
 		.command = ls_refs,
 	},
@@ -99,6 +125,7 @@  static struct protocol_capability capabilities[] = {
 	},
 	{
 		.name = "session-id",
+		.startup_config = session_id_startup_config,
 		.advertise = session_id_advertise,
 	},
 	{
@@ -108,6 +135,15 @@  static struct protocol_capability capabilities[] = {
 	},
 };
 
+static void read_startup_config(struct protocol_capability *command)
+{
+	if (!command->startup_config)
+		return;
+	if (command->have_startup_config++)
+		return;
+	git_config(command->startup_config, NULL);
+}
+
 static int call_advertise(struct protocol_capability *command,
 			  struct repository *r, struct strbuf *value)
 {
@@ -115,6 +151,8 @@  static int call_advertise(struct protocol_capability *command,
 	struct strbuf sb = STRBUF_INIT;
 	const char *msg;
 
+	read_startup_config(command);
+
 	strbuf_addf(&sb, "advertise/%s", command->name);
 	trace2_region_enter("serve", sb.buf, r);
 	ret = command->advertise(r, value);
@@ -132,6 +170,8 @@  static int call_command(struct protocol_capability *command,
 	int ret;
 	struct strbuf sb = STRBUF_INIT;
 
+	read_startup_config(command);
+
 	strbuf_addf(&sb, "command/%s", command->name);
 	trace2_region_enter("serve", sb.buf, r);
 	ret = command->command(r, keys, request);
@@ -338,8 +378,6 @@  static int process_request(void)
 /* Main serve loop for protocol version 2 */
 void serve(struct serve_options *options)
 {
-	git_config_get_bool("transfer.advertisesid", &advertise_sid);
-
 	if (options->advertise_capabilities || !options->stateless_rpc) {
 		/* serve by default supports v2 */
 		packet_write_fmt(1, "version 2\n");