Message ID | 20250127151701.2321341-6-christian.couder@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | Introduce a "promisor-remote" capability | expand |
Christian Couder <christian.couder@gmail.com> writes: > A previous commit introduced a "promisor.acceptFromServer" configuration > variable with only "None" or "All" as valid values. > > Let's introduce "KnownName" and "KnownUrl" as valid values for this > configuration option to give more choice to a client about which > promisor remotes it might accept among those that the server advertised. OK. > promisor.acceptFromServer:: > If set to "all", a client will accept all the promisor remotes > a server might advertise using the "promisor-remote" > - capability. Default is "none", which means no promisor remote > - advertised by a server will be accepted. By accepting a > - promisor remote, the client agrees that the server might omit > - objects that are lazily fetchable from this promisor remote > - from its responses to "fetch" and "clone" requests from the > - client. See linkgit:gitprotocol-v2[5]. > + capability. If set to "knownName" the client will accept > + promisor remotes which are already configured on the client > + and have the same name as those advertised by the client. This > + is not very secure, but could be used in a corporate setup > + where servers and clients are trusted to not switch name and > + URLs. I wonder if the reader needs to be told a bit more about the security argument here. I imagine that the attack vector behind the use of "secure" in the above paragraph is for a malicious server that guesses a promisor remote name the client already uses, which has a different URL from what the client expects to be associated with the name, thereby such an acceptance means that the URL used in future fetches would be replaced without the user's consent. Being able to silently repoint the remote.origin.url at an evil repository you control is indeed a powerful thing, I would guess. Of course, in a corp environment, such a mechanism to drive the clients to a new repository after upgrading or migrating may be extremely handy. Or does the above paragraph assumes some other attack vectors, perhaps? > + If set to "knownUrl", the client will accept promisor > + remotes which have both the same name and the same URL > + configured on the client as the name and URL advertised by the > + server. This is more secure than "all" or "knownUrl", so it > + should be used if possible instead of those options. Default > + is "none", which means no promisor remote advertised by a > + server will be accepted. OK. > diff --git a/promisor-remote.c b/promisor-remote.c > index 5ac282ed27..790a96aa19 100644 > --- a/promisor-remote.c > +++ b/promisor-remote.c > @@ -370,30 +370,73 @@ char *promisor_remote_info(struct repository *repo) > return strbuf_detach(&sb, NULL); > } > > +/* > + * Find first index of 'vec' where there is 'val'. 'val' is compared > + * case insensively to the strings in 'vec'. If not found 'vec->nr' is > + * returned. > + */ > +static size_t strvec_find_index(struct strvec *vec, const char *val) > +{ > + for (size_t i = 0; i < vec->nr; i++) > + if (!strcasecmp(vec->v[i], val)) > + return i; > + return vec->nr; > +} Hmph, without the hardcoded strcasecmp(), strvec_find() might make a fine public API in <strvec.h>. Unless we intend to create a generic function that qualifies as a part of the public strvec API, we shouldn't call it strvec_anything. This is a great helper that finds a matching remote nickname from list of remote nicknames, so remote_nick_find(struct strvec *nicks, const char *nick) may be more appropriate. When we lift it out of here and make it more generic to move it to strvec.[ch], perhaps size_t strvec_find(struct strvec *vec, void *needle, int (*match)(const char *, void *)) { for (size_t ix = 0; ix < vec->nr, ix++) if (match(vec->v[ix], needle)) return ix; return vec->nr; } which will be used to rewrite remote_nick_find() like so: static int nicks_match(const char *nick, void *needle) { return !strcasecmp(nick, (conat char *)needle); } remote_hick_find(struct strvec *nicks, const char *nick) { return strvec_find(nicks, nick, nicks_match); } it would be better to use a more generic parameter name "vec", but until then, it is better to be more specific and explicit about the reason why the immediate callers call the function for, which is where my "nicks" vs "nick" comes from (it is OK to call the latter "needle", though). > enum accept_promisor { > ACCEPT_NONE = 0, > + ACCEPT_KNOWN_URL, > + ACCEPT_KNOWN_NAME, > ACCEPT_ALL > }; > > static int should_accept_remote(enum accept_promisor accept, > - const char *remote_name UNUSED, > - const char *remote_url UNUSED) > + const char *remote_name, const char *remote_url, > + struct strvec *names, struct strvec *urls) > { > + size_t i; > + > if (accept == ACCEPT_ALL) > return 1; > > - BUG("Unhandled 'enum accept_promisor' value '%d'", accept); > + i = strvec_find_index(names, remote_name); > + > + if (i >= names->nr) > + /* We don't know about that remote */ > + return 0; OK. > + if (accept == ACCEPT_KNOWN_NAME) > + return 1; > + > + if (accept != ACCEPT_KNOWN_URL) > + BUG("Unhandled 'enum accept_promisor' value '%d'", accept); I can see why this defensiveness may be a good idea than not having any, but I wonder if we can take advantage of compile time checks some compilers have to ensure that case arms in a switch statement are exhausitive? > + if (!strcasecmp(urls->v[i], remote_url)) > + return 1; This is iffy. The <schema>://<host>/ part might want to be compared case insensitively, but the rest of the URL is generally case sensitive (unless the material served is stored on a machine with case-insensitive filesystem)? Given that the existing URL must have come by either cloning from this server or another related server or by an earlier acceptFromServer behaviour, I do not see a need for being extra lax here. We should be more careful about our use of case-insensitive comparison, and I do not see how this URL comparison could be something the end users would expect to be done case insensitively. > -static void filter_promisor_remote(struct strvec *accepted, const char *info) > +static void filter_promisor_remote(struct repository *repo, > + struct strvec *accepted, > + const char *info) > { > struct strbuf **remotes; > const char *accept_str; > enum accept_promisor accept = ACCEPT_NONE; > + struct strvec names = STRVEC_INIT; > + struct strvec urls = STRVEC_INIT; > > if (!git_config_get_string_tmp("promisor.acceptfromserver", &accept_str)) { > if (!accept_str || !*accept_str || !strcasecmp("None", accept_str)) Not a fault of this step, but is it sensible to even expect !accept_str in an error case? *accept_str could be NUL, but accept_str be either left uninitialized (because this caller does not initialize it) when the get_string_tmp() returns non-zero, or points at the internal cached value in the config_set if it returns 0 (and the control comes into this block). > accept = ACCEPT_NONE; > + else if (!strcasecmp("KnownUrl", accept_str)) > + accept = ACCEPT_KNOWN_URL; > + else if (!strcasecmp("KnownName", accept_str)) > + accept = ACCEPT_KNOWN_NAME; > else if (!strcasecmp("All", accept_str)) > accept = ACCEPT_ALL; > else Ditto about icase for all of the above. > +test_expect_success "clone with 'KnownUrl' and different remote urls" ' > + ln -s server2 serverTwo && > + > + git -C server config promisor.advertise true && > + > + # Clone from server to create a client > + GIT_NO_LAZY_FETCH=0 git clone -c remote.server2.promisor=true \ > + -c remote.server2.fetch="+refs/heads/*:refs/remotes/server2/*" \ > + -c remote.server2.url="file://$(pwd)/serverTwo" \ > + -c promisor.acceptfromserver=KnownUrl \ > + --no-local --filter="blob:limit=5k" server client && > + test_when_finished "rm -rf client" && > + > + # Check that the largest object is not missing on the server > + check_missing_objects server 0 "" && > + > + # Reinitialize server so that the largest object is missing again > + initialize_server 1 "$oid" > +' Nice ;-) Here, I also notice that we are not testing that serverTwo and servertwo are considered the same thanks to the use of icase comparison. We shouldn't compare URLs with strcasecmp(). Thanks.
Junio C Hamano <gitster@pobox.com> writes: >> + if (!strcasecmp(urls->v[i], remote_url)) >> + return 1; > > This is iffy. The <schema>://<host>/ part might want to be compared > case insensitively, but the rest of the URL is generally case > sensitive (unless the material served is stored on a machine with > case-insensitive filesystem)? > > Given that the existing URL must have come by either cloning from > this server or another related server or by an earlier > acceptFromServer behaviour, I do not see a need for being extra lax > here. We should be more careful about our use of case-insensitive > comparison, and I do not see how this URL comparison could be > something the end users would expect to be done case insensitively. Note that I am not advocating to compare the earlier part case insensitively while comparing the remainder case sensitively. Because we are not comparing URLs that come from random sources, but we know they come from a only few very controlled sources (i.e., the original server we cloned from, and the promisor remotes sugggested by the original server and other promisor remotes whose suggestion we accepted, recursively), it should be sufficient to compare the whole string case sensitively. Thanks.
On Mon, Jan 27, 2025 at 03:48:08PM -0800, Junio C Hamano wrote: > Christian Couder <christian.couder@gmail.com> writes: > > promisor.acceptFromServer:: > > If set to "all", a client will accept all the promisor remotes > > a server might advertise using the "promisor-remote" > > - capability. Default is "none", which means no promisor remote > > - advertised by a server will be accepted. By accepting a > > - promisor remote, the client agrees that the server might omit > > - objects that are lazily fetchable from this promisor remote > > - from its responses to "fetch" and "clone" requests from the > > - client. See linkgit:gitprotocol-v2[5]. > > + capability. If set to "knownName" the client will accept > > + promisor remotes which are already configured on the client > > + and have the same name as those advertised by the client. This > > + is not very secure, but could be used in a corporate setup > > + where servers and clients are trusted to not switch name and > > + URLs. > > I wonder if the reader needs to be told a bit more about the > security argument here. I imagine that the attack vector behind the > use of "secure" in the above paragraph is for a malicious server > that guesses a promisor remote name the client already uses, which > has a different URL from what the client expects to be associated > with the name, thereby such an acceptance means that the URL used in > future fetches would be replaced without the user's consent. Being > able to silently repoint the remote.origin.url at an evil repository > you control is indeed a powerful thing, I would guess. Of course, > in a corp environment, such a mechanism to drive the clients to a > new repository after upgrading or migrating may be extremely handy. I'm still very hesitant about letting the server-side control remote names at all, as I've already mentioned in previous review rounds. I think that it opens up the client for a whole lot of issues that should rather be avoided. Most importantly, it takes control away from the user, as they are not free anymore to name the remotes however they want to. It also casts into stone current behaviour because it is now part of the protocol. That being said, I get the point that it may make sense to be "agile" regarding the promisor remotes. But I think we can achieve that without having to compromise on either usability or security by using something like a promisor ID instead. Instead of announcing remote names, each announced promisor would have an ID. This ID is opaque and merely used to identify the promisor after the fact. It could for example be a UUID or something else that is mostly unique. The client will then create a promisor remote for each of the remote names. The name of the promisor is derived from the remote name that it is being created from. When there's a single promisor only it could for example be called "origin-promisor". When there are multiple ones they could be enumerated as "origin-promisor-1". In practice, we can even roll the dice to generate the name, even though that may not be as user friendly. These names are _not_ used to identify the promisor. Instead, we also write "remote.origin-promisor.id" and point it to the UUID that the server has advertised. Furthermore, for each promisor that gets added in this way, we'll also add "remote.origin.promisor" pointing to the promisor name. So on a subsequent fetch, we can now: 1. Look up all the promisors for the remote we're fetching from via the "remote.origin.promisor" multivalue config. 2. For each promisor, we figure out whether its ID is still being advertised by the remote server. If not, then it is a stale promisor and we can optionally remove it. 3. If the promisor ID is still being announced we double check whether the URL we have stored is still valid. If not, we can optionally update it to point to the new URL. This buys us a bunch of things: - We have promisor agility and are easily able to update URLs and prune out stale promisors. - Promisors can be renamed by the user at will, as they are identified by ID and not by remote name. We have to add logic to update the "remote.*.promisor" links, but that should be doable. - Each remote has its own set of promisors that cannot conflict with one another. From hereon, I'd also redesign "promisor.acceptFromServer" a bit: - "new" allows newly announced promisor remotes. - "update" allows updating existing promisor remotes. - "prune" allows pruning existing promisor remotes. All of that only applies to promisors connected to the current remote, of course. Furthermore, the values may be combined arbitrarily with one another, e.g. you can say "new,update" to only accept new or updated remotes but not allow pruning, or "update,prune" to only allow updating or pruning promisors without adding new ones. I realize that this is a bit more work than what we currently have, but I think that the design is significantly better than the proposed one. From my point of view none of this really needs to be part of the current patch series though, as these are all client-side changes in the first place, and as far as I understand we don't have the client-side ready yet anyway. The only change required would be to adapt the protocol so that we don't advertise a promisor names anymore, but instead promisor IDs. Patrick
On Thu, Jan 30, 2025 at 11:51 AM Patrick Steinhardt <ps@pks.im> wrote: > > On Mon, Jan 27, 2025 at 03:48:08PM -0800, Junio C Hamano wrote: > > I wonder if the reader needs to be told a bit more about the > > security argument here. I imagine that the attack vector behind the > > use of "secure" in the above paragraph is for a malicious server > > that guesses a promisor remote name the client already uses, which > > has a different URL from what the client expects to be associated > > with the name, thereby such an acceptance means that the URL used in > > future fetches would be replaced without the user's consent. Being > > able to silently repoint the remote.origin.url at an evil repository > > you control is indeed a powerful thing, I would guess. Of course, > > in a corp environment, such a mechanism to drive the clients to a > > new repository after upgrading or migrating may be extremely handy. > > I'm still very hesitant about letting the server-side control remote > names at all, as I've already mentioned in previous review rounds. I > think that it opens up the client for a whole lot of issues that should > rather be avoided. Most importantly, it takes control away from the > user, as they are not free anymore to name the remotes however they want > to. It also casts into stone current behaviour because it is now part of > the protocol. The server-side doesn't control remote names at all in this series. There is just a match or no match, depending on the value of promisor.acceptFromServer on the client-side, between what the client already has configured (for example using the clone -c option) and what the server advertises. > That being said, I get the point that it may make sense to be "agile" > regarding the promisor remotes. But I think we can achieve that without > having to compromise on either usability or security by using something > like a promisor ID instead. Thanks for the suggestion and the ideas, but I think that what you suggest could be discussed and implemented as part of a follow up patch series. This patch series implements basic checks with information (name and URL) that already exists on the server side and might also be available on the client side. For a number of use cases it is likely enough, and it's also not very complex. I would be fine with resending the series without this patch, if that's what is prefered though.
On Tue, Jan 28, 2025 at 12:48 AM Junio C Hamano <gitster@pobox.com> wrote: > > Christian Couder <christian.couder@gmail.com> writes: > > > A previous commit introduced a "promisor.acceptFromServer" configuration > > variable with only "None" or "All" as valid values. > > > > Let's introduce "KnownName" and "KnownUrl" as valid values for this > > configuration option to give more choice to a client about which > > promisor remotes it might accept among those that the server advertised. > > OK. > > > promisor.acceptFromServer:: > > If set to "all", a client will accept all the promisor remotes > > a server might advertise using the "promisor-remote" > > - capability. Default is "none", which means no promisor remote > > - advertised by a server will be accepted. By accepting a > > - promisor remote, the client agrees that the server might omit > > - objects that are lazily fetchable from this promisor remote > > - from its responses to "fetch" and "clone" requests from the > > - client. See linkgit:gitprotocol-v2[5]. > > + capability. If set to "knownName" the client will accept > > + promisor remotes which are already configured on the client > > + and have the same name as those advertised by the client. This > > + is not very secure, but could be used in a corporate setup > > + where servers and clients are trusted to not switch name and > > + URLs. > > I wonder if the reader needs to be told a bit more about the > security argument here. I imagine that the attack vector behind the > use of "secure" in the above paragraph is for a malicious server > that guesses a promisor remote name the client already uses, which > has a different URL from what the client expects to be associated > with the name, thereby such an acceptance means that the URL used in > future fetches would be replaced without the user's consent. There is currently no mechanism for the URL to be replaced on the client side by the one advertised by the server. The client will still use the URL that has been configured in another way, likely the clone `-c` option. But yeah it could lead to misunderstandings between the client and the server. And if we later develop such a mechanism to replace the URL on the client side, or to just temporarily use the one advertised by the server, this could be a problem. > Being > able to silently repoint the remote.origin.url at an evil repository > you control is indeed a powerful thing, I would guess. Of course, > in a corp environment, such a mechanism to drive the clients to a > new repository after upgrading or migrating may be extremely handy. Yeah, that's why there are chances that such a mechanism will be developed later, and we should take care of warning users even if currently there are no real security risks. > Or does the above paragraph assumes some other attack vectors, > perhaps? No, I don't see another attack vector. > > + If set to "knownUrl", the client will accept promisor > > + remotes which have both the same name and the same URL > > + configured on the client as the name and URL advertised by the > > + server. This is more secure than "all" or "knownUrl", so it Here I see that it should be "knownName" instead of "knownUrl". I have fixed this in the next version I will send soon. > > + should be used if possible instead of those options. Default > > + is "none", which means no promisor remote advertised by a > > + server will be accepted. > > OK. > > > diff --git a/promisor-remote.c b/promisor-remote.c > > index 5ac282ed27..790a96aa19 100644 > > --- a/promisor-remote.c > > +++ b/promisor-remote.c > > @@ -370,30 +370,73 @@ char *promisor_remote_info(struct repository *repo) > > return strbuf_detach(&sb, NULL); > > } > > > > +/* > > + * Find first index of 'vec' where there is 'val'. 'val' is compared > > + * case insensively to the strings in 'vec'. If not found 'vec->nr' is I mean "insensitively" instead of "insensively". This is fixed in the next version. > > + * returned. > > + */ > > +static size_t strvec_find_index(struct strvec *vec, const char *val) > > +{ > > + for (size_t i = 0; i < vec->nr; i++) > > + if (!strcasecmp(vec->v[i], val)) > > + return i; > > + return vec->nr; > > +} > > Hmph, without the hardcoded strcasecmp(), strvec_find() might make a > fine public API in <strvec.h>. Yeah, but I didn't find any other places in the code where a strvec_find() function could be useful. > Unless we intend to create a generic function that qualifies as a > part of the public strvec API, we shouldn't call it strvec_anything. > This is a great helper that finds a matching remote nickname from > list of remote nicknames, so > > remote_nick_find(struct strvec *nicks, const char *nick) > > may be more appropriate. Ok, I have renamed it remote_nick_find() in the next version. > When we lift it out of here and make it > more generic to move it to strvec.[ch], perhaps > > size_t strvec_find(struct strvec *vec, void *needle, > int (*match)(const char *, void *)) { > for (size_t ix = 0; ix < vec->nr, ix++) > if (match(vec->v[ix], needle)) > return ix; > return vec->nr; > } > > which will be used to rewrite remote_nick_find() like so: > > static int nicks_match(const char *nick, void *needle) > { > return !strcasecmp(nick, (conat char *)needle); > } > > remote_hick_find(struct strvec *nicks, const char *nick) > { > return strvec_find(nicks, nick, nicks_match); > } > > it would be better to use a more generic parameter name "vec", but > until then, it is better to be more specific and explicit about the > reason why the immediate callers call the function for, which is > where my "nicks" vs "nick" comes from (it is OK to call the latter > "needle", though). Yeah, I would be fine with this solution if there were other places where strvec_find() could be useful. > > enum accept_promisor { > > ACCEPT_NONE = 0, > > + ACCEPT_KNOWN_URL, > > + ACCEPT_KNOWN_NAME, > > ACCEPT_ALL > > }; > > > > static int should_accept_remote(enum accept_promisor accept, > > - const char *remote_name UNUSED, > > - const char *remote_url UNUSED) > > + const char *remote_name, const char *remote_url, > > + struct strvec *names, struct strvec *urls) > > { > > + size_t i; > > + > > if (accept == ACCEPT_ALL) > > return 1; > > > > - BUG("Unhandled 'enum accept_promisor' value '%d'", accept); > > + i = strvec_find_index(names, remote_name); > > + > > + if (i >= names->nr) > > + /* We don't know about that remote */ > > + return 0; > > OK. > > > + if (accept == ACCEPT_KNOWN_NAME) > > + return 1; > > + > > + if (accept != ACCEPT_KNOWN_URL) > > + BUG("Unhandled 'enum accept_promisor' value '%d'", accept); > > I can see why this defensiveness may be a good idea than not having > any, but I wonder if we can take advantage of compile time checks > some compilers have to ensure that case arms in a switch statement > are exhausitive? Perhaps, but otherwise I am not sure that using a switch statement would make the code better. The ACCEPT_KNOWN_NAME and ACCEPT_KNOWN_URL cases need to share some code and the ACCEPT_NONE case seems better handled by the caller. > > + if (!strcasecmp(urls->v[i], remote_url)) > > + return 1; > > This is iffy. The <schema>://<host>/ part might want to be compared > case insensitively, but the rest of the URL is generally case > sensitive (unless the material served is stored on a machine with > case-insensitive filesystem)? I am fine with comparing the whole URL case sensitively. So "strcasecmp()" is replaced with "strcmp()" in the next version. > Given that the existing URL must have come by either cloning from > this server or another related server or by an earlier > acceptFromServer behaviour, I do not see a need for being extra lax > here. We should be more careful about our use of case-insensitive > comparison, and I do not see how this URL comparison could be > something the end users would expect to be done case insensitively. In another email you also said: > Note that I am not advocating to compare the earlier part case > insensitively while comparing the remainder case sensitively. > > Because we are not comparing URLs that come from random sources, but > we know they come from a only few very controlled sources (i.e., the > original server we cloned from, and the promisor remotes sugggested > by the original server and other promisor remotes whose suggestion > we accepted, recursively), it should be sufficient to compare the > whole string case sensitively. When I implemented this, I was just thinking that some users might for example spell the scheme part "HTTPS" in their client config and then complain that it should work when the server advertises the same URL with "https" instead of "HTTPS", because yeah the <schema>://<host>/ part should be case insensitive. But I agree we can start with everything being case sensitive and improve on this (likely by comparing the <schema>://<host>/ part case insensitively and the rest case sensitively) if/when users complain. > > -static void filter_promisor_remote(struct strvec *accepted, const char *info) > > +static void filter_promisor_remote(struct repository *repo, > > + struct strvec *accepted, > > + const char *info) > > { > > struct strbuf **remotes; > > const char *accept_str; > > enum accept_promisor accept = ACCEPT_NONE; > > + struct strvec names = STRVEC_INIT; > > + struct strvec urls = STRVEC_INIT; > > > > if (!git_config_get_string_tmp("promisor.acceptfromserver", &accept_str)) { > > if (!accept_str || !*accept_str || !strcasecmp("None", accept_str)) > > Not a fault of this step, but is it sensible to even expect > !accept_str in an error case? *accept_str could be NUL, but > accept_str be either left uninitialized (because this caller does > not initialize it) when the get_string_tmp() returns non-zero, or > points at the internal cached value in the config_set if it returns > 0 (and the control comes into this block). Yeah, I agree accept_str cannot be NULL here. I have removed "!accept_str || " in the next version. > > accept = ACCEPT_NONE; > > + else if (!strcasecmp("KnownUrl", accept_str)) > > + accept = ACCEPT_KNOWN_URL; > > + else if (!strcasecmp("KnownName", accept_str)) > > + accept = ACCEPT_KNOWN_NAME; > > else if (!strcasecmp("All", accept_str)) > > accept = ACCEPT_ALL; > > else > > Ditto about icase for all of the above. These are config values that can take only a specific set of values. I think those are most often compared case insensitively in Git, for example there is no distinction between "True" and "true" for bool values. So I am not sure what you suggest here. > > +test_expect_success "clone with 'KnownUrl' and different remote urls" ' > > + ln -s server2 serverTwo && > > + > > + git -C server config promisor.advertise true && > > + > > + # Clone from server to create a client > > + GIT_NO_LAZY_FETCH=0 git clone -c remote.server2.promisor=true \ > > + -c remote.server2.fetch="+refs/heads/*:refs/remotes/server2/*" \ > > + -c remote.server2.url="file://$(pwd)/serverTwo" \ > > + -c promisor.acceptfromserver=KnownUrl \ > > + --no-local --filter="blob:limit=5k" server client && > > + test_when_finished "rm -rf client" && > > + > > + # Check that the largest object is not missing on the server > > + check_missing_objects server 0 "" && > > + > > + # Reinitialize server so that the largest object is missing again > > + initialize_server 1 "$oid" > > +' > > Nice ;-) > > Here, I also notice that we are not testing that serverTwo and > servertwo are considered the same thanks to the use of icase > comparison. We shouldn't compare URLs with strcasecmp(). Ok, thanks.
diff --git a/Documentation/config/promisor.txt b/Documentation/config/promisor.txt index 9cbfe3e59e..d1364bc018 100644 --- a/Documentation/config/promisor.txt +++ b/Documentation/config/promisor.txt @@ -12,9 +12,19 @@ promisor.advertise:: promisor.acceptFromServer:: If set to "all", a client will accept all the promisor remotes a server might advertise using the "promisor-remote" - capability. Default is "none", which means no promisor remote - advertised by a server will be accepted. By accepting a - promisor remote, the client agrees that the server might omit - objects that are lazily fetchable from this promisor remote - from its responses to "fetch" and "clone" requests from the - client. See linkgit:gitprotocol-v2[5]. + capability. If set to "knownName" the client will accept + promisor remotes which are already configured on the client + and have the same name as those advertised by the client. This + is not very secure, but could be used in a corporate setup + where servers and clients are trusted to not switch name and + URLs. If set to "knownUrl", the client will accept promisor + remotes which have both the same name and the same URL + configured on the client as the name and URL advertised by the + server. This is more secure than "all" or "knownUrl", so it + should be used if possible instead of those options. Default + is "none", which means no promisor remote advertised by a + server will be accepted. By accepting a promisor remote, the + client agrees that the server might omit objects that are + lazily fetchable from this promisor remote from its responses + to "fetch" and "clone" requests from the client. See + linkgit:gitprotocol-v2[5]. diff --git a/promisor-remote.c b/promisor-remote.c index 5ac282ed27..790a96aa19 100644 --- a/promisor-remote.c +++ b/promisor-remote.c @@ -370,30 +370,73 @@ char *promisor_remote_info(struct repository *repo) return strbuf_detach(&sb, NULL); } +/* + * Find first index of 'vec' where there is 'val'. 'val' is compared + * case insensively to the strings in 'vec'. If not found 'vec->nr' is + * returned. + */ +static size_t strvec_find_index(struct strvec *vec, const char *val) +{ + for (size_t i = 0; i < vec->nr; i++) + if (!strcasecmp(vec->v[i], val)) + return i; + return vec->nr; +} + enum accept_promisor { ACCEPT_NONE = 0, + ACCEPT_KNOWN_URL, + ACCEPT_KNOWN_NAME, ACCEPT_ALL }; static int should_accept_remote(enum accept_promisor accept, - const char *remote_name UNUSED, - const char *remote_url UNUSED) + const char *remote_name, const char *remote_url, + struct strvec *names, struct strvec *urls) { + size_t i; + if (accept == ACCEPT_ALL) return 1; - BUG("Unhandled 'enum accept_promisor' value '%d'", accept); + i = strvec_find_index(names, remote_name); + + if (i >= names->nr) + /* We don't know about that remote */ + return 0; + + if (accept == ACCEPT_KNOWN_NAME) + return 1; + + if (accept != ACCEPT_KNOWN_URL) + BUG("Unhandled 'enum accept_promisor' value '%d'", accept); + + if (!strcasecmp(urls->v[i], remote_url)) + return 1; + + warning(_("known remote named '%s' but with url '%s' instead of '%s'"), + remote_name, urls->v[i], remote_url); + + return 0; } -static void filter_promisor_remote(struct strvec *accepted, const char *info) +static void filter_promisor_remote(struct repository *repo, + struct strvec *accepted, + const char *info) { struct strbuf **remotes; const char *accept_str; enum accept_promisor accept = ACCEPT_NONE; + struct strvec names = STRVEC_INIT; + struct strvec urls = STRVEC_INIT; if (!git_config_get_string_tmp("promisor.acceptfromserver", &accept_str)) { if (!accept_str || !*accept_str || !strcasecmp("None", accept_str)) accept = ACCEPT_NONE; + else if (!strcasecmp("KnownUrl", accept_str)) + accept = ACCEPT_KNOWN_URL; + else if (!strcasecmp("KnownName", accept_str)) + accept = ACCEPT_KNOWN_NAME; else if (!strcasecmp("All", accept_str)) accept = ACCEPT_ALL; else @@ -404,6 +447,9 @@ static void filter_promisor_remote(struct strvec *accepted, const char *info) if (accept == ACCEPT_NONE) return; + if (accept != ACCEPT_ALL) + promisor_info_vecs(repo, &names, &urls); + /* Parse remote info received */ remotes = strbuf_split_str(info, ';', 0); @@ -433,7 +479,7 @@ static void filter_promisor_remote(struct strvec *accepted, const char *info) if (remote_url) decoded_url = url_percent_decode(remote_url); - if (decoded_name && should_accept_remote(accept, decoded_name, decoded_url)) + if (decoded_name && should_accept_remote(accept, decoded_name, decoded_url, &names, &urls)) strvec_push(accepted, decoded_name); strbuf_list_free(elems); @@ -441,6 +487,8 @@ static void filter_promisor_remote(struct strvec *accepted, const char *info) free(decoded_url); } + strvec_clear(&names); + strvec_clear(&urls); strbuf_list_free(remotes); } @@ -449,7 +497,7 @@ char *promisor_remote_reply(const char *info) struct strvec accepted = STRVEC_INIT; struct strbuf reply = STRBUF_INIT; - filter_promisor_remote(&accepted, info); + filter_promisor_remote(the_repository, &accepted, info); if (!accepted.nr) return NULL; diff --git a/t/t5710-promisor-remote-capability.sh b/t/t5710-promisor-remote-capability.sh index 0390c1dbad..5bce99f5eb 100755 --- a/t/t5710-promisor-remote-capability.sh +++ b/t/t5710-promisor-remote-capability.sh @@ -160,6 +160,74 @@ test_expect_success "init + fetch with promisor.advertise set to 'true'" ' check_missing_objects server 1 "$oid" ' +test_expect_success "clone with promisor.acceptfromserver set to 'KnownName'" ' + git -C server config promisor.advertise true && + + # Clone from server to create a client + GIT_NO_LAZY_FETCH=0 git clone -c remote.server2.promisor=true \ + -c remote.server2.fetch="+refs/heads/*:refs/remotes/server2/*" \ + -c remote.server2.url="file://$(pwd)/server2" \ + -c promisor.acceptfromserver=KnownName \ + --no-local --filter="blob:limit=5k" server client && + test_when_finished "rm -rf client" && + + # Check that the largest object is still missing on the server + check_missing_objects server 1 "$oid" +' + +test_expect_success "clone with 'KnownName' and different remote names" ' + git -C server config promisor.advertise true && + + # Clone from server to create a client + GIT_NO_LAZY_FETCH=0 git clone -c remote.serverTwo.promisor=true \ + -c remote.serverTwo.fetch="+refs/heads/*:refs/remotes/server2/*" \ + -c remote.serverTwo.url="file://$(pwd)/server2" \ + -c promisor.acceptfromserver=KnownName \ + --no-local --filter="blob:limit=5k" server client && + test_when_finished "rm -rf client" && + + # Check that the largest object is not missing on the server + check_missing_objects server 0 "" && + + # Reinitialize server so that the largest object is missing again + initialize_server 1 "$oid" +' + +test_expect_success "clone with promisor.acceptfromserver set to 'KnownUrl'" ' + git -C server config promisor.advertise true && + + # Clone from server to create a client + GIT_NO_LAZY_FETCH=0 git clone -c remote.server2.promisor=true \ + -c remote.server2.fetch="+refs/heads/*:refs/remotes/server2/*" \ + -c remote.server2.url="file://$(pwd)/server2" \ + -c promisor.acceptfromserver=KnownUrl \ + --no-local --filter="blob:limit=5k" server client && + test_when_finished "rm -rf client" && + + # Check that the largest object is still missing on the server + check_missing_objects server 1 "$oid" +' + +test_expect_success "clone with 'KnownUrl' and different remote urls" ' + ln -s server2 serverTwo && + + git -C server config promisor.advertise true && + + # Clone from server to create a client + GIT_NO_LAZY_FETCH=0 git clone -c remote.server2.promisor=true \ + -c remote.server2.fetch="+refs/heads/*:refs/remotes/server2/*" \ + -c remote.server2.url="file://$(pwd)/serverTwo" \ + -c promisor.acceptfromserver=KnownUrl \ + --no-local --filter="blob:limit=5k" server client && + test_when_finished "rm -rf client" && + + # Check that the largest object is not missing on the server + check_missing_objects server 0 "" && + + # Reinitialize server so that the largest object is missing again + initialize_server 1 "$oid" +' + test_expect_success "clone with promisor.advertise set to 'true' but don't delete the client" ' git -C server config promisor.advertise true &&
A previous commit introduced a "promisor.acceptFromServer" configuration variable with only "None" or "All" as valid values. Let's introduce "KnownName" and "KnownUrl" as valid values for this configuration option to give more choice to a client about which promisor remotes it might accept among those that the server advertised. In case of "KnownName", the client will accept promisor remotes which are already configured on the client and have the same name as those advertised by the client. This could be useful in a corporate setup where servers and clients are trusted to not switch names and URLs, but where some kind of control is still useful. In case of "KnownUrl", the client will accept promisor remotes which have both the same name and the same URL configured on the client as the name and URL advertised by the server. This is the most secure option, so it should be used if possible. Signed-off-by: Christian Couder <chriscool@tuxfamily.org> --- Documentation/config/promisor.txt | 22 ++++++--- promisor-remote.c | 60 ++++++++++++++++++++--- t/t5710-promisor-remote-capability.sh | 68 +++++++++++++++++++++++++++ 3 files changed, 138 insertions(+), 12 deletions(-)