[v2,0/4] Introduce a "promisor-remote" capability

Message ID	20240910163000.1985723-1-christian.couder@gmail.com (mailing list archive)
Headers	show Received: from mail-wr1-f44.google.com (mail-wr1-f44.google.com [209.85.221.44]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8440F38F9C for <git@vger.kernel.org>; Tue, 10 Sep 2024 16:30:21 +0000 (UTC) From: Christian Couder <christian.couder@gmail.com> To: git@vger.kernel.org Cc: Junio C Hamano <gitster@pobox.com>, John Cai <johncai86@gmail.com>, Patrick Steinhardt <ps@pks.im>, Taylor Blau <me@ttaylorr.com>, Eric Sunshine <sunshine@sunshineco.com>, Christian Couder <christian.couder@gmail.com> Subject: [PATCH v2 0/4] Introduce a "promisor-remote" capability Date: Tue, 10 Sep 2024 18:29:56 +0200 Message-ID: <20240910163000.1985723-1-christian.couder@gmail.com> In-Reply-To: <20240731134014.2299361-1-christian.couder@gmail.com> References: <20240731134014.2299361-1-christian.couder@gmail.com> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	Introduce a "promisor-remote" capability \| expand [v2,0/4] Introduce a "promisor-remote" capability [v2,1/4] version: refactor strbuf_sanitize() [v2,2/4] strbuf: refactor strbuf_trim_trailing_ch() [v2,3/4] Add 'promisor-remote' capability to protocol v2 [v2,4/4] promisor-remote: check advertised name or URL

Christian Couder Sept. 10, 2024, 4:29 p.m. UTC

Earlier this year, I sent 3 versions of a patch series with the goal
of allowing a client C to clone from a server S while using the same
promisor remote X that S already use. See:

https://lore.kernel.org/git/20240418184043.2900955-1-christian.couder@gmail.com/

Junio suggested to implement that feature using:

"a protocol extension that lets S tell C that S wants C to fetch
missing objects from X (which means that if C knows about X in its
".git/config" then there is no need for end-user interaction at all),
or a protocol extension that C tells S that C is willing to see
objects available from X omitted when S does not have them (again,
this could be done by looking at ".git/config" at C, but there may be
security implications???)"

This patch series implements that protocol extension called
"promisor-remote" (that name is open to change or simplification)
which allows S and C to agree on C using X directly or not.

I have tried to implement it in a quite generic way that could allow S
and C to share more information about promisor remotes and how to use
them.

For now, C doesn't use the information it gets from S when cloning.
That information is only used to decide if C is OK to use the promisor
remotes advertised by S. But this could change in the future which
could make it much simpler for clients than using the current way of
passing information about X with the `-c` option of `git clone` many
times on the command line.

Another improvement could be to not require GIT_NO_LAZY_FETCH=0 when S
and C have agreed on using S.

Changes compared to version 1
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

  - In patch 1/4, fixed a typo spotted by Eric Sunshine when I sent
    this patch in a previous patch-series.

  - In patch 3/4, changed server and client behavior, so that they
    don't advertise the "promisor-remote" capability at all if they
    don't use it. Changed the doc and commit message accordingly.

  - In patch 3/4, related to the previous change, modified both
    promisor_remote_reply() and promisor_remote_info() so they return
    NULL when the "promisor-remote" capability shouldn't be advertised
    in the protocol, and the string that should appear in the
    advertisement otherwise. Adapted the callers and the doc of these
    functions accordingly.

  - In patch 3/4, fixed lack of urlencoding for remote names, as it
    looks like remote names can contain a lot of characters like ','
    and ';'. So it's just better to urlencode them in the same way
    URLs are also urlencoded. Also used url_percent_decode(), instead
    of url_decode(), to decode both names and urls, as it looks more
    suited to decode what strbuf_addstr_urlencode() encoded.

  - In patch 3/4, simplified a bit some `if` conditions in
    loops. Changed `if (sb.len)` or `if (i != 0)` to just `if (i)`.

  - In patch 3/4, improve doc and commit message to talk about the
    case of a server advertising promisor remotes which are better
    connected to the world.

  - In patch 3/4, improve doc and commit message to talk about the
    consequences for the client and the server of promisor remotes
    being advertised or accepted.

  - In patch 3/4, improve the doc to say that `pr-name` MUST be a
    valid remote name, and the ';' and ',' characters MUST be encoded
    if they appear in `pr-name` or `pr-url`.

  - In patch 3/4, changed 'pm-*' to 'pr-*' in commit message to match
    what is in the doc.

  - In patch 3/4, made a number of other minor improvements to the doc
    and commit message.

  - In patch 4/4, improve doc and commit message to add information
    about the security implications of the new "KnownName" and
    "KnownUrl" options for the "promisor.acceptFromServer" config
    variable.

Thanks to Junio, Patrick, Eric and Taylor for their suggestions.

CI tests
~~~~~~~~

See: https://github.com/chriscool/git/actions/runs/10796188005

Range diff compared to version 1
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

1:  6260022d20 ! 1:  0d9d094181 version: refactor strbuf_sanitize()
    @@ Commit message
         While at it, let's also make a few small improvements:
           - use 'size_t' for 'i' instead of 'int',
           - move the declaration of 'i' inside the 'for ( ... )',
    -      - use strbuf_detach() to explicitely detach the string contained by
    +      - use strbuf_detach() to explicitly detach the string contained by
             the 'sb' strbuf.
     
    +    Helped-by: Eric Sunshine <sunshine@sunshineco.com>
         Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
     
      ## strbuf.c ##
2:  ff1246b07c = 2:  fc53229eff strbuf: refactor strbuf_trim_trailing_ch()
3:  cb7250d06e ! 3:  5c507e427f Add 'promisor-remote' capability to protocol v2
    @@ Metadata
      ## Commit message ##
         Add 'promisor-remote' capability to protocol v2
     
    -    When a server repository S borrows some objects from a promisor remote X,
    -    then a client repository C which would like to clone or fetch from S might,
    -    or might not, want to also borrow objects from X. Also S might, or might
    -    not, want to advertise X as a good way for C to directly get objects from,
    -    instead of C getting everything through S.
    +    When a server S knows that some objects from a repository are available
    +    from a promisor remote X, S might want to suggest to a client C cloning
    +    or fetching the repo from S that C should use X directly instead of S
    +    for these objects.
     
    -    To allow S and C to agree on C using X or not, let's introduce a new
    -    "promisor-remote" capability in the protocol v2, as well as a few new
    -    configuration variables:
    +    Note that this could happen both in the case S itself doesn't have the
    +    objects and borrows them from X, and in the case S has the objects but
    +    knows that X is better connected to the world (e.g., it is in a
    +    $LARGEINTERNETCOMPANY datacenter with petabit/s backbone connections)
    +    than S. Implementation of the latter case, which would require S to
    +    omit in its response the objects available on X, is left for future
    +    improvement though.
    +
    +    Then C might or might not, want to get the objects from X, and should
    +    let S know about this.
    +
    +    To allow S and C to agree and let each other know about C using X or
    +    not, let's introduce a new "promisor-remote" capability in the
    +    protocol v2, as well as a few new configuration variables:
     
           - "promisor.advertise" on the server side, and:
           - "promisor.acceptFromServer" on the client side.
     
         By default, or if "promisor.advertise" is set to 'false', a server S will
    -    advertise only the "promisor-remote" capability without passing any
    -    argument through this capability. This means that S supports the new
    -    capability but doesn't wish any client C to directly access any promisor
    -    remote X S might use.
    +    not advertise the "promisor-remote" capability.
    +
    +    If S doesn't advertise the "promisor-remote" capability, then a client C
    +    replying to S shouldn't advertise the "promisor-remote" capability
    +    either.
     
         If "promisor.advertise" is set to 'true', S will advertise its promisor
         remotes with a string like:
     
    -      promisor-remote=<pm-info>[;<pm-info>]...
    +      promisor-remote=<pr-info>[;<pr-info>]...
     
    -    where each <pm-info> element contains information about a single
    +    where each <pr-info> element contains information about a single
         promisor remote in the form:
     
    -      name=<pm-name>[,url=<pm-url>]
    +      name=<pr-name>[,url=<pr-url>]
     
    -    where <pm-name> is the name of a promisor remote and <pm-url> is the
    -    urlencoded url of the promisor remote named <pm-name>.
    +    where <pr-name> is the urlencoded name of a promisor remote and
    +    <pr-url> is the urlencoded URL of the promisor remote named <pr-name>.
     
         For now, the URL is passed in addition to the name. In the future, it
         might be possible to pass other information like a filter-spec that the
    @@ Commit message
         use when retrieving objects from X.
     
         It might also be possible in the future for "promisor.advertise" to have
    -    other values like "onlyName", so that no URL is advertised.
    +    other values. For example a value like "onlyName" could prevent S from
    +    advertising URLs, which could help in case C should use a different URL
    +    for X than the URL S is using. (The URL S is using might be an internal
    +    one on the server side for example.)
     
    -    By default or if "promisor.acceptFromServer" is set to "None", the
    -    client will not accept to use the promisor remotes that might have been
    -    advertised by the server. In this case, the client will advertise only
    -    "promisor-remote" in its reply to the server. This means that the client
    -    has the "promisor-remote" capability but decided not to use any of the
    -    promisor remotes that the server might have advertised.
    +    By default or if "promisor.acceptFromServer" is set to "None", C will
    +    not accept to use the promisor remotes that might have been advertised
    +    by S. In this case, C will not advertise any "promisor-remote"
    +    capability in its reply to S.
     
    -    If "promisor.acceptFromServer" is set to "All", on the contrary, the
    -    client will accept to use all the promisor remotes that the server
    -    advertised and it will reply with a string like:
    +    If "promisor.acceptFromServer" is set to "All" and S advertised some
    +    promisor remotes, then on the contrary, C will accept to use all the
    +    promisor remotes that S advertised and C will reply with a string like:
     
    -      promisor-remote=<pm-name>[;<pm-name>]...
    +      promisor-remote=<pr-name>[;<pr-name>]...
     
    -    where the <pm-name> elements are the names of all the promisor remotes
    -    the server advertised. If the server advertised no promisor remote
    -    though, the client will reply with just "promisor-remote".
    +    where the <pr-name> elements are the urlencoded names of all the
    +    promisor remotes S advertised.
     
         In a following commit, other values for "promisor.acceptFromServer" will
    -    be implemented so that the client will be able to decide the promisor
    -    remotes it accepts depending on the name and URL it received from the
    -    server. So even if that name and URL information is not used much right
    -    now, it will be needed soon.
    +    be implemented, so that C will be able to decide the promisor remotes it
    +    accepts depending on the name and URL it received from S. So even if
    +    that name and URL information is not used much right now, it will be
    +    needed soon.
     
    +    Helped-by: Taylor Blau <me@ttaylorr.com>
    +    Helped-by: Patrick Steinhardt <ps@pks.im>
         Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
     
      ## Documentation/config/promisor.txt ##
    @@ Documentation/config/promisor.txt
     +promisor.advertise::
     +  If set to "true", a server will use the "promisor-remote"
     +  capability, see linkgit:gitprotocol-v2[5], to advertise the
    -+  promisor remotes it is using if any. Default is "false", which
    -+  means no promisor remote is advertised.
    ++  promisor remotes it is using, if it uses some. Default is
    ++  "false", which means the "promisor-remote" capability is not
    ++  advertised.
     +
     +promisor.acceptFromServer::
     +  If set to "all", a client will accept all the promisor remotes
     +  a server might advertise using the "promisor-remote"
    -+  capability, see linkgit:gitprotocol-v2[5]. Default is "none",
    -+  which means no promisor remote advertised by a server will be
    -+  accepted.
    ++  capability. Default is "none", which means no promisor remote
    ++  advertised by a server will be accepted. By accepting a
    ++  promisor remote, the client agrees that the server might omit
    ++  objects that are lazily fetchable from this promisor remote
    ++  from its responses to "fetch" and "clone" requests from the
    ++  client. See linkgit:gitprotocol-v2[5].
     
      ## Documentation/gitprotocol-v2.txt ##
     @@ Documentation/gitprotocol-v2.txt: retrieving the header from a bundle at the indicated URI, and thus
    @@ Documentation/gitprotocol-v2.txt: retrieving the header from a bundle at the ind
     +promisor-remote=<pr-infos>
     +~~~~~~~~~~~~~~~~~~~~~~~~~~
     +
    -+The server may advertise some promisor remotes it is using, if it's OK
    -+for the server that a client uses them too. In this case <pr-infos>
    -+should be of the form:
    ++The server may advertise some promisor remotes it is using or knows
    ++about to a client which may want to use them as its promisor remotes,
    ++instead of this repository. In this case <pr-infos> should be of the
    ++form:
     +
     +  pr-infos = pr-info | pr-infos ";" pr-info
     +
     +  pr-info = "name=" pr-name | "name=" pr-name "," "url=" pr-url
     +
    -+where `pr-name` is the name of a promisor remote, and `pr-url` the
    -+urlencoded URL of that promisor remote.
    ++where `pr-name` is the urlencoded name of a promisor remote, and
    ++`pr-url` the urlencoded URL of that promisor remote.
     +
    -+In this case a client wanting to use one or more promisor remotes the
    -+server advertised should reply with "promisor-remote=<pr-names>" where
    -+<pr-names> should be of the form:
    ++In this case, if the client decides to use one or more promisor
    ++remotes the server advertised, it can reply with
    ++"promisor-remote=<pr-names>" where <pr-names> should be of the form:
     +
     +  pr-names = pr-name | pr-names ";" pr-name
     +
    -+where `pr-name` is the name of a promisor remote the server
    -+advertised.
    ++where `pr-name` is the urlencoded name of a promisor remote the server
    ++advertised and the client accepts.
    ++
    ++Note that, everywhere in this document, `pr-name` MUST be a valid
    ++remote name, and the ';' and ',' characters MUST be encoded if they
    ++appear in `pr-name` or `pr-url`.
     +
    -+If the server prefers a client not to use any promisor remote the
    -+server uses, or if the server doesn't use any promisor remote, it
    -+should only advertise "promisor-remote" without any value or "=" sign
    -+after it.
    ++If the server doesn't know any promisor remote that could be good for
    ++a client to use, or prefers a client not to use any promisor remote it
    ++uses or knows about, it shouldn't advertise the "promisor-remote"
    ++capability at all.
     +
     +In this case, or if the client doesn't want to use any promisor remote
    -+the server advertised, the client should reply only "promisor-remote"
    -+without any value or "=" sign after it.
    ++the server advertised, the client shouldn't advertise the
    ++"promisor-remote" capability at all in its reply.
     +
     +The "promisor.advertise" and "promisor.acceptFromServer" configuration
     +options can be used on the server and client side respectively to
     +control what they advertise or accept respectively. See the
     +documentation of these configuration options for more information.
    ++
    ++Note that in the future it would be nice if the "promisor-remote"
    ++protocol capability could be used by the server, when responding to
    ++`git fetch` or `git clone`, to advertise better-connected remotes that
    ++the client can use as promisor remotes, instead of this repository, so
    ++that the client can lazily fetch objects from these other
    ++better-connected remotes. This would require the server to omit in its
    ++response the objects available on the better-connected remotes that
    ++the client has accepted. This hasn't been implemented yet though. So
    ++for now this "promisor-remote" capability is useful only when the
    ++server advertises some promisor remotes it already uses to borrow
    ++objects from.
     +
      GIT
      ---
    @@ connect.c: static void send_capabilities(int fd_out, struct packet_reader *reade
        }
     +  if (server_feature_v2("promisor-remote", &promisor_remote_info)) {
     +          char *reply = promisor_remote_reply(promisor_remote_info);
    -+          packet_write_fmt(fd_out, "promisor-remote%s", reply ? reply : "");
    -+          free(reply);
    ++          if (reply) {
    ++                  packet_write_fmt(fd_out, "promisor-remote=%s", reply);
    ++                  free(reply);
    ++          }
     +  }
      }
      
    @@ promisor-remote.c: void promisor_remote_get_direct(struct repository *repo,
     +  }
     +}
     +
    -+void promisor_remote_info(struct repository *repo, struct strbuf *buf)
    ++char *promisor_remote_info(struct repository *repo)
     +{
     +  struct strbuf sb = STRBUF_INIT;
     +  int advertise_promisors = 0;
    @@ promisor-remote.c: void promisor_remote_get_direct(struct repository *repo,
     +  git_config_get_bool("promisor.advertise", &advertise_promisors);
     +
     +  if (!advertise_promisors)
    -+          return;
    ++          return NULL;
     +
     +  promisor_info_vecs(repo, &names, &urls);
     +
    ++  if (!names.nr)
    ++          return NULL;
    ++
     +  for (size_t i = 0; i < names.nr; i++) {
    -+          if (sb.len)
    ++          if (i)
     +                  strbuf_addch(&sb, ';');
    -+          strbuf_addf(&sb, "name=%s", names.v[i]);
    ++          strbuf_addstr(&sb, "name=");
    ++          strbuf_addstr_urlencode(&sb, names.v[i], allow_unsanitized);
     +          if (urls.v[i]) {
     +                  strbuf_addstr(&sb, ",url=");
     +                  strbuf_addstr_urlencode(&sb, urls.v[i], allow_unsanitized);
    @@ promisor-remote.c: void promisor_remote_get_direct(struct repository *repo,
     +  }
     +
     +  strbuf_sanitize(&sb);
    -+  strbuf_addbuf(buf, &sb);
     +
     +  strvec_clear(&names);
     +  strvec_clear(&urls);
    ++
    ++  return strbuf_detach(&sb, NULL);
     +}
     +
     +enum accept_promisor {
    @@ promisor-remote.c: void promisor_remote_get_direct(struct repository *repo,
     +          struct strbuf **elems;
     +          const char *remote_name = NULL;
     +          const char *remote_url = NULL;
    ++          char *decoded_name = NULL;
     +          char *decoded_url = NULL;
     +
     +          strbuf_trim_trailing_ch(remotes[i], ';');
    @@ promisor-remote.c: void promisor_remote_get_direct(struct repository *repo,
     +                                  elems[j]->buf);
     +          }
     +
    -+          decoded_url = url_decode(remote_url);
    ++          if (remote_name)
    ++                  decoded_name = url_percent_decode(remote_name);
    ++          if (remote_url)
    ++                  decoded_url = url_percent_decode(remote_url);
     +
    -+          if (should_accept_remote(accept, remote_name, decoded_url))
    -+                  strvec_push(accepted, remote_name);
    ++          if (decoded_name && should_accept_remote(accept, decoded_name, decoded_url))
    ++                  strvec_push(accepted, decoded_name);
     +
     +          strbuf_list_free(elems);
    ++          free(decoded_name);
     +          free(decoded_url);
     +  }
     +
    @@ promisor-remote.c: void promisor_remote_get_direct(struct repository *repo,
     +
     +  filter_promisor_remote(the_repository, &accepted, info);
     +
    -+  strbuf_addch(&reply, '=');
    ++  if (!accepted.nr)
    ++          return NULL;
     +
     +  for (size_t i = 0; i < accepted.nr; i++) {
    -+          if (i != 0)
    ++          if (i)
     +                  strbuf_addch(&reply, ';');
    -+          strbuf_addstr(&reply, accepted.v[i]);
    ++          strbuf_addstr_urlencode(&reply, accepted.v[i], allow_unsanitized);
     +  }
     +
     +  strvec_clear(&accepted);
    @@ promisor-remote.c: void promisor_remote_get_direct(struct repository *repo,
     +
     +  for (size_t i = 0; accepted_remotes[i]; i++) {
     +          struct promisor_remote *p;
    ++          char *decoded_remote;
     +
     +          strbuf_trim_trailing_ch(accepted_remotes[i], ';');
    -+          p = repo_promisor_remote_find(r, accepted_remotes[i]->buf);
    ++          decoded_remote = url_percent_decode(accepted_remotes[i]->buf);
    ++
    ++          p = repo_promisor_remote_find(r, decoded_remote);
     +          if (p)
     +                  p->accepted = 1;
     +          else
     +                  warning(_("accepted promisor remote '%s' not found"),
    -+                          accepted_remotes[i]->buf);
    ++                          decoded_remote);
    ++
    ++          free(decoded_remote);
     +  }
     +
     +  strbuf_list_free(accepted_remotes);
    @@ promisor-remote.h: void promisor_remote_get_direct(struct repository *repo,
                                int oid_nr);
      
     +/*
    -+ * Append promisor remote info to buf. Useful for a server to
    -+ * advertise the promisor remotes it uses.
    ++ * Prepare a "promisor-remote" advertisement by a server.
    ++ * Check the value of "promisor.advertise" and maybe the configured
    ++ * promisor remotes, if any, to prepare information to send in an
    ++ * advertisement.
    ++ * Return value is NULL if no promisor remote advertisement should be
    ++ * made. Otherwise it contains the names and urls of the advertised
    ++ * promisor remotes separated by ';'
     + */
    -+void promisor_remote_info(struct repository *repo, struct strbuf *buf);
    ++char *promisor_remote_info(struct repository *repo);
     +
     +/*
     + * Prepare a reply to a "promisor-remote" advertisement from a server.
    ++ * Check the value of "promisor.acceptfromserver" and maybe the
    ++ * configured promisor remotes, if any, to prepare the reply.
    ++ * Return value is NULL if no promisor remote from the server
    ++ * is accepted. Otherwise it contains the names of the accepted promisor
    ++ * remotes separated by ';'.
     + */
     +char *promisor_remote_reply(const char *info);
     +
    @@ serve.c: static int agent_advertise(struct repository *r UNUSED,
     +static int promisor_remote_advertise(struct repository *r,
     +                               struct strbuf *value)
     +{
    -+       if (value)
    -+         promisor_remote_info(r, value);
    -+       return 1;
    ++  if (value) {
    ++          char *info = promisor_remote_info(r);
    ++          if (!info)
    ++                  return 0;
    ++          strbuf_addstr(value, info);
    ++          free(info);
    ++  }
    ++  return 1;
     +}
     +
     +static void promisor_remote_receive(struct repository *r,
    @@ serve.c: static struct protocol_capability capabilities[] = {
      
      void protocol_v2_advertise_capabilities(void)
     
    - ## t/t5555-http-smart-common.sh ##
    -@@ t/t5555-http-smart-common.sh: test_expect_success 'git upload-pack --advertise-refs: v2' '
    -   fetch=shallow wait-for-done
    -   server-option
    -   object-format=$(test_oid algo)
    -+  promisor-remote
    -   0000
    -   EOF
    - 
    -
    - ## t/t5701-git-serve.sh ##
    -@@ t/t5701-git-serve.sh: test_expect_success 'test capability advertisement' '
    -   object-format=$(test_oid algo)
    -   EOF
    -   cat >expect.trailer <<-EOF &&
    -+  promisor-remote
    -   0000
    -   EOF
    -   cat expect.base expect.trailer >expect &&
    -
      ## t/t5710-promisor-remote-capability.sh (new) ##
     @@
     +#!/bin/sh
4:  bcb884ee16 ! 4:  1c2794f139 promisor-remote: check advertised name or URL
    @@ Commit message
     
         In case of "KnownName", the client will accept promisor remotes which
         are already configured on the client and have the same name as those
    -    advertised by the client.
    +    advertised by the client. This could be useful in a corporate setup
    +    where servers and clients are trusted to not switch names and URLs, but
    +    where some kind of control is still useful.
     
         In case of "KnownUrl", the client will accept promisor remotes which
         have both the same name and the same URL configured on the client as the
    -    name and URL advertised by the server.
    +    name and URL advertised by the server. This is the most secure option,
    +    so it should be used if possible.
     
         Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
     
    @@ Documentation/config/promisor.txt: promisor.advertise::
      promisor.acceptFromServer::
        If set to "all", a client will accept all the promisor remotes
        a server might advertise using the "promisor-remote"
    --  capability, see linkgit:gitprotocol-v2[5]. Default is "none",
    --  which means no promisor remote advertised by a server will be
    --  accepted.
    -+  capability, see linkgit:gitprotocol-v2[5]. If set to
    -+  "knownName" the client will accept promisor remotes which are
    -+  already configured on the client and have the same name as
    -+  those advertised by the client. If set to "knownUrl", the
    -+  client will accept promisor remotes which have both the same
    -+  name and the same URL configured on the client as the name and
    -+  URL advertised by the server. Default is "none", which means
    -+  no promisor remote advertised by a server will be accepted.
    +-  capability. Default is "none", which means no promisor remote
    +-  advertised by a server will be accepted. By accepting a
    +-  promisor remote, the client agrees that the server might omit
    +-  objects that are lazily fetchable from this promisor remote
    +-  from its responses to "fetch" and "clone" requests from the
    +-  client. See linkgit:gitprotocol-v2[5].
    ++  capability. If set to "knownName" the client will accept
    ++  promisor remotes which are already configured on the client
    ++  and have the same name as those advertised by the client. This
    ++  is not very secure, but could be used in a corporate setup
    ++  where servers and clients are trusted to not switch name and
    ++  URLs. If set to "knownUrl", the client will accept promisor
    ++  remotes which have both the same name and the same URL
    ++  configured on the client as the name and URL advertised by the
    ++  server. This is more secure than "all" or "knownUrl", so it
    ++  should be used if possible instead of those options. Default
    ++  is "none", which means no promisor remote advertised by a
    ++  server will be accepted. By accepting a promisor remote, the
    ++  client agrees that the server might omit objects that are
    ++  lazily fetchable from this promisor remote from its responses
    ++  to "fetch" and "clone" requests from the client. See
    ++  linkgit:gitprotocol-v2[5].
     
      ## promisor-remote.c ##
    -@@ promisor-remote.c: void promisor_remote_info(struct repository *repo, struct strbuf *buf)
    -   strvec_clear(&urls);
    +@@ promisor-remote.c: char *promisor_remote_info(struct repository *repo)
    +   return strbuf_detach(&sb, NULL);
      }
      
     +/*
    @@ promisor-remote.c: static void filter_promisor_remote(struct repository *repo,
      
        remotes = strbuf_split_str(info, ';', 0);
     @@ promisor-remote.c: static void filter_promisor_remote(struct repository *repo,
    +           if (remote_url)
    +                   decoded_url = url_percent_decode(remote_url);
      
    -           decoded_url = url_decode(remote_url);
    - 
    --          if (should_accept_remote(accept, remote_name, decoded_url))
    -+          if (should_accept_remote(accept, remote_name, decoded_url, &names, &urls))
    -                   strvec_push(accepted, remote_name);
    +-          if (decoded_name && should_accept_remote(accept, decoded_name, decoded_url))
    ++          if (decoded_name && should_accept_remote(accept, decoded_name, decoded_url, &names, &urls))
    +                   strvec_push(accepted, decoded_name);
      
                strbuf_list_free(elems);
     @@ promisor-remote.c: static void filter_promisor_remote(struct repository *repo,


Christian Couder (4):
  version: refactor strbuf_sanitize()
  strbuf: refactor strbuf_trim_trailing_ch()
  Add 'promisor-remote' capability to protocol v2
  promisor-remote: check advertised name or URL

 Documentation/config/promisor.txt     |  27 +++
 Documentation/gitprotocol-v2.txt      |  54 ++++++
 connect.c                             |   9 +
 promisor-remote.c                     | 244 ++++++++++++++++++++++++++
 promisor-remote.h                     |  36 +++-
 serve.c                               |  26 +++
 strbuf.c                              |  16 ++
 strbuf.h                              |  10 ++
 t/t5710-promisor-remote-capability.sh | 192 ++++++++++++++++++++
 trace2/tr2_cfg.c                      |  10 +-
 upload-pack.c                         |   3 +
 version.c                             |   9 +-
 12 files changed, 620 insertions(+), 16 deletions(-)
 create mode 100755 t/t5710-promisor-remote-capability.sh

Junio C Hamano Sept. 26, 2024, 6:09 p.m. UTC | #1

Christian Couder <christian.couder@gmail.com> writes:

> Changes compared to version 1
> ...
> Thanks to Junio, Patrick, Eric and Taylor for their suggestions.

We haven't heard from anybody in support of (or against, for that
matter) this series even after a few weeks, which is not a good
sign, even with everybody away for GitMerge for a few days.

IIRC, the comments that the initial iteration have received were
mostly about clarifying the intent of this new capability (and some
typofixes).  What are opinions on this round from folks (especially
those who did not read the initial round)?  Does this round clearly
explain what the capability means and why projects want to use it
under what condition?

Personally, I still find that knownName is increasing potential
attack surface without much benefit, but in a tightly controled
intranet environment, it might have convenience value.  I dunno.

Christian Couder Sept. 27, 2024, 9:15 a.m. UTC | #2

On Thu, Sep 26, 2024 at 8:09 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Christian Couder <christian.couder@gmail.com> writes:
>
> > Changes compared to version 1
> > ...
> > Thanks to Junio, Patrick, Eric and Taylor for their suggestions.
>
> We haven't heard from anybody in support of (or against, for that
> matter) this series even after a few weeks, which is not a good
> sign, even with everybody away for GitMerge for a few days.

By the way there was an unconference breakout session on day 2 of the
Git Merge called "Git LFS Can we do better?" where this was discussed
with a number of people. Scott Chacon took some notes:

https://github.com/git/git-merge/blob/main/breakouts/git-lfs.md

It was in parallel with the Contributor Summit, so few contributors
participated in this session (maybe only Michael Haggerty, John Cai
and me). But the impression of GitLab people there, including me, was
that folks in general would be happy to have an alternative to Git LFS
based on this.

> IIRC, the comments that the initial iteration have received were
> mostly about clarifying the intent of this new capability (and some
> typofixes).  What are opinions on this round from folks (especially
> those who did not read the initial round)?  Does this round clearly
> explain what the capability means and why projects want to use it
> under what condition?
>
> Personally, I still find that knownName is increasing potential
> attack surface without much benefit, but in a tightly controled
> intranet environment, it might have convenience value.  I dunno.

Junio C Hamano Sept. 27, 2024, 10:48 p.m. UTC | #3

Christian Couder <christian.couder@gmail.com> writes:

> By the way there was an unconference breakout session on day 2 of the
> Git Merge called "Git LFS Can we do better?" where this was discussed
> with a number of people. Scott Chacon took some notes:
>
> https://github.com/git/git-merge/blob/main/breakouts/git-lfs.md

Thanks for a link.

> It was in parallel with the Contributor Summit, so few contributors
> participated in this session (maybe only Michael Haggerty, John Cai
> and me). But the impression of GitLab people there, including me, was
> that folks in general would be happy to have an alternative to Git LFS
> based on this.

I am not sure what "based on this" is really about, though.

This series adds a feature to redirect requests to one server to
another, but does it really have much to solve the problem LFS wants
to solve?  I would imagine that you would want to be able to manage
larger objects separately to avoid affecting the performance and
convenience when handling smaller objects, and to serve these larger
objects from a dedicated server.  You certainly can filter the
larger blobs away with blob size filter, but when you really need
these larger blobs, it is unclear how the new capability helps, as
you cannot really tell what the criteria the serving side that gave
you the "promisor-remote" capability wants you to use to sift your
requests between the original server and the new promisor.  Wouldn't
your requests _all_ be redirected to a single place, the promisor
remote you learned via the capability?

Coming up with a better alternative to LFS is certainly good, and it
is worthwhile addtion to the system.  I just do not see how the
topic of this series helps further that goal.

Thanks.

Randall S. Becker Sept. 27, 2024, 11:31 p.m. UTC | #4

On September 27, 2024 6:48 PM, Junio C Hamano wrote:
>Christian Couder <christian.couder@gmail.com> writes:
>
>> By the way there was an unconference breakout session on day 2 of the
>> Git Merge called "Git LFS Can we do better?" where this was discussed
>> with a number of people. Scott Chacon took some notes:
>>
>> https://github.com/git/git-merge/blob/main/breakouts/git-lfs.md
>
>Thanks for a link.
>
>> It was in parallel with the Contributor Summit, so few contributors
>> participated in this session (maybe only Michael Haggerty, John Cai
>> and me). But the impression of GitLab people there, including me, was
>> that folks in general would be happy to have an alternative to Git LFS
>> based on this.
>
>I am not sure what "based on this" is really about, though.
>
>This series adds a feature to redirect requests to one server to another, but does it
>really have much to solve the problem LFS wants to solve?  I would imagine that
>you would want to be able to manage larger objects separately to avoid affecting
>the performance and convenience when handling smaller objects, and to serve
>these larger objects from a dedicated server.  You certainly can filter the larger blobs
>away with blob size filter, but when you really need these larger blobs, it is unclear
>how the new capability helps, as you cannot really tell what the criteria the serving
>side that gave you the "promisor-remote" capability wants you to use to sift your
>requests between the original server and the new promisor.  Wouldn't your
>requests _all_ be redirected to a single place, the promisor remote you learned via
>the capability?
>
>Coming up with a better alternative to LFS is certainly good, and it is worthwhile
>addtion to the system.  I just do not see how the topic of this series helps further
>that goal.

I am one of those who really would like to see an improvement in this area. My
community needs large binaries, and the GitHub LFS support limits sizes to the
point of being pretty much not enough. I would be happy to participate in
requirements gathering for this effort (even if it goes to Rust

Kristoffer Haugsbakk Sept. 28, 2024, 10:56 a.m. UTC | #5

On Sat, Sep 28, 2024, at 01:31, rsbecker@nexbridge.com wrote:
> I am one of those who really would like to see an improvement in this area. My
> community needs large binaries, and the GitHub LFS support limits sizes to the
> point of being pretty much not enough. I would be happy to participate in
> requirements gathering for this effort (even if it goes to Rust

Patrick Steinhardt Sept. 30, 2024, 7:57 a.m. UTC | #6

On Fri, Sep 27, 2024 at 03:48:11PM -0700, Junio C Hamano wrote:
> Christian Couder <christian.couder@gmail.com> writes:
> 
> > By the way there was an unconference breakout session on day 2 of the
> > Git Merge called "Git LFS Can we do better?" where this was discussed
> > with a number of people. Scott Chacon took some notes:
> >
> > https://github.com/git/git-merge/blob/main/breakouts/git-lfs.md
> 
> Thanks for a link.
> 
> > It was in parallel with the Contributor Summit, so few contributors
> > participated in this session (maybe only Michael Haggerty, John Cai
> > and me). But the impression of GitLab people there, including me, was
> > that folks in general would be happy to have an alternative to Git LFS
> > based on this.
> 
> I am not sure what "based on this" is really about, though.
> 
> This series adds a feature to redirect requests to one server to
> another, but does it really have much to solve the problem LFS wants
> to solve?  I would imagine that you would want to be able to manage
> larger objects separately to avoid affecting the performance and
> convenience when handling smaller objects, and to serve these larger
> objects from a dedicated server.  You certainly can filter the
> larger blobs away with blob size filter, but when you really need
> these larger blobs, it is unclear how the new capability helps, as
> you cannot really tell what the criteria the serving side that gave
> you the "promisor-remote" capability wants you to use to sift your
> requests between the original server and the new promisor.  Wouldn't
> your requests _all_ be redirected to a single place, the promisor
> remote you learned via the capability?
> 
> Coming up with a better alternative to LFS is certainly good, and it
> is worthwhile addtion to the system.  I just do not see how the
> topic of this series helps further that goal.

I guess it helps to address part of the problem. I'm not sure whether my
understanding is aligned with Chris' intention, but I could certainly
see that at some point in time we start to advertise promisor remote
URLs that use different transport helpers to fetch objects. This would
allow hosting providers to offload objects to e.g. blob storage or
somesuch thing and the client would know how to fetch them.

But there are still a couple of pieces missing in the bigger puzzle:

  - How would a client know to omit certain objects? Right now it only
    knows that there are promisor remotes, but it doesn't know that it
    e.g. should omit every blob larger than X megabytes. The answer
    could of course be that the client should just know to do a partial
    clone by themselves.

  - Storing those large objects locally is still expensive. We had
    discussions in the past where such objects could be stored
    uncompressed to stop wasting compute here. At GitLab, we're thinking
    about the ability to use rolling hash functions to chunk such big
    objects into smaller parts to also allow for somewhat efficient
    deduplication. We're also thinking about how to make the overall ODB
    pluggable such that we can eventually make it more scalable in this
    context. But that's of course thinking into the future quite a bit.

  - Local repositories would likely want to prune large objects that
    have not been accessed for a while to eventually regain some storage
    space.

I think chipping away the problems one by one is fine. But it would be
nice to draw something like a "big picture" of where we eventually want
to end up at and how all the parts connect with each other to form a
viable native replacement for Git LFS.

Also Cc'ing brian, who likely has a thing or two to say about this :)

Patrick

Christian Couder Sept. 30, 2024, 9:17 a.m. UTC | #7

On Mon, Sep 30, 2024 at 9:57 AM Patrick Steinhardt <ps@pks.im> wrote:
>
> On Fri, Sep 27, 2024 at 03:48:11PM -0700, Junio C Hamano wrote:
> > Christian Couder <christian.couder@gmail.com> writes:
> >
> > > By the way there was an unconference breakout session on day 2 of the
> > > Git Merge called "Git LFS Can we do better?" where this was discussed
> > > with a number of people. Scott Chacon took some notes:
> > >
> > > https://github.com/git/git-merge/blob/main/breakouts/git-lfs.md
> >
> > Thanks for a link.
> >
> > > It was in parallel with the Contributor Summit, so few contributors
> > > participated in this session (maybe only Michael Haggerty, John Cai
> > > and me). But the impression of GitLab people there, including me, was
> > > that folks in general would be happy to have an alternative to Git LFS
> > > based on this.
> >
> > I am not sure what "based on this" is really about, though.
> >
> > This series adds a feature to redirect requests to one server to
> > another, but does it really have much to solve the problem LFS wants
> > to solve?  I would imagine that you would want to be able to manage
> > larger objects separately to avoid affecting the performance and
> > convenience when handling smaller objects, and to serve these larger
> > objects from a dedicated server.  You certainly can filter the
> > larger blobs away with blob size filter, but when you really need
> > these larger blobs, it is unclear how the new capability helps, as
> > you cannot really tell what the criteria the serving side that gave
> > you the "promisor-remote" capability wants you to use to sift your
> > requests between the original server and the new promisor.  Wouldn't
> > your requests _all_ be redirected to a single place, the promisor
> > remote you learned via the capability?
> >
> > Coming up with a better alternative to LFS is certainly good, and it
> > is worthwhile addtion to the system.  I just do not see how the
> > topic of this series helps further that goal.
>
> I guess it helps to address part of the problem. I'm not sure whether my
> understanding is aligned with Chris' intention, but I could certainly
> see that at some point in time we start to advertise promisor remote
> URLs that use different transport helpers to fetch objects. This would
> allow hosting providers to offload objects to e.g. blob storage or
> somesuch thing and the client would know how to fetch them.
>
> But there are still a couple of pieces missing in the bigger puzzle:
>
>   - How would a client know to omit certain objects? Right now it only
>     knows that there are promisor remotes, but it doesn't know that it
>     e.g. should omit every blob larger than X megabytes. The answer
>     could of course be that the client should just know to do a partial
>     clone by themselves.

If we add a "filter" field to the "promisor-remote" capability in a
future patch series, then the server could pass information like a
filter-spec that the client could use to omit some large blobs.

Patch 3/4 has the following in its commit message about it: "In the
future, it might be possible to pass other information like a
filter-spec that the client should use when cloning from S".

>   - Storing those large objects locally is still expensive. We had
>     discussions in the past where such objects could be stored
>     uncompressed to stop wasting compute here.

Yeah, I think a new "verbatim" object representation in the object
database as discussed in
https://lore.kernel.org/git/xmqqbkdometi.fsf@gitster.g/ is the most
likely and easiest in the short term.

> At GitLab, we're thinking
>     about the ability to use rolling hash functions to chunk such big
>     objects into smaller parts to also allow for somewhat efficient
>     deduplication. We're also thinking about how to make the overall ODB
>     pluggable such that we can eventually make it more scalable in this
>     context. But that's of course thinking into the future quite a bit.

Yeah, there are different options for this. For example HuggingFace
(https://huggingface.co/) recently acquired the XetHub company (see
https://huggingface.co/blog/xethub-joins-hf), and said they might open
source XetHub software that does chunking and deduplicates chunks, so
that could be an option too.

>   - Local repositories would likely want to prune large objects that
>     have not been accessed for a while to eventually regain some storage
>     space.

`git repack --filter` and such might already help a bit in this area.
I agree that more work is needed though.

> I think chipping away the problems one by one is fine. But it would be
> nice to draw something like a "big picture" of where we eventually want
> to end up at and how all the parts connect with each other to form a
> viable native replacement for Git LFS.

I have tried to discuss this at the Git Merge 2022 and 2024 and
perhaps even before that. But as you know it's difficult to make
people agree on big projects that are not backed by patches and that
might span over several years (especially when very few people
actually work on them and when they might have other things to work on
too).

Thanks,
Christian.

Junio C Hamano Sept. 30, 2024, 4:34 p.m. UTC | #8

Patrick Steinhardt <ps@pks.im> writes:

> I guess it helps to address part of the problem. I'm not sure whether my
> understanding is aligned with Chris' intention, but I could certainly
> see that at some point in time we start to advertise promisor remote
> URLs that use different transport helpers to fetch objects. This would
> allow hosting providers to offload objects to e.g. blob storage or
> somesuch thing and the client would know how to fetch them.
>
> But there are still a couple of pieces missing in the bigger puzzle:
> ...
> I think chipping away the problems one by one is fine. But it would be
> nice to draw something like a "big picture" of where we eventually want
> to end up at and how all the parts connect with each other to form a
> viable native replacement for Git LFS.

Yes, thanks for stating this a lot more clearly than I said in the
reviews so far.

> Also Cc'ing brian, who likely has a thing or two to say about this :)

Junio C Hamano Sept. 30, 2024, 4:52 p.m. UTC | #9

Christian Couder <christian.couder@gmail.com> writes:

>> But there are still a couple of pieces missing in the bigger puzzle:
>>
>>   - How would a client know to omit certain objects? Right now it only
>>     knows that there are promisor remotes, but it doesn't know that it
>>     e.g. should omit every blob larger than X megabytes. The answer
>>     could of course be that the client should just know to do a partial
>>     clone by themselves.
>
> If we add a "filter" field to the "promisor-remote" capability in a
> future patch series, then the server could pass information like a
> filter-spec that the client could use to omit some large blobs.

Yes, but at that point, is the current scheme to mark a promisor
pack with a single bit, the fact that the pack came from a promisor
remote (which one?, and for what filter settings does the remote
used?) becomes insufficient, isn't it?  Chipping away one by one is
fine, but we'd at least need to be aware that it is one of the
things we need to upgrade in the scope of the bigger picture.

It may even be OK to upgrade the on-the-wire protocol side before
the code on the both ends learn to take advantage of the feature
(e.g., to add "promisor-remote" capability itself, or to add the
capability that can also convey the associated filter specification
to that remote), but without even the design (let alone the
implementation) of what runs on both ends of the connection to to
make use of what is communicated via the capability, it is rather
hard to get the details of the protocol design right.

As on-the-wire protocol is harder to upgrade due to compatibility
constraints, it smells like it is a better order to do things if it
is left as the _last_ piece to be designed and implemented, if we
were to chip away one-by-one.  That may, for example, go like this:

 (0) We want to ensure that the projects can specify what kind of
     objects are to be offloaded to other transports.

 (1) We design the client end first.  We may want to be able to
     choose what remote to run a lazy fetch against, based on a
     filter spec, for example.  We realize and make a mental note
     that our new "capability" needs to tell the client enough
     information to make such a decision.

 (2) We design the server end to supply the above pieces of
     information to the client end.  During this process, we may
     realize that some pieces of information cannot be prepared on
     the server end and (1) may need to get adjusted.

 (3) There may be tons of other things that need to be designed and
     implemented before we know what pieces of information our new
     "capability" needs to convey, and what these pieces of
     information mean by iterating (1) and (2).

 (4) Once we nail (3) down, we can add a new protocol capability,
     knowing how it should work, and knowing that the client and the
     server ends would work well once it is implemented.

>> At GitLab, we're thinking
>>     about the ability to use rolling hash functions to chunk such big
>>     objects into smaller parts to also allow for somewhat efficient
>>     deduplication. We're also thinking about how to make the overall ODB
>>     pluggable such that we can eventually make it more scalable in this
>>     context. But that's of course thinking into the future quite a bit.

Reminds me of rsync and bup ;-).

Thanks.

brian m. carlson Sept. 30, 2024, 9:26 p.m. UTC | #10

On 2024-09-30 at 07:57:17, Patrick Steinhardt wrote:
> But there are still a couple of pieces missing in the bigger puzzle:
> 
>   - How would a client know to omit certain objects? Right now it only
>     knows that there are promisor remotes, but it doesn't know that it
>     e.g. should omit every blob larger than X megabytes. The answer
>     could of course be that the client should just know to do a partial
>     clone by themselves.

It would be helpful to have some sort of protocol v2 feature that says
that a partial clone (of whatever sort) is recommended and let honouring
that be a config flag.  Otherwise, you're going to have a bunch of users
who try to download every giant object in the repository when they don't
need to.

Git LFS has the advantage that this is the default behaviour, which is
really valuable.

>   - Storing those large objects locally is still expensive. We had
>     discussions in the past where such objects could be stored
>     uncompressed to stop wasting compute here. At GitLab, we're thinking
>     about the ability to use rolling hash functions to chunk such big
>     objects into smaller parts to also allow for somewhat efficient
>     deduplication. We're also thinking about how to make the overall ODB
>     pluggable such that we can eventually make it more scalable in this
>     context. But that's of course thinking into the future quite a bit.

Git LFS has a `git lfs dedup` command, which takes the files in the
working tree and creates a copy using the copy-on-write functionality in
the operating system and file system to avoid duplicating them.  There
are certainly some users who simply cannot afford to store multiple
copies of the file system (say, because their repository is 500 GB), and
this is important functionality for them.

Note that this doesn't work for all file systems.  It does for APFS on
macOS, XFS and Btrfs on Linux, and ReFS on Windows, but not HFS+, ext4,
or NTFS, which lack copy-on-write functionality.

We'd probably need to add an extension for uncompressed objects for
this, since it's a repository format change, but it shouldn't be hard to
do.

In Git LFS, it's also possible to share a set of objects across
repositories although one must be careful not to prune them.  We already
have that through alternates, so I don't think we're lacking anything
there.

>   - Local repositories would likely want to prune large objects that
>     have not been accessed for a while to eventually regain some storage
>     space.

Git LFS has a `git lfs prune` command for this as well.  It does have to
be run manually, though.

> I think chipping away the problems one by one is fine. But it would be
> nice to draw something like a "big picture" of where we eventually want
> to end up at and how all the parts connect with each other to form a
> viable native replacement for Git LFS.

I think a native replacement would be a valuable feature.  Part of the
essential component is going to be a way to handle this gracefully
during pushes, since part of the goal of Git LFS is to get large blobs
off the main server storage where they tend to make repacks extremely
expensive and into an external store.  Without that, it's unlikely that
this feature is going to be viable on the server side.  GitHub doesn't
allow large blobs for exactly that reason, so we'd want some way to
store them outside the main repository but still have the repo think
they were present.

One idea I had about this was pluggable storage backends, which might be
a nice feature to add via a dynamically loaded shared library.  In
addition, this seems like the kind of feature that one might like to use
Rust for, since it probably will involve HTTP code, and generally people
like doing that less in C (I do, at least).

> Also Cc'ing brian, who likely has a thing or two to say about this :)

I certainly have thought about this a lot.  I will say that I've stepped
down from being one of the Git LFS maintainers (endless supply of work,
not nearly enough time), but I am still familiar with the architecture
of the project.

Junio C Hamano Sept. 30, 2024, 10:27 p.m. UTC | #11

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> One idea I had about this was pluggable storage backends, which might be
> a nice feature to add via a dynamically loaded shared library.  In
> addition, this seems like the kind of feature that one might like to use
> Rust for, since it probably will involve HTTP code, and generally people
> like doing that less in C (I do, at least).

Yes, yes, and yes.

>> Also Cc'ing brian, who likely has a thing or two to say about this :)
>
> I certainly have thought about this a lot.  I will say that I've stepped
> down from being one of the Git LFS maintainers (endless supply of work,
> not nearly enough time), but I am still familiar with the architecture
> of the project.

Thanks.

Patrick Steinhardt Oct. 1, 2024, 10:13 a.m. UTC | #12

On Mon, Sep 30, 2024 at 03:27:14PM -0700, Junio C Hamano wrote:
> "brian m. carlson" <sandals@crustytoothpaste.net> writes:
> 
> > One idea I had about this was pluggable storage backends, which might be
> > a nice feature to add via a dynamically loaded shared library.  In
> > addition, this seems like the kind of feature that one might like to use
> > Rust for, since it probably will involve HTTP code, and generally people
> > like doing that less in C (I do, at least).
> 
> Yes, yes, and yes.

Indeed, I strongly agree with this. In fact, pluggable ODBs are the next
big topic I'll be working on now that the refdb is pluggable. Naturally
this is a huge undertaking that will likely take more on the order of
years to realize, but one has to start somewhen, I guess.

I'm also aligned with the idea of having something like dlopen-style
implementations of the backends. While the reftable-library is nice and
fixes some of the issues that we have at GitLab, the more important win
is that this demonstrates that the abstractions that we have hold. Which
also means that adding a new backend has gotten a ton easier now.

And yes, being able to implement self-contained features like a refdb
implementation or an ODB implementation in Rust would be a sensible
first step for adopting it. It doesn't interact with anything else and
initially we could continue to support platforms that do not have Rust
by simply not compiling such a backend.

Patrick

Patrick Steinhardt Oct. 1, 2024, 10:14 a.m. UTC | #13

On Mon, Sep 30, 2024 at 11:17:48AM +0200, Christian Couder wrote:
> On Mon, Sep 30, 2024 at 9:57 AM Patrick Steinhardt <ps@pks.im> wrote:
> > I think chipping away the problems one by one is fine. But it would be
> > nice to draw something like a "big picture" of where we eventually want
> > to end up at and how all the parts connect with each other to form a
> > viable native replacement for Git LFS.
> 
> I have tried to discuss this at the Git Merge 2022 and 2024 and
> perhaps even before that. But as you know it's difficult to make
> people agree on big projects that are not backed by patches and that
> might span over several years (especially when very few people
> actually work on them and when they might have other things to work on
> too).

Certainly true, yeah. But we did have documents in the past that
outlined long-term visions in our tree, and it may help the project as a
whole to better understand the long-term vision we're headed into. And
by encouraging discussion up front we may be able to spot any weaknesses
and address them before it is too late.

Patrick

[v2,0/4] Introduce a "promisor-remote" capability

Message

Comments