mbox series

[00/25,RFC] Bundle URIs

Message ID pull.1160.git.1645641063.gitgitgadget@gmail.com (mailing list archive)
Headers show
Series Bundle URIs | expand

Message

John Passaro via GitGitGadget Feb. 23, 2022, 6:30 p.m. UTC
There have been several suggestions to improve Git clone speeds and
reliability by supplementing the Git protocol with static content. The
Packfile URI [0] feature lets the Git response include URIs that point to
packfiles that the client must download to complete the request.

Last year, Ævar suggested using bundles instead of packfiles [1] [2]. This
design has the same benefits to the packfile URI feature because it offloads
most object downloads to static content fetches. The main advantage over
packfile URIs is that the remote Git server does not need to know what is in
those bundles. The Git client tells the server what it downloaded during the
fetch negotiation afterwards. This includes any chance that the client did
not have access to those bundles or otherwise failed to access them. I
agreed that this was a much more desirable way to serve static content, but
had concerns about the flexibility of that design [3]. I have not heard more
on the topic since October, so I started investigating this idea myself in
December, resulting in this RFC.

I focused on maximizing flexibility for the service that organizes and
serves bundles. This includes:

 * Bundle URIs work for full and partial clones.

 * Bundle URIs can assist with git fetch in addition to git clone.

 * Users can set up bundle servers independent of the remote Git server if
   they specify the bundle URI via a --bundle-uri argument.

This series is based on the recently-submitted series that adds object
filters to bundles [4]. There is a slight adjacent-line-add conflict with
js/apply-partial-clone-filters-recursively, but that is in the last few
patches, so it will be easy to rebase by the time we have a fully-reviewable
patch series for those steps.

The general breakdown is as follows:

 * Patch 1 adds documentation for the feature in its entirety.

 * Patches 2-14 add the ability to run ‘git clone --bundle-uri=’

 * Patches 15-17 add bundle fetches to ‘git fetch’ calls

 * Patches 18-25 add a new ‘features’ capability that allows a server to
   advertise bundle URIs (and in the future, other features).

I consider the patches in their current form to be “RFC quality”. There are
multiple places where tests are missing or special cases are not checked.
The goal for this RFC is to seek feedback on the high-level ideas before
committing to the deep work of creating mergeable patches.


Testing with this series
========================

To get a full test of the features being proposed here, I created an MVP of
a bundle server by pushing bundle data to some publicly-readable Azure
Storage accounts. These bundle servers mirror the following public
repositories on GitHub:

 * git/git
 * git-for-windows/git
 * homebrew/homebrew-core
 * cocoapods/specs
 * torvalds/linux

In addition, the Azure Storage accounts are available in different regions:

 * East US: https://gitbundleserver.z13.web.core.windows.net
 * West US: https://gitbundleserverwestus.z22.web.core.windows.net
 * Europe: https://gitbundleservereurope.z6.web.core.windows.net
 * East Asia: https://gitbundleservereastasia.z7.web.core.windows.net
 * South Asia: https://gitbundleserversouthasia.z23.web.core.windows.net
 * Australia: https://gitbundleserveraustralia.z26.web.core.windows.net

To test this RFC against these servers, choose your $org/$repo to clone and
your region's bundle server $url and run

$ git clone --bundle-uri=$url/$org/$repo/ https://github.com/$org/$repo


Note that these servers are set up using "almost free" storage in Azure.
Network connectivity of this storage can be slower than that of GitHub data
centers, so your results may vary.

From my location in Raleigh, NC, USA, I am able to clone torvalds/linux from
the bundle servers quite a bit faster than from GitHub:

@derrickstolee With Bundles Without Bundles Full Clone 491.9s 964.9s Partial
Clone 132.3s 171.7s

I recruited GitHub employees from across the globe to test this experimental
deployment of bundle servers and found mixed results. Some users had better
performance with bundle servers, and many had very close results. Some had
much worse connections to bundle servers than the GitHub remote, because of
the use of the cheapest form of Azure Storage services compared to highly
optimized GitHub infrastructure.

I plan to keep these servers running for a while, so this test should be
possible throughout the review of this RFC and the patch series as they are
reviewed. I'd love to hear if anyone else does this experiment and has
anything to say about the results.


Implementation Plan
===================

The first patch contains a design document that is "aspirational": It
describes the feature as it should be when the implementation is complete.
Most of that implementation is included in this RFC, though there are some
minor tweaks to the functionality I want to change or add before the patches
are under full review. In particular, things are poorly tested and
undocumented the further you look in the RFC.

Here is a potential plan for splitting this RFC into digestible pieces that
can be reviewed in sequence:

 0. Update the git bundle create command to take a --filter option, allowing
    bundles to store packfiles restricted to an object filter. This is
    necessary for using bundle URIs to benefit partial clones. This step was
    already submitted for full review [4]. These patches are based on those.

 1. Integrate bundle URIs into git clone with a --bundle-uri option. This
    will include the full understanding of a table of contents, but will not
    integrate with git fetch or allow the server to advertise URIs.

 2. Integrate bundle URIs into git fetch, triggered by config values that
    are set during git clone if the server indicates that the bundle
    strategy works for fetches.

 3. Create a new "recommended features" capability in protocol v2 where the
    server can recommend features such as bundle URIs, partial clone, and
    sparse-checkout. These features will be extremely limited in scope and
    blocked by opt-in config options. The design for this portion could be
    replaced by a "bundle-uri" capability that only advertises bundle URIs
    and no other information.


Intended Focus of this RFC
==========================

This RFC is very large, and that's even with many patches not including full
documentation or tests. These commits are not intended to be reviewed as if
I intended to merge this as-is.

One thing this feature establishes is a new standard by which the Git client
will communicate with external servers. The goal of this RFC is to determine
if this standard is well designed, and whether we need to make it more
robust. Alternatively, the design might need to be changed for reasons I
cannot predict.

For that reason, hopefully most of the feedback is directly on the first
patch, which contains the design document. In particular, the design
document repeats the implementation plan, and I'd like extra eyes on that,
too.

[0]
https://github.com/git/git/blob/master/Documentation/technical/packfile-uri.txt
The packfile URI feature in Git (Created June 2020)

[1]
https://lore.kernel.org/git/RFC-cover-00.13-0000000000-20210805T150534Z-avarab@gmail.com/
An earlier RFC for a bundle URI feature. (August 2021)

[2]
https://lore.kernel.org/git/cover-0.3-00000000000-20211025T211159Z-avarab@gmail.com/
An earlier patch series creating a 'bundle-uri' protocol v2 capability.
(October 2021)

[3]
https://lore.kernel.org/git/e7fe220b-2877-107e-8f7e-ea507a65feff@gmail.com/
My earlier thoughts on the previous RFCs, many of which are integrated into
this RFC (August 2021)

[4]
https://lore.kernel.org/git/pull.1159.git.1645638911.gitgitgadget@gmail.com/
Add object filters to bundles (February 2022)

Thanks, -Stolee

Derrick Stolee (24):
  docs: document bundle URI standard
  bundle: alphabetize subcommands better
  dir: extract starts_with_dot[_dot]_slash()
  remote: move relative_url()
  remote: allow relative_url() to return an absolute url
  http: make http_get_file() external
  remote-curl: add 'get' capability
  bundle: implement 'fetch' command for direct bundles
  bundle: parse table of contents during 'fetch'
  bundle: add --filter option to 'fetch'
  bundle: allow relative URLs in table of contents
  bundle: make it easy to call 'git bundle fetch'
  clone: add --bundle-uri option
  clone: --bundle-uri cannot be combined with --depth
  config: add git_config_get_timestamp()
  bundle: only fetch bundles if timestamp is new
  fetch: fetch bundles before fetching original data
  protocol-caps: implement cap_features()
  serve: understand but do not advertise 'features' capability
  serve: advertise 'features' when config exists
  connect: implement get_recommended_features()
  transport: add connections for 'features' capability
  clone: use server-recommended bundle URI
  t5601: basic bundle URI test

Ævar Arnfjörð Bjarmason (1):
  connect.c: refactor sending of agent & object-format

 Documentation/gitremote-helpers.txt    |   6 +
 Documentation/technical/bundle-uri.txt | 404 +++++++++++++++++++++
 builtin/bundle.c                       | 478 ++++++++++++++++++++++++-
 builtin/clone.c                        |  51 +++
 builtin/fetch.c                        |  17 +
 builtin/submodule--helper.c            | 129 -------
 bundle.c                               |  21 ++
 bundle.h                               |   9 +
 config.c                               |  39 ++
 config.h                               |  14 +
 connect.c                              |  69 +++-
 dir.h                                  |  11 +
 fsck.c                                 |  14 +-
 http.c                                 |   4 +-
 http.h                                 |   9 +
 protocol-caps.c                        |  66 ++++
 protocol-caps.h                        |   1 +
 remote-curl.c                          |  32 ++
 remote.c                               | 104 ++++++
 remote.h                               |  35 ++
 serve.c                                |  23 ++
 t/t5601-clone.sh                       |  12 +
 t/t5701-git-serve.sh                   |   9 +
 transport-helper.c                     |  14 +
 transport-internal.h                   |   9 +
 transport.c                            |  38 ++
 transport.h                            |   5 +
 27 files changed, 1467 insertions(+), 156 deletions(-)
 create mode 100644 Documentation/technical/bundle-uri.txt


base-commit: ec51d0a50e6e64ae37795d77f7d33204b9b71ecd
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1160%2Fderrickstolee%2Fbundle%2Frfc-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1160/derrickstolee/bundle/rfc-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1160

Comments

Ævar Arnfjörð Bjarmason Feb. 23, 2022, 10:17 p.m. UTC | #1
On Wed, Feb 23 2022, Derrick Stolee via GitGitGadget wrote:

[Note: The E-Mail address you CC'd for me (presumably, dropped in this
reply) is not my E-Mail address, this one is]

[Also CC-ing some people who have expressed interest in this are, and
would probably like to be kept in the loop going forward]

> There have been several suggestions to improve Git clone speeds and
> reliability by supplementing the Git protocol with static content. The
> Packfile URI [0] feature lets the Git response include URIs that point to
> packfiles that the client must download to complete the request.
>
> Last year, Ævar suggested using bundles instead of packfiles [1] [2]. This
> design has the same benefits to the packfile URI feature because it offloads
> most object downloads to static content fetches. The main advantage over
> packfile URIs is that the remote Git server does not need to know what is in
> those bundles. The Git client tells the server what it downloaded during the
> fetch negotiation afterwards. This includes any chance that the client did
> not have access to those bundles or otherwise failed to access them. I
> agreed that this was a much more desirable way to serve static content, but
> had concerns about the flexibility of that design [3]. I have not heard more
> on the topic since October, so I started investigating this idea myself in
> December, resulting in this RFC.

This timing is both quite fortunate & unfortunate for me, since I'd been
blocked / waiting on various things until very recently to submit a
non-RFC re-roll of (a larger version of) that series you mentioned from
October.

I guess the good news is that we'll have at least one guaranteed very
interested reviewer for each other's patches, and that the design that
makes it into git.git in the end will definitely be well hashed out :)

I won't be able to review this in any detail right at this hour, but
will be doing so. I'd also like to submit what I've got in some form
soon for hashing the two out.

It will be some 50+ patches on the ML in total though related to this
topic, so I think the two of us coming up with some way to manage all of
that for both ourselves & others would be nice. Perhaps we could also
have an off-list (video) chat in real time to clarify/discuss various
thing related to this.

Having said that, basically:

> I focused on maximizing flexibility for the service that organizes and
> serves bundles. This includes:
>
>  * Bundle URIs work for full and partial clones.
>
>  * Bundle URIs can assist with git fetch in addition to git clone.
>
>  * Users can set up bundle servers independent of the remote Git server if
>    they specify the bundle URI via a --bundle-uri argument.
>
> This series is based on the recently-submitted series that adds object
> filters to bundles [4]. There is a slight adjacent-line-add conflict with
> js/apply-partial-clone-filters-recursively, but that is in the last few
> patches, so it will be easy to rebase by the time we have a fully-reviewable
> patch series for those steps.
>
> The general breakdown is as follows:
>
>  * Patch 1 adds documentation for the feature in its entirety.
>
>  * Patches 2-14 add the ability to run ‘git clone --bundle-uri=’
>
>  * Patches 15-17 add bundle fetches to ‘git fetch’ calls
>
>  * Patches 18-25 add a new ‘features’ capability that allows a server to
>    advertise bundle URIs (and in the future, other features).
>
> I consider the patches in their current form to be “RFC quality”. There are
> multiple places where tests are missing or special cases are not checked.
> The goal for this RFC is to seek feedback on the high-level ideas before
> committing to the deep work of creating mergeable patches.

Having skimmed through all of this a *very rough* overview of what
you've got here & the direction I chose to go in is:

1. I didn't go for an initial step of teaching "git bundle" any direct
   remote operation, rather it's straight to  the protocol v2 bits etc.

   I don't think there's anything wrong with that, but didn't see much
   point in teaching  "git bundle" to do that when the eventual state is
   to have "git fetch" do so anyway.

   But in either case the "fetch" parts are either a thin wrapper for
   "git bundle fetch", or a "git bundle fetch/unbundle" is a thin
   equivalent to "init" "fetch" (with bundle-uri) + "unbundle".

2. By far the main difference is that you're heavily leaning on a TOC
   format which encodes certain assumptions that aren't true of
   clones/fetches in general (but probably are for most fetches), whereas
   my design (as we previously discussed) leans entirely on the client
   making sense of the bundle header & content itself.

   E.g. you have a "bundle.tableOfContents.forFetch", but e.g. if you've
   got a git.git clone of "master" and want to:

       git fetch origin refs/heads/todo:todo

   The assumption that we can cleanly separate "clone" from "fetch" breaks
   down.

   I.e. such a thing needs to assume that "clone" implies "you have
   most of the objects you need already" and that "fetch" means "..an
   incremental update thereof", doesn't it?

   Whereas I think (but we'll hash that out) that having a client fetch the
   bundle header and working that out via current reachability checks will
   be just as fast/faster, and such a thing is definitely more
   general/applicable to all sorts/types of fetches.

   (A TOC mechanism might still be good/valuable, but I hope it can be a
   cheap/discardable way to simply cache those bundle headers, or serve
   them up all at once)

3. Ditto "bundle.<id>.timestamp" in the design (presumably assumes not-rewound
   histories), and "requires" (can also currently be inferred from bundle
   headers).

4. I still need to go over your just-submitted "bundle filters"
   (https://lore.kernel.org/git/pull.1159.git.1645638911.gitgitgadget@gmail.com/)
   in detail but by adding a @filter to the file format (good!) presumably the
   "bundle.<id>.filter" amounts to a cache of the headers (which was 100% in line
   with any design I had for such extra information associated with a bundle).

In (partial) summary: I really want to lean more heavily into the
distributed nature of git in that a "bundle clone" be no more special
than the same operation performed locally where "clone/fetch" is
pointed-to a directory containing X number of local bundles, and has to
make sense of whether those help with the clone/fetch operation. I.e. by
parsing their headers & comparing that to the ref advertisement.

Maybe a meta-format TOC will be needed eventually, and I'm not against
such a thing.

I'd just like to make sure we wouldn't add such a thing as a premature
optimization or something that would needlessly complicate the
design. In particular (quoting from a part of 01/25:
    
    +A further optimization is that the client can avoid downloading any
    +bundles if their timestamps are not larger than the stored timestamp.
    +After fetching new bundles, this local timestamp value is updated.

Such caching seems sensible, but to me seems basically redundant to what
you'd get by doing the same with just:

 * A set of dumb bundle files in a directory on a webserver
 * Having unique names for each of those (e.g. naming them
   https://<host>/<hash-of-content>.bundle instead of
   https://<host>/weekly.bundle)
 * Since the content wouldn't change (HTTP headers indicating caching
   forever) a client would have downloaded say the last 6 of your set of
   7 "daily" rotating bundles already, and we'd locally cache their
   entire header, not just a timestamp.

I.e. I think you'd get the same reduction in requests and more from
that. I.e. (to go back to the earlier example) of:

    git fetch origin refs/heads/todo:todo

You'd get the tip of "ls-refs" for TODO, and locally discover that one
of the 6 "daily" bundles whose headers (but not necessarily content) you
already downloaded had that advertised OID, and grab it from there.

The critical difference being that such an arrangement would not be
assuming linear history/additive only (i.e. only fast-forward) which the
"forFetch" + "timestamp" surely does.

And, I think we'll be much better off both in the short and long term by
heavily leaning into HTTP caching features and things like request
pipelining + range requests than a custom meta-index format.

IOW is a TOC format needed if we assume for a moment for the sake of
argument that for a given repository the say 100 bundles you'd
potentially serve up aren't remote at all, but something you've got
mmap()'d and can inspect the bundle headers for and compare with the
remote "ls-refs"?

Because if that's the case we could basically get to the same place via
HTTP caching features, and doing it that way has the advantage of
piggy-backing on all existing caching infrastructure.

Have 1000 computers on your network that keep fetching torvalds/linux?
Stick a proxy configured to cache the first say 1MB of <bundle-base-url>
in front of them.

Now all their requests to discover if the bundles help will be local
(and it would probably make sense to cache the actual content
too). Whereas any type of custom caching strategy would be
per-git-client.

Just food for thought, and sorry that this E-Mail/braindump got so long
already...
Derrick Stolee Feb. 24, 2022, 2:11 p.m. UTC | #2
On 2/23/2022 5:17 PM, Ævar Arnfjörð Bjarmason wrote:
> 
> On Wed, Feb 23 2022, Derrick Stolee via GitGitGadget wrote:
> 
> [Note: The E-Mail address you CC'd for me (presumably, dropped in this
> reply) is not my E-Mail address, this one is]

Sorry about that. I appear to have gotten it right on the partial
bundles series, but somehow had a brain fart here.
 
> [Also CC-ing some people who have expressed interest in this are, and
> would probably like to be kept in the loop going forward]

Thanks. The more eyes, the better.

>> There have been several suggestions to improve Git clone speeds and
>> reliability by supplementing the Git protocol with static content. The
>> Packfile URI [0] feature lets the Git response include URIs that point to
>> packfiles that the client must download to complete the request.
>>
>> Last year, Ævar suggested using bundles instead of packfiles [1] [2]. This
>> design has the same benefits to the packfile URI feature because it offloads
>> most object downloads to static content fetches. The main advantage over
>> packfile URIs is that the remote Git server does not need to know what is in
>> those bundles. The Git client tells the server what it downloaded during the
>> fetch negotiation afterwards. This includes any chance that the client did
>> not have access to those bundles or otherwise failed to access them. I
>> agreed that this was a much more desirable way to serve static content, but
>> had concerns about the flexibility of that design [3]. I have not heard more
>> on the topic since October, so I started investigating this idea myself in
>> December, resulting in this RFC.
> 
> This timing is both quite fortunate & unfortunate for me, since I'd been
> blocked / waiting on various things until very recently to submit a
> non-RFC re-roll of (a larger version of) that series you mentioned from
> October.
> 
> I guess the good news is that we'll have at least one guaranteed very
> interested reviewer for each other's patches, and that the design that
> makes it into git.git in the end will definitely be well hashed out :)
> 
> I won't be able to review this in any detail right at this hour, but
> will be doing so. I'd also like to submit what I've got in some form
> soon for hashing the two out.
> 
> It will be some 50+ patches on the ML in total though related to this
> topic, so I think the two of us coming up with some way to manage all of
> that for both ourselves & others would be nice. Perhaps we could also
> have an off-list (video) chat in real time to clarify/discuss various
> thing related to this.

I look forward to seeing your full implementation. There are many things
about your RFC that left me confused and not fully understanding your
vision. You reference a few of them further down, so I'll mention them
specifically in that context.

> Having said that, basically:
> 
>> I focused on maximizing flexibility for the service that organizes and
>> serves bundles. This includes:
>>
>>  * Bundle URIs work for full and partial clones.
>>
>>  * Bundle URIs can assist with git fetch in addition to git clone.
>>
>>  * Users can set up bundle servers independent of the remote Git server if
>>    they specify the bundle URI via a --bundle-uri argument.
>>
>> This series is based on the recently-submitted series that adds object
>> filters to bundles [4]. There is a slight adjacent-line-add conflict with
>> js/apply-partial-clone-filters-recursively, but that is in the last few
>> patches, so it will be easy to rebase by the time we have a fully-reviewable
>> patch series for those steps.
>>
>> The general breakdown is as follows:
>>
>>  * Patch 1 adds documentation for the feature in its entirety.
>>
>>  * Patches 2-14 add the ability to run ‘git clone --bundle-uri=’
>>
>>  * Patches 15-17 add bundle fetches to ‘git fetch’ calls
>>
>>  * Patches 18-25 add a new ‘features’ capability that allows a server to
>>    advertise bundle URIs (and in the future, other features).
>>
>> I consider the patches in their current form to be “RFC quality”. There are
>> multiple places where tests are missing or special cases are not checked.
>> The goal for this RFC is to seek feedback on the high-level ideas before
>> committing to the deep work of creating mergeable patches.
> 
> Having skimmed through all of this a *very rough* overview of what
> you've got here & the direction I chose to go in is:
> 
> 1. I didn't go for an initial step of teaching "git bundle" any direct
>    remote operation, rather it's straight to  the protocol v2 bits etc.
> 
>    I don't think there's anything wrong with that, but didn't see much
>    point in teaching  "git bundle" to do that when the eventual state is
>    to have "git fetch" do so anyway.
> 
>    But in either case the "fetch" parts are either a thin wrapper for
>    "git bundle fetch", or a "git bundle fetch/unbundle" is a thin
>    equivalent to "init" "fetch" (with bundle-uri) + "unbundle".

I'm not married to this specific implementation, although having the
bundle fetch be something a user could run independently of 'git fetch'
or 'git clone' might be desirable.

> 2. By far the main difference is that you're heavily leaning on a TOC
>    format which encodes certain assumptions that aren't true of
>    clones/fetches in general (but probably are for most fetches), whereas
>    my design (as we previously discussed) leans entirely on the client
>    making sense of the bundle header & content itself.
> 
>    E.g. you have a "bundle.tableOfContents.forFetch", but e.g. if you've
>    got a git.git clone of "master" and want to:
> 
>        git fetch origin refs/heads/todo:todo
> 
>    The assumption that we can cleanly separate "clone" from "fetch" breaks
>    down.
> 
>    I.e. such a thing needs to assume that "clone" implies "you have
>    most of the objects you need already" and that "fetch" means "..an
>    incremental update thereof", doesn't it?
> 
>    Whereas I think (but we'll hash that out) that having a client fetch the
>    bundle header and working that out via current reachability checks will
>    be just as fast/faster, and such a thing is definitely more
>    general/applicable to all sorts/types of fetches.
> 
>    (A TOC mechanism might still be good/valuable, but I hope it can be a
>    cheap/discardable way to simply cache those bundle headers, or serve
>    them up all at once)

Note that the TOC is completely optional, and you could serve a bundle from
the advertised URI. The biggest difference is that the TOC allows flexibility
that your design did not (or at least, I could not detect how to get that
flexibility out of it).

The biggest thing that I am trying to understand from your design is something
I'm going to make an educated guess about: if you _do_ break the repo into
multiple bundles, then the bundle URI advertisement sends multiple URIs that
must _all_ be inspected to get the full bundle content.

The change in the TOC model is that there can be multiple potential servers
hosting multiple bundles, and the bundle URI advertisement sending multiple
URIs means "any of these would work. Try one close to you, or use the others
as a fallback." This seems like the biggest incompatibility in our approaches
based on my understanding.

Finally, the biggest thing that is possible in my model that is not in yours
is the idea of an independent bundle server that is not known by the origin
server. It seems that everything relies on the bundle URI advertisement,
while this implementation allows a --bundle-uri=<X> at clone time or a
configured URI.

Again, the TOC is critical here, so there can be one well-known URI for the
TOC that doesn't change over time, and can be used to fetch multiple bundles.

And perhaps you are intending the bundle URI to point to a directory listing,
but that seems like it would need a format that Git understands. And you
mention parsing bundle headers, which I would really like to see how you
intend to implement that without downloading too much data from the remotes.
The TOC can be extended to include whatever header information you intend to
use for these decisions.

> 3. Ditto "bundle.<id>.timestamp" in the design (presumably assumes not-rewound
>    histories), and "requires" (can also currently be inferred from bundle
>    headers).
> 
> 4. I still need to go over your just-submitted "bundle filters"
>    (https://lore.kernel.org/git/pull.1159.git.1645638911.gitgitgadget@gmail.com/)
>    in detail but by adding a @filter to the file format (good!) presumably the
>    "bundle.<id>.filter" amounts to a cache of the headers (which was 100% in line
>    with any design I had for such extra information associated with a bundle).
> 
> In (partial) summary: I really want to lean more heavily into the
> distributed nature of git in that a "bundle clone" be no more special
> than the same operation performed locally where "clone/fetch" is
> pointed-to a directory containing X number of local bundles, and has to
> make sense of whether those help with the clone/fetch operation. I.e. by
> parsing their headers & comparing that to the ref advertisement.

Cloning and fetching from bundles is fundamentally different from the
dynamic fetch negotiation of the Git protocol, so I see this intent as
a drawback to your approach, not a strength.

> Maybe a meta-format TOC will be needed eventually, and I'm not against
> such a thing.
> 
> I'd just like to make sure we wouldn't add such a thing as a premature
> optimization or something that would needlessly complicate the
> design. In particular (quoting from a part of 01/25:
>     
>     +A further optimization is that the client can avoid downloading any
>     +bundles if their timestamps are not larger than the stored timestamp.
>     +After fetching new bundles, this local timestamp value is updated.
> 
> Such caching seems sensible, but to me seems basically redundant to what
> you'd get by doing the same with just:
> 
>  * A set of dumb bundle files in a directory on a webserver
>  * Having unique names for each of those (e.g. naming them
>    https://<host>/<hash-of-content>.bundle instead of
>    https://<host>/weekly.bundle)

The bundles on my prototype server do have unique names (the timestamp
is part of the name), even though they don't include a hash of the
contents. This is mostly for readability when a human looks at the
TOC, since it does not affect the correctness. A hash could be used in
the name if the bundle server wanted, but it also isn't required.

>  * Since the content wouldn't change (HTTP headers indicating caching
>    forever) a client would have downloaded say the last 6 of your set of
>    7 "daily" rotating bundles already, and we'd locally cache their
>    entire header, not just a timestamp.

This model either requires Git understanding how to walk that
directory of files on the webserver OR for the origin server to be
aware of the hash of every bundle that might be fetched. That coupling
is exactly the kind of thing I think is too difficult for serving with
packfile URIs.

> I.e. I think you'd get the same reduction in requests and more from
> that. I.e. (to go back to the earlier example) of:
> 
>     git fetch origin refs/heads/todo:todo
> 
> You'd get the tip of "ls-refs" for TODO, and locally discover that one
> of the 6 "daily" bundles whose headers (but not necessarily content) you
> already downloaded had that advertised OID, and grab it from there.

It would be easy to special-case custom refspecs as not wanting bundle
data. This goes back to the idea that serving from static content is
_not_ dynamic enough to handle these cases and therefore should not be
a major concern for the design.

Everything about using static content should be a _heuristic_ that
works for most users most-frequent operations. Most users have a
standard refs/heads/*:refs/remotes/origin/* refspec when fetching, so
they want as much data as they can get from that refspec from the
bundles.

> The critical difference being that such an arrangement would not be
> assuming linear history/additive only (i.e. only fast-forward) which the
> "forFetch" + "timestamp" surely does.

Generally, refs/heads/* does move forward. Yes, if someone force-pushes
then there is a chance that some extra data from their older version is
stored in a bundle somewhere.

> And, I think we'll be much better off both in the short and long term by
> heavily leaning into HTTP caching features and things like request
> pipelining + range requests than a custom meta-index format.

The point of TOC is not to split bundles small enough to avoid those
features. Those things are still possible, and I expect that anyone
organizing a bundle server would want one very large bundle to handle
the majority of the repo data. Range requests are critical for allowing
resumable clones for that large data.

> IOW is a TOC format needed if we assume for a moment for the sake of
> argument that for a given repository the say 100 bundles you'd
> potentially serve up aren't remote at all, but something you've got
> mmap()'d and can inspect the bundle headers for and compare with the
> remote "ls-refs"?
> 
> Because if that's the case we could basically get to the same place via
> HTTP caching features, and doing it that way has the advantage of
> piggy-backing on all existing caching infrastructure.
1. 100 bundles is probably at least triple the maximum I would
   recommend, unless you have a have a super-active monorepo that
   wants bundles computed hourly.

2. You keep saying "inspecting the headers" to make a decision, but
   I have yet to see you present a way that you would organize bundles
   such that that decision isn't just "get all bundles I haven't
   already downloaded" and then be equivalent to the timestamp
   heuristic.

> Have 1000 computers on your network that keep fetching torvalds/linux?
> Stick a proxy configured to cache the first say 1MB of <bundle-base-url>
> in front of them.
> 
> Now all their requests to discover if the bundles help will be local
> (and it would probably make sense to cache the actual content
> too). Whereas any type of custom caching strategy would be
> per-git-client.
> 
> Just food for thought, and sorry that this E-Mail/braindump got so long
> already...

I recommend that when you have the time you look carefully at Patch 1
which actually includes the design document for the feature. That doc
contains concrete descriptions of all the ways the presented design is
flexible for users and server administrators.

In particular, I think an extremely valuable aspect of this design is
the ability for someone to spin up their own bundle server in their
build lab and modify their build scripts to fetch from that bundle
server before going to the origin remote. If your design can handle
that scenario, then I'd love to see how.

Take your time preparing your patches, and I'll give them a careful
read then they are available. I expect that you have some great ideas
that could augment this design, and I'm sure that your code will have
some implementation details that are better than mine. We can find a
way to combine to get the best of both worlds.

That said, I do feel that I must not have done an adequate job of
describing how important I think it is to have a more flexible design
than what you presented before. You continue to push back that your
ideas are sufficient, but I disagree. Unfortunately, I cannot be
sure either way until I see your full implementation.

Thanks,
-Stolee
Derrick Stolee March 4, 2022, 1:30 p.m. UTC | #3
On 2/24/2022 9:11 AM, Derrick Stolee wrote:
> On 2/23/2022 5:17 PM, Ævar Arnfjörð Bjarmason wrote:
>>
>> On Wed, Feb 23 2022, Derrick Stolee via GitGitGadget wrote:

>>> There have been several suggestions to improve Git clone speeds and
>>> reliability by supplementing the Git protocol with static content. The
>>> Packfile URI [0] feature lets the Git response include URIs that point to
>>> packfiles that the client must download to complete the request.
>>>
>>> Last year, Ævar suggested using bundles instead of packfiles [1] [2]. This
>>> design has the same benefits to the packfile URI feature because it offloads
>>> most object downloads to static content fetches. The main advantage over
>>> packfile URIs is that the remote Git server does not need to know what is in
>>> those bundles. The Git client tells the server what it downloaded during the
>>> fetch negotiation afterwards. This includes any chance that the client did
>>> not have access to those bundles or otherwise failed to access them. I
>>> agreed that this was a much more desirable way to serve static content, but
>>> had concerns about the flexibility of that design [3]. I have not heard more
>>> on the topic since October, so I started investigating this idea myself in
>>> December, resulting in this RFC.
>>
>> This timing is both quite fortunate & unfortunate for me, since I'd been
>> blocked / waiting on various things until very recently to submit a
>> non-RFC re-roll of (a larger version of) that series you mentioned from
>> October.
>>
>> I guess the good news is that we'll have at least one guaranteed very
>> interested reviewer for each other's patches, and that the design that
>> makes it into git.git in the end will definitely be well hashed out :)
>>
>> I won't be able to review this in any detail right at this hour, but
>> will be doing so. I'd also like to submit what I've got in some form
>> soon for hashing the two out.
>>
>> It will be some 50+ patches on the ML in total though related to this
>> topic, so I think the two of us coming up with some way to manage all of
>> that for both ourselves & others would be nice. Perhaps we could also
>> have an off-list (video) chat in real time to clarify/discuss various
>> thing related to this.
> 
> I look forward to seeing your full implementation. There are many things
> about your RFC that left me confused and not fully understanding your
> vision.

I am genuinely curious to see your full implementation of bundle URIs.
I've been having trouble joining the Git IRC chats, but I saw from the
logs that you are working on getting patches together.

Do you have an expected timeline on that progress?

I would like to move forward in getting bundle URIs submitted as a full
feature, but it is important to see your intended design so we can take
the best parts of both to create a version that satisfies us both.

Thanks,
-Stolee
Ævar Arnfjörð Bjarmason March 4, 2022, 2:49 p.m. UTC | #4
On Fri, Mar 04 2022, Derrick Stolee wrote:

> On 2/24/2022 9:11 AM, Derrick Stolee wrote:
>> On 2/23/2022 5:17 PM, Ævar Arnfjörð Bjarmason wrote:
>>>
>>> On Wed, Feb 23 2022, Derrick Stolee via GitGitGadget wrote:
>
>>>> There have been several suggestions to improve Git clone speeds and
>>>> reliability by supplementing the Git protocol with static content. The
>>>> Packfile URI [0] feature lets the Git response include URIs that point to
>>>> packfiles that the client must download to complete the request.
>>>>
>>>> Last year, Ævar suggested using bundles instead of packfiles [1] [2]. This
>>>> design has the same benefits to the packfile URI feature because it offloads
>>>> most object downloads to static content fetches. The main advantage over
>>>> packfile URIs is that the remote Git server does not need to know what is in
>>>> those bundles. The Git client tells the server what it downloaded during the
>>>> fetch negotiation afterwards. This includes any chance that the client did
>>>> not have access to those bundles or otherwise failed to access them. I
>>>> agreed that this was a much more desirable way to serve static content, but
>>>> had concerns about the flexibility of that design [3]. I have not heard more
>>>> on the topic since October, so I started investigating this idea myself in
>>>> December, resulting in this RFC.
>>>
>>> This timing is both quite fortunate & unfortunate for me, since I'd been
>>> blocked / waiting on various things until very recently to submit a
>>> non-RFC re-roll of (a larger version of) that series you mentioned from
>>> October.
>>>
>>> I guess the good news is that we'll have at least one guaranteed very
>>> interested reviewer for each other's patches, and that the design that
>>> makes it into git.git in the end will definitely be well hashed out :)
>>>
>>> I won't be able to review this in any detail right at this hour, but
>>> will be doing so. I'd also like to submit what I've got in some form
>>> soon for hashing the two out.
>>>
>>> It will be some 50+ patches on the ML in total though related to this
>>> topic, so I think the two of us coming up with some way to manage all of
>>> that for both ourselves & others would be nice. Perhaps we could also
>>> have an off-list (video) chat in real time to clarify/discuss various
>>> thing related to this.
>> 
>> I look forward to seeing your full implementation. There are many things
>> about your RFC that left me confused and not fully understanding your
>> vision.
>
> I am genuinely curious to see your full implementation of bundle URIs.
> I've been having trouble joining the Git IRC chats, but I saw from the
> logs that you are working on getting patches together.
>
> Do you have an expected timeline on that progress?
>
> I would like to move forward in getting bundle URIs submitted as a full
> feature, but it is important to see your intended design so we can take
> the best parts of both to create a version that satisfies us both.

Hi. Very sorry about the late reply. I relly meant to have something
read to send at the end of this week, but it's been a bit of a
whirlwhind of random things coming up & distracting me too much. Sorry.

I'm also totally on board with that goal, are you OK with waiting until
the end of next week at the latest?

Also, as noted in the upthread
<220224.86czjdb22l.gmgdl@evledraar.gmail.com> it might be useful to chat
in a more voice/video medium in parallel (maybe mid-next-week) about the
high-level ideas & to get a feel for our goals, conflicts etc. Doing
that over very long E-Mail exchanges (and the fault of "long" there is
mostly on my side:) can be a bit harder...
Derrick Stolee March 4, 2022, 3:12 p.m. UTC | #5
On 3/4/2022 9:49 AM, Ævar Arnfjörð Bjarmason wrote:
> 
> On Fri, Mar 04 2022, Derrick Stolee wrote:
> 
>> On 2/24/2022 9:11 AM, Derrick Stolee wrote:
>>> I look forward to seeing your full implementation. There are many things
>>> about your RFC that left me confused and not fully understanding your
>>> vision.
>>
>> I am genuinely curious to see your full implementation of bundle URIs.
>> I've been having trouble joining the Git IRC chats, but I saw from the
>> logs that you are working on getting patches together.
>>
>> Do you have an expected timeline on that progress?
>>
>> I would like to move forward in getting bundle URIs submitted as a full
>> feature, but it is important to see your intended design so we can take
>> the best parts of both to create a version that satisfies us both.
> 
> Hi. Very sorry about the late reply. I relly meant to have something
> read to send at the end of this week, but it's been a bit of a
> whirlwhind of random things coming up & distracting me too much. Sorry.
> 
> I'm also totally on board with that goal, are you OK with waiting until
> the end of next week at the latest?

I'm OK with waiting, especially when I have a timeline in mind.
 
> Also, as noted in the upthread
> <220224.86czjdb22l.gmgdl@evledraar.gmail.com> it might be useful to chat
> in a more voice/video medium in parallel (maybe mid-next-week) about the
> high-level ideas & to get a feel for our goals, conflicts etc. Doing
> that over very long E-Mail exchanges (and the fault of "long" there is
> mostly on my side:) can be a bit harder...

I agree. I we can work out a time in a private thread and I can send
you a video call invite.

Thanks,
-Stolee
Teng Long March 8, 2022, 8:18 a.m. UTC | #6
Ævar Arnfjörð Bjarmason wrote on Wed, 23 Feb 2022 23:17:22 +0100:

>[Note: The E-Mail address you CC'd for me (presumably, dropped in this
>reply) is not my E-Mail address, this one is]
>
>[Also CC-ing some people who have expressed interest in this are, and
>would probably like to be kept in the loop going forward]

Appreciate for the CC. 

I'm attractive on this for a while. On a earlier time, I had posted a
patchset about to extend "packfile-uri" for common or similar reasons,
but after I saw the idea and RFC from Ævar Arnfjörð Bjarmason, I suspended
it.

Really looking forward the solution after you guys reach the consensuses and
I'm glad to attend and listen to your meetings, if possible (I looked at the
context a little bit late, maybe already missed it).

Thanks.
Derrick Stolee March 8, 2022, 5:15 p.m. UTC | #7
On 3/4/2022 10:12 AM, Derrick Stolee wrote:
> On 3/4/2022 9:49 AM, Ævar Arnfjörð Bjarmason wrote:
>> Also, as noted in the upthread
>> <220224.86czjdb22l.gmgdl@evledraar.gmail.com> it might be useful to chat
>> in a more voice/video medium in parallel (maybe mid-next-week) about the
>> high-level ideas & to get a feel for our goals, conflicts etc. Doing
>> that over very long E-Mail exchanges (and the fault of "long" there is
>> mostly on my side:) can be a bit harder...
> 
> I agree. I we can work out a time in a private thread and I can send
> you a video call invite.

Ævar and I just finished our chat and came away with these two
action items:

1. Ævar will finish prepping his RFC as-is and send it to the list.
   It contains several deeply technical optimizations that are
   critical to how his model works, but could also be used to
   improve scenarios in the table of contents model.

2. Ævar will then do a round of taking both series and combining
   them in a way that allows the union of possible functionality
   to work.

3. As these things come out, I will make it a priority to read the
   patches and provide feedback focusing on high-level concepts
   and ways we can split the future, non-RFC series into chunks
   that provide incremental functionality while keeping review
   easier than reading the whole series.

Thanks,
-Stolee
Johannes Schindelin March 10, 2022, 2:45 p.m. UTC | #8
Hi Stolee & Ævar,

On Tue, 8 Mar 2022, Derrick Stolee wrote:

> On 3/4/2022 10:12 AM, Derrick Stolee wrote:
> > On 3/4/2022 9:49 AM, Ævar Arnfjörð Bjarmason wrote:
> >> Also, as noted in the upthread
> >> <220224.86czjdb22l.gmgdl@evledraar.gmail.com> it might be useful to chat
> >> in a more voice/video medium in parallel (maybe mid-next-week) about the
> >> high-level ideas & to get a feel for our goals, conflicts etc. Doing
> >> that over very long E-Mail exchanges (and the fault of "long" there is
> >> mostly on my side:) can be a bit harder...
> >
> > I agree. I we can work out a time in a private thread and I can send
> > you a video call invite.
>
> Ævar and I just finished our chat and came away with these two
> action items:
>
> 1. Ævar will finish prepping his RFC as-is and send it to the list.
>    It contains several deeply technical optimizations that are
>    critical to how his model works, but could also be used to
>    improve scenarios in the table of contents model.
>
> 2. Ævar will then do a round of taking both series and combining
>    them in a way that allows the union of possible functionality
>    to work.
>
> 3. As these things come out, I will make it a priority to read the
>    patches and provide feedback focusing on high-level concepts
>    and ways we can split the future, non-RFC series into chunks
>    that provide incremental functionality while keeping review
>    easier than reading the whole series.

I very much look forward to see the combined work soon!

Thank you, both,
Johannes
Derrick Stolee April 7, 2022, 7:08 p.m. UTC | #9
On 3/8/2022 12:15 PM, Derrick Stolee wrote:
> On 3/4/2022 10:12 AM, Derrick Stolee wrote:
>> On 3/4/2022 9:49 AM, Ævar Arnfjörð Bjarmason wrote:
>>> Also, as noted in the upthread
>>> <220224.86czjdb22l.gmgdl@evledraar.gmail.com> it might be useful to chat
>>> in a more voice/video medium in parallel (maybe mid-next-week) about the
>>> high-level ideas & to get a feel for our goals, conflicts etc. Doing
>>> that over very long E-Mail exchanges (and the fault of "long" there is
>>> mostly on my side:) can be a bit harder...
>>
>> I agree. I we can work out a time in a private thread and I can send
>> you a video call invite.
> 
> Ævar and I just finished our chat and came away with these two
> action items:
> 
> 1. Ævar will finish prepping his RFC as-is and send it to the list.
>    It contains several deeply technical optimizations that are
>    critical to how his model works, but could also be used to
>    improve scenarios in the table of contents model.

Ævar: I'm still waiting on the full version of this. While you
updated [1] your original RFC [2], it was incomplete. I am still
looking forward to seeing your full vision of how it works with
incremental fetch and how your optimizations to download only the
headers of the bundles will work.

[1] [RFC PATCH v2 00/13] bundle-uri: a "dumb CDN" for git
    https://lore.kernel.org/git/RFC-cover-v2-00.13-00000000000-20220311T155841Z-avarab@gmail.com/

[2] [RFC PATCH 00/13] Add bundle-uri: resumably clones, static "dumb" CDN etc.
    https://lore.kernel.org/git/RFC-cover-00.13-0000000000-20210805T150534Z-avarab@gmail.com/

> 2. Ævar will then do a round of taking both series and combining
>    them in a way that allows the union of possible functionality
>    to work.

Or perhaps you are jumping straight to this part?

> 3. As these things come out, I will make it a priority to read the
>    patches and provide feedback focusing on high-level concepts
>    and ways we can split the future, non-RFC series into chunks
>    that provide incremental functionality while keeping review
>    easier than reading the whole series.

I'm still looking forward to seeing progress in this area. Please
let me know what your plan is here.

Thanks,
-Stolee
Ævar Arnfjörð Bjarmason April 8, 2022, 9:15 a.m. UTC | #10
On Thu, Apr 07 2022, Derrick Stolee wrote:

> On 3/8/2022 12:15 PM, Derrick Stolee wrote:
>> On 3/4/2022 10:12 AM, Derrick Stolee wrote:
>>> On 3/4/2022 9:49 AM, Ævar Arnfjörð Bjarmason wrote:
>>>> Also, as noted in the upthread
>>>> <220224.86czjdb22l.gmgdl@evledraar.gmail.com> it might be useful to chat
>>>> in a more voice/video medium in parallel (maybe mid-next-week) about the
>>>> high-level ideas & to get a feel for our goals, conflicts etc. Doing
>>>> that over very long E-Mail exchanges (and the fault of "long" there is
>>>> mostly on my side:) can be a bit harder...
>>>
>>> I agree. I we can work out a time in a private thread and I can send
>>> you a video call invite.
>> 
>> Ævar and I just finished our chat and came away with these two
>> action items:
>> 
>> 1. Ævar will finish prepping his RFC as-is and send it to the list.
>>    It contains several deeply technical optimizations that are
>>    critical to how his model works, but could also be used to
>>    improve scenarios in the table of contents model.
>
> Ævar: I'm still waiting on the full version of this. While you
> updated [1] your original RFC [2], it was incomplete. I am still
> looking forward to seeing your full vision of how it works with
> incremental fetch and how your optimizations to download only the
> headers of the bundles will work.
>
> [1] [RFC PATCH v2 00/13] bundle-uri: a "dumb CDN" for git
>     https://lore.kernel.org/git/RFC-cover-v2-00.13-00000000000-20220311T155841Z-avarab@gmail.com/
>
> [2] [RFC PATCH 00/13] Add bundle-uri: resumably clones, static "dumb" CDN etc.
>     https://lore.kernel.org/git/RFC-cover-00.13-0000000000-20210805T150534Z-avarab@gmail.com/
>
>> 2. Ævar will then do a round of taking both series and combining
>>    them in a way that allows the union of possible functionality
>>    to work.
>
> Or perhaps you are jumping straight to this part?

Yeah, that was part of it...

>> 3. As these things come out, I will make it a priority to read the
>>    patches and provide feedback focusing on high-level concepts
>>    and ways we can split the future, non-RFC series into chunks
>>    that provide incremental functionality while keeping review
>>    easier than reading the whole series.
>
> I'm still looking forward to seeing progress in this area. Please
> let me know what your plan is here.

Hi. I'm sorry about the delay, I ran into various life/software things,
and found that this topic required a lot of continuous "sit down for a
day and work on it" attention from me v.s. some other topics where I'd
deal with interruption better.

Then I was hoping that the merger of your bundle.c changes would come a
bit earlier before the rc, but they pretty much coincided, and since the
rc dropped I've been hesitant to send a very large topic to the list
(c.f. e.g. [1]).

Maybe I should just bite the bullet and submit it anyway, what do you
think?

1. https://lore.kernel.org/git/nycvar.QRO.7.76.6.2204071407160.347@tvgsbejvaqbjf.bet/
Derrick Stolee April 8, 2022, 1:13 p.m. UTC | #11
On 4/8/2022 5:15 AM, Ævar Arnfjörð Bjarmason wrote:
> 
> On Thu, Apr 07 2022, Derrick Stolee wrote:
>> Ævar: I'm still waiting on the full version of this. While you
>> updated [1] your original RFC [2], it was incomplete. I am still
>> looking forward to seeing your full vision of how it works with
>> incremental fetch and how your optimizations to download only the
>> headers of the bundles will work.
>>
>> [1] [RFC PATCH v2 00/13] bundle-uri: a "dumb CDN" for git
>>     https://lore.kernel.org/git/RFC-cover-v2-00.13-00000000000-20220311T155841Z-avarab@gmail.com/
>>
>> [2] [RFC PATCH 00/13] Add bundle-uri: resumably clones, static "dumb" CDN etc.
>>     https://lore.kernel.org/git/RFC-cover-00.13-0000000000-20210805T150534Z-avarab@gmail.com/
>>
>>> 2. Ævar will then do a round of taking both series and combining
>>>    them in a way that allows the union of possible functionality
>>>    to work.
>>
>> Or perhaps you are jumping straight to this part?
> 
> Yeah, that was part of it...
> 
>>> 3. As these things come out, I will make it a priority to read the
>>>    patches and provide feedback focusing on high-level concepts
>>>    and ways we can split the future, non-RFC series into chunks
>>>    that provide incremental functionality while keeping review
>>>    easier than reading the whole series.
>>
>> I'm still looking forward to seeing progress in this area. Please
>> let me know what your plan is here.
> 
> Hi. I'm sorry about the delay, I ran into various life/software things,
> and found that this topic required a lot of continuous "sit down for a
> day and work on it" attention from me v.s. some other topics where I'd
> deal with interruption better.
> 
> Then I was hoping that the merger of your bundle.c changes would come a
> bit earlier before the rc, but they pretty much coincided, and since the
> rc dropped I've been hesitant to send a very large topic to the list
> (c.f. e.g. [1]).
> 
> Maybe I should just bite the bullet and submit it anyway, what do you
> think?
> 
> 1. https://lore.kernel.org/git/nycvar.QRO.7.76.6.2204071407160.347@tvgsbejvaqbjf.bet/

We could err on the side of not distracting from the list. I tend to
think the release phase can be a good time to talk about an RFC, since
Junio can ignore the thread until it is actually ready for full review.

At minimum, I appreciate knowing your progress. I can be patient for a
few more weeks as long as I have confidence that it will be shared as
soon as external blockers are cleared.

Thanks!
-Stolee
Junio C Hamano April 8, 2022, 6:26 p.m. UTC | #12
Derrick Stolee <derrickstolee@github.com> writes:

> We could err on the side of not distracting from the list. I tend to
> think the release phase can be a good time to talk about an RFC, since

Yes, I think that it is a good thing to do.  I'd need to be careful
not to miss updates and issues critical to the upcoming release but
a large topic that clearly marks itself as RFC would not get in the
way, I would think.

> Junio can ignore the thread until it is actually ready for full review.

It is true for both during -rc freeze and outside ;-)