Message ID | pull.1160.git.1645641063.gitgitgadget@gmail.com (mailing list archive) |
---|---|
Headers | show |
Series | Bundle URIs | expand |
On Wed, Feb 23 2022, Derrick Stolee via GitGitGadget wrote: [Note: The E-Mail address you CC'd for me (presumably, dropped in this reply) is not my E-Mail address, this one is] [Also CC-ing some people who have expressed interest in this are, and would probably like to be kept in the loop going forward] > There have been several suggestions to improve Git clone speeds and > reliability by supplementing the Git protocol with static content. The > Packfile URI [0] feature lets the Git response include URIs that point to > packfiles that the client must download to complete the request. > > Last year, Ævar suggested using bundles instead of packfiles [1] [2]. This > design has the same benefits to the packfile URI feature because it offloads > most object downloads to static content fetches. The main advantage over > packfile URIs is that the remote Git server does not need to know what is in > those bundles. The Git client tells the server what it downloaded during the > fetch negotiation afterwards. This includes any chance that the client did > not have access to those bundles or otherwise failed to access them. I > agreed that this was a much more desirable way to serve static content, but > had concerns about the flexibility of that design [3]. I have not heard more > on the topic since October, so I started investigating this idea myself in > December, resulting in this RFC. This timing is both quite fortunate & unfortunate for me, since I'd been blocked / waiting on various things until very recently to submit a non-RFC re-roll of (a larger version of) that series you mentioned from October. I guess the good news is that we'll have at least one guaranteed very interested reviewer for each other's patches, and that the design that makes it into git.git in the end will definitely be well hashed out :) I won't be able to review this in any detail right at this hour, but will be doing so. I'd also like to submit what I've got in some form soon for hashing the two out. It will be some 50+ patches on the ML in total though related to this topic, so I think the two of us coming up with some way to manage all of that for both ourselves & others would be nice. Perhaps we could also have an off-list (video) chat in real time to clarify/discuss various thing related to this. Having said that, basically: > I focused on maximizing flexibility for the service that organizes and > serves bundles. This includes: > > * Bundle URIs work for full and partial clones. > > * Bundle URIs can assist with git fetch in addition to git clone. > > * Users can set up bundle servers independent of the remote Git server if > they specify the bundle URI via a --bundle-uri argument. > > This series is based on the recently-submitted series that adds object > filters to bundles [4]. There is a slight adjacent-line-add conflict with > js/apply-partial-clone-filters-recursively, but that is in the last few > patches, so it will be easy to rebase by the time we have a fully-reviewable > patch series for those steps. > > The general breakdown is as follows: > > * Patch 1 adds documentation for the feature in its entirety. > > * Patches 2-14 add the ability to run ‘git clone --bundle-uri=’ > > * Patches 15-17 add bundle fetches to ‘git fetch’ calls > > * Patches 18-25 add a new ‘features’ capability that allows a server to > advertise bundle URIs (and in the future, other features). > > I consider the patches in their current form to be “RFC quality”. There are > multiple places where tests are missing or special cases are not checked. > The goal for this RFC is to seek feedback on the high-level ideas before > committing to the deep work of creating mergeable patches. Having skimmed through all of this a *very rough* overview of what you've got here & the direction I chose to go in is: 1. I didn't go for an initial step of teaching "git bundle" any direct remote operation, rather it's straight to the protocol v2 bits etc. I don't think there's anything wrong with that, but didn't see much point in teaching "git bundle" to do that when the eventual state is to have "git fetch" do so anyway. But in either case the "fetch" parts are either a thin wrapper for "git bundle fetch", or a "git bundle fetch/unbundle" is a thin equivalent to "init" "fetch" (with bundle-uri) + "unbundle". 2. By far the main difference is that you're heavily leaning on a TOC format which encodes certain assumptions that aren't true of clones/fetches in general (but probably are for most fetches), whereas my design (as we previously discussed) leans entirely on the client making sense of the bundle header & content itself. E.g. you have a "bundle.tableOfContents.forFetch", but e.g. if you've got a git.git clone of "master" and want to: git fetch origin refs/heads/todo:todo The assumption that we can cleanly separate "clone" from "fetch" breaks down. I.e. such a thing needs to assume that "clone" implies "you have most of the objects you need already" and that "fetch" means "..an incremental update thereof", doesn't it? Whereas I think (but we'll hash that out) that having a client fetch the bundle header and working that out via current reachability checks will be just as fast/faster, and such a thing is definitely more general/applicable to all sorts/types of fetches. (A TOC mechanism might still be good/valuable, but I hope it can be a cheap/discardable way to simply cache those bundle headers, or serve them up all at once) 3. Ditto "bundle.<id>.timestamp" in the design (presumably assumes not-rewound histories), and "requires" (can also currently be inferred from bundle headers). 4. I still need to go over your just-submitted "bundle filters" (https://lore.kernel.org/git/pull.1159.git.1645638911.gitgitgadget@gmail.com/) in detail but by adding a @filter to the file format (good!) presumably the "bundle.<id>.filter" amounts to a cache of the headers (which was 100% in line with any design I had for such extra information associated with a bundle). In (partial) summary: I really want to lean more heavily into the distributed nature of git in that a "bundle clone" be no more special than the same operation performed locally where "clone/fetch" is pointed-to a directory containing X number of local bundles, and has to make sense of whether those help with the clone/fetch operation. I.e. by parsing their headers & comparing that to the ref advertisement. Maybe a meta-format TOC will be needed eventually, and I'm not against such a thing. I'd just like to make sure we wouldn't add such a thing as a premature optimization or something that would needlessly complicate the design. In particular (quoting from a part of 01/25: +A further optimization is that the client can avoid downloading any +bundles if their timestamps are not larger than the stored timestamp. +After fetching new bundles, this local timestamp value is updated. Such caching seems sensible, but to me seems basically redundant to what you'd get by doing the same with just: * A set of dumb bundle files in a directory on a webserver * Having unique names for each of those (e.g. naming them https://<host>/<hash-of-content>.bundle instead of https://<host>/weekly.bundle) * Since the content wouldn't change (HTTP headers indicating caching forever) a client would have downloaded say the last 6 of your set of 7 "daily" rotating bundles already, and we'd locally cache their entire header, not just a timestamp. I.e. I think you'd get the same reduction in requests and more from that. I.e. (to go back to the earlier example) of: git fetch origin refs/heads/todo:todo You'd get the tip of "ls-refs" for TODO, and locally discover that one of the 6 "daily" bundles whose headers (but not necessarily content) you already downloaded had that advertised OID, and grab it from there. The critical difference being that such an arrangement would not be assuming linear history/additive only (i.e. only fast-forward) which the "forFetch" + "timestamp" surely does. And, I think we'll be much better off both in the short and long term by heavily leaning into HTTP caching features and things like request pipelining + range requests than a custom meta-index format. IOW is a TOC format needed if we assume for a moment for the sake of argument that for a given repository the say 100 bundles you'd potentially serve up aren't remote at all, but something you've got mmap()'d and can inspect the bundle headers for and compare with the remote "ls-refs"? Because if that's the case we could basically get to the same place via HTTP caching features, and doing it that way has the advantage of piggy-backing on all existing caching infrastructure. Have 1000 computers on your network that keep fetching torvalds/linux? Stick a proxy configured to cache the first say 1MB of <bundle-base-url> in front of them. Now all their requests to discover if the bundles help will be local (and it would probably make sense to cache the actual content too). Whereas any type of custom caching strategy would be per-git-client. Just food for thought, and sorry that this E-Mail/braindump got so long already...
On 2/23/2022 5:17 PM, Ævar Arnfjörð Bjarmason wrote: > > On Wed, Feb 23 2022, Derrick Stolee via GitGitGadget wrote: > > [Note: The E-Mail address you CC'd for me (presumably, dropped in this > reply) is not my E-Mail address, this one is] Sorry about that. I appear to have gotten it right on the partial bundles series, but somehow had a brain fart here. > [Also CC-ing some people who have expressed interest in this are, and > would probably like to be kept in the loop going forward] Thanks. The more eyes, the better. >> There have been several suggestions to improve Git clone speeds and >> reliability by supplementing the Git protocol with static content. The >> Packfile URI [0] feature lets the Git response include URIs that point to >> packfiles that the client must download to complete the request. >> >> Last year, Ævar suggested using bundles instead of packfiles [1] [2]. This >> design has the same benefits to the packfile URI feature because it offloads >> most object downloads to static content fetches. The main advantage over >> packfile URIs is that the remote Git server does not need to know what is in >> those bundles. The Git client tells the server what it downloaded during the >> fetch negotiation afterwards. This includes any chance that the client did >> not have access to those bundles or otherwise failed to access them. I >> agreed that this was a much more desirable way to serve static content, but >> had concerns about the flexibility of that design [3]. I have not heard more >> on the topic since October, so I started investigating this idea myself in >> December, resulting in this RFC. > > This timing is both quite fortunate & unfortunate for me, since I'd been > blocked / waiting on various things until very recently to submit a > non-RFC re-roll of (a larger version of) that series you mentioned from > October. > > I guess the good news is that we'll have at least one guaranteed very > interested reviewer for each other's patches, and that the design that > makes it into git.git in the end will definitely be well hashed out :) > > I won't be able to review this in any detail right at this hour, but > will be doing so. I'd also like to submit what I've got in some form > soon for hashing the two out. > > It will be some 50+ patches on the ML in total though related to this > topic, so I think the two of us coming up with some way to manage all of > that for both ourselves & others would be nice. Perhaps we could also > have an off-list (video) chat in real time to clarify/discuss various > thing related to this. I look forward to seeing your full implementation. There are many things about your RFC that left me confused and not fully understanding your vision. You reference a few of them further down, so I'll mention them specifically in that context. > Having said that, basically: > >> I focused on maximizing flexibility for the service that organizes and >> serves bundles. This includes: >> >> * Bundle URIs work for full and partial clones. >> >> * Bundle URIs can assist with git fetch in addition to git clone. >> >> * Users can set up bundle servers independent of the remote Git server if >> they specify the bundle URI via a --bundle-uri argument. >> >> This series is based on the recently-submitted series that adds object >> filters to bundles [4]. There is a slight adjacent-line-add conflict with >> js/apply-partial-clone-filters-recursively, but that is in the last few >> patches, so it will be easy to rebase by the time we have a fully-reviewable >> patch series for those steps. >> >> The general breakdown is as follows: >> >> * Patch 1 adds documentation for the feature in its entirety. >> >> * Patches 2-14 add the ability to run ‘git clone --bundle-uri=’ >> >> * Patches 15-17 add bundle fetches to ‘git fetch’ calls >> >> * Patches 18-25 add a new ‘features’ capability that allows a server to >> advertise bundle URIs (and in the future, other features). >> >> I consider the patches in their current form to be “RFC quality”. There are >> multiple places where tests are missing or special cases are not checked. >> The goal for this RFC is to seek feedback on the high-level ideas before >> committing to the deep work of creating mergeable patches. > > Having skimmed through all of this a *very rough* overview of what > you've got here & the direction I chose to go in is: > > 1. I didn't go for an initial step of teaching "git bundle" any direct > remote operation, rather it's straight to the protocol v2 bits etc. > > I don't think there's anything wrong with that, but didn't see much > point in teaching "git bundle" to do that when the eventual state is > to have "git fetch" do so anyway. > > But in either case the "fetch" parts are either a thin wrapper for > "git bundle fetch", or a "git bundle fetch/unbundle" is a thin > equivalent to "init" "fetch" (with bundle-uri) + "unbundle". I'm not married to this specific implementation, although having the bundle fetch be something a user could run independently of 'git fetch' or 'git clone' might be desirable. > 2. By far the main difference is that you're heavily leaning on a TOC > format which encodes certain assumptions that aren't true of > clones/fetches in general (but probably are for most fetches), whereas > my design (as we previously discussed) leans entirely on the client > making sense of the bundle header & content itself. > > E.g. you have a "bundle.tableOfContents.forFetch", but e.g. if you've > got a git.git clone of "master" and want to: > > git fetch origin refs/heads/todo:todo > > The assumption that we can cleanly separate "clone" from "fetch" breaks > down. > > I.e. such a thing needs to assume that "clone" implies "you have > most of the objects you need already" and that "fetch" means "..an > incremental update thereof", doesn't it? > > Whereas I think (but we'll hash that out) that having a client fetch the > bundle header and working that out via current reachability checks will > be just as fast/faster, and such a thing is definitely more > general/applicable to all sorts/types of fetches. > > (A TOC mechanism might still be good/valuable, but I hope it can be a > cheap/discardable way to simply cache those bundle headers, or serve > them up all at once) Note that the TOC is completely optional, and you could serve a bundle from the advertised URI. The biggest difference is that the TOC allows flexibility that your design did not (or at least, I could not detect how to get that flexibility out of it). The biggest thing that I am trying to understand from your design is something I'm going to make an educated guess about: if you _do_ break the repo into multiple bundles, then the bundle URI advertisement sends multiple URIs that must _all_ be inspected to get the full bundle content. The change in the TOC model is that there can be multiple potential servers hosting multiple bundles, and the bundle URI advertisement sending multiple URIs means "any of these would work. Try one close to you, or use the others as a fallback." This seems like the biggest incompatibility in our approaches based on my understanding. Finally, the biggest thing that is possible in my model that is not in yours is the idea of an independent bundle server that is not known by the origin server. It seems that everything relies on the bundle URI advertisement, while this implementation allows a --bundle-uri=<X> at clone time or a configured URI. Again, the TOC is critical here, so there can be one well-known URI for the TOC that doesn't change over time, and can be used to fetch multiple bundles. And perhaps you are intending the bundle URI to point to a directory listing, but that seems like it would need a format that Git understands. And you mention parsing bundle headers, which I would really like to see how you intend to implement that without downloading too much data from the remotes. The TOC can be extended to include whatever header information you intend to use for these decisions. > 3. Ditto "bundle.<id>.timestamp" in the design (presumably assumes not-rewound > histories), and "requires" (can also currently be inferred from bundle > headers). > > 4. I still need to go over your just-submitted "bundle filters" > (https://lore.kernel.org/git/pull.1159.git.1645638911.gitgitgadget@gmail.com/) > in detail but by adding a @filter to the file format (good!) presumably the > "bundle.<id>.filter" amounts to a cache of the headers (which was 100% in line > with any design I had for such extra information associated with a bundle). > > In (partial) summary: I really want to lean more heavily into the > distributed nature of git in that a "bundle clone" be no more special > than the same operation performed locally where "clone/fetch" is > pointed-to a directory containing X number of local bundles, and has to > make sense of whether those help with the clone/fetch operation. I.e. by > parsing their headers & comparing that to the ref advertisement. Cloning and fetching from bundles is fundamentally different from the dynamic fetch negotiation of the Git protocol, so I see this intent as a drawback to your approach, not a strength. > Maybe a meta-format TOC will be needed eventually, and I'm not against > such a thing. > > I'd just like to make sure we wouldn't add such a thing as a premature > optimization or something that would needlessly complicate the > design. In particular (quoting from a part of 01/25: > > +A further optimization is that the client can avoid downloading any > +bundles if their timestamps are not larger than the stored timestamp. > +After fetching new bundles, this local timestamp value is updated. > > Such caching seems sensible, but to me seems basically redundant to what > you'd get by doing the same with just: > > * A set of dumb bundle files in a directory on a webserver > * Having unique names for each of those (e.g. naming them > https://<host>/<hash-of-content>.bundle instead of > https://<host>/weekly.bundle) The bundles on my prototype server do have unique names (the timestamp is part of the name), even though they don't include a hash of the contents. This is mostly for readability when a human looks at the TOC, since it does not affect the correctness. A hash could be used in the name if the bundle server wanted, but it also isn't required. > * Since the content wouldn't change (HTTP headers indicating caching > forever) a client would have downloaded say the last 6 of your set of > 7 "daily" rotating bundles already, and we'd locally cache their > entire header, not just a timestamp. This model either requires Git understanding how to walk that directory of files on the webserver OR for the origin server to be aware of the hash of every bundle that might be fetched. That coupling is exactly the kind of thing I think is too difficult for serving with packfile URIs. > I.e. I think you'd get the same reduction in requests and more from > that. I.e. (to go back to the earlier example) of: > > git fetch origin refs/heads/todo:todo > > You'd get the tip of "ls-refs" for TODO, and locally discover that one > of the 6 "daily" bundles whose headers (but not necessarily content) you > already downloaded had that advertised OID, and grab it from there. It would be easy to special-case custom refspecs as not wanting bundle data. This goes back to the idea that serving from static content is _not_ dynamic enough to handle these cases and therefore should not be a major concern for the design. Everything about using static content should be a _heuristic_ that works for most users most-frequent operations. Most users have a standard refs/heads/*:refs/remotes/origin/* refspec when fetching, so they want as much data as they can get from that refspec from the bundles. > The critical difference being that such an arrangement would not be > assuming linear history/additive only (i.e. only fast-forward) which the > "forFetch" + "timestamp" surely does. Generally, refs/heads/* does move forward. Yes, if someone force-pushes then there is a chance that some extra data from their older version is stored in a bundle somewhere. > And, I think we'll be much better off both in the short and long term by > heavily leaning into HTTP caching features and things like request > pipelining + range requests than a custom meta-index format. The point of TOC is not to split bundles small enough to avoid those features. Those things are still possible, and I expect that anyone organizing a bundle server would want one very large bundle to handle the majority of the repo data. Range requests are critical for allowing resumable clones for that large data. > IOW is a TOC format needed if we assume for a moment for the sake of > argument that for a given repository the say 100 bundles you'd > potentially serve up aren't remote at all, but something you've got > mmap()'d and can inspect the bundle headers for and compare with the > remote "ls-refs"? > > Because if that's the case we could basically get to the same place via > HTTP caching features, and doing it that way has the advantage of > piggy-backing on all existing caching infrastructure. 1. 100 bundles is probably at least triple the maximum I would recommend, unless you have a have a super-active monorepo that wants bundles computed hourly. 2. You keep saying "inspecting the headers" to make a decision, but I have yet to see you present a way that you would organize bundles such that that decision isn't just "get all bundles I haven't already downloaded" and then be equivalent to the timestamp heuristic. > Have 1000 computers on your network that keep fetching torvalds/linux? > Stick a proxy configured to cache the first say 1MB of <bundle-base-url> > in front of them. > > Now all their requests to discover if the bundles help will be local > (and it would probably make sense to cache the actual content > too). Whereas any type of custom caching strategy would be > per-git-client. > > Just food for thought, and sorry that this E-Mail/braindump got so long > already... I recommend that when you have the time you look carefully at Patch 1 which actually includes the design document for the feature. That doc contains concrete descriptions of all the ways the presented design is flexible for users and server administrators. In particular, I think an extremely valuable aspect of this design is the ability for someone to spin up their own bundle server in their build lab and modify their build scripts to fetch from that bundle server before going to the origin remote. If your design can handle that scenario, then I'd love to see how. Take your time preparing your patches, and I'll give them a careful read then they are available. I expect that you have some great ideas that could augment this design, and I'm sure that your code will have some implementation details that are better than mine. We can find a way to combine to get the best of both worlds. That said, I do feel that I must not have done an adequate job of describing how important I think it is to have a more flexible design than what you presented before. You continue to push back that your ideas are sufficient, but I disagree. Unfortunately, I cannot be sure either way until I see your full implementation. Thanks, -Stolee
On 2/24/2022 9:11 AM, Derrick Stolee wrote: > On 2/23/2022 5:17 PM, Ævar Arnfjörð Bjarmason wrote: >> >> On Wed, Feb 23 2022, Derrick Stolee via GitGitGadget wrote: >>> There have been several suggestions to improve Git clone speeds and >>> reliability by supplementing the Git protocol with static content. The >>> Packfile URI [0] feature lets the Git response include URIs that point to >>> packfiles that the client must download to complete the request. >>> >>> Last year, Ævar suggested using bundles instead of packfiles [1] [2]. This >>> design has the same benefits to the packfile URI feature because it offloads >>> most object downloads to static content fetches. The main advantage over >>> packfile URIs is that the remote Git server does not need to know what is in >>> those bundles. The Git client tells the server what it downloaded during the >>> fetch negotiation afterwards. This includes any chance that the client did >>> not have access to those bundles or otherwise failed to access them. I >>> agreed that this was a much more desirable way to serve static content, but >>> had concerns about the flexibility of that design [3]. I have not heard more >>> on the topic since October, so I started investigating this idea myself in >>> December, resulting in this RFC. >> >> This timing is both quite fortunate & unfortunate for me, since I'd been >> blocked / waiting on various things until very recently to submit a >> non-RFC re-roll of (a larger version of) that series you mentioned from >> October. >> >> I guess the good news is that we'll have at least one guaranteed very >> interested reviewer for each other's patches, and that the design that >> makes it into git.git in the end will definitely be well hashed out :) >> >> I won't be able to review this in any detail right at this hour, but >> will be doing so. I'd also like to submit what I've got in some form >> soon for hashing the two out. >> >> It will be some 50+ patches on the ML in total though related to this >> topic, so I think the two of us coming up with some way to manage all of >> that for both ourselves & others would be nice. Perhaps we could also >> have an off-list (video) chat in real time to clarify/discuss various >> thing related to this. > > I look forward to seeing your full implementation. There are many things > about your RFC that left me confused and not fully understanding your > vision. I am genuinely curious to see your full implementation of bundle URIs. I've been having trouble joining the Git IRC chats, but I saw from the logs that you are working on getting patches together. Do you have an expected timeline on that progress? I would like to move forward in getting bundle URIs submitted as a full feature, but it is important to see your intended design so we can take the best parts of both to create a version that satisfies us both. Thanks, -Stolee
On Fri, Mar 04 2022, Derrick Stolee wrote: > On 2/24/2022 9:11 AM, Derrick Stolee wrote: >> On 2/23/2022 5:17 PM, Ævar Arnfjörð Bjarmason wrote: >>> >>> On Wed, Feb 23 2022, Derrick Stolee via GitGitGadget wrote: > >>>> There have been several suggestions to improve Git clone speeds and >>>> reliability by supplementing the Git protocol with static content. The >>>> Packfile URI [0] feature lets the Git response include URIs that point to >>>> packfiles that the client must download to complete the request. >>>> >>>> Last year, Ævar suggested using bundles instead of packfiles [1] [2]. This >>>> design has the same benefits to the packfile URI feature because it offloads >>>> most object downloads to static content fetches. The main advantage over >>>> packfile URIs is that the remote Git server does not need to know what is in >>>> those bundles. The Git client tells the server what it downloaded during the >>>> fetch negotiation afterwards. This includes any chance that the client did >>>> not have access to those bundles or otherwise failed to access them. I >>>> agreed that this was a much more desirable way to serve static content, but >>>> had concerns about the flexibility of that design [3]. I have not heard more >>>> on the topic since October, so I started investigating this idea myself in >>>> December, resulting in this RFC. >>> >>> This timing is both quite fortunate & unfortunate for me, since I'd been >>> blocked / waiting on various things until very recently to submit a >>> non-RFC re-roll of (a larger version of) that series you mentioned from >>> October. >>> >>> I guess the good news is that we'll have at least one guaranteed very >>> interested reviewer for each other's patches, and that the design that >>> makes it into git.git in the end will definitely be well hashed out :) >>> >>> I won't be able to review this in any detail right at this hour, but >>> will be doing so. I'd also like to submit what I've got in some form >>> soon for hashing the two out. >>> >>> It will be some 50+ patches on the ML in total though related to this >>> topic, so I think the two of us coming up with some way to manage all of >>> that for both ourselves & others would be nice. Perhaps we could also >>> have an off-list (video) chat in real time to clarify/discuss various >>> thing related to this. >> >> I look forward to seeing your full implementation. There are many things >> about your RFC that left me confused and not fully understanding your >> vision. > > I am genuinely curious to see your full implementation of bundle URIs. > I've been having trouble joining the Git IRC chats, but I saw from the > logs that you are working on getting patches together. > > Do you have an expected timeline on that progress? > > I would like to move forward in getting bundle URIs submitted as a full > feature, but it is important to see your intended design so we can take > the best parts of both to create a version that satisfies us both. Hi. Very sorry about the late reply. I relly meant to have something read to send at the end of this week, but it's been a bit of a whirlwhind of random things coming up & distracting me too much. Sorry. I'm also totally on board with that goal, are you OK with waiting until the end of next week at the latest? Also, as noted in the upthread <220224.86czjdb22l.gmgdl@evledraar.gmail.com> it might be useful to chat in a more voice/video medium in parallel (maybe mid-next-week) about the high-level ideas & to get a feel for our goals, conflicts etc. Doing that over very long E-Mail exchanges (and the fault of "long" there is mostly on my side:) can be a bit harder...
On 3/4/2022 9:49 AM, Ævar Arnfjörð Bjarmason wrote: > > On Fri, Mar 04 2022, Derrick Stolee wrote: > >> On 2/24/2022 9:11 AM, Derrick Stolee wrote: >>> I look forward to seeing your full implementation. There are many things >>> about your RFC that left me confused and not fully understanding your >>> vision. >> >> I am genuinely curious to see your full implementation of bundle URIs. >> I've been having trouble joining the Git IRC chats, but I saw from the >> logs that you are working on getting patches together. >> >> Do you have an expected timeline on that progress? >> >> I would like to move forward in getting bundle URIs submitted as a full >> feature, but it is important to see your intended design so we can take >> the best parts of both to create a version that satisfies us both. > > Hi. Very sorry about the late reply. I relly meant to have something > read to send at the end of this week, but it's been a bit of a > whirlwhind of random things coming up & distracting me too much. Sorry. > > I'm also totally on board with that goal, are you OK with waiting until > the end of next week at the latest? I'm OK with waiting, especially when I have a timeline in mind. > Also, as noted in the upthread > <220224.86czjdb22l.gmgdl@evledraar.gmail.com> it might be useful to chat > in a more voice/video medium in parallel (maybe mid-next-week) about the > high-level ideas & to get a feel for our goals, conflicts etc. Doing > that over very long E-Mail exchanges (and the fault of "long" there is > mostly on my side:) can be a bit harder... I agree. I we can work out a time in a private thread and I can send you a video call invite. Thanks, -Stolee
Ævar Arnfjörð Bjarmason wrote on Wed, 23 Feb 2022 23:17:22 +0100: >[Note: The E-Mail address you CC'd for me (presumably, dropped in this >reply) is not my E-Mail address, this one is] > >[Also CC-ing some people who have expressed interest in this are, and >would probably like to be kept in the loop going forward] Appreciate for the CC. I'm attractive on this for a while. On a earlier time, I had posted a patchset about to extend "packfile-uri" for common or similar reasons, but after I saw the idea and RFC from Ævar Arnfjörð Bjarmason, I suspended it. Really looking forward the solution after you guys reach the consensuses and I'm glad to attend and listen to your meetings, if possible (I looked at the context a little bit late, maybe already missed it). Thanks.
On 3/4/2022 10:12 AM, Derrick Stolee wrote: > On 3/4/2022 9:49 AM, Ævar Arnfjörð Bjarmason wrote: >> Also, as noted in the upthread >> <220224.86czjdb22l.gmgdl@evledraar.gmail.com> it might be useful to chat >> in a more voice/video medium in parallel (maybe mid-next-week) about the >> high-level ideas & to get a feel for our goals, conflicts etc. Doing >> that over very long E-Mail exchanges (and the fault of "long" there is >> mostly on my side:) can be a bit harder... > > I agree. I we can work out a time in a private thread and I can send > you a video call invite. Ævar and I just finished our chat and came away with these two action items: 1. Ævar will finish prepping his RFC as-is and send it to the list. It contains several deeply technical optimizations that are critical to how his model works, but could also be used to improve scenarios in the table of contents model. 2. Ævar will then do a round of taking both series and combining them in a way that allows the union of possible functionality to work. 3. As these things come out, I will make it a priority to read the patches and provide feedback focusing on high-level concepts and ways we can split the future, non-RFC series into chunks that provide incremental functionality while keeping review easier than reading the whole series. Thanks, -Stolee
Hi Stolee & Ævar, On Tue, 8 Mar 2022, Derrick Stolee wrote: > On 3/4/2022 10:12 AM, Derrick Stolee wrote: > > On 3/4/2022 9:49 AM, Ævar Arnfjörð Bjarmason wrote: > >> Also, as noted in the upthread > >> <220224.86czjdb22l.gmgdl@evledraar.gmail.com> it might be useful to chat > >> in a more voice/video medium in parallel (maybe mid-next-week) about the > >> high-level ideas & to get a feel for our goals, conflicts etc. Doing > >> that over very long E-Mail exchanges (and the fault of "long" there is > >> mostly on my side:) can be a bit harder... > > > > I agree. I we can work out a time in a private thread and I can send > > you a video call invite. > > Ævar and I just finished our chat and came away with these two > action items: > > 1. Ævar will finish prepping his RFC as-is and send it to the list. > It contains several deeply technical optimizations that are > critical to how his model works, but could also be used to > improve scenarios in the table of contents model. > > 2. Ævar will then do a round of taking both series and combining > them in a way that allows the union of possible functionality > to work. > > 3. As these things come out, I will make it a priority to read the > patches and provide feedback focusing on high-level concepts > and ways we can split the future, non-RFC series into chunks > that provide incremental functionality while keeping review > easier than reading the whole series. I very much look forward to see the combined work soon! Thank you, both, Johannes
On 3/8/2022 12:15 PM, Derrick Stolee wrote: > On 3/4/2022 10:12 AM, Derrick Stolee wrote: >> On 3/4/2022 9:49 AM, Ævar Arnfjörð Bjarmason wrote: >>> Also, as noted in the upthread >>> <220224.86czjdb22l.gmgdl@evledraar.gmail.com> it might be useful to chat >>> in a more voice/video medium in parallel (maybe mid-next-week) about the >>> high-level ideas & to get a feel for our goals, conflicts etc. Doing >>> that over very long E-Mail exchanges (and the fault of "long" there is >>> mostly on my side:) can be a bit harder... >> >> I agree. I we can work out a time in a private thread and I can send >> you a video call invite. > > Ævar and I just finished our chat and came away with these two > action items: > > 1. Ævar will finish prepping his RFC as-is and send it to the list. > It contains several deeply technical optimizations that are > critical to how his model works, but could also be used to > improve scenarios in the table of contents model. Ævar: I'm still waiting on the full version of this. While you updated [1] your original RFC [2], it was incomplete. I am still looking forward to seeing your full vision of how it works with incremental fetch and how your optimizations to download only the headers of the bundles will work. [1] [RFC PATCH v2 00/13] bundle-uri: a "dumb CDN" for git https://lore.kernel.org/git/RFC-cover-v2-00.13-00000000000-20220311T155841Z-avarab@gmail.com/ [2] [RFC PATCH 00/13] Add bundle-uri: resumably clones, static "dumb" CDN etc. https://lore.kernel.org/git/RFC-cover-00.13-0000000000-20210805T150534Z-avarab@gmail.com/ > 2. Ævar will then do a round of taking both series and combining > them in a way that allows the union of possible functionality > to work. Or perhaps you are jumping straight to this part? > 3. As these things come out, I will make it a priority to read the > patches and provide feedback focusing on high-level concepts > and ways we can split the future, non-RFC series into chunks > that provide incremental functionality while keeping review > easier than reading the whole series. I'm still looking forward to seeing progress in this area. Please let me know what your plan is here. Thanks, -Stolee
On Thu, Apr 07 2022, Derrick Stolee wrote: > On 3/8/2022 12:15 PM, Derrick Stolee wrote: >> On 3/4/2022 10:12 AM, Derrick Stolee wrote: >>> On 3/4/2022 9:49 AM, Ævar Arnfjörð Bjarmason wrote: >>>> Also, as noted in the upthread >>>> <220224.86czjdb22l.gmgdl@evledraar.gmail.com> it might be useful to chat >>>> in a more voice/video medium in parallel (maybe mid-next-week) about the >>>> high-level ideas & to get a feel for our goals, conflicts etc. Doing >>>> that over very long E-Mail exchanges (and the fault of "long" there is >>>> mostly on my side:) can be a bit harder... >>> >>> I agree. I we can work out a time in a private thread and I can send >>> you a video call invite. >> >> Ævar and I just finished our chat and came away with these two >> action items: >> >> 1. Ævar will finish prepping his RFC as-is and send it to the list. >> It contains several deeply technical optimizations that are >> critical to how his model works, but could also be used to >> improve scenarios in the table of contents model. > > Ævar: I'm still waiting on the full version of this. While you > updated [1] your original RFC [2], it was incomplete. I am still > looking forward to seeing your full vision of how it works with > incremental fetch and how your optimizations to download only the > headers of the bundles will work. > > [1] [RFC PATCH v2 00/13] bundle-uri: a "dumb CDN" for git > https://lore.kernel.org/git/RFC-cover-v2-00.13-00000000000-20220311T155841Z-avarab@gmail.com/ > > [2] [RFC PATCH 00/13] Add bundle-uri: resumably clones, static "dumb" CDN etc. > https://lore.kernel.org/git/RFC-cover-00.13-0000000000-20210805T150534Z-avarab@gmail.com/ > >> 2. Ævar will then do a round of taking both series and combining >> them in a way that allows the union of possible functionality >> to work. > > Or perhaps you are jumping straight to this part? Yeah, that was part of it... >> 3. As these things come out, I will make it a priority to read the >> patches and provide feedback focusing on high-level concepts >> and ways we can split the future, non-RFC series into chunks >> that provide incremental functionality while keeping review >> easier than reading the whole series. > > I'm still looking forward to seeing progress in this area. Please > let me know what your plan is here. Hi. I'm sorry about the delay, I ran into various life/software things, and found that this topic required a lot of continuous "sit down for a day and work on it" attention from me v.s. some other topics where I'd deal with interruption better. Then I was hoping that the merger of your bundle.c changes would come a bit earlier before the rc, but they pretty much coincided, and since the rc dropped I've been hesitant to send a very large topic to the list (c.f. e.g. [1]). Maybe I should just bite the bullet and submit it anyway, what do you think? 1. https://lore.kernel.org/git/nycvar.QRO.7.76.6.2204071407160.347@tvgsbejvaqbjf.bet/
On 4/8/2022 5:15 AM, Ævar Arnfjörð Bjarmason wrote: > > On Thu, Apr 07 2022, Derrick Stolee wrote: >> Ævar: I'm still waiting on the full version of this. While you >> updated [1] your original RFC [2], it was incomplete. I am still >> looking forward to seeing your full vision of how it works with >> incremental fetch and how your optimizations to download only the >> headers of the bundles will work. >> >> [1] [RFC PATCH v2 00/13] bundle-uri: a "dumb CDN" for git >> https://lore.kernel.org/git/RFC-cover-v2-00.13-00000000000-20220311T155841Z-avarab@gmail.com/ >> >> [2] [RFC PATCH 00/13] Add bundle-uri: resumably clones, static "dumb" CDN etc. >> https://lore.kernel.org/git/RFC-cover-00.13-0000000000-20210805T150534Z-avarab@gmail.com/ >> >>> 2. Ævar will then do a round of taking both series and combining >>> them in a way that allows the union of possible functionality >>> to work. >> >> Or perhaps you are jumping straight to this part? > > Yeah, that was part of it... > >>> 3. As these things come out, I will make it a priority to read the >>> patches and provide feedback focusing on high-level concepts >>> and ways we can split the future, non-RFC series into chunks >>> that provide incremental functionality while keeping review >>> easier than reading the whole series. >> >> I'm still looking forward to seeing progress in this area. Please >> let me know what your plan is here. > > Hi. I'm sorry about the delay, I ran into various life/software things, > and found that this topic required a lot of continuous "sit down for a > day and work on it" attention from me v.s. some other topics where I'd > deal with interruption better. > > Then I was hoping that the merger of your bundle.c changes would come a > bit earlier before the rc, but they pretty much coincided, and since the > rc dropped I've been hesitant to send a very large topic to the list > (c.f. e.g. [1]). > > Maybe I should just bite the bullet and submit it anyway, what do you > think? > > 1. https://lore.kernel.org/git/nycvar.QRO.7.76.6.2204071407160.347@tvgsbejvaqbjf.bet/ We could err on the side of not distracting from the list. I tend to think the release phase can be a good time to talk about an RFC, since Junio can ignore the thread until it is actually ready for full review. At minimum, I appreciate knowing your progress. I can be patient for a few more weeks as long as I have confidence that it will be shared as soon as external blockers are cleared. Thanks! -Stolee
Derrick Stolee <derrickstolee@github.com> writes: > We could err on the side of not distracting from the list. I tend to > think the release phase can be a good time to talk about an RFC, since Yes, I think that it is a good thing to do. I'd need to be careful not to miss updates and issues critical to the upcoming release but a large topic that clearly marks itself as RFC would not get in the way, I would think. > Junio can ignore the thread until it is actually ready for full review. It is true for both during -rc freeze and outside ;-)