Message ID | 6250c13897e3cc01f247d80c148cf8dc5e7f3ad0.1604362701.git.jonathantanmy@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | "Push" protocol change proposal: user-specified base | expand |
(+cc: Elijah Newren and Derrick Stolee because of [1]) Hi, Jonathan Tan wrote: > Includes protocol documentation and a design document. Thanks for writing this. > Signed-off-by: Jonathan Tan <jonathantanmy@google.com> > --- > Documentation/technical/pack-protocol.txt | 10 ++-- > Documentation/technical/push-with-base.txt | 61 ++++++++++++++++++++++ > 2 files changed, 68 insertions(+), 3 deletions(-) > create mode 100644 Documentation/technical/push-with-base.txt > > diff --git a/Documentation/technical/pack-protocol.txt b/Documentation/technical/pack-protocol.txt > index e13a2c064d..0485616701 100644 > --- a/Documentation/technical/pack-protocol.txt > +++ b/Documentation/technical/pack-protocol.txt > @@ -59,9 +59,9 @@ Parameters", and are supported by the Git, SSH, and HTTP protocols. > Each Extra Parameter takes the form of `<key>=<value>` or `<key>`. > > Servers that receive any such Extra Parameters MUST ignore all > -unrecognized keys. Currently, the only Extra Parameter recognized is > -"version" with a value of '1' or '2'. See protocol-v2.txt for more > -information on protocol version 2. > +unrecognized keys. Currently, the only Extra Parameters recognized are > +"version" with a value of '1' or '2' and, for push, "base" with an OID. See > +protocol-v2.txt for more information on protocol version 2. It's nice to see an example of extra parameters being useful in protocol v0 as well. :) > > Git Transport > ------------- > @@ -506,6 +506,10 @@ real difference is that the capability listing is different - the only > possible values are 'report-status', 'report-status-v2', 'delete-refs', > 'ofs-delta', 'atomic' and 'push-options'. > > +If a "base=<oid>" Extra Parameter was sent by the client, and the > +server recognizes that object, the server MAY send "<oid> .have" in > +lieu of all the reference obj-ids and names. > + Can this only appear once, or is it permitted to pass multiple oids this way? I'm also curious about what Junio asked about elsewhere in the thread: for cases that benefit from more complex negotiation than the client proposing a particular oid, what comes next in this roadmap? I like that this is an optional feature so we could clean up later by removing support for it; do we expect to? If we do expect to, is there anything we could do to minimize the impact of the feature later (e.g. using a less short-and-sweet key name than `base`, maybe)? > Reference Update Request and Packfile Transfer > ---------------------------------------------- > > diff --git a/Documentation/technical/push-with-base.txt b/Documentation/technical/push-with-base.txt > new file mode 100644 > index 0000000000..d56aa7f900 > --- /dev/null > +++ b/Documentation/technical/push-with-base.txt > @@ -0,0 +1,61 @@ > +Push with base design notes > +=========================== > + > +This feature allows clients, when pushing, to indicate that a > +certain object is an ancestor of all pushed commits and that they > +believe that the server knows of this object. This in turn allows > +servers to send an abbreviated ref advertisement containing only that > +object. > + > +Besides bandwidth savings, this also ensures that the ref > +advertisement contains information relevant to the client. For > +example, at least one project (Gerrit [1]) have included workarounds > +to send ancestors of refs that move often, even though the ref > +advertisement is only meant to contain refs. > + > +[1] https://gerrit.googlesource.com/gerrit/+/refs/heads/master/java/com/google/gerrit/server/git/receive/HackPushNegotiateHook.java > + > + > +Design overview > +--------------- > + > +The "base" being sent is sent as an Extra Parameter, supported in the > +git://, ssh://, and http(s):// protocols. By sending it as an Extra > +Parameter, the server is aware of this parameter before it generates > +the ref advertisement, thus making it able to tailor the ref > +advertisement accordingly. Sending it as an Extra Parameter also makes > +this protocol backwards-compatible, as servers will ignore any Extra > +Parameters they do not understand. (The push will then proceed as if > +neither party had this feature.) > + > +The remote helper protocol has been extended to support the > +"push-base" capability and an option of the same name. When a remote > +helper advertises this capability, it thus indicates that it supports > +this option. Git then will send "option push-base" if the user > +specifies it when invoking "git push". > + > +The remote-curl remote helper bundled with Git has been updated to > +support this capability and option. Since this is an optional remote helper capability, it could be removed later. Good (but the capability and option names would still need to be reserved). > + > + > +Future work > +----------- > + > +In the future, we might want a way to automatically determine the base > +instead of always having the user specify it. However, this does not > +make obsolete any of the current work - once the base is automatically > +determined, we still need this protocol to communicate it to the > +server, and allowing the user to specify the base manually is still > +useful. Makes sense. I think this isn't "might", but "would"; most users that do not know why their push is slow wouldn't know to use this feature. > + > + > +Alternatives > +------------ > + > +- Making a more substantial protocol change like "fetch" protocol v2. > + This would eliminate the need for some of the remote helper updates; > + as part of the protocol change, the protocol could be made to > + support "stateless-connect" and thus no remote helper updates (like > + "push-base") would be needed. For "fetch", the protocol change has > + enabled features like wanted-refs and packfile-uris, but I do not > + have any similar ideas in mind for "push". I think you're saying that we don't need a "push" v2 because v0 already has what a user would want. Git protocol v2 for fetch brought two major changes: - it changed the response for the initial request, allowing abbreviating the ref advertisement at last - it defined a structure for requests and responses, simplifying the addition of later protocol improvements. In particular, because the initial response is a capability advertisement, it allows changing the ref advertisement format more in the future. Both of those changes would be valuable for push. The ref advertisements are large, and matching the structure of commands used by fetchv2 would make debugging easier. There are some specific applications I'm interested in after that (e.g., pushing symrefs), but the fundamental extensibility improvement is larger than any particular application I could think of. That said, I'm not against experimenting with extra parameters before we go there, as a way of getting more information about what a workable negotiation for push looks like. The push ref advertisement serves two roles: it advertises commits, to serve as negotiation, and it tells the client the current values of refs they may want to update, so they can send an appropriate compare-and-swap command for a force-push. If we have a workable replacement negotiation system, then the other role can be replaced with something simpler. For example: - listing ref updates before the pack, to allow the server to bail out early if they would be rejected, and - allowing an explicit force update of a ref instead of requiring the client to compare-and-swap if they don't care about the old value (but taking care to still make the client "fetch first" when appropriate!) Thanks, Jonathan [1] https://lore.kernel.org/git/CABPp-BEswHLhymQ1_07g3qqu=7kFR3eQyAHR0qMgSvi6THy=zQ@mail.gmail.com/
On 11/2/2020 7:26 PM, Jonathan Tan wrote: > Includes protocol documentation and a design document. > Servers that receive any such Extra Parameters MUST ignore all > -unrecognized keys. Currently, the only Extra Parameter recognized is > -"version" with a value of '1' or '2'. See protocol-v2.txt for more > -information on protocol version 2. > +unrecognized keys. Currently, the only Extra Parameters recognized are > +"version" with a value of '1' or '2' and, for push, "base" with an OID. See > +protocol-v2.txt for more information on protocol version 2. > > Git Transport > ------------- > @@ -506,6 +506,10 @@ real difference is that the capability listing is different - the only > possible values are 'report-status', 'report-status-v2', 'delete-refs', > 'ofs-delta', 'atomic' and 'push-options'. > > +If a "base=<oid>" Extra Parameter was sent by the client, and the > +server recognizes that object, the server MAY send "<oid> .have" in > +lieu of all the reference obj-ids and names. > + nit: no comma before "and the server recognizes" as these are both conditions of the "if". My preference is also to include a "then" before the result, especially with compound conditions like this. Perhaps also make it more active voice while we are here? + If the client sent a "base=<oid>" Extra Parameter and the server + recognizes that object, then the server MAY send "<oid> .have" in + lieu of all the reference object ids and names. > +Push with base design notes > +=========================== Perhaps start with a brief problem statement? + If a client wishes to push data to a server without first running + `git fetch`, then they might not have local copies of the objects + at the server's ref tips. The client cannot compute an adequate + common base and will send more objects than necessary. Then, + The "push with base" feature allows... > +This feature allows clients, when pushing, to indicate that a > +certain object is an ancestor of all pushed commits and that they > +believe that the server knows of this object. This in turn allows > +servers to send an abbreviated ref advertisement containing only that > +object. Why only one base? Could there be value in sending multiple possible bases and the server can report the subset that they know about? > +Besides bandwidth savings, It seems you are underselling the bandwidth savings here, because we save in both the ref advertisement and the push pack size. It might be good to call out both metrics so we can see that the ref improvement works even for cases where the client recently fetched and can figure out the proper base on its own! > this also ensures that the ref > +advertisement contains information relevant to the client. For > +example, at least one project (Gerrit [1]) have included workarounds > +to send ancestors of refs that move often, even though the ref > +advertisement is only meant to contain refs. > + > +[1] https://gerrit.googlesource.com/gerrit/+/refs/heads/master/java/com/google/gerrit/server/git/receive/HackPushNegotiateHook.java > + > + > +Design overview > +--------------- > + > +The "base" being sent is sent as an Extra Parameter, supported in the > +git://, ssh://, and http(s):// protocols. By sending it as an Extra > +Parameter, the server is aware of this parameter before it generates > +the ref advertisement, thus making it able to tailor the ref > +advertisement accordingly. Sending it as an Extra Parameter also makes > +this protocol backwards-compatible, as servers will ignore any Extra > +Parameters they do not understand. (The push will then proceed as if > +neither party had this feature.) I'm not familiar enough with the extra parameter logic to know if we allow multi-valued extra parameters. It would be good to relax the "base" parameter to either allow multiple instances of the parameter or to have something like a comma-separated list of OIDs be allowed. This might be particularly important for multiple refs being pushed at the same time. > +The remote helper protocol has been extended to support the > +"push-base" capability and an option of the same name. When a remote > +helper advertises this capability, it thus indicates that it supports > +this option. Git then will send "option push-base" if the user > +specifies it when invoking "git push". > + > +The remote-curl remote helper bundled with Git has been updated to > +support this capability and option.> + > + > +Future work > +----------- > + > +In the future, we might want a way to automatically determine the base > +instead of always having the user specify it. However, this does not > +make obsolete any of the current work - once the base is automatically > +determined, we still need this protocol to communicate it to the > +server, and allowing the user to specify the base manually is still > +useful. It is appropriate to separate the protocol ability from the automatic computation on the client. It would be nice to know how you expect that to work. For the single-ref push case, my default instinct would be to find a merge-base between that ref and all of the remote refs matching the remote we are pushing to. Bonus points if we find _all_ (maximal) merge-bases! The equivalent for the multi-ref push might be to find the boundary commits from something like the following command: git rev-list --boundary \ --not refs/remotes/origin/A \ --not refs/remotes/origin/B \ --not refs/remotes/origin/C \ refs/heads/toPush/1 \ refs/heads/toPush/2 \ refs/heads/toPush/3 But you might have something better in mind. > + > + > +Alternatives > +------------ Odd to have a single bullet in a bulleted list. > +- Making a more substantial protocol change like "fetch" protocol v2. This isn't a full sentence. Perhaps: "One considered approach was to introduce a multi-phase negotiation step similar to how 'git fetch' operates." > + This would eliminate the need for some of the remote helper updates; > + as part of the protocol change, the protocol could be made to > + support "stateless-connect" and thus no remote helper updates (like > + "push-base") would be needed. For "fetch", the protocol change has > + enabled features like wanted-refs and packfile-uris, but I do not > + have any similar ideas in mind for "push". I think that you should mention that a negotiation phase here would use significantly more bandwidth to do the ref advertisements on both ends with only a rare case that the negotiated bases are better than the locally-computed bases. Everything is about tradeoffs here! I'm generally excited about the opportunities here. I'd love to see some measurements for reduced ref advertisements and reduced object counts in the pushed pack. Thanks, -Stolee
On Mon, Nov 02, 2020 at 09:35:54PM -0800, Jonathan Nieder wrote: > I think you're saying that we don't need a "push" v2 because v0 > already has what a user would want. > > Git protocol v2 for fetch brought two major changes: > > - it changed the response for the initial request, allowing > abbreviating the ref advertisement at last > > - it defined a structure for requests and responses, simplifying the > addition of later protocol improvements. In particular, because the > initial response is a capability advertisement, it allows changing > the ref advertisement format more in the future. > > Both of those changes would be valuable for push. The ref > advertisements are large, and matching the structure of commands used > by fetchv2 would make debugging easier. > > There are some specific applications I'm interested in after that > (e.g., pushing symrefs), but the fundamental extensibility improvement > is larger than any particular application I could think of. You pretty much summed up what I was going to respond. :) But I'd go further here... > That said, I'm not against experimenting with extra parameters before > we go there, as a way of getting more information about what a > workable negotiation for push looks like. I'd prefer to avoid doing this as an extra parameter for a few reasons: - once it's in a released version, it's much harder for us to take it away - the extra parameters area is a hack that helped us bootstrap v2. We could probably use the same hack to bootstrap v3, etc. But it has limitations for stuffing in arbitrary data. An obvious one is size. We can transmit a single base, but would be limited if we wanted to be able to send multiple. We already ran into this once with the "symref=foo:bar" capability overflowing pkt-line limits. Here I'm not even sure what the limits might be (it's subject to things like how big an HTTP header a proxy will pass, or how large an environment variable an ssh implementation supports) - it potentially pushes more data/work outside of the git protocol itself. E.g., web servers have to translate Git-Protocol headers into the GIT_PROTOCOL environment for v2. I guess this new field works in our tests because we copy the header's value entirely in our apache.conf. But I wonder how many systems in the wild may only work if it contains "version=2". -Peff
Jeff King <peff@peff.net> writes: > On Mon, Nov 02, 2020 at 09:35:54PM -0800, Jonathan Nieder wrote: > >> I think you're saying that we don't need a "push" v2 because v0 >> already has what a user would want. >> >> Git protocol v2 for fetch brought two major changes: >> >> - it changed the response for the initial request, allowing >> abbreviating the ref advertisement at last >> >> - it defined a structure for requests and responses, simplifying the >> addition of later protocol improvements. In particular, because the >> initial response is a capability advertisement, it allows changing >> the ref advertisement format more in the future. >> >> Both of those changes would be valuable for push. The ref >> advertisements are large, and matching the structure of commands used >> by fetchv2 would make debugging easier. >> >> There are some specific applications I'm interested in after that >> (e.g., pushing symrefs), but the fundamental extensibility improvement >> is larger than any particular application I could think of. > > You pretty much summed up what I was going to respond. :) > > But I'd go further here... > >> That said, I'm not against experimenting with extra parameters before >> we go there, as a way of getting more information about what a >> workable negotiation for push looks like. > > I'd prefer to avoid doing this as an extra parameter for a few reasons: > > - once it's in a released version, it's much harder for us to take it > away > > - the extra parameters area is a hack that helped us bootstrap v2. We > could probably use the same hack to bootstrap v3, etc. But it has > limitations for stuffing in arbitrary data. An obvious one is size. > We can transmit a single base, but would be limited if we wanted to > be able to send multiple. We already ran into this once with the > "symref=foo:bar" capability overflowing pkt-line limits. Here I'm > not even sure what the limits might be (it's subject to things like > how big an HTTP header a proxy will pass, or how large an > environment variable an ssh implementation supports) > > - it potentially pushes more data/work outside of the git protocol > itself. E.g., web servers have to translate Git-Protocol headers > into the GIT_PROTOCOL environment for v2. I guess this new field > works in our tests because we copy the header's value entirely in > our apache.conf. But I wonder how many systems in the wild may only > work if it contains "version=2". I do not have much to add to what has been said so far, other than offering historical perspective. The single biggest reason why "fetch" has common ancestor discovery negotiation and "push" does not is because the design comes from the use case the inventor of Git and those worked on the early protocol wanted to support---you are pushing into your own repository you alone push into, your work is disseminated to others who fetch from your repository, and you get others' work by fetching from theirs. In such a world without a central server where everybody pushes into, by definition, a pusher knows all the objects that have ever been pushed into the receiving repository when running "git push". They are all objects that passed through the repository you are pushing from to the receiving repository in your previous pushes. The advertised ref(s) are expected to be known to the repository you are pusing from anyway, and if that is not the case, you would first fetch from there before force-pushing. Hence, when you push, there isn't much need to walk back from the tip of refs at the remote to discover common ancestor like we do for the fetch side in pre-central-server world. On the other hand, you expect that remote refs point at objects unknown to you when you fetch from your colleagues, so it is expected that you have to perform the common ancestor discovery negotiation. After 15 years, we live in a different world. People expect that a single repository at their hosting sites can be used as the central meeting point for the project, just like CVS/SVN servers were in older world. "git push" would need to accept that reality and start common ancestor discovery eventually.
> People expect that a single repository at their hosting sites can be > used as the central meeting point for the project, just like CVS/SVN > servers were in older world. "git push" would need to accept that > reality and start common ancestor discovery eventually. Thanks for your reply (and everyone else's). I was thinking that a more rudimentary form of the feature would suffice, since I wasn't expecting much more need in the future, but looks like this isn't the case. I'll be thinking of a more comprehensive idea.
On 11/9/20 2:56 PM, Jonathan Tan wrote: >> People expect that a single repository at their hosting sites can be >> used as the central meeting point for the project, just like CVS/SVN >> servers were in older world. "git push" would need to accept that >> reality and start common ancestor discovery eventually. > > Thanks for your reply (and everyone else's). I was thinking that a more > rudimentary form of the feature would suffice, since I wasn't expecting > much more need in the future, but looks like this isn't the case. I'll > be thinking of a more comprehensive idea. I think this "half round negotiation" idea you have has merit, and can get us 95% of the benefit that a multi-round negotiation would bring without those extra steps. My concerns with the current series is that it isn't fully ready for even that case. In my mind, a protocol change like this would need: 1. A top-to-bottom implementation that allows a user to opt-in to this new behavior with a config setting. 2. A demonstration of situations where this algorithm out-performs the existing algorithm (i.e. client is far behind server, but topic is a small change based on something in server's history) 3. A clear way to handle odd cases, such as multiple merge-bases. This leads to a change in how you are sending the data. Perhaps this "send multiple OIDs in a payload" is already half-way to implementing a full negotiation, and we might as well go all the way in the writing. I expect that sending all maximal merge-bases will be sufficient for the vast majority of cases, and so any multi-round negotiation process will almost always end after the client sends that data. Looking forward to your next version. Thanks, -Stolee
Jonathan Tan <jonathantanmy@google.com> writes: >> People expect that a single repository at their hosting sites can be >> used as the central meeting point for the project, just like CVS/SVN >> servers were in older world. "git push" would need to accept that >> reality and start common ancestor discovery eventually. > > Thanks for your reply (and everyone else's). I was thinking that a more > rudimentary form of the feature would suffice, since I wasn't expecting > much more need in the future, but looks like this isn't the case. I'll > be thinking of a more comprehensive idea. I said "eventually", meaning that we may not have to solve it immediately, but judging from the need for ad-hoc workarounds like sending older commits that are not necessarily at the tip of anything from the receiving end as if they are tips and then another ad-hoc workaround like this one, it seems that the need is real. Would the earlier refactoring of the negotiation part into a separate negotiator module help, or did the refactor not remove the deep assumption that it is only about the fetch/upload-pack traffic and we need a design for push/receive-pack from scratch? Thanks.
On Mon, Nov 09, 2020 at 01:20:14PM -0800, Junio C Hamano wrote: > Would the earlier refactoring of the negotiation part into a > separate negotiator module help, or did the refactor not remove the > deep assumption that it is only about the fetch/upload-pack traffic > and we need a design for push/receive-pack from scratch? I haven't thought too deeply about this problem area, but there is one challenge I'd expect with reusing the existing negotiation protocol and code: the stateless side is flipped for http. In the fetch protocol, the client is stateful and makes a series of requests to a stateless server, each time saying "I want X, we agreed previously on Y, and here are some more options Z; let me know if that's enough to generate a pack". If we were to flip that around, the client must remain the stateful party, but it's no longer the receiver of the pack. For the receiver (i.e., the stateless server) to iterate through its history looking for a shard point, I think the message would have to be more like "last time you told me about X, and gave me cut-points Y to resume your traversal; tell me more about Y so that I can decide if we're ready to send a pack". -Peff
> On 11/9/20 2:56 PM, Jonathan Tan wrote: > >> People expect that a single repository at their hosting sites can be > >> used as the central meeting point for the project, just like CVS/SVN > >> servers were in older world. "git push" would need to accept that > >> reality and start common ancestor discovery eventually. > > > > Thanks for your reply (and everyone else's). I was thinking that a more > > rudimentary form of the feature would suffice, since I wasn't expecting > > much more need in the future, but looks like this isn't the case. I'll > > be thinking of a more comprehensive idea. > > I think this "half round negotiation" idea you have has merit, and can > get us 95% of the benefit that a multi-round negotiation would bring > without those extra steps. Thanks. > My concerns with the current series is that it isn't fully ready for > even that case. In my mind, a protocol change like this would need: > > 1. A top-to-bottom implementation that allows a user to opt-in to > this new behavior with a config setting. Makes sense. The client already needs to opt-in by specifying the base, but yes, an option needs to be added for the server. > 2. A demonstration of situations where this algorithm out-performs > the existing algorithm (i.e. client is far behind server, but > topic is a small change based on something in server's history) Noted. > 3. A clear way to handle odd cases, such as multiple merge-bases. > This leads to a change in how you are sending the data. > > Perhaps this "send multiple OIDs in a payload" is already half-way to > implementing a full negotiation, and we might as well go all the way > in the writing. I expect that sending all maximal merge-bases will be > sufficient for the vast majority of cases, and so any multi-round > negotiation process will almost always end after the client sends that > data. I think the ability to send multiple bases would work in this scenario, but this might run into concerns that other reviewers have brought up (specifically, the fact that we're using environment variables and HTTP headers to pass this information - and among other things, line length limits). So a more comprehensive solution might be needed, even if only for this reason. > Looking forward to your next version. Thanks.
> Jonathan Tan <jonathantanmy@google.com> writes: > > >> People expect that a single repository at their hosting sites can be > >> used as the central meeting point for the project, just like CVS/SVN > >> servers were in older world. "git push" would need to accept that > >> reality and start common ancestor discovery eventually. > > > > Thanks for your reply (and everyone else's). I was thinking that a more > > rudimentary form of the feature would suffice, since I wasn't expecting > > much more need in the future, but looks like this isn't the case. I'll > > be thinking of a more comprehensive idea. > > I said "eventually", meaning that we may not have to solve it > immediately, but judging from the need for ad-hoc workarounds like > sending older commits that are not necessarily at the tip of > anything from the receiving end as if they are tips and then another > ad-hoc workaround like this one, it seems that the need is real. That's true. And if we need a more comprehensive solution than putting bases as Extra Parameters (and don't think that base-in-Extra-Parameter will form a part of that comprehensive solution), then I do think that we should go for the comprehensive solution instead of making something obsolete that we will still need to support later. > Would the earlier refactoring of the negotiation part into a > separate negotiator module help, or did the refactor not remove the > deep assumption that it is only about the fetch/upload-pack traffic > and we need a design for push/receive-pack from scratch? The negotiator module might help - looking that the API, it takes a list of known commons (none in this case, because we want to skip the long ref advertisement) and a list of tips to check (for this case, the tips we want to push), and then the negotiation starts. So we might be able to use it in a push v2. Having said that, it might be possible to reuse the existing fetch v0/v2 protocol to perform this negotiation (preferably v2, so that we can skip the ref advertisement). We'll just need to add the opposite of the "no-done" capability to make sure that the server never sends a packfile. Once we have found the base commit(s) either through push v2 or through fetch+opposite-of-no-done, we'll need to send the packfile somehow. My proposal here could do it, although then we might run into the problems Peff describes about Extra Parameters [1]. If we don't use Extra Parameters, we would probably need a push v2, but we might then run into similar problems to what we had during the fetch v2 migration (e.g. unexpected subtly different behaviors). [1] https://lore.kernel.org/git/20201103151859.GA444466@coredump.intra.peff.net/
diff --git a/Documentation/technical/pack-protocol.txt b/Documentation/technical/pack-protocol.txt index e13a2c064d..0485616701 100644 --- a/Documentation/technical/pack-protocol.txt +++ b/Documentation/technical/pack-protocol.txt @@ -59,9 +59,9 @@ Parameters", and are supported by the Git, SSH, and HTTP protocols. Each Extra Parameter takes the form of `<key>=<value>` or `<key>`. Servers that receive any such Extra Parameters MUST ignore all -unrecognized keys. Currently, the only Extra Parameter recognized is -"version" with a value of '1' or '2'. See protocol-v2.txt for more -information on protocol version 2. +unrecognized keys. Currently, the only Extra Parameters recognized are +"version" with a value of '1' or '2' and, for push, "base" with an OID. See +protocol-v2.txt for more information on protocol version 2. Git Transport ------------- @@ -506,6 +506,10 @@ real difference is that the capability listing is different - the only possible values are 'report-status', 'report-status-v2', 'delete-refs', 'ofs-delta', 'atomic' and 'push-options'. +If a "base=<oid>" Extra Parameter was sent by the client, and the +server recognizes that object, the server MAY send "<oid> .have" in +lieu of all the reference obj-ids and names. + Reference Update Request and Packfile Transfer ---------------------------------------------- diff --git a/Documentation/technical/push-with-base.txt b/Documentation/technical/push-with-base.txt new file mode 100644 index 0000000000..d56aa7f900 --- /dev/null +++ b/Documentation/technical/push-with-base.txt @@ -0,0 +1,61 @@ +Push with base design notes +=========================== + +This feature allows clients, when pushing, to indicate that a +certain object is an ancestor of all pushed commits and that they +believe that the server knows of this object. This in turn allows +servers to send an abbreviated ref advertisement containing only that +object. + +Besides bandwidth savings, this also ensures that the ref +advertisement contains information relevant to the client. For +example, at least one project (Gerrit [1]) have included workarounds +to send ancestors of refs that move often, even though the ref +advertisement is only meant to contain refs. + +[1] https://gerrit.googlesource.com/gerrit/+/refs/heads/master/java/com/google/gerrit/server/git/receive/HackPushNegotiateHook.java + + +Design overview +--------------- + +The "base" being sent is sent as an Extra Parameter, supported in the +git://, ssh://, and http(s):// protocols. By sending it as an Extra +Parameter, the server is aware of this parameter before it generates +the ref advertisement, thus making it able to tailor the ref +advertisement accordingly. Sending it as an Extra Parameter also makes +this protocol backwards-compatible, as servers will ignore any Extra +Parameters they do not understand. (The push will then proceed as if +neither party had this feature.) + +The remote helper protocol has been extended to support the +"push-base" capability and an option of the same name. When a remote +helper advertises this capability, it thus indicates that it supports +this option. Git then will send "option push-base" if the user +specifies it when invoking "git push". + +The remote-curl remote helper bundled with Git has been updated to +support this capability and option. + + +Future work +----------- + +In the future, we might want a way to automatically determine the base +instead of always having the user specify it. However, this does not +make obsolete any of the current work - once the base is automatically +determined, we still need this protocol to communicate it to the +server, and allowing the user to specify the base manually is still +useful. + + +Alternatives +------------ + +- Making a more substantial protocol change like "fetch" protocol v2. + This would eliminate the need for some of the remote helper updates; + as part of the protocol change, the protocol could be made to + support "stateless-connect" and thus no remote helper updates (like + "push-base") would be needed. For "fetch", the protocol change has + enabled features like wanted-refs and packfile-uris, but I do not + have any similar ideas in mind for "push".
Includes protocol documentation and a design document. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> --- Documentation/technical/pack-protocol.txt | 10 ++-- Documentation/technical/push-with-base.txt | 61 ++++++++++++++++++++++ 2 files changed, 68 insertions(+), 3 deletions(-) create mode 100644 Documentation/technical/push-with-base.txt