Message ID | RFC-patch-13.13-1e657ed27a-20210805T150534Z-avarab@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Add bundle-uri: resumably clones, static "dumb" CDN etc. | expand |
On 2021-08-05 at 15:07:29, Ævar Arnfjörð Bjarmason wrote: > Add a design doc for the bundle-uri protocol extension to go along > with the packfile-uri extension added in cd8402e0fd8 (Documentation: > add Packfile URIs design doc, 2020-06-10). > > Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> > --- > Documentation/technical/bundle-uri.txt | 119 ++++++++++++++++++++++++ > Documentation/technical/protocol-v2.txt | 5 + > 2 files changed, 124 insertions(+) > create mode 100644 Documentation/technical/bundle-uri.txt > > diff --git a/Documentation/technical/bundle-uri.txt b/Documentation/technical/bundle-uri.txt > new file mode 100644 > index 0000000000..5ae9a15eaf > --- /dev/null > +++ b/Documentation/technical/bundle-uri.txt > @@ -0,0 +1,119 @@ > +Bundle URI Design Notes > +======================= > + > +Protocol > +-------- > + > +See `bundle-uri` in the link:protocol-v2.html[protocol-v2] > +documentation for a discussion of the bundle-uri command, and the > +expectations of clients and servers. > + > +This document is a a more general discussion of how the `bundle-uri` > +command fits in with the rest of the git ecosystem, its design goals > +and non-goals, comparison to alternatives etc. > + > +Comparison with Packfile URIs > +----------------------------- > + > +There is a similar "Packfile URIs" facility, see the > +link:packfile-uri.html[packfile-uri] documentation for details. > + > +The Packfile URIs facility requires a much closer cooperation between > +CDN and server than the bundle URI facility. > + > +I.e. the server MUST know what objects exist in the packfile URI it's > +pointing to, as well as its pack checksum. Failure to do so will not > +only result in a client error (the packfile hash won't match), but > +even if it got past that would likely result in a corrupt repository > +with tips pointing to unreachable objects. > + > +By comparison the bundle URIs are meant to be a "dumb" solution > +friendly to e.g. having a weekly cronjob take a snapshot of a git > +repository, that snapshot being uploaded to a network of FTP mirrors > +(which may be inconsistent or out of date). > + > +The server does not need to know what state the side-channel download > +is at, because the client will first validate it, and then optionally > +negotiate with the server using what it discovers there. > + > +Using the local `transfer.injectBundleURI` configuration variable (see > +linkgit:git-config[1]) the `bundle-uri` mechanism doesn't even need > +the server to support it. One thing I'm not seeing with this doc that I brought up during the packfile URI discussion is that HTTPS is broken for a decent number of Git users, and for them SSH is the only viable option. This is true for users of certain antivirus programs on Windows, as well as people who have certain corporate proxies in their workplace. For those people, as soon as the server offers a bundle URI, their connection will stop working. I know that you're probably thinking, "Gee, how often does that happen?" but judging by the number of people on StackOverflow, this is actually very common. The antivirus programs that break Git are actually not uncommon and they are widely deployed on corporate machines, plus the fact that lots of companies sell TLS intercepting proxies, which are almost always broken in this way. Many of these users don't even know what's going on, so they simply lack the knowledge to take any action or ask their network administrator for a fix. For them, HTTPS just doesn't work with Git, while it does for a web browser. So we will probably want to make this behavior opt-in with a config option for SSH, or just not available for SSH at all, so that we don't magically break users on upgrade who are relying on the SSH protocol not using HTTPS under the hood[0], especially the users who won't even know what's wrong. [0] I can't tell you how many times users have complained about the Git LFS SSH protocol also using HTTPS implicitly.
On Tue, Aug 24 2021, brian m. carlson wrote: > [[PGP Signed Part:Undecided]] > On 2021-08-05 at 15:07:29, Ævar Arnfjörð Bjarmason wrote: >> Add a design doc for the bundle-uri protocol extension to go along >> with the packfile-uri extension added in cd8402e0fd8 (Documentation: >> add Packfile URIs design doc, 2020-06-10). >> >> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> >> --- >> Documentation/technical/bundle-uri.txt | 119 ++++++++++++++++++++++++ >> Documentation/technical/protocol-v2.txt | 5 + >> 2 files changed, 124 insertions(+) >> create mode 100644 Documentation/technical/bundle-uri.txt >> >> diff --git a/Documentation/technical/bundle-uri.txt b/Documentation/technical/bundle-uri.txt >> new file mode 100644 >> index 0000000000..5ae9a15eaf >> --- /dev/null >> +++ b/Documentation/technical/bundle-uri.txt >> @@ -0,0 +1,119 @@ >> +Bundle URI Design Notes >> +======================= >> + >> +Protocol >> +-------- >> + >> +See `bundle-uri` in the link:protocol-v2.html[protocol-v2] >> +documentation for a discussion of the bundle-uri command, and the >> +expectations of clients and servers. >> + >> +This document is a a more general discussion of how the `bundle-uri` >> +command fits in with the rest of the git ecosystem, its design goals >> +and non-goals, comparison to alternatives etc. >> + >> +Comparison with Packfile URIs >> +----------------------------- >> + >> +There is a similar "Packfile URIs" facility, see the >> +link:packfile-uri.html[packfile-uri] documentation for details. >> + >> +The Packfile URIs facility requires a much closer cooperation between >> +CDN and server than the bundle URI facility. >> + >> +I.e. the server MUST know what objects exist in the packfile URI it's >> +pointing to, as well as its pack checksum. Failure to do so will not >> +only result in a client error (the packfile hash won't match), but >> +even if it got past that would likely result in a corrupt repository >> +with tips pointing to unreachable objects. >> + >> +By comparison the bundle URIs are meant to be a "dumb" solution >> +friendly to e.g. having a weekly cronjob take a snapshot of a git >> +repository, that snapshot being uploaded to a network of FTP mirrors >> +(which may be inconsistent or out of date). >> + >> +The server does not need to know what state the side-channel download >> +is at, because the client will first validate it, and then optionally >> +negotiate with the server using what it discovers there. >> + >> +Using the local `transfer.injectBundleURI` configuration variable (see >> +linkgit:git-config[1]) the `bundle-uri` mechanism doesn't even need >> +the server to support it. > > One thing I'm not seeing with this doc that I brought up during the > packfile URI discussion is that HTTPS is broken for a decent number of > Git users, and for them SSH is the only viable option. This is true for > users of certain antivirus programs on Windows, as well as people who > have certain corporate proxies in their workplace. For those people, as > soon as the server offers a bundle URI, their connection will stop > working. > > I know that you're probably thinking, "Gee, how often does that happen?" > but judging by the number of people on StackOverflow, this is actually > very common. The antivirus programs that break Git are actually not > uncommon and they are widely deployed on corporate machines, plus the > fact that lots of companies sell TLS intercepting proxies, which are > almost always broken in this way. Many of these users don't even know > what's going on, so they simply lack the knowledge to take any action or > ask their network administrator for a fix. For them, HTTPS just doesn't > work with Git, while it does for a web browser. > > So we will probably want to make this behavior opt-in with a config > option for SSH, or just not available for SSH at all, so that we don't > magically break users on upgrade who are relying on the SSH protocol not > using HTTPS under the hood[0], especially the users who won't even know > what's wrong. Good point, I think this sort of thing will be a non-issue with bundle-uri, because in general it handles any sort of network / fetching / validation failures gracefully. I.e. with these patches you can point at a bad URI, broken non-bundle etc. We'll just move on to a full clone. Whereas with packfile-uri the inline PACK and the URI are things you MUST both get, as the provided packfile-uri completes the incomplete inline PACK. So once you say that you're willing to accept things over https, you MUST be able to get that thing. We'll still waste a bit of time trying though with bundle-uri. But I think for the common case of bundle-uri helping more than not (which presumably, the server operator has tested), it's a better default to try https:// even if the main dialog is over ssh://.
diff --git a/Documentation/technical/bundle-uri.txt b/Documentation/technical/bundle-uri.txt new file mode 100644 index 0000000000..5ae9a15eaf --- /dev/null +++ b/Documentation/technical/bundle-uri.txt @@ -0,0 +1,119 @@ +Bundle URI Design Notes +======================= + +Protocol +-------- + +See `bundle-uri` in the link:protocol-v2.html[protocol-v2] +documentation for a discussion of the bundle-uri command, and the +expectations of clients and servers. + +This document is a a more general discussion of how the `bundle-uri` +command fits in with the rest of the git ecosystem, its design goals +and non-goals, comparison to alternatives etc. + +Comparison with Packfile URIs +----------------------------- + +There is a similar "Packfile URIs" facility, see the +link:packfile-uri.html[packfile-uri] documentation for details. + +The Packfile URIs facility requires a much closer cooperation between +CDN and server than the bundle URI facility. + +I.e. the server MUST know what objects exist in the packfile URI it's +pointing to, as well as its pack checksum. Failure to do so will not +only result in a client error (the packfile hash won't match), but +even if it got past that would likely result in a corrupt repository +with tips pointing to unreachable objects. + +By comparison the bundle URIs are meant to be a "dumb" solution +friendly to e.g. having a weekly cronjob take a snapshot of a git +repository, that snapshot being uploaded to a network of FTP mirrors +(which may be inconsistent or out of date). + +The server does not need to know what state the side-channel download +is at, because the client will first validate it, and then optionally +negotiate with the server using what it discovers there. + +Using the local `transfer.injectBundleURI` configuration variable (see +linkgit:git-config[1]) the `bundle-uri` mechanism doesn't even need +the server to support it. + +Security +-------- + +The omission of something equivalent to the packfile <OID> in the +Packfile URIs protocol is intentional, as having it would require +closer server and CDN cooperation than some server operators are +comfortable with. + +Furthermore, it is not needed for security. The server doesn't need to +trust its CDN. If the server were to attempt to send harmful content +to the client, the result would not validate against the server's +provided ref tips gotten from ls-refs. + +The lack of a such a hash does leave room open to a malicious CDN +operation to be annoying however. E.g. they could inject irrelevant +objects into the bundles, which would enlarge the downloaded +repository until a "gc" would eventually throw them away. + +In practice the lack of a hash is considered to be a non-issue. Anyone +concerned about such security problems between their server and their +CDN is going to be pointing to a "https" URL under their control. For +a client the "threat" is the same as without bundle-uri, i.e. a server +is free to be annoying today and send you garbage in the PACK that you +won't need. + +Security issues peculiar to bundle-uri +-------------------------------------- + +Both packfile-uri and bundle-uri use the `fetch.uriProtocols` +configuration variable (see linkgit:git-config[1]) to configure which +protocols they support. + +By default this is set to "http,https" for both, but bundle-uri +supports adding "file" to that list. The server can thus point to +"file://" URIs it expects the client to have access to. + +This is primarily intended for use with the `transfer.injectBundleURI` +mechanism, but can also be useful e.g. in a centralized environment +where a server might point to a "file:///mnt/bundles/big-repo.bdl" it +knows to be mounted on the local machine (e.g. a racked server), +points to it in its "bundle-uri" response. + +The client can then add "file" to the `fetch.uriProtocols` list to +obey such responses. That does mean that a malicious server can point +to any arbitrary file on the local machine. The threat of this is +considered minimal, since anyone adding `file` to `fetch.uriProtocols` +likely knows what they're doing and controls both ands, and the worst +they can do is make a curl(1) pipe garbage into "index-pack" (which +will likely promptly die on the non-PACK-file). + +Security comparison with packfile-uri +------------------------------------- + +The initial implementation of packfile-uri needed special adjusting to +run "git fsck" on incoming .gitmodules files, this was to deal with a +general security issue in git, See CVE-2018-17456. + +The current packfile-uri mechanism requires special handling around +"fsck" to do such cross-PACK fsck's, this is because it first indexes +the "incremental" PACK, and then any PACK(s) provided via +packfile-uri, before finally doing a full connectivity check. + +This is effect doing the fsck one might do via "clone" and "fetch" in +reverse, or the equivalent of starting with the incremental "fetch", +followed by the "clone". + +Since the packfile-uri mechanism can result in the .gitmodules blob +referenced by such a "fetch" to be in the pack for the "clone" the +fetch-pack process needs to keep state between the indexing of +multiple packs, to remember to fsck the blob (via the "clone") later +after seeing it in a tree (from the "fetch). + +There are no known security issues with the way packfile-uri does +this, but since bundle-uri effectively emulates what a which doesn't +support either "bundle-uri" or "packfile-uri" would do on clone/fetch, +any future security issues peculiar to the packfile-uri approach are +unlikely to be shared by it. diff --git a/Documentation/technical/protocol-v2.txt b/Documentation/technical/protocol-v2.txt index d10d5e9ef6..5536ea4b7e 100644 --- a/Documentation/technical/protocol-v2.txt +++ b/Documentation/technical/protocol-v2.txt @@ -696,3 +696,8 @@ intended to support future features such as: they've got that OID already (for multi-tips the client would need to fetch the bundle, or do e.g. HTTP range requests to get its header). + +bundle-uri SEE ALSO +^^^^^^^^^^^^^^^^^^^ + +See the link:bundle-uri.html[Bundle URI Design Notes] for more.
Add a design doc for the bundle-uri protocol extension to go along with the packfile-uri extension added in cd8402e0fd8 (Documentation: add Packfile URIs design doc, 2020-06-10). Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> --- Documentation/technical/bundle-uri.txt | 119 ++++++++++++++++++++++++ Documentation/technical/protocol-v2.txt | 5 + 2 files changed, 124 insertions(+) create mode 100644 Documentation/technical/bundle-uri.txt