Message ID | pull.1897.git.git.1740489585344.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | bundle-uri: copy all bundle references ino the refs/bundle space | expand |
"Scott Chacon via GitGitGadget" <gitgitgadget@gmail.com> writes: > Furthermore, when I bundled just a tag (thinking it would have most > reachable objects) it completely failed to work because there were no > refs/heads/ available for negotiation - so it downloaded a huge file and > then still started from scratch on the fetch. Nice finding. > Now I'm only getting an extra 43k objects, so 1% of the original total, > and the entire operation is a bit faster as well. That makes all sense. > I'm not sure if there is a downside here, it seems clearly how you would > want the negotiation to go. It ends up with way more refs under > refs/bundle (now there is refs/bundle/origin/master, etc) but that's > being polluted by the head refs anyhow, right? I am not sure what you mean by "being polluted by the head refs anyhow", but we should be equipped to deal with a repository with tons of local branches, so having the comparable number of remote-tracking branches instead of local branches now exposed in refs/bundle/remotes/* hierarchy should work equally as well, or we would have something we need to fix. So in principle I do not see a problem with this approach. The mapping used to be "refs/heads/foo" to "refs/bundles/foo", but now "refs/heads/foo" is mapped to "refs/bundles/heads/foo", so that you would presumably be mapping "refs/remotes/origin/master" to "refs/bundles/remotes/origin/master", right? I hope existing users are *not* looking at their resulting refs/bundles/ hierarchy and somehow assuming the original mapping. This is not something this "fix" changes, but unbundle_all_bundles() apparently is prepared to handle more than one bundles. I wonder what happens when multiple bundles expose the same branch pointing at different objects? The way I read unbundle_from_file() is that a new one overwrites the previous value, so even though we may have all unbundled many bundles, the objects from earlier bundles may lose their anchors, subject to be garbage-collected. Imagine creating a bundle with two refs, refs/heads and refs/remotes, and append that bundle as the last bundle of a bunch of bundles full of local and remote-tracking branches, that have populated refs/bundles/ hierarchy with tons of refs. Now the last bundle is unbundled, and these two phoney refs would nuke everything that used to be under refs/bundles/heads/* and refs/bundles/remotes/* left by unpacking previous bundles, right? > Is this a reasonable change? This is mostly Stolee's design, IIRC, so I have CC'ed; the work is mostly from 53a50892 (bundle-uri: create basic file-copy logic, 2022-08-09) that is more than 2 years ago. > Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1897%2Fschacon%2Fsc-more-bundle-refs-v1 > Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1897/schacon/sc-more-bundle-refs-v1 > Pull-Request: https://github.com/git/git/pull/1897 > > bundle-uri.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/bundle-uri.c b/bundle-uri.c > index 744257c49c1..3371d56f4ce 100644 > --- a/bundle-uri.c > +++ b/bundle-uri.c > @@ -403,7 +403,7 @@ static int unbundle_from_file(struct repository *r, const char *file) > const char *branch_name; > int has_old; > > - if (!skip_prefix(refname->string, "refs/heads/", &branch_name)) > + if (!skip_prefix(refname->string, "refs/", &branch_name)) > continue; By skipping only "refs/", "branch_name" is no longer a branch name, which may be something we may want to fix, but if we were to address the "names from later bundles seem to overwrite names used by previous boundles, and unanchor objects obtained from them" problem, I suspect we'd rather want to create new and unique names there, without assuming or relying on that the names used by these bundles are reasonably unique, so this part of the code may have to change anyway, so we may not care too deeply at this point.
On 2/25/25 1:14 PM, Junio C Hamano wrote: > "Scott Chacon via GitGitGadget" <gitgitgadget@gmail.com> writes: > >> Furthermore, when I bundled just a tag (thinking it would have most >> reachable objects) it completely failed to work because there were no >> refs/heads/ available for negotiation - so it downloaded a huge file and >> then still started from scratch on the fetch. > > Nice finding. This is an interesting case that I had not considered in the original implementation. The intention of the design is to avoid having the bundle URI fetch changing tag refs, especially annotated tags. Those tag updates are expected to be advertised in the "git fetch" output. It would probably be best to peel the tag refs to a commit and then create a fake branch for the bundle. >> Now I'm only getting an extra 43k objects, so 1% of the original total, >> and the entire operation is a bit faster as well. > > That makes all sense. > >> I'm not sure if there is a downside here, it seems clearly how you would >> want the negotiation to go. It ends up with way more refs under >> refs/bundle (now there is refs/bundle/origin/master, etc) but that's >> being polluted by the head refs anyhow, right? > > I am not sure what you mean by "being polluted by the head refs > anyhow", but we should be equipped to deal with a repository with > tons of local branches, so having the comparable number of > remote-tracking branches instead of local branches now exposed in > refs/bundle/remotes/* hierarchy should work equally as well, or we > would have something we need to fix. So in principle I do not see > a problem with this approach. > > The mapping used to be "refs/heads/foo" to "refs/bundles/foo", but > now "refs/heads/foo" is mapped to "refs/bundles/heads/foo", so that > you would presumably be mapping "refs/remotes/origin/master" to > "refs/bundles/remotes/origin/master", right? I hope existing users > are *not* looking at their resulting refs/bundles/ hierarchy and > somehow assuming the original mapping. > > This is not something this "fix" changes, but unbundle_all_bundles() > apparently is prepared to handle more than one bundles. I wonder > what happens when multiple bundles expose the same branch pointing > at different objects? The way I read unbundle_from_file() is that > a new one overwrites the previous value, so even though we may have > all unbundled many bundles, the objects from earlier bundles may > lose their anchors, subject to be garbage-collected. > > Imagine creating a bundle with two refs, refs/heads and > refs/remotes, and append that bundle as the last bundle of a bunch > of bundles full of local and remote-tracking branches, that have > populated refs/bundles/ hierarchy with tons of refs. Now the last > bundle is unbundled, and these two phoney refs would nuke everything > that used to be under refs/bundles/heads/* and refs/bundles/remotes/* > left by unpacking previous bundles, right? > >> Is this a reasonable change? > > This is mostly Stolee's design, IIRC, so I have CC'ed; the work is > mostly from 53a50892 (bundle-uri: create basic file-copy logic, > 2022-08-09) that is more than 2 years ago. The biggest question I had (and tried to get ahead of on the PR) is the use of a test to demonstrate what kind of bundle files cause this issue. It would be important to demosntrate that the repo is still usable if "refs/bundles/tags/v1.0" exists and points to a tag object. >> - if (!skip_prefix(refname->string, "refs/heads/", &branch_name)) >> + if (!skip_prefix(refname->string, "refs/", &branch_name)) >> continue; So I'm OK with relaxing this to be more flexible, but I'm not sure why the bundles couldn't be created using "refs/heads/", possibly via changing the ref names during bundle creation. > By skipping only "refs/", "branch_name" is no longer a branch name, > which may be something we may want to fix, but if we were to address > the "names from later bundles seem to overwrite names used by > previous boundles, and unanchor objects obtained from them" problem, > I suspect we'd rather want to create new and unique names there, > without assuming or relying on that the names used by these bundles > are reasonably unique, so this part of the code may have to change > anyway, so we may not care too deeply at this point. Thanks, -Stolee
Hey, On Wed, Feb 26, 2025 at 12:36 AM Derrick Stolee <stolee@gmail.com> wrote: > > The intention of the design is to avoid having the bundle URI fetch > changing tag refs, especially annotated tags. Those tag updates are > expected to be advertised in the "git fetch" output. It would probably > be best to peel the tag refs to a commit and then create a fake branch > for the bundle. The issue for this and also for the other suggestion you have later on is that I'm not sure how this can be easily done with the bundle command. It seems like everyone would have to write some sort of script to create a special type of bundle so that all these objects are referenced in a way that makes the bundle-uri helper actually get most of the objects that are needed. Is there some option to rev-list that does this? Or are you saying it's better to write a script? > The biggest question I had (and tried to get ahead of on the PR) is > the use of a test to demonstrate what kind of bundle files cause this > issue. It would be important to demosntrate that the repo is still > usable if "refs/bundles/tags/v1.0" exists and points to a tag object. I have written a test and I'll submit the new series in a minute, but I'm not sure what you mean by 'usable' in this context. Is there a situation where Git gets mad if there are annotated tags that aren't under refs/tags? I have done these test clones and nothing bad seems to happen having them in refs/bundle/tags/v1.0 that I notice, but I don't know how to write a test that specifically verifies that. > So I'm OK with relaxing this to be more flexible, but I'm not sure > why the bundles couldn't be created using "refs/heads/", possibly via > changing the ref names during bundle creation. So same point here. I think the bundle-uri functionality isn't particularly effective if the creation of the bundle needs special scripts to create in a way that is expected. One other approach would be to add an option to `git bundle` that does this sanitization (unpeeling things into fake branch heads), like some `--bundle-for-uri`, but I feel like just using `--all` and having the clone handle it in the way I proposed might be much simpler and more usable. We could also immediately delete everything under `refs/bundle/tags` after the fetch if we don't like them there, but still having them be available for the fetch negotiation. I'll send a new series with the existing tests updated to look for `refs/bundle/heads/*` instead of `refs/bundle/*` and adding a very simple test to see that the tags were unpacked as the next step. Scott
Scott Chacon <schacon@gmail.com> writes: > Hey, > > On Wed, Feb 26, 2025 at 12:36 AM Derrick Stolee <stolee@gmail.com> wrote: >> >> The intention of the design is to avoid having the bundle URI fetch >> changing tag refs, especially annotated tags. Those tag updates are >> expected to be advertised in the "git fetch" output. It would probably >> be best to peel the tag refs to a commit and then create a fake branch >> for the bundle. I am not sure where that need to avoid including tags comes from. >> The biggest question I had (and tried to get ahead of on the PR) is >> the use of a test to demonstrate what kind of bundle files cause this >> issue. It would be important to demosntrate that the repo is still >> usable if "refs/bundles/tags/v1.0" exists and points to a tag object. > > I have written a test and I'll submit the new series in a minute, but > I'm not sure what you mean by 'usable' in this context. Is there a > situation where Git gets mad if there are annotated tags that aren't > under refs/tags? I do not know of any at least for a local consumption of these tags. > I have done these test clones and nothing bad seems to happen having > them in refs/bundle/tags/v1.0 that I notice, but I don't know how to > write a test that specifically verifies that. Can it be some brittleness Derrick is worried about auto-following of tags during future "git fetch"? You store a tag that a regular fetch may want to store at refs/tags/v1.0 in refs/bundles/tags/v1.0 taken from the bundle, and then a later fetch may advance the history based on you extracted from the bundle---without having to run an explicit "git fetch --tags" or "git fetch origin v1.0", would we ever obtain "refs/tags/v1.0" with only the usual auto-following when we have the same tag elsewhere? In any case, instead of me speculating, I'd prefer to hear from Derrick, who is a lot more familiar with the mechanism under discussion, what the issues are that we want to limit ourselves to local branches. Thanks.
On 3/3/25 12:12 PM, Junio C Hamano wrote: > Scott Chacon <schacon@gmail.com> writes: > >> Hey, >> >> On Wed, Feb 26, 2025 at 12:36 AM Derrick Stolee <stolee@gmail.com> wrote: >>> >>> The intention of the design is to avoid having the bundle URI fetch >>> changing tag refs, especially annotated tags. Those tag updates are >>> expected to be advertised in the "git fetch" output. It would probably >>> be best to peel the tag refs to a commit and then create a fake branch >>> for the bundle. > > I am not sure where that need to avoid including tags comes from. > >>> The biggest question I had (and tried to get ahead of on the PR) is >>> the use of a test to demonstrate what kind of bundle files cause this >>> issue. It would be important to demosntrate that the repo is still >>> usable if "refs/bundles/tags/v1.0" exists and points to a tag object. >> >> I have written a test and I'll submit the new series in a minute, but >> I'm not sure what you mean by 'usable' in this context. Is there a >> situation where Git gets mad if there are annotated tags that aren't >> under refs/tags? > > I do not know of any at least for a local consumption of these tags. These ideas about avoiding annotated tags outside of refs/tags/ is likely an invention of my own, and must not be a firm expectation of the Git tool. >> I have done these test clones and nothing bad seems to happen having >> them in refs/bundle/tags/v1.0 that I notice, but I don't know how to >> write a test that specifically verifies that. > > Can it be some brittleness Derrick is worried about auto-following > of tags during future "git fetch"? You store a tag that a regular > fetch may want to store at refs/tags/v1.0 in refs/bundles/tags/v1.0 > taken from the bundle, and then a later fetch may advance the > history based on you extracted from the bundle---without having to > run an explicit "git fetch --tags" or "git fetch origin v1.0", would > we ever obtain "refs/tags/v1.0" with only the usual auto-following > when we have the same tag elsewhere? > > In any case, instead of me speculating, I'd prefer to hear from > Derrick, who is a lot more familiar with the mechanism under > discussion, what the issues are that we want to limit ourselves to > local branches. My thinking here is similar to the prefetch maintenance task: we want users who run "git fetch [origin]" to see the refs that are being updated by that action in the normal foreground messages. This includes new tag messages, such as in my local copy of the Git repository: $ git fetch origin remote: Enumerating objects: 186, done. remote: Counting objects: 100% (168/168), done. remote: Compressing objects: 100% (51/51), done. remote: Total 186 (delta 120), reused 162 (delta 117), pack-reused 18 (from 2) Receiving objects: 100% (186/186), 118.04 KiB | 10.73 MiB/s, done. Resolving deltas: 100% (122/122), completed with 24 local objects. From github.com:git/git b838bf19389..db91954e186 master -> origin/master 2feabab25ac..627208d89de next -> origin/next + 9b2be6f7989...39dfbbf0521 seen -> origin/seen (forced update) 5fa232d8520..fb8899337a8 todo -> origin/todo * [new tag] v2.49.0-rc0 -> v2.49.0-rc0 As long as these messages are still appearing, then I'm fine with these annotated tags being added to the object database earlier by the bundle URI mechanism. Scott: You also asked about the intended design for bundle URIs and things, and the best places to look are Git's technical docs [1] and the bundle server reference implementation docs [2]. CC Victoria who implemented the reference implementation and wrote those docs. [1] https://github.com/git/git/blob/master/Documentation/technical/bundle-uri.adoc [2] https://github.com/git-ecosystem/git-bundle-server/blob/main/docs/technical/architecture.md Thanks, -Stolee
diff --git a/bundle-uri.c b/bundle-uri.c index 744257c49c1..3371d56f4ce 100644 --- a/bundle-uri.c +++ b/bundle-uri.c @@ -403,7 +403,7 @@ static int unbundle_from_file(struct repository *r, const char *file) const char *branch_name; int has_old; - if (!skip_prefix(refname->string, "refs/heads/", &branch_name)) + if (!skip_prefix(refname->string, "refs/", &branch_name)) continue; strbuf_setlen(&bundle_ref, bundle_prefix_len);