Message ID | e7fde2369495f32c7aa88c7b6b74ebee1a1bed24.1613000292.git.martinvonz@google.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | docs: clarify that refs/notes/ do not keep the attached objects alive | expand |
Martin von Zweigbergk <martinvonz@google.com> writes: > ... In particular, it will keep not only > +objects referenced by the index, remote-tracking branches, reflogs > +(which may reference commits in branches that were later amended or > +rewound), and anything else in the refs/* namespace. Notes saved by > +'git notes' under refs/notes/ will be kept, but the objects (typically > +commits) they are attached to will not be. The notes will not contribute in keeping the objects they are attached to. As long as the objects have some paths from refs and reflog entries (reachability anchors), they will be kept. These two are facts. But I am afraid that the new phrasing can be misread as saying that an object, if it has notes attached to it, will not be kept, period. Knowing Git, we can tell immediately that it would be a nonsense behaviour, but still, I think that is how it can be read, so I suspect that the new text would invite a misunderstanding in the opposite direction. ... and anything else in the refs/* namespace. Note that a note attached to an object does not contribute in keeping the object alive. would be less misinterpretation-inducing, perhaps. We could go further to explain by adding something like that immediately after "keeping the object alive" above, e.g. ---when an object becomes unreachable (e.g. a branch gets rewound, a commit gets rewritten) and eventually gets pruned, a note attached to the object will become dangling (use "git notes prune" to remove them). but I am not sure if that is necessary. Pruning notes attached to objects that are pruned may be relevant in the context of discussing "git gc", I guess. > +If you are expecting some > +objects to be deleted and they aren't, check all of those locations > +and decide whether it makes sense in your case to remove those > +references. > > On the other hand, when 'git gc' runs concurrently with another process, > there is a risk of it deleting an object that the other process is using Thanks.
On Wed, Feb 10, 2021 at 2:35 PM Junio C Hamano <gitster@pobox.com> wrote: > > Martin von Zweigbergk <martinvonz@google.com> writes: > > > ... In particular, it will keep not only > > +objects referenced by the index, remote-tracking branches, reflogs > > +(which may reference commits in branches that were later amended or > > +rewound), and anything else in the refs/* namespace. Notes saved by > > +'git notes' under refs/notes/ will be kept, but the objects (typically > > +commits) they are attached to will not be. > > The notes will not contribute in keeping the objects they are > attached to. As long as the objects have some paths from refs and > reflog entries (reachability anchors), they will be kept. These > two are facts. > > But I am afraid that the new phrasing can be misread as saying that > an object, if it has notes attached to it, will not be kept, period. > > Knowing Git, we can tell immediately that it would be a nonsense > behaviour, but still, I think that is how it can be read, so I > suspect that the new text would invite a misunderstanding in the > opposite direction. > > ... and anything else in the refs/* namespace. Note that a note > attached to an object does not contribute in keeping the object > alive. > > would be less misinterpretation-inducing, perhaps. Good point. You dropped the bit about the notes (texts) being kept alive. I don't know if you did that intentionally are not. I initially thought that we should keep that bit, but it's probably not actually very useful information. Users probably don't have large amounts of information stored in notes, so they probably don't care whether notes text is kept, especially since there's no good way of pruning the notes. So I took your proposed sentence, but I added a parenthesis to clarify that we're talking about notes from 'git notes'. > > We could go further to explain by adding something like that > immediately after "keeping the object alive" above, e.g. > > ---when an object becomes unreachable (e.g. a branch gets > rewound, a commit gets rewritten) and eventually gets pruned, a > note attached to the object will become dangling (use "git notes > prune" to remove them). > > but I am not sure if that is necessary. Pruning notes attached to > objects that are pruned may be relevant in the context of discussing > "git gc", I guess. Yes, seems only tangentially related, so I'll leave it out. I'll send a v2 in a moment.
Martin von Zweigbergk <martinvonz@google.com> writes: > Good point. You dropped the bit about the notes (texts) being kept > alive. I don't know if you did that intentionally are not. Yes, I did it on purpose, because it is just one of the things that can be reached from refs/, but we shouldn't write our document for those like me, who know what notes and other things in Git are. > I initially > thought that we should keep that bit, but it's probably not actually > very useful information. Users probably don't have large amounts of > information stored in notes, so they probably don't care whether notes > text is kept, especially since there's no good way of pruning the > notes. I am not sure if I agree with any part of the above. End-user data is precious no matter the volume, and we keep notes by making them reachable from refs in the refs/notes/ hierarchy. I am not sure what qualifies, in your eyes, "good" way, but "git notes prune" is a good way to remove notes that are attached to objects that have already been pruned away.
On Wed, Feb 10, 2021 at 9:30 PM Junio C Hamano <gitster@pobox.com> wrote: > > Martin von Zweigbergk <martinvonz@google.com> writes: > > > Good point. You dropped the bit about the notes (texts) being kept > > alive. I don't know if you did that intentionally are not. > > Yes, I did it on purpose, because it is just one of the things that > can be reached from refs/, but we shouldn't write our document for > those like me, who know what notes and other things in Git are. > > > I initially > > thought that we should keep that bit, but it's probably not actually > > very useful information. Users probably don't have large amounts of > > information stored in notes, so they probably don't care whether notes > > text is kept, especially since there's no good way of pruning the > > notes. > > I am not sure if I agree with any part of the above. End-user data > is precious no matter the volume, and we keep notes by making them > reachable from refs in the refs/notes/ hierarchy. Sorry, I forgot to qualify that whole paragraph with something like "Regarding notes attached to unreachable commits: ". Users will obviously not want to lose notes about reachable commits and they won't. So the only remaining concern in my mind was whether they might care about it because they *want* to save the space that the note used. Makes more sense then? > I am not sure what qualifies, in your eyes, "good" way, but "git > notes prune" is a good way to remove notes that are attached to > objects that have already been pruned away. My paragraph above probably clarifies (that I was thinking about saving the space used by notes, which I don't think `git notes prune` helps with).
diff --git a/Documentation/git-gc.txt b/Documentation/git-gc.txt index 0c114ad1ca..52824269a8 100644 --- a/Documentation/git-gc.txt +++ b/Documentation/git-gc.txt @@ -117,12 +117,14 @@ NOTES 'git gc' tries very hard not to delete objects that are referenced anywhere in your repository. In particular, it will keep not only objects referenced by your current set of branches and tags, but also -objects referenced by the index, remote-tracking branches, notes saved -by 'git notes' under refs/notes/, reflogs (which may reference commits -in branches that were later amended or rewound), and anything else in -the refs/* namespace. If you are expecting some objects to be deleted -and they aren't, check all of those locations and decide whether it -makes sense in your case to remove those references. +objects referenced by the index, remote-tracking branches, reflogs +(which may reference commits in branches that were later amended or +rewound), and anything else in the refs/* namespace. Notes saved by +'git notes' under refs/notes/ will be kept, but the objects (typically +commits) they are attached to will not be. If you are expecting some +objects to be deleted and they aren't, check all of those locations +and decide whether it makes sense in your case to remove those +references. On the other hand, when 'git gc' runs concurrently with another process, there is a risk of it deleting an object that the other process is using