diff mbox series

docs: clarify that refs/notes/ do not keep the attached objects alive

Message ID e7fde2369495f32c7aa88c7b6b74ebee1a1bed24.1613000292.git.martinvonz@google.com (mailing list archive)
State Superseded
Headers show
Series docs: clarify that refs/notes/ do not keep the attached objects alive | expand

Commit Message

Martin von Zweigbergk Feb. 11, 2021, midnight UTC
`git help gc` contains this snippet:

  "[...] it will keep [..] objects referenced by the index,
  remote-tracking branches, notes saved by git notes under refs/notes/"

I had interpreted that as saying that the objects that notes were
attached to are kept, but that is not the case. Let's clarify the
documentation by moving out the part about git notes to a separate
sentence.
---
 Documentation/git-gc.txt | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

Comments

Junio C Hamano Feb. 11, 2021, 12:35 a.m. UTC | #1
Martin von Zweigbergk <martinvonz@google.com> writes:

>  ... In particular, it will keep not only
> +objects referenced by the index, remote-tracking branches, reflogs
> +(which may reference commits in branches that were later amended or
> +rewound), and anything else in the refs/* namespace. Notes saved by
> +'git notes' under refs/notes/ will be kept, but the objects (typically
> +commits) they are attached to will not be.

The notes will not contribute in keeping the objects they are
attached to.  As long as the objects have some paths from refs and
reflog entries (reachability anchors), they will be kept.  These
two are facts.

But I am afraid that the new phrasing can be misread as saying that
an object, if it has notes attached to it, will not be kept, period.

Knowing Git, we can tell immediately that it would be a nonsense
behaviour, but still, I think that is how it can be read, so I
suspect that the new text would invite a misunderstanding in the
opposite direction.

    ... and anything else in the refs/* namespace.  Note that a note
    attached to an object does not contribute in keeping the object
    alive.

would be less misinterpretation-inducing, perhaps.

We could go further to explain by adding something like that
immediately after "keeping the object alive" above, e.g.

    ---when an object becomes unreachable (e.g. a branch gets
    rewound, a commit gets rewritten) and eventually gets pruned, a
    note attached to the object will become dangling (use "git notes
    prune" to remove them).

but I am not sure if that is necessary.  Pruning notes attached to
objects that are pruned may be relevant in the context of discussing
"git gc", I guess.

> +If you are expecting some
> +objects to be deleted and they aren't, check all of those locations
> +and decide whether it makes sense in your case to remove those
> +references.
>  
>  On the other hand, when 'git gc' runs concurrently with another process,
>  there is a risk of it deleting an object that the other process is using

Thanks.
Martin von Zweigbergk Feb. 11, 2021, 7:14 a.m. UTC | #2
On Wed, Feb 10, 2021 at 2:35 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Martin von Zweigbergk <martinvonz@google.com> writes:
>
> >  ... In particular, it will keep not only
> > +objects referenced by the index, remote-tracking branches, reflogs
> > +(which may reference commits in branches that were later amended or
> > +rewound), and anything else in the refs/* namespace. Notes saved by
> > +'git notes' under refs/notes/ will be kept, but the objects (typically
> > +commits) they are attached to will not be.
>
> The notes will not contribute in keeping the objects they are
> attached to.  As long as the objects have some paths from refs and
> reflog entries (reachability anchors), they will be kept.  These
> two are facts.
>
> But I am afraid that the new phrasing can be misread as saying that
> an object, if it has notes attached to it, will not be kept, period.
>
> Knowing Git, we can tell immediately that it would be a nonsense
> behaviour, but still, I think that is how it can be read, so I
> suspect that the new text would invite a misunderstanding in the
> opposite direction.
>
>     ... and anything else in the refs/* namespace.  Note that a note
>     attached to an object does not contribute in keeping the object
>     alive.
>
> would be less misinterpretation-inducing, perhaps.

Good point. You dropped the bit about the notes (texts) being kept
alive. I don't know if you did that intentionally are not. I initially
thought that we should keep that bit, but it's probably not actually
very useful information. Users probably don't have large amounts of
information stored in notes, so they probably don't care whether notes
text is kept, especially since there's no good way of pruning the
notes. So I took your proposed sentence, but I added a parenthesis to
clarify that we're talking about notes from 'git notes'.

>
> We could go further to explain by adding something like that
> immediately after "keeping the object alive" above, e.g.
>
>     ---when an object becomes unreachable (e.g. a branch gets
>     rewound, a commit gets rewritten) and eventually gets pruned, a
>     note attached to the object will become dangling (use "git notes
>     prune" to remove them).
>
> but I am not sure if that is necessary.  Pruning notes attached to
> objects that are pruned may be relevant in the context of discussing
> "git gc", I guess.

Yes, seems only tangentially related, so I'll leave it out.

I'll send a v2 in a moment.
Junio C Hamano Feb. 11, 2021, 7:30 a.m. UTC | #3
Martin von Zweigbergk <martinvonz@google.com> writes:

> Good point. You dropped the bit about the notes (texts) being kept
> alive. I don't know if you did that intentionally are not.

Yes, I did it on purpose, because it is just one of the things that
can be reached from refs/, but we shouldn't write our document for
those like me, who know what notes and other things in Git are.

> I initially
> thought that we should keep that bit, but it's probably not actually
> very useful information. Users probably don't have large amounts of
> information stored in notes, so they probably don't care whether notes
> text is kept, especially since there's no good way of pruning the
> notes.

I am not sure if I agree with any part of the above.  End-user data
is precious no matter the volume, and we keep notes by making them
reachable from refs in the refs/notes/ hierarchy.

I am not sure what qualifies, in your eyes, "good" way, but "git
notes prune" is a good way to remove notes that are attached to
objects that have already been pruned away.
Martin von Zweigbergk Feb. 11, 2021, 7:38 a.m. UTC | #4
On Wed, Feb 10, 2021 at 9:30 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Martin von Zweigbergk <martinvonz@google.com> writes:
>
> > Good point. You dropped the bit about the notes (texts) being kept
> > alive. I don't know if you did that intentionally are not.
>
> Yes, I did it on purpose, because it is just one of the things that
> can be reached from refs/, but we shouldn't write our document for
> those like me, who know what notes and other things in Git are.
>
> > I initially
> > thought that we should keep that bit, but it's probably not actually
> > very useful information. Users probably don't have large amounts of
> > information stored in notes, so they probably don't care whether notes
> > text is kept, especially since there's no good way of pruning the
> > notes.
>
> I am not sure if I agree with any part of the above.  End-user data
> is precious no matter the volume, and we keep notes by making them
> reachable from refs in the refs/notes/ hierarchy.

Sorry, I forgot to qualify that whole paragraph with something like
"Regarding notes attached to unreachable commits: ". Users will
obviously not want to lose notes about reachable commits and they
won't. So the only remaining concern in my mind was whether they might
care about it because they *want* to save the space that the note
used. Makes more sense then?

> I am not sure what qualifies, in your eyes, "good" way, but "git
> notes prune" is a good way to remove notes that are attached to
> objects that have already been pruned away.

My paragraph above probably clarifies (that I was thinking about
saving the space used by notes, which I don't think `git notes prune`
helps with).
diff mbox series

Patch

diff --git a/Documentation/git-gc.txt b/Documentation/git-gc.txt
index 0c114ad1ca..52824269a8 100644
--- a/Documentation/git-gc.txt
+++ b/Documentation/git-gc.txt
@@ -117,12 +117,14 @@  NOTES
 'git gc' tries very hard not to delete objects that are referenced
 anywhere in your repository. In particular, it will keep not only
 objects referenced by your current set of branches and tags, but also
-objects referenced by the index, remote-tracking branches, notes saved
-by 'git notes' under refs/notes/, reflogs (which may reference commits
-in branches that were later amended or rewound), and anything else in
-the refs/* namespace.  If you are expecting some objects to be deleted
-and they aren't, check all of those locations and decide whether it
-makes sense in your case to remove those references.
+objects referenced by the index, remote-tracking branches, reflogs
+(which may reference commits in branches that were later amended or
+rewound), and anything else in the refs/* namespace. Notes saved by
+'git notes' under refs/notes/ will be kept, but the objects (typically
+commits) they are attached to will not be. If you are expecting some
+objects to be deleted and they aren't, check all of those locations
+and decide whether it makes sense in your case to remove those
+references.
 
 On the other hand, when 'git gc' runs concurrently with another process,
 there is a risk of it deleting an object that the other process is using