mbox series

[0/2] commit-graph: suggest deleting corrupt graphs

Message ID cover.1708643825.git.steadmon@google.com (mailing list archive)
Headers show
Series commit-graph: suggest deleting corrupt graphs | expand

Message

Josh Steadmon Feb. 22, 2024, 11:19 p.m. UTC
At $WORK, we've had a few occasions where someone's commit-graph becomes
corrupt, and hits various BUG()s that block their day-to-day work. When
this happens, we advise the user to either disable the commit graph, or
to delete it and let it be regenerated.

It would be a nicer user experience if we can make this a self-serve
procedure. To do this, let's add a new `git commit-graph clear`
subcommand so that users don't need to manually delete files under their
.git directories. And to make it self-documenting, update various BUG(),
die(), and error() messages to suggest removing the commit graph to
recover from the corruption.

This approach was suggested in [1] and generally positively received
[2], but was never implemented.

[1] https://lore.kernel.org/git/YBoBBie2t1EhcLAN@google.com/
[2] https://lore.kernel.org/git/xmqqk0rpc7uj.fsf@gitster.c.googlers.com/

Open questions for reviewers:
* Should we turn this into an advice setting instead?
* Should we also suggest running `commit-graph write` after clearing
  the graph? I lean towards no; everything will still function as normal
  without a commit graph.
* Does it make sense to add the suggestion in all of these corruption
  error messages? There are many other error()s in commit-graph.c,
  should we add this for all of them, or just the ones that specifically
  mention corruption? Or maybe just the fatal BUG()s and die()s?
* Any other places this suggestion should be added that I've missed?


Josh Steadmon (2):
  commit-graph: add `git commit-graph clear` subcommand
  commit-graph: suggest removing corrupt graphs

 Documentation/git-commit-graph.txt |  5 ++++
 builtin/commit-graph.c             | 40 +++++++++++++++++++++++++++
 commit-graph.c                     | 43 +++++++++++++++++++++++++++---
 commit-graph.h                     |  1 +
 commit-reach.c                     |  4 ++-
 t/t5318-commit-graph.sh            | 17 ++++++++++--
 t/t5324-split-commit-graph.sh      | 26 ++++++++++++------
 7 files changed, 122 insertions(+), 14 deletions(-)


base-commit: 3e0d3cd5c7def4808247caf168e17f2bbf47892b

Comments

Junio C Hamano Feb. 23, 2024, 12:05 a.m. UTC | #1
Josh Steadmon <steadmon@google.com> writes:

> At $WORK, we've had a few occasions where someone's commit-graph becomes
> corrupt, and hits various BUG()s that block their day-to-day work. When
> this happens, we advise the user to either disable the commit graph, or
> to delete it and let it be regenerated.
>
> It would be a nicer user experience if we can make this a self-serve
> procedure. To do this, let's add a new `git commit-graph clear`
> subcommand so that users don't need to manually delete files under their
> .git directories. And to make it self-documenting, update various BUG(),
> die(), and error() messages to suggest removing the commit graph to
> recover from the corruption.

I am of two minds.

For one, if we know there is a corruption and if we know that we
will certainly recover cleanly if we removed these files, it would
be fair for an end-user to respond with: instead of telling me to
run "commit-graph clear", you can run it for me, can't you?

The other one is if it hinders debugging the root cause to run
"clear", whether it is done by the end-user or by the mechanism that
detects and dies upon discovery of a corruption.  Do we know how
these commit-graph files become corrupt?  How valuable would these
corrupt files be to help us track down where the corruption comes
from?  If they are not all that useful in debugging, then removing
them ourselves or telling users to remove them may be OK, of course.

Do these BUG()s come from corruption that can be diagnosed upfront
when we "open" the commit-graph files?  I am wondering if it would
be the matter of teaching prepare_commit_graph() to check for
corruption and return without enabling the support.

Thanks.
Josh Steadmon April 24, 2024, 7:30 p.m. UTC | #2
On 2024.02.22 16:05, Junio C Hamano wrote:
> Josh Steadmon <steadmon@google.com> writes:
> 
> > At $WORK, we've had a few occasions where someone's commit-graph becomes
> > corrupt, and hits various BUG()s that block their day-to-day work. When
> > this happens, we advise the user to either disable the commit graph, or
> > to delete it and let it be regenerated.
> >
> > It would be a nicer user experience if we can make this a self-serve
> > procedure. To do this, let's add a new `git commit-graph clear`
> > subcommand so that users don't need to manually delete files under their
> > .git directories. And to make it self-documenting, update various BUG(),
> > die(), and error() messages to suggest removing the commit graph to
> > recover from the corruption.
> 
> I am of two minds.
> 
> For one, if we know there is a corruption and if we know that we
> will certainly recover cleanly if we removed these files, it would
> be fair for an end-user to respond with: instead of telling me to
> run "commit-graph clear", you can run it for me, can't you?
> 
> The other one is if it hinders debugging the root cause to run
> "clear", whether it is done by the end-user or by the mechanism that
> detects and dies upon discovery of a corruption.  Do we know how
> these commit-graph files become corrupt?  How valuable would these
> corrupt files be to help us track down where the corruption comes
> from?  If they are not all that useful in debugging, then removing
> them ourselves or telling users to remove them may be OK, of course.
> 
> Do these BUG()s come from corruption that can be diagnosed upfront
> when we "open" the commit-graph files?  I am wondering if it would
> be the matter of teaching prepare_commit_graph() to check for
> corruption and return without enabling the support.
> 
> Thanks.

Sorry for the late reply, this got buried in my inbox. The corruption we
saw was related to a generation numbers bug [1] that I think was only
present for a short while in 'next'.

[1] https://lore.kernel.org/git/YBn3fxFe978Up5Ly@google.com/

I believe that being able to examine the files after the corruption was
detected did help us narrow down the issue, so I would lean towards not
automatically deleting them upon detecting corruption.

I don't think that this case would be detectable without running a full
`git commit-graph verify` up front.
Junio C Hamano April 24, 2024, 9:28 p.m. UTC | #3
Josh Steadmon <steadmon@google.com> writes:

> I believe that being able to examine the files after the corruption was
> detected did help us narrow down the issue, so I would lean towards not
> automatically deleting them upon detecting corruption.

Understood.

> I don't think that this case would be detectable without running a full
> `git commit-graph verify` up front.

OK, then that approach is not worth pursuing.  Thanks for a dose of
sanity.