mbox series

[0/6] builtin/maintenance: introduce "reflog-expire" task

Message ID 20250226-pks-maintenance-reflog-expire-v1-0-a1204a814952@pks.im (mailing list archive)
Headers show
Series builtin/maintenance: introduce "reflog-expire" task | expand

Message

Patrick Steinhardt Feb. 26, 2025, 3:24 p.m. UTC
Hi,

this patch series introduces a new "reflog-expire" task to
git-maintenance(1). This task is designed to plug a gap when the "gc"
task is disabled, as there is no way to expire reflog entries in that
case.

This patch series has been inspired by the discussion at [1]. I consider
it to be another step into the direction of replacing git-gc(1) and
allowing for more flexible maintenance strategies overall. Next steps
could be:

  1. Enable the "reflog-expire" task by default when using the
     "incremental" strategy. and then we might eventually switch over
     the

  2. Use "incremental" strategy when "features.experimental" is enabled.

  3. Switch over the default strategy to "incremental" after a couple of
     releases.

Thanks!

Patrick

[1]: <e650f4e4-e267-4f1f-bb3a-c71b1fe0b276@uxp.de>

---
Patrick Steinhardt (6):
      reflog: rename `cmd_reflog_expire_cb` to `reflog_expire_options`
      builtin/reflog: stop storing default reflog expiry dates globally
      builtin/reflog: stop storing per-reflog expiry dates globally
      builtin/reflog: make functions regarding `reflog_expire_options` public
      builtin/gc: split out function to expire reflog entries
      builtin/maintenance: introduce "reflog-expire" task

 Documentation/config/maintenance.adoc |   9 ++
 Documentation/git-maintenance.adoc    |   4 +
 builtin/gc.c                          |  72 +++++++++++++---
 builtin/reflog.c                      | 153 ++++------------------------------
 reflog.c                              | 137 ++++++++++++++++++++++++++----
 reflog.h                              |  35 +++++++-
 t/t7900-maintenance.sh                |  18 ++++
 7 files changed, 263 insertions(+), 165 deletions(-)


---
base-commit: 5a526e5e18ddb9a7dfc5a2967d21d6154df64a4f
change-id: 20250226-pks-maintenance-reflog-expire-61c61410751a

Comments

Ramsay Jones Feb. 26, 2025, 5:50 p.m. UTC | #1
On 26/02/2025 15:24, Patrick Steinhardt wrote:
> Hi,
> 
> this patch series introduces a new "reflog-expire" task to
> git-maintenance(1). This task is designed to plug a gap when the "gc"
> task is disabled, as there is no way to expire reflog entries in that
> case.
> 
> This patch series has been inspired by the discussion at [1]. I consider
> it to be another step into the direction of replacing git-gc(1) and
> allowing for more flexible maintenance strategies overall. Next steps

Hmm, I don't know what you have in mind, but just as a data-point, I have
never used, and have no inclination to use, git-maintenance. However, I do
use git-gc extensively: at least once (times the number of repos fetched
which have changes) per day, pretty much every day! :)

ATB,
Ramsay Jones
Junio C Hamano Feb. 26, 2025, 6:40 p.m. UTC | #2
Ramsay Jones <ramsay@ramsayjones.plus.com> writes:

> Hmm, I don't know what you have in mind, but just as a data-point, I have
> never used, and have no inclination to use, git-maintenance. However, I do
> use git-gc extensively: at least once (times the number of repos fetched
> which have changes) per day, pretty much every day! :)

That makes two of us, but everybody knows that we are old fashioned ;-)
Ramsay Jones Feb. 26, 2025, 6:54 p.m. UTC | #3
On 26/02/2025 18:40, Junio C Hamano wrote:
> Ramsay Jones <ramsay@ramsayjones.plus.com> writes:
> 
>> Hmm, I don't know what you have in mind, but just as a data-point, I have
>> never used, and have no inclination to use, git-maintenance. However, I do
>> use git-gc extensively: at least once (times the number of repos fetched
>> which have changes) per day, pretty much every day! :)
> 
> That makes two of us, but everybody knows that we are old fashioned ;-)

true, very true. :)

ATB,
Ramsay Jones
Junio C Hamano Feb. 27, 2025, 1:23 a.m. UTC | #4
Patrick Steinhardt <ps@pks.im> writes:

> this patch series introduces a new "reflog-expire" task to
> git-maintenance(1). This task is designed to plug a gap when the "gc"
> task is disabled, as there is no way to expire reflog entries in that
> case.

I think in the longer run, "maintenance" users should be able to
treat the single ball of wax "gc" task as a mere short-hand to
invoke a set of often used maintenance tasks, and we would want to
break down the component tasks grouped in it and make them
independently available.  This is a good step along that journey.

Are there other things that the "gc" task covers that are not
available elsewhere?  "git gc --help" suggests there are things
related to pruning (unused?) worktrees and stale rerere database
entries.

Another thing, how much control do we want to cede to the end users
the choice of tasks and order of running them?  When you are
expiring stale reflog entries and repacking the object database to
discard unreachable objects, it would only make sense to do them in
the order I just said.  We could leave it up to the end users, but
that may be doing disservice to them.
Patrick Steinhardt Feb. 27, 2025, 9:10 a.m. UTC | #5
On Wed, Feb 26, 2025 at 06:54:48PM +0000, Ramsay Jones wrote:
> On 26/02/2025 18:40, Junio C Hamano wrote:
> > Ramsay Jones <ramsay@ramsayjones.plus.com> writes:
> > 
> >> Hmm, I don't know what you have in mind, but just as a data-point, I have
> >> never used, and have no inclination to use, git-maintenance. However, I do
> >> use git-gc extensively: at least once (times the number of repos fetched
> >> which have changes) per day, pretty much every day! :)
> > 
> > That makes two of us, but everybody knows that we are old fashioned ;-)
> 
> true, very true. :)

Well, it depends on what you mean by "use". In fact, both of you use it
implicitly assuming that you use a recent version of Git because that is
what Git nowadays spawns automatically: we don't use `git gc --auto`
anymore, but instead use `git maintenance run --auto`. It _does_ still
use git-gc(1) under the hood by default, but that is something we can
change going forward.

The opportunity here is to have a more fine-grained strategy to perform
maintenance, both when run explicitly but also when run automatically by
Git. git-maintenance(1) is written in a way that makes it significantly
more flexible overall, so we can iterate on how exactly it performs the
maintenance for the user. Different strategies may make sense in some
contexts, but not in others, and that is something we can account for
here.

It also allows us to bring newer features to the masses that have a
chance to improve performance or reduce the time spent maintaining
repositories for everyone: multi-pack indices, split commit graphs,
geometric repacking, incremental bitmaps.

While we could move them into git-gc(1), I think that this tool is just
not well-suited for such changes as it simply doesn't provide a good
foundation for tweakable behaviour.

Patrick
Patrick Steinhardt Feb. 27, 2025, 9:22 a.m. UTC | #6
On Wed, Feb 26, 2025 at 05:23:10PM -0800, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> > this patch series introduces a new "reflog-expire" task to
> > git-maintenance(1). This task is designed to plug a gap when the "gc"
> > task is disabled, as there is no way to expire reflog entries in that
> > case.
> 
> I think in the longer run, "maintenance" users should be able to
> treat the single ball of wax "gc" task as a mere short-hand to
> invoke a set of often used maintenance tasks, and we would want to
> break down the component tasks grouped in it and make them
> independently available.  This is a good step along that journey.
> 
> Are there other things that the "gc" task covers that are not
> available elsewhere?  "git gc --help" suggests there are things
> related to pruning (unused?) worktrees and stale rerere database
> entries.

These are more gaps indeed. I'm happy to work on them once this patch
series has landed. I don't know about any other gaps.

> Another thing, how much control do we want to cede to the end users
> the choice of tasks and order of running them?  When you are
> expiring stale reflog entries and repacking the object database to
> discard unreachable objects, it would only make sense to do them in
> the order I just said.  We could leave it up to the end users, but
> that may be doing disservice to them.

This is a good question. From my perspective, there are three classes of
users here:

  - Those that don't care and don't have special needs. This class of
    users is unlikely to tweak things anyway.

  - Those that aren't deeply familiar with how Git works, but who do
    have special needs e.g. because they have huge repositories. This
    class of users may need to tweak configuration, but we should give
    them an _easy_ way to do so. Configuring individual tasks ain't that
    from my perspective.

  - Power users that are deeply familiar with how Git works. This class
    of users may even want to tweak the order in which specific tasks
    run.

"maintenance.strategy" exists to cater to the second class of users and
allows them to configure the high-level strategy used to maintain repos.
I don't know whether it's honored by `git maintenance run`, but I think
it is (and if it's not it should be).

That to me means that the configuration for individual tasks for power
users can be as flexible as possible, including configuring the order in
which tasks are run.

Patrick
Junio C Hamano Feb. 27, 2025, 5:01 p.m. UTC | #7
Patrick Steinhardt <ps@pks.im> writes:

> On Wed, Feb 26, 2025 at 05:23:10PM -0800, Junio C Hamano wrote:
>> Patrick Steinhardt <ps@pks.im> writes:
>> 
>> > this patch series introduces a new "reflog-expire" task to
>> > git-maintenance(1). This task is designed to plug a gap when the "gc"
>> > task is disabled, as there is no way to expire reflog entries in that
>> > case.
>> 
>> I think in the longer run, "maintenance" users should be able to
>> treat the single ball of wax "gc" task as a mere short-hand to
>> invoke a set of often used maintenance tasks, and we would want to
>> break down the component tasks grouped in it and make them
>> independently available.  This is a good step along that journey.
>> 
>> Are there other things that the "gc" task covers that are not
>> available elsewhere?  "git gc --help" suggests there are things
>> related to pruning (unused?) worktrees and stale rerere database
>> entries.
>
> These are more gaps indeed. I'm happy to work on them once this patch
> series has landed. I don't know about any other gaps.

Or maybe leave breadcrumbs and invite others to help advance the
cause?  If we know we have achieved consensus that it is a good
direction to go in, that is (we already saw a mention that indicates
that there are populations of us who do not care too much about
extending maintenance but are familiar with gc).
Patrick Steinhardt Feb. 28, 2025, 8:35 a.m. UTC | #8
On Thu, Feb 27, 2025 at 09:01:49AM -0800, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> > On Wed, Feb 26, 2025 at 05:23:10PM -0800, Junio C Hamano wrote:
> >> Patrick Steinhardt <ps@pks.im> writes:
> >> 
> >> > this patch series introduces a new "reflog-expire" task to
> >> > git-maintenance(1). This task is designed to plug a gap when the "gc"
> >> > task is disabled, as there is no way to expire reflog entries in that
> >> > case.
> >> 
> >> I think in the longer run, "maintenance" users should be able to
> >> treat the single ball of wax "gc" task as a mere short-hand to
> >> invoke a set of often used maintenance tasks, and we would want to
> >> break down the component tasks grouped in it and make them
> >> independently available.  This is a good step along that journey.
> >> 
> >> Are there other things that the "gc" task covers that are not
> >> available elsewhere?  "git gc --help" suggests there are things
> >> related to pruning (unused?) worktrees and stale rerere database
> >> entries.
> >
> > These are more gaps indeed. I'm happy to work on them once this patch
> > series has landed. I don't know about any other gaps.
> 
> Or maybe leave breadcrumbs and invite others to help advance the
> cause?  If we know we have achieved consensus that it is a good
> direction to go in, that is (we already saw a mention that indicates
> that there are populations of us who do not care too much about
> extending maintenance but are familiar with gc).

Oh, sure, I wouldn't mind at all if somebody else picked this up. The
question to me is where to leave the breadcrumb, other than having it in
this thread.

Patrick