diff mbox series

reflog expire: add progress output

Message ID 20180919141016.27930-1-avarab@gmail.com (mailing list archive)
State New, archived
Headers show
Series reflog expire: add progress output | expand

Commit Message

Ævar Arnfjörð Bjarmason Sept. 19, 2018, 2:10 p.m. UTC
Before this change the "git reflog expire" command didn't report any
progress. This is the second command (after "pack-refs --all --prune")
that the "gc" command will run.

On small repositories like this command won't take long to run, my
test system it takes just under 1 second to run on git.git, but just
around 8 seconds on linux.git, and much longer on the
2015-04-03-1M-git.git[1] large test repository.

Taking so long means that "gc" will appear to hang at the beginning of
its run. It might still do so after this change if the earlier
"pack-refs" command takes a really long time to run, but that'll only
impact repositories with a really large set of refs to pack, and can
be addressed in some future change.

One thing that's bad about this change is that we might *in theory*
print a "Marking unreachable commits in reflog for expiry" message for
each ref with a reflog. This is because the abbreviated callstack
looks like this:

    0  mark_reachable at builtin/reflog.c:227
    1  in unreachable at builtin/reflog.c:290
    2  in should_expire_reflog_ent at builtin/reflog.c:317
    3  in expire_reflog_ent at refs/files-backend.c:2956
    4  in show_one_reflog_ent at refs/files-backend.c:1879
    # This is the last function that has the refname (e.g. "HEAD") available
    5  in files_for_each_reflog_ent at refs/files-backend.c:2025
    6  in refs_for_each_reflog_ent at refs.c:2066
    7  in files_reflog_expire at refs/files-backend.c:3043
    8  in refs_reflog_expire at refs.c:2117
    9  in reflog_expire at refs.c:2129
    # Here's where we collect reflogs to expire, and expire each one
    10 in cmd_reflog_expire  at builtin/reflog.c:595

I.e. this progress is being reported for each expired reflog. So if
start_progress() were used instead of start_delayed_progress() we'd
print (e.g. on my git.git) hundreds of these lines.

In practice I haven't been able to make it print anything except one
line. This is because validating the reflogs for these other
branches (not "HEAD") takes such a short amount of time.

That may just be some artifact of the repositories I've tested, but I
suspect It'll be true in general. As the callstack above shows, in
order to guarantee that we don't do that we'd need to pass some
"progress" variable through 10 levels of functions, many of which are
"for_each" callback functions with void* cb_data.

1. https://github.com/avar/2015-04-03-1M-git

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/reflog.c | 8 ++++++++
 1 file changed, 8 insertions(+)

Comments

Duy Nguyen Sept. 19, 2018, 4:26 p.m. UTC | #1
On Wed, Sep 19, 2018 at 4:23 PM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
> Before this change the "git reflog expire" command didn't report any
> progress.

I love these progress additions you've been pushing lately :)

> diff --git a/builtin/reflog.c b/builtin/reflog.c
> index 3acef5a0ab..d3075ee75a 100644
> --- a/builtin/reflog.c
> +++ b/builtin/reflog.c
> @@ -10,6 +10,7 @@
>  #include "diff.h"
>  #include "revision.h"
>  #include "reachable.h"
> +#include "progress.h"
>
>  /* NEEDSWORK: switch to using parse_options */
>  static const char reflog_expire_usage[] =
> @@ -225,14 +226,20 @@ static void mark_reachable(struct expire_reflog_policy_cb *cb)
>         struct commit_list *pending;
>         timestamp_t expire_limit = cb->mark_limit;
>         struct commit_list *leftover = NULL;
> +       struct progress *progress = NULL;
> +       int i = 0;
>
>         for (pending = cb->mark_list; pending; pending = pending->next)
>                 pending->item->object.flags &= ~REACHABLE;
>
>         pending = cb->mark_list;
> +       progress = start_delayed_progress(
> +               _("Marking unreachable commits in reflog for expiry"), 0);

Maybe just "Searching expired reflog entries" or something like that.
It's not technically as accurate, but I think it's easier to
understand by by new people.

Do we have --quiet option or something that needs to completely
suppress this progress thing?

>         while (pending) {
>                 struct commit_list *parent;
>                 struct commit *commit = pop_commit(&pending);
> +
> +               display_progress(progress, ++i);

maybe rename it to commit_count or something and leave "i" for
temporary/short lived usage.

>                 if (commit->object.flags & REACHABLE)
>                         continue;
>                 if (parse_commit(commit))
> @@ -253,6 +260,7 @@ static void mark_reachable(struct expire_reflog_policy_cb *cb)
>                 }
>         }
>         cb->mark_list = leftover;
> +       stop_progress(&progress);
>  }
>
>  static int unreachable(struct expire_reflog_policy_cb *cb, struct commit *commit, struct object_id *oid)
> --
> 2.19.0.444.g18242da7ef
>
Ævar Arnfjörð Bjarmason Sept. 19, 2018, 5:22 p.m. UTC | #2
On Wed, Sep 19 2018, Duy Nguyen wrote:

> On Wed, Sep 19, 2018 at 4:23 PM Ævar Arnfjörð Bjarmason
> <avarab@gmail.com> wrote:
>> Before this change the "git reflog expire" command didn't report any
>> progress.
>
> I love these progress additions you've been pushing lately :)
>
>> diff --git a/builtin/reflog.c b/builtin/reflog.c
>> index 3acef5a0ab..d3075ee75a 100644
>> --- a/builtin/reflog.c
>> +++ b/builtin/reflog.c
>> @@ -10,6 +10,7 @@
>>  #include "diff.h"
>>  #include "revision.h"
>>  #include "reachable.h"
>> +#include "progress.h"
>>
>>  /* NEEDSWORK: switch to using parse_options */
>>  static const char reflog_expire_usage[] =
>> @@ -225,14 +226,20 @@ static void mark_reachable(struct expire_reflog_policy_cb *cb)
>>         struct commit_list *pending;
>>         timestamp_t expire_limit = cb->mark_limit;
>>         struct commit_list *leftover = NULL;
>> +       struct progress *progress = NULL;
>> +       int i = 0;
>>
>>         for (pending = cb->mark_list; pending; pending = pending->next)
>>                 pending->item->object.flags &= ~REACHABLE;
>>
>>         pending = cb->mark_list;
>> +       progress = start_delayed_progress(
>> +               _("Marking unreachable commits in reflog for expiry"), 0);
>
> Maybe just "Searching expired reflog entries" or something like that.
> It's not technically as accurate, but I think it's easier to
> understand by by new people.

Or "Pruning reflog entries"? I was aiming for some combination of a)
making it clear what we're doing (pruning stuff) b) that we're doing it
on a subset of the counter of the (very large) values we're showing.

So the current one has "a" && "b", "Searching..." has neither, and
"Pruning..." has "a" but not "b".

Maybe making a && b clear isn't that important, but I'm currently
leaning towards keeping the current one because it's not *that* long and
makes things clearer to the user.

> Do we have --quiet option or something that needs to completely
> suppress this progress thing?

Yes. I also see my commit graph process patches sitting in "next" broke
the "git gc --quiet" mode, and I'll need to submit something on top
(which'll be easy), and submit a v2 on this (pending further
comments...).

Is there a better way to test that (fake up the file descriptor check)
in the tests other than adding getenv("GIT_TEST...") to the progress.c
logic?

>>         while (pending) {
>>                 struct commit_list *parent;
>>                 struct commit *commit = pop_commit(&pending);
>> +
>> +               display_progress(progress, ++i);
>
> maybe rename it to commit_count or something and leave "i" for
> temporary/short lived usage.

Good point. Willdo.

>>                 if (commit->object.flags & REACHABLE)
>>                         continue;
>>                 if (parse_commit(commit))
>> @@ -253,6 +260,7 @@ static void mark_reachable(struct expire_reflog_policy_cb *cb)
>>                 }
>>         }
>>         cb->mark_list = leftover;
>> +       stop_progress(&progress);
>>  }
>>
>>  static int unreachable(struct expire_reflog_policy_cb *cb, struct commit *commit, struct object_id *oid)
>> --
>> 2.19.0.444.g18242da7ef
>>
Jeff King Sept. 19, 2018, 7:01 p.m. UTC | #3
On Wed, Sep 19, 2018 at 07:22:44PM +0200, Ævar Arnfjörð Bjarmason wrote:

> > Do we have --quiet option or something that needs to completely
> > suppress this progress thing?
> 
> Yes. I also see my commit graph process patches sitting in "next" broke
> the "git gc --quiet" mode, and I'll need to submit something on top
> (which'll be easy), and submit a v2 on this (pending further
> comments...).
> 
> Is there a better way to test that (fake up the file descriptor check)
> in the tests other than adding getenv("GIT_TEST...") to the progress.c
> logic?

The progress code doesn't do the isatty() check at all. The caller has
to do it (and ideally would respect --progress/--no-progress to
override, along with having --quiet imply --no-progress if such an
option exists).

Once you have all that, then you can test --progress explicitly. If you
want to check the isatty() handling, you can use test_terminal().

-Peff
Ævar Arnfjörð Bjarmason Sept. 19, 2018, 8:16 p.m. UTC | #4
On Wed, Sep 19 2018, Jeff King wrote:

> On Wed, Sep 19, 2018 at 07:22:44PM +0200, Ævar Arnfjörð Bjarmason wrote:
>
>> > Do we have --quiet option or something that needs to completely
>> > suppress this progress thing?
>>
>> Yes. I also see my commit graph process patches sitting in "next" broke
>> the "git gc --quiet" mode, and I'll need to submit something on top
>> (which'll be easy), and submit a v2 on this (pending further
>> comments...).
>>
>> Is there a better way to test that (fake up the file descriptor check)
>> in the tests other than adding getenv("GIT_TEST...") to the progress.c
>> logic?
>
> The progress code doesn't do the isatty() check at all. The caller has
> to do it (and ideally would respect --progress/--no-progress to
> override, along with having --quiet imply --no-progress if such an
> option exists).

Yeah, what I was confused by was testing this with "git gc", and until
my recent commit graph progress patches + this (which I wasn't testing
against) the progress output was all from pack-objects, which checks the
--progress option, and then proceeds to ignore all of that and just
check isatty().

> Once you have all that, then you can test --progress explicitly. If you
> want to check the isatty() handling, you can use test_terminal().

Thanks.
Duy Nguyen Sept. 20, 2018, 4:19 p.m. UTC | #5
On Wed, Sep 19, 2018 at 07:22:44PM +0200, Ævar Arnfjörð Bjarmason wrote:
> >> @@ -225,14 +226,20 @@ static void mark_reachable(struct expire_reflog_policy_cb *cb)
> >>         struct commit_list *pending;
> >>         timestamp_t expire_limit = cb->mark_limit;
> >>         struct commit_list *leftover = NULL;
> >> +       struct progress *progress = NULL;
> >> +       int i = 0;
> >>
> >>         for (pending = cb->mark_list; pending; pending = pending->next)
> >>                 pending->item->object.flags &= ~REACHABLE;
> >>
> >>         pending = cb->mark_list;
> >> +       progress = start_delayed_progress(
> >> +               _("Marking unreachable commits in reflog for expiry"), 0);
> >
> > Maybe just "Searching expired reflog entries" or something like that.
> > It's not technically as accurate, but I think it's easier to
> > understand by by new people.
> 
> Or "Pruning reflog entries"? I was aiming for some combination of a)
> making it clear what we're doing (pruning stuff) b) that we're doing it
> on a subset of the counter of the (very large) values we're showing.
> 
> So the current one has "a" && "b", "Searching..." has neither, and
> "Pruning..." has "a" but not "b".
> 
> Maybe making a && b clear isn't that important, but I'm currently
> leaning towards keeping the current one because it's not *that* long and
> makes things clearer to the user.

OK

> >>         while (pending) {
> >>                 struct commit_list *parent;
> >>                 struct commit *commit = pop_commit(&pending);
> >> +
> >> +               display_progress(progress, ++i);
> >
> > maybe rename it to commit_count or something and leave "i" for
> > temporary/short lived usage.
> 
> Good point. Willdo.

Actually I'm still not sure if it's valuable to show the actual commit
count here. Some patch like this could show "activity" without the
number. But this is tangent.

-- 8< --
Subject: [PATCH] progress: add api for displaying a throbber

---
 progress.c | 29 +++++++++++++++++++++++++----
 progress.h |  1 +
 2 files changed, 26 insertions(+), 4 deletions(-)

diff --git a/progress.c b/progress.c
index 5a99c9fbf0..fada556bf0 100644
--- a/progress.c
+++ b/progress.c
@@ -36,6 +36,7 @@ struct progress {
 	unsigned delay;
 	struct throughput *throughput;
 	uint64_t start_ns;
+	int show_throbber;
 };
 
 static volatile sig_atomic_t progress_update;
@@ -81,11 +82,13 @@ static int is_foreground_fd(int fd)
 static int display(struct progress *progress, uint64_t n, const char *done)
 {
 	const char *eol, *tp;
+	static char throbber[] = "\\|/-";
 
 	if (progress->delay && (!progress_update || --progress->delay))
 		return 0;
 
-	progress->last_value = n;
+	if (!progress->show_throbber)
+		progress->last_value = n;
 	tp = (progress->throughput) ? progress->throughput->display.buf : "";
 	eol = done ? done : "   \r";
 	if (progress->total) {
@@ -104,8 +107,15 @@ static int display(struct progress *progress, uint64_t n, const char *done)
 		}
 	} else if (progress_update) {
 		if (is_foreground_fd(fileno(stderr)) || done) {
-			fprintf(stderr, "%s: %"PRIuMAX"%s%s",
-				progress->title, (uintmax_t)n, tp, eol);
+			if (progress->show_throbber) {
+				progress->last_value++;
+				fprintf(stderr, "%s: %c%s%s",
+					progress->title,
+					throbber[progress->last_value % 4],
+					tp, eol);
+			} else
+				fprintf(stderr, "%s: %"PRIuMAX"%s%s",
+					progress->title, (uintmax_t)n, tp, eol);
 			fflush(stderr);
 		}
 		progress_update = 0;
@@ -193,6 +203,14 @@ int display_progress(struct progress *progress, uint64_t n)
 	return progress ? display(progress, n, NULL) : 0;
 }
 
+void display_throbber(struct progress *progress)
+{
+	if (progress) {
+		progress->show_throbber = 1;
+		display(progress, 0, NULL);
+	}
+}
+
 static struct progress *start_progress_delay(const char *title, uint64_t total,
 					     unsigned delay)
 {
@@ -248,7 +266,10 @@ void stop_progress_msg(struct progress **p_progress, const char *msg)
 			throughput_string(&tp->display, tp->curr_total, rate);
 		}
 		progress_update = 1;
-		buf = xstrfmt(", %s.\n", msg);
+		if (progress->show_throbber)
+			buf = xstrfmt(" %s.\n", msg);
+		else
+			buf = xstrfmt(", %s.\n", msg);
 		display(progress, progress->last_value, buf);
 		free(buf);
 	}
diff --git a/progress.h b/progress.h
index 70a4d4a0d6..9d23ff242f 100644
--- a/progress.h
+++ b/progress.h
@@ -5,6 +5,7 @@ struct progress;
 
 void display_throughput(struct progress *progress, uint64_t total);
 int display_progress(struct progress *progress, uint64_t n);
+void display_throbber(struct progress *progress);
 struct progress *start_progress(const char *title, uint64_t total);
 struct progress *start_delayed_progress(const char *title, uint64_t total);
 void stop_progress(struct progress **progress);
Eric Sunshine Sept. 21, 2018, 2:17 a.m. UTC | #6
On Wed, Sep 19, 2018 at 10:10 AM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
> Before this change the "git reflog expire" command didn't report any
> progress. This is the second command (after "pack-refs --all --prune")
> that the "gc" command will run.
> [...]
> That may just be some artifact of the repositories I've tested, but I
> suspect It'll be true in general. As the callstack above shows, in

s/It'll/it'll/

> order to guarantee that we don't do that we'd need to pass some
> "progress" variable through 10 levels of functions, many of which are
> "for_each" callback functions with void* cb_data.
diff mbox series

Patch

diff --git a/builtin/reflog.c b/builtin/reflog.c
index 3acef5a0ab..d3075ee75a 100644
--- a/builtin/reflog.c
+++ b/builtin/reflog.c
@@ -10,6 +10,7 @@ 
 #include "diff.h"
 #include "revision.h"
 #include "reachable.h"
+#include "progress.h"
 
 /* NEEDSWORK: switch to using parse_options */
 static const char reflog_expire_usage[] =
@@ -225,14 +226,20 @@  static void mark_reachable(struct expire_reflog_policy_cb *cb)
 	struct commit_list *pending;
 	timestamp_t expire_limit = cb->mark_limit;
 	struct commit_list *leftover = NULL;
+	struct progress *progress = NULL;
+	int i = 0;
 
 	for (pending = cb->mark_list; pending; pending = pending->next)
 		pending->item->object.flags &= ~REACHABLE;
 
 	pending = cb->mark_list;
+	progress = start_delayed_progress(
+		_("Marking unreachable commits in reflog for expiry"), 0);
 	while (pending) {
 		struct commit_list *parent;
 		struct commit *commit = pop_commit(&pending);
+
+		display_progress(progress, ++i);
 		if (commit->object.flags & REACHABLE)
 			continue;
 		if (parse_commit(commit))
@@ -253,6 +260,7 @@  static void mark_reachable(struct expire_reflog_policy_cb *cb)
 		}
 	}
 	cb->mark_list = leftover;
+	stop_progress(&progress);
 }
 
 static int unreachable(struct expire_reflog_policy_cb *cb, struct commit *commit, struct object_id *oid)