diff mbox series

[v4] git-log: add a --since=... --as-filter option

Message ID YlnYDgZRzDI87b/z@vmiklos.hu (mailing list archive)
State New, archived
Headers show
Series [v4] git-log: add a --since=... --as-filter option | expand

Commit Message

Miklos Vajna April 15, 2022, 8:39 p.m. UTC
This is similar to --since, but it will filter out not matching commits,
rather than stopping at the first not matching commit.

This is useful if you e.g. want to list the commits from the last year,
but one odd commit has a bad commit date and that would hide lots of
earlier commits in that range.

The behavior of --since is left unchanged, since it's valid to depend on
its current behavior.

Signed-off-by: Miklos Vajna <vmiklos@vmiklos.hu>
---

Hi Ævar,

On Tue, Apr 12, 2022 at 10:47:15AM +0200, Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
> > +--as-filter::
> > +	When combined with `--since=<date>`, show all commits more recent than
> > +	a specific date. This visits all commits in the range, rather than stopping at
> > +	the first commit which is older than a specific date.
> 
> I wonder if we should be more future-proof here and say that we'll run
> anything as a filter, and that --since is the one option currently
> affected.
> 
> But maybe there's no reason to do so...

My understanding is that in practice --since is the only option that
terminates the revision walk on the first match, so I would argue there
is no need for this.

> In any case these docs are inaccurate because they cover --since, but if
> you check revision.c we'll set "max_age" on other options too
> (synonyms?).

Good catch, I've added --max-age and --after as well.

> All in all I wonder if this wouldn't be much more understandable if we
> advertised is as another option to do "HISTORY SIMPLIFICATION", which
> looking at e.g. get_commit_action() and "prune" is kind of what we're
> doing with the existing --since behavior.

Makes sense, we kind of simplify history by default here & then this
option could be documented as one that modifies this terminating
behavior.

> I think it's good to do this as a general mechanism, but if you now
> remove the "max_age" field from "struct rev_info" and:
> 
> 	make -k
> 
> You'll see a bunch of callers who check "max_age" outside of revision.c,
> since those will accept these revision options are they doing the right
> thing now too?

I found the following callers:

- some builtins that want to make sure that no history limiting is used,
  an additional --as-filter doesn't change behavior there

- blame: this has its own commit walking loop, so --as-filter doesn't
  change any behavior here unintentionally.

- bundle: --since is not used for revision walking here, just to check
  what tags to include/exclude, so this is already not terminating

> In any case we should have tests for those callers, i.e. blame, bundle
> etc.

t/t5607-clone-bundle.sh already tests bundle --since. I've added a new
t/t4218-blame-limit.sh to test blame --since, it seems there were no
tests for this so far.

Thanks,

Miklos

 Documentation/rev-list-options.txt |  6 +++++
 revision.c                         | 13 +++++++++--
 revision.h                         |  1 +
 t/t4217-log-limit.sh               | 36 ++++++++++++++++++++++++++++++
 t/t4218-blame-limit.sh             | 36 ++++++++++++++++++++++++++++++
 5 files changed, 90 insertions(+), 2 deletions(-)
 create mode 100755 t/t4217-log-limit.sh
 create mode 100755 t/t4218-blame-limit.sh

Comments

Junio C Hamano April 15, 2022, 11:13 p.m. UTC | #1
Miklos Vajna <vmiklos@vmiklos.hu> writes:

> This is similar to --since, but it will filter out not matching commits,
> rather than stopping at the first not matching commit.
>
> This is useful if you e.g. want to list the commits from the last year,
> but one odd commit has a bad commit date and that would hide lots of
> earlier commits in that range.
>
> The behavior of --since is left unchanged, since it's valid to depend on
> its current behavior.

The above is good if it were a new --since-as-filter option.  Adding
"--as-filter" as a separate option to be used in conjunction with
"--since" would fundamentally mean "The behaviour of --since is left
unchanged" cannot possibly be true.

A suggested rewrite.  I label each section and highlight its point,
but in the end product, you should just separate them with a blank
line, making each of them into its own paragraph.

Title (single line, short and to the point)

    log: "--as-filter" option adjusts how "--since" cut-off works

Intro (observation of the current state)

    The "--since=<time>" option of "git log" limits the commits
    displayed by the command by stopping the traversal once it
    sees a commit whose timestamp is older than the given time and
    not digging further into its parents.

Problem description (pros and cons of the current state)

    This is OK in a history where a commit always has a newer
    timestamp than any of its parents'.  Once you see a commit older
    than the given <time>, all ancestor commits of it are even older
    than the time anyway.  It poses, however, a problem when there
    is a commit with a wrong timestamp that makes it appear older
    than its parents.  Stopping traversal at the "incorrectly old"
    commit will hide its ancestors that are newer than that wrong
    commit and are newer than the cut-off time given with the --since
    option.  --max-age and --after being the synonyms to --since,
    they share the same issue.

Solution (give orders to the codebase to "be like so")

    Add a new "--as-filter" option that modifies how "--since=<time>"
    is used.  Instead of stopping the traversal to hide an old
    enough commit and its all ancestors, exclude commits with an old
    timestamp from the output but still keep digging the history.

Other comments (caveats, etc.)

    Without other traversal stopping options, this will force the
    command in "git log" family to dig down the history to the root.
    It may be an acceptable cost for a small project with short
    history and many commits with screwy timestamps.

> diff --git a/revision.c b/revision.c
> index 7d435f8048..ff018c3976 100644
> --- a/revision.c
> +++ b/revision.c
> @@ -1440,6 +1440,9 @@ static int limit_list(struct rev_info *revs)
>  		if (revs->min_age != -1 && (commit->date > revs->min_age) &&
>  		    !revs->line_level_traverse)
>  			continue;
> +		if (revs->max_age != -1 && revs->as_filter && (commit->date < revs->max_age) &&
> +		    !revs->line_level_traverse)

That's an overly long line.

> +			continue;

In any case, isn't this too late in limit_list() to adjust --since?
There is this logic earlier in the loop:

	while (original_list) {
		struct commit *commit = pop_commit(&original_list);
		struct object *obj = &commit->object;
		show_early_output_fn_t show;

		if (commit == interesting_cache)
			interesting_cache = NULL;

		if (revs->max_age != -1 && (commit->date < revs->max_age))
			obj->flags |= UNINTERESTING;
		if (process_parents(revs, commit, &original_list, NULL) < 0)
			return -1;

We look at max_age and if it is set and newer than the timestamp of
the commit we are looking at, we immediately mark that commit
uninteresting so that this and all its ancestors are excluded from
the output.  Don't you want to disable that logic so that the
traversal continues?

> @@ -3862,6 +3868,9 @@ enum commit_action get_commit_action(struct rev_info *revs, struct commit *commi
>  	if (revs->min_age != -1 &&
>  	    comparison_date(revs, commit) > revs->min_age)
>  			return commit_ignore;
> +	if (revs->max_age != -1 && revs->as_filter &&
> +	    comparison_date(revs, commit) < revs->max_age)
> +			return commit_ignore;
>  	if (revs->min_parents || (revs->max_parents >= 0)) {
>  		int n = commit_list_count(commit->parents);
>  		if ((n < revs->min_parents) ||

I am not sure how --since should affect (or not affect) the history
simplification, but my gut feeling says this may cause unintended
fallout.  I do not have time to dig deeper today, but something to
keep in mind...

> diff --git a/t/t4217-log-limit.sh b/t/t4217-log-limit.sh
> new file mode 100755
> index 0000000000..2a3705c714
> --- /dev/null
> +++ b/t/t4217-log-limit.sh
> @@ -0,0 +1,36 @@
> +#!/bin/sh
> +
> +test_description='git log with filter options limiting the output'

OK.

> +GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
> +export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME

Does any of the tests in this file care?  The above is a mechanism
primarily for transitioning scripts that were written long time ago,
and newly written tests that do not have to care about the initial
branch should not need it (they can instead explicitly create a
branch and work on it, if they really need to refer the initial
branch by name, but as far as I can tell, none your test in this
file even needs to refer to any branch with any name).

> +
> +. ./test-lib.sh
> +
> +GIT_TEST_COMMIT_GRAPH=0
> +GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS=0

Do these matter, and should these matter?  Does anybody export them
to make any difference in this test?

> +test_expect_success 'setup test' '
> +	git init &&
> +	echo a > file &&

Style:

	echo a >file &&

(cf. Documentation/CodingGuidelines)

> +	git add file &&
> +	GIT_COMMITTER_DATE="2021-02-01 0:00" git commit -m init &&

It is somewhat annoying to see 00:00 spelled as 0:00 here (and below).

> +	echo a >> file &&
> +	git add file &&
> +	GIT_COMMITTER_DATE="2022-02-01 0:00" git commit -m first &&
> +	echo a >> file &&
> +	git add file &&
> +	GIT_COMMITTER_DATE="2021-03-01 0:00" git commit -m second &&
> +	echo a >> file &&
> +	git add file &&
> +	GIT_COMMITTER_DATE="2022-03-01 0:00" git commit -m third
> +'
> +
> +test_expect_success 'git log --since=... --as-filter' '
> +	git log --since="2022-01-01" --as-filter --pretty="format:%s" > actual &&

I think you meant --format=%s (which is --pretty="tformat:%s"), as
you do not want "actual" to end in an incomplete line.  The same
"no space between the redirection operator and its target" applies
here as everywhere else.

> +	! test_i18ngrep init actual &&
> +	test_i18ngrep first actual &&
> +	! test_i18ngrep second actual &&
> +	test_i18ngrep third actual

test_i18ngrep -> grep

But stepping back a bit, do we or do we not care the order in which
first and third appear in the "actual" file?  It's not like we have
a mergy history and two commits that cannot topologically be
compared.  We have a simple linear single strand of pearls here, and
if you start traversing from the tip, you'll reliably see third
first, then second, then first and then finally init, in that order
and in no other order.  So, wouldn't we be better off writing it
more like ...

	git log --since=2022-01-01 --as-filter --format=%s >actual &&
	cat >expect <<-\EOF &&
	third
	first
	EOF
	test_cmp expect actual

... i.e. explicitly spell out what we expect not to change?

Thanks.
diff mbox series

Patch

diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt
index fd4f4e26c9..354bd29f10 100644
--- a/Documentation/rev-list-options.txt
+++ b/Documentation/rev-list-options.txt
@@ -350,6 +350,12 @@  The following options select the commits to be shown:
 <paths>::
 	Commits modifying the given <paths> are selected.
 
+--as-filter::
+	When combined with `--max-age=<date>`, `--since=<date>` or
+	`--after=<date>`, show all commits more recent than a specific date. This
+	visits all commits in the range, rather than stopping at the first commit
+	which is older than a specific date.
+
 --simplify-by-decoration::
 	Commits that are referred by some branch or tag are selected.
 
diff --git a/revision.c b/revision.c
index 7d435f8048..ff018c3976 100644
--- a/revision.c
+++ b/revision.c
@@ -1440,6 +1440,9 @@  static int limit_list(struct rev_info *revs)
 		if (revs->min_age != -1 && (commit->date > revs->min_age) &&
 		    !revs->line_level_traverse)
 			continue;
+		if (revs->max_age != -1 && revs->as_filter && (commit->date < revs->max_age) &&
+		    !revs->line_level_traverse)
+			continue;
 		date = commit->date;
 		p = &commit_list_insert(commit, p)->next;
 
@@ -1838,6 +1841,7 @@  void repo_init_revisions(struct repository *r,
 	revs->dense = 1;
 	revs->prefix = prefix;
 	revs->max_age = -1;
+	revs->as_filter = 0;
 	revs->min_age = -1;
 	revs->skip_count = -1;
 	revs->max_count = -1;
@@ -2218,6 +2222,8 @@  static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg
 	} else if ((argcount = parse_long_opt("since", argv, &optarg))) {
 		revs->max_age = approxidate(optarg);
 		return argcount;
+	} else if (!strcmp(arg, "--as-filter")) {
+		revs->as_filter = 1;
 	} else if ((argcount = parse_long_opt("after", argv, &optarg))) {
 		revs->max_age = approxidate(optarg);
 		return argcount;
@@ -3365,7 +3371,7 @@  static void explore_walk_step(struct rev_info *revs)
 	if (revs->sort_order == REV_SORT_BY_AUTHOR_DATE)
 		record_author_date(&info->author_date, c);
 
-	if (revs->max_age != -1 && (c->date < revs->max_age))
+	if (revs->max_age != -1 && !revs->as_filter && (c->date < revs->max_age))
 		c->object.flags |= UNINTERESTING;
 
 	if (process_parents(revs, c, NULL, NULL) < 0)
@@ -3862,6 +3868,9 @@  enum commit_action get_commit_action(struct rev_info *revs, struct commit *commi
 	if (revs->min_age != -1 &&
 	    comparison_date(revs, commit) > revs->min_age)
 			return commit_ignore;
+	if (revs->max_age != -1 && revs->as_filter &&
+	    comparison_date(revs, commit) < revs->max_age)
+			return commit_ignore;
 	if (revs->min_parents || (revs->max_parents >= 0)) {
 		int n = commit_list_count(commit->parents);
 		if ((n < revs->min_parents) ||
@@ -4019,7 +4028,7 @@  static struct commit *get_revision_1(struct rev_info *revs)
 		 * that we'd otherwise have done in limit_list().
 		 */
 		if (!revs->limited) {
-			if (revs->max_age != -1 &&
+			if (revs->max_age != -1 && !revs->as_filter &&
 			    comparison_date(revs, commit) < revs->max_age)
 				continue;
 
diff --git a/revision.h b/revision.h
index 5bc59c7bfe..fe37ebd83d 100644
--- a/revision.h
+++ b/revision.h
@@ -263,6 +263,7 @@  struct rev_info {
 	int skip_count;
 	int max_count;
 	timestamp_t max_age;
+	int as_filter;
 	timestamp_t min_age;
 	int min_parents;
 	int max_parents;
diff --git a/t/t4217-log-limit.sh b/t/t4217-log-limit.sh
new file mode 100755
index 0000000000..2a3705c714
--- /dev/null
+++ b/t/t4217-log-limit.sh
@@ -0,0 +1,36 @@ 
+#!/bin/sh
+
+test_description='git log with filter options limiting the output'
+GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
+export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
+
+. ./test-lib.sh
+
+GIT_TEST_COMMIT_GRAPH=0
+GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS=0
+
+test_expect_success 'setup test' '
+	git init &&
+	echo a > file &&
+	git add file &&
+	GIT_COMMITTER_DATE="2021-02-01 0:00" git commit -m init &&
+	echo a >> file &&
+	git add file &&
+	GIT_COMMITTER_DATE="2022-02-01 0:00" git commit -m first &&
+	echo a >> file &&
+	git add file &&
+	GIT_COMMITTER_DATE="2021-03-01 0:00" git commit -m second &&
+	echo a >> file &&
+	git add file &&
+	GIT_COMMITTER_DATE="2022-03-01 0:00" git commit -m third
+'
+
+test_expect_success 'git log --since=... --as-filter' '
+	git log --since="2022-01-01" --as-filter --pretty="format:%s" > actual &&
+	! test_i18ngrep init actual &&
+	test_i18ngrep first actual &&
+	! test_i18ngrep second actual &&
+	test_i18ngrep third actual
+'
+
+test_done
diff --git a/t/t4218-blame-limit.sh b/t/t4218-blame-limit.sh
new file mode 100755
index 0000000000..03f513f331
--- /dev/null
+++ b/t/t4218-blame-limit.sh
@@ -0,0 +1,36 @@ 
+#!/bin/sh
+
+test_description='git blame with filter options limiting the output'
+GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
+export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
+
+. ./test-lib.sh
+
+GIT_TEST_COMMIT_GRAPH=0
+GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS=0
+
+test_expect_success 'setup test' '
+	git init &&
+	echo a > file &&
+	git add file &&
+	GIT_AUTHOR_DATE="2020-01-01 0:00" GIT_COMMITTER_DATE="2020-01-01 0:00" git commit -m init &&
+	echo a >> file &&
+	git add file &&
+	GIT_AUTHOR_DATE="2020-02-01 0:00" GIT_COMMITTER_DATE="2020-02-01 0:00" git commit -m first &&
+	echo a >> file &&
+	git add file &&
+	GIT_AUTHOR_DATE="2020-03-01 0:00" GIT_COMMITTER_DATE="2020-03-01 0:00" git commit -m second &&
+	echo a >> file &&
+	git add file &&
+	GIT_AUTHOR_DATE="2020-04-01 0:00" GIT_COMMITTER_DATE="2020-04-01 0:00" git commit -m third
+'
+
+test_expect_success 'git blame --since=...' '
+	git blame --since="2020-02-15" file > actual &&
+	! test_i18ngrep 2020-01-01 actual &&
+	test_i18ngrep 2020-02-01 actual &&
+	test_i18ngrep 2020-03-01 actual &&
+	test_i18ngrep 2020-04-01 actual
+'
+
+test_done