diff mbox series

2.36 gitk/diff-tree --stdin regression fix

Message ID xmqqbkwpvyyc.fsf@gitster.g (mailing list archive)
State Superseded
Headers show
Series 2.36 gitk/diff-tree --stdin regression fix | expand

Commit Message

Junio C Hamano April 25, 2022, 5:45 p.m. UTC
This reverts commit 244c2724 (diff.[ch]: have diff_free() call
clear_pathspec(opts.pathspec), 2022-02-16).

The diff_free() call is to be used after a diffopt structure is used
to compare two sets of paths to release resources that were needed
only for that comparison, and keep the data such as pathspec that
are reused by the diffopt structure to make the next and subsequent
comparison (imagine "git log -p -<options> -- <pathspec>" where the
options and pathspec are kept in the diffopt structure, used to
compare HEAD and HEAD~, then used again when HEAD~ and HEAD~2 are
compared).

We by mistake started clearing the pathspec in diff_free(), so
programs like gitk that runs

    git diff-tree --stdin -- <pathspec>

downstream of a pipe, processing one commit after another, started
showing irrelevant comparison outside the given <pathspec> from the
second commit.

The buggy commit may have been hiding the places where diff
machinery is used only once and called diff_free() to release that
per-comparison resources, but forgetting to call clear_pathspec() to
release the resource held for the (potentially) repeated comparison,
and we eventually would want to add clear_pathspec() to clear
resources to be released after a (potentially repeated) diff session
is done (if there are similar resources other than pathspec that
need to be cleared at the end, we should then know where to clear
them), but that is "per program invocation" leak that will be
cleaned up by calling exit(3) and of lower priority than fixing this
behavior-breaking regression.

Reported-by: Matthias Aßhauer <mha1993@live.de>
Helped-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 add-interactive.c | 6 +++---
 blame.c           | 3 +++

 builtin/reset.c   | 1 +
 diff.c            | 1 -
 notes-merge.c     | 2 ++
 5 files changed, 9 insertions(+), 4 deletions(-)

Comments

Phillip Wood April 26, 2022, 10:09 a.m. UTC | #1
On 25/04/2022 18:45, Junio C Hamano wrote:
> This reverts commit 244c2724 (diff.[ch]: have diff_free() call
> clear_pathspec(opts.pathspec), 2022-02-16).
> 
> The diff_free() call is to be used after a diffopt structure is used
> to compare two sets of paths to release resources that were needed
> only for that comparison, and keep the data such as pathspec that
> are reused by the diffopt structure to make the next and subsequent
> comparison (imagine "git log -p -<options> -- <pathspec>" where the
> options and pathspec are kept in the diffopt structure, used to
> compare HEAD and HEAD~, then used again when HEAD~ and HEAD~2 are
> compared).
> 
> We by mistake started clearing the pathspec in diff_free(), so
> programs like gitk that runs
> 
>      git diff-tree --stdin -- <pathspec>
> 
> downstream of a pipe, processing one commit after another, started
> showing irrelevant comparison outside the given <pathspec> from the
> second commit.

I notice from the patch context that we are still calling 
diff_free_ignore_regex(options) which was added in c45dc9cf30 ("diff: 
plug memory leak from regcomp() on {log,diff} -I", 2021-02-11). I think 
that will need reverting as well as it freeing data that is needed when 
options is reused by "diff-tree --stdin" or "log -p".

Best Wishes

Phillip

> The buggy commit may have been hiding the places where diff
> machinery is used only once and called diff_free() to release that
> per-comparison resources, but forgetting to call clear_pathspec() to
> release the resource held for the (potentially) repeated comparison,
> and we eventually would want to add clear_pathspec() to clear
> resources to be released after a (potentially repeated) diff session
> is done (if there are similar resources other than pathspec that
> need to be cleared at the end, we should then know where to clear
> them), but that is "per program invocation" leak that will be
> cleaned up by calling exit(3) and of lower priority than fixing this
> behavior-breaking regression.
> 
> Reported-by: Matthias Aßhauer <mha1993@live.de>
> Helped-by: René Scharfe <l.s.r@web.de>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
>   add-interactive.c | 6 +++---
>   blame.c           | 3 +++
> 
>   builtin/reset.c   | 1 +
>   diff.c            | 1 -
>   notes-merge.c     | 2 ++
>   5 files changed, 9 insertions(+), 4 deletions(-)
> 
> diff --git a/add-interactive.c b/add-interactive.c
> index e1ab39cce3..6498ae196f 100644
> --- a/add-interactive.c
> +++ b/add-interactive.c
> @@ -797,14 +797,14 @@ static int run_revert(struct add_i_state *s, const struct pathspec *ps,
>   	diffopt.flags.override_submodule_config = 1;
>   	diffopt.repo = s->r;
>   
> -	if (do_diff_cache(&oid, &diffopt)) {
> -		diff_free(&diffopt);
> +	if (do_diff_cache(&oid, &diffopt))
>   		res = -1;
> -	} else {
> +	else {
>   		diffcore_std(&diffopt);
>   		diff_flush(&diffopt);
>   	}
>   	free(paths);
> +	clear_pathspec(&diffopt.pathspec);
>   
>   	if (!res && write_locked_index(s->r->index, &index_lock,
>   				       COMMIT_LOCK) < 0)
> diff --git a/blame.c b/blame.c
> index 401990726e..206c295660 100644
> --- a/blame.c
> +++ b/blame.c
> @@ -1403,6 +1403,7 @@ static struct blame_origin *find_origin(struct repository *r,
>   		}
>   	}
>   	diff_flush(&diff_opts);
> +	clear_pathspec(&diff_opts.pathspec);
>   	return porigin;
>   }
>   
> @@ -1446,6 +1447,7 @@ static struct blame_origin *find_rename(struct repository *r,
>   		}
>   	}
>   	diff_flush(&diff_opts);
> +	clear_pathspec(&diff_opts.pathspec);
>   	return porigin;
>   }
>   
> @@ -2326,6 +2328,7 @@ static void find_copy_in_parent(struct blame_scoreboard *sb,
>   	} while (unblamed);
>   	target->suspects = reverse_blame(leftover, NULL);
>   	diff_flush(&diff_opts);
> +	clear_pathspec(&diff_opts.pathspec);
>   }
>   
>   /*
> diff --git a/builtin/reset.c b/builtin/reset.c
> index 24968dd628..b97745ee94 100644
> --- a/builtin/reset.c
> +++ b/builtin/reset.c
> @@ -274,6 +274,7 @@ static int read_from_tree(const struct pathspec *pathspec,
>   		return 1;
>   	diffcore_std(&opt);
>   	diff_flush(&opt);
> +	clear_pathspec(&opt.pathspec);
>   
>   	return 0;
>   }
> diff --git a/diff.c b/diff.c
> index 0aef3db6e1..c862771a58 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -6345,7 +6345,6 @@ void diff_free(struct diff_options *options)
>   
>   	diff_free_file(options);
>   	diff_free_ignore_regex(options);
> -	clear_pathspec(&options->pathspec);
>   }
>   
>   void diff_flush(struct diff_options *options)
> diff --git a/notes-merge.c b/notes-merge.c
> index 7ba40cfb08..b4a3a903e8 100644
> --- a/notes-merge.c
> +++ b/notes-merge.c
> @@ -175,6 +175,7 @@ static struct notes_merge_pair *diff_tree_remote(struct notes_merge_options *o,
>   		       oid_to_hex(&mp->remote));
>   	}
>   	diff_flush(&opt);
> +	clear_pathspec(&opt.pathspec);
>   
>   	*num_changes = len;
>   	return changes;
> @@ -260,6 +261,7 @@ static void diff_tree_local(struct notes_merge_options *o,
>   		       oid_to_hex(&mp->local));
>   	}
>   	diff_flush(&opt);
> +	clear_pathspec(&opt.pathspec);
>   }
>   
>   static void check_notes_merge_worktree(struct notes_merge_options *o)
Phillip Wood April 26, 2022, 1:45 p.m. UTC | #2
On 26/04/2022 11:09, Phillip Wood wrote:
> On 25/04/2022 18:45, Junio C Hamano wrote:
>> This reverts commit 244c2724 (diff.[ch]: have diff_free() call
>> clear_pathspec(opts.pathspec), 2022-02-16).
>>
>> The diff_free() call is to be used after a diffopt structure is used
>> to compare two sets of paths to release resources that were needed
>> only for that comparison, and keep the data such as pathspec that
>> are reused by the diffopt structure to make the next and subsequent
>> comparison (imagine "git log -p -<options> -- <pathspec>" where the
>> options and pathspec are kept in the diffopt structure, used to
>> compare HEAD and HEAD~, then used again when HEAD~ and HEAD~2 are
>> compared).
>>
>> We by mistake started clearing the pathspec in diff_free(), so
>> programs like gitk that runs
>>
>>      git diff-tree --stdin -- <pathspec>
>>
>> downstream of a pipe, processing one commit after another, started
>> showing irrelevant comparison outside the given <pathspec> from the
>> second commit.
> 
> I notice from the patch context that we are still calling 
> diff_free_ignore_regex(options) which was added in c45dc9cf30 ("diff: 
> plug memory leak from regcomp() on {log,diff} -I", 2021-02-11). I think 
> that will need reverting as well as it freeing data that is needed when 
> options is reused by "diff-tree --stdin" or "log -p".

On further inspection we have tests for "log -p -I<regex>" in t4013 and 
e900d494dc ("diff: add an API for deferred freeing", 2021-02-11) 
modified builtin/log.c to set the new no_free flag so "log" should be 
OK. However "diff-tree --stdin -p -I<regex>" is not as 
builtin/diff-tree.c is unchanged by e900d494dc so the no_free flag is 
not set which I think is the cause of the problems reported here.

I think the close_file changes in e900d494dc should be safe as far as 
"diff-tree" is concerned as it never sets that flag.

In retrospect the no_free flag is pretty ugly and fragile. If we really 
cannot do it another way at least requiring callers to set a flag when 
they want things freeing would avoid nasty surprises like this at the 
expense of leaking when the caller forgets to set it. Perhaps once 
2.36.1 is out we should step back and think about exactly what we're 
trying to achieve by removing these bounded leaks rather than annotating 
them with UNLEAK().

Best Wishes

Phillip

> Best Wishes
> 
> Phillip
> 
>> The buggy commit may have been hiding the places where diff
>> machinery is used only once and called diff_free() to release that
>> per-comparison resources, but forgetting to call clear_pathspec() to
>> release the resource held for the (potentially) repeated comparison,
>> and we eventually would want to add clear_pathspec() to clear
>> resources to be released after a (potentially repeated) diff session
>> is done (if there are similar resources other than pathspec that
>> need to be cleared at the end, we should then know where to clear
>> them), but that is "per program invocation" leak that will be
>> cleaned up by calling exit(3) and of lower priority than fixing this
>> behavior-breaking regression.
>>
>> Reported-by: Matthias Aßhauer <mha1993@live.de>
>> Helped-by: René Scharfe <l.s.r@web.de>
>> Signed-off-by: Junio C Hamano <gitster@pobox.com>
>> ---
>>   add-interactive.c | 6 +++---
>>   blame.c           | 3 +++
>>
>>   builtin/reset.c   | 1 +
>>   diff.c            | 1 -
>>   notes-merge.c     | 2 ++
>>   5 files changed, 9 insertions(+), 4 deletions(-)
>>
>> diff --git a/add-interactive.c b/add-interactive.c
>> index e1ab39cce3..6498ae196f 100644
>> --- a/add-interactive.c
>> +++ b/add-interactive.c
>> @@ -797,14 +797,14 @@ static int run_revert(struct add_i_state *s, 
>> const struct pathspec *ps,
>>       diffopt.flags.override_submodule_config = 1;
>>       diffopt.repo = s->r;
>> -    if (do_diff_cache(&oid, &diffopt)) {
>> -        diff_free(&diffopt);
>> +    if (do_diff_cache(&oid, &diffopt))
>>           res = -1;
>> -    } else {
>> +    else {
>>           diffcore_std(&diffopt);
>>           diff_flush(&diffopt);
>>       }
>>       free(paths);
>> +    clear_pathspec(&diffopt.pathspec);
>>       if (!res && write_locked_index(s->r->index, &index_lock,
>>                          COMMIT_LOCK) < 0)
>> diff --git a/blame.c b/blame.c
>> index 401990726e..206c295660 100644
>> --- a/blame.c
>> +++ b/blame.c
>> @@ -1403,6 +1403,7 @@ static struct blame_origin *find_origin(struct 
>> repository *r,
>>           }
>>       }
>>       diff_flush(&diff_opts);
>> +    clear_pathspec(&diff_opts.pathspec);
>>       return porigin;
>>   }
>> @@ -1446,6 +1447,7 @@ static struct blame_origin *find_rename(struct 
>> repository *r,
>>           }
>>       }
>>       diff_flush(&diff_opts);
>> +    clear_pathspec(&diff_opts.pathspec);
>>       return porigin;
>>   }
>> @@ -2326,6 +2328,7 @@ static void find_copy_in_parent(struct 
>> blame_scoreboard *sb,
>>       } while (unblamed);
>>       target->suspects = reverse_blame(leftover, NULL);
>>       diff_flush(&diff_opts);
>> +    clear_pathspec(&diff_opts.pathspec);
>>   }
>>   /*
>> diff --git a/builtin/reset.c b/builtin/reset.c
>> index 24968dd628..b97745ee94 100644
>> --- a/builtin/reset.c
>> +++ b/builtin/reset.c
>> @@ -274,6 +274,7 @@ static int read_from_tree(const struct pathspec 
>> *pathspec,
>>           return 1;
>>       diffcore_std(&opt);
>>       diff_flush(&opt);
>> +    clear_pathspec(&opt.pathspec);
>>       return 0;
>>   }
>> diff --git a/diff.c b/diff.c
>> index 0aef3db6e1..c862771a58 100644
>> --- a/diff.c
>> +++ b/diff.c
>> @@ -6345,7 +6345,6 @@ void diff_free(struct diff_options *options)
>>       diff_free_file(options);
>>       diff_free_ignore_regex(options);
>> -    clear_pathspec(&options->pathspec);
>>   }
>>   void diff_flush(struct diff_options *options)
>> diff --git a/notes-merge.c b/notes-merge.c
>> index 7ba40cfb08..b4a3a903e8 100644
>> --- a/notes-merge.c
>> +++ b/notes-merge.c
>> @@ -175,6 +175,7 @@ static struct notes_merge_pair 
>> *diff_tree_remote(struct notes_merge_options *o,
>>                  oid_to_hex(&mp->remote));
>>       }
>>       diff_flush(&opt);
>> +    clear_pathspec(&opt.pathspec);
>>       *num_changes = len;
>>       return changes;
>> @@ -260,6 +261,7 @@ static void diff_tree_local(struct 
>> notes_merge_options *o,
>>                  oid_to_hex(&mp->local));
>>       }
>>       diff_flush(&opt);
>> +    clear_pathspec(&opt.pathspec);
>>   }
>>   static void check_notes_merge_worktree(struct notes_merge_options *o)
Junio C Hamano April 26, 2022, 3:16 p.m. UTC | #3
Phillip Wood <phillip.wood123@gmail.com> writes:

> On further inspection we have tests for "log -p -I<regex>" in t4013
> and e900d494dc ("diff: add an API for deferred freeing", 2021-02-11) 
> modified builtin/log.c to set the new no_free flag so "log" should be
> OK. However "diff-tree --stdin -p -I<regex>" is not as 
> builtin/diff-tree.c is unchanged by e900d494dc so the no_free flag is
> not set which I think is the cause of the problems reported here.

... reported here, meaning some reproduction exists?  It would be
good to have it in the test, next to the ones I added yesterday, I
think.

In any case, I think that is a much older breakage that can be left
after this "oops, where is my pathspec?" regression is dealt with.

> I think the close_file changes in e900d494dc should be safe as far as
> "diff-tree" is concerned as it never sets that flag.
>
> In retrospect the no_free flag is pretty ugly and fragile.

Yes.

> If we
> really cannot do it another way at least requiring callers to set a
> flag when they want things freeing would avoid nasty surprises like
> this at the expense of leaking when the caller forgets to set
> it. Perhaps once 2.36.1 is out we should step back and think about
> exactly what we're trying to achieve by removing these bounded leaks
> rather than annotating them with UNLEAK().

Doubly yes.

There is small per-task resources allocated that is not proportional
to the size of the task, i.e. "git log -p" may need more resource at
"peak" in a project with 100k files than in a project with 1k files,
and we do not want to leak these resources we use to compare two sets
of 100k (or 1k) files between "commit^" and "commit".  It may allocate
and deallocate more times in a project with 100k commits than in a
project with 1k commits, and we do not want to leak 100 times more
resources in the former project than the latter.  Aiming to reclaim
these resources needed proportinally to the size of the task is
absolutely a good thing to do.

But the final clean-up for the very top-level allocations that is
not proportional to the size of the task, like pathspec, regex, and
other result from parsing command line options and configuration
variables?  Using UNLEAK() to squelch the leak checker and letting
process termination to reclaim them is absolutely a no-cost solution
that is much better.
Junio C Hamano April 26, 2022, 3:26 p.m. UTC | #4
I wonder if we would have caught a regression like the one if we
used FREE_AND_NULL more sparingly.  For example, if we prematurely
called clear_pathspec(), the second iteration, because there is
free-and-null of pathspec->items and resetting pathspec->nr to 0,
would behave very normally as if there is no pathspec.  If we just
freed things, without NULLing them out or resetting .nr to 0, the
second iteration would try to access garbage and hopefully we will
catch a crash before such a code would have escaped the lab.

In any case, based on what I heard here, it appears that mimicking
"git log" does may probably be a better way to deal with this
regression?  As you said, all the other things diff_free() calls are
unwanted while "diff-tree --stdin" is still working, just like
"log"?

Thanks.
diff mbox series

Patch

diff --git a/add-interactive.c b/add-interactive.c
index e1ab39cce3..6498ae196f 100644
--- a/add-interactive.c
+++ b/add-interactive.c
@@ -797,14 +797,14 @@  static int run_revert(struct add_i_state *s, const struct pathspec *ps,
 	diffopt.flags.override_submodule_config = 1;
 	diffopt.repo = s->r;
 
-	if (do_diff_cache(&oid, &diffopt)) {
-		diff_free(&diffopt);
+	if (do_diff_cache(&oid, &diffopt))
 		res = -1;
-	} else {
+	else {
 		diffcore_std(&diffopt);
 		diff_flush(&diffopt);
 	}
 	free(paths);
+	clear_pathspec(&diffopt.pathspec);
 
 	if (!res && write_locked_index(s->r->index, &index_lock,
 				       COMMIT_LOCK) < 0)
diff --git a/blame.c b/blame.c
index 401990726e..206c295660 100644
--- a/blame.c
+++ b/blame.c
@@ -1403,6 +1403,7 @@  static struct blame_origin *find_origin(struct repository *r,
 		}
 	}
 	diff_flush(&diff_opts);
+	clear_pathspec(&diff_opts.pathspec);
 	return porigin;
 }
 
@@ -1446,6 +1447,7 @@  static struct blame_origin *find_rename(struct repository *r,
 		}
 	}
 	diff_flush(&diff_opts);
+	clear_pathspec(&diff_opts.pathspec);
 	return porigin;
 }
 
@@ -2326,6 +2328,7 @@  static void find_copy_in_parent(struct blame_scoreboard *sb,
 	} while (unblamed);
 	target->suspects = reverse_blame(leftover, NULL);
 	diff_flush(&diff_opts);
+	clear_pathspec(&diff_opts.pathspec);
 }
 
 /*
diff --git a/builtin/reset.c b/builtin/reset.c
index 24968dd628..b97745ee94 100644
--- a/builtin/reset.c
+++ b/builtin/reset.c
@@ -274,6 +274,7 @@  static int read_from_tree(const struct pathspec *pathspec,
 		return 1;
 	diffcore_std(&opt);
 	diff_flush(&opt);
+	clear_pathspec(&opt.pathspec);
 
 	return 0;
 }
diff --git a/diff.c b/diff.c
index 0aef3db6e1..c862771a58 100644
--- a/diff.c
+++ b/diff.c
@@ -6345,7 +6345,6 @@  void diff_free(struct diff_options *options)
 
 	diff_free_file(options);
 	diff_free_ignore_regex(options);
-	clear_pathspec(&options->pathspec);
 }
 
 void diff_flush(struct diff_options *options)
diff --git a/notes-merge.c b/notes-merge.c
index 7ba40cfb08..b4a3a903e8 100644
--- a/notes-merge.c
+++ b/notes-merge.c
@@ -175,6 +175,7 @@  static struct notes_merge_pair *diff_tree_remote(struct notes_merge_options *o,
 		       oid_to_hex(&mp->remote));
 	}
 	diff_flush(&opt);
+	clear_pathspec(&opt.pathspec);
 
 	*num_changes = len;
 	return changes;
@@ -260,6 +261,7 @@  static void diff_tree_local(struct notes_merge_options *o,
 		       oid_to_hex(&mp->local));
 	}
 	diff_flush(&opt);
+	clear_pathspec(&opt.pathspec);
 }
 
 static void check_notes_merge_worktree(struct notes_merge_options *o)