diff mbox series

[v2,5/7] ls-files: fix a trivial dir_clear() leak

Message ID patch-v2-5.7-73cf1018953-20211007T100014Z-avarab@gmail.com (mailing list archive)
State Accepted
Commit eab4ac6a233e505e78fc8b04df03d58628d5ff3e
Headers show
Series leak tests: fix "test-tool" & other small leaks | expand

Commit Message

Ævar Arnfjörð Bjarmason Oct. 7, 2021, 10:01 a.m. UTC
Fix an edge case that was missed when the dir_clear() call was added
in eceba532141 (dir: fix problematic API to avoid memory leaks,
2020-08-18), we need to also clean up when we're about to exit with
non-zero.

That commit says, on the topic of the dir_clear() API and UNLEAK():

    [...]two of them clearly thought about leaks since they had an
    UNLEAK(dir) directive, which to me suggests that the method to
    free the data was too unclear.

I think that 0e5bba53af7 (add UNLEAK annotation for reducing leak
false positives, 2017-09-08) which added the UNLEAK() makes it clear
that that wasn't the case, rather it was the desire to avoid the
complexity of freeing the memory at the end of the program.

This does add a bit of complexity, but I think it's worth it to just
fix these leaks when it's easy in built-ins. It allows them to serve
as canaries for underlying APIs that shouldn't be leaking, it
encourages us to make those freeing APIs nicer for all their users,
and it prevents other leaking regressions by being able to mark the
entire test as TEST_PASSES_SANITIZE_LEAK=true.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/ls-files.c                | 13 +++++--------
 t/t3005-ls-files-relative.sh      |  1 +
 t/t3020-ls-files-error-unmatch.sh |  2 ++
 t/t3700-add.sh                    |  1 +
 t/t7104-reset-hard.sh             |  1 +
 5 files changed, 10 insertions(+), 8 deletions(-)

Comments

Junio C Hamano Oct. 7, 2021, 10:46 p.m. UTC | #1
Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> This does add a bit of complexity, but I think it's worth it to just
> fix these leaks when it's easy in built-ins. It allows them to serve
> as canaries for underlying APIs that shouldn't be leaking, it
> encourages us to make those freeing APIs nicer for all their users,
> and it prevents other leaking regressions by being able to mark the
> entire test as TEST_PASSES_SANITIZE_LEAK=true.

This does more than necessary, though.  Introducing "ret", replacing
an early return with an assignment to it, and returning "ret"
instead of hardcoded 0, would have been the "fix a trivial leak",
and the "ah, report_path_error() always returns true" does not
belong here.

These things look small, but small things add up.
Ævar Arnfjörð Bjarmason Oct. 13, 2021, 1:39 p.m. UTC | #2
On Thu, Oct 07 2021, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:
>
>> This does add a bit of complexity, but I think it's worth it to just
>> fix these leaks when it's easy in built-ins. It allows them to serve
>> as canaries for underlying APIs that shouldn't be leaking, it
>> encourages us to make those freeing APIs nicer for all their users,
>> and it prevents other leaking regressions by being able to mark the
>> entire test as TEST_PASSES_SANITIZE_LEAK=true.
>
> This does more than necessary, though.  Introducing "ret", replacing
> an early return with an assignment to it, and returning "ret"
> instead of hardcoded 0, would have been the "fix a trivial leak",
> and the "ah, report_path_error() always returns true" does not
> belong here.
>
> These things look small, but small things add up.

I think you mean that you'd have liked:

diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index a2000ed6bf2..5e6b6f2d4a0 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -672,6 +672,7 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix)
                         N_("suppress duplicate entries")),
                OPT_END()
        };
+       int ret = 0;
 
        if (argc == 2 && !strcmp(argv[1], "-h"))
                usage_with_options(ls_files_usage, builtin_ls_files_options);
@@ -776,15 +777,13 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix)
                show_ru_info(the_repository->index);
 
        if (ps_matched) {
-               int bad;
-               bad = report_path_error(ps_matched, &pathspec);
-               if (bad)
+               if (report_path_error(ps_matched, &pathspec)) {
                        fprintf(stderr, "Did you forget to 'git add'?\n");
-
-               return bad ? 1 : 0;
+                       ret = 1;
+               }
        }
 
        dir_clear(&dir);
        free(max_prefix);
-       return 0;
+       return ret;
 }

Instead of this:

diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index a2000ed6bf2..fcc685947f9 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -672,6 +672,7 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix)
                         N_("suppress duplicate entries")),
                OPT_END()
        };
+       int ret = 0;
 
        if (argc == 2 && !strcmp(argv[1], "-h"))
                usage_with_options(ls_files_usage, builtin_ls_files_options);
@@ -775,16 +776,12 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix)
        if (show_resolve_undo)
                show_ru_info(the_repository->index);
 
-       if (ps_matched) {
-               int bad;
-               bad = report_path_error(ps_matched, &pathspec);
-               if (bad)
-                       fprintf(stderr, "Did you forget to 'git add'?\n");
-
-               return bad ? 1 : 0;
+       if (ps_matched && report_path_error(ps_matched, &pathspec)) {
+               fprintf(stderr, "Did you forget to 'git add'?\n");
+               ret = 1;
        }
 
        dir_clear(&dir);
        free(max_prefix);
-       return 0;
+       return ret;
 }

Doesn't make much sense, but I can re-roll with it if you feel strongly
about it. I think the current version is ready to be picked up.

Yeah we should avoid refactoring-while-at-it, but in cases where a patch
removes the only reason a nested if/if statement exists, unrolling it
into a single "if" handly seems too much. I think the alternative just
leaves the reader squinting at the diff wondering why we still need that
nesting...
Junio C Hamano Oct. 13, 2021, 7:01 p.m. UTC | #3
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> -       if (ps_matched) {
> -               int bad;
> -               bad = report_path_error(ps_matched, &pathspec);
> -               if (bad)
> -                       fprintf(stderr, "Did you forget to 'git add'?\n");
> -
> -               return bad ? 1 : 0;
> +       if (ps_matched && report_path_error(ps_matched, &pathspec)) {
> +               fprintf(stderr, "Did you forget to 'git add'?\n");
> +               ret = 1;
>         }
>  
>         dir_clear(&dir);
>         free(max_prefix);
> -       return 0;
> +       return ret;
>  }
>
> Doesn't make much sense, but I can re-roll with it if you feel strongly
> about it. I think the current version is ready to be picked up.

I do not see where that "doesn't make much sense" comes from.  If it
does not make sense, I wouldn't have mentioned it.

> Yeah we should avoid refactoring-while-at-it, but in cases where a patch
> removes the only reason a nested if/if statement exists, unrolling it

And I do not quite see what "the only reason" is in this case, or
what it has to do with the restructuring, either.  Care to either
clarify, or fix the patch, or perhaps both?

Thanks.
Ævar Arnfjörð Bjarmason Oct. 14, 2021, 12:15 a.m. UTC | #4
On Wed, Oct 13 2021, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>
>> -       if (ps_matched) {
>> -               int bad;
>> -               bad = report_path_error(ps_matched, &pathspec);
>> -               if (bad)
>> -                       fprintf(stderr, "Did you forget to 'git add'?\n");
>> -
>> -               return bad ? 1 : 0;
>> +       if (ps_matched && report_path_error(ps_matched, &pathspec)) {
>> +               fprintf(stderr, "Did you forget to 'git add'?\n");
>> +               ret = 1;
>>         }
>>  
>>         dir_clear(&dir);
>>         free(max_prefix);
>> -       return 0;
>> +       return ret;
>>  }
>>
>> Doesn't make much sense, but I can re-roll with it if you feel strongly
>> about it. I think the current version is ready to be picked up.
>
> I do not see where that "doesn't make much sense" comes from.  If it
> does not make sense, I wouldn't have mentioned it.

I meant "I don't think that makes much sense".

>> Yeah we should avoid refactoring-while-at-it, but in cases where a patch
>> removes the only reason a nested if/if statement exists, unrolling it
>
> And I do not quite see what "the only reason" is in this case, or
> what it has to do with the restructuring, either.  Care to either
> clarify, or fix the patch, or perhaps both?

I mean that we generally don't write code like:

	if (x) {
		if (y) {
			fprintf(...);
			ret = 1;
		}
	}

And instead write:

	if (x && y) {
		fprintf(...);
		ret = 1;
	}

So aside from the specific change here I thought your objection would
also apply to e.g. removing braces from a standalone "if" as it's
reduced to a one statement body, which we we generally do.

But reading your upthread:

    [...]the "ah, report_path_error() always returns true" does not
    belong here.

I think the objection might be in folding that in particular into the
"if" statement?

The point of this change is that if report_path_error() was non-zero
we'd skip the freeing before, but shouldn't.

So any change here would have wanted to fall through the "if", but we
need to do the fprintf() if report_path_error returns true.

So I don't really get that "always returns true" comment means in this
context. It returns true on error, and dealing with that code branch in
relation to the freeing is what this change needs to concern itself
with.
Junio C Hamano Oct. 14, 2021, 5:38 p.m. UTC | #5
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> On Wed, Oct 13 2021, Junio C Hamano wrote:
>
>> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>>
>>> -       if (ps_matched) {
>>> -               int bad;
>>> -               bad = report_path_error(ps_matched, &pathspec);
>>> -               if (bad)
>>> -                       fprintf(stderr, "Did you forget to 'git add'?\n");
>>> -
>>> -               return bad ? 1 : 0;
>>> +       if (ps_matched && report_path_error(ps_matched, &pathspec)) {
>>> +               fprintf(stderr, "Did you forget to 'git add'?\n");
>>> +               ret = 1;
>>>         }

 ...

> I mean that we generally don't write code like:
>
> 	if (x) {
> 		if (y) {
> 			fprintf(...);
> 			ret = 1;
> 		}
> 	}
>
> And instead write:
>
> 	if (x && y) {
> 		fprintf(...);
> 		ret = 1;
> 	}
>
> So aside from the specific change here I thought your objection would
> also apply to e.g. removing braces from a standalone "if" as it's
> reduced to a one statement body, which we we generally do.

Yes, but in a "plug-leak-only" fix, I would have expected

	if (ps_matched) {
		int bad;
		bad = report_path_error(ps_matched, &pathspec);
		if (bad)
			fprintf(stderr, "Did you forget to 'git add'?\n");

-		return bad ? 1 : 0;
+		ret = bad ? 1 : 0;

	}

Doing anything more than that is "while at it clean-up".
diff mbox series

Patch

diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index a2000ed6bf2..fcc685947f9 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -672,6 +672,7 @@  int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix)
 			 N_("suppress duplicate entries")),
 		OPT_END()
 	};
+	int ret = 0;
 
 	if (argc == 2 && !strcmp(argv[1], "-h"))
 		usage_with_options(ls_files_usage, builtin_ls_files_options);
@@ -775,16 +776,12 @@  int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix)
 	if (show_resolve_undo)
 		show_ru_info(the_repository->index);
 
-	if (ps_matched) {
-		int bad;
-		bad = report_path_error(ps_matched, &pathspec);
-		if (bad)
-			fprintf(stderr, "Did you forget to 'git add'?\n");
-
-		return bad ? 1 : 0;
+	if (ps_matched && report_path_error(ps_matched, &pathspec)) {
+		fprintf(stderr, "Did you forget to 'git add'?\n");
+		ret = 1;
 	}
 
 	dir_clear(&dir);
 	free(max_prefix);
-	return 0;
+	return ret;
 }
diff --git a/t/t3005-ls-files-relative.sh b/t/t3005-ls-files-relative.sh
index 727e9ae1a44..6ba8b589cd0 100755
--- a/t/t3005-ls-files-relative.sh
+++ b/t/t3005-ls-files-relative.sh
@@ -5,6 +5,7 @@  test_description='ls-files tests with relative paths
 This test runs git ls-files with various relative path arguments.
 '
 
+TEST_PASSES_SANITIZE_LEAK=true
 . ./test-lib.sh
 
 test_expect_success 'prepare' '
diff --git a/t/t3020-ls-files-error-unmatch.sh b/t/t3020-ls-files-error-unmatch.sh
index 124e73b8e60..2cbcbc0721b 100755
--- a/t/t3020-ls-files-error-unmatch.sh
+++ b/t/t3020-ls-files-error-unmatch.sh
@@ -9,6 +9,8 @@  This test runs git ls-files --error-unmatch to ensure it correctly
 returns an error when a non-existent path is provided on the command
 line.
 '
+
+TEST_PASSES_SANITIZE_LEAK=true
 . ./test-lib.sh
 
 test_expect_success 'setup' '
diff --git a/t/t3700-add.sh b/t/t3700-add.sh
index 4086e1ebbc9..283a66955d6 100755
--- a/t/t3700-add.sh
+++ b/t/t3700-add.sh
@@ -5,6 +5,7 @@ 
 
 test_description='Test of git add, including the -- option.'
 
+TEST_PASSES_SANITIZE_LEAK=true
 . ./test-lib.sh
 
 # Test the file mode "$1" of the file "$2" in the index.
diff --git a/t/t7104-reset-hard.sh b/t/t7104-reset-hard.sh
index 7948ec392b3..cf9697eba9a 100755
--- a/t/t7104-reset-hard.sh
+++ b/t/t7104-reset-hard.sh
@@ -2,6 +2,7 @@ 
 
 test_description='reset --hard unmerged'
 
+TEST_PASSES_SANITIZE_LEAK=true
 . ./test-lib.sh
 
 test_expect_success setup '