diff mbox series

[02/11] submodule--helper: replace memset() with { 0 }-initialization

Message ID patch-02.11-4049362e9b4-20220713T131601Z-avarab@gmail.com (mailing list archive)
State Superseded
Headers show
Series submodule--helper: fix memory leaks | expand

Commit Message

Ævar Arnfjörð Bjarmason July 13, 2022, 1:16 p.m. UTC
Use the less verbose { 0 }-initialization syntax rather than memset()
in builtin/submodule--helper.c, this doesn't make a difference in
terms of behavior, but as we're about to modify adjacent code makes
this more consistent, and lets us avoid worrying about when the
memset() happens v.s. a "goto cleanup".

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/submodule--helper.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

Comments

Glen Choo July 13, 2022, 9 p.m. UTC | #1
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> Use the less verbose { 0 }-initialization syntax rather than memset()
> in builtin/submodule--helper.c, this doesn't make a difference in
> terms of behavior, but as we're about to modify adjacent code makes
> this more consistent, and lets us avoid worrying about when the
> memset() happens v.s. a "goto cleanup".

Ok. I wonder if we could reduce this kind of churn in the future by
adding this to CodingGuidelines, e.g. "always use { 0 } for stack
variables". Tangentially, do we require { NULL } when the first element
is a pointer? (I'm not sure because this isn't in CodingGuidelines
either AFAICT.)

> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
>  builtin/submodule--helper.c | 6 ++----
>  1 file changed, 2 insertions(+), 4 deletions(-)
>
> diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
> index fac52ade5e1..73717be957c 100644
> --- a/builtin/submodule--helper.c
> +++ b/builtin/submodule--helper.c
> @@ -1744,7 +1744,7 @@ static int module_clone(int argc, const char **argv, const char *prefix)
>  {
>  	int dissociate = 0, quiet = 0, progress = 0, require_init = 0;
>  	struct module_clone_data clone_data = MODULE_CLONE_DATA_INIT;
> -	struct list_objects_filter_options filter_options;
> +	struct list_objects_filter_options filter_options = { 0 };
>  
>  	struct option module_clone_options[] = {
>  		OPT_STRING(0, "prefix", &clone_data.prefix,
> @@ -1786,7 +1786,6 @@ static int module_clone(int argc, const char **argv, const char *prefix)
>  		NULL
>  	};
>  
> -	memset(&filter_options, 0, sizeof(filter_options));
>  	argc = parse_options(argc, argv, prefix, module_clone_options,
>  			     git_submodule_helper_usage, 0);
>  
> @@ -2563,7 +2562,7 @@ static int module_update(int argc, const char **argv, const char *prefix)
>  {
>  	struct pathspec pathspec;
>  	struct update_data opt = UPDATE_DATA_INIT;
> -	struct list_objects_filter_options filter_options;
> +	struct list_objects_filter_options filter_options = { 0 };
>  	int ret;
>  
>  	struct option module_update_options[] = {
> @@ -2623,7 +2622,6 @@ static int module_update(int argc, const char **argv, const char *prefix)
>  	update_clone_config_from_gitmodules(&opt.max_jobs);
>  	git_config(git_update_clone_config, &opt.max_jobs);
>  
> -	memset(&filter_options, 0, sizeof(filter_options));
>  	argc = parse_options(argc, argv, prefix, module_update_options,
>  			     git_submodule_helper_usage, 0);
>  
> -- 
> 2.37.0.932.g7b7031e73bc
Junio C Hamano July 13, 2022, 9:19 p.m. UTC | #2
Glen Choo <chooglen@google.com> writes:

> Ok. I wonder if we could reduce this kind of churn in the future by
> adding this to CodingGuidelines, e.g. "always use { 0 } for stack
> variables". Tangentially, do we require { NULL } when the first element
> is a pointer? (I'm not sure because this isn't in CodingGuidelines
> either AFAICT.)

A valiable can legitimately be left uninitialized, or initialized to
a known value that is not zero using designated initializers.  So
saying something like

    When zero-initializing an auto variable that is a struct or
    union in its definition, use "{ 0 }", whether the first member
    in the struct is of a number, a pointer, or a compound type.

may be OK.  I do not think we would want to say "always use X", as
the world is not that simple..

We do favor designated initializers over traditional initialization
in the order of members these days, so something like

    When declaring a struct/union variable or an array with initial
    value to some members or elements, consider using designated
    initializers, instead of listing the values in the order of
    members in the definition of the struct.

would also be good.

Thanks.
Glen Choo July 13, 2022, 10:41 p.m. UTC | #3
Junio C Hamano <gitster@pobox.com> writes:

> Glen Choo <chooglen@google.com> writes:
>
>> Ok. I wonder if we could reduce this kind of churn in the future by
>> adding this to CodingGuidelines, e.g. "always use { 0 } for stack
>> variables". Tangentially, do we require { NULL } when the first element
>> is a pointer? (I'm not sure because this isn't in CodingGuidelines
>> either AFAICT.)
>
> A valiable can legitimately be left uninitialized, or initialized to
> a known value that is not zero using designated initializers.  So
> saying something like
>
>     When zero-initializing an auto variable that is a struct or
>     union in its definition, use "{ 0 }", whether the first member
>     in the struct is of a number, a pointer, or a compound type.
>
> may be OK.  I do not think we would want to say "always use X", as
> the world is not that simple..

Thanks. Also, I made a typo here, I meant to be more specific with
regards to memset(), like:

  When zero-initializing an auto variable that is a struct or
  union in its definition, use "{ 0 }", whether the first member
  in the struct is of a number, a pointer, or a compound type. _Do not
  use memset()._

Unless there really is a legitimate reason to use memset(&my_auto_var, 0
sizeof(my_autovar)) that I've missed?

> We do favor designated initializers over traditional initialization
> in the order of members these days, so something like
>
>     When declaring a struct/union variable or an array with initial
>     value to some members or elements, consider using designated
>     initializers, instead of listing the values in the order of
>     members in the definition of the struct.
>
> would also be good.

Thanks. If I have time I'll send that proposal to CodingGuidelines, or
someone else can send it (I don't mind either way).
Ævar Arnfjörð Bjarmason July 14, 2022, 10:25 a.m. UTC | #4
On Wed, Jul 13 2022, Glen Choo wrote:

> Junio C Hamano <gitster@pobox.com> writes:
>
>> Glen Choo <chooglen@google.com> writes:
>>
>>> Ok. I wonder if we could reduce this kind of churn in the future by
>>> adding this to CodingGuidelines, e.g. "always use { 0 } for stack
>>> variables". Tangentially, do we require { NULL } when the first element
>>> is a pointer? (I'm not sure because this isn't in CodingGuidelines
>>> either AFAICT.)
>>
>> A valiable can legitimately be left uninitialized, or initialized to
>> a known value that is not zero using designated initializers.  So
>> saying something like
>>
>>     When zero-initializing an auto variable that is a struct or
>>     union in its definition, use "{ 0 }", whether the first member
>>     in the struct is of a number, a pointer, or a compound type.
>>
>> may be OK.  I do not think we would want to say "always use X", as
>> the world is not that simple..
>
> Thanks. Also, I made a typo here, I meant to be more specific with
> regards to memset(), like:
>
>   When zero-initializing an auto variable that is a struct or
>   union in its definition, use "{ 0 }", whether the first member
>   in the struct is of a number, a pointer, or a compound type. _Do not
>   use memset()._

It's curious that the { 0 } v.s. { NULL } form jumps out at people, but
seemingly not that you don't memset(&x, NULL, ...). I.e. that we're
already dealing with a form where C's "0 is NULL in pointer context"
rules kick in :)

So I wonder if we should say anything about the first member at all. On
the other hand this is quite an odd bit of C syntax, and reveals a bit
of on edge case or wart that I'm not happy with.

It's why I believe GCC has a syntax extension so you can just do:

	struct foo x = {};

I.e. 6.7.8.10 and 6.7.8.21 disuss the semantics of this, basically that
if there are fewer initializers than elements or members that the
structure is initialized as though it were "static".

But for the first member we rely on 0 and NULL being the same in pointer
context (even though NULL my not be (void*)NULL!). So it's a tad nasty,
it's basically doing the same as:

    const char *foo = 0; /* not NULL */

As an aside I wonder if we should say "C says this is undefined, but
c'mon, let's just assume '#define NULL (void*)0'". I think in practice
it's like the recently standardized 2's compliment, in that almost
nobody can even remember a platform where it isn't true, and none are in
current use (but maybe I'm wrong...).

Anyway, I was curious about this so I tried the following locally:
	
	@@
	type T;
	identifier I;
	@@
	- T I;
	+ T I = { 0 };
	... when strict
	    when != \( I \| &I \)
	(
	- memset(&I, 0, sizeof(I));
	|
	- memset(&I, 0, sizeof(T));
	)
	

Which aside from whitespace issues (that I've asked the cocci ML about)
yields a sane result.

What *doesn't* yield a sane result is getting rid of the "when strict"
there, i.e. we must not do this in xdiff/xhistogram.c's
histogram_diff(), as we "goto" to the memset() to re-zero-out the
variable.

But removing the "when strict" yields a change to 45 more files, the
initial one with "when strict" is 76 files. With the exception of that
histogram_diff() case (and I manually skimmed the code for the rest) it
passes all tests at least... :)

> Unless there really is a legitimate reason to use memset(&my_auto_var, 0
> sizeof(my_autovar)) that I've missed?

Basically no, except when you need to do the init N times, as noted
above.

>> We do favor designated initializers over traditional initialization
>> in the order of members these days, so something like
>>
>>     When declaring a struct/union variable or an array with initial
>>     value to some members or elements, consider using designated
>>     initializers, instead of listing the values in the order of
>>     members in the definition of the struct.
>>
>> would also be good.
>
> Thanks. If I have time I'll send that proposal to CodingGuidelines, or
> someone else can send it (I don't mind either way).

Sounds good!
Junio C Hamano July 14, 2022, 5:19 p.m. UTC | #5
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> It's curious that the { 0 } v.s. { NULL } form jumps out at people, but
> seemingly not that you don't memset(&x, NULL, ...). I.e. that we're
> already dealing with a form where C's "0 is NULL in pointer context"
> rules kick in :)

Strictly speaking there are different mechanisms at play here.
Literal "0" spelled in source code and assigned to a pointer
variable assigns a NULL to the variable even on architectures where
the representation of NULL is *not* "all 0-bit filling the width of
the variable" and C language is what guarantees "NULL pointer is
spelled as integer 0 in the source code".

Also, the language, not runtime and pointer representation, is how
second and subsequent members of a struct that are pointers are
initialized to NULL (not necessarily to "all 0-bit filling the width
of the member").

memset(&ptr, '\0', sizeof(ptr)), where ptr is a pointer variable or
a struct/union with a pointer member, on the other hand, is unaware
of how a NULL pointer is represented on the platform.  All it can
(and should) do is to fill the thing "ptr" with all 0-bit.  On an
exotic architecture, where NULL is not "all 0-bit filling the width
of the variable", I do not think it would work.

So from that point of view, using memset() as a replacement for zero
initialization is simply wrong, but in practice people do not work
on such a platform that the distinction matters anymore, hopefully?

> So I wonder if we should say anything about the first member at all.

The mention of the first member of the struct is historically
important only because some checkers like sparse used to complain
about

    struct { TYPE *member0; ... } var = { 0 };

the same way as

    char *string;
    string = 0;

before they got fixed so that they know about "{ 0 }" convention to
be silent about the former, while still warning about "even though C
language says literal 0 is a valid way to spell NULL, you shouldn't
do that" to the latter.

These days, we can safely write "{ 0 }" without having to worry
about the type of the first member.
Glen Choo July 15, 2022, 12:17 a.m. UTC | #6
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> On Wed, Jul 13 2022, Glen Choo wrote:
> Anyway, I was curious about this so I tried the following locally:
> 	
> 	@@
> 	type T;
> 	identifier I;
> 	@@
> 	- T I;
> 	+ T I = { 0 };
> 	... when strict
> 	    when != \( I \| &I \)
> 	(
> 	- memset(&I, 0, sizeof(I));
> 	|
> 	- memset(&I, 0, sizeof(T));
> 	)
> 	
>
> Which aside from whitespace issues (that I've asked the cocci ML about)
> yields a sane result...
>> ....... If I have time I'll send that proposal to CodingGuidelines, or
>> someone else can send it (I don't mind either way).

Adding an extra cocci check sounds like a great addition alongside the
CodingGuidelines change (automatically checking the rule is way better
than doing it manually of course). The fact that { 0 } is wrapped around
two lines is annoying, e.g.

  -	struct update_callback_data data;
  +	struct update_callback_data data = {
  +		0
  +	};
    struct rev_info rev;

  -	memset(&data, 0, sizeof(data));
    data.flags = flags;

but since it goes away with "make style", I'm tempted to say that this
check is worth having. Is it too confusing to have coccinelle recommend
something that we expect users to fix afterwards?
diff mbox series

Patch

diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index fac52ade5e1..73717be957c 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -1744,7 +1744,7 @@  static int module_clone(int argc, const char **argv, const char *prefix)
 {
 	int dissociate = 0, quiet = 0, progress = 0, require_init = 0;
 	struct module_clone_data clone_data = MODULE_CLONE_DATA_INIT;
-	struct list_objects_filter_options filter_options;
+	struct list_objects_filter_options filter_options = { 0 };
 
 	struct option module_clone_options[] = {
 		OPT_STRING(0, "prefix", &clone_data.prefix,
@@ -1786,7 +1786,6 @@  static int module_clone(int argc, const char **argv, const char *prefix)
 		NULL
 	};
 
-	memset(&filter_options, 0, sizeof(filter_options));
 	argc = parse_options(argc, argv, prefix, module_clone_options,
 			     git_submodule_helper_usage, 0);
 
@@ -2563,7 +2562,7 @@  static int module_update(int argc, const char **argv, const char *prefix)
 {
 	struct pathspec pathspec;
 	struct update_data opt = UPDATE_DATA_INIT;
-	struct list_objects_filter_options filter_options;
+	struct list_objects_filter_options filter_options = { 0 };
 	int ret;
 
 	struct option module_update_options[] = {
@@ -2623,7 +2622,6 @@  static int module_update(int argc, const char **argv, const char *prefix)
 	update_clone_config_from_gitmodules(&opt.max_jobs);
 	git_config(git_update_clone_config, &opt.max_jobs);
 
-	memset(&filter_options, 0, sizeof(filter_options));
 	argc = parse_options(argc, argv, prefix, module_update_options,
 			     git_submodule_helper_usage, 0);