Message ID | 20241111162148.337375-1-simon.marchi@efficios.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | builtin/difftool: intialize some hashmap variables | expand |
On Mon, Nov 11, 2024 at 11:21:44AM -0500, Simon Marchi wrote: > When running a dir-diff command that produces no diff, variables > `wt_modified` and `tmp_modified` are used while uninitialized, causing: > > $ /home/smarchi/src/git/git-difftool --dir-diff master > free(): invalid pointer > [1] 334004 IOT instruction (core dumped) /home/smarchi/src/git/git-difftool --dir-diff master > $ valgrind --track-origins=yes /home/smarchi/src/git/git-difftool --dir-diff master > ... > Invalid free() / delete / delete[] / realloc() > at 0x48478EF: free (vg_replace_malloc.c:989) > by 0x422CAC: hashmap_clear_ (hashmap.c:208) > by 0x283830: run_dir_diff (difftool.c:667) > by 0x284103: cmd_difftool (difftool.c:801) > by 0x238E0F: run_builtin (git.c:484) > by 0x2392B9: handle_builtin (git.c:750) > by 0x2399BC: cmd_main (git.c:921) > by 0x356FEF: main (common-main.c:64) > Address 0x1ffefff180 is on thread 1's stack > in frame #2, created by run_dir_diff (difftool.c:358) > ... > > If taking any `goto finish` path before these variables are initialized, > `hashmap_clear_and_free()` operates on uninitialized data, sometimes > causing a crash. > > Fix it by zero-initializing these variables, making > `hashmap_clear_and_free()` a no-op in that case. The fix makes sense. I wondered if this had been broken for a long time, and if so, how we managed not to notice it. But it looks like it is a recent problem, via 7f795a1715 (builtin/difftool: plug several trivial memory leaks, 2024-09-26). > diff --git a/builtin/difftool.c b/builtin/difftool.c > index ca1b0890659b..b902f5d2ae17 100644 > --- a/builtin/difftool.c > +++ b/builtin/difftool.c > @@ -376,7 +376,8 @@ static int run_dir_diff(const char *extcmd, int symlinks, const char *prefix, > struct checkout lstate, rstate; > int err = 0; > struct child_process cmd = CHILD_PROCESS_INIT; > - struct hashmap wt_modified, tmp_modified; > + struct hashmap wt_modified = {0}; > + struct hashmap tmp_modified = {0}; > int indices_loaded = 0; That commit likewise frees some other local variables, but they are all properly initialized. So touching these two are sufficient. I'm not sure if zero-initialization is being a little too familiar with the hashmap internals, though. The other variables use HASHMAP_INIT(). Should we do the same here, like this: diff --git a/builtin/difftool.c b/builtin/difftool.c index 1a68ab6699..86995390c7 100644 --- a/builtin/difftool.c +++ b/builtin/difftool.c @@ -374,7 +374,8 @@ static int run_dir_diff(const char *extcmd, int symlinks, const char *prefix, struct checkout lstate, rstate; int err = 0; struct child_process cmd = CHILD_PROCESS_INIT; - struct hashmap wt_modified, tmp_modified; + struct hashmap wt_modified = HASHMAP_INIT(path_entry_cmp, NULL); + struct hashmap tmp_modified = HASHMAP_INIT(path_entry_cmp, NULL); int indices_loaded = 0; workdir = get_git_work_tree(); @@ -594,14 +595,7 @@ static int run_dir_diff(const char *extcmd, int symlinks, const char *prefix, * should be copied back to the working tree. * Do not copy back files when symlinks are used and the * external tool did not replace the original link with a file. - * - * These hashes are loaded lazily since they aren't needed - * in the common case of --symlinks and the difftool updating - * files through the symlink. */ - hashmap_init(&wt_modified, path_entry_cmp, NULL, wtindex.cache_nr); - hashmap_init(&tmp_modified, path_entry_cmp, NULL, wtindex.cache_nr); - for (i = 0; i < wtindex.cache_nr; i++) { struct hashmap_entry dummy; const char *name = wtindex.cache[i]->name; That loses the initial table growth that the original had, but I think letting it grow in the usual way is fine here. -Peff
On 11/11/24 3:54 PM, Jeff King wrote: > On Mon, Nov 11, 2024 at 11:21:44AM -0500, Simon Marchi wrote: > >> When running a dir-diff command that produces no diff, variables >> `wt_modified` and `tmp_modified` are used while uninitialized, causing: >> >> $ /home/smarchi/src/git/git-difftool --dir-diff master >> free(): invalid pointer >> [1] 334004 IOT instruction (core dumped) /home/smarchi/src/git/git-difftool --dir-diff master >> $ valgrind --track-origins=yes /home/smarchi/src/git/git-difftool --dir-diff master >> ... >> Invalid free() / delete / delete[] / realloc() >> at 0x48478EF: free (vg_replace_malloc.c:989) >> by 0x422CAC: hashmap_clear_ (hashmap.c:208) >> by 0x283830: run_dir_diff (difftool.c:667) >> by 0x284103: cmd_difftool (difftool.c:801) >> by 0x238E0F: run_builtin (git.c:484) >> by 0x2392B9: handle_builtin (git.c:750) >> by 0x2399BC: cmd_main (git.c:921) >> by 0x356FEF: main (common-main.c:64) >> Address 0x1ffefff180 is on thread 1's stack >> in frame #2, created by run_dir_diff (difftool.c:358) >> ... >> >> If taking any `goto finish` path before these variables are initialized, >> `hashmap_clear_and_free()` operates on uninitialized data, sometimes >> causing a crash. >> >> Fix it by zero-initializing these variables, making >> `hashmap_clear_and_free()` a no-op in that case. > > The fix makes sense. I wondered if this had been broken for a long time, > and if so, how we managed not to notice it. But it looks like it is a > recent problem, via 7f795a1715 (builtin/difftool: plug several trivial > memory leaks, 2024-09-26). Are there tests for this specific scenario (no diff between the two versions)? > >> diff --git a/builtin/difftool.c b/builtin/difftool.c >> index ca1b0890659b..b902f5d2ae17 100644 >> --- a/builtin/difftool.c >> +++ b/builtin/difftool.c >> @@ -376,7 +376,8 @@ static int run_dir_diff(const char *extcmd, int symlinks, const char *prefix, >> struct checkout lstate, rstate; >> int err = 0; >> struct child_process cmd = CHILD_PROCESS_INIT; >> - struct hashmap wt_modified, tmp_modified; >> + struct hashmap wt_modified = {0}; >> + struct hashmap tmp_modified = {0}; >> int indices_loaded = 0; > > That commit likewise frees some other local variables, but they are all > properly initialized. So touching these two are sufficient. Indeed, I checked the other variables, they look fine. > I'm not sure if zero-initialization is being a little too familiar with > the hashmap internals, though. Up to you. In other C projects I worked on, it was typical that zero-ing an object would get it in a valid initial empty state, properly handled by the destruction functions. This way, a big struct containing other objects could be initialized simply by zero-ing it, without having to initialize each component explicitly. > The other variables use HASHMAP_INIT(). > Should we do the same here, like this: > > diff --git a/builtin/difftool.c b/builtin/difftool.c > index 1a68ab6699..86995390c7 100644 > --- a/builtin/difftool.c > +++ b/builtin/difftool.c > @@ -374,7 +374,8 @@ static int run_dir_diff(const char *extcmd, int symlinks, const char *prefix, > struct checkout lstate, rstate; > int err = 0; > struct child_process cmd = CHILD_PROCESS_INIT; > - struct hashmap wt_modified, tmp_modified; > + struct hashmap wt_modified = HASHMAP_INIT(path_entry_cmp, NULL); > + struct hashmap tmp_modified = HASHMAP_INIT(path_entry_cmp, NULL); > int indices_loaded = 0; > > workdir = get_git_work_tree(); > @@ -594,14 +595,7 @@ static int run_dir_diff(const char *extcmd, int symlinks, const char *prefix, > * should be copied back to the working tree. > * Do not copy back files when symlinks are used and the > * external tool did not replace the original link with a file. > - * > - * These hashes are loaded lazily since they aren't needed > - * in the common case of --symlinks and the difftool updating > - * files through the symlink. > */ > - hashmap_init(&wt_modified, path_entry_cmp, NULL, wtindex.cache_nr); > - hashmap_init(&tmp_modified, path_entry_cmp, NULL, wtindex.cache_nr); > - > for (i = 0; i < wtindex.cache_nr; i++) { > struct hashmap_entry dummy; > const char *name = wtindex.cache[i]->name; > > That loses the initial table growth that the original had, but I think > letting it grow in the usual way is fine here. I thought about it, but was indeed afraid to be told that this removes an optimization. If you think it's fine, I'm happy with it too. Please let me know if you want a v2 or if you are just going to merge an updated version of this patch. Thanks, Simon
On Mon, Nov 11, 2024 at 04:22:26PM -0500, Simon Marchi wrote: > > The fix makes sense. I wondered if this had been broken for a long time, > > and if so, how we managed not to notice it. But it looks like it is a > > recent problem, via 7f795a1715 (builtin/difftool: plug several trivial > > memory leaks, 2024-09-26). > > Are there tests for this specific scenario (no diff between the two > versions)? Presumably not, or they'd have been segfaulting. ;) So it may be worth adding some to t7800. > > I'm not sure if zero-initialization is being a little too familiar with > > the hashmap internals, though. > > Up to you. In other C projects I worked on, it was typical that > zero-ing an object would get it in a valid initial empty state, properly > handled by the destruction functions. This way, a big struct containing > other objects could be initialized simply by zero-ing it, without having > to initialize each component explicitly. We often follow that rule in git.git, too, but have been increasingly moving to macro initializers. They make it easier if we ever need a non-zero state as an invariant (e.g., STRBUF_INIT always points to a dummy string). In this case, it does look like HASHMAP_INIT sets do_count_items. It's not a bug in your program since we still hashmap_init() over the zero'd state, but it does feel a bit weird to me. > Please let me know if you want a v2 or if you are just going to merge an > updated version of this patch. That's up to the maintainer. IMHO it's worth using HASHMAP_INIT, though, and perhaps adding a test. -Peff
Jeff King <peff@peff.net> writes: >> Please let me know if you want a v2 or if you are just going to merge an >> updated version of this patch. > > That's up to the maintainer. IMHO it's worth using HASHMAP_INIT, though, > and perhaps adding a test. Yup, please do. Thanks.
diff --git a/builtin/difftool.c b/builtin/difftool.c index ca1b0890659b..b902f5d2ae17 100644 --- a/builtin/difftool.c +++ b/builtin/difftool.c @@ -376,7 +376,8 @@ static int run_dir_diff(const char *extcmd, int symlinks, const char *prefix, struct checkout lstate, rstate; int err = 0; struct child_process cmd = CHILD_PROCESS_INIT; - struct hashmap wt_modified, tmp_modified; + struct hashmap wt_modified = {0}; + struct hashmap tmp_modified = {0}; int indices_loaded = 0; workdir = repo_get_work_tree(the_repository);
When running a dir-diff command that produces no diff, variables `wt_modified` and `tmp_modified` are used while uninitialized, causing: $ /home/smarchi/src/git/git-difftool --dir-diff master free(): invalid pointer [1] 334004 IOT instruction (core dumped) /home/smarchi/src/git/git-difftool --dir-diff master $ valgrind --track-origins=yes /home/smarchi/src/git/git-difftool --dir-diff master ... Invalid free() / delete / delete[] / realloc() at 0x48478EF: free (vg_replace_malloc.c:989) by 0x422CAC: hashmap_clear_ (hashmap.c:208) by 0x283830: run_dir_diff (difftool.c:667) by 0x284103: cmd_difftool (difftool.c:801) by 0x238E0F: run_builtin (git.c:484) by 0x2392B9: handle_builtin (git.c:750) by 0x2399BC: cmd_main (git.c:921) by 0x356FEF: main (common-main.c:64) Address 0x1ffefff180 is on thread 1's stack in frame #2, created by run_dir_diff (difftool.c:358) ... If taking any `goto finish` path before these variables are initialized, `hashmap_clear_and_free()` operates on uninitialized data, sometimes causing a crash. Fix it by zero-initializing these variables, making `hashmap_clear_and_free()` a no-op in that case. Signed-off-by: Simon Marchi <simon.marchi@efficios.com> Cc: Junio C Hamano <gitster@pobox.com> Cc: René Scharfe <l.s.r@web.de> Cc: Taylor Blau <me@ttaylorr.com> Cc: Patrick Steinhardt <ps@pks.im> --- builtin/difftool.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) base-commit: b31fb630c0fc6869a33ed717163e8a1210460d94