Message ID | 20250109083310.GC2748836@coredump.intra.peff.net (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | combine-diff cleanups | expand |
Jeff King <peff@peff.net> writes: > All of the other functions which allocate a combine_diff_path struct > zero out the parent array, but this code path does not. There's no bug, > since our caller will fill in most of the fields. But leaving the unused > fields (like combine_diff_parent.path) uninitialized makes working with > the struct more error-prone than it needs to be. OK. We however will still not use the array at all when we do not need it, so it would be between accessing uninitialized bytes vs accessing 0-bytes by mistake? With my devil's advocate hat on, I wonder if this would lead to more sloppy users saying "I am not following the pointer; I am merely stopping when I see a NULL pointer at the end of the array" or something silly like that without checking the validity of the array itself (which presumably can be inferred by inspecting some other member in the containing struct, right?)". > Let's just zero the parent field to be consistent with the > combine_diff_path_new() allocator. But I like the "let's be consistent" reasoning, so I wouldn't complain ;-) > Signed-off-by: Jeff King <peff@peff.net> > --- > tree-diff.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/tree-diff.c b/tree-diff.c > index d9237ffd9b..24f7b5912c 100644 > --- a/tree-diff.c > +++ b/tree-diff.c > @@ -151,8 +151,6 @@ static int emit_diff_first_parent_only(struct diff_options *opt, struct combine_ > * process(p); > * p = pprev; > * ; don't forget to free tail->next in the end > - * > - * p->parent[] remains uninitialized. > */ > static struct combine_diff_path *path_appendnew(struct combine_diff_path *last, > int nparent, const struct strbuf *base, const char *path, int pathlen, > @@ -187,6 +185,8 @@ static struct combine_diff_path *path_appendnew(struct combine_diff_path *last, > p->mode = mode; > oidcpy(&p->oid, oid ? oid : null_oid()); > > + memset(p->parent, 0, sizeof(p->parent[0]) * nparent); > + > return p; > }
On Thu, Jan 09, 2025 at 10:28:05AM -0800, Junio C Hamano wrote: > Jeff King <peff@peff.net> writes: > > > All of the other functions which allocate a combine_diff_path struct > > zero out the parent array, but this code path does not. There's no bug, > > since our caller will fill in most of the fields. But leaving the unused > > fields (like combine_diff_parent.path) uninitialized makes working with > > the struct more error-prone than it needs to be. > > OK. We however will still not use the array at all when we do not > need it, so it would be between accessing uninitialized bytes vs > accessing 0-bytes by mistake? With my devil's advocate hat on, I > wonder if this would lead to more sloppy users saying "I am not > following the pointer; I am merely stopping when I see a NULL > pointer at the end of the array" or something silly like that > without checking the validity of the array itself (which presumably > can be inferred by inspecting some other member in the containing > struct, right?)". Yes, code may be equally wrong to look at uninitialized versus zero bytes, depending on what it's doing. I don't think "stop when you see NULL" is a danger here; this is an array of structs, one of which now happens to be NULL (rather than an array of char pointers, which might imply that NULL is the end). Some of that sloppiness already exists. For instance, before my series, check out intersect_paths(). If we are removing an element from the list, we clean it up like this: for (j = 0; j < num_parent; j++) if (combined_all_paths && filename_changed(p->parent[j].status)) strbuf_release(&p->parent[j].path); but if we allocated for 3 parents and have only gotten to the second pass, all of parent[2] will never have been filled in. We zero initialize the parents in that function, so there's no memory error. But it is relying on the fact that filename_changed() will reject a zero status to avoid calling strbuf_release() on a zero'd strbuf (which incidentally also works, but violates the strbuf API). Now in that case we are zero-ing, so it is not one of the uninitialized cases that Wink ran into. But even if he had tried to be careful with: if (filename_changed(p->parent[i].status)) /* ok to look at p->parent[i].path */ it would not have worked, because that status would have been uninitialized, too. > > Let's just zero the parent field to be consistent with the > > combine_diff_path_new() allocator. > > But I like the "let's be consistent" reasoning, so I wouldn't > complain ;-) So yeah. This is the part that I think is really helping new code. Changing the strbuf to a pointer makes it even simpler (you do not even have to check the status at all), but this is the commit that is preventing undefined behavior. ;) -Peff
diff --git a/tree-diff.c b/tree-diff.c index d9237ffd9b..24f7b5912c 100644 --- a/tree-diff.c +++ b/tree-diff.c @@ -151,8 +151,6 @@ static int emit_diff_first_parent_only(struct diff_options *opt, struct combine_ * process(p); * p = pprev; * ; don't forget to free tail->next in the end - * - * p->parent[] remains uninitialized. */ static struct combine_diff_path *path_appendnew(struct combine_diff_path *last, int nparent, const struct strbuf *base, const char *path, int pathlen, @@ -187,6 +185,8 @@ static struct combine_diff_path *path_appendnew(struct combine_diff_path *last, p->mode = mode; oidcpy(&p->oid, oid ? oid : null_oid()); + memset(p->parent, 0, sizeof(p->parent[0]) * nparent); + return p; }
All of the other functions which allocate a combine_diff_path struct zero out the parent array, but this code path does not. There's no bug, since our caller will fill in most of the fields. But leaving the unused fields (like combine_diff_parent.path) uninitialized makes working with the struct more error-prone than it needs to be. Let's just zero the parent field to be consistent with the combine_diff_path_new() allocator. Signed-off-by: Jeff King <peff@peff.net> --- tree-diff.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)