diff mbox series

object_as_type: initialize commit-graph-related fields of 'struct commit'

Message ID 20190127130832.23652-1-szeder.dev@gmail.com (mailing list archive)
State New, archived
Headers show
Series object_as_type: initialize commit-graph-related fields of 'struct commit' | expand

Commit Message

SZEDER Gábor Jan. 27, 2019, 1:08 p.m. UTC
When the commit graph and generation numbers were introduced in
commits 177722b344 (commit: integrate commit graph with commit
parsing, 2018-04-10) and 83073cc994 (commit: add generation number to
struct commit, 2018-04-25), they tried to make sure that the
corresponding 'graph_pos' and 'generation' fields of 'struct commit'
are initialized conservatively, as if the commit were not included in
the commit-graph file.

Alas, initializing those fields only in alloc_commit_node() missed the
case when an object that happens to be a commit is first looked up via
lookup_unknown_object(), and is then later converted to a 'struct
commit' via the object_as_type() helper function (either calling it
directly, or as part of a subsequent lookup_commit() call).
Consequently, both of those fields incorrectly remain set to zero,
which means e.g. that the commit is present in and is the first entry
of the commit-graph file.  This will result in wrong timestamp, parent
and root tree hashes, if such a 'struct commit' instance is later
filled from the commit-graph.

Extract the initialization of 'struct commit's fields from
alloc_commit_node() into a helper function, and call it from
object_as_type() as well, to make sure that it properly initializes
the two commit-graph-related fields, too.  With this helper function
it is hopefully less likely that any new fields added to 'struct
commit' in the future would remain uninitialized.

With this change alloc_commit_index() won't have any remaining callers
outside of 'alloc.c', so mark it as static.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
---

So, it turns out that ec0c5798ee (revision: use commit graph in
get_reference(), 2018-12-04) is not the culprit after all, it merely
highlighted a bug that is as old as the commit-graph feature itself.
This patch fixes this and all other related issues I reported
upthread.

I couldn't find any other place where an object of unknown type is
turned into a 'struct commit', so this might have been the only place
that needed fixing.

Other object types seem to be fine, because they contain only fields
that should be zero initialized.  At least for now, because a similar
issue might arise in the future, if one of them gains a new field that
should not be initialized to zero...  but will they ever get such a
field?  So I'm not too keen on introducing similar init_tree_node(),
etc. helper funcions at the moment.

 alloc.c  | 11 ++++++++---
 alloc.h  |  2 +-
 object.c |  5 +++--
 3 files changed, 12 insertions(+), 6 deletions(-)

Comments

SZEDER Gábor Jan. 27, 2019, 1:28 p.m. UTC | #1
On Sun, Jan 27, 2019 at 02:08:32PM +0100, SZEDER Gábor wrote:
> When the commit graph and generation numbers were introduced in
> commits 177722b344 (commit: integrate commit graph with commit
> parsing, 2018-04-10) and 83073cc994 (commit: add generation number to
> struct commit, 2018-04-25), they tried to make sure that the
> corresponding 'graph_pos' and 'generation' fields of 'struct commit'
> are initialized conservatively, as if the commit were not included in
> the commit-graph file.
> 
> Alas, initializing those fields only in alloc_commit_node() missed the
> case when an object that happens to be a commit is first looked up via
> lookup_unknown_object(), and is then later converted to a 'struct
> commit' via the object_as_type() helper function (either calling it
> directly, or as part of a subsequent lookup_commit() call).
> Consequently, both of those fields incorrectly remain set to zero,
> which means e.g. that the commit is present in and is the first entry
> of the commit-graph file.  This will result in wrong timestamp, parent
> and root tree hashes, if such a 'struct commit' instance is later
> filled from the commit-graph.
> 
> Extract the initialization of 'struct commit's fields from
> alloc_commit_node() into a helper function, and call it from
> object_as_type() as well, to make sure that it properly initializes
> the two commit-graph-related fields, too.  With this helper function
> it is hopefully less likely that any new fields added to 'struct
> commit' in the future would remain uninitialized.
> 
> With this change alloc_commit_index() won't have any remaining callers
> outside of 'alloc.c', so mark it as static.
> 
> Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
> ---
> 
> So, it turns out that ec0c5798ee (revision: use commit graph in
> get_reference(), 2018-12-04) is not the culprit after all, it merely
> highlighted a bug that is as old as the commit-graph feature itself.
> This patch fixes this and all other related issues I reported
> upthread.

And how/why does this affect 'git describe --dirty'?

  - 'git describe' first iterates over all refs, and somewhere deep
    inside for_each_ref() each commit (well, object) a ref points to
    is looked up via lookup_unknown_object().  This leaves all fields
    of the created object zero initialized.

  - Then it dereferences HEAD for '--dirty' and ec0c5798ee's changes
    to get_reference() kick in: lookup_commit() doesn't instantiate a
    brand new and freshly initialized 'struct commit', but returns the
    object created in the previous step converted into 'struct
    commit'.  This conversion doesn't set the commit-graph fields in
    'struct commit', but leaves both as zero.  get_reference() then
    tries to load HEAD's commit information from the commit-graph,
    find_commit_in_graph() sees the the still zero 'graph_pos' field
    and doesn't perform a search through the commit-graph file, and
    the subsequent fill_commit_in_graph() reads the commit info from
    the first entry.

    In case of the failing test I posted earlier, where only the first
    commit is in the commit-graph but HEAD isn't, this means that the
    HEAD's 'struct commit' is filled with the info of HEAD^.

  - Ultimately, the diff machinery then doesn't compare the worktree
    to HEAD's tree, but to HEAD^'s, finds that they differ, hence the
    incorrect '-dirty' flag in the output.

Before ec0c5798ee get_reference() simply called parse_object(), which
ignored the commit-graph, so the issue could remain hidden.
Derrick Stolee Jan. 27, 2019, 6:40 p.m. UTC | #2
On 1/27/2019 8:28 AM, SZEDER Gábor wrote:
> On Sun, Jan 27, 2019 at 02:08:32PM +0100, SZEDER Gábor wrote:
>> When the commit graph and generation numbers were introduced in
>> commits 177722b344 (commit: integrate commit graph with commit
>> parsing, 2018-04-10) and 83073cc994 (commit: add generation number to
>> struct commit, 2018-04-25), they tried to make sure that the
>> corresponding 'graph_pos' and 'generation' fields of 'struct commit'
>> are initialized conservatively, as if the commit were not included in
>> the commit-graph file.
>>
>> Alas, initializing those fields only in alloc_commit_node() missed the
>> case when an object that happens to be a commit is first looked up via
>> lookup_unknown_object(), and is then later converted to a 'struct
>> commit' via the object_as_type() helper function (either calling it
>> directly, or as part of a subsequent lookup_commit() call).
>> Consequently, both of those fields incorrectly remain set to zero,
>> which means e.g. that the commit is present in and is the first entry
>> of the commit-graph file.  This will result in wrong timestamp, parent
>> and root tree hashes, if such a 'struct commit' instance is later
>> filled from the commit-graph.
>>
>> Extract the initialization of 'struct commit's fields from
>> alloc_commit_node() into a helper function, and call it from
>> object_as_type() as well, to make sure that it properly initializes
>> the two commit-graph-related fields, too.  With this helper function
>> it is hopefully less likely that any new fields added to 'struct
>> commit' in the future would remain uninitialized.
>>
>> With this change alloc_commit_index() won't have any remaining callers
>> outside of 'alloc.c', so mark it as static.
>>
>> Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
>> ---
>>
>> So, it turns out that ec0c5798ee (revision: use commit graph in
>> get_reference(), 2018-12-04) is not the culprit after all, it merely
>> highlighted a bug that is as old as the commit-graph feature itself.
>> This patch fixes this and all other related issues I reported
>> upthread.
> 
> And how/why does this affect 'git describe --dirty'?
> 
>   - 'git describe' first iterates over all refs, and somewhere deep
>     inside for_each_ref() each commit (well, object) a ref points to
>     is looked up via lookup_unknown_object().  This leaves all fields
>     of the created object zero initialized.
> 
>   - Then it dereferences HEAD for '--dirty' and ec0c5798ee's changes
>     to get_reference() kick in: lookup_commit() doesn't instantiate a
>     brand new and freshly initialized 'struct commit', but returns the
>     object created in the previous step converted into 'struct
>     commit'.  This conversion doesn't set the commit-graph fields in
>     'struct commit', but leaves both as zero.  get_reference() then
>     tries to load HEAD's commit information from the commit-graph,
>     find_commit_in_graph() sees the the still zero 'graph_pos' field
>     and doesn't perform a search through the commit-graph file, and
>     the subsequent fill_commit_in_graph() reads the commit info from
>     the first entry.
> 
>     In case of the failing test I posted earlier, where only the first
>     commit is in the commit-graph but HEAD isn't, this means that the
>     HEAD's 'struct commit' is filled with the info of HEAD^.
> 
>   - Ultimately, the diff machinery then doesn't compare the worktree
>     to HEAD's tree, but to HEAD^'s, finds that they differ, hence the
>     incorrect '-dirty' flag in the output.
> 
> Before ec0c5798ee get_reference() simply called parse_object(), which
> ignored the commit-graph, so the issue could remain hidden.

Thanks for digging in, Szeder. This is a very subtle interaction, and
I'm glad you caught the issue. There are likely other ways this could
become problematic, including hitting BUG() statements regarding
generation numbers.

I recommend this be merged to 'maint' if possible.

Thanks,
-Stolee
Jeff King Jan. 28, 2019, 4:15 p.m. UTC | #3
On Sun, Jan 27, 2019 at 02:08:32PM +0100, SZEDER Gábor wrote:

> When the commit graph and generation numbers were introduced in
> commits 177722b344 (commit: integrate commit graph with commit
> parsing, 2018-04-10) and 83073cc994 (commit: add generation number to
> struct commit, 2018-04-25), they tried to make sure that the
> corresponding 'graph_pos' and 'generation' fields of 'struct commit'
> are initialized conservatively, as if the commit were not included in
> the commit-graph file.
> 
> Alas, initializing those fields only in alloc_commit_node() missed the
> case when an object that happens to be a commit is first looked up via
> lookup_unknown_object(), and is then later converted to a 'struct
> commit' via the object_as_type() helper function (either calling it
> directly, or as part of a subsequent lookup_commit() call).
> Consequently, both of those fields incorrectly remain set to zero,
> which means e.g. that the commit is present in and is the first entry
> of the commit-graph file.  This will result in wrong timestamp, parent
> and root tree hashes, if such a 'struct commit' instance is later
> filled from the commit-graph.
> 
> Extract the initialization of 'struct commit's fields from
> alloc_commit_node() into a helper function, and call it from
> object_as_type() as well, to make sure that it properly initializes
> the two commit-graph-related fields, too.  With this helper function
> it is hopefully less likely that any new fields added to 'struct
> commit' in the future would remain uninitialized.
> 
> With this change alloc_commit_index() won't have any remaining callers
> outside of 'alloc.c', so mark it as static.

Good find, and nicely explained.

> ---
> 
> So, it turns out that ec0c5798ee (revision: use commit graph in
> get_reference(), 2018-12-04) is not the culprit after all, it merely
> highlighted a bug that is as old as the commit-graph feature itself.
> This patch fixes this and all other related issues I reported
> upthread.
> 
> I couldn't find any other place where an object of unknown type is
> turned into a 'struct commit', so this might have been the only place
> that needed fixing.

This should be the only place. We already ran into this with the
commit-index field, which was what caused us to create object_as_type()
in the first place, to give a central place for coercing OBJ_NONE into
other types.

> Other object types seem to be fine, because they contain only fields
> that should be zero initialized.  At least for now, because a similar
> issue might arise in the future, if one of them gains a new field that
> should not be initialized to zero...  but will they ever get such a
> field?  So I'm not too keen on introducing similar init_tree_node(),
> etc. helper funcions at the moment.

Agreed. We can deal with those if it ever becomes necessary. In theory
adding empty placeholder functions might help somebody realize they'd
need to handle this case, but I have the feeling that they'd be as
likely to miss init_tree_node() as they would object_as_type().

I dunno. I guess if init_tree_node() were actually called from
alloc_tree_node(), it might be harder to miss.

>  alloc.c  | 11 ++++++++---
>  alloc.h  |  2 +-
>  object.c |  5 +++--
>  3 files changed, 12 insertions(+), 6 deletions(-)

The patch itself looks good to me.

-Peff
Jonathan Tan Jan. 28, 2019, 4:57 p.m. UTC | #4
On Sun, Jan 27, 2019 at 5:08 AM SZEDER Gábor <szeder.dev@gmail.com> wrote:
>
> When the commit graph and generation numbers were introduced in
> commits 177722b344 (commit: integrate commit graph with commit
> parsing, 2018-04-10) and 83073cc994 (commit: add generation number to
> struct commit, 2018-04-25), they tried to make sure that the
> corresponding 'graph_pos' and 'generation' fields of 'struct commit'
> are initialized conservatively, as if the commit were not included in
> the commit-graph file.

Thanks for looking into this! The patch looks good to me.
diff mbox series

Patch

diff --git a/alloc.c b/alloc.c
index e7aa81b7aa..1c64c4dd16 100644
--- a/alloc.c
+++ b/alloc.c
@@ -99,18 +99,23 @@  void *alloc_object_node(struct repository *r)
 	return obj;
 }
 
-unsigned int alloc_commit_index(struct repository *r)
+static unsigned int alloc_commit_index(struct repository *r)
 {
 	return r->parsed_objects->commit_count++;
 }
 
-void *alloc_commit_node(struct repository *r)
+void init_commit_node(struct repository *r, struct commit *c)
 {
-	struct commit *c = alloc_node(r->parsed_objects->commit_state, sizeof(struct commit));
 	c->object.type = OBJ_COMMIT;
 	c->index = alloc_commit_index(r);
 	c->graph_pos = COMMIT_NOT_FROM_GRAPH;
 	c->generation = GENERATION_NUMBER_INFINITY;
+}
+
+void *alloc_commit_node(struct repository *r)
+{
+	struct commit *c = alloc_node(r->parsed_objects->commit_state, sizeof(struct commit));
+	init_commit_node(r, c);
 	return c;
 }
 
diff --git a/alloc.h b/alloc.h
index ba356ed847..ed1071c11e 100644
--- a/alloc.h
+++ b/alloc.h
@@ -9,11 +9,11 @@  struct repository;
 
 void *alloc_blob_node(struct repository *r);
 void *alloc_tree_node(struct repository *r);
+void init_commit_node(struct repository *r, struct commit *c);
 void *alloc_commit_node(struct repository *r);
 void *alloc_tag_node(struct repository *r);
 void *alloc_object_node(struct repository *r);
 void alloc_report(struct repository *r);
-unsigned int alloc_commit_index(struct repository *r);
 
 struct alloc_state *allocate_alloc_state(void);
 void clear_alloc_state(struct alloc_state *s);
diff --git a/object.c b/object.c
index c4170d2d0c..7bccfd5d8e 100644
--- a/object.c
+++ b/object.c
@@ -164,8 +164,9 @@  void *object_as_type(struct repository *r, struct object *obj, enum object_type
 		return obj;
 	else if (obj->type == OBJ_NONE) {
 		if (type == OBJ_COMMIT)
-			((struct commit *)obj)->index = alloc_commit_index(r);
-		obj->type = type;
+			init_commit_node(r, (struct commit *) obj);
+		else
+			obj->type = type;
 		return obj;
 	}
 	else {