Message ID | 20190406113453.5149-1-pclouds@gmail.com (mailing list archive) |
---|---|
Headers | show |
Series | nd/sha1-name-c-wo-the-repository updates | expand |
Hi Duy, On Sat, 6 Apr 2019, Nguyễn Thái Ngọc Duy wrote: > 10: 68876a150f ! 11: 848456f59c commit.c: add repo_get_commit_tree() > @@ -2,6 +2,11 @@ > > commit.c: add repo_get_commit_tree() > > + Remove the implicit dependency on the_repository in this function. > + It will be used in sha1-name.c functions when they are updated to take > + any 'struct repository'. get_commit_tree() remains as a compat wrapper, > + to be slowly replaced later. > + > diff --git a/commit.c b/commit.c > --- a/commit.c > +++ b/commit.c > @@ -29,6 +34,15 @@ > --- a/commit.h > +++ b/commit.h > @@ > + > + /* > + * If the commit is loaded from the commit-graph file, then this > +- * member may be NULL. Only access it through get_commit_tree() > ++ * member may be NULL. Only access it through repo_get_commit_tree() > + * or get_commit_tree_oid(). > + */ > + struct tree *maybe_tree; > +@@ > */ > void free_commit_buffer(struct parsed_object_pool *pool, struct commit *); > > @@ -57,3 +71,10 @@ > ...>} > > @@ > + expression c; > ++expression r; > + expression s; > + @@ > +-- get_commit_tree(c) = s > ++- repo_get_commit_tree(r, c) = s > + + c->maybe_tree = s I think this is wrong, and admittedly I had the very same version originally. When you have an arbitrary `r` in any `repo_get_commit_tree(r, c)` (as opposed to `the_repository`), the conversion to `c->maybe_tree` is most likely incorrect. Therefore, I don't think that we can do that. Ciao, Johannes
On Thu, Apr 11, 2019 at 3:56 AM Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote: > > @@ > > + expression c; > > ++expression r; > > + expression s; > > + @@ > > +-- get_commit_tree(c) = s > > ++- repo_get_commit_tree(r, c) = s > > + + c->maybe_tree = s > > I think this is wrong, and admittedly I had the very same version > originally. > > When you have an arbitrary `r` in any `repo_get_commit_tree(r, c)` (as > opposed to `the_repository`), the conversion to `c->maybe_tree` is most > likely incorrect. I did read the get_commit_tree() source code before doing this. struct repository is only used to get commit graph to speed things up and we can't change a thing there when maybe_tree is reassigned. To reassign maybe_tree, commit-graph does not matter. Neither does the_repository (vs arbitrary struct repo) > > Therefore, I don't think that we can do that. > > Ciao, > Johannes
Hi, On Thu, 11 Apr 2019, Duy Nguyen wrote: > On Thu, Apr 11, 2019 at 3:56 AM Johannes Schindelin > <Johannes.Schindelin@gmx.de> wrote: > > > @@ > > > + expression c; > > > ++expression r; > > > + expression s; > > > + @@ > > > +-- get_commit_tree(c) = s > > > ++- repo_get_commit_tree(r, c) = s > > > + + c->maybe_tree = s > > > > I think this is wrong, and admittedly I had the very same version > > originally. > > > > When you have an arbitrary `r` in any `repo_get_commit_tree(r, c)` (as > > opposed to `the_repository`), the conversion to `c->maybe_tree` is most > > likely incorrect. > > I did read the get_commit_tree() source code before doing this. struct > repository is only used to get commit graph to speed things up and we > can't change a thing there when maybe_tree is reassigned. To reassign > maybe_tree, commit-graph does not matter. Neither does the_repository > (vs arbitrary struct repo) You read the current code. Obviously, you had no future code to read. The cocci patch you are changing will affect future code, too. The safe, and correct, thing to do, then, is to not pretend to know what future `get_commit_tree()` functions will do, and instead go with the stricter version of the cocci patch, as I had suggested. Ciao, Johannes > > > > Therefore, I don't think that we can do that. > > > > Ciao, > > Johannes > > > > -- > Duy >
On Wed, Apr 10, 2019 at 10:56:52PM +0200, Johannes Schindelin wrote: > Hi Duy, > > On Sat, 6 Apr 2019, Nguyễn Thái Ngọc Duy wrote: > > > 10: 68876a150f ! 11: 848456f59c commit.c: add repo_get_commit_tree() > > @@ -2,6 +2,11 @@ > > > > commit.c: add repo_get_commit_tree() > > > > + Remove the implicit dependency on the_repository in this function. > > + It will be used in sha1-name.c functions when they are updated to take > > + any 'struct repository'. get_commit_tree() remains as a compat wrapper, > > + to be slowly replaced later. > > + > > diff --git a/commit.c b/commit.c > > --- a/commit.c > > +++ b/commit.c > > @@ -29,6 +34,15 @@ > > --- a/commit.h > > +++ b/commit.h > > @@ > > + > > + /* > > + * If the commit is loaded from the commit-graph file, then this > > +- * member may be NULL. Only access it through get_commit_tree() > > ++ * member may be NULL. Only access it through repo_get_commit_tree() > > + * or get_commit_tree_oid(). > > + */ > > + struct tree *maybe_tree; > > +@@ > > */ > > void free_commit_buffer(struct parsed_object_pool *pool, struct commit *); > > > > @@ -57,3 +71,10 @@ > > ...>} > > > > @@ > > + expression c; > > ++expression r; > > + expression s; > > + @@ > > +-- get_commit_tree(c) = s > > ++- repo_get_commit_tree(r, c) = s > > + + c->maybe_tree = s > > I think this is wrong, and admittedly I had the very same version > originally. > > When you have an arbitrary `r` in any `repo_get_commit_tree(r, c)` (as > opposed to `the_repository`), the conversion to `c->maybe_tree` is most > likely incorrect. > > Therefore, I don't think that we can do that. So, as far as I understand, the goal of these 'c->maybe_tree'-related semantic patches is to prevent "generic" parts of Git from accessing this field directly, as it might not be initialized in a commit-graph-enabled repository. Only three functions are explicitly exempt, while this last semantic patch in question implicitly allows a few more that assign a value to 'c->maybe_tree'. These functions are release_commit_memory() and parse_commit_buffer() in 'commit.c' and fill_commit_in_graph() in 'commit-graph.c', and after a quick look these functions seem to be rather fundamenal in the life-cycle of a commit object. I think they deserve to be explicitly exempted, too, and then we could remove this last semantic patch altogether.
On Thu, Apr 11, 2019 at 10:51:46PM +0200, SZEDER Gábor wrote: > On Wed, Apr 10, 2019 at 10:56:52PM +0200, Johannes Schindelin wrote: > > > ...>} > > > > > > @@ > > > + expression c; > > > ++expression r; > > > + expression s; > > > + @@ > > > +-- get_commit_tree(c) = s > > > ++- repo_get_commit_tree(r, c) = s > > > + + c->maybe_tree = s > > > > I think this is wrong, and admittedly I had the very same version > > originally. > > > > When you have an arbitrary `r` in any `repo_get_commit_tree(r, c)` (as > > opposed to `the_repository`), the conversion to `c->maybe_tree` is most > > likely incorrect. > > > > Therefore, I don't think that we can do that. > > So, as far as I understand, the goal of these 'c->maybe_tree'-related > semantic patches is to prevent "generic" parts of Git from accessing > this field directly, as it might not be initialized in a > commit-graph-enabled repository. > > Only three functions are explicitly exempt, while this last semantic > patch in question implicitly allows a few more that assign a value to > 'c->maybe_tree'. These functions are release_commit_memory() and > parse_commit_buffer() in 'commit.c' and fill_commit_in_graph() in > 'commit-graph.c', and after a quick look these functions seem to be > rather fundamenal in the life-cycle of a commit object. Erm, not "commit object"; I meant the life-cycle of a 'struct commit' instance. > I think they deserve to be explicitly exempted, too, and then we could > remove this last semantic patch altogether. > >
On Thu, Apr 11, 2019 at 10:58:57PM +0200, SZEDER Gábor wrote: > On Thu, Apr 11, 2019 at 10:51:46PM +0200, SZEDER Gábor wrote: > > On Wed, Apr 10, 2019 at 10:56:52PM +0200, Johannes Schindelin wrote: > > > > ...>} > > > > > > > > @@ > > > > + expression c; > > > > ++expression r; > > > > + expression s; > > > > + @@ > > > > +-- get_commit_tree(c) = s > > > > ++- repo_get_commit_tree(r, c) = s > > > > + + c->maybe_tree = s > > > > > > I think this is wrong, and admittedly I had the very same version > > > originally. > > > > > > When you have an arbitrary `r` in any `repo_get_commit_tree(r, c)` (as > > > opposed to `the_repository`), the conversion to `c->maybe_tree` is most > > > likely incorrect. > > > > > > Therefore, I don't think that we can do that. > > > > So, as far as I understand, the goal of these 'c->maybe_tree'-related > > semantic patches is to prevent "generic" parts of Git from accessing > > this field directly, as it might not be initialized in a > > commit-graph-enabled repository. > > > > Only three functions are explicitly exempt, while this last semantic > > patch in question implicitly allows a few more that assign a value to > > 'c->maybe_tree'. These functions are release_commit_memory() and > > parse_commit_buffer() in 'commit.c' and fill_commit_in_graph() in > > 'commit-graph.c', ... and make_virtual_commit() in 'merge-recursive.c'. > and after a quick look these functions seem to be > > rather fundamenal in the life-cycle of a commit object. > > Erm, not "commit object"; I meant the life-cycle of a 'struct commit' > instance. > > > I think they deserve to be explicitly exempted, too, and then we could > > remove this last semantic patch altogether. And it would look like this. Yeah, that's a very long line there, but I don't think we can break it up. -- >8 -- diff --git a/contrib/coccinelle/commit.cocci b/contrib/coccinelle/commit.cocci index 57c8f71479..fe814f313e 100644 --- a/contrib/coccinelle/commit.cocci +++ b/contrib/coccinelle/commit.cocci @@ -10,20 +10,15 @@ expression c; - c->maybe_tree->object.oid.hash + get_commit_tree_oid(c)->hash -// These excluded functions must access c->maybe_tree direcly. +// These excluded functions must access/modify c->maybe_tree direcly. +// Note that if c->maybe_tree is written somewhere outside of these +// functions, then the recommended transformation will be bogus with +// repo_get_commit_tree() on the LHS. @@ -identifier f !~ "^(repo_get_commit_tree|get_commit_tree_in_graph_one|load_tree_for_commit)$"; -expression c; +identifier f !~ "^(repo_get_commit_tree|get_commit_tree_in_graph_one|load_tree_for_commit|fill_commit_in_graph|parse_commit_buffer|release_commit_memory|make_virtual_commit)$"; +struct commit *c; @@ f(...) {<... - c->maybe_tree + repo_get_commit_tree(the_repository, c) ...>} - -@@ -expression c; -expression r; -expression s; -@@ -- repo_get_commit_tree(r, c) = s -+ c->maybe_tree = s
Hi, On Fri, 12 Apr 2019, SZEDER Gábor wrote: > On Thu, Apr 11, 2019 at 10:58:57PM +0200, SZEDER Gábor wrote: > > On Thu, Apr 11, 2019 at 10:51:46PM +0200, SZEDER Gábor wrote: > > > On Wed, Apr 10, 2019 at 10:56:52PM +0200, Johannes Schindelin wrote: > > > > > ...>} > > > > > > > > > > @@ > > > > > + expression c; > > > > > ++expression r; > > > > > + expression s; > > > > > + @@ > > > > > +-- get_commit_tree(c) = s > > > > > ++- repo_get_commit_tree(r, c) = s > > > > > + + c->maybe_tree = s > > > > > > > > I think this is wrong, and admittedly I had the very same version > > > > originally. > > > > > > > > When you have an arbitrary `r` in any `repo_get_commit_tree(r, c)` (as > > > > opposed to `the_repository`), the conversion to `c->maybe_tree` is most > > > > likely incorrect. > > > > > > > > Therefore, I don't think that we can do that. > > > > > > So, as far as I understand, the goal of these 'c->maybe_tree'-related > > > semantic patches is to prevent "generic" parts of Git from accessing > > > this field directly, as it might not be initialized in a > > > commit-graph-enabled repository. > > > > > > Only three functions are explicitly exempt, while this last semantic > > > patch in question implicitly allows a few more that assign a value to > > > 'c->maybe_tree'. These functions are release_commit_memory() and > > > parse_commit_buffer() in 'commit.c' and fill_commit_in_graph() in > > > 'commit-graph.c', > > ... and make_virtual_commit() in 'merge-recursive.c'. > > > and after a quick look these functions seem to be > > > rather fundamenal in the life-cycle of a commit object. > > > > Erm, not "commit object"; I meant the life-cycle of a 'struct commit' > > instance. > > > > > I think they deserve to be explicitly exempted, too, and then we could > > > remove this last semantic patch altogether. > > And it would look like this. Yeah, that's a very long line there, but > I don't think we can break it up. > > -- >8 -- > > diff --git a/contrib/coccinelle/commit.cocci b/contrib/coccinelle/commit.cocci > index 57c8f71479..fe814f313e 100644 > --- a/contrib/coccinelle/commit.cocci > +++ b/contrib/coccinelle/commit.cocci > @@ -10,20 +10,15 @@ expression c; > - c->maybe_tree->object.oid.hash > + get_commit_tree_oid(c)->hash > > -// These excluded functions must access c->maybe_tree direcly. > +// These excluded functions must access/modify c->maybe_tree direcly. > +// Note that if c->maybe_tree is written somewhere outside of these > +// functions, then the recommended transformation will be bogus with > +// repo_get_commit_tree() on the LHS. > @@ > -identifier f !~ "^(repo_get_commit_tree|get_commit_tree_in_graph_one|load_tree_for_commit)$"; > -expression c; > +identifier f !~ "^(repo_get_commit_tree|get_commit_tree_in_graph_one|load_tree_for_commit|fill_commit_in_graph|parse_commit_buffer|release_commit_memory|make_virtual_commit)$"; Hahahaha! That's *really* long. And a good indicator that this should be hidden in a single helper function (`set_commit_tree()`, file-local of course) that is exempted in the cocci patch. Ciao, Dscho > +struct commit *c; > @@ > f(...) {<... > - c->maybe_tree > + repo_get_commit_tree(the_repository, c) > ...>} > - > -@@ > -expression c; > -expression r; > -expression s; > -@@ > -- repo_get_commit_tree(r, c) = s > -+ c->maybe_tree = s >
On Fri, Apr 12, 2019 at 04:25:08PM +0200, Johannes Schindelin wrote: > Hi, > > On Fri, 12 Apr 2019, SZEDER Gábor wrote: > > > On Thu, Apr 11, 2019 at 10:58:57PM +0200, SZEDER Gábor wrote: > > > On Thu, Apr 11, 2019 at 10:51:46PM +0200, SZEDER Gábor wrote: > > > > On Wed, Apr 10, 2019 at 10:56:52PM +0200, Johannes Schindelin wrote: > > > > > > ...>} > > > > > > > > > > > > @@ > > > > > > + expression c; > > > > > > ++expression r; > > > > > > + expression s; > > > > > > + @@ > > > > > > +-- get_commit_tree(c) = s > > > > > > ++- repo_get_commit_tree(r, c) = s > > > > > > + + c->maybe_tree = s > > > > > > > > > > I think this is wrong, and admittedly I had the very same version > > > > > originally. > > > > > > > > > > When you have an arbitrary `r` in any `repo_get_commit_tree(r, c)` (as > > > > > opposed to `the_repository`), the conversion to `c->maybe_tree` is most > > > > > likely incorrect. > > > > > > > > > > Therefore, I don't think that we can do that. > > > > > > > > So, as far as I understand, the goal of these 'c->maybe_tree'-related > > > > semantic patches is to prevent "generic" parts of Git from accessing > > > > this field directly, as it might not be initialized in a > > > > commit-graph-enabled repository. > > > > > > > > Only three functions are explicitly exempt, while this last semantic > > > > patch in question implicitly allows a few more that assign a value to > > > > 'c->maybe_tree'. These functions are release_commit_memory() and > > > > parse_commit_buffer() in 'commit.c' and fill_commit_in_graph() in > > > > 'commit-graph.c', > > > > ... and make_virtual_commit() in 'merge-recursive.c'. > > > > > and after a quick look these functions seem to be > > > > rather fundamenal in the life-cycle of a commit object. > > > > > > Erm, not "commit object"; I meant the life-cycle of a 'struct commit' > > > instance. > > > > > > > I think they deserve to be explicitly exempted, too, and then we could > > > > remove this last semantic patch altogether. > > > > And it would look like this. Yeah, that's a very long line there, but > > I don't think we can break it up. > > > > -- >8 -- > > > > diff --git a/contrib/coccinelle/commit.cocci b/contrib/coccinelle/commit.cocci > > index 57c8f71479..fe814f313e 100644 > > --- a/contrib/coccinelle/commit.cocci > > +++ b/contrib/coccinelle/commit.cocci > > @@ -10,20 +10,15 @@ expression c; > > - c->maybe_tree->object.oid.hash > > + get_commit_tree_oid(c)->hash > > > > -// These excluded functions must access c->maybe_tree direcly. > > +// These excluded functions must access/modify c->maybe_tree direcly. > > +// Note that if c->maybe_tree is written somewhere outside of these > > +// functions, then the recommended transformation will be bogus with > > +// repo_get_commit_tree() on the LHS. > > @@ > > -identifier f !~ "^(repo_get_commit_tree|get_commit_tree_in_graph_one|load_tree_for_commit)$"; > > -expression c; > > +identifier f !~ "^(repo_get_commit_tree|get_commit_tree_in_graph_one|load_tree_for_commit|fill_commit_in_graph|parse_commit_buffer|release_commit_memory|make_virtual_commit)$"; > > Hahahaha! That's *really* long. > > And a good indicator that this should be hidden in a single helper > function (`set_commit_tree()`, file-local of course) that is exempted in > the cocci patch. Note that this is not only about line length, and consider the slight differences between the three approaches: - Currently only direct read accesses to 'c->maybe_tree' outside of the listed functions are forbidden and transformed, but still any function can set this field directly (thanks to the last semantic patch in 'commit.cocci'). - Encapsulating writes in set_commit_tree() and adding this function to that list would prevent other functions from setting 'c->maybe_tree' directly, but still any function could set it indirectly by calling set_commit_tree(). - With that loooong line only those few listed special functions would be able read or write 'c->maybe_tree'. Does the additional restriction of the long line variant bring us benefits? Well, not sure. The root tree of a commit is needed in many places, and in the past we got used to it being always initialized in any 'struct commit' instance. However, with the commit graph that's not the case anymore, and any read accesses to the uninitialized root tree object would have bad consequences. That's why there is get_commit_tree() helper performing lazy-initialization, and the protection from direct reads in the form of the semantic patch. OTOH, apart from five functions, most parts of Git simply don't want to create new 'struct commit' instances themselves, or, more generally, set anything in a 'struct commit', so arguably there is not that much danger to protect ourselves from. Anyway, I just wanted to make sure that we fully understand the implications, and I think any solution is an improvement that eliminates the current "let's transform this code pattern, but then quickly transform it back in some cases" back-and-forth. > > +struct commit *c; > > @@ > > f(...) {<... > > - c->maybe_tree > > + repo_get_commit_tree(the_repository, c) > > ...>} > > - > > -@@ > > -expression c; > > -expression r; > > -expression s; > > -@@ > > -- repo_get_commit_tree(r, c) = s > > -+ c->maybe_tree = s > >
On Sat, Apr 13, 2019 at 7:14 PM SZEDER Gábor <szeder.dev@gmail.com> wrote: > > On Fri, Apr 12, 2019 at 04:25:08PM +0200, Johannes Schindelin wrote: > > Hi, > > > > On Fri, 12 Apr 2019, SZEDER Gábor wrote: > > > > > On Thu, Apr 11, 2019 at 10:58:57PM +0200, SZEDER Gábor wrote: > > > > On Thu, Apr 11, 2019 at 10:51:46PM +0200, SZEDER Gábor wrote: > > > > > On Wed, Apr 10, 2019 at 10:56:52PM +0200, Johannes Schindelin wrote: > > > > > > > ...>} > > > > > > > > > > > > > > @@ > > > > > > > + expression c; > > > > > > > ++expression r; > > > > > > > + expression s; > > > > > > > + @@ > > > > > > > +-- get_commit_tree(c) = s > > > > > > > ++- repo_get_commit_tree(r, c) = s > > > > > > > + + c->maybe_tree = s > > > > > > > > > > > > I think this is wrong, and admittedly I had the very same version > > > > > > originally. > > > > > > > > > > > > When you have an arbitrary `r` in any `repo_get_commit_tree(r, c)` (as > > > > > > opposed to `the_repository`), the conversion to `c->maybe_tree` is most > > > > > > likely incorrect. > > > > > > > > > > > > Therefore, I don't think that we can do that. > > > > > > > > > > So, as far as I understand, the goal of these 'c->maybe_tree'-related > > > > > semantic patches is to prevent "generic" parts of Git from accessing > > > > > this field directly, as it might not be initialized in a > > > > > commit-graph-enabled repository. > > > > > > > > > > Only three functions are explicitly exempt, while this last semantic > > > > > patch in question implicitly allows a few more that assign a value to > > > > > 'c->maybe_tree'. These functions are release_commit_memory() and > > > > > parse_commit_buffer() in 'commit.c' and fill_commit_in_graph() in > > > > > 'commit-graph.c', > > > > > > ... and make_virtual_commit() in 'merge-recursive.c'. > > > > > > > and after a quick look these functions seem to be > > > > > rather fundamenal in the life-cycle of a commit object. > > > > > > > > Erm, not "commit object"; I meant the life-cycle of a 'struct commit' > > > > instance. > > > > > > > > > I think they deserve to be explicitly exempted, too, and then we could > > > > > remove this last semantic patch altogether. > > > > > > And it would look like this. Yeah, that's a very long line there, but > > > I don't think we can break it up. > > > > > > -- >8 -- > > > > > > diff --git a/contrib/coccinelle/commit.cocci b/contrib/coccinelle/commit.cocci > > > index 57c8f71479..fe814f313e 100644 > > > --- a/contrib/coccinelle/commit.cocci > > > +++ b/contrib/coccinelle/commit.cocci > > > @@ -10,20 +10,15 @@ expression c; > > > - c->maybe_tree->object.oid.hash > > > + get_commit_tree_oid(c)->hash > > > > > > -// These excluded functions must access c->maybe_tree direcly. > > > +// These excluded functions must access/modify c->maybe_tree direcly. > > > +// Note that if c->maybe_tree is written somewhere outside of these > > > +// functions, then the recommended transformation will be bogus with > > > +// repo_get_commit_tree() on the LHS. > > > @@ > > > -identifier f !~ "^(repo_get_commit_tree|get_commit_tree_in_graph_one|load_tree_for_commit)$"; > > > -expression c; > > > +identifier f !~ "^(repo_get_commit_tree|get_commit_tree_in_graph_one|load_tree_for_commit|fill_commit_in_graph|parse_commit_buffer|release_commit_memory|make_virtual_commit)$"; > > > > Hahahaha! That's *really* long. > > > > And a good indicator that this should be hidden in a single helper > > function (`set_commit_tree()`, file-local of course) that is exempted in > > the cocci patch. > > Note that this is not only about line length, and consider the slight > differences between the three approaches: > > - Currently only direct read accesses to 'c->maybe_tree' outside of > the listed functions are forbidden and transformed, but still any > function can set this field directly (thanks to the last semantic > patch in 'commit.cocci'). > > - Encapsulating writes in set_commit_tree() and adding this function > to that list would prevent other functions from setting > 'c->maybe_tree' directly, but still any function could set it > indirectly by calling set_commit_tree(). > > - With that loooong line only those few listed special functions > would be able read or write 'c->maybe_tree'. > > Does the additional restriction of the long line variant bring us > benefits? Well, not sure. > > The root tree of a commit is needed in many places, and in the past we > got used to it being always initialized in any 'struct commit' > instance. However, with the commit graph that's not the case anymore, > and any read accesses to the uninitialized root tree object would have > bad consequences. That's why there is get_commit_tree() helper > performing lazy-initialization, and the protection from direct reads > in the form of the semantic patch. And I think set_commit_tree() is a good way to go. Basically "maybe_tree" is not safe to be manipulated directly because of traps and pitfalls. If we have one entry point to update maybe_tree, we can handle all that in one place (even though I don't foresee any special handling when updating it). Keeping commit.cocci shorter is just a side bonus.