mbox series

[v3,00/33] nd/sha1-name-c-wo-the-repository updates

Message ID 20190406113453.5149-1-pclouds@gmail.com (mailing list archive)
Headers show
Series nd/sha1-name-c-wo-the-repository updates | expand

Message

Duy Nguyen April 6, 2019, 11:34 a.m. UTC
Hopefully the final fix for commit.cocci in 11/33. Instead of adding
more commit details on 01/33 I replaced it with the two commits from
Szeder, he put more efforts into them anyway.

Range-diff dựa trên v2:
 1:  b992f6c799 !  1:  aa603ea09e rebase: 'make coccicheck' cleanup
    @@ -1,6 +1,14 @@
    -Author: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
    +Author: SZEDER Gábor <szeder.dev@gmail.com>
     
    -    rebase: 'make coccicheck' cleanup
    +    builtin rebase: use FREE_AND_NULL
    +
    +    Use the macro FREE_AND_NULL to release memory allocated for
    +    'head_name' and clear its pointer.
    +
    +    Patch generated with 'contrib/coccinelle/free.cocci' and Coccinelle
    +    v1.0.7 (previous Coccinelle versions don't notice this).
    +
    +    Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
     
      diff --git a/builtin/rebase.c b/builtin/rebase.c
      --- a/builtin/rebase.c
    @@ -15,12 +23,3 @@
      			branch_name = "HEAD";
      		}
      		if (get_oid("HEAD", &options.orig_head))
    -@@
    - 	 * we just fast-forwarded.
    - 	 */
    - 	strbuf_reset(&msg);
    --	if (!oidcmp(&merge_base, &options.orig_head)) {
    -+	if (oideq(&merge_base, &options.orig_head)) {
    - 		printf(_("Fast-forwarded %s to %s.\n"),
    - 			branch_name, options.onto_name);
    - 		strbuf_addf(&msg, "rebase finished: %s onto %s",
 -:  ---------- >  2:  b03cd28a4e builtin rebase: use oideq()
 2:  e76c73a75a =  3:  3a68454479 packfile.c: add repo_approximate_object_count()
 3:  962a168516 =  4:  9738ab8797 refs.c: add refs_ref_exists()
 4:  d06bea7909 =  5:  1a2cb46fe9 refs.c: add refs_shorten_unambiguous_ref()
 5:  fb880154e6 =  6:  ffdf9ad9bf refs.c: remove the_repo from substitute_branch_name()
 6:  6fd7960522 =  7:  697daf317f refs.c: remove the_repo from expand_ref()
 7:  f5ad3133bc =  8:  fac41b3b1c refs.c: add repo_dwim_ref()
 8:  7552a4085a =  9:  db528690fd refs.c: add repo_dwim_log()
 9:  962fff4c9c = 10:  9318416f94 refs.c: remove the_repo from read_ref_at()
10:  68876a150f ! 11:  848456f59c commit.c: add repo_get_commit_tree()
    @@ -2,6 +2,11 @@
     
         commit.c: add repo_get_commit_tree()
     
    +    Remove the implicit dependency on the_repository in this function.
    +    It will be used in sha1-name.c functions when they are updated to take
    +    any 'struct repository'. get_commit_tree() remains as a compat wrapper,
    +    to be slowly replaced later.
    +
      diff --git a/commit.c b/commit.c
      --- a/commit.c
      +++ b/commit.c
    @@ -29,6 +34,15 @@
      --- a/commit.h
      +++ b/commit.h
     @@
    + 
    + 	/*
    + 	 * If the commit is loaded from the commit-graph file, then this
    +-	 * member may be NULL. Only access it through get_commit_tree()
    ++	 * member may be NULL. Only access it through repo_get_commit_tree()
    + 	 * or get_commit_tree_oid().
    + 	 */
    + 	struct tree *maybe_tree;
    +@@
       */
      void free_commit_buffer(struct parsed_object_pool *pool, struct commit *);
      
    @@ -57,3 +71,10 @@
        ...>}
      
      @@
    + expression c;
    ++expression r;
    + expression s;
    + @@
    +-- get_commit_tree(c) = s
    ++- repo_get_commit_tree(r, c) = s
    + + c->maybe_tree = s
11:  9817d150b2 = 12:  a60c06b63a sha1-name.c: remove the_repo from sort_ambiguous()
12:  7ce23aaeb5 = 13:  3cbbe5de4a sha1-name.c: remove the_repo from find_abbrev_len_packed()
13:  2d99af106f = 14:  a79bf276bd sha1-name.c: add repo_find_unique_abbrev_r()
14:  f63626e937 = 15:  aa0fa191d9 sha1-name.c: store and use repo in struct disambiguate_state
15:  ce9f7ebab8 = 16:  ad00c19d30 sha1-name.c: add repo_for_each_abbrev()
16:  8cb8278d6e = 17:  d9ff762461 sha1-name.c: remove the_repo from get_short_oid()
17:  9210dd01b8 = 18:  46b8a05d8c sha1-name.c: remove the_repo from interpret_nth_prior_checkout()
18:  a161ed33d2 = 19:  52ec37f2d7 sha1-name.c: remove the_repo from interpret_branch_mark()
19:  6bf4df0131 = 20:  f50562b480 sha1-name.c: add repo_interpret_branch_name()
20:  94f707cb7f = 21:  66619e37cb sha1-name.c: remove the_repo from get_oid_oneline()
21:  2a1dd6368c = 22:  d960f47821 sha1-name.c: remove the_repo from get_describe_name()
22:  f958b565db = 23:  f38960304d sha1-name.c: remove the_repo from get_oid_basic()
23:  ec821007b2 = 24:  3b69cfcd8b sha1-name.c: remove the_repo from get_oid_1()
24:  317a365f30 = 25:  e889ae729a sha1-name.c: remove the_repo from handle_one_ref()
25:  3a46ea22b2 = 26:  386b6f4654 sha1-name.c: remove the_repo from diagnose_invalid_index_path()
26:  59a8cb9749 = 27:  d3ff3ee163 sha1-name.c: remove the_repo from resolve_relative_path()
27:  d1f4df1915 = 28:  226a30e6f6 sha1-name.c: remove the_repo from get_oid_with_context_1()
28:  3034e9cf1e = 29:  5d0aed2f70 sha1-name.c: add repo_get_oid()
29:  9c72941ec9 = 30:  72ab26a247 submodule-config.c: use repo_get_oid for reading .gitmodules
30:  40acfb6b82 = 31:  53dc11463d sha1-name.c: remove the_repo from maybe_die_on_misspelt_object_name
31:  3b8be75e77 = 32:  01b4deaa8a sha1-name.c: remove the_repo from other get_oid_*
32:  59cadc5deb = 33:  03a7283ef3 sha1-name.c: remove the_repo from get_oid_mb()

Nguyễn Thái Ngọc Duy (31):
  packfile.c: add repo_approximate_object_count()
  refs.c: add refs_ref_exists()
  refs.c: add refs_shorten_unambiguous_ref()
  refs.c: remove the_repo from substitute_branch_name()
  refs.c: remove the_repo from expand_ref()
  refs.c: add repo_dwim_ref()
  refs.c: add repo_dwim_log()
  refs.c: remove the_repo from read_ref_at()
  commit.c: add repo_get_commit_tree()
  sha1-name.c: remove the_repo from sort_ambiguous()
  sha1-name.c: remove the_repo from find_abbrev_len_packed()
  sha1-name.c: add repo_find_unique_abbrev_r()
  sha1-name.c: store and use repo in struct disambiguate_state
  sha1-name.c: add repo_for_each_abbrev()
  sha1-name.c: remove the_repo from get_short_oid()
  sha1-name.c: remove the_repo from interpret_nth_prior_checkout()
  sha1-name.c: remove the_repo from interpret_branch_mark()
  sha1-name.c: add repo_interpret_branch_name()
  sha1-name.c: remove the_repo from get_oid_oneline()
  sha1-name.c: remove the_repo from get_describe_name()
  sha1-name.c: remove the_repo from get_oid_basic()
  sha1-name.c: remove the_repo from get_oid_1()
  sha1-name.c: remove the_repo from handle_one_ref()
  sha1-name.c: remove the_repo from diagnose_invalid_index_path()
  sha1-name.c: remove the_repo from resolve_relative_path()
  sha1-name.c: remove the_repo from get_oid_with_context_1()
  sha1-name.c: add repo_get_oid()
  submodule-config.c: use repo_get_oid for reading .gitmodules
  sha1-name.c: remove the_repo from maybe_die_on_misspelt_object_name
  sha1-name.c: remove the_repo from other get_oid_*
  sha1-name.c: remove the_repo from get_oid_mb()

SZEDER Gábor (2):
  builtin rebase: use FREE_AND_NULL
  builtin rebase: use oideq()

 builtin/rebase.c                   |   5 +-
 builtin/show-branch.c              |   6 +-
 cache.h                            |  50 ++--
 commit.c                           |   5 +-
 commit.h                           |   5 +-
 contrib/coccinelle/commit.cocci    |   7 +-
 dir.c                              |   8 +
 dir.h                              |   4 +-
 packfile.c                         |  14 +-
 packfile.h                         |   3 +-
 refs.c                             |  71 ++++--
 refs.h                             |   9 +-
 setup.c                            |   7 +-
 sha1-name.c                        | 388 ++++++++++++++++++-----------
 submodule-config.c                 |  20 +-
 t/t7814-grep-recurse-submodules.sh |   6 +-
 upload-pack.c                      |   2 +-
 17 files changed, 378 insertions(+), 232 deletions(-)

Comments

Johannes Schindelin April 10, 2019, 8:56 p.m. UTC | #1
Hi Duy,

On Sat, 6 Apr 2019, Nguyễn Thái Ngọc Duy wrote:

> 10:  68876a150f ! 11:  848456f59c commit.c: add repo_get_commit_tree()
>     @@ -2,6 +2,11 @@
>
>          commit.c: add repo_get_commit_tree()
>
>     +    Remove the implicit dependency on the_repository in this function.
>     +    It will be used in sha1-name.c functions when they are updated to take
>     +    any 'struct repository'. get_commit_tree() remains as a compat wrapper,
>     +    to be slowly replaced later.
>     +
>       diff --git a/commit.c b/commit.c
>       --- a/commit.c
>       +++ b/commit.c
>     @@ -29,6 +34,15 @@
>       --- a/commit.h
>       +++ b/commit.h
>      @@
>     +
>     + 	/*
>     + 	 * If the commit is loaded from the commit-graph file, then this
>     +-	 * member may be NULL. Only access it through get_commit_tree()
>     ++	 * member may be NULL. Only access it through repo_get_commit_tree()
>     + 	 * or get_commit_tree_oid().
>     + 	 */
>     + 	struct tree *maybe_tree;
>     +@@
>        */
>       void free_commit_buffer(struct parsed_object_pool *pool, struct commit *);
>
>     @@ -57,3 +71,10 @@
>         ...>}
>
>       @@
>     + expression c;
>     ++expression r;
>     + expression s;
>     + @@
>     +-- get_commit_tree(c) = s
>     ++- repo_get_commit_tree(r, c) = s
>     + + c->maybe_tree = s

I think this is wrong, and admittedly I had the very same version
originally.

When you have an arbitrary `r` in any `repo_get_commit_tree(r, c)` (as
opposed to `the_repository`), the conversion to `c->maybe_tree` is most
likely incorrect.

Therefore, I don't think that we can do that.

Ciao,
Johannes
Duy Nguyen April 11, 2019, 12:26 a.m. UTC | #2
On Thu, Apr 11, 2019 at 3:56 AM Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
> >       @@
> >     + expression c;
> >     ++expression r;
> >     + expression s;
> >     + @@
> >     +-- get_commit_tree(c) = s
> >     ++- repo_get_commit_tree(r, c) = s
> >     + + c->maybe_tree = s
>
> I think this is wrong, and admittedly I had the very same version
> originally.
>
> When you have an arbitrary `r` in any `repo_get_commit_tree(r, c)` (as
> opposed to `the_repository`), the conversion to `c->maybe_tree` is most
> likely incorrect.

I did read the get_commit_tree() source code before doing this. struct
repository is only used to get commit graph to speed things up and we
can't change a thing there when maybe_tree is reassigned. To reassign
maybe_tree, commit-graph does not matter. Neither does the_repository
(vs arbitrary struct repo)

>
> Therefore, I don't think that we can do that.
>
> Ciao,
> Johannes
Johannes Schindelin April 11, 2019, 8:36 p.m. UTC | #3
Hi,

On Thu, 11 Apr 2019, Duy Nguyen wrote:

> On Thu, Apr 11, 2019 at 3:56 AM Johannes Schindelin
> <Johannes.Schindelin@gmx.de> wrote:
> > >       @@
> > >     + expression c;
> > >     ++expression r;
> > >     + expression s;
> > >     + @@
> > >     +-- get_commit_tree(c) = s
> > >     ++- repo_get_commit_tree(r, c) = s
> > >     + + c->maybe_tree = s
> >
> > I think this is wrong, and admittedly I had the very same version
> > originally.
> >
> > When you have an arbitrary `r` in any `repo_get_commit_tree(r, c)` (as
> > opposed to `the_repository`), the conversion to `c->maybe_tree` is most
> > likely incorrect.
>
> I did read the get_commit_tree() source code before doing this. struct
> repository is only used to get commit graph to speed things up and we
> can't change a thing there when maybe_tree is reassigned. To reassign
> maybe_tree, commit-graph does not matter. Neither does the_repository
> (vs arbitrary struct repo)

You read the current code. Obviously, you had no future code to read.

The cocci patch you are changing will affect future code, too.

The safe, and correct, thing to do, then, is to not pretend to know what
future `get_commit_tree()` functions will do, and instead go with the
stricter  version of the cocci patch, as I had suggested.

Ciao,
Johannes

> >
> > Therefore, I don't think that we can do that.
> >
> > Ciao,
> > Johannes
>
>
>
> --
> Duy
>
SZEDER Gábor April 11, 2019, 8:51 p.m. UTC | #4
On Wed, Apr 10, 2019 at 10:56:52PM +0200, Johannes Schindelin wrote:
> Hi Duy,
> 
> On Sat, 6 Apr 2019, Nguyễn Thái Ngọc Duy wrote:
> 
> > 10:  68876a150f ! 11:  848456f59c commit.c: add repo_get_commit_tree()
> >     @@ -2,6 +2,11 @@
> >
> >          commit.c: add repo_get_commit_tree()
> >
> >     +    Remove the implicit dependency on the_repository in this function.
> >     +    It will be used in sha1-name.c functions when they are updated to take
> >     +    any 'struct repository'. get_commit_tree() remains as a compat wrapper,
> >     +    to be slowly replaced later.
> >     +
> >       diff --git a/commit.c b/commit.c
> >       --- a/commit.c
> >       +++ b/commit.c
> >     @@ -29,6 +34,15 @@
> >       --- a/commit.h
> >       +++ b/commit.h
> >      @@
> >     +
> >     + 	/*
> >     + 	 * If the commit is loaded from the commit-graph file, then this
> >     +-	 * member may be NULL. Only access it through get_commit_tree()
> >     ++	 * member may be NULL. Only access it through repo_get_commit_tree()
> >     + 	 * or get_commit_tree_oid().
> >     + 	 */
> >     + 	struct tree *maybe_tree;
> >     +@@
> >        */
> >       void free_commit_buffer(struct parsed_object_pool *pool, struct commit *);
> >
> >     @@ -57,3 +71,10 @@
> >         ...>}
> >
> >       @@
> >     + expression c;
> >     ++expression r;
> >     + expression s;
> >     + @@
> >     +-- get_commit_tree(c) = s
> >     ++- repo_get_commit_tree(r, c) = s
> >     + + c->maybe_tree = s
> 
> I think this is wrong, and admittedly I had the very same version
> originally.
> 
> When you have an arbitrary `r` in any `repo_get_commit_tree(r, c)` (as
> opposed to `the_repository`), the conversion to `c->maybe_tree` is most
> likely incorrect.
> 
> Therefore, I don't think that we can do that.

So, as far as I understand, the goal of these 'c->maybe_tree'-related
semantic patches is to prevent "generic" parts of Git from accessing
this field directly, as it might not be initialized in a
commit-graph-enabled repository.

Only three functions are explicitly exempt, while this last semantic
patch in question implicitly allows a few more that assign a value to
'c->maybe_tree'.  These functions are release_commit_memory() and
parse_commit_buffer() in 'commit.c' and fill_commit_in_graph() in
'commit-graph.c', and after a quick look these functions seem to be
rather fundamenal in the life-cycle of a commit object.

I think they deserve to be explicitly exempted, too, and then we could
remove this last semantic patch altogether.
SZEDER Gábor April 11, 2019, 8:58 p.m. UTC | #5
On Thu, Apr 11, 2019 at 10:51:46PM +0200, SZEDER Gábor wrote:
> On Wed, Apr 10, 2019 at 10:56:52PM +0200, Johannes Schindelin wrote:
> > >         ...>}
> > >
> > >       @@
> > >     + expression c;
> > >     ++expression r;
> > >     + expression s;
> > >     + @@
> > >     +-- get_commit_tree(c) = s
> > >     ++- repo_get_commit_tree(r, c) = s
> > >     + + c->maybe_tree = s
> > 
> > I think this is wrong, and admittedly I had the very same version
> > originally.
> > 
> > When you have an arbitrary `r` in any `repo_get_commit_tree(r, c)` (as
> > opposed to `the_repository`), the conversion to `c->maybe_tree` is most
> > likely incorrect.
> > 
> > Therefore, I don't think that we can do that.
> 
> So, as far as I understand, the goal of these 'c->maybe_tree'-related
> semantic patches is to prevent "generic" parts of Git from accessing
> this field directly, as it might not be initialized in a
> commit-graph-enabled repository.
> 
> Only three functions are explicitly exempt, while this last semantic
> patch in question implicitly allows a few more that assign a value to
> 'c->maybe_tree'.  These functions are release_commit_memory() and
> parse_commit_buffer() in 'commit.c' and fill_commit_in_graph() in
> 'commit-graph.c', and after a quick look these functions seem to be
> rather fundamenal in the life-cycle of a commit object.

Erm, not "commit object"; I meant the life-cycle of a 'struct commit'
instance.

> I think they deserve to be explicitly exempted, too, and then we could
> remove this last semantic patch altogether.
> 
>
SZEDER Gábor April 12, 2019, 12:17 a.m. UTC | #6
On Thu, Apr 11, 2019 at 10:58:57PM +0200, SZEDER Gábor wrote:
> On Thu, Apr 11, 2019 at 10:51:46PM +0200, SZEDER Gábor wrote:
> > On Wed, Apr 10, 2019 at 10:56:52PM +0200, Johannes Schindelin wrote:
> > > >         ...>}
> > > >
> > > >       @@
> > > >     + expression c;
> > > >     ++expression r;
> > > >     + expression s;
> > > >     + @@
> > > >     +-- get_commit_tree(c) = s
> > > >     ++- repo_get_commit_tree(r, c) = s
> > > >     + + c->maybe_tree = s
> > > 
> > > I think this is wrong, and admittedly I had the very same version
> > > originally.
> > > 
> > > When you have an arbitrary `r` in any `repo_get_commit_tree(r, c)` (as
> > > opposed to `the_repository`), the conversion to `c->maybe_tree` is most
> > > likely incorrect.
> > > 
> > > Therefore, I don't think that we can do that.
> > 
> > So, as far as I understand, the goal of these 'c->maybe_tree'-related
> > semantic patches is to prevent "generic" parts of Git from accessing
> > this field directly, as it might not be initialized in a
> > commit-graph-enabled repository.
> > 
> > Only three functions are explicitly exempt, while this last semantic
> > patch in question implicitly allows a few more that assign a value to
> > 'c->maybe_tree'.  These functions are release_commit_memory() and
> > parse_commit_buffer() in 'commit.c' and fill_commit_in_graph() in
> > 'commit-graph.c', 

... and make_virtual_commit() in 'merge-recursive.c'.

> and after a quick look these functions seem to be
> > rather fundamenal in the life-cycle of a commit object.
> 
> Erm, not "commit object"; I meant the life-cycle of a 'struct commit'
> instance.
> 
> > I think they deserve to be explicitly exempted, too, and then we could
> > remove this last semantic patch altogether.

And it would look like this.  Yeah, that's a very long line there, but
I don't think we can break it up.

  -- >8 --

diff --git a/contrib/coccinelle/commit.cocci b/contrib/coccinelle/commit.cocci
index 57c8f71479..fe814f313e 100644
--- a/contrib/coccinelle/commit.cocci
+++ b/contrib/coccinelle/commit.cocci
@@ -10,20 +10,15 @@ expression c;
 - c->maybe_tree->object.oid.hash
 + get_commit_tree_oid(c)->hash
 
-// These excluded functions must access c->maybe_tree direcly.
+// These excluded functions must access/modify c->maybe_tree direcly.
+// Note that if c->maybe_tree is written somewhere outside of these
+// functions, then the recommended transformation will be bogus with
+// repo_get_commit_tree() on the LHS.
 @@
-identifier f !~ "^(repo_get_commit_tree|get_commit_tree_in_graph_one|load_tree_for_commit)$";
-expression c;
+identifier f !~ "^(repo_get_commit_tree|get_commit_tree_in_graph_one|load_tree_for_commit|fill_commit_in_graph|parse_commit_buffer|release_commit_memory|make_virtual_commit)$";
+struct commit *c;
 @@
   f(...) {<...
 - c->maybe_tree
 + repo_get_commit_tree(the_repository, c)
   ...>}
-
-@@
-expression c;
-expression r;
-expression s;
-@@
-- repo_get_commit_tree(r, c) = s
-+ c->maybe_tree = s
Johannes Schindelin April 12, 2019, 2:25 p.m. UTC | #7
Hi,

On Fri, 12 Apr 2019, SZEDER Gábor wrote:

> On Thu, Apr 11, 2019 at 10:58:57PM +0200, SZEDER Gábor wrote:
> > On Thu, Apr 11, 2019 at 10:51:46PM +0200, SZEDER Gábor wrote:
> > > On Wed, Apr 10, 2019 at 10:56:52PM +0200, Johannes Schindelin wrote:
> > > > >         ...>}
> > > > >
> > > > >       @@
> > > > >     + expression c;
> > > > >     ++expression r;
> > > > >     + expression s;
> > > > >     + @@
> > > > >     +-- get_commit_tree(c) = s
> > > > >     ++- repo_get_commit_tree(r, c) = s
> > > > >     + + c->maybe_tree = s
> > > >
> > > > I think this is wrong, and admittedly I had the very same version
> > > > originally.
> > > >
> > > > When you have an arbitrary `r` in any `repo_get_commit_tree(r, c)` (as
> > > > opposed to `the_repository`), the conversion to `c->maybe_tree` is most
> > > > likely incorrect.
> > > >
> > > > Therefore, I don't think that we can do that.
> > >
> > > So, as far as I understand, the goal of these 'c->maybe_tree'-related
> > > semantic patches is to prevent "generic" parts of Git from accessing
> > > this field directly, as it might not be initialized in a
> > > commit-graph-enabled repository.
> > >
> > > Only three functions are explicitly exempt, while this last semantic
> > > patch in question implicitly allows a few more that assign a value to
> > > 'c->maybe_tree'.  These functions are release_commit_memory() and
> > > parse_commit_buffer() in 'commit.c' and fill_commit_in_graph() in
> > > 'commit-graph.c',
>
> ... and make_virtual_commit() in 'merge-recursive.c'.
>
> > and after a quick look these functions seem to be
> > > rather fundamenal in the life-cycle of a commit object.
> >
> > Erm, not "commit object"; I meant the life-cycle of a 'struct commit'
> > instance.
> >
> > > I think they deserve to be explicitly exempted, too, and then we could
> > > remove this last semantic patch altogether.
>
> And it would look like this.  Yeah, that's a very long line there, but
> I don't think we can break it up.
>
>   -- >8 --
>
> diff --git a/contrib/coccinelle/commit.cocci b/contrib/coccinelle/commit.cocci
> index 57c8f71479..fe814f313e 100644
> --- a/contrib/coccinelle/commit.cocci
> +++ b/contrib/coccinelle/commit.cocci
> @@ -10,20 +10,15 @@ expression c;
>  - c->maybe_tree->object.oid.hash
>  + get_commit_tree_oid(c)->hash
>
> -// These excluded functions must access c->maybe_tree direcly.
> +// These excluded functions must access/modify c->maybe_tree direcly.
> +// Note that if c->maybe_tree is written somewhere outside of these
> +// functions, then the recommended transformation will be bogus with
> +// repo_get_commit_tree() on the LHS.
>  @@
> -identifier f !~ "^(repo_get_commit_tree|get_commit_tree_in_graph_one|load_tree_for_commit)$";
> -expression c;
> +identifier f !~ "^(repo_get_commit_tree|get_commit_tree_in_graph_one|load_tree_for_commit|fill_commit_in_graph|parse_commit_buffer|release_commit_memory|make_virtual_commit)$";

Hahahaha! That's *really* long.

And a good indicator that this should be hidden in a single helper
function (`set_commit_tree()`, file-local of course) that is exempted in
the cocci patch.

Ciao,
Dscho

> +struct commit *c;
>  @@
>    f(...) {<...
>  - c->maybe_tree
>  + repo_get_commit_tree(the_repository, c)
>    ...>}
> -
> -@@
> -expression c;
> -expression r;
> -expression s;
> -@@
> -- repo_get_commit_tree(r, c) = s
> -+ c->maybe_tree = s
>
SZEDER Gábor April 13, 2019, 12:14 p.m. UTC | #8
On Fri, Apr 12, 2019 at 04:25:08PM +0200, Johannes Schindelin wrote:
> Hi,
> 
> On Fri, 12 Apr 2019, SZEDER Gábor wrote:
> 
> > On Thu, Apr 11, 2019 at 10:58:57PM +0200, SZEDER Gábor wrote:
> > > On Thu, Apr 11, 2019 at 10:51:46PM +0200, SZEDER Gábor wrote:
> > > > On Wed, Apr 10, 2019 at 10:56:52PM +0200, Johannes Schindelin wrote:
> > > > > >         ...>}
> > > > > >
> > > > > >       @@
> > > > > >     + expression c;
> > > > > >     ++expression r;
> > > > > >     + expression s;
> > > > > >     + @@
> > > > > >     +-- get_commit_tree(c) = s
> > > > > >     ++- repo_get_commit_tree(r, c) = s
> > > > > >     + + c->maybe_tree = s
> > > > >
> > > > > I think this is wrong, and admittedly I had the very same version
> > > > > originally.
> > > > >
> > > > > When you have an arbitrary `r` in any `repo_get_commit_tree(r, c)` (as
> > > > > opposed to `the_repository`), the conversion to `c->maybe_tree` is most
> > > > > likely incorrect.
> > > > >
> > > > > Therefore, I don't think that we can do that.
> > > >
> > > > So, as far as I understand, the goal of these 'c->maybe_tree'-related
> > > > semantic patches is to prevent "generic" parts of Git from accessing
> > > > this field directly, as it might not be initialized in a
> > > > commit-graph-enabled repository.
> > > >
> > > > Only three functions are explicitly exempt, while this last semantic
> > > > patch in question implicitly allows a few more that assign a value to
> > > > 'c->maybe_tree'.  These functions are release_commit_memory() and
> > > > parse_commit_buffer() in 'commit.c' and fill_commit_in_graph() in
> > > > 'commit-graph.c',
> >
> > ... and make_virtual_commit() in 'merge-recursive.c'.
> >
> > > and after a quick look these functions seem to be
> > > > rather fundamenal in the life-cycle of a commit object.
> > >
> > > Erm, not "commit object"; I meant the life-cycle of a 'struct commit'
> > > instance.
> > >
> > > > I think they deserve to be explicitly exempted, too, and then we could
> > > > remove this last semantic patch altogether.
> >
> > And it would look like this.  Yeah, that's a very long line there, but
> > I don't think we can break it up.
> >
> >   -- >8 --
> >
> > diff --git a/contrib/coccinelle/commit.cocci b/contrib/coccinelle/commit.cocci
> > index 57c8f71479..fe814f313e 100644
> > --- a/contrib/coccinelle/commit.cocci
> > +++ b/contrib/coccinelle/commit.cocci
> > @@ -10,20 +10,15 @@ expression c;
> >  - c->maybe_tree->object.oid.hash
> >  + get_commit_tree_oid(c)->hash
> >
> > -// These excluded functions must access c->maybe_tree direcly.
> > +// These excluded functions must access/modify c->maybe_tree direcly.
> > +// Note that if c->maybe_tree is written somewhere outside of these
> > +// functions, then the recommended transformation will be bogus with
> > +// repo_get_commit_tree() on the LHS.
> >  @@
> > -identifier f !~ "^(repo_get_commit_tree|get_commit_tree_in_graph_one|load_tree_for_commit)$";
> > -expression c;
> > +identifier f !~ "^(repo_get_commit_tree|get_commit_tree_in_graph_one|load_tree_for_commit|fill_commit_in_graph|parse_commit_buffer|release_commit_memory|make_virtual_commit)$";
> 
> Hahahaha! That's *really* long.
> 
> And a good indicator that this should be hidden in a single helper
> function (`set_commit_tree()`, file-local of course) that is exempted in
> the cocci patch.

Note that this is not only about line length, and consider the slight
differences between the three approaches:

  - Currently only direct read accesses to 'c->maybe_tree' outside of
    the listed functions are forbidden and transformed, but still any
    function can set this field directly (thanks to the last semantic
    patch in 'commit.cocci').

  - Encapsulating writes in set_commit_tree() and adding this function
    to that list would prevent other functions from setting
    'c->maybe_tree' directly, but still any function could set it
    indirectly by calling set_commit_tree().

  - With that loooong line only those few listed special functions
    would be able read or write 'c->maybe_tree'.

Does the additional restriction of the long line variant bring us
benefits?  Well, not sure.

The root tree of a commit is needed in many places, and in the past we
got used to it being always initialized in any 'struct commit'
instance.  However, with the commit graph that's not the case anymore,
and any read accesses to the uninitialized root tree object would have
bad consequences.  That's why there is get_commit_tree() helper
performing lazy-initialization, and the protection from direct reads
in the form of the semantic patch.

OTOH, apart from five functions, most parts of Git simply don't want
to create new 'struct commit' instances themselves, or, more
generally, set anything in a 'struct commit', so arguably there is not
that much danger to protect ourselves from.

Anyway, I just wanted to make sure that we fully understand the
implications, and I think any solution is an improvement that
eliminates the current "let's transform this code pattern, but then
quickly transform it back in some cases" back-and-forth.


> > +struct commit *c;
> >  @@
> >    f(...) {<...
> >  - c->maybe_tree
> >  + repo_get_commit_tree(the_repository, c)
> >    ...>}
> > -
> > -@@
> > -expression c;
> > -expression r;
> > -expression s;
> > -@@
> > -- repo_get_commit_tree(r, c) = s
> > -+ c->maybe_tree = s
> >
Duy Nguyen April 13, 2019, 2 p.m. UTC | #9
On Sat, Apr 13, 2019 at 7:14 PM SZEDER Gábor <szeder.dev@gmail.com> wrote:
>
> On Fri, Apr 12, 2019 at 04:25:08PM +0200, Johannes Schindelin wrote:
> > Hi,
> >
> > On Fri, 12 Apr 2019, SZEDER Gábor wrote:
> >
> > > On Thu, Apr 11, 2019 at 10:58:57PM +0200, SZEDER Gábor wrote:
> > > > On Thu, Apr 11, 2019 at 10:51:46PM +0200, SZEDER Gábor wrote:
> > > > > On Wed, Apr 10, 2019 at 10:56:52PM +0200, Johannes Schindelin wrote:
> > > > > > >         ...>}
> > > > > > >
> > > > > > >       @@
> > > > > > >     + expression c;
> > > > > > >     ++expression r;
> > > > > > >     + expression s;
> > > > > > >     + @@
> > > > > > >     +-- get_commit_tree(c) = s
> > > > > > >     ++- repo_get_commit_tree(r, c) = s
> > > > > > >     + + c->maybe_tree = s
> > > > > >
> > > > > > I think this is wrong, and admittedly I had the very same version
> > > > > > originally.
> > > > > >
> > > > > > When you have an arbitrary `r` in any `repo_get_commit_tree(r, c)` (as
> > > > > > opposed to `the_repository`), the conversion to `c->maybe_tree` is most
> > > > > > likely incorrect.
> > > > > >
> > > > > > Therefore, I don't think that we can do that.
> > > > >
> > > > > So, as far as I understand, the goal of these 'c->maybe_tree'-related
> > > > > semantic patches is to prevent "generic" parts of Git from accessing
> > > > > this field directly, as it might not be initialized in a
> > > > > commit-graph-enabled repository.
> > > > >
> > > > > Only three functions are explicitly exempt, while this last semantic
> > > > > patch in question implicitly allows a few more that assign a value to
> > > > > 'c->maybe_tree'.  These functions are release_commit_memory() and
> > > > > parse_commit_buffer() in 'commit.c' and fill_commit_in_graph() in
> > > > > 'commit-graph.c',
> > >
> > > ... and make_virtual_commit() in 'merge-recursive.c'.
> > >
> > > > and after a quick look these functions seem to be
> > > > > rather fundamenal in the life-cycle of a commit object.
> > > >
> > > > Erm, not "commit object"; I meant the life-cycle of a 'struct commit'
> > > > instance.
> > > >
> > > > > I think they deserve to be explicitly exempted, too, and then we could
> > > > > remove this last semantic patch altogether.
> > >
> > > And it would look like this.  Yeah, that's a very long line there, but
> > > I don't think we can break it up.
> > >
> > >   -- >8 --
> > >
> > > diff --git a/contrib/coccinelle/commit.cocci b/contrib/coccinelle/commit.cocci
> > > index 57c8f71479..fe814f313e 100644
> > > --- a/contrib/coccinelle/commit.cocci
> > > +++ b/contrib/coccinelle/commit.cocci
> > > @@ -10,20 +10,15 @@ expression c;
> > >  - c->maybe_tree->object.oid.hash
> > >  + get_commit_tree_oid(c)->hash
> > >
> > > -// These excluded functions must access c->maybe_tree direcly.
> > > +// These excluded functions must access/modify c->maybe_tree direcly.
> > > +// Note that if c->maybe_tree is written somewhere outside of these
> > > +// functions, then the recommended transformation will be bogus with
> > > +// repo_get_commit_tree() on the LHS.
> > >  @@
> > > -identifier f !~ "^(repo_get_commit_tree|get_commit_tree_in_graph_one|load_tree_for_commit)$";
> > > -expression c;
> > > +identifier f !~ "^(repo_get_commit_tree|get_commit_tree_in_graph_one|load_tree_for_commit|fill_commit_in_graph|parse_commit_buffer|release_commit_memory|make_virtual_commit)$";
> >
> > Hahahaha! That's *really* long.
> >
> > And a good indicator that this should be hidden in a single helper
> > function (`set_commit_tree()`, file-local of course) that is exempted in
> > the cocci patch.
>
> Note that this is not only about line length, and consider the slight
> differences between the three approaches:
>
>   - Currently only direct read accesses to 'c->maybe_tree' outside of
>     the listed functions are forbidden and transformed, but still any
>     function can set this field directly (thanks to the last semantic
>     patch in 'commit.cocci').
>
>   - Encapsulating writes in set_commit_tree() and adding this function
>     to that list would prevent other functions from setting
>     'c->maybe_tree' directly, but still any function could set it
>     indirectly by calling set_commit_tree().
>
>   - With that loooong line only those few listed special functions
>     would be able read or write 'c->maybe_tree'.
>
> Does the additional restriction of the long line variant bring us
> benefits?  Well, not sure.
>
> The root tree of a commit is needed in many places, and in the past we
> got used to it being always initialized in any 'struct commit'
> instance.  However, with the commit graph that's not the case anymore,
> and any read accesses to the uninitialized root tree object would have
> bad consequences.  That's why there is get_commit_tree() helper
> performing lazy-initialization, and the protection from direct reads
> in the form of the semantic patch.

And I think set_commit_tree() is a good way to go. Basically
"maybe_tree" is not safe to be manipulated directly because of traps
and pitfalls. If we have one entry point to update maybe_tree, we can
handle all that in one place (even though I don't foresee any special
handling when updating it). Keeping commit.cocci shorter is just a
side bonus.