[5/5] path.c: don't call the match function without value in trie_find()
diff mbox series

Message ID 20191021160043.701-6-szeder.dev@gmail.com
State New
Headers show
Series
  • path.c: a couple of common dir/trie fixes
Related show

Commit Message

SZEDER Gábor Oct. 21, 2019, 4 p.m. UTC
'logs/refs' is not a working tree-specific path, but since commit
b9317d55a3 (Make sure refs/rewritten/ is per-worktree, 2019-03-07)
'git rev-parse --git-path' has been returning a bogus path if a
trailing '/' is present:

  $ git -C WT/ rev-parse --git-path logs/refs --git-path logs/refs/
  /home/szeder/src/git/.git/logs/refs
  /home/szeder/src/git/.git/worktrees/WT/logs/refs/

We use a trie data structure to efficiently decide whether a path
belongs to the common dir or is working tree-specific.  As it happens
b9317d55a3 triggered a bug that is as old as the trie implementation
itself, added in 4e09cf2acf (path: optimize common dir checking,
2015-08-31).

  - According to the comment describing trie_find(), it should only
    call the given match function 'fn' for a "/-or-\0-terminated
    prefix of the key for which the trie contains a value".  This is
    not true: there are three places where trie_find() calls the match
    function, but one of them is missing the check for value's
    existence.

  - b9317d55a3 added two new keys to the trie: 'logs/refs/rewritten'
    and 'logs/refs/worktree', next to the already existing
    'logs/refs/bisect'.  This resulted in a trie node with the path
    'logs/refs', which didn't exist before, and which doesn't have a
    value attached.  A query for 'logs/refs/' finds this node and then
    hits that one callsite of the match function which doesn't check
    for the value's existence, and thus invokes the match function
    with NULL as value.

  - When the match function check_common() is invoked with a NULL
    value, it returns 0, which indicates that the queried path doesn't
    belong to the common directory, ultimately resulting the bogus
    path shown above.

Add the missing condition to trie_find() so it will never invoke the
match function with a non-existing value.  check_common() will then no
longer have to check that it got a non-NULL value, so remove that
condition.

I believe that there are no other paths that could cause similar bogus
output.  AFAICT the only other key resulting in the match function
being called with a NULL value is 'co' (because of the keys 'common'
and 'config').  However, as they are not in a directory that belongs
to the common directory the resulting working tree-specific path is
expected.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
---
 path.c                | 11 ++++++-----
 t/t0060-path-utils.sh |  2 ++
 2 files changed, 8 insertions(+), 5 deletions(-)

Comments

David Turner Oct. 21, 2019, 5:39 p.m. UTC | #1
On Mon, 2019-10-21 at 18:00 +0200, SZEDER Gábor wrote:
> Add the missing condition to trie_find() so it will never invoke the
> match function with a non-existing value.  check_common() will then
> no
> longer have to check that it got a non-NULL value, so remove that
> condition.
...
>  
>  	/* Partial path normalization: skip consecutive slashes */
>  	while (key[0] == '/' && key[1] == '/')
> @@ -345,9 +349,6 @@ static int check_common(const char *unmatched,
> void *value, void *baton)
>  {
>  	struct common_dir *dir = value;
>  
> -	if (!dir)
> -		return 0;


Do we want to assert(dir) here?

Overall, LGTM.  Thanks for the clean-up.
SZEDER Gábor Oct. 21, 2019, 8:57 p.m. UTC | #2
On Mon, Oct 21, 2019 at 06:00:43PM +0200, SZEDER Gábor wrote:
> 'logs/refs' is not a working tree-specific path, but since commit
> b9317d55a3 (Make sure refs/rewritten/ is per-worktree, 2019-03-07)
> 'git rev-parse --git-path' has been returning a bogus path if a
> trailing '/' is present:
> 
>   $ git -C WT/ rev-parse --git-path logs/refs --git-path logs/refs/
>   /home/szeder/src/git/.git/logs/refs
>   /home/szeder/src/git/.git/worktrees/WT/logs/refs/
> 
> We use a trie data structure to efficiently decide whether a path
> belongs to the common dir or is working tree-specific.  As it happens
> b9317d55a3 triggered a bug that is as old as the trie implementation
> itself, added in 4e09cf2acf (path: optimize common dir checking,
> 2015-08-31).
> 
>   - According to the comment describing trie_find(), it should only
>     call the given match function 'fn' for a "/-or-\0-terminated
>     prefix of the key for which the trie contains a value".  This is
>     not true: there are three places where trie_find() calls the match
>     function, but one of them is missing the check for value's
>     existence.
> 
>   - b9317d55a3 added two new keys to the trie: 'logs/refs/rewritten'
>     and 'logs/refs/worktree', next to the already existing
>     'logs/refs/bisect'.  This resulted in a trie node with the path
>     'logs/refs', which didn't exist before, and which doesn't have a

Oops, I missed the trailing slash, that must be 'logs/refs/'!

>     value attached.  A query for 'logs/refs/' finds this node and then
>     hits that one callsite of the match function which doesn't check
>     for the value's existence, and thus invokes the match function
>     with NULL as value.
Junio C Hamano Oct. 23, 2019, 4:01 a.m. UTC | #3
SZEDER Gábor <szeder.dev@gmail.com> writes:

>>   - b9317d55a3 added two new keys to the trie: 'logs/refs/rewritten'
>>     and 'logs/refs/worktree', next to the already existing
>>     'logs/refs/bisect'.  This resulted in a trie node with the path
>>     'logs/refs', which didn't exist before, and which doesn't have a
>
> Oops, I missed the trailing slash, that must be 'logs/refs/'!
>
>>     value attached.  A query for 'logs/refs/' finds this node and then
>>     hits that one callsite of the match function which doesn't check
>>     for the value's existence, and thus invokes the match function
>>     with NULL as value.

Given that the trie is maintained by hand in common_list[], I wonder
if we can mechanically catch errors like the one b9317d55a3 added,
by perhaps having a self-test function that a t/helper/ program
calls to perform consistency check after the "git" gets built.

Thanks.
SZEDER Gábor Oct. 23, 2019, 4:20 p.m. UTC | #4
On Wed, Oct 23, 2019 at 01:01:00PM +0900, Junio C Hamano wrote:
> SZEDER Gábor <szeder.dev@gmail.com> writes:
> 
> >>   - b9317d55a3 added two new keys to the trie: 'logs/refs/rewritten'
> >>     and 'logs/refs/worktree', next to the already existing
> >>     'logs/refs/bisect'.  This resulted in a trie node with the path
> >>     'logs/refs', which didn't exist before, and which doesn't have a
> >
> > Oops, I missed the trailing slash, that must be 'logs/refs/'!
> >
> >>     value attached.  A query for 'logs/refs/' finds this node and then
> >>     hits that one callsite of the match function which doesn't check
> >>     for the value's existence, and thus invokes the match function
> >>     with NULL as value.
> 
> Given that the trie is maintained by hand in common_list[], I wonder
> if we can mechanically catch errors like the one b9317d55a3 added,
> by perhaps having a self-test function that a t/helper/ program
> calls to perform consistency check after the "git" gets built.

I'm not sure what you mean by "consistency check".  The resulting trie
looked as expected both before and after b9317d55a3, i.e. each trie
node had the right contents, value, and children, as far as I could
tell.  The issue was in the lookup function.
Junio C Hamano Oct. 24, 2019, 3:29 a.m. UTC | #5
SZEDER Gábor <szeder.dev@gmail.com> writes:

> I'm not sure what you mean by "consistency check".  The resulting trie
> looked as expected both before and after b9317d55a3, i.e. each trie
> node had the right contents, value, and children, as far as I could
> tell.  The issue was in the lookup function.

I saw the change to the code as a band-aid, which wouldn't have been
necessary if we had the missing refs/ entry.  Fully populating the
leading levels explicitly may give the application more flexibility
by yielding results different from hardcoded -1 like the patched
code gives.  

But perhaps treating a path that matches a missing intermediate
level just like a path that did not match anything in the trie is
what we want anyway, so I guess the code change was the right thing
(as opposed to a band-aid).

Thanks.
Johannes Schindelin Oct. 28, 2019, 10:57 a.m. UTC | #6
Hi Gábor,

On Mon, 21 Oct 2019, SZEDER Gábor wrote:

> 'logs/refs' is not a working tree-specific path, but since commit
> b9317d55a3 (Make sure refs/rewritten/ is per-worktree, 2019-03-07)
> 'git rev-parse --git-path' has been returning a bogus path if a
> trailing '/' is present:
>
>   $ git -C WT/ rev-parse --git-path logs/refs --git-path logs/refs/
>   /home/szeder/src/git/.git/logs/refs
>   /home/szeder/src/git/.git/worktrees/WT/logs/refs/
>
> We use a trie data structure to efficiently decide whether a path
> belongs to the common dir or is working tree-specific.  As it happens
> b9317d55a3 triggered a bug that is as old as the trie implementation
> itself, added in 4e09cf2acf (path: optimize common dir checking,
> 2015-08-31).
>
>   - According to the comment describing trie_find(), it should only
>     call the given match function 'fn' for a "/-or-\0-terminated
>     prefix of the key for which the trie contains a value".  This is
>     not true: there are three places where trie_find() calls the match
>     function, but one of them is missing the check for value's
>     existence.
>
>   - b9317d55a3 added two new keys to the trie: 'logs/refs/rewritten'
>     and 'logs/refs/worktree', next to the already existing
>     'logs/refs/bisect'.  This resulted in a trie node with the path
>     'logs/refs', which didn't exist before, and which doesn't have a
>     value attached.  A query for 'logs/refs/' finds this node and then
>     hits that one callsite of the match function which doesn't check
>     for the value's existence, and thus invokes the match function
>     with NULL as value.
>
>   - When the match function check_common() is invoked with a NULL
>     value, it returns 0, which indicates that the queried path doesn't
>     belong to the common directory, ultimately resulting the bogus
>     path shown above.
>
> Add the missing condition to trie_find() so it will never invoke the
> match function with a non-existing value.  check_common() will then no
> longer have to check that it got a non-NULL value, so remove that
> condition.
>
> I believe that there are no other paths that could cause similar bogus
> output.  AFAICT the only other key resulting in the match function
> being called with a NULL value is 'co' (because of the keys 'common'
> and 'config').  However, as they are not in a directory that belongs
> to the common directory the resulting working tree-specific path is
> expected.
>
> Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>

Thank you for this entire patch series. Just one nit:


> diff --git a/path.c b/path.c
> index cf57bd52dd..e21b00c4d4 100644
> --- a/path.c
> +++ b/path.c
> @@ -299,9 +299,13 @@ static int trie_find(struct trie *root, const char *key, match_fn fn,
>
>  	/* Matched the entire compressed section */
>  	key += i;
> -	if (!*key)
> +	if (!*key) {
>  		/* End of key */
> -		return fn(key, root->value, baton);
> +		if (root->value)
> +			return fn(key, root->value, baton);
> +		else
> +			return -1;

I would have preferred this:

+		if (!root->value)
+			return -1;
+		return fn(key, root->value, baton);

... as it would more accurately reflect my mental model of an "early
out".

But as I said, this is just a nit-pick.

Thank you for working on this!
Dscho

> +	}
>
>  	/* Partial path normalization: skip consecutive slashes */
>  	while (key[0] == '/' && key[1] == '/')
> @@ -345,9 +349,6 @@ static int check_common(const char *unmatched, void *value, void *baton)
>  {
>  	struct common_dir *dir = value;
>
> -	if (!dir)
> -		return 0;
> -
>  	if (dir->is_dir && (unmatched[0] == 0 || unmatched[0] == '/'))
>  		return dir->is_common;
>
> diff --git a/t/t0060-path-utils.sh b/t/t0060-path-utils.sh
> index c7b53e494b..501e1a288d 100755
> --- a/t/t0060-path-utils.sh
> +++ b/t/t0060-path-utils.sh
> @@ -288,6 +288,8 @@ test_git_path GIT_COMMON_DIR=bar index                    .git/index
>  test_git_path GIT_COMMON_DIR=bar HEAD                     .git/HEAD
>  test_git_path GIT_COMMON_DIR=bar logs/HEAD                .git/logs/HEAD
>  test_git_path GIT_COMMON_DIR=bar logs/refs/bisect/foo     .git/logs/refs/bisect/foo
> +test_git_path GIT_COMMON_DIR=bar logs/refs                bar/logs/refs
> +test_git_path GIT_COMMON_DIR=bar logs/refs/               bar/logs/refs/
>  test_git_path GIT_COMMON_DIR=bar logs/refs/bisec/foo      bar/logs/refs/bisec/foo
>  test_git_path GIT_COMMON_DIR=bar logs/refs/bisec          bar/logs/refs/bisec
>  test_git_path GIT_COMMON_DIR=bar logs/refs/bisectfoo      bar/logs/refs/bisectfoo
> --
> 2.24.0.rc0.472.ga6f06c86b4
>
>
SZEDER Gábor Oct. 28, 2019, noon UTC | #7
On Mon, Oct 28, 2019 at 11:57:10AM +0100, Johannes Schindelin wrote:
> >   - According to the comment describing trie_find(), it should only
> >     call the given match function 'fn' for a "/-or-\0-terminated
> >     prefix of the key for which the trie contains a value".  This is
> >     not true: there are three places where trie_find() calls the match
> >     function, but one of them is missing the check for value's
> >     existence.

> Thank you for this entire patch series. Just one nit:
> 
> 
> > diff --git a/path.c b/path.c
> > index cf57bd52dd..e21b00c4d4 100644
> > --- a/path.c
> > +++ b/path.c
> > @@ -299,9 +299,13 @@ static int trie_find(struct trie *root, const char *key, match_fn fn,
> >
> >  	/* Matched the entire compressed section */
> >  	key += i;
> > -	if (!*key)
> > +	if (!*key) {
> >  		/* End of key */
> > -		return fn(key, root->value, baton);
> > +		if (root->value)
> > +			return fn(key, root->value, baton);
> > +		else
> > +			return -1;
> 
> I would have preferred this:
> 
> +		if (!root->value)
> +			return -1;
> +		return fn(key, root->value, baton);
> 
> ... as it would more accurately reflect my mental model of an "early
> out".

The checks at the other two of those three callsites look like this,
and I just followed suit for the sake of consistency.
Johannes Schindelin Oct. 28, 2019, 9:30 p.m. UTC | #8
Hi Gábor,

On Mon, 28 Oct 2019, SZEDER Gábor wrote:

> On Mon, Oct 28, 2019 at 11:57:10AM +0100, Johannes Schindelin wrote:
> > >   - According to the comment describing trie_find(), it should only
> > >     call the given match function 'fn' for a "/-or-\0-terminated
> > >     prefix of the key for which the trie contains a value".  This is
> > >     not true: there are three places where trie_find() calls the match
> > >     function, but one of them is missing the check for value's
> > >     existence.
>
> > Thank you for this entire patch series. Just one nit:
> >
> >
> > > diff --git a/path.c b/path.c
> > > index cf57bd52dd..e21b00c4d4 100644
> > > --- a/path.c
> > > +++ b/path.c
> > > @@ -299,9 +299,13 @@ static int trie_find(struct trie *root, const char *key, match_fn fn,
> > >
> > >  	/* Matched the entire compressed section */
> > >  	key += i;
> > > -	if (!*key)
> > > +	if (!*key) {
> > >  		/* End of key */
> > > -		return fn(key, root->value, baton);
> > > +		if (root->value)
> > > +			return fn(key, root->value, baton);
> > > +		else
> > > +			return -1;
> >
> > I would have preferred this:
> >
> > +		if (!root->value)
> > +			return -1;
> > +		return fn(key, root->value, baton);
> >
> > ... as it would more accurately reflect my mental model of an "early
> > out".
>
> The checks at the other two of those three callsites look like this,
> and I just followed suit for the sake of consistency.

Oh, okay. Sorry for the noise, then.

Thanks,
Dscho

Patch
diff mbox series

diff --git a/path.c b/path.c
index cf57bd52dd..e21b00c4d4 100644
--- a/path.c
+++ b/path.c
@@ -299,9 +299,13 @@  static int trie_find(struct trie *root, const char *key, match_fn fn,
 
 	/* Matched the entire compressed section */
 	key += i;
-	if (!*key)
+	if (!*key) {
 		/* End of key */
-		return fn(key, root->value, baton);
+		if (root->value)
+			return fn(key, root->value, baton);
+		else
+			return -1;
+	}
 
 	/* Partial path normalization: skip consecutive slashes */
 	while (key[0] == '/' && key[1] == '/')
@@ -345,9 +349,6 @@  static int check_common(const char *unmatched, void *value, void *baton)
 {
 	struct common_dir *dir = value;
 
-	if (!dir)
-		return 0;
-
 	if (dir->is_dir && (unmatched[0] == 0 || unmatched[0] == '/'))
 		return dir->is_common;
 
diff --git a/t/t0060-path-utils.sh b/t/t0060-path-utils.sh
index c7b53e494b..501e1a288d 100755
--- a/t/t0060-path-utils.sh
+++ b/t/t0060-path-utils.sh
@@ -288,6 +288,8 @@  test_git_path GIT_COMMON_DIR=bar index                    .git/index
 test_git_path GIT_COMMON_DIR=bar HEAD                     .git/HEAD
 test_git_path GIT_COMMON_DIR=bar logs/HEAD                .git/logs/HEAD
 test_git_path GIT_COMMON_DIR=bar logs/refs/bisect/foo     .git/logs/refs/bisect/foo
+test_git_path GIT_COMMON_DIR=bar logs/refs                bar/logs/refs
+test_git_path GIT_COMMON_DIR=bar logs/refs/               bar/logs/refs/
 test_git_path GIT_COMMON_DIR=bar logs/refs/bisec/foo      bar/logs/refs/bisec/foo
 test_git_path GIT_COMMON_DIR=bar logs/refs/bisec          bar/logs/refs/bisec
 test_git_path GIT_COMMON_DIR=bar logs/refs/bisectfoo      bar/logs/refs/bisectfoo