Message ID | 20191021160043.701-6-szeder.dev@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | path.c: a couple of common dir/trie fixes | expand |
On Mon, 2019-10-21 at 18:00 +0200, SZEDER Gábor wrote: > Add the missing condition to trie_find() so it will never invoke the > match function with a non-existing value. check_common() will then > no > longer have to check that it got a non-NULL value, so remove that > condition. ... > > /* Partial path normalization: skip consecutive slashes */ > while (key[0] == '/' && key[1] == '/') > @@ -345,9 +349,6 @@ static int check_common(const char *unmatched, > void *value, void *baton) > { > struct common_dir *dir = value; > > - if (!dir) > - return 0; Do we want to assert(dir) here? Overall, LGTM. Thanks for the clean-up.
On Mon, Oct 21, 2019 at 06:00:43PM +0200, SZEDER Gábor wrote: > 'logs/refs' is not a working tree-specific path, but since commit > b9317d55a3 (Make sure refs/rewritten/ is per-worktree, 2019-03-07) > 'git rev-parse --git-path' has been returning a bogus path if a > trailing '/' is present: > > $ git -C WT/ rev-parse --git-path logs/refs --git-path logs/refs/ > /home/szeder/src/git/.git/logs/refs > /home/szeder/src/git/.git/worktrees/WT/logs/refs/ > > We use a trie data structure to efficiently decide whether a path > belongs to the common dir or is working tree-specific. As it happens > b9317d55a3 triggered a bug that is as old as the trie implementation > itself, added in 4e09cf2acf (path: optimize common dir checking, > 2015-08-31). > > - According to the comment describing trie_find(), it should only > call the given match function 'fn' for a "/-or-\0-terminated > prefix of the key for which the trie contains a value". This is > not true: there are three places where trie_find() calls the match > function, but one of them is missing the check for value's > existence. > > - b9317d55a3 added two new keys to the trie: 'logs/refs/rewritten' > and 'logs/refs/worktree', next to the already existing > 'logs/refs/bisect'. This resulted in a trie node with the path > 'logs/refs', which didn't exist before, and which doesn't have a Oops, I missed the trailing slash, that must be 'logs/refs/'! > value attached. A query for 'logs/refs/' finds this node and then > hits that one callsite of the match function which doesn't check > for the value's existence, and thus invokes the match function > with NULL as value.
SZEDER Gábor <szeder.dev@gmail.com> writes: >> - b9317d55a3 added two new keys to the trie: 'logs/refs/rewritten' >> and 'logs/refs/worktree', next to the already existing >> 'logs/refs/bisect'. This resulted in a trie node with the path >> 'logs/refs', which didn't exist before, and which doesn't have a > > Oops, I missed the trailing slash, that must be 'logs/refs/'! > >> value attached. A query for 'logs/refs/' finds this node and then >> hits that one callsite of the match function which doesn't check >> for the value's existence, and thus invokes the match function >> with NULL as value. Given that the trie is maintained by hand in common_list[], I wonder if we can mechanically catch errors like the one b9317d55a3 added, by perhaps having a self-test function that a t/helper/ program calls to perform consistency check after the "git" gets built. Thanks.
On Wed, Oct 23, 2019 at 01:01:00PM +0900, Junio C Hamano wrote: > SZEDER Gábor <szeder.dev@gmail.com> writes: > > >> - b9317d55a3 added two new keys to the trie: 'logs/refs/rewritten' > >> and 'logs/refs/worktree', next to the already existing > >> 'logs/refs/bisect'. This resulted in a trie node with the path > >> 'logs/refs', which didn't exist before, and which doesn't have a > > > > Oops, I missed the trailing slash, that must be 'logs/refs/'! > > > >> value attached. A query for 'logs/refs/' finds this node and then > >> hits that one callsite of the match function which doesn't check > >> for the value's existence, and thus invokes the match function > >> with NULL as value. > > Given that the trie is maintained by hand in common_list[], I wonder > if we can mechanically catch errors like the one b9317d55a3 added, > by perhaps having a self-test function that a t/helper/ program > calls to perform consistency check after the "git" gets built. I'm not sure what you mean by "consistency check". The resulting trie looked as expected both before and after b9317d55a3, i.e. each trie node had the right contents, value, and children, as far as I could tell. The issue was in the lookup function.
SZEDER Gábor <szeder.dev@gmail.com> writes: > I'm not sure what you mean by "consistency check". The resulting trie > looked as expected both before and after b9317d55a3, i.e. each trie > node had the right contents, value, and children, as far as I could > tell. The issue was in the lookup function. I saw the change to the code as a band-aid, which wouldn't have been necessary if we had the missing refs/ entry. Fully populating the leading levels explicitly may give the application more flexibility by yielding results different from hardcoded -1 like the patched code gives. But perhaps treating a path that matches a missing intermediate level just like a path that did not match anything in the trie is what we want anyway, so I guess the code change was the right thing (as opposed to a band-aid). Thanks.
Hi Gábor, On Mon, 21 Oct 2019, SZEDER Gábor wrote: > 'logs/refs' is not a working tree-specific path, but since commit > b9317d55a3 (Make sure refs/rewritten/ is per-worktree, 2019-03-07) > 'git rev-parse --git-path' has been returning a bogus path if a > trailing '/' is present: > > $ git -C WT/ rev-parse --git-path logs/refs --git-path logs/refs/ > /home/szeder/src/git/.git/logs/refs > /home/szeder/src/git/.git/worktrees/WT/logs/refs/ > > We use a trie data structure to efficiently decide whether a path > belongs to the common dir or is working tree-specific. As it happens > b9317d55a3 triggered a bug that is as old as the trie implementation > itself, added in 4e09cf2acf (path: optimize common dir checking, > 2015-08-31). > > - According to the comment describing trie_find(), it should only > call the given match function 'fn' for a "/-or-\0-terminated > prefix of the key for which the trie contains a value". This is > not true: there are three places where trie_find() calls the match > function, but one of them is missing the check for value's > existence. > > - b9317d55a3 added two new keys to the trie: 'logs/refs/rewritten' > and 'logs/refs/worktree', next to the already existing > 'logs/refs/bisect'. This resulted in a trie node with the path > 'logs/refs', which didn't exist before, and which doesn't have a > value attached. A query for 'logs/refs/' finds this node and then > hits that one callsite of the match function which doesn't check > for the value's existence, and thus invokes the match function > with NULL as value. > > - When the match function check_common() is invoked with a NULL > value, it returns 0, which indicates that the queried path doesn't > belong to the common directory, ultimately resulting the bogus > path shown above. > > Add the missing condition to trie_find() so it will never invoke the > match function with a non-existing value. check_common() will then no > longer have to check that it got a non-NULL value, so remove that > condition. > > I believe that there are no other paths that could cause similar bogus > output. AFAICT the only other key resulting in the match function > being called with a NULL value is 'co' (because of the keys 'common' > and 'config'). However, as they are not in a directory that belongs > to the common directory the resulting working tree-specific path is > expected. > > Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Thank you for this entire patch series. Just one nit: > diff --git a/path.c b/path.c > index cf57bd52dd..e21b00c4d4 100644 > --- a/path.c > +++ b/path.c > @@ -299,9 +299,13 @@ static int trie_find(struct trie *root, const char *key, match_fn fn, > > /* Matched the entire compressed section */ > key += i; > - if (!*key) > + if (!*key) { > /* End of key */ > - return fn(key, root->value, baton); > + if (root->value) > + return fn(key, root->value, baton); > + else > + return -1; I would have preferred this: + if (!root->value) + return -1; + return fn(key, root->value, baton); ... as it would more accurately reflect my mental model of an "early out". But as I said, this is just a nit-pick. Thank you for working on this! Dscho > + } > > /* Partial path normalization: skip consecutive slashes */ > while (key[0] == '/' && key[1] == '/') > @@ -345,9 +349,6 @@ static int check_common(const char *unmatched, void *value, void *baton) > { > struct common_dir *dir = value; > > - if (!dir) > - return 0; > - > if (dir->is_dir && (unmatched[0] == 0 || unmatched[0] == '/')) > return dir->is_common; > > diff --git a/t/t0060-path-utils.sh b/t/t0060-path-utils.sh > index c7b53e494b..501e1a288d 100755 > --- a/t/t0060-path-utils.sh > +++ b/t/t0060-path-utils.sh > @@ -288,6 +288,8 @@ test_git_path GIT_COMMON_DIR=bar index .git/index > test_git_path GIT_COMMON_DIR=bar HEAD .git/HEAD > test_git_path GIT_COMMON_DIR=bar logs/HEAD .git/logs/HEAD > test_git_path GIT_COMMON_DIR=bar logs/refs/bisect/foo .git/logs/refs/bisect/foo > +test_git_path GIT_COMMON_DIR=bar logs/refs bar/logs/refs > +test_git_path GIT_COMMON_DIR=bar logs/refs/ bar/logs/refs/ > test_git_path GIT_COMMON_DIR=bar logs/refs/bisec/foo bar/logs/refs/bisec/foo > test_git_path GIT_COMMON_DIR=bar logs/refs/bisec bar/logs/refs/bisec > test_git_path GIT_COMMON_DIR=bar logs/refs/bisectfoo bar/logs/refs/bisectfoo > -- > 2.24.0.rc0.472.ga6f06c86b4 > >
On Mon, Oct 28, 2019 at 11:57:10AM +0100, Johannes Schindelin wrote: > > - According to the comment describing trie_find(), it should only > > call the given match function 'fn' for a "/-or-\0-terminated > > prefix of the key for which the trie contains a value". This is > > not true: there are three places where trie_find() calls the match > > function, but one of them is missing the check for value's > > existence. > Thank you for this entire patch series. Just one nit: > > > > diff --git a/path.c b/path.c > > index cf57bd52dd..e21b00c4d4 100644 > > --- a/path.c > > +++ b/path.c > > @@ -299,9 +299,13 @@ static int trie_find(struct trie *root, const char *key, match_fn fn, > > > > /* Matched the entire compressed section */ > > key += i; > > - if (!*key) > > + if (!*key) { > > /* End of key */ > > - return fn(key, root->value, baton); > > + if (root->value) > > + return fn(key, root->value, baton); > > + else > > + return -1; > > I would have preferred this: > > + if (!root->value) > + return -1; > + return fn(key, root->value, baton); > > ... as it would more accurately reflect my mental model of an "early > out". The checks at the other two of those three callsites look like this, and I just followed suit for the sake of consistency.
Hi Gábor, On Mon, 28 Oct 2019, SZEDER Gábor wrote: > On Mon, Oct 28, 2019 at 11:57:10AM +0100, Johannes Schindelin wrote: > > > - According to the comment describing trie_find(), it should only > > > call the given match function 'fn' for a "/-or-\0-terminated > > > prefix of the key for which the trie contains a value". This is > > > not true: there are three places where trie_find() calls the match > > > function, but one of them is missing the check for value's > > > existence. > > > Thank you for this entire patch series. Just one nit: > > > > > > > diff --git a/path.c b/path.c > > > index cf57bd52dd..e21b00c4d4 100644 > > > --- a/path.c > > > +++ b/path.c > > > @@ -299,9 +299,13 @@ static int trie_find(struct trie *root, const char *key, match_fn fn, > > > > > > /* Matched the entire compressed section */ > > > key += i; > > > - if (!*key) > > > + if (!*key) { > > > /* End of key */ > > > - return fn(key, root->value, baton); > > > + if (root->value) > > > + return fn(key, root->value, baton); > > > + else > > > + return -1; > > > > I would have preferred this: > > > > + if (!root->value) > > + return -1; > > + return fn(key, root->value, baton); > > > > ... as it would more accurately reflect my mental model of an "early > > out". > > The checks at the other two of those three callsites look like this, > and I just followed suit for the sake of consistency. Oh, okay. Sorry for the noise, then. Thanks, Dscho
diff --git a/path.c b/path.c index cf57bd52dd..e21b00c4d4 100644 --- a/path.c +++ b/path.c @@ -299,9 +299,13 @@ static int trie_find(struct trie *root, const char *key, match_fn fn, /* Matched the entire compressed section */ key += i; - if (!*key) + if (!*key) { /* End of key */ - return fn(key, root->value, baton); + if (root->value) + return fn(key, root->value, baton); + else + return -1; + } /* Partial path normalization: skip consecutive slashes */ while (key[0] == '/' && key[1] == '/') @@ -345,9 +349,6 @@ static int check_common(const char *unmatched, void *value, void *baton) { struct common_dir *dir = value; - if (!dir) - return 0; - if (dir->is_dir && (unmatched[0] == 0 || unmatched[0] == '/')) return dir->is_common; diff --git a/t/t0060-path-utils.sh b/t/t0060-path-utils.sh index c7b53e494b..501e1a288d 100755 --- a/t/t0060-path-utils.sh +++ b/t/t0060-path-utils.sh @@ -288,6 +288,8 @@ test_git_path GIT_COMMON_DIR=bar index .git/index test_git_path GIT_COMMON_DIR=bar HEAD .git/HEAD test_git_path GIT_COMMON_DIR=bar logs/HEAD .git/logs/HEAD test_git_path GIT_COMMON_DIR=bar logs/refs/bisect/foo .git/logs/refs/bisect/foo +test_git_path GIT_COMMON_DIR=bar logs/refs bar/logs/refs +test_git_path GIT_COMMON_DIR=bar logs/refs/ bar/logs/refs/ test_git_path GIT_COMMON_DIR=bar logs/refs/bisec/foo bar/logs/refs/bisec/foo test_git_path GIT_COMMON_DIR=bar logs/refs/bisec bar/logs/refs/bisec test_git_path GIT_COMMON_DIR=bar logs/refs/bisectfoo bar/logs/refs/bisectfoo
'logs/refs' is not a working tree-specific path, but since commit b9317d55a3 (Make sure refs/rewritten/ is per-worktree, 2019-03-07) 'git rev-parse --git-path' has been returning a bogus path if a trailing '/' is present: $ git -C WT/ rev-parse --git-path logs/refs --git-path logs/refs/ /home/szeder/src/git/.git/logs/refs /home/szeder/src/git/.git/worktrees/WT/logs/refs/ We use a trie data structure to efficiently decide whether a path belongs to the common dir or is working tree-specific. As it happens b9317d55a3 triggered a bug that is as old as the trie implementation itself, added in 4e09cf2acf (path: optimize common dir checking, 2015-08-31). - According to the comment describing trie_find(), it should only call the given match function 'fn' for a "/-or-\0-terminated prefix of the key for which the trie contains a value". This is not true: there are three places where trie_find() calls the match function, but one of them is missing the check for value's existence. - b9317d55a3 added two new keys to the trie: 'logs/refs/rewritten' and 'logs/refs/worktree', next to the already existing 'logs/refs/bisect'. This resulted in a trie node with the path 'logs/refs', which didn't exist before, and which doesn't have a value attached. A query for 'logs/refs/' finds this node and then hits that one callsite of the match function which doesn't check for the value's existence, and thus invokes the match function with NULL as value. - When the match function check_common() is invoked with a NULL value, it returns 0, which indicates that the queried path doesn't belong to the common directory, ultimately resulting the bogus path shown above. Add the missing condition to trie_find() so it will never invoke the match function with a non-existing value. check_common() will then no longer have to check that it got a non-NULL value, so remove that condition. I believe that there are no other paths that could cause similar bogus output. AFAICT the only other key resulting in the match function being called with a NULL value is 'co' (because of the keys 'common' and 'config'). However, as they are not in a directory that belongs to the common directory the resulting working tree-specific path is expected. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> --- path.c | 11 ++++++----- t/t0060-path-utils.sh | 2 ++ 2 files changed, 8 insertions(+), 5 deletions(-)