Message ID | 59276a5b3fd1fd3b25db73e096cf0e834af2d4f9.1696615769.git.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | Performance improvement & cleanup in loose ref iteration | expand |
"Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes: > From: Victoria Dye <vdye@github.com> > > Update 'cache_ref_iterator_advance' to skip over refs that are not matched > by the given prefix. > > Currently, a ref entry is considered "matched" if the entry name is fully > contained within the prefix: > > * prefix: "refs/heads/v1" > * entry: "refs/heads/v1.0" > > OR if the prefix is fully contained in the entry name: > > * prefix: "refs/heads/v1.0" > * entry: "refs/heads/v1" > > The first case is always correct, but the second is only correct if the ref > cache entry is a directory, for example: > > * prefix: "refs/heads/example" > * entry: "refs/heads/" > > Modify the logic in 'cache_ref_iterator_advance' to reflect these > expectations: > > 1. If 'overlaps_prefix' returns 'PREFIX_EXCLUDES_DIR', then the prefix and > ref cache entry do not overlap at all. Skip this entry. > 2. If 'overlaps_prefix' returns 'PREFIX_WITHIN_DIR', then the prefix matches > inside this entry if it is a directory. Skip if the entry is not a > directory, otherwise iterate over it. > 3. Otherwise, 'overlaps_prefix' returned 'PREFIX_CONTAINS_DIR', indicating > that the cache entry (directory or not) is fully contained by or equal to > the prefix. Iterate over this entry. > > Note that condition 2 relies on the names of directory entries having the > appropriate trailing slash. The existing function documentation of > 'create_dir_entry' explicitly calls out the trailing slash requirement, so > this is a safe assumption to make. Thanks for explaining it very well and clearly. Allowing prefix="refs/heads/v1.0" to yield entry="refs/heads/v1" (case #2 above that this patch fixes the behaviour for) would cause ref_iterator_advance() to return a ref outside the hierarhcy, wouldn't it? So it appears to me that either one of the two would be true: * the code is structured in such a way that such a condition does not actually happen (in which case this patch would be a no-op), or * there is a bug in the current code that is fixed by this patch, whose externally observable behaviour can be verified with a test. It is not quite clear to me which is the case here. The code with the patch looks more logical than the original, but I am not sure how to demonstrate the existing breakage (if any). > Signed-off-by: Victoria Dye <vdye@github.com> > --- > refs/ref-cache.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/refs/ref-cache.c b/refs/ref-cache.c > index 2294c4564fb..6e3b725245c 100644 > --- a/refs/ref-cache.c > +++ b/refs/ref-cache.c > @@ -412,7 +412,8 @@ static int cache_ref_iterator_advance(struct ref_iterator *ref_iterator) > > if (level->prefix_state == PREFIX_WITHIN_DIR) { > entry_prefix_state = overlaps_prefix(entry->name, iter->prefix); > - if (entry_prefix_state == PREFIX_EXCLUDES_DIR) > + if (entry_prefix_state == PREFIX_EXCLUDES_DIR || > + (entry_prefix_state == PREFIX_WITHIN_DIR && !(entry->flag & REF_DIR))) > continue; > } else { > entry_prefix_state = level->prefix_state;
On Fri, Oct 06, 2023 at 02:51:24PM -0700, Junio C Hamano wrote: > "Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes: > > > From: Victoria Dye <vdye@github.com> > > > > Update 'cache_ref_iterator_advance' to skip over refs that are not matched > > by the given prefix. > > > > Currently, a ref entry is considered "matched" if the entry name is fully > > contained within the prefix: > > > > * prefix: "refs/heads/v1" > > * entry: "refs/heads/v1.0" > > > > OR if the prefix is fully contained in the entry name: > > > > * prefix: "refs/heads/v1.0" > > * entry: "refs/heads/v1" > > > > The first case is always correct, but the second is only correct if the ref > > cache entry is a directory, for example: > > > > * prefix: "refs/heads/example" > > * entry: "refs/heads/" > > > > Modify the logic in 'cache_ref_iterator_advance' to reflect these > > expectations: > > > > 1. If 'overlaps_prefix' returns 'PREFIX_EXCLUDES_DIR', then the prefix and > > ref cache entry do not overlap at all. Skip this entry. > > 2. If 'overlaps_prefix' returns 'PREFIX_WITHIN_DIR', then the prefix matches > > inside this entry if it is a directory. Skip if the entry is not a > > directory, otherwise iterate over it. > > 3. Otherwise, 'overlaps_prefix' returned 'PREFIX_CONTAINS_DIR', indicating > > that the cache entry (directory or not) is fully contained by or equal to > > the prefix. Iterate over this entry. > > > > Note that condition 2 relies on the names of directory entries having the > > appropriate trailing slash. The existing function documentation of > > 'create_dir_entry' explicitly calls out the trailing slash requirement, so > > this is a safe assumption to make. > > Thanks for explaining it very well and clearly. > > Allowing prefix="refs/heads/v1.0" to yield entry="refs/heads/v1" > (case #2 above that this patch fixes the behaviour for) would cause > ref_iterator_advance() to return a ref outside the hierarhcy, > wouldn't it? So it appears to me that either one of the two would > be true: > > * the code is structured in such a way that such a condition does > not actually happen (in which case this patch would be a no-op), > or > > * there is a bug in the current code that is fixed by this patch, > whose externally observable behaviour can be verified with a > test. > > It is not quite clear to me which is the case here. The code with > the patch looks more logical than the original, but I am not sure > how to demonstrate the existing breakage (if any). Agreed, I also had a bit of a hard time to figure out whether this is an actual bug fix, a performance improvement or merely a refactoring. Patrick > > Signed-off-by: Victoria Dye <vdye@github.com> > > --- > > refs/ref-cache.c | 3 ++- > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > diff --git a/refs/ref-cache.c b/refs/ref-cache.c > > index 2294c4564fb..6e3b725245c 100644 > > --- a/refs/ref-cache.c > > +++ b/refs/ref-cache.c > > @@ -412,7 +412,8 @@ static int cache_ref_iterator_advance(struct ref_iterator *ref_iterator) > > > > if (level->prefix_state == PREFIX_WITHIN_DIR) { > > entry_prefix_state = overlaps_prefix(entry->name, iter->prefix); > > - if (entry_prefix_state == PREFIX_EXCLUDES_DIR) > > + if (entry_prefix_state == PREFIX_EXCLUDES_DIR || > > + (entry_prefix_state == PREFIX_WITHIN_DIR && !(entry->flag & REF_DIR))) > > continue; > > } else { > > entry_prefix_state = level->prefix_state;
Patrick Steinhardt wrote: >> Allowing prefix="refs/heads/v1.0" to yield entry="refs/heads/v1" >> (case #2 above that this patch fixes the behaviour for) would cause >> ref_iterator_advance() to return a ref outside the hierarhcy, >> wouldn't it? So it appears to me that either one of the two would >> be true: >> >> * the code is structured in such a way that such a condition does >> not actually happen (in which case this patch would be a no-op), >> or >> >> * there is a bug in the current code that is fixed by this patch, >> whose externally observable behaviour can be verified with a >> test. >> >> It is not quite clear to me which is the case here. The code with >> the patch looks more logical than the original, but I am not sure >> how to demonstrate the existing breakage (if any). > > Agreed, I also had a bit of a hard time to figure out whether this is an > actual bug fix, a performance improvement or merely a refactoring. > I originally operated on the assumption that it was the first case, which is why I didn't include a test in this patch. Commands like 'for-each-ref', 'show-ref', etc. either use an empty prefix or a directory prefix with a trailing slash, which won't trigger this issue. I encountered the problem while working on a builtin that filtered refs by a user-specified prefix - the results included refs that should not have been matched, which led me to this fix. Scanning through the codebase again, though, I do see a way to replicate the issue: $ git update-ref refs/bisect/b HEAD $ git rev-parse --abbrev-ref --bisect refs/bisect/b Because 'rev-parse --bisect' uses the "refs/bisect/bad" prefix (no trailing slash) and does no additional filtering in its 'for_each_fullref_in' callback, refs like "refs/bisect/b" and "refs/bisect/ba" are (incorrectly) matched. I'll re-roll with the added test.
Victoria Dye <vdye@github.com> writes: > I originally operated on the assumption that it was the first case, which is > why I didn't include a test in this patch. Commands like 'for-each-ref', > 'show-ref', etc. either use an empty prefix or a directory prefix with a > trailing slash, which won't trigger this issue. Ah, yes, I didn't mention it but I suspected as such (i.e. the code is structured in such a way that this broken implementation does not matter to the current callers). > I encountered the problem > while working on a builtin that filtered refs by a user-specified prefix - > the results included refs that should not have been matched, which led me to > this fix. OK, perfectly understandable. > Scanning through the codebase again, though, I do see a way to replicate the > issue: > > $ git update-ref refs/bisect/b HEAD > $ git rev-parse --abbrev-ref --bisect > refs/bisect/b > > Because 'rev-parse --bisect' uses the "refs/bisect/bad" prefix (no trailing > slash) and does no additional filtering in its 'for_each_fullref_in' > callback, refs like "refs/bisect/b" and "refs/bisect/ba" are (incorrectly) > matched. I'll re-roll with the added test. Good find. Thanks!
diff --git a/refs/ref-cache.c b/refs/ref-cache.c index 2294c4564fb..6e3b725245c 100644 --- a/refs/ref-cache.c +++ b/refs/ref-cache.c @@ -412,7 +412,8 @@ static int cache_ref_iterator_advance(struct ref_iterator *ref_iterator) if (level->prefix_state == PREFIX_WITHIN_DIR) { entry_prefix_state = overlaps_prefix(entry->name, iter->prefix); - if (entry_prefix_state == PREFIX_EXCLUDES_DIR) + if (entry_prefix_state == PREFIX_EXCLUDES_DIR || + (entry_prefix_state == PREFIX_WITHIN_DIR && !(entry->flag & REF_DIR))) continue; } else { entry_prefix_state = level->prefix_state;