Message ID | 20181114122725.18659-1-szeder.dev@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | ref-filter: don't look for objects when outside of a repository | expand |
On Wed, Nov 14, 2018 at 01:27:25PM +0100, SZEDER Gábor wrote: > The command 'git ls-remote --sort=authordate <remote>' segfaults when > run outside of a repository, ever since the introduction of its > '--sort' option in 1fb20dfd8e (ls-remote: create '--sort' option, > 2018-04-09). > > While in general the 'git ls-remote' command can be run outside of a > repository just fine, its '--sort=<key>' option with certain keys does > require access to the referenced objects. This sorting is implemented > using the generic ref-filter sorting facility, which already handles > missing objects gracefully with the appropriate 'missing object > deadbeef for HEAD' message. However, being generic means that it > checks replace refs while trying to retrieve an object, and while > doing so it accesses the 'git_replace_ref_base' variable, which has > not been initialized and is still a NULL pointer when outside of a > repository, thus causing the segfault. > > Make ref-filter more careful upfront while parsing the format string, > and make it error out when encountering a format atom requiring object > access when we are not in a repository. Also add a test to ensure > that 'git ls-remote --sort' fails gracefully when executed outside of > a repository. Thanks for picking up this loose end. I like the general approach here, but... > diff --git a/ref-filter.c b/ref-filter.c > index 0c45ed9d94..a1290659af 100644 > --- a/ref-filter.c > +++ b/ref-filter.c > @@ -534,6 +534,10 @@ static int parse_ref_filter_atom(const struct ref_format *format, > if (ARRAY_SIZE(valid_atom) <= i) > return strbuf_addf_ret(err, -1, _("unknown field name: %.*s"), > (int)(ep-atom), atom); > + if (valid_atom[i].source != SOURCE_NONE && !have_git_dir()) > + return strbuf_addf_ret(err, -1, > + _("not a git repository, but the field '%.*s' requires access to object data"), > + (int)(ep-atom), atom); Is SOURCE_NONE a complete match for what we want? I see problems in both directions: - sorting by "objectname" works now, but it's marked with SOURCE_OBJ, and would be forbidden with your patch. I'm actually not sure if SOURCE_OBJ is accurate; we shouldn't need to access the object to show it (and we are probably wasting effort loading the full contents for tools like for-each-ref). However, that's not the full story. For objectname:short, it _does_ call find_unique_abbrev(). So we expect to have an object directory. - sorting by "HEAD" hits a BUG(), and would still be allowed with your patch. So I like the idea here that the particular atoms would tell us whether they're going to need to be in a repository or not, but I think the annotations have to be cleaned up first. > diff --git a/t/t5512-ls-remote.sh b/t/t5512-ls-remote.sh > index 91ee6841c1..32e722db2e 100755 > --- a/t/t5512-ls-remote.sh > +++ b/t/t5512-ls-remote.sh > @@ -302,6 +302,12 @@ test_expect_success 'ls-remote works outside repository' ' > nongit git ls-remote dst.git > ' > > +test_expect_success 'ls-remote --sort fails gracefully outside repository' ' > + # Use a sort key that requires access to the referenced objects. > + nongit test_must_fail git ls-remote --sort=authordate "$TRASH_DIRECTORY" 2>err && > + test_i18ngrep "^fatal: not a git repository, but the field '\''authordate'\'' requires access to object data" err > +' Regardless of our solution, we probably want to add an extra test making sure that something vanilla like: nongit git ls-remote --sort=v:refname "$TRASH_DIRECTORY" continues to work (we do test ls-remote outside a repo already, but not with a sort specifier). -Peff
On Thu, Nov 15, 2018 at 04:38:44AM -0500, Jeff King wrote: > Is SOURCE_NONE a complete match for what we want? > > I see problems in both directions: > > - sorting by "objectname" works now, but it's marked with SOURCE_OBJ, > and would be forbidden with your patch. I'm actually not sure if > SOURCE_OBJ is accurate; we shouldn't need to access the object to > show it (and we are probably wasting effort loading the full contents > for tools like for-each-ref). > > However, that's not the full story. For objectname:short, it _does_ call > find_unique_abbrev(). So we expect to have an object directory. Oops, I'm apparently bad at reading. It is in fact SOURCE_OTHER, which makes sense (outside of this whole "--sort outside a repo thing"). But we'd ideally distinguish between "objectname" (which should be OK outside a repo) and "objectname:short" (which currently segfaults). -Peff
Jeff King <peff@peff.net> writes: > On Thu, Nov 15, 2018 at 04:38:44AM -0500, Jeff King wrote: > >> Is SOURCE_NONE a complete match for what we want? >> >> I see problems in both directions: >> >> - sorting by "objectname" works now, but it's marked with SOURCE_OBJ, >> and would be forbidden with your patch. I'm actually not sure if >> SOURCE_OBJ is accurate; we shouldn't need to access the object to >> show it (and we are probably wasting effort loading the full contents >> for tools like for-each-ref). >> >> However, that's not the full story. For objectname:short, it _does_ call >> find_unique_abbrev(). So we expect to have an object directory. > > Oops, I'm apparently bad at reading. It is in fact SOURCE_OTHER, which > makes sense (outside of this whole "--sort outside a repo thing"). > > But we'd ideally distinguish between "objectname" (which should be OK > outside a repo) and "objectname:short" (which currently segfaults). Arguably, use of ref-filter machinery in ls-remote, whether it is given from inside or outside a repo, was a mistake in 1fb20dfd ("ls-remote: create '--sort' option", 2018-04-09), as the whole point of "ls-remote" is to peek the list of refs and it is perfectly normal that the objects listed are not available. "ls-remote --sort=authorname" that is run in a repository may not segfault on a ref that points at a yet-to-be-fetched commit, but it cannot be doing anything sensible. Is it still better to silently produce a nonsense result than refusing to --sort no matter what the sort keys are, whether we are inside or outside a repository?
On Fri, Nov 16, 2018 at 02:09:07PM +0900, Junio C Hamano wrote: > >> I see problems in both directions: > >> > >> - sorting by "objectname" works now, but it's marked with SOURCE_OBJ, > >> and would be forbidden with your patch. I'm actually not sure if > >> SOURCE_OBJ is accurate; we shouldn't need to access the object to > >> show it (and we are probably wasting effort loading the full contents > >> for tools like for-each-ref). > >> > >> However, that's not the full story. For objectname:short, it _does_ call > >> find_unique_abbrev(). So we expect to have an object directory. > > > > Oops, I'm apparently bad at reading. It is in fact SOURCE_OTHER, which > > makes sense (outside of this whole "--sort outside a repo thing"). > > > > But we'd ideally distinguish between "objectname" (which should be OK > > outside a repo) and "objectname:short" (which currently segfaults). > > Arguably, use of ref-filter machinery in ls-remote, whether it is > given from inside or outside a repo, was a mistake in 1fb20dfd > ("ls-remote: create '--sort' option", 2018-04-09), as the whole > point of "ls-remote" is to peek the list of refs and it is perfectly > normal that the objects listed are not available. I think it's conceptually reasonable to use the ref-filter machinery. It's just that it was underprepared to handle this out-of-repo case. I think we're not too far off, though. > "ls-remote --sort=authorname" that is run in a repository may not > segfault on a ref that points at a yet-to-be-fetched commit, but it > cannot be doing anything sensible. Is it still better to silently > produce a nonsense result than refusing to --sort no matter what the > sort keys are, whether we are inside or outside a repository? I don't think we produce silent nonsense in the current code (or after any of the discussed solutions), either in a repo or out. We say "fatal: missing object ..." inside a repo if the request cannot be fulfilled. That's not incredibly illuminating, perhaps, but it means we fulfill whatever we _can_ on behalf of the user's request, and bail otherwise. If you are arguing that even in a repo we should reject "authorname" early (just as we would outside of a repo), I could buy that. Technically we can make it work sometimes (if we happen to have fetched everything the other side has), but behaving consistently (and with a decent error message) may trump that. -Peff
Jeff King <peff@peff.net> writes: > If you are arguing that even in a repo we should reject "authorname" > early (just as we would outside of a repo), I could buy that. Yup, that (and replace 'authorname' with anything that won't work with missing objects) for consistency was what I meant.
On Fri, Nov 16, 2018 at 02:09:07PM +0900, Junio C Hamano wrote: > Jeff King <peff@peff.net> writes: > > > On Thu, Nov 15, 2018 at 04:38:44AM -0500, Jeff King wrote: > > > >> Is SOURCE_NONE a complete match for what we want? > >> > >> I see problems in both directions: > >> > >> - sorting by "objectname" works now, but it's marked with SOURCE_OBJ, > >> and would be forbidden with your patch. I'm actually not sure if > >> SOURCE_OBJ is accurate; we shouldn't need to access the object to > >> show it (and we are probably wasting effort loading the full contents > >> for tools like for-each-ref). > >> > >> However, that's not the full story. For objectname:short, it _does_ call > >> find_unique_abbrev(). So we expect to have an object directory. > > > > Oops, I'm apparently bad at reading. It is in fact SOURCE_OTHER, which > > makes sense (outside of this whole "--sort outside a repo thing"). > > > > But we'd ideally distinguish between "objectname" (which should be OK > > outside a repo) and "objectname:short" (which currently segfaults). > > Arguably, use of ref-filter machinery in ls-remote, whether it is > given from inside or outside a repo, was a mistake in 1fb20dfd > ("ls-remote: create '--sort' option", 2018-04-09), as the whole > point of "ls-remote" is to peek the list of refs and it is perfectly > normal that the objects listed are not available. I hope that one day 'git ls-remote' will learn to '--format=...' its output, and I think that (re)using the ref-filter machinery would be the right way to go to achive that. Sure, ref-filter supports a lot of format specifiers that don't at all make sense in the context of 'ls-remote' (perhaps we should have a dedicated set of valid_atoms for that), but I think it's perfectly reasonable to do something like: git ls-remote --format=%(refname:strip=2) remote A concrete use case for that could be to eliminate the last remaining shell loops from refs completion.
diff --git a/ref-filter.c b/ref-filter.c index 0c45ed9d94..a1290659af 100644 --- a/ref-filter.c +++ b/ref-filter.c @@ -534,6 +534,10 @@ static int parse_ref_filter_atom(const struct ref_format *format, if (ARRAY_SIZE(valid_atom) <= i) return strbuf_addf_ret(err, -1, _("unknown field name: %.*s"), (int)(ep-atom), atom); + if (valid_atom[i].source != SOURCE_NONE && !have_git_dir()) + return strbuf_addf_ret(err, -1, + _("not a git repository, but the field '%.*s' requires access to object data"), + (int)(ep-atom), atom); /* Add it in, including the deref prefix */ at = used_atom_cnt; diff --git a/t/t5512-ls-remote.sh b/t/t5512-ls-remote.sh index 91ee6841c1..32e722db2e 100755 --- a/t/t5512-ls-remote.sh +++ b/t/t5512-ls-remote.sh @@ -302,6 +302,12 @@ test_expect_success 'ls-remote works outside repository' ' nongit git ls-remote dst.git ' +test_expect_success 'ls-remote --sort fails gracefully outside repository' ' + # Use a sort key that requires access to the referenced objects. + nongit test_must_fail git ls-remote --sort=authordate "$TRASH_DIRECTORY" 2>err && + test_i18ngrep "^fatal: not a git repository, but the field '\''authordate'\'' requires access to object data" err +' + test_expect_success 'ls-remote patterns work with all protocol versions' ' git for-each-ref --format="%(objectname) %(refname)" \ refs/heads/master refs/remotes/origin/master >expect &&