Message ID | 20230601174351.GC4165297@coredump.intra.peff.net (mailing list archive) |
---|---|
State | Accepted |
Commit | 8260bc59023136edeaed1f1006a03f44cc849883 |
Headers | show |
Series | handling pathspec magic with --follow | expand |
Jeff King <peff@peff.net> writes: > The --follow code doesn't handle most forms of pathspec magic. We check > that no unexpected ones have made it to try_to_follow_renames() with a > runtime GUARD_PATHSPEC() check, which gives behavior like this: > > $ git log --follow ':(icase)makefile' >/dev/null > BUG: tree-diff.c:596: unsupported magic 10 > Aborted Stepping back a bit, isn't the real reason why follow mode is incompatible with the pathspec magic because it does not want to deal with anything but a single literal pathname? In other words, in "git log --follow ':(icase)makefile'", if the given pathspec matches a single path in the starting commit, we should be able to start from the commit inspecting that uniquely found path, compare the commit with its parent(s) at the path, and when the parent does not have the path, detect rename and continue following that the name before the rename as the single path starting from that parent. Perhaps the commit we start at, HEAD, has "Makefile" and not "makefile" (if we have two, that is an error), so we start traversal, following the pathname "Makefile". At some point, we may discover that the parent did not have "Makefile", but has "GNUMakefile" with a similar contents, at which point we swap the path we follow from "Makefile" to "GNUMakefile". At each step, tree-diff does not need to and does not want to do anything fancy with pathspec---the operation is always "this path in the parent, does it exist in the parent? If not, did it come from a different path?", and use of "literal pathspec" mode makes sense there. So why does try_to_follow_renames() even *need* to say anything other than "We are in follow mode; what we have is a single path; use it as a literal pathspec that matches the path literally without any funky magic"? GUARD_PATHSPEC() is "please inspect the pathspec and see what kind of special pattern matching it requests---we cannot allow it to ask for certain things but do allow it to ask for other things", that sounds terribly misguided here. The string is literal, so we do not even want to look at it to see what it asks---it is taken as nothing but a literal string and we do not let it ask for anything special. For example, if we started at HEAD to follow Makefile and in an ancestor commit we find that the Makefile came from a funny path ":(icase)foo", instead of allowing the pathspec match code to interprete it as a magic pathspec that matches any case permutations of 'f/o/o', we want to force the pathspec match code to match nothing but the funny path "colon parens icase close parens foo" by using literal pathspec magic. So perhaps having GUARD_PATHSPEC() there is the mistake, no? For the above idea to work, I think we need to resolve the pathspec early; "log --follow" should find its starting point, apply the given pathspec to its treeish to grab the literal pathname, and use that single literal pathname as the literal pathspec and continue, which I do not think it does right now. But with it done upfront, the pathspec limiting done during the history traversal, including try_to_follow_renames() bit, should be able to limit itself to "is the path literally the string given as the pathspec"? I dunno.
On Fri, Jun 02, 2023 at 04:27:05PM +0900, Junio C Hamano wrote: > Jeff King <peff@peff.net> writes: > > > The --follow code doesn't handle most forms of pathspec magic. We check > > that no unexpected ones have made it to try_to_follow_renames() with a > > runtime GUARD_PATHSPEC() check, which gives behavior like this: > > > > $ git log --follow ':(icase)makefile' >/dev/null > > BUG: tree-diff.c:596: unsupported magic 10 > > Aborted > > Stepping back a bit, isn't the real reason why follow mode is > incompatible with the pathspec magic because it does not want to > deal with anything but a single literal pathname? Sorry, I'm a little behind on list reading; I know this topic has advanced almost to 'master' in the meantime, but I had a few thoughts worth getting out. Basically, yes, I agree with everything in your email. My series is more or less papering over deficiencies in the --follow code, and that code would be much better off thinking of single literal paths that it is following. I was trying with my series not to go down the rabbit hole of --follow deficiencies. So in that sense, I think it is still a good idea for the short-term. And though we might perhaps revert some of it if we actually improve how --follow works, I don't think it would be too hard to do so later. But of course if we are going to improve --follow _now_, then it could replace my series. And what you outlined here: > For the above idea to work, I think we need to resolve the pathspec > early; "log --follow" should find its starting point, apply the > given pathspec to its treeish to grab the literal pathname, and use > that single literal pathname as the literal pathspec and continue, > which I do not think it does right now. But with it done upfront, > the pathspec limiting done during the history traversal, including > try_to_follow_renames() bit, should be able to limit itself to "is > the path literally the string given as the pathspec"? seems mostly sensible to me, except that the notion of "starting point" is ambiguous. If I do: git log --follow branch1 branch2 -- foo* then we have two treeishes. Do we look at one? Both? What happens if the pathspec resolves differently? Off the top of my head, I'd say that we should look at all starting tips, resolve the pathspec in all of them, and complain if it doesn't resolve to the same single path in each one. Because it otherwise would produce garbled results. To be fair, this same garbled case already happens when traversing a merge sends us down two paths of history, as a followed rename might be present on one but not the other, and we keep only a single path. But as you know, that is a long-time deficiency of --follow. :) At least detecting it from the starting tips is easy and something the user can deal with by modifying the command invocation. So I dunno. That doesn't sound _too_ hard to do, though it probably also needs a lot of the same infrastructure I introduced in my series. Namely that log.follow needs to ask "hey, is this pathspec usable for following?" so that it can quietly turn the feature off, rather than triggering an error. So I'd suggest to let my series continue graduating, and then if anybody wants to work on more sensible pathspec handling, to do so on top. -Peff
Jeff King <peff@peff.net> writes: > ... except that the notion of "starting point" > is ambiguous. If I do: > > git log --follow branch1 branch2 -- foo* > > then we have two treeishes. Do we look at one? Both? What happens if the > pathspec resolves differently? If we consider a middle-ground where we do not keep a separate pathname that each history traversal is following but only keep a single pathname, then the answer is one of "ignore others and pick one" (which is essentially what needs to happen in the current implementation when a path gets renamed between a child commit and its parent while the other parents do not have renames), "ignore others and pick one, but give a warning that the result may be incomplete", or "barf". The latter two would need us to look at all treeishes. A false-positive-ridden "solution" that would slightly be better than that would be to still keep a global, single, pathspec that has more than one elements (all of which are literal pathspec elements). Every time we discover a rename, we add the newly discovered old name as an additional element, instead of replacing the single element. If we were to go that route, then starting from multiple points, each following a pathname, would be a simple matter of looking at all trees and finding which pathname each history traversal wants to follow and creating a pathspec with all these (literal) pathnames. But once we fix the underlying problem, and start tracking which path each history traversal trajectory is currently following, then it is not a problem for each starting point to have a pathname that is being followed. Each commit would _know_ which pathname it is interested in, and when a traversal from a commit to its parent sees a rename, the pathname being followed for the parent commit will be different from the child. When this happens, the parent may have a pathname it is already interested in, set by traversing from another child (i.e. one side of a fork renamed the path one way, while the other side of a fork kept the same name or renamed it to another way, and we traversed both forks down and are now at the fork point). It most likely should be the same name we discovered from the other child, but in weird cases it may be different (e.g. a child thinks its path A came from parent's path X while another child thinks its path A came from parent's path Y), in which case we may have to follow two pathnames in the parent commit, as if the pathname we have been following in the child had combined contents from multiple paths. > So I'd suggest to let my series continue graduating, and then if anybody > wants to work on more sensible pathspec handling, to do so on top. I think we are in agreement on this---otherwise I wouldn't have merged the patch as-is down to 'next' ;-) As I said multiple times in the past, the current "log --follow" is more of a checkbox item than a feature, and in that context, I think your "fix", which may be papering over just one obvious breakage, is what the codepath should be happy with, before it gets reworked more seriously. Thanks.
diff --git a/diff.c b/diff.c index 9f548f3471..8ca16fefe7 100644 --- a/diff.c +++ b/diff.c @@ -4753,11 +4753,26 @@ unsigned diff_filter_bit(char status) int diff_check_follow_pathspec(struct pathspec *ps, int die_on_error) { + unsigned forbidden_magic; + if (ps->nr != 1) { if (die_on_error) die(_("--follow requires exactly one pathspec")); return 0; } + + forbidden_magic = ps->items[0].magic & ~(PATHSPEC_FROMTOP | + PATHSPEC_LITERAL); + if (forbidden_magic) { + if (die_on_error) { + struct strbuf sb = STRBUF_INIT; + pathspec_magic_names(forbidden_magic, &sb); + die(_("pathspec magic not supported by --follow: %s"), + sb.buf); + } + return 0; + } + return 1; } diff --git a/t/t4202-log.sh b/t/t4202-log.sh index ae73aef922..5b54d7928e 100755 --- a/t/t4202-log.sh +++ b/t/t4202-log.sh @@ -187,6 +187,21 @@ test_expect_success 'git config log.follow does not die with no paths' ' git log -- ' +test_expect_success 'git log --follow rejects unsupported pathspec magic' ' + test_must_fail git log --follow ":(top,glob,icase)ichi" 2>stderr && + # check full error message; we want to be sure we mention both + # of the rejected types (glob,icase), but not the allowed one (top) + echo "fatal: pathspec magic not supported by --follow: ${SQ}glob${SQ}, ${SQ}icase${SQ}" >expect && + test_cmp expect stderr +' + +test_expect_success 'log.follow disabled with unsupported pathspec magic' ' + test_config log.follow true && + git log --format=%s ":(glob,icase)ichi" >actual && + echo third >expect && + test_cmp expect actual +' + test_expect_success 'git config log.follow is overridden by --no-follow' ' test_config log.follow true && git log --no-follow --pretty="format:%s" ichi >actual &&
The --follow code doesn't handle most forms of pathspec magic. We check that no unexpected ones have made it to try_to_follow_renames() with a runtime GUARD_PATHSPEC() check, which gives behavior like this: $ git log --follow ':(icase)makefile' >/dev/null BUG: tree-diff.c:596: unsupported magic 10 Aborted The same is true of ":(glob)", ":(attr)", and so on. It's good that we notice the problem rather than continuing and producing a wrong answer. But there are two non-ideal things: 1. The idea of GUARD_PATHSPEC() is to catch programming errors where low-level code gets unexpected pathspecs. We'd usually try to catch unsupported pathspecs by passing a magic_mask to parse_pathspec(), which would give the user a much better message like: pathspec magic not supported by this command: 'icase' That doesn't happen here because git-log usually _does_ support all types of pathspec magic, and so it passes "0" for the mask (this call actually happens in setup_revisions()). It needs to distinguish the normal case from the "--follow" one but currently doesn't. 2. In addition to --follow, we have the log.follow config option. When that is set, we try to turn on --follow mode only when there is a single pathspec (since --follow doesn't handle anything else). But really, that ought to be expanded to "use --follow when the pathspec supports it". Otherwise, we'd complain any time you use an exotic pathspec: $ git config log.follow true $ git log ':(icase)makefile' >/dev/null BUG: tree-diff.c:596: unsupported magic 10 Aborted We should instead just avoid enabling follow mode if it's not supported by this particular invocation. This patch expands our diff_check_follow_pathspec() function to cover pathspec magic, solving both problems. A few final notes: - we could also solve (1) by passing the appropriate mask to parse_pathspec(). But that's not great for two reasons. One is that the error message is less precise. It says "magic not supported by this command", but really it is not the command, but rather the --follow option which is the problem. The second is that it always calls die(). But for our log.follow code, we want to speculatively ask "is this pathspec OK?" and just get a boolean result. - This is obviously the right thing to do for ':(icase)' and most other magic options. But ':(glob)' is a bit odd here. The --follow code doesn't support wildcards, but we allow them anyway. From try_to_follow_renames(): #if 0 /* * We should reject wildcards as well. Unfortunately we * haven't got a reliable way to detect that 'foo\*bar' in * fact has no wildcards. nowildcard_len is merely a hint for * optimization. Let it slip for now until wildmatch is taught * about dry-run mode and returns wildcard info. */ if (opt->pathspec.has_wildcard) BUG("wildcards are not supported"); #endif So something like "git log --follow 'Make*'" is already doing the wrong thing, since ":(glob)" behavior is already the default (it is used only to countermand an earlier --noglob-pathspecs). So we _could_ loosen the guard to allow :(glob), since it just behaves the same as pathspecs do by default. But it seems like a backwards step to do so. It already doesn't work (it hits the BUG() case currently), and given that the user took an explicit step to say "this pathspec should glob", it is reasonable for us to say "no, --follow does not support globbing" (or in the case of log.follow, avoid turning on follow mode). Which is what happens after this patch. - The set of allowed pathspec magic is obviously the same as in GUARD_PATHSPEC(). We could perhaps factor these out to avoid repetition. The point of having separate masks and GUARD calls is that we don't necessarily know which parsed pathspecs will be used where. But in this case, the two are heavily correlated. Still, there may be some value in keeping them separate; it would make anyone think twice about adding new magic to the list in diff_check_follow_pathspec(). They'd need to touch try_to_follow_renames() as well, which is the code that would actually need to be updated to handle more exotic pathspecs. - The documentation for log.follow says that it enables --follow "...when a single <path> is given". We could possibly expand that to say "with no unsupported pathspec magic", but that raises the question of documenting which magic is supported. I think the existing wording of "single <path>" sufficiently encompasses the idea (the forbidden magic is stuff that might match multiple entries), and the spirit remains the same. Reported-by: Jim Pryor <dubiousjim@gmail.com> Signed-off-by: Jeff King <peff@peff.net> --- diff.c | 15 +++++++++++++++ t/t4202-log.sh | 15 +++++++++++++++ 2 files changed, 30 insertions(+)