Message ID | 20190424152215.16251-3-avarab@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [1/2] diffcore-pickaxe: refactor !one or !two case in diff_grep | expand |
On Wed, Apr 24 2019, Ævar Arnfjörð Bjarmason wrote: > Add the ability for the -G<regex> pickaxe to search only through added > or removed lines in the diff, or even through an arbitrary amount of > context lines when combined with -U<n>. > > This has been requested[1][2] a few times in the past, and isn't > currently possible. Instead users need to do -G<regex> and then write > their own post-parsing script to see if the <regex> matched added or > removed lines, or both. There was no way to match the adjacent context > lines other than running and grepping the equivalent of a "log -p -U<n>". > > 1. https://public-inbox.org/git/xmqqwoqrr8y2.fsf@gitster-ct.c.googlers.com/ > 2. https://public-inbox.org/git/20190424102609.GA19697@vmlxhi-102.adit-jv.com/ I see now once I actually read Eugeniu Rosca's E-Mail upthread instead of just knee-jerk sending out patches that this doesn't actually solve his particular problem fully. I.e. if you want some AND/OR matching support this --pickaxe-raw-diff won't give you that, but it *does* make it much easier to script up such an option. Run it twice with -G"\+<regex>" and -G"-<regex>", "sort | uniq -c" the commit list, and see which things occur once or twice. Of course that doesn't give you more complex nested and/or cases, but if git-log grew support for that like git-grep has the -G option could use that, although at that point we'd probably want to spend effort on making the underlying machinery smarter to avoid duplicate work. Furthermore, and quoting Eugeniu upthread: In the context of [1], I would like to find all Linux commits which replaced: 'devm_request_threaded_irq(* IRQF_SHARED *)' by: 'devm_request_threaded_irq(* IRQF_ONESHOT *)' Such AND/OR machinery would give you what you wanted *most* of the time, but it would also find removed/added pairs that were "unrelated" as well as "related". Solving *that* problem is more complex, but something the diff machinery could in principle expose. But the "-G<regex> --pickaxe-raw-diff" feature I have as-is is very useful, I've had at least two people off-list ask me about a problem that would be solved by it just in the last 1/2 year (unrelated to them having seen the WIP patch I sent last October). It's more general than Junio's suggested --pickaxe-ignore-{add,del} options[1], but those could be implemented in terms of this underlying code if anyone cared to have those as aliases. You'd just take the -G<regex> and prefix the <regex> with "^\+" or "^-" as appropriate and turn on the DIFF_PICKAXE_G_RAW_DIFF flag. 1. https://public-inbox.org/git/xmqqwoqrr8y2.fsf@gitster-ct.c.googlers.com/
Hi Ævar, Thanks for the amazingly fast reply and for the useful feature (yay!). On Wed, Apr 24, 2019 at 05:37:10PM +0200, Ævar Arnfjörð Bjarmason wrote: > > On Wed, Apr 24 2019, Ævar Arnfjörð Bjarmason wrote: > > > Add the ability for the -G<regex> pickaxe to search only through added > > or removed lines in the diff, or even through an arbitrary amount of > > context lines when combined with -U<n>. > > > > This has been requested[1][2] a few times in the past, and isn't > > currently possible. Instead users need to do -G<regex> and then write > > their own post-parsing script to see if the <regex> matched added or > > removed lines, or both. There was no way to match the adjacent context > > lines other than running and grepping the equivalent of a "log -p -U<n>". > > > > 1. https://public-inbox.org/git/xmqqwoqrr8y2.fsf@gitster-ct.c.googlers.com/ > > 2. https://public-inbox.org/git/20190424102609.GA19697@vmlxhi-102.adit-jv.com/ > > I see now once I actually read Eugeniu Rosca's E-Mail upthread instead > of just knee-jerk sending out patches that this doesn't actually solve > his particular problem fully. > > I.e. if you want some AND/OR matching support this --pickaxe-raw-diff > won't give you that, but it *does* make it much easier to script up such > an option. Run it twice with -G"\+<regex>" and -G"-<regex>", "sort | > uniq -c" the commit list, and see which things occur once or twice. > > Of course that doesn't give you more complex nested and/or cases, but if > git-log grew support for that like git-grep has the -G option could use > that, although at that point we'd probably want to spend effort on > making the underlying machinery smarter to avoid duplicate work. Purely from user's standpoint, I feel more comfortable with `git grep` and `git log --grep` particularly b/c they support '--all-match' [2], allowing more flexible multi-line searches. Based on your feedback, it looks to me that `git log -G/-S` did not have a chance to develop their features to the same level. > > Furthermore, and quoting Eugeniu upthread: > > In the context of [1], I would like to find all Linux commits which > replaced: > 'devm_request_threaded_irq(* IRQF_SHARED *)' > by: > 'devm_request_threaded_irq(* IRQF_ONESHOT *)' > > Such AND/OR machinery would give you what you wanted *most* of the time, > but it would also find removed/added pairs that were "unrelated" as well > as "related". Solving *that* problem is more complex, but something the > diff machinery could in principle expose. I expect some false positives, since git is agnostic on the language used to write the versioned files (the latter sounds like a research topic to me - I hope there is somebody willing to experiment with that in future). > > But the "-G<regex> --pickaxe-raw-diff" feature I have as-is is very > useful, I agree. I am a bit bothered by the fact that `git log --oneline -Ux -G<regex> --pickaxe-raw-diff` outputs the contents/patch of a commit. My expectation is that we have the `log -p` knob for that? > I've had at least two people off-list ask me about a problem > that would be solved by it just in the last 1/2 year (unrelated to them > having seen the WIP patch I sent last October). > > It's more general than Junio's suggested --pickaxe-ignore-{add,del} As a user, I would be happier to freely grep in the raw commit contents rather than learning a dozen of new options which provide small subsets of the same functionality. So, I personally vote for the approach taken by --pickaxe-raw-diff. This would also reduce the complexity of my current git aliases and/or allow dropping some of them altogether. Quite off topic, but I also needed to come up with a solution to get the C functions modified/touched by a git commit [3]. It is my understanding that --pickaxe-raw-diff can't help here and I still have to rely on parsing the output of `git log -p`? > options[1], but those could be implemented in terms of this underlying > code if anyone cared to have those as aliases. You'd just take the > -G<regex> and prefix the <regex> with "^\+" or "^-" as appropriate and > turn on the DIFF_PICKAXE_G_RAW_DIFF flag. > > 1. https://public-inbox.org/git/xmqqwoqrr8y2.fsf@gitster-ct.c.googlers.com/ Thanks! [2] https://gitster.livejournal.com/30195.html [3] https://stackoverflow.com/questions/50707171/how-to-get-all-c-functions-modified-by-a-git-commit
On Thu, Apr 25 2019, Eugeniu Rosca wrote: > Hi Ævar, > > Thanks for the amazingly fast reply and for the useful feature (yay!). > > On Wed, Apr 24, 2019 at 05:37:10PM +0200, Ævar Arnfjörð Bjarmason wrote: >> >> On Wed, Apr 24 2019, Ævar Arnfjörð Bjarmason wrote: >> >> > Add the ability for the -G<regex> pickaxe to search only through added >> > or removed lines in the diff, or even through an arbitrary amount of >> > context lines when combined with -U<n>. >> > >> > This has been requested[1][2] a few times in the past, and isn't >> > currently possible. Instead users need to do -G<regex> and then write >> > their own post-parsing script to see if the <regex> matched added or >> > removed lines, or both. There was no way to match the adjacent context >> > lines other than running and grepping the equivalent of a "log -p -U<n>". >> > >> > 1. https://public-inbox.org/git/xmqqwoqrr8y2.fsf@gitster-ct.c.googlers.com/ >> > 2. https://public-inbox.org/git/20190424102609.GA19697@vmlxhi-102.adit-jv.com/ >> >> I see now once I actually read Eugeniu Rosca's E-Mail upthread instead >> of just knee-jerk sending out patches that this doesn't actually solve >> his particular problem fully. >> >> I.e. if you want some AND/OR matching support this --pickaxe-raw-diff >> won't give you that, but it *does* make it much easier to script up such >> an option. Run it twice with -G"\+<regex>" and -G"-<regex>", "sort | >> uniq -c" the commit list, and see which things occur once or twice. >> >> Of course that doesn't give you more complex nested and/or cases, but if >> git-log grew support for that like git-grep has the -G option could use >> that, although at that point we'd probably want to spend effort on >> making the underlying machinery smarter to avoid duplicate work. > > Purely from user's standpoint, I feel more comfortable with `git grep` > and `git log --grep` particularly b/c they support '--all-match' [2], > allowing more flexible multi-line searches. Based on your feedback, it > looks to me that `git log -G/-S` did not have a chance to develop their > features to the same level. > >> >> Furthermore, and quoting Eugeniu upthread: >> >> In the context of [1], I would like to find all Linux commits which >> replaced: >> 'devm_request_threaded_irq(* IRQF_SHARED *)' >> by: >> 'devm_request_threaded_irq(* IRQF_ONESHOT *)' >> >> Such AND/OR machinery would give you what you wanted *most* of the time, >> but it would also find removed/added pairs that were "unrelated" as well >> as "related". Solving *that* problem is more complex, but something the >> diff machinery could in principle expose. > > I expect some false positives, since git is agnostic on the language > used to write the versioned files (the latter sounds like a research > topic to me - I hope there is somebody willing to experiment with that > in future). I was thinking of something where the added/removed could be filtered to cases that occur in the same diff hunk. >> >> But the "-G<regex> --pickaxe-raw-diff" feature I have as-is is very >> useful, > > I agree. I am a bit bothered by the fact that > `git log --oneline -Ux -G<regex> --pickaxe-raw-diff` outputs the > contents/patch of a commit. My expectation is that we have the > `log -p` knob for that? This is unrelated to --pickaxe-raw-diff, -U<n> just implies -p in general. See e.g. "git log -U1". >> I've had at least two people off-list ask me about a problem >> that would be solved by it just in the last 1/2 year (unrelated to them >> having seen the WIP patch I sent last October). >> >> It's more general than Junio's suggested --pickaxe-ignore-{add,del} > > As a user, I would be happier to freely grep in the raw commit contents > rather than learning a dozen of new options which provide small subsets > of the same functionality. So, I personally vote for the approach taken > by --pickaxe-raw-diff. This would also reduce the complexity of my > current git aliases and/or allow dropping some of them altogether. > > Quite off topic, but I also needed to come up with a solution to get > the C functions modified/touched by a git commit [3]. It is my > understanding that --pickaxe-raw-diff can't help here and I still have > to rely on parsing the output of `git log -p`? Yeah, it doesn't help with that. When it runs we haven't generated the context line or the "@@" line yet, that's later. You can breakpoint on xdl_format_hunk_hdr and diffgrep_consume to see it in action. It's a waste of CPU to generate that for all possible hunks, most of which we won't show at all. But it's of course possible to do so by running the full diff machinery over every commit and matching on the result, the current pickaxe is just taking shortcuts and not doing that. >> options[1], but those could be implemented in terms of this underlying >> code if anyone cared to have those as aliases. You'd just take the >> -G<regex> and prefix the <regex> with "^\+" or "^-" as appropriate and >> turn on the DIFF_PICKAXE_G_RAW_DIFF flag. >> >> 1. https://public-inbox.org/git/xmqqwoqrr8y2.fsf@gitster-ct.c.googlers.com/ > > Thanks! > > [2] https://gitster.livejournal.com/30195.html > [3] https://stackoverflow.com/questions/50707171/how-to-get-all-c-functions-modified-by-a-git-commit
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: >> I agree. I am a bit bothered by the fact that >> `git log --oneline -Ux -G<regex> --pickaxe-raw-diff` outputs the >> contents/patch of a commit. My expectation is that we have the >> `log -p` knob for that? > > This is unrelated to --pickaxe-raw-diff, -U<n> just implies -p in > general. See e.g. "git log -U1". The reason why I found this exchange interesting is because I think it shows a noteworthy gap between end-user expectations and what the implementors know. Stepping back (or sideways) a bit, pretend for a while that there were no "pickaxe" feature in Git. Instead there is the "patch-grep" tool whose design is roughly: 1. It reads "git log -p" output from its standard input, and splits the lines into records, each of which consists of the header part (i.e. starting at the "commit <object name>" line, to the first blank line before the title), the log message part, and the patch part. 2. It takes command line arguments, which are, like "git grep", patterns to match and instructions on how to combine the match result. 3. It applies the match criteria only to the patch part of each record. A record without any match in the patch part is discarded. 4. It uses the surviving record's "commit <object name>" lines to decide what commits to show. It does the moral equivalent of invoking "git show" on each of them, and perhaps lets you affect how the commits are shown. Or perhaps it just lists the commit object names chosen for further processing by downstream tools that read from it. So the user would be able to say something like git log -Ux --since=6.months | git patch-grep \ --commit-names-only \ --all-match \ -e '+.*devm_request_threaded_irq(IRQF_SHARED)' \ -e '-.*devm_request_threaded_irq(IRQF_ONESHOT)' | xargs git show --oneline -s As an implementor, you know that is not how your -G<pattern> thing works, but coming from the end-user side, I think it is a reasonable mental model to expect a tool to work more like so. And I think the expectation from combining --oneline with -Ux was that the -U option would apply to step 1, not step 4 (as --oneline is a clear indication that the user wants a very concise final result). Personally, I think the _best_ match for the original wish would be to have that hypothetical "git patch-grep" read from "git log -L" that is limited to the C function in the source the user is interested in. And until "git patch-grep" becomes reality, I would probably have done git log -L<function of interest> -U<x> | less and asked "less" to skip to a match with /(IRQF_SHARED|IRQF_ONESHOT) and then kept hitting 'n' until I find what replaces them, as a stop-gap measure. By the way, I think your thing is interesting regardless, even if it does not match the use case in the original thread (it actually may match---I didn't think it through). Because in the context of diff/log family, however, the word "raw" has a specific connotation about the "--raw" format (as opposed to "--patch"), I would not call this "grep the patch output itself, instead of grepping the source (guided by the patch output to tell what lines are near the lines that got replaced)" feature anything "raw", by the way.
On Thu, Apr 25, 2019 at 01:24:56AM +0200, Ævar Arnfjörð Bjarmason wrote: > > On Thu, Apr 25 2019, Eugeniu Rosca wrote: > > > Hi Ævar, > > > > Thanks for the amazingly fast reply and for the useful feature (yay!). > > > > On Wed, Apr 24, 2019 at 05:37:10PM +0200, Ævar Arnfjörð Bjarmason wrote: > >> > >> On Wed, Apr 24 2019, Ævar Arnfjörð Bjarmason wrote: > >> > >> > Add the ability for the -G<regex> pickaxe to search only through added > >> > or removed lines in the diff, or even through an arbitrary amount of > >> > context lines when combined with -U<n>. > >> > > >> > This has been requested[1][2] a few times in the past, and isn't > >> > currently possible. Instead users need to do -G<regex> and then write > >> > their own post-parsing script to see if the <regex> matched added or > >> > removed lines, or both. There was no way to match the adjacent context > >> > lines other than running and grepping the equivalent of a "log -p -U<n>". > >> > > >> > 1. https://public-inbox.org/git/xmqqwoqrr8y2.fsf@gitster-ct.c.googlers.com/ > >> > 2. https://public-inbox.org/git/20190424102609.GA19697@vmlxhi-102.adit-jv.com/ > >> > >> I see now once I actually read Eugeniu Rosca's E-Mail upthread instead > >> of just knee-jerk sending out patches that this doesn't actually solve > >> his particular problem fully. > >> > >> I.e. if you want some AND/OR matching support this --pickaxe-raw-diff > >> won't give you that, but it *does* make it much easier to script up such > >> an option. Run it twice with -G"\+<regex>" and -G"-<regex>", "sort | > >> uniq -c" the commit list, and see which things occur once or twice. > >> > >> Of course that doesn't give you more complex nested and/or cases, but if > >> git-log grew support for that like git-grep has the -G option could use > >> that, although at that point we'd probably want to spend effort on > >> making the underlying machinery smarter to avoid duplicate work. > > > > Purely from user's standpoint, I feel more comfortable with `git grep` > > and `git log --grep` particularly b/c they support '--all-match' [2], > > allowing more flexible multi-line searches. Based on your feedback, it > > looks to me that `git log -G/-S` did not have a chance to develop their > > features to the same level. > > > >> > >> Furthermore, and quoting Eugeniu upthread: > >> > >> In the context of [1], I would like to find all Linux commits which > >> replaced: > >> 'devm_request_threaded_irq(* IRQF_SHARED *)' > >> by: > >> 'devm_request_threaded_irq(* IRQF_ONESHOT *)' > >> > >> Such AND/OR machinery would give you what you wanted *most* of the time, > >> but it would also find removed/added pairs that were "unrelated" as well > >> as "related". Solving *that* problem is more complex, but something the > >> diff machinery could in principle expose. > > > > I expect some false positives, since git is agnostic on the language > > used to write the versioned files (the latter sounds like a research > > topic to me - I hope there is somebody willing to experiment with that > > in future). > > I was thinking of something where the added/removed could be filtered to > cases that occur in the same diff hunk. > > >> > >> But the "-G<regex> --pickaxe-raw-diff" feature I have as-is is very > >> useful, > > > > I agree. I am a bit bothered by the fact that > > `git log --oneline -Ux -G<regex> --pickaxe-raw-diff` outputs the > > contents/patch of a commit. My expectation is that we have the > > `log -p` knob for that? > > This is unrelated to --pickaxe-raw-diff, -U<n> just implies -p in > general. See e.g. "git log -U1". Oops. Since I use `-U<n>` mostly with `git show`, I missed the implication. You are right. Then, my question is how users are going to (quote from commit description): > >> > [..] search [..] through an arbitrary amount of > >> > context lines when combined with -U<n>. and achieve a `git log --oneline` report, given that -U<n> unfolds the commits? FTR, based on my quick experiments, --pickaxe-raw-diff does process several lines of context by default (it appears to default to -U3). > > >> I've had at least two people off-list ask me about a problem > >> that would be solved by it just in the last 1/2 year (unrelated to them > >> having seen the WIP patch I sent last October). > >> > >> It's more general than Junio's suggested --pickaxe-ignore-{add,del} > > > > As a user, I would be happier to freely grep in the raw commit contents > > rather than learning a dozen of new options which provide small subsets > > of the same functionality. So, I personally vote for the approach taken > > by --pickaxe-raw-diff. This would also reduce the complexity of my > > current git aliases and/or allow dropping some of them altogether. > > > > Quite off topic, but I also needed to come up with a solution to get > > the C functions modified/touched by a git commit [3]. It is my > > understanding that --pickaxe-raw-diff can't help here and I still have > > to rely on parsing the output of `git log -p`? > > Yeah, it doesn't help with that. When it runs we haven't generated the > context line or the "@@" line yet, that's later. You can breakpoint on > xdl_format_hunk_hdr and diffgrep_consume to see it in action. > > It's a waste of CPU to generate that for all possible hunks, most of > which we won't show at all. > > But it's of course possible to do so by running the full diff machinery > over every commit and matching on the result, the current pickaxe is > just taking shortcuts and not doing that. > > >> options[1], but those could be implemented in terms of this underlying > >> code if anyone cared to have those as aliases. You'd just take the > >> -G<regex> and prefix the <regex> with "^\+" or "^-" as appropriate and > >> turn on the DIFF_PICKAXE_G_RAW_DIFF flag. > >> > >> 1. https://public-inbox.org/git/xmqqwoqrr8y2.fsf@gitster-ct.c.googlers.com/ > > > > Thanks! > > > > [2] https://gitster.livejournal.com/30195.html > > [3] https://stackoverflow.com/questions/50707171/how-to-get-all-c-functions-modified-by-a-git-commit
On Thu, Apr 25 2019, Eugeniu Rosca wrote: > On Thu, Apr 25, 2019 at 01:24:56AM +0200, Ævar Arnfjörð Bjarmason wrote: >> >> On Thu, Apr 25 2019, Eugeniu Rosca wrote: >> >> > Hi Ævar, >> > >> > Thanks for the amazingly fast reply and for the useful feature (yay!). >> > >> > On Wed, Apr 24, 2019 at 05:37:10PM +0200, Ævar Arnfjörð Bjarmason wrote: >> >> >> >> On Wed, Apr 24 2019, Ævar Arnfjörð Bjarmason wrote: >> >> >> >> > Add the ability for the -G<regex> pickaxe to search only through added >> >> > or removed lines in the diff, or even through an arbitrary amount of >> >> > context lines when combined with -U<n>. >> >> > >> >> > This has been requested[1][2] a few times in the past, and isn't >> >> > currently possible. Instead users need to do -G<regex> and then write >> >> > their own post-parsing script to see if the <regex> matched added or >> >> > removed lines, or both. There was no way to match the adjacent context >> >> > lines other than running and grepping the equivalent of a "log -p -U<n>". >> >> > >> >> > 1. https://public-inbox.org/git/xmqqwoqrr8y2.fsf@gitster-ct.c.googlers.com/ >> >> > 2. https://public-inbox.org/git/20190424102609.GA19697@vmlxhi-102.adit-jv.com/ >> >> >> >> I see now once I actually read Eugeniu Rosca's E-Mail upthread instead >> >> of just knee-jerk sending out patches that this doesn't actually solve >> >> his particular problem fully. >> >> >> >> I.e. if you want some AND/OR matching support this --pickaxe-raw-diff >> >> won't give you that, but it *does* make it much easier to script up such >> >> an option. Run it twice with -G"\+<regex>" and -G"-<regex>", "sort | >> >> uniq -c" the commit list, and see which things occur once or twice. >> >> >> >> Of course that doesn't give you more complex nested and/or cases, but if >> >> git-log grew support for that like git-grep has the -G option could use >> >> that, although at that point we'd probably want to spend effort on >> >> making the underlying machinery smarter to avoid duplicate work. >> > >> > Purely from user's standpoint, I feel more comfortable with `git grep` >> > and `git log --grep` particularly b/c they support '--all-match' [2], >> > allowing more flexible multi-line searches. Based on your feedback, it >> > looks to me that `git log -G/-S` did not have a chance to develop their >> > features to the same level. >> > >> >> >> >> Furthermore, and quoting Eugeniu upthread: >> >> >> >> In the context of [1], I would like to find all Linux commits which >> >> replaced: >> >> 'devm_request_threaded_irq(* IRQF_SHARED *)' >> >> by: >> >> 'devm_request_threaded_irq(* IRQF_ONESHOT *)' >> >> >> >> Such AND/OR machinery would give you what you wanted *most* of the time, >> >> but it would also find removed/added pairs that were "unrelated" as well >> >> as "related". Solving *that* problem is more complex, but something the >> >> diff machinery could in principle expose. >> > >> > I expect some false positives, since git is agnostic on the language >> > used to write the versioned files (the latter sounds like a research >> > topic to me - I hope there is somebody willing to experiment with that >> > in future). >> >> I was thinking of something where the added/removed could be filtered to >> cases that occur in the same diff hunk. >> >> >> >> >> But the "-G<regex> --pickaxe-raw-diff" feature I have as-is is very >> >> useful, >> > >> > I agree. I am a bit bothered by the fact that >> > `git log --oneline -Ux -G<regex> --pickaxe-raw-diff` outputs the >> > contents/patch of a commit. My expectation is that we have the >> > `log -p` knob for that? >> >> This is unrelated to --pickaxe-raw-diff, -U<n> just implies -p in >> general. See e.g. "git log -U1". > > Oops. Since I use `-U<n>` mostly with `git show`, I missed the > implication. You are right. Then, my question is how users are > going to (quote from commit description): > >> >> > [..] search [..] through an arbitrary amount of >> >> > context lines when combined with -U<n>. > > and achieve a `git log --oneline` report, given that -U<n> unfolds > the commits? > > FTR, based on my quick experiments, --pickaxe-raw-diff does process > several lines of context by default (it appears to default to -U3). Yeah I should document this explicitly. We use the default diff context so if you just -G'foo.*bar' you'll find things in the 6x lines of context (3 before / 3 after), not just the "-" and "+" lines. It's a "feature", but we should be really clear about it, i.e. you need to anchor with "^[+-]" if you want the same thing that -G does for you now. I *do* find the default semantics really useful. Sometimes you can use -L, but I've often done manual greps with -U<n> for "let's find code changes anywhere in the project near places where we use some API", maybe we should pick -U0 with --pickaxe-raw-diff by default to avoid *that* particular surprise by default, but I think that would be even more confusing... >> >> >> I've had at least two people off-list ask me about a problem >> >> that would be solved by it just in the last 1/2 year (unrelated to them >> >> having seen the WIP patch I sent last October). >> >> >> >> It's more general than Junio's suggested --pickaxe-ignore-{add,del} >> > >> > As a user, I would be happier to freely grep in the raw commit contents >> > rather than learning a dozen of new options which provide small subsets >> > of the same functionality. So, I personally vote for the approach taken >> > by --pickaxe-raw-diff. This would also reduce the complexity of my >> > current git aliases and/or allow dropping some of them altogether. >> > >> > Quite off topic, but I also needed to come up with a solution to get >> > the C functions modified/touched by a git commit [3]. It is my >> > understanding that --pickaxe-raw-diff can't help here and I still have >> > to rely on parsing the output of `git log -p`? >> >> Yeah, it doesn't help with that. When it runs we haven't generated the >> context line or the "@@" line yet, that's later. You can breakpoint on >> xdl_format_hunk_hdr and diffgrep_consume to see it in action. >> >> It's a waste of CPU to generate that for all possible hunks, most of >> which we won't show at all. >> >> But it's of course possible to do so by running the full diff machinery >> over every commit and matching on the result, the current pickaxe is >> just taking shortcuts and not doing that. >> >> >> options[1], but those could be implemented in terms of this underlying >> >> code if anyone cared to have those as aliases. You'd just take the >> >> -G<regex> and prefix the <regex> with "^\+" or "^-" as appropriate and >> >> turn on the DIFF_PICKAXE_G_RAW_DIFF flag. >> >> >> >> 1. https://public-inbox.org/git/xmqqwoqrr8y2.fsf@gitster-ct.c.googlers.com/ >> > >> > Thanks! >> > >> > [2] https://gitster.livejournal.com/30195.html >> > [3] https://stackoverflow.com/questions/50707171/how-to-get-all-c-functions-modified-by-a-git-commit
On Thu, Apr 25 2019, Junio C Hamano wrote: > Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > >>> I agree. I am a bit bothered by the fact that >>> `git log --oneline -Ux -G<regex> --pickaxe-raw-diff` outputs the >>> contents/patch of a commit. My expectation is that we have the >>> `log -p` knob for that? >> >> This is unrelated to --pickaxe-raw-diff, -U<n> just implies -p in >> general. See e.g. "git log -U1". > > The reason why I found this exchange interesting is because I think > it shows a noteworthy gap between end-user expectations and what the > implementors know. > > Stepping back (or sideways) a bit, pretend for a while that there > were no "pickaxe" feature in Git. Instead there is the "patch-grep" > tool whose design is roughly: > > 1. It reads "git log -p" output from its standard input, and > splits the lines into records, each of which consists of the > header part (i.e. starting at the "commit <object name>" line, > to the first blank line before the title), the log message > part, and the patch part. > > 2. It takes command line arguments, which are, like "git grep", > patterns to match and instructions on how to combine the match > result. > > 3. It applies the match criteria only to the patch part of each > record. A record without any match in the patch part is > discarded. > > 4. It uses the surviving record's "commit <object name>" lines > to decide what commits to show. It does the moral equivalent > of invoking "git show" on each of them, and perhaps lets you > affect how the commits are shown. > > Or perhaps it just lists the commit object names chosen for > further processing by downstream tools that read from it. > > > So the user would be able to say something like > > git log -Ux --since=6.months | > git patch-grep \ > --commit-names-only \ > --all-match \ > -e '+.*devm_request_threaded_irq(IRQF_SHARED)' \ > -e '-.*devm_request_threaded_irq(IRQF_ONESHOT)' | > xargs git show --oneline -s > > As an implementor, you know that is not how your -G<pattern> thing > works, but coming from the end-user side, I think it is a reasonable > mental model to expect a tool to work more like so. And I think the > expectation from combining --oneline with -Ux was that the -U option > would apply to step 1, not step 4 (as --oneline is a clear > indication that the user wants a very concise final result). > > Personally, I think the _best_ match for the original wish would be > to have that hypothetical "git patch-grep" read from "git log -L" > that is limited to the C function in the source the user is > interested in. > > And until "git patch-grep" becomes reality, I would probably have > done > > git log -L<function of interest> -U<x> | less > > and asked "less" to skip to a match with > > /(IRQF_SHARED|IRQF_ONESHOT) > > and then kept hitting 'n' until I find what replaces them, as a > stop-gap measure. > > By the way, I think your thing is interesting regardless, even if it > does not match the use case in the original thread (it actually may > match---I didn't think it through). Yeah it's definitely a bit orthagonal, should have sent it in reply to something else and actually read the E-Mail, but I think it's useful. > Because in the context of diff/log family, however, the word "raw" > has a specific connotation about the "--raw" format (as opposed to > "--patch"), I would not call this "grep the patch output itself, > instead of grepping the source (guided by the patch output to tell > what lines are near the lines that got replaced)" feature anything > "raw", by the way. I agree, brainfarted on not thinking about "raw". Do you or anyone have a suggestion for a better CLI option name? Maybe --pickaxe-patch or --pickaxe-patch-format (to go with git-diff's -u aka --patch (i.e. not --raw) default format)? Or --pickaxe-G-with-context or --pickaxe-with-context or --with-pickaxe-context or --pickaxe-context ? All of these suck, but I'm coming up blank on a better one :) Probably the least shitty of those shitty options is --pickaxe-patch, since we have --patch which triggers the same format, and we can document that the default is a -G search through --no-pickaxe-patch, and you can just tweak the format. It also leaves the door open (unlike having *-G-* in the option) to support this for -S if anyone cared...
On Thu, Apr 25, 2019 at 02:54:48AM +0200, Eugeniu Rosca wrote: > > This is unrelated to --pickaxe-raw-diff, -U<n> just implies -p in > > general. See e.g. "git log -U1". > > Oops. Since I use `-U<n>` mostly with `git show`, I missed the > implication. You are right. Then, my question is how users are > going to (quote from commit description): > > > >> > [..] search [..] through an arbitrary amount of > > >> > context lines when combined with -U<n>. > > and achieve a `git log --oneline` report, given that -U<n> unfolds > the commits? You can use "-s" to suppress patch output; as long as it comes after -U on the command-line, it will countermand the patch-format part. (Of course it doesn't matter until we have a raw-diff grep, since otherwise the context lines do not matter at all, and you should just omit -U entirely). -Peff
On Thu, Apr 25, 2019 at 09:44:40AM +0900, Junio C Hamano wrote: [..] > So the user would be able to say something like > > git log -Ux --since=6.months | > git patch-grep \ > --commit-names-only \ > --all-match \ > -e '+.*devm_request_threaded_irq(IRQF_SHARED)' \ > -e '-.*devm_request_threaded_irq(IRQF_ONESHOT)' | > xargs git show --oneline -s [..] JFTR/FWIW, this looks quite user friendly to me, but I believe users like me can (and likely do) already handcraft their own variants of 'git patch-grep', since the above model implies piping (as opposed to in-git processing done by --pickaxe-raw-diff).
On Thu, Apr 25, 2019 at 02:25:13PM +0200, Ævar Arnfjörð Bjarmason wrote: [..] > Do you or anyone have a suggestion for a better CLI option name? > > Maybe --pickaxe-patch or --pickaxe-patch-format (to go with git-diff's > -u aka --patch (i.e. not --raw) default format)? Or > --pickaxe-G-with-context or --pickaxe-with-context or > --with-pickaxe-context or --pickaxe-context ? All of these suck, but I'm > coming up blank on a better one :) '--pickaxe-patch' is shorter than '--pickaxe-raw-diff', hence more convenient to me. It looks like 'pickaxe-all' and 'pickaxe-regex' are the only --pickaxe-* options currently implemented. Both of them are two-worded only and easy to remember/type. I think '--pickaxe-patch' is more user-friendly, but I leave git people to say the final word. > > Probably the least shitty of those shitty options is --pickaxe-patch, > since we have --patch which triggers the same format, and we can > document that the default is a -G search through --no-pickaxe-patch, and > you can just tweak the format. > > It also leaves the door open (unlike having *-G-* in the option) to > support this for -S if anyone cared...
diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt index 09faee3b44..f367b40362 100644 --- a/Documentation/diff-options.txt +++ b/Documentation/diff-options.txt @@ -579,6 +579,9 @@ occurrences of that string did not change). Unless `--text` is supplied patches of binary files without a textconv filter will be ignored. + +When `--pickaxe-raw-diff` is supplied the whole diff is searched +instead of just added/removed lines. See below. ++ See the 'pickaxe' entry in linkgit:gitdiffcore[7] for more information. @@ -600,6 +603,20 @@ The object can be a blob or a submodule commit. It implies the `-t` option in Treat the <string> given to `-S` as an extended POSIX regular expression to match. +--pickaxe-raw-diff:: + When `-G` looks for a change a diff will be generated, and + only the added/removed lines will be matched against with the + "+" or "-" stripped. ++ +Supplying this option skips that pre-processing. This makes it +possible to match only lines that added or removed something matching +a <regex> with "\^\+<regex>" and "^-<regex>", respectively. ++ +It also allows for finding something in the diff context. E.g. "\^ +<regex>" will match the context lines (see `-U<n>` above) around the +added/removed lines, and doing an unanchored match will match any of +the the added/removed lines & diff context. + endif::git-format-patch[] -O<orderfile>:: diff --git a/diff.c b/diff.c index 4d3cf83a27..4cdc000ee5 100644 --- a/diff.c +++ b/diff.c @@ -5503,6 +5503,9 @@ static void prep_parse_options(struct diff_options *options) OPT_BIT_F(0, "pickaxe-regex", &options->pickaxe_opts, N_("treat <string> in -S as extended POSIX regular expression"), DIFF_PICKAXE_REGEX, PARSE_OPT_NONEG), + OPT_BIT_F(0, "pickaxe-raw-diff", &options->pickaxe_opts, + N_("have <string> in -G match the raw diff output"), + DIFF_PICKAXE_G_RAW_DIFF, PARSE_OPT_NONEG), OPT_FILENAME('O', NULL, &options->orderfile, N_("control the order in which files appear in the output")), OPT_CALLBACK_F(0, "find-object", options, N_("<object-id>"), diff --git a/diff.h b/diff.h index b20cbcc091..d431fbc602 100644 --- a/diff.h +++ b/diff.h @@ -370,6 +370,8 @@ int git_config_rename(const char *var, const char *value); DIFF_PICKAXE_KIND_OBJFIND) #define DIFF_PICKAXE_IGNORE_CASE 32 +#define DIFF_PICKAXE_G_RAW_DIFF 64 + void diffcore_std(struct diff_options *); void diffcore_fix_diff_index(void); diff --git a/diffcore-pickaxe.c b/diffcore-pickaxe.c index 3c6416bfe2..e23f04b4f0 100644 --- a/diffcore-pickaxe.c +++ b/diffcore-pickaxe.c @@ -17,14 +17,18 @@ typedef int (*pickaxe_fn)(mmfile_t *one, mmfile_t *two, struct diffgrep_cb { regex_t *regexp; int hit; + int raw_diff; }; static void diffgrep_consume(void *priv, char *line, unsigned long len) { struct diffgrep_cb *data = priv; regmatch_t regmatch; + int raw_diff = data->raw_diff; + const char *string = raw_diff ? line : line + 1; + size_t size = raw_diff ? len : len - 1; - if (line[0] != '+' && line[0] != '-') + if (!raw_diff && line[0] != '+' && line[0] != '-') return; if (data->hit) /* @@ -32,7 +36,7 @@ static void diffgrep_consume(void *priv, char *line, unsigned long len) * caller early. */ return; - data->hit = !regexec_buf(data->regexp, line + 1, len - 1, 1, + data->hit = !regexec_buf(data->regexp, string, size, 1, ®match, 0); } @@ -44,15 +48,36 @@ static int diff_grep(mmfile_t *one, mmfile_t *two, struct diffgrep_cb ecbdata; xpparam_t xpp; xdemitconf_t xecfg; + int raw_diff = o->pickaxe_opts & DIFF_PICKAXE_G_RAW_DIFF; + struct strbuf sb = STRBUF_INIT; + char *string; + size_t size; if (!one || !two) { mmfile_t *which = one ? one : two; int ret; - char *string = which->ptr; - size_t size = which->size; assert(!(!one && !two)); + if (raw_diff) { + /* + * When we have created/deleted files with + * --pickaxe-raw-diff we need to fake up the + * "+" and "-" at the start of the lines, a + * plain -G without --pickaxe-raw-diff didn't + * care since it would indiscriminately search + * through both added and removed lines. + */ + strbuf_add_lines(&sb, !one ? "+" : "-", which->ptr, + which->size); + string = sb.buf; + size = sb.len; + } else { + string = which->ptr; + size = which->size; + } ret = !regexec_buf(regexp, string, size, 1, ®match, 0); + if (raw_diff) + strbuf_release(&sb); return ret; } @@ -64,6 +89,7 @@ static int diff_grep(mmfile_t *one, mmfile_t *two, memset(&xecfg, 0, sizeof(xecfg)); ecbdata.regexp = regexp; ecbdata.hit = 0; + ecbdata.raw_diff = raw_diff; xecfg.ctxlen = o->context; xecfg.interhunkctxlen = o->interhunkcontext; if (xdi_diff_outf(one, two, discard_hunk_line, diffgrep_consume, diff --git a/t/t4013-diff-various.sh b/t/t4013-diff-various.sh index 9f8f0e84ad..39a1f6c230 100755 --- a/t/t4013-diff-various.sh +++ b/t/t4013-diff-various.sh @@ -276,6 +276,7 @@ log -SF master --max-count=1 log -SF master --max-count=2 log -GF master log -GF -p master +log -GF -p --pickaxe-raw-diff master log -GF -p --pickaxe-all master log --decorate --all log --decorate=full --all diff --git a/t/t4209-log-pickaxe.sh b/t/t4209-log-pickaxe.sh index 5d06f5f45e..2d98318d23 100755 --- a/t/t4209-log-pickaxe.sh +++ b/t/t4209-log-pickaxe.sh @@ -141,4 +141,49 @@ test_expect_success 'log -S looks into binary files' ' test_cmp log full-log ' +test_expect_success 'setup log -G --pickaxe-raw-diff' ' + git checkout --orphan G-raw-diff && + test_write_lines A B C D E F G >file && + git add file && + git commit --allow-empty-message file && + sed -i -e "s/B/2/" file && + git add file && + git commit --allow-empty-message file && + sed -i -e "s/D/4/" file && + git add file && + git commit --allow-empty-message file && + git rm file && + git commit --allow-empty-message && + git log --oneline -1 HEAD~0 >file.fourth && + git log --oneline -1 HEAD~1 >file.third && + git log --oneline -1 HEAD~2 >file.second && + git log --oneline -1 HEAD~3 >file.first +' + +test_expect_success 'log -G --pickaxe-raw-diff skips header and range information' ' + git log --pickaxe-raw-diff -p -G"(@@|file)" >log && + test_must_be_empty log +' + +test_expect_success 'log -G --pickaxe-raw-diff searching in context' ' + git log --oneline --pickaxe-raw-diff -G"^ F" -U2 -s >log && + test_cmp file.third log && + git log --oneline --pickaxe-raw-diff -G"^ F" -U1 -s >log && + test_must_be_empty log +' + +test_expect_success 'log -G --pickaxe-raw-diff searching added / removed lines (skip create/delete)' ' + git log --oneline --pickaxe-raw-diff -G"^-[D2]" -s HEAD~1 >log && + test_cmp file.third log && + git log --oneline --pickaxe-raw-diff -G"^\+[D2]" -s -1 >log && + test_cmp file.second log +' + +test_expect_success 'log -G --pickaxe-raw-diff searching created / deleted files' ' + git log --oneline --pickaxe-raw-diff -G"^\+A" -s >log && + test_cmp file.first log && + git log --oneline --pickaxe-raw-diff -G"^\-A" -s >log && + test_cmp file.fourth log +' + test_done
Add the ability for the -G<regex> pickaxe to search only through added or removed lines in the diff, or even through an arbitrary amount of context lines when combined with -U<n>. This has been requested[1][2] a few times in the past, and isn't currently possible. Instead users need to do -G<regex> and then write their own post-parsing script to see if the <regex> matched added or removed lines, or both. There was no way to match the adjacent context lines other than running and grepping the equivalent of a "log -p -U<n>". 1. https://public-inbox.org/git/xmqqwoqrr8y2.fsf@gitster-ct.c.googlers.com/ 2. https://public-inbox.org/git/20190424102609.GA19697@vmlxhi-102.adit-jv.com/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> --- Documentation/diff-options.txt | 17 +++++++++++++ diff.c | 3 +++ diff.h | 2 ++ diffcore-pickaxe.c | 34 ++++++++++++++++++++++--- t/t4013-diff-various.sh | 1 + t/t4209-log-pickaxe.sh | 45 ++++++++++++++++++++++++++++++++++ 6 files changed, 98 insertions(+), 4 deletions(-)