Message ID | 20190107213013.231514-1-brho@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | blame: add the ability to ignore commits | expand |
Barret Rhoden <brho@google.com> writes: > Commits that make formatting changes or renames are often not > interesting when blaming a file. This commit, similar to > git-hyper-blame, allows one to specify certain revisions to ignore during > the blame process. > > To ignore a revision, put its committish in a file specified by > --ignore-file=<file> or use -i <rev>, which can be repeated. If I read it correctly, this gives a very limited form of -S, in the sense that anything this can do can be expressed by using -S but the reverse is not true, but is designed to be easier to use, in the sense that unlike -S, this does not have to describe the part of the history you do not have to lie about. The documentation should say something about these pros-and-cons to help readers decide which feature to use. I somehow feel that this is rare enough that it should not squat on short-and-sweet '-i'. We would want to reserve it to something like "--ignore-case", for example. > The file .git-blame-ignore-revs is checked by default. Giving the projects a way to easily help participants to share the same distorted view of the history is a good idea, but I do not think we should allow projects to do so to those who merely clone it without their consent. IOW, "by default" is a terrible idea. Perhaps paying attention to blame.skipRevsFile configuration variable that is set by the end user would be sufficient----the project can ship .blame-skip-revs (or any random filename of their choice) in-tree as tracked contents and tell its users that they can optionally use it. > It's useful to be alerted to the presence of an ignored commit in the > history of a line. Those lines will be marked with '*' in the > non-porcelain output. The '*' is attached to the line number to keep > from breaking tools that rely on the whitespace between columns. A policy decision like the above two shouldn't be hardcoded in the feature like this, but should be done as a separate option. By default, these shouldn't be marked with '*', as the same tools you said you are afraid of breaking would be expecting a word with only digits and no asterisk in the column where line numbers appear and will get broken by this change if done unconditionally. In general, I find this patch trying to change too many things at the same time, without sufficient justification. Perhaps do these different things as separate steps in a single series? > A blame_entry attributed to an ignored commit will get passed to its > parent. Obviously, an interesting consideration is what happens when a merge commit is skipped. Is it sufficient to change this description to "...will get passed to its parentS", or would the code do completely nonsensical things without further tweaks (a possible simple tweak could be to refuse skipping merges)? > If an ignored commit changed a line, an ancestor that changed > the line will get blamed. However, if an ignored commit added lines, a > commit changing a nearby line may get blamed. If no commit is found, > the original commit for the file will get blamed. The above somehow does not read as describing a carefully designed behaviour; rather, it sounds as if it is saying "the code does whatever random things it happens to do". For example, when there is a newly added line how is "A" commit changing a nearby line chosen (a line has lines before it and after it---they may be touched by different commits, and before and after that skipped commit, so there are multiple commits to choose from)? > diff --git a/Documentation/git-blame.txt b/Documentation/git-blame.txt > index 16323eb80e31..e41375374892 100644 > --- a/Documentation/git-blame.txt > +++ b/Documentation/git-blame.txt > @@ -10,6 +10,7 @@ SYNOPSIS > [verse] > 'git blame' [-c] [-b] [-l] [--root] [-t] [-f] [-n] [-s] [-e] [-p] [-w] [--incremental] > [-L <range>] [-S <revs-file>] [-M] [-C] [-C] [-C] [--since=<date>] > + [-i <rev>] [--no-default-ignores] [--ignore-file=<file>] > [--progress] [--abbrev=<n>] [<rev> | --contents <file> | --reverse <rev>..<rev>] > [--] <file> > > @@ -84,6 +85,20 @@ include::blame-options.txt[] > Ignore whitespace when comparing the parent's version and > the child's to find where the lines came from. > > +-i <rev>:: > + Ignore revision when assigning blame. Lines that were changed by an > + ignored commit will be marked with a `*` in the blame output. Lines > + that were added by an ignored commit may be attributed commits making > + nearby changes or to the first commit touching the file. It probably deserves to be told that this option can be given multiple times and used cumulatively (unlike usual "last one wins" rule). > +--no-default-ignores:: > + Do not automatically ignore revisions in the file > + `.git-blame-ignore-revs`. This should not be "opt-out" like this. > +--ignore-file=<file>:: > + Ignore revisions listed in `file`, one revision per line. Whitespace > + and comments beginning with `#` are ignored. Should it be capable to take two or more ignore-files? Or should we use the usual "the last one wins" rule? I think we should support blame.skipRevFile configuration variable so that the users do not have to constantly give the option from the command line. And with that, there is no need to have a hardcoded filename .git-blame-ignore-revs or anything like that. If we are to use configuration variable, however, we'd need a way to override its use from the command line, too. Perhaps a sane arrangement would be - if one or more --skip-revs-file=<file> are given from the command line, use all of them and ignore blame.skipRevsFile configuration variable. - if no --skip-revs-file=<file> is given from the command line, use blame.skipRevsFile configuration variable. - regardless of the above two, pay attention to --skip-rev=<rev> command line option(s). Somehow the damage to blame.c codebase looks way too noisy for what the code wants to do. If all we want is to pretend in a history, e.g. ---O---A---B---C---D that commit B does not exist, i.e. use a distorted view of the history ---O---A-------C---D wouldn't it be sufficient to modify pass_blame(), i.e. the only the caller of the pass_blame_to_parent(), where we find the parent commits of "C" and dig the history to pass blame to "B", and have it pretend as if "B" does not exist and pass blame directly to "A"? Thanks. I am personally not all that interested in this yet.
On Mon, Jan 07 2019, Barret Rhoden wrote: > +static int handle_ignore_file(const char *path, struct string_list *ignores) > +{ > + FILE *fp = fopen(path, "r"); > + struct strbuf sb = STRBUF_INIT; > + > + if (!fp) > + return -1; > + while (!strbuf_getline(&sb, fp)) { > + const char *hash; > + > + hash = strchr(sb.buf, '#'); > + if (hash) > + strbuf_setlen(&sb, hash - sb.buf); > + strbuf_trim(&sb); > + if (!sb.len) > + continue; > + string_list_append(ignores, sb.buf); > + } > + fclose(fp); > + strbuf_release(&sb); > + return 0; > +} Aside from other comments on this patch that Junio had either you mostly copy-pasted this from init_skiplist() or you've come up with almost the same code on your own. In any case, if we're going to integrate something like this patch let's split this "parse file with SHA-1s or comments/whitespace" into a utility function that both this and init_skiplist() can call. Then we could split up the description for the fsck.skipList config variable to reference that format, and say that both it and this new thing should consult those docs for how it's parsed.
On 2019-01-07 at 15:13 Junio C Hamano <gitster@pobox.com> wrote: > If I read it correctly, this gives a very limited form of -S, in the > sense that anything this can do can be expressed by using -S but the > reverse is not true, but is designed to be easier to use, in the > sense that unlike -S, this does not have to describe the part of the > history you do not have to lie about. The documentation should say > something about these pros-and-cons to help readers decide which > feature to use. Yeah, -S lists the revs to use, this lists the revs to *not* use. I'll add a note. > I somehow feel that this is rare enough that it should not squat on > short-and-sweet '-i'. We would want to reserve it to something like > "--ignore-case", for example. Can do. I'll change the interface to your suggestion from down below. > > The file .git-blame-ignore-revs is checked by default. > > Giving the projects a way to easily help participants to share the > same distorted view of the history is a good idea, but I do not > think we should allow projects to do so to those who merely clone it > without their consent. IOW, "by default" is a terrible idea. > > Perhaps paying attention to blame.skipRevsFile configuration > variable that is set by the end user would be sufficient----the > project can ship .blame-skip-revs (or any random filename of their > choice) in-tree as tracked contents and tell its users that they can > optionally use it. A blame config option works for me. I'd like the users/cloners of a repo to not need to do anything extravagant, but a one-time config would be fine. > > It's useful to be alerted to the presence of an ignored commit in the > > history of a line. Those lines will be marked with '*' in the > > non-porcelain output. The '*' is attached to the line number to keep > > from breaking tools that rely on the whitespace between columns. > > A policy decision like the above two shouldn't be hardcoded in the > feature like this, but should be done as a separate option. By > default, these shouldn't be marked with '*', as the same tools you > said you are afraid of breaking would be expecting a word with only > digits and no asterisk in the column where line numbers appear and > will get broken by this change if done unconditionally. Since users are already opting-in to the blame-ignore, do you also want them to opt-in to the annotation? I can make a separate config option to turn on the annotation. Any preference for how it is marked? > In general, I find this patch trying to change too many things at > the same time, without sufficient justification. Perhaps do these > different things as separate steps in a single series? > > > A blame_entry attributed to an ignored commit will get passed to its > > parent. > > Obviously, an interesting consideration is what happens when a merge > commit is skipped. Is it sufficient to change this description to > "...will get passed to its parentS", or would the code do completely > nonsensical things without further tweaks (a possible simple tweak > could be to refuse skipping merges)? If we skip a merge commit, it might pick the wrong parent. For example, this line was brought in via a merge: $ ~/src/git/git-blame include/linux/mm.h | grep VM_SYNC b6fb293f2497a (Jan Kara 2017-11-01 16:36:41 +0100 204) #define VM_SYNC It's from merge: a3841f94c7ec ("Merge tag 'libnvdimm-for-4.15', and if we ignore it: $ ~/src/git/git-blame -i a3841f94c7ecb include/linux/mm.h | grep VM_SYNC cc2383ec06be0 (Konstantin Khlebnikov 2012-10-08 16:28:37 -0700 204*) #define VM_SYNC The wrong commit is blamed. I can put a note in the doc about it and die if we're given a merge commit. Is there a convenient helper for detecting a merge, or can I just check for multiple parents? (something like commit->parents && commit->parents->next) > > If an ignored commit changed a line, an ancestor that changed > > the line will get blamed. However, if an ignored commit added lines, a > > commit changing a nearby line may get blamed. If no commit is found, > > the original commit for the file will get blamed. > > The above somehow does not read as describing a carefully designed > behaviour; rather, it sounds as if it is saying "the code does > whatever random things it happens to do". For example, when there > is a newly added line how is "A" commit changing a nearby line > chosen (a line has lines before it and after it---they may be > touched by different commits, and before and after that skipped > commit, so there are multiple commits to choose from)? This was more of a commentary about its behavior. If you ignore a commit that added lines, it'd be nice to get a hint of what might have caused it, and picking a commit that affected an adjacent line seemed fine. But yeah, it's not doing anything crazy. > > diff --git a/Documentation/git-blame.txt b/Documentation/git-blame.txt > > index 16323eb80e31..e41375374892 100644 > > --- a/Documentation/git-blame.txt > > +++ b/Documentation/git-blame.txt > > @@ -10,6 +10,7 @@ SYNOPSIS > > [verse] > > 'git blame' [-c] [-b] [-l] [--root] [-t] [-f] [-n] [-s] [-e] [-p] [-w] [--incremental] > > [-L <range>] [-S <revs-file>] [-M] [-C] [-C] [-C] [--since=<date>] > > + [-i <rev>] [--no-default-ignores] [--ignore-file=<file>] > > [--progress] [--abbrev=<n>] [<rev> | --contents <file> | --reverse <rev>..<rev>] > > [--] <file> > > > > @@ -84,6 +85,20 @@ include::blame-options.txt[] > > Ignore whitespace when comparing the parent's version and > > the child's to find where the lines came from. > > > > +-i <rev>:: > > + Ignore revision when assigning blame. Lines that were changed by an > > + ignored commit will be marked with a `*` in the blame output. Lines > > + that were added by an ignored commit may be attributed commits making > > + nearby changes or to the first commit touching the file. > > It probably deserves to be told that this option can be given > multiple times and used cumulatively (unlike usual "last one wins" > rule). > > > +--no-default-ignores:: > > + Do not automatically ignore revisions in the file > > + `.git-blame-ignore-revs`. > > This should not be "opt-out" like this. > > > +--ignore-file=<file>:: > > + Ignore revisions listed in `file`, one revision per line. Whitespace > > + and comments beginning with `#` are ignored. > > Should it be capable to take two or more ignore-files? Or should we > use the usual "the last one wins" rule? > > I think we should support blame.skipRevFile configuration variable > so that the users do not have to constantly give the option from the > command line. And with that, there is no need to have a hardcoded > filename .git-blame-ignore-revs or anything like that. > > If we are to use configuration variable, however, we'd need a way to > override its use from the command line, too. Perhaps a sane > arrangement would be > > - if one or more --skip-revs-file=<file> are given from the > command line, use all of them and ignore blame.skipRevsFile > configuration variable. > > - if no --skip-revs-file=<file> is given from the command line, > use blame.skipRevsFile configuration variable. > > - regardless of the above two, pay attention to --skip-rev=<rev> > command line option(s). Sounds fine to me. > Somehow the damage to blame.c codebase looks way too noisy for what > the code wants to do. If all we want is to pretend in a history, > e.g. > > ---O---A---B---C---D > > that commit B does not exist, i.e. use a distorted view of the > history > > ---O---A-------C---D > > wouldn't it be sufficient to modify pass_blame(), i.e. the only the > caller of the pass_blame_to_parent(), where we find the parent > commits of "C" and dig the history to pass blame to "B", and have it > pretend as if "B" does not exist and pass blame directly to "A"? I originally tried to skip 'B' in pass_blame() when B popped up as a scapegoat. That broke the offsets of the blame_entries in the parent. By running diff_hunks(), we get the opportunity to adjust the offsets. Also, when it comes to marking the blame_entries for marking the output, we want to know the specific diffs and to break them up at the boundaries of [tlno,same) in blame_chunk(). > Thanks. I am personally not all that interested in this yet. Thanks for taking a look. Barret
On 2019-01-08 at 14:12 Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote: > On Mon, Jan 07 2019, Barret Rhoden wrote: > > > +static int handle_ignore_file(const char *path, struct string_list *ignores) > > +{ > > + FILE *fp = fopen(path, "r"); > > + struct strbuf sb = STRBUF_INIT; > > + > > + if (!fp) > > + return -1; > > + while (!strbuf_getline(&sb, fp)) { > > + const char *hash; > > + > > + hash = strchr(sb.buf, '#'); > > + if (hash) > > + strbuf_setlen(&sb, hash - sb.buf); > > + strbuf_trim(&sb); > > + if (!sb.len) > > + continue; > > + string_list_append(ignores, sb.buf); > > + } > > + fclose(fp); > > + strbuf_release(&sb); > > + return 0; > > +} > > Aside from other comments on this patch that Junio had either you mostly > copy-pasted this from init_skiplist() or you've come up with almost the > same code on your own. > > In any case, if we're going to integrate something like this patch let's > split this "parse file with SHA-1s or comments/whitespace" into a > utility function that both this and init_skiplist() can call. One minor difference is that fsck wants an unabbreviated SHA-1, using parse_oid_hex() instead of get_oid_committish(). Would you be OK with also changing fsck to take a committish instead of a full SHA-1? Is there a good place for the common helper? Since it's an oidset, I could put it in oidset.c. oidset_parse_file() or something. > Then we could split up the description for the fsck.skipList config > variable to reference that format, and say that both it and this new > thing should consult those docs for how it's parsed. Is there a good spot for the generic skipList documentation? The only common text would be: ... comments ('#'), empty lines, and any leading and trailing whitespace is ignored Thanks, Barret
On 2019-01-08 at 11:41 Barret Rhoden <brho@google.com> wrote: > Would you be OK with > also changing fsck to take a committish instead of a full SHA-1? Actually, in retrospect, I can keep the unabbreviated SHA-1 for the file inputs and use get_oid_committish() for the one-off --skip-rev= cases. Thanks, Barret
Barret Rhoden <brho@google.com> writes: >> A policy decision like the above two shouldn't be hardcoded in the >> feature like this, but should be done as a separate option. By >> default, these shouldn't be marked with '*', as the same tools you >> said you are afraid of breaking would be expecting a word with only >> digits and no asterisk in the column where line numbers appear and >> will get broken by this change if done unconditionally. > > Since users are already opting-in to the blame-ignore, do you also want > them to opt-in to the annotation? Absolutely. After all, the users of a moral equivalent that is -S never needed such an extra annotation, which tells us two things. (1) the claim "It's useful to be alerted to the presence of an ignored commit" in the proposed log message is merely a personal preference and universal users' requirement; (2) if it is useful to mark a blame-source whose parents (used while blaming) do not match the actual parents, such an annotation would also be useful while running -S. So probably it should be a separate option that can be given when any of the --skip-commit=<rev>, --skip-commits-file=<file>, r -S<file> option is given. >> Obviously, an interesting consideration is what happens when a merge >> commit is skipped. Is it sufficient to change this description to >> "...will get passed to its parentS", or would the code do completely >> nonsensical things without further tweaks (a possible simple tweak >> could be to refuse skipping merges)? > > If we skip a merge commit, it might pick the wrong parent. Actually after thinking about it a bit more, I no longer think a merge is so special. In this topology (as usual, time flows from left to right), if you are skipping M, ---A---M---C---D / ----B you'd simply pretend that the ancestry is like this and you'd be fine, no? ---A-------C---D / --------B That is, when inspecting C, pass_blame() would enumerate its true parents, notice that M to be skipped, which would cause it to pretend as if C has M's parents instead of M itself as its parents. If M in the true history is the first parent of C, then M's first parent in the true history becomes C's first parent, M's second parent in the true history becomes C's second parent, etc. (if C were a merge in the true history, such a rewriting would make C an octopus in the distorted history) > The wrong commit is blamed. So... I still suspect that it is merely a bug in your implementation and there is nothing inherent that forces us to avoid skipping a merge. >> Somehow the damage to blame.c codebase looks way too noisy for what >> the code wants to do. If all we want is to pretend in a history, >> e.g. >> >> ---O---A---B---C---D >> >> that commit B does not exist, i.e. use a distorted view of the >> history >> >> ---O---A-------C---D >> >> wouldn't it be sufficient to modify pass_blame(), i.e. the only the >> caller of the pass_blame_to_parent(), where we find the parent >> commits of "C" and dig the history to pass blame to "B", and have it >> pretend as if "B" does not exist and pass blame directly to "A"? > > I originally tried to skip 'B' in pass_blame() when B popped up as a > scapegoat. That broke the offsets of the blame_entries in the > parent. Why? When inspecting C in order to exonerate it from as many lines as possible, we'd run "git diff $P C" for each parent of C, but in the distorted history (i.e. instead of using $P==B, we use $P==A). As long as the code reads from "git diff A C", I do not see why "the offsets in the parent" can be broken. Care to elaborate a bit more?
Hi - On 2019-01-08 at 10:26 Junio C Hamano <gitster@pobox.com> wrote: > >> Obviously, an interesting consideration is what happens when a merge > >> commit is skipped. Is it sufficient to change this description to > >> "...will get passed to its parentS", or would the code do completely > >> nonsensical things without further tweaks (a possible simple tweak > >> could be to refuse skipping merges)? > > > > If we skip a merge commit, it might pick the wrong parent. > > Actually after thinking about it a bit more, I no longer think a > merge is so special. In this topology (as usual, time flows from > left to right), if you are skipping M, > > ---A---M---C---D > / > ----B > > you'd simply pretend that the ancestry is like this and you'd be > fine, no? > > > ---A-------C---D > / > --------B > > That is, when inspecting C, pass_blame() would enumerate its true > parents, notice that M to be skipped, which would cause it to > pretend as if C has M's parents instead of M itself as its parents. > If M in the true history is the first parent of C, then M's first > parent in the true history becomes C's first parent, M's second > parent in the true history becomes C's second parent, etc. (if C > were a merge in the true history, such a rewriting would make C an > octopus in the distorted history) > > > The wrong commit is blamed. > > So... I still suspect that it is merely a bug in your implementation > and there is nothing inherent that forces us to avoid skipping a > merge. Probably! =) I tried a bunch of things initially, and might have had things broken for other reasons. One reason was not calling parse_commit() at the right time. I took another look and had first_scapegoat() scan the parent list, and for any parent in the list, replace it with its own parent list. e.g. when removing 'A': ---X---A---B---C / Y (B's parent was A, A is the skipped commit, and A has parents X and Y) becomes: ---X---B---C / Y But just yanking skipped commits from the scapegoat list doesn't help, since B gets blamed for the changes made by A, since the file B and {X,Y} differ, specifically due to the changes A made. You'd think that would have worked, since it sounds similar to what -S revs-file does, but it turns out I want different behavior than -S for the ignore/skip. With -S, you blame a nearby commit (at least based on my testing - what is the desired behavior there?). With ignore/skip, I want to blame the last commit that modified the line. For instance, commit X does this: -foo(x,y); +foo(x,y,z); Then commit Y comes along to reformat it: -foo(x,y,z); +foo(x, y, z); And the history / rev-list for the file looks like: ---O---A---X---B---C---D---Y---E---F I want to ignore/skip Y and see X in the blame output. Sorry that I wasn't clear about that from the beginning; I can see how one could have different expectations for what happens when a commit is skipped. > >> Somehow the damage to blame.c codebase looks way too noisy for what > >> the code wants to do. If all we want is to pretend in a history, > >> e.g. > >> > >> ---O---A---B---C---D > >> > >> that commit B does not exist, i.e. use a distorted view of the > >> history > >> > >> ---O---A-------C---D > >> > >> wouldn't it be sufficient to modify pass_blame(), i.e. the only the > >> caller of the pass_blame_to_parent(), where we find the parent > >> commits of "C" and dig the history to pass blame to "B", and have it > >> pretend as if "B" does not exist and pass blame directly to "A"? > > > > I originally tried to skip 'B' in pass_blame() when B popped up as a > > scapegoat. That broke the offsets of the blame_entries in the > > parent. > > Why? When inspecting C in order to exonerate it from as many lines > as possible, we'd run "git diff $P C" for each parent of C, but in > the distorted history (i.e. instead of using $P==B, we use $P==A). > As long as the code reads from "git diff A C", I do not see why "the > offsets in the parent" can be broken. Care to elaborate a bit more? When we run "git diff $P C" with $P == A, the diffs from B (the one we want to remove) will pop up and be attributed to C. This is what happened when I tried the "yank B from the history" approach from above. If we ever diff over a skipped commit like that, we'll blame the wrong one. (C, here). The first thing I tried was to simply move the blame_entry list from C to B to A without calling diff_hunks(). That's where the "file offsets were broken," since e->s_lno should be adjusted when passing an entry from the target blame_origin to its parent blame_origin. An example of this is in blame_chunk() when we hand every blame_entry before tlno to the parent in the "e->s_lno < tlno" block. Without diffing commit B, we can't propagate the blame_entries. And if we just diff A and C, we'll get the "wrong" commit. The other reason to run the diffs was to identify and mark the blame_entry that would have been blamed for the skipped commit. That's for the annotations during the output phase. If we don't track and taint the blame entries, you could just call blame_chunk() from blame_chunk_cb() with tlno = same, which in essence tells the blame code that "this diff belongs to the parent." But that loses the blame_entry boundaries, and is almost nearly as much of a change to blame.c. When it comes to skipping merges, my existing code couldn't do it, but I found a way to make it work: ---A---M---C---D / ---X---B Say the commit changing the line/blame_entry was X, and we try to skip M. 'A' made no changes to the line. If B is the first scapegoat, pass_blame_to_parent() will see no diff between B and M for that line, and the blame will get passed to B, and we're done. This is the same both with my change and without. However, if A is the first scapegoat, the line will have a diff, and from the perspective of the ---A---M---C graph, it looks like M introduced the change. (In fact it did - it was via B though). The existing code would keep the blame on M and then check the next scapegoat: B. With my change, it couldn't tell the difference between M introducing a change that we wanted to skip and M having another parent that introduced the change. Both look like a diff from A..M. I changed my version to handle that case. In pass_blame(), if it fails to find a scapegoat in the loop that calls pass_blame_to_parent(), it'll make another loop through the scapegoats, where it will pass the blame to the first scapegoat that has a diff. This way, we get a first attempt to pass the blame for real (e.g. to B), and then when we failed, we skip the commit. In other words, we only skip the commit when we need to. With that change, from my earlier example: This was wrong: $ git-blame --skip-rev a3841f94c7ecb include/linux/mm.h | grep VM_SYNC cc2383ec06be0 (Konstantin Khlebnikov 2012-10-08 16:28:37 -0700 204*) #define VM_SYNC This is right: $ git-blame --skip-rev a3841f94c7ecb include/linux/mm.h | grep VM_SYNC b6fb293f2497a (Jan Kara 2017-11-01 16:36:41 +0100 204) #define VM_SYNC Also note that in the 'right' case it didn't annotate it with the *. That's because we didn't need to ignore the commit. b6fb29 was the correct one, and we didn't need to ignore M. Anyway, thanks for reading and for the feedback. I'll clean things up and explain it better in another patchset. Thanks, Barret
On Tue, Jan 8, 2019 at 10:26 AM Junio C Hamano <gitster@pobox.com> wrote: > > Barret Rhoden <brho@google.com> writes: > > >> A policy decision like the above two shouldn't be hardcoded in the > >> feature like this, but should be done as a separate option. By > >> default, these shouldn't be marked with '*', as the same tools you > >> said you are afraid of breaking would be expecting a word with only > >> digits and no asterisk in the column where line numbers appear and > >> will get broken by this change if done unconditionally. > > > > Since users are already opting-in to the blame-ignore, do you also want > > them to opt-in to the annotation? > > Absolutely. > > After all, the users of a moral equivalent that is -S > never needed such an extra annotation, which tells us two things. > (1) the claim "It's useful to be alerted to the presence of an > ignored commit" in the proposed log message is merely a personal > preference and universal users' requirement; (2) if it is useful to > mark a blame-source whose parents (used while blaming) do not match > the actual parents, such an annotation would also be useful while > running -S. So probably it should be a separate option that can be > given when any of the --skip-commit=<rev>, --skip-commits-file=<file>, > r -S<file> option is given. From a users point of view it may be desirable to express all this in the grafts file, i.e. -S <file> where the syntax of that file is extended. For example we could introduce !<hash> to make it exclude that commit. Of course this could lead to confusion, as this puts 2 very different concepts into the same option/file. Speaking of the implementation: This patch proposes an oid-set that is handled specially in blame itself. I wonder if this could be generalized. Jonathan Tan (cc'd) refactored and extended revision walking for git-fetch and its negotiation leading to 7c85ee6c58 (Merge branch 'jt/fetch-negotiator-skipping', 2018-08-02), and 3390e42adb (fetch-pack: support negotiation tip whitelist, 2018-07-02) which implements another revision walking algorithm that can be used to fine-tune revisions walked when fetching. I wonder if that work could be generalized more to have "generic" revision walking algorithms and then making use of them in either fetch or blame. For git-fetch there is a new algorithm that increases step size between commits, which would be funny to try for blame here. It would give the wrong blamed commit, but would speed up blaming a lot. Omitting some revisions seems to be applicable to more than just blame/fetch, too. Stefan
Barret Rhoden <brho@google.com> writes: > For instance, commit X does this: > > -foo(x,y); > +foo(x,y,z); > > Then commit Y comes along to reformat it: > > -foo(x,y,z); > +foo(x, y, z); > > And the history / rev-list for the file looks like: > > ---O---A---X---B---C---D---Y---E---F > > I want to ignore/skip Y and see X in the blame output. If you skip Y, the altered history would have "foo(x, y, z)" in E, "foo(x,y,z)" in X, and "foo(x,y)" in A. If you start blaming from F, you'd get E as the commit that explains the latest state. If you do not skip Y, you'd get Y. I am not sure how you'd get X in either case.
On 2019-01-10 at 14:29 Junio C Hamano <gitster@pobox.com> wrote: > > For instance, commit X does this: > > > > -foo(x,y); > > +foo(x,y,z); > > > > Then commit Y comes along to reformat it: > > > > -foo(x,y,z); > > +foo(x, y, z); > > > > And the history / rev-list for the file looks like: > > > > ---O---A---X---B---C---D---Y---E---F > > > > I want to ignore/skip Y and see X in the blame output. > > If you skip Y, the altered history would have "foo(x, y, z)" in E, > "foo(x,y,z)" in X, and "foo(x,y)" in A. If you start blaming from > F, you'd get E as the commit that explains the latest state. If you > do not skip Y, you'd get Y. I am not sure how you'd get X in either > case. The way to do it is to let the blames get passed to Y - don't yank it from the graph. Then when trying to pass the blames from Y to its parent, when we get a diff chunk that Y is responsible for, instead of keeping it on Y's suspect list, we hand the blame_entry to D. That blame_entry will get passed all the way back to X, which also has a diff that touches that line. Basically we do the same blame processing as usual, but just don't let any blames stick to Y. Barret
Barret Rhoden <brho@google.com> writes: > On 2019-01-10 at 14:29 Junio C Hamano <gitster@pobox.com> wrote: >> > For instance, commit X does this: >> > >> > -foo(x,y); >> > +foo(x,y,z); >> > >> > Then commit Y comes along to reformat it: >> > >> > -foo(x,y,z); >> > +foo(x, y, z); >> > >> > And the history / rev-list for the file looks like: >> > >> > ---O---A---X---B---C---D---Y---E---F >> > >> > I want to ignore/skip Y and see X in the blame output. >> >> If you skip Y, the altered history would have "foo(x, y, z)" in E, >> "foo(x,y,z)" in X, and "foo(x,y)" in A. If you start blaming from >> F, you'd get E as the commit that explains the latest state. If you >> do not skip Y, you'd get Y. I am not sure how you'd get X in either >> case. > > The way to do it is ... Sorry, I made a too-fuzzy statement. What I meant was, that unless you are ignoring E, I do not know why you "would want to" attribute a line "foo(x, y, z)" that appears in F to X. Starting from X up to D (and to Y in real history, but you are ignoring Y), the line was "foo(x,y,z)", after E, it is "foo(x, y, z)". I didn't mean to ask how you "would show" such a result---as I do not yet understand why you would want such a result to begin with.
On January 14, 2019 12:46, Junio C Hamano wrote: > Barret Rhoden <brho@google.com> writes: > > > On 2019-01-10 at 14:29 Junio C Hamano <gitster@pobox.com> wrote: > >> > For instance, commit X does this: > >> > > >> > -foo(x,y); > >> > +foo(x,y,z); > >> > > >> > Then commit Y comes along to reformat it: > >> > > >> > -foo(x,y,z); > >> > +foo(x, y, z); > >> > > >> > And the history / rev-list for the file looks like: > >> > > >> > ---O---A---X---B---C---D---Y---E---F > >> > > >> > I want to ignore/skip Y and see X in the blame output. > >> > >> If you skip Y, the altered history would have "foo(x, y, z)" in E, > >> "foo(x,y,z)" in X, and "foo(x,y)" in A. If you start blaming from F, > >> you'd get E as the commit that explains the latest state. If you do > >> not skip Y, you'd get Y. I am not sure how you'd get X in either > >> case. > > > > The way to do it is ... > > Sorry, I made a too-fuzzy statement. What I meant was, that unless you are > ignoring E, I do not know why you "would want to" attribute a line "foo(x, y, > z)" that appears in F to X. Starting from X up to D (and to Y in real history, but > you are ignoring Y), the line was "foo(x,y,z)", after E, it is "foo(x, y, z)". I > didn't mean to ask how you "would show" such a result---as I do not yet > understand why you would want such a result to begin with. From my own community, this came up also. The intent was to show everyone who touched a particular line, throughout history, not just the current one. Perhaps that is what Barret is going for. Regards, Randall
On 2019-01-14 at 13:26 "Randall S. Becker" <rsbecker@nexbridge.com> wrote: > > > > Sorry, I made a too-fuzzy statement. What I meant was, that unless you > are > > ignoring E, I do not know why you "would want to" attribute a line "foo(x, > y, > > z)" that appears in F to X. Starting from X up to D (and to Y in real > history, but > > you are ignoring Y), the line was "foo(x,y,z)", after E, it is "foo(x, y, > z)". I > > didn't mean to ask how you "would show" such a result---as I do not yet > > understand why you would want such a result to begin with. > > From my own community, this came up also. The intent was to show everyone > who touched a particular line, throughout history, not just the current one. > Perhaps that is what Barret is going for. Yeah. I want to find the most recent commit that changed a line, minus a set of commits that are deemed 'not interesting' by the user. The primary reason for this is to not see a blame attributed to a commit that does nothing but reformatting the code base. One could also use this feature for one-off investigation with the command-line switch. Imagine a manually driven git-blame + git-log, where you do a git-blame, check the commit, decide you want to see the next one back, and rerun git-blame with --skip-rev=SHA1. The Chromium depot_tools has a python tool that can do this, called hyper-blame: (https://commondatastorage.googleapis.com/chrome-infra-docs/flat/depot_tools/docs/html/git-hyper-blame.html). I thought it was a useful feature for all of git's users. And I also didn't want people to have to install depot_tools. Barret
"Randall S. Becker" <rsbecker@nexbridge.com> writes: > On January 14, 2019 12:46, Junio C Hamano wrote: >> Barret Rhoden <brho@google.com> writes: >> >> > On 2019-01-10 at 14:29 Junio C Hamano <gitster@pobox.com> wrote: >> >> > For instance, commit X does this: >> >> > >> >> > -foo(x,y); >> >> > +foo(x,y,z); >> >> > >> >> > Then commit Y comes along to reformat it: >> >> > >> >> > -foo(x,y,z); >> >> > +foo(x, y, z); >> >> > >> >> > And the history / rev-list for the file looks like: >> >> > >> >> > ---O---A---X---B---C---D---Y---E---F >> >> > >> >> > I want to ignore/skip Y and see X in the blame output. >> >> >> >> If you skip Y, the altered history would have "foo(x, y, z)" in E, >> >> "foo(x,y,z)" in X, and "foo(x,y)" in A. If you start blaming from F, >> >> you'd get E as the commit that explains the latest state. If you do >> >> not skip Y, you'd get Y. I am not sure how you'd get X in either >> >> case. >> > >> ... > > From my own community, this came up also. The intent was to show everyone > who touched a particular line, throughout history, not just the current one. > Perhaps that is what Barret is going for. I think I now understand what is going on. In short, what Barret wanted is way more ambitious than "blame in a hypothetical history that lacks certain commits". There is no similiarity with existing "-S <revs>" option that I asked about in my initial review. This is quite a different beast (and I kind-a like it ;-). Instead of "skipping" Y, Barret wants to pretend that the effect of commit Y, in addition to the commit Y itself, never existed in the history. Even when blaming back from F (where it has "foo(x, y, z)", which is the same as E and down to Y), the algorithm wants to pretend as if E and F had that line in "foo(x,y,z)" form, i.e. before Y touched it, and also pretend that the user started blaming a line in that shape. As there is no general solution [*1*] to "how did this line looked like before this commit", such a blame may not have a meaningful answer. When Y added a block of text without removing anything (i.e. "git show Y" has no line that begins with '-'), and when the user asked to blame that block of text in F, there is no sensible answer to the question: "what got replaced by Y in this text?", so a blame that "ignores" Y would not be able to answer "where did these lines come from?" question. But in some cases (e.g. when "git show Y" has only one '-' line, immediately followed by '+' line, both with quite a similar contents and no other change in the patch---it is highly plausible that in such a change, the former became the latter, like the example in Barret's message I was responding to), we can say "this was how the line that was modified in Y looked like before Y" and blaming while ignoring Y can have a reasonable answer. [Footnote] *1* Here, I use the word "solution" in the same sense as "solution to a math equation", not in the sense to describe the steps to derive that "solution".
diff --git a/Documentation/git-blame.txt b/Documentation/git-blame.txt index 16323eb80e31..e41375374892 100644 --- a/Documentation/git-blame.txt +++ b/Documentation/git-blame.txt @@ -10,6 +10,7 @@ SYNOPSIS [verse] 'git blame' [-c] [-b] [-l] [--root] [-t] [-f] [-n] [-s] [-e] [-p] [-w] [--incremental] [-L <range>] [-S <revs-file>] [-M] [-C] [-C] [-C] [--since=<date>] + [-i <rev>] [--no-default-ignores] [--ignore-file=<file>] [--progress] [--abbrev=<n>] [<rev> | --contents <file> | --reverse <rev>..<rev>] [--] <file> @@ -84,6 +85,20 @@ include::blame-options.txt[] Ignore whitespace when comparing the parent's version and the child's to find where the lines came from. +-i <rev>:: + Ignore revision when assigning blame. Lines that were changed by an + ignored commit will be marked with a `*` in the blame output. Lines + that were added by an ignored commit may be attributed commits making + nearby changes or to the first commit touching the file. + +--no-default-ignores:: + Do not automatically ignore revisions in the file + `.git-blame-ignore-revs`. + +--ignore-file=<file>:: + Ignore revisions listed in `file`, one revision per line. Whitespace + and comments beginning with `#` are ignored. + --abbrev=<n>:: Instead of using the default 7+1 hexadecimal digits as the abbreviated object name, use <n>+1 digits. Note that 1 column diff --git a/blame.c b/blame.c index d84c93778080..9e338cfa83e3 100644 --- a/blame.c +++ b/blame.c @@ -474,7 +474,8 @@ void blame_coalesce(struct blame_scoreboard *sb) for (ent = sb->ent; ent && (next = ent->next); ent = next) { if (ent->suspect == next->suspect && - ent->s_lno + ent->num_lines == next->s_lno) { + ent->s_lno + ent->num_lines == next->s_lno && + ent->ignored == next->ignored) { ent->num_lines += next->num_lines; ent->next = next->next; blame_origin_decref(next->suspect); @@ -726,6 +727,8 @@ static void split_overlap(struct blame_entry *split, int chunk_end_lno; memset(split, 0, sizeof(struct blame_entry [3])); + split[0].ignored = split[1].ignored = split[2].ignored = e->ignored; + if (e->s_lno < tlno) { /* there is a pre-chunk part not blamed on parent */ split[0].suspect = blame_origin_incref(e->suspect); @@ -845,10 +848,10 @@ static struct blame_entry *reverse_blame(struct blame_entry *head, */ static void blame_chunk(struct blame_entry ***dstq, struct blame_entry ***srcq, int tlno, int offset, int same, - struct blame_origin *parent) + struct blame_origin *parent, int ignore_diffs) { struct blame_entry *e = **srcq; - struct blame_entry *samep = NULL, *diffp = NULL; + struct blame_entry *samep = NULL, *diffp = NULL, *ignoredp = NULL; while (e && e->s_lno < tlno) { struct blame_entry *next = e->next; @@ -862,6 +865,7 @@ static void blame_chunk(struct blame_entry ***dstq, struct blame_entry ***srcq, int len = tlno - e->s_lno; struct blame_entry *n = xcalloc(1, sizeof (struct blame_entry)); n->suspect = e->suspect; + n->ignored = e->ignored; n->lno = e->lno + len; n->s_lno = e->s_lno + len; n->num_lines = e->num_lines - len; @@ -916,6 +920,7 @@ static void blame_chunk(struct blame_entry ***dstq, struct blame_entry ***srcq, int len = same - e->s_lno; struct blame_entry *n = xcalloc(1, sizeof (struct blame_entry)); n->suspect = blame_origin_incref(e->suspect); + n->ignored = e->ignored; n->lno = e->lno + len; n->s_lno = e->s_lno + len; n->num_lines = e->num_lines - len; @@ -925,10 +930,24 @@ static void blame_chunk(struct blame_entry ***dstq, struct blame_entry ***srcq, n->next = samep; samep = n; } - e->next = diffp; - diffp = e; + if (ignore_diffs) { + /* These go to the parent, like the ones before tlno. */ + blame_origin_decref(e->suspect); + e->suspect = blame_origin_incref(parent); + e->s_lno += offset; + e->ignored = 1; + e->next = ignoredp; + ignoredp = e; + } else { + e->next = diffp; + diffp = e; + } e = next; } + if (ignoredp) { + **dstq = reverse_blame(ignoredp, **dstq); + *dstq = &ignoredp->next; + } **srcq = reverse_blame(diffp, reverse_blame(samep, e)); /* Move across elements that are in the unblamable portion */ if (diffp) @@ -938,6 +957,7 @@ static void blame_chunk(struct blame_entry ***dstq, struct blame_entry ***srcq, struct blame_chunk_cb_data { struct blame_origin *parent; long offset; + int ignore_diffs; struct blame_entry **dstq; struct blame_entry **srcq; }; @@ -950,7 +970,7 @@ static int blame_chunk_cb(long start_a, long count_a, if (start_a - start_b != d->offset) die("internal error in blame::blame_chunk_cb"); blame_chunk(&d->dstq, &d->srcq, start_b, start_a - start_b, - start_b + count_b, d->parent); + start_b + count_b, d->parent, d->ignore_diffs); d->offset = start_a + count_a - (start_b + count_b); return 0; } @@ -973,18 +993,22 @@ static void pass_blame_to_parent(struct blame_scoreboard *sb, d.parent = parent; d.offset = 0; + d.ignore_diffs = 0; d.dstq = &newdest; d.srcq = &target->suspects; fill_origin_blob(&sb->revs->diffopt, parent, &file_p, &sb->num_read_blob); fill_origin_blob(&sb->revs->diffopt, target, &file_o, &sb->num_read_blob); sb->num_get_patch++; + if (oidset_contains(&sb->ignores, &target->commit->object.oid)) + d.ignore_diffs = 1; + if (diff_hunks(&file_p, &file_o, blame_chunk_cb, &d, sb->xdl_opts)) die("unable to generate diff (%s -> %s)", oid_to_hex(&parent->commit->object.oid), oid_to_hex(&target->commit->object.oid)); /* The rest are the same as the parent */ - blame_chunk(&d.dstq, &d.srcq, INT_MAX, d.offset, INT_MAX, parent); + blame_chunk(&d.dstq, &d.srcq, INT_MAX, d.offset, INT_MAX, parent, 0); *d.dstq = NULL; queue_blames(sb, parent, newdest); diff --git a/blame.h b/blame.h index be3a895043e0..3fe71a59d372 100644 --- a/blame.h +++ b/blame.h @@ -92,6 +92,7 @@ struct blame_entry { * scanning the lines over and over. */ unsigned score; + int ignored; }; /* @@ -117,6 +118,8 @@ struct blame_scoreboard { /* linked list of blames */ struct blame_entry *ent; + struct oidset ignores; + /* look-up a line in the final buffer */ int num_lines; int *lineno; diff --git a/builtin/blame.c b/builtin/blame.c index 6d798f99392e..698834426771 100644 --- a/builtin/blame.c +++ b/builtin/blame.c @@ -516,8 +516,13 @@ static void emit_other(struct blame_scoreboard *sb, struct blame_entry *ent, int ci.author_tz.buf, show_raw_time)); } - printf(" %*d) ", - max_digits, ent->lno + 1 + cnt); + if (ent->ignored) { + printf(" %*d%c) ", max_digits - 1, + ent->lno + 1 + cnt, '*'); + } else { + printf(" %*d) ", max_digits, + ent->lno + 1 + cnt); + } } if (reset) fputs(reset, stdout); @@ -603,6 +608,7 @@ static void find_alignment(struct blame_scoreboard *sb, int *option) { int longest_src_lines = 0; int longest_dst_lines = 0; + int has_ignore = 0; unsigned largest_score = 0; struct blame_entry *e; int compute_auto_abbrev = (abbrev < 0); @@ -639,9 +645,11 @@ static void find_alignment(struct blame_scoreboard *sb, int *option) longest_dst_lines = num; if (largest_score < blame_entry_score(sb, e)) largest_score = blame_entry_score(sb, e); + if (e->ignored) + has_ignore = 1; } max_orig_digits = decimal_width(longest_src_lines); - max_digits = decimal_width(longest_dst_lines); + max_digits = decimal_width(longest_dst_lines) + has_ignore; max_score_digits = decimal_width(largest_score); if (compute_auto_abbrev) @@ -774,6 +782,43 @@ static int is_a_rev(const char *name) return OBJ_NONE < oid_object_info(the_repository, &oid, NULL); } +static void handle_ignore_list(struct blame_scoreboard *sb, + struct string_list *ignores) +{ + struct string_list_item *i; + struct object_id oid; + + oidset_init(&sb->ignores, 0); + for_each_string_list_item(i, ignores) { + if (get_oid_committish(i->string, &oid)) + die(_("Can't find revision '%s' to ignore"), i->string); + oidset_insert(&sb->ignores, &oid); + } +} + +static int handle_ignore_file(const char *path, struct string_list *ignores) +{ + FILE *fp = fopen(path, "r"); + struct strbuf sb = STRBUF_INIT; + + if (!fp) + return -1; + while (!strbuf_getline(&sb, fp)) { + const char *hash; + + hash = strchr(sb.buf, '#'); + if (hash) + strbuf_setlen(&sb, hash - sb.buf); + strbuf_trim(&sb); + if (!sb.len) + continue; + string_list_append(ignores, sb.buf); + } + fclose(fp); + strbuf_release(&sb); + return 0; +} + int cmd_blame(int argc, const char **argv, const char *prefix) { struct rev_info revs; @@ -785,8 +830,11 @@ int cmd_blame(int argc, const char **argv, const char *prefix) struct progress_info pi = { NULL, 0 }; struct string_list range_list = STRING_LIST_INIT_NODUP; + struct string_list ignore_list = STRING_LIST_INIT_DUP; int output_option = 0, opt = 0; int show_stats = 0; + int no_default_ignores = 0; + const char *ignore_file = NULL; const char *revs_file = NULL; const char *contents_from = NULL; const struct option options[] = { @@ -806,6 +854,9 @@ int cmd_blame(int argc, const char **argv, const char *prefix) OPT_BIT('s', NULL, &output_option, N_("Suppress author name and timestamp (Default: off)"), OUTPUT_NO_AUTHOR), OPT_BIT('e', "show-email", &output_option, N_("Show author email instead of name (Default: off)"), OUTPUT_SHOW_EMAIL), OPT_BIT('w', NULL, &xdl_opts, N_("Ignore whitespace differences"), XDF_IGNORE_WHITESPACE), + OPT_STRING_LIST('i', NULL, &ignore_list, N_("rev"), N_("Ignore <rev> when blaming")), + OPT_BOOL(0, "no-default-ignores", &no_default_ignores, N_("Do not ignore revisions from the .git-blame-ignore-revs file")), + OPT_STRING(0, "ignore-file", &ignore_file, N_("file"), N_("Ignore revisions from <file>")), OPT_BIT(0, "color-lines", &output_option, N_("color redundant metadata from previous line differently"), OUTPUT_COLOR_LINE), OPT_BIT(0, "color-by-age", &output_option, N_("color lines by age"), OUTPUT_SHOW_AGE_WITH_COLOR), @@ -987,6 +1038,13 @@ int cmd_blame(int argc, const char **argv, const char *prefix) argv[argc - 1] = "--"; } + if (!no_default_ignores) + handle_ignore_file(".git-blame-ignore-revs", &ignore_list); + if (ignore_file) { + if (handle_ignore_file(ignore_file, &ignore_list)) + die(_("Unable to open ignore-file '%s'"), ignore_file); + } + revs.disable_stdin = 1; setup_revisions(argc, argv, &revs, NULL); @@ -995,6 +1053,8 @@ int cmd_blame(int argc, const char **argv, const char *prefix) sb.contents_from = contents_from; sb.reverse = reverse; sb.repo = the_repository; + handle_ignore_list(&sb, &ignore_list); + string_list_clear(&ignore_list, 0); setup_scoreboard(&sb, path, &o); lno = sb.num_lines;
Commits that make formatting changes or renames are often not interesting when blaming a file. This commit, similar to git-hyper-blame, allows one to specify certain revisions to ignore during the blame process. To ignore a revision, put its committish in a file specified by --ignore-file=<file> or use -i <rev>, which can be repeated. The file .git-blame-ignore-revs is checked by default. It's useful to be alerted to the presence of an ignored commit in the history of a line. Those lines will be marked with '*' in the non-porcelain output. The '*' is attached to the line number to keep from breaking tools that rely on the whitespace between columns. A blame_entry attributed to an ignored commit will get passed to its parent. If an ignored commit changed a line, an ancestor that changed the line will get blamed. However, if an ignored commit added lines, a commit changing a nearby line may get blamed. If no commit is found, the original commit for the file will get blamed. Signed-off-by: Barret Rhoden <brho@google.com> --- Documentation/git-blame.txt | 15 +++++++++ blame.c | 38 +++++++++++++++++---- blame.h | 3 ++ builtin/blame.c | 66 +++++++++++++++++++++++++++++++++++-- 4 files changed, 112 insertions(+), 10 deletions(-)