diff mbox series

[RFC/WIP] range-diff: show old/new blob OIDs in comments

Message ID 20191017121045.GA15364@dcvr (mailing list archive)
State New, archived
Headers show
Series [RFC/WIP] range-diff: show old/new blob OIDs in comments | expand

Commit Message

Eric Wong Oct. 17, 2019, 12:10 p.m. UTC
(WIP, mostly stream-of-concious notes + reasoning)

When using "git format-patch --range-diff", the pre and
post-image blob OIDs are in each email, while the exact
commit OIDs are rarely shared via emails (only the tip
commit from "git request-pull").

These blob OIDs make it easy to search for or lookup the
full emails which create them, or the blob itself once
it's fetched via git.

public-inbox indexes and allows querying specifically for blob
OIDs via dfpre:/dfpost: since June 2017.  As of Jan 2019,
public-inbox also supports recreating blobs out of patch emails
(querying internally with dfpre:/dfpost: and doing "git apply")

Searching on these blob OIDs also makes it easier to find
previous versions of the patch sets using any mail search
engine.

Future changes to public-inbox may allow generating custom
diffs out of any blobs it can find or recreate.

Most of this is pretty public-inbox-specific and would've
made some future changes to public-inbox much easier....
(if we had this from the start of range-diff).

Unfortunately, it won't help with cases where range-diffs
are already published, but range-diff isn't too old.

I'm also still learning my way around git's C internals, but
using patch.{old,new}_oid_prefix seems alright...

FIXME: tests, t3206 needs updating

Not-signed-off-by: Eric Wong <e@80x24.org>
---
 range-diff.c | 19 +++++++++++++++----
 1 file changed, 15 insertions(+), 4 deletions(-)

Comments

Johannes Schindelin Oct. 22, 2019, 7:18 p.m. UTC | #1
Hi Eric,


On Thu, 17 Oct 2019, Eric Wong wrote:

> (WIP, mostly stream-of-concious notes + reasoning)
>
> When using "git format-patch --range-diff", the pre and
> post-image blob OIDs are in each email, while the exact
> commit OIDs are rarely shared via emails (only the tip
> commit from "git request-pull").
>
> These blob OIDs make it easy to search for or lookup the
> full emails which create them, or the blob itself once
> it's fetched via git.
>
> public-inbox indexes and allows querying specifically for blob
> OIDs via dfpre:/dfpost: since June 2017.  As of Jan 2019,
> public-inbox also supports recreating blobs out of patch emails
> (querying internally with dfpre:/dfpost: and doing "git apply")
>
> Searching on these blob OIDs also makes it easier to find
> previous versions of the patch sets using any mail search
> engine.
>
> Future changes to public-inbox may allow generating custom
> diffs out of any blobs it can find or recreate.
>
> Most of this is pretty public-inbox-specific and would've
> made some future changes to public-inbox much easier....
> (if we had this from the start of range-diff).
>
> Unfortunately, it won't help with cases where range-diffs
> are already published, but range-diff isn't too old.

I guess your patch won't hurt.

As to recreating blobs from mails: Wow. That's quite a length you're
going, and I think it is a shame that you have to. If only every
contribution came accompanied with a pullable branch in a public
repository.

Instead, we will have to rely on your centralized, non-distributed
service...

Ciao,
Dscho

>
> I'm also still learning my way around git's C internals, but
> using patch.{old,new}_oid_prefix seems alright...
>
> FIXME: tests, t3206 needs updating
>
> Not-signed-off-by: Eric Wong <e@80x24.org>
> ---
>  range-diff.c | 19 +++++++++++++++----
>  1 file changed, 15 insertions(+), 4 deletions(-)
>
> diff --git a/range-diff.c b/range-diff.c
> index 7fed5a3b4b..85d2f1f58f 100644
> --- a/range-diff.c
> +++ b/range-diff.c
> @@ -118,13 +118,24 @@ static int read_patches(const char *range, struct string_list *list)
>  				die(_("could not parse git header '%.*s'"), (int)len, line);
>  			strbuf_addstr(&buf, " ## ");
>  			if (patch.is_new > 0)
> -				strbuf_addf(&buf, "%s (new)", patch.new_name);
> +				strbuf_addf(&buf, "%s (new %s)",
> +						patch.new_name,
> +						patch.new_oid_prefix);
>  			else if (patch.is_delete > 0)
> -				strbuf_addf(&buf, "%s (deleted)", patch.old_name);
> +				strbuf_addf(&buf, "%s (deleted %s)",
> +						patch.old_name,
> +						patch.old_oid_prefix);
>  			else if (patch.is_rename)
> -				strbuf_addf(&buf, "%s => %s", patch.old_name, patch.new_name);
> +				strbuf_addf(&buf, "%s => %s (%s..%s)",
> +						patch.old_name,
> +						patch.new_name,
> +						patch.old_oid_prefix,
> +						patch.new_oid_prefix);
>  			else
> -				strbuf_addstr(&buf, patch.new_name);
> +				strbuf_addf(&buf, "%s (%s..%s)",
> +						patch.new_name,
> +						patch.old_oid_prefix,
> +						patch.new_oid_prefix);
>
>  			free(current_filename);
>  			if (patch.is_delete > 0)
>
>
Konstantin Ryabitsev Oct. 22, 2019, 7:35 p.m. UTC | #2
On Tue, Oct 22, 2019 at 09:18:35PM +0200, Johannes Schindelin wrote:
>As to recreating blobs from mails: Wow. That's quite a length you're
>going, and I think it is a shame that you have to. If only every
>contribution came accompanied with a pullable branch in a public
>repository.

Trouble is, those public repositories are transient and may be gone soon 
after the pull request is performed. The goal is to be able to 
reconstruct full code history prior to when it is merged into the 
official repository.

>Instead, we will have to rely on your centralized, non-distributed
>service...

It's actually neither, because mirroring and distributing public-inbox 
repositories is already easy, and we hope to make it easier in the near 
future.

-K
Eric Wong Oct. 23, 2019, 1:56 a.m. UTC | #3
Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
> Hi Eric,
> 
> 
> On Thu, 17 Oct 2019, Eric Wong wrote:
> 
> > (WIP, mostly stream-of-concious notes + reasoning)
> >
> > When using "git format-patch --range-diff", the pre and
> > post-image blob OIDs are in each email, while the exact
> > commit OIDs are rarely shared via emails (only the tip
> > commit from "git request-pull").
> >
> > These blob OIDs make it easy to search for or lookup the
> > full emails which create them, or the blob itself once
> > it's fetched via git.
> >
> > public-inbox indexes and allows querying specifically for blob
> > OIDs via dfpre:/dfpost: since June 2017.  As of Jan 2019,
> > public-inbox also supports recreating blobs out of patch emails
> > (querying internally with dfpre:/dfpost: and doing "git apply")
> >
> > Searching on these blob OIDs also makes it easier to find
> > previous versions of the patch sets using any mail search
> > engine.
> >
> > Future changes to public-inbox may allow generating custom
> > diffs out of any blobs it can find or recreate.
> >
> > Most of this is pretty public-inbox-specific and would've
> > made some future changes to public-inbox much easier....
> > (if we had this from the start of range-diff).
> >
> > Unfortunately, it won't help with cases where range-diffs
> > are already published, but range-diff isn't too old.
> 
> I guess your patch won't hurt.

Cool, will update tests and resend.

> As to recreating blobs from mails: Wow. That's quite a length you're
> going, and I think it is a shame that you have to. If only every
 contribution came accompanied with a pullable branch in a public
> repository.

What Konstantin said about git repos being transient.
It wasn't too much work to recreate those blobs from
scratch since git-apply has done it since 2005.
Just add search :)

We could get around transient repos with automatic mirroring
bots which never deletes or overwrites anything published.
That includes preserving pre-force-push data in case of
force pushes.

> Instead, we will have to rely on your centralized, non-distributed
> service...

I'm curious how you came to believe that, since that's the
opposite of what public-inbox has always been intended to be.

The only thing that's centralized and not reproducible by
mirrors is the domain name (and I also have Tor .onion mirrors
with no dependency on ICAAN).  Memorable naming is a tricky
problem in decentralized systems, though...
Junio C Hamano Oct. 23, 2019, 4:11 a.m. UTC | #4
Eric Wong <e@80x24.org> writes:

> What Konstantin said about git repos being transient.
> It wasn't too much work to recreate those blobs from
> scratch since git-apply has done it since 2005.

;-)

> We could get around transient repos with automatic mirroring
> bots which never deletes or overwrites anything published.
> That includes preserving pre-force-push data in case of
> force pushes.
>
>> Instead, we will have to rely on your centralized, non-distributed
>> service...
>
> I'm curious how you came to believe that, since that's the
> opposite of what public-inbox has always been intended to be.

I think the (mis)perception comes from the fact that the website and
the newsfeed you give are both too easy to use and directly attract
end users, instead of enticing them to keep their own mirrors for
offline use.

Thanks for injecting dose of sanity.
Eric Wong Oct. 23, 2019, 10:06 a.m. UTC | #5
Eric Wong <e@80x24.org> wrote:
> Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
> > I guess your patch won't hurt.
> 
> Cool, will update tests and resend.

Turns out the t3206 "trivial reordering" case is way too noisy.
I might need to change diff.c to support this case; will update
in a few days or week.
Johannes Schindelin Oct. 24, 2019, 10:16 p.m. UTC | #6
Hi,

On Wed, 23 Oct 2019, Junio C Hamano wrote:

> Eric Wong <e@80x24.org> writes:
>
> > What Konstantin said about git repos being transient.
> > It wasn't too much work to recreate those blobs from
> > scratch since git-apply has done it since 2005.
>
> ;-)
>
> > We could get around transient repos with automatic mirroring
> > bots which never deletes or overwrites anything published.
> > That includes preserving pre-force-push data in case of
> > force pushes.
> >
> >> Instead, we will have to rely on your centralized, non-distributed
> >> service...
> >
> > I'm curious how you came to believe that, since that's the
> > opposite of what public-inbox has always been intended to be.
>
> I think the (mis)perception comes from the fact that the website and
> the newsfeed you give are both too easy to use and directly attract
> end users, instead of enticing them to keep their own mirrors for
> offline use.
>
> Thanks for injecting dose of sanity.

Maybe your dose of sanity can inject a statement about the case when
public-inbox.org/git differs from a mirror, and not in a
fast-forwardable way? What is the authoritative source of truth, then?

Ciao,
Dscho
Eric Wong Oct. 25, 2019, 12:12 a.m. UTC | #7
Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
> On Wed, 23 Oct 2019, Junio C Hamano wrote:
> > Eric Wong <e@80x24.org> writes:
> > > Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
> > >> Instead, we will have to rely on your centralized, non-distributed
> > >> service...
> > >
> > > I'm curious how you came to believe that, since that's the
> > > opposite of what public-inbox has always been intended to be.
> >
> > I think the (mis)perception comes from the fact that the website and
> > the newsfeed you give are both too easy to use and directly attract
> > end users, instead of enticing them to keep their own mirrors for
> > offline use.
> >
> > Thanks for injecting dose of sanity.
> 
> Maybe your dose of sanity can inject a statement about the case when
> public-inbox.org/git differs from a mirror, and not in a
> fast-forwardable way? What is the authoritative source of truth, then?

Why does authoritative source of truth matter?  My
anti-authoritarian ethos is what drew me to DVCS in the first
place.

If senders want to attest to the integrity of their messages;
they can sign, and/or publish a copy/log of their sent messages
on their homepage/social media/whatever.  That's up to THEM,
not anybody else.

If somebody wants to fork public-inbox.org/git and run
public-inbox-watch from their own Maildir, they're more than
welcome to.

If somebody wants to write their own importers since they don't
like the code I write, they are more than welcome to.  There's
already mail-archive.com, marc.info, news.gmane.org (which
public-inbox.org/git forked from) and some others.

Going farther, if people want to fork entire mailing lists and
communities, they should be able to do so.  I don't like mail
subscriber lists being centralized on any host, either.

I have never, ever asked anybody to trust me or public-inbox;
in fact, I've stated the opposite and will continue to do so.
Johannes Schindelin Oct. 25, 2019, 1:43 p.m. UTC | #8
Hi Eric,

On Fri, 25 Oct 2019, Eric Wong wrote:

> Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
> > On Wed, 23 Oct 2019, Junio C Hamano wrote:
> > > Eric Wong <e@80x24.org> writes:
> > > > Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
> > > >> Instead, we will have to rely on your centralized, non-distributed
> > > >> service...
> > > >
> > > > I'm curious how you came to believe that, since that's the
> > > > opposite of what public-inbox has always been intended to be.
> > >
> > > I think the (mis)perception comes from the fact that the website and
> > > the newsfeed you give are both too easy to use and directly attract
> > > end users, instead of enticing them to keep their own mirrors for
> > > offline use.
> > >
> > > Thanks for injecting dose of sanity.
> >
> > Maybe your dose of sanity can inject a statement about the case when
> > public-inbox.org/git differs from a mirror, and not in a
> > fast-forwardable way? What is the authoritative source of truth, then?
>
> Why does authoritative source of truth matter?  My
> anti-authoritarian ethos is what drew me to DVCS in the first
> place.
>
> If senders want to attest to the integrity of their messages;
> they can sign, and/or publish a copy/log of their sent messages
> on their homepage/social media/whatever.  That's up to THEM,
> not anybody else.
>
> If somebody wants to fork public-inbox.org/git and run
> public-inbox-watch from their own Maildir, they're more than
> welcome to.

I am _more_ than happy to rely on public-inbox.org/git. And I will never
kid myself about relying on a central service, is all.

> If somebody wants to write their own importers since they don't
> like the code I write, they are more than welcome to.  There's
> already mail-archive.com, marc.info, news.gmane.org (which
> public-inbox.org/git forked from) and some others.
>
> Going farther, if people want to fork entire mailing lists and
> communities, they should be able to do so.  I don't like mail
> subscriber lists being centralized on any host, either.
>
> I have never, ever asked anybody to trust me or public-inbox;
> in fact, I've stated the opposite and will continue to do so.

Well, too bad. I trust you, Eric. I do trust you and will probably
continue to trust you because I don't expect you to do anything, ever,
to break that trust. So far, you haven't disappointed me even a single
time, and we've concurrently been Git contributors for, sheesh, has it
already been almost 14 years? I have benefitted from your work greatly,
mostly via `git svn` in the olden days, and I hope that I could return
the favor every once in a while.

Without public-inbox.org/git, GitGitGadget would not be possible. My
scripts to map commits to mails and vice versa (mirrored to
https://github.com/gitgitgadget/git as `refs/notes/mail-to-commit` and
`commit-to-mail`) would remain a pipe dream of mine.

(Yes, yes, there are holes in that mapping, but even if I only have to
look up manually one out of 30 mails when I want to comment on a
specific commit, that already saves me so much time, not to mention
nerves.)

So please understand that I am deeply grateful that you came up with
these projects, in particular with public-inbox. It is a life saver. I
might not share all of your philosophy regarding centralized vs
decentralized, even so, what you did helps me multiple times every
single day.

Therefore: a heart-felt Thank You, I owe you more than one,
Dscho
diff mbox series

Patch

diff --git a/range-diff.c b/range-diff.c
index 7fed5a3b4b..85d2f1f58f 100644
--- a/range-diff.c
+++ b/range-diff.c
@@ -118,13 +118,24 @@  static int read_patches(const char *range, struct string_list *list)
 				die(_("could not parse git header '%.*s'"), (int)len, line);
 			strbuf_addstr(&buf, " ## ");
 			if (patch.is_new > 0)
-				strbuf_addf(&buf, "%s (new)", patch.new_name);
+				strbuf_addf(&buf, "%s (new %s)",
+						patch.new_name,
+						patch.new_oid_prefix);
 			else if (patch.is_delete > 0)
-				strbuf_addf(&buf, "%s (deleted)", patch.old_name);
+				strbuf_addf(&buf, "%s (deleted %s)",
+						patch.old_name,
+						patch.old_oid_prefix);
 			else if (patch.is_rename)
-				strbuf_addf(&buf, "%s => %s", patch.old_name, patch.new_name);
+				strbuf_addf(&buf, "%s => %s (%s..%s)",
+						patch.old_name,
+						patch.new_name,
+						patch.old_oid_prefix,
+						patch.new_oid_prefix);
 			else
-				strbuf_addstr(&buf, patch.new_name);
+				strbuf_addf(&buf, "%s (%s..%s)",
+						patch.new_name,
+						patch.old_oid_prefix,
+						patch.new_oid_prefix);
 
 			free(current_filename);
 			if (patch.is_delete > 0)