Message ID | b9fb52b8-8168-6bf0-9a72-1e6c44a281a5@oracle.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | email as a bona fide git transport | expand |
Hi Vegard, On Wed, Oct 16, 2019 at 12:22:54PM +0200, Vegard Nossum wrote: > (cross-posted to git, LKML, and the kernel workflows mailing lists.) > > Hi all, > > I've been following Konstantin Ryabitsev's quest for better development > and communication tools for the kernel [1][2][3], and I would like to > propose a relatively straightforward idea which I think could bring a > lot to the table. > > Step 1: > > * git send-email needs to include parent SHA1s and generally all the > information needed to perfectly recreate the commit when applied so > that all the SHA1s remain the same > > * git am (or an alternative command) needs to recreate the commit > perfectly when applied, including applying it to the correct parent > > Having these two will allow a perfect mapping between email and git; > essentially email just becomes a transport for git. There are a lot of > advantages to this, particularly that you have a stable way to refer to > a patch or commit (despite it appearing on a mailing list), and there > is no need for "changeset IDs" or whatever, since you can just use the > git SHA1 which is unique, unambiguous, and stable. I agree this would be great and have been missing this a number of times, eventhough I'm aware of git-send-pack/git-receive-pack. The text format is way more convenient for a lot of reasons. It could also help with Greg's idea of using the commit IDs to reference bugs, as such IDs could remain stable within a series before it is merged, and as such referenced in subsequent commit messages. It could also be useful to avoid losing notes related to a patch once it's merged. > Step 3: > > * Instead of describing a patchset in a separate introduction email, we > can create a merge commit between the parent of the first commit in > the series and the last and put the patchset description in the merge > commit [5]. This means the patchset description also gets to be part > of git history. > > (This would require support for git send-email/am to be able to send > and apply merge commits -- at least those which have the same tree as > one of the parents. This is _not_ yet supported in my proposed git > patches.) That's a good idea, as we've all seen long series with a very detailed description in patch 0 and much less context in subsequent patches, thus losing the context once merged. > * stable SHA1s means we can refer to previous versions of a patchset by > SHA1 rather than archive links. I propose a new changelog tag for > this, maybe "Previous:" or maybe even a full list of "v1:", "v2:", > etc. with a SHA1 or ref. Note that these SHA1s do *not* need to exist > in Linus's repo, but those who want can pull those branches from the > bot-maintained repo on git.kernel.org. For me this mainly brings the benefit of finally having a unique identifier for multiple iterations of a patchset. It then becomes easier to use this identifier to designate the functional work, regardless of the number of updates it gets. Of course it's never that black and white since such work may itself merge multiple other patchsets but for most use cases it can help. Willy
Hi Willy, Vegard. On Wed, Oct 16, 2019 at 01:10:09PM +0200, Willy Tarreau wrote: > Hi Vegard, > > On Wed, Oct 16, 2019 at 12:22:54PM +0200, Vegard Nossum wrote: > > (cross-posted to git, LKML, and the kernel workflows mailing lists.) > > > > Hi all, > > > > I've been following Konstantin Ryabitsev's quest for better development > > and communication tools for the kernel [1][2][3], and I would like to > > propose a relatively straightforward idea which I think could bring a > > lot to the table. > > > > Step 1: > > > > * git send-email needs to include parent SHA1s and generally all the > > information needed to perfectly recreate the commit when applied so > > that all the SHA1s remain the same > > > > * git am (or an alternative command) needs to recreate the commit > > perfectly when applied, including applying it to the correct parent > > > > Having these two will allow a perfect mapping between email and git; > > essentially email just becomes a transport for git. There are a lot of > > advantages to this, particularly that you have a stable way to refer to > > a patch or commit (despite it appearing on a mailing list), and there > > is no need for "changeset IDs" or whatever, since you can just use the > > git SHA1 which is unique, unambiguous, and stable. I wonder if it'd be also possible to then embed gpg signatures over send-mail payloads so as they can be transparently transferred to the commit. -Santiago
Hi Vegard, On 16/10/19 12:22PM, Vegard Nossum wrote: > (cross-posted to git, LKML, and the kernel workflows mailing lists.) > > Hi all, > > I've been following Konstantin Ryabitsev's quest for better development > and communication tools for the kernel [1][2][3], and I would like to > propose a relatively straightforward idea which I think could bring a > lot to the table. > > Step 1: > > * git send-email needs to include parent SHA1s and generally all the > information needed to perfectly recreate the commit when applied so > that all the SHA1s remain the same > > * git am (or an alternative command) needs to recreate the commit > perfectly when applied, including applying it to the correct parent > > Having these two will allow a perfect mapping between email and git; > essentially email just becomes a transport for git. There are a lot of > advantages to this, particularly that you have a stable way to refer to > a patch or commit (despite it appearing on a mailing list), and there > is no need for "changeset IDs" or whatever, since you can just use the > git SHA1 which is unique, unambiguous, and stable. FWIW, I like the idea. > As a rough proof of concept I've attached 3 git patches which implement > this. There are issues to work out like exact format, encodings, mail > mangling, error handling, etc., but hopefully the git community can > help out here. (Improvement suggestions are welcome!) > > Step 2: > > * A bot that follows LKML (and other lists) and imports patchsets into > a git repository hosted on git.kernel.org > > * The bot can add git notes with URLs to lore (and/or other mailing > list archives) and store them in e.g. refs/notes/lore, > refs/notes/lkml, etc. > > (For those who don't use git notes yet: they are essentially small > bits of information you can add to a commit without changing its SHA1, > and you can configure tools like 'git log' to show these at the bottom > of a commit. Notes can also exist in a repo completely separate from > the commits they attach data to, so there is _zero_ overhead for those > who don't want to use this.) > > * Maintainers can either pull patchsets directly from this bot- > maintained repo OR they can continue to apply patches from their inbox > (the result should be the same either way) OR they can continue in the > old-style process (at least for a while) and just not have the > benefits of the new process. > > Step 3: > > * Instead of describing a patchset in a separate introduction email, we > can create a merge commit between the parent of the first commit in > the series and the last and put the patchset description in the merge > commit [5]. This means the patchset description also gets to be part > of git history. > > (This would require support for git send-email/am to be able to send > and apply merge commits -- at least those which have the same tree as > one of the parents. This is _not_ yet supported in my proposed git > patches.) Can sending merge commits via email work with your proposed '--exact'? Say I'm the maintainer, and you fork off a feature branch off my master, add a few commits that introduce your new feature, and then merge it into my master, and then send those commits, including the merge. Now in that scenario, say the tip of your feature branch was X and the tip of my 'master' was Y when you sent your patches. Now while your patches are still being reviewed, I merge in some other branch creating a merge commit Z on my master. Now your merge's first parent was Y and second parent was X. But now the tip of my master is Z, so the first parent of your merge needs to be Z, not Y. Changing the first parent would mean a different commit hash. So, the way I see it, your proposed merge commits via email can't work with '--exact'. Do I understand this situation correctly? Am I missing something? Maybe a better idea would be to allow 'am' to create these merges locally when applying the patches. That would mean having to merge the separate branch along with applying the patches, otherwise the cover letter text is lost. This might not be something everyone wants. I for one don't. When I apply patches via 'am', I first keep them on a separate branch, test them out, and then merge them into 'master'. So a yet another alternative could be to save the cover letter as the branch description. This branch description can then be used to generate the merge message. IIRC, Denton Liu is working on generating the cover letter text from branch description, so this feature would be like its inverse. > * stable SHA1s means we can refer to previous versions of a patchset by > SHA1 rather than archive links. I propose a new changelog tag for > this, maybe "Previous:" or maybe even a full list of "v1:", "v2:", > etc. with a SHA1 or ref. Note that these SHA1s do *not* need to exist > in Linus's repo, but those who want can pull those branches from the > bot-maintained repo on git.kernel.org. > > Advantages: > > - we can keep using email to post patches/patchsets > > - the process is opt-in (but should be encouraged) for both authors and > maintainers, and the transition can happen over time > > - there is a central repo for convenience, but it is not necessary for > development to happen and is not a single point of failure -- it's > more like Linus's repo and can be moved or even replicated from > scratch by somebody else simply by having mailing list archives > > - allows quick lookup of patch/patchset <-> email discussion within git > > - allows diffing between versions of a single logical patchset > > - patchset descriptions naturally become part of the changelog that ends > up in Linus's tree > > Disadvantages: > > - requires patching git > > - requires a bot to continuously create branches for patchsets sent to > mailing lists > > - increased storage/bandwidth for git.kernel.org (?) > > - may need a couple of new wrapper scripts to automate patchset > construction/versioning Just to play the devil's advocate, even though I'm in favor of something like this, I'll add in another disadvantage: - The maintainer can't make small edits before pushing the changes out. I do that every now and then for git-gui, and Junio does that sometimes for Git. I don't know if the folks over at Linux do something like this, but using '--exact' would mean that contributors would have to send a re-roll for even minor changes. Its mostly an inconvenience instead of a problem, but I thought I'd point it out. > Thoughts? One more question, not strictly related to your proposal: right now, when I apply patches from contributors, I pass '-s' to 'am', so the applied commit would have my sign-off. The way I see it, that sign-off is supposed to signify that I have the right to push out the commit to the "main" repo, just like the author's sign-off means that they have the right to send me that commit. Looking at git.git, I notice that Junio does the same. The new '--exact' would be incompatible with '-s', correct (since the commit message has changed, the SHA1 would also change)? So firstly, make sure you account for something like that if you haven't already (I haven't found the time to read your patches yet). Secondly, is it all right for the maintainer to just not sign-off on the commits they push out?
Hi, A few small points. Vegard Nossum wrote: > * git am (or an alternative command) needs to recreate the commit > perfectly when applied, including applying it to the correct parent Interesting. "git format-patch" has a --base option to do some of what you're looking for, for the sake of snowpatch <https://github.com/ruscur/snowpatch>. Though it's not exactly the same thing you mean. We also discussed sending merge commits by mail recently in the virtual git committer summit[1]. Of course, the devil is in the details. It's straightforward to use "git bundle" to use mail as a Git transport today, but presumably you also want the ability to perform reviews along the way and that's not so easy with a binary format. Do you have more details on what you'd want the format to look like, particularly for merge commits? [...] > there > is no need for "changeset IDs" or whatever, since you can just use the > git SHA1 which is unique, unambiguous, and stable. In [2] the hope was for some identifier that is preserved by "git rebase" and "git commit --amend" (so that you can track the evolution of a change as the author improves it in response to reviews). Is that the conversation you're alluding to? [...] > Disadvantages: > > - requires patching git That's not a disadvantage. It means get to work with the Git project, which is a welcoming bunch of people, working on userspace (seeing how the other half lives), and improving the lives of everyone using Git. [...] > Date: Sat, 5 Oct 2019 16:15:59 +0200 > Subject: [PATCH 1/3] format-patch: add --complete > > Include the raw commit data between the changelog and the diffstat. Oh! I had missed this on first reading because it was in an attachment. I have mixed feelings. Can you say a bit more about the advantages and disadvantages relative to sending a git bundle? What happens if a mail client or a box along the way mangles whitespace in the commit message? Happy hacking, Jonathan [1] https://public-inbox.org/git/nycvar.QRO.7.76.6.1909261253400.15067@tvgsbejvaqbjf.bet/ [2] https://lore.kernel.org/ksummit-discuss/CAD=FV=UPjPpUyFTPjF-Ogzj_6LJLE4PTxMhCoCEDmH1LXSSmpQ@mail.gmail.com/
Vegard Nossum <vegard.nossum@oracle.com> writes: > Step 1: > > * git send-email needs to include parent SHA1s and generally all the > information needed to perfectly recreate the commit when applied so > that all the SHA1s remain the same > > * git am (or an alternative command) needs to recreate the commit > perfectly when applied, including applying it to the correct parent You can record and convey the commit object name a series is meant to be applied on top already, and it in general is a good way to give a wider context in order to explain and justify the series. On the other hand, "all the information needed to recreate..." is not very useful. If you want the commit object to be exactly what you want to see at the tip of the end result, you are better off asking your upstream to pull. Using e-mail for that makes you and project participants give up a lot of benefits the workflow based on e-mail gives you, the biggest of which is the ease of giving suggestions for improvements. Once you insist "perfectly recreate the commit", you are not willing to take any input from the sidelines---worse yet, you are even dictating when the upstream runs "git am" to turn them into commits, and do so without reading the patches (there is no point reviewing as the person who runs "git am" is not even allowed to fix typo or make obvious fixes to the code, which will fail to perfectly recreate the commit). In short, one should resist temptation to bring up "perfect reproduction" when one talks about e-mail workflow. > * Instead of describing a patchset in a separate introduction email, we > can create a merge commit between the parent of the first commit in > the series and the last and put the patchset description in the merge > commit [5]. This means the patchset description also gets to be part > of git history. This has been done with tools around git-core, and merits a more official support. When merging a topic, it is a good idea to explain in the merge commit that brings in the topic to the mainline what the topic is about, and at least in the past few years Linus and other maintainers both within and outside the kernel have been doing so. The cover-letter material in [PATCH 00/NN] obviously can help those integrators when they write the merge message. > (This would require support for git send-email/am to be able to send > and apply merge commits -- at least those which have the same tree as > one of the parents. This is _not_ yet supported in my proposed git > patches.) This does not require much from format-patch and am. All you need to do is to ensure that they can handle an empty commit. What you need more is a support in merge. The outline for the workflow would go like this: * The contributor prepares an N patch series 1/N..N/N on a single topic branch. * The summary of the series, the message that is meant to help the integrator, is recorded as (N+1)th commit at the tip of the topic branch, as an empty commit (i.e. a commit that records the same tree as its parent). * "git format-patch" is taught, when told to prepare the patch e-mails from such a topic branch, to notice the unusual "an empty commit at the tip" layout, and turn that into 0/N of the message. * "git am" is taught a new option to cap a topic branch made from patches 1/N..N/N from the incoming mbox with an extra empty commit, whose message is taken from the 0/N cover-letter material, to recreate what the contributor had in the second step above. * "git merge" is taught, when told to merge a topic branch, to notice the unusual "an empty commit at the tip" layout, and (1) merge topic~1 instead to excise the empty commit itself, (2) take the log message from the empty commit at the tip and use it to help prepare the log message of the merge.
On 10/16/19 5:00 PM, Pratyush Yadav wrote: > On 16/10/19 12:22PM, Vegard Nossum wrote: > Just to play the devil's advocate, even though I'm in favor of something > like this, I'll add in another disadvantage: > > - The maintainer can't make small edits before pushing the changes out. > > I do that every now and then for git-gui, and Junio does that sometimes > for Git. I don't know if the folks over at Linux do something like this, > but using '--exact' would mean that contributors would have to send a > re-roll for even minor changes. Its mostly an inconvenience instead of a > problem, but I thought I'd point it out. I don't think this is a problem. The point of 'git am --exact' is not for maintainers per se (although they should use it if they don't have any manual changes to make), but for the bot that keeps track of patchsets submitted via email. The important part is that there is a git reference to the patchset that was submitted in the patchset that was merged. You could see it as the maintainer rolling a new version of the patchset locally and merging that instead of merging what was submitted directly. Of course, this relies strongly on actually having (correct) sha1 references to previous versions inside the changelog. In my original idea, this reference would only appear inside the merge commit that binds the patchset together to minimise churn, although maybe it is feasible to also append it to each patch -- in that case, the "patchset" command from my first email is not sufficient to create a new version of a patchset. > One more question, not strictly related to your proposal: right now, > when I apply patches from contributors, I pass '-s' to 'am', so the > applied commit would have my sign-off. The way I see it, that sign-off > is supposed to signify that I have the right to push out the commit to > the "main" repo, just like the author's sign-off means that they have > the right to send me that commit. > > Looking at git.git, I notice that Junio does the same. The new '--exact' > would be incompatible with '-s', correct (since the commit message has > changed, the SHA1 would also change)? So firstly, make sure you account > for something like that if you haven't already (I haven't found the time > to read your patches yet). Secondly, is it all right for the maintainer > to just not sign-off on the commits they push out? In the Linux kernel at least, only the front-line maintainers add their signoffs; higher-level maintainers take pull requests and don't add their own sign-offs. Only Linus (and Greg, perhaps) has commit rights to the main repository, but he only signs off on the patches that he either writes himself or applies directly from email. In any case, I don't think this is a concern because of what I wrote above -- somebody who wants to add their signoff can do it by essentially rolling a new version of the patchset that has the signoff, but refers to the sha1 that was submitted to them by the patchset author so that you can still find the original commits (without the signoffs) and reviews/discussions. I don't want to create extra work for maintainers, so I think any solution should involve having the existing git tools/workflows do the right thing automatically. How about this? If 'git am' or 'git am -s' (without --exact) finds an email patch where the exact commit metadata is present, it automatically appends a line to the changelog saying where it was taken from: Submitted-as: 111122223333444455556666777788889999aaaa or Applied-from: 111122223333444455556666777788889999aaaa Although, again, this would modify the changelog of the patch itself rather than just the changelog of the merge commit... Maybe this is enough and we can have the "patchset" command also add references to previous versions of each patch rather than the patchset as a whole. Vegard
On 10/16/19 10:57 PM, Jonathan Nieder wrote: > Hi, > > A few small points. > > Vegard Nossum wrote: > >> * git am (or an alternative command) needs to recreate the commit >> perfectly when applied, including applying it to the correct parent > > Interesting. "git format-patch" has a --base option to do some of > what you're looking for, for the sake of snowpatch > <https://github.com/ruscur/snowpatch>. Though it's not exactly the > same thing you mean. Yes, --base is great for importing email patches into git, but does not allow resulting commits to have the same sha1 (because it doesn't have the complete author/committer information, mainly), so it doesn't do enough for what I want with this proposal. > We also discussed sending merge commits by mail recently in the > virtual git committer summit[1]. > > Of course, the devil is in the details. It's straightforward to use > "git bundle" to use mail as a Git transport today, but presumably you > also want the ability to perform reviews along the way and that's not > so easy with a binary format. Do you have more details on what you'd > want the format to look like, particularly for merge commits? Yes, one of the goals is to continue to use git-send-email and be able to view and review patches on a mailing list with minimal intrusion to existing processes. I did not envision supporting merge commits where the resulting tree is different from all of its parents, which means any merge commits run through 'git-format-patch --complete' would just have multiple "parent" lines and no diff where the diff would normally be. >> is no need for "changeset IDs" or whatever, since you can just use the >> git SHA1 which is unique, unambiguous, and stable. > > In [2] the hope was for some identifier that is preserved by "git > rebase" and "git commit --amend" (so that you can track the evolution > of a change as the author improves it in response to reviews). Is > that the conversation you're alluding to? Yes. (Not the specific email/thread you linked, but yes, this is the general conversation about having stable identifiers I'm referring to.) > > [...] >> Disadvantages: >> >> - requires patching git > > That's not a disadvantage. It means get to work with the Git project, > which is a welcoming bunch of people, working on userspace (seeing how > the other half lives), and improving the lives of everyone using Git. True! :-) >> Date: Sat, 5 Oct 2019 16:15:59 +0200 >> Subject: [PATCH 1/3] format-patch: add --complete >> >> Include the raw commit data between the changelog and the diffstat. > > Oh! I had missed this on first reading because it was in an > attachment. > > I have mixed feelings. Can you say a bit more about the advantages > and disadvantages relative to sending a git bundle? What happens if a > mail client or a box along the way mangles whitespace in the commit > message? Yes, as we both said above: git bundle is not human readable and does not lend itself to reviews in mail clients or on mailing lists. Any kind of mangling is a serious concern, and my thoughts are: - The _diff_ itself should already be safe from mangling. If this were not the case, then sending patches by email would be completely unsafe and would need to be fixed somehow anyway. - I think I remember seeing something in either 'git am' or 'git mailinfo' about format=flowed apparently allowing whitespace to disappear from the line endings. - Isolated problems like that could be preemptively fixed (or at least warned about) on the submitter side by e.g. warning about potential issues when committing or running format-patch/send-email) - If some mail software is susceptible to mangling patches, then I think it would great to have a mechanism for detecting that -- which "git-format-patch --complete" would be! :-) - It is true that the commit message itself (and perhaps particularly people's names) is especially vulnerable to mangling and/or reencoding (I noticed that 'git am' seems to convert to utf8 before committing). I honestly don't know if this would be a problem in practice, as I don't know too much about mail software and encodings. If most people use format-patch/send-email to send patches, then maybe it's not actually a problem because everything is either kept as utf-8 to start with or encoded/decoded consistently across the git userbase? - I was thinking of hex-encoding the raw bytes of the author and committer lines of the extra commit metadata in the final email to keep everything as ASCII and avoid character set conversions. It is true that this does not stop mangling of the changelog... - I suspect that if mangling was a problem in practice, then we would have seen it already and the solution to it is not specific to what I'm proposing here. I'd love for somebody who is more knowledgeable about email and encoding to chime in here. It is definitely one of the biggest potential problems and it would be good to approach it in a way that doesn't require lots of red tape after it has already been implemented. Vegard > > Happy hacking, > Jonathan > > [1] https://public-inbox.org/git/nycvar.QRO.7.76.6.1909261253400.15067@tvgsbejvaqbjf.bet/ > [2] https://lore.kernel.org/ksummit-discuss/CAD=FV=UPjPpUyFTPjF-Ogzj_6LJLE4PTxMhCoCEDmH1LXSSmpQ@mail.gmail.com/ >
On Thu, Oct 17, 2019 at 02:23:58PM +0200, Vegard Nossum wrote: > Of course, this relies strongly on actually having (correct) sha1 > references to previous versions inside the changelog. In my original > idea, this reference would only appear inside the merge commit that > binds the patchset together to minimise churn, although maybe it is > feasible to also append it to each patch -- in that case, the "patchset" > command from my first email is not sufficient to create a new version of > a patchset. This also relies on the base of the commit actually being a public SHA1. Sometimes developers will cherry-pick in a patch that they need so that the kernel will actually *boot* (or otherwise fix problems that have been fixed in other subsystems, but not yet landed in -rc2 or -rc3). Of course, we could tell people that they should always create their patches off of the last stable version (but then there may have been changes pulled in via the last merge window that makes their patch not apply), or they could be told to develop against -rc2 or -rc3, and then cherry pick the required fix-up patches on top of -rc2 and -rc3, but then they have to do a lot more rebuilding. So there are no perfect solutions here, and while in the ideal world, -rc2 and -rc3 should be perfectly stable enough for developers so that they never need to manually patch in stablization patches, we need to live in the real world. I believe that Darrick told me that in the previous development cycle, he had to wait until -rc4 before the tree was stable enough for him to start building xfs patches on top mainline. (This is also true for this development cycle if you enable CONFIG_KMEMLEAK, although fortunately, the workaround that worked for me was to just CONFIG_KMEMLEAK --- although of course, if I do have to run a KMEMLEAK test run, I'll need to cherry-pick the fix which landed this week on top of the ext4 git tree.) What this all might mean is that sometimes it will make sense to allow the user to override the base commit so such stablization patches can be elided. Of course, we could force the user to create a separate branch and rebase, but can be quite painful and slow --- and they won't be able to test the rebased branch anyway, unless we then want to tell them to cherry pick the stablization patches on top, and then remove them before running "git send-email". - Ted
On 10/17/19 5:17 AM, Junio C Hamano wrote: > Vegard Nossum <vegard.nossum@oracle.com> writes: > >> Step 1: >> >> * git send-email needs to include parent SHA1s and generally all the >> information needed to perfectly recreate the commit when applied so >> that all the SHA1s remain the same >> >> * git am (or an alternative command) needs to recreate the commit >> perfectly when applied, including applying it to the correct parent > > You can record and convey the commit object name a series is meant > to be applied on top already, and it in general is a good way to > give a wider context in order to explain and justify the series. > > On the other hand, "all the information needed to recreate..." is > not very useful. If you want the commit object to be exactly what > you want to see at the tip of the end result, you are better off > asking your upstream to pull. Using e-mail for that makes you and > project participants give up a lot of benefits the workflow based on > e-mail gives you, the biggest of which is the ease of giving > suggestions for improvements. Once you insist "perfectly recreate > the commit", you are not willing to take any input from the > sidelines---worse yet, you are even dictating when the upstream > runs "git am" to turn them into commits, and do so without reading > the patches (there is no point reviewing as the person who runs "git > am" is not even allowed to fix typo or make obvious fixes to the > code, which will fail to perfectly recreate the commit). > > In short, one should resist temptation to bring up "perfect > reproduction" when one talks about e-mail workflow. Please see what I wrote to Pratyush Yadav here: https://public-inbox.org/git/a1c33600-14e6-be37-c026-8d8b8e4bad92@oracle.com/ TL;DR: the goal is not necessarily for maintainers to be able to merge the patchset with the same SHA1 that the submitter had, but for the patchset to have a definite SHA1 that lives in git, and which can be used by all the participants -- submitter, reviewers, bots (including potentially testing/CI infrastructure), and maintainers. I am definitely not proposing to get rid of the email workflow -- on the contrary, this it the workflow I want to preserve! :-) The "workflows" mailing list was created for the purpose of discussing this topic (in the context of Linux kernel development) and right now there are many proposals that either completely cut out email or reduce it to something like pull requests. My proposal keeps almost everything the same, except for a few lines of extra metadata before the actual diff. (I will answer the rest of your email separately.) Vegard
On 10/17/19 3:11 PM, Theodore Y. Ts'o wrote: > On Thu, Oct 17, 2019 at 02:23:58PM +0200, Vegard Nossum wrote: >> Of course, this relies strongly on actually having (correct) sha1 >> references to previous versions inside the changelog. In my original >> idea, this reference would only appear inside the merge commit that >> binds the patchset together to minimise churn, although maybe it is >> feasible to also append it to each patch -- in that case, the "patchset" >> command from my first email is not sufficient to create a new version of >> a patchset. > > This also relies on the base of the commit actually being a public > SHA1. Sometimes developers will cherry-pick in a patch that they need > so that the kernel will actually *boot* (or otherwise fix problems > that have been fixed in other subsystems, but not yet landed in -rc2 > or -rc3). > > Of course, we could tell people that they should always create their > patches off of the last stable version (but then there may have been > changes pulled in via the last merge window that makes their patch not > apply), or they could be told to develop against -rc2 or -rc3, and > then cherry pick the required fix-up patches on top of -rc2 and -rc3, > but then they have to do a lot more rebuilding. > > So there are no perfect solutions here, and while in the ideal world, > -rc2 and -rc3 should be perfectly stable enough for developers so that > they never need to manually patch in stablization patches, we need to > live in the real world. I believe that Darrick told me that in the > previous development cycle, he had to wait until -rc4 before the tree > was stable enough for him to start building xfs patches on top > mainline. > > (This is also true for this development cycle if you enable > CONFIG_KMEMLEAK, although fortunately, the workaround that worked for > me was to just CONFIG_KMEMLEAK --- although of course, if I do have to > run a KMEMLEAK test run, I'll need to cherry-pick the fix which landed > this week on top of the ext4 git tree.) > > What this all might mean is that sometimes it will make sense to allow > the user to override the base commit so such stablization patches can > be elided. Of course, we could force the user to create a separate > branch and rebase, but can be quite painful and slow --- and they > won't be able to test the rebased branch anyway, unless we then want > to tell them to cherry pick the stablization patches on top, and then > remove them before running "git send-email". Good points. I suspect that you should almost always be able to find a good base revision to build and test your changes on. In your example, couldn't Darrick simply base his xfs work on the latest xfs branch that was pulled by Linus? That should be up to date with all things xfs without having any of the things that made Linus's tree not work for him. Otherwise, you could apply the stabilisation patches and then do your final testing in a branch that merges that with your patchset, like so: rc1 o -----> fixup A ------> fixup B ---->o merge (tested) (base) \ / \ / ---> patch 001 --> patch 002 -->o patchset (submitted) It does not seem too hard to me, and it should be pretty safe from a test-what-you-ship point of view assuming the fixups and your patches really are independent. I think the more difficult problem to solve might be how to ensure that the base commit is actually public/reachable when this is the intention. A bot watching the mailing list could always respond with a "Hey, I don't have that, could you rebase the series or push it somewhere?". But it would be even better if git could tell you when you're about to submit a patch. Maybe something like: git send-email --ensure-reachable-from [remote] rev^^.. In the worst case, I guess the base commit will just not be available -- the email will still have a sha1 on it, though, and which might still be usable as an identifier for the patch/patchset. If not, it's still not worse than the current workflow (which would still work). Vegard
On Thu, Oct 17, 2019 at 04:01:33PM +0200, Vegard Nossum wrote: > > In your example, couldn't Darrick simply base his xfs work on the latest > xfs branch that was pulled by Linus? That should be up to date with all > things xfs without having any of the things that made Linus's tree not > work for him. Sure, but sometimes there are changes in subsystems which the file system depends upon that were also merged by Linus. So for example, the ext4 branch might be based on v5.3-rc4, and gets pulled after v5.3 is released, along with a huge number of other subsystem trees, so the delta between v5.3 and v5.3-rc1 is ***huge***. So while I could base my development on my previous ext4.git branch (based off of v5.3-rc4), at *some* point I need to be able to sync up with upstream. And the usual way to do this is to start a new development branch based on (for example) v5.4-rc2, or in some cases v5.4-rc4. We could keep the development branch on based off of v5.3-rc4, and wait until things stablize, and *then* merge in v5.4-rcX, when v5.4-rcX is finally stable, but that makes for a more complex merge, and so it means that things like "git log origin..master" don't really work any more. So the preferred development practice is very much.... rc2 o --> patch 1 --> patch 2 --> ... --> patch N (origin) (master) Where the "master" branch gets merged into the rewinding "dev" branch (which works much like git's pu branch), and where the "master" branch is what Linus will merge at the next merge window. > Otherwise, you could apply the stabilisation patches and then do your > final testing in a branch that merges that with your patchset, like so: > > rc1 o -----> fixup A ------> fixup B ---->o merge (tested) > (base) \ / > \ / > ---> patch 001 --> patch 002 -->o patchset (submitted) I cloud do that, but remember that the checked out kernel tree is about a gigabyte (this assumes using git clone --shared, so it doesn't include the git pack files, and this is source only; the object files are another 2.6 GB). I could keep separate checked out trees, and separate build trees, but that burns a lot of disk/SSD space. Or I could switch back and forth by using "git checkout" between the development branch and the branch with the stablization patches, but then I'm constantly having to rebuild the object files, and ccache only helps so much. So it's much simpler to put the fixup patches at the on top of the origin, and then mail them out without having to play git branch rebasing gymnastics. When the patch series is finally ready to roll, then the maintainer will apply the patch series on a clean branch, since hopefully by then -rc3 or -rc4 is finally stable enough to use as an origin point. So the idea is that developer might be sending out revisions of their patches on top of -rc1 plus fixup patches, but then the final version of the patch series, after a few rounds of review, gets applied on top of -rc3 or -rc4. > I think the more difficult problem to solve might be how to ensure that > the base commit is actually public/reachable when this is the intention. > A bot watching the mailing list could always respond with a "Hey, I > don't have that, could you rebase the series or push it somewhere?". But > it would be even better if git could tell you when you're about to > submit a patch. Maybe something like: > > git send-email --ensure-reachable-from [remote] rev^^.. > > In the worst case, I guess the base commit will just not be available -- > the email will still have a sha1 on it, though, and which might still be > usable as an identifier for the patch/patchset. If not, it's still not > worse than the current workflow (which would still work). ... or what we can do is allow the developer to specify the intended base --- e.g., -rc1, even though his patchset was against "-rc1 plus fixups". - Ted
On Thu, 17 Oct 2019 16:01:33 +0200 Vegard Nossum <vegard.nossum@oracle.com> wrote: > In your example, couldn't Darrick simply base his xfs work on the latest > xfs branch that was pulled by Linus? That should be up to date with all > things xfs without having any of the things that made Linus's tree not > work for him. Sure, but why? I thought this whole exercise is to make the process easier. This seems to be making it more complex. Now we are going to be demanding submitters to be basing their work on a specific (older) commit. I always tell people that submit to me, to base off of one of Linus's latest tags. That's what I do. -- Steve
On Wed, Oct 16, 2019 at 10:45:19AM -0400, Santiago Torres Arias wrote: > Hi Willy, Vegard. > > On Wed, Oct 16, 2019 at 01:10:09PM +0200, Willy Tarreau wrote: > > Hi Vegard, > > > > On Wed, Oct 16, 2019 at 12:22:54PM +0200, Vegard Nossum wrote: > > > (cross-posted to git, LKML, and the kernel workflows mailing lists.) > > > > > > Hi all, > > > > > > I've been following Konstantin Ryabitsev's quest for better development > > > and communication tools for the kernel [1][2][3], and I would like to > > > propose a relatively straightforward idea which I think could bring a > > > lot to the table. > > > > > > Step 1: > > > > > > * git send-email needs to include parent SHA1s and generally all the > > > information needed to perfectly recreate the commit when applied so > > > that all the SHA1s remain the same > > > > > > * git am (or an alternative command) needs to recreate the commit > > > perfectly when applied, including applying it to the correct parent > > > > > > Having these two will allow a perfect mapping between email and git; > > > essentially email just becomes a transport for git. There are a lot of > > > advantages to this, particularly that you have a stable way to refer to > > > a patch or commit (despite it appearing on a mailing list), and there > > > is no need for "changeset IDs" or whatever, since you can just use the > > > git SHA1 which is unique, unambiguous, and stable. > > I wonder if it'd be also possible to then embed gpg signatures over > send-mail payloads so as they can be transparently transferred to the > commit. That's a crazy idea. It would be nice if we could do that, I like it :) greg k-h
On Thu, Oct 17, 2019 at 01:43:43PM -0700, Greg KH wrote: >> I wonder if it'd be also possible to then embed gpg signatures over >> send-mail payloads so as they can be transparently transferred to the >> commit. > >That's a crazy idea. It would be nice if we could do that, I like it >:) It could only possibly work if nobody ever adds their own "Signed-Off-By" or any other bylines. I expect this is a deal-breaker for most maintainers. -K
On Thu, Oct 17, 2019 at 04:45:32PM -0400, Konstantin Ryabitsev wrote: > On Thu, Oct 17, 2019 at 01:43:43PM -0700, Greg KH wrote: > > > I wonder if it'd be also possible to then embed gpg signatures over > > > send-mail payloads so as they can be transparently transferred to the > > > commit. > > > > That's a crazy idea. It would be nice if we could do that, I like it :) > > It could only possibly work if nobody ever adds their own "Signed-Off-By" or > any other bylines. I expect this is a deal-breaker for most maintainers. Yeah it is :( But, if we could just have the signature on the code change, not the changelog text, that would help with that issue. thanks, rgeg k-h
On Thu, Oct 17, 2019 at 06:30:29PM -0700, Greg KH wrote: >> It could only possibly work if nobody ever adds their own >> "Signed-Off-By" or >> any other bylines. I expect this is a deal-breaker for most maintainers. > >Yeah it is :( > >But, if we could just have the signature on the code change, not the >changelog text, that would help with that issue. We totally should, and I even mused on how we would do that here: https://public-inbox.org/git/20190910121324.GA6867@pure.paranoia.local/ However, since git's PGP signatures are made for the content in the actual commit record (tree hash, parent, author, commit message, etc), the only way we could preserve them between the email and the git tree is if we never modify any of that data. The SOB and other trailers would have to only be applied to the merge commit, or migrate into commit notes. -K
Vegard Nossum <vegard.nossum@oracle.com> wrote: <snip> > Disadvantages: > > - requires patching git The bigger disadvantage is this won't work with a historical patch series (and some folks stay on ancient git). But maybe that window for that is only a few years... The toughest part right now for public-inbox is trying to make sense of --range-diff (supporting --interdiff would be easy, I think...). Also, we've only had --range-diff for a year or so. Your proposal would make things 100% easier for public-inbox to deal with future --range-diff uses, however :) > - requires a bot to continuously create branches for patchsets sent to > mailing lists Not necessarily, being able to search on commit OIDs would be pretty handy itself for dealing with --range-diff output in public-inbox, so there's no real need to actually make the branch in git. I also have a parallel solution in the works to make --range-diff output more amenable for search engines like public-inbox by adding blob OIDs to its output: https://public-inbox.org/git/20191017121045.GA15364@dcvr/ I shall call myself an "SEO expert" from now on :> > Thoughts? Pretty much the same concerns others brought up around exactness and working on top of cherry-picks. > PS: Eric Wong described something that comes quite close to this idea, but > AFAICT without actually recreating commits exactly. I've included the link > for completeness. [4] > [4]: https://lore.kernel.org/workflows/20191008003931.y4rc2dp64gbhv5ju@dcvr/ My plan is to work on interdiff support in the next week or so once bugs are fixed and public-inbox v1.2 is out the door. Not sure about range-diff and reverse-mapping blobs -> trees -> commits, but searching on "git patch-id --stable" output is also on the table. PS: Attached patches: I have nothing against using MIME for those, (not speaking for anybody else). public-inbox needs to handle those better w.r.t search indexing linkification. And then I found some bugs for --reindex corner cases which I'm still working on :x
On Thu, Oct 17, 2019 at 09:54:47PM -0400, Konstantin Ryabitsev wrote: > On Thu, Oct 17, 2019 at 06:30:29PM -0700, Greg KH wrote: > > > It could only possibly work if nobody ever adds their own > > > "Signed-Off-By" or > > > any other bylines. I expect this is a deal-breaker for most maintainers. > > > > Yeah it is :( > > > > But, if we could just have the signature on the code change, not the > > changelog text, that would help with that issue. > > We totally should, and I even mused on how we would do that here: > https://public-inbox.org/git/20190910121324.GA6867@pure.paranoia.local/ > > However, since git's PGP signatures are made for the content in the actual > commit record (tree hash, parent, author, commit message, etc), the only way > we could preserve them between the email and the git tree is if we never > modify any of that data. The SOB and other trailers would have to only be > applied to the merge commit, or migrate into commit notes. There's also the possibility to handle this a bit like we do when adding comments before the SOB: a PGP signature would apply to the text *before* it only. We could then have long chains of SOB, PGP, SOB, PGP etc. Willy
On 10/18/19 4:52 AM, Willy Tarreau wrote: > On Thu, Oct 17, 2019 at 09:54:47PM -0400, Konstantin Ryabitsev wrote: >> On Thu, Oct 17, 2019 at 06:30:29PM -0700, Greg KH wrote: >>>> It could only possibly work if nobody ever adds their own >>>> "Signed-Off-By" or >>>> any other bylines. I expect this is a deal-breaker for most maintainers. >>> Yeah it is :( >>> >>> But, if we could just have the signature on the code change, not the >>> changelog text, that would help with that issue. >> We totally should, and I even mused on how we would do that here: >> https://public-inbox.org/git/20190910121324.GA6867@pure.paranoia.local/ >> >> However, since git's PGP signatures are made for the content in the actual >> commit record (tree hash, parent, author, commit message, etc), the only way >> we could preserve them between the email and the git tree is if we never >> modify any of that data. The SOB and other trailers would have to only be >> applied to the merge commit, or migrate into commit notes. > There's also the possibility to handle this a bit like we do when adding > comments before the SOB: a PGP signature would apply to the text *before* > it only. We could then have long chains of SOB, PGP, SOB, PGP etc. > > Willy I don't think it can work that easily as the signed content is not just the message. It would need git to support nesting signatures and to allow amending a commit without touching the signature and to allow adding one to cover the new content and to have a way to verify every step. Moreover you won't be able to reparent the commit as a maintainer (wich I think is also a deal-breaker) Nicolas
On 10/16/19 4:45 PM, Santiago Torres Arias wrote: > Hi Willy, Vegard. > > On Wed, Oct 16, 2019 at 01:10:09PM +0200, Willy Tarreau wrote: >> Hi Vegard, >> >> On Wed, Oct 16, 2019 at 12:22:54PM +0200, Vegard Nossum wrote: >>> (cross-posted to git, LKML, and the kernel workflows mailing lists.) >>> >>> Hi all, >>> >>> I've been following Konstantin Ryabitsev's quest for better development >>> and communication tools for the kernel [1][2][3], and I would like to >>> propose a relatively straightforward idea which I think could bring a >>> lot to the table. >>> >>> Step 1: >>> >>> * git send-email needs to include parent SHA1s and generally all the >>> information needed to perfectly recreate the commit when applied so >>> that all the SHA1s remain the same >>> >>> * git am (or an alternative command) needs to recreate the commit >>> perfectly when applied, including applying it to the correct parent >>> >>> Having these two will allow a perfect mapping between email and git; >>> essentially email just becomes a transport for git. There are a lot of >>> advantages to this, particularly that you have a stable way to refer to >>> a patch or commit (despite it appearing on a mailing list), and there >>> is no need for "changeset IDs" or whatever, since you can just use the >>> git SHA1 which is unique, unambiguous, and stable. > > I wonder if it'd be also possible to then embed gpg signatures over > send-mail payloads so as they can be transparently transferred to the > commit. > > -Santiago > I just played a bit with this and with my proposed patch for git-format-patch the signature is already part of the output: $ ./git-format-patch --complete HEAD^- 0001-format-patch-add-complete.patch $ cat 0001-format-patch-add-complete.patch From ac30b08065cd55362a7244a3bbc8df3563cefaaa Mon Sep 17 00:00:00 2001 From: Vegard Nossum <vegard.nossum@oracle.com> Date: Sat, 5 Oct 2019 16:15:59 +0200 Subject: [PATCH] format-patch: add --complete Include the raw commit data between the changelog and the diffstat. This will allow 'git am' to reconstruct the commit exactly to the point where the sha1 will be the same. Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> --- commit ac30b08065cd55362a7244a3bbc8df3563cefaaa tree 8f09d9d6ed78f8617b2fe54fe9712990ba808546 parent 108b97dc372828f0e72e56bbb40cae8e1e83ece6 author Vegard Nossum <vegard.nossum@oracle.com> 1570284959 +0200 committer Vegard Nossum <vegard.nossum@oracle.com> 1571408340 +0200 gpgsig -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJdqcnVAAoJEAvO9Nj+mLpYPEMP/0qyUF6U9y6FMM3BQrjteGGY IEEwmkvfW8vMBdjXRjmSI1jxRUuW+xJs3kxezNuW79Gzkl63PlS3CdW50yLlWau6 2gU4R8oSNr7vxpgfAscELxaAuvUSp7Vb1FEPc5kPW06Sprg4PkLkUMD71ALRnGMV TxTVMDbMYg2xHpwBFs1ZyF2l0ElqOvRqoQqYfvRql1rgbs5LhF0RevkIN5xswj93 3Gz1CuB8MURX2lfglfYSTy/05Rx3w/QHxwNlbbPDtXrexySf+a70j/Z6i2/BIzR/ kxlZJ/k4ZPN931mxFcLPBsV/K51uP378oEH1QdaZyO2jz1rj+AZxXlgXe8J3ZAmt XYT/FMze5lukd7EQDO5vPZazp1dnJ6wnmrAd8shCSWe23vybDMCYnjXTuwAXwbA5 R7ffKxm3MwRn9LKsbHFiV0J8tS1/fHbOIEXQDJ+kFhKqys0RSXipDZU61LnogXaw 827TcsUYLvkYlQ+LdmSjZ537E+bUTo3Udb/UkGbgwSSm9LTjHnAI34S6dxSZ+1cl jBD54v8u9I1hEImWxGbXns7ET1fh17Z4PoTPpA4COt3puAQY7vB7inGY3/kWz+7z iRieHyD/W6lba4rqNYHBxacD4JTXN9S9Z7o6F4ijeGQThbA77RWD64SGjuJM0mC7 mGUqvHz0pn7zOl1ZOS26 =gCT0 -----END PGP SIGNATURE----- --- builtin/log.c | 12 ++++++++++++ log-tree.c | 17 +++++++++++++++++ revision.h | 3 ++- 3 files changed, 31 insertions(+), 1 deletion(-) diff --git a/builtin/log.c b/builtin/log.c index c4b35fdaf9..81c1164ae5 100644 --- a/builtin/log.c +++ b/builtin/log.c @@ -1545,6 +1545,7 @@ int cmd_format_patch(int argc, const char **argv, const char *prefix) [...] There is no corresponding support in 'git am' yet, however. Seeing how large this signature is, I have to admit that I am partial to Konstantin's suggestion of using minisign. This seems like something that could be added to git as an alternative to gpg without too much trouble, I think. Vegard
On Fri, Oct 18, 2019 at 08:34:17AM +0200, Nicolas Belouin wrote: > On 10/18/19 4:52 AM, Willy Tarreau wrote: > > On Thu, Oct 17, 2019 at 09:54:47PM -0400, Konstantin Ryabitsev wrote: > >> On Thu, Oct 17, 2019 at 06:30:29PM -0700, Greg KH wrote: > >>>> It could only possibly work if nobody ever adds their own > >>>> "Signed-Off-By" or > >>>> any other bylines. I expect this is a deal-breaker for most maintainers. > >>> Yeah it is :( > >>> > >>> But, if we could just have the signature on the code change, not the > >>> changelog text, that would help with that issue. > >> We totally should, and I even mused on how we would do that here: > >> https://urldefense.proofpoint.com/v2/url?u=https-3A__public-2Dinbox.org_git_20190910121324.GA6867-40pure.paranoia.local_&d=DwICaQ&c=slrrB7dE8n7gBJbeO0g-IQ&r=yZMPY-APGKyVIX7HgQFZJA&m=-7NJMybpa_bV7Y1FxWmqo1cUHOsDXAsRR1vvpQmYhyI&s=iFHNwBfYAPr---qMdv0mvKQAxqjXxvf1mAiAYZG6DIE&e= > >> > >> However, since git's PGP signatures are made for the content in the actual > >> commit record (tree hash, parent, author, commit message, etc), the only way > >> we could preserve them between the email and the git tree is if we never > >> modify any of that data. The SOB and other trailers would have to only be > >> applied to the merge commit, or migrate into commit notes. > > There's also the possibility to handle this a bit like we do when adding > > comments before the SOB: a PGP signature would apply to the text *before* > > it only. We could then have long chains of SOB, PGP, SOB, PGP etc. > > > > Willy > > I don't think it can work that easily as the signed content is not just > the message. > It would need git to support nesting signatures and to allow amending a > commit without > touching the signature and to allow adding one to cover the new content > and to have a > way to verify every step. > Moreover you won't be able to reparent the commit as a maintainer (wich > I think is > also a deal-breaker) For reference, we did something similar here[1]. I'll acknowledge it's somewhat of a niche use, and there's a danger with multiple signature types that could mean many different things... I do wonder if an over-lying tool could probably provide with more granular verification over mutiple gpg payloads inside of a commit... Cheers! -Santiago. [1] https://dl.acm.org/citation.cfm?id=3196523
> Seeing how large this signature is, I have to admit that I am partial to > Konstantin's suggestion of using minisign. This seems like something > that could be added to git as an alternative to gpg without too much > trouble, I think. > > I wonder how big the pgp payload would be with ed25519 as the underlying algorithm. AFAICT, the payload of a minisign signature vs a signature packet have almost the same fields...
On Fri, Oct 18, 2019 at 11:54:09AM -0400, Santiago Torres Arias wrote: >> Seeing how large this signature is, I have to admit that I am partial to >> Konstantin's suggestion of using minisign. This seems like something >> that could be added to git as an alternative to gpg without too much >> trouble, I think. > >I wonder how big the pgp payload would be with ed25519 as the underlying >algorithm. AFAICT, the payload of a minisign signature vs a signature >packet have almost the same fields... It's smaller, but it's not a one-liner. Here's a comparison using ED25519 keys of the same length: minisign: RWQ4kF9UdFgeSt3LqnS3WnrLlx2EnuIFW7euw5JnLUHY/79ipftmj7A2ug7FiR2WmnFNoSacWr7llBuyInVmRL/VRovj1LFtvA0= pgp: -----BEGIN PGP SIGNATURE----- iHUEARYIAB0WIQR2vl2yUnHhSB5njDW2xBzjVmSZbAUCXaniFAAKCRC2xBzjVmSZ bHA5AP46sSPFJfL2tbXwswvj0v2DjLAQ9doxl9bfj9iPZu+3qwEAw5qAMbjw9teL L7+NbJ0WVniDWTgt+5ruQ2V9vyfYxAc= =B/St -----END PGP SIGNATURE----- -K
On Fri, Oct 18, 2019 at 12:03:43PM -0400, Konstantin Ryabitsev wrote: > On Fri, Oct 18, 2019 at 11:54:09AM -0400, Santiago Torres Arias wrote: > > > Seeing how large this signature is, I have to admit that I am partial to > > > Konstantin's suggestion of using minisign. This seems like something > > > that could be added to git as an alternative to gpg without too much > > > trouble, I think. > > > > I wonder how big the pgp payload would be with ed25519 as the underlying > > algorithm. AFAICT, the payload of a minisign signature vs a signature > > packet have almost the same fields... > > It's smaller, but it's not a one-liner. Here's a comparison using ED25519 > keys of the same length: > > minisign: > > RWQ4kF9UdFgeSt3LqnS3WnrLlx2EnuIFW7euw5JnLUHY/79ipftmj7A2ug7FiR2WmnFNoSacWr7llBuyInVmRL/VRovj1LFtvA0= > > pgp: > > -----BEGIN PGP SIGNATURE----- > > iHUEARYIAB0WIQR2vl2yUnHhSB5njDW2xBzjVmSZbAUCXaniFAAKCRC2xBzjVmSZ > bHA5AP46sSPFJfL2tbXwswvj0v2DjLAQ9doxl9bfj9iPZu+3qwEAw5qAMbjw9teL > L7+NbJ0WVniDWTgt+5ruQ2V9vyfYxAc= > =B/St Yeah, the discrepancy mostly comes from pgp embedding a timestamp and a longer keyid (+a full keyid fingerprint in pgp 2.1+). Minisign keyids are 8 random bytes, apparently. It doesn't seem like an amazing win in terms of succintness, imvho... Cheers! -Santiago.
On Fri, Oct 18, 2019 at 04:27:48PM +0200, Vegard Nossum wrote: > commit ac30b08065cd55362a7244a3bbc8df3563cefaaa > tree 8f09d9d6ed78f8617b2fe54fe9712990ba808546 > parent 108b97dc372828f0e72e56bbb40cae8e1e83ece6 > author Vegard Nossum <vegard.nossum@oracle.com> 1570284959 +0200 > committer Vegard Nossum <vegard.nossum@oracle.com> 1571408340 +0200 > gpgsig -----BEGIN PGP SIGNATURE----- ... Would it perhaps be possible to put some or all of these headers after the patch, as a set of "trailers"? That would make it easier for human readers of the e-mail to get the bits that they most care about.... namely, the patch itself. :-) If we move the PGP signature to the end, then the fact that it is so big and bulky becomes much less of an issue. A mini-sig might still be a cool thing, from a space savings perspective both in the mail archives, and in the git repo itself, if we start signing all commits. But that seems like a separable issue. Thanks, - Ted
On 10/18/19 6:15 PM, Theodore Y. Ts'o wrote: > On Fri, Oct 18, 2019 at 04:27:48PM +0200, Vegard Nossum wrote: >> commit ac30b08065cd55362a7244a3bbc8df3563cefaaa >> tree 8f09d9d6ed78f8617b2fe54fe9712990ba808546 >> parent 108b97dc372828f0e72e56bbb40cae8e1e83ece6 >> author Vegard Nossum <vegard.nossum@oracle.com> 1570284959 +0200 >> committer Vegard Nossum <vegard.nossum@oracle.com> 1571408340 +0200 >> gpgsig -----BEGIN PGP SIGNATURE----- > ... > > Would it perhaps be possible to put some or all of these headers after > the patch, as a set of "trailers"? That would make it easier for > human readers of the e-mail to get the bits that they most care > about.... namely, the patch itself. :-) > Yes, agreed. I started out using this approach, but I changed it because the implementation was a bit annoying: 'git am' runs 'git mailsplit', which just splits the email into two parts: 1) headers, changelog, and diffstat; 2) diff and signature. One of my PoC patches changes mailsplit to split the extra metadata into a third file. The problem I ran into with putting the metadata at the end was detecting where the diff ends. A comment in 'git apply' suggested that detecting the difference between "--" as a diff/signature separator and as part of the diff is nontrivial in the sense that you need to actually do some parsing and keep track of hunk sizes. I can try to put it at the end, but maybe the git people have some hints that would make the implementation easier? Is it okay to reimplement a simple diff parser in mailsplit? Thanks, Vegard
On Fri, Oct 18, 2019 at 12:11:22PM -0400, Santiago Torres Arias wrote: >> It's smaller, but it's not a one-liner. Here's a comparison using >> ED25519 >> keys of the same length: >> >> minisign: >> >> RWQ4kF9UdFgeSt3LqnS3WnrLlx2EnuIFW7euw5JnLUHY/79ipftmj7A2ug7FiR2WmnFNoSacWr7llBuyInVmRL/VRovj1LFtvA0= >> >> pgp: >> >> -----BEGIN PGP SIGNATURE----- >> >> iHUEARYIAB0WIQR2vl2yUnHhSB5njDW2xBzjVmSZbAUCXaniFAAKCRC2xBzjVmSZ >> bHA5AP46sSPFJfL2tbXwswvj0v2DjLAQ9doxl9bfj9iPZu+3qwEAw5qAMbjw9teL >> L7+NbJ0WVniDWTgt+5ruQ2V9vyfYxAc= >> =B/St > >Yeah, the discrepancy mostly comes from pgp embedding a timestamp and a >longer keyid (+a full keyid fingerprint in pgp 2.1+). Minisign keyids >are 8 random bytes, apparently. > >It doesn't seem like an amazing win in terms of succintness, imvho... There isn't, but ED25519 subkeys are still very rare among developers. Many have 4096-bit RSA subkeys, and you can imagine how large the sigs from those are. I want to underline that my use of minisign was specifically for patches sent via email, without the intent of preserving them in git history (which is why in my proposal they are put under the `---` cutoff). Git itself would continue to use PGP signing. (This also means that we don't necessarily need to make this a native part of git -- it can be accomplished by a combination of wrappers, git-format-patch parameters, and a pre-applypatch hook. However, the likelihood of adoption in this case would be very low.) -K
On Fri, Oct 18, 2019 at 06:50:51PM +0200, Vegard Nossum wrote: > I started out using this approach, but I changed it because the > implementation was a bit annoying: 'git am' runs 'git mailsplit', > which just splits the email into two parts: > > 1) headers, changelog, and diffstat; > 2) diff and signature. > > One of my PoC patches changes mailsplit to split the extra metadata into > a third file. > > The problem I ran into with putting the metadata at the end was > detecting where the diff ends. A comment in 'git apply' suggested that > detecting the difference between "--" as a diff/signature separator and > as part of the diff is nontrivial in the sense that you need to actually > do some parsing and keep track of hunk sizes. Could we cheat by having "git format-patch" add a "Diff-size" in the header which gives the number of lines in the diff so git am can just count lines to find the Trailer section? Thanks, - Ted
On Fri, Oct 18, 2019 at 03:14:56PM -0400, Theodore Y. Ts'o wrote: > On Fri, Oct 18, 2019 at 06:50:51PM +0200, Vegard Nossum wrote: > > The problem I ran into with putting the metadata at the end was > > detecting where the diff ends. A comment in 'git apply' suggested that > > detecting the difference between "--" as a diff/signature separator and > > as part of the diff is nontrivial in the sense that you need to actually > > do some parsing and keep track of hunk sizes. > > Could we cheat by having "git format-patch" add a "Diff-size" in the > header which gives the number of lines in the diff so git am can just > count lines to find the Trailer section? Be careful with this, it starts like this and ends up with non-editable patches. I'd rather have git-am use best-effort detection of the end. Also when dealing with stable backports, I've done a lot of "cat foo.diff >> bar.patch" to fixup some patches in which I just had to move some parts around. Having to count lines and edit a counter somewhere is going to become really painful. Just my two cents, Willy
On Thu, Oct 17, 2019 at 06:30:29PM -0700, Greg KH wrote: > On Thu, Oct 17, 2019 at 04:45:32PM -0400, Konstantin Ryabitsev wrote: > > On Thu, Oct 17, 2019 at 01:43:43PM -0700, Greg KH wrote: > >>> I wonder if it'd be also possible to then embed gpg signatures over > >>> send-mail payloads so as they can be transparently transferred to the > >>> commit. > >> > >> That's a crazy idea. It would be nice if we could do that, I like it :) > > > > It could only possibly work if nobody ever adds their own "Signed-Off-By" or > > any other bylines. I expect this is a deal-breaker for most maintainers. > > Yeah it is :( > > But, if we could just have the signature on the code change, not the > changelog text, that would help with that issue. I ran into a related issue recently when thinking about how to implement server-side workflows (for a non-kernel project). My goal is to ensure a patch can only be pushed to the master branch if it has received review. The easy way to do so it to check the Reviewed-by tags, but those can easily be forged. I was thus wondering if we should have a way to sign tags (as in commit message tags, not git tags).
On 10/20/19 5:17 AM, Willy Tarreau wrote: > On Fri, Oct 18, 2019 at 03:14:56PM -0400, Theodore Y. Ts'o wrote: >> On Fri, Oct 18, 2019 at 06:50:51PM +0200, Vegard Nossum wrote: >>> The problem I ran into with putting the metadata at the end was >>> detecting where the diff ends. A comment in 'git apply' suggested that >>> detecting the difference between "--" as a diff/signature separator and >>> as part of the diff is nontrivial in the sense that you need to actually >>> do some parsing and keep track of hunk sizes. >> >> Could we cheat by having "git format-patch" add a "Diff-size" in the >> header which gives the number of lines in the diff so git am can just >> count lines to find the Trailer section? > > Be careful with this, it starts like this and ends up with non-editable > patches. I'd rather have git-am use best-effort detection of the end. Expect filesystem developers to come up with a format that uses extents ;-) > Also when dealing with stable backports, I've done a lot of > "cat foo.diff >> bar.patch" to fixup some patches in which I just had > to move some parts around. Having to count lines and edit a counter > somewhere is going to become really painful. I almost have some new patches ready for putting the metadata after the patch using a very bare-bones diff parser (it's actually not that bad), I just need to fix a few corner cases that are causing breakage in the git test suite. Vegard
On 10/20/19 8:28 AM, Vegard Nossum wrote: > > On 10/20/19 5:17 AM, Willy Tarreau wrote: >> On Fri, Oct 18, 2019 at 03:14:56PM -0400, Theodore Y. Ts'o wrote: >>> On Fri, Oct 18, 2019 at 06:50:51PM +0200, Vegard Nossum wrote: >>>> The problem I ran into with putting the metadata at the end was >>>> detecting where the diff ends. A comment in 'git apply' suggested that >>>> detecting the difference between "--" as a diff/signature separator and >>>> as part of the diff is nontrivial in the sense that you need to >>>> actually >>>> do some parsing and keep track of hunk sizes. >>> >>> Could we cheat by having "git format-patch" add a "Diff-size" in the >>> header which gives the number of lines in the diff so git am can just >>> count lines to find the Trailer section? >> >> Be careful with this, it starts like this and ends up with non-editable >> patches. I'd rather have git-am use best-effort detection of the end. > > Expect filesystem developers to come up with a format that uses extents ;-) > >> Also when dealing with stable backports, I've done a lot of >> "cat foo.diff >> bar.patch" to fixup some patches in which I just had >> to move some parts around. Having to count lines and edit a counter >> somewhere is going to become really painful. > > I almost have some new patches ready for putting the metadata after the > patch using a very bare-bones diff parser (it's actually not that bad), > I just need to fix a few corner cases that are causing breakage in the > git test suite. I sent v2 of the patches (with metadata _after_ the diff) to the git list here: https://public-inbox.org/git/20191022114518.32055-1-vegard.nossum@oracle.com/T/#u As I wrote in there, we could already today start using git am --message-id when applying patches and this would provide something that a bot could annotate with git notes pointing to lore/LKML/LWN/whatever. I think that would already be a pretty nice improvement over today's situation. Sadly, since the beginning of 2018, this was only used for a measly ~0.14% of all non-merge commits in the kernel: $ git rev-list --count --no-merges --since='2018-01-01' --grep 'Message-Id: ' linus/master 178 $ git rev-list --count --no-merges --since='2018-01-01' linus/master 130777 So how can we spread the word about --message-id and get maintainers to actually use it? I don't suppose it's reasonable to change the 'git am' default setting? Vegard
On Tue, Oct 22, 2019 at 02:11:22PM +0200, Vegard Nossum wrote: > > As I wrote in there, we could already today start using > > git am --message-id > > when applying patches and this would provide something that a bot could > annotate with git notes pointing to lore/LKML/LWN/whatever. I think that > would already be a pretty nice improvement over today's situation. > > Sadly, since the beginning of 2018, this was only used for a measly > ~0.14% of all non-merge commits in the kernel: > > $ git rev-list --count --no-merges --since='2018-01-01' --grep 'Message-Id: > ' linus/master > 178 You might also want to count commits which have a link tag with a Message-Id: Link: https://lore.kernel.org/r/c3438dad66a34a7d4e7509a5dd64c2326340a52a.1571647180.git.mbobrowski@mbobrowski.org That's because some kernel developers have been using a hook script like this: #!/bin/sh # For .git/hooks/applypatch-msg # # You must have the following in .git/config: # [am] # messageid = true . git-sh-setup perl -pi -e 's|^Message-Id:\s*<?([^>]+)>?$|Link: https://lore.kernel.org/r/$1|g;' "$1" test -x "$GIT_DIR/hooks/commit-msg" && exec "$GIT_DIR/hooks/commit-msg" ${1+"$@"} : .... as we had reached rough consensus that this was the best way to incorprate the message id (since it could made to be a clickable link in tools like gitk, for example). This rough consensus has only been in place since around the time of the Maintainer's Summit in Lisbon, so uptake is still probably a bit slow. I'd expect to see a lot more of this in the next merge window, though. - Ted
On 10/22/19 3:53 PM, Theodore Y. Ts'o wrote: > On Tue, Oct 22, 2019 at 02:11:22PM +0200, Vegard Nossum wrote: >> >> As I wrote in there, we could already today start using >> >> git am --message-id >> >> when applying patches and this would provide something that a bot could >> annotate with git notes pointing to lore/LKML/LWN/whatever. I think that >> would already be a pretty nice improvement over today's situation. >> >> Sadly, since the beginning of 2018, this was only used for a measly >> ~0.14% of all non-merge commits in the kernel: >> >> $ git rev-list --count --no-merges --since='2018-01-01' --grep 'Message-Id: >> ' linus/master >> 178 > > You might also want to count commits which have a link tag with a > Message-Id: > > Link: https://lore.kernel.org/r/c3438dad66a34a7d4e7509a5dd64c2326340a52a.1571647180.git.mbobrowski@mbobrowski.org > > That's because some kernel developers have been using a hook script like this: > > #!/bin/sh > # For .git/hooks/applypatch-msg > # > # You must have the following in .git/config: > # [am] > # messageid = true > > . git-sh-setup > perl -pi -e 's|^Message-Id:\s*<?([^>]+)>?$|Link: https://lore.kernel.org/r/$1|g;' "$1" > test -x "$GIT_DIR/hooks/commit-msg" && > exec "$GIT_DIR/hooks/commit-msg" ${1+"$@"} > : > > .... as we had reached rough consensus that this was the best way to > incorprate the message id (since it could made to be a clickable link > in tools like gitk, for example). This rough consensus has only been > in place since around the time of the Maintainer's Summit in Lisbon, > so uptake is still probably a bit slow. I'd expect to see a lot more > of this in the next merge window, though. Thanks, I was not aware of this! Seems like something that should go in Documentation/maintainer/, right? The figure is much better, 16.7% on all non-merges since 2018-01-01. This should help and we can maybe already do some interesting things with git notes and lore/public-inbox. Vegard
Vegard Nossum <vegard.nossum@oracle.com> wrote: > I sent v2 of the patches (with metadata _after_ the diff) to the git > list here: > > https://public-inbox.org/git/20191022114518.32055-1-vegard.nossum@oracle.com/T/#u > > As I wrote in there, we could already today start using > > git am --message-id > > when applying patches and this would provide something that a bot could > annotate with git notes pointing to lore/LKML/LWN/whatever. I think that > would already be a pretty nice improvement over today's situation. > > Sadly, since the beginning of 2018, this was only used for a measly > ~0.14% of all non-merge commits in the kernel: --message-id helps provide a concrete reference, yes. However, being able to search for commit subjects in the mail archives is already implemented via cgit filter. An example is here: https://80x24.org/mirrors/git.git/commit/?id=8da56a484800023a545d7a7c022473f5aa9e720f The link at "userdiff: fix some corner cases in dts regex" makes a link to: https://public-inbox.org/git/?x=t&q=%22userdiff:+fix+some+corner+cases+in+dts+regex%22 (side note: not sure if that "x=t" to expand the whole message is good...) That link is generated by examples/cgit-commit-filter.lua in the public-inbox source: https://public-inbox.org/meta/1677253/s/?b=examples/cgit-commit-filter.lua My longer term plan is to be able to use the post-image blob OIDs from cgit to generate a search query for public-inbox such as: https://public-inbox.org/git/?q=dfpost:afc6b5b404+dfpost:072d58b69d+dfpost:4353b8220c+dfpost:333a625c70+dfpost:e187d356f6 Which finds all versions of the userdiff patch posted. But AFAIK there's no easy way to get at blob OIDs from cgit to a Lua filter...
From 3120370db888889f32e07a082edb4722db8feef1 Mon Sep 17 00:00:00 2001 From: Vegard Nossum <vegard.nossum@oracle.com> Date: Wed, 16 Oct 2019 02:36:18 +0200 Subject: [PATCH 3/3] am: add --exact This uses exact metadata when creating the commit object, hopefully reconstructing the commit with the exact same SHA1. Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> --- commit 3120370db888889f32e07a082edb4722db8feef1 tree 61b7556f06fd6fcb0f4a43940ec0cbc29ccf1bcc parent 51bb531eb57320caf3761680ebf77c25b89b3719 author Vegard Nossum <vegard.nossum@oracle.com> 1571186178 +0200 committer Vegard Nossum <vegard.nossum@oracle.com> 1571219301 +0200 --- builtin/am.c | 103 ++++++++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 94 insertions(+), 9 deletions(-) diff --git a/builtin/am.c b/builtin/am.c index 4190383bba..069a625895 100644 --- a/builtin/am.c +++ b/builtin/am.c @@ -118,6 +118,7 @@ struct am_state { int allow_rerere_autoupdate; const char *sign_commit; int rebasing; + int exact; }; /** @@ -399,6 +400,8 @@ static void am_load(struct am_state *state) state->rebasing = !!file_exists(am_path(state, "rebasing")); + state->exact = read_state_file(&sb, state, "exact", 1); + strbuf_release(&sb); } @@ -1005,6 +1008,8 @@ static void am_setup(struct am_state *state, enum patch_format patch_format, else write_state_text(state, "applying", ""); + write_state_bool(state, "exact", state->exact); + if (!get_oid("HEAD", &curr_head)) { write_state_text(state, "abort-safety", oid_to_hex(&curr_head)); if (!state->rebasing) @@ -1548,19 +1553,88 @@ static int fall_back_threeway(const struct am_state *state, const char *index_pa */ static void do_commit(const struct am_state *state) { + struct object_id meta_commit = {}; + struct object_id meta_tree = {}; + struct object_id tree, parent, commit; const struct object_id *old_oid; struct commit_list *parents = NULL; const char *reflog_msg, *author; struct strbuf sb = STRBUF_INIT; + if (state->exact) { + /* + * Scan meta file for parents + other data + */ + + struct strbuf line = STRBUF_INIT; + FILE *fp = xfopen(am_path(state, "meta"), "r"); + + while (!strbuf_getline_lf(&line, fp)) { + const char *rest; + + if (skip_prefix(line.buf, "commit ", &rest)) { + if (get_oid_hex(rest, &meta_commit)) + die("invalid exact metadata (commit)"); + } else if (skip_prefix(line.buf, "tree ", &rest)) { + if (get_oid_hex(rest, &meta_tree)) + die("invalid exact metadata (tree)"); + } else if (skip_prefix(line.buf, "parent ", &rest)) { + if (get_oid_hex(rest, &parent)) + die("invalid exact metadata (parent)"); + + commit_list_insert(lookup_commit(the_repository, &parent), &parents); + } else if (skip_prefix(line.buf, "author ", &rest)) { + author = strdup(rest); + } else if (skip_prefix(line.buf, "committer ", &rest)) { + char *name_copy; + char *email; + char *email_copy; + char *date; + + email = strstr(rest, " <"); + if (!email) + die("invalid exact metadata (committer name)"); + + name_copy = xstrndup(rest, email - rest); + email += 2; + setenv("GIT_COMMITTER_NAME", name_copy, 1); + free(name_copy); + + date = strstr(email, "> "); + if (!date) + die("invalid exact metadata (committer email)"); + + email_copy = xstrndup(email, date - email); + date += 2; + setenv("GIT_COMMITTER_EMAIL", email_copy, 1); + free(email_copy); + + setenv("GIT_COMMITTER_DATE", date, 1); + } else if (line.len == 0) { + break; + } else { + die("unknown exact metadata: %.*s", line.len, line.buf); + } + } + + fclose(fp); + } + if (run_hook_le(NULL, "pre-applypatch", NULL)) exit(1); if (write_cache_as_tree(&tree, 0, NULL)) die(_("git write-tree failed to write a tree")); - if (!get_oid_commit("HEAD", &parent)) { + if (state->exact && !oideq(&tree, &meta_tree)) + die("tree mismatch"); + + if (state->exact) { + /* + * Already got parents above. + */ + } else if (!get_oid_commit("HEAD", &parent)) { old_oid = &parent; commit_list_insert(lookup_commit(the_repository, &parent), &parents); @@ -1569,19 +1643,28 @@ static void do_commit(const struct am_state *state) say(state, stderr, _("applying to an empty history")); } - author = fmt_ident(state->author_name, state->author_email, - WANT_AUTHOR_IDENT, - state->ignore_date ? NULL : state->author_date, - IDENT_STRICT); - - if (state->committer_date_is_author_date) - setenv("GIT_COMMITTER_DATE", - state->ignore_date ? "" : state->author_date, 1); + if (state->exact) { + /* + * Already got author above. + */ + } else { + author = fmt_ident(state->author_name, state->author_email, + WANT_AUTHOR_IDENT, + state->ignore_date ? NULL : state->author_date, + IDENT_STRICT); + + if (state->committer_date_is_author_date) + setenv("GIT_COMMITTER_DATE", + state->ignore_date ? "" : state->author_date, 1); + } if (commit_tree(state->msg, state->msg_len, &tree, parents, &commit, author, state->sign_commit)) die(_("failed to write commit object")); + if (state->exact && !oideq(&commit, &meta_commit)) + die("sha1 mismatch"); + reflog_msg = getenv("GIT_REFLOG_ACTION"); if (!reflog_msg) reflog_msg = "am"; @@ -2182,6 +2265,8 @@ int cmd_am(int argc, const char **argv, const char *prefix) 0, PARSE_OPT_NONEG), OPT_BOOL('c', "scissors", &state.scissors, N_("strip everything before a scissors line")), + OPT_BOOL('e', "exact", &state.exact, + N_("preserve exact metadata, including sha1")), OPT_PASSTHRU_ARGV(0, "whitespace", &state.git_apply_opts, N_("action"), N_("pass it through git-apply"), 0), -- 2.23.0.718.g3120370db8