Message ID | pull.1489.git.1678111598.gitgitgadget@gmail.com (mailing list archive) |
---|---|
Headers | show |
Series | ahead-behind: new builtin for counting multiple commit ranges | expand |
"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes: > These numbers can be computed by 'git rev-list --count B..C' and 'git > rev-list --count C..B', but there are common needs that benefit from having > the checks being done in the same process: This makes readers wonder if "git rev-list --count B...C" should be the end-user facing UI for this new feature, perhaps? Of course if you are checking how C0, C1, C2,... relate to a single B, the existing rev-list syntax would not work, and makes a totally new subcommand a possibilty. > 2. When a branch is updated, a background job checks if any pull requests > that target that branch should be closed because their branches were > merged implicitly by that update. These queries can e batched into 'git > ahead-behind' calls. > > In that second example, we don't need the full ahead/behind counts (although > it is sufficient to look for branches that are "zero commits ahead", meaning > they are reachable from the base), so this builtin has an extra '--contains' > mode that only checks reachability from the base to each of the tips. 'git > ahead-behind --contains' is sort of the reverse of 'git branch --contains'. I thought that the reverse of "git branch --contains" was "git branch --merged". "git branch --merged maint ??/\*" is how I cull topic branches that have already served their purpose. Isn't closing pull requests because they have been already merged the same idea? "git for-each-ref --merged main refs/pull/\*" or something, perhaps? All of the above are from only reading the cover letter. I'm sure I'll have more thoughts or even change my mind after reading the patches. Thanks.
On 3/6/2023 1:26 PM, Junio C Hamano wrote: > "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes: > >> These numbers can be computed by 'git rev-list --count B..C' and 'git >> rev-list --count C..B', but there are common needs that benefit from having >> the checks being done in the same process: > > This makes readers wonder if "git rev-list --count B...C" should be > the end-user facing UI for this new feature, perhaps? > > Of course if you are checking how C0, C1, C2,... relate to a single > B, the existing rev-list syntax would not work, and makes a totally > new subcommand a possibilty. > >> 2. When a branch is updated, a background job checks if any pull requests >> that target that branch should be closed because their branches were >> merged implicitly by that update. These queries can e batched into 'git >> ahead-behind' calls. >> >> In that second example, we don't need the full ahead/behind counts (although >> it is sufficient to look for branches that are "zero commits ahead", meaning >> they are reachable from the base), so this builtin has an extra '--contains' >> mode that only checks reachability from the base to each of the tips. 'git >> ahead-behind --contains' is sort of the reverse of 'git branch --contains'. > > I thought that the reverse of "git branch --contains" was "git > branch --merged". "git branch --merged maint ??/\*" is how I cull > topic branches that have already served their purpose. > > Isn't closing pull requests because they have been already merged > the same idea? "git for-each-ref --merged main refs/pull/\*" or > something, perhaps? You are definitely on to something, and I was not aware of --merged as an option to either of these. 'git branch --merged' has some limitations that tags cannot be used. 'git for-each-ref --merged' is probably sufficient. The only difference being that it would be nice to specify the matching refs over stdin with --stdin to avoid long argument lists. With this in mind, I can update the performance test to look like this (after updating the setup step to add branches for each line in 'refs') test_perf 'batch reachability: git ahead-behind --contains' ' git ahead-behind --contains --base=HEAD --stdin <refs ' test_perf 'batch reachability: git branch --merged' ' xargs git branch --merged=HEAD <branches ' test_perf 'batch reachability: git for-each-ref --merged' ' xargs git for-each-ref --merged=HEAD <refs ' And get decent results on all cases with the Linux kernel repository: Test this tree ------------------------------------------------------------------------- 1500.2: ahead-behind counts: git ahead-behind 0.26(0.24+0.01) 1500.3: ahead-behind counts: git rev-list 4.46(3.91+0.54) 1500.4: batch reachability: git ahead-behind --contains 0.02(0.01+0.01) 1500.5: batch reachability: git branch --merged 0.14(0.13+0.00) 1500.6: batch reachability: git for-each-ref --merged 0.14(0.13+0.00) So, there is benefit in using this tips_reachable_from_base() method in the two existing 'git (branch|for-each-ref) --merged' computations. The API boundary selected in this series might not be the most appropriate for those builtins, so let's kick out patch 8 from this series for now and I'll revisit it separately. Thanks, -Stolee
Derrick Stolee <derrickstolee@github.com> writes: > 'git for-each-ref --merged' is probably sufficient. The only difference > being that it would be nice to specify the matching refs over stdin > with --stdin to avoid long argument lists. Yeah, if you have a list of concrete refs maintained externally (as opposed to the example I responded to, where you generate the refs by telling for-each-ref what pattern they should match), having --stdin would be a good thing. > So, there is benefit in using this tips_reachable_from_base() method in > the two existing 'git (branch|for-each-ref) --merged' computations. The > API boundary selected in this series might not be the most appropriate > for those builtins, so let's kick out patch 8 from this series for now > and I'll revisit it separately. Yup, if the reachability API refactoring in these patches can also help the "for-each-ref" listing (which "git branch" and "git tag" bases their listing behaviour), it would be very good. Thanks.
On Mon, Mar 06, 2023 at 02:06:30PM +0000, Derrick Stolee via GitGitGadget wrote: > This series introduces the 'git ahead-behind' builtin, which has been used > at $DAYJOB for many years, but took many forms before landing on the current > version. Thanks for a helpful summary of all of the details here. I am of course familiar with your use of this builtin at our common $DAYJOB, but it was nice to see a from-scratch explanation of what you're trying to do here. Now let's take a look at how the patches came together... ;-). Thanks, Taylor
On Mon, Mar 06, 2023 at 10:26:26AM -0800, Junio C Hamano wrote: > "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes: > > > These numbers can be computed by 'git rev-list --count B..C' and 'git > > rev-list --count C..B', but there are common needs that benefit from having > > the checks being done in the same process: > > This makes readers wonder if "git rev-list --count B...C" should be > the end-user facing UI for this new feature, perhaps? > > Of course if you are checking how C0, C1, C2,... relate to a single > B, the existing rev-list syntax would not work, and makes a totally > new subcommand a possibilty. Yeah. You could imagine that `rev-list --count` might do something fancy like coalescing git rev-list --count B...C1 B...C2 B...C3 into a single walk. But I am not sure that just because `rev-list --count` provides similar functionality that we should fold in the proposed `ahead-behind` interface into that flag. On the other hand, I could see a compelling argument for a slightly different syntax (maybe `--count-ahead-behind` or `--count=ahead-behind`) that would fold this functionality into `rev-list`. And that is the sort of thing that we would want to settle on sooner rather than later, since it's fairly baked in once we decide one way or another and then merge this up. My personal feeling is that we ought to avoid (further) overloading `rev-list` absent of a compelling reason to do so. But I am definitely open to other thoughts here. Thanks, Taylor
On Mon, Mar 06, 2023 at 07:36:23PM -0500, Taylor Blau wrote: > > This makes readers wonder if "git rev-list --count B...C" should be > > the end-user facing UI for this new feature, perhaps? > > > > Of course if you are checking how C0, C1, C2,... relate to a single > > B, the existing rev-list syntax would not work, and makes a totally > > new subcommand a possibilty. > > Yeah. You could imagine that `rev-list --count` might do something > fancy like coalescing > > git rev-list --count B...C1 B...C2 B...C3 > > into a single walk. But I am not sure that just because `rev-list > --count` provides similar functionality that we should fold in the > proposed `ahead-behind` interface into that flag. It does coalesce all of that into a single walk. The problem is somewhat the opposite: it only has a notion of two "sides" for a symmetric traversal: left and right. But in your example there are many sides, and we have to remember which is which. I think getting the answer from one walk would require an arbitrary number of bits to paint down each path. Certainly the ahead-behind that Vicent and I wrote long ago didn't do that (IIRC it mostly relied on doing multiple traversals in the same process, which amortized the cost of commit parsing; that's not really an issue these days with commit graphs). Peeking at patch 7 of Stolee's series...yep. That's exactly what it does. :) I wondered how much it would matter on top of a naive loop of single-traversals, now that we have commit graphs. It looks like there's still quite a nice speedup from the numbers in patch 7 (though the totally naive "loop of rev-list" is incurring extra startup overhead, too). > My personal feeling is that we ought to avoid (further) overloading > `rev-list` absent of a compelling reason to do so. But I am definitely > open to other thoughts here. So I think this actually is what "git rev-list --left-right --count old...new" does now. But extending it to multiple sets in one traversal means you need: - being able to ask for individual left-right markers for each pair, not treating all lefts and all rights together - don't stop traversing when you hit an UNINTERESTING commit if there are still bits to paint. In a single-pair traversal, those two are the same thing (we stop at the merge base), but with multiple pairs you may have to keep walking past a commit that is excluded from one pair, but not another. This _might_ be doable if you assume all of the left-hand bases are the same, but I didn't think hard enough to feel confident in that. But even so, that only solves cases like "how do these branches compare to HEAD" (which I think is what GitHub does). But it doesn't allow "how do these branches compare to to their respective @{upstream} refs". So I don't think it would be impossible to make this a mode of rev-list. And that mode might even provide flexibility for other similar operations, like a mass "git rev-list --cherry-mark"[1]. But it is a pretty big departure from the current rev-list traversal (to my mind, especially the "keep walking past UNINTERESTING part). I don't mind it as its own command. -Peff [1] The reason you might want a mass cherry-mark is basically doing something like the "branches" page, but in a workflow where upstream applies patches, like git.git. There you may want to ask about "origin/next...$branch" for all of your branches to see which ones have been merged where.
Jeff King <peff@peff.net> writes: >> Yeah. You could imagine that `rev-list --count` might do something >> fancy like coalescing >> >> git rev-list --count B...C1 B...C2 B...C3 >> >> into a single walk. But I am not sure that just because `rev-list >> --count` provides similar functionality that we should fold in the >> proposed `ahead-behind` interface into that flag. > > It does coalesce all of that into a single walk. The problem is somewhat > the opposite: it only has a notion of two "sides" for a symmetric > traversal: left and right. But in your example there are many sides, and > we have to remember which is which. Yeah, this reminds me of what I had to do in "show-branch", where each tip gets assigned a bit in the object->flags (which means it can only traverse from a very small limited number of tips, like 30 or so), which I once planned to extend to arbitrary number of tips by storing these bits in commit slab, but it never materialized. > So I don't think it would be impossible to make this a mode of rev-list. > And that mode might even provide flexibility for other similar > operations, like a mass "git rev-list --cherry-mark"[1]. But it is a > pretty big departure from the current rev-list traversal (to my mind, > especially the "keep walking past UNINTERESTING part). I don't mind it > as its own command. I agree this is not a good fit for the mental model of rev-list or the revision.c::get_revision() traversal. Thanks.