Message ID | a1ddae16bae563dd904555661e2e916536f388d8.1539298957.git.matvore@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | support for filtering trees and blobs based on depth | expand |
Matthew DeVore <matvore@google.com> writes: > -List commits that are reachable by following the `parent` links from the > -given commit(s), but exclude commits that are reachable from the one(s) > -given with a '{caret}' in front of them. The output is given in reverse > -chronological order by default. > +List objects that are reachable by following references from the given > +object(s), but exclude objects that are reachable from the one(s) given > +with a '{caret}' in front of them. > > +By default, only commit objects are shown, and the commits are shown in > +reverse chronological order. The '--object' flag causes non-commit objects > +to also be shown. > > +You can think of this as a set operation. Objects given on the command > +line form a set of objects that are reachable from any of them, and then > +objects reachable from any of the ones given with '{caret}' in front are > +subtracted from that set. The remaining objects are what come out in the > command's output. Various other options and paths parameters can be used > to further limit the result. I am not sure if this is a good rewrite. It gives a false impression as if you'd not see anything if I did this: git rev-list --objects ^master md/filter-trees:t/valgrind especially if you change the SYNOPSIS and make it claim that the command takes <object> as starting point. Updating <commit> to <object> for those that are _shown_ is OK, but the s/commit/object/ for those that are given as the starting points is not a good change at all, if I understand what the code is designed to do correctly. It is more like "this is a set operation across commits. We also show objects that are reachable from the commits in the resulting set and are not reachable from the commits in the set that were excluded when --objects option is given".
On Sun, Oct 14, 2018 at 4:25 PM Junio C Hamano <gitster@pobox.com> wrote: > > Matthew DeVore <matvore@google.com> writes: > > > -List commits that are reachable by following the `parent` links from the > > -given commit(s), but exclude commits that are reachable from the one(s) > > -given with a '{caret}' in front of them. The output is given in reverse > > -chronological order by default. > > +List objects that are reachable by following references from the given > > +object(s), but exclude objects that are reachable from the one(s) given > > +with a '{caret}' in front of them. > > > > +By default, only commit objects are shown, and the commits are shown in > > +reverse chronological order. The '--object' flag causes non-commit objects > > +to also be shown. > > > > +You can think of this as a set operation. Objects given on the command > > +line form a set of objects that are reachable from any of them, and then > > +objects reachable from any of the ones given with '{caret}' in front are > > +subtracted from that set. The remaining objects are what come out in the > > command's output. Various other options and paths parameters can be used > > to further limit the result. > > I am not sure if this is a good rewrite. It gives a false > impression as if you'd not see anything if I did this: > > git rev-list --objects ^master md/filter-trees:t/valgrind > Oh that's interesting. So my mental model conflicts with the command's behavior. It actually is surprising behavior because: # this shows all files that were modified in the HEAD commit git rev-list --objects ^HEAD~1^{tree} HEAD: # but this shows *all* files at HEAD git rev-list --objects ^HEAD~1 HEAD: Which means that ^commit and ^non-commit are treated inherently differently. Maybe I should fix this bug before clarifying this documentation... > It is more like "this is a set operation across commits. We also > show objects that are reachable from the commits in the resulting > set and are not reachable from the commits in the set that were > excluded when --objects option is given". > That would be correct though it wouldn't tell that you can use "--objects ^foo-tree bar-tree." Without fixing the above bug, I could add to your wording something to the effect of "You can also use trees to include and exclude sets of objects rather than commits." Which implies that mixing "--objects ^commit tree" on the command line is undefined.
Matthew DeVore <matvore@google.com> writes: >> It is more like "this is a set operation across commits. We also >> show objects that are reachable from the commits in the resulting >> set and are not reachable from the commits in the set that were >> excluded when --objects option is given". >> > That would be correct though it wouldn't tell that you can use > "--objects ^foo-tree bar-tree." Yeah, but quite honestly, I consider that it is working by accident, not by design, in the current code (iow, you are looking at a behaviour of whatever the code happens to do). "rev-list" is pretty much set operation across commits, and anything that deals with a non commit-ish given from the command line is an afterthought at best, and happenstance in reality. I do not mean to say that the code must stay that way, though.
On Sun, Oct 21, 2018 at 7:21 PM Junio C Hamano <gitster@pobox.com> wrote: > > Matthew DeVore <matvore@google.com> writes: > > >> It is more like "this is a set operation across commits. We also > >> show objects that are reachable from the commits in the resulting > >> set and are not reachable from the commits in the set that were > >> excluded when --objects option is given". > >> > > That would be correct though it wouldn't tell that you can use > > "--objects ^foo-tree bar-tree." > > Yeah, but quite honestly, I consider that it is working by accident, > not by design, in the current code (iow, you are looking at a > behaviour of whatever the code happens to do). "rev-list" is pretty > much set operation across commits, and anything that deals with a > non commit-ish given from the command line is an afterthought at > best, and happenstance in reality. > > I do not mean to say that the code must stay that way, though. I tried fixing the issue of "--objects ^commitobj treeobj" not properly excluding objects reachable from commitobj, but this ended up causing t5616-partial-clone.sh to fail. In the test labeled "manual prefetch of missing objects", we create a clone of srv.bare without blobs called "pc1", then push some new commits to srv.bare (via a separate "local" repo), and try to fetch missing blobs with this command: $ git -C pc1 fetch-pack --stdin "file://$(pwd)/srv.bare" <observed.oids Where observed.oids contains all the blobs that were missing. It tells the remote that it already has the "refs/heads/master" commit, which means it is excluded. Before, this worked fine, since it didn't mean the blobs were excluded, only the commit itself. With the fix I naively put together (and didn't share), because one of the blobs is reachable from "refs/heads/master" it wouldn't pull it. I guess I can change it so that items given directly to stdin or argv are always fetched, though I'm not sure how easy that will be, and I'm not as interested in fixing this as I once was. I just wanted the documentation to outline the object-enumeration capabilities. So I'll send a re-roll which makes much more modest changes to the documentation.
Matthew DeVore <matvore@google.com> writes: > $ git -C pc1 fetch-pack --stdin "file://$(pwd)/srv.bare" <observed.oids > > Where observed.oids contains all the blobs that were missing. It tells > the remote that it already has the "refs/heads/master" commit, which > means it is excluded. Before, this worked fine, since it didn't mean > the blobs were excluded, only the commit itself. So 'master' has some blob in its tree, which is missing from the repository, in this test? Which means that such a repository is "corrupt" and does not pass connectivity check by fsck. I am of two minds. If we claim that we have everything that is needed to complete the commit sitting at the tip of 'master', I think it is correct for the other side not to send a blob that is in 'master' (or its ancestors), so your "fix" may (at least technically) be more correct than the status quo. On the other hand, if possession of commit 'master' does not defeat an explicit request for a blob in it, that would actually be a good thing---it would be a very straight-forward way to recover from such form of repository corruption. Fetching isolated objects without walking is also something that would help backfill a lazily created clone, and I even vaguely recall an effort to allow objects explicitly requested to be always fetched regardless of the connectivity, if I am not mistaken (JTan?) Anyway, thanks for a thoughtful response.
diff --git a/Documentation/git-rev-list.txt b/Documentation/git-rev-list.txt index 88609ff43..b3357932c 100644 --- a/Documentation/git-rev-list.txt +++ b/Documentation/git-rev-list.txt @@ -60,20 +60,23 @@ SYNOPSIS [ --no-walk ] [ --do-walk ] [ --count ] [ --use-bitmap-index ] - <commit>... [ \-- <paths>... ] + <object>... [ \-- <paths>... ] DESCRIPTION ----------- -List commits that are reachable by following the `parent` links from the -given commit(s), but exclude commits that are reachable from the one(s) -given with a '{caret}' in front of them. The output is given in reverse -chronological order by default. +List objects that are reachable by following references from the given +object(s), but exclude objects that are reachable from the one(s) given +with a '{caret}' in front of them. -You can think of this as a set operation. Commits given on the command -line form a set of commits that are reachable from any of them, and then -commits reachable from any of the ones given with '{caret}' in front are -subtracted from that set. The remaining commits are what comes out in the +By default, only commit objects are shown, and the commits are shown in +reverse chronological order. The '--object' flag causes non-commit objects +to also be shown. + +You can think of this as a set operation. Objects given on the command +line form a set of objects that are reachable from any of them, and then +objects reachable from any of the ones given with '{caret}' in front are +subtracted from that set. The remaining objects are what come out in the command's output. Various other options and paths parameters can be used to further limit the result. diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt index 5f1672913..c2c1c40e6 100644 --- a/Documentation/rev-list-options.txt +++ b/Documentation/rev-list-options.txt @@ -139,29 +139,29 @@ parents) and `--max-parents=-1` (negative numbers denote no upper limit). --all:: Pretend as if all the refs in `refs/`, along with `HEAD`, are - listed on the command line as '<commit>'. + listed on the command line as '<object>'. --branches[=<pattern>]:: Pretend as if all the refs in `refs/heads` are listed - on the command line as '<commit>'. If '<pattern>' is given, limit + on the command line as '<object>'. If '<pattern>' is given, limit branches to ones matching given shell glob. If pattern lacks '?', '{asterisk}', or '[', '/{asterisk}' at the end is implied. --tags[=<pattern>]:: Pretend as if all the refs in `refs/tags` are listed - on the command line as '<commit>'. If '<pattern>' is given, limit + on the command line as '<object>'. If '<pattern>' is given, limit tags to ones matching given shell glob. If pattern lacks '?', '{asterisk}', or '[', '/{asterisk}' at the end is implied. --remotes[=<pattern>]:: Pretend as if all the refs in `refs/remotes` are listed - on the command line as '<commit>'. If '<pattern>' is given, limit + on the command line as '<object>'. If '<pattern>' is given, limit remote-tracking branches to ones matching given shell glob. If pattern lacks '?', '{asterisk}', or '[', '/{asterisk}' at the end is implied. --glob=<glob-pattern>:: Pretend as if all the refs matching shell glob '<glob-pattern>' - are listed on the command line as '<commit>'. Leading 'refs/', + are listed on the command line as '<object>'. Leading 'refs/', is automatically prepended if missing. If pattern lacks '?', '{asterisk}', or '[', '/{asterisk}' at the end is implied. @@ -182,7 +182,7 @@ explicitly. --reflog:: Pretend as if all objects mentioned by reflogs are listed on the - command line as `<commit>`. + command line as `<object>`. --single-worktree:: By default, all working trees will be examined by the @@ -205,9 +205,9 @@ ifndef::git-rev-list[] endif::git-rev-list[] --stdin:: - In addition to the '<commit>' listed on the command + In addition to the '<object>' listed on the command line, read them from the standard input. If a `--` separator is - seen, stop reading commits and start reading paths to limit the + seen, stop reading objects and start reading paths to limit the result. ifdef::git-rev-list[] diff --git a/builtin/rev-list.c b/builtin/rev-list.c index 49d6deed7..9817e6747 100644 --- a/builtin/rev-list.c +++ b/builtin/rev-list.c @@ -21,7 +21,7 @@ #include "object-store.h" static const char rev_list_usage[] = -"git rev-list [OPTION] <commit-id>... [ -- paths... ]\n" +"git rev-list [OPTION] <object-id>... [ -- paths... ]\n" " limiting output:\n" " --max-count=<n>\n" " --max-age=<epoch>\n"
git-rev-list has a mode where it works on the granularity of trees and blobs, rather than commits only. When discussing this mode in documenation, it can get awkward to refer to the list of arguments that may include blobs and trees as <commit>. It is especially awkward in a follow-up patch, so prepare for that patch by renaming the argument. In addition to simply renaming the argument, also reword documentation in some places such that we include non-commit objects in our terminology. In other words, s/commit/object/ in any prose where the context obviously applies to trees and blobs in a non-pathological way. Signed-off-by: Matthew DeVore <matvore@google.com> --- Documentation/git-rev-list.txt | 21 ++++++++++++--------- Documentation/rev-list-options.txt | 16 ++++++++-------- builtin/rev-list.c | 2 +- 3 files changed, 21 insertions(+), 18 deletions(-)