diff mbox series

[RFC,2/3] Documentation/git-rev-list: s/<commit>/<object>/

Message ID a1ddae16bae563dd904555661e2e916536f388d8.1539298957.git.matvore@google.com (mailing list archive)
State New, archived
Headers show
Series support for filtering trees and blobs based on depth | expand

Commit Message

Matthew DeVore Oct. 11, 2018, 11:09 p.m. UTC
git-rev-list has a mode where it works on the granularity of trees and
blobs, rather than commits only. When discussing this mode in
documenation, it can get awkward to refer to the list of arguments that
may include blobs and trees as <commit>. It is especially awkward in a
follow-up patch, so prepare for that patch by renaming the argument.

In addition to simply renaming the argument, also reword documentation
in some places such that we include non-commit objects in our
terminology. In other words, s/commit/object/ in any prose where the
context obviously applies to trees and blobs in a non-pathological way.

Signed-off-by: Matthew DeVore <matvore@google.com>
---
 Documentation/git-rev-list.txt     | 21 ++++++++++++---------
 Documentation/rev-list-options.txt | 16 ++++++++--------
 builtin/rev-list.c                 |  2 +-
 3 files changed, 21 insertions(+), 18 deletions(-)

Comments

Junio C Hamano Oct. 14, 2018, 11:25 p.m. UTC | #1
Matthew DeVore <matvore@google.com> writes:

> -List commits that are reachable by following the `parent` links from the
> -given commit(s), but exclude commits that are reachable from the one(s)
> -given with a '{caret}' in front of them.  The output is given in reverse
> -chronological order by default.
> +List objects that are reachable by following references from the given
> +object(s), but exclude objects that are reachable from the one(s) given
> +with a '{caret}' in front of them.
>  
> +By default, only commit objects are shown, and the commits are shown in
> +reverse chronological order. The '--object' flag causes non-commit objects
> +to also be shown.
>
> +You can think of this as a set operation.  Objects given on the command
> +line form a set of objects that are reachable from any of them, and then
> +objects reachable from any of the ones given with '{caret}' in front are
> +subtracted from that set.  The remaining objects are what come out in the
>  command's output.  Various other options and paths parameters can be used
>  to further limit the result.

I am not sure if this is a good rewrite.  It gives a false
impression as if you'd not see anything if I did this:

	git rev-list --objects ^master md/filter-trees:t/valgrind

especially if you change the SYNOPSIS and make it claim that the
command takes <object> as starting point.  Updating <commit> to
<object> for those that are _shown_ is OK, but the s/commit/object/
for those that are given as the starting points is not a good change
at all, if I understand what the code is designed to do correctly.

It is more like "this is a set operation across commits.  We also
show objects that are reachable from the commits in the resulting
set and are not reachable from the commits in the set that were
excluded when --objects option is given".
Matthew DeVore Oct. 20, 2018, 12:03 a.m. UTC | #2
On Sun, Oct 14, 2018 at 4:25 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Matthew DeVore <matvore@google.com> writes:
>
> > -List commits that are reachable by following the `parent` links from the
> > -given commit(s), but exclude commits that are reachable from the one(s)
> > -given with a '{caret}' in front of them.  The output is given in reverse
> > -chronological order by default.
> > +List objects that are reachable by following references from the given
> > +object(s), but exclude objects that are reachable from the one(s) given
> > +with a '{caret}' in front of them.
> >
> > +By default, only commit objects are shown, and the commits are shown in
> > +reverse chronological order. The '--object' flag causes non-commit objects
> > +to also be shown.
> >
> > +You can think of this as a set operation.  Objects given on the command
> > +line form a set of objects that are reachable from any of them, and then
> > +objects reachable from any of the ones given with '{caret}' in front are
> > +subtracted from that set.  The remaining objects are what come out in the
> >  command's output.  Various other options and paths parameters can be used
> >  to further limit the result.
>
> I am not sure if this is a good rewrite.  It gives a false
> impression as if you'd not see anything if I did this:
>
>         git rev-list --objects ^master md/filter-trees:t/valgrind
>
Oh that's interesting. So my mental model conflicts with the command's
behavior. It actually is surprising behavior because:

# this shows all files that were modified in the HEAD commit
git rev-list --objects ^HEAD~1^{tree} HEAD:

# but this shows *all* files at HEAD
git rev-list --objects ^HEAD~1 HEAD:

Which means that ^commit and ^non-commit are treated inherently
differently. Maybe I should fix this bug before clarifying this
documentation...

> It is more like "this is a set operation across commits.  We also
> show objects that are reachable from the commits in the resulting
> set and are not reachable from the commits in the set that were
> excluded when --objects option is given".
>
That would be correct though it wouldn't tell that you can use
"--objects ^foo-tree bar-tree." Without fixing the above bug, I could
add to your wording something to the effect of "You can also use trees
to include and exclude sets of objects rather than commits." Which
implies that mixing "--objects ^commit tree" on the command line is
undefined.
Junio C Hamano Oct. 22, 2018, 2:21 a.m. UTC | #3
Matthew DeVore <matvore@google.com> writes:

>> It is more like "this is a set operation across commits.  We also
>> show objects that are reachable from the commits in the resulting
>> set and are not reachable from the commits in the set that were
>> excluded when --objects option is given".
>>
> That would be correct though it wouldn't tell that you can use
> "--objects ^foo-tree bar-tree."

Yeah, but quite honestly, I consider that it is working by accident,
not by design, in the current code (iow, you are looking at a
behaviour of whatever the code happens to do).  "rev-list" is pretty
much set operation across commits, and anything that deals with a
non commit-ish given from the command line is an afterthought at
best, and happenstance in reality.

I do not mean to say that the code must stay that way, though.
Matthew DeVore Dec. 7, 2018, 10:55 p.m. UTC | #4
On Sun, Oct 21, 2018 at 7:21 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Matthew DeVore <matvore@google.com> writes:
>
> >> It is more like "this is a set operation across commits.  We also
> >> show objects that are reachable from the commits in the resulting
> >> set and are not reachable from the commits in the set that were
> >> excluded when --objects option is given".
> >>
> > That would be correct though it wouldn't tell that you can use
> > "--objects ^foo-tree bar-tree."
>
> Yeah, but quite honestly, I consider that it is working by accident,
> not by design, in the current code (iow, you are looking at a
> behaviour of whatever the code happens to do).  "rev-list" is pretty
> much set operation across commits, and anything that deals with a
> non commit-ish given from the command line is an afterthought at
> best, and happenstance in reality.
>
> I do not mean to say that the code must stay that way, though.

I tried fixing the issue of "--objects ^commitobj treeobj" not
properly excluding objects reachable from commitobj, but this ended up
causing t5616-partial-clone.sh to fail. In the test labeled "manual
prefetch of missing objects", we create a clone of srv.bare without
blobs called "pc1", then push some new commits to srv.bare (via a
separate "local" repo), and try to fetch missing blobs with this
command:

$ git -C pc1 fetch-pack --stdin "file://$(pwd)/srv.bare" <observed.oids

Where observed.oids contains all the blobs that were missing. It tells
the remote that it already has the "refs/heads/master" commit, which
means it is excluded. Before, this worked fine, since it didn't mean
the blobs were excluded, only the commit itself. With the fix I
naively put together (and didn't share), because one of the blobs is
reachable from "refs/heads/master" it wouldn't pull it. I guess I can
change it so that items given directly to stdin or argv are always
fetched, though I'm not sure how easy that will be, and I'm not as
interested in fixing this as I once was. I just wanted the
documentation to outline the object-enumeration capabilities. So I'll
send a re-roll which makes much more modest changes to the
documentation.
Junio C Hamano Dec. 8, 2018, 6:24 a.m. UTC | #5
Matthew DeVore <matvore@google.com> writes:

> $ git -C pc1 fetch-pack --stdin "file://$(pwd)/srv.bare" <observed.oids
>
> Where observed.oids contains all the blobs that were missing. It tells
> the remote that it already has the "refs/heads/master" commit, which
> means it is excluded. Before, this worked fine, since it didn't mean
> the blobs were excluded, only the commit itself.

So 'master' has some blob in its tree, which is missing from the
repository, in this test?  Which means that such a repository is
"corrupt" and does not pass connectivity check by fsck.

I am of two minds.  If we claim that we have everything that is
needed to complete the commit sitting at the tip of 'master', I
think it is correct for the other side not to send a blob that is in
'master' (or its ancestors), so your "fix" may (at least
technically) be more correct than the status quo.  On the other
hand, if possession of commit 'master' does not defeat an explicit
request for a blob in it, that would actually be a good thing---it
would be a very straight-forward way to recover from such form of
repository corruption.

Fetching isolated objects without walking is also something that
would help backfill a lazily created clone, and I even vaguely
recall an effort to allow objects explicitly requested to be always
fetched regardless of the connectivity, if I am not mistaken (JTan?)

Anyway, thanks for a thoughtful response.
diff mbox series

Patch

diff --git a/Documentation/git-rev-list.txt b/Documentation/git-rev-list.txt
index 88609ff43..b3357932c 100644
--- a/Documentation/git-rev-list.txt
+++ b/Documentation/git-rev-list.txt
@@ -60,20 +60,23 @@  SYNOPSIS
 	     [ --no-walk ] [ --do-walk ]
 	     [ --count ]
 	     [ --use-bitmap-index ]
-	     <commit>... [ \-- <paths>... ]
+	     <object>... [ \-- <paths>... ]
 
 DESCRIPTION
 -----------
 
-List commits that are reachable by following the `parent` links from the
-given commit(s), but exclude commits that are reachable from the one(s)
-given with a '{caret}' in front of them.  The output is given in reverse
-chronological order by default.
+List objects that are reachable by following references from the given
+object(s), but exclude objects that are reachable from the one(s) given
+with a '{caret}' in front of them.
 
-You can think of this as a set operation.  Commits given on the command
-line form a set of commits that are reachable from any of them, and then
-commits reachable from any of the ones given with '{caret}' in front are
-subtracted from that set.  The remaining commits are what comes out in the
+By default, only commit objects are shown, and the commits are shown in
+reverse chronological order. The '--object' flag causes non-commit objects
+to also be shown.
+
+You can think of this as a set operation.  Objects given on the command
+line form a set of objects that are reachable from any of them, and then
+objects reachable from any of the ones given with '{caret}' in front are
+subtracted from that set.  The remaining objects are what come out in the
 command's output.  Various other options and paths parameters can be used
 to further limit the result.
 
diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt
index 5f1672913..c2c1c40e6 100644
--- a/Documentation/rev-list-options.txt
+++ b/Documentation/rev-list-options.txt
@@ -139,29 +139,29 @@  parents) and `--max-parents=-1` (negative numbers denote no upper limit).
 
 --all::
 	Pretend as if all the refs in `refs/`, along with `HEAD`, are
-	listed on the command line as '<commit>'.
+	listed on the command line as '<object>'.
 
 --branches[=<pattern>]::
 	Pretend as if all the refs in `refs/heads` are listed
-	on the command line as '<commit>'. If '<pattern>' is given, limit
+	on the command line as '<object>'. If '<pattern>' is given, limit
 	branches to ones matching given shell glob. If pattern lacks '?',
 	'{asterisk}', or '[', '/{asterisk}' at the end is implied.
 
 --tags[=<pattern>]::
 	Pretend as if all the refs in `refs/tags` are listed
-	on the command line as '<commit>'. If '<pattern>' is given, limit
+	on the command line as '<object>'. If '<pattern>' is given, limit
 	tags to ones matching given shell glob. If pattern lacks '?', '{asterisk}',
 	or '[', '/{asterisk}' at the end is implied.
 
 --remotes[=<pattern>]::
 	Pretend as if all the refs in `refs/remotes` are listed
-	on the command line as '<commit>'. If '<pattern>' is given, limit
+	on the command line as '<object>'. If '<pattern>' is given, limit
 	remote-tracking branches to ones matching given shell glob.
 	If pattern lacks '?', '{asterisk}', or '[', '/{asterisk}' at the end is implied.
 
 --glob=<glob-pattern>::
 	Pretend as if all the refs matching shell glob '<glob-pattern>'
-	are listed on the command line as '<commit>'. Leading 'refs/',
+	are listed on the command line as '<object>'. Leading 'refs/',
 	is automatically prepended if missing. If pattern lacks '?', '{asterisk}',
 	or '[', '/{asterisk}' at the end is implied.
 
@@ -182,7 +182,7 @@  explicitly.
 
 --reflog::
 	Pretend as if all objects mentioned by reflogs are listed on the
-	command line as `<commit>`.
+	command line as `<object>`.
 
 --single-worktree::
 	By default, all working trees will be examined by the
@@ -205,9 +205,9 @@  ifndef::git-rev-list[]
 endif::git-rev-list[]
 
 --stdin::
-	In addition to the '<commit>' listed on the command
+	In addition to the '<object>' listed on the command
 	line, read them from the standard input. If a `--` separator is
-	seen, stop reading commits and start reading paths to limit the
+	seen, stop reading objects and start reading paths to limit the
 	result.
 
 ifdef::git-rev-list[]
diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 49d6deed7..9817e6747 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -21,7 +21,7 @@ 
 #include "object-store.h"
 
 static const char rev_list_usage[] =
-"git rev-list [OPTION] <commit-id>... [ -- paths... ]\n"
+"git rev-list [OPTION] <object-id>... [ -- paths... ]\n"
 "  limiting output:\n"
 "    --max-count=<n>\n"
 "    --max-age=<epoch>\n"