diff mbox series

[v2,2/8] revision: mark commit parents as NOT_USER_GIVEN

Message ID ddbec7598664bceee50213a41fefa248d249435e.1615813673.git.ps@pks.im (mailing list archive)
State Superseded
Headers show
Series rev-parse: implement object type filter | expand

Commit Message

Patrick Steinhardt March 15, 2021, 1:14 p.m. UTC
The NOT_USER_GIVEN flag of an object marks whether a flag was explicitly
provided by the user or not. The most important use case for this is
when filtering objects: only objects that were not explicitly requested
will get filtered.

The flag is currently only set for blobs and trees, which has been fine
given that there are no filters for tags or commits currently. We're
about to extend filtering capabilities to add object type filter though,
which requires us to set up the NOT_USER_GIVEN flag correctly -- if it's
not set, the object wouldn't get filtered at all.

Mark unseen commit parents as NOT_USER_GIVEN when processing parents.
Like this, explicitly provided parents stay user-given and thus
unfiltered, while parents which get loaded as part of the graph walk
can be filtered.

This commit shouldn't have any user-visible impact yet as there is no
logic to filter commits yet.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 revision.c | 4 ++--
 revision.h | 3 ---
 2 files changed, 2 insertions(+), 5 deletions(-)

Comments

Jeff King April 6, 2021, 5:30 p.m. UTC | #1
On Mon, Mar 15, 2021 at 02:14:36PM +0100, Patrick Steinhardt wrote:

> The NOT_USER_GIVEN flag of an object marks whether a flag was explicitly
> provided by the user or not. The most important use case for this is
> when filtering objects: only objects that were not explicitly requested
> will get filtered.
> 
> The flag is currently only set for blobs and trees, which has been fine
> given that there are no filters for tags or commits currently. We're
> about to extend filtering capabilities to add object type filter though,
> which requires us to set up the NOT_USER_GIVEN flag correctly -- if it's
> not set, the object wouldn't get filtered at all.
> 
> Mark unseen commit parents as NOT_USER_GIVEN when processing parents.
> Like this, explicitly provided parents stay user-given and thus
> unfiltered, while parents which get loaded as part of the graph walk
> can be filtered.
> 
> This commit shouldn't have any user-visible impact yet as there is no
> logic to filter commits yet.

I'm still scratching my head a bit to understand how NOT_USER_GIVEN can
possibly be correct (as opposed to USER_GIVEN). If we visit the commit
in a not-user-given context and add the flag, how do we know it wasn't
_also_ visited in a user-given context?

Just guessing, but perhaps the SEEN flag is saving us here? If we visit
the user-given commit itself first, then we give it the SEEN flag. Then
if we try to visit it again via parent traversal, we've already
processed it and don't add the NOT_USER_GIVEN flag here.

That seems the opposite of the order we'd usually traverse, but I think
we set SEEN on each commit in prepare_revision_walk(), before we do any
traversing.

So I _think_ it all works even with your changes here, but I have to say
this NOT_USER_GIVEN thing seems really fragile to me. Not new in your
series, of course, but something we may want to look at.

Just grepping around, "rev-list -g" will happily remove SEEN flags, so I
suspect it interacts badly with --filter. Just trying "rev-list -g
--objects --filter=object:type=blob HEAD" shows that it produces quite a
lot of commits (which I think is a more fundamental problem: it is not
walking the parent chain at all to assign these NOT_USER_GIVEN flags).

-Peff
Patrick Steinhardt April 9, 2021, 10:19 a.m. UTC | #2
On Tue, Apr 06, 2021 at 01:30:57PM -0400, Jeff King wrote:
> On Mon, Mar 15, 2021 at 02:14:36PM +0100, Patrick Steinhardt wrote:
> 
> > The NOT_USER_GIVEN flag of an object marks whether a flag was explicitly
> > provided by the user or not. The most important use case for this is
> > when filtering objects: only objects that were not explicitly requested
> > will get filtered.
> > 
> > The flag is currently only set for blobs and trees, which has been fine
> > given that there are no filters for tags or commits currently. We're
> > about to extend filtering capabilities to add object type filter though,
> > which requires us to set up the NOT_USER_GIVEN flag correctly -- if it's
> > not set, the object wouldn't get filtered at all.
> > 
> > Mark unseen commit parents as NOT_USER_GIVEN when processing parents.
> > Like this, explicitly provided parents stay user-given and thus
> > unfiltered, while parents which get loaded as part of the graph walk
> > can be filtered.
> > 
> > This commit shouldn't have any user-visible impact yet as there is no
> > logic to filter commits yet.
> 
> I'm still scratching my head a bit to understand how NOT_USER_GIVEN can
> possibly be correct (as opposed to USER_GIVEN). If we visit the commit
> in a not-user-given context and add the flag, how do we know it wasn't
> _also_ visited in a user-given context?
> 
> Just guessing, but perhaps the SEEN flag is saving us here? If we visit
> the user-given commit itself first, then we give it the SEEN flag. Then
> if we try to visit it again via parent traversal, we've already
> processed it and don't add the NOT_USER_GIVEN flag here.

Yes, I think that's mostly it.

> That seems the opposite of the order we'd usually traverse, but I think
> we set SEEN on each commit in prepare_revision_walk(), before we do any
> traversing.
> 
> So I _think_ it all works even with your changes here, but I have to say
> this NOT_USER_GIVEN thing seems really fragile to me. Not new in your
> series, of course, but something we may want to look at.
> 
> Just grepping around, "rev-list -g" will happily remove SEEN flags, so I
> suspect it interacts badly with --filter. Just trying "rev-list -g
> --objects --filter=object:type=blob HEAD" shows that it produces quite a
> lot of commits (which I think is a more fundamental problem: it is not
> walking the parent chain at all to assign these NOT_USER_GIVEN flags).

I totally agree that this feels fragile, and developing this series with
NOT_USER_GIVEN wasn't the most enjoyable experience either. I wouldn't
love to be doing the conversion back to USER_GIVEN as part of this
series, but I wouldn't oppose doing that job either. Right now I don't
feel like I'm sufficiently sure that it's working for all cases, and
indeed your example with "rev-list -g" already shows one case where it's
breaking.

So let me know whether I should add the conversion as preparatory step.

Patrick
diff mbox series

Patch

diff --git a/revision.c b/revision.c
index b78733f508..26f422f50d 100644
--- a/revision.c
+++ b/revision.c
@@ -1123,7 +1123,7 @@  static int process_parents(struct rev_info *revs, struct commit *commit,
 				mark_parents_uninteresting(p);
 			if (p->object.flags & SEEN)
 				continue;
-			p->object.flags |= SEEN;
+			p->object.flags |= (SEEN | NOT_USER_GIVEN);
 			if (list)
 				commit_list_insert_by_date(p, list);
 			if (queue)
@@ -1165,7 +1165,7 @@  static int process_parents(struct rev_info *revs, struct commit *commit,
 		}
 		p->object.flags |= left_flag;
 		if (!(p->object.flags & SEEN)) {
-			p->object.flags |= SEEN;
+			p->object.flags |= (SEEN | NOT_USER_GIVEN);
 			if (list)
 				commit_list_insert_by_date(p, list);
 			if (queue)
diff --git a/revision.h b/revision.h
index e6be3c845e..f1f324a19b 100644
--- a/revision.h
+++ b/revision.h
@@ -44,9 +44,6 @@ 
 /*
  * Indicates object was reached by traversal. i.e. not given by user on
  * command-line or stdin.
- * NEEDSWORK: NOT_USER_GIVEN doesn't apply to commits because we only support
- * filtering trees and blobs, but it may be useful to support filtering commits
- * in the future.
  */
 #define NOT_USER_GIVEN	(1u<<25)
 #define TRACK_LINEAR	(1u<<26)