[1/9] refs: make _advance() check struct repo, not flag

Message ID	493fff7f4716d889da751b5f8c6740cc1e3aa360.1632242495.git.jonathantanmy@google.com (mailing list archive)
State	Superseded
Headers	show Return-Path: <git-owner@kernel.org> Date: Tue, 21 Sep 2021 09:51:03 -0700 In-Reply-To: <cover.1632242495.git.jonathantanmy@google.com> Message-Id: <493fff7f4716d889da751b5f8c6740cc1e3aa360.1632242495.git.jonathantanmy@google.com> Mime-Version: 1.0 References: <cover.1632242495.git.jonathantanmy@google.com> Subject: [PATCH 1/9] refs: make _advance() check struct repo, not flag From: Jonathan Tan <jonathantanmy@google.com> To: git@vger.kernel.org Cc: Jonathan Tan <jonathantanmy@google.com> Content-Type: text/plain; charset="UTF-8" Precedence: bulk
Series	No more adding submodule ODB as alternate \| expand [0/9] No more adding submodule ODB as alternate [1/9] refs: make _advance() check struct repo, not flag [2/9] refs: add repo paramater to _iterator_peel() [3/9] refs iterator: support non-the_repository advance [4/9] refs: teach refs_for_each_ref() arbitrary repos [5/9] merge-{ort,recursive}: remove add_submodule_odb() [6/9] object-file: only register submodule ODB if needed [7/9] submodule: pass repo to check_has_commit() [8/9] refs: change refs_for_each_ref_in() to take repo [9/9] submodule: trace adding submodule ODB as alternate

Jonathan Tan Sept. 21, 2021, 4:51 p.m. UTC

Currently, ref iterators access the object store each time they advance
if and only if the boolean flag DO_FOR_EACH_INCLUDE_BROKEN is unset.
(The iterators access the object store because, if
DO_FOR_EACH_INCLUDE_BROKEN is unset, they need to attempt to resolve
each ref to determine that it is not broken.)

Also, the object store accessed is always that of the_repository, making
it impossible to iterate over a submodule's refs without
DO_FOR_EACH_INCLUDE_BROKEN (unless add_submodule_odb() is used).

As a first step in resolving both these problems, replace the
DO_FOR_EACH_INCLUDE_BROKEN flag with a struct repository pointer. This
commit is a mechanical conversion - whenever DO_FOR_EACH_INCLUDE_BROKEN
is set, a NULL repository (representing access to no object store) is
used instead, and whenever DO_FOR_EACH_INCLUDE_BROKEN is unset, a
non-NULL repository (representing access to that repository's object
store) is used instead. Right now, the locations in which
non-the_repository support needs to be added are marked with BUG()
statements - in a future patch, these will be replaced. (NEEDSWORK: in
this RFC patch set, this has not been done)

I have considered and rejected the following design alternative:

- Change the _advance() callback to also have a repository object
  parameter, and either skip or not skip depending on whether that
  parameter is NULL. This burdens callers to have to carry this
  information along with the iterator, and such calling code may be
  unclear as to why that parameter can be NULL in some cases and cannot
  in others.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 refs.c                | 46 +++++++++++++++++++++++--------------------
 refs/files-backend.c  | 14 ++++---------
 refs/iterator.c       | 18 ++++++++++++++++-
 refs/packed-backend.c | 10 +---------
 refs/refs-internal.h  | 27 +++++++++++++++----------
 5 files changed, 64 insertions(+), 51 deletions(-)

Junio C Hamano Sept. 23, 2021, 1 a.m. UTC | #1

Jonathan Tan <jonathantanmy@google.com> writes:

> As a first step in resolving both these problems, replace the
> DO_FOR_EACH_INCLUDE_BROKEN flag with a struct repository pointer. This
> commit is a mechanical conversion - whenever DO_FOR_EACH_INCLUDE_BROKEN
> is set, a NULL repository (representing access to no object store) is
> used instead, and whenever DO_FOR_EACH_INCLUDE_BROKEN is unset, a
> non-NULL repository (representing access to that repository's object
> store) is used instead.

Hmph, so the lack of "include broken" is a signal to validate the
object the ref points at, and the new parameter is "if this pointer
is not NULL, then expect to find the object in this repository and
validate it" that replaces the original "validate it" with a bit
more detailed instruction (i.e. "how to validate--use the object
store associated to this repository")?

> Right now, the locations in which
> non-the_repository support needs to be added are marked with BUG()
> statements - in a future patch, these will be replaced. (NEEDSWORK: in
> this RFC patch set, this has not been done)

> - Change the _advance() callback to also have a repository object
>   parameter, and either skip or not skip depending on whether that
>   parameter is NULL. This burdens callers to have to carry this
>   information along with the iterator, and such calling code may be
>   unclear as to why that parameter can be NULL in some cases and cannot
>   in others.

Hmph.  

> diff --git a/refs.c b/refs.c
> index 8b9f7c3a80..49ddcdac53 100644
> --- a/refs.c
> +++ b/refs.c
> @@ -1413,16 +1413,16 @@ int head_ref(each_ref_fn fn, void *cb_data)
>  
>  struct ref_iterator *refs_ref_iterator_begin(
>  		struct ref_store *refs,
> -		const char *prefix, int trim, int flags)
> +		const char *prefix, int trim, struct repository *repo,
> +		int flags)
>  {
>  	struct ref_iterator *iter;
>  
>  	if (ref_paranoia < 0)
>  		ref_paranoia = git_env_bool("GIT_REF_PARANOIA", 0);
> -	if (ref_paranoia)
> -		flags |= DO_FOR_EACH_INCLUDE_BROKEN;
>  
>  	iter = refs->be->iterator_begin(refs, prefix, flags);
> +	iter->repo = ref_paranoia ? NULL : repo;

OK.  "flags" is still kept because there are bits other than
"include broken" that need to be propagated.

> @@ -1442,13 +1442,16 @@ struct ref_iterator *refs_ref_iterator_begin(
>   * Call fn for each reference in the specified submodule for which the
>   * refname begins with prefix. If trim is non-zero, then trim that
>   * many characters off the beginning of each refname before passing
> - * the refname to fn. flags can be DO_FOR_EACH_INCLUDE_BROKEN to
> - * include broken references in the iteration. If fn ever returns a
> + * the refname to fn. If fn ever returns a
>   * non-zero value, stop the iteration and return that value;
>   * otherwise, return 0.
> + *
> + * See the documentation of refs_ref_iterator_begin() for more information on
> + * the repo parameter.
>   */
>  static int do_for_each_repo_ref(struct repository *r, const char *prefix,
> -				each_repo_ref_fn fn, int trim, int flags,
> +				each_repo_ref_fn fn, int trim,
> +				struct repository *repo, int flags,
>  				void *cb_data)

Confusing.  We are iterating refs that exists in the repository "r",
right?  Why do we need to have an extra "repo" parameter?  Can they
ever diverge (beyond repo could be NULL to signal now-lost "include
broken" bit wanted to convey)?  It's not like a valid caller can
pass the superproject in 'r' and a submodule in 'repo', right?

Enhancing an interface this way, and allowing an arbitrary
repository instance to be passed only to convey one bit of
information, by adding a "repo" smells like inviting bugs in the
future.

I have a feeling that the function signature for this one should
stay as before, and "repo" should be a local variable that is
initialized as

	struct repository *repo = (flags & DO_FOR_EACH_INCLUDE_BROKEN)
				? r
				: NULL;

to avoid such a future bug, but given that there is only one caller
to this helper, I do not mind

	if (repo && r != repo)
		BUG(...);

to catch any such mistake.

>  int for_each_replace_ref(struct repository *r, each_repo_ref_fn fn, void *cb_data)
>  {
>  	return do_for_each_repo_ref(r, git_replace_ref_base, fn,
>  				    strlen(git_replace_ref_base),
> -				    DO_FOR_EACH_INCLUDE_BROKEN, cb_data);
> +				    NULL, 0, cb_data);
>  }

And this is the only such caller, if I am reading the code right.

Do we ever pass non-NULL "repo" to do_for_each_repo_ref() in future
steps?

If not, perhaps we do not even have to add "repo" as a new parameter
to do_for_each_repo_ref(), and instead always pass NULL down to
refs_ref_iterator_begin() from do_for_each_repo_ref()?

> diff --git a/refs/files-backend.c b/refs/files-backend.c
> index 677b7e4cdd..cd145301d0 100644
> --- a/refs/files-backend.c
> +++ b/refs/files-backend.c
> @@ -744,12 +744,6 @@ static int files_ref_iterator_advance(struct ref_iterator *ref_iterator)
>  		    ref_type(iter->iter0->refname) != REF_TYPE_PER_WORKTREE)
>  			continue;
>  
> -		if (!(iter->flags & DO_FOR_EACH_INCLUDE_BROKEN) &&
> -		    !ref_resolves_to_object(iter->iter0->refname,
> -					    iter->iter0->oid,
> -					    iter->iter0->flags))
> -			continue;
> -
>  		iter->base.refname = iter->iter0->refname;
>  		iter->base.oid = iter->iter0->oid;
>  		iter->base.flags = iter->iter0->flags;
> @@ -801,9 +795,6 @@ static struct ref_iterator *files_ref_iterator_begin(
>  	struct ref_iterator *ref_iterator;
>  	unsigned int required_flags = REF_STORE_READ;
>  
> -	if (!(flags & DO_FOR_EACH_INCLUDE_BROKEN))
> -		required_flags |= REF_STORE_ODB;
> -
>  	refs = files_downcast(ref_store, required_flags, "ref_iterator_begin");
>  
>  	/*

Hmph, I am not sure where the lossage in these two hunks are
compensated.  Perhaps in the backend independent layer in
refs/iterator.c?  Let's read on.

> @@ -836,10 +827,13 @@ static struct ref_iterator *files_ref_iterator_begin(
>  	 * references, and (if needed) do our own check for broken
>  	 * ones in files_ref_iterator_advance(), after we have merged
>  	 * the packed and loose references.
> +	 *
> +	 * Do this by not supplying any repo, regardless of whether a repo was
> +	 * supplied to files_ref_iterator_begin().
>  	 */
>  	packed_iter = refs_ref_iterator_begin(
>  			refs->packed_ref_store, prefix, 0,
> -			DO_FOR_EACH_INCLUDE_BROKEN);
> +			NULL, 0);

OK.

> diff --git a/refs/iterator.c b/refs/iterator.c
> index a89d132d4f..5af6554887 100644
> --- a/refs/iterator.c
> +++ b/refs/iterator.c
> @@ -10,7 +10,23 @@
>  
>  int ref_iterator_advance(struct ref_iterator *ref_iterator)
>  {
> -	return ref_iterator->vtable->advance(ref_iterator);
> +	int ok;
> +
> +	if (ref_iterator->repo && ref_iterator->repo != the_repository)

OK. refs_ref_interator_begin() assigned the "repo" parameter that
tells which repository to consult to validate the objects at the tip
of refs to the .repo member of the iterator object, and we check it
here.

It is a bit surprising that ref_iterator does not know which
repository it is working in (regardless of "include broken" bit).
Do you think it will stay that way?  I have this nagging feeling
that it won't, and having "struct repository *repository" pointer
that always points at the repository the ref-store belongs to in a
ref_iterator instance would become necessary in the longer run.

In which case, this .repo member this patch adds would become a big
problem, no?  If we were to validate objects at the tip of the refs
against object store, we will always use the object store that
belongs to the iterator->repository, so the only valid states for
iterator->repo are either NULL or iterator->repository.  That again
is the same problem I pointed out already about the parameter the
do_for_each_repo_ref() helper that is inviting future bugs, it seems
to me.  Wouldn't it make more sense to add

 * iterator->repository that points at the repository in which we
   are iterating the refs

 * a bit in iterator that chooses between "do not bother checking"
   and "do check the tip of refs against the object store of
   iterator->repository

to avoid such a mess?  Perhaps we already have such a bit in the
flags word in the ref_iterator but I didn't check.

Thanks.

Jonathan Tan Sept. 24, 2021, 5:56 p.m. UTC | #2

> Jonathan Tan <jonathantanmy@google.com> writes:
> 
> > As a first step in resolving both these problems, replace the
> > DO_FOR_EACH_INCLUDE_BROKEN flag with a struct repository pointer. This
> > commit is a mechanical conversion - whenever DO_FOR_EACH_INCLUDE_BROKEN
> > is set, a NULL repository (representing access to no object store) is
> > used instead, and whenever DO_FOR_EACH_INCLUDE_BROKEN is unset, a
> > non-NULL repository (representing access to that repository's object
> > store) is used instead.
> 
> Hmph, so the lack of "include broken" is a signal to validate the
> object the ref points at, and the new parameter is "if this pointer
> is not NULL, then expect to find the object in this repository and
> validate it" that replaces the original "validate it" with a bit
> more detailed instruction (i.e. "how to validate--use the object
> store associated to this repository")?

Yes.

> > @@ -1442,13 +1442,16 @@ struct ref_iterator *refs_ref_iterator_begin(
> >   * Call fn for each reference in the specified submodule for which the
> >   * refname begins with prefix. If trim is non-zero, then trim that
> >   * many characters off the beginning of each refname before passing
> > - * the refname to fn. flags can be DO_FOR_EACH_INCLUDE_BROKEN to
> > - * include broken references in the iteration. If fn ever returns a
> > + * the refname to fn. If fn ever returns a
> >   * non-zero value, stop the iteration and return that value;
> >   * otherwise, return 0.
> > + *
> > + * See the documentation of refs_ref_iterator_begin() for more information on
> > + * the repo parameter.
> >   */
> >  static int do_for_each_repo_ref(struct repository *r, const char *prefix,
> > -				each_repo_ref_fn fn, int trim, int flags,
> > +				each_repo_ref_fn fn, int trim,
> > +				struct repository *repo, int flags,
> >  				void *cb_data)
> 
> Confusing.  We are iterating refs that exists in the repository "r",
> right?  Why do we need to have an extra "repo" parameter?  Can they
> ever diverge (beyond repo could be NULL to signal now-lost "include
> broken" bit wanted to convey)?  It's not like a valid caller can
> pass the superproject in 'r' and a submodule in 'repo', right?
> 
> Enhancing an interface this way, and allowing an arbitrary
> repository instance to be passed only to convey one bit of
> information, by adding a "repo" smells like inviting bugs in the
> future.
> 
> I have a feeling that the function signature for this one should
> stay as before, and "repo" should be a local variable that is
> initialized as
> 
> 	struct repository *repo = (flags & DO_FOR_EACH_INCLUDE_BROKEN)
> 				? r
> 				: NULL;
> 
> to avoid such a future bug, but given that there is only one caller
> to this helper, I do not mind
> 
> 	if (repo && r != repo)
> 		BUG(...);
> 
> to catch any such mistake.

(see next answer)

> >  int for_each_replace_ref(struct repository *r, each_repo_ref_fn fn, void *cb_data)
> >  {
> >  	return do_for_each_repo_ref(r, git_replace_ref_base, fn,
> >  				    strlen(git_replace_ref_base),
> > -				    DO_FOR_EACH_INCLUDE_BROKEN, cb_data);
> > +				    NULL, 0, cb_data);
> >  }
> 
> And this is the only such caller, if I am reading the code right.
> 
> Do we ever pass non-NULL "repo" to do_for_each_repo_ref() in future
> steps?
> 
> If not, perhaps we do not even have to add "repo" as a new parameter
> to do_for_each_repo_ref(), and instead always pass NULL down to
> refs_ref_iterator_begin() from do_for_each_repo_ref()?

do_for_each_repo_ref() does not gain future callers in future steps (so
we never pass non-NULL "repo"). I'll do this (and add a comment to
do_for_each_repo_ref() explaining that we do not check for broken refs).

> > diff --git a/refs/files-backend.c b/refs/files-backend.c
> > index 677b7e4cdd..cd145301d0 100644
> > --- a/refs/files-backend.c
> > +++ b/refs/files-backend.c
> > @@ -744,12 +744,6 @@ static int files_ref_iterator_advance(struct ref_iterator *ref_iterator)
> >  		    ref_type(iter->iter0->refname) != REF_TYPE_PER_WORKTREE)
> >  			continue;
> >  
> > -		if (!(iter->flags & DO_FOR_EACH_INCLUDE_BROKEN) &&
> > -		    !ref_resolves_to_object(iter->iter0->refname,
> > -					    iter->iter0->oid,
> > -					    iter->iter0->flags))
> > -			continue;
> > -
> >  		iter->base.refname = iter->iter0->refname;
> >  		iter->base.oid = iter->iter0->oid;
> >  		iter->base.flags = iter->iter0->flags;
> > @@ -801,9 +795,6 @@ static struct ref_iterator *files_ref_iterator_begin(
> >  	struct ref_iterator *ref_iterator;
> >  	unsigned int required_flags = REF_STORE_READ;
> >  
> > -	if (!(flags & DO_FOR_EACH_INCLUDE_BROKEN))
> > -		required_flags |= REF_STORE_ODB;
> > -
> >  	refs = files_downcast(ref_store, required_flags, "ref_iterator_begin");
> >  
> >  	/*
> 
> Hmph, I am not sure where the lossage in these two hunks are
> compensated.  Perhaps in the backend independent layer in
> refs/iterator.c?  Let's read on.

Yes - the first hunk is compensated in the backend independent layer,
and the second hunk was there to ensure that the first hunk would work
(and since the first hunk is removed, the second hunk no longer needs to
be there). I'll add a note in the commit message.

> > @@ -10,7 +10,23 @@
> >  
> >  int ref_iterator_advance(struct ref_iterator *ref_iterator)
> >  {
> > -	return ref_iterator->vtable->advance(ref_iterator);
> > +	int ok;
> > +
> > +	if (ref_iterator->repo && ref_iterator->repo != the_repository)
> 
> OK. refs_ref_interator_begin() assigned the "repo" parameter that
> tells which repository to consult to validate the objects at the tip
> of refs to the .repo member of the iterator object, and we check it
> here.
> 
> It is a bit surprising that ref_iterator does not know which
> repository it is working in (regardless of "include broken" bit).
> Do you think it will stay that way?  I have this nagging feeling
> that it won't, and having "struct repository *repository" pointer
> that always points at the repository the ref-store belongs to in a
> ref_iterator instance would become necessary in the longer run.

I think it's better if it stays that way, so that callers that don't
want to access the ODB (all the callers that currently use
DO_FOR_EACH_INCLUDE_BROKEN) can be assured that the iterator won't do
that. If we had a non-NULL "struct repository *repository" parameter, a
future code change might inadvertently use it, thus causing a bug.

Right now I think that this is possible, since the only other thing that
accesses the ODB is peeling, and that is handled by the next patch
(patch 2/9). If we think that it won't stay that way in the future,
though, then I agree with you.

> In which case, this .repo member this patch adds would become a big
> problem, no?  If we were to validate objects at the tip of the refs
> against object store, we will always use the object store that
> belongs to the iterator->repository, so the only valid states for
> iterator->repo are either NULL or iterator->repository.  That again
> is the same problem I pointed out already about the parameter the
> do_for_each_repo_ref() helper that is inviting future bugs, it seems
> to me.  Wouldn't it make more sense to add
> 
>  * iterator->repository that points at the repository in which we
>    are iterating the refs
> 
>  * a bit in iterator that chooses between "do not bother checking"
>    and "do check the tip of refs against the object store of
>    iterator->repository
> 
> to avoid such a mess?  Perhaps we already have such a bit in the
> flags word in the ref_iterator but I didn't check.

If we need iterator->repository, then I agree with you. The bit could
then be DO_FOR_EACH_INCLUDE_BROKEN (which currently exists, and which I
am removing in this patch, but if we think we should keep it, then we
should keep it).

Having said all that, it may be better to have a non-NULL repository
reference in the iterator and retain DO_FOR_EACH_INCLUDE_BROKEN - at the
very least, this is a more gradual change and still leaves open the
possibility of turning the repository reference into a nullable one.
Callers that use DO_FOR_EACH_INCLUDE_BROKEN will have to deal with an
API that is unclear about what is being done with the repository object
being passed in, but that is the same as the status quo. I'll try it and
see how it goes (it will probably simplify patch 2 too).

Jeff King Sept. 24, 2021, 6:13 p.m. UTC | #3

On Tue, Sep 21, 2021 at 09:51:03AM -0700, Jonathan Tan wrote:

> Currently, ref iterators access the object store each time they advance
> if and only if the boolean flag DO_FOR_EACH_INCLUDE_BROKEN is unset.
> (The iterators access the object store because, if
> DO_FOR_EACH_INCLUDE_BROKEN is unset, they need to attempt to resolve
> each ref to determine that it is not broken.)
> 
> Also, the object store accessed is always that of the_repository, making
> it impossible to iterate over a submodule's refs without
> DO_FOR_EACH_INCLUDE_BROKEN (unless add_submodule_odb() is used).
>
> As a first step in resolving both these problems, replace the
> DO_FOR_EACH_INCLUDE_BROKEN flag with a struct repository pointer. This
> commit is a mechanical conversion - whenever DO_FOR_EACH_INCLUDE_BROKEN
> is set, a NULL repository (representing access to no object store) is
> used instead, and whenever DO_FOR_EACH_INCLUDE_BROKEN is unset, a
> non-NULL repository (representing access to that repository's object
> store) is used instead. Right now, the locations in which
> non-the_repository support needs to be added are marked with BUG()
> statements - in a future patch, these will be replaced. (NEEDSWORK: in
> this RFC patch set, this has not been done)

I think your goal here of passing around a repository object is good.
But rolling the meaning of DO_FOR_EACH_INCLUDE_BROKEN into an implicit
"do we have a non-NULL repository" makes things awkward, I think.

As you noticed, we can't get rid of the flags parameter entirely. We
still have DO_FOR_EACH_PER_WORKTREE_ONLY. But I also have a series which
adds another flag which pairs with INCLUDE_BROKEN. Having half of the
logic implicit in the repository pointer and half in a flag would be
weird.

I'll post that series in a moment, but what I'm wondering here is: would
it be that big a deal to just pass the repository object around, and it
is simply not used if INCLUDE_BROKEN is passed?

-Peff

Jonathan Tan Sept. 24, 2021, 6:28 p.m. UTC | #4

> I think your goal here of passing around a repository object is good.
> But rolling the meaning of DO_FOR_EACH_INCLUDE_BROKEN into an implicit
> "do we have a non-NULL repository" makes things awkward, I think.
> 
> As you noticed, we can't get rid of the flags parameter entirely. We
> still have DO_FOR_EACH_PER_WORKTREE_ONLY. But I also have a series which
> adds another flag which pairs with INCLUDE_BROKEN. Having half of the
> logic implicit in the repository pointer and half in a flag would be
> weird.
> 
> I'll post that series in a moment, but what I'm wondering here is: would
> it be that big a deal to just pass the repository object around, and it
> is simply not used if INCLUDE_BROKEN is passed?

Quoting my response to Junio (sent a few minutes ago, so you might have
not seen it) [1]:

 so that callers that don't
 want to access the ODB (all the callers that currently use
 DO_FOR_EACH_INCLUDE_BROKEN) can be assured that the iterator won't do
 that. If we had a non-NULL "struct repository *repository" parameter, a
 future code change might inadvertently use it, thus causing a bug.

I'll take a look at your series when it comes out, but from what you
say, it looks like we should pass a non-nullable repository and keep the
DO_FOR_EACH_INCLUDE_BROKEN flag. I'll update this patch set to do that.

[1] https://lore.kernel.org/git/20210924175651.2918488-1-jonathantanmy@google.com/

Junio C Hamano Sept. 24, 2021, 7:55 p.m. UTC | #5

Jonathan Tan <jonathantanmy@google.com> writes:

>> It is a bit surprising that ref_iterator does not know which
>> repository it is working in (regardless of "include broken" bit).
>> Do you think it will stay that way?  I have this nagging feeling
>> that it won't, and having "struct repository *repository" pointer
>> that always points at the repository the ref-store belongs to in a
>> ref_iterator instance would become necessary in the longer run.
>
> I think it's better if it stays that way, so that callers that don't
> want to access the ODB (all the callers that currently use
> DO_FOR_EACH_INCLUDE_BROKEN) can be assured that the iterator won't do
> that. If we had a non-NULL "struct repository *repository" parameter, a
> future code change might inadvertently use it, thus causing a bug.

Hmph, I am unlikely to be the one who is doing such a future code
change, but anybody who equates having a pointer to the repository
structure and the need to always validate the tips of refs with the
object store associated to the repository would be poor future
developers we wouldn't want their hands on our codebase.

An in-core repository has a lot more than the object store.

Besides, if we really want to have choice between "do check them"
and "ignore broken" expressed cleanly in the code, it would be much
better to be explicit about it, and a member in the context struct
whose name is ".repo" is not it[*].  A reader would say "ok, I see a
repo.  What is that thing for?  Can I reliably use it to figure out
other things about the repository this reference enumeration is
going on from it?  Ah, no, it sometimes is NULL---what crazy thing
is going on here???".

	Side note.  If it were called
	.check_ref_tips_against_the_object_store_of_this_repository,
	it would be a different story.

>> In which case, this .repo member this patch adds would become a big
>> problem, no?  If we were to validate objects at the tip of the refs
>> against object store, we will always use the object store that
>> belongs to the iterator->repository, so the only valid states for
>> iterator->repo are either NULL or iterator->repository.  That again
>> is the same problem I pointed out already about the parameter the
>> do_for_each_repo_ref() helper that is inviting future bugs, it seems
>> to me.  Wouldn't it make more sense to add
>> 
>>  * iterator->repository that points at the repository in which we
>>    are iterating the refs
>> 
>>  * a bit in iterator that chooses between "do not bother checking"
>>    and "do check the tip of refs against the object store of
>>    iterator->repository
>> 
>> to avoid such a mess?  Perhaps we already have such a bit in the
>> flags word in the ref_iterator but I didn't check.
>
> If we need iterator->repository, then I agree with you. The bit could
> then be DO_FOR_EACH_INCLUDE_BROKEN (which currently exists, and which I
> am removing in this patch, but if we think we should keep it, then we
> should keep it).

I do not care too much about the bit itself.  I have huge trouble
with the idea that representing a single bit with an entire pointer
to a repository, which can cause confusion down the line.  Those who
want to have an access to the repository does not have a reliable
way to get to it, those who do set it can mistakenly set to point at
an unrelated repository.

> Having said all that, it may be better to have a non-NULL repository
> reference in the iterator and retain DO_FOR_EACH_INCLUDE_BROKEN - at the

Yes.

> very least, this is a more gradual change and still leaves open the
> possibility of turning the repository reference into a nullable one.
> Callers that use DO_FOR_EACH_INCLUDE_BROKEN will have to deal with an
> API that is unclear about what is being done with the repository object
> being passed in, but that is the same as the status quo. I'll try it and
> see how it goes (it will probably simplify patch 2 too).

I do not think a structure member of type "struct repository" that
signals anything but "we are not operating in a repository" by being
NULL is a sane interface, so I do not see any positive value in
leaving opent he possibility at all.  The next person who would want
to _optionally_ use a repository for some other purpose (other than
"checking the validity of the object name") may be tempted to add
another member .repo2 next to your .repo---and that would be insane,
given that ref iterator will be iterating over a ref store of a
single repo at any given time.  It is much saner to have a single
"this is the repository the refstore we are iterating over is
attached to" instance, with separate bits "please do validate" and
"do whatever check the second develoepr wanted to signal the need
for with the .repo2 member".

Thanks.

[1/9] refs: make _advance() check struct repo, not flag

Commit Message

Comments

Patch