diff mbox series

[v10,02/12] Iterate over the "refs/" namespace in for_each_[raw]ref

Message ID 45fd65f72e097dcabba6ea15b1d54c85e7271593.1588018418.git.gitgitgadget@gmail.com (mailing list archive)
State New, archived
Headers show
Series Reftable support git-core | expand

Commit Message

John Passaro via GitGitGadget April 27, 2020, 8:13 p.m. UTC
From: Han-Wen Nienhuys <hanwen@google.com>

This happens implicitly in the files/packed ref backend; making it
explicit simplifies adding alternate ref storage backends, such as
reftable.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 refs.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

Comments

Emily Shaffer April 30, 2020, 9:17 p.m. UTC | #1
On Mon, Apr 27, 2020 at 08:13:28PM +0000, Han-Wen Nienhuys via GitGitGadget wrote:
> 
> 
> This happens implicitly in the files/packed ref backend; making it
> explicit simplifies adding alternate ref storage backends, such as
> reftable.

As an outsider to this part of the codebase, a little more explanation
could be handy in the commit message. I found the backends you mentioned
in refs, and it seems like they're the only two, but it's not obvious
how this delta is related to those backends. Furthermore, grepping looks
like this function whose behavior is changing is being called from
somewhere else, with no change to that function (and it looks like the
callsite's callback doesn't check whether a ref begins with refs/ or
not).

All this to say - it's hard to convince myself this is a safe change,
and I'd really like to read more to understand why you made it.

> 
> Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
> ---
>  refs.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/refs.c b/refs.c
> index b8759116cd0..05e05579408 100644
> --- a/refs.c
> +++ b/refs.c
> @@ -1569,7 +1569,7 @@ static int do_for_each_ref(struct ref_store *refs, const char *prefix,
>  
>  int refs_for_each_ref(struct ref_store *refs, each_ref_fn fn, void *cb_data)
>  {
> -	return do_for_each_ref(refs, "", fn, 0, 0, cb_data);
> +	return do_for_each_ref(refs, "refs/", fn, 0, 0, cb_data);
>  }
>  
>  int for_each_ref(each_ref_fn fn, void *cb_data)
> @@ -1629,8 +1629,8 @@ int for_each_namespaced_ref(each_ref_fn fn, void *cb_data)
>  
>  int refs_for_each_rawref(struct ref_store *refs, each_ref_fn fn, void *cb_data)
>  {
> -	return do_for_each_ref(refs, "", fn, 0,
> -			       DO_FOR_EACH_INCLUDE_BROKEN, cb_data);
> +	return do_for_each_ref(refs, "refs/", fn, 0, DO_FOR_EACH_INCLUDE_BROKEN,
> +			       cb_data);
>  }
>  
>  int for_each_rawref(each_ref_fn fn, void *cb_data)
> -- 
> gitgitgadget
>
Han-Wen Nienhuys May 4, 2020, 6:03 p.m. UTC | #2
On Thu, Apr 30, 2020 at 11:17 PM Emily Shaffer <emilyshaffer@google.com> wrote:
>
> On Mon, Apr 27, 2020 at 08:13:28PM +0000, Han-Wen Nienhuys via GitGitGadget wrote:
> >
> >
> > This happens implicitly in the files/packed ref backend; making it
> > explicit simplifies adding alternate ref storage backends, such as
> > reftable.
>
> As an outsider to this part of the codebase, a little more explanation
> could be handy in the commit message. I found the backends you mentioned
> in refs, and it seems like they're the only two, but it's not obvious
> how this delta is related to those backends. Furthermore, grepping looks
> like this function whose behavior is changing is being called from
> somewhere else, with no change to that function (and it looks like the
> callsite's callback doesn't check whether a ref begins with refs/ or
> not).
>
> All this to say - it's hard to convince myself this is a safe change,
> and I'd really like to read more to understand why you made it.

I'll be the first to admit that I'm on shaky ground here. However,
given that the test suite passes, if this is breaking some behavior,
it's probably not very well tested behavior.

Here is what I know:

Git stores refs in multiple places:
- normal refs, packed: .git/packed-refs
- normal refs, loose: under refs/*/
- special refs: HEAD, ORIG_HEAD, REBASE_HEAD.

Currently, the special refs can only be read with refs_read_raw_ref().
If you iterate over the refs in the files/packed backend, you can
never find HEAD, ORIG_HEAD etc.

Reftable does not have different classes of ref storage, so this means
that the whole space of refs (including HEAD) is managed by the
reftable backend, and if you iterate over the refs, reftable will also
produce HEAD. This is not spelled out in the spec, but JGit actually
stores HEAD in reftable too, and we want to be interoperable.

Without this patch, commands like "git show-ref" will produce an entry
"HEAD", which is a regression.

With this patch, the default iteration is limited to the "refs/"
prefix, so we don't produce HEAD in the reftable backend by default.

Now, I have several questions:

* how does this interact with worktrees? It seems that there is a
special worktree namespace?
* if you are doing a rebase, and have unreachable objects in
ORIG_HEAD, REBASE_HEAD, how does git-gc ensure that these objects stay
alive? I'd think that you need to iterate over all entries (including
REBASE_HEAD and friends), but I haven't been able to understand how
that works.
Han-Wen Nienhuys May 5, 2020, 6:26 p.m. UTC | #3
On Mon, May 4, 2020 at 8:03 PM Han-Wen Nienhuys <hanwen@google.com> wrote:
> >
> > All this to say - it's hard to convince myself this is a safe change,
> > and I'd really like to read more to understand why you made it.
>
> I'll be the first to admit that I'm on shaky ground here. However,
> given that the test suite passes, if this is breaking some behavior,
> it's probably not very well tested behavior.
>
> Here is what I know:
>
> Git stores refs in multiple places:
> - normal refs, packed: .git/packed-refs
> - normal refs, loose: under refs/*/
> - special refs: HEAD, ORIG_HEAD, REBASE_HEAD.
>
> Currently, the special refs can only be read with refs_read_raw_ref().
> If you iterate over the refs in the files/packed backend, you can
> never find HEAD, ORIG_HEAD etc.

A large source of test failures of the reftable patch series is the
handling of pseudo refs (HEAD, CHERRY_PICK_HEAD etc.)

These are written as

  if (ref_type(refname) == REF_TYPE_PSEUDOREF) {
    assert(refs == get_main_ref_store(the_repository));
    ret = write_pseudoref(refname, new_oid, old_oid, &err);
  } else {
    t = ref_store_transaction_begin(refs, &err);
    ..

write_pseudoref writes a file directly to the .git/ directory.

However, on read (for example, CHERRY_PICK_HEAD), the pseudo ref is
given to lookup_commit_reference_by_name, which in the end calls
refs_resolve_ref_unsafe(). If reftable is active, the pseudo ref isn't
read from .git/CHERRY_PICK_HEAD.

If the pseudo refs are read through a ref backend, shouldn't they be
written through a ref backend too?

This causes all rebase and all cherry-pick tests to fail with reftable.
diff mbox series

Patch

diff --git a/refs.c b/refs.c
index b8759116cd0..05e05579408 100644
--- a/refs.c
+++ b/refs.c
@@ -1569,7 +1569,7 @@  static int do_for_each_ref(struct ref_store *refs, const char *prefix,
 
 int refs_for_each_ref(struct ref_store *refs, each_ref_fn fn, void *cb_data)
 {
-	return do_for_each_ref(refs, "", fn, 0, 0, cb_data);
+	return do_for_each_ref(refs, "refs/", fn, 0, 0, cb_data);
 }
 
 int for_each_ref(each_ref_fn fn, void *cb_data)
@@ -1629,8 +1629,8 @@  int for_each_namespaced_ref(each_ref_fn fn, void *cb_data)
 
 int refs_for_each_rawref(struct ref_store *refs, each_ref_fn fn, void *cb_data)
 {
-	return do_for_each_ref(refs, "", fn, 0,
-			       DO_FOR_EACH_INCLUDE_BROKEN, cb_data);
+	return do_for_each_ref(refs, "refs/", fn, 0, DO_FOR_EACH_INCLUDE_BROKEN,
+			       cb_data);
 }
 
 int for_each_rawref(each_ref_fn fn, void *cb_data)