[v2,0/2] refs: introduce reftable backend

Message ID	cover.1706862705.git.ps@pks.im (mailing list archive)
Headers	show Received: from out1-smtp.messagingengine.com (out1-smtp.messagingengine.com [66.111.4.25]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7D0965FB8A for <git@vger.kernel.org>; Fri, 2 Feb 2024 08:38:30 +0000 (UTC) Feedback-ID: i197146af:Fastmail Date: Fri, 2 Feb 2024 09:38:25 +0100 From: Patrick Steinhardt <ps@pks.im> To: git@vger.kernel.org Cc: Junio C Hamano <gitster@pobox.com>, Han-Wen Nienhuys <hanwen@google.com>, Karthik Nayak <karthik.188@gmail.com> Subject: [PATCH v2 0/2] refs: introduce reftable backend Message-ID: <cover.1706862705.git.ps@pks.im> References: <cover.1706601199.git.ps@pks.im> Precedence: bulk MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="GuXJXIQdxPEWftCR" Content-Disposition: inline In-Reply-To: <cover.1706601199.git.ps@pks.im>
Series	refs: introduce reftable backend \| expand [v2,0/2] refs: introduce reftable backend [v2,1/2] refs: introduce reftable backend [v2,2/2] ci: add jobs to test with the reftable backend

Patrick Steinhardt Feb. 2, 2024, 8:38 a.m. UTC

Hi,

this is the second version of my patch series that introduces the
reftable backend.

This version addresses the feedback by Karthik. I don't think it really
makes sense to spell out every change here -- the range diff should
likely be easier to digest.

Thanks!

Patrick

Patrick Steinhardt (2):
  refs: introduce reftable backend
  ci: add jobs to test with the reftable backend

 .github/workflows/main.yml                    |    9 +
 .gitlab-ci.yml                                |    9 +
 Documentation/ref-storage-format.txt          |    2 +
 .../technical/repository-version.txt          |    5 +-
 Makefile                                      |    1 +
 ci/lib.sh                                     |    2 +-
 ci/run-build-and-tests.sh                     |    3 +
 contrib/workdir/git-new-workdir               |    2 +-
 path.c                                        |    2 +-
 path.h                                        |    1 +
 refs.c                                        |    1 +
 refs/refs-internal.h                          |    1 +
 refs/reftable-backend.c                       | 2295 +++++++++++++++++
 repository.h                                  |    5 +-
 t/t0610-reftable-basics.sh                    |  887 +++++++
 t/t0611-reftable-httpd.sh                     |   26 +
 t/test-lib.sh                                 |    2 +
 17 files changed, 3246 insertions(+), 7 deletions(-)
 create mode 100644 refs/reftable-backend.c
 create mode 100755 t/t0610-reftable-basics.sh
 create mode 100755 t/t0611-reftable-httpd.sh

Range-diff against v1:
1:  5598cd1307 ! 1:  d6548dcfad refs: introduce reftable backend
    @@ Commit message
             previously been deleting reflogs together with their refs to avoid
             file/directory conflicts, which is not necessary anymore.
     
    +      - We can properly enumerate all refs. With the "files" backend it is
    +        not easily possible to distinguish between refs and non-refs because
    +        they may live side by side in the gitdir.
    +
         Not all of these improvements are realized with the current "reftable"
         backend implementation. At this point, the new backend is supposed to be
         a drop-in replacement for the "files" backend that is used by basically
    @@ Commit message
             workloads we would likely also want to perform pack loose refs,
             which would likely change the picture.
     
    -            Benchmark 1: update-ref: create refs sequentially (refformat = files)
    +            Benchmark 1: update-ref: create refs sequentially (refformat = files, refcount = 1)
                   Time (mean ± σ):       2.1 ms ±   0.3 ms    [User: 0.6 ms, System: 1.7 ms]
                   Range (min … max):     1.8 ms …   4.3 ms    133 runs
     
    -            Benchmark 2: update-ref: create refs sequentially (refformat = reftable)
    +            Benchmark 2: update-ref: create refs sequentially (refformat = reftable, refcount = 1)
                   Time (mean ± σ):       2.7 ms ±   0.1 ms    [User: 0.6 ms, System: 2.2 ms]
                   Range (min … max):     2.4 ms …   2.9 ms    132 runs
     
    -            Benchmark 3: update-ref: create refs sequentially (refformat = files)
    +            Benchmark 3: update-ref: create refs sequentially (refformat = files, refcount = 1000)
                   Time (mean ± σ):      1.975 s ±  0.006 s    [User: 0.437 s, System: 1.535 s]
                   Range (min … max):    1.969 s …  1.980 s    3 runs
     
    -            Benchmark 4: update-ref: create refs sequentially (refformat = reftable)
    +            Benchmark 4: update-ref: create refs sequentially (refformat = reftable, refcount = 1000)
                   Time (mean ± σ):      2.611 s ±  0.013 s    [User: 0.782 s, System: 1.825 s]
                   Range (min … max):    2.597 s …  2.622 s    3 runs
     
    -            Benchmark 5: update-ref: create refs sequentially (refformat = files)
    +            Benchmark 5: update-ref: create refs sequentially (refformat = files, refcount = 100000)
                   Time (mean ± σ):     198.442 s ±  0.241 s    [User: 43.051 s, System: 155.250 s]
                   Range (min … max):   198.189 s … 198.670 s    3 runs
     
    -            Benchmark 6: update-ref: create refs sequentially (refformat = reftable)
    +            Benchmark 6: update-ref: create refs sequentially (refformat = reftable, refcount = 100000)
                   Time (mean ± σ):     294.509 s ±  4.269 s    [User: 104.046 s, System: 190.326 s]
                   Range (min … max):   290.223 s … 298.761 s    3 runs
     
    @@ refs/reftable-backend.c (new)
     +#include "../git-compat-util.h"
     +#include "../abspath.h"
     +#include "../chdir-notify.h"
    -+#include "../config.h"
     +#include "../environment.h"
     +#include "../gettext.h"
     +#include "../hash.h"
    @@ refs/reftable-backend.c (new)
     +#include "../reftable/reftable-stack.h"
     +#include "../reftable/reftable-record.h"
     +#include "../reftable/reftable-error.h"
    -+#include "../reftable/reftable-blocksource.h"
    -+#include "../reftable/reftable-reader.h"
     +#include "../reftable/reftable-iterator.h"
     +#include "../reftable/reftable-merged.h"
    -+#include "../reftable/reftable-generic.h"
     +#include "../setup.h"
     +#include "../strmap.h"
    -+#include "../worktree.h"
     +#include "refs-internal.h"
     +
     +/*
    @@ refs/reftable-backend.c (new)
     +struct reftable_ref_store {
     +	struct ref_store base;
     +
    ++	/*
    ++	 * The main stack refers to the common dir and thus contains common
    ++	 * refs as well as refs of the main repository.
    ++	 */
     +	struct reftable_stack *main_stack;
    ++	/*
    ++	 * The worktree stack refers to the gitdir in case the refdb is opened
    ++	 * via a worktree. It thus contains the per-worktree refs.
    ++	 */
     +	struct reftable_stack *worktree_stack;
    ++	/*
    ++	 * Map of worktree stacks by their respective worktree names. The map
    ++	 * is populated lazily when we try to resolve `worktrees/$worktree` refs.
    ++	 */
     +	struct strmap worktree_stacks;
     +	struct reftable_write_options write_options;
     +
    @@ refs/reftable-backend.c (new)
     +	/*
     +	 * Set up the main reftable stack that is hosted in GIT_COMMON_DIR.
     +	 * This stack contains both the shared and the main worktree refs.
    ++	 *
    ++	 * Note that we don't try to resolve the path in case we have a
    ++	 * worktree because `get_common_dir_noenv()` already does it for us.
     +	 */
     +	is_worktree = get_common_dir_noenv(&path, gitdir);
     +	if (!is_worktree) {
    @@ refs/reftable-backend.c (new)
     +	/*
     +	 * If we're in a worktree we also need to set up the worktree reftable
     +	 * stack that is contained in the per-worktree GIT_DIR.
    ++	 *
    ++	 * Ideally, we would also add the stack to our worktree stack map. But
    ++	 * we have no way to figure out the worktree name here and thus can't
    ++	 * do it efficiently.
     +	 */
     +	if (is_worktree) {
     +		strbuf_reset(&path);
    @@ refs/reftable-backend.c (new)
     +		}
     +
     +		 /*
    -+		  * Otherwise, if we either have no worktree refs anymore or if
    -+		  * the common ref sorts before the next worktree ref, we need
    -+		  * to figure out whether the next common ref belongs to the
    -+		  * main worktree. In that case, it should be ignored.
    ++		  * We now know that the lexicographically-next ref is a common
    ++		  * ref. When the common ref is a shared one we return it.
     +		  */
     +		if (parse_worktree_ref(iter_common->refname, NULL, NULL,
     +				       NULL) == REF_WORKTREE_SHARED)
     +			return ITER_SELECT_1;
     +
    ++		/*
    ++		 * Otherwise, if the common ref is a per-worktree ref we skip
    ++		 * it because it would belong to the main worktree, not ours.
    ++		 */
     +		return ITER_SKIP_1;
     +	} else {
     +		return ITER_DONE;
    @@ refs/reftable-backend.c (new)
     +		}
     +
     +		if (u->type & REF_ISSYMREF) {
    ++			/*
    ++			 * The reftable stack is locked at this point already,
    ++			 * so it is safe to call `refs_resolve_ref_unsafe()`
    ++			 * here without causing races.
    ++			 */
     +			const char *resolved = refs_resolve_ref_unsafe(&refs->base, u->refname, 0,
     +								       &current_oid, NULL);
     +
     +			if (u->flags & REF_NO_DEREF) {
    -+				/*
    -+				 * The reftable stack is locked at this point
    -+				 * already, so it should be safe to call
    -+				 * `refs_resolve_ref_unsafe()` here.
    -+				 */
     +				if (u->flags & REF_HAVE_OLD && !resolved) {
     +					strbuf_addf(err, _("cannot lock ref '%s': "
     +						    "error reading reference"), u->refname);
    @@ refs/reftable-backend.c (new)
     +				ALLOC_GROW(logs, logs_nr + 1, logs_alloc);
     +				tombstone = &logs[logs_nr++];
     +				tombstone->refname = xstrdup(u->refname);
    -+				tombstone->value_type = REFTABLE_LOG_DELETION,
    ++				tombstone->value_type = REFTABLE_LOG_DELETION;
     +				tombstone->update_index = log.update_index;
     +			}
     +
    @@ refs/reftable-backend.c (new)
     +
     +	/*
     +	 * When deleting the old reference we have to use two update indices:
    -+	 * one to delete the old ref and its reflog, and once to create the new
    -+	 * ref and its reflog. They need to be staged with two separate indices
    -+	 * because the new reflog needs to encode both the deletion of the old
    -+	 * branch and the creation of the new branch, and we cannot do two
    -+	 * changes to a reflog in a single update.
    ++	 * once to delete the old ref and its reflog, and once to create the
    ++	 * new ref and its reflog. They need to be staged with two separate
    ++	 * indices because the new reflog needs to encode both the deletion of
    ++	 * the old branch and the creation of the new branch, and we cannot do
    ++	 * two changes to a reflog in a single update.
     +	 */
     +	deletion_ts = creation_ts = reftable_stack_next_update_index(arg->stack);
     +	if (arg->delete_old)
    @@ refs/reftable-backend.c (new)
     +	struct reftable_reflog_iterator *iter =
     +		(struct reftable_reflog_iterator *)ref_iterator;
     +
    -+	if (iter->err) {
    -+		ref_iterator_abort(ref_iterator);
    -+		return iter->err;
    -+	}
    -+
    -+	while (1) {
    -+		int flags, ret;
    ++	while (!iter->err) {
    ++		int flags;
     +
    -+		ret = reftable_iterator_next_log(&iter->iter, &iter->log);
    -+		if (ret < 0) {
    -+			ref_iterator_abort(ref_iterator);
    -+			return ITER_ERROR;
    -+		}
    -+		if (ret > 0) {
    -+			if (ref_iterator_abort(ref_iterator) != ITER_DONE)
    -+				return ITER_ERROR;
    -+			return ITER_DONE;
    -+		}
    ++		iter->err = reftable_iterator_next_log(&iter->iter, &iter->log);
    ++		if (iter->err)
    ++			break;
     +
     +		/*
     +		 * We want the refnames that we have reflogs for, so we skip if
    @@ refs/reftable-backend.c (new)
     +		iter->base.refname = iter->log.refname;
     +		iter->base.oid = &iter->oid;
     +		iter->base.flags = flags;
    -+		return ITER_OK;
    ++
    ++		break;
     +	}
    ++
    ++	if (iter->err > 0) {
    ++		if (ref_iterator_abort(ref_iterator) != ITER_DONE)
    ++			return ITER_ERROR;
    ++		return ITER_DONE;
    ++	}
    ++
    ++	if (iter->err < 0) {
    ++		ref_iterator_abort(ref_iterator);
    ++		return ITER_ERROR;
    ++	}
    ++
    ++	return ITER_OK;
     +}
     +
     +static int reftable_reflog_iterator_peel(struct ref_iterator *ref_iterator,
    @@ refs/reftable-backend.c (new)
     +static struct reftable_reflog_iterator *reflog_iterator_for_stack(struct reftable_ref_store *refs,
     +								  struct reftable_stack *stack)
     +{
    ++	struct reftable_merged_table *merged_table;
     +	struct reftable_reflog_iterator *iter;
    -+	struct reftable_merged_table *mt;
     +	int ret;
     +
     +	iter = xcalloc(1, sizeof(*iter));
    @@ refs/reftable-backend.c (new)
     +	iter->refs = refs;
     +	iter->base.oid = &iter->oid;
     +
    ++	ret = refs->err;
    ++	if (ret)
    ++		goto done;
    ++
     +	ret = reftable_stack_reload(refs->main_stack);
     +	if (ret < 0)
     +		goto done;
     +
    -+	mt = reftable_stack_merged_table(stack);
    -+	ret = reftable_merged_table_seek_log(mt, &iter->iter, "");
    ++	merged_table = reftable_stack_merged_table(stack);
    ++
    ++	ret = reftable_merged_table_seek_log(merged_table, &iter->iter, "");
     +	if (ret < 0)
     +		goto done;
     +
    @@ refs/reftable-backend.c (new)
     +					iterator_select, NULL);
     +}
     +
    ++static int yield_log_record(struct reftable_log_record *log,
    ++			    each_reflog_ent_fn fn,
    ++			    void *cb_data)
    ++{
    ++	struct object_id old_oid, new_oid;
    ++	const char *full_committer;
    ++
    ++	oidread(&old_oid, log->value.update.old_hash);
    ++	oidread(&new_oid, log->value.update.new_hash);
    ++
    ++	/*
    ++	 * When both the old object ID and the new object ID are null
    ++	 * then this is the reflog existence marker. The caller must
    ++	 * not be aware of it.
    ++	 */
    ++	if (is_null_oid(&old_oid) && is_null_oid(&new_oid))
    ++		return 0;
    ++
    ++	full_committer = fmt_ident(log->value.update.name, log->value.update.email,
    ++				   WANT_COMMITTER_IDENT, NULL, IDENT_NO_DATE);
    ++	return fn(&old_oid, &new_oid, full_committer,
    ++		  log->value.update.time, log->value.update.tz_offset,
    ++		  log->value.update.message, cb_data);
    ++}
    ++
     +static int reftable_be_for_each_reflog_ent_reverse(struct ref_store *ref_store,
     +						   const char *refname,
     +						   each_reflog_ent_fn fn,
    @@ refs/reftable-backend.c (new)
     +	mt = reftable_stack_merged_table(stack);
     +	ret = reftable_merged_table_seek_log(mt, &it, refname);
     +	while (!ret) {
    -+		struct object_id old_oid, new_oid;
    -+		const char *full_committer;
    -+
     +		ret = reftable_iterator_next_log(&it, &log);
     +		if (ret < 0)
     +			break;
    @@ refs/reftable-backend.c (new)
     +			break;
     +		}
     +
    -+		oidread(&old_oid, log.value.update.old_hash);
    -+		oidread(&new_oid, log.value.update.new_hash);
    -+
    -+		/*
    -+		 * When both the old object ID and the new object ID are null
    -+		 * then this is the reflog existence marker. The caller must
    -+		 * not be aware of it.
    -+		 */
    -+		if (is_null_oid(&old_oid) && is_null_oid(&new_oid))
    -+			continue;
    -+
    -+		full_committer = fmt_ident(log.value.update.name, log.value.update.email,
    -+					   WANT_COMMITTER_IDENT, NULL, IDENT_NO_DATE);
    -+		ret = fn(&old_oid, &new_oid, full_committer,
    -+			 log.value.update.time, log.value.update.tz_offset,
    -+			 log.value.update.message, cb_data);
    ++		ret = yield_log_record(&log, fn, cb_data);
     +		if (ret)
     +			break;
     +	}
    @@ refs/reftable-backend.c (new)
     +
     +		ret = reftable_iterator_next_log(&it, &log);
     +		if (ret < 0)
    -+			break;
    ++			goto done;
     +		if (ret > 0 || strcmp(log.refname, refname)) {
     +			reftable_log_record_release(&log);
     +			ret = 0;
    @@ refs/reftable-backend.c (new)
     +	}
     +
     +	for (i = logs_nr; i--;) {
    -+		struct reftable_log_record *log = &logs[i];
    -+		struct object_id old_oid, new_oid;
    -+		const char *full_committer = "";
    -+
    -+		oidread(&old_oid, log->value.update.old_hash);
    -+		oidread(&new_oid, log->value.update.new_hash);
    -+
    -+		/*
    -+		 * When both the old object ID and the new object ID are null
    -+		 * then this is the reflog existence marker. The caller must
    -+		 * not be aware of it.
    -+		 */
    -+		if (is_null_oid(&old_oid) && is_null_oid(&new_oid))
    -+			continue;
    -+
    -+		full_committer = fmt_ident(log->value.update.name, log->value.update.email,
    -+					   WANT_COMMITTER_IDENT, NULL, IDENT_NO_DATE);
    -+		ret = fn(&old_oid, &new_oid, full_committer,
    -+			 log->value.update.time, log->value.update.tz_offset,
    -+			 log->value.update.message, cb_data);
    ++		ret = yield_log_record(&logs[i], fn, cb_data);
     +		if (ret)
    -+			break;
    ++			goto done;
     +	}
     +
    ++done:
     +	reftable_iterator_destroy(&it);
     +	for (i = 0; i < logs_nr; i++)
     +		reftable_log_record_release(&logs[i]);
    @@ refs/reftable-backend.c (new)
     +		goto done;
     +
     +	ret = reftable_stack_reload(stack);
    -+	if (ret)
    ++	if (ret < 0)
     +		goto done;
     +
     +	ret = reftable_merged_table_seek_log(mt, &it, refname);
    -+	if (ret)
    ++	if (ret < 0)
     +		goto done;
     +
     +	/*
    -+	 * Seek the reflog to see whether it contains any reflog entries which
    -+	 * aren't marked for deletion.
    ++	 * Check whether we get at least one log record for the given ref name.
    ++	 * If so, the reflog exists, otherwise it doesn't.
     +	 */
    -+	while (1) {
    -+		ret = reftable_iterator_next_log(&it, &log);
    -+		if (ret < 0)
    -+			goto done;
    -+		if (ret > 0 || strcmp(log.refname, refname)) {
    -+			ret = 0;
    -+			goto done;
    -+		}
    -+
    -+		ret = 1;
    -+		break;
    ++	ret = reftable_iterator_next_log(&it, &log);
    ++	if (ret < 0)
    ++		goto done;
    ++	if (ret > 0) {
    ++		ret = 0;
    ++		goto done;
     +	}
     +
    ++	ret = strcmp(log.refname, refname) == 0;
    ++
     +done:
     +	reftable_iterator_destroy(&it);
     +	reftable_log_record_release(&log);
    @@ refs/reftable-backend.c (new)
     +			break;
     +		}
     +
    -+		tombstone.refname = (char *)arg->refname,
    -+		tombstone.value_type = REFTABLE_LOG_DELETION,
    ++		tombstone.refname = (char *)arg->refname;
    ++		tombstone.value_type = REFTABLE_LOG_DELETION;
     +		tombstone.update_index = log.update_index;
     +
     +		ret = reftable_writer_add_log(writer, &tombstone);
2:  bb694fa132 = 2:  63eafc9f5b ci: add jobs to test with the reftable backend

karthik nayak Feb. 2, 2024, 1:02 p.m. UTC | #1

Hello,

Patrick Steinhardt <ps@pks.im> writes:
> Hi,
>
> this is the second version of my patch series that introduces the
> reftable backend.
>

The diff looks good to, apart from the tests (which I didn't get around
to reviewing), everything else looks great!

Thanks,
Karthik

Junio C Hamano Feb. 3, 2024, 8:41 p.m. UTC | #2

Patrick Steinhardt <ps@pks.im> writes:

> this is the second version of my patch series that introduces the
> reftable backend.
>
> This version addresses the feedback by Karthik. I don't think it really
> makes sense to spell out every change here -- the range diff should
> likely be easier to digest.

This, when merged to 'seen' (which also has "for-each-ref now thinks
an empty string is a signal to show ref-like things outside the
refs/ hierarchy" topic, kn/for-all-refs), seems to break *-reftable
CI tests, cf.  https://github.com/git/git/actions/runs/7765401528

I'll tentatively eject the topic from 'seen', even though I have a
suspicion that it byitself would pass all the tests.

Thanks.

Patrick Steinhardt Feb. 4, 2024, 6 a.m. UTC | #3

On Sat, Feb 03, 2024 at 12:41:01PM -0800, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> > this is the second version of my patch series that introduces the
> > reftable backend.
> >
> > This version addresses the feedback by Karthik. I don't think it really
> > makes sense to spell out every change here -- the range diff should
> > likely be easier to digest.
> 
> This, when merged to 'seen' (which also has "for-each-ref now thinks
> an empty string is a signal to show ref-like things outside the
> refs/ hierarchy" topic, kn/for-all-refs), seems to break *-reftable
> CI tests, cf.  https://github.com/git/git/actions/runs/7765401528
> 
> I'll tentatively eject the topic from 'seen', even though I have a
> suspicion that it byitself would pass all the tests.
> 
> Thanks.

Yup, this is known due to the semantic conflict with kn/for-all-refs, as
you point out. Karthik already sent a single-line change in this thread
that fixes the issue. I'll rebase the patch series tomorrow and pull in
the topic as a dependency and adapt accordingly.

Thanks!

Patrick

[v2,0/2] refs: introduce reftable backend

Message

Comments