Message ID | ca3eab99f7ef86d1b7a5b4d4bdb8d2b0a55566e1.1726578382.git.ps@pks.im (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | reftable: graceful concurrent writes | expand |
Patrick Steinhardt <ps@pks.im> writes: [snip] > +reftable.lockTimeout:: > + Whenever the reftable backend appends a new table to the stack, it has > + to lock the central "tables.list" file before updating it. This config > + controls how long the process will wait to acquire the lock in case > + another process has already acquired it. Default is 1000 (i.e., retry > + for 1 second). Isn't the default 100ms? As that was what was mentioned in the commit message [snip] > +test_expect_success 'ref transaction: retry acquiring tables.list lock' ' > + test_when_finished "rm -rf repo" && > + git init repo && > + ( > + cd repo && > + test_commit initial && > + LOCK=.git/reftable/tables.list.lock && > + >$LOCK && > + { > + ( sleep 1 && rm -f $LOCK ) & > + } && > + git -c reftable.lockTimeout=5000 update-ref refs/heads/branch HEAD > + ) > +' Nit: This does stall the test for 1s. Which is slightly annoying when running single tests locally. Couldn't we achieve this by doing `sleep 0.1`?
On Tue, Sep 17, 2024 at 1:46 PM karthik nayak <karthik.188@gmail.com> wrote: > Patrick Steinhardt <ps@pks.im> writes: > > +test_expect_success 'ref transaction: retry acquiring tables.list lock' ' > > + test_when_finished "rm -rf repo" && > > + git init repo && > > + ( > > + cd repo && > > + test_commit initial && > > + LOCK=.git/reftable/tables.list.lock && > > + >$LOCK && > > + { > > + ( sleep 1 && rm -f $LOCK ) & > > + } && > > + git -c reftable.lockTimeout=5000 update-ref refs/heads/branch HEAD > > + ) > > +' > > Nit: This does stall the test for 1s. Which is slightly annoying when > running single tests locally. Couldn't we achieve this by doing `sleep > 0.1`? `sleep 0.1` is neither POSIX nor portable.
On Tue, Sep 17, 2024 at 01:50:27PM -0400, Eric Sunshine wrote: > On Tue, Sep 17, 2024 at 1:46 PM karthik nayak <karthik.188@gmail.com> wrote: > > Patrick Steinhardt <ps@pks.im> writes: > > > +test_expect_success 'ref transaction: retry acquiring tables.list lock' ' > > > + test_when_finished "rm -rf repo" && > > > + git init repo && > > > + ( > > > + cd repo && > > > + test_commit initial && > > > + LOCK=.git/reftable/tables.list.lock && > > > + >$LOCK && > > > + { > > > + ( sleep 1 && rm -f $LOCK ) & > > > + } && > > > + git -c reftable.lockTimeout=5000 update-ref refs/heads/branch HEAD > > > + ) > > > +' > > > > Nit: This does stall the test for 1s. Which is slightly annoying when > > running single tests locally. Couldn't we achieve this by doing `sleep > > 0.1`? > > `sleep 0.1` is neither POSIX nor portable. The above test also verifies that the timeout can in fact be overridden. If we only had `sleep 0.1` we wouldn't be able to reliably verify that, as the default is a timeout of 100ms. We also have essentially the same tests in t0601 for packed-refs, I mostly just copied the logic over from there. It's not exactly pretty, and I'm not a huge fan of introducing timing sensitive tests, but in the end I guess it's okayish. Patrick
On Tue, Sep 17, 2024 at 01:46:39PM -0400, karthik nayak wrote: > Patrick Steinhardt <ps@pks.im> writes: > > [snip] > > > +reftable.lockTimeout:: > > + Whenever the reftable backend appends a new table to the stack, it has > > + to lock the central "tables.list" file before updating it. This config > > + controls how long the process will wait to acquire the lock in case > > + another process has already acquired it. Default is 1000 (i.e., retry > > + for 1 second). > > Isn't the default 100ms? As that was what was mentioned in the commit > message Oh, yeah. I ended up changing it. 1 second is the default timeout used by packed-refs, 100ms is the default timeout used by loose refs. The reason why I ultimately picked 100ms over 1s is that our usecase is closer to loose refs, because it is our "normal" code path that we hit whenever we write reftables. Will fix. Patrick
diff --git a/Documentation/config/reftable.txt b/Documentation/config/reftable.txt index 05157279778..6c9449d0232 100644 --- a/Documentation/config/reftable.txt +++ b/Documentation/config/reftable.txt @@ -46,3 +46,10 @@ reftable.geometricFactor:: By default, the geometric sequence uses a factor of 2, meaning that for any table, the next-biggest table must at least be twice as big. A maximum factor of 256 is supported. + +reftable.lockTimeout:: + Whenever the reftable backend appends a new table to the stack, it has + to lock the central "tables.list" file before updating it. This config + controls how long the process will wait to acquire the lock in case + another process has already acquired it. Default is 1000 (i.e., retry + for 1 second). diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c index 1c4b19e737f..e90ddfb98dd 100644 --- a/refs/reftable-backend.c +++ b/refs/reftable-backend.c @@ -256,6 +256,9 @@ static int reftable_be_config(const char *var, const char *value, if (factor > UINT8_MAX) die("reftable geometric factor cannot exceed %u", (unsigned)UINT8_MAX); opts->auto_compaction_factor = factor; + } else if (!strcmp(var, "reftable.locktimeout")) { + unsigned long lock_timeout = git_config_ulong(var, value, ctx->kvi); + opts->lock_timeout_ms = lock_timeout; } return 0; @@ -281,6 +284,7 @@ static struct ref_store *reftable_be_init(struct repository *repo, refs->write_options.default_permissions = calc_shared_perm(0666 & ~mask); refs->write_options.disable_auto_compact = !git_env_bool("GIT_TEST_REFTABLE_AUTOCOMPACTION", 1); + refs->write_options.lock_timeout_ms = 100; git_config(reftable_be_config, &refs->write_options); diff --git a/reftable/reftable-writer.h b/reftable/reftable-writer.h index 189b1f4144f..d5f2446220b 100644 --- a/reftable/reftable-writer.h +++ b/reftable/reftable-writer.h @@ -51,6 +51,14 @@ struct reftable_write_options { * tables to compact. Defaults to 2 if unset. */ uint8_t auto_compaction_factor; + + /* + * The number of milliseconds to wait when trying to lock "tables.list". + * Note that this does not apply to locking individual tables, as these + * should only ever be locked when already holding the "tables.list" + * lock. + */ + unsigned lock_timeout_ms; }; /* reftable_block_stats holds statistics for a single block type */ diff --git a/reftable/stack.c b/reftable/stack.c index ce0a35216ba..5ccad2cff31 100644 --- a/reftable/stack.c +++ b/reftable/stack.c @@ -603,8 +603,10 @@ static int reftable_stack_init_addition(struct reftable_addition *add, add->stack = st; - err = hold_lock_file_for_update(&add->tables_list_lock, st->list_file, - LOCK_NO_DEREF); + err = hold_lock_file_for_update_timeout(&add->tables_list_lock, + st->list_file, + LOCK_NO_DEREF, + st->opts.lock_timeout_ms); if (err < 0) { if (errno == EEXIST) { err = REFTABLE_LOCK_ERROR; @@ -1056,8 +1058,10 @@ static int stack_compact_range(struct reftable_stack *st, * Hold the lock so that we can read "tables.list" and lock all tables * which are part of the user-specified range. */ - err = hold_lock_file_for_update(&tables_list_lock, st->list_file, - LOCK_NO_DEREF); + err = hold_lock_file_for_update_timeout(&tables_list_lock, + st->list_file, + LOCK_NO_DEREF, + st->opts.lock_timeout_ms); if (err < 0) { if (errno == EEXIST) err = REFTABLE_LOCK_ERROR; @@ -1156,8 +1160,10 @@ static int stack_compact_range(struct reftable_stack *st, * "tables.list". We'll then replace the compacted range of tables with * the new table. */ - err = hold_lock_file_for_update(&tables_list_lock, st->list_file, - LOCK_NO_DEREF); + err = hold_lock_file_for_update_timeout(&tables_list_lock, + st->list_file, + LOCK_NO_DEREF, + st->opts.lock_timeout_ms); if (err < 0) { if (errno == EEXIST) err = REFTABLE_LOCK_ERROR; diff --git a/t/t0610-reftable-basics.sh b/t/t0610-reftable-basics.sh index 37510c2b2a5..62da3e37823 100755 --- a/t/t0610-reftable-basics.sh +++ b/t/t0610-reftable-basics.sh @@ -423,6 +423,33 @@ test_expect_success 'ref transaction: fails gracefully when auto compaction fail ) ' +test_expect_success 'ref transaction: timeout acquiring tables.list lock' ' + test_when_finished "rm -rf repo" && + git init repo && + ( + cd repo && + test_commit initial && + >.git/reftable/tables.list.lock && + test_must_fail git update-ref refs/heads/branch HEAD 2>err && + test_grep "cannot lock references" err + ) +' + +test_expect_success 'ref transaction: retry acquiring tables.list lock' ' + test_when_finished "rm -rf repo" && + git init repo && + ( + cd repo && + test_commit initial && + LOCK=.git/reftable/tables.list.lock && + >$LOCK && + { + ( sleep 1 && rm -f $LOCK ) & + } && + git -c reftable.lockTimeout=5000 update-ref refs/heads/branch HEAD + ) +' + test_expect_success 'pack-refs: compacts tables' ' test_when_finished "rm -rf repo" && git init repo &&
When multiple concurrent processes try to update references in a repository they may try to lock the same lockfiles. This can happen even when the updates are non-conflicting and can both be applied, so it doesn't always make sense to abort the transaction immediately. Both the "loose" and "packed" backends thus have a grace period that they wait for the lock to be released that can be controlled via the config values "core.filesRefLockTimeout" and "core.packedRefsTimeout", respectively. The reftable backend doesn't have such a setting yet and instead fails immediately when it sees such a lock. But the exact same concepts apply here as they do apply to the other backends. Introduce a new "reftable.lockTimeout" config that controls how long we may wait for a "tables.list" lock to be released. The default value of this config is 100ms, which is the same default as we have it for the "loose" backend. Note that even though we also lock individual tables, this config really only applies to the "tables.list" file. This is because individual tables are only ever locked when we already hold the "tables.list" lock during compaction. When we observe such a lock we in fact do not want to compact the table at all because it is already in the process of being compacted by a concurrent process. So applying the same timeout here would not make any sense and only delay progress. Signed-off-by: Patrick Steinhardt <ps@pks.im> --- Documentation/config/reftable.txt | 7 +++++++ refs/reftable-backend.c | 4 ++++ reftable/reftable-writer.h | 8 ++++++++ reftable/stack.c | 18 ++++++++++++------ t/t0610-reftable-basics.sh | 27 +++++++++++++++++++++++++++ 5 files changed, 58 insertions(+), 6 deletions(-)