Message ID | f1e8a7bb3bf0f4c0414819cb1d5579dc08fd2a4f.1646905589.git.ps@pks.im (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | A design for future-proofing fsync() configuration | expand |
Patrick Steinhardt <ps@pks.im> writes: > diff --git a/Documentation/config/core.txt b/Documentation/config/core.txt > index 973805e8a9..b67d3c340e 100644 > --- a/Documentation/config/core.txt > +++ b/Documentation/config/core.txt > @@ -564,8 +564,10 @@ core.fsync:: > * `pack-metadata` hardens packfile bitmaps and indexes. > * `commit-graph` hardens the commit graph file. > * `index` hardens the index when it is modified. > +* `loose-ref` hardens references modified in the repo in loose-ref form. > * `objects` is an aggregate option that is equivalent to > `loose-object,pack`. > +* `refs` is an aggregate option that is equivalent to `loose-ref`. Aggregate of one feels strange. I do not see a strong reason to have two separate classes loose vs packed and allow them be protected to different robustness, but that aside, if we were to have two, this "aggregate" is better added when the second one is. Having said that, given that a separate ref backend that has no distinction between loose or packed is on horizen, I think we would rather prefer to see a single "ref" component that governs all backends. > @@ -1026,7 +1029,8 @@ enum fsync_component { > FSYNC_COMPONENT_PACK | \ > FSYNC_COMPONENT_PACK_METADATA | \ > FSYNC_COMPONENT_COMMIT_GRAPH | \ > - FSYNC_COMPONENT_INDEX) > + FSYNC_COMPONENT_INDEX | \ > + FSYNC_COMPONENT_LOOSE_REF) OK. > +static int files_sync_loose_ref(struct ref_lock *lock, struct strbuf *err) This file-scope static function will not be in the vtable (in other words, it is not like "sync_loose_ref" method must be defined across all ref backends and this function is called as the implementation of the method for the files backend), so we do not have to give it "files_" prefix if we do not want to. sync_loose_ref() may be easier to read, perhaps? > +{ > + int ret = fsync_component(FSYNC_COMPONENT_LOOSE_REF, get_lock_file_fd(&lock->lk)); > + if (ret) > + strbuf_addf(err, "could not sync loose ref '%s': %s", lock->ref_name, > + strerror(errno)); > + return ret; > +} OK. Good illustration how the new helper in 6/8 is useful. It would be nice if the reroll of the base topic by Neeraj reorders the patches to introduce fsync_component() much earlier, at the same time it introduces the fsync_component_or_die(). > @@ -1504,6 +1513,7 @@ static int files_copy_or_rename_ref(struct ref_store *ref_store, > oidcpy(&lock->old_oid, &orig_oid); > > if (write_ref_to_lockfile(lock, &orig_oid, 0, &err) || > + files_sync_loose_ref(lock, &err) || > commit_ref_update(refs, lock, &orig_oid, logmsg, &err)) { > error("unable to write current sha1 into %s: %s", newrefname, err.buf); > strbuf_release(&err); > @@ -1524,6 +1534,7 @@ static int files_copy_or_rename_ref(struct ref_store *ref_store, > flag = log_all_ref_updates; > log_all_ref_updates = LOG_REFS_NONE; > if (write_ref_to_lockfile(lock, &orig_oid, 0, &err) || > + files_sync_loose_ref(lock, &err) || > commit_ref_update(refs, lock, &orig_oid, NULL, &err)) { > error("unable to write current sha1 into %s: %s", oldrefname, err.buf); > strbuf_release(&err); We used to skip commit_ref_update() and gone to the error code path upon write failure, and we do the same upon fsync failure now. The error code path will do the rollback the same way as before. OK, these look sensible. > @@ -2819,6 +2830,24 @@ static int files_transaction_prepare(struct ref_store *ref_store, > } > } > > + /* > + * Sync all lockfiles to disk to ensure data consistency. We do this in > + * a separate step such that we can sync all modified refs in a single > + * step, which may be more efficient on some filesystems. > + */ > + for (i = 0; i < transaction->nr; i++) { > + struct ref_update *update = transaction->updates[i]; > + struct ref_lock *lock = update->backend_data; > + > + if (!(update->flags & REF_NEEDS_COMMIT)) > + continue; > + > + if (files_sync_loose_ref(lock, err)) { > + ret = TRANSACTION_GENERIC_ERROR; > + goto cleanup; > + } > + } An obvious alternative that naïvely comes to mind is to keep going after seeing the first failure to sync and attempt to sync all the rest, but remember that we had an error. But I think what this patch does makes a lot more sense, as the error code path will just cleans the transaction up. If any of them fails, there is no reason to spend more effort. Thanks. > cleanup: > free(head_ref); > string_list_clear(&affected_refnames, 0);
> Good illustration how the new helper in 6/8 is useful. It would be > nice if the reroll of the base topic by Neeraj reorders the patches > to introduce fsync_component() much earlier, at the same time it > introduces the fsync_component_or_die(). Will do.
On Thu, Mar 10, 2022 at 1:53 AM Patrick Steinhardt <ps@pks.im> wrote: > diff --git a/Documentation/config/core.txt b/Documentation/config/core.txt > index 973805e8a9..b67d3c340e 100644 > --- a/Documentation/config/core.txt > +++ b/Documentation/config/core.txt > @@ -564,8 +564,10 @@ core.fsync:: > * `pack-metadata` hardens packfile bitmaps and indexes. > * `commit-graph` hardens the commit graph file. > * `index` hardens the index when it is modified. > +* `loose-ref` hardens references modified in the repo in loose-ref form. > * `objects` is an aggregate option that is equivalent to > `loose-object,pack`. > +* `refs` is an aggregate option that is equivalent to `loose-ref`. > * `derived-metadata` is an aggregate option that is equivalent to > `pack-metadata,commit-graph`. > * `default` is an aggregate option that is equivalent to > diff --git a/cache.h b/cache.h > index 63a95d1977..b56a56f539 100644 > --- a/cache.h > +++ b/cache.h > @@ -1005,11 +1005,14 @@ enum fsync_component { > FSYNC_COMPONENT_PACK_METADATA = 1 << 2, > FSYNC_COMPONENT_COMMIT_GRAPH = 1 << 3, > FSYNC_COMPONENT_INDEX = 1 << 4, > + FSYNC_COMPONENT_LOOSE_REF = 1 << 5, > }; > > #define FSYNC_COMPONENTS_OBJECTS (FSYNC_COMPONENT_LOOSE_OBJECT | \ > FSYNC_COMPONENT_PACK) > > +#define FSYNC_COMPONENTS_REFS (FSYNC_COMPONENT_LOOSE_REF) > + > #define FSYNC_COMPONENTS_DERIVED_METADATA (FSYNC_COMPONENT_PACK_METADATA | \ > FSYNC_COMPONENT_COMMIT_GRAPH) > > @@ -1026,7 +1029,8 @@ enum fsync_component { > FSYNC_COMPONENT_PACK | \ > FSYNC_COMPONENT_PACK_METADATA | \ > FSYNC_COMPONENT_COMMIT_GRAPH | \ > - FSYNC_COMPONENT_INDEX) > + FSYNC_COMPONENT_INDEX | \ > + FSYNC_COMPONENT_LOOSE_REF) > > /* > * A bitmask indicating which components of the repo should be fsynced. > diff --git a/config.c b/config.c > index f03f27c3de..b5d3e6e404 100644 > --- a/config.c > +++ b/config.c > @@ -1332,7 +1332,9 @@ static const struct fsync_component_entry { > { "pack-metadata", FSYNC_COMPONENT_PACK_METADATA }, > { "commit-graph", FSYNC_COMPONENT_COMMIT_GRAPH }, > { "index", FSYNC_COMPONENT_INDEX }, > + { "loose-ref", FSYNC_COMPONENT_LOOSE_REF }, > { "objects", FSYNC_COMPONENTS_OBJECTS }, > + { "refs", FSYNC_COMPONENTS_REFS }, > { "derived-metadata", FSYNC_COMPONENTS_DERIVED_METADATA }, > { "default", FSYNC_COMPONENTS_DEFAULT }, > { "committed", FSYNC_COMPONENTS_COMMITTED }, In terms of the 'preciousness-levels', refs should be included in FSYNC_COMPONENTS_COMMITTED, from which it will also be included in _ADDED.
Patrick Steinhardt <ps@pks.im> writes: > @@ -1504,6 +1513,7 @@ static int files_copy_or_rename_ref(struct ref_store *ref_store, > oidcpy(&lock->old_oid, &orig_oid); > > if (write_ref_to_lockfile(lock, &orig_oid, 0, &err) || > + files_sync_loose_ref(lock, &err) || > commit_ref_update(refs, lock, &orig_oid, logmsg, &err)) { > error("unable to write current sha1 into %s: %s", newrefname, err.buf); > strbuf_release(&err); Given that write_ref_to_lockfile() on the success code path does this: fd = get_lock_file_fd(&lock->lk); if (write_in_full(fd, oid_to_hex(oid), the_hash_algo->hexsz) < 0 || write_in_full(fd, &term, 1) < 0 || close_ref_gently(lock) < 0) { strbuf_addf(err, "couldn't write '%s'", get_lock_file_path(&lock->lk)); unlock_ref(lock); return -1; } return 0; the above unfortunately does not work. By the time the new call to files_sync_loose_ref() is made, lock->fd is closed by the call to close_lock_file_gently() made in close_ref_gently(), and because of that, you'll get an error like this: Writing objects: 100% (3/3), 279 bytes | 279.00 KiB/s, done. Total 3 (delta 0), reused 0 (delta 0), pack-reused 0 remote: error: could not sync loose ref 'refs/heads/client_branch': Bad file descriptor when running "make test" (the above is from t5702 but I wouldn't be surprised if this broke ALL ref updates). Just before write_ref_to_lockfile() calls close_ref_gently() would be a good place to make the fsync_loose_ref() call, perhaps? Thanks.
On Thu, Mar 10, 2022 at 10:40:07PM -0800, Junio C Hamano wrote: > Patrick Steinhardt <ps@pks.im> writes: > > > @@ -1504,6 +1513,7 @@ static int files_copy_or_rename_ref(struct ref_store *ref_store, > > oidcpy(&lock->old_oid, &orig_oid); > > > > if (write_ref_to_lockfile(lock, &orig_oid, 0, &err) || > > + files_sync_loose_ref(lock, &err) || > > commit_ref_update(refs, lock, &orig_oid, logmsg, &err)) { > > error("unable to write current sha1 into %s: %s", newrefname, err.buf); > > strbuf_release(&err); > > Given that write_ref_to_lockfile() on the success code path does this: > > fd = get_lock_file_fd(&lock->lk); > if (write_in_full(fd, oid_to_hex(oid), the_hash_algo->hexsz) < 0 || > write_in_full(fd, &term, 1) < 0 || > close_ref_gently(lock) < 0) { > strbuf_addf(err, > "couldn't write '%s'", get_lock_file_path(&lock->lk)); > unlock_ref(lock); > return -1; > } > return 0; > > the above unfortunately does not work. By the time the new call to > files_sync_loose_ref() is made, lock->fd is closed by the call to > close_lock_file_gently() made in close_ref_gently(), and because of > that, you'll get an error like this: > > Writing objects: 100% (3/3), 279 bytes | 279.00 KiB/s, done. > Total 3 (delta 0), reused 0 (delta 0), pack-reused 0 > remote: error: could not sync loose ref 'refs/heads/client_branch': > Bad file descriptor > > when running "make test" (the above is from t5702 but I wouldn't be > surprised if this broke ALL ref updates). > > Just before write_ref_to_lockfile() calls close_ref_gently() would > be a good place to make the fsync_loose_ref() call, perhaps? > > > Thanks. Yeah, that thought indeed occurred to me this night, too. I was hoping that I could fix this before anybody noticed ;) It's a bit unfortunate that we can't just defer this to a later place to hopefully implement this more efficiently, but so be it. The alternative would be to re-open all locked loose refs and then sync them to disk, but this would likely be a lot more painful than just syncing them to disk before closing it. Will fix, thanks. Patrick
On Fri, Mar 11 2022, Patrick Steinhardt wrote: > [[PGP Signed Part:Undecided]] > On Thu, Mar 10, 2022 at 10:40:07PM -0800, Junio C Hamano wrote: >> Patrick Steinhardt <ps@pks.im> writes: >> >> > @@ -1504,6 +1513,7 @@ static int files_copy_or_rename_ref(struct ref_store *ref_store, >> > oidcpy(&lock->old_oid, &orig_oid); >> > >> > if (write_ref_to_lockfile(lock, &orig_oid, 0, &err) || >> > + files_sync_loose_ref(lock, &err) || >> > commit_ref_update(refs, lock, &orig_oid, logmsg, &err)) { >> > error("unable to write current sha1 into %s: %s", newrefname, err.buf); >> > strbuf_release(&err); >> >> Given that write_ref_to_lockfile() on the success code path does this: >> >> fd = get_lock_file_fd(&lock->lk); >> if (write_in_full(fd, oid_to_hex(oid), the_hash_algo->hexsz) < 0 || >> write_in_full(fd, &term, 1) < 0 || >> close_ref_gently(lock) < 0) { >> strbuf_addf(err, >> "couldn't write '%s'", get_lock_file_path(&lock->lk)); >> unlock_ref(lock); >> return -1; >> } >> return 0; >> >> the above unfortunately does not work. By the time the new call to >> files_sync_loose_ref() is made, lock->fd is closed by the call to >> close_lock_file_gently() made in close_ref_gently(), and because of >> that, you'll get an error like this: >> >> Writing objects: 100% (3/3), 279 bytes | 279.00 KiB/s, done. >> Total 3 (delta 0), reused 0 (delta 0), pack-reused 0 >> remote: error: could not sync loose ref 'refs/heads/client_branch': >> Bad file descriptor >> >> when running "make test" (the above is from t5702 but I wouldn't be >> surprised if this broke ALL ref updates). >> >> Just before write_ref_to_lockfile() calls close_ref_gently() would >> be a good place to make the fsync_loose_ref() call, perhaps? >> >> >> Thanks. > > Yeah, that thought indeed occurred to me this night, too. I was hoping > that I could fix this before anybody noticed ;) > > It's a bit unfortunate that we can't just defer this to a later place to > hopefully implement this more efficiently, but so be it. The alternative > would be to re-open all locked loose refs and then sync them to disk, > but this would likely be a lot more painful than just syncing them to > disk before closing it. Aside: is open/write/close followed by open/fsync/close on the same file portably guaranteed to yield the same end result as a single open/write/fsync/close? I think in practice nobody would be insane enough to implement a system to do otherwise, but on the other hand I've seen some really insane behavior :) I could see it being different e.g. in some NFS cases/configurations where the fsync() for an open FD syncs to the remote storage, and the second open() might therefore get the old version and noop-sync that. Most implementations would guard against that in the common case by having a local cache of outstanding data to flush, but if you're talking to some sharded storage array for each request... Anyway, I *think* it should be OK, just an aside to check the assumption for any future work... :)
diff --git a/Documentation/config/core.txt b/Documentation/config/core.txt index 973805e8a9..b67d3c340e 100644 --- a/Documentation/config/core.txt +++ b/Documentation/config/core.txt @@ -564,8 +564,10 @@ core.fsync:: * `pack-metadata` hardens packfile bitmaps and indexes. * `commit-graph` hardens the commit graph file. * `index` hardens the index when it is modified. +* `loose-ref` hardens references modified in the repo in loose-ref form. * `objects` is an aggregate option that is equivalent to `loose-object,pack`. +* `refs` is an aggregate option that is equivalent to `loose-ref`. * `derived-metadata` is an aggregate option that is equivalent to `pack-metadata,commit-graph`. * `default` is an aggregate option that is equivalent to diff --git a/cache.h b/cache.h index 63a95d1977..b56a56f539 100644 --- a/cache.h +++ b/cache.h @@ -1005,11 +1005,14 @@ enum fsync_component { FSYNC_COMPONENT_PACK_METADATA = 1 << 2, FSYNC_COMPONENT_COMMIT_GRAPH = 1 << 3, FSYNC_COMPONENT_INDEX = 1 << 4, + FSYNC_COMPONENT_LOOSE_REF = 1 << 5, }; #define FSYNC_COMPONENTS_OBJECTS (FSYNC_COMPONENT_LOOSE_OBJECT | \ FSYNC_COMPONENT_PACK) +#define FSYNC_COMPONENTS_REFS (FSYNC_COMPONENT_LOOSE_REF) + #define FSYNC_COMPONENTS_DERIVED_METADATA (FSYNC_COMPONENT_PACK_METADATA | \ FSYNC_COMPONENT_COMMIT_GRAPH) @@ -1026,7 +1029,8 @@ enum fsync_component { FSYNC_COMPONENT_PACK | \ FSYNC_COMPONENT_PACK_METADATA | \ FSYNC_COMPONENT_COMMIT_GRAPH | \ - FSYNC_COMPONENT_INDEX) + FSYNC_COMPONENT_INDEX | \ + FSYNC_COMPONENT_LOOSE_REF) /* * A bitmask indicating which components of the repo should be fsynced. diff --git a/config.c b/config.c index f03f27c3de..b5d3e6e404 100644 --- a/config.c +++ b/config.c @@ -1332,7 +1332,9 @@ static const struct fsync_component_entry { { "pack-metadata", FSYNC_COMPONENT_PACK_METADATA }, { "commit-graph", FSYNC_COMPONENT_COMMIT_GRAPH }, { "index", FSYNC_COMPONENT_INDEX }, + { "loose-ref", FSYNC_COMPONENT_LOOSE_REF }, { "objects", FSYNC_COMPONENTS_OBJECTS }, + { "refs", FSYNC_COMPONENTS_REFS }, { "derived-metadata", FSYNC_COMPONENTS_DERIVED_METADATA }, { "default", FSYNC_COMPONENTS_DEFAULT }, { "committed", FSYNC_COMPONENTS_COMMITTED }, diff --git a/refs/files-backend.c b/refs/files-backend.c index f59589d6cc..279316de45 100644 --- a/refs/files-backend.c +++ b/refs/files-backend.c @@ -1392,6 +1392,15 @@ static int refs_rename_ref_available(struct ref_store *refs, return ok; } +static int files_sync_loose_ref(struct ref_lock *lock, struct strbuf *err) +{ + int ret = fsync_component(FSYNC_COMPONENT_LOOSE_REF, get_lock_file_fd(&lock->lk)); + if (ret) + strbuf_addf(err, "could not sync loose ref '%s': %s", lock->ref_name, + strerror(errno)); + return ret; +} + static int files_copy_or_rename_ref(struct ref_store *ref_store, const char *oldrefname, const char *newrefname, const char *logmsg, int copy) @@ -1504,6 +1513,7 @@ static int files_copy_or_rename_ref(struct ref_store *ref_store, oidcpy(&lock->old_oid, &orig_oid); if (write_ref_to_lockfile(lock, &orig_oid, 0, &err) || + files_sync_loose_ref(lock, &err) || commit_ref_update(refs, lock, &orig_oid, logmsg, &err)) { error("unable to write current sha1 into %s: %s", newrefname, err.buf); strbuf_release(&err); @@ -1524,6 +1534,7 @@ static int files_copy_or_rename_ref(struct ref_store *ref_store, flag = log_all_ref_updates; log_all_ref_updates = LOG_REFS_NONE; if (write_ref_to_lockfile(lock, &orig_oid, 0, &err) || + files_sync_loose_ref(lock, &err) || commit_ref_update(refs, lock, &orig_oid, NULL, &err)) { error("unable to write current sha1 into %s: %s", oldrefname, err.buf); strbuf_release(&err); @@ -2819,6 +2830,24 @@ static int files_transaction_prepare(struct ref_store *ref_store, } } + /* + * Sync all lockfiles to disk to ensure data consistency. We do this in + * a separate step such that we can sync all modified refs in a single + * step, which may be more efficient on some filesystems. + */ + for (i = 0; i < transaction->nr; i++) { + struct ref_update *update = transaction->updates[i]; + struct ref_lock *lock = update->backend_data; + + if (!(update->flags & REF_NEEDS_COMMIT)) + continue; + + if (files_sync_loose_ref(lock, err)) { + ret = TRANSACTION_GENERIC_ERROR; + goto cleanup; + } + } + cleanup: free(head_ref); string_list_clear(&affected_refnames, 0);
When writing loose references to disk we first create a lockfile, write the updated value of the reference into that lockfile, and on commit we rename the file into place. According to filesystem developers, this behaviour is broken because applications should always sync data to disk before doing the final rename to ensure data consistency [1][2][3]. If applications fail to do this correctly, a hard crash of the machine can easily result in corrupted on-disk data. This kind of corruption can in fact be easily observed with Git when the machine hard-crashes shortly after writing loose references to disk. On machines with ext4, this will likely lead to the "empty files" problem: the file has been renamed, but its data has not been synced to disk. The result is that the references is corrupt, and in the worst case this can lead to data loss. Implement a new option to harden loose references so that users and admins can avoid this scenario by syncing locked loose references to disk before we rename them into place. [1]: https://thunk.org/tytso/blog/2009/03/15/dont-fear-the-fsync/ [2]: https://btrfs.wiki.kernel.org/index.php/FAQ (What are the crash guarantees of overwrite-by-rename) [3]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/admin-guide/ext4.rst (see auto_da_alloc) Signed-off-by: Patrick Steinhardt <ps@pks.im> --- Documentation/config/core.txt | 2 ++ cache.h | 6 +++++- config.c | 2 ++ refs/files-backend.c | 29 +++++++++++++++++++++++++++++ 4 files changed, 38 insertions(+), 1 deletion(-)