diff mbox series

[7/8] core.fsync: new option to harden loose references

Message ID f1e8a7bb3bf0f4c0414819cb1d5579dc08fd2a4f.1646905589.git.ps@pks.im (mailing list archive)
State Superseded
Headers show
Series A design for future-proofing fsync() configuration | expand

Commit Message

Patrick Steinhardt March 10, 2022, 9:53 a.m. UTC
When writing loose references to disk we first create a lockfile, write
the updated value of the reference into that lockfile, and on commit we
rename the file into place. According to filesystem developers, this
behaviour is broken because applications should always sync data to disk
before doing the final rename to ensure data consistency [1][2][3]. If
applications fail to do this correctly, a hard crash of the machine can
easily result in corrupted on-disk data.

This kind of corruption can in fact be easily observed with Git when the
machine hard-crashes shortly after writing loose references to disk. On
machines with ext4, this will likely lead to the "empty files" problem:
the file has been renamed, but its data has not been synced to disk. The
result is that the references is corrupt, and in the worst case this can
lead to data loss.

Implement a new option to harden loose references so that users and
admins can avoid this scenario by syncing locked loose references to
disk before we rename them into place.

[1]: https://thunk.org/tytso/blog/2009/03/15/dont-fear-the-fsync/
[2]: https://btrfs.wiki.kernel.org/index.php/FAQ (What are the crash guarantees of overwrite-by-rename)
[3]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/admin-guide/ext4.rst (see auto_da_alloc)

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 Documentation/config/core.txt |  2 ++
 cache.h                       |  6 +++++-
 config.c                      |  2 ++
 refs/files-backend.c          | 29 +++++++++++++++++++++++++++++
 4 files changed, 38 insertions(+), 1 deletion(-)

Comments

Junio C Hamano March 10, 2022, 6:25 p.m. UTC | #1
Patrick Steinhardt <ps@pks.im> writes:

> diff --git a/Documentation/config/core.txt b/Documentation/config/core.txt
> index 973805e8a9..b67d3c340e 100644
> --- a/Documentation/config/core.txt
> +++ b/Documentation/config/core.txt
> @@ -564,8 +564,10 @@ core.fsync::
>  * `pack-metadata` hardens packfile bitmaps and indexes.
>  * `commit-graph` hardens the commit graph file.
>  * `index` hardens the index when it is modified.
> +* `loose-ref` hardens references modified in the repo in loose-ref form.
>  * `objects` is an aggregate option that is equivalent to
>    `loose-object,pack`.
> +* `refs` is an aggregate option that is equivalent to `loose-ref`.

Aggregate of one feels strange.  I do not see a strong reason to
have two separate classes loose vs packed and allow them be
protected to different robustness, but that aside, if we were to
have two, this "aggregate" is better added when the second one is.

Having said that, given that a separate ref backend that has no
distinction between loose or packed is on horizen, I think we would
rather prefer to see a single "ref" component that governs all
backends.

> @@ -1026,7 +1029,8 @@ enum fsync_component {
>  			      FSYNC_COMPONENT_PACK | \
>  			      FSYNC_COMPONENT_PACK_METADATA | \
>  			      FSYNC_COMPONENT_COMMIT_GRAPH | \
> -			      FSYNC_COMPONENT_INDEX)
> +			      FSYNC_COMPONENT_INDEX | \
> +			      FSYNC_COMPONENT_LOOSE_REF)

OK.

> +static int files_sync_loose_ref(struct ref_lock *lock, struct strbuf *err)

This file-scope static function will not be in the vtable (in other
words, it is not like "sync_loose_ref" method must be defined across
all ref backends and this function is called as the implementation
of the method for the files backend), so we do not have to give it
"files_" prefix if we do not want to.

sync_loose_ref() may be easier to read, perhaps?

> +{
> +	int ret = fsync_component(FSYNC_COMPONENT_LOOSE_REF, get_lock_file_fd(&lock->lk));
> +	if (ret)
> +		strbuf_addf(err, "could not sync loose ref '%s': %s", lock->ref_name,
> +			    strerror(errno));
> +	return ret;
> +}

OK.

Good illustration how the new helper in 6/8 is useful.  It would be
nice if the reroll of the base topic by Neeraj reorders the patches
to introduce fsync_component() much earlier, at the same time it
introduces the fsync_component_or_die().

> @@ -1504,6 +1513,7 @@ static int files_copy_or_rename_ref(struct ref_store *ref_store,
>  	oidcpy(&lock->old_oid, &orig_oid);
>  
>  	if (write_ref_to_lockfile(lock, &orig_oid, 0, &err) ||
> +	    files_sync_loose_ref(lock, &err) ||
>  	    commit_ref_update(refs, lock, &orig_oid, logmsg, &err)) {
>  		error("unable to write current sha1 into %s: %s", newrefname, err.buf);
>  		strbuf_release(&err);
> @@ -1524,6 +1534,7 @@ static int files_copy_or_rename_ref(struct ref_store *ref_store,
>  	flag = log_all_ref_updates;
>  	log_all_ref_updates = LOG_REFS_NONE;
>  	if (write_ref_to_lockfile(lock, &orig_oid, 0, &err) ||
> +	    files_sync_loose_ref(lock, &err) ||
>  	    commit_ref_update(refs, lock, &orig_oid, NULL, &err)) {
>  		error("unable to write current sha1 into %s: %s", oldrefname, err.buf);
>  		strbuf_release(&err);

We used to skip commit_ref_update() and gone to the error code path
upon write failure, and we do the same upon fsync failure now.  The
error code path will do the rollback the same way as before.

OK, these look sensible.

> @@ -2819,6 +2830,24 @@ static int files_transaction_prepare(struct ref_store *ref_store,
>  		}
>  	}
>  
> +	/*
> +	 * Sync all lockfiles to disk to ensure data consistency. We do this in
> +	 * a separate step such that we can sync all modified refs in a single
> +	 * step, which may be more efficient on some filesystems.
> +	 */
> +	for (i = 0; i < transaction->nr; i++) {
> +		struct ref_update *update = transaction->updates[i];
> +		struct ref_lock *lock = update->backend_data;
> +
> +		if (!(update->flags & REF_NEEDS_COMMIT))
> +			continue;
> +
> +		if (files_sync_loose_ref(lock, err)) {
> +			ret  = TRANSACTION_GENERIC_ERROR;
> +			goto cleanup;
> +		}
> +	}

An obvious alternative that naïvely comes to mind is to keep going
after seeing the first failure to sync and attempt to sync all the
rest, but remember that we had an error.  But I think what this
patch does makes a lot more sense, as the error code path will just
cleans the transaction up.  If any of them fails, there is no reason
to spend more effort.

Thanks.

>  cleanup:
>  	free(head_ref);
>  	string_list_clear(&affected_refnames, 0);
Neeraj Singh March 10, 2022, 7:03 p.m. UTC | #2
> Good illustration how the new helper in 6/8 is useful.  It would be
> nice if the reroll of the base topic by Neeraj reorders the patches
> to introduce fsync_component() much earlier, at the same time it
> introduces the fsync_component_or_die().

Will do.
Neeraj Singh March 10, 2022, 10:54 p.m. UTC | #3
On Thu, Mar 10, 2022 at 1:53 AM Patrick Steinhardt <ps@pks.im> wrote:
> diff --git a/Documentation/config/core.txt b/Documentation/config/core.txt
> index 973805e8a9..b67d3c340e 100644
> --- a/Documentation/config/core.txt
> +++ b/Documentation/config/core.txt
> @@ -564,8 +564,10 @@ core.fsync::
>  * `pack-metadata` hardens packfile bitmaps and indexes.
>  * `commit-graph` hardens the commit graph file.
>  * `index` hardens the index when it is modified.
> +* `loose-ref` hardens references modified in the repo in loose-ref form.
>  * `objects` is an aggregate option that is equivalent to
>    `loose-object,pack`.
> +* `refs` is an aggregate option that is equivalent to `loose-ref`.
>  * `derived-metadata` is an aggregate option that is equivalent to
>    `pack-metadata,commit-graph`.
>  * `default` is an aggregate option that is equivalent to
> diff --git a/cache.h b/cache.h
> index 63a95d1977..b56a56f539 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -1005,11 +1005,14 @@ enum fsync_component {
>         FSYNC_COMPONENT_PACK_METADATA           = 1 << 2,
>         FSYNC_COMPONENT_COMMIT_GRAPH            = 1 << 3,
>         FSYNC_COMPONENT_INDEX                   = 1 << 4,
> +       FSYNC_COMPONENT_LOOSE_REF               = 1 << 5,
>  };
>
>  #define FSYNC_COMPONENTS_OBJECTS (FSYNC_COMPONENT_LOOSE_OBJECT | \
>                                   FSYNC_COMPONENT_PACK)
>
> +#define FSYNC_COMPONENTS_REFS (FSYNC_COMPONENT_LOOSE_REF)
> +
>  #define FSYNC_COMPONENTS_DERIVED_METADATA (FSYNC_COMPONENT_PACK_METADATA | \
>                                            FSYNC_COMPONENT_COMMIT_GRAPH)
>
> @@ -1026,7 +1029,8 @@ enum fsync_component {
>                               FSYNC_COMPONENT_PACK | \
>                               FSYNC_COMPONENT_PACK_METADATA | \
>                               FSYNC_COMPONENT_COMMIT_GRAPH | \
> -                             FSYNC_COMPONENT_INDEX)
> +                             FSYNC_COMPONENT_INDEX | \
> +                             FSYNC_COMPONENT_LOOSE_REF)
>
>  /*
>   * A bitmask indicating which components of the repo should be fsynced.
> diff --git a/config.c b/config.c
> index f03f27c3de..b5d3e6e404 100644
> --- a/config.c
> +++ b/config.c
> @@ -1332,7 +1332,9 @@ static const struct fsync_component_entry {
>         { "pack-metadata", FSYNC_COMPONENT_PACK_METADATA },
>         { "commit-graph", FSYNC_COMPONENT_COMMIT_GRAPH },
>         { "index", FSYNC_COMPONENT_INDEX },
> +       { "loose-ref", FSYNC_COMPONENT_LOOSE_REF },
>         { "objects", FSYNC_COMPONENTS_OBJECTS },
> +       { "refs", FSYNC_COMPONENTS_REFS },
>         { "derived-metadata", FSYNC_COMPONENTS_DERIVED_METADATA },
>         { "default", FSYNC_COMPONENTS_DEFAULT },
>         { "committed", FSYNC_COMPONENTS_COMMITTED },

In terms of the 'preciousness-levels', refs should be included in
FSYNC_COMPONENTS_COMMITTED,
from which it will also be included in _ADDED.
Junio C Hamano March 11, 2022, 6:40 a.m. UTC | #4
Patrick Steinhardt <ps@pks.im> writes:

> @@ -1504,6 +1513,7 @@ static int files_copy_or_rename_ref(struct ref_store *ref_store,
>  	oidcpy(&lock->old_oid, &orig_oid);
>  
>  	if (write_ref_to_lockfile(lock, &orig_oid, 0, &err) ||
> +	    files_sync_loose_ref(lock, &err) ||
>  	    commit_ref_update(refs, lock, &orig_oid, logmsg, &err)) {
>  		error("unable to write current sha1 into %s: %s", newrefname, err.buf);
>  		strbuf_release(&err);

Given that write_ref_to_lockfile() on the success code path does this:

	fd = get_lock_file_fd(&lock->lk);
	if (write_in_full(fd, oid_to_hex(oid), the_hash_algo->hexsz) < 0 ||
	    write_in_full(fd, &term, 1) < 0 ||
	    close_ref_gently(lock) < 0) {
		strbuf_addf(err,
			    "couldn't write '%s'", get_lock_file_path(&lock->lk));
		unlock_ref(lock);
		return -1;
	}
	return 0;

the above unfortunately does not work.  By the time the new call to
files_sync_loose_ref() is made, lock->fd is closed by the call to
close_lock_file_gently() made in close_ref_gently(), and because of
that, you'll get an error like this:

    Writing objects: 100% (3/3), 279 bytes | 279.00 KiB/s, done.
    Total 3 (delta 0), reused 0 (delta 0), pack-reused 0
    remote: error: could not sync loose ref 'refs/heads/client_branch':
    Bad file descriptor     

when running "make test" (the above is from t5702 but I wouldn't be
surprised if this broke ALL ref updates).

Just before write_ref_to_lockfile() calls close_ref_gently() would
be a good place to make the fsync_loose_ref() call, perhaps?


Thanks.
Patrick Steinhardt March 11, 2022, 9:15 a.m. UTC | #5
On Thu, Mar 10, 2022 at 10:40:07PM -0800, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> > @@ -1504,6 +1513,7 @@ static int files_copy_or_rename_ref(struct ref_store *ref_store,
> >  	oidcpy(&lock->old_oid, &orig_oid);
> >  
> >  	if (write_ref_to_lockfile(lock, &orig_oid, 0, &err) ||
> > +	    files_sync_loose_ref(lock, &err) ||
> >  	    commit_ref_update(refs, lock, &orig_oid, logmsg, &err)) {
> >  		error("unable to write current sha1 into %s: %s", newrefname, err.buf);
> >  		strbuf_release(&err);
> 
> Given that write_ref_to_lockfile() on the success code path does this:
> 
> 	fd = get_lock_file_fd(&lock->lk);
> 	if (write_in_full(fd, oid_to_hex(oid), the_hash_algo->hexsz) < 0 ||
> 	    write_in_full(fd, &term, 1) < 0 ||
> 	    close_ref_gently(lock) < 0) {
> 		strbuf_addf(err,
> 			    "couldn't write '%s'", get_lock_file_path(&lock->lk));
> 		unlock_ref(lock);
> 		return -1;
> 	}
> 	return 0;
> 
> the above unfortunately does not work.  By the time the new call to
> files_sync_loose_ref() is made, lock->fd is closed by the call to
> close_lock_file_gently() made in close_ref_gently(), and because of
> that, you'll get an error like this:
> 
>     Writing objects: 100% (3/3), 279 bytes | 279.00 KiB/s, done.
>     Total 3 (delta 0), reused 0 (delta 0), pack-reused 0
>     remote: error: could not sync loose ref 'refs/heads/client_branch':
>     Bad file descriptor     
> 
> when running "make test" (the above is from t5702 but I wouldn't be
> surprised if this broke ALL ref updates).
> 
> Just before write_ref_to_lockfile() calls close_ref_gently() would
> be a good place to make the fsync_loose_ref() call, perhaps?
> 
> 
> Thanks.

Yeah, that thought indeed occurred to me this night, too. I was hoping
that I could fix this before anybody noticed ;)

It's a bit unfortunate that we can't just defer this to a later place to
hopefully implement this more efficiently, but so be it. The alternative
would be to re-open all locked loose refs and then sync them to disk,
but this would likely be a lot more painful than just syncing them to
disk before closing it.

Will fix, thanks.

Patrick
Ævar Arnfjörð Bjarmason March 11, 2022, 9:36 a.m. UTC | #6
On Fri, Mar 11 2022, Patrick Steinhardt wrote:

> [[PGP Signed Part:Undecided]]
> On Thu, Mar 10, 2022 at 10:40:07PM -0800, Junio C Hamano wrote:
>> Patrick Steinhardt <ps@pks.im> writes:
>> 
>> > @@ -1504,6 +1513,7 @@ static int files_copy_or_rename_ref(struct ref_store *ref_store,
>> >  	oidcpy(&lock->old_oid, &orig_oid);
>> >  
>> >  	if (write_ref_to_lockfile(lock, &orig_oid, 0, &err) ||
>> > +	    files_sync_loose_ref(lock, &err) ||
>> >  	    commit_ref_update(refs, lock, &orig_oid, logmsg, &err)) {
>> >  		error("unable to write current sha1 into %s: %s", newrefname, err.buf);
>> >  		strbuf_release(&err);
>> 
>> Given that write_ref_to_lockfile() on the success code path does this:
>> 
>> 	fd = get_lock_file_fd(&lock->lk);
>> 	if (write_in_full(fd, oid_to_hex(oid), the_hash_algo->hexsz) < 0 ||
>> 	    write_in_full(fd, &term, 1) < 0 ||
>> 	    close_ref_gently(lock) < 0) {
>> 		strbuf_addf(err,
>> 			    "couldn't write '%s'", get_lock_file_path(&lock->lk));
>> 		unlock_ref(lock);
>> 		return -1;
>> 	}
>> 	return 0;
>> 
>> the above unfortunately does not work.  By the time the new call to
>> files_sync_loose_ref() is made, lock->fd is closed by the call to
>> close_lock_file_gently() made in close_ref_gently(), and because of
>> that, you'll get an error like this:
>> 
>>     Writing objects: 100% (3/3), 279 bytes | 279.00 KiB/s, done.
>>     Total 3 (delta 0), reused 0 (delta 0), pack-reused 0
>>     remote: error: could not sync loose ref 'refs/heads/client_branch':
>>     Bad file descriptor     
>> 
>> when running "make test" (the above is from t5702 but I wouldn't be
>> surprised if this broke ALL ref updates).
>> 
>> Just before write_ref_to_lockfile() calls close_ref_gently() would
>> be a good place to make the fsync_loose_ref() call, perhaps?
>> 
>> 
>> Thanks.
>
> Yeah, that thought indeed occurred to me this night, too. I was hoping
> that I could fix this before anybody noticed ;)
>
> It's a bit unfortunate that we can't just defer this to a later place to
> hopefully implement this more efficiently, but so be it. The alternative
> would be to re-open all locked loose refs and then sync them to disk,
> but this would likely be a lot more painful than just syncing them to
> disk before closing it.

Aside: is open/write/close followed by open/fsync/close on the same file
portably guaranteed to yield the same end result as a single
open/write/fsync/close?

I think in practice nobody would be insane enough to implement a system
to do otherwise, but on the other hand I've seen some really insane
behavior :)

I could see it being different e.g. in some NFS cases/configurations
where the fsync() for an open FD syncs to the remote storage, and the
second open() might therefore get the old version and noop-sync that.

Most implementations would guard against that in the common case by
having a local cache of outstanding data to flush, but if you're talking
to some sharded storage array for each request...

Anyway, I *think* it should be OK, just an aside to check the assumption
for any future work... :)
diff mbox series

Patch

diff --git a/Documentation/config/core.txt b/Documentation/config/core.txt
index 973805e8a9..b67d3c340e 100644
--- a/Documentation/config/core.txt
+++ b/Documentation/config/core.txt
@@ -564,8 +564,10 @@  core.fsync::
 * `pack-metadata` hardens packfile bitmaps and indexes.
 * `commit-graph` hardens the commit graph file.
 * `index` hardens the index when it is modified.
+* `loose-ref` hardens references modified in the repo in loose-ref form.
 * `objects` is an aggregate option that is equivalent to
   `loose-object,pack`.
+* `refs` is an aggregate option that is equivalent to `loose-ref`.
 * `derived-metadata` is an aggregate option that is equivalent to
   `pack-metadata,commit-graph`.
 * `default` is an aggregate option that is equivalent to
diff --git a/cache.h b/cache.h
index 63a95d1977..b56a56f539 100644
--- a/cache.h
+++ b/cache.h
@@ -1005,11 +1005,14 @@  enum fsync_component {
 	FSYNC_COMPONENT_PACK_METADATA		= 1 << 2,
 	FSYNC_COMPONENT_COMMIT_GRAPH		= 1 << 3,
 	FSYNC_COMPONENT_INDEX			= 1 << 4,
+	FSYNC_COMPONENT_LOOSE_REF		= 1 << 5,
 };
 
 #define FSYNC_COMPONENTS_OBJECTS (FSYNC_COMPONENT_LOOSE_OBJECT | \
 				  FSYNC_COMPONENT_PACK)
 
+#define FSYNC_COMPONENTS_REFS (FSYNC_COMPONENT_LOOSE_REF)
+
 #define FSYNC_COMPONENTS_DERIVED_METADATA (FSYNC_COMPONENT_PACK_METADATA | \
 					   FSYNC_COMPONENT_COMMIT_GRAPH)
 
@@ -1026,7 +1029,8 @@  enum fsync_component {
 			      FSYNC_COMPONENT_PACK | \
 			      FSYNC_COMPONENT_PACK_METADATA | \
 			      FSYNC_COMPONENT_COMMIT_GRAPH | \
-			      FSYNC_COMPONENT_INDEX)
+			      FSYNC_COMPONENT_INDEX | \
+			      FSYNC_COMPONENT_LOOSE_REF)
 
 /*
  * A bitmask indicating which components of the repo should be fsynced.
diff --git a/config.c b/config.c
index f03f27c3de..b5d3e6e404 100644
--- a/config.c
+++ b/config.c
@@ -1332,7 +1332,9 @@  static const struct fsync_component_entry {
 	{ "pack-metadata", FSYNC_COMPONENT_PACK_METADATA },
 	{ "commit-graph", FSYNC_COMPONENT_COMMIT_GRAPH },
 	{ "index", FSYNC_COMPONENT_INDEX },
+	{ "loose-ref", FSYNC_COMPONENT_LOOSE_REF },
 	{ "objects", FSYNC_COMPONENTS_OBJECTS },
+	{ "refs", FSYNC_COMPONENTS_REFS },
 	{ "derived-metadata", FSYNC_COMPONENTS_DERIVED_METADATA },
 	{ "default", FSYNC_COMPONENTS_DEFAULT },
 	{ "committed", FSYNC_COMPONENTS_COMMITTED },
diff --git a/refs/files-backend.c b/refs/files-backend.c
index f59589d6cc..279316de45 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -1392,6 +1392,15 @@  static int refs_rename_ref_available(struct ref_store *refs,
 	return ok;
 }
 
+static int files_sync_loose_ref(struct ref_lock *lock, struct strbuf *err)
+{
+	int ret = fsync_component(FSYNC_COMPONENT_LOOSE_REF, get_lock_file_fd(&lock->lk));
+	if (ret)
+		strbuf_addf(err, "could not sync loose ref '%s': %s", lock->ref_name,
+			    strerror(errno));
+	return ret;
+}
+
 static int files_copy_or_rename_ref(struct ref_store *ref_store,
 			    const char *oldrefname, const char *newrefname,
 			    const char *logmsg, int copy)
@@ -1504,6 +1513,7 @@  static int files_copy_or_rename_ref(struct ref_store *ref_store,
 	oidcpy(&lock->old_oid, &orig_oid);
 
 	if (write_ref_to_lockfile(lock, &orig_oid, 0, &err) ||
+	    files_sync_loose_ref(lock, &err) ||
 	    commit_ref_update(refs, lock, &orig_oid, logmsg, &err)) {
 		error("unable to write current sha1 into %s: %s", newrefname, err.buf);
 		strbuf_release(&err);
@@ -1524,6 +1534,7 @@  static int files_copy_or_rename_ref(struct ref_store *ref_store,
 	flag = log_all_ref_updates;
 	log_all_ref_updates = LOG_REFS_NONE;
 	if (write_ref_to_lockfile(lock, &orig_oid, 0, &err) ||
+	    files_sync_loose_ref(lock, &err) ||
 	    commit_ref_update(refs, lock, &orig_oid, NULL, &err)) {
 		error("unable to write current sha1 into %s: %s", oldrefname, err.buf);
 		strbuf_release(&err);
@@ -2819,6 +2830,24 @@  static int files_transaction_prepare(struct ref_store *ref_store,
 		}
 	}
 
+	/*
+	 * Sync all lockfiles to disk to ensure data consistency. We do this in
+	 * a separate step such that we can sync all modified refs in a single
+	 * step, which may be more efficient on some filesystems.
+	 */
+	for (i = 0; i < transaction->nr; i++) {
+		struct ref_update *update = transaction->updates[i];
+		struct ref_lock *lock = update->backend_data;
+
+		if (!(update->flags & REF_NEEDS_COMMIT))
+			continue;
+
+		if (files_sync_loose_ref(lock, err)) {
+			ret  = TRANSACTION_GENERIC_ERROR;
+			goto cleanup;
+		}
+	}
+
 cleanup:
 	free(head_ref);
 	string_list_clear(&affected_refnames, 0);