diff mbox series

refs: fix corruption by not correctly syncing packed-refs to disk

Message ID 8c8ecf8e3718cbca049ee7a283edd7b7887e464e.1671547905.git.ps@pks.im (mailing list archive)
State Accepted
Commit ce54672f9b017adf60d15bc7174994b63cb29d3a
Headers show
Series refs: fix corruption by not correctly syncing packed-refs to disk | expand

Commit Message

Patrick Steinhardt Dec. 20, 2022, 2:52 p.m. UTC
At GitLab we have recently received a report where a repository was left
with a corrupted `packed-refs` file after the node hard-crashed even
though `core.fsync=reference` was set. This is something that in theory
should not happen if we correctly did the atomic-rename dance to:

    1. Write the data into a temporary file.

    2. Synchronize the temporary file to disk.

    3. Rename the temporary file into place.

So if we crash in the middle of writing the `packed-refs` file we should
only ever see either the old or the new state of the file.

And while we do the dance when writing the `packed-refs` file, there is
indeed one gotcha: we use a `FILE *` stream to write the temporary file,
but don't flush it before synchronizing it to disk. As a consequence any
data that is still buffered will not get synchronized and a crash of the
machine may cause corruption.

Fix this bug by flushing the file stream before we fsync.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 refs/packed-backend.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Jeff King Dec. 20, 2022, 3:44 p.m. UTC | #1
On Tue, Dec 20, 2022 at 03:52:14PM +0100, Patrick Steinhardt wrote:

> And while we do the dance when writing the `packed-refs` file, there is
> indeed one gotcha: we use a `FILE *` stream to write the temporary file,
> but don't flush it before synchronizing it to disk. As a consequence any
> data that is still buffered will not get synchronized and a crash of the
> machine may cause corruption.

The problem description makes sense, and so does your fix.

Grepping for other uses of fsync_component(), this looks like the only
buggy case (loose refs use write() directly, and most other files go via
finalize_hashfile(), which does likewise).

> diff --git a/refs/packed-backend.c b/refs/packed-backend.c
> index c1c71d183e..6f5a0709fb 100644
> --- a/refs/packed-backend.c
> +++ b/refs/packed-backend.c
> @@ -1263,7 +1263,8 @@ static int write_with_updates(struct packed_ref_store *refs,
>  		goto error;
>  	}
>  
> -	if (fsync_component(FSYNC_COMPONENT_REFERENCE, get_tempfile_fd(refs->tempfile)) ||
> +	if (fflush(out) ||
> +	    fsync_component(FSYNC_COMPONENT_REFERENCE, get_tempfile_fd(refs->tempfile)) ||
>  	    close_tempfile_gently(refs->tempfile)) {

It kind of feels like this ought to be part of fsync_component() or
close_tempfile_gently(), but it would pollute those interfaces:

  - fsync_component() doesn't otherwise know about stdio

  - close_tempfile_gently() doesn't otherwise know about syncing (and it
    would have to learn about fsync_components to do it right).

So given that this is the only affected site, it makes sense to just fix
it for now and worry about a more generalized solution if we run into it
again.

-Peff
Junio C Hamano Dec. 25, 2022, 11:25 a.m. UTC | #2
Jeff King <peff@peff.net> writes:

> On Tue, Dec 20, 2022 at 03:52:14PM +0100, Patrick Steinhardt wrote:
>
>> And while we do the dance when writing the `packed-refs` file, there is
>> indeed one gotcha: we use a `FILE *` stream to write the temporary file,
>> but don't flush it before synchronizing it to disk. As a consequence any
>> data that is still buffered will not get synchronized and a crash of the
>> machine may cause corruption.
>
> The problem description makes sense, and so does your fix.
>
> Grepping for other uses of fsync_component(), this looks like the only
> buggy case (loose refs use write() directly, and most other files go via
> finalize_hashfile(), which does likewise).
> ...
> So given that this is the only affected site, it makes sense to just fix
> it for now and worry about a more generalized solution if we run into it
> again.

Sounds good.

This came from bc22d845 (core.fsync: new option to harden
references, 2022-03-11), before which we did not even fsync() the
file, so let me apply directly on top of that commit.  Those who are
stuck on older versions of Git can choose to merge the result, even
though I may probably not bother merging it down to anything older
than 2.39 maintenance track.

Thanks.
diff mbox series

Patch

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index c1c71d183e..6f5a0709fb 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -1263,7 +1263,8 @@  static int write_with_updates(struct packed_ref_store *refs,
 		goto error;
 	}
 
-	if (fsync_component(FSYNC_COMPONENT_REFERENCE, get_tempfile_fd(refs->tempfile)) ||
+	if (fflush(out) ||
+	    fsync_component(FSYNC_COMPONENT_REFERENCE, get_tempfile_fd(refs->tempfile)) ||
 	    close_tempfile_gently(refs->tempfile)) {
 		strbuf_addf(err, "error closing file %s: %s",
 			    get_tempfile_path(refs->tempfile),