Message ID | 8c8ecf8e3718cbca049ee7a283edd7b7887e464e.1671547905.git.ps@pks.im (mailing list archive) |
---|---|
State | Accepted |
Commit | ce54672f9b017adf60d15bc7174994b63cb29d3a |
Headers | show |
Series | refs: fix corruption by not correctly syncing packed-refs to disk | expand |
On Tue, Dec 20, 2022 at 03:52:14PM +0100, Patrick Steinhardt wrote: > And while we do the dance when writing the `packed-refs` file, there is > indeed one gotcha: we use a `FILE *` stream to write the temporary file, > but don't flush it before synchronizing it to disk. As a consequence any > data that is still buffered will not get synchronized and a crash of the > machine may cause corruption. The problem description makes sense, and so does your fix. Grepping for other uses of fsync_component(), this looks like the only buggy case (loose refs use write() directly, and most other files go via finalize_hashfile(), which does likewise). > diff --git a/refs/packed-backend.c b/refs/packed-backend.c > index c1c71d183e..6f5a0709fb 100644 > --- a/refs/packed-backend.c > +++ b/refs/packed-backend.c > @@ -1263,7 +1263,8 @@ static int write_with_updates(struct packed_ref_store *refs, > goto error; > } > > - if (fsync_component(FSYNC_COMPONENT_REFERENCE, get_tempfile_fd(refs->tempfile)) || > + if (fflush(out) || > + fsync_component(FSYNC_COMPONENT_REFERENCE, get_tempfile_fd(refs->tempfile)) || > close_tempfile_gently(refs->tempfile)) { It kind of feels like this ought to be part of fsync_component() or close_tempfile_gently(), but it would pollute those interfaces: - fsync_component() doesn't otherwise know about stdio - close_tempfile_gently() doesn't otherwise know about syncing (and it would have to learn about fsync_components to do it right). So given that this is the only affected site, it makes sense to just fix it for now and worry about a more generalized solution if we run into it again. -Peff
Jeff King <peff@peff.net> writes: > On Tue, Dec 20, 2022 at 03:52:14PM +0100, Patrick Steinhardt wrote: > >> And while we do the dance when writing the `packed-refs` file, there is >> indeed one gotcha: we use a `FILE *` stream to write the temporary file, >> but don't flush it before synchronizing it to disk. As a consequence any >> data that is still buffered will not get synchronized and a crash of the >> machine may cause corruption. > > The problem description makes sense, and so does your fix. > > Grepping for other uses of fsync_component(), this looks like the only > buggy case (loose refs use write() directly, and most other files go via > finalize_hashfile(), which does likewise). > ... > So given that this is the only affected site, it makes sense to just fix > it for now and worry about a more generalized solution if we run into it > again. Sounds good. This came from bc22d845 (core.fsync: new option to harden references, 2022-03-11), before which we did not even fsync() the file, so let me apply directly on top of that commit. Those who are stuck on older versions of Git can choose to merge the result, even though I may probably not bother merging it down to anything older than 2.39 maintenance track. Thanks.
diff --git a/refs/packed-backend.c b/refs/packed-backend.c index c1c71d183e..6f5a0709fb 100644 --- a/refs/packed-backend.c +++ b/refs/packed-backend.c @@ -1263,7 +1263,8 @@ static int write_with_updates(struct packed_ref_store *refs, goto error; } - if (fsync_component(FSYNC_COMPONENT_REFERENCE, get_tempfile_fd(refs->tempfile)) || + if (fflush(out) || + fsync_component(FSYNC_COMPONENT_REFERENCE, get_tempfile_fd(refs->tempfile)) || close_tempfile_gently(refs->tempfile)) { strbuf_addf(err, "error closing file %s: %s", get_tempfile_path(refs->tempfile),
At GitLab we have recently received a report where a repository was left with a corrupted `packed-refs` file after the node hard-crashed even though `core.fsync=reference` was set. This is something that in theory should not happen if we correctly did the atomic-rename dance to: 1. Write the data into a temporary file. 2. Synchronize the temporary file to disk. 3. Rename the temporary file into place. So if we crash in the middle of writing the `packed-refs` file we should only ever see either the old or the new state of the file. And while we do the dance when writing the `packed-refs` file, there is indeed one gotcha: we use a `FILE *` stream to write the temporary file, but don't flush it before synchronizing it to disk. As a consequence any data that is still buffered will not get synchronized and a crash of the machine may cause corruption. Fix this bug by flushing the file stream before we fsync. Signed-off-by: Patrick Steinhardt <ps@pks.im> --- refs/packed-backend.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)