diff mbox series

[9/9] pack-objects: rename .idx files into place after .bitmap files

Message ID d8286cf1075dc85231128145c5abb0db3881032b.1631157880.git.me@ttaylorr.com (mailing list archive)
State Superseded
Headers show
Series packfile: avoid .idx rename races | expand

Commit Message

Taylor Blau Sept. 9, 2021, 3:25 a.m. UTC
From: Ævar Arnfjörð Bjarmason <avarab@gmail.com>

In preceding commits the race of renaming .idx files in place before
.rev files and other auxiliary files was fixed in pack-write.c's
finish_tmp_packfile(), builtin/repack.c's "struct exts", and
builtin/index-pack.c's final(). As noted in the change to pack-write.c
we left in place the issue of writing *.bitmap files after the *.idx,
let's fix that issue.

See 7cc8f971085 (pack-objects: implement bitmap writing, 2013-12-21)
for commentary at the time when *.bitmap was implemented about how
those files are written out, nothing in that commit contradicts what's
being done here.

Note that this commit and preceding ones only close any race condition
with *.idx files being written before their auxiliary files if we're
optimistic about our lack of fsync()-ing in this are not tripping us
over. See the thread at [1] for a rabbit hole of various discussions
about filesystem races in the face of doing and not doing fsync() (and
if doing fsync(), not doing it properly).

In particular, in this case of writing to ".git/objects/pack" we only
write and fsync() the individual files, but if we wanted to guarantee
that the metadata update was seen in that way by concurrent processes
we'd need to fsync() on the "fd" of the containing directory. That
concern is probably more theoretical than not, modern OS's tend to be
more on the forgiving side than the overly pedantic side of
implementing POSIX FS semantics.

1. https://lore.kernel.org/git/8735qgkvv1.fsf@evledraar.gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/pack-objects.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Ævar Arnfjörð Bjarmason Sept. 9, 2021, 7:54 a.m. UTC | #1
On Wed, Sep 08 2021, Taylor Blau wrote:

> From: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
>
> In preceding commits the race of renaming .idx files in place before
> .rev files and other auxiliary files was fixed in pack-write.c's
> finish_tmp_packfile(), builtin/repack.c's "struct exts", and
> builtin/index-pack.c's final(). As noted in the change to pack-write.c
> we left in place the issue of writing *.bitmap files after the *.idx,
> let's fix that issue.
>
> See 7cc8f971085 (pack-objects: implement bitmap writing, 2013-12-21)
> for commentary at the time when *.bitmap was implemented about how
> those files are written out, nothing in that commit contradicts what's
> being done here.
>
> Note that this commit and preceding ones only close any race condition
> with *.idx files being written before their auxiliary files if we're
> optimistic about our lack of fsync()-ing in this are not tripping us
> over. See the thread at [1] for a rabbit hole of various discussions
> about filesystem races in the face of doing and not doing fsync() (and
> if doing fsync(), not doing it properly).
>
> In particular, in this case of writing to ".git/objects/pack" we only
> write and fsync() the individual files, but if we wanted to guarantee
> that the metadata update was seen in that way by concurrent processes
> we'd need to fsync() on the "fd" of the containing directory. That
> concern is probably more theoretical than not, modern OS's tend to be
> more on the forgiving side than the overly pedantic side of
> implementing POSIX FS semantics.

Some weird gramma/phrasing of mine left over here, i.e.  the "[...] in
this are not tripping us over.". On reflection perhaps it's better to
replace these last two paragraphs with say:

    With this and preceding commits we've covered all the cases where we
    wrote another auxiliary file before the *.idx file, and should thus
    never have another concurrent process try to use an incomplete set
    of pack-OID.* files.

    This assumes that the renames we make here and elsewhere appear on
    the filesystem in the order we performed them. This assumption is
    known to be false in the face of pedantic POSIX FS semantics. See
    the thread around [1] for discussions on that point.

    We fsync() the file descriptors we're writing out for all the
    auxiliary files, but we don't yet fsync() the file descriptor for
    the containing directory. Therefore our data might have been written
    out, but it's anyone's guess what the state of the directory
    containing the file is after we write the *.idx.

    In practice modern OS's are known to be forgiving on that point, so
    this will probably solve races in practice for most users. It will
    almost certainly make them better than they were before when we
    didn't write *.idx files last. We should more generally improve our
    use of fsync() to cover containing directories, but that'll
    hopefully be addressed by some follow-up series.
Junio C Hamano Sept. 9, 2021, 7:52 p.m. UTC | #2
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

>     We fsync() the file descriptors we're writing out for all the
>     auxiliary files, but we don't yet fsync() the file descriptor for
>     the containing directory. Therefore our data might have been written
>     out, but it's anyone's guess what the state of the directory
>     containing the file is after we write the *.idx.
>
>     In practice modern OS's are known to be forgiving on that point, so
>     this will probably solve races in practice for most users. It will
>     almost certainly make them better than they were before when we
>     didn't write *.idx files last. We should more generally improve our
>     use of fsync() to cover containing directories, but that'll
>     hopefully be addressed by some follow-up series.

I'd probably drop the last paragraph, and replace it with a single
sentence "we may want to fsync the containing directory once after
placing *.idx file in place, but it is outside of the scope of this
series", if I were doing this series.

Other than that (and I made a few comments on other patches), these
were pleasant read.

Thanks.
Taylor Blau Sept. 9, 2021, 9:13 p.m. UTC | #3
On Thu, Sep 09, 2021 at 12:52:03PM -0700, Junio C Hamano wrote:
> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>
> >     We fsync() the file descriptors we're writing out for all the
> >     auxiliary files, but we don't yet fsync() the file descriptor for
> >     the containing directory. Therefore our data might have been written
> >     out, but it's anyone's guess what the state of the directory
> >     containing the file is after we write the *.idx.
> >
> >     In practice modern OS's are known to be forgiving on that point, so
> >     this will probably solve races in practice for most users. It will
> >     almost certainly make them better than they were before when we
> >     didn't write *.idx files last. We should more generally improve our
> >     use of fsync() to cover containing directories, but that'll
> >     hopefully be addressed by some follow-up series.
>
> I'd probably drop the last paragraph, and replace it with a single
> sentence "we may want to fsync the containing directory once after
> placing *.idx file in place, but it is outside of the scope of this
> series", if I were doing this series.

Yep, I think that this makes things clearer, and I prefer it to Ævar's
suggestion (which conveys more information than I think is necessary
here).

Thanks,
Taylor
diff mbox series

Patch

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 944134b6f2..a01767a384 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -1250,7 +1250,6 @@  static void write_pack_file(void)
 			stage_tmp_packfiles(&tmpname, pack_tmp_name,
 					    written_list, nr_written,
 					    &pack_idx_opts, hash, &idx_tmp_name);
-			rename_tmp_packfile_idx(&tmpname, &idx_tmp_name);
 
 			if (write_bitmap_index) {
 				size_t tmpname_len = tmpname.len;
@@ -1267,6 +1266,8 @@  static void write_pack_file(void)
 				strbuf_setlen(&tmpname, tmpname_len);
 			}
 
+			rename_tmp_packfile_idx(&tmpname, &idx_tmp_name);
+
 			free(idx_tmp_name);
 			strbuf_release(&tmpname);
 			free(pack_tmp_name);