mbox series

[v2,0/3] prevent opening packs too early

Message ID cover.1631139433.git.me@ttaylorr.com (mailing list archive)
Headers show
Series prevent opening packs too early | expand

Message

Taylor Blau Sept. 8, 2021, 10:17 p.m. UTC
Here is a very small reroll of a couple of patches to make sure that packs are
not read before all of their auxiliary files exist and are moved into place, by
renaming the .idx file into place last.

New since the original version is a patch to apply similar treatment to
index-pack, as noticed by Peff in [1]. This should be queued before Ævar's
series on top.

[1]: https://lore.kernel.org/git/YS75P7r33NIBmFh2@coredump.intra.peff.net/

Taylor Blau (3):
  pack-write.c: rename `.idx` files into place last
  builtin/repack.c: move `.idx` files into place last
  builtin/index-pack.c: move `.idx` files into place last

 builtin/index-pack.c | 16 ++++++++--------
 builtin/repack.c     |  2 +-
 pack-write.c         | 12 ++++++------
 3 files changed, 15 insertions(+), 15 deletions(-)

Range-diff against v1:
1:  ea3b1a0d8e ! 1:  cb3a843994 pack-write.c: rename `.idx` file into place last
    @@ Metadata
     Author: Taylor Blau <me@ttaylorr.com>
     
      ## Commit message ##
    -    pack-write.c: rename `.idx` file into place last
    +    pack-write.c: rename `.idx` files into place last
     
         We treat the presence of an `.idx` file as the indicator of whether or
         not it's safe to use a packfile. But `finish_tmp_packfile()` (which is
2:  a6a4d2154e = 2:  925f9ada2a builtin/repack.c: move `.idx` files into place last
-:  ---------- > 3:  460b7b9151 builtin/index-pack.c: move `.idx` files into place last

Comments

Ævar Arnfjörð Bjarmason Sept. 8, 2021, 11:52 p.m. UTC | #1
On Wed, Sep 08 2021, Taylor Blau wrote:

> Here is a very small reroll of a couple of patches to make sure that packs are
> not read before all of their auxiliary files exist and are moved into place, by
> renaming the .idx file into place last.
>
> New since the original version is a patch to apply similar treatment to
> index-pack, as noticed by Peff in [1]. This should be queued before Ævar's
> series on top.

I read Junio's earlier <xmqq8s063m7m.fsf@gitster.g>[1] as a suggestion
that we combine the two into a singe series. I do think that yields a
better end-result, in particular your 1/3 is much more readable if the
refactoring in my 2/4.

I'm mot of the way done with such a rewrite, which also incorporates
your suggestion for how to manage memory in rename_tmp_packfile(), but
I'll hold of on what you & Junio have to say about next steps before
adding to the re-roll competition Junio mentioned...

1. https://lore.kernel.org/git/xmqq8s063m7m.fsf@gitster.g
Ævar Arnfjörð Bjarmason Sept. 9, 2021, 12:50 a.m. UTC | #2
On Thu, Sep 09 2021, Ævar Arnfjörð Bjarmason wrote:

> On Wed, Sep 08 2021, Taylor Blau wrote:
>
>> Here is a very small reroll of a couple of patches to make sure that packs are
>> not read before all of their auxiliary files exist and are moved into place, by
>> renaming the .idx file into place last.
>>
>> New since the original version is a patch to apply similar treatment to
>> index-pack, as noticed by Peff in [1]. This should be queued before Ævar's
>> series on top.
>
> I read Junio's earlier <xmqq8s063m7m.fsf@gitster.g>[1] as a suggestion
> that we combine the two into a singe series. I do think that yields a
> better end-result, in particular your 1/3 is much more readable if the
> refactoring in my 2/4.
>
> I'm mot of the way done with such a rewrite, which also incorporates
> your suggestion for how to manage memory in rename_tmp_packfile(), but
> I'll hold of on what you & Junio have to say about next steps before
> adding to the re-roll competition Junio mentioned...
>
> 1. https://lore.kernel.org/git/xmqq8s063m7m.fsf@gitster.g

I've got that at
https://github.com/git/git/compare/master...avar:avar-tb/idx-rename-race-3

Range-diff to my own v2 + your v1 follows (i.e. not my v2 + your v2
here). I can submit this as-is to the list, or something else. Junio
nominated you "team leader", so I'll go with your plan :)

I think the functional changes on top of the function refactoring make
it much easier to see what's going on, e.g.:

https://github.com/git/git/commit/832d9fa083d20abab35695037152dfeb1f9fe50a

I took the liberty of similarly creating a helper function in
index-pack.c, which turns your commit there into this:
https://github.com/git/git/commit/8c2dcbf6011c466a806491672c0d7308e8105a1d

The relevant commit messages were also adjusted to note that in the
earlier *.idx v.s. *.rev we're leaving the similar *.bitmap problem for
a later change.

3:  29f57876515 = 1:  4869f97408b pack.h: line-wrap the definition of finish_tmp_packfile()
4:  7b39f4599b1 ! 2:  c0b1ec90d43 pack-write: refactor renaming in finish_tmp_packfile()
    @@ Metadata
      ## Commit message ##
         pack-write: refactor renaming in finish_tmp_packfile()
     
    -    Refactor the renaming in finish_tmp_packfile() so that it takes a
    -    "const struct strbuf *" instead of a non-const, and refactor the
    -    repetitive renaming pattern in finish_tmp_packfile() to use a new
    -    static rename_tmp_packfile() helper function.
    +    Refactor the renaming in finish_tmp_packfile() into a helper
    +    function. The callers are now expected to pass a "name_buffer" ending
    +    in "pack-OID." instead of the previous "pack-", we then append "pack",
    +    "idx" or "rev" to it.
    +
    +    By doing the strbuf_setlen() in rename_tmp_packfile() we re-use the
    +    buffer and avoid the repeated allocations we'd get if that function
    +    had its own temporary "struct strbuf".
    +
    +    This approach of re-using the buffer does make the last user in
    +    pack-object.c's write_pack_file() slightly awkward, we needlessly do a
    +    strbuf_setlen() there just before the strbuf_release() for
    +    consistency. In subsequent changes we'll move that bitmap writing code
    +    around, so let's not skip the strbuf_setlen() now.
     
         The previous strbuf_reset() idiom originated with
         5889271114a (finish_tmp_packfile():use strbuf for pathname
    @@ Commit message
         pre-strbuf code added in 0e990530ae (finish_tmp_packfile(): a helper
         function, 2011-10-28).
     
    -    Since the memory allocation is not a bottleneck here we can afford a
    -    bit more readability at the cost of re-allocating this new "struct
    -    strbuf sb".
    -
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
      ## builtin/pack-objects.c ##
    +@@ builtin/pack-objects.c: static void write_pack_file(void)
    + 					warning_errno(_("failed utime() on %s"), pack_tmp_name);
    + 			}
    + 
    +-			strbuf_addf(&tmpname, "%s-", base_name);
    ++			strbuf_addf(&tmpname, "%s-%s.", base_name,
    ++				    hash_to_hex(hash));
    + 
    + 			if (write_bitmap_index) {
    + 				bitmap_writer_set_checksum(hash);
     @@ builtin/pack-objects.c: static void write_pack_file(void)
      					    &pack_idx_opts, hash);
      
      			if (write_bitmap_index) {
     -				strbuf_addf(&tmpname, "%s.bitmap", hash_to_hex(hash));
    -+				struct strbuf sb = STRBUF_INIT;
    -+
    -+				strbuf_addf(&sb, "%s%s.bitmap", tmpname.buf,
    -+					    hash_to_hex(hash));
    ++				size_t tmpname_len = tmpname.len;
      
    ++				strbuf_addstr(&tmpname, "bitmap");
      				stop_progress(&progress_state);
      
    + 				bitmap_writer_show_progress(progress);
     @@ builtin/pack-objects.c: static void write_pack_file(void)
    - 				bitmap_writer_select_commits(indexed_commits, indexed_commits_nr, -1);
    - 				bitmap_writer_build(&to_pack);
      				bitmap_writer_finish(written_list, nr_written,
    --						     tmpname.buf, write_bitmap_options);
    -+						     sb.buf, write_bitmap_options);
    + 						     tmpname.buf, write_bitmap_options);
      				write_bitmap_index = 0;
    -+				strbuf_release(&sb);
    ++				strbuf_setlen(&tmpname, tmpname_len);
      			}
      
      			strbuf_release(&tmpname);
    @@ pack-write.c: struct hashfile *create_tmp_packfile(char **pack_tmp_name)
      	return hashfd(fd, *pack_tmp_name);
      }
      
    --void finish_tmp_packfile(struct strbuf *name_buffer,
    -+static void rename_tmp_packfile(const char *source,
    -+				 const struct strbuf *basename,
    -+				 const unsigned char hash[],
    -+				 const char *ext)
    ++static void rename_tmp_packfile(struct strbuf *nb, const char *source,
    ++				const char *ext)
     +{
    -+	struct strbuf sb = STRBUF_INIT;
    ++	size_t nb_len = nb->len;
     +
    -+	strbuf_addf(&sb, "%s%s.%s", basename->buf, hash_to_hex(hash), ext);
    -+	if (rename(source, sb.buf))
    ++	strbuf_addstr(nb, ext);
    ++	if (rename(source, nb->buf))
     +		die_errno("unable to rename temporary '*.%s' file to '%s'",
    -+			  ext, sb.buf);
    -+	strbuf_release(&sb);
    ++			  ext, nb->buf);
    ++	strbuf_setlen(nb, nb_len);
     +}
     +
    -+void finish_tmp_packfile(const struct strbuf *basename,
    + void finish_tmp_packfile(struct strbuf *name_buffer,
      			 const char *pack_tmp_name,
      			 struct pack_idx_entry **written_list,
    - 			 uint32_t nr_written,
     @@ pack-write.c: void finish_tmp_packfile(struct strbuf *name_buffer,
      			 unsigned char hash[])
      {
    @@ pack-write.c: void finish_tmp_packfile(struct strbuf *name_buffer,
     -
     -	strbuf_setlen(name_buffer, basename_len);
     -
    +-	strbuf_addf(name_buffer, "%s.idx", hash_to_hex(hash));
    +-	if (rename(idx_tmp_name, name_buffer->buf))
    +-		die_errno("unable to rename temporary index file");
    +-
    +-	strbuf_setlen(name_buffer, basename_len);
    +-
     -	if (rev_tmp_name) {
     -		strbuf_addf(name_buffer, "%s.rev", hash_to_hex(hash));
     -		if (rename(rev_tmp_name, name_buffer->buf))
     -			die_errno("unable to rename temporary reverse-index file");
    --
    --		strbuf_setlen(name_buffer, basename_len);
     -	}
     -
    --	strbuf_addf(name_buffer, "%s.idx", hash_to_hex(hash));
    --	if (rename(idx_tmp_name, name_buffer->buf))
    --		die_errno("unable to rename temporary index file");
    --
     -	strbuf_setlen(name_buffer, basename_len);
    -+	rename_tmp_packfile(pack_tmp_name, basename, hash, "pack");
    ++	rename_tmp_packfile(name_buffer, pack_tmp_name, "pack");
    ++	rename_tmp_packfile(name_buffer, idx_tmp_name, "idx");
     +	if (rev_tmp_name)
    -+		rename_tmp_packfile(rev_tmp_name, basename, hash, "rev");
    -+	rename_tmp_packfile(idx_tmp_name, basename, hash, "idx");
    ++		rename_tmp_packfile(name_buffer, rev_tmp_name, "rev");
      
      	free((void *)idx_tmp_name);
      }
    -
    - ## pack.h ##
    -@@ pack.h: int encode_in_pack_object_header(unsigned char *hdr, int hdr_len,
    - int read_pack_header(int fd, struct pack_header *);
    - 
    - struct hashfile *create_tmp_packfile(char **pack_tmp_name);
    --void finish_tmp_packfile(struct strbuf *name_buffer,
    -+void finish_tmp_packfile(const struct strbuf *basename,
    - 			 const char *pack_tmp_name,
    - 			 struct pack_idx_entry **written_list,
    - 			 uint32_t nr_written,
-:  ----------- > 3:  832d9fa083d pack-write.c: rename `.idx` files after `*.rev'
2:  a6a4d2154e8 ! 4:  9f7e005ba03 builtin/repack.c: move `.idx` files into place last
    @@ Commit message
         itself).
     
         Signed-off-by: Taylor Blau <me@ttaylorr.com>
    +    Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
      ## builtin/repack.c ##
     @@ builtin/repack.c: static struct {
1:  ea3b1a0d8ed ! 5:  457d4b9787c pack-write.c: rename `.idx` file into place last
    @@
      ## Metadata ##
    -Author: Taylor Blau <me@ttaylorr.com>
    +Author: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
      ## Commit message ##
    -    pack-write.c: rename `.idx` file into place last
    +    index-pack: refactor renaming in final()
     
    -    We treat the presence of an `.idx` file as the indicator of whether or
    -    not it's safe to use a packfile. But `finish_tmp_packfile()` (which is
    -    responsible for writing the pack and moving the temporary versions of
    -    all of its auxiliary files into place) is inconsistent about the write
    -    order.
    +    Refactor the renaming in final() into a helper function, this is
    +    similar in spirit to a preceding refactoring of finish_tmp_packfile()
    +    in pack-write.c.
     
    -    But the `.rev` file is moved into place after the `.idx`, so it's
    -    possible for a process to open a pack in between the two (i.e., while
    -    the `.idx` file is in place but the `.rev` file is not) and mistakenly
    -    fall back to generating the pack's reverse index in memory. Though racy,
    -    this amounts to an unnecessary slow-down at worst, and doesn't affect
    -    the correctness of the resulting reverse index.
    +    Before e37d0b8730b (builtin/index-pack.c: write reverse indexes,
    +    2021-01-25) it probably wasn't worth it to have this sort of helper,
    +    due to the differing "else if" case for "pack" files v.s. "idx" files.
     
    -    Close this race by moving the .rev file into place before moving the
    -    .idx file into place.
    +    But since we've got "rev" as well now, let's do the renaming via a
    +    helper, this is both a net decrease in lines, and improves the
    +    readability, since we can easily see at a glance that the logic for
    +    writing these three types of files is exactly the same, aside from the
    +    obviously differing cases of "*final_xyz_name" being NULL, and
    +    "else_chmod_if" being different.
     
    -    While we're here, only call strbuf_setlen() if we actually modified the
    -    buffer by bringing it inside of the same if-statement that calls
    -    rename().
    +    Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
    -    Signed-off-by: Taylor Blau <me@ttaylorr.com>
    -
    - ## pack-write.c ##
    -@@ pack-write.c: void finish_tmp_packfile(struct strbuf *name_buffer,
    + ## builtin/index-pack.c ##
    +@@ builtin/index-pack.c: static void write_special_file(const char *suffix, const char *msg,
    + 	strbuf_release(&name_buf);
    + }
      
    - 	strbuf_setlen(name_buffer, basename_len);
    ++static void rename_tmp_packfile(const char **final_xyz_name,
    ++				const char *curr_xyz_name,
    ++				struct strbuf *xyz_name, unsigned char *hash,
    ++				const char *ext, int else_chmod_if)
    ++{
    ++	if (*final_xyz_name != curr_xyz_name) {
    ++		if (!*final_xyz_name)
    ++			*final_xyz_name = odb_pack_name(xyz_name, hash, ext);
    ++		if (finalize_object_file(curr_xyz_name, *final_xyz_name))
    ++			die(_("unable to rename temporary '*.%s' file to '%s"),
    ++			    ext, *final_xyz_name);
    ++	} else if (else_chmod_if) {
    ++		chmod(*final_xyz_name, 0444);
    ++	}
    ++}
    ++
    + static void final(const char *final_pack_name, const char *curr_pack_name,
    + 		  const char *final_index_name, const char *curr_index_name,
    + 		  const char *final_rev_index_name, const char *curr_rev_index_name,
    +@@ builtin/index-pack.c: static void final(const char *final_pack_name, const char *curr_pack_name,
    + 		write_special_file("promisor", promisor_msg, final_pack_name,
    + 				   hash, NULL);
      
    --	strbuf_addf(name_buffer, "%s.idx", hash_to_hex(hash));
    --	if (rename(idx_tmp_name, name_buffer->buf))
    --		die_errno("unable to rename temporary index file");
    +-	if (final_pack_name != curr_pack_name) {
    +-		if (!final_pack_name)
    +-			final_pack_name = odb_pack_name(&pack_name, hash, "pack");
    +-		if (finalize_object_file(curr_pack_name, final_pack_name))
    +-			die(_("cannot store pack file"));
    +-	} else if (from_stdin)
    +-		chmod(final_pack_name, 0444);
     -
    --	strbuf_setlen(name_buffer, basename_len);
    +-	if (final_index_name != curr_index_name) {
    +-		if (!final_index_name)
    +-			final_index_name = odb_pack_name(&index_name, hash, "idx");
    +-		if (finalize_object_file(curr_index_name, final_index_name))
    +-			die(_("cannot store index file"));
    +-	} else
    +-		chmod(final_index_name, 0444);
     -
    - 	if (rev_tmp_name) {
    - 		strbuf_addf(name_buffer, "%s.rev", hash_to_hex(hash));
    - 		if (rename(rev_tmp_name, name_buffer->buf))
    - 			die_errno("unable to rename temporary reverse-index file");
    -+
    -+		strbuf_setlen(name_buffer, basename_len);
    - 	}
    - 
    -+	strbuf_addf(name_buffer, "%s.idx", hash_to_hex(hash));
    -+	if (rename(idx_tmp_name, name_buffer->buf))
    -+		die_errno("unable to rename temporary index file");
    -+
    - 	strbuf_setlen(name_buffer, basename_len);
    +-	if (curr_rev_index_name) {
    +-		if (final_rev_index_name != curr_rev_index_name) {
    +-			if (!final_rev_index_name)
    +-				final_rev_index_name = odb_pack_name(&rev_index_name, hash, "rev");
    +-			if (finalize_object_file(curr_rev_index_name, final_rev_index_name))
    +-				die(_("cannot store reverse index file"));
    +-		} else
    +-			chmod(final_rev_index_name, 0444);
    +-	}
    ++	rename_tmp_packfile(&final_pack_name, curr_pack_name, &pack_name,
    ++			    hash, "pack", from_stdin);
    ++	rename_tmp_packfile(&final_index_name, curr_index_name, &index_name,
    ++			    hash, "idx", 1);
    ++	if (curr_rev_index_name)
    ++		rename_tmp_packfile(&final_rev_index_name, curr_rev_index_name,
    ++				    &rev_index_name, hash, "rev", 1);
      
    - 	free((void *)idx_tmp_name);
    + 	if (do_fsck_object) {
    + 		struct packed_git *p;
-:  ----------- > 6:  8c2dcbf6011 builtin/index-pack.c: move `.idx` files into place last
5:  1205f9d0c25 ! 7:  3a0ee7e4a99 pack-write: split up finish_tmp_packfile() function
    @@ builtin/pack-objects.c: static void write_pack_file(void)
      					    written_list, nr_written,
     -					    &pack_idx_opts, hash);
     +					    &pack_idx_opts, hash, &idx_tmp_name);
    -+			rename_tmp_packfile_idx(&tmpname, hash, &idx_tmp_name);
    ++			rename_tmp_packfile_idx(&tmpname, &idx_tmp_name);
      
      			if (write_bitmap_index) {
    - 				struct strbuf sb = STRBUF_INIT;
    + 				size_t tmpname_len = tmpname.len;
     @@ builtin/pack-objects.c: static void write_pack_file(void)
    - 				strbuf_release(&sb);
    + 				strbuf_setlen(&tmpname, tmpname_len);
      			}
      
     +			free(idx_tmp_name);
    @@ bulk-checkin.c: static struct bulk_checkin_state {
      	uint32_t nr_written;
      } state;
      
    -+static void finish_tmp_packfile(const struct strbuf *basename,
    ++static void finish_tmp_packfile(struct strbuf *basename,
     +				const char *pack_tmp_name,
     +				struct pack_idx_entry **written_list,
     +				uint32_t nr_written,
    @@ bulk-checkin.c: static struct bulk_checkin_state {
     +
     +	stage_tmp_packfiles(basename, pack_tmp_name, written_list, nr_written,
     +			    pack_idx_opts, hash, &idx_tmp_name);
    -+	rename_tmp_packfile_idx(basename, hash, &idx_tmp_name);
    ++	rename_tmp_packfile_idx(basename, &idx_tmp_name);
     +
     +	free(idx_tmp_name);
     +}
    @@ bulk-checkin.c: static struct bulk_checkin_state {
      static void finish_bulk_checkin(struct bulk_checkin_state *state)
      {
      	struct object_id oid;
    +@@ bulk-checkin.c: static void finish_bulk_checkin(struct bulk_checkin_state *state)
    + 		close(fd);
    + 	}
    + 
    +-	strbuf_addf(&packname, "%s/pack/pack-", get_object_directory());
    ++	strbuf_addf(&packname, "%s/pack/pack-%s.", get_object_directory(),
    ++		    oid_to_hex(&oid));
    + 	finish_tmp_packfile(&packname, state->pack_tmp_name,
    + 			    state->written, state->nr_written,
    + 			    &state->pack_idx_opts, oid.hash);
     
      ## pack-write.c ##
    -@@ pack-write.c: static void rename_tmp_packfile(const char *source,
    - 	strbuf_release(&sb);
    +@@ pack-write.c: static void rename_tmp_packfile(struct strbuf *nb, const char *source,
    + 	strbuf_setlen(nb, nb_len);
      }
      
    --void finish_tmp_packfile(const struct strbuf *basename,
    -+void rename_tmp_packfile_idx(const struct strbuf *basename,
    -+			      unsigned char hash[], char **idx_tmp_name)
    +-void finish_tmp_packfile(struct strbuf *name_buffer,
    ++void rename_tmp_packfile_idx(struct strbuf *name_buffer,
    ++			     char **idx_tmp_name)
     +{
    -+	rename_tmp_packfile(*idx_tmp_name, basename, hash, "idx");
    ++	rename_tmp_packfile(name_buffer, *idx_tmp_name, "idx");
     +}
     +
    -+void stage_tmp_packfiles(const struct strbuf *basename,
    ++void stage_tmp_packfiles(struct strbuf *name_buffer,
      			 const char *pack_tmp_name,
      			 struct pack_idx_entry **written_list,
      			 uint32_t nr_written,
    @@ pack-write.c: static void rename_tmp_packfile(const char *source,
      		die_errno("unable to make temporary index file readable");
      
      	rev_tmp_name = write_rev_file(NULL, written_list, nr_written, hash,
    -@@ pack-write.c: void finish_tmp_packfile(const struct strbuf *basename,
    - 	rename_tmp_packfile(pack_tmp_name, basename, hash, "pack");
    +@@ pack-write.c: void finish_tmp_packfile(struct strbuf *name_buffer,
    + 	rename_tmp_packfile(name_buffer, pack_tmp_name, "pack");
      	if (rev_tmp_name)
    - 		rename_tmp_packfile(rev_tmp_name, basename, hash, "rev");
    --	rename_tmp_packfile(idx_tmp_name, basename, hash, "idx");
    + 		rename_tmp_packfile(name_buffer, rev_tmp_name, "rev");
    +-	rename_tmp_packfile(name_buffer, idx_tmp_name, "idx");
     -
     -	free((void *)idx_tmp_name);
      }
    @@ pack.h: int encode_in_pack_object_header(unsigned char *hdr, int hdr_len,
      int read_pack_header(int fd, struct pack_header *);
      
      struct hashfile *create_tmp_packfile(char **pack_tmp_name);
    --void finish_tmp_packfile(const struct strbuf *basename,
    -+void stage_tmp_packfiles(const struct strbuf *basename,
    +-void finish_tmp_packfile(struct strbuf *name_buffer,
    ++void stage_tmp_packfiles(struct strbuf *name_buffer,
      			 const char *pack_tmp_name,
      			 struct pack_idx_entry **written_list,
      			 uint32_t nr_written,
    @@ pack.h: int encode_in_pack_object_header(unsigned char *hdr, int hdr_len,
     -			 unsigned char sha1[]);
     +			 unsigned char hash[],
     +			 char **idx_tmp_name);
    -+void rename_tmp_packfile_idx(const struct strbuf *tmp_basename,
    -+			     unsigned char hash[], char **idx_tmp_name);
    ++void rename_tmp_packfile_idx(struct strbuf *basename,
    ++			     char **idx_tmp_name);
      
      #endif
6:  70f4a9767d3 ! 8:  b1b232b80de pack-write: rename *.idx file into place last (really!)
    @@ Metadata
     Author: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
      ## Commit message ##
    -    pack-write: rename *.idx file into place last (really!)
    +    pack-objects: rename .idx files into place after .bitmap files
     
    -    Follow-up a preceding commit (pack-write.c: rename `.idx` file into
    -    place last, 2021-08-16)[1] and rename the *.idx file in-place after we
    -    write the *.bitmap. The preceding commit fixed the issue of *.idx
    -    being written before *.rev files, but did not do so for *.idx files.
    +    In preceding commits the race of renaming .idx files in place before
    +    .rev files and other auxiliary files was fixed in pack-write.c's
    +    finish_tmp_packfile(), builtin/repack.c's "struct exts", and
    +    builtin/index-pack.c's final(). As noted in the change to pack-write.c
    +    we left in place the issue of writing *.bitmap files after the *.idx,
    +    let's fix that issue.
     
         See 7cc8f971085 (pack-objects: implement bitmap writing, 2013-12-21)
         for commentary at the time when *.bitmap was implemented about how
         those files are written out, nothing in that commit contradicts what's
         being done here.
     
    -    Note that the referenced earlier commit[1] is overly optimistic about
    -    "clos[ing the] race", i.e. yes we'll now write the files in the right
    -    order, but we might still race due to our sloppy use of fsync(). See
    -    the thread at [2] for a rabbit hole of various discussions about
    -    filesystem races in the face of doing and not doing fsync() (and if
    -    doing fsync(), not doing it properly).
    +    Note that this commit and preceding ones only close any race condition
    +    with *.idx files being written before their auxiliary files if we're
    +    optimistic about our lack of fsync()-ing in this are not tripping us
    +    over. See the thread at [1] for a rabbit hole of various discussions
    +    about filesystem races in the face of doing and not doing fsync() (and
    +    if doing fsync(), not doing it properly).
     
    -    1. https://lore.kernel.org/git/a6a4d2154e83d41c10986c5f455279ab249a910c.1630461918.git.me@ttaylorr.com/
    -    2. https://lore.kernel.org/git/8735qgkvv1.fsf@evledraar.gmail.com/
    +    In particular, in this case of writing to ".git/objects/pack" we only
    +    write and fsync() the individual files, but if we wanted to guarantee
    +    that the metadata update was seen in that way by concurrent processes
    +    we'd need to fsync() on the "fd" of the containing directory. That
    +    concern is probably more theoretical than not, modern OS's tend to be
    +    more on the forgiving side than the overly pedantic side of
    +    implementing POSIX FS semantics.
    +
    +    1. https://lore.kernel.org/git/8735qgkvv1.fsf@evledraar.gmail.com/
     
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
    @@ builtin/pack-objects.c: static void write_pack_file(void)
      			stage_tmp_packfiles(&tmpname, pack_tmp_name,
      					    written_list, nr_written,
      					    &pack_idx_opts, hash, &idx_tmp_name);
    --			rename_tmp_packfile_idx(&tmpname, hash, &idx_tmp_name);
    +-			rename_tmp_packfile_idx(&tmpname, &idx_tmp_name);
      
      			if (write_bitmap_index) {
    - 				struct strbuf sb = STRBUF_INIT;
    + 				size_t tmpname_len = tmpname.len;
     @@ builtin/pack-objects.c: static void write_pack_file(void)
    - 				strbuf_release(&sb);
    + 				strbuf_setlen(&tmpname, tmpname_len);
      			}
      
    -+			rename_tmp_packfile_idx(&tmpname, hash, &idx_tmp_name);
    ++			rename_tmp_packfile_idx(&tmpname, &idx_tmp_name);
     +
      			free(idx_tmp_name);
      			strbuf_release(&tmpname);
Taylor Blau Sept. 9, 2021, 1:13 a.m. UTC | #3
On Thu, Sep 09, 2021 at 02:50:59AM +0200, Ævar Arnfjörð Bjarmason wrote:
>
> On Thu, Sep 09 2021, Ævar Arnfjörð Bjarmason wrote:
>
> > On Wed, Sep 08 2021, Taylor Blau wrote:
> >
> >> Here is a very small reroll of a couple of patches to make sure that packs are
> >> not read before all of their auxiliary files exist and are moved into place, by
> >> renaming the .idx file into place last.
> >>
> >> New since the original version is a patch to apply similar treatment to
> >> index-pack, as noticed by Peff in [1]. This should be queued before Ævar's
> >> series on top.
> >
> > I read Junio's earlier <xmqq8s063m7m.fsf@gitster.g>[1] as a suggestion
> > that we combine the two into a singe series. I do think that yields a
> > better end-result, in particular your 1/3 is much more readable if the
> > refactoring in my 2/4.
> >
> > I'm mot of the way done with such a rewrite, which also incorporates
> > your suggestion for how to manage memory in rename_tmp_packfile(), but
> > I'll hold of on what you & Junio have to say about next steps before
> > adding to the re-roll competition Junio mentioned...
> >
> > 1. https://lore.kernel.org/git/xmqq8s063m7m.fsf@gitster.g
>
> I've got that at
> https://github.com/git/git/compare/master...avar:avar-tb/idx-rename-race-3

Beautiful :-).

I mentioned in my response to [1] that I missed that message when
sending v2 of my series to fix a couple of these races. And I was even
happy to unify our two series, but you did all of the work for me while
I was eating dinner. Thank you!

I fetched your branch, reviewed, and am happy with the result. So I
would be content to apply my s-o-b to your patches and resubmit them as
a unified series.

But I did wonder if you wanted to include [2] in this series as well.
It's kind of different, but also related enough that it probably makes
sense to just pull it in. So I'm inclined to just do that, unless you
have any objections.

Thanks,
Taylor

[2]: https://lore.kernel.org/git/patch-1.1-366ba928bd-20210908T010743Z-avarab@gmail.com/
Ævar Arnfjörð Bjarmason Sept. 9, 2021, 1:33 a.m. UTC | #4
On Wed, Sep 08 2021, Taylor Blau wrote:

> On Thu, Sep 09, 2021 at 02:50:59AM +0200, Ævar Arnfjörð Bjarmason wrote:
>>
>> On Thu, Sep 09 2021, Ævar Arnfjörð Bjarmason wrote:
>>
>> > On Wed, Sep 08 2021, Taylor Blau wrote:
>> >
>> >> Here is a very small reroll of a couple of patches to make sure that packs are
>> >> not read before all of their auxiliary files exist and are moved into place, by
>> >> renaming the .idx file into place last.
>> >>
>> >> New since the original version is a patch to apply similar treatment to
>> >> index-pack, as noticed by Peff in [1]. This should be queued before Ævar's
>> >> series on top.
>> >
>> > I read Junio's earlier <xmqq8s063m7m.fsf@gitster.g>[1] as a suggestion
>> > that we combine the two into a singe series. I do think that yields a
>> > better end-result, in particular your 1/3 is much more readable if the
>> > refactoring in my 2/4.
>> >
>> > I'm mot of the way done with such a rewrite, which also incorporates
>> > your suggestion for how to manage memory in rename_tmp_packfile(), but
>> > I'll hold of on what you & Junio have to say about next steps before
>> > adding to the re-roll competition Junio mentioned...
>> >
>> > 1. https://lore.kernel.org/git/xmqq8s063m7m.fsf@gitster.g
>>
>> I've got that at
>> https://github.com/git/git/compare/master...avar:avar-tb/idx-rename-race-3
>
> Beautiful :-).
>
> I mentioned in my response to [1] that I missed that message when
> sending v2 of my series to fix a couple of these races. And I was even
> happy to unify our two series, but you did all of the work for me while
> I was eating dinner. Thank you!
>
> I fetched your branch, reviewed, and am happy with the result. So I
> would be content to apply my s-o-b to your patches and resubmit them as
> a unified series.

Sounds good to me, also in particular any typo/grammar etc. fixes to my
changes are most welcome, I tend to sneak those in, and any code changes
you see fit of course.

> But I did wonder if you wanted to include [2] in this series as well.
> It's kind of different, but also related enough that it probably makes
> sense to just pull it in. So I'm inclined to just do that, unless you
> have any objections.

In this latest push-out of "next" Junio's merged that one down already,
see 1972c5931bb (Merge branch 'ab/reverse-midx-optim' into next,
2021-09-08).

So I think at this point it could be built on top of that, but given
that the two don't conflict in any way (textually or semantically) it's
probably better to base this larger topic on "master" and proceed with
them independently.

> [2]: https://lore.kernel.org/git/patch-1.1-366ba928bd-20210908T010743Z-avarab@gmail.com/
Ævar Arnfjörð Bjarmason Sept. 9, 2021, 2:36 a.m. UTC | #5
On Thu, Sep 09 2021, Ævar Arnfjörð Bjarmason wrote:

> On Wed, Sep 08 2021, Taylor Blau wrote:
>
>> On Thu, Sep 09, 2021 at 02:50:59AM +0200, Ævar Arnfjörð Bjarmason wrote:
>>>
>>> On Thu, Sep 09 2021, Ævar Arnfjörð Bjarmason wrote:
>>>
>>> > On Wed, Sep 08 2021, Taylor Blau wrote:
>>> >
>>> >> Here is a very small reroll of a couple of patches to make sure that packs are
>>> >> not read before all of their auxiliary files exist and are moved into place, by
>>> >> renaming the .idx file into place last.
>>> >>
>>> >> New since the original version is a patch to apply similar treatment to
>>> >> index-pack, as noticed by Peff in [1]. This should be queued before Ævar's
>>> >> series on top.
>>> >
>>> > I read Junio's earlier <xmqq8s063m7m.fsf@gitster.g>[1] as a suggestion
>>> > that we combine the two into a singe series. I do think that yields a
>>> > better end-result, in particular your 1/3 is much more readable if the
>>> > refactoring in my 2/4.
>>> >
>>> > I'm mot of the way done with such a rewrite, which also incorporates
>>> > your suggestion for how to manage memory in rename_tmp_packfile(), but
>>> > I'll hold of on what you & Junio have to say about next steps before
>>> > adding to the re-roll competition Junio mentioned...
>>> >
>>> > 1. https://lore.kernel.org/git/xmqq8s063m7m.fsf@gitster.g
>>>
>>> I've got that at
>>> https://github.com/git/git/compare/master...avar:avar-tb/idx-rename-race-3
>>
>> Beautiful :-).
>>
>> I mentioned in my response to [1] that I missed that message when
>> sending v2 of my series to fix a couple of these races. And I was even
>> happy to unify our two series, but you did all of the work for me while
>> I was eating dinner. Thank you!
>>
>> I fetched your branch, reviewed, and am happy with the result. So I
>> would be content to apply my s-o-b to your patches and resubmit them as
>> a unified series.
>
> Sounds good to me, also in particular any typo/grammar etc. fixes to my
> changes are most welcome, I tend to sneak those in, and any code changes
> you see fit of course.
>
>> But I did wonder if you wanted to include [2] in this series as well.
>> It's kind of different, but also related enough that it probably makes
>> sense to just pull it in. So I'm inclined to just do that, unless you
>> have any objections.
>
> In this latest push-out of "next" Junio's merged that one down already,
> see 1972c5931bb (Merge branch 'ab/reverse-midx-optim' into next,
> 2021-09-08).
>
> So I think at this point it could be built on top of that, but given
> that the two don't conflict in any way (textually or semantically) it's
> probably better to base this larger topic on "master" and proceed with
> them independently.
>
>> [2]: https://lore.kernel.org/git/patch-1.1-366ba928bd-20210908T010743Z-avarab@gmail.com/

Also: That avar-tb/idx-rename-race-3 has a bug where I didn't update the
bulk-checkin.c code accordingly, missed it because I was running a glob
of tests that included "*pack*,*rev*", but missed "*large*". I've got a
avar-tb/idx-rename-race-4 which fixes it. Sorry about that.

https://github.com/git/git/compare/master...avar:avar-tb/idx-rename-race-4
Taylor Blau Sept. 9, 2021, 2:49 a.m. UTC | #6
On Thu, Sep 09, 2021 at 04:36:17AM +0200, Ævar Arnfjörð Bjarmason wrote:
> Also: That avar-tb/idx-rename-race-3 has a bug where I didn't update the
> bulk-checkin.c code accordingly, missed it because I was running a glob
> of tests that included "*pack*,*rev*", but missed "*large*". I've got a
> avar-tb/idx-rename-race-4 which fixes it. Sorry about that.
>
> https://github.com/git/git/compare/master...avar:avar-tb/idx-rename-race-4

Yep, I noticed the same thing while running make test on the series
before sending it out. Your fix matches mine (which is to move the
change from 7/8 to 2/8).

While I was fixing the second patch, I inserted a new one just before it
to store the hash in a `unsigned char hash[GIT_MAX_RAWSZ]` instead of a
`struct object_id`.

Thanks,
Taylor