diff mbox series

packfile: delete .idx files before .pack files

Message ID pull.1547.git.1687287675248.gitgitgadget@gmail.com (mailing list archive)
State Accepted
Commit 0dd1324a73e404a2b02356c77088ec1f71beec11
Headers show
Series packfile: delete .idx files before .pack files | expand

Commit Message

Derrick Stolee June 20, 2023, 7:01 p.m. UTC
From: Derrick Stolee <derrickstolee@github.com>

When installing a packfile, we place the .pack file before the .idx
file. The intention is that Git scans for .idx files in the pack
directory and then loads the .pack files from that list.

However, when we delete packfiles, we do not do this in the reverse
order as we should. The unlink_pack_path() method deletes the .pack
followed by the .idx.

This creates a window where the process could be interrupted between
the .pack deletion and the .idx deletion, leaving the repository in a
state that looks strange, but isn't actually too problematic if we
assume the pack was safe to delete. The .idx without a .pack will cause
some overhead, but will not interrupt other Git processes.

This ordering was introduced into the 'git repack' builtin by
a1bbc6c0176 (repack: rewrite the shell script in C, 2013-09-15), though
we must be careful to track history through the code move in 8434e85d5f9
(repack: refactor pack deletion for future use, 2019-06-10) to see that.

This became more important after 73320e49add (builtin/repack.c: only
collect fully-formed packs, 2023-06-07) changed how 'git repack' scanned
for packfiles for use in the cruft pack process. It previously looked
for .pack files, but that was problematic due to the order that packs
are installed: repacks between the creation of a .pack and the creation
of its .idx would result in hard failures.

There is an independent proposal about what to do in the case of a .idx
without a .pack during this 'git repack' scenario, but this change is
focused on deleting .pack files more safely.

Modify the order to delete the .idx before the .pack. The rest of the
modifiers on the .pack should still come after the .pack so we know all
of the presumed properties of the packfile as long as it exists in the
filesystem, in case we wish to reinstate it by re-indexing the .pack
file.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
    packfile: delete .idx files before .pack files
    
    I'm going to submit an RFC soon [1] that approaches this issue from
    another angle, but this should be a straightforward fix to avoid the
    problematic "an .idx exists without its .pack" case that has been seen
    in the wild.
    
    [1] https://github.com/gitgitgadget/git/pull/1546
    
    Thanks, -Stolee

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1547%2Fderrickstolee%2Fdelete-idx-first-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1547/derrickstolee/delete-idx-first-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1547

 packfile.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


base-commit: fe86abd7511a9a6862d5706c6fa1d9b57a63ba09

Comments

Junio C Hamano June 20, 2023, 8:39 p.m. UTC | #1
"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> However, when we delete packfiles, we do not do this in the reverse
> order as we should. The unlink_pack_path() method deletes the .pack
> followed by the .idx.

Makes sense.
Taylor Blau June 20, 2023, 10:02 p.m. UTC | #2
On Tue, Jun 20, 2023 at 07:01:15PM +0000, Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <derrickstolee@github.com>
>
> When installing a packfile, we place the .pack file before the .idx
> file. The intention is that Git scans for .idx files in the pack
> directory and then loads the .pack files from that list.
>
> However, when we delete packfiles, we do not do this in the reverse
> order as we should. The unlink_pack_path() method deletes the .pack
> followed by the .idx.
>
> This creates a window where the process could be interrupted between
> the .pack deletion and the .idx deletion, leaving the repository in a
> state that looks strange, but isn't actually too problematic if we
> assume the pack was safe to delete. The .idx without a .pack will cause
> some overhead, but will not interrupt other Git processes.

I think that this specific case wouldn't cause a problem since we were
about to delete that pack anyway, but it may be worth emphasizing that
having a missing pack with an extant .idx file is not always guaranteed
to be safe.

> This ordering was introduced into the 'git repack' builtin by
> a1bbc6c0176 (repack: rewrite the shell script in C, 2013-09-15), though
> we must be careful to track history through the code move in 8434e85d5f9
> (repack: refactor pack deletion for future use, 2019-06-10) to see that.

Thanks for calling this out, I agree that the "regression" dates back to
a1bbc6c0176, since we can see that in git-repack.sh, we do:

    # Remove the "old-" files
    for name in $names
    do
      rm -f "$PACKDIR/old-pack-$name.idx"
      rm -f "$PACKDIR/old-pack-$name.pack"
    done

...removing the .idx first, which is the right thing to do here.

> diff --git a/packfile.c b/packfile.c
> index fd083c86e00..6b591ddcccf 100644
> --- a/packfile.c
> +++ b/packfile.c
> @@ -381,7 +381,7 @@ void close_object_store(struct raw_object_store *o)
>
>  void unlink_pack_path(const char *pack_name, int force_delete)
>  {
> -	static const char *exts[] = {".pack", ".idx", ".rev", ".keep", ".bitmap", ".promisor", ".mtimes"};
> +	static const char *exts[] = {".idx", ".pack", ".rev", ".keep", ".bitmap", ".promisor", ".mtimes"};
>  	int i;
>  	struct strbuf buf = STRBUF_INIT;
>  	size_t plen;

Looks good, thanks.

Thanks,
Taylor
diff mbox series

Patch

diff --git a/packfile.c b/packfile.c
index fd083c86e00..6b591ddcccf 100644
--- a/packfile.c
+++ b/packfile.c
@@ -381,7 +381,7 @@  void close_object_store(struct raw_object_store *o)
 
 void unlink_pack_path(const char *pack_name, int force_delete)
 {
-	static const char *exts[] = {".pack", ".idx", ".rev", ".keep", ".bitmap", ".promisor", ".mtimes"};
+	static const char *exts[] = {".idx", ".pack", ".rev", ".keep", ".bitmap", ".promisor", ".mtimes"};
 	int i;
 	struct strbuf buf = STRBUF_INIT;
 	size_t plen;