diff mbox series

server-info: do not list unlinked packs

Message ID 20190523085959.4q76pokx2gy6wqq7@dcvr (mailing list archive)
State New, archived
Headers show
Series server-info: do not list unlinked packs | expand

Commit Message

Eric Wong May 23, 2019, 8:59 a.m. UTC
Jeff King <peff@peff.net> wrote:
> On Wed, May 15, 2019 at 08:38:39PM +0000, Eric Wong wrote:
> 
> > I've also noticed objects/info/packs contains stale entries
> > after repack/gc runs on current git.
> > 
> > Tried adding reprepare_packed_git before update_server_info,
> > but that didn't seem to work; so maybe something isn't cleared.
> > Might have time to investigate more this week, might not...
> 
> We never delete entries from the in-memory packed_git list; a reprepare
> only adds to the list. You'd need to teach update_server_info() to
> ignore packs which are no longer present (or switch to exec-ing a
> separate update-server-info binary).

Ah, checking files_exists() and setting a bit seems sufficient.

--------8<---------
Subject: [PATCH] server-info: do not list unlinked packs

Having non-existent packs in objects/info/packs causes
dumb HTTP clients to abort.

There remains a small window where the old objects/info/packs
file can refer to unlinked packs.  That's unavoidable even on a
local FS given the time-of-use-time-of-check window between
listing and retrieving files.

Signed-off-by: Eric Wong <e@80x24.org>
---
  I think the small window I refer to can be worked around by
  teaching the dumb HTTP client to reread objects/info/packs
  if it 404s while trying to GET a pack...

 object-store.h | 1 +
 server-info.c  | 7 ++++++-
 t/t6500-gc.sh  | 2 ++
 3 files changed, 9 insertions(+), 1 deletion(-)


base-commit: aa25c82427ae70aebf3b8f970f2afd54e9a2a8c6

Comments

Jeff King May 23, 2019, 10:24 a.m. UTC | #1
On Thu, May 23, 2019 at 08:59:59AM +0000, Eric Wong wrote:

> > We never delete entries from the in-memory packed_git list; a reprepare
> > only adds to the list. You'd need to teach update_server_info() to
> > ignore packs which are no longer present (or switch to exec-ing a
> > separate update-server-info binary).
> 
> Ah, checking files_exists() and setting a bit seems sufficient.

Yes, though we do we even need to store the bit?

I.e.,

> @@ -199,12 +200,16 @@ static void init_pack_info(const char *infofile, int force)
>  		 */
>  		if (!p->pack_local)
>  			continue;
> +		if (!file_exists(p->pack_name)) {
> +			p->pack_unlinked = 1;
> +			continue;
> +		}
>  		i++;
>  	}
>  	num_pack = i;
>  	info = xcalloc(num_pack, sizeof(struct pack_info *));
>  	for (i = 0, p = get_all_packs(the_repository); p; p = p->next) {
> -		if (!p->pack_local)
> +		if (!p->pack_local || p->pack_unlinked)
>  			continue;
>  		assert(i < num_pack);
>  		info[i] = xcalloc(1, sizeof(struct pack_info));

If we just check file_exists() in the second loop, then this is entirely
local to update_server_info(). And other users of packed_git do not have
to wonder who is responsible for setting that flag in the global list.

It does mean you'd over-allocate the array (and num_pack would have to
be adjusted down to "i" after the second loop), but that's not a big
deal.  I do think the whole two-loop thing would be more readable if we
simply grew it on the fly with ALLOC_GROW().

-Peff
diff mbox series

Patch

diff --git a/object-store.h b/object-store.h
index 272e01e452..2c9facc8f2 100644
--- a/object-store.h
+++ b/object-store.h
@@ -77,6 +77,7 @@  struct packed_git {
 		 freshened:1,
 		 do_not_close:1,
 		 pack_promisor:1,
+		 pack_unlinked:1,
 		 multi_pack_index:1;
 	unsigned char hash[GIT_MAX_RAWSZ];
 	struct revindex_entry *revindex;
diff --git a/server-info.c b/server-info.c
index 41274d098b..69e2c5279b 100644
--- a/server-info.c
+++ b/server-info.c
@@ -1,4 +1,5 @@ 
 #include "cache.h"
+#include "dir.h"
 #include "repository.h"
 #include "refs.h"
 #include "object.h"
@@ -199,12 +200,16 @@  static void init_pack_info(const char *infofile, int force)
 		 */
 		if (!p->pack_local)
 			continue;
+		if (!file_exists(p->pack_name)) {
+			p->pack_unlinked = 1;
+			continue;
+		}
 		i++;
 	}
 	num_pack = i;
 	info = xcalloc(num_pack, sizeof(struct pack_info *));
 	for (i = 0, p = get_all_packs(the_repository); p; p = p->next) {
-		if (!p->pack_local)
+		if (!p->pack_local || p->pack_unlinked)
 			continue;
 		assert(i < num_pack);
 		info[i] = xcalloc(1, sizeof(struct pack_info));
diff --git a/t/t6500-gc.sh b/t/t6500-gc.sh
index 515c6735e9..c0f04dc6b0 100755
--- a/t/t6500-gc.sh
+++ b/t/t6500-gc.sh
@@ -71,6 +71,8 @@  test_expect_success 'gc --keep-largest-pack' '
 		git gc --keep-largest-pack &&
 		( cd .git/objects/pack && ls *.pack ) >pack-list &&
 		test_line_count = 2 pack-list &&
+		awk "/^P /{print \$2}" <.git/objects/info/packs >pack-info &&
+		test_line_count = 2 pack-info &&
 		test_path_is_file $BASE_PACK &&
 		git fsck
 	)