diff mbox series

[2/3] midx: close multi-pack-index on repack

Message ID 725ebadc92a91469eed089eb501b705c2dd2c627.1539011820.git.gitgitgadget@gmail.com (mailing list archive)
State New, archived
Headers show
Series Add GIT_TEST_MULTI_PACK_INDEX environment variable | expand

Commit Message

Linus Arver via GitGitGadget Oct. 8, 2018, 3:17 p.m. UTC
From: Derrick Stolee <dstolee@microsoft.com>

When repacking, we may remove pack-files. This invalidates the
multi-pack-index (if it exists). Previously, we removed the
multi-pack-index file before removing any pack-file. In some cases,
the repack command may load the multi-pack-index into memory. This
may lead to later in-memory references to the non-existent pack-
files.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/repack.c | 4 ++++
 midx.c           | 6 +++++-
 midx.h           | 2 ++
 3 files changed, 11 insertions(+), 1 deletion(-)

Comments

Junio C Hamano Oct. 9, 2018, 9:10 a.m. UTC | #1
"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> diff --git a/builtin/repack.c b/builtin/repack.c
> index c6a7943d5c..7925bb976e 100644
> --- a/builtin/repack.c
> +++ b/builtin/repack.c
> @@ -432,6 +432,10 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
>  
>  			if (!midx_cleared) {
>  				/* if we move a packfile, it will invalidated the midx */
> +				if (the_repository->objects) {
> +					close_midx(the_repository->objects->multi_pack_index);
> +					the_repository->objects->multi_pack_index = NULL;
> +				}
>  				clear_midx_file(get_object_directory());
>  				midx_cleared = 1;

It somehow looks like a bit of layering violation, doesn't it?  When
we are clearing a midx file, don't we always want to do this as well?
Derrick Stolee Oct. 9, 2018, 2:11 p.m. UTC | #2
On 10/9/2018 5:10 AM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
>> diff --git a/builtin/repack.c b/builtin/repack.c
>> index c6a7943d5c..7925bb976e 100644
>> --- a/builtin/repack.c
>> +++ b/builtin/repack.c
>> @@ -432,6 +432,10 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
>>   
>>   			if (!midx_cleared) {
>>   				/* if we move a packfile, it will invalidated the midx */
>> +				if (the_repository->objects) {
>> +					close_midx(the_repository->objects->multi_pack_index);
>> +					the_repository->objects->multi_pack_index = NULL;
>> +				}
>>   				clear_midx_file(get_object_directory());
>>   				midx_cleared = 1;
> It somehow looks like a bit of layering violation, doesn't it?  When
> we are clearing a midx file, don't we always want to do this as well?

You're right. It did feel a bit wrong. In v2, I'll replace this commit 
with a refactor of clear_midx_file() to do that. One tricky part is that 
we need to clear the file even if the in-memory struct hasn't been 
initialized, but I think passing a repository will suffice for that.

CC Stefan: Is there a plan to make get_object_directory() take a 
repository parameter?

Thanks,

-Stolee
Stefan Beller Oct. 9, 2018, 6:15 p.m. UTC | #3
On Tue, Oct 9, 2018 at 7:11 AM Derrick Stolee <stolee@gmail.com> wrote:
>
> CC Stefan: Is there a plan to make get_object_directory() take a
> repository parameter?

Not an immediate plan. Regarding the large refactorings I am mostly
waiting for nd/the-index to stabilize (which may be stable enough by
now) before proceeding.  Specifically 2abf350385 (revision.c: remove
implicit dependency on the_index, 2018-09-21) is a missing piece that
I want to build on.

My next step is to look into making use of the refactorings that we did
over the last year, so I'd rather look into more submodule code again.

To come back to your question:
    git grep get_object_directory |wc -l
    30
    git grep "objects->objectdir" |wc -l
    11
2 out of the 11 matches are from the implementation of get_object_directory

So either we'd actually teach get_object_directory to take a repository
and inspect the 11 occurrences of direct access to objects->objectdir
or we'd view get_object_directory() as a fancy wrapper, that we might want
to drop eventually.
diff mbox series

Patch

diff --git a/builtin/repack.c b/builtin/repack.c
index c6a7943d5c..7925bb976e 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -432,6 +432,10 @@  int cmd_repack(int argc, const char **argv, const char *prefix)
 
 			if (!midx_cleared) {
 				/* if we move a packfile, it will invalidated the midx */
+				if (the_repository->objects) {
+					close_midx(the_repository->objects->multi_pack_index);
+					the_repository->objects->multi_pack_index = NULL;
+				}
 				clear_midx_file(get_object_directory());
 				midx_cleared = 1;
 			}
diff --git a/midx.c b/midx.c
index 999717b96f..fe8532a9d1 100644
--- a/midx.c
+++ b/midx.c
@@ -180,9 +180,13 @@  cleanup_fail:
 	return NULL;
 }
 
-static void close_midx(struct multi_pack_index *m)
+void close_midx(struct multi_pack_index *m)
 {
 	uint32_t i;
+
+	if (!m)
+		return;
+
 	munmap((unsigned char *)m->data, m->data_len);
 	close(m->fd);
 	m->fd = -1;
diff --git a/midx.h b/midx.h
index a210f1af2a..af6b5cb58f 100644
--- a/midx.h
+++ b/midx.h
@@ -44,4 +44,6 @@  int prepare_multi_pack_index_one(struct repository *r, const char *object_dir, i
 int write_midx_file(const char *object_dir);
 void clear_midx_file(const char *object_dir);
 
+void close_midx(struct multi_pack_index *m);
+
 #endif