diff mbox series

[v2,2/6] midx: add progress to write_midx_file Add progress to write_midx_file. Progress is displayed when the MIDX_PROGRESS flag is set.

Message ID 3bc8677ea7655a3706914f9753c0a3b79dbf7e1f.1568998427.git.gitgitgadget@gmail.com (mailing list archive)
State New, archived
Headers show
Series multi-pack-index: add --no-progress | expand

Commit Message

Linus Arver via GitGitGadget Sept. 20, 2019, 4:53 p.m. UTC
From: William Baker <William.Baker@microsoft.com>

Signed-off-by: William Baker <William.Baker@microsoft.com>
---
 midx.c | 25 +++++++++++++++++++++----
 1 file changed, 21 insertions(+), 4 deletions(-)

Comments

Junio C Hamano Sept. 20, 2019, 8:10 p.m. UTC | #1
"William Baker via GitGitGadget" <gitgitgadget@gmail.com> writes:

> diff --git a/midx.c b/midx.c
> index b2673f52e8..54e4e93b2b 100644
> --- a/midx.c
> +++ b/midx.c
> @@ -449,6 +449,8 @@ struct pack_list {
>  	uint32_t nr;
>  	uint32_t alloc;
>  	struct multi_pack_index *m;
> +	struct progress *progress;
> +	uint32_t pack_paths_checked;
>  };

What is the reason behind u32 here?  Do we want to be consistently
limited to 4G on all platforms?

I have a feeling that we do not care too deeply all that much for
the purpose of progress output, in which case, just an ordinary
"unsigned" may be much less misleading.

FWIW, the function that uses this field is display_progress(), which
takes uint64_t there.

> @@ -457,6 +459,7 @@ static void add_pack_to_midx(const char *full_path, size_t full_path_len,
>  	struct pack_list *packs = (struct pack_list *)data;
>  
>  	if (ends_with(file_name, ".idx")) {
> +		display_progress(packs->progress, ++packs->pack_paths_checked);

OK, "git grep display_progress" tells me that we are expected to
pass the value of the counter that is already incremented, so this
is also in line with that practice.

> +	if (flags & MIDX_PROGRESS)
> +		packs.progress = start_progress(_("Adding packfiles to multi-pack-index"), 0);
> +	else
> +		packs.progress = NULL;
>
>  	for_each_file_in_pack_dir(object_dir, add_pack_to_midx, &packs);
> +	stop_progress(&packs.progress);

OK.

> +	if (flags & MIDX_PROGRESS)
> +		progress = start_progress(_("Writing chunks to multi-pack-index"),
> +					  num_chunks);
> ...
> +
> +		display_progress(progress, i + 1);
>  	}
> +	stop_progress(&progress);
>  

OK.
William Baker Sept. 23, 2019, 9:12 p.m. UTC | #2
On 9/20/19 1:10 PM, Junio C Hamano wrote:
>> diff --git a/midx.c b/midx.c
>> index b2673f52e8..54e4e93b2b 100644
>> --- a/midx.c
>> +++ b/midx.c
>> @@ -449,6 +449,8 @@ struct pack_list {
>>  	uint32_t nr;
>>  	uint32_t alloc;
>>  	struct multi_pack_index *m;
>> +	struct progress *progress;
>> +	uint32_t pack_paths_checked;
>>  };
> 
> What is the reason behind u32 here?  Do we want to be consistently
> limited to 4G on all platforms?
> 
> I have a feeling that we do not care too deeply all that much for
> the purpose of progress output, in which case, just an ordinary
> "unsigned" may be much less misleading.

I went with u32 here to match the data type used to track how many
entries are in the pack_list ('nr' is a u32).

I could switch to this to an unsigned but we would run the (extremely
unlikely) risk of having more than 65k packs on a system where
unsigned is 16 bits.

> FWIW, the function that uses this field is display_progress(), which
> takes uint64_t there.

Thanks for pointing this out.  Given that display_progress() expects a
u64 using that type here for 'pack_paths_checked' could help make things
more straightforward.

-William
Junio C Hamano Sept. 28, 2019, 3:49 a.m. UTC | #3
William Baker <williamtbakeremail@gmail.com> writes:

> On 9/20/19 1:10 PM, Junio C Hamano wrote:
>>> diff --git a/midx.c b/midx.c
>>> index b2673f52e8..54e4e93b2b 100644
>>> --- a/midx.c
>>> +++ b/midx.c
>>> @@ -449,6 +449,8 @@ struct pack_list {
>>>  	uint32_t nr;
>>>  	uint32_t alloc;
>>>  	struct multi_pack_index *m;
>>> +	struct progress *progress;
>>> +	uint32_t pack_paths_checked;
>>>  };
>> 
>> What is the reason behind u32 here?  Do we want to be consistently
>> limited to 4G on all platforms?
>> 
>> I have a feeling that we do not care too deeply all that much for
>> the purpose of progress output, in which case, just an ordinary
>> "unsigned" may be much less misleading.
>
> I went with u32 here to match the data type used to track how many
> entries are in the pack_list ('nr' is a u32).

That kind of parallel is valid when you could compare nr with this
new thing (or put it differently, the existing nr and this new thing
counts the same).  Are they both about the number of packs?

> I could switch to this to an unsigned but we would run the (extremely
> unlikely) risk of having more than 65k packs on a system where
> unsigned is 16 bits.

Why?  If an arch is small enough that the natural integer size is 16-bit,
then limiting the total number of packs to 65k sound entirely
sensible.

The only reason why you'd want fixed (across platforms and
architectures) type is when you want to make sure that a file
storing the literal values taken from these fields are portable and
everybody is limited the same way.  If a platform's natural integer
is 64-bit, by artificially limiting the size of this field to u32,
you are making disservice to the platform users, and more
importantly, you are wasting developers' time by forcing them to
wonder if there is a reason behind the choice of u32 (does it really
need to be able to store up to 4G, even on a smaller machines?  Is
it necessary to refuse to store more than 4G?  What are the
reasons?), like me wondering about these questions and writing them
down here.

So, unless there is a reason why this must be of fixed type, I'd say
just an unsigned would be the most reasonable choice.
William Baker Sept. 30, 2019, 4:36 p.m. UTC | #4
On 9/27/19 8:49 PM, Junio C Hamano wrote:
>>>> diff --git a/midx.c b/midx.c
>>>> index b2673f52e8..54e4e93b2b 100644
>>>> --- a/midx.c
>>>> +++ b/midx.c
>>>> @@ -449,6 +449,8 @@ struct pack_list {
>>>>  	uint32_t nr;
>>>>  	uint32_t alloc;
>>>>  	struct multi_pack_index *m;
>>>> +	struct progress *progress;
>>>> +	uint32_t pack_paths_checked;
>>>>  };
>>
>> I went with u32 here to match the data type used to track how many
>> entries are in the pack_list ('nr' is a u32).
> 
> That kind of parallel is valid when you could compare nr with this
> new thing (or put it differently, the existing nr and this new thing
> counts the same).  Are they both about the number of packs?
> 

Both 'nr' and 'pack_paths_checked' are about the number of packs.  
'nr' tracks the number of packs in the multi-pack-index and it grows
as add_pack_to_midx() finds new packs to add.  'pack_paths_checked' is
the number of pack ".idx" files that have been checked by add_pack_to_midx().

>> I could switch to this to an unsigned but we would run the (extremely
>> unlikely) risk of having more than 65k packs on a system where
>> unsigned is 16 bits.
> 
> Why?  If an arch is small enough that the natural integer size is 16-bit,
> then limiting the total number of packs to 65k sound entirely
> sensible.> The only reason why you'd want fixed (across platforms and
> architectures) type is when you want to make sure that a file
> storing the literal values taken from these fields are portable and
> everybody is limited the same way.  If a platform's natural integer
> is 64-bit, by artificially limiting the size of this field to u32,
> you are making disservice to the platform users, and more
> importantly, you are wasting developers' time by forcing them to
> wonder if there is a reason behind the choice of u32 (does it really
> need to be able to store up to 4G, even on a smaller machines?  Is
> it necessary to refuse to store more than 4G?  What are the
> reasons?), like me wondering about these questions and writing them
> down here.
> 
> So, unless there is a reason why this must be of fixed type, I'd say
> just an unsigned would be the most reasonable choice.
> 

I agree that it's best to avoid using a fixed type here unless there's
a reason that it must be.  Do you think that both 'nr' and
'pack_paths_checked' being about the number of packs is a strong enough
reason to use u32 for 'pack_paths_checked'?  If not, I will update
'pack_paths_checked' in the next path series to be an unsigned int.

Thanks,
William
diff mbox series

Patch

diff --git a/midx.c b/midx.c
index b2673f52e8..54e4e93b2b 100644
--- a/midx.c
+++ b/midx.c
@@ -449,6 +449,8 @@  struct pack_list {
 	uint32_t nr;
 	uint32_t alloc;
 	struct multi_pack_index *m;
+	struct progress *progress;
+	uint32_t pack_paths_checked;
 };
 
 static void add_pack_to_midx(const char *full_path, size_t full_path_len,
@@ -457,6 +459,7 @@  static void add_pack_to_midx(const char *full_path, size_t full_path_len,
 	struct pack_list *packs = (struct pack_list *)data;
 
 	if (ends_with(file_name, ".idx")) {
+		display_progress(packs->progress, ++packs->pack_paths_checked);
 		if (packs->m && midx_contains_pack(packs->m, file_name))
 			return;
 
@@ -786,7 +789,7 @@  static size_t write_midx_large_offsets(struct hashfile *f, uint32_t nr_large_off
 }
 
 static int write_midx_internal(const char *object_dir, struct multi_pack_index *m,
-			       struct string_list *packs_to_drop)
+			       struct string_list *packs_to_drop, unsigned flags)
 {
 	unsigned char cur_chunk, num_chunks = 0;
 	char *midx_name;
@@ -800,6 +803,7 @@  static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 	uint64_t chunk_offsets[MIDX_MAX_CHUNKS + 1];
 	uint32_t nr_entries, num_large_offsets = 0;
 	struct pack_midx_entry *entries = NULL;
+	struct progress *progress = NULL;
 	int large_offsets_needed = 0;
 	int pack_name_concat_len = 0;
 	int dropped_packs = 0;
@@ -833,8 +837,15 @@  static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 			packs.nr++;
 		}
 	}
+	
+	packs.pack_paths_checked = 0;
+	if (flags & MIDX_PROGRESS)
+		packs.progress = start_progress(_("Adding packfiles to multi-pack-index"), 0);
+	else
+		packs.progress = NULL;
 
 	for_each_file_in_pack_dir(object_dir, add_pack_to_midx, &packs);
+	stop_progress(&packs.progress);
 
 	if (packs.m && packs.nr == packs.m->num_packs && !packs_to_drop)
 		goto cleanup;
@@ -959,6 +970,9 @@  static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 		written += MIDX_CHUNKLOOKUP_WIDTH;
 	}
 
+	if (flags & MIDX_PROGRESS)
+		progress = start_progress(_("Writing chunks to multi-pack-index"),
+					  num_chunks);
 	for (i = 0; i < num_chunks; i++) {
 		if (written != chunk_offsets[i])
 			BUG("incorrect chunk offset (%"PRIu64" != %"PRIu64") for chunk id %"PRIx32,
@@ -991,7 +1005,10 @@  static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 				BUG("trying to write unknown chunk id %"PRIx32,
 				    chunk_ids[i]);
 		}
+
+		display_progress(progress, i + 1);
 	}
+	stop_progress(&progress);
 
 	if (written != chunk_offsets[num_chunks])
 		BUG("incorrect final offset %"PRIu64" != %"PRIu64,
@@ -1019,7 +1036,7 @@  static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 
 int write_midx_file(const char *object_dir, unsigned flags)
 {
-	return write_midx_internal(object_dir, NULL, NULL);
+	return write_midx_internal(object_dir, NULL, NULL, flags);
 }
 
 void clear_midx_file(struct repository *r)
@@ -1222,7 +1239,7 @@  int expire_midx_packs(struct repository *r, const char *object_dir, unsigned fla
 	free(count);
 
 	if (packs_to_drop.nr)
-		result = write_midx_internal(object_dir, m, &packs_to_drop);
+		result = write_midx_internal(object_dir, m, &packs_to_drop, flags);
 
 	string_list_clear(&packs_to_drop, 0);
 	return result;
@@ -1371,7 +1388,7 @@  int midx_repack(struct repository *r, const char *object_dir, size_t batch_size,
 		goto cleanup;
 	}
 
-	result = write_midx_internal(object_dir, m, NULL);
+	result = write_midx_internal(object_dir, m, NULL, flags);
 	m = NULL;
 
 cleanup: