mbox series

[v6,0/9] packfile: avoid using the 'the_repository' global variable

Message ID cover.1730976185.git.karthik.188@gmail.com (mailing list archive)
Headers show
Series packfile: avoid using the 'the_repository' global variable | expand

Message

karthik nayak Nov. 7, 2024, 2:10 p.m. UTC
The `packfile.c` file uses the global variable 'the_repository' extensively
throughout the code. Let's remove all usecases of this, by modifying the
required functions to accept a 'struct repository' instead. This is to clean up
usage of global state.

The first 3 patches are mostly internal to `packfile.c`, we add the repository
field to the `packed_git` struct and this is used to clear up some useages of
the global variables.

The next 3 patches are more disruptive, they modify the function definition of
`odb_pack_name`, `has_object[_kept]_pack` and `for_each_packed_object` to receive
a repository, helping remove other usages of 'the_repository' variable.

Finally, the last two patches deal with global config values. These values are
localized.

For v5/6 onwards, I've rebased the series off the new master: 8f8d6eee53 (The
seventh batch, 2024-11-01), as a dependency for this series 'jk/dumb-http-finalize'
was merged to master. I've found no conflicts while merging with seen & next. But
since this series does touch multiple files, there could be future conflicts.

Changes in v6:
- Lazy load repository settings in packfile.c. This ensures that the settings are
available for sure and we do not rely on callees setting it.
- Use `size_t` for `delta_base_cache_limit`.

Changes in v5:
- Move packed_git* settings to repo_settings to ensure we don't keep reparsing the
settings in `use_pack`.

Changes in v4:
- Renamed the repository field within `packed_git` and `multi_pack_index` from
`r` to `repo`, while keeping function parameters to be `r`.
- Fixed bad braces. 

Changes in v3:
- Improved commit messages. In the first commit to talk about how packed_git
struct could also be part of the alternates of a repository. In the 7th commit
to talk about the motive behind removing the global variable.
- Changed 'packed_git->repo' to 'packed_git->r' to keep it consistent with the
rest of the code base.
- Replaced 'the_repository' with locally available access to the repository
struct in multiple regions.
- Removed unecessary inclusion of the 'repository.h' header file by forward
declaring the 'repository' struct.
- Replace memcpy with hashcpy.
- Change the logic in the 7th patch to use if else statements.
- Added an extra commit to cleanup `pack-bitmap.c`. 

Karthik Nayak (9):
  packfile: add repository to struct `packed_git`
  packfile: use `repository` from `packed_git` directly
  packfile: pass `repository` to static function in the file
  packfile: pass down repository to `odb_pack_name`
  packfile: pass down repository to `has_object[_kept]_pack`
  packfile: pass down repository to `for_each_packed_object`
  config: make `delta_base_cache_limit` a non-global variable
  config: make `packed_git_(limit|window_size)` non-global variables
  midx: add repository to `multi_pack_index` struct

 builtin/cat-file.c       |   7 +-
 builtin/count-objects.c  |   2 +-
 builtin/fast-import.c    |  15 ++--
 builtin/fsck.c           |  20 +++---
 builtin/gc.c             |   8 ++-
 builtin/index-pack.c     |  20 ++++--
 builtin/pack-objects.c   |  11 +--
 builtin/pack-redundant.c |   2 +-
 builtin/repack.c         |   2 +-
 builtin/rev-list.c       |   2 +-
 commit-graph.c           |   4 +-
 config.c                 |  22 ------
 connected.c              |   3 +-
 diff.c                   |   3 +-
 environment.c            |   3 -
 environment.h            |   1 -
 fsck.c                   |   2 +-
 http.c                   |   4 +-
 list-objects.c           |   7 +-
 midx-write.c             |   2 +-
 midx.c                   |   3 +-
 midx.h                   |   3 +
 object-store-ll.h        |   9 ++-
 pack-bitmap.c            |  90 +++++++++++++++---------
 pack-objects.h           |   3 +-
 pack-write.c             |   1 +
 pack.h                   |   1 +
 packfile.c               | 148 +++++++++++++++++++++++----------------
 packfile.h               |  18 +++--
 promisor-remote.c        |   2 +-
 prune-packed.c           |   2 +-
 reachable.c              |   4 +-
 repo-settings.c          |  14 ++++
 repo-settings.h          |   5 ++
 revision.c               |  13 ++--
 tag.c                    |   2 +-
 36 files changed, 268 insertions(+), 190 deletions(-)

Range-diff against v5:
 -:  ---------- >  1:  6c00e25c86 packfile: add repository to struct `packed_git`
 -:  ---------- >  2:  70fc8a79af packfile: use `repository` from `packed_git` directly
 -:  ---------- >  3:  167a1f3a11 packfile: pass `repository` to static function in the file
 -:  ---------- >  4:  b7cfe78217 packfile: pass down repository to `odb_pack_name`
 -:  ---------- >  5:  5566f5554c packfile: pass down repository to `has_object[_kept]_pack`
 -:  ---------- >  6:  1b26e45a9b packfile: pass down repository to `for_each_packed_object`
 1:  7654bf5e7e !  7:  89313cfed4 config: make `delta_base_cache_limit` a non-global variable
    @@ Commit message
         this value from the repository config, since the value is only used once
         in the entire subsystem.
     
    -    The type of the value is changed from `size_t` to an `unsigned long`
    -    since the default value is small enough to fit inside the latter on all
    -    platforms.
    -
         These changes are made to remove the usage of `delta_base_cache_limit`
         as a global variable in `packfile.c`. This brings us one step closer to
         removing the `USE_THE_REPOSITORY_VARIABLE` definition in `packfile.c`
    @@ builtin/gc.c: struct gc_config {
      	char *repack_filter_to;
      	unsigned long big_pack_threshold;
      	unsigned long max_delta_cache_size;
    -+	unsigned long delta_base_cache_limit;
    ++	size_t delta_base_cache_limit;
      };
      
      #define GC_CONFIG_INIT { \
    @@ builtin/gc.c: struct gc_config {
      
      static void gc_config_release(struct gc_config *cfg)
     @@ builtin/gc.c: static void gc_config(struct gc_config *cfg)
    + {
    + 	const char *value;
    + 	char *owned = NULL;
    ++	unsigned long longval;
      
    + 	if (!git_config_get_value("gc.packrefs", &value)) {
    + 		if (value && !strcmp(value, "notbare"))
    +@@ builtin/gc.c: static void gc_config(struct gc_config *cfg)
      	git_config_get_ulong("gc.bigpackthreshold", &cfg->big_pack_threshold);
      	git_config_get_ulong("pack.deltacachesize", &cfg->max_delta_cache_size);
    -+	git_config_get_ulong("core.deltabasecachelimit", &cfg->delta_base_cache_limit);
      
    ++	if(!git_config_get_ulong("core.deltabasecachelimit", &longval))
    ++		cfg->delta_base_cache_limit = longval;
    ++
      	if (!git_config_get_string("gc.repackfilter", &owned)) {
      		free(cfg->repack_filter);
    + 		cfg->repack_filter = owned;
     @@ builtin/gc.c: static uint64_t estimate_repack_memory(struct gc_config *cfg,
      	 * read_sha1_file() (either at delta calculation phase, or
      	 * writing phase) also fills up the delta base cache
 2:  2730aacd8e !  8:  3a8e3b88df config: make `packed_git_(limit|window_size)` non-global variables
    @@ packfile.c: unsigned char *use_pack(struct packed_git *p,
     -			size_t window_align = packed_git_window_size / 2;
     +			size_t window_align;
      			off_t len;
    -+			struct repo_settings *settings = &p->repo->settings;
    ++			struct repo_settings *settings;
    ++
    ++			/* lazy load the settings incase it hasn't been setup */
    ++			prepare_repo_settings(p->repo);
    ++			settings = &p->repo->settings;
     +
     +			window_align = settings->packed_git_window_size / 2;
      
 3:  8e33d40077 =  9:  2f9a146978 midx: add repository to `multi_pack_index` struct

Comments

Junio C Hamano Nov. 8, 2024, 7:49 a.m. UTC | #1
Karthik Nayak <karthik.188@gmail.com> writes:

> Changes in v6:
> - Lazy load repository settings in packfile.c. This ensures that the settings are
> available for sure and we do not rely on callees setting it.
> - Use `size_t` for `delta_base_cache_limit`.

I'll trust the reviews made while I was gone and will comment only
on the differences between the last iteration.

> diff --git c/builtin/gc.c w/builtin/gc.c
> index 9a10eb58bc..ad80c3aed2 100644
> --- c/builtin/gc.c
> +++ w/builtin/gc.c
> @@ -138,7 +138,7 @@ struct gc_config {
>  	char *repack_filter_to;
>  	unsigned long big_pack_threshold;
>  	unsigned long max_delta_cache_size;
> -	unsigned long delta_base_cache_limit;
> +	size_t delta_base_cache_limit;
>  };

Makes sense.

> @@ -170,6 +170,7 @@ static void gc_config(struct gc_config *cfg)
>  {
>  	const char *value;
>  	char *owned = NULL;
> +	unsigned long longval;
>  
>  	if (!git_config_get_value("gc.packrefs", &value)) {
>  		if (value && !strcmp(value, "notbare"))
> @@ -207,7 +208,9 @@ static void gc_config(struct gc_config *cfg)
>  
>  	git_config_get_ulong("gc.bigpackthreshold", &cfg->big_pack_threshold);
>  	git_config_get_ulong("pack.deltacachesize", &cfg->max_delta_cache_size);
> -	git_config_get_ulong("core.deltabasecachelimit", &cfg->delta_base_cache_limit);
> +
> +	if(!git_config_get_ulong("core.deltabasecachelimit", &longval))
> +		cfg->delta_base_cache_limit = longval;

And this is a sensible way to fill size_t member with the value read
into a ulong.  Should "longval" be named after "unsigned long" instead
of "long", by the way?

There is a required SP missing inside "if(!".

> diff --git c/packfile.c w/packfile.c
> index e1b04a2a6a..46f5369173 100644
> --- c/packfile.c
> +++ w/packfile.c
> @@ -653,7 +653,11 @@ unsigned char *use_pack(struct packed_git *p,
>  		if (!win) {
>  			size_t window_align;
>  			off_t len;
> -			struct repo_settings *settings = &p->repo->settings;
> +			struct repo_settings *settings;
> +
> +			/* lazy load the settings incase it hasn't been setup */
> +			prepare_repo_settings(p->repo);
> +			settings = &p->repo->settings;

This is a bit curious.  I'll read the individual patch that has this
change before commenting on it.
karthik nayak Nov. 8, 2024, 9:28 a.m. UTC | #2
Junio C Hamano <gitster@pobox.com> writes:

> Karthik Nayak <karthik.188@gmail.com> writes:
>
>> Changes in v6:
>> - Lazy load repository settings in packfile.c. This ensures that the settings are
>> available for sure and we do not rely on callees setting it.
>> - Use `size_t` for `delta_base_cache_limit`.
>
> I'll trust the reviews made while I was gone and will comment only
> on the differences between the last iteration.
>
>> diff --git c/builtin/gc.c w/builtin/gc.c
>> index 9a10eb58bc..ad80c3aed2 100644
>> --- c/builtin/gc.c
>> +++ w/builtin/gc.c
>> @@ -138,7 +138,7 @@ struct gc_config {
>>  	char *repack_filter_to;
>>  	unsigned long big_pack_threshold;
>>  	unsigned long max_delta_cache_size;
>> -	unsigned long delta_base_cache_limit;
>> +	size_t delta_base_cache_limit;
>>  };
>
> Makes sense.
>
>> @@ -170,6 +170,7 @@ static void gc_config(struct gc_config *cfg)
>>  {
>>  	const char *value;
>>  	char *owned = NULL;
>> +	unsigned long longval;
>>
>>  	if (!git_config_get_value("gc.packrefs", &value)) {
>>  		if (value && !strcmp(value, "notbare"))
>> @@ -207,7 +208,9 @@ static void gc_config(struct gc_config *cfg)
>>
>>  	git_config_get_ulong("gc.bigpackthreshold", &cfg->big_pack_threshold);
>>  	git_config_get_ulong("pack.deltacachesize", &cfg->max_delta_cache_size);
>> -	git_config_get_ulong("core.deltabasecachelimit", &cfg->delta_base_cache_limit);
>> +
>> +	if(!git_config_get_ulong("core.deltabasecachelimit", &longval))
>> +		cfg->delta_base_cache_limit = longval;
>
> And this is a sensible way to fill size_t member with the value read
> into a ulong.  Should "longval" be named after "unsigned long" instead
> of "long", by the way?
>
> There is a required SP missing inside "if(!".
>

Agreed, will fix both and send in a new version.

>> diff --git c/packfile.c w/packfile.c
>> index e1b04a2a6a..46f5369173 100644
>> --- c/packfile.c
>> +++ w/packfile.c
>> @@ -653,7 +653,11 @@ unsigned char *use_pack(struct packed_git *p,
>>  		if (!win) {
>>  			size_t window_align;
>>  			off_t len;
>> -			struct repo_settings *settings = &p->repo->settings;
>> +			struct repo_settings *settings;
>> +
>> +			/* lazy load the settings incase it hasn't been setup */
>> +			prepare_repo_settings(p->repo);
>> +			settings = &p->repo->settings;
>
> This is a bit curious.  I'll read the individual patch that has this
> change before commenting on it.