Message ID | cover-v2-0.6-00000000000-20230112T124842Z-avarab@gmail.com (mailing list archive) |
---|---|
Headers | show |
Series | cache API: always have a "istate->repo" | expand |
On 1/12/2023 7:55 AM, Ævar Arnfjörð Bjarmason wrote: > * Derrick suggested in > https://lore.kernel.org/git/6b92fad2-6b74-fddb-679c-8c8735e7103d@github.com/ > that things might be nicer if we had an explicit initializer, which > was also the subject of the previous discussion at > https://lore.kernel.org/git/xmqqmt6vqo2w.fsf@gitster.g/; but > concluded that it was probably best to leave it for now. > > I tried it out, and I think it's worth just doing that now, which > is why there's a new 5/6 here: We start by adding an > INDEX_STATE_INIT macro, and corresponding function. > > There's a bit of churn in 6/6 as all of those now will take a > "repo" argument, but I think the end result is worth it, because > even if "repo" remains the only thing that we need to initialize > we're now able to use ALLOC_ARRAY() instead of CALLOC_ARRAY(). > > We'll thus be helped by analysis tools (which would show access to > un-init'd memory) if we miss properly init-ing not just the "repo" > field, but anything in the structure, so our test coverage will be > better. > > It also makes the code easier to follow and change, as it's now > more obvious where we initialize this, and it'll be easier to > change it in the future if e.g. we add a new member that has > mandatory initialization (e.g. a "struct strbuf"). I wasn't expecting the initializer idea to work as well as it has. Now, Patch 5 does all the heavy lifting and Patch 6 is an easy read. Outside of one question about the istate->initialized member (which might not need any change) I'm happy with this version. Thanks, -Stolee
Derrick Stolee <derrickstolee@github.com> writes: > I wasn't expecting the initializer idea to work as well as it has. > Now, Patch 5 does all the heavy lifting and Patch 6 is an easy read. > > Outside of one question about the istate->initialized member (which > might not need any change) I'm happy with this version. Thanks, both. And especially for finding 913e0e99 (unpack_trees(): protect the handcrafted in-core index from read_cache(), 2008-08-23). Will queue.
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > The "struct index_state" contains a "repo" member, which should be a > pointer to the repository for that index, but which due to us > constructing such structs on an ad-hoc basis in various places wasn't > always available. I'd exclude 6/6 for now, as it seems to depend on some changes only in 'next'. Feel free to resend only that step with reduced scope so that it applies to 'master', and send in incremental updates when each of these topics that are only in 'next' graduates. Thanks.
On Fri, Jan 13 2023, Junio C Hamano wrote: > Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > >> The "struct index_state" contains a "repo" member, which should be a >> pointer to the repository for that index, but which due to us >> constructing such structs on an ad-hoc basis in various places wasn't >> always available. > > I'd exclude 6/6 for now, as it seems to depend on some changes only > in 'next'. Feel free to resend only that step with reduced scope so > that it applies to 'master', and send in incremental updates when > each of these topics that are only in 'next' graduates. Okey, the 6/6 requires ds/omit-trailing-hash-in-index. As both it and the 1-5/6 of this are in "next" now I think it's best that I just submit the 6/6 stand-alone after both of those have graduated.
On 16/01/2023 13:38, Ævar Arnfjörð Bjarmason wrote: > On Fri, Jan 13 2023, Junio C Hamano wrote: > >> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: >> >>> The "struct index_state" contains a "repo" member, which should be a >>> pointer to the repository for that index, but which due to us >>> constructing such structs on an ad-hoc basis in various places wasn't >>> always available. >> I'd exclude 6/6 for now, as it seems to depend on some changes only >> in 'next'. Feel free to resend only that step with reduced scope so >> that it applies to 'master', and send in incremental updates when >> each of these topics that are only in 'next' graduates. > Okey, the 6/6 requires ds/omit-trailing-hash-in-index. As both it and > the 1-5/6 of this are in "next" now I think it's best that I just submit > the 6/6 stand-alone after both of those have graduated. > micro-nit: The commit message of 5/6 starts "As we'll see in a subsequent commit..", which may need a slight tweak if 6/6 becomes 'far away' in the commit tree. -- Philip
Philip Oakley <philipoakley@iee.email> writes: > On 16/01/2023 13:38, Ævar Arnfjörð Bjarmason wrote: >> On Fri, Jan 13 2023, Junio C Hamano wrote: >> >>> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: >>> >>>> The "struct index_state" contains a "repo" member, which should be a >>>> pointer to the repository for that index, but which due to us >>>> constructing such structs on an ad-hoc basis in various places wasn't >>>> always available. >>> I'd exclude 6/6 for now, as it seems to depend on some changes only >>> in 'next'. Feel free to resend only that step with reduced scope so >>> that it applies to 'master', and send in incremental updates when >>> each of these topics that are only in 'next' graduates. >> Okey, the 6/6 requires ds/omit-trailing-hash-in-index. As both it and >> the 1-5/6 of this are in "next" now I think it's best that I just submit >> the 6/6 stand-alone after both of those have graduated. >> > micro-nit: The commit message of 5/6 starts "As we'll see in a > subsequent commit..", which may need a slight tweak if 6/6 becomes 'far > away' in the commit tree. Indeed. I'd use Hopefully in some not so distant future, we'll get advantages from always initializing the "repo" member ... for now.
The "struct index_state" contains a "repo" member, which should be a pointer to the repository for that index, but which due to us constructing such structs on an ad-hoc basis in various places wasn't always available. We'd thus end up with code like this, in the recent ds/omit-trailing-hash-in-index topic: struct repository *r = istate->repo ? istate->repo : the_repository; Really we should be able to trust the "istate->repo", but were carrying those sorts of conditionals because our index might come from a manually constructed source, so we'd have to fall back to "the_repository". This series changes the relvant code so the "repo" field is always non-NULL, as 6/6 here shows we had various workarounds in place for that, which can now go away. For v1 and more general summary, see: https://lore.kernel.org/git/cover-0.5-00000000000-20230110T060340Z-avarab@gmail.com/ See https://github.com/avar/git/tree/avar/do-not-lazy-populate-istate-repo-2 for passing CI and a fetchable branch for this topic. Change since v1: * Typo/rewording/correcting of commit messages * Kept a "TODO" comment which Derrick notes I shouldn't have removed * Derrick suggested in https://lore.kernel.org/git/6b92fad2-6b74-fddb-679c-8c8735e7103d@github.com/ that things might be nicer if we had an explicit initializer, which was also the subject of the previous discussion at https://lore.kernel.org/git/xmqqmt6vqo2w.fsf@gitster.g/; but concluded that it was probably best to leave it for now. I tried it out, and I think it's worth just doing that now, which is why there's a new 5/6 here: We start by adding an INDEX_STATE_INIT macro, and corresponding function. There's a bit of churn in 6/6 as all of those now will take a "repo" argument, but I think the end result is worth it, because even if "repo" remains the only thing that we need to initialize we're now able to use ALLOC_ARRAY() instead of CALLOC_ARRAY(). We'll thus be helped by analysis tools (which would show access to un-init'd memory) if we miss properly init-ing not just the "repo" field, but anything in the structure, so our test coverage will be better. It also makes the code easier to follow and change, as it's now more obvious where we initialize this, and it'll be easier to change it in the future if e.g. we add a new member that has mandatory initialization (e.g. a "struct strbuf"). Ævar Arnfjörð Bjarmason (6): builtin/difftool.c: { 0 }-initialize rather than using memset() sparse-index.c: expand_to_path() can assume non-NULL "istate" sparse-index API: BUG() out on NULL ensure_full_index() read-cache.c: refactor set_new_index_sparsity() for subsequent commit cache API: add a "INDEX_STATE_INIT" macro/function, add release_index() treewide: always have a valid "index_state.repo" member apply.c | 2 +- builtin/difftool.c | 4 +--- builtin/sparse-checkout.c | 1 + builtin/stash.c | 16 ++++++------- builtin/worktree.c | 2 +- cache.h | 16 +++++++++++++ fsmonitor-settings.c | 14 ------------ fsmonitor.c | 2 +- merge-recursive.c | 2 +- read-cache.c | 48 +++++++++++++++++++-------------------- repository.c | 8 +++++-- revision.c | 2 +- sparse-index.c | 15 ++++-------- split-index.c | 3 ++- unpack-trees.c | 5 ++-- 15 files changed, 69 insertions(+), 71 deletions(-) Range-diff against v1: 1: 214fe7d3fc2 = 1: 214fe7d3fc2 builtin/difftool.c: { 0 }-initialize rather than using memset() 2: 6db74fe958f = 2: 6db74fe958f sparse-index.c: expand_to_path() can assume non-NULL "istate" 3: d96388acef6 ! 3: 25e9cff0e97 sparse-index API: fix TODO, BUG() out on NULL ensure_full_index() @@ Metadata Author: Ævar Arnfjörð Bjarmason <avarab@gmail.com> ## Commit message ## - sparse-index API: fix TODO, BUG() out on NULL ensure_full_index() + sparse-index API: BUG() out on NULL ensure_full_index() Make the ensure_full_index() function stricter, and have it only accept a non-NULL "struct index_state". This function (and this behavior) was added in [1]. The only reason it needed to be this lax was due to interaction with - repo_index_has_changes(). See the addition of that code in . This - fixes one of the TODO comments added in 0c18c059a15, the other one was - already removed in [3]. + repo_index_has_changes(). See the addition of that code in [2]. The other reason for why this was needed dates back to interaction - with code added in [4]. In [5] we started calling ensure_full_index() + with code added in [3]. In [4] we started calling ensure_full_index() in unpack_trees(), but the caller added in 34110cd4e39 wants to pass us a NULL "dst_index". Let's instead do the NULL check in unpack_trees() itself. 1. 4300f8442a2 (sparse-index: implement ensure_full_index(), 2021-03-30) 2. 0c18c059a15 (read-cache: ensure full index, 2021-04-01) - 3. d76723ee531 (status: use sparse-index throughout, 2021-07-14). - 4. 34110cd4e39 (Make 'unpack_trees()' have a separate source and + 3. 34110cd4e39 (Make 'unpack_trees()' have a separate source and destination index, 2008-03-06) - 5. 6863df35503 (unpack-trees: ensure full index, 2021-03-30) + 4. 6863df35503 (unpack-trees: ensure full index, 2021-03-30) Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> - ## read-cache.c ## -@@ read-cache.c: int repo_index_has_changes(struct repository *repo, - } - diff_flush(&opt); - return opt.flags.has_changes != 0; -- } else { -- /* TODO: audit for interaction with sparse-index. */ -+ } else if (istate) { - ensure_full_index(istate); - for (i = 0; sb && i < istate->cache_nr; i++) { - if (i) -@@ read-cache.c: int repo_index_has_changes(struct repository *repo, - strbuf_addstr(sb, istate->cache[i]->name); - } - return !!istate->cache_nr; -+ } else { -+ return 0; - } - } - - ## sparse-index.c ## @@ sparse-index.c: void expand_index(struct index_state *istate, struct pattern_list *pl) * If the index is already full, then keep it full. We will convert 4: e514a881e38 = 4: a75977bd37b read-cache.c: refactor set_new_index_sparsity() for subsequent commit -: ----------- > 5: ae256efe94a cache API: add a "INDEX_STATE_INIT" macro/function, add release_index() 5: b3b0e592101 ! 6: 291233b0420 treewide: always have a valid "index_state.repo" member @@ Commit message second-guessing is the "istate->repo ? istate->repo : the_repository" code in [2]. We can now simply use "istate->repo". + We're doing this by making use of the INDEX_STATE_INIT() macro (and + corresponding function) added in the preceding commit, which now have + mandatory "repo" arguments. As seen here there are cases where we set + it to NULL, but only if we're about to fill in the correct non-NULL + "repo" right afterwards. + For "fsmonitor-settings.c" we can remove the initialization of a NULL "r" argument to "the_repository". This was added back in [3], and was needed at the time for callers that would pass us the "r" from an @@ apply.c: static int preimage_oid_in_gitlink_patch(struct patch *p, struct object static int build_fake_ancestor(struct apply_state *state, struct patch *list) { struct patch *patch; -- struct index_state result = { NULL }; -+ struct index_state result = { .repo = state->repo }; +- struct index_state result = INDEX_STATE_INIT; ++ struct index_state result = INDEX_STATE_INIT(state->repo); struct lock_file lock = LOCK_INIT; int res; @@ builtin/difftool.c: static int run_dir_diff(const char *extcmd, int symlinks, co struct hashmap symlinks2 = HASHMAP_INIT(pair_cmp, NULL); struct hashmap_iter iter; struct pair_entry *entry; -- struct index_state wtindex = { 0 }; -+ struct index_state wtindex = { .repo = the_repository }; +- struct index_state wtindex = INDEX_STATE_INIT; ++ struct index_state wtindex = INDEX_STATE_INIT(the_repository); struct checkout lstate, rstate; int err = 0; struct child_process cmd = CHILD_PROCESS_INIT; @@ builtin/sparse-checkout.c: static int update_working_directory(struct pattern_li o.head_idx = -1; o.src_index = r->index; o.dst_index = r->index; -+ o.result.repo = r; +- index_state_init(&o.result); ++ index_state_init(&o.result, r); o.skip_sparse_checkout = 0; o.pl = pl; @@ builtin/stash.c: static int save_untracked_files(struct stash_info *info, struct int ret = 0; struct strbuf untracked_msg = STRBUF_INIT; struct child_process cp_upd_index = CHILD_PROCESS_INIT; -- struct index_state istate = { NULL }; -+ struct index_state istate = { .repo = the_repository }; +- struct index_state istate = INDEX_STATE_INIT; ++ struct index_state istate = INDEX_STATE_INIT(the_repository); cp_upd_index.git_cmd = 1; strvec_pushl(&cp_upd_index.args, "update-index", "-z", "--add", @@ builtin/stash.c: static int stash_staged(struct stash_info *info, struct strbuf { int ret = 0; struct child_process cp_diff_tree = CHILD_PROCESS_INIT; -- struct index_state istate = { NULL }; -+ struct index_state istate = { .repo = the_repository }; +- struct index_state istate = INDEX_STATE_INIT; ++ struct index_state istate = INDEX_STATE_INIT(the_repository); if (write_index_as_tree(&info->w_tree, &istate, the_repository->index_file, 0, NULL)) { @@ builtin/stash.c: static int stash_patch(struct stash_info *info, const struct pa int ret = 0; struct child_process cp_read_tree = CHILD_PROCESS_INIT; struct child_process cp_diff_tree = CHILD_PROCESS_INIT; -- struct index_state istate = { NULL }; -+ struct index_state istate = { .repo = the_repository }; +- struct index_state istate = INDEX_STATE_INIT; ++ struct index_state istate = INDEX_STATE_INIT(the_repository); char *old_index_env = NULL, *old_repo_index_file; remove_path(stash_index_path.buf); @@ builtin/stash.c: static int stash_working_tree(struct stash_info *info, const st struct rev_info rev; struct child_process cp_upd_index = CHILD_PROCESS_INIT; struct strbuf diff_output = STRBUF_INIT; -- struct index_state istate = { NULL }; -+ struct index_state istate = { .repo = the_repository }; +- struct index_state istate = INDEX_STATE_INIT; ++ struct index_state istate = INDEX_STATE_INIT(the_repository); init_revisions(&rev, NULL); copy_pathspec(&rev.prune_data, ps); @@ builtin/worktree.c: static int unlock_worktree(int ac, const char **av, const ch static void validate_no_submodules(const struct worktree *wt) { -- struct index_state istate = { NULL }; -+ struct index_state istate = { .repo = the_repository }; +- struct index_state istate = INDEX_STATE_INIT; ++ struct index_state istate = INDEX_STATE_INIT(the_repository); struct strbuf path = STRBUF_INIT; int i, found_submodules = 0; + ## cache.h ## +@@ cache.h: struct index_state { + * If the variable won't be used again, use release_index() to free() + * its resources. If it needs to be used again use discard_index(), + * which does the same thing, but will use use index_state_init() at +- * the end. ++ * the end. The discard_index() will use its own "istate->repo" as the ++ * "r" argument to index_state_init() in that case. + */ +-#define INDEX_STATE_INIT { 0 } +-void index_state_init(struct index_state *istate); ++#define INDEX_STATE_INIT(r) { \ ++ .repo = (r), \ ++} ++void index_state_init(struct index_state *istate, struct repository *r); + void release_index(struct index_state *istate); + + /* Name hashing */ + ## fsmonitor-settings.c ## @@ fsmonitor-settings.c: static void lookup_fsmonitor_settings(struct repository *r) @@ merge-recursive.c: static int unpack_trees_start(struct merge_options *opt, { int rc; struct tree_desc t[3]; -- struct index_state tmp_index = { NULL }; -+ struct index_state tmp_index = { .repo = opt->repo }; +- struct index_state tmp_index = INDEX_STATE_INIT; ++ struct index_state tmp_index = INDEX_STATE_INIT(opt->repo); memset(&opt->priv->unpack_opts, 0, sizeof(opt->priv->unpack_opts)); if (opt->priv->call_depth) @@ read-cache.c: int do_read_index(struct index_state *istate, const char *path, in * If the command explicitly requires a full index, force it * to be full. Otherwise, correct the sparsity based on repository @@ read-cache.c: int read_index_from(struct index_state *istate, const char *path, - discard_index(split_index->base); + release_index(split_index->base); else - CALLOC_ARRAY(split_index->base, 1); -+ split_index->base->repo = istate->repo; + ALLOC_ARRAY(split_index->base, 1); +- index_state_init(split_index->base); ++ index_state_init(split_index->base, istate->repo); base_oid_hex = oid_to_hex(&split_index->base_oid); base_path = xstrfmt("%s/sharedindex.%s", gitdir, base_oid_hex); +@@ read-cache.c: int is_index_unborn(struct index_state *istate) + return (!istate->cache_nr && !istate->timestamp.sec); + } + +-void index_state_init(struct index_state *istate) ++void index_state_init(struct index_state *istate, struct repository *r) + { +- struct index_state blank = INDEX_STATE_INIT; ++ struct index_state blank = INDEX_STATE_INIT(r); + memcpy(istate, &blank, sizeof(*istate)); + } + +@@ read-cache.c: void release_index(struct index_state *istate) + void discard_index(struct index_state *istate) + { + release_index(istate); +- index_state_init(istate); ++ index_state_init(istate, istate->repo); + } + + /* @@ read-cache.c: static int do_write_index(struct index_state *istate, struct tempfile *tempfile, int ieot_entries = 1; struct index_entry_offset_table *ieot = NULL; @@ repository.c: void initialize_the_repository(void) the_repo.remote_state = remote_state_new(); the_repo.parsed_objects = parsed_object_pool_new(); -+ the_index.repo = the_repository; ++ index_state_init(&the_index, the_repository); + repo_set_hash_algo(&the_repo, GIT_HASH_SHA1); } +@@ repository.c: int repo_read_index(struct repository *repo) + + if (!repo->index) { + ALLOC_ARRAY(repo->index, 1); +- index_state_init(repo->index); ++ index_state_init(repo->index, NULL /* set below */); + } + + /* Complete the double-reference */ ## revision.c ## @@ revision.c: void add_index_objects_to_pending(struct rev_info *revs, unsigned int flags) worktrees = get_worktrees(); for (p = worktrees; *p; p++) { struct worktree *wt = *p; -- struct index_state istate = { NULL }; -+ struct index_state istate = { .repo = revs->repo }; +- struct index_state istate = INDEX_STATE_INIT; ++ struct index_state istate = INDEX_STATE_INIT(revs->repo); if (wt->is_current) continue; /* current index already taken care of */ @@ split-index.c @@ split-index.c: void move_cache_to_base_index(struct index_state *istate) } - CALLOC_ARRAY(si->base, 1); -+ si->base->repo = istate->repo; + ALLOC_ARRAY(si->base, 1); +- index_state_init(si->base); ++ index_state_init(si->base, istate->repo); si->base->version = istate->version; /* zero timestamp disables racy test in ce_write_index() */ si->base->timestamp = istate->timestamp; ## unpack-trees.c ## @@ unpack-trees.c: int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options + populate_from_existing_patterns(o, &pl); } - memset(&o->result, 0, sizeof(o->result)); -+ o->result.repo = o->src_index->repo; +- index_state_init(&o->result); ++ index_state_init(&o->result, o->src_index->repo); o->result.initialized = 1; o->result.timestamp.sec = o->src_index->timestamp.sec; o->result.timestamp.nsec = o->src_index->timestamp.nsec;