Message ID | f1897b880729b649ab24da14cbc3432d44b7c731.1679500859.git.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Fix a few split-index bugs | expand |
"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com> writes: > From: Johannes Schindelin <johannes.schindelin@gmx.de> > > When a split-index is in effect, the `$GIT_DIR/index` file needs to > contain a "link" extension that contains all the information about the > split-index, including the information about the shared index. > ... > Let's stop zeroing out the `base_oid` to indicate that the "link" > extension should not be written. Nicely explained. > One might be tempted to simply call `discard_split_index()` instead, > under the assumption that Git decided to write a non-split index and > therefore the the `split_index` structure might no longer be wanted. "the the". > +enum strip_extensions { > + WRITE_ALL_EXTENSIONS = 0, > + STRIP_ALL_EXTENSIONS = 1, > + STRIP_LINK_EXTENSION_ONLY = 2 > +}; We do not need to spell out the specific values for this enum; the users' (i.e. the callers of do_write_index()) sole requirement is for these symbols to have different values. Also do we envision that (1) we would need to keep STRIP_LINK_ONLY to be with the largest value among the enum values, or (2) we would never add new value to the set? Otherwise let's end the last one with a trailing comma. Looking at the way strip_extensions variable is used in do_write_index(), an alternative design might be to make it a set of bits (e.g. unsigned write_extension) and give one bit to each extension. But such a clean-up is better left outside the topic, I would imagine, as we do not have any need to skip an arbitrary set of extensions right now. > +/* > + * Write the Git index into a `.lock` file > + * > + * If `strip_link_extension` is non-zero, avoid writing any "link" extension > + * (used by the split-index feature). > + */ Not exposing "enum strip_extensions" to the caller of this function, like this patch does, is probably a very safe and sensible thing to do. We do not have a reason to allow its callers to (perhaps mistakenly) pass STRIP_ALL_EXTENSIONS to it. > static int do_write_locked_index(struct index_state *istate, struct lock_file *lock, > - unsigned flags) > + unsigned flags, int strip_link_extension) > { > int ret; > int was_full = istate->sparse_index == INDEX_EXPANDED; > @@ -3185,7 +3197,7 @@ static int do_write_locked_index(struct index_state *istate, struct lock_file *l > */ > trace2_region_enter_printf("index", "do_write_index", the_repository, > "%s", get_lock_file_path(lock)); > - ret = do_write_index(istate, lock->tempfile, 0, flags); > + ret = do_write_index(istate, lock->tempfile, strip_link_extension ? STRIP_LINK_EXTENSION_ONLY : 0, flags); > trace2_region_leave_printf("index", "do_write_index", the_repository, > "%s", get_lock_file_path(lock)); > OK. Very nicely done.
On 3/22/23 5:24 PM, Junio C Hamano wrote: > "Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com> > writes: > >> From: Johannes Schindelin <johannes.schindelin@gmx.de> >> >> When a split-index is in effect, the `$GIT_DIR/index` file needs to >> contain a "link" extension that contains all the information about the >> split-index, including the information about the shared index. >> ... >> Let's stop zeroing out the `base_oid` to indicate that the "link" >> extension should not be written. > > Nicely explained. > >> One might be tempted to simply call `discard_split_index()` instead, >> under the assumption that Git decided to write a non-split index and >> therefore the the `split_index` structure might no longer be wanted. > > "the the". > >> +enum strip_extensions { >> + WRITE_ALL_EXTENSIONS = 0, >> + STRIP_ALL_EXTENSIONS = 1, >> + STRIP_LINK_EXTENSION_ONLY = 2 >> +}; > > We do not need to spell out the specific values for this enum; the > users' (i.e. the callers of do_write_index()) sole requirement is > for these symbols to have different values. There are several calls to do_write_locked_index() that pass 0 or 1 as the new final arg. If we update them to use these enum values, then we don't need integer values here. > > Also do we envision that (1) we would need to keep STRIP_LINK_ONLY > to be with the largest value among the enum values, or (2) we would > never add new value to the set? Otherwise let's end the last one > with a trailing comma. > > Looking at the way strip_extensions variable is used in > do_write_index(), an alternative design might be to make it a set of > bits (e.g. unsigned write_extension) and give one bit to each > extension. But such a clean-up is better left outside the topic, I > would imagine, as we do not have any need to skip an arbitrary set > of extensions right now. Agreed, I thought about suggesting a set of bits too, but right now we only need to strip all of them or just this one. > >> +/* >> + * Write the Git index into a `.lock` file >> + * >> + * If `strip_link_extension` is non-zero, avoid writing any "link" extension >> + * (used by the split-index feature). >> + */ > > Not exposing "enum strip_extensions" to the caller of this function, > like this patch does, is probably a very safe and sensible thing to > do. We do not have a reason to allow its callers to (perhaps > mistakenly) pass STRIP_ALL_EXTENSIONS to it. > >> static int do_write_locked_index(struct index_state *istate, struct lock_file *lock, >> - unsigned flags) >> + unsigned flags, int strip_link_extension) >> { >> int ret; >> int was_full = istate->sparse_index == INDEX_EXPANDED; >> @@ -3185,7 +3197,7 @@ static int do_write_locked_index(struct index_state *istate, struct lock_file *l >> */ >> trace2_region_enter_printf("index", "do_write_index", the_repository, >> "%s", get_lock_file_path(lock)); >> - ret = do_write_index(istate, lock->tempfile, 0, flags); >> + ret = do_write_index(istate, lock->tempfile, strip_link_extension ? STRIP_LINK_EXTENSION_ONLY : 0, flags); In the else of the ?: operator, could we use the WRITE_ALL_EXTENSIONS instead of 0? >> trace2_region_leave_printf("index", "do_write_index", the_repository, >> "%s", get_lock_file_path(lock)); >> > > OK. > > Very nicely done.
On 3/22/23 12:00 PM, Johannes Schindelin via GitGitGadget wrote: > From: Johannes Schindelin <johannes.schindelin@gmx.de> > > When a split-index is in effect, the `$GIT_DIR/index` file needs to > contain a "link" extension that contains all the information about the > split-index, including the information about the shared index. > > However, in some cases Git needs to suppress writing that "link" > extension (i.e. to fall back to writing a full index) even if the > in-memory index structure _has_ a `split_index` configured. This is the > case e.g. when "too many not shared" index entries exist. > > In such instances, the current code sets the `base_oid` field of said > `split_index` structure to all-zero to indicate that `do_write_index()` > should skip writing the "link" extension. > > This can lead to problems later on, when the in-memory index is still > used to perform other operations and eventually wants to write a > split-index, detects the presence of the `split_index` and reuses that, > too (under the assumption that it has been initialized correctly and > still has a non-null `base_oid`). > > Let's stop zeroing out the `base_oid` to indicate that the "link" > extension should not be written. > > One might be tempted to simply call `discard_split_index()` instead, > under the assumption that Git decided to write a non-split index and > therefore the the `split_index` structure might no longer be wanted. > However, that is not possible because that would release index entries > in `split_index->base` that are likely to still be in use. Therefore we > cannot do that. > > The next best thing we _can_ do is to introduce a flag, specifically > indicating when the "link" extension should be skipped. So that's what > we do here. > > Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> > --- > read-cache.c | 37 ++++++++++++++++++++++-------------- > t/t7527-builtin-fsmonitor.sh | 2 +- > 2 files changed, 24 insertions(+), 15 deletions(-) > > diff --git a/read-cache.c b/read-cache.c > index b09128b1884..8fcb2d54c05 100644 > --- a/read-cache.c > +++ b/read-cache.c > @@ -2868,6 +2868,12 @@ static int record_ieot(void) > return !git_config_get_index_threads(&val) && val != 1; > } > > +enum strip_extensions { > + WRITE_ALL_EXTENSIONS = 0, > + STRIP_ALL_EXTENSIONS = 1, > + STRIP_LINK_EXTENSION_ONLY = 2 > +}; Earlier (in a response to Junio's response on this commit) I said that I didn't think we needed to make a bit set here, but I want to re-think that or at least walk thru the change and talk out loud. I'll explain in-line below. > + > /* > * On success, `tempfile` is closed. If it is the temporary file > * of a `struct lock_file`, we will therefore effectively perform > @@ -2876,7 +2882,7 @@ static int record_ieot(void) > * rely on it. > */ > static int do_write_index(struct index_state *istate, struct tempfile *tempfile, > - int strip_extensions, unsigned flags) > + enum strip_extensions strip_extensions, unsigned flags) > { > uint64_t start = getnanotime(); > struct hashfile *f; > @@ -3045,7 +3051,7 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile, > return -1; > } > > - if (!strip_extensions && istate->split_index && > + if (strip_extensions == WRITE_ALL_EXTENSIONS && istate->split_index && > !is_null_oid(&istate->split_index->base_oid)) { (I hate all of this double negative logic...) Here we only want the extension if we have WRITE_ALL, so that is NOT STRIP_ALL and NOT STRIP_LINK_ONLY, so that is OK. > struct strbuf sb = STRBUF_INIT; > > @@ -3060,7 +3066,7 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile, > if (err) > return -1; > } > - if (!strip_extensions && !drop_cache_tree && istate->cache_tree) { > + if (strip_extensions != STRIP_ALL_EXTENSIONS && !drop_cache_tree && istate->cache_tree) { Here we only want the extension when NOT STRIP_ALL, so this is either WRITE_ALL or STRIP_LINK_ONLY, so this is OK. The rest are the same, so I'll omit them. [...] All of this looks correct, but I stumbled over things on my first or second reading. I wonder if it would it simplify things to define this as: enum strip_extensions { WRITE_ALL_EXTENSIONS = 0, STRIP_LINK_EXTENSION = (1<0), STRIP_OTHER_EXTENSIONS = (1<1), STRIP_ALL_EXTENSIONS = (STRIP_LINK_EXTENSION | STRIP_OTHER_EXTENSIONS), }; Then the link test becomes: if ( ! (strip_extensions & STRIP_LINK_EXTENSION) && istate->split_index && ...) { and the others become: if ( ! (strip_extensions & STRIP_OTHER_EXTENSIONS) && ...) { If we need to add the ability later to strip an individual, we can easily add a bit to the enum and update the _ALL_ mask and the corresponding `if` test. In a later commit (probably in another series), we can invert these double negatives to improve readability. > +/* > + * Write the Git index into a `.lock` file > + * > + * If `strip_link_extension` is non-zero, avoid writing any "link" extension > + * (used by the split-index feature). > + */ > static int do_write_locked_index(struct index_state *istate, struct lock_file *lock, > - unsigned flags) > + unsigned flags, int strip_link_extension) > { > int ret; > int was_full = istate->sparse_index == INDEX_EXPANDED; > @@ -3185,7 +3197,7 @@ static int do_write_locked_index(struct index_state *istate, struct lock_file *l > */ > trace2_region_enter_printf("index", "do_write_index", the_repository, > "%s", get_lock_file_path(lock)); > - ret = do_write_index(istate, lock->tempfile, 0, flags); > + ret = do_write_index(istate, lock->tempfile, strip_link_extension ? STRIP_LINK_EXTENSION_ONLY : 0, flags); > trace2_region_leave_printf("index", "do_write_index", the_repository, > "%s", get_lock_file_path(lock)); > > @@ -3214,7 +3226,7 @@ static int write_split_index(struct index_state *istate, > { > int ret; > prepare_to_write_split_index(istate); > - ret = do_write_locked_index(istate, lock, flags); > + ret = do_write_locked_index(istate, lock, flags, 0); could we use the enum values here instead of 0 ? > finish_writing_split_index(istate); > return ret; > } > @@ -3366,9 +3378,7 @@ int write_locked_index(struct index_state *istate, struct lock_file *lock, > if ((!si && !test_split_index_env) || > alternate_index_output || > (istate->cache_changed & ~EXTMASK)) { > - if (si) > - oidclr(&si->base_oid); > - ret = do_write_locked_index(istate, lock, flags); > + ret = do_write_locked_index(istate, lock, flags, 1); and here > goto out; > } > > @@ -3394,8 +3404,7 @@ int write_locked_index(struct index_state *istate, struct lock_file *lock, > /* Same initial permissions as the main .git/index file */ > temp = mks_tempfile_sm(git_path("sharedindex_XXXXXX"), 0, 0666); > if (!temp) { > - oidclr(&si->base_oid); > - ret = do_write_locked_index(istate, lock, flags); > + ret = do_write_locked_index(istate, lock, flags, 1); and here > goto out; > } > ret = write_shared_index(istate, &temp, flags); > diff --git a/t/t7527-builtin-fsmonitor.sh b/t/t7527-builtin-fsmonitor.sh > index cbafdd69602..9fab9a2ab38 100755 > --- a/t/t7527-builtin-fsmonitor.sh > +++ b/t/t7527-builtin-fsmonitor.sh > @@ -1003,7 +1003,7 @@ test_expect_success !UNICODE_COMPOSITION_SENSITIVE 'Unicode nfc/nfd' ' > egrep "^event: nfd/d_${utf8_nfc}/?$" ./unicode.trace > ' > > -test_expect_failure 'split-index and FSMonitor work well together' ' > +test_expect_success 'split-index and FSMonitor work well together' ' > git init split-index && > test_when_finished "git -C \"$PWD/split-index\" \ > fsmonitor--daemon stop" && Thanks Jeff
Jeff Hostetler <git@jeffhostetler.com> writes: >>> +enum strip_extensions { >>> + WRITE_ALL_EXTENSIONS = 0, >>> + STRIP_ALL_EXTENSIONS = 1, >>> + STRIP_LINK_EXTENSION_ONLY = 2 >>> +}; >> We do not need to spell out the specific values for this enum; the >> users' (i.e. the callers of do_write_index()) sole requirement is >> for these symbols to have different values. > > There are several calls to do_write_locked_index() that pass 0 or 1 > as the new final arg. If we update them to use these enum values, > then we don't need integer values here. Good eyes. Yes, the new caller that selectively passes STRIP_LINK_EXTENSION_ONLY should pass WRITE_ALL_EXTENSIONS, not 0, on the other side of ?: as you pointed out. Thanks.
diff --git a/read-cache.c b/read-cache.c index b09128b1884..8fcb2d54c05 100644 --- a/read-cache.c +++ b/read-cache.c @@ -2868,6 +2868,12 @@ static int record_ieot(void) return !git_config_get_index_threads(&val) && val != 1; } +enum strip_extensions { + WRITE_ALL_EXTENSIONS = 0, + STRIP_ALL_EXTENSIONS = 1, + STRIP_LINK_EXTENSION_ONLY = 2 +}; + /* * On success, `tempfile` is closed. If it is the temporary file * of a `struct lock_file`, we will therefore effectively perform @@ -2876,7 +2882,7 @@ static int record_ieot(void) * rely on it. */ static int do_write_index(struct index_state *istate, struct tempfile *tempfile, - int strip_extensions, unsigned flags) + enum strip_extensions strip_extensions, unsigned flags) { uint64_t start = getnanotime(); struct hashfile *f; @@ -3045,7 +3051,7 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile, return -1; } - if (!strip_extensions && istate->split_index && + if (strip_extensions == WRITE_ALL_EXTENSIONS && istate->split_index && !is_null_oid(&istate->split_index->base_oid)) { struct strbuf sb = STRBUF_INIT; @@ -3060,7 +3066,7 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile, if (err) return -1; } - if (!strip_extensions && !drop_cache_tree && istate->cache_tree) { + if (strip_extensions != STRIP_ALL_EXTENSIONS && !drop_cache_tree && istate->cache_tree) { struct strbuf sb = STRBUF_INIT; cache_tree_write(&sb, istate->cache_tree); @@ -3070,7 +3076,7 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile, if (err) return -1; } - if (!strip_extensions && istate->resolve_undo) { + if (strip_extensions != STRIP_ALL_EXTENSIONS && istate->resolve_undo) { struct strbuf sb = STRBUF_INIT; resolve_undo_write(&sb, istate->resolve_undo); @@ -3081,7 +3087,7 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile, if (err) return -1; } - if (!strip_extensions && istate->untracked) { + if (strip_extensions != STRIP_ALL_EXTENSIONS && istate->untracked) { struct strbuf sb = STRBUF_INIT; write_untracked_extension(&sb, istate->untracked); @@ -3092,7 +3098,7 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile, if (err) return -1; } - if (!strip_extensions && istate->fsmonitor_last_update) { + if (strip_extensions != STRIP_ALL_EXTENSIONS && istate->fsmonitor_last_update) { struct strbuf sb = STRBUF_INIT; write_fsmonitor_extension(&sb, istate); @@ -3166,8 +3172,14 @@ static int commit_locked_index(struct lock_file *lk) return commit_lock_file(lk); } +/* + * Write the Git index into a `.lock` file + * + * If `strip_link_extension` is non-zero, avoid writing any "link" extension + * (used by the split-index feature). + */ static int do_write_locked_index(struct index_state *istate, struct lock_file *lock, - unsigned flags) + unsigned flags, int strip_link_extension) { int ret; int was_full = istate->sparse_index == INDEX_EXPANDED; @@ -3185,7 +3197,7 @@ static int do_write_locked_index(struct index_state *istate, struct lock_file *l */ trace2_region_enter_printf("index", "do_write_index", the_repository, "%s", get_lock_file_path(lock)); - ret = do_write_index(istate, lock->tempfile, 0, flags); + ret = do_write_index(istate, lock->tempfile, strip_link_extension ? STRIP_LINK_EXTENSION_ONLY : 0, flags); trace2_region_leave_printf("index", "do_write_index", the_repository, "%s", get_lock_file_path(lock)); @@ -3214,7 +3226,7 @@ static int write_split_index(struct index_state *istate, { int ret; prepare_to_write_split_index(istate); - ret = do_write_locked_index(istate, lock, flags); + ret = do_write_locked_index(istate, lock, flags, 0); finish_writing_split_index(istate); return ret; } @@ -3366,9 +3378,7 @@ int write_locked_index(struct index_state *istate, struct lock_file *lock, if ((!si && !test_split_index_env) || alternate_index_output || (istate->cache_changed & ~EXTMASK)) { - if (si) - oidclr(&si->base_oid); - ret = do_write_locked_index(istate, lock, flags); + ret = do_write_locked_index(istate, lock, flags, 1); goto out; } @@ -3394,8 +3404,7 @@ int write_locked_index(struct index_state *istate, struct lock_file *lock, /* Same initial permissions as the main .git/index file */ temp = mks_tempfile_sm(git_path("sharedindex_XXXXXX"), 0, 0666); if (!temp) { - oidclr(&si->base_oid); - ret = do_write_locked_index(istate, lock, flags); + ret = do_write_locked_index(istate, lock, flags, 1); goto out; } ret = write_shared_index(istate, &temp, flags); diff --git a/t/t7527-builtin-fsmonitor.sh b/t/t7527-builtin-fsmonitor.sh index cbafdd69602..9fab9a2ab38 100755 --- a/t/t7527-builtin-fsmonitor.sh +++ b/t/t7527-builtin-fsmonitor.sh @@ -1003,7 +1003,7 @@ test_expect_success !UNICODE_COMPOSITION_SENSITIVE 'Unicode nfc/nfd' ' egrep "^event: nfd/d_${utf8_nfc}/?$" ./unicode.trace ' -test_expect_failure 'split-index and FSMonitor work well together' ' +test_expect_success 'split-index and FSMonitor work well together' ' git init split-index && test_when_finished "git -C \"$PWD/split-index\" \ fsmonitor--daemon stop" &&