Message ID | 20181113153235.25402-1-avarab@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [RFC/PATCH] read-cache: write all indexes with the same permissions | expand |
On Tue, Nov 13, 2018 at 4:32 PM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote: > I won't have time to finish this today, as noted in > https://public-inbox.org/git/874lcl2e9t.fsf@evledraar.gmail.com/ > there's a pretty major bug here in that we're now writing out literal > sharedindex_XXXXXX files. It's not the end of the world because create_tempfile opens with O_EXCL so if two processes try to create it at the same time, one will fail, but no corruption or such. > Obviously that needs to be fixed, and the fix is trivial, I can use > another one of the mks_*() functions with the same mode we use to > create the index. > > But we really ought to have tests for the bug this patch introduces, > and as noted in the E-Mail linked above we don't. > > So hopefully Duy or someone with more knowledge of the split index > will chime in to say what's missing there... I don't have any bright idea how to catch the literal _XXXXX file. It's a temporary file and will not last long enough for us to verify unless we intercept open() calls with LD_PRELOAD.
On Tue, Nov 13 2018, Duy Nguyen wrote: > On Tue, Nov 13, 2018 at 4:32 PM Ævar Arnfjörð Bjarmason > <avarab@gmail.com> wrote: >> I won't have time to finish this today, as noted in >> https://public-inbox.org/git/874lcl2e9t.fsf@evledraar.gmail.com/ >> there's a pretty major bug here in that we're now writing out literal >> sharedindex_XXXXXX files. > > It's not the end of the world because create_tempfile opens with > O_EXCL so if two processes try to create it at the same time, one will > fail, but no corruption or such. Right, no race there. >> Obviously that needs to be fixed, and the fix is trivial, I can use >> another one of the mks_*() functions with the same mode we use to >> create the index. >> >> But we really ought to have tests for the bug this patch introduces, >> and as noted in the E-Mail linked above we don't. >> >> So hopefully Duy or someone with more knowledge of the split index >> will chime in to say what's missing there... > > I don't have any bright idea how to catch the literal _XXXXX file. > It's a temporary file and will not last long enough for us to verify > unless we intercept open() calls with LD_PRELOAD. Sorry for being unclear. I don't mean how can we catch this specific bug, that would be uninteresting and hard to test for. I'm asking whether the bug in this patch isn't revealing an existing issue with us not having any tests for N number of sharedindex.* files. I.e. we have >1 of them, merge them and prune them, don't we?
On Tue, Nov 13, 2018 at 6:34 PM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote: > > On Tue, Nov 13 2018, Duy Nguyen wrote: > > > On Tue, Nov 13, 2018 at 4:32 PM Ævar Arnfjörð Bjarmason > > <avarab@gmail.com> wrote: > > I don't have any bright idea how to catch the literal _XXXXX file. > > It's a temporary file and will not last long enough for us to verify > > unless we intercept open() calls with LD_PRELOAD. > > Sorry for being unclear. I don't mean how can we catch this specific > bug, that would be uninteresting and hard to test for. > > I'm asking whether the bug in this patch isn't revealing an existing > issue with us not having any tests for N number of sharedindex.* > files. I.e. we have >1 of them, merge them and prune them, don't we? I think we shouldn't have many of them. Usually we should have just one, though it's possible that switching the shared index files feature on and off several times or using temporary index files could create more than one. And there is clean_shared_index_files() which calls should_delete_shared_index() to make sure they are regularly cleaned up. Anyway it's a different topic and according to what we privately discussed I just sent https://public-inbox.org/git/20181116173105.21784-1-chriscool@tuxfamily.org/ to fix the current issue.
On Fri, Nov 16, 2018 at 06:41:43PM +0100, Christian Couder wrote: > On Tue, Nov 13, 2018 at 6:34 PM Ævar Arnfjörð Bjarmason > <avarab@gmail.com> wrote: > > I'm asking whether the bug in this patch isn't revealing an existing > > issue with us not having any tests for N number of sharedindex.* > > files. I.e. we have >1 of them, merge them and prune them, don't we? We don't merge shared index files, but write a new one. > I think we shouldn't have many of them. Usually we should have just > one, though it's possible that switching the shared index files > feature on and off several times or using temporary index files could > create more than one. > > And there is clean_shared_index_files() which calls > should_delete_shared_index() to make sure they are regularly cleaned > up. I would think that it's more common to have more than one shared index files, because a new shared index is written when the number of entries in the split index reaches the threshold given in 'splitIndex.maxPercentChange'. The old ones are kept until they expire, and 'splitIndex.sharedIndexExpire' defaults to 2 weeks (and can even be be set to "never"). With the default 20% threshold a new shared index is written rather frequently with our usual small test-repos: $ git init $ git update-index --split-index $ ls -1 .git/*index* .git/index .git/sharedindex.4370042739b31cd17a5c5cd6043a77c9a00df113 $ echo 1 >file $ git add file $ git commit -q -m 1 $ echo 2 >file $ git commit -q -m 2 file $ echo 3 >file $ git commit -q -m 3 file $ ls -1 .git/*index* .git/index .git/sharedindex.4370042739b31cd17a5c5cd6043a77c9a00df113 .git/sharedindex.6aedbf71b1a6bdc0018078ec7571e1b21ba4b954 .git/sharedindex.b9106e9b82a818a0e4e9148224fc44ea98f488a1 .git/sharedindex.bad0b75d270a431b9e961cfc15df6ec935a67be8 > Anyway it's a different topic and according to what we privately > discussed I just sent > https://public-inbox.org/git/20181116173105.21784-1-chriscool@tuxfamily.org/ > to fix the current issue.
On Fri, Nov 16, 2018 at 8:07 PM SZEDER Gábor <szeder.dev@gmail.com> wrote: > > On Fri, Nov 16, 2018 at 06:41:43PM +0100, Christian Couder wrote: > > On Tue, Nov 13, 2018 at 6:34 PM Ævar Arnfjörð Bjarmason > > <avarab@gmail.com> wrote: > > > > I'm asking whether the bug in this patch isn't revealing an existing > > > issue with us not having any tests for N number of sharedindex.* > > > files. I.e. we have >1 of them, merge them and prune them, don't we? > > We don't merge shared index files, but write a new one. True. They are immutable like git objects. > With the default 20% threshold a new shared index is written rather > frequently with our usual small test-repos: Side note. Split index is definitely not meant for small repos. But maybe we should have a lower limit (in terms of absolute number of entries) that prevent splitting. This splitting seems excessive. > $ git init > $ git update-index --split-index > $ ls -1 .git/*index* > .git/index > .git/sharedindex.4370042739b31cd17a5c5cd6043a77c9a00df113 > $ echo 1 >file > $ git add file > $ git commit -q -m 1 > $ echo 2 >file > $ git commit -q -m 2 file > $ echo 3 >file > $ git commit -q -m 3 file > $ ls -1 .git/*index* > .git/index > .git/sharedindex.4370042739b31cd17a5c5cd6043a77c9a00df113 > .git/sharedindex.6aedbf71b1a6bdc0018078ec7571e1b21ba4b954 > .git/sharedindex.b9106e9b82a818a0e4e9148224fc44ea98f488a1 > .git/sharedindex.bad0b75d270a431b9e961cfc15df6ec935a67be8
On Fri, Nov 16, 2018 at 8:20 PM Duy Nguyen <pclouds@gmail.com> wrote: > > On Fri, Nov 16, 2018 at 8:07 PM SZEDER Gábor <szeder.dev@gmail.com> wrote: > > > With the default 20% threshold a new shared index is written rather > > frequently with our usual small test-repos: > > Side note. Split index is definitely not meant for small repos. I very much agree with that. It makes sense to use them only for big repos and big repos usually don't pass a 20% threshold very often. > But > maybe we should have a lower limit (in terms of absolute number of > entries) that prevent splitting. This splitting seems excessive. I would agree if split index was the default mode or if our goal was to eventually make it the default mode. Or it could be a new "mixed" mode for core.splitIndex (which might eventually become the default mode) to have no split-index as long as the repo stays under a lower limit and to automatically use split-index when the repo gets over the limit.
On Sat, Nov 17, 2018 at 07:52:30AM +0100, Christian Couder wrote: > On Fri, Nov 16, 2018 at 8:20 PM Duy Nguyen <pclouds@gmail.com> wrote: > > > > On Fri, Nov 16, 2018 at 8:07 PM SZEDER Gábor <szeder.dev@gmail.com> wrote: > > > > > With the default 20% threshold a new shared index is written rather > > > frequently with our usual small test-repos: > > > > Side note. Split index is definitely not meant for small repos. > > I very much agree with that. It makes sense to use them only for big > repos and big repos usually don't pass a 20% threshold very often. But our test suite does use very small repositories, so perhaps we have been already testing what Ævar wanted to test? (Though I didn't quite understood what that was; and we likely don't do so explicitly, but only by chance with GIT_TEST_SPLIT_INDEX=1.) > > But > > maybe we should have a lower limit (in terms of absolute number of > > entries) that prevent splitting. This splitting seems excessive. > > I would agree if split index was the default mode or if our goal was > to eventually make it the default mode. Same here. If you don't need split index, i.e. don't have huge repos, then don't enable it in the first place. And if it is enabled in a small repo, then the extra effort to write a new shared index is negligible, and the space wasted for those small files doesn't really matter (though arguably the output from a 'ls .git' would be surprising... which, at the same time, would be a good motivating factor to turn the feature off).
diff --git a/read-cache.c b/read-cache.c index f3a848d61c..7135537554 100644 --- a/read-cache.c +++ b/read-cache.c @@ -3074,11 +3074,6 @@ static int write_shared_index(struct index_state *istate, ret = do_write_index(si->base, *temp, 1); if (ret) return ret; - ret = adjust_shared_perm(get_tempfile_path(*temp)); - if (ret) { - error("cannot fix permission bits on %s", get_tempfile_path(*temp)); - return ret; - } ret = rename_tempfile(temp, git_path("sharedindex.%s", oid_to_hex(&si->base->oid))); if (!ret) { @@ -3159,7 +3154,7 @@ int write_locked_index(struct index_state *istate, struct lock_file *lock, struct tempfile *temp; int saved_errno; - temp = mks_tempfile(git_path("sharedindex_XXXXXX")); + temp = create_tempfile(git_path("sharedindex_XXXXXX")); if (!temp) { oidclr(&si->base_oid); ret = do_write_locked_index(istate, lock, flags); diff --git a/t/t1700-split-index.sh b/t/t1700-split-index.sh index 2ac47aa0e4..fa1d3d468b 100755 --- a/t/t1700-split-index.sh +++ b/t/t1700-split-index.sh @@ -381,6 +381,26 @@ test_expect_success 'check splitIndex.sharedIndexExpire set to "never" and "now" test $(ls .git/sharedindex.* | wc -l) -le 2 ' +test_expect_success POSIXPERM 'same mode for index & split index' ' + git init same-mode && + ( + cd same-mode && + test_commit A && + test_modebits .git/index >index_mode && + test_must_fail git config core.sharedRepository && + git -c core.splitIndex=true status && + shared=$(ls .git/sharedindex.*) && + case "$shared" in + *" "*) + # we have more than one??? + false ;; + *) + test_modebits "$shared" >split_index_mode && + test_cmp index_mode split_index_mode ;; + esac + ) +' + while read -r mode modebits do test_expect_success POSIXPERM "split index respects core.sharedrepository $mode" '
Change the code that writes out the shared index to use create_tempfile() instead of mks_tempfile(); The create_tempfile() function is used to write out the main .git/index (via .git/index.lock) using lock_file(). The create_tempfile() function respects the umask, whereas the mks_tempfile() function will create files with 0600 permissions. A bug related to this was spotted, fixed and tested for in df801f3f9f ("read-cache: use shared perms when writing shared index", 2017-06-25) and 3ee83f48e5 ("t1700: make sure split-index respects core.sharedrepository", 2017-06-25). However, as noted in those commits we'd still create the file as 0600, and would just re-chmod it depending on the setting of core.sharedRepository. So without core.splitIndex a system with e.g. the umask set to group writeability would work, but not with core.splitIndex set. Let's instead make the two consistent by using create_tempfile(). This allows us to remove the code added in df801f3f9f (subsequently modified in 59f9d2dd60 ("read-cache.c: move tempfile creation/cleanup out of write_shared_index", 2018-01-14)) as redundant. The create_tempfile() function itself calls adjust_shared_perm(). Now we're not leaking the implementation detail that we're using a mkstemp()-like API for something that's not really a mkstemp() use-case. See c18b80a0e8 ("update-index: new options to enable/disable split index mode", 2014-06-13) for the initial implementation which used mkstemp() without a wrapper. One thing I was paranoid about when making this change was not introducing a race condition where with e.g. core.sharedRepository=0600 we'd do something different for "index" v.s. "sharedindex.*", as the former has a *.lock file, not the latter. But I'm confident that we're exposing no such edge-case. With a user umask of e.g. 0022 and core.sharedRepository=0600 we initially create both "index' and "sharedindex.*" files that are globally readable, but re-chmod them while they're still empty. Ideally we'd split up the adjust_shared_perm() function to one that can give us the mode we want so we could just call open() instead of open() followed by chmod(), but that's an unrelated cleanup. We already have that minor issue with the "index" file #leftoverbits. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> --- I won't have time to finish this today, as noted in https://public-inbox.org/git/874lcl2e9t.fsf@evledraar.gmail.com/ there's a pretty major bug here in that we're now writing out literal sharedindex_XXXXXX files. Obviously that needs to be fixed, and the fix is trivial, I can use another one of the mks_*() functions with the same mode we use to create the index. But we really ought to have tests for the bug this patch introduces, and as noted in the E-Mail linked above we don't. So hopefully Duy or someone with more knowledge of the split index will chime in to say what's missing there... read-cache.c | 7 +------ t/t1700-split-index.sh | 20 ++++++++++++++++++++ 2 files changed, 21 insertions(+), 6 deletions(-)