diff mbox series

[RFC/PATCH] read-cache: write all indexes with the same permissions

Message ID 20181113153235.25402-1-avarab@gmail.com (mailing list archive)
State New, archived
Headers show
Series [RFC/PATCH] read-cache: write all indexes with the same permissions | expand

Commit Message

Ævar Arnfjörð Bjarmason Nov. 13, 2018, 3:32 p.m. UTC
Change the code that writes out the shared index to use
create_tempfile() instead of mks_tempfile();

The create_tempfile() function is used to write out the main
.git/index (via .git/index.lock) using lock_file(). The
create_tempfile() function respects the umask, whereas the
mks_tempfile() function will create files with 0600 permissions.

A bug related to this was spotted, fixed and tested for in
df801f3f9f ("read-cache: use shared perms when writing shared index",
2017-06-25) and 3ee83f48e5 ("t1700: make sure split-index respects
core.sharedrepository", 2017-06-25).

However, as noted in those commits we'd still create the file as 0600,
and would just re-chmod it depending on the setting of
core.sharedRepository. So without core.splitIndex a system with
e.g. the umask set to group writeability would work, but not with
core.splitIndex set.

Let's instead make the two consistent by using create_tempfile(). This
allows us to remove the code added in df801f3f9f (subsequently
modified in 59f9d2dd60 ("read-cache.c: move tempfile creation/cleanup
out of write_shared_index", 2018-01-14)) as redundant. The
create_tempfile() function itself calls adjust_shared_perm().

Now we're not leaking the implementation detail that we're using a
mkstemp()-like API for something that's not really a mkstemp()
use-case. See c18b80a0e8 ("update-index: new options to enable/disable
split index mode", 2014-06-13) for the initial implementation which
used mkstemp() without a wrapper.

One thing I was paranoid about when making this change was not
introducing a race condition where with
e.g. core.sharedRepository=0600 we'd do something different for
"index" v.s. "sharedindex.*", as the former has a *.lock file, not the
latter.

But I'm confident that we're exposing no such edge-case. With a user
umask of e.g. 0022 and core.sharedRepository=0600 we initially create
both "index' and "sharedindex.*" files that are globally readable, but
re-chmod them while they're still empty.

Ideally we'd split up the adjust_shared_perm() function to one that
can give us the mode we want so we could just call open() instead of
open() followed by chmod(), but that's an unrelated cleanup. We
already have that minor issue with the "index" file #leftoverbits.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---

I won't have time to finish this today, as noted in
https://public-inbox.org/git/874lcl2e9t.fsf@evledraar.gmail.com/
there's a pretty major bug here in that we're now writing out literal
sharedindex_XXXXXX files.

Obviously that needs to be fixed, and the fix is trivial, I can use
another one of the mks_*() functions with the same mode we use to
create the index.

But we really ought to have tests for the bug this patch introduces,
and as noted in the E-Mail linked above we don't.

So hopefully Duy or someone with more knowledge of the split index
will chime in to say what's missing there...

 read-cache.c           |  7 +------
 t/t1700-split-index.sh | 20 ++++++++++++++++++++
 2 files changed, 21 insertions(+), 6 deletions(-)

Comments

Duy Nguyen Nov. 13, 2018, 4:55 p.m. UTC | #1
On Tue, Nov 13, 2018 at 4:32 PM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
> I won't have time to finish this today, as noted in
> https://public-inbox.org/git/874lcl2e9t.fsf@evledraar.gmail.com/
> there's a pretty major bug here in that we're now writing out literal
> sharedindex_XXXXXX files.

It's not the end of the world because create_tempfile opens with
O_EXCL so if two processes try to create it at the same time, one will
fail, but no corruption or such.

> Obviously that needs to be fixed, and the fix is trivial, I can use
> another one of the mks_*() functions with the same mode we use to
> create the index.
>
> But we really ought to have tests for the bug this patch introduces,
> and as noted in the E-Mail linked above we don't.
>
> So hopefully Duy or someone with more knowledge of the split index
> will chime in to say what's missing there...

I don't have any bright idea how to catch the literal _XXXXX file.
It's a temporary file and will not last long enough for us to verify
unless we intercept open() calls with LD_PRELOAD.
Ævar Arnfjörð Bjarmason Nov. 13, 2018, 5:34 p.m. UTC | #2
On Tue, Nov 13 2018, Duy Nguyen wrote:

> On Tue, Nov 13, 2018 at 4:32 PM Ævar Arnfjörð Bjarmason
> <avarab@gmail.com> wrote:
>> I won't have time to finish this today, as noted in
>> https://public-inbox.org/git/874lcl2e9t.fsf@evledraar.gmail.com/
>> there's a pretty major bug here in that we're now writing out literal
>> sharedindex_XXXXXX files.
>
> It's not the end of the world because create_tempfile opens with
> O_EXCL so if two processes try to create it at the same time, one will
> fail, but no corruption or such.

Right, no race there.

>> Obviously that needs to be fixed, and the fix is trivial, I can use
>> another one of the mks_*() functions with the same mode we use to
>> create the index.
>>
>> But we really ought to have tests for the bug this patch introduces,
>> and as noted in the E-Mail linked above we don't.
>>
>> So hopefully Duy or someone with more knowledge of the split index
>> will chime in to say what's missing there...
>
> I don't have any bright idea how to catch the literal _XXXXX file.
> It's a temporary file and will not last long enough for us to verify
> unless we intercept open() calls with LD_PRELOAD.

Sorry for being unclear. I don't mean how can we catch this specific
bug, that would be uninteresting and hard to test for.

I'm asking whether the bug in this patch isn't revealing an existing
issue with us not having any tests for N number of sharedindex.*
files. I.e. we have >1 of them, merge them and prune them, don't we?
Christian Couder Nov. 16, 2018, 5:41 p.m. UTC | #3
On Tue, Nov 13, 2018 at 6:34 PM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
> On Tue, Nov 13 2018, Duy Nguyen wrote:
>
> > On Tue, Nov 13, 2018 at 4:32 PM Ævar Arnfjörð Bjarmason
> > <avarab@gmail.com> wrote:
> > I don't have any bright idea how to catch the literal _XXXXX file.
> > It's a temporary file and will not last long enough for us to verify
> > unless we intercept open() calls with LD_PRELOAD.
>
> Sorry for being unclear. I don't mean how can we catch this specific
> bug, that would be uninteresting and hard to test for.
>
> I'm asking whether the bug in this patch isn't revealing an existing
> issue with us not having any tests for N number of sharedindex.*
> files. I.e. we have >1 of them, merge them and prune them, don't we?

I think we shouldn't have many of them. Usually we should have just
one, though it's possible that switching the shared index files
feature on and off several times or using temporary index files could
create more than one.

And there is clean_shared_index_files() which calls
should_delete_shared_index() to make sure they are regularly cleaned
up.

Anyway it's a different topic and according to what we privately
discussed I just sent
https://public-inbox.org/git/20181116173105.21784-1-chriscool@tuxfamily.org/
to fix the current issue.
SZEDER Gábor Nov. 16, 2018, 7:07 p.m. UTC | #4
On Fri, Nov 16, 2018 at 06:41:43PM +0100, Christian Couder wrote:
> On Tue, Nov 13, 2018 at 6:34 PM Ævar Arnfjörð Bjarmason
> <avarab@gmail.com> wrote:

> > I'm asking whether the bug in this patch isn't revealing an existing
> > issue with us not having any tests for N number of sharedindex.*
> > files. I.e. we have >1 of them, merge them and prune them, don't we?

We don't merge shared index files, but write a new one.

> I think we shouldn't have many of them. Usually we should have just
> one, though it's possible that switching the shared index files
> feature on and off several times or using temporary index files could
> create more than one.
>
> And there is clean_shared_index_files() which calls
> should_delete_shared_index() to make sure they are regularly cleaned
> up.

I would think that it's more common to have more than one shared index
files, because a new shared index is written when the number of
entries in the split index reaches the threshold given in
'splitIndex.maxPercentChange'.  The old ones are kept until they
expire, and 'splitIndex.sharedIndexExpire' defaults to 2 weeks (and
can even be be set to "never").

With the default 20% threshold a new shared index is written rather
frequently with our usual small test-repos:

  $ git init
  $ git update-index --split-index
  $ ls -1 .git/*index*
  .git/index
  .git/sharedindex.4370042739b31cd17a5c5cd6043a77c9a00df113
  $ echo 1 >file
  $ git add file
  $ git commit -q -m 1
  $ echo 2 >file
  $ git commit -q -m 2 file
  $ echo 3 >file
  $ git commit -q -m 3 file
  $ ls -1 .git/*index*
  .git/index
  .git/sharedindex.4370042739b31cd17a5c5cd6043a77c9a00df113
  .git/sharedindex.6aedbf71b1a6bdc0018078ec7571e1b21ba4b954
  .git/sharedindex.b9106e9b82a818a0e4e9148224fc44ea98f488a1
  .git/sharedindex.bad0b75d270a431b9e961cfc15df6ec935a67be8

> Anyway it's a different topic and according to what we privately
> discussed I just sent
> https://public-inbox.org/git/20181116173105.21784-1-chriscool@tuxfamily.org/
> to fix the current issue.
Duy Nguyen Nov. 16, 2018, 7:19 p.m. UTC | #5
On Fri, Nov 16, 2018 at 8:07 PM SZEDER Gábor <szeder.dev@gmail.com> wrote:
>
> On Fri, Nov 16, 2018 at 06:41:43PM +0100, Christian Couder wrote:
> > On Tue, Nov 13, 2018 at 6:34 PM Ævar Arnfjörð Bjarmason
> > <avarab@gmail.com> wrote:
>
> > > I'm asking whether the bug in this patch isn't revealing an existing
> > > issue with us not having any tests for N number of sharedindex.*
> > > files. I.e. we have >1 of them, merge them and prune them, don't we?
>
> We don't merge shared index files, but write a new one.

True. They are immutable like git objects.

> With the default 20% threshold a new shared index is written rather
> frequently with our usual small test-repos:

Side note. Split index is definitely not meant for small repos. But
maybe we should have a lower limit (in terms of absolute number of
entries) that prevent splitting. This splitting seems excessive.

>   $ git init
>   $ git update-index --split-index
>   $ ls -1 .git/*index*
>   .git/index
>   .git/sharedindex.4370042739b31cd17a5c5cd6043a77c9a00df113
>   $ echo 1 >file
>   $ git add file
>   $ git commit -q -m 1
>   $ echo 2 >file
>   $ git commit -q -m 2 file
>   $ echo 3 >file
>   $ git commit -q -m 3 file
>   $ ls -1 .git/*index*
>   .git/index
>   .git/sharedindex.4370042739b31cd17a5c5cd6043a77c9a00df113
>   .git/sharedindex.6aedbf71b1a6bdc0018078ec7571e1b21ba4b954
>   .git/sharedindex.b9106e9b82a818a0e4e9148224fc44ea98f488a1
>   .git/sharedindex.bad0b75d270a431b9e961cfc15df6ec935a67be8
Christian Couder Nov. 17, 2018, 6:52 a.m. UTC | #6
On Fri, Nov 16, 2018 at 8:20 PM Duy Nguyen <pclouds@gmail.com> wrote:
>
> On Fri, Nov 16, 2018 at 8:07 PM SZEDER Gábor <szeder.dev@gmail.com> wrote:
>
> > With the default 20% threshold a new shared index is written rather
> > frequently with our usual small test-repos:
>
> Side note. Split index is definitely not meant for small repos.

I very much agree with that. It makes sense to use them only for big
repos and big repos usually don't pass a 20% threshold very often.

> But
> maybe we should have a lower limit (in terms of absolute number of
> entries) that prevent splitting. This splitting seems excessive.

I would agree if split index was the default mode or if our goal was
to eventually make it the default mode.

Or it could be a new "mixed" mode for core.splitIndex (which might
eventually become the default mode) to have no split-index as long as
the repo stays under a lower limit and to automatically use
split-index when the repo gets over the limit.
SZEDER Gábor Nov. 17, 2018, 12:38 p.m. UTC | #7
On Sat, Nov 17, 2018 at 07:52:30AM +0100, Christian Couder wrote:
> On Fri, Nov 16, 2018 at 8:20 PM Duy Nguyen <pclouds@gmail.com> wrote:
> >
> > On Fri, Nov 16, 2018 at 8:07 PM SZEDER Gábor <szeder.dev@gmail.com> wrote:
> >
> > > With the default 20% threshold a new shared index is written rather
> > > frequently with our usual small test-repos:
> >
> > Side note. Split index is definitely not meant for small repos.
> 
> I very much agree with that. It makes sense to use them only for big
> repos and big repos usually don't pass a 20% threshold very often.

But our test suite does use very small repositories, so perhaps we
have been already testing what Ævar wanted to test?  (Though I didn't
quite understood what that was; and we likely don't do so explicitly,
but only by chance with GIT_TEST_SPLIT_INDEX=1.)

> > But
> > maybe we should have a lower limit (in terms of absolute number of
> > entries) that prevent splitting. This splitting seems excessive.
> 
> I would agree if split index was the default mode or if our goal was
> to eventually make it the default mode.

Same here.  If you don't need split index, i.e. don't have huge repos,
then don't enable it in the first place.  And if it is enabled in a
small repo, then the extra effort to write a new shared index is
negligible, and the space wasted for those small files doesn't really
matter (though arguably the output from a 'ls .git' would be
surprising...  which, at the same time, would be a good motivating
factor to turn the feature off).
diff mbox series

Patch

diff --git a/read-cache.c b/read-cache.c
index f3a848d61c..7135537554 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -3074,11 +3074,6 @@  static int write_shared_index(struct index_state *istate,
 	ret = do_write_index(si->base, *temp, 1);
 	if (ret)
 		return ret;
-	ret = adjust_shared_perm(get_tempfile_path(*temp));
-	if (ret) {
-		error("cannot fix permission bits on %s", get_tempfile_path(*temp));
-		return ret;
-	}
 	ret = rename_tempfile(temp,
 			      git_path("sharedindex.%s", oid_to_hex(&si->base->oid)));
 	if (!ret) {
@@ -3159,7 +3154,7 @@  int write_locked_index(struct index_state *istate, struct lock_file *lock,
 		struct tempfile *temp;
 		int saved_errno;
 
-		temp = mks_tempfile(git_path("sharedindex_XXXXXX"));
+		temp = create_tempfile(git_path("sharedindex_XXXXXX"));
 		if (!temp) {
 			oidclr(&si->base_oid);
 			ret = do_write_locked_index(istate, lock, flags);
diff --git a/t/t1700-split-index.sh b/t/t1700-split-index.sh
index 2ac47aa0e4..fa1d3d468b 100755
--- a/t/t1700-split-index.sh
+++ b/t/t1700-split-index.sh
@@ -381,6 +381,26 @@  test_expect_success 'check splitIndex.sharedIndexExpire set to "never" and "now"
 	test $(ls .git/sharedindex.* | wc -l) -le 2
 '
 
+test_expect_success POSIXPERM 'same mode for index & split index' '
+	git init same-mode &&
+	(
+		cd same-mode &&
+		test_commit A &&
+		test_modebits .git/index >index_mode &&
+		test_must_fail git config core.sharedRepository &&
+		git -c core.splitIndex=true status &&
+		shared=$(ls .git/sharedindex.*) &&
+		case "$shared" in
+		*" "*)
+			# we have more than one???
+			false ;;
+		*)
+			test_modebits "$shared" >split_index_mode &&
+			test_cmp index_mode split_index_mode ;;
+		esac
+	)
+'
+
 while read -r mode modebits
 do
 	test_expect_success POSIXPERM "split index respects core.sharedrepository $mode" '