[RFC] pack-refs: fail on falsely sorted packed-refs

Message ID	20190130231359.23978-1-max@max630.net (mailing list archive)
State	New, archived
Headers	show Return-Path: <git-owner@kernel.org> From: Max Kirillov <max@max630.net> To: Michael Haggerty <mhagger@alum.mit.edu> Cc: Max Kirillov <max@max630.net>, git@vger.kernel.org Subject: [RFC PATCH] pack-refs: fail on falsely sorted packed-refs Date: Thu, 31 Jan 2019 01:13:59 +0200 Message-Id: <20190130231359.23978-1-max@max630.net> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: git-owner@vger.kernel.org Precedence: bulk
Series	[RFC] pack-refs: fail on falsely sorted packed-refs \| expand [RFC] pack-refs: fail on falsely sorted packed-refs

Max Kirillov Jan. 30, 2019, 11:13 p.m. UTC

If packed-refs is marked as sorted but not really sorted it causes
very hard to comprehend misbehavior of reference resolving - a reference
is reported as not found.

As the scope of the issue is not clear, make it visible by failing
pack-refs command - the one which would not suffer performance penalty
to verify the sortedness - when it encounters not really sorted existing
data.

Signed-off-by: Max Kirillov <max@max630.net>
---
I happened to have a not really sorted packed-refs file. As you might guess,
it was quite wtf-ing experience. It worked, mostly, but there was one branch
which just did not resolve, regardless of existing and being presented in
for-each-refs output.

I don't know where the corruption came from. I should admit it could even be a manual
editing but last time I did it (in that reporitory) was several years ago so it is unlikely.

I am not sure what should be the proper fix. I did a minimal detection, so that
it does not go unnoticed. Probably next step would be either fixing in `git fsck` call.

 refs/packed-backend.c               | 15 +++++++++++++++
 t/t3212-pack-refs-broken-sorting.sh | 26 ++++++++++++++++++++++++++
 2 files changed, 41 insertions(+)
 create mode 100755 t/t3212-pack-refs-broken-sorting.sh

Eric Sunshine Jan. 30, 2019, 11:31 p.m. UTC | #1

On Wed, Jan 30, 2019 at 6:21 PM Max Kirillov <max@max630.net> wrote:
> If packed-refs is marked as sorted but not really sorted it causes
> very hard to comprehend misbehavior of reference resolving - a reference
> is reported as not found.
>
> As the scope of the issue is not clear, make it visible by failing
> pack-refs command - the one which would not suffer performance penalty
> to verify the sortedness - when it encounters not really sorted existing
> data.
>
> Signed-off-by: Max Kirillov <max@max630.net>
> ---
> diff --git a/refs/packed-backend.c b/refs/packed-backend.c
> @@ -1088,6 +1088,7 @@ static int write_with_updates(struct packed_ref_store *refs,
> +       struct strbuf prev_ref = STRBUF_INIT;
> @@ -1137,6 +1138,20 @@ static int write_with_updates(struct packed_ref_store *refs,
> +               if (iter)
> +               {
> +                       if (prev_ref.len &&  strcmp(prev_ref.buf, iter->refname) > 0)
> +                       {
> +                               strbuf_addf(err, "broken sorting in packed-refs: '%s' > '%s'",
> +                                           prev_ref.buf,
> +                                           iter->refname);

strbuf_release(&prev_ref) either here or after the "error" label.

> +                               goto error;
> +                       }
> +
> +                       strbuf_init(&prev_ref, 0);
> +                       strbuf_addstr(&prev_ref, iter->refname);
> +               }
> diff --git a/t/t3212-pack-refs-broken-sorting.sh b/t/t3212-pack-refs-broken-sorting.sh
> @@ -0,0 +1,26 @@
> +test_expect_success 'setup' '
> +       git commit --allow-empty -m commit &&
> +       for num in $(test_seq 10)
> +       do
> +               git branch b$(printf "%02d" $num) || break

This should probably be "|| return 1" rather than "|| break" in order
to fail the test immediately.

> +       done &&
> +       git pack-refs --all &&
> +       head_object=$(git rev-parse HEAD) &&
> +       printf "$head_object refs/heads/b00\\n" >>.git/packed-refs &&
> +       git branch b11
> +'
> +
> +test_expect_success 'off-order branch not found' '
> +       ! git show-ref --verify --quiet refs/heads/b00
> +'

Use test_must_fail() rather than '!' when expecting a Git command to fail.

> +test_expect_success 'subsequent pack-refs fails' '
> +       ! git pack-refs --all
> +'

Ditto.

Max Kirillov Jan. 31, 2019, 8:21 a.m. UTC | #2

On Wed, Jan 30, 2019 at 06:31:34PM -0500, Eric Sunshine wrote:
> On Wed, Jan 30, 2019 at 6:21 PM Max Kirillov <max@max630.net> wrote:
>> +                               strbuf_addf(err, "broken sorting in packed-refs: '%s' > '%s'",
>> +                                           prev_ref.buf,
>> +                                           iter->refname);

> strbuf_release(&prev_ref) either here or after the "error" label.

Thanks! I seem to forget about it.

> > +               git branch b$(printf "%02d" $num) || break

> This should probably be "|| return 1" rather than "|| break" in order
> to fail the test immediately.

I've been looking for the correct way, and have seen the
break somewhere. Now I see the "return 1" is mostly user.
Thanks, will fix.

> Use test_must_fail() rather than '!' when expecting a Git command to fail.

Will fix in both places

Ævar Arnfjörð Bjarmason Feb. 13, 2019, 10:08 a.m. UTC | #3

On Thu, Jan 31 2019, Max Kirillov wrote:

> If packed-refs is marked as sorted but not really sorted it causes
> very hard to comprehend misbehavior of reference resolving - a reference
> is reported as not found.
>
> As the scope of the issue is not clear, make it visible by failing
> pack-refs command - the one which would not suffer performance penalty
> to verify the sortedness - when it encounters not really sorted existing
> data.
>
> Signed-off-by: Max Kirillov <max@max630.net>
> ---
> I happened to have a not really sorted packed-refs file. As you might guess,
> it was quite wtf-ing experience. It worked, mostly, but there was one branch
> which just did not resolve, regardless of existing and being presented in
> for-each-refs output.
>
> I don't know where the corruption came from. I should admit it could even be a manual
> editing but last time I did it (in that reporitory) was several years ago so it is unlikely.
>
> I am not sure what should be the proper fix. I did a minimal detection, so that
> it does not go unnoticed. Probably next step would be either fixing in `git fsck` call.
>
>  refs/packed-backend.c               | 15 +++++++++++++++
>  t/t3212-pack-refs-broken-sorting.sh | 26 ++++++++++++++++++++++++++
>  2 files changed, 41 insertions(+)
>  create mode 100755 t/t3212-pack-refs-broken-sorting.sh

This is not an area I'm very familiar with. So mostly commeting on
cosmetic issues with the patch. FWIW the "years back" issue you had
could be that an issue didn't manifest until now, i.e. in a sorted file
format you can get lucky and not see corruption for a while with a
random insert.

> diff --git a/refs/packed-backend.c b/refs/packed-backend.c
> index c01c7f5901..505f4535b5 100644
> --- a/refs/packed-backend.c
> +++ b/refs/packed-backend.c
> @@ -1088,6 +1088,7 @@ static int write_with_updates(struct packed_ref_store *refs,
>  	FILE *out;
>  	struct strbuf sb = STRBUF_INIT;
>  	char *packed_refs_path;
> +	struct strbuf prev_ref = STRBUF_INIT;
>
>  	if (!is_lock_file_locked(&refs->lock))
>  		BUG("write_with_updates() called while unlocked");
> @@ -1137,6 +1138,20 @@ static int write_with_updates(struct packed_ref_store *refs,
>  		struct ref_update *update = NULL;
>  		int cmp;
>
> +		if (iter)
> +		{
> +			if (prev_ref.len &&  strcmp(prev_ref.buf, iter->refname) > 0)

You have an extra two whitespaces after "&&" there.

> +			{
> +				strbuf_addf(err, "broken sorting in packed-refs: '%s' > '%s'",
> +					    prev_ref.buf,
> +					    iter->refname);
> +				goto error;
> +			}
> +
> +			strbuf_init(&prev_ref, 0);
> +			strbuf_addstr(&prev_ref, iter->refname);
> +		}
> +
>  		if (i >= updates->nr) {
>  			cmp = -1;
>  		} else {
> diff --git a/t/t3212-pack-refs-broken-sorting.sh b/t/t3212-pack-refs-broken-sorting.sh
> new file mode 100755
> index 0000000000..37a98a6fb1
> --- /dev/null
> +++ b/t/t3212-pack-refs-broken-sorting.sh
> @@ -0,0 +1,26 @@
> +#!/bin/sh
> +
> +test_description='tests for the falsely sorted refs'
> +. ./test-lib.sh
> +
> +test_expect_success 'setup' '
> +	git commit --allow-empty -m commit &&

Looks like just "test_commit A" would do here.

> +	for num in $(test_seq 10)
> +	do
> +		git branch b$(printf "%02d" $num) || break
> +	done &&

We can fail in these sorts of loops. There's a few ways to deal with
that. Doing it like this with "break" will still silently hide errors:

    $ for i in $(seq 1 3); do if test $i = 2; then false || break; else echo $i; fi; done && echo success
    1
    success

One way to deal with that is to e.g. before the loop say "had_fail=",
then set "had_fail=t" in that "||" case, and test for it after the loop.

But perhaps in this case we're better off e.g. running for-each-ref
after and either using test_cmp or test_line_count to see that we
created the refs successfully?

> +	git pack-refs --all &&
> +	head_object=$(git rev-parse HEAD) &&
> +	printf "$head_object refs/heads/b00\\n" >>.git/packed-refs &&

Looks like just "echo" here would be simpler since we only use printf to
add a newline.

> +	git branch b11
> +'
> +
> +test_expect_success 'off-order branch not found' '
> +	! git show-ref --verify --quiet refs/heads/b00
> +'
> +
> +test_expect_success 'subsequent pack-refs fails' '
> +	! git pack-refs --all
> +'

Instead of "! git ..." use "test_must_fail git ...". See t/README. This
will hide e.g. segfaults.

Also, perhaps:

    test_must_fail git ... 2>stderr &&
    grep "broken sorting in packed-refs" stderr

Would make this more obvious/self-documenting so we know we failed due
to that issue in particular.

SZEDER Gábor Feb. 13, 2019, 10:56 a.m. UTC | #4

On Wed, Feb 13, 2019 at 11:08:01AM +0100, Ævar Arnfjörð Bjarmason wrote:
> 
> On Thu, Jan 31 2019, Max Kirillov wrote:

> >  refs/packed-backend.c               | 15 +++++++++++++++
> >  t/t3212-pack-refs-broken-sorting.sh | 26 ++++++++++++++++++++++++++
> >  2 files changed, 41 insertions(+)
> >  create mode 100755 t/t3212-pack-refs-broken-sorting.sh
> 
> This is not an area I'm very familiar with. So mostly commeting on
> cosmetic issues with the patch. 

Just two quick comments in addition to Ævar's:

> > @@ -1137,6 +1138,20 @@ static int write_with_updates(struct packed_ref_store *refs,
> >  		struct ref_update *update = NULL;
> >  		int cmp;
> >
> > +		if (iter)
> > +		{

According to our CodingGuidelines, the opening bracket should go on
the same line as the condition, i.e.

  if (iter) {

> > diff --git a/t/t3212-pack-refs-broken-sorting.sh b/t/t3212-pack-refs-broken-sorting.sh
> > new file mode 100755
> > index 0000000000..37a98a6fb1
> > --- /dev/null
> > +++ b/t/t3212-pack-refs-broken-sorting.sh
> > @@ -0,0 +1,26 @@
> > +#!/bin/sh
> > +
> > +test_description='tests for the falsely sorted refs'
> > +. ./test-lib.sh
> > +
> > +test_expect_success 'setup' '
> > +	git commit --allow-empty -m commit &&
> 
> Looks like just "test_commit A" would do here.
> 
> > +	for num in $(test_seq 10)
> > +	do
> > +		git branch b$(printf "%02d" $num) || break
> > +	done &&
> 
> We can fail in these sorts of loops. There's a few ways to deal with
> that. Doing it like this with "break" will still silently hide errors:
> 
>     $ for i in $(seq 1 3); do if test $i = 2; then false || break; else echo $i; fi; done && echo success
>     1
>     success
> 
> One way to deal with that is to e.g. before the loop say "had_fail=",
> then set "had_fail=t" in that "||" case, and test for it after the loop.

No, you can simply do 'cmd1 && cmd2 || return 1' in the body of the
for loop; that's why we have a separate test_eval_inner() helper
function in test-lib.

Jeff King Feb. 14, 2019, 6:06 a.m. UTC | #5

On Wed, Feb 13, 2019 at 11:08:01AM +0100, Ævar Arnfjörð Bjarmason wrote:

> > I happened to have a not really sorted packed-refs file. As you might guess,
> > it was quite wtf-ing experience. It worked, mostly, but there was one branch
> > which just did not resolve, regardless of existing and being presented in
> > for-each-refs output.
> >
> > I don't know where the corruption came from. I should admit it could even be a manual
> > editing but last time I did it (in that reporitory) was several years ago so it is unlikely.
> >
> > I am not sure what should be the proper fix. I did a minimal detection, so that
> > it does not go unnoticed. Probably next step would be either fixing in `git fsck` call.
> >
> >  refs/packed-backend.c               | 15 +++++++++++++++
> >  t/t3212-pack-refs-broken-sorting.sh | 26 ++++++++++++++++++++++++++
> >  2 files changed, 41 insertions(+)
> >  create mode 100755 t/t3212-pack-refs-broken-sorting.sh
> 
> This is not an area I'm very familiar with. So mostly commeting on
> cosmetic issues with the patch. FWIW the "years back" issue you had
> could be that an issue didn't manifest until now, i.e. in a sorted file
> format you can get lucky and not see corruption for a while with a
> random insert.

It actually shouldn't be that old a breakage. Until 02b920f3f7
(read_packed_refs(): ensure that references are ordered when read,
2017-09-25), we did not assume the file was sorted (even though we
always wrote it out sorted). And we continue to not assume the file is
sorted unless it is written out with an explicit "sorted" trait in the
header (which we started doing in that commit, too).

So a years-old manual edit would not have the "sorted" trait, and should
not have manifested as a problem, even now.  Likewise for a years-old
bug. It would have to be a bug in a _new_ writer which writes out the
sorted trait.  If there is such a bug in our implementation, this would
be the first report we've seen. Given the number of times pack-refs has
been run, without further evidence I'm inclined to think it was some
weird manual edit, or maybe an alternate implementation (though one
would _hope_ they would not write out the sorted trait without actually
sorting!).

I agree with all of the cosmetic issues you mentioned. As far as what
the patch itself does, I think it's OK. We could probably go further and
actually sort it (or even just write it out without a "sorted" trait,
which means the next read would load it all into memory and sort it).
That's a little friendlier, since just dying leaves the user to fix it
up themselves. But given that we expect this code to trigger
approximately never, it's probably not worth spending much time on a
fancy solution.

-Peff

Max Kirillov Feb. 23, 2019, 7:09 a.m. UTC | #6

On Wed, Feb 13, 2019 at 11:08:01AM +0100, Ævar Arnfjörð Bjarmason wrote:
> You have an extra two whitespaces after "&&" there.

Thanks, will check it.

>> +	git commit --allow-empty -m commit &&
> Looks like just "test_commit A" would do here.

About this I'm not sure. AFAIK test_commit does lots of stuff,
so can it be considered "just" compared to "commit
--allow-empty" or the opposite? I could replace it with
test_commit for uniformity reason though.

> We can fail in these sorts of loops. There's a few ways to deal with
> that. Doing it like this with "break" will still silently hide errors:

Thanks, this was pointed point

>> +	printf "$head_object refs/heads/b00\\n" >>.git/packed-refs &&
> 
> Looks like just "echo" here would be simpler since we only use printf to
> add a newline.

Could it happen so that "echo" adds '\r\n' at Windows? I
could use echo.

> Instead of "! git ..." use "test_must_fail git ...". See t/README. This
> will hide e.g. segfaults.

Thanks, this was pointed point

> Also, perhaps:
> 
>     test_must_fail git ... 2>stderr &&
>     grep "broken sorting in packed-refs" stderr
> 
> Would make this more obvious/self-documenting so we know we failed due
> to that issue in particular.

Thanks, will change it

Max Kirillov Feb. 23, 2019, 7:10 a.m. UTC | #7

On Wed, Feb 13, 2019 at 11:56:16AM +0100, SZEDER Gábor wrote:
>>> +		if (iter)
>>> +		{
> 
> According to our CodingGuidelines, the opening bracket should go on
> the same line as the condition, i.e.
> 
>   if (iter) {

Oh, thanks. I must have been professionally deformed.

[RFC] pack-refs: fail on falsely sorted packed-refs

Commit Message

Comments

Patch