diff mbox series

[GSOC] ref-filter: solve bugs caused by enumeration

Message ID pull.949.git.1620228664666.gitgitgadget@gmail.com (mailing list archive)
State New
Headers show
Series [GSOC] ref-filter: solve bugs caused by enumeration | expand

Commit Message

ZheNing Hu May 5, 2021, 3:31 p.m. UTC
From: ZheNing Hu <adlternative@gmail.com>

Johannes Schindelin seems to have introduced a bug in
cc72385f(for-each-ref: let upstream/push optionally
report the remote name), it use `atom->u.remote_ref.option`
which is a member of enumeration in the judgment statement.
When we use other members in the enumeration `used_atom.u`,
and it happened to fill in `remote_ref.push`, this judgment
may still be established and produces errors. So replace the
judgment statement with `starts_with(name, "push")` to fix
the error.

Signed-off-by: ZheNing Hu <adlternative@gmail.com>
---
    [GSOC] ref-filter: solve bugs caused by enumeration
    
    This is the problem I found in the implementation of git for-each-ref
    --format="%(notes)".
    
    I added a new option to the enum used_atom.u , but the program seems to
    output the value of "%(push)".
    
    After research, it should be the problem of incorrect use of enumeration
    values here.
    
    Thanks, may the force be with you!

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-949%2Fadlternative%2Fref-filter-enum-bug-fix-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-949/adlternative/ref-filter-enum-bug-fix-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/949

 ref-filter.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


base-commit: 311531c9de557d25ac087c1637818bd2aad6eb3a

Comments

Junio C Hamano May 6, 2021, 1:53 a.m. UTC | #1
"ZheNing Hu via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: ZheNing Hu <adlternative@gmail.com>
>
> Johannes Schindelin seems to have introduced a bug in
> cc72385f(for-each-ref: let upstream/push optionally
> report the remote name), it use `atom->u.remote_ref.option`
> which is a member of enumeration in the judgment statement.

Sorry but I am not sure if our readers would understand what "a
member of enumeration in the judgment statement" is (I certainly do
not), and even more importantly, "bugs caused by enumeration" on the
title does not hint much about what problem the patch is trying to
solve.

> When we use other members in the enumeration `used_atom.u`,
> and it happened to fill in `remote_ref.push`, this judgment
> may still be established and produces errors. So replace the
> judgment statement with `starts_with(name, "push")` to fix
> the error.

And this paragraph does not enlighten all that much, unfortunately.

Is it that a check refers to one member of a union without making
sure that member is the one in effect in the union?  I am most
puzzled by the mention of "enumeration" when there does not appear
to be any enum involved.

> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-949%2Fadlternative%2Fref-filter-enum-bug-fix-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-949/adlternative/ref-filter-enum-bug-fix-v1
> Pull-Request: https://github.com/gitgitgadget/git/pull/949
>
>  ref-filter.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/ref-filter.c b/ref-filter.c
> index a0adb4551d87..f467f2fbbb73 100644
> --- a/ref-filter.c
> +++ b/ref-filter.c
> @@ -1730,7 +1730,7 @@ static int populate_value(struct ref_array_item *ref, struct strbuf *err)
>  			else
>  				v->s = xstrdup("");
>  			continue;
> -		} else if (atom->u.remote_ref.push) {
> +		} else if (starts_with(name, "push")) {
>  			const char *branch_name;
>  			v->s = xstrdup("");
>  			if (!skip_prefix(ref->refname, "refs/heads/",
>
> base-commit: 311531c9de557d25ac087c1637818bd2aad6eb3a
ZheNing Hu May 6, 2021, 5:02 a.m. UTC | #2
Junio C Hamano <gitster@pobox.com> 于2021年5月6日周四 上午9:53写道:
>
> "ZheNing Hu via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > From: ZheNing Hu <adlternative@gmail.com>
> >
> > Johannes Schindelin seems to have introduced a bug in
> > cc72385f(for-each-ref: let upstream/push optionally
> > report the remote name), it use `atom->u.remote_ref.option`
> > which is a member of enumeration in the judgment statement.
>
> Sorry but I am not sure if our readers would understand what "a
> member of enumeration in the judgment statement" is (I certainly do
> not), and even more importantly, "bugs caused by enumeration" on the
> title does not hint much about what problem the patch is trying to
> solve.
>
> > When we use other members in the enumeration `used_atom.u`,
> > and it happened to fill in `remote_ref.push`, this judgment
> > may still be established and produces errors. So replace the
> > judgment statement with `starts_with(name, "push")` to fix
> > the error.
>
> And this paragraph does not enlighten all that much, unfortunately.
>
> Is it that a check refers to one member of a union without making
> sure that member is the one in effect in the union?  I am most
> puzzled by the mention of "enumeration" when there does not appear
> to be any enum involved.
>

Sorry, I didn't make it clear. I re-describe the problem first, and then
modify the commit messages.

Suppose we are dealing with "%(notes)", the name member of our
`used_atom` item at this time is "notes", and its member union `u` uses
a `struct notes_option`, we fill some values in `used_atom.u.notes_option`,

When we traverse in `used_atom` array in `populate_value()` and previous
judgement like "if (starts_with(name, "refname"))" will failed, because we
are dealing with atom "notes", but in judgement "else if
(atom->u.remote_ref.push)",
The value we fill in `used_atom.u.notes_option` just makes
`used_atom.u.remote_ref.push` non-zero. This leads us into the wrong case.

Is this clearer?

Thanks.
--
ZheNing Hu
Junio C Hamano May 6, 2021, 5:35 a.m. UTC | #3
ZheNing Hu <adlternative@gmail.com> writes:

>> Is it that a check refers to one member of a union without making
>> sure that member is the one in effect in the union?  I am most
>> puzzled by the mention of "enumeration" when there does not appear
>> to be any enum involved.
>
> Sorry, I didn't make it clear. I re-describe the problem first, and then
> modify the commit messages.
>
> Suppose we are dealing with "%(notes)", the name member of our
> `used_atom` item at this time is "notes", and its member union `u` uses
> a `struct notes_option`, we fill some values in `used_atom.u.notes_option`,
>
> When we traverse in `used_atom` array in `populate_value()` and previous
> judgement like "if (starts_with(name, "refname"))" will failed, because we
> are dealing with atom "notes", but in judgement "else if
> (atom->u.remote_ref.push)",
> The value we fill in `used_atom.u.notes_option` just makes
> `used_atom.u.remote_ref.push` non-zero. This leads us into the wrong case.
>
> Is this clearer?

This time you avoided the word enumeration, and it made it clearer.
The word commonly used is "condition" where you said "judgment", I
think, and wit that it would probably be even more clear.

used_atom.u is an union, and it has different members depending on
what atom the auxiliary data the union part of the "struct
used_atom" wants to record.  At most only one of the members can be
valid at any one time.  Since the code checks u.remote_ref without
even making sure if the atom is "push" or "push:" (which are only
two cases that u.remote_ref.push becomes valid), but u.remote_ref
shares the same storage for other members of the union, the check
was reading from an invalid member, which was the bug.

So the fix was to see what atom it is by checking its name member?
Is starts_with() a reliable test?  A fictitious atom "pushe" will be
different from "push" or "push:<something>", but will still pass
that test, so from the point of view of future-proofing the tests,
shouldn't it do the same check as the one at the beginning of
remote_ref_atom_parser()?

Thanks.
ZheNing Hu May 6, 2021, 10:39 a.m. UTC | #4
> This time you avoided the word enumeration, and it made it clearer.
> The word commonly used is "condition" where you said "judgment", I
> think, and wit that it would probably be even more clear.
>

OK.

> used_atom.u is an union, and it has different members depending on
> what atom the auxiliary data the union part of the "struct
> used_atom" wants to record.  At most only one of the members can be
> valid at any one time.  Since the code checks u.remote_ref without
> even making sure if the atom is "push" or "push:" (which are only
> two cases that u.remote_ref.push becomes valid), but u.remote_ref
> shares the same storage for other members of the union, the check
> was reading from an invalid member, which was the bug.
>

Yes, that's what it means. I got it wrong before, of course used_atom.u
is not a enum, it's union.

> So the fix was to see what atom it is by checking its name member?
> Is starts_with() a reliable test?  A fictitious atom "pushe" will be
> different from "push" or "push:<something>", but will still pass
> that test, so from the point of view of future-proofing the tests,
> shouldn't it do the same check as the one at the beginning of
> remote_ref_atom_parser()?
>

I think it's not necesssary. Before we call `populate_value()`,
`parse_ref_filter_atom()` only allow user use atom which match
 "valid_atom" entry fully, something like "pushe" will be rejected
and process die with "fatal: unknown field name: pushe", so it didn't
pass this "starts_with()" test.

        /* Is the atom a valid one? */
        for (i = 0; i < ARRAY_SIZE(valid_atom); i++) {
                int len = strlen(valid_atom[i].name);
                if (len == atom_len && !memcmp(valid_atom[i].name, sp, len))
                        break;
        }

        if (ARRAY_SIZE(valid_atom) <= i)
                return strbuf_addf_ret(err, -1, _("unknown field name: %.*s"),
                                       (int)(ep-atom), atom);

> Thanks.

Thanks.
--
ZheNing Hu
Junio C Hamano May 6, 2021, 11:20 a.m. UTC | #5
ZheNing Hu <adlternative@gmail.com> writes:

>> So the fix was to see what atom it is by checking its name member?
>> Is starts_with() a reliable test?  A fictitious atom "pushe" will be
>> different from "push" or "push:<something>", but will still pass
>> that test, so from the point of view of future-proofing the tests,
>> shouldn't it do the same check as the one at the beginning of
>> remote_ref_atom_parser()?
>>
>
> I think it's not necesssary. Before we call `populate_value()`, ...

I do not care if it is not needed with the "current" set of atoms we
accept.  The example "pushe", which obviously will not be accepted
by today's code, was all about future-proofing.
ZheNing Hu May 6, 2021, 11:52 a.m. UTC | #6
Junio C Hamano <gitster@pobox.com> 于2021年5月6日周四 下午7:20写道:
>
> ZheNing Hu <adlternative@gmail.com> writes:
>
> >> So the fix was to see what atom it is by checking its name member?
> >> Is starts_with() a reliable test?  A fictitious atom "pushe" will be
> >> different from "push" or "push:<something>", but will still pass
> >> that test, so from the point of view of future-proofing the tests,
> >> shouldn't it do the same check as the one at the beginning of
> >> remote_ref_atom_parser()?
> >>
> >
> > I think it's not necesssary. Before we call `populate_value()`, ...
>
> I do not care if it is not needed with the "current" set of atoms we
> accept.  The example "pushe", which obviously will not be accepted
> by today's code, was all about future-proofing.

Well, what you said makes sense, now I think something like this will be
more secure. In the future, other people may be grateful for such a choice. :)

@@ -1730,7 +1759,7 @@ static int populate_value(struct ref_array_item
*ref, struct strbuf *err)
                        else
                                v->s = xstrdup("");
                        continue;
-               } else if (starts_with(name, "push")) {
+               } else if (starts_with(name, "push") &&
atom->u.remote_ref.push) {


--
ZheNing Hu
Junio C Hamano May 6, 2021, 9:20 p.m. UTC | #7
ZheNing Hu <adlternative@gmail.com> writes:

> ... now I think something like this will be more secure. In the
> future, other people may be grateful for such a choice. :)
>
> @@ -1730,7 +1759,7 @@ static int populate_value(struct ref_array_item
> *ref, struct strbuf *err)
>                         else
>                                 v->s = xstrdup("");
>                         continue;
> -               } else if (starts_with(name, "push")) {
> +               } else if (starts_with(name, "push") &&
> atom->u.remote_ref.push) {

Hmph, I do no think that would be futureproof at all.  If a new atom
"pushe" gets added, it is not all that unlikely that it would add
its own member to the same union.  name here would be "pushe" and
starts with "push", and upon parsing "pushe", its own member in the
union, atom->u.X, would have been assigned to, but the code we see
here still accesses atom->u.remote_ref.*, so you still have the same
problem you started to solve, no?

The check we use in remote_ref_atom_parser() to see if the atom is
about pushing, i.e.

	if (!strcmp(atom->name, "push") || starts_with(atom->name, "push:"))

is unlikely to be invalidated in future changes, as this is very
much end-user facing and changing the condition will break existing
scripts, so that is what I was expecting you to use instead.
ZheNing Hu May 7, 2021, 4:32 a.m. UTC | #8
> Hmph, I do no think that would be futureproof at all.  If a new atom
> "pushe" gets added, it is not all that unlikely that it would add
> its own member to the same union.  name here would be "pushe" and
> starts with "push", and upon parsing "pushe", its own member in the
> union, atom->u.X, would have been assigned to, but the code we see
> here still accesses atom->u.remote_ref.*, so you still have the same
> problem you started to solve, no?
>

Well, this "pushe" has `atom->u.X` which is different from "push" and
"push:". This is indeed worth worrying about.

> The check we use in remote_ref_atom_parser() to see if the atom is
> about pushing, i.e.
>
>         if (!strcmp(atom->name, "push") || starts_with(atom->name, "push:"))
>
> is unlikely to be invalidated in future changes, as this is very
> much end-user facing and changing the condition will break existing
> scripts, so that is what I was expecting you to use instead.
>

But I am afraid that the cost we paid for string matching here is too high,
Since the string matching here is to illustrate that we use one of the members
of the union, why can't we directly add a "union_type" member to used_atom?
if we have this, we can just switch the "union_type" flag and also make the
program correct. Perhaps this would be better than the large number of string
matches have done here.

>
>

Thanks.
--
ZheNing Hu
Junio C Hamano May 7, 2021, 4:49 a.m. UTC | #9
ZheNing Hu <adlternative@gmail.com> writes:

> But I am afraid that the cost we paid for string matching here is too high,

If that is truly the concern (I do not know without measuring),
perhaps we should add a member next to the union to say which one of
the union members is valid, so that you can say

    if (atom->atom_type == ATOM_TYPE_REMOTE_REF &&
        atom->u.remote_ref.push)

(introduce an enum and define ATOM_TYPE_* after the member in the
union).

That would help futureproofing the code even further, as a new
synonym of "push" introduced laster [*] would not invalidate the check you are
adding there.


[Footnote]

* remote_ref_atom_parser() in the future may begin like so:

-	if (!strcmp(atom->name, "push") || starts_with(atom->name, "push:"))
+	if (!strcmp(atom->name, "push") || starts_with(atom->name, "push:") ||
+           !strcmp(atom->name, "a-synonym-for-push"))
		atom->u.remote_ref.push = 1;
ZheNing Hu May 7, 2021, 5:09 a.m. UTC | #10
> If that is truly the concern (I do not know without measuring),
> perhaps we should add a member next to the union to say which one of
> the union members is valid, so that you can say
>
>     if (atom->atom_type == ATOM_TYPE_REMOTE_REF &&
>         atom->u.remote_ref.push)
>
> (introduce an enum and define ATOM_TYPE_* after the member in the
> union).
>

Yes, I think so. Since the content of this part needs to be modified for
 the parsing of all atoms, I will put it in a separate topic to complete.

> That would help futureproofing the code even further, as a new
> synonym of "push" introduced laster [*] would not invalidate the check you are
> adding there.
>

Yes, this enhances its generalization ability.

>
> [Footnote]
>
> * remote_ref_atom_parser() in the future may begin like so:
>
> -       if (!strcmp(atom->name, "push") || starts_with(atom->name, "push:"))
> +       if (!strcmp(atom->name, "push") || starts_with(atom->name, "push:") ||
> +           !strcmp(atom->name, "a-synonym-for-push"))
>                 atom->u.remote_ref.push = 1;
>

--
ZheNing Hu
diff mbox series

Patch

diff --git a/ref-filter.c b/ref-filter.c
index a0adb4551d87..f467f2fbbb73 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -1730,7 +1730,7 @@  static int populate_value(struct ref_array_item *ref, struct strbuf *err)
 			else
 				v->s = xstrdup("");
 			continue;
-		} else if (atom->u.remote_ref.push) {
+		} else if (starts_with(name, "push")) {
 			const char *branch_name;
 			v->s = xstrdup("");
 			if (!skip_prefix(ref->refname, "refs/heads/",