Message ID | 20210323094025.3569441-9-armbru@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | qapi: Enforce naming rules | expand |
On 3/23/21 4:40 AM, Markus Armbruster wrote: > Flat union tag values get checked twice: as enum member name, and as > union branch name. The former accepts leading digits, the latter > doesn't. The restriction feels arbitrary. Skip the latter check. > > This can expose c_name() to input it can't handle: a name starting > with a digit. Improve it to return a valid C identifier for any > input. > > Signed-off-by: Markus Armbruster <armbru@redhat.com> > --- > scripts/qapi/common.py | 8 ++++---- > scripts/qapi/expr.py | 4 +++- > 2 files changed, 7 insertions(+), 5 deletions(-) > Reviewed-by: Eric Blake <eblake@redhat.com>
On 3/23/21 5:40 AM, Markus Armbruster wrote: > Flat union tag values get checked twice: as enum member name, and as > union branch name. The former accepts leading digits, the latter > doesn't. The restriction feels arbitrary. Skip the latter check. > > This can expose c_name() to input it can't handle: a name starting > with a digit. Improve it to return a valid C identifier for any > input. > > Signed-off-by: Markus Armbruster <armbru@redhat.com> Anything in particular inspire this? > --- > scripts/qapi/common.py | 8 ++++---- > scripts/qapi/expr.py | 4 +++- > 2 files changed, 7 insertions(+), 5 deletions(-) > > diff --git a/scripts/qapi/common.py b/scripts/qapi/common.py > index 11b86beeab..cbd3fd81d3 100644 > --- a/scripts/qapi/common.py > +++ b/scripts/qapi/common.py > @@ -18,7 +18,6 @@ > #: Magic string that gets removed along with all space to its right. > EATSPACE = '\033EATSPACE.' > POINTER_SUFFIX = ' *' + EATSPACE > -_C_NAME_TRANS = str.maketrans('.-', '__') > > > def camel_to_upper(value: str) -> str: > @@ -109,9 +108,10 @@ def c_name(name: str, protect: bool = True) -> str: > 'not_eq', 'or', 'or_eq', 'xor', 'xor_eq']) > # namespace pollution: > polluted_words = set(['unix', 'errno', 'mips', 'sparc', 'i386']) > - name = name.translate(_C_NAME_TRANS) > - if protect and (name in c89_words | c99_words | c11_words | gcc_words > - | cpp_words | polluted_words): > + name = re.sub(r'[^A-Za-z0-9_]', '_', name) The regex gets a little more powerful now. Instead of just . and - we now translate *everything* that's not an alphanumeric to _. Does this have a visible impact anywhere, or are we constrained somewhere else? > + if protect and (name in (c89_words | c99_words | c11_words | gcc_words > + | cpp_words | polluted_words) > + or name[0].isdigit()): Worth touching the docstring? :param protect: If true, avoid returning certain ticklish identifiers (like C keywords) by prepending ``q_``. I know the formatting is a hot mess, but I still intend to get to that "all at once" with an "enable sphinx" pass later, so just touching the content so it gets included in that pass would be fine (to me, at least.) > return 'q_' + name > return name > > diff --git a/scripts/qapi/expr.py b/scripts/qapi/expr.py > index cf09fa9fd3..507550c340 100644 > --- a/scripts/qapi/expr.py > +++ b/scripts/qapi/expr.py > @@ -246,7 +246,9 @@ def check_union(expr, info): > > for (key, value) in members.items(): > source = "'data' member '%s'" % key > - check_name_str(key, info, source) > + if discriminator is None: > + check_name_str(key, info, source) > + # else: name is in discriminator enum, which gets checked I suppose the alternative would be to have allowed check_name_str to take an 'allow_leading_digits' parameter (instead of 'enum_member') and set that to true here and just deal with the mild inefficiency. I might have a slight preference to just accept the inefficiency so that it's obvious that it's checked regardless of the discriminator condition, buuuuut not enough to be pushy about it, I think. > check_keys(value, info, source, ['type'], ['if']) > check_if(value, info, source) > check_type(value['type'], info, source, allow_array=not base) >
John Snow <jsnow@redhat.com> writes: > On 3/23/21 5:40 AM, Markus Armbruster wrote: >> Flat union tag values get checked twice: as enum member name, and as >> union branch name. The former accepts leading digits, the latter >> doesn't. The restriction feels arbitrary. Skip the latter check. >> This can expose c_name() to input it can't handle: a name starting >> with a digit. Improve it to return a valid C identifier for any >> input. >> Signed-off-by: Markus Armbruster <armbru@redhat.com> > > Anything in particular inspire this? Just a desire for keeping things simple. "Any enum type works as discriminator" is simpler than "any enum works, but branches corresponding to enum values starting with a digit cannot have members". Let me elaborate. This works: {'enum': 'Enu', 'data': ['0', 'eins', '2']} {'struct': 'St', 'data': {'s': 'str'}} {'union': 'Uni', 'base': {'type': 'Enu'}, 'discriminator': 'type', 'data': { 'eins': 'St'}} But if you change the last line to '0': 'St'}} you get told off: scripts/qapi-gen.py: /dev/stdin: In union 'Uni': /dev/stdin:3: 'data' member '0' has an invalid name > >> --- >> scripts/qapi/common.py | 8 ++++---- >> scripts/qapi/expr.py | 4 +++- >> 2 files changed, 7 insertions(+), 5 deletions(-) >> diff --git a/scripts/qapi/common.py b/scripts/qapi/common.py >> index 11b86beeab..cbd3fd81d3 100644 >> --- a/scripts/qapi/common.py >> +++ b/scripts/qapi/common.py >> @@ -18,7 +18,6 @@ >> #: Magic string that gets removed along with all space to its right. >> EATSPACE = '\033EATSPACE.' >> POINTER_SUFFIX = ' *' + EATSPACE >> -_C_NAME_TRANS = str.maketrans('.-', '__') >> >> def camel_to_upper(value: str) -> str: >> @@ -109,9 +108,10 @@ def c_name(name: str, protect: bool = True) -> str: >> 'not_eq', 'or', 'or_eq', 'xor', 'xor_eq']) >> # namespace pollution: >> polluted_words = set(['unix', 'errno', 'mips', 'sparc', 'i386']) >> - name = name.translate(_C_NAME_TRANS) >> - if protect and (name in c89_words | c99_words | c11_words | gcc_words >> - | cpp_words | polluted_words): >> + name = re.sub(r'[^A-Za-z0-9_]', '_', name) > > The regex gets a little more powerful now. Instead of just . and - we > now translate *everything* that's not an alphanumeric to _. Yes. > Does this have a visible impact anywhere, or are we constrained > somewhere else? If it had, we'd generate C that doesn't compile. Except when we're unlucky. Then it compiles, but means something different than we want. I need to catch "C identifiers can't start with a digit" to make the remainder of the patch work (right below). Instead of doing just that, I chose to do a complete job, and ensure the function always returns a valid C identifier. >> + if protect and (name in (c89_words | c99_words | c11_words | gcc_words >> + | cpp_words | polluted_words) >> + or name[0].isdigit()): > > Worth touching the docstring? > > :param protect: If true, avoid returning certain ticklish identifiers > (like C keywords) by prepending ``q_``. It doesn't become wrong. Care to suggest an improvement? > I know the formatting is a hot mess, but I still intend to get to that > "all at once" with an "enable sphinx" pass later, so just touching the > content so it gets included in that pass would be fine (to me, at least.) > >> return 'q_' + name >> return name >> diff --git a/scripts/qapi/expr.py b/scripts/qapi/expr.py >> index cf09fa9fd3..507550c340 100644 >> --- a/scripts/qapi/expr.py >> +++ b/scripts/qapi/expr.py >> @@ -246,7 +246,9 @@ def check_union(expr, info): >> for (key, value) in members.items(): >> source = "'data' member '%s'" % key >> - check_name_str(key, info, source) >> + if discriminator is None: >> + check_name_str(key, info, source) >> + # else: name is in discriminator enum, which gets checked > > I suppose the alternative would be to have allowed check_name_str to > take an 'allow_leading_digits' parameter (instead of 'enum_member') > and set that to true here and just deal with the mild inefficiency. > > I might have a slight preference to just accept the inefficiency so > that it's obvious that it's checked regardless of the discriminator > condition, buuuuut not enough to be pushy about it, I think. Sounds like a defensible idea, but this series is heading in the opposite direction; see the next few patches. >> check_keys(value, info, source, ['type'], ['if']) >> check_if(value, info, source) >> check_type(value['type'], info, source, allow_array=not base) >>
Markus Armbruster <armbru@redhat.com> writes: > John Snow <jsnow@redhat.com> writes: > >> On 3/23/21 5:40 AM, Markus Armbruster wrote: >>> Flat union tag values get checked twice: as enum member name, and as >>> union branch name. The former accepts leading digits, the latter >>> doesn't. The restriction feels arbitrary. Skip the latter check. >>> >>> This can expose c_name() to input it can't handle: a name starting >>> with a digit. Improve it to return a valid C identifier for any >>> input. >>> >>> Signed-off-by: Markus Armbruster <armbru@redhat.com> >> >> Anything in particular inspire this? > > Just a desire for keeping things simple. "Any enum type works as > discriminator" is simpler than "any enum works, but branches > corresponding to enum values starting with a digit cannot have members". > Let me elaborate. > > This works: > > {'enum': 'Enu', 'data': ['0', 'eins', '2']} > {'struct': 'St', 'data': {'s': 'str'}} > {'union': 'Uni', > 'base': {'type': 'Enu'}, > 'discriminator': 'type', > 'data': { > 'eins': 'St'}} > > But if you change the last line to > > '0': 'St'}} > > you get told off: > > scripts/qapi-gen.py: /dev/stdin: In union 'Uni': > /dev/stdin:3: 'data' member '0' has an invalid name Improved commit message: qapi: Permit flat union members for any tag value Flat union branch names match the tag enum's member names. Omitted branches default to "no members for this tag value". Branch names starting with a digit get rejected like "'data' member '0' has an invalid name". However, omitting the branch works. This is because flat union tag values get checked twice: as enum member name, and as union branch name. The former accepts leading digits, the latter doesn't. Branches whose names start with a digit therefore cannot have members. Feels wrong. Get rid of the restriction by skipping the latter check. This can expose c_name() to input it can't handle: a name starting with a digit. Improve it to return a valid C identifier for any input. Signed-off-by: Markus Armbruster <armbru@redhat.com> [...]
diff --git a/scripts/qapi/common.py b/scripts/qapi/common.py index 11b86beeab..cbd3fd81d3 100644 --- a/scripts/qapi/common.py +++ b/scripts/qapi/common.py @@ -18,7 +18,6 @@ #: Magic string that gets removed along with all space to its right. EATSPACE = '\033EATSPACE.' POINTER_SUFFIX = ' *' + EATSPACE -_C_NAME_TRANS = str.maketrans('.-', '__') def camel_to_upper(value: str) -> str: @@ -109,9 +108,10 @@ def c_name(name: str, protect: bool = True) -> str: 'not_eq', 'or', 'or_eq', 'xor', 'xor_eq']) # namespace pollution: polluted_words = set(['unix', 'errno', 'mips', 'sparc', 'i386']) - name = name.translate(_C_NAME_TRANS) - if protect and (name in c89_words | c99_words | c11_words | gcc_words - | cpp_words | polluted_words): + name = re.sub(r'[^A-Za-z0-9_]', '_', name) + if protect and (name in (c89_words | c99_words | c11_words | gcc_words + | cpp_words | polluted_words) + or name[0].isdigit()): return 'q_' + name return name diff --git a/scripts/qapi/expr.py b/scripts/qapi/expr.py index cf09fa9fd3..507550c340 100644 --- a/scripts/qapi/expr.py +++ b/scripts/qapi/expr.py @@ -246,7 +246,9 @@ def check_union(expr, info): for (key, value) in members.items(): source = "'data' member '%s'" % key - check_name_str(key, info, source) + if discriminator is None: + check_name_str(key, info, source) + # else: name is in discriminator enum, which gets checked check_keys(value, info, source, ['type'], ['if']) check_if(value, info, source) check_type(value['type'], info, source, allow_array=not base)
Flat union tag values get checked twice: as enum member name, and as union branch name. The former accepts leading digits, the latter doesn't. The restriction feels arbitrary. Skip the latter check. This can expose c_name() to input it can't handle: a name starting with a digit. Improve it to return a valid C identifier for any input. Signed-off-by: Markus Armbruster <armbru@redhat.com> --- scripts/qapi/common.py | 8 ++++---- scripts/qapi/expr.py | 4 +++- 2 files changed, 7 insertions(+), 5 deletions(-)