Message ID | 20190403153552.23461-3-ben.dooks@codethink.co.uk (mailing list archive) |
---|---|
State | Superseded, archived |
Headers | show |
Series | [1/6] validation: ignore temporary ~ files | expand |
On Wed, Apr 03, 2019 at 04:35:48PM +0100, Ben Dooks wrote: > Add code to parse the __attribute__((format)) used to indicate that > a variadic function takes a printf-style format string and where > those are. Save the data in ctype ready for checking when such an > function is encoutered. s/.../encountered/ > Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk> > --- > Fixes since v1: > - moved to using ctype in base_type to store infromation > - fixed formatting issues > - updated check for bad format arguments > - reduced the line count to unsigned short to save space > > Fixes since v2: > - correctly use the decl->ctype to store printf information > - fixed issues with checking format positions for va_list code > - parse but ignore scanf formatting for now > > Fixes since v4: > - deal with function pointers losing format info > --- > parse.c | 102 ++++++++++++++++++++++++++++++++++++++++++++++++++++++- > symbol.h | 2 ++ > 2 files changed, 103 insertions(+), 1 deletion(-) > > diff --git a/parse.c b/parse.c > index f291e24..42ba457 100644 > --- a/parse.c > +++ b/parse.c > @@ -136,6 +136,10 @@ static void asm_modifier_inline(struct token *token, unsigned long *mods) > asm_modifier(token, mods, MOD_INLINE); > } > > +enum { > + FmtPrintf = 0, FmtScanf, It would be good to add a small comment explaining the purpose. Something like: "for the different kinds of 'format' attributes". > @@ -1172,6 +1194,74 @@ static struct token *attribute_address_space(struct token *token, struct symbol > return token; > } > > +static int invalid_printf_format_args(long long start, long long at) > +{ > + return start < 0 || at < 0 || (start == at && start > 0) || > + (start == 0 && at == 0); > +} > + > +static struct token *attribute_format(struct token *token, struct symbol *attr, struct decl_state *ctx) > +{ > + struct expression *args[3]; > + struct symbol *fmt_sym = NULL; > + int argc = 0; > + > + /* expecting format ( type, start, va_args at) */ > + > + token = expect(token, '(', "after format attribute"); > + while (!match_op(token, ')')) { > + struct expression *expr = NULL; > + > + if (argc == 0) { > + if (token_type(token) == TOKEN_IDENT) > + fmt_sym = lookup_keyword(token->ident, NS_KEYWORD); > + > + if (!fmt_sym || !fmt_sym->op || > + fmt_sym->op->type != KW_FORMAT) { > + sparse_error(token->pos, > + "unknown format type '%s'\n", > + show_ident(token->ident)); For now, it's better to just ignore unknown format type. > + fmt_sym = NULL; > + } > + } > + > + token = conditional_expression(token, &expr); > + if (!expr) > + break; > + if (argc < 3) > + args[argc++] = expr; > + if (!match_op(token, ',')) > + break; > + token = token->next; > + } > + > + if (argc != 3 || !fmt_sym) { > + warning(token->pos, "incorrect format attribute"); Since there is no optional argument, I think it would be clearer to not use a loop but instead do something like: token = expect(token, '(', "after format attribute"); ... check for the KW_FORMAT ... token = expect(token, ',', ... token = conditional_expression(token, &expr1); token = expect(token, ',', ... token = conditional_expression(token, &expr2); token = expect(token, ')', ... > @@ -2981,8 +3071,18 @@ struct token *external_declaration(struct token *token, struct symbol_list **lis > > if (!(decl->ctype.modifiers & MOD_STATIC)) > decl->ctype.modifiers |= MOD_EXTERN; > + > + base_type->ctype.printf_msg = decl->ctype.printf_msg; > + base_type->ctype.printf_va_start = decl->ctype.printf_va_start; > } else if (base_type == &void_ctype && !(decl->ctype.modifiers & MOD_EXTERN)) { > sparse_error(token->pos, "void declaration"); > + } else if (base_type && base_type->type == SYM_PTR) { > + // think this is correct to do // > + struct symbol *ptr_base = get_base_type(base_type); > + if (ptr_base) { > + ptr_base->ctype.printf_msg = decl->ctype.printf_msg; > + ptr_base->ctype.printf_va_start = decl->ctype.printf_va_start; > + } I'll need to check this. What situation is this supposed to cover? > diff --git a/symbol.h b/symbol.h > index ac43b31..7bb6f29 100644 > --- a/symbol.h > +++ b/symbol.h > @@ -103,6 +104,7 @@ struct ctype { > struct context_list *contexts; > struct ident *as; > struct symbol *base_type; > + unsigned short printf_va_start, printf_msg; > }; Note (mainly to myself): it's unfortunate that this will make struct ctype bigger but without some generic solution for attributes' paramaters, this is OK. -- Luc
On 03/04/2019 21:46, Luc Van Oostenryck wrote: > On Wed, Apr 03, 2019 at 04:35:48PM +0100, Ben Dooks wrote: >> Add code to parse the __attribute__((format)) used to indicate that >> a variadic function takes a printf-style format string and where >> those are. Save the data in ctype ready for checking when such an >> function is encoutered. > > s/.../encountered/ > >> Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk> >> --- >> Fixes since v1: >> - moved to using ctype in base_type to store infromation >> - fixed formatting issues >> - updated check for bad format arguments >> - reduced the line count to unsigned short to save space >> >> Fixes since v2: >> - correctly use the decl->ctype to store printf information >> - fixed issues with checking format positions for va_list code >> - parse but ignore scanf formatting for now >> >> Fixes since v4: >> - deal with function pointers losing format info >> --- >> parse.c | 102 ++++++++++++++++++++++++++++++++++++++++++++++++++++++- >> symbol.h | 2 ++ >> 2 files changed, 103 insertions(+), 1 deletion(-) >> >> diff --git a/parse.c b/parse.c >> index f291e24..42ba457 100644 >> --- a/parse.c >> +++ b/parse.c >> @@ -136,6 +136,10 @@ static void asm_modifier_inline(struct token *token, unsigned long *mods) >> asm_modifier(token, mods, MOD_INLINE); >> } >> >> +enum { >> + FmtPrintf = 0, FmtScanf, > > It would be good to add a small comment explaining the purpose. > Something like: "for the different kinds of 'format' attributes". > ok, fixed > @@ -1172,6 +1194,74 @@ static struct token *attribute_address_space(struct token *token, struct symbol >> return token; >> } >> >> +static int invalid_printf_format_args(long long start, long long at) >> +{ >> + return start < 0 || at < 0 || (start == at && start > 0) || >> + (start == 0 && at == 0); >> +} >> + >> +static struct token *attribute_format(struct token *token, struct symbol *attr, struct decl_state *ctx) >> +{ >> + struct expression *args[3]; >> + struct symbol *fmt_sym = NULL; >> + int argc = 0; >> + >> + /* expecting format ( type, start, va_args at) */ >> + >> + token = expect(token, '(', "after format attribute"); >> + while (!match_op(token, ')')) { >> + struct expression *expr = NULL; >> + >> + if (argc == 0) { >> + if (token_type(token) == TOKEN_IDENT) >> + fmt_sym = lookup_keyword(token->ident, NS_KEYWORD); >> + >> + if (!fmt_sym || !fmt_sym->op || >> + fmt_sym->op->type != KW_FORMAT) { >> + sparse_error(token->pos, >> + "unknown format type '%s'\n", >> + show_ident(token->ident)); > > For now, it's better to just ignore unknown format type. > >> + fmt_sym = NULL; >> + } >> + } >> + >> + token = conditional_expression(token, &expr); >> + if (!expr) >> + break; >> + if (argc < 3) >> + args[argc++] = expr; >> + if (!match_op(token, ',')) >> + break; >> + token = token->next; >> + } >> + >> + if (argc != 3 || !fmt_sym) { >> + warning(token->pos, "incorrect format attribute"); > > Since there is no optional argument, I think it would be clearer > to not use a loop but instead do something like: > token = expect(token, '(', "after format attribute"); > ... check for the KW_FORMAT ... > token = expect(token, ',', ... > token = conditional_expression(token, &expr1); > token = expect(token, ',', ... > token = conditional_expression(token, &expr2); > token = expect(token, ')', ... ok, i'll change that >> @@ -2981,8 +3071,18 @@ struct token *external_declaration(struct token *token, struct symbol_list **lis >> >> if (!(decl->ctype.modifiers & MOD_STATIC)) >> decl->ctype.modifiers |= MOD_EXTERN; >> + >> + base_type->ctype.printf_msg = decl->ctype.printf_msg; >> + base_type->ctype.printf_va_start = decl->ctype.printf_va_start; >> } else if (base_type == &void_ctype && !(decl->ctype.modifiers & MOD_EXTERN)) { >> sparse_error(token->pos, "void declaration"); >> + } else if (base_type && base_type->type == SYM_PTR) { >> + // think this is correct to do // >> + struct symbol *ptr_base = get_base_type(base_type); >> + if (ptr_base) { >> + ptr_base->ctype.printf_msg = decl->ctype.printf_msg; >> + ptr_base->ctype.printf_va_start = decl->ctype.printf_va_start; >> + } > > I'll need to check this. > What situation is this supposed to cover? definition of pointers to functions, ie: int (*ptr)(args..) __attribute__((format...)); >> diff --git a/symbol.h b/symbol.h >> index ac43b31..7bb6f29 100644 >> --- a/symbol.h >> +++ b/symbol.h >> @@ -103,6 +104,7 @@ struct ctype { >> struct context_list *contexts; >> struct ident *as; >> struct symbol *base_type; >> + unsigned short printf_va_start, printf_msg; >> }; > > Note (mainly to myself): it's unfortunate that this will make > struct ctype bigger but without some generic solution for attributes' > paramaters, this is OK. ok, thanks.
On Thu, Apr 04, 2019 at 10:53:08AM +0100, Ben Dooks wrote: > On 03/04/2019 21:46, Luc Van Oostenryck wrote: > > On Wed, Apr 03, 2019 at 04:35:48PM +0100, Ben Dooks wrote: > > > @@ -2981,8 +3071,18 @@ struct token *external_declaration(struct token *token, struct symbol_list **lis > > > if (!(decl->ctype.modifiers & MOD_STATIC)) > > > decl->ctype.modifiers |= MOD_EXTERN; > > > + > > > + base_type->ctype.printf_msg = decl->ctype.printf_msg; > > > + base_type->ctype.printf_va_start = decl->ctype.printf_va_start; > > > } else if (base_type == &void_ctype && !(decl->ctype.modifiers & MOD_EXTERN)) { > > > sparse_error(token->pos, "void declaration"); > > > + } else if (base_type && base_type->type == SYM_PTR) { > > > + // think this is correct to do // > > > + struct symbol *ptr_base = get_base_type(base_type); > > > + if (ptr_base) { > > > + ptr_base->ctype.printf_msg = decl->ctype.printf_msg; > > > + ptr_base->ctype.printf_va_start = decl->ctype.printf_va_start; > > > + } > > > > I'll need to check this. > > What situation is this supposed to cover? > > definition of pointers to functions, ie: > > int (*ptr)(args..) __attribute__((format...)); But this is not the correct syntax as it applies the attribute to the declared object 'ptr', a pointer type and not to the underlying function type. The syntax used in the kernel is (cfr. kdb_printf_t): __attribute__((...)) int (*ptr)(args...); and GCC's manual also specifies: int (__attribute__((...)) *ptr)(args...); (but I'm not sure this one is correctly supported by sparse). It would be good to issue a warning if the 'format' attribute is applied to anything that is not a function type. -- Luc
iOn 04/04/2019 12:48, Luc Van Oostenryck wrote: > On Thu, Apr 04, 2019 at 10:53:08AM +0100, Ben Dooks wrote: >> On 03/04/2019 21:46, Luc Van Oostenryck wrote: >>> On Wed, Apr 03, 2019 at 04:35:48PM +0100, Ben Dooks wrote: >>>> @@ -2981,8 +3071,18 @@ struct token *external_declaration(struct token *token, struct symbol_list **lis >>>> if (!(decl->ctype.modifiers & MOD_STATIC)) >>>> decl->ctype.modifiers |= MOD_EXTERN; >>>> + >>>> + base_type->ctype.printf_msg = decl->ctype.printf_msg; >>>> + base_type->ctype.printf_va_start = decl->ctype.printf_va_start; >>>> } else if (base_type == &void_ctype && !(decl->ctype.modifiers & MOD_EXTERN)) { >>>> sparse_error(token->pos, "void declaration"); >>>> + } else if (base_type && base_type->type == SYM_PTR) { >>>> + // think this is correct to do // >>>> + struct symbol *ptr_base = get_base_type(base_type); >>>> + if (ptr_base) { >>>> + ptr_base->ctype.printf_msg = decl->ctype.printf_msg; >>>> + ptr_base->ctype.printf_va_start = decl->ctype.printf_va_start; >>>> + } >>> >>> I'll need to check this. >>> What situation is this supposed to cover? >> >> definition of pointers to functions, ie: >> >> int (*ptr)(args..) __attribute__((format...)); > > But this is not the correct syntax as it applies the attribute > to the declared object 'ptr', a pointer type and not to the > underlying function type. > > The syntax used in the kernel is (cfr. kdb_printf_t): > __attribute__((...)) int (*ptr)(args...); > and GCC's manual also specifies: > int (__attribute__((...)) *ptr)(args...); > (but I'm not sure this one is correctly supported by sparse). > > It would be good to issue a warning if the 'format' attribute is > applied to anything that is not a function type. ok, gcc doesn't seem to issue a warning about that. in fact, it seems happy about: int main(void) { int (*ptr)(const char *msg, ...) __attribute__((format(printf,1,2))); (ptr)("moose %s %d\n", 1, 2); return 0; } gcc -O2 -Wall test4.c test4.c: In function ‘main’: test4.c:6:17: warning: format ‘%s’ expects argument of type ‘char *’, but argument 2 has type ‘int’ [-Wformat=] (ptr)("moose %s %d\n", 1, 2); ~^ ~ %d
On Thu, Apr 04, 2019 at 01:05:29PM +0100, Ben Dooks wrote: > iOn 04/04/2019 12:48, Luc Van Oostenryck wrote: > > On Thu, Apr 04, 2019 at 10:53:08AM +0100, Ben Dooks wrote: > > > On 03/04/2019 21:46, Luc Van Oostenryck wrote: > > > > On Wed, Apr 03, 2019 at 04:35:48PM +0100, Ben Dooks wrote: > > > > > @@ -2981,8 +3071,18 @@ struct token *external_declaration(struct token *token, struct symbol_list **lis > > > > > if (!(decl->ctype.modifiers & MOD_STATIC)) > > > > > decl->ctype.modifiers |= MOD_EXTERN; > > > > > + > > > > > + base_type->ctype.printf_msg = decl->ctype.printf_msg; > > > > > + base_type->ctype.printf_va_start = decl->ctype.printf_va_start; > > > > > } else if (base_type == &void_ctype && !(decl->ctype.modifiers & MOD_EXTERN)) { > > > > > sparse_error(token->pos, "void declaration"); > > > > > + } else if (base_type && base_type->type == SYM_PTR) { > > > > > + // think this is correct to do // > > > > > + struct symbol *ptr_base = get_base_type(base_type); > > > > > + if (ptr_base) { > > > > > + ptr_base->ctype.printf_msg = decl->ctype.printf_msg; > > > > > + ptr_base->ctype.printf_va_start = decl->ctype.printf_va_start; > > > > > + } > > > > > > > > I'll need to check this. > > > > What situation is this supposed to cover? > > > > > > definition of pointers to functions, ie: > > > > > > int (*ptr)(args..) __attribute__((format...)); > > > > But this is not the correct syntax as it applies the attribute > > to the declared object 'ptr', a pointer type and not to the > > underlying function type. > > > > The syntax used in the kernel is (cfr. kdb_printf_t): > > __attribute__((...)) int (*ptr)(args...); > > and GCC's manual also specifies: > > int (__attribute__((...)) *ptr)(args...); > > (but I'm not sure this one is correctly supported by sparse). > > > > It would be good to issue a warning if the 'format' attribute is > > applied to anything that is not a function type. > > ok, gcc doesn't seem to issue a warning about that. > in fact, it seems happy about: > > int main(void) > { > int (*ptr)(const char *msg, ...) __attribute__((format(printf,1,2))); > > (ptr)("moose %s %d\n", 1, 2); > return 0; > } > > gcc -O2 -Wall test4.c > > test4.c: In function ‘main’: > test4.c:6:17: warning: format ‘%s’ expects argument of type > ‘char *’, but argument 2 has type ‘int’ [-Wformat=] > (ptr)("moose %s %d\n", 1, 2); > ~^ ~ > %d OK. I'm not very surprised, as the manual also says: "... at present the <x> attribute applies to 'f', which causes a warning since 'f' is not a function, but in future it may apply to the function '****f'. The precise semantics of what attribute in such cases will apply to are not yet specified." I propose for the moment to just support the simple case used for the kernel (some other changes in the handling of attributes in declarations are anyway needed). -- Luc
diff --git a/parse.c b/parse.c index f291e24..42ba457 100644 --- a/parse.c +++ b/parse.c @@ -87,7 +87,7 @@ static attr_t attribute_address_space, attribute_context, attribute_designated_init, attribute_transparent_union, ignore_attribute, - attribute_mode, attribute_force; + attribute_mode, attribute_force, attribute_format; typedef struct symbol *to_mode_t(struct symbol *); @@ -136,6 +136,10 @@ static void asm_modifier_inline(struct token *token, unsigned long *mods) asm_modifier(token, mods, MOD_INLINE); } +enum { + FmtPrintf = 0, FmtScanf, +}; + static struct symbol_op typedef_op = { .type = KW_MODIFIER, .declarator = typedef_specifier, @@ -386,6 +390,10 @@ static struct symbol_op attr_force_op = { .attribute = attribute_force, }; +static struct symbol_op attr_format = { + .attribute = attribute_format, +}; + static struct symbol_op address_space_op = { .attribute = attribute_address_space, }; @@ -445,6 +453,16 @@ static struct symbol_op mode_word_op = { .to_mode = to_word_mode }; +static struct symbol_op attr_printf_op = { + .type = KW_FORMAT, + .class = FmtPrintf, +}; + +static struct symbol_op attr_scanf_op = { + .type = KW_FORMAT, + .class = FmtScanf, +}; + /* Using NS_TYPEDEF will also make the keyword a reserved one */ static struct init_keyword { const char *name; @@ -570,6 +588,10 @@ static struct init_keyword { {"externally_visible", NS_KEYWORD, .op = &ext_visible_op }, {"__externally_visible__", NS_KEYWORD, .op = &ext_visible_op }, + { "format", NS_KEYWORD, .op = &attr_format }, + { "printf", NS_KEYWORD, .op = &attr_printf_op }, + { "scanf", NS_KEYWORD, .op = &attr_scanf_op }, + { "mode", NS_KEYWORD, .op = &mode_op }, { "__mode__", NS_KEYWORD, .op = &mode_op }, { "QI", NS_KEYWORD, .op = &mode_QI_op }, @@ -1172,6 +1194,74 @@ static struct token *attribute_address_space(struct token *token, struct symbol return token; } +static int invalid_printf_format_args(long long start, long long at) +{ + return start < 0 || at < 0 || (start == at && start > 0) || + (start == 0 && at == 0); +} + +static struct token *attribute_format(struct token *token, struct symbol *attr, struct decl_state *ctx) +{ + struct expression *args[3]; + struct symbol *fmt_sym = NULL; + int argc = 0; + + /* expecting format ( type, start, va_args at) */ + + token = expect(token, '(', "after format attribute"); + while (!match_op(token, ')')) { + struct expression *expr = NULL; + + if (argc == 0) { + if (token_type(token) == TOKEN_IDENT) + fmt_sym = lookup_keyword(token->ident, NS_KEYWORD); + + if (!fmt_sym || !fmt_sym->op || + fmt_sym->op->type != KW_FORMAT) { + sparse_error(token->pos, + "unknown format type '%s'\n", + show_ident(token->ident)); + fmt_sym = NULL; + } + } + + token = conditional_expression(token, &expr); + if (!expr) + break; + if (argc < 3) + args[argc++] = expr; + if (!match_op(token, ',')) + break; + token = token->next; + } + + if (argc != 3 || !fmt_sym) { + warning(token->pos, "incorrect format attribute"); + } else if (fmt_sym->op->class != FmtPrintf) { + /* skip anything that isn't printf for the moment */ + warning(token->pos, "only printf format attribute supported"); + } else { + long long start, at; + + start = get_expression_value(args[2]); + at = get_expression_value(args[1]); + + if (invalid_printf_format_args(start, at)) { + warning(token->pos, "bad format positions"); + } else if (start == 0) { + /* nothing to do here, is va_list function */ + } else if (start < at) { + warning(token->pos, "format cannot be after va_args"); + } else { + ctx->ctype.printf_va_start = start; + ctx->ctype.printf_msg = at; + } + } + + token = expect(token, ')', "after format attribute"); + return token; +} + static struct symbol *to_QI_mode(struct symbol *ctype) { if (ctype->ctype.base_type != &int_type) @@ -2981,8 +3071,18 @@ struct token *external_declaration(struct token *token, struct symbol_list **lis if (!(decl->ctype.modifiers & MOD_STATIC)) decl->ctype.modifiers |= MOD_EXTERN; + + base_type->ctype.printf_msg = decl->ctype.printf_msg; + base_type->ctype.printf_va_start = decl->ctype.printf_va_start; } else if (base_type == &void_ctype && !(decl->ctype.modifiers & MOD_EXTERN)) { sparse_error(token->pos, "void declaration"); + } else if (base_type && base_type->type == SYM_PTR) { + // think this is correct to do // + struct symbol *ptr_base = get_base_type(base_type); + if (ptr_base) { + ptr_base->ctype.printf_msg = decl->ctype.printf_msg; + ptr_base->ctype.printf_va_start = decl->ctype.printf_va_start; + } } if (base_type == &incomplete_ctype) { warning(decl->pos, "'%s' has implicit type", show_ident(decl->ident)); diff --git a/symbol.h b/symbol.h index ac43b31..7bb6f29 100644 --- a/symbol.h +++ b/symbol.h @@ -86,6 +86,7 @@ enum keyword { KW_SHORT = 1 << 7, KW_LONG = 1 << 8, KW_EXACT = 1 << 9, + KW_FORMAT = 1 << 10, }; struct context { @@ -103,6 +104,7 @@ struct ctype { struct context_list *contexts; struct ident *as; struct symbol *base_type; + unsigned short printf_va_start, printf_msg; }; struct decl_state {
Add code to parse the __attribute__((format)) used to indicate that a variadic function takes a printf-style format string and where those are. Save the data in ctype ready for checking when such an function is encoutered. Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk> --- Fixes since v1: - moved to using ctype in base_type to store infromation - fixed formatting issues - updated check for bad format arguments - reduced the line count to unsigned short to save space Fixes since v2: - correctly use the decl->ctype to store printf information - fixed issues with checking format positions for va_list code - parse but ignore scanf formatting for now Fixes since v4: - deal with function pointers losing format info --- parse.c | 102 ++++++++++++++++++++++++++++++++++++++++++++++++++++++- symbol.h | 2 ++ 2 files changed, 103 insertions(+), 1 deletion(-)