diff mbox series

docs/style: allow C99 mixed declarations

Message ID 20240205171819.474283-1-stefanha@redhat.com (mailing list archive)
State New, archived
Headers show
Series docs/style: allow C99 mixed declarations | expand

Commit Message

Stefan Hajnoczi Feb. 5, 2024, 5:18 p.m. UTC
C99 mixed declarations support interleaving of local variable
declarations and code.

The coding style "generally" forbids C99 mixed declarations with some
exceptions to the rule. This rule is not checked by checkpatch.pl and
naturally there are violations in the source tree.

While contemplating adding another exception, I came to the conclusion
that the best location for declarations depends on context. Let the
programmer declare variables where it is best for legibility. Don't try
to define all possible scenarios/exceptions.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 docs/devel/style.rst | 20 --------------------
 1 file changed, 20 deletions(-)

Comments

Daniel P. Berrangé Feb. 5, 2024, 5:41 p.m. UTC | #1
On Mon, Feb 05, 2024 at 12:18:19PM -0500, Stefan Hajnoczi wrote:
> C99 mixed declarations support interleaving of local variable
> declarations and code.
> 
> The coding style "generally" forbids C99 mixed declarations with some
> exceptions to the rule. This rule is not checked by checkpatch.pl and
> naturally there are violations in the source tree.
> 
> While contemplating adding another exception, I came to the conclusion
> that the best location for declarations depends on context. Let the
> programmer declare variables where it is best for legibility. Don't try
> to define all possible scenarios/exceptions.

IIRC, we had a discussion on this topic sometime last year, but can't
remember what the $SUBJECT was, so I'll just repeat what I said then.

Combining C99 mixed declarations with 'goto' is dangerous.

Consider this program:

$ cat jump.c
#include <stdlib.h>

int main(int argc, char**argv) {

  if (getenv("SKIP"))
    goto target;

  char *foo = malloc(30);

 target:
  free(foo);
}

$ gcc -Wall -Wuninitialized -o jump jump.c

$ SKIP=1 ./jump 
free(): invalid pointer
Aborted (core dumped)


 -> The programmer thinks they have initialized 'foo'
 -> GCC thinks the programmer has initialized 'foo'
 -> Yet 'foo' is not guaranteed to be initialized at 'target:'

Given that QEMU makes heavy use of 'goto', allowing C99 mixed
declarations exposes us to significant danger.

Full disclosure, GCC fails to diagnmose this mistake, even
with a decl at start of 'main', but at least the mistake is
now more visible to the programmer.

Fortunately with -fanalyzer GCC can diagnose this:

$ gcc -fanalyzer -Wall -o jump jump.c
jump.c: In function ‘main’:
jump.c:12:3: warning: use of uninitialized value ‘foo’ [CWE-457] [-Wanalyzer-use-of-uninitialized-value]
   12 |   free(foo);
      |   ^~~~~~~~~
  ‘main’: events 1-5
    |
    |    6 |   if (getenv("SKIP"))
    |      |      ~  
    |      |      |
    |      |      (3) following ‘true’ branch...
    |    7 |     goto target;
    |      |     ~~~~
    |      |     |
    |      |     (4) ...to here
    |    8 | 
    |    9 |  char *foo = malloc(30);
    |      |        ^~~
    |      |        |
    |      |        (1) region created on stack here
    |      |        (2) capacity: 8 bytes
    |......
    |   12 |   free(foo);
    |      |   ~~~~~~~~~
    |      |   |
    |      |   (5) use of uninitialized value ‘foo’ here


...but -fanalyzer isn't something we have enabled by default, it
is opt-in. I'm also not sure how comprehensive the flow control
analysis of -fanalyzer is ?  Can we be sure it'll catch these
mistakes in large complex functions with many code paths ?

Even if the compiler does reliably warn, I think the code pattern
remains misleading to contributors, as the flow control flaw is
very non-obvious.

Rather than accept the status quo and remove the coding guideline,
I think we should strengthen the guidelines, such that it is
explicitly forbidden in any method that uses 'goto'. Personally
I'd go all the way to -Werror=declaration-after-statement, as
while C99 mixed decl is appealing, it isn't exactly a game
changer in improving code maintainability.

> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  docs/devel/style.rst | 20 --------------------
>  1 file changed, 20 deletions(-)
> 
> diff --git a/docs/devel/style.rst b/docs/devel/style.rst
> index 2f68b50079..80c4e4df52 100644
> --- a/docs/devel/style.rst
> +++ b/docs/devel/style.rst
> @@ -199,26 +199,6 @@ Rationale: a consistent (except for functions...) bracing style reduces
>  ambiguity and avoids needless churn when lines are added or removed.
>  Furthermore, it is the QEMU coding style.
>  
> -Declarations
> -============
> -
> -Mixed declarations (interleaving statements and declarations within
> -blocks) are generally not allowed; declarations should be at the beginning
> -of blocks. To avoid accidental re-use it is permissible to declare
> -loop variables inside for loops:
> -
> -.. code-block:: c
> -
> -    for (int i = 0; i < ARRAY_SIZE(thing); i++) {
> -        /* do something loopy */
> -    }
> -
> -Every now and then, an exception is made for declarations inside a
> -#ifdef or #ifndef block: if the code looks nicer, such declarations can
> -be placed at the top of the block even if there are statements above.
> -On the other hand, however, it's often best to move that #ifdef/#ifndef
> -block to a separate function altogether.
> -
>  Conditional statements
>  ======================
>  
> -- 
> 2.43.0
> 

With regards,
Daniel
Peter Maydell Feb. 5, 2024, 5:55 p.m. UTC | #2
On Mon, 5 Feb 2024 at 17:41, Daniel P. Berrangé <berrange@redhat.com> wrote:
> Rather than accept the status quo and remove the coding guideline,
> I think we should strengthen the guidelines, such that it is
> explicitly forbidden in any method that uses 'goto'. Personally
> I'd go all the way to -Werror=declaration-after-statement, as
> while C99 mixed decl is appealing, it isn't exactly a game
> changer in improving code maintainability.

I definitely think we should either (a) say we're OK with
mixed declarations or (b) enforce it via the warning option.
(gcc -Wdeclaration-after-statement does not appear to warn
for the "declaration inside for()" syntax, so we could
enable it if we wanted to fix the existing style lapses.)

Personally I prefer declaration-at-top-of-block, but
more as an aesthetic choice than anything else.

thanks
-- PMM
Samuel Tardieu Feb. 5, 2024, 6:12 p.m. UTC | #3
Daniel P. Berrangé <berrange@redhat.com> writes:

> $ gcc -Wall -Wuninitialized -o jump jump.c

Note that many GCC warnings don't trigger if you don't enable 
optimizations. In the case you exhibit, adding -O is enough to get 
a sensible warning:

$ gcc -Wall -O -o jump jump.c
jump.c: In function ‘main’:
jump.c:11:3: warning: ‘foo’ may be used uninitialized 
[-Wmaybe-uninitialized]
   11 |   free(foo);
      |   ^~~~~~~~~
jump.c:8:9: note: ‘foo’ was declared here
    8 |   char *foo = malloc(30);
      |         ^~~

Best.

  Sam
Stefan Hajnoczi Feb. 5, 2024, 7:06 p.m. UTC | #4
On Mon, 5 Feb 2024 at 13:16, Samuel Tardieu <sam@rfc1149.net> wrote:
>
>
> Daniel P. Berrangé <berrange@redhat.com> writes:
>
> > $ gcc -Wall -Wuninitialized -o jump jump.c
>
> Note that many GCC warnings don't trigger if you don't enable
> optimizations. In the case you exhibit, adding -O is enough to get
> a sensible warning:
>
> $ gcc -Wall -O -o jump jump.c
> jump.c: In function ‘main’:
> jump.c:11:3: warning: ‘foo’ may be used uninitialized
> [-Wmaybe-uninitialized]
>    11 |   free(foo);
>       |   ^~~~~~~~~
> jump.c:8:9: note: ‘foo’ was declared here
>     8 |   char *foo = malloc(30);
>       |         ^~~

llvm also prints a warning:

  jump.c:5:7: warning: variable 'foo' is used uninitialized whenever
'if' condition is true [-Wsometimes-uninitialized]

I confirmed that QEMU's current compiler flags enable these warnings
so both gcc and llvm detect the issue that Daniel pointed out in QEMU
code.

Daniel: Does this address your concern about compiler warnings?


Stefan
Daniel P. Berrangé Feb. 5, 2024, 7:17 p.m. UTC | #5
On Mon, Feb 05, 2024 at 02:06:39PM -0500, Stefan Hajnoczi wrote:
> On Mon, 5 Feb 2024 at 13:16, Samuel Tardieu <sam@rfc1149.net> wrote:
> >
> >
> > Daniel P. Berrangé <berrange@redhat.com> writes:
> >
> > > $ gcc -Wall -Wuninitialized -o jump jump.c
> >
> > Note that many GCC warnings don't trigger if you don't enable
> > optimizations. In the case you exhibit, adding -O is enough to get
> > a sensible warning:
> >
> > $ gcc -Wall -O -o jump jump.c
> > jump.c: In function ‘main’:
> > jump.c:11:3: warning: ‘foo’ may be used uninitialized
> > [-Wmaybe-uninitialized]
> >    11 |   free(foo);
> >       |   ^~~~~~~~~
> > jump.c:8:9: note: ‘foo’ was declared here
> >     8 |   char *foo = malloc(30);
> >       |         ^~~
> 
> llvm also prints a warning:
> 
>   jump.c:5:7: warning: variable 'foo' is used uninitialized whenever
> 'if' condition is true [-Wsometimes-uninitialized]
> 
> I confirmed that QEMU's current compiler flags enable these warnings
> so both gcc and llvm detect the issue that Daniel pointed out in QEMU
> code.
> 
> Daniel: Does this address your concern about compiler warnings?

While it is good that its included when optimization is enabled, IME
GCC is not entirely reliable at detecting unintialized variables as
it has limits on extent of its flow control analysis.

Also consider the maint impact if a method includes C99 decls, and
a contributor later needs to add a 'goto' error cleanup path. They
might be forced to make a bunch of unrelated refactoring changes to
eliminate use of the C99 decls. This makes me not a fan of having a
situation where we tell contributors they can use C99 mixed decls,
except when they can't use them.

With regards,
Daniel
Alex Bennée Feb. 5, 2024, 11:17 p.m. UTC | #6
Stefan Hajnoczi <stefanha@redhat.com> writes:

> C99 mixed declarations support interleaving of local variable
> declarations and code.
>
> The coding style "generally" forbids C99 mixed declarations with some
> exceptions to the rule. This rule is not checked by checkpatch.pl and
> naturally there are violations in the source tree.
>
> While contemplating adding another exception, I came to the conclusion
> that the best location for declarations depends on context. Let the
> programmer declare variables where it is best for legibility. Don't try
> to define all possible scenarios/exceptions.
>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  docs/devel/style.rst | 20 --------------------
>  1 file changed, 20 deletions(-)

While I may be convinced there may be some cases where mixed
declarations could make things simpler/clearer I don't support removing
all the guidance and leaving it as a free for all as long as the
compiler is happy.
Markus Armbruster Feb. 6, 2024, 5:53 a.m. UTC | #7
Daniel P. Berrangé <berrange@redhat.com> writes:

> On Mon, Feb 05, 2024 at 12:18:19PM -0500, Stefan Hajnoczi wrote:
>> C99 mixed declarations support interleaving of local variable
>> declarations and code.
>> 
>> The coding style "generally" forbids C99 mixed declarations with some
>> exceptions to the rule. This rule is not checked by checkpatch.pl and
>> naturally there are violations in the source tree.
>> 
>> While contemplating adding another exception, I came to the conclusion
>> that the best location for declarations depends on context. Let the
>> programmer declare variables where it is best for legibility. Don't try
>> to define all possible scenarios/exceptions.
>
> IIRC, we had a discussion on this topic sometime last year, but can't
> remember what the $SUBJECT was, so I'll just repeat what I said then.

From: Juan Quintela <quintela@redhat.com>
Subject: [PATCH] Change the default for Mixed declarations.
Date: Tue, 14 Feb 2023 17:07:38 +0100
Message-Id: <20230214160738.88614-1-quintela@redhat.com>
https://lore.kernel.org/qemu-devel/20230214160738.88614-1-quintela@redhat.com/

> Combining C99 mixed declarations with 'goto' is dangerous.
>
> Consider this program:
>
> $ cat jump.c
> #include <stdlib.h>
>
> int main(int argc, char**argv) {
>
>   if (getenv("SKIP"))
>     goto target;
>
>   char *foo = malloc(30);
>
>  target:
>   free(foo);
> }
>
> $ gcc -Wall -Wuninitialized -o jump jump.c
>
> $ SKIP=1 ./jump 
> free(): invalid pointer
> Aborted (core dumped)
>
>
>  -> The programmer thinks they have initialized 'foo'
>  -> GCC thinks the programmer has initialized 'foo'
>  -> Yet 'foo' is not guaranteed to be initialized at 'target:'
>
> Given that QEMU makes heavy use of 'goto', allowing C99 mixed
> declarations exposes us to significant danger.
>
> Full disclosure, GCC fails to diagnmose this mistake, even
> with a decl at start of 'main', but at least the mistake is
> now more visible to the programmer.
>
> Fortunately with -fanalyzer GCC can diagnose this:
>
> $ gcc -fanalyzer -Wall -o jump jump.c
> jump.c: In function ‘main’:
> jump.c:12:3: warning: use of uninitialized value ‘foo’ [CWE-457] [-Wanalyzer-use-of-uninitialized-value]
>    12 |   free(foo);
>       |   ^~~~~~~~~
>   ‘main’: events 1-5
>     |
>     |    6 |   if (getenv("SKIP"))
>     |      |      ~  
>     |      |      |
>     |      |      (3) following ‘true’ branch...
>     |    7 |     goto target;
>     |      |     ~~~~
>     |      |     |
>     |      |     (4) ...to here
>     |    8 | 
>     |    9 |  char *foo = malloc(30);
>     |      |        ^~~
>     |      |        |
>     |      |        (1) region created on stack here
>     |      |        (2) capacity: 8 bytes
>     |......
>     |   12 |   free(foo);
>     |      |   ~~~~~~~~~
>     |      |   |
>     |      |   (5) use of uninitialized value ‘foo’ here
>
>
> ...but -fanalyzer isn't something we have enabled by default, it
> is opt-in. I'm also not sure how comprehensive the flow control
> analysis of -fanalyzer is ?  Can we be sure it'll catch these
> mistakes in large complex functions with many code paths ?
>
> Even if the compiler does reliably warn, I think the code pattern
> remains misleading to contributors, as the flow control flaw is
> very non-obvious.

Yup.  Strong dislike.

> Rather than accept the status quo and remove the coding guideline,
> I think we should strengthen the guidelines, such that it is
> explicitly forbidden in any method that uses 'goto'. Personally
> I'd go all the way to -Werror=declaration-after-statement, as

I support this.

> while C99 mixed decl is appealing,

Not to me.

I much prefer declarations and statements to be visually distinct.
Putting declarations first and separating from statements them with a
blank line accomplishes that.  Less necessary in languages where
declarations are syntactically obvious.

>                                    it isn't exactly a game
> changer in improving code maintainability.


[...]
Stefan Hajnoczi Feb. 6, 2024, 1:48 p.m. UTC | #8
On Mon, 5 Feb 2024 at 12:19, Stefan Hajnoczi <stefanha@redhat.com> wrote:
>
> C99 mixed declarations support interleaving of local variable
> declarations and code.
>
> The coding style "generally" forbids C99 mixed declarations with some
> exceptions to the rule. This rule is not checked by checkpatch.pl and
> naturally there are violations in the source tree.
>
> While contemplating adding another exception, I came to the conclusion
> that the best location for declarations depends on context. Let the
> programmer declare variables where it is best for legibility. Don't try
> to define all possible scenarios/exceptions.
>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  docs/devel/style.rst | 20 --------------------
>  1 file changed, 20 deletions(-)

I'm withdrawing this patch because it seems many of the other QEMU
maintainers prefer not to have C99 mixed declarations.

Stefan

> diff --git a/docs/devel/style.rst b/docs/devel/style.rst
> index 2f68b50079..80c4e4df52 100644
> --- a/docs/devel/style.rst
> +++ b/docs/devel/style.rst
> @@ -199,26 +199,6 @@ Rationale: a consistent (except for functions...) bracing style reduces
>  ambiguity and avoids needless churn when lines are added or removed.
>  Furthermore, it is the QEMU coding style.
>
> -Declarations
> -============
> -
> -Mixed declarations (interleaving statements and declarations within
> -blocks) are generally not allowed; declarations should be at the beginning
> -of blocks. To avoid accidental re-use it is permissible to declare
> -loop variables inside for loops:
> -
> -.. code-block:: c
> -
> -    for (int i = 0; i < ARRAY_SIZE(thing); i++) {
> -        /* do something loopy */
> -    }
> -
> -Every now and then, an exception is made for declarations inside a
> -#ifdef or #ifndef block: if the code looks nicer, such declarations can
> -be placed at the top of the block even if there are statements above.
> -On the other hand, however, it's often best to move that #ifdef/#ifndef
> -block to a separate function altogether.
> -
>  Conditional statements
>  ======================
>
> --
> 2.43.0
>
>
Philippe Mathieu-Daudé Feb. 7, 2024, 5:28 a.m. UTC | #9
On 6/2/24 06:53, Markus Armbruster wrote:
> Daniel P. Berrangé <berrange@redhat.com> writes:
> 
>> On Mon, Feb 05, 2024 at 12:18:19PM -0500, Stefan Hajnoczi wrote:
>>> C99 mixed declarations support interleaving of local variable
>>> declarations and code.
>>>
>>> The coding style "generally" forbids C99 mixed declarations with some
>>> exceptions to the rule. This rule is not checked by checkpatch.pl and
>>> naturally there are violations in the source tree.
>>>
>>> While contemplating adding another exception, I came to the conclusion
>>> that the best location for declarations depends on context. Let the
>>> programmer declare variables where it is best for legibility. Don't try
>>> to define all possible scenarios/exceptions.
...

>> Even if the compiler does reliably warn, I think the code pattern
>> remains misleading to contributors, as the flow control flaw is
>> very non-obvious.
> 
> Yup.  Strong dislike.
> 
>> Rather than accept the status quo and remove the coding guideline,
>> I think we should strengthen the guidelines, such that it is
>> explicitly forbidden in any method that uses 'goto'. Personally
>> I'd go all the way to -Werror=declaration-after-statement, as
> 
> I support this.
> 
>> while C99 mixed decl is appealing,
> 
> Not to me.
> 
> I much prefer declarations and statements to be visually distinct.
> Putting declarations first and separating from statements them with a
> blank line accomplishes that.  Less necessary in languages where
> declarations are syntactically obvious.

But we already implicitly suggest C99, see commit ae7c80a7bd
("error: New macro ERRP_GUARD()"):

  * To use ERRP_GUARD(), add it right at the beginning of the function.
  * @errp can then be used without worrying about the argument being
  * NULL or &error_fatal.

  #define ERRP_GUARD()                                           \
     g_auto(ErrorPropagator) _auto_errp_prop = {.errp = errp};   \
     do {                                                        \
         if (!errp || errp == &error_fatal) {                    \
             errp = &_auto_errp_prop.local_err;                  \
         }                                                       \
     } while (0)

Or commit 5626f8c6d4 ("rcu: Add automatically released rcu_read_lock
variants") with WITH_RCU_READ*:

util/aio-posix.c:540:5: error: mixing declarations and code is 
incompatible with standards before C99 
[-Werror,-Wdeclaration-after-statement]
     RCU_READ_LOCK_GUARD();
     ^
include/qemu/rcu.h:189:28: note: expanded from macro 'RCU_READ_LOCK_GUARD'
     g_autoptr(RCUReadAuto) _rcu_read_auto __attribute__((unused)) = 
rcu_read_auto_lock()
                            ^
Markus Armbruster Feb. 7, 2024, 7:37 a.m. UTC | #10
Philippe Mathieu-Daudé <philmd@linaro.org> writes:

> On 6/2/24 06:53, Markus Armbruster wrote:
>> Daniel P. Berrangé <berrange@redhat.com> writes:
>> 
>>> On Mon, Feb 05, 2024 at 12:18:19PM -0500, Stefan Hajnoczi wrote:
>>>> C99 mixed declarations support interleaving of local variable
>>>> declarations and code.
>>>>
>>>> The coding style "generally" forbids C99 mixed declarations with some
>>>> exceptions to the rule. This rule is not checked by checkpatch.pl and
>>>> naturally there are violations in the source tree.
>>>>
>>>> While contemplating adding another exception, I came to the conclusion
>>>> that the best location for declarations depends on context. Let the
>>>> programmer declare variables where it is best for legibility. Don't try
>>>> to define all possible scenarios/exceptions.
> ...
>
>>> Even if the compiler does reliably warn, I think the code pattern
>>> remains misleading to contributors, as the flow control flaw is
>>> very non-obvious.
>> 
>> Yup.  Strong dislike.
>> 
>>> Rather than accept the status quo and remove the coding guideline,
>>> I think we should strengthen the guidelines, such that it is
>>> explicitly forbidden in any method that uses 'goto'. Personally
>>> I'd go all the way to -Werror=declaration-after-statement, as
>> 
>> I support this.
>> 
>>> while C99 mixed decl is appealing,
>> 
>> Not to me.
>> 
>> I much prefer declarations and statements to be visually distinct.
>> Putting declarations first and separating from statements them with a
>> blank line accomplishes that.  Less necessary in languages where
>> declarations are syntactically obvious.
>
> But we already implicitly suggest C99, see commit ae7c80a7bd
> ("error: New macro ERRP_GUARD()"):
>
>   * To use ERRP_GUARD(), add it right at the beginning of the function.
>   * @errp can then be used without worrying about the argument being
>   * NULL or &error_fatal.
>
>   #define ERRP_GUARD()                                           \
>      g_auto(ErrorPropagator) _auto_errp_prop = {.errp = errp};   \
>      do {                                                        \
>          if (!errp || errp == &error_fatal) {                    \
>              errp = &_auto_errp_prop.local_err;                  \
>          }                                                       \
>      } while (0)

We can make ERRP_GUARD() expand into just a declaration:

    #define ERRP_GUARD()                                            \
        g_auto(ErrorPropagator) _auto_errp_prop = {                 \
            .errp = errp,                                           \
            .local_err = ((!errp || errp == &error_fatal            \
                          ? errp = &_auto_errp_prop.local_err       \
                          : NULL),                                  \
                          NULL) }

> Or commit 5626f8c6d4 ("rcu: Add automatically released rcu_read_lock
> variants") with WITH_RCU_READ*:
>
> util/aio-posix.c:540:5: error: mixing declarations and code is 
> incompatible with standards before C99 
> [-Werror,-Wdeclaration-after-statement]
>      RCU_READ_LOCK_GUARD();
>      ^
> include/qemu/rcu.h:189:28: note: expanded from macro 'RCU_READ_LOCK_GUARD'
>      g_autoptr(RCUReadAuto) _rcu_read_auto __attribute__((unused)) = 
> rcu_read_auto_lock()
>                             ^

Valid example; RCU_READ_LOCK_GUARD() expands into a declaration.

To enable -Wdeclaration-after-statement, we'd have to futz around with
_Pragma.
diff mbox series

Patch

diff --git a/docs/devel/style.rst b/docs/devel/style.rst
index 2f68b50079..80c4e4df52 100644
--- a/docs/devel/style.rst
+++ b/docs/devel/style.rst
@@ -199,26 +199,6 @@  Rationale: a consistent (except for functions...) bracing style reduces
 ambiguity and avoids needless churn when lines are added or removed.
 Furthermore, it is the QEMU coding style.
 
-Declarations
-============
-
-Mixed declarations (interleaving statements and declarations within
-blocks) are generally not allowed; declarations should be at the beginning
-of blocks. To avoid accidental re-use it is permissible to declare
-loop variables inside for loops:
-
-.. code-block:: c
-
-    for (int i = 0; i < ARRAY_SIZE(thing); i++) {
-        /* do something loopy */
-    }
-
-Every now and then, an exception is made for declarations inside a
-#ifdef or #ifndef block: if the code looks nicer, such declarations can
-be placed at the top of the block even if there are statements above.
-On the other hand, however, it's often best to move that #ifdef/#ifndef
-block to a separate function altogether.
-
 Conditional statements
 ======================