[01/22] qapi/parser: Don't try to handle file errors

Message ID	20210422030720.3685766-2-jsnow@redhat.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=ILBl=JT=nongnu.org=qemu-devel-bounces+qemu-devel=archiver.kernel.org@kernel.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 134CD61422 From: John Snow <jsnow@redhat.com> To: qemu-devel@nongnu.org Subject: [PATCH 01/22] qapi/parser: Don't try to handle file errors Date: Wed, 21 Apr 2021 23:06:59 -0400 Message-Id: <20210422030720.3685766-2-jsnow@redhat.com> In-Reply-To: <20210422030720.3685766-1-jsnow@redhat.com> References: <20210422030720.3685766-1-jsnow@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset="US-ASCII" Received-SPF: pass client-ip=216.205.24.124; envelope-from=jsnow@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=unavailable autolearn_force=no X-Spam_action: no action Precedence: list Cc: Michael Roth <michael.roth@amd.com>, Cleber Rosa <crosa@redhat.com>, John Snow <jsnow@redhat.com>, Markus Armbruster <armbru@redhat.com>, Eduardo Habkost <ehabkost@redhat.com> Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" <qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org>
Series	qapi: static typing conversion, pt5a \| expand [00/22] qapi: static typing conversion, pt5a [01/22] qapi/parser: Don't try to handle file errors [02/22] qapi/source: [RFC] add "with_column" contextmanager [03/22] qapi/source: Remove line number from QAPISourceInfo initializer [04/22] qapi/parser: factor parsing routine into method [05/22] qapi/parser: Assert lexer value is a string [06/22] qapi/parser: assert get_expr returns object in outer loop [07/22] qapi/parser: assert object keys are strings [08/22] qapi/parser: Use @staticmethod where appropriate [09/22] qapi: add match_nofail helper [10/22] qapi/parser: Fix typing of token membership tests [11/22] qapi/parser: Rework _check_pragma_list_of_str as a TypeGuard [12/22] qapi/parser: add type hint annotations [13/22] qapi/parser: [RFC] overload the return type of get_expr [14/22] qapi/parser: Remove superfluous list constructor [15/22] qapi/parser: allow 'ch' variable name [16/22] qapi/parser: add docstrings [17/22] CHECKPOINT [18/22] qapi: [WIP] Rip QAPIDoc out of parser.py [19/22] qapi: [WIP] Add type ignores for qapidoc.py [20/22] qapi: [WIP] Import QAPIDoc from qapidoc Signed-off-by: John Snow <jsnow@redhat.com> [21/22] qapi: [WIP] Add QAPIDocError [22/22] qapi: [WIP] Enable linters on parser.py

John Snow April 22, 2021, 3:06 a.m. UTC

The short-ish version of what motivates this patch is:

- The parser initializer does not possess adequate context to write a
  good error message -- It tries to determine the caller's semantic
  context.
- We don't want to allow QAPISourceInfo(None, None, None) to exist.
- Errors made using such an object are currently incorrect.
- It's not technically a semantic error if we cannot open the schema
- There are various typing constraints that make mixing these two cases
  undesirable for a single special case.
- The current open block in parser's initializer will leak file
  pointers, because it isn't using a with statement.


Here's the details in why this got written the way it did, and why a few
disparate issues are rolled into one commit. (They're hard to fix
separately without writing really weird stuff that'd be harder to
review.)

The error message string here is incorrect:

> python3 qapi-gen.py 'fake.json'
qapi-gen.py: qapi-gen.py: can't read schema file 'fake.json': No such file or directory

In pursuing it, we find that QAPISourceInfo has a special accommodation
for when there's no filename. Meanwhile, we intend to type info.fname as
str; something we always have.

To remove this, we need to not have a "fake" QAPISourceInfo object. We
also don't want to explicitly begin accommodating QAPISourceInfo being
None, because we actually want to eventually prove that this can never
happen -- We don't want to confuse "The file isn't open yet" with "This
error stems from a definition that wasn't defined in any file".

(An earlier series tried to create an official dummy object, but it was
tough to prove in review that it worked correctly without creating new
regressions. This patch avoids trying to re-litigate that discussion.

We would like to first prove that we never raise QAPISemError for any
built-in object before we relent and add "special" info objects. We
aren't ready to do that yet, so crashing is preferred.)

So, how to solve this mess?

Here's one way: Don't try to handle errors at a level with "mixed"
semantic levels; i.e. don't try to handle inclusion errors (should
report a source line where the include was triggered) with command line
errors (where we specified a file we couldn't read).

Simply remove the error handling from the initializer of the
parser. Pythonic! Now it's the caller's job to figure out what to do
about it. Handle the error in QAPISchemaParser._include() instead, where
we do have the correct semantic context to not need to play games with
the error message generation.

Next, to re-gain a nice error at the top level, add a new try/except
into qapi/main.generate(). Now the error looks sensible:

> python3 qapi-gen.py 'fake.json'
qapi-gen.py: can't read schema file 'fake.json': No such file or directory

Lastly, with this usage gone, we can remove the special type violation
from QAPISourceInfo, and all is well with the world.

Signed-off-by: John Snow <jsnow@redhat.com>
---
 scripts/qapi/main.py   |  8 +++++++-
 scripts/qapi/parser.py | 18 +++++++++---------
 scripts/qapi/source.py |  3 ---
 3 files changed, 16 insertions(+), 13 deletions(-)

Markus Armbruster April 23, 2021, 3:46 p.m. UTC | #1

John Snow <jsnow@redhat.com> writes:

> The short-ish version of what motivates this patch is:
>
> - The parser initializer does not possess adequate context to write a
>   good error message -- It tries to determine the caller's semantic
>   context.

I'm not sure I get what you're trying to say here.

> - We don't want to allow QAPISourceInfo(None, None, None) to exist.
> - Errors made using such an object are currently incorrect.
> - It's not technically a semantic error if we cannot open the schema
> - There are various typing constraints that make mixing these two cases
>   undesirable for a single special case.

These I understand.

> - The current open block in parser's initializer will leak file
>   pointers, because it isn't using a with statement.

Uh, isn't the value returned by open() reference-counted?  @fp is the
only reference...

> Here's the details in why this got written the way it did, and why a few
> disparate issues are rolled into one commit. (They're hard to fix
> separately without writing really weird stuff that'd be harder to
> review.)
>
> The error message string here is incorrect:
>
>> python3 qapi-gen.py 'fake.json'
> qapi-gen.py: qapi-gen.py: can't read schema file 'fake.json': No such file or directory

Regressed in commit 52a474180a "qapi-gen: Separate arg-parsing from
generation" (v5.2.0).

Before commit c615550df3 "qapi: Improve source file read error handling"
(v4.2.0), it was differently bad (uncaught exception).

Commit c615550df3 explains why the funny QAPISourceInfo exists:

    Reporting open or read failure for the main schema file needs a
    QAPISourceInfo representing "no source".  Make QAPISourceInfo cope
    with fname=None.

The commit turned QAPISourceInfo into the equivalent of a disjoint union
of

1. A position in a source file (.fname is a str)

2. "Not in any source file" (.fname is None)

This is somewhat similar to struct Location in C, which has

1. LOC_FILE: a position in a source file

2. LOC_CMDLINE: a range of command line arguments

3. LOC_NONE: no location information

Abstracting locations this way lets error_report() do the right thing
whether its complaining about the command line, a monitor command, or a
configuration file read with -readconfig.

Your patch demonstrates that qapi-gen has much less need for abstracting
sources: we use 2. "Not in any source file" only for reading the main
schema file.

> In pursuing it, we find that QAPISourceInfo has a special accommodation
> for when there's no filename.

Yes:

    def loc(self) -> str:
-->     if self.fname is None:
-->         return sys.argv[0]
        ret = self.fname
        if self.line is not None:
            ret += ':%d' % self.line
        return ret

>                               Meanwhile, we intend to type info.fname as
> str; something we always have.

Do you mean "as non-optional str"?

> To remove this, we need to not have a "fake" QAPISourceInfo object. We

We may well want to, but I doubt we *need* to.  There are almost
certainly other ways to fix the bug.  I don't see a need to explore
them, though.

> also don't want to explicitly begin accommodating QAPISourceInfo being
> None, because we actually want to eventually prove that this can never
> happen -- We don't want to confuse "The file isn't open yet" with "This
> error stems from a definition that wasn't defined in any file".

Yes, encoding both "poisoned source info not to be used with actual
errors" and "'fake' source info not pointing to a source file" as None
would be a mistake.

> (An earlier series tried to create an official dummy object, but it was
> tough to prove in review that it worked correctly without creating new
> regressions. This patch avoids trying to re-litigate that discussion.
>
> We would like to first prove that we never raise QAPISemError for any
> built-in object before we relent and add "special" info objects. We
> aren't ready to do that yet, so crashing is preferred.)
>
> So, how to solve this mess?
>
> Here's one way: Don't try to handle errors at a level with "mixed"
> semantic levels; i.e. don't try to handle inclusion errors (should
> report a source line where the include was triggered) with command line
> errors (where we specified a file we couldn't read).
>
> Simply remove the error handling from the initializer of the
> parser. Pythonic! Now it's the caller's job to figure out what to do
> about it. Handle the error in QAPISchemaParser._include() instead, where
> we do have the correct semantic context to not need to play games with
> the error message generation.
>
> Next, to re-gain a nice error at the top level, add a new try/except
> into qapi/main.generate(). Now the error looks sensible:

Missing "again" after "sensible" ;-P

>
>> python3 qapi-gen.py 'fake.json'
> qapi-gen.py: can't read schema file 'fake.json': No such file or directory
>
> Lastly, with this usage gone, we can remove the special type violation
> from QAPISourceInfo, and all is well with the world.
>
> Signed-off-by: John Snow <jsnow@redhat.com>
> ---
>  scripts/qapi/main.py   |  8 +++++++-
>  scripts/qapi/parser.py | 18 +++++++++---------
>  scripts/qapi/source.py |  3 ---
>  3 files changed, 16 insertions(+), 13 deletions(-)
>
> diff --git a/scripts/qapi/main.py b/scripts/qapi/main.py
> index 703e7ed1ed5..70f8aa86f37 100644
> --- a/scripts/qapi/main.py
> +++ b/scripts/qapi/main.py
> @@ -48,7 +48,13 @@ def generate(schema_file: str,
>      """
>      assert invalid_prefix_char(prefix) is None
>  
> -    schema = QAPISchema(schema_file)
> +    try:
> +        schema = QAPISchema(schema_file)
> +    except OSError as err:
> +        raise QAPIError(
> +            f"can't read schema file '{schema_file}': {err.strerror}"
> +        ) from err
> +
>      gen_types(schema, output_dir, prefix, builtins)
>      gen_visit(schema, output_dir, prefix, builtins)
>      gen_commands(schema, output_dir, prefix)
> diff --git a/scripts/qapi/parser.py b/scripts/qapi/parser.py
> index ca5e8e18e00..b378fa33807 100644
> --- a/scripts/qapi/parser.py
> +++ b/scripts/qapi/parser.py
> @@ -40,15 +40,9 @@ def __init__(self, fname, previously_included=None, incl_info=None):
>          previously_included = previously_included or set()
>          previously_included.add(os.path.abspath(fname))
>  
> -        try:
> -            fp = open(fname, 'r', encoding='utf-8')
> +        # Allow the caller to catch this error.

"this error"?  I understand what you mean now, but I'm not sure I will
in three months, when I won't have the context I have now.

> +        with open(fname, 'r', encoding='utf-8') as fp:
>              self.src = fp.read()
> -        except IOError as e:
> -            raise QAPISemError(incl_info or QAPISourceInfo(None, None, None),
> -                               "can't read %s file '%s': %s"
> -                               % ("include" if incl_info else "schema",
> -                                  fname,
> -                                  e.strerror))
>  
>          if self.src == '' or self.src[-1] != '\n':
>              self.src += '\n'
> @@ -129,7 +123,13 @@ def _include(self, include, info, incl_fname, previously_included):
>          if incl_abs_fname in previously_included:
>              return None
>  
> -        return QAPISchemaParser(incl_fname, previously_included, info)
> +        try:
> +            return QAPISchemaParser(incl_fname, previously_included, info)
> +        except OSError as err:
> +            raise QAPISemError(
> +                info,
> +                f"can't read include file '{incl_fname}': {err.strerror}"
> +            ) from err
>  
>      def _check_pragma_list_of_str(self, name, value, info):
>          if (not isinstance(value, list)

Before the patch, only IOError from open() and .read() get converted to
QAPISemError, and therefore caught by main().

The patch widen this to anywhere in QAPISchemaParser.__init__().  Hmm.

> diff --git a/scripts/qapi/source.py b/scripts/qapi/source.py
> index 03b6ede0828..1ade864d7b9 100644
> --- a/scripts/qapi/source.py
> +++ b/scripts/qapi/source.py
> @@ -10,7 +10,6 @@
>  # See the COPYING file in the top-level directory.
>  
>  import copy
> -import sys
>  from typing import List, Optional, TypeVar
>  
>  
> @@ -53,8 +52,6 @@ def next_line(self: T) -> T:
>          return info
>  
>      def loc(self) -> str:
> -        if self.fname is None:
> -            return sys.argv[0]
>          ret = self.fname
>          if self.line is not None:
>              ret += ':%d' % self.line

tests/qapi-schema/test-qapi.py also needs an update.  Before the patch:

    $ PYTHONPATH=scripts python3 tests/qapi-schema/test-qapi.py nonexistent
    tests/qapi-schema/test-qapi.py: can't read schema file 'nonexistent.json': No such file or directory

After:

    Traceback (most recent call last):
      File "tests/qapi-schema/test-qapi.py", line 207, in <module>
        main(sys.argv)
      File "tests/qapi-schema/test-qapi.py", line 201, in main
        status |= test_and_diff(test_name, dir_name, args.update)
      File "tests/qapi-schema/test-qapi.py", line 129, in test_and_diff
        test_frontend(os.path.join(dir_name, test_name + '.json'))
      File "tests/qapi-schema/test-qapi.py", line 109, in test_frontend
        schema = QAPISchema(fname)
      File "/work/armbru/qemu/scripts/qapi/schema.py", line 852, in __init__
        parser = QAPISchemaParser(fname)
      File "/work/armbru/qemu/scripts/qapi/parser.py", line 44, in __init__
        with open(fname, 'r', encoding='utf-8') as fp:
    FileNotFoundError: [Errno 2] No such file or directory: 'nonexistent.json'

John Snow April 23, 2021, 7:20 p.m. UTC | #2

On 4/23/21 11:46 AM, Markus Armbruster wrote:
> John Snow <jsnow@redhat.com> writes:
> 
>> The short-ish version of what motivates this patch is:
>>
>> - The parser initializer does not possess adequate context to write a
>>    good error message -- It tries to determine the caller's semantic
>>    context.
> 
> I'm not sure I get what you're trying to say here.
> 

I mean: this __init__ method does not *know* who is calling it or why. 
Of course, *we* do, because the code base is finite and nobody else but 
us is calling into it.

I mean to point out that the initializer has to do extra work (Just a 
little) to determine what the calling context is and raise an error 
accordingly.

Example: If we have a parent info context, we raise an error in the 
context of the caller. If we don't, we have to create a new presumed 
context (using the weird None SourceInfo object).

So I just mean to say:

"Let the caller, who unambiguously always has the exactly correct 
context worry about what the error message ought to be."

>> - We don't want to allow QAPISourceInfo(None, None, None) to exist.
>> - Errors made using such an object are currently incorrect.
>> - It's not technically a semantic error if we cannot open the schema
>> - There are various typing constraints that make mixing these two cases
>>    undesirable for a single special case.
> 
> These I understand.
> 
>> - The current open block in parser's initializer will leak file
>>    pointers, because it isn't using a with statement.
> 
> Uh, isn't the value returned by open() reference-counted?  @fp is the
> only reference...
> 

Yeah, eventually. O:-)

Whenever the GC runs. OK, it's not really an apocalypse error, but it 
felt strange to rewrite a try/except and then write it using bad hygiene 
on purpose in the name of a more isolated commit.

>> Here's the details in why this got written the way it did, and why a few
>> disparate issues are rolled into one commit. (They're hard to fix
>> separately without writing really weird stuff that'd be harder to
>> review.)
>>
>> The error message string here is incorrect:
>>
>>> python3 qapi-gen.py 'fake.json'
>> qapi-gen.py: qapi-gen.py: can't read schema file 'fake.json': No such file or directory
> 
> Regressed in commit 52a474180a "qapi-gen: Separate arg-parsing from
> generation" (v5.2.0).
> 

Mea Culpa. Didn't realize it wasn't tested, and I didn't realize at the 
time that the two kinds of errors here were treated differently.

> Before commit c615550df3 "qapi: Improve source file read error handling"
> (v4.2.0), it was differently bad (uncaught exception).
> 
> Commit c615550df3 explains why the funny QAPISourceInfo exists:
> 
>      Reporting open or read failure for the main schema file needs a
>      QAPISourceInfo representing "no source".  Make QAPISourceInfo cope
>      with fname=None.
> 

I am apparently not the first or the last person to dream of wanting a 
QAPISourceInfo that represents "Actually, there's no source location!"

> The commit turned QAPISourceInfo into the equivalent of a disjoint union
> of
> 
> 1. A position in a source file (.fname is a str)
> 
> 2. "Not in any source file" (.fname is None)
> 
> This is somewhat similar to struct Location in C, which has
> 
> 1. LOC_FILE: a position in a source file
> 
> 2. LOC_CMDLINE: a range of command line arguments
> 
> 3. LOC_NONE: no location information
> 
> Abstracting locations this way lets error_report() do the right thing
> whether its complaining about the command line, a monitor command, or a
> configuration file read with -readconfig.
> 
> Your patch demonstrates that qapi-gen has much less need for abstracting
> sources: we use 2. "Not in any source file" only for reading the main
> schema file.
> 

Yes. I got the impression that you didn't want to pursue more abstract 
QSI constructs based on earlier work, so going the other way and 
*removing* them seemed like the faster way to achieve a clean type 
system here.

>> In pursuing it, we find that QAPISourceInfo has a special accommodation
>> for when there's no filename.
> 
> Yes:
> 
>      def loc(self) -> str:
> -->     if self.fname is None:
> -->         return sys.argv[0]
>          ret = self.fname
>          if self.line is not None:
>              ret += ':%d' % self.line
>          return ret
> 
>>                                Meanwhile, we intend to type info.fname as
>> str; something we always have.
> 
> Do you mean "as non-optional str"?
> 

Yeah. I typed it originally as `str`, but the analyzer missed that we 
check the field to see if it's None, which is misleading.

>> To remove this, we need to not have a "fake" QAPISourceInfo object. We
> 
> We may well want to, but I doubt we *need* to.  There are almost
> certainly other ways to fix the bug.  I don't see a need to explore
> them, though.
> 

Either we build out the fake QSI into a proper subtype, or we remove it 
-- those are the two obvious options. Building it out is almost 
certainly more work than this patch.

>> also don't want to explicitly begin accommodating QAPISourceInfo being
>> None, because we actually want to eventually prove that this can never
>> happen -- We don't want to confuse "The file isn't open yet" with "This
>> error stems from a definition that wasn't defined in any file".
> 
> Yes, encoding both "poisoned source info not to be used with actual
> errors" and "'fake' source info not pointing to a source file" as None
> would be a mistake.
> 

:)

>> (An earlier series tried to create an official dummy object, but it was
>> tough to prove in review that it worked correctly without creating new
>> regressions. This patch avoids trying to re-litigate that discussion.
>>
>> We would like to first prove that we never raise QAPISemError for any
>> built-in object before we relent and add "special" info objects. We
>> aren't ready to do that yet, so crashing is preferred.)
>>
>> So, how to solve this mess?
>>
>> Here's one way: Don't try to handle errors at a level with "mixed"
>> semantic levels; i.e. don't try to handle inclusion errors (should
>> report a source line where the include was triggered) with command line
>> errors (where we specified a file we couldn't read).
>>
>> Simply remove the error handling from the initializer of the
>> parser. Pythonic! Now it's the caller's job to figure out what to do
>> about it. Handle the error in QAPISchemaParser._include() instead, where
>> we do have the correct semantic context to not need to play games with
>> the error message generation.
>>
>> Next, to re-gain a nice error at the top level, add a new try/except
>> into qapi/main.generate(). Now the error looks sensible:
> 
> Missing "again" after "sensible" ;-P
> 

okayokayokayfine

>>
>>> python3 qapi-gen.py 'fake.json'
>> qapi-gen.py: can't read schema file 'fake.json': No such file or directory
>>
>> Lastly, with this usage gone, we can remove the special type violation
>> from QAPISourceInfo, and all is well with the world.
>>
>> Signed-off-by: John Snow <jsnow@redhat.com>
>> ---
>>   scripts/qapi/main.py   |  8 +++++++-
>>   scripts/qapi/parser.py | 18 +++++++++---------
>>   scripts/qapi/source.py |  3 ---
>>   3 files changed, 16 insertions(+), 13 deletions(-)
>>
>> diff --git a/scripts/qapi/main.py b/scripts/qapi/main.py
>> index 703e7ed1ed5..70f8aa86f37 100644
>> --- a/scripts/qapi/main.py
>> +++ b/scripts/qapi/main.py
>> @@ -48,7 +48,13 @@ def generate(schema_file: str,
>>       """
>>       assert invalid_prefix_char(prefix) is None
>>   
>> -    schema = QAPISchema(schema_file)
>> +    try:
>> +        schema = QAPISchema(schema_file)
>> +    except OSError as err:
>> +        raise QAPIError(
>> +            f"can't read schema file '{schema_file}': {err.strerror}"
>> +        ) from err
>> +
>>       gen_types(schema, output_dir, prefix, builtins)
>>       gen_visit(schema, output_dir, prefix, builtins)
>>       gen_commands(schema, output_dir, prefix)
>> diff --git a/scripts/qapi/parser.py b/scripts/qapi/parser.py
>> index ca5e8e18e00..b378fa33807 100644
>> --- a/scripts/qapi/parser.py
>> +++ b/scripts/qapi/parser.py
>> @@ -40,15 +40,9 @@ def __init__(self, fname, previously_included=None, incl_info=None):
>>           previously_included = previously_included or set()
>>           previously_included.add(os.path.abspath(fname))
>>   
>> -        try:
>> -            fp = open(fname, 'r', encoding='utf-8')
>> +        # Allow the caller to catch this error.
> 
> "this error"?  I understand what you mean now, but I'm not sure I will
> in three months, when I won't have the context I have now.
> 

Yep, OK.

# May raise OSError, allow the caller to handle it.

>> +        with open(fname, 'r', encoding='utf-8') as fp:
>>               self.src = fp.read()
>> -        except IOError as e:
>> -            raise QAPISemError(incl_info or QAPISourceInfo(None, None, None),
>> -                               "can't read %s file '%s': %s"
>> -                               % ("include" if incl_info else "schema",
>> -                                  fname,
>> -                                  e.strerror))
>>   
>>           if self.src == '' or self.src[-1] != '\n':
>>               self.src += '\n'
>> @@ -129,7 +123,13 @@ def _include(self, include, info, incl_fname, previously_included):
>>           if incl_abs_fname in previously_included:
>>               return None
>>   
>> -        return QAPISchemaParser(incl_fname, previously_included, info)
>> +        try:
>> +            return QAPISchemaParser(incl_fname, previously_included, info)
>> +        except OSError as err:
>> +            raise QAPISemError(
>> +                info,
>> +                f"can't read include file '{incl_fname}': {err.strerror}"
>> +            ) from err
>>   
>>       def _check_pragma_list_of_str(self, name, value, info):
>>           if (not isinstance(value, list)
> 
> Before the patch, only IOError from open() and .read() get converted to
> QAPISemError, and therefore caught by main().
> 
> The patch widen this to anywhere in QAPISchemaParser.__init__().  Hmm.
> 

"Changed in version 3.3: EnvironmentError, IOError, WindowsError, 
socket.error, select.error and mmap.error have been merged into OSError, 
and the constructor may return a subclass."

 >>> OSError == IOError
True

(No, I didn't know this before I wrote it. I just intentionally wanted 
to catch everything that open() might return, which I had simply assumed 
was not fully captured by IOError. Better to leave it as OSError now to 
avoid misleading anyone into thinking it's more narrow than it really is.)

>> diff --git a/scripts/qapi/source.py b/scripts/qapi/source.py
>> index 03b6ede0828..1ade864d7b9 100644
>> --- a/scripts/qapi/source.py
>> +++ b/scripts/qapi/source.py
>> @@ -10,7 +10,6 @@
>>   # See the COPYING file in the top-level directory.
>>   
>>   import copy
>> -import sys
>>   from typing import List, Optional, TypeVar
>>   
>>   
>> @@ -53,8 +52,6 @@ def next_line(self: T) -> T:
>>           return info
>>   
>>       def loc(self) -> str:
>> -        if self.fname is None:
>> -            return sys.argv[0]
>>           ret = self.fname
>>           if self.line is not None:
>>               ret += ':%d' % self.line
> 
> tests/qapi-schema/test-qapi.py also needs an update.  Before the patch:
> 
>      $ PYTHONPATH=scripts python3 tests/qapi-schema/test-qapi.py nonexistent
>      tests/qapi-schema/test-qapi.py: can't read schema file 'nonexistent.json': No such file or directory
> 
> After:
> 
>      Traceback (most recent call last):
>        File "tests/qapi-schema/test-qapi.py", line 207, in <module>
>          main(sys.argv)
>        File "tests/qapi-schema/test-qapi.py", line 201, in main
>          status |= test_and_diff(test_name, dir_name, args.update)
>        File "tests/qapi-schema/test-qapi.py", line 129, in test_and_diff
>          test_frontend(os.path.join(dir_name, test_name + '.json'))
>        File "tests/qapi-schema/test-qapi.py", line 109, in test_frontend
>          schema = QAPISchema(fname)
>        File "/work/armbru/qemu/scripts/qapi/schema.py", line 852, in __init__
>          parser = QAPISchemaParser(fname)
>        File "/work/armbru/qemu/scripts/qapi/parser.py", line 44, in __init__
>          with open(fname, 'r', encoding='utf-8') as fp:
>      FileNotFoundError: [Errno 2] No such file or directory: 'nonexistent.json'
> 

Probably something that should be added to the actual battery of tests 
somehow, yeah? I can't prevent regressions in invocations that don't get 
run O:-)

--js

Markus Armbruster April 27, 2021, 1:47 p.m. UTC | #3

John Snow <jsnow@redhat.com> writes:

> On 4/23/21 11:46 AM, Markus Armbruster wrote:
>> John Snow <jsnow@redhat.com> writes:
>> 
>>> The short-ish version of what motivates this patch is:
>>>
>>> - The parser initializer does not possess adequate context to write a
>>>    good error message -- It tries to determine the caller's semantic
>>>    context.
>> 
>> I'm not sure I get what you're trying to say here.
>> 
>
> I mean: this __init__ method does not *know* who is calling it or why. 
> Of course, *we* do, because the code base is finite and nobody else but 
> us is calling into it.
>
> I mean to point out that the initializer has to do extra work (Just a 
> little) to determine what the calling context is and raise an error 
> accordingly.
>
> Example: If we have a parent info context, we raise an error in the 
> context of the caller. If we don't, we have to create a new presumed 
> context (using the weird None SourceInfo object).

I guess you mean

            raise QAPISemError(incl_info or QAPISourceInfo(None, None, None),

I can't see other instances of messing with context.

> So I just mean to say:
>
> "Let the caller, who unambiguously always has the exactly correct 
> context worry about what the error message ought to be."
>
>>> - We don't want to allow QAPISourceInfo(None, None, None) to exist.
>>> - Errors made using such an object are currently incorrect.
>>> - It's not technically a semantic error if we cannot open the schema
>>> - There are various typing constraints that make mixing these two cases
>>>    undesirable for a single special case.
>> 
>> These I understand.
>> 
>>> - The current open block in parser's initializer will leak file
>>>    pointers, because it isn't using a with statement.
>> 
>> Uh, isn't the value returned by open() reference-counted?  @fp is the
>> only reference...
>> 
>
> Yeah, eventually. O:-)
>
> Whenever the GC runs. OK, it's not really an apocalypse error, but it 
> felt strange to rewrite a try/except and then write it using bad hygiene 
> on purpose in the name of a more isolated commit.

I agree use of with is an improvement (it's idiomatic).  We shouldn't
call it a leak fix, though.

>>> Here's the details in why this got written the way it did, and why a few
>>> disparate issues are rolled into one commit. (They're hard to fix
>>> separately without writing really weird stuff that'd be harder to
>>> review.)
>>>
>>> The error message string here is incorrect:
>>>
>>>> python3 qapi-gen.py 'fake.json'
>>> qapi-gen.py: qapi-gen.py: can't read schema file 'fake.json': No such file or directory
>> 
>> Regressed in commit 52a474180a "qapi-gen: Separate arg-parsing from
>> generation" (v5.2.0).
>> 
>
> Mea Culpa. Didn't realize it wasn't tested, and I didn't realize at the 
> time that the two kinds of errors here were treated differently.

Our tests cover the schema language, not qapi-gen's CLI language.  The
gap feels tolerable.

>> Before commit c615550df3 "qapi: Improve source file read error handling"
>> (v4.2.0), it was differently bad (uncaught exception).
>> 
>> Commit c615550df3 explains why the funny QAPISourceInfo exists:
>> 
>>      Reporting open or read failure for the main schema file needs a
>>      QAPISourceInfo representing "no source".  Make QAPISourceInfo cope
>>      with fname=None.
>> 
>
> I am apparently not the first or the last person to dream of wanting a 
> QAPISourceInfo that represents "Actually, there's no source location!"
>
>> The commit turned QAPISourceInfo into the equivalent of a disjoint union
>> of
>> 
>> 1. A position in a source file (.fname is a str)
>> 
>> 2. "Not in any source file" (.fname is None)
>> 
>> This is somewhat similar to struct Location in C, which has
>> 
>> 1. LOC_FILE: a position in a source file
>> 
>> 2. LOC_CMDLINE: a range of command line arguments
>> 
>> 3. LOC_NONE: no location information
>> 
>> Abstracting locations this way lets error_report() do the right thing
>> whether its complaining about the command line, a monitor command, or a
>> configuration file read with -readconfig.
>> 
>> Your patch demonstrates that qapi-gen has much less need for abstracting
>> sources: we use 2. "Not in any source file" only for reading the main
>> schema file.
>> 
>
> Yes. I got the impression that you didn't want to pursue more abstract 
> QSI constructs based on earlier work, so going the other way and 
> *removing* them seemed like the faster way to achieve a clean type 
> system here.
>
>>> In pursuing it, we find that QAPISourceInfo has a special accommodation
>>> for when there's no filename.
>> 
>> Yes:
>> 
>>      def loc(self) -> str:
>> -->     if self.fname is None:
>> -->         return sys.argv[0]
>>          ret = self.fname
>>          if self.line is not None:
>>              ret += ':%d' % self.line
>>          return ret
>> 
>>>                                Meanwhile, we intend to type info.fname as
>>> str; something we always have.
>> 
>> Do you mean "as non-optional str"?
>> 
>
> Yeah. I typed it originally as `str`, but the analyzer missed that we 
> check the field to see if it's None, which is misleading.
>
>>> To remove this, we need to not have a "fake" QAPISourceInfo object. We
>> 
>> We may well want to, but I doubt we *need* to.  There are almost
>> certainly other ways to fix the bug.  I don't see a need to explore
>> them, though.
>> 
>
> Either we build out the fake QSI into a proper subtype, or we remove it 
> -- those are the two obvious options. Building it out is almost 
> certainly more work than this patch.
>
>>> also don't want to explicitly begin accommodating QAPISourceInfo being
>>> None, because we actually want to eventually prove that this can never
>>> happen -- We don't want to confuse "The file isn't open yet" with "This
>>> error stems from a definition that wasn't defined in any file".
>> 
>> Yes, encoding both "poisoned source info not to be used with actual
>> errors" and "'fake' source info not pointing to a source file" as None
>> would be a mistake.
>> 
>
> :)
>
>>> (An earlier series tried to create an official dummy object, but it was
>>> tough to prove in review that it worked correctly without creating new
>>> regressions. This patch avoids trying to re-litigate that discussion.
>>>
>>> We would like to first prove that we never raise QAPISemError for any
>>> built-in object before we relent and add "special" info objects. We
>>> aren't ready to do that yet, so crashing is preferred.)
>>>
>>> So, how to solve this mess?
>>>
>>> Here's one way: Don't try to handle errors at a level with "mixed"
>>> semantic levels; i.e. don't try to handle inclusion errors (should
>>> report a source line where the include was triggered) with command line
>>> errors (where we specified a file we couldn't read).
>>>
>>> Simply remove the error handling from the initializer of the
>>> parser. Pythonic! Now it's the caller's job to figure out what to do
>>> about it. Handle the error in QAPISchemaParser._include() instead, where
>>> we do have the correct semantic context to not need to play games with
>>> the error message generation.
>>>
>>> Next, to re-gain a nice error at the top level, add a new try/except
>>> into qapi/main.generate(). Now the error looks sensible:
>> 
>> Missing "again" after "sensible" ;-P
>> 
>
> okayokayokayfine
>
>>>
>>>> python3 qapi-gen.py 'fake.json'
>>> qapi-gen.py: can't read schema file 'fake.json': No such file or directory
>>>
>>> Lastly, with this usage gone, we can remove the special type violation
>>> from QAPISourceInfo, and all is well with the world.
>>>
>>> Signed-off-by: John Snow <jsnow@redhat.com>
>>> ---
>>>   scripts/qapi/main.py   |  8 +++++++-
>>>   scripts/qapi/parser.py | 18 +++++++++---------
>>>   scripts/qapi/source.py |  3 ---
>>>   3 files changed, 16 insertions(+), 13 deletions(-)
>>>
>>> diff --git a/scripts/qapi/main.py b/scripts/qapi/main.py
>>> index 703e7ed1ed5..70f8aa86f37 100644
>>> --- a/scripts/qapi/main.py
>>> +++ b/scripts/qapi/main.py
>>> @@ -48,7 +48,13 @@ def generate(schema_file: str,
>>>       """
>>>       assert invalid_prefix_char(prefix) is None
>>>   
>>> -    schema = QAPISchema(schema_file)
>>> +    try:
>>> +        schema = QAPISchema(schema_file)
>>> +    except OSError as err:
>>> +        raise QAPIError(
>>> +            f"can't read schema file '{schema_file}': {err.strerror}"
>>> +        ) from err
>>> +
>>>       gen_types(schema, output_dir, prefix, builtins)
>>>       gen_visit(schema, output_dir, prefix, builtins)
>>>       gen_commands(schema, output_dir, prefix)
>>> diff --git a/scripts/qapi/parser.py b/scripts/qapi/parser.py
>>> index ca5e8e18e00..b378fa33807 100644
>>> --- a/scripts/qapi/parser.py
>>> +++ b/scripts/qapi/parser.py
>>> @@ -40,15 +40,9 @@ def __init__(self, fname, previously_included=None, incl_info=None):
>>>           previously_included = previously_included or set()
>>>           previously_included.add(os.path.abspath(fname))
>>>   
>>> -        try:
>>> -            fp = open(fname, 'r', encoding='utf-8')
>>> +        # Allow the caller to catch this error.
>> 
>> "this error"?  I understand what you mean now, but I'm not sure I will
>> in three months, when I won't have the context I have now.
>> 
>
> Yep, OK.
>
> # May raise OSError, allow the caller to handle it.

Okay.

>>> +        with open(fname, 'r', encoding='utf-8') as fp:
>>>               self.src = fp.read()
>>> -        except IOError as e:
>>> -            raise QAPISemError(incl_info or QAPISourceInfo(None, None, None),
>>> -                               "can't read %s file '%s': %s"
>>> -                               % ("include" if incl_info else "schema",
>>> -                                  fname,
>>> -                                  e.strerror))
>>>   
>>>           if self.src == '' or self.src[-1] != '\n':
>>>               self.src += '\n'
>>> @@ -129,7 +123,13 @@ def _include(self, include, info, incl_fname, previously_included):
>>>           if incl_abs_fname in previously_included:
>>>               return None
>>>   
>>> -        return QAPISchemaParser(incl_fname, previously_included, info)
>>> +        try:
>>> +            return QAPISchemaParser(incl_fname, previously_included, info)
>>> +        except OSError as err:
>>> +            raise QAPISemError(
>>> +                info,
>>> +                f"can't read include file '{incl_fname}': {err.strerror}"
>>> +            ) from err
>>>   
>>>       def _check_pragma_list_of_str(self, name, value, info):
>>>           if (not isinstance(value, list)
>> 
>> Before the patch, only IOError from open() and .read() get converted to
>> QAPISemError, and therefore caught by main().
>> 
>> The patch widen this to anywhere in QAPISchemaParser.__init__().  Hmm.
>> 
>
> "Changed in version 3.3: EnvironmentError, IOError, WindowsError, 
> socket.error, select.error and mmap.error have been merged into OSError, 
> and the constructor may return a subclass."
>
>  >>> OSError == IOError
> True
>
> (No, I didn't know this before I wrote it. I just intentionally wanted 
> to catch everything that open() might return, which I had simply assumed 
> was not fully captured by IOError. Better to leave it as OSError now to 
> avoid misleading anyone into thinking it's more narrow than it really is.)

Good to know.

However, I was talking about the code covered by try ... except OSError
(or IOError, or whatever).  Before the patch, it's just open() and
.read().  Afterwards it's all of .__init__().

Could anything else in .__init__() possibly raise OSError?  Probably
not, but it's not trivially obvious.  Which makes me go "hmm."

"Hmm" isn't "no", it's just "hmm".

>>> diff --git a/scripts/qapi/source.py b/scripts/qapi/source.py
>>> index 03b6ede0828..1ade864d7b9 100644
>>> --- a/scripts/qapi/source.py
>>> +++ b/scripts/qapi/source.py
>>> @@ -10,7 +10,6 @@
>>>   # See the COPYING file in the top-level directory.
>>>   
>>>   import copy
>>> -import sys
>>>   from typing import List, Optional, TypeVar
>>>   
>>>   
>>> @@ -53,8 +52,6 @@ def next_line(self: T) -> T:
>>>           return info
>>>   
>>>       def loc(self) -> str:
>>> -        if self.fname is None:
>>> -            return sys.argv[0]
>>>           ret = self.fname
>>>           if self.line is not None:
>>>               ret += ':%d' % self.line
>> 
>> tests/qapi-schema/test-qapi.py also needs an update.  Before the patch:
>> 
>>      $ PYTHONPATH=scripts python3 tests/qapi-schema/test-qapi.py nonexistent
>>      tests/qapi-schema/test-qapi.py: can't read schema file 'nonexistent.json': No such file or directory
>> 
>> After:
>> 
>>      Traceback (most recent call last):
>>        File "tests/qapi-schema/test-qapi.py", line 207, in <module>
>>          main(sys.argv)
>>        File "tests/qapi-schema/test-qapi.py", line 201, in main
>>          status |= test_and_diff(test_name, dir_name, args.update)
>>        File "tests/qapi-schema/test-qapi.py", line 129, in test_and_diff
>>          test_frontend(os.path.join(dir_name, test_name + '.json'))
>>        File "tests/qapi-schema/test-qapi.py", line 109, in test_frontend
>>          schema = QAPISchema(fname)
>>        File "/work/armbru/qemu/scripts/qapi/schema.py", line 852, in __init__
>>          parser = QAPISchemaParser(fname)
>>        File "/work/armbru/qemu/scripts/qapi/parser.py", line 44, in __init__
>>          with open(fname, 'r', encoding='utf-8') as fp:
>>      FileNotFoundError: [Errno 2] No such file or directory: 'nonexistent.json'
>> 
>
> Probably something that should be added to the actual battery of tests 
> somehow, yeah? I can't prevent regressions in invocations that don't get 
> run O:-)

I can, by reviewing patches ;-P

You're welcome to contribute automated tests covering qapi-gen and
test-qapi invocations.  I never bothered automating this, and I wouldn't
bother now.

John Snow April 27, 2021, 5:58 p.m. UTC | #4

On 4/27/21 9:47 AM, Markus Armbruster wrote:
> John Snow <jsnow@redhat.com> writes:
> 
>> On 4/23/21 11:46 AM, Markus Armbruster wrote:
>>> John Snow <jsnow@redhat.com> writes:
>>>
>>>> The short-ish version of what motivates this patch is:
>>>>
>>>> - The parser initializer does not possess adequate context to write a
>>>>     good error message -- It tries to determine the caller's semantic
>>>>     context.
>>>
>>> I'm not sure I get what you're trying to say here.
>>>
>>
>> I mean: this __init__ method does not *know* who is calling it or why.
>> Of course, *we* do, because the code base is finite and nobody else but
>> us is calling into it.
>>
>> I mean to point out that the initializer has to do extra work (Just a
>> little) to determine what the calling context is and raise an error
>> accordingly.
>>
>> Example: If we have a parent info context, we raise an error in the
>> context of the caller. If we don't, we have to create a new presumed
>> context (using the weird None SourceInfo object).
> 
> I guess you mean
> 
>              raise QAPISemError(incl_info or QAPISourceInfo(None, None, None),
> 
> I can't see other instances of messing with context.
> 

Yes, and the string construction that follows, too. It's all about 
trying to understand who our caller is and raising an error appropriate 
for them on their behalf.

>> So I just mean to say:
>>
>> "Let the caller, who unambiguously always has the exactly correct
>> context worry about what the error message ought to be."
>>
>>>> - We don't want to allow QAPISourceInfo(None, None, None) to exist.
>>>> - Errors made using such an object are currently incorrect.
>>>> - It's not technically a semantic error if we cannot open the schema
>>>> - There are various typing constraints that make mixing these two cases
>>>>     undesirable for a single special case.
>>>
>>> These I understand.
>>>
>>>> - The current open block in parser's initializer will leak file
>>>>     pointers, because it isn't using a with statement.
>>>
>>> Uh, isn't the value returned by open() reference-counted?  @fp is the
>>> only reference...
>>>
>>
>> Yeah, eventually. O:-)
>>
>> Whenever the GC runs. OK, it's not really an apocalypse error, but it
>> felt strange to rewrite a try/except and then write it using bad hygiene
>> on purpose in the name of a more isolated commit.
> 
> I agree use of with is an improvement (it's idiomatic).  We shouldn't
> call it a leak fix, though.
> 

OK. I'll reword it.

>>>> Here's the details in why this got written the way it did, and why a few
>>>> disparate issues are rolled into one commit. (They're hard to fix
>>>> separately without writing really weird stuff that'd be harder to
>>>> review.)
>>>>
>>>> The error message string here is incorrect:
>>>>
>>>>> python3 qapi-gen.py 'fake.json'
>>>> qapi-gen.py: qapi-gen.py: can't read schema file 'fake.json': No such file or directory
>>>
>>> Regressed in commit 52a474180a "qapi-gen: Separate arg-parsing from
>>> generation" (v5.2.0).
>>>
>>
>> Mea Culpa. Didn't realize it wasn't tested, and I didn't realize at the
>> time that the two kinds of errors here were treated differently.
> 
> Our tests cover the schema language, not qapi-gen's CLI language.  The
> gap feels tolerable.
> 
>>> Before commit c615550df3 "qapi: Improve source file read error handling"
>>> (v4.2.0), it was differently bad (uncaught exception).
>>>
>>> Commit c615550df3 explains why the funny QAPISourceInfo exists:
>>>
>>>       Reporting open or read failure for the main schema file needs a
>>>       QAPISourceInfo representing "no source".  Make QAPISourceInfo cope
>>>       with fname=None.
>>>
>>
>> I am apparently not the first or the last person to dream of wanting a
>> QAPISourceInfo that represents "Actually, there's no source location!"
>>
>>> The commit turned QAPISourceInfo into the equivalent of a disjoint union
>>> of
>>>
>>> 1. A position in a source file (.fname is a str)
>>>
>>> 2. "Not in any source file" (.fname is None)
>>>
>>> This is somewhat similar to struct Location in C, which has
>>>
>>> 1. LOC_FILE: a position in a source file
>>>
>>> 2. LOC_CMDLINE: a range of command line arguments
>>>
>>> 3. LOC_NONE: no location information
>>>
>>> Abstracting locations this way lets error_report() do the right thing
>>> whether its complaining about the command line, a monitor command, or a
>>> configuration file read with -readconfig.
>>>
>>> Your patch demonstrates that qapi-gen has much less need for abstracting
>>> sources: we use 2. "Not in any source file" only for reading the main
>>> schema file.
>>>
>>
>> Yes. I got the impression that you didn't want to pursue more abstract
>> QSI constructs based on earlier work, so going the other way and
>> *removing* them seemed like the faster way to achieve a clean type
>> system here.
>>
>>>> In pursuing it, we find that QAPISourceInfo has a special accommodation
>>>> for when there's no filename.
>>>
>>> Yes:
>>>
>>>       def loc(self) -> str:
>>> -->     if self.fname is None:
>>> -->         return sys.argv[0]
>>>           ret = self.fname
>>>           if self.line is not None:
>>>               ret += ':%d' % self.line
>>>           return ret
>>>
>>>>                                 Meanwhile, we intend to type info.fname as
>>>> str; something we always have.
>>>
>>> Do you mean "as non-optional str"?
>>>
>>
>> Yeah. I typed it originally as `str`, but the analyzer missed that we
>> check the field to see if it's None, which is misleading.
>>
>>>> To remove this, we need to not have a "fake" QAPISourceInfo object. We
>>>
>>> We may well want to, but I doubt we *need* to.  There are almost
>>> certainly other ways to fix the bug.  I don't see a need to explore
>>> them, though.
>>>
>>
>> Either we build out the fake QSI into a proper subtype, or we remove it
>> -- those are the two obvious options. Building it out is almost
>> certainly more work than this patch.
>>
>>>> also don't want to explicitly begin accommodating QAPISourceInfo being
>>>> None, because we actually want to eventually prove that this can never
>>>> happen -- We don't want to confuse "The file isn't open yet" with "This
>>>> error stems from a definition that wasn't defined in any file".
>>>
>>> Yes, encoding both "poisoned source info not to be used with actual
>>> errors" and "'fake' source info not pointing to a source file" as None
>>> would be a mistake.
>>>
>>
>> :)
>>
>>>> (An earlier series tried to create an official dummy object, but it was
>>>> tough to prove in review that it worked correctly without creating new
>>>> regressions. This patch avoids trying to re-litigate that discussion.
>>>>
>>>> We would like to first prove that we never raise QAPISemError for any
>>>> built-in object before we relent and add "special" info objects. We
>>>> aren't ready to do that yet, so crashing is preferred.)
>>>>
>>>> So, how to solve this mess?
>>>>
>>>> Here's one way: Don't try to handle errors at a level with "mixed"
>>>> semantic levels; i.e. don't try to handle inclusion errors (should
>>>> report a source line where the include was triggered) with command line
>>>> errors (where we specified a file we couldn't read).
>>>>
>>>> Simply remove the error handling from the initializer of the
>>>> parser. Pythonic! Now it's the caller's job to figure out what to do
>>>> about it. Handle the error in QAPISchemaParser._include() instead, where
>>>> we do have the correct semantic context to not need to play games with
>>>> the error message generation.
>>>>
>>>> Next, to re-gain a nice error at the top level, add a new try/except
>>>> into qapi/main.generate(). Now the error looks sensible:
>>>
>>> Missing "again" after "sensible" ;-P
>>>
>>
>> okayokayokayfine
>>
>>>>
>>>>> python3 qapi-gen.py 'fake.json'
>>>> qapi-gen.py: can't read schema file 'fake.json': No such file or directory
>>>>
>>>> Lastly, with this usage gone, we can remove the special type violation
>>>> from QAPISourceInfo, and all is well with the world.
>>>>
>>>> Signed-off-by: John Snow <jsnow@redhat.com>
>>>> ---
>>>>    scripts/qapi/main.py   |  8 +++++++-
>>>>    scripts/qapi/parser.py | 18 +++++++++---------
>>>>    scripts/qapi/source.py |  3 ---
>>>>    3 files changed, 16 insertions(+), 13 deletions(-)
>>>>
>>>> diff --git a/scripts/qapi/main.py b/scripts/qapi/main.py
>>>> index 703e7ed1ed5..70f8aa86f37 100644
>>>> --- a/scripts/qapi/main.py
>>>> +++ b/scripts/qapi/main.py
>>>> @@ -48,7 +48,13 @@ def generate(schema_file: str,
>>>>        """
>>>>        assert invalid_prefix_char(prefix) is None
>>>>    
>>>> -    schema = QAPISchema(schema_file)
>>>> +    try:
>>>> +        schema = QAPISchema(schema_file)
>>>> +    except OSError as err:
>>>> +        raise QAPIError(
>>>> +            f"can't read schema file '{schema_file}': {err.strerror}"
>>>> +        ) from err
>>>> +
>>>>        gen_types(schema, output_dir, prefix, builtins)
>>>>        gen_visit(schema, output_dir, prefix, builtins)
>>>>        gen_commands(schema, output_dir, prefix)
>>>> diff --git a/scripts/qapi/parser.py b/scripts/qapi/parser.py
>>>> index ca5e8e18e00..b378fa33807 100644
>>>> --- a/scripts/qapi/parser.py
>>>> +++ b/scripts/qapi/parser.py
>>>> @@ -40,15 +40,9 @@ def __init__(self, fname, previously_included=None, incl_info=None):
>>>>            previously_included = previously_included or set()
>>>>            previously_included.add(os.path.abspath(fname))
>>>>    
>>>> -        try:
>>>> -            fp = open(fname, 'r', encoding='utf-8')
>>>> +        # Allow the caller to catch this error.
>>>
>>> "this error"?  I understand what you mean now, but I'm not sure I will
>>> in three months, when I won't have the context I have now.
>>>
>>
>> Yep, OK.
>>
>> # May raise OSError, allow the caller to handle it.
> 
> Okay.
> 
>>>> +        with open(fname, 'r', encoding='utf-8') as fp:
>>>>                self.src = fp.read()
>>>> -        except IOError as e:
>>>> -            raise QAPISemError(incl_info or QAPISourceInfo(None, None, None),
>>>> -                               "can't read %s file '%s': %s"
>>>> -                               % ("include" if incl_info else "schema",
>>>> -                                  fname,
>>>> -                                  e.strerror))
>>>>    
>>>>            if self.src == '' or self.src[-1] != '\n':
>>>>                self.src += '\n'
>>>> @@ -129,7 +123,13 @@ def _include(self, include, info, incl_fname, previously_included):
>>>>            if incl_abs_fname in previously_included:
>>>>                return None
>>>>    
>>>> -        return QAPISchemaParser(incl_fname, previously_included, info)
>>>> +        try:
>>>> +            return QAPISchemaParser(incl_fname, previously_included, info)
>>>> +        except OSError as err:
>>>> +            raise QAPISemError(
>>>> +                info,
>>>> +                f"can't read include file '{incl_fname}': {err.strerror}"
>>>> +            ) from err
>>>>    
>>>>        def _check_pragma_list_of_str(self, name, value, info):
>>>>            if (not isinstance(value, list)
>>>
>>> Before the patch, only IOError from open() and .read() get converted to
>>> QAPISemError, and therefore caught by main().
>>>
>>> The patch widen this to anywhere in QAPISchemaParser.__init__().  Hmm.
>>>
>>
>> "Changed in version 3.3: EnvironmentError, IOError, WindowsError,
>> socket.error, select.error and mmap.error have been merged into OSError,
>> and the constructor may return a subclass."
>>
>>   >>> OSError == IOError
>> True
>>
>> (No, I didn't know this before I wrote it. I just intentionally wanted
>> to catch everything that open() might return, which I had simply assumed
>> was not fully captured by IOError. Better to leave it as OSError now to
>> avoid misleading anyone into thinking it's more narrow than it really is.)
> 
> Good to know.
> 
> However, I was talking about the code covered by try ... except OSError
> (or IOError, or whatever).  Before the patch, it's just open() and
> .read().  Afterwards it's all of .__init__().
> 

Apologies, I misread.

> Could anything else in .__init__() possibly raise OSError?  Probably
> not, but it's not trivially obvious.  Which makes me go "hmm."
> 
> "Hmm" isn't "no", it's just "hmm".
> 

Yeah, it is rather broad. That is one of the perils of doing *so much* 
at init() time, in my opinion.

We don't make any other syscalls in the parser though, so it should be 
fine. The docstring patch later documents the errors we expect to see 
here, so it becomes a visible part of the interface.

>>>> diff --git a/scripts/qapi/source.py b/scripts/qapi/source.py
>>>> index 03b6ede0828..1ade864d7b9 100644
>>>> --- a/scripts/qapi/source.py
>>>> +++ b/scripts/qapi/source.py
>>>> @@ -10,7 +10,6 @@
>>>>    # See the COPYING file in the top-level directory.
>>>>    
>>>>    import copy
>>>> -import sys
>>>>    from typing import List, Optional, TypeVar
>>>>    
>>>>    
>>>> @@ -53,8 +52,6 @@ def next_line(self: T) -> T:
>>>>            return info
>>>>    
>>>>        def loc(self) -> str:
>>>> -        if self.fname is None:
>>>> -            return sys.argv[0]
>>>>            ret = self.fname
>>>>            if self.line is not None:
>>>>                ret += ':%d' % self.line
>>>
>>> tests/qapi-schema/test-qapi.py also needs an update.  Before the patch:
>>>
>>>       $ PYTHONPATH=scripts python3 tests/qapi-schema/test-qapi.py nonexistent
>>>       tests/qapi-schema/test-qapi.py: can't read schema file 'nonexistent.json': No such file or directory
>>>
>>> After:
>>>
>>>       Traceback (most recent call last):
>>>         File "tests/qapi-schema/test-qapi.py", line 207, in <module>
>>>           main(sys.argv)
>>>         File "tests/qapi-schema/test-qapi.py", line 201, in main
>>>           status |= test_and_diff(test_name, dir_name, args.update)
>>>         File "tests/qapi-schema/test-qapi.py", line 129, in test_and_diff
>>>           test_frontend(os.path.join(dir_name, test_name + '.json'))
>>>         File "tests/qapi-schema/test-qapi.py", line 109, in test_frontend
>>>           schema = QAPISchema(fname)
>>>         File "/work/armbru/qemu/scripts/qapi/schema.py", line 852, in __init__
>>>           parser = QAPISchemaParser(fname)
>>>         File "/work/armbru/qemu/scripts/qapi/parser.py", line 44, in __init__
>>>           with open(fname, 'r', encoding='utf-8') as fp:
>>>       FileNotFoundError: [Errno 2] No such file or directory: 'nonexistent.json'
>>>
>>
>> Probably something that should be added to the actual battery of tests
>> somehow, yeah? I can't prevent regressions in invocations that don't get
>> run O:-)
> 
> I can, by reviewing patches ;-P
> 
> You're welcome to contribute automated tests covering qapi-gen and
> test-qapi invocations.  I never bothered automating this, and I wouldn't
> bother now.
> 

Yep, I am going to add a not-found test thanks to Paolo's help.

Markus Armbruster April 28, 2021, 5:48 a.m. UTC | #5

John Snow <jsnow@redhat.com> writes:

> On 4/27/21 9:47 AM, Markus Armbruster wrote:
>> John Snow <jsnow@redhat.com> writes:
>> 
>>> On 4/23/21 11:46 AM, Markus Armbruster wrote:
>>>> John Snow <jsnow@redhat.com> writes:
>>>>
>>>>> The short-ish version of what motivates this patch is:
>>>>>
>>>>> - The parser initializer does not possess adequate context to write a
>>>>>     good error message -- It tries to determine the caller's semantic
>>>>>     context.
>>>>
>>>> I'm not sure I get what you're trying to say here.
>>>>
>>>
>>> I mean: this __init__ method does not *know* who is calling it or why.
>>> Of course, *we* do, because the code base is finite and nobody else but
>>> us is calling into it.
>>>
>>> I mean to point out that the initializer has to do extra work (Just a
>>> little) to determine what the calling context is and raise an error
>>> accordingly.
>>>
>>> Example: If we have a parent info context, we raise an error in the
>>> context of the caller. If we don't, we have to create a new presumed
>>> context (using the weird None SourceInfo object).
>> 
>> I guess you mean
>> 
>>              raise QAPISemError(incl_info or QAPISourceInfo(None, None, None),
>> 
>> I can't see other instances of messing with context.
>> 
>
> Yes, and the string construction that follows, too. It's all about 
> trying to understand who our caller is and raising an error appropriate 
> for them on their behalf.

I guess you can view it that way.  I never did.  My thinking was

    @fname either comes from a schema file (@incl_info is not None) or
    somewhere else.  If schema file, make the exception's __str__()
    start with "SCHEMA-FILE:LINE: ", because that's how compilers report
    errors in source files.  Else, make it start with just "PROGNAME: ",
    because that's how compilers report errors unrelated to source
    files.

This assumes "incl_info is None implies unrelated to source file".  I
think that's fair.  I don't think it rises to the level of
"understanding who our caller is".

>>> So I just mean to say:
>>>
>>> "Let the caller, who unambiguously always has the exactly correct
>>> context worry about what the error message ought to be."

[...]

>>>> Before the patch, only IOError from open() and .read() get converted to
>>>> QAPISemError, and therefore caught by main().
>>>>
>>>> The patch widen this to anywhere in QAPISchemaParser.__init__().  Hmm.
>>>>
>>>
>>> "Changed in version 3.3: EnvironmentError, IOError, WindowsError,
>>> socket.error, select.error and mmap.error have been merged into OSError,
>>> and the constructor may return a subclass."
>>>
>>>   >>> OSError == IOError
>>> True
>>>
>>> (No, I didn't know this before I wrote it. I just intentionally wanted
>>> to catch everything that open() might return, which I had simply assumed
>>> was not fully captured by IOError. Better to leave it as OSError now to
>>> avoid misleading anyone into thinking it's more narrow than it really is.)
>> 
>> Good to know.
>> 
>> However, I was talking about the code covered by try ... except OSError
>> (or IOError, or whatever).  Before the patch, it's just open() and
>> .read().  Afterwards it's all of .__init__().
>> 
>
> Apologies, I misread.
>
>> Could anything else in .__init__() possibly raise OSError?  Probably
>> not, but it's not trivially obvious.  Which makes me go "hmm."
>> 
>> "Hmm" isn't "no", it's just "hmm".
>> 
>
> Yeah, it is rather broad. That is one of the perils of doing *so much* 
> at init() time, in my opinion.

It's not ideal, but then having to write something like

    parser = QAPISchemaParser(fname).parse()

or

    parser = QAPISchemaParser().parse(fname)

instead of just

    parser = QAPISchemaParser(fname)

would be less than ideal, too.

> We don't make any other syscalls in the parser though, so it should be 
> fine. The docstring patch later documents the errors we expect to see 
> here, so it becomes a visible part of the interface.

[...]

[01/22] qapi/parser: Don't try to handle file errors

Commit Message

Comments

Patch