diff mbox series

[v2] kunit: tool: continue past invalid utf-8 output

Message ID 20211020232121.1748376-1-dlatypov@google.com (mailing list archive)
State Accepted
Commit 2ab5d5e67f7ab2d2ecf67b8855ac65691f4e4b4d
Delegated to: Brendan Higgins
Headers show
Series [v2] kunit: tool: continue past invalid utf-8 output | expand

Commit Message

Daniel Latypov Oct. 20, 2021, 11:21 p.m. UTC
kunit.py currently crashes and fails to parse kernel output if it's not
fully valid utf-8.

This can come from memory corruption or or just inadvertently printing
out binary data as strings.

E.g. adding this line into a kunit test
  pr_info("\x80")
will cause this exception
  UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 1961: invalid start byte

We can tell Python how to handle errors, see
https://docs.python.org/3/library/codecs.html#error-handlers

Unfortunately, it doesn't seem like there's a way to specify this in
just one location, so we need to repeat ourselves quite a bit.

Specify `errors='backslashreplace'` so we instead:
* print out the offending byte as '\x80'
* try and continue parsing the output.
  * as long as the TAP lines themselves are valid, we're fine.

Signed-off-by: Daniel Latypov <dlatypov@google.com>
Reviewed-by: Brendan Higgins <brendanhiggins@google.com>
---
v1 -> v2: add comment to silence erroneous pytype error
---
 tools/testing/kunit/kunit.py        | 3 ++-
 tools/testing/kunit/kunit_kernel.py | 4 ++--
 2 files changed, 4 insertions(+), 3 deletions(-)


base-commit: 63b136c634a2bdffd78795bc33ac2d488152ffe8

Comments

David Gow Oct. 21, 2021, 2:32 a.m. UTC | #1
On Thu, Oct 21, 2021 at 7:21 AM 'Daniel Latypov' via KUnit Development
<kunit-dev@googlegroups.com> wrote:
>
> kunit.py currently crashes and fails to parse kernel output if it's not
> fully valid utf-8.
>
> This can come from memory corruption or or just inadvertently printing
> out binary data as strings.
>
> E.g. adding this line into a kunit test
>   pr_info("\x80")
> will cause this exception
>   UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 1961: invalid start byte
>
> We can tell Python how to handle errors, see
> https://docs.python.org/3/library/codecs.html#error-handlers
>
> Unfortunately, it doesn't seem like there's a way to specify this in
> just one location, so we need to repeat ourselves quite a bit.
>
> Specify `errors='backslashreplace'` so we instead:
> * print out the offending byte as '\x80'
> * try and continue parsing the output.
>   * as long as the TAP lines themselves are valid, we're fine.
>
> Signed-off-by: Daniel Latypov <dlatypov@google.com>
> Reviewed-by: Brendan Higgins <brendanhiggins@google.com>
> ---
> v1 -> v2: add comment to silence erroneous pytype error
> ---

Thanks. I've tested this, and it works well for me. I don't mind the
pytype comment, even though I don't use pytype, so I'm glad it's
there.

Tested-by: David Gow <davidgow@google.com>

Cheers,
-- David

>  tools/testing/kunit/kunit.py        | 3 ++-
>  tools/testing/kunit/kunit_kernel.py | 4 ++--
>  2 files changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/tools/testing/kunit/kunit.py b/tools/testing/kunit/kunit.py
> index e1dd3180f0d1..68e6f461c758 100755
> --- a/tools/testing/kunit/kunit.py
> +++ b/tools/testing/kunit/kunit.py
> @@ -477,9 +477,10 @@ def main(argv, linux=None):
>                         sys.exit(1)
>         elif cli_args.subcommand == 'parse':
>                 if cli_args.file == None:
> +                       sys.stdin.reconfigure(errors='backslashreplace')  # pytype: disable=attribute-error
>                         kunit_output = sys.stdin
>                 else:
> -                       with open(cli_args.file, 'r') as f:
> +                       with open(cli_args.file, 'r', errors='backslashreplace') as f:
>                                 kunit_output = f.read().splitlines()
>                 request = KunitParseRequest(cli_args.raw_output,
>                                             None,
> diff --git a/tools/testing/kunit/kunit_kernel.py b/tools/testing/kunit/kunit_kernel.py
> index faa6320e900e..f08c6c36a947 100644
> --- a/tools/testing/kunit/kunit_kernel.py
> +++ b/tools/testing/kunit/kunit_kernel.py
> @@ -135,7 +135,7 @@ class LinuxSourceTreeOperationsQemu(LinuxSourceTreeOperations):
>                                            stdin=subprocess.PIPE,
>                                            stdout=subprocess.PIPE,
>                                            stderr=subprocess.STDOUT,
> -                                          text=True, shell=True)
> +                                          text=True, shell=True, errors='backslashreplace')
>
>  class LinuxSourceTreeOperationsUml(LinuxSourceTreeOperations):
>         """An abstraction over command line operations performed on a source tree."""
> @@ -172,7 +172,7 @@ class LinuxSourceTreeOperationsUml(LinuxSourceTreeOperations):
>                                            stdin=subprocess.PIPE,
>                                            stdout=subprocess.PIPE,
>                                            stderr=subprocess.STDOUT,
> -                                          text=True)
> +                                          text=True, errors='backslashreplace')
>
>  def get_kconfig_path(build_dir) -> str:
>         return get_file_path(build_dir, KCONFIG_PATH)
>
> base-commit: 63b136c634a2bdffd78795bc33ac2d488152ffe8
> --
> 2.33.0.1079.g6e70778dc9-goog
>
> --
> You received this message because you are subscribed to the Google Groups "KUnit Development" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to kunit-dev+unsubscribe@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/kunit-dev/20211020232121.1748376-1-dlatypov%40google.com.
diff mbox series

Patch

diff --git a/tools/testing/kunit/kunit.py b/tools/testing/kunit/kunit.py
index e1dd3180f0d1..68e6f461c758 100755
--- a/tools/testing/kunit/kunit.py
+++ b/tools/testing/kunit/kunit.py
@@ -477,9 +477,10 @@  def main(argv, linux=None):
 			sys.exit(1)
 	elif cli_args.subcommand == 'parse':
 		if cli_args.file == None:
+			sys.stdin.reconfigure(errors='backslashreplace')  # pytype: disable=attribute-error
 			kunit_output = sys.stdin
 		else:
-			with open(cli_args.file, 'r') as f:
+			with open(cli_args.file, 'r', errors='backslashreplace') as f:
 				kunit_output = f.read().splitlines()
 		request = KunitParseRequest(cli_args.raw_output,
 					    None,
diff --git a/tools/testing/kunit/kunit_kernel.py b/tools/testing/kunit/kunit_kernel.py
index faa6320e900e..f08c6c36a947 100644
--- a/tools/testing/kunit/kunit_kernel.py
+++ b/tools/testing/kunit/kunit_kernel.py
@@ -135,7 +135,7 @@  class LinuxSourceTreeOperationsQemu(LinuxSourceTreeOperations):
 					   stdin=subprocess.PIPE,
 					   stdout=subprocess.PIPE,
 					   stderr=subprocess.STDOUT,
-					   text=True, shell=True)
+					   text=True, shell=True, errors='backslashreplace')
 
 class LinuxSourceTreeOperationsUml(LinuxSourceTreeOperations):
 	"""An abstraction over command line operations performed on a source tree."""
@@ -172,7 +172,7 @@  class LinuxSourceTreeOperationsUml(LinuxSourceTreeOperations):
 					   stdin=subprocess.PIPE,
 					   stdout=subprocess.PIPE,
 					   stderr=subprocess.STDOUT,
-					   text=True)
+					   text=True, errors='backslashreplace')
 
 def get_kconfig_path(build_dir) -> str:
 	return get_file_path(build_dir, KCONFIG_PATH)