diff mbox

[OSSTEST,v13,19/24] TestSupport: Implement target_subunit_cmd a subunit stream parser into substeps

Message ID 20170725115759.21895-20-anthony.perard@citrix.com (mailing list archive)
State New, archived
Headers show

Commit Message

Anthony PERARD July 25, 2017, 11:57 a.m. UTC
target_subunit_cmd can be used like target_cmd, but the command would
needs to output a subunit v1 stream, which will be parsed and turned
into osstest substeps. The command can be `| subunit-2to1` in order to
turn a subunit v2 stream into v1.

Currently, time is not taken into account, and all substeps will have
bogus timestamp as the output of the command is parsed after it has
runned.

This is a description of the subunit v1 protocol, taken from
python-subunit README, or https://pypi.python.org/pypi/python-subunit

test|testing|test:|testing: test LABEL
success|success:|successful|successful: test LABEL
success|success:|successful|successful: test LABEL DETAILS
failure: test LABEL
failure: test LABEL DETAILS
error: test LABEL
error: test LABEL DETAILS
skip[:] test LABEL
skip[:] test LABEL DETAILS
xfail[:] test LABEL
xfail[:] test LABEL DETAILS
uxsuccess[:] test LABEL
uxsuccess[:] test LABEL DETAILS
progress: [+|-]X
progress: push
progress: pop
tags: [-]TAG ...
time: YYYY-MM-DD HH:MM:SSZ

LABEL: UTF8*
NAME: UTF8*
DETAILS ::= BRACKETED | MULTIPART
BRACKETED ::= '[' CR UTF8-lines ']' CR
MULTIPART ::= '[ multipart' CR PART* ']' CR
PART ::= PART_TYPE CR NAME CR PART_BYTES CR
PART_TYPE ::= Content-Type: type/sub-type(;parameter=value,parameter=value)
PART_BYTES ::= (DIGITS CR LF BYTE{DIGITS})* '0' CR LF

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
---

Notes:
    Changes in v13:
    - also parse multipart output
    - add every possible test result
    - use target_cmd_stashed

 Osstest/TestSupport.pm | 117 ++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 116 insertions(+), 1 deletion(-)

Comments

Ian Jackson July 25, 2017, 6 p.m. UTC | #1
Anthony PERARD writes ("[OSSTEST PATCH v13 19/24] TestSupport: Implement target_subunit_cmd a subunit stream parser into substeps"):
> target_subunit_cmd can be used like target_cmd, but the command would
> needs to output a subunit v1 stream, which will be parsed and turned
> into osstest substeps. The command can be `| subunit-2to1` in order to
> turn a subunit v2 stream into v1.
> 
> Currently, time is not taken into account, and all substeps will have
> bogus timestamp as the output of the command is parsed after it has
> runned.
> 
> This is a description of the subunit v1 protocol, taken from
> python-subunit README, or https://pypi.python.org/pypi/python-subunit

What a lot of code!

> +    while (<$stdout>) {
> +        if (/^time: (\d+)-(\d+)-(\d+) (\d+):(\d+):(\d+)(\.\d+)?Z$/) {
> +            # This is the timestamp for the next events

I'm not sure what your ( ) are doing here.

> +        } elsif (/^test(?:ing)?:? (.+)\n/) {
> +            # Start of a new test.
> +            $logfilename = subunit_sanitize_testname($1) . '.log';
> +            $fh = open_unique_stashfile(\$logfilename);

This name might clash with existing logfile names, which might be
generated later.  Can you put "subunit-" on the front maybe ?

> +            substep_start(subunit_sanitize_testname($1), $logfilename);

And here, I think you should start the parameter you pass to
substep_start with '/' so that it gets appended to the testid for the
whole script, for a similar reason.

I think it would be better to call subunit_sanitize_testname only
once.

> +        } elsif (/^(success(?:ful)?|failure|skip|error|xfail|uxsuccess):
> +                   \ (.+?)(\ \[(\ multipart)?)?$/x) {
> +            # Result of a test, with its output.
> +            my $result = $1;
> +            my $testname = $2;
> +            my $have_details = $3;
> +            my $is_multipart = $4;

I would normally write this:
               my ($result, $testname, $have_... ) = ($1,$2,$3,$4,$5)
although I don't really mind much that you have written it as you
have.

> +            if ($have_details) {
> +                if ($is_multipart) {
> +                    # Test output
> +                    while (<$stdout>) {
> +                        # part content-type
> +                        # from https://tools.ietf.org/html/rfc6838#section-4.2
> +                        my $restricted_name = qr'[a-zA-Z0-9][a-zA-Z0-9!#$&^_.+-]*';
> +                        if (m{ ^Content-Type:\s+
> +                                $restricted_name/$restricted_name # type/sub-type
> +                                # parameters
> +                                (?:\s*;\s*$restricted_name=[^,]+
> +                                  (?:,\s*$restricted_name=[^,]+)*)
> +                                \s*$
> +                            }xi) {

I don't understand why you are trying to match this Content-Type so
precisely.  AFAICT from the grammar, all you need to do is see whether
there is something vaguely like a c-t header.

> +                            print $fh $_ or die $!;
> +
> +                            # part name
> +                            my $line = <$stdout>;
> +                            print $fh $line or die $!;
> +
> +                            # Read chunks of a part
> +                            while (<$stdout>) {
> +                                if (/^([0-9A-F]+)\r$/i) {
> +                                    my $chunk_size = hex($1);

What makes you think the digits are in hex ?

Since you have to go to the effort of separating out all of this
stuff, it might be worth printing these multipart objects with one
object per logfile.  Although I won't insist on that because I suspect
that multipart results are rare.

> +                                } else {
> +                                    # Unexpected output
> +                                    chomp;
> +                                    logm("*** $_");

I guess the error recovery is to continue until you see "]"
and hope.  Fair enough.

Thanks,
Ian.
Anthony PERARD July 26, 2017, 3:03 p.m. UTC | #2
On Tue, Jul 25, 2017 at 07:00:47PM +0100, Ian Jackson wrote:
> Anthony PERARD writes ("[OSSTEST PATCH v13 19/24] TestSupport: Implement target_subunit_cmd a subunit stream parser into substeps"):
> > target_subunit_cmd can be used like target_cmd, but the command would
> > needs to output a subunit v1 stream, which will be parsed and turned
> > into osstest substeps. The command can be `| subunit-2to1` in order to
> > turn a subunit v2 stream into v1.
> > 
> > Currently, time is not taken into account, and all substeps will have
> > bogus timestamp as the output of the command is parsed after it has
> > runned.
> > 
> > This is a description of the subunit v1 protocol, taken from
> > python-subunit README, or https://pypi.python.org/pypi/python-subunit
> 
> What a lot of code!
> 
> > +    while (<$stdout>) {
> > +        if (/^time: (\d+)-(\d+)-(\d+) (\d+):(\d+):(\d+)(\.\d+)?Z$/) {
> > +            # This is the timestamp for the next events
> 
> I'm not sure what your ( ) are doing here.

No real reason. I'll remove them.

> > +        } elsif (/^test(?:ing)?:? (.+)\n/) {
> > +            # Start of a new test.
> > +            $logfilename = subunit_sanitize_testname($1) . '.log';
> > +            $fh = open_unique_stashfile(\$logfilename);
> 
> This name might clash with existing logfile names, which might be
> generated later.  Can you put "subunit-" on the front maybe ?

Will do.

> > +            substep_start(subunit_sanitize_testname($1), $logfilename);
> 
> And here, I think you should start the parameter you pass to
> substep_start with '/' so that it gets appended to the testid for the
> whole script, for a similar reason.

OK.

> I think it would be better to call subunit_sanitize_testname only
> once.

OK.

> > +        } elsif (/^(success(?:ful)?|failure|skip|error|xfail|uxsuccess):
> > +                   \ (.+?)(\ \[(\ multipart)?)?$/x) {
> > +            # Result of a test, with its output.
> > +            my $result = $1;
> > +            my $testname = $2;
> > +            my $have_details = $3;
> > +            my $is_multipart = $4;
> 
> I would normally write this:
>                my ($result, $testname, $have_... ) = ($1,$2,$3,$4,$5)
> although I don't really mind much that you have written it as you
> have.
> 
> > +            if ($have_details) {
> > +                if ($is_multipart) {
> > +                    # Test output
> > +                    while (<$stdout>) {
> > +                        # part content-type
> > +                        # from https://tools.ietf.org/html/rfc6838#section-4.2
> > +                        my $restricted_name = qr'[a-zA-Z0-9][a-zA-Z0-9!#$&^_.+-]*';
> > +                        if (m{ ^Content-Type:\s+
> > +                                $restricted_name/$restricted_name # type/sub-type
> > +                                # parameters
> > +                                (?:\s*;\s*$restricted_name=[^,]+
> > +                                  (?:,\s*$restricted_name=[^,]+)*)
> > +                                \s*$
> > +                            }xi) {
> 
> I don't understand why you are trying to match this Content-Type so
> precisely.  AFAICT from the grammar, all you need to do is see whether
> there is something vaguely like a c-t header.

I think I start by looking at what kind of characters could be part of
type and sub-type, and just start writing a more complicated regex.

So is the following would be enough for you?
m{^Content-Type: [^/ ]+/[^/ ]+(?:;.+)?$}


> > +                            print $fh $_ or die $!;
> > +
> > +                            # part name
> > +                            my $line = <$stdout>;
> > +                            print $fh $line or die $!;
> > +
> > +                            # Read chunks of a part
> > +                            while (<$stdout>) {
> > +                                if (/^([0-9A-F]+)\r$/i) {
> > +                                    my $chunk_size = hex($1);
> 
> What makes you think the digits are in hex ?

I tried with [0-9] (because DIGITS), but that was not enought. Then I've
check the subunit implementation, there are using "%X" which is hex.

> Since you have to go to the effort of separating out all of this
> stuff, it might be worth printing these multipart objects with one
> object per logfile.  Although I won't insist on that because I suspect
> that multipart results are rare.

There are usually 3 part per tests, with those names:
  pythonlogging:''
  stdout
  stderr
And sometime, there is also one name 'traceback'.
I think stdout and stderr are usually empty.

I think having one file per part will make it more complicated to
read logs of a failed test.

> > +                                } else {
> > +                                    # Unexpected output
> > +                                    chomp;
> > +                                    logm("*** $_");
> 
> I guess the error recovery is to continue until you see "]"
> and hope.  Fair enough.

That one of the reason for the subunit-v2, with a binary protocol,
better recovery.
Ian Jackson July 26, 2017, 3:41 p.m. UTC | #3
Anthony PERARD writes ("Re: [OSSTEST PATCH v13 19/24] TestSupport: Implement target_subunit_cmd a subunit stream parser into substeps"):
> I think I start by looking at what kind of characters could be part of
> type and sub-type, and just start writing a more complicated regex.
> 
> So is the following would be enough for you?
> m{^Content-Type: [^/ ]+/[^/ ]+(?:;.+)?$}

Why do you need to check the at all ?  I think, according to the spec,
that the only thing which can occur here is "Content-Type: something"
or "]".  What would be wrong with
   m{^content-type:}i
?

> > > +                            # Read chunks of a part
> > > +                            while (<$stdout>) {
> > > +                                if (/^([0-9A-F]+)\r$/i) {
> > > +                                    my $chunk_size = hex($1);
> > 
> > What makes you think the digits are in hex ?
> 
> I tried with [0-9] (because DIGITS), but that was not enought. Then I've
> check the subunit implementation, there are using "%X" which is hex.

Wow.  Can you put a comment next to this please ?  Something like

 # The chunk size is in hex, even though this does not seem to be
 # documented in the subunit specification.

perhaps.

> > Since you have to go to the effort of separating out all of this
> > stuff, it might be worth printing these multipart objects with one
> > object per logfile.  Although I won't insist on that because I suspect
> > that multipart results are rare.
> 
> There are usually 3 part per tests, with those names:
>   pythonlogging:''
>   stdout
>   stderr
> And sometime, there is also one name 'traceback'.
> I think stdout and stderr are usually empty.
> 
> I think having one file per part will make it more complicated to
> read logs of a failed test.

OK, leave it as-is then.  (Also, "pythonlogging:''" ?!)

> > I guess the error recovery is to continue until you see "]"
> > and hope.  Fair enough.
> 
> That one of the reason for the subunit-v2, with a binary protocol,
> better recovery.

I don't think that's a good reason.  But this ranting is quite
off-topic now :-).

Ian.
Anthony PERARD July 27, 2017, 2:42 p.m. UTC | #4
On Wed, Jul 26, 2017 at 04:41:58PM +0100, Ian Jackson wrote:
> Anthony PERARD writes ("Re: [OSSTEST PATCH v13 19/24] TestSupport: Implement target_subunit_cmd a subunit stream parser into substeps"):
> > I think I start by looking at what kind of characters could be part of
> > type and sub-type, and just start writing a more complicated regex.
> > 
> > So is the following would be enough for you?
> > m{^Content-Type: [^/ ]+/[^/ ]+(?:;.+)?$}
> 
> Why do you need to check the at all ?  I think, according to the spec,
> that the only thing which can occur here is "Content-Type: something"
> or "]".  What would be wrong with
>    m{^content-type:}i
> ?

Ok, will go with that.

> > > > +                            # Read chunks of a part
> > > > +                            while (<$stdout>) {
> > > > +                                if (/^([0-9A-F]+)\r$/i) {
> > > > +                                    my $chunk_size = hex($1);
> > > 
> > > What makes you think the digits are in hex ?
> > 
> > I tried with [0-9] (because DIGITS), but that was not enought. Then I've
> > check the subunit implementation, there are using "%X" which is hex.
> 
> Wow.  Can you put a comment next to this please ?  Something like
> 
>  # The chunk size is in hex, even though this does not seem to be
>  # documented in the subunit specification.
> 
> perhaps.

I don't think there is a specification for subunit, beside the source
code. In the README of the project, it is called a "description", and
there is even "When in doubt, refer the source".

But I guess I can add the comment.
Anthony PERARD July 27, 2017, 4:12 p.m. UTC | #5
On Tue, Jul 25, 2017 at 07:00:47PM +0100, Ian Jackson wrote:
> Anthony PERARD writes ("[OSSTEST PATCH v13 19/24] TestSupport: Implement target_subunit_cmd a subunit stream parser into substeps"):
> > +                            # Read chunks of a part
> > +                            while (<$stdout>) {
> > +                                if (/^([0-9A-F]+)\r$/i) {
> > +                                    my $chunk_size = hex($1);
[...]
> > +                                } else {
> > +                                    # Unexpected output
> > +                                    chomp;
> > +                                    logm("*** $_");
> 
> I guess the error recovery is to continue until you see "]"
> and hope.  Fair enough.

Actually, in the chunk parser, the only expected output is an hex number
for a chunk size. There is nothing that parse "]". So the only error
recovery would be to find a line that only contain a hex number, this
could be the next "0\r\n" if part ends with a "\n", which I think is
likely.

Only the loop that parse the multipart check for "]".

Maybe I could end the chunks parser loop and go back to the multipart
parser loop.
diff mbox

Patch

diff --git a/Osstest/TestSupport.pm b/Osstest/TestSupport.pm
index 135289b..ba55967 100644
--- a/Osstest/TestSupport.pm
+++ b/Osstest/TestSupport.pm
@@ -55,7 +55,7 @@  BEGIN {
 
                       target_cmd_root_status target_cmd_output_root_status
                       target_cmd_root target_cmd target_cmd_build
-                      target_cmd_stashed
+                      target_cmd_stashed target_subunit_cmd
                       target_cmd_output_root target_cmd_output
                       target_cmd_inputfh_root sshuho
                       target_getfile target_getfile_root
@@ -775,6 +775,121 @@  sub target_cmd_stashed ($$$;$$) {
     return "$stash/$$leafref";
 }
 
+sub subunit_result_to_osstest_result ($) {
+    my ($ret) = @_;
+    return "pass" if $ret eq "success" or $ret eq "successful";
+    return "fail" if $ret eq "failure";
+    return "skip" if $ret eq "skip";
+    return "fail" if $ret eq "error";
+    # expected failure
+    return "pass" if $ret eq "xfail";
+    # unexpected success
+    return "fail" if $ret eq "uxsuccess";
+    die "subunit_result_to_osstest_result unexpected result $ret";
+}
+sub subunit_sanitize_testname ($) {
+    my ($testname) = @_;
+    $testname =~ s'[^_.()\[\]/~0-9a-zA-Z-]'_'g;
+    return $testname;
+}
+
+# Like target_cmd, but parse the command output as a subunit v1 stream and make
+# a substep out of each subunit test.
+sub target_subunit_cmd ($$;$$) {
+    my ($tho,$tcmd,$timeout,$extrasshopts) = @_;
+    my $filename = "subunit-output";
+    my $path = target_cmd_stashed($tho, \$filename, $tcmd, $timeout,
+        $extrasshopts);
+
+    open my $stdout, "$path" or die "$path: $!";
+
+    my $logfilename = undef;
+    my $fh = undef;
+
+    while (<$stdout>) {
+        if (/^time: (\d+)-(\d+)-(\d+) (\d+):(\d+):(\d+)(\.\d+)?Z$/) {
+            # This is the timestamp for the next events
+        } elsif (/^test(?:ing)?:? (.+)\n/) {
+            # Start of a new test.
+            $logfilename = subunit_sanitize_testname($1) . '.log';
+            $fh = open_unique_stashfile(\$logfilename);
+            substep_start(subunit_sanitize_testname($1), $logfilename);
+        } elsif (/^(success(?:ful)?|failure|skip|error|xfail|uxsuccess):
+                   \ (.+?)(\ \[(\ multipart)?)?$/x) {
+            # Result of a test, with its output.
+            my $result = $1;
+            my $testname = $2;
+            my $have_details = $3;
+            my $is_multipart = $4;
+
+            if ($have_details) {
+                if ($is_multipart) {
+                    # Test output
+                    while (<$stdout>) {
+                        # part content-type
+                        # from https://tools.ietf.org/html/rfc6838#section-4.2
+                        my $restricted_name = qr'[a-zA-Z0-9][a-zA-Z0-9!#$&^_.+-]*';
+                        if (m{ ^Content-Type:\s+
+                                $restricted_name/$restricted_name # type/sub-type
+                                # parameters
+                                (?:\s*;\s*$restricted_name=[^,]+
+                                  (?:,\s*$restricted_name=[^,]+)*)
+                                \s*$
+                            }xi) {
+                            print $fh $_ or die $!;
+
+                            # part name
+                            my $line = <$stdout>;
+                            print $fh $line or die $!;
+
+                            # Read chunks of a part
+                            while (<$stdout>) {
+                                if (/^([0-9A-F]+)\r$/i) {
+                                    my $chunk_size = hex($1);
+                                    my $chunk;
+
+                                    last if $chunk_size == 0;
+                                    read $stdout, $chunk, $chunk_size;
+                                    print $fh $chunk or die $!;
+                                } else {
+                                    # Unexpected output
+                                    chomp;
+                                    logm("*** $_");
+                                }
+                            }
+                        } elsif (/^\]$/) {
+                            last;
+                        } else {
+                            # Unexpected output
+                            chomp;
+                            logm("*** $_");
+                        }
+                    }
+                } else {
+                    # Simple non-multipart test output.
+                    while (<$stdout>) {
+                        last if (/^\]$/);
+                        print $fh $_ or die $!;
+                    }
+                }
+            }
+            close $fh or die $!;
+            substep_finish(subunit_sanitize_testname($testname),
+                subunit_result_to_osstest_result($result));
+        } elsif (/^tags: .+/) {
+            # unused
+        } elsif (/^progress: (?:[+-]?\d+|push|pop)$/) {
+            # unused
+        } else {
+            # Unexpected output
+            chomp;
+            logm("*** $_");
+        }
+    }
+
+    close $stdout or die $!;
+}
+
 sub poll_loop ($$$&) {
     my ($maxwait, $interval, $what, $code) = @_;
     # $code should return undef when all is well