Message ID | 20170725115759.21895-20-anthony.perard@citrix.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Anthony PERARD writes ("[OSSTEST PATCH v13 19/24] TestSupport: Implement target_subunit_cmd a subunit stream parser into substeps"): > target_subunit_cmd can be used like target_cmd, but the command would > needs to output a subunit v1 stream, which will be parsed and turned > into osstest substeps. The command can be `| subunit-2to1` in order to > turn a subunit v2 stream into v1. > > Currently, time is not taken into account, and all substeps will have > bogus timestamp as the output of the command is parsed after it has > runned. > > This is a description of the subunit v1 protocol, taken from > python-subunit README, or https://pypi.python.org/pypi/python-subunit What a lot of code! > + while (<$stdout>) { > + if (/^time: (\d+)-(\d+)-(\d+) (\d+):(\d+):(\d+)(\.\d+)?Z$/) { > + # This is the timestamp for the next events I'm not sure what your ( ) are doing here. > + } elsif (/^test(?:ing)?:? (.+)\n/) { > + # Start of a new test. > + $logfilename = subunit_sanitize_testname($1) . '.log'; > + $fh = open_unique_stashfile(\$logfilename); This name might clash with existing logfile names, which might be generated later. Can you put "subunit-" on the front maybe ? > + substep_start(subunit_sanitize_testname($1), $logfilename); And here, I think you should start the parameter you pass to substep_start with '/' so that it gets appended to the testid for the whole script, for a similar reason. I think it would be better to call subunit_sanitize_testname only once. > + } elsif (/^(success(?:ful)?|failure|skip|error|xfail|uxsuccess): > + \ (.+?)(\ \[(\ multipart)?)?$/x) { > + # Result of a test, with its output. > + my $result = $1; > + my $testname = $2; > + my $have_details = $3; > + my $is_multipart = $4; I would normally write this: my ($result, $testname, $have_... ) = ($1,$2,$3,$4,$5) although I don't really mind much that you have written it as you have. > + if ($have_details) { > + if ($is_multipart) { > + # Test output > + while (<$stdout>) { > + # part content-type > + # from https://tools.ietf.org/html/rfc6838#section-4.2 > + my $restricted_name = qr'[a-zA-Z0-9][a-zA-Z0-9!#$&^_.+-]*'; > + if (m{ ^Content-Type:\s+ > + $restricted_name/$restricted_name # type/sub-type > + # parameters > + (?:\s*;\s*$restricted_name=[^,]+ > + (?:,\s*$restricted_name=[^,]+)*) > + \s*$ > + }xi) { I don't understand why you are trying to match this Content-Type so precisely. AFAICT from the grammar, all you need to do is see whether there is something vaguely like a c-t header. > + print $fh $_ or die $!; > + > + # part name > + my $line = <$stdout>; > + print $fh $line or die $!; > + > + # Read chunks of a part > + while (<$stdout>) { > + if (/^([0-9A-F]+)\r$/i) { > + my $chunk_size = hex($1); What makes you think the digits are in hex ? Since you have to go to the effort of separating out all of this stuff, it might be worth printing these multipart objects with one object per logfile. Although I won't insist on that because I suspect that multipart results are rare. > + } else { > + # Unexpected output > + chomp; > + logm("*** $_"); I guess the error recovery is to continue until you see "]" and hope. Fair enough. Thanks, Ian.
On Tue, Jul 25, 2017 at 07:00:47PM +0100, Ian Jackson wrote: > Anthony PERARD writes ("[OSSTEST PATCH v13 19/24] TestSupport: Implement target_subunit_cmd a subunit stream parser into substeps"): > > target_subunit_cmd can be used like target_cmd, but the command would > > needs to output a subunit v1 stream, which will be parsed and turned > > into osstest substeps. The command can be `| subunit-2to1` in order to > > turn a subunit v2 stream into v1. > > > > Currently, time is not taken into account, and all substeps will have > > bogus timestamp as the output of the command is parsed after it has > > runned. > > > > This is a description of the subunit v1 protocol, taken from > > python-subunit README, or https://pypi.python.org/pypi/python-subunit > > What a lot of code! > > > + while (<$stdout>) { > > + if (/^time: (\d+)-(\d+)-(\d+) (\d+):(\d+):(\d+)(\.\d+)?Z$/) { > > + # This is the timestamp for the next events > > I'm not sure what your ( ) are doing here. No real reason. I'll remove them. > > + } elsif (/^test(?:ing)?:? (.+)\n/) { > > + # Start of a new test. > > + $logfilename = subunit_sanitize_testname($1) . '.log'; > > + $fh = open_unique_stashfile(\$logfilename); > > This name might clash with existing logfile names, which might be > generated later. Can you put "subunit-" on the front maybe ? Will do. > > + substep_start(subunit_sanitize_testname($1), $logfilename); > > And here, I think you should start the parameter you pass to > substep_start with '/' so that it gets appended to the testid for the > whole script, for a similar reason. OK. > I think it would be better to call subunit_sanitize_testname only > once. OK. > > + } elsif (/^(success(?:ful)?|failure|skip|error|xfail|uxsuccess): > > + \ (.+?)(\ \[(\ multipart)?)?$/x) { > > + # Result of a test, with its output. > > + my $result = $1; > > + my $testname = $2; > > + my $have_details = $3; > > + my $is_multipart = $4; > > I would normally write this: > my ($result, $testname, $have_... ) = ($1,$2,$3,$4,$5) > although I don't really mind much that you have written it as you > have. > > > + if ($have_details) { > > + if ($is_multipart) { > > + # Test output > > + while (<$stdout>) { > > + # part content-type > > + # from https://tools.ietf.org/html/rfc6838#section-4.2 > > + my $restricted_name = qr'[a-zA-Z0-9][a-zA-Z0-9!#$&^_.+-]*'; > > + if (m{ ^Content-Type:\s+ > > + $restricted_name/$restricted_name # type/sub-type > > + # parameters > > + (?:\s*;\s*$restricted_name=[^,]+ > > + (?:,\s*$restricted_name=[^,]+)*) > > + \s*$ > > + }xi) { > > I don't understand why you are trying to match this Content-Type so > precisely. AFAICT from the grammar, all you need to do is see whether > there is something vaguely like a c-t header. I think I start by looking at what kind of characters could be part of type and sub-type, and just start writing a more complicated regex. So is the following would be enough for you? m{^Content-Type: [^/ ]+/[^/ ]+(?:;.+)?$} > > + print $fh $_ or die $!; > > + > > + # part name > > + my $line = <$stdout>; > > + print $fh $line or die $!; > > + > > + # Read chunks of a part > > + while (<$stdout>) { > > + if (/^([0-9A-F]+)\r$/i) { > > + my $chunk_size = hex($1); > > What makes you think the digits are in hex ? I tried with [0-9] (because DIGITS), but that was not enought. Then I've check the subunit implementation, there are using "%X" which is hex. > Since you have to go to the effort of separating out all of this > stuff, it might be worth printing these multipart objects with one > object per logfile. Although I won't insist on that because I suspect > that multipart results are rare. There are usually 3 part per tests, with those names: pythonlogging:'' stdout stderr And sometime, there is also one name 'traceback'. I think stdout and stderr are usually empty. I think having one file per part will make it more complicated to read logs of a failed test. > > + } else { > > + # Unexpected output > > + chomp; > > + logm("*** $_"); > > I guess the error recovery is to continue until you see "]" > and hope. Fair enough. That one of the reason for the subunit-v2, with a binary protocol, better recovery.
Anthony PERARD writes ("Re: [OSSTEST PATCH v13 19/24] TestSupport: Implement target_subunit_cmd a subunit stream parser into substeps"): > I think I start by looking at what kind of characters could be part of > type and sub-type, and just start writing a more complicated regex. > > So is the following would be enough for you? > m{^Content-Type: [^/ ]+/[^/ ]+(?:;.+)?$} Why do you need to check the at all ? I think, according to the spec, that the only thing which can occur here is "Content-Type: something" or "]". What would be wrong with m{^content-type:}i ? > > > + # Read chunks of a part > > > + while (<$stdout>) { > > > + if (/^([0-9A-F]+)\r$/i) { > > > + my $chunk_size = hex($1); > > > > What makes you think the digits are in hex ? > > I tried with [0-9] (because DIGITS), but that was not enought. Then I've > check the subunit implementation, there are using "%X" which is hex. Wow. Can you put a comment next to this please ? Something like # The chunk size is in hex, even though this does not seem to be # documented in the subunit specification. perhaps. > > Since you have to go to the effort of separating out all of this > > stuff, it might be worth printing these multipart objects with one > > object per logfile. Although I won't insist on that because I suspect > > that multipart results are rare. > > There are usually 3 part per tests, with those names: > pythonlogging:'' > stdout > stderr > And sometime, there is also one name 'traceback'. > I think stdout and stderr are usually empty. > > I think having one file per part will make it more complicated to > read logs of a failed test. OK, leave it as-is then. (Also, "pythonlogging:''" ?!) > > I guess the error recovery is to continue until you see "]" > > and hope. Fair enough. > > That one of the reason for the subunit-v2, with a binary protocol, > better recovery. I don't think that's a good reason. But this ranting is quite off-topic now :-). Ian.
On Wed, Jul 26, 2017 at 04:41:58PM +0100, Ian Jackson wrote: > Anthony PERARD writes ("Re: [OSSTEST PATCH v13 19/24] TestSupport: Implement target_subunit_cmd a subunit stream parser into substeps"): > > I think I start by looking at what kind of characters could be part of > > type and sub-type, and just start writing a more complicated regex. > > > > So is the following would be enough for you? > > m{^Content-Type: [^/ ]+/[^/ ]+(?:;.+)?$} > > Why do you need to check the at all ? I think, according to the spec, > that the only thing which can occur here is "Content-Type: something" > or "]". What would be wrong with > m{^content-type:}i > ? Ok, will go with that. > > > > + # Read chunks of a part > > > > + while (<$stdout>) { > > > > + if (/^([0-9A-F]+)\r$/i) { > > > > + my $chunk_size = hex($1); > > > > > > What makes you think the digits are in hex ? > > > > I tried with [0-9] (because DIGITS), but that was not enought. Then I've > > check the subunit implementation, there are using "%X" which is hex. > > Wow. Can you put a comment next to this please ? Something like > > # The chunk size is in hex, even though this does not seem to be > # documented in the subunit specification. > > perhaps. I don't think there is a specification for subunit, beside the source code. In the README of the project, it is called a "description", and there is even "When in doubt, refer the source". But I guess I can add the comment.
On Tue, Jul 25, 2017 at 07:00:47PM +0100, Ian Jackson wrote: > Anthony PERARD writes ("[OSSTEST PATCH v13 19/24] TestSupport: Implement target_subunit_cmd a subunit stream parser into substeps"): > > + # Read chunks of a part > > + while (<$stdout>) { > > + if (/^([0-9A-F]+)\r$/i) { > > + my $chunk_size = hex($1); [...] > > + } else { > > + # Unexpected output > > + chomp; > > + logm("*** $_"); > > I guess the error recovery is to continue until you see "]" > and hope. Fair enough. Actually, in the chunk parser, the only expected output is an hex number for a chunk size. There is nothing that parse "]". So the only error recovery would be to find a line that only contain a hex number, this could be the next "0\r\n" if part ends with a "\n", which I think is likely. Only the loop that parse the multipart check for "]". Maybe I could end the chunks parser loop and go back to the multipart parser loop.
diff --git a/Osstest/TestSupport.pm b/Osstest/TestSupport.pm index 135289b..ba55967 100644 --- a/Osstest/TestSupport.pm +++ b/Osstest/TestSupport.pm @@ -55,7 +55,7 @@ BEGIN { target_cmd_root_status target_cmd_output_root_status target_cmd_root target_cmd target_cmd_build - target_cmd_stashed + target_cmd_stashed target_subunit_cmd target_cmd_output_root target_cmd_output target_cmd_inputfh_root sshuho target_getfile target_getfile_root @@ -775,6 +775,121 @@ sub target_cmd_stashed ($$$;$$) { return "$stash/$$leafref"; } +sub subunit_result_to_osstest_result ($) { + my ($ret) = @_; + return "pass" if $ret eq "success" or $ret eq "successful"; + return "fail" if $ret eq "failure"; + return "skip" if $ret eq "skip"; + return "fail" if $ret eq "error"; + # expected failure + return "pass" if $ret eq "xfail"; + # unexpected success + return "fail" if $ret eq "uxsuccess"; + die "subunit_result_to_osstest_result unexpected result $ret"; +} +sub subunit_sanitize_testname ($) { + my ($testname) = @_; + $testname =~ s'[^_.()\[\]/~0-9a-zA-Z-]'_'g; + return $testname; +} + +# Like target_cmd, but parse the command output as a subunit v1 stream and make +# a substep out of each subunit test. +sub target_subunit_cmd ($$;$$) { + my ($tho,$tcmd,$timeout,$extrasshopts) = @_; + my $filename = "subunit-output"; + my $path = target_cmd_stashed($tho, \$filename, $tcmd, $timeout, + $extrasshopts); + + open my $stdout, "$path" or die "$path: $!"; + + my $logfilename = undef; + my $fh = undef; + + while (<$stdout>) { + if (/^time: (\d+)-(\d+)-(\d+) (\d+):(\d+):(\d+)(\.\d+)?Z$/) { + # This is the timestamp for the next events + } elsif (/^test(?:ing)?:? (.+)\n/) { + # Start of a new test. + $logfilename = subunit_sanitize_testname($1) . '.log'; + $fh = open_unique_stashfile(\$logfilename); + substep_start(subunit_sanitize_testname($1), $logfilename); + } elsif (/^(success(?:ful)?|failure|skip|error|xfail|uxsuccess): + \ (.+?)(\ \[(\ multipart)?)?$/x) { + # Result of a test, with its output. + my $result = $1; + my $testname = $2; + my $have_details = $3; + my $is_multipart = $4; + + if ($have_details) { + if ($is_multipart) { + # Test output + while (<$stdout>) { + # part content-type + # from https://tools.ietf.org/html/rfc6838#section-4.2 + my $restricted_name = qr'[a-zA-Z0-9][a-zA-Z0-9!#$&^_.+-]*'; + if (m{ ^Content-Type:\s+ + $restricted_name/$restricted_name # type/sub-type + # parameters + (?:\s*;\s*$restricted_name=[^,]+ + (?:,\s*$restricted_name=[^,]+)*) + \s*$ + }xi) { + print $fh $_ or die $!; + + # part name + my $line = <$stdout>; + print $fh $line or die $!; + + # Read chunks of a part + while (<$stdout>) { + if (/^([0-9A-F]+)\r$/i) { + my $chunk_size = hex($1); + my $chunk; + + last if $chunk_size == 0; + read $stdout, $chunk, $chunk_size; + print $fh $chunk or die $!; + } else { + # Unexpected output + chomp; + logm("*** $_"); + } + } + } elsif (/^\]$/) { + last; + } else { + # Unexpected output + chomp; + logm("*** $_"); + } + } + } else { + # Simple non-multipart test output. + while (<$stdout>) { + last if (/^\]$/); + print $fh $_ or die $!; + } + } + } + close $fh or die $!; + substep_finish(subunit_sanitize_testname($testname), + subunit_result_to_osstest_result($result)); + } elsif (/^tags: .+/) { + # unused + } elsif (/^progress: (?:[+-]?\d+|push|pop)$/) { + # unused + } else { + # Unexpected output + chomp; + logm("*** $_"); + } + } + + close $stdout or die $!; +} + sub poll_loop ($$$&) { my ($maxwait, $interval, $what, $code) = @_; # $code should return undef when all is well
target_subunit_cmd can be used like target_cmd, but the command would needs to output a subunit v1 stream, which will be parsed and turned into osstest substeps. The command can be `| subunit-2to1` in order to turn a subunit v2 stream into v1. Currently, time is not taken into account, and all substeps will have bogus timestamp as the output of the command is parsed after it has runned. This is a description of the subunit v1 protocol, taken from python-subunit README, or https://pypi.python.org/pypi/python-subunit test|testing|test:|testing: test LABEL success|success:|successful|successful: test LABEL success|success:|successful|successful: test LABEL DETAILS failure: test LABEL failure: test LABEL DETAILS error: test LABEL error: test LABEL DETAILS skip[:] test LABEL skip[:] test LABEL DETAILS xfail[:] test LABEL xfail[:] test LABEL DETAILS uxsuccess[:] test LABEL uxsuccess[:] test LABEL DETAILS progress: [+|-]X progress: push progress: pop tags: [-]TAG ... time: YYYY-MM-DD HH:MM:SSZ LABEL: UTF8* NAME: UTF8* DETAILS ::= BRACKETED | MULTIPART BRACKETED ::= '[' CR UTF8-lines ']' CR MULTIPART ::= '[ multipart' CR PART* ']' CR PART ::= PART_TYPE CR NAME CR PART_BYTES CR PART_TYPE ::= Content-Type: type/sub-type(;parameter=value,parameter=value) PART_BYTES ::= (DIGITS CR LF BYTE{DIGITS})* '0' CR LF Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> --- Notes: Changes in v13: - also parse multipart output - add every possible test result - use target_cmd_stashed Osstest/TestSupport.pm | 117 ++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 116 insertions(+), 1 deletion(-)