diff mbox series

[RFC,v2,2/3] trace2: add a schema validator for trace2 events

Message ID 3fa4e9eef84ba00c631c82fb3a2eacb8439df9e5.1562712943.git.steadmon@google.com (mailing list archive)
State New, archived
Headers show
Series Add a JSON Schema for trace2 events | expand

Commit Message

Josh Steadmon July 9, 2019, 11:05 p.m. UTC
trace_schema_validator can be used to verify that trace2 event output
conforms to the expectations set by the API documentation and codified
in event_schema.json (or strict_schema.json). This allows us to build a
regression test to verify that trace2 output does not change
unexpectedly.

Signed-off-by: Josh Steadmon <steadmon@google.com>
---
 t/trace_schema_validator/.gitignore           |  1 +
 t/trace_schema_validator/Makefile             | 10 +++
 .../trace_schema_validator.go                 | 78 +++++++++++++++++++
 3 files changed, 89 insertions(+)
 create mode 100644 t/trace_schema_validator/.gitignore
 create mode 100644 t/trace_schema_validator/Makefile
 create mode 100644 t/trace_schema_validator/trace_schema_validator.go

Comments

Jakub Narębski July 11, 2019, 1:35 p.m. UTC | #1
Josh Steadmon <steadmon@google.com> writes:

> trace_schema_validator can be used to verify that trace2 event output
> conforms to the expectations set by the API documentation and codified
> in event_schema.json (or strict_schema.json). This allows us to build a
> regression test to verify that trace2 output does not change
> unexpectedly.
>
> Signed-off-by: Josh Steadmon <steadmon@google.com>

Very nitpicky comments below.

> ---
>  t/trace_schema_validator/.gitignore           |  1 +
>  t/trace_schema_validator/Makefile             | 10 +++
>  .../trace_schema_validator.go                 | 78 +++++++++++++++++++
>  3 files changed, 89 insertions(+)
>  create mode 100644 t/trace_schema_validator/.gitignore
>  create mode 100644 t/trace_schema_validator/Makefile
>  create mode 100644 t/trace_schema_validator/trace_schema_validator.go
>
> diff --git a/t/trace_schema_validator/.gitignore b/t/trace_schema_validator/.gitignore
> new file mode 100644
> index 0000000000..c3f1e04e9e
> --- /dev/null
> +++ b/t/trace_schema_validator/.gitignore
> @@ -0,0 +1 @@
> +trace_schema_validator
> diff --git a/t/trace_schema_validator/Makefile b/t/trace_schema_validator/Makefile
> new file mode 100644
> index 0000000000..ed22675e5d
> --- /dev/null
> +++ b/t/trace_schema_validator/Makefile
> @@ -0,0 +1,10 @@
> +.PHONY: fetch_deps clean
> +
> +trace_schema_validator: fetch_deps trace_schema_validator.go
> +	go build

I don't know the Go build process, but shouldn't the name of target and
the name of actual source file passed to the command?

Though I don't think we would _need_ for example being able to configure
Go build process via Makefile variables, like e.g. $(GOBUILD) in
https://sohlich.github.io/post/go_makefile/

> +
> +fetch_deps:
> +	go get github.com/xeipuuv/gojsonschema
> +
> +clean:
> +	rm -f trace_schema_validator

In git Makefile we use

  clean:
  	$(RM) $(PROGRAMS)

I'm not sure if it is needed for operating system independence, but
using $(RM) is a standard way to create 'clean' targets...

> diff --git a/t/trace_schema_validator/trace_schema_validator.go
> b/t/trace_schema_validator/trace_schema_validator.go
> new file mode 100644
> index 0000000000..f779ac5ff5
> --- /dev/null
> +++ b/t/trace_schema_validator/trace_schema_validator.go
> @@ -0,0 +1,78 @@
> +// trace_schema_validator validates individual lines of an input file against a
> +// provided JSON-Schema for git trace2 event output.
> +//
> +// Traces can be collected by setting the GIT_TRACE2_EVENT environment variable
> +// to an absolute path and running any Git command; traces will be appended to
> +// the file.
> +//
> +// Traces can then be verified like so:
> +//   trace_schema_validator \
> +//     --trace2_event_file /path/to/trace/output \
> +//     --schema_file /path/to/schema
> +package main
> +
> +import (
> +	"bufio"
> +	"flag"
> +	"log"
> +	"os"
> +	"path/filepath"
> +
> +	"github.com/xeipuuv/gojsonschema"
> +)
> +
> +// Required flags
> +var schemaFile = flag.String("schema_file", "", "JSON-Schema filename")
> +var trace2EventFile = flag.String("trace2_event_file", "", "trace2 event filename")

The standard for long options is to use "kebab case", not "snake case"
for them, i.e. --schema-file not current --schema_file, etc.

> +
> +func main() {
> +	flag.Parse()
> +	if *schemaFile == "" || *trace2EventFile == "" {
> +		log.Fatal("Both --schema_file and --trace2_event_file are required.")
> +	}

I guess that you prefer required options with explicit arguments instead
of positional arguments (that is requiring the command to be called with
two arguments, first being schema file, second being event file to
validate).

> +	schemaURI, err := filepath.Abs(*schemaFile)
> +	if err != nil {
> +		log.Fatal("Can't get absolute path for schema file: ", err)
> +	}
> +	schemaURI = "file://" + schemaURI
> +
> +	schemaLoader := gojsonschema.NewReferenceLoader(schemaURI)
> +	schema, err := gojsonschema.NewSchema(schemaLoader)
> +	if err != nil {
> +		log.Fatal("Problem loading schema: ", err)
> +	}
> +
> +	tracesFile, err := os.Open(*trace2EventFile)
> +	if err != nil {
> +		log.Fatal("Problem opening trace file: ", err)
> +	}
> +	defer tracesFile.Close()
> +
> +	scanner := bufio.NewScanner(tracesFile)
> +
> +	count := 0
> +	for ; scanner.Scan(); count++ {

I see that you assume JSON-Lines format, i.e. one JSON object per line
in the file.

> +		if count%10000 == 0 {
> +			// Travis-CI expects regular output or it will time out.
> +			log.Print("Validated items: ", count)

I wonder if it wouldn't be better to provide --progress flag, which
Travis-CI job would turn on.

> +		}
> +		event := gojsonschema.NewStringLoader(scanner.Text())
> +		result, err := schema.Validate(event)
> +		if err != nil {
> +			log.Fatal(err)
> +		}
> +		if !result.Valid() {
> +			log.Print("Trace event is invalid: ", scanner.Text())

It might be good idea to print the line (i.e. the value of `count`
variable).

I guess that conforming to the <filename>:<line>: prefix (like e.g. gcc
uses for errors) would be unnecessary.

> +			for _, desc := range result.Errors() {
> +				log.Print("- ", desc)
> +			}
> +			os.Exit(1)
> +		}
> +	}
> +
> +	if err := scanner.Err(); err != nil {
> +		log.Fatal("Scanning error: ", err)
> +	}
> +
> +	log.Print("Validated events: ", count)
> +}
Josh Steadmon July 24, 2019, 10:47 p.m. UTC | #2
Thanks for the review, replies are inline below.

On 2019.07.11 15:35, Jakub Narebski wrote:
> Josh Steadmon <steadmon@google.com> writes:
> 
> > trace_schema_validator can be used to verify that trace2 event output
> > conforms to the expectations set by the API documentation and codified
> > in event_schema.json (or strict_schema.json). This allows us to build a
> > regression test to verify that trace2 output does not change
> > unexpectedly.
> >
> > Signed-off-by: Josh Steadmon <steadmon@google.com>
> 
> Very nitpicky comments below.
> 
> > ---
> >  t/trace_schema_validator/.gitignore           |  1 +
> >  t/trace_schema_validator/Makefile             | 10 +++
> >  .../trace_schema_validator.go                 | 78 +++++++++++++++++++
> >  3 files changed, 89 insertions(+)
> >  create mode 100644 t/trace_schema_validator/.gitignore
> >  create mode 100644 t/trace_schema_validator/Makefile
> >  create mode 100644 t/trace_schema_validator/trace_schema_validator.go
> >
> > diff --git a/t/trace_schema_validator/.gitignore b/t/trace_schema_validator/.gitignore
> > new file mode 100644
> > index 0000000000..c3f1e04e9e
> > --- /dev/null
> > +++ b/t/trace_schema_validator/.gitignore
> > @@ -0,0 +1 @@
> > +trace_schema_validator
> > diff --git a/t/trace_schema_validator/Makefile b/t/trace_schema_validator/Makefile
> > new file mode 100644
> > index 0000000000..ed22675e5d
> > --- /dev/null
> > +++ b/t/trace_schema_validator/Makefile
> > @@ -0,0 +1,10 @@
> > +.PHONY: fetch_deps clean
> > +
> > +trace_schema_validator: fetch_deps trace_schema_validator.go
> > +	go build
> 
> I don't know the Go build process, but shouldn't the name of target and
> the name of actual source file passed to the command?
> 
> Though I don't think we would _need_ for example being able to configure
> Go build process via Makefile variables, like e.g. $(GOBUILD) in
> https://sohlich.github.io/post/go_makefile/

Yeah it seems optional for Go, but fixed to make things more consistent
with the other makefiles.

> > +
> > +fetch_deps:
> > +	go get github.com/xeipuuv/gojsonschema
> > +
> > +clean:
> > +	rm -f trace_schema_validator
> 
> In git Makefile we use
> 
>   clean:
>   	$(RM) $(PROGRAMS)
> 
> I'm not sure if it is needed for operating system independence, but
> using $(RM) is a standard way to create 'clean' targets...

Fixed in V3.

> > diff --git a/t/trace_schema_validator/trace_schema_validator.go
> > b/t/trace_schema_validator/trace_schema_validator.go
> > new file mode 100644
> > index 0000000000..f779ac5ff5
> > --- /dev/null
> > +++ b/t/trace_schema_validator/trace_schema_validator.go
> > @@ -0,0 +1,78 @@
> > +// trace_schema_validator validates individual lines of an input file against a
> > +// provided JSON-Schema for git trace2 event output.
> > +//
> > +// Traces can be collected by setting the GIT_TRACE2_EVENT environment variable
> > +// to an absolute path and running any Git command; traces will be appended to
> > +// the file.
> > +//
> > +// Traces can then be verified like so:
> > +//   trace_schema_validator \
> > +//     --trace2_event_file /path/to/trace/output \
> > +//     --schema_file /path/to/schema
> > +package main
> > +
> > +import (
> > +	"bufio"
> > +	"flag"
> > +	"log"
> > +	"os"
> > +	"path/filepath"
> > +
> > +	"github.com/xeipuuv/gojsonschema"
> > +)
> > +
> > +// Required flags
> > +var schemaFile = flag.String("schema_file", "", "JSON-Schema filename")
> > +var trace2EventFile = flag.String("trace2_event_file", "", "trace2 event filename")
> 
> The standard for long options is to use "kebab case", not "snake case"
> for them, i.e. --schema-file not current --schema_file, etc.

Fixed in V3.

> > +
> > +func main() {
> > +	flag.Parse()
> > +	if *schemaFile == "" || *trace2EventFile == "" {
> > +		log.Fatal("Both --schema_file and --trace2_event_file are required.")
> > +	}
> 
> I guess that you prefer required options with explicit arguments instead
> of positional arguments (that is requiring the command to be called with
> two arguments, first being schema file, second being event file to
> validate).

Yeah, that's the style I'm used to at work, but I can change this if
positional args are more acceptable for Git.

> > +	schemaURI, err := filepath.Abs(*schemaFile)
> > +	if err != nil {
> > +		log.Fatal("Can't get absolute path for schema file: ", err)
> > +	}
> > +	schemaURI = "file://" + schemaURI
> > +
> > +	schemaLoader := gojsonschema.NewReferenceLoader(schemaURI)
> > +	schema, err := gojsonschema.NewSchema(schemaLoader)
> > +	if err != nil {
> > +		log.Fatal("Problem loading schema: ", err)
> > +	}
> > +
> > +	tracesFile, err := os.Open(*trace2EventFile)
> > +	if err != nil {
> > +		log.Fatal("Problem opening trace file: ", err)
> > +	}
> > +	defer tracesFile.Close()
> > +
> > +	scanner := bufio.NewScanner(tracesFile)
> > +
> > +	count := 0
> > +	for ; scanner.Scan(); count++ {
> 
> I see that you assume JSON-Lines format, i.e. one JSON object per line
> in the file.

Yeah, I'll add a comment noting this. This won't work with the provided
list schemas unless we reformat everything to be on a single line.

> 
> > +		if count%10000 == 0 {
> > +			// Travis-CI expects regular output or it will time out.
> > +			log.Print("Validated items: ", count)
> 
> I wonder if it wouldn't be better to provide --progress flag, which
> Travis-CI job would turn on.

Done, thanks for the suggestion.

> > +		}
> > +		event := gojsonschema.NewStringLoader(scanner.Text())
> > +		result, err := schema.Validate(event)
> > +		if err != nil {
> > +			log.Fatal(err)
> > +		}
> > +		if !result.Valid() {
> > +			log.Print("Trace event is invalid: ", scanner.Text())
> 
> It might be good idea to print the line (i.e. the value of `count`
> variable).
> 
> I guess that conforming to the <filename>:<line>: prefix (like e.g. gcc
> uses for errors) would be unnecessary.

Done in V3.

> > +			for _, desc := range result.Errors() {
> > +				log.Print("- ", desc)
> > +			}
> > +			os.Exit(1)
> > +		}
> > +	}
> > +
> > +	if err := scanner.Err(); err != nil {
> > +		log.Fatal("Scanning error: ", err)
> > +	}
> > +
> > +	log.Print("Validated events: ", count)
> > +}
diff mbox series

Patch

diff --git a/t/trace_schema_validator/.gitignore b/t/trace_schema_validator/.gitignore
new file mode 100644
index 0000000000..c3f1e04e9e
--- /dev/null
+++ b/t/trace_schema_validator/.gitignore
@@ -0,0 +1 @@ 
+trace_schema_validator
diff --git a/t/trace_schema_validator/Makefile b/t/trace_schema_validator/Makefile
new file mode 100644
index 0000000000..ed22675e5d
--- /dev/null
+++ b/t/trace_schema_validator/Makefile
@@ -0,0 +1,10 @@ 
+.PHONY: fetch_deps clean
+
+trace_schema_validator: fetch_deps trace_schema_validator.go
+	go build
+
+fetch_deps:
+	go get github.com/xeipuuv/gojsonschema
+
+clean:
+	rm -f trace_schema_validator
diff --git a/t/trace_schema_validator/trace_schema_validator.go b/t/trace_schema_validator/trace_schema_validator.go
new file mode 100644
index 0000000000..f779ac5ff5
--- /dev/null
+++ b/t/trace_schema_validator/trace_schema_validator.go
@@ -0,0 +1,78 @@ 
+// trace_schema_validator validates individual lines of an input file against a
+// provided JSON-Schema for git trace2 event output.
+//
+// Traces can be collected by setting the GIT_TRACE2_EVENT environment variable
+// to an absolute path and running any Git command; traces will be appended to
+// the file.
+//
+// Traces can then be verified like so:
+//   trace_schema_validator \
+//     --trace2_event_file /path/to/trace/output \
+//     --schema_file /path/to/schema
+package main
+
+import (
+	"bufio"
+	"flag"
+	"log"
+	"os"
+	"path/filepath"
+
+	"github.com/xeipuuv/gojsonschema"
+)
+
+// Required flags
+var schemaFile = flag.String("schema_file", "", "JSON-Schema filename")
+var trace2EventFile = flag.String("trace2_event_file", "", "trace2 event filename")
+
+func main() {
+	flag.Parse()
+	if *schemaFile == "" || *trace2EventFile == "" {
+		log.Fatal("Both --schema_file and --trace2_event_file are required.")
+	}
+	schemaURI, err := filepath.Abs(*schemaFile)
+	if err != nil {
+		log.Fatal("Can't get absolute path for schema file: ", err)
+	}
+	schemaURI = "file://" + schemaURI
+
+	schemaLoader := gojsonschema.NewReferenceLoader(schemaURI)
+	schema, err := gojsonschema.NewSchema(schemaLoader)
+	if err != nil {
+		log.Fatal("Problem loading schema: ", err)
+	}
+
+	tracesFile, err := os.Open(*trace2EventFile)
+	if err != nil {
+		log.Fatal("Problem opening trace file: ", err)
+	}
+	defer tracesFile.Close()
+
+	scanner := bufio.NewScanner(tracesFile)
+
+	count := 0
+	for ; scanner.Scan(); count++ {
+		if count%10000 == 0 {
+			// Travis-CI expects regular output or it will time out.
+			log.Print("Validated items: ", count)
+		}
+		event := gojsonschema.NewStringLoader(scanner.Text())
+		result, err := schema.Validate(event)
+		if err != nil {
+			log.Fatal(err)
+		}
+		if !result.Valid() {
+			log.Print("Trace event is invalid: ", scanner.Text())
+			for _, desc := range result.Errors() {
+				log.Print("- ", desc)
+			}
+			os.Exit(1)
+		}
+	}
+
+	if err := scanner.Err(); err != nil {
+		log.Fatal("Scanning error: ", err)
+	}
+
+	log.Print("Validated events: ", count)
+}