diff mbox series

[1/2] xen/misra: add diff-report.py tool

Message ID 20230519094613.2134153-2-luca.fancellu@arm.com (mailing list archive)
State Superseded
Headers show
Series diff-report.py tool | expand

Commit Message

Luca Fancellu May 19, 2023, 9:46 a.m. UTC
Add a new tool, diff-report.py that can be used to make diff between
reports generated by xen-analysis.py tool.
Currently this tool supports the Xen cppcheck text report format in
its operations.

The tool prints every finding that is in the report passed with -r
(check report) which is not in the report passed with -b (baseline).

Signed-off-by: Luca Fancellu <luca.fancellu@arm.com>
---
Changes from v1:
 - removed 2 method from class ReportEntry that landed there by a
   mistake on rebase.
 - Made the script compatible also with python2 (Stefano)
---
 xen/scripts/diff-report.py                    |  80 ++++++++++++++
 .../xen_analysis/diff_tool/__init__.py        |   0
 .../xen_analysis/diff_tool/cppcheck_report.py |  44 ++++++++
 xen/scripts/xen_analysis/diff_tool/debug.py   |  40 +++++++
 xen/scripts/xen_analysis/diff_tool/report.py  | 100 ++++++++++++++++++
 5 files changed, 264 insertions(+)
 create mode 100755 xen/scripts/diff-report.py
 create mode 100644 xen/scripts/xen_analysis/diff_tool/__init__.py
 create mode 100644 xen/scripts/xen_analysis/diff_tool/cppcheck_report.py
 create mode 100644 xen/scripts/xen_analysis/diff_tool/debug.py
 create mode 100644 xen/scripts/xen_analysis/diff_tool/report.py

Comments

Jan Beulich May 19, 2023, 10:53 a.m. UTC | #1
On 19.05.2023 11:46, Luca Fancellu wrote:
> Add a new tool, diff-report.py that can be used to make diff between
> reports generated by xen-analysis.py tool.
> Currently this tool supports the Xen cppcheck text report format in
> its operations.
> 
> The tool prints every finding that is in the report passed with -r
> (check report) which is not in the report passed with -b (baseline).
> 
> Signed-off-by: Luca Fancellu <luca.fancellu@arm.com>
> ---
> Changes from v1:
>  - removed 2 method from class ReportEntry that landed there by a
>    mistake on rebase.
>  - Made the script compatible also with python2 (Stefano)
> ---
>  xen/scripts/diff-report.py                    |  80 ++++++++++++++
>  .../xen_analysis/diff_tool/__init__.py        |   0
>  .../xen_analysis/diff_tool/cppcheck_report.py |  44 ++++++++
>  xen/scripts/xen_analysis/diff_tool/debug.py   |  40 +++++++
>  xen/scripts/xen_analysis/diff_tool/report.py  | 100 ++++++++++++++++++
>  5 files changed, 264 insertions(+)
>  create mode 100755 xen/scripts/diff-report.py
>  create mode 100644 xen/scripts/xen_analysis/diff_tool/__init__.py
>  create mode 100644 xen/scripts/xen_analysis/diff_tool/cppcheck_report.py
>  create mode 100644 xen/scripts/xen_analysis/diff_tool/debug.py
>  create mode 100644 xen/scripts/xen_analysis/diff_tool/report.py

If I'm not mistaken Python has no issue with dashes in path names.
Hence it would once again be better if the underscores were avoided
in the new directory names.

Jan
Luca Fancellu May 19, 2023, 11:10 a.m. UTC | #2
> On 19 May 2023, at 11:53, Jan Beulich <jbeulich@suse.com> wrote:
> 
> On 19.05.2023 11:46, Luca Fancellu wrote:
>> Add a new tool, diff-report.py that can be used to make diff between
>> reports generated by xen-analysis.py tool.
>> Currently this tool supports the Xen cppcheck text report format in
>> its operations.
>> 
>> The tool prints every finding that is in the report passed with -r
>> (check report) which is not in the report passed with -b (baseline).
>> 
>> Signed-off-by: Luca Fancellu <luca.fancellu@arm.com>
>> ---
>> Changes from v1:
>> - removed 2 method from class ReportEntry that landed there by a
>>   mistake on rebase.
>> - Made the script compatible also with python2 (Stefano)
>> ---
>> xen/scripts/diff-report.py                    |  80 ++++++++++++++
>> .../xen_analysis/diff_tool/__init__.py        |   0
>> .../xen_analysis/diff_tool/cppcheck_report.py |  44 ++++++++
>> xen/scripts/xen_analysis/diff_tool/debug.py   |  40 +++++++
>> xen/scripts/xen_analysis/diff_tool/report.py  | 100 ++++++++++++++++++
>> 5 files changed, 264 insertions(+)
>> create mode 100755 xen/scripts/diff-report.py
>> create mode 100644 xen/scripts/xen_analysis/diff_tool/__init__.py
>> create mode 100644 xen/scripts/xen_analysis/diff_tool/cppcheck_report.py
>> create mode 100644 xen/scripts/xen_analysis/diff_tool/debug.py
>> create mode 100644 xen/scripts/xen_analysis/diff_tool/report.py
> 
> If I'm not mistaken Python has no issue with dashes in path names.
> Hence it would once again be better if the underscores were avoided
> in the new directory names.

Hi Jan,

From what I know python can’t use import for module with dashes in the name, unless
using some tricks, but if anyone knows more about that please correct me if I’m wrong.

The style guide for python (https://peps.python.org/pep-0008/#package-and-module-names)
Says:

Modules should have short, all-lowercase names. Underscores can be used in the module
name if it improves readability. Python packages should also have short, all-lowercase names,
although the use of underscores is discouraged.

So, yes, the use is discouraged, but here I think it improves the readability. Unless we want
to use “difftool” instead of “diff_tool” and “cppcheckreport” instead of “cppcheck_report”.

Can I ask the reason why we need to avoid underscores in file names?

> 
> Jan
Jan Beulich May 19, 2023, 1:46 p.m. UTC | #3
On 19.05.2023 13:10, Luca Fancellu wrote:
> 
> 
>> On 19 May 2023, at 11:53, Jan Beulich <jbeulich@suse.com> wrote:
>>
>> On 19.05.2023 11:46, Luca Fancellu wrote:
>>> Add a new tool, diff-report.py that can be used to make diff between
>>> reports generated by xen-analysis.py tool.
>>> Currently this tool supports the Xen cppcheck text report format in
>>> its operations.
>>>
>>> The tool prints every finding that is in the report passed with -r
>>> (check report) which is not in the report passed with -b (baseline).
>>>
>>> Signed-off-by: Luca Fancellu <luca.fancellu@arm.com>
>>> ---
>>> Changes from v1:
>>> - removed 2 method from class ReportEntry that landed there by a
>>>   mistake on rebase.
>>> - Made the script compatible also with python2 (Stefano)
>>> ---
>>> xen/scripts/diff-report.py                    |  80 ++++++++++++++
>>> .../xen_analysis/diff_tool/__init__.py        |   0
>>> .../xen_analysis/diff_tool/cppcheck_report.py |  44 ++++++++
>>> xen/scripts/xen_analysis/diff_tool/debug.py   |  40 +++++++
>>> xen/scripts/xen_analysis/diff_tool/report.py  | 100 ++++++++++++++++++
>>> 5 files changed, 264 insertions(+)
>>> create mode 100755 xen/scripts/diff-report.py
>>> create mode 100644 xen/scripts/xen_analysis/diff_tool/__init__.py
>>> create mode 100644 xen/scripts/xen_analysis/diff_tool/cppcheck_report.py
>>> create mode 100644 xen/scripts/xen_analysis/diff_tool/debug.py
>>> create mode 100644 xen/scripts/xen_analysis/diff_tool/report.py
>>
>> If I'm not mistaken Python has no issue with dashes in path names.
>> Hence it would once again be better if the underscores were avoided
>> in the new directory names.
> 
> From what I know python can’t use import for module with dashes in the name, unless
> using some tricks, but if anyone knows more about that please correct me if I’m wrong.
> 
> The style guide for python (https://peps.python.org/pep-0008/#package-and-module-names)
> Says:
> 
> Modules should have short, all-lowercase names. Underscores can be used in the module
> name if it improves readability. Python packages should also have short, all-lowercase names,
> although the use of underscores is discouraged.

Hmm, I was initially thinking there might be such a restriction, but
the I checked a pretty old installation and found plat-linux2/ there
with several .py / .pyo / .pyc files underneath. Which suggested to
me that, for them to be of any use, such a path name must be permitted.

But well, if underscores are required to be used if any separator
is wanted, so be it. Albeit ...

> So, yes, the use is discouraged, but here I think it improves the readability. Unless we want
> to use “difftool” instead of “diff_tool” and “cppcheckreport” instead of “cppcheck_report”.

... personally I'd like both shorter variants better, plus perhaps
xen_ dropped from xen_analysis, or some different name used there
altogether (to me this name doesn't really tell me what to expect
there, but maybe that's indeed just me).

> Can I ask the reason why we need to avoid underscores in file names?

First of all they're odd, a space or dash is simply more natural to
use. From my pov they ought to be used only when a visual separator
is wanted, but neither space nor dash fit the purpose (e.g. for
lexical reasons in programming languages). Plus typing them requires,
on all keyboards I'm aware of, <shift> to be used when dash doesn't.

Jan
Stefano Stabellini May 25, 2023, 1:08 a.m. UTC | #4
On Fri, 19 May 2023, Luca Fancellu wrote:
> Add a new tool, diff-report.py that can be used to make diff between
> reports generated by xen-analysis.py tool.
> Currently this tool supports the Xen cppcheck text report format in
> its operations.
> 
> The tool prints every finding that is in the report passed with -r
> (check report) which is not in the report passed with -b (baseline).
> 
> Signed-off-by: Luca Fancellu <luca.fancellu@arm.com>

Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Tested-by: Stefano Stabellini <sstabellini@kernel.org>


> ---
> Changes from v1:
>  - removed 2 method from class ReportEntry that landed there by a
>    mistake on rebase.
>  - Made the script compatible also with python2 (Stefano)
> ---
>  xen/scripts/diff-report.py                    |  80 ++++++++++++++
>  .../xen_analysis/diff_tool/__init__.py        |   0
>  .../xen_analysis/diff_tool/cppcheck_report.py |  44 ++++++++
>  xen/scripts/xen_analysis/diff_tool/debug.py   |  40 +++++++
>  xen/scripts/xen_analysis/diff_tool/report.py  | 100 ++++++++++++++++++
>  5 files changed, 264 insertions(+)
>  create mode 100755 xen/scripts/diff-report.py
>  create mode 100644 xen/scripts/xen_analysis/diff_tool/__init__.py
>  create mode 100644 xen/scripts/xen_analysis/diff_tool/cppcheck_report.py
>  create mode 100644 xen/scripts/xen_analysis/diff_tool/debug.py
>  create mode 100644 xen/scripts/xen_analysis/diff_tool/report.py
> 
> diff --git a/xen/scripts/diff-report.py b/xen/scripts/diff-report.py
> new file mode 100755
> index 000000000000..f97cb2355cc3
> --- /dev/null
> +++ b/xen/scripts/diff-report.py
> @@ -0,0 +1,80 @@
> +#!/usr/bin/env python3
> +
> +from __future__ import print_function
> +import os
> +import sys
> +from argparse import ArgumentParser
> +from xen_analysis.diff_tool.cppcheck_report import CppcheckReport
> +from xen_analysis.diff_tool.debug import Debug
> +from xen_analysis.diff_tool.report import ReportError
> +
> +
> +def log_info(text, end='\n'):
> +    # type: (str, str) -> None
> +    global args
> +    global file_out
> +
> +    if (args.verbose):
> +        print(text, end=end, file=file_out)
> +
> +
> +def main(argv):
> +    # type: (list) -> None
> +    global args
> +    global file_out
> +
> +    parser = ArgumentParser(prog="diff-report.py")
> +    parser.add_argument("-b", "--baseline", required=True, type=str,
> +                        help="Path to the baseline report.")
> +    parser.add_argument("--debug", action='store_true',
> +                        help="Produce intermediate reports during operations.")
> +    parser.add_argument("-o", "--out", default="stdout", type=str,
> +                        help="Where to print the tool output. Default is "
> +                             "stdout")
> +    parser.add_argument("-r", "--report", required=True, type=str,
> +                        help="Path to the 'check report', the one checked "
> +                             "against the baseline.")
> +    parser.add_argument("-v", "--verbose", action='store_true',
> +                        help="Print more informations during the run.")
> +
> +    args = parser.parse_args()
> +
> +    if args.out == "stdout":
> +        file_out = sys.stdout
> +    else:
> +        try:
> +            file_out = open(args.out, "wt")
> +        except OSError as e:
> +            print("ERROR: Issue opening file {}: {}".format(args.out, e))
> +            sys.exit(1)
> +
> +    debug = Debug(args)
> +
> +    try:
> +        baseline_path = os.path.realpath(args.baseline)
> +        log_info("Loading baseline report {}".format(baseline_path), "")
> +        baseline = CppcheckReport(baseline_path)
> +        baseline.parse()
> +        debug.debug_print_parsed_report(baseline)
> +        log_info(" [OK]")
> +        new_rep_path = os.path.realpath(args.report)
> +        log_info("Loading check report {}".format(new_rep_path), "")
> +        new_rep = CppcheckReport(new_rep_path)
> +        new_rep.parse()
> +        debug.debug_print_parsed_report(new_rep)
> +        log_info(" [OK]")
> +    except ReportError as e:
> +        print("ERROR: {}".format(e))
> +        sys.exit(1)
> +
> +    output = new_rep - baseline
> +    print(output, end="", file=file_out)
> +
> +    if len(output) > 0:
> +        sys.exit(1)
> +
> +    sys.exit(0)
> +
> +
> +if __name__ == "__main__":
> +    main(sys.argv[1:])
> diff --git a/xen/scripts/xen_analysis/diff_tool/__init__.py b/xen/scripts/xen_analysis/diff_tool/__init__.py
> new file mode 100644
> index 000000000000..e69de29bb2d1
> diff --git a/xen/scripts/xen_analysis/diff_tool/cppcheck_report.py b/xen/scripts/xen_analysis/diff_tool/cppcheck_report.py
> new file mode 100644
> index 000000000000..e7e80a9dde84
> --- /dev/null
> +++ b/xen/scripts/xen_analysis/diff_tool/cppcheck_report.py
> @@ -0,0 +1,44 @@
> +#!/usr/bin/env python3
> +
> +import re
> +from .report import Report, ReportError
> +
> +
> +class CppcheckReport(Report):
> +    def __init__(self, report_path):
> +        # type: (str) -> None
> +        super(CppcheckReport, self).__init__(report_path)
> +        # This matches a string like:
> +        # path/to/file.c(<line number>,<digits>):<whatever>
> +        # and captures file name path and line number
> +        # the last capture group is used for text substitution in __str__
> +        self.__report_entry_regex = re.compile(r'^(.*)\((\d+)(,\d+\):.*)$')
> +
> +    def parse(self):
> +        # type: () -> None
> +        report_path = self.get_report_path()
> +        try:
> +            with open(report_path, "rt") as infile:
> +                report_lines = infile.readlines()
> +        except OSError as e:
> +            raise ReportError("Issue with reading file {}: {}"
> +                              .format(report_path, e))
> +        for line in report_lines:
> +            entry = self.__report_entry_regex.match(line)
> +            if entry and entry.group(1) and entry.group(2):
> +                file_path = entry.group(1)
> +                line_number = int(entry.group(2))
> +                self.add_entry(file_path, line_number, line)
> +            else:
> +                raise ReportError("Malformed report entry in file {}:\n{}"
> +                                  .format(report_path, line))
> +
> +    def __str__(self):
> +        # type: () -> str
> +        ret = ""
> +        for entry in self.to_list():
> +            ret += re.sub(self.__report_entry_regex,
> +                          r'{}({}\3'.format(entry.file_path,
> +                                            entry.line_number),
> +                          entry.text)
> +        return ret
> diff --git a/xen/scripts/xen_analysis/diff_tool/debug.py b/xen/scripts/xen_analysis/diff_tool/debug.py
> new file mode 100644
> index 000000000000..65cca2464110
> --- /dev/null
> +++ b/xen/scripts/xen_analysis/diff_tool/debug.py
> @@ -0,0 +1,40 @@
> +#!/usr/bin/env python3
> +
> +from __future__ import print_function
> +import os
> +from .report import Report
> +
> +
> +class Debug:
> +    def __init__(self, args):
> +        self.args = args
> +
> +    def __get_debug_out_filename(self, path, type):
> +        # type: (str, str) -> str
> +        # Take basename
> +        file_name = os.path.basename(path)
> +        # Split in name and extension
> +        file_name = os.path.splitext(file_name)
> +        if self.args.out != "stdout":
> +            out_folder = os.path.dirname(self.args.out)
> +        else:
> +            out_folder = "./"
> +        dbg_report_path = out_folder + file_name[0] + type + file_name[1]
> +
> +        return dbg_report_path
> +
> +    def __debug_print_report(self, report, type):
> +        # type: (Report, str) -> None
> +        report_name = self.__get_debug_out_filename(report.get_report_path(),
> +                                                    type)
> +        try:
> +            with open(report_name, "wt") as outfile:
> +                print(report, end="", file=outfile)
> +        except OSError as e:
> +            print("ERROR: Issue opening file {}: {}".format(report_name, e))
> +
> +    def debug_print_parsed_report(self, report):
> +        # type: (Report) -> None
> +        if not self.args.debug:
> +            return
> +        self.__debug_print_report(report, ".parsed")
> diff --git a/xen/scripts/xen_analysis/diff_tool/report.py b/xen/scripts/xen_analysis/diff_tool/report.py
> new file mode 100644
> index 000000000000..4a303d61b3ea
> --- /dev/null
> +++ b/xen/scripts/xen_analysis/diff_tool/report.py
> @@ -0,0 +1,100 @@
> +#!/usr/bin/env python3
> +
> +import os
> +
> +
> +class ReportError(Exception):
> +    pass
> +
> +
> +class Report(object):
> +    class ReportEntry:
> +        def __init__(self, file_path, line_number, entry_text, line_id):
> +            # type: (str, int, list, int) -> None
> +            if not isinstance(line_number, int) or \
> +               not isinstance(line_id, int):
> +                raise ReportError("ReportEntry constructor wrong type args")
> +            self.file_path = file_path
> +            self.line_number = line_number
> +            self.text = entry_text
> +            self.line_id = line_id
> +
> +    def __init__(self, report_path):
> +        # type: (str) -> None
> +        self.__entries = {}
> +        self.__path = report_path
> +        self.__last_line_order = 0
> +
> +    def parse(self):
> +        # type: () -> None
> +        raise ReportError("Please create a specialised class from 'Report'.")
> +
> +    def get_report_path(self):
> +        # type: () -> str
> +        return self.__path
> +
> +    def get_report_entries(self):
> +        # type: () -> dict
> +        return self.__entries
> +
> +    def add_entry(self, entry_path, entry_line_number, entry_text):
> +        # type: (str, int, str) -> None
> +        entry = Report.ReportEntry(entry_path, entry_line_number, entry_text,
> +                                   self.__last_line_order)
> +        if entry_path in self.__entries.keys():
> +            self.__entries[entry_path].append(entry)
> +        else:
> +            self.__entries[entry_path] = [entry]
> +        self.__last_line_order += 1
> +
> +    def to_list(self):
> +        # type: () -> list
> +        report_list = []
> +        for _, entries in self.__entries.items():
> +            for entry in entries:
> +                report_list.append(entry)
> +
> +        report_list.sort(key=lambda x: x.line_id)
> +        return report_list
> +
> +    def __str__(self):
> +        # type: () -> str
> +        ret = ""
> +        for entry in self.to_list():
> +            ret += entry.file_path + ":" + entry.line_number + ":" + entry.text
> +
> +        return ret
> +
> +    def __len__(self):
> +        # type: () -> int
> +        return len(self.to_list())
> +
> +    def __sub__(self, report_b):
> +        # type: (Report) -> Report
> +        if self.__class__ != report_b.__class__:
> +            raise ReportError("Diff of different type of report!")
> +
> +        filename, file_extension = os.path.splitext(self.__path)
> +        diff_report = self.__class__(filename + ".diff" + file_extension)
> +        # Put in the diff report only records of this report that are not
> +        # present in the report_b.
> +        for file_path, entries in self.__entries.items():
> +            rep_b_entries = report_b.get_report_entries()
> +            if file_path in rep_b_entries.keys():
> +                # File path exists in report_b, so check what entries of that
> +                # file path doesn't exist in report_b and add them to the diff
> +                rep_b_entries_num = [
> +                    x.line_number for x in rep_b_entries[file_path]
> +                ]
> +                for entry in entries:
> +                    if entry.line_number not in rep_b_entries_num:
> +                        diff_report.add_entry(file_path, entry.line_number,
> +                                              entry.text)
> +            else:
> +                # File path doesn't exist in report_b, so add every entry
> +                # of that file path to the diff
> +                for entry in entries:
> +                    diff_report.add_entry(file_path, entry.line_number,
> +                                          entry.text)
> +
> +        return diff_report
> -- 
> 2.34.1
>
diff mbox series

Patch

diff --git a/xen/scripts/diff-report.py b/xen/scripts/diff-report.py
new file mode 100755
index 000000000000..f97cb2355cc3
--- /dev/null
+++ b/xen/scripts/diff-report.py
@@ -0,0 +1,80 @@ 
+#!/usr/bin/env python3
+
+from __future__ import print_function
+import os
+import sys
+from argparse import ArgumentParser
+from xen_analysis.diff_tool.cppcheck_report import CppcheckReport
+from xen_analysis.diff_tool.debug import Debug
+from xen_analysis.diff_tool.report import ReportError
+
+
+def log_info(text, end='\n'):
+    # type: (str, str) -> None
+    global args
+    global file_out
+
+    if (args.verbose):
+        print(text, end=end, file=file_out)
+
+
+def main(argv):
+    # type: (list) -> None
+    global args
+    global file_out
+
+    parser = ArgumentParser(prog="diff-report.py")
+    parser.add_argument("-b", "--baseline", required=True, type=str,
+                        help="Path to the baseline report.")
+    parser.add_argument("--debug", action='store_true',
+                        help="Produce intermediate reports during operations.")
+    parser.add_argument("-o", "--out", default="stdout", type=str,
+                        help="Where to print the tool output. Default is "
+                             "stdout")
+    parser.add_argument("-r", "--report", required=True, type=str,
+                        help="Path to the 'check report', the one checked "
+                             "against the baseline.")
+    parser.add_argument("-v", "--verbose", action='store_true',
+                        help="Print more informations during the run.")
+
+    args = parser.parse_args()
+
+    if args.out == "stdout":
+        file_out = sys.stdout
+    else:
+        try:
+            file_out = open(args.out, "wt")
+        except OSError as e:
+            print("ERROR: Issue opening file {}: {}".format(args.out, e))
+            sys.exit(1)
+
+    debug = Debug(args)
+
+    try:
+        baseline_path = os.path.realpath(args.baseline)
+        log_info("Loading baseline report {}".format(baseline_path), "")
+        baseline = CppcheckReport(baseline_path)
+        baseline.parse()
+        debug.debug_print_parsed_report(baseline)
+        log_info(" [OK]")
+        new_rep_path = os.path.realpath(args.report)
+        log_info("Loading check report {}".format(new_rep_path), "")
+        new_rep = CppcheckReport(new_rep_path)
+        new_rep.parse()
+        debug.debug_print_parsed_report(new_rep)
+        log_info(" [OK]")
+    except ReportError as e:
+        print("ERROR: {}".format(e))
+        sys.exit(1)
+
+    output = new_rep - baseline
+    print(output, end="", file=file_out)
+
+    if len(output) > 0:
+        sys.exit(1)
+
+    sys.exit(0)
+
+
+if __name__ == "__main__":
+    main(sys.argv[1:])
diff --git a/xen/scripts/xen_analysis/diff_tool/__init__.py b/xen/scripts/xen_analysis/diff_tool/__init__.py
new file mode 100644
index 000000000000..e69de29bb2d1
diff --git a/xen/scripts/xen_analysis/diff_tool/cppcheck_report.py b/xen/scripts/xen_analysis/diff_tool/cppcheck_report.py
new file mode 100644
index 000000000000..e7e80a9dde84
--- /dev/null
+++ b/xen/scripts/xen_analysis/diff_tool/cppcheck_report.py
@@ -0,0 +1,44 @@ 
+#!/usr/bin/env python3
+
+import re
+from .report import Report, ReportError
+
+
+class CppcheckReport(Report):
+    def __init__(self, report_path):
+        # type: (str) -> None
+        super(CppcheckReport, self).__init__(report_path)
+        # This matches a string like:
+        # path/to/file.c(<line number>,<digits>):<whatever>
+        # and captures file name path and line number
+        # the last capture group is used for text substitution in __str__
+        self.__report_entry_regex = re.compile(r'^(.*)\((\d+)(,\d+\):.*)$')
+
+    def parse(self):
+        # type: () -> None
+        report_path = self.get_report_path()
+        try:
+            with open(report_path, "rt") as infile:
+                report_lines = infile.readlines()
+        except OSError as e:
+            raise ReportError("Issue with reading file {}: {}"
+                              .format(report_path, e))
+        for line in report_lines:
+            entry = self.__report_entry_regex.match(line)
+            if entry and entry.group(1) and entry.group(2):
+                file_path = entry.group(1)
+                line_number = int(entry.group(2))
+                self.add_entry(file_path, line_number, line)
+            else:
+                raise ReportError("Malformed report entry in file {}:\n{}"
+                                  .format(report_path, line))
+
+    def __str__(self):
+        # type: () -> str
+        ret = ""
+        for entry in self.to_list():
+            ret += re.sub(self.__report_entry_regex,
+                          r'{}({}\3'.format(entry.file_path,
+                                            entry.line_number),
+                          entry.text)
+        return ret
diff --git a/xen/scripts/xen_analysis/diff_tool/debug.py b/xen/scripts/xen_analysis/diff_tool/debug.py
new file mode 100644
index 000000000000..65cca2464110
--- /dev/null
+++ b/xen/scripts/xen_analysis/diff_tool/debug.py
@@ -0,0 +1,40 @@ 
+#!/usr/bin/env python3
+
+from __future__ import print_function
+import os
+from .report import Report
+
+
+class Debug:
+    def __init__(self, args):
+        self.args = args
+
+    def __get_debug_out_filename(self, path, type):
+        # type: (str, str) -> str
+        # Take basename
+        file_name = os.path.basename(path)
+        # Split in name and extension
+        file_name = os.path.splitext(file_name)
+        if self.args.out != "stdout":
+            out_folder = os.path.dirname(self.args.out)
+        else:
+            out_folder = "./"
+        dbg_report_path = out_folder + file_name[0] + type + file_name[1]
+
+        return dbg_report_path
+
+    def __debug_print_report(self, report, type):
+        # type: (Report, str) -> None
+        report_name = self.__get_debug_out_filename(report.get_report_path(),
+                                                    type)
+        try:
+            with open(report_name, "wt") as outfile:
+                print(report, end="", file=outfile)
+        except OSError as e:
+            print("ERROR: Issue opening file {}: {}".format(report_name, e))
+
+    def debug_print_parsed_report(self, report):
+        # type: (Report) -> None
+        if not self.args.debug:
+            return
+        self.__debug_print_report(report, ".parsed")
diff --git a/xen/scripts/xen_analysis/diff_tool/report.py b/xen/scripts/xen_analysis/diff_tool/report.py
new file mode 100644
index 000000000000..4a303d61b3ea
--- /dev/null
+++ b/xen/scripts/xen_analysis/diff_tool/report.py
@@ -0,0 +1,100 @@ 
+#!/usr/bin/env python3
+
+import os
+
+
+class ReportError(Exception):
+    pass
+
+
+class Report(object):
+    class ReportEntry:
+        def __init__(self, file_path, line_number, entry_text, line_id):
+            # type: (str, int, list, int) -> None
+            if not isinstance(line_number, int) or \
+               not isinstance(line_id, int):
+                raise ReportError("ReportEntry constructor wrong type args")
+            self.file_path = file_path
+            self.line_number = line_number
+            self.text = entry_text
+            self.line_id = line_id
+
+    def __init__(self, report_path):
+        # type: (str) -> None
+        self.__entries = {}
+        self.__path = report_path
+        self.__last_line_order = 0
+
+    def parse(self):
+        # type: () -> None
+        raise ReportError("Please create a specialised class from 'Report'.")
+
+    def get_report_path(self):
+        # type: () -> str
+        return self.__path
+
+    def get_report_entries(self):
+        # type: () -> dict
+        return self.__entries
+
+    def add_entry(self, entry_path, entry_line_number, entry_text):
+        # type: (str, int, str) -> None
+        entry = Report.ReportEntry(entry_path, entry_line_number, entry_text,
+                                   self.__last_line_order)
+        if entry_path in self.__entries.keys():
+            self.__entries[entry_path].append(entry)
+        else:
+            self.__entries[entry_path] = [entry]
+        self.__last_line_order += 1
+
+    def to_list(self):
+        # type: () -> list
+        report_list = []
+        for _, entries in self.__entries.items():
+            for entry in entries:
+                report_list.append(entry)
+
+        report_list.sort(key=lambda x: x.line_id)
+        return report_list
+
+    def __str__(self):
+        # type: () -> str
+        ret = ""
+        for entry in self.to_list():
+            ret += entry.file_path + ":" + entry.line_number + ":" + entry.text
+
+        return ret
+
+    def __len__(self):
+        # type: () -> int
+        return len(self.to_list())
+
+    def __sub__(self, report_b):
+        # type: (Report) -> Report
+        if self.__class__ != report_b.__class__:
+            raise ReportError("Diff of different type of report!")
+
+        filename, file_extension = os.path.splitext(self.__path)
+        diff_report = self.__class__(filename + ".diff" + file_extension)
+        # Put in the diff report only records of this report that are not
+        # present in the report_b.
+        for file_path, entries in self.__entries.items():
+            rep_b_entries = report_b.get_report_entries()
+            if file_path in rep_b_entries.keys():
+                # File path exists in report_b, so check what entries of that
+                # file path doesn't exist in report_b and add them to the diff
+                rep_b_entries_num = [
+                    x.line_number for x in rep_b_entries[file_path]
+                ]
+                for entry in entries:
+                    if entry.line_number not in rep_b_entries_num:
+                        diff_report.add_entry(file_path, entry.line_number,
+                                              entry.text)
+            else:
+                # File path doesn't exist in report_b, so add every entry
+                # of that file path to the diff
+                for entry in entries:
+                    diff_report.add_entry(file_path, entry.line_number,
+                                          entry.text)
+
+        return diff_report