[1/1] contrib: add coverage-diff script
diff mbox series

Message ID e4124471e5494b737d99eceed25fb03e787d0b96.1536770746.git.gitgitgadget@gmail.com
State New
Headers show
Series
  • contrib: Add script to show uncovered "new" lines
Related show

Commit Message

erik chen via GitGitGadget Sept. 12, 2018, 4:45 p.m. UTC
From: Derrick Stolee <dstolee@microsoft.com>

We have coverage targets in our Makefile for using gcov to display line
coverage based on our test suite. The way I like to do it is to run:

    make coverage-test
    make coverage-report

This leaves the repo in a state where every X.c file that was covered has
an X.c.gcov file containing the coverage counts for every line, and "#####"
at every uncovered line.

There have been a few bugs in recent patches what would have been caught
if the test suite covered those blocks (including a few of mine). I want
to work towards a "sensible" amount of coverage on new topics. In my opinion,
this means that any logic should be covered, but the 'die()' blocks in error
cases do not need to be covered.

It is important to not measure the coverage of the codebase by what old code
is not covered. To help, I created the 'contrib/coverage-diff.sh' script.
After creating the coverage statistics at a version (say, 'topic') you can
then run

    contrib/coverage-diff.sh base topic

to see the lines added between 'base' and 'topic' that are not covered by the
test suite. The output uses 'git blame -c' format so you can find the commits
responsible and view the line numbers for quick access to the context.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 contrib/coverage-diff.sh | 70 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 70 insertions(+)
 create mode 100755 contrib/coverage-diff.sh

Comments

Junio C Hamano Sept. 12, 2018, 10:13 p.m. UTC | #1
"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

>  contrib/coverage-diff.sh | 70 ++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 70 insertions(+)
>  create mode 100755 contrib/coverage-diff.sh

I fully appreciate the motivation.  But it is a bit sad that this
begins with "#!/bin/bash" but it seems that the script is full of
bash-isms.  I haven't gone through the script to see if these are
inevitable or gratuitous yet, but I'd assume it made it easier for
you to write it to step outside the pure POSIX shell?

> +V1=$1
> +V2=$2
> +
> +diff-lines() {

Being able to use '-' in identifier is probably a bash-ism that you
did not have to use.

> +    local path=
> +    local line=
> +    while read; do

Being able to omit variable to be read into and can use the implicit
variable $REPLY also is.

> +	esc=$'\033'
> +	if [[ $REPLY =~ ---\ (a/)?.* ]]; then
> +	    continue
> +	elif [[ $REPLY =~ \+\+\+\ (b/)?([^[:blank:]$esc]+).* ]]; then
> +	    path=${BASH_REMATCH[2]}

OK, it probably is easier to write in bash than using expr if you
want to do regexp.  Where do these escape code come from in "git
diff" output, by the way?

> +	elif [[ $REPLY =~ @@\ -[0-9]+(,[0-9]+)?\ \+([0-9]+)(,[0-9]+)?\ @@.* ]]; then
> +	    line=${BASH_REMATCH[2]}
> +	elif [[ $REPLY =~ ^($esc\[[0-9;]+m)*([\ +-]) ]]; then
> +	    echo "$path:$line:$REPLY"
> +	    if [[ ${BASH_REMATCH[2]} != - ]]; then
> +		((line++))
> +	    fi
> +	fi
> +    done
> +}
> +
> +git diff --raw $V1 $V2 | grep \.c$ | awk 'NF>1{print $NF}' >files.txt

Hmph, not 

	git diff --name-only "$V1" "$V2" -- "*.c"

Do we (or do we not) want "--no-renames"?

> +for file in $(cat files.txt)
> +do
> +	hash_file=${file//\//\#}
> +
> +	git diff $V1 $V2 -- $file \
> +		| diff-lines \
> +		| grep ":+" \
> +		>"diff_file.txt"

Style:

	cmd1 |
	cmd2 |
	cmd3 >output

is easier to read without backslashes.

> +	cat diff_file.txt \
> +		| sed -E 's/:/ /g' \
> +		| awk '{print $2}' \
> +		| sort \
> +		>new_lines.txt
> +
> +	cat "$hash_file.gcov" \
> +		| grep \#\#\#\#\# \
> +		| sed 's/    #####: //g' \
> +		| sed 's/\:/ /g' \
> +		| awk '{print $1}' \
> +		| sort \
> +		>uncovered_lines.txt

OK, so we assume that we have run coverage in $V2 checkout so that
we can pick up the postimage line numbers in "diff $V1 $V2" and find
corresponding record in .gcov file in the filesystem.  I did not
realize the significance of 'topic' being the later argument to the
script in this part

    After creating the coverage statistics at a version (say,
    'topic') you can then run

        contrib/coverage-diff.sh base topic

of your description before I see this implementation.  Also the
comment at the beginning

    # Usage: 'contrib/coverage-diff.sh <version1> <version2>
    # Outputs a list of new lines in version2 compared to version1 that are
    # not covered by the test suite. Assumes you ran
    # 'make coverage-test coverage-report' from root first, so .gcov files exist.

would want to make it clear that we want coverage run from root
for version2 before using this script.

> +	comm -12 uncovered_lines.txt new_lines.txt \
> +		| sed -e 's/$/\)/' \
> +		| sed -e 's/^/\t/' \
> +		>uncovered_new_lines.txt
> +
> +	grep -q '[^[:space:]]' < uncovered_new_lines.txt && \

Style: when you end a line with && (or || or | for that matter), the
shell knows that you have not finished speaking, and will wait to
listen to you to finish the sentence.  No need for backslash there.

> +		echo $file && \
> +		git blame -c $file \
> +			| grep -f uncovered_new_lines.txt
> +
> +	rm -f diff_file.txt new_lines.txt \
> +		uncovered_lines.txt uncovered_new_lines.txt
> +done
> +
> +rm -rf files.txt
Junio C Hamano Sept. 12, 2018, 10:54 p.m. UTC | #2
Junio C Hamano <gitster@pobox.com> writes:

> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
>>  contrib/coverage-diff.sh | 70 ++++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 70 insertions(+)
>>  create mode 100755 contrib/coverage-diff.sh
>
> I fully appreciate the motivation.  But it is a bit sad that this
> begins with "#!/bin/bash" but it seems that the script is full of
> bash-isms.  I haven't gone through the script to see if these are
> inevitable or gratuitous yet, but I'd assume it made it easier for
> you to write it to step outside the pure POSIX shell?
> ...
>> +	elif [[ $REPLY =~ \+\+\+\ (b/)?([^[:blank:]$esc]+).* ]]; then
>> +	    path=${BASH_REMATCH[2]}
>
> OK, it probably is easier to write in bash than using expr if you
> want to do regexp.

Just to clarify. I am saying that it is OK to give up writing in
pure POSIX and relying on bash-isms after seeing these lines.
Derrick Stolee Sept. 13, 2018, 12:21 p.m. UTC | #3
On 9/12/2018 6:54 PM, Junio C Hamano wrote:
> Junio C Hamano <gitster@pobox.com> writes:
>
>> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
>>
>>>   contrib/coverage-diff.sh | 70 ++++++++++++++++++++++++++++++++++++++++
>>>   1 file changed, 70 insertions(+)
>>>   create mode 100755 contrib/coverage-diff.sh
>> I fully appreciate the motivation.  But it is a bit sad that this
>> begins with "#!/bin/bash" but it seems that the script is full of
>> bash-isms.  I haven't gone through the script to see if these are
>> inevitable or gratuitous yet, but I'd assume it made it easier for
>> you to write it to step outside the pure POSIX shell?

I completely forgot to avoid bash, as I wrote this first as an experiment.

>> ...
>>> +	elif [[ $REPLY =~ \+\+\+\ (b/)?([^[:blank:]$esc]+).* ]]; then
>>> +	    path=${BASH_REMATCH[2]}
>> OK, it probably is easier to write in bash than using expr if you
>> want to do regexp.
> Just to clarify. I am saying that it is OK to give up writing in
> pure POSIX and relying on bash-isms after seeing these lines.

I'll try rewriting it using POSIX shell and see how hard it is.

Thanks,

-Stolee
Junio C Hamano Sept. 13, 2018, 2:59 p.m. UTC | #4
Derrick Stolee <stolee@gmail.com> writes:

> On 9/12/2018 6:54 PM, Junio C Hamano wrote:
>> Junio C Hamano <gitster@pobox.com> writes:
>>
>>> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
>>>
>>>>   contrib/coverage-diff.sh | 70 ++++++++++++++++++++++++++++++++++++++++
>>>>   1 file changed, 70 insertions(+)
>>>>   create mode 100755 contrib/coverage-diff.sh
>>> I fully appreciate the motivation.  But it is a bit sad that this
>>> begins with "#!/bin/bash" but it seems that the script is full of
>>> bash-isms.  I haven't gone through the script to see if these are
>>> inevitable or gratuitous yet, but I'd assume it made it easier for
>>> you to write it to step outside the pure POSIX shell?
>
> I completely forgot to avoid bash, as I wrote this first as an experiment.
>
>>> ...
>>>> +	elif [[ $REPLY =~ \+\+\+\ (b/)?([^[:blank:]$esc]+).* ]]; then
>>>> +	    path=${BASH_REMATCH[2]}
>>> OK, it probably is easier to write in bash than using expr if you
>>> want to do regexp.
>> Just to clarify. I am saying that it is OK to give up writing in
>> pure POSIX and relying on bash-isms after seeing these lines.
>
> I'll try rewriting it using POSIX shell and see how hard it is.

Thanks.  Don't waste too much time on it and try to bend backwards
too far, though.

Patch
diff mbox series

diff --git a/contrib/coverage-diff.sh b/contrib/coverage-diff.sh
new file mode 100755
index 0000000000..22acb13d38
--- /dev/null
+++ b/contrib/coverage-diff.sh
@@ -0,0 +1,70 @@ 
+#!/bin/bash
+
+# Usage: 'contrib/coverage-diff.sh <version1> <version2>
+# Outputs a list of new lines in version2 compared to version1 that are
+# not covered by the test suite. Assumes you ran
+# 'make coverage-test coverage-report' from root first, so .gcov files exist.
+
+V1=$1
+V2=$2
+
+diff-lines() {
+    local path=
+    local line=
+    while read; do
+	esc=$'\033'
+	if [[ $REPLY =~ ---\ (a/)?.* ]]; then
+	    continue
+	elif [[ $REPLY =~ \+\+\+\ (b/)?([^[:blank:]$esc]+).* ]]; then
+	    path=${BASH_REMATCH[2]}
+	elif [[ $REPLY =~ @@\ -[0-9]+(,[0-9]+)?\ \+([0-9]+)(,[0-9]+)?\ @@.* ]]; then
+	    line=${BASH_REMATCH[2]}
+	elif [[ $REPLY =~ ^($esc\[[0-9;]+m)*([\ +-]) ]]; then
+	    echo "$path:$line:$REPLY"
+	    if [[ ${BASH_REMATCH[2]} != - ]]; then
+		((line++))
+	    fi
+	fi
+    done
+}
+
+git diff --raw $V1 $V2 | grep \.c$ | awk 'NF>1{print $NF}' >files.txt
+
+for file in $(cat files.txt)
+do
+	hash_file=${file//\//\#}
+
+	git diff $V1 $V2 -- $file \
+		| diff-lines \
+		| grep ":+" \
+		>"diff_file.txt"
+
+	cat diff_file.txt \
+		| sed -E 's/:/ /g' \
+		| awk '{print $2}' \
+		| sort \
+		>new_lines.txt
+
+	cat "$hash_file.gcov" \
+		| grep \#\#\#\#\# \
+		| sed 's/    #####: //g' \
+		| sed 's/\:/ /g' \
+		| awk '{print $1}' \
+		| sort \
+		>uncovered_lines.txt
+
+	comm -12 uncovered_lines.txt new_lines.txt \
+		| sed -e 's/$/\)/' \
+		| sed -e 's/^/\t/' \
+		>uncovered_new_lines.txt
+
+	grep -q '[^[:space:]]' < uncovered_new_lines.txt && \
+		echo $file && \
+		git blame -c $file \
+			| grep -f uncovered_new_lines.txt
+
+	rm -f diff_file.txt new_lines.txt \
+		uncovered_lines.txt uncovered_new_lines.txt
+done
+
+rm -rf files.txt