diff mbox series

linear-assignment: fix potential out of bounds memory access (was: Re: Git 2.19 Segmentation fault 11 on macOS)

Message ID 20180912190108.GE4865@hank.intra.tgummerer.com (mailing list archive)
State New, archived
Headers show
Series linear-assignment: fix potential out of bounds memory access (was: Re: Git 2.19 Segmentation fault 11 on macOS) | expand

Commit Message

Thomas Gummerer Sept. 12, 2018, 7:01 p.m. UTC
On 09/11, Thomas Gummerer wrote:
> On 09/11, Thomas Gummerer wrote:
> > I think you're on the right track here.  I can not test this on Mac
> > OS, but on Linux, the following fails when running the test under
> > valgrind:
> > 
> >     diff --git a/t/t3206-range-diff.sh b/t/t3206-range-diff.sh
> >     index 2237c7f4af..a8b0ef8c1d 100755
> >     --- a/t/t3206-range-diff.sh
> >     +++ b/t/t3206-range-diff.sh
> >     @@ -142,4 +142,9 @@ test_expect_success 'changed message' '
> >             test_cmp expected actual
> >      '
> >      
> >     +test_expect_success 'amend and check' '
> >     +       git commit --amend -m "new message" &&
> >     +       git range-diff master HEAD@{1} HEAD
> >     +'
> >     +
> >      test_done
> > 
> > valgrind gives me the following:
> > 
> > ==18232== Invalid read of size 4
> > ==18232==    at 0x34D7B5: compute_assignment (linear-assignment.c:54)
> > ==18232==    by 0x2A4253: get_correspondences (range-diff.c:245)
> > ==18232==    by 0x2A4BFB: show_range_diff (range-diff.c:427)
> > ==18232==    by 0x19D453: cmd_range_diff (range-diff.c:108)
> > ==18232==    by 0x122698: run_builtin (git.c:418)
> > ==18232==    by 0x1229D8: handle_builtin (git.c:637)
> > ==18232==    by 0x122BCC: run_argv (git.c:689)
> > ==18232==    by 0x122D90: cmd_main (git.c:766)
> > ==18232==    by 0x1D55A3: main (common-main.c:45)
> > ==18232==  Address 0x4f4d844 is 0 bytes after a block of size 4 alloc'd
> > ==18232==    at 0x483777F: malloc (vg_replace_malloc.c:299)
> > ==18232==    by 0x3381B0: do_xmalloc (wrapper.c:60)
> > ==18232==    by 0x338283: xmalloc (wrapper.c:87)
> > ==18232==    by 0x2A3F8C: get_correspondences (range-diff.c:207)
> > ==18232==    by 0x2A4BFB: show_range_diff (range-diff.c:427)
> > ==18232==    by 0x19D453: cmd_range_diff (range-diff.c:108)
> > ==18232==    by 0x122698: run_builtin (git.c:418)
> > ==18232==    by 0x1229D8: handle_builtin (git.c:637)
> > ==18232==    by 0x122BCC: run_argv (git.c:689)
> > ==18232==    by 0x122D90: cmd_main (git.c:766)
> > ==18232==    by 0x1D55A3: main (common-main.c:45)
> > ==18232== 
> > 
> > I'm looking into why that fails.  Also adding Dscho to Cc here as the
> > author of this code.
> 
> The diff below seems to fix it.  Not submitting this as a proper
> patch [...]

I found the time to actually have a look at the paper, so here's a
proper patch:

I'm still not entirely sure what the initial code tried to do here,
but I think staying as close as possible to the original is probably
our best option here, also for future readers of this code.

--- >8 ---

Subject: [PATCH] linear-assignment: fix potential out of bounds memory access

Currently the 'compute_assignment()' function can may read memory out
of bounds, even if used correctly.  Namely this happens when we only
have one column.  In that case we try to calculate the initial
minimum cost using '!j1' as column in the reduction transfer code.
That in turn causes us to try and get the cost from column 1 in the
cost matrix, which does not exist, and thus results in an out of
bounds memory read.

Instead of trying to intialize the minimum cost from another column,
just set it to INT_MAX.  This also matches what the example code in the
original paper for the algorithm [1] does (it initializes the value to
inf, for which INT_MAX is the closest match in C).

Note that the test only fails under valgrind on Linux, but the same
command has been reported to segfault on Mac OS.

Also start from 0 in the loop, which matches what the example code in
the original paper does as well.  Starting from 1 means we'd ignore
the first column during the reduction transfer phase.  Note that in
the original paper the loop does start from 1, but the implementation
is in Pascal, where arrays are 1 indexed.

[1]: Jonker, R., & Volgenant, A. (1987). A shortest augmenting path
     algorithm for dense and sparse linear assignment
     problems. Computing, 38(4), 325–340.

Reported-by: ryenus <ryenus@gmail.com>
Helped-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 linear-assignment.c   | 4 ++--
 t/t3206-range-diff.sh | 5 +++++
 2 files changed, 7 insertions(+), 2 deletions(-)

Comments

Junio C Hamano Sept. 12, 2018, 8:11 p.m. UTC | #1
Thomas Gummerer <t.gummerer@gmail.com> writes:

>> > I'm looking into why that fails.  Also adding Dscho to Cc here as the
>> > author of this code.
>> 
>> The diff below seems to fix it.  Not submitting this as a proper
>> patch [...]
>
> I found the time to actually have a look at the paper, so here's a
> proper patch:
>
> I'm still not entirely sure what the initial code tried to do here,

It looks to me an attempt to optimize (instead of starting from a
value that is too big to be minimum, pick the first value and start
from there, and all the "found even smaller one, so let's replace"
later would work the same way) that went wrong (just that the "first
one" was written incorrectly), but it is not absolutely necessary to
find out why the code was written in a particular way that happened
to be buggy.

> but I think staying as close as possible to the original is probably
> our best option here, also for future readers of this code.

Thanks for digging.

> --- >8 ---
>
> Subject: [PATCH] linear-assignment: fix potential out of bounds memory access
>
> Currently the 'compute_assignment()' function can may read memory out
> of bounds, even if used correctly.  Namely this happens when we only
> have one column.  In that case we try to calculate the initial
> minimum cost using '!j1' as column in the reduction transfer code.
> That in turn causes us to try and get the cost from column 1 in the
> cost matrix, which does not exist, and thus results in an out of
> bounds memory read.

This nicely explains what goes wrong.

> Instead of trying to intialize the minimum cost from another column,
> just set it to INT_MAX.  This also matches what the example code in the
> original paper for the algorithm [1] does (it initializes the value to
> inf, for which INT_MAX is the closest match in C).

Yeah, if we really want to avoid INT_MAX we could use another "have
we found any value yet?" boolean variable, but the caller in
get_correspondences() does not even worry about integer overflows
when stuffing diffsize to the cost[] array, and the other possible
value that can go to cost[] array is COST_MAX that is mere 65k, so
it would be OK to use INT_MAX as sentinel here, I guess.

> Note that the test only fails under valgrind on Linux, but the same
> command has been reported to segfault on Mac OS.
>
> Also start from 0 in the loop, which matches what the example code in
> the original paper does as well.  Starting from 1 means we'd ignore
> the first column during the reduction transfer phase.  Note that in
> the original paper the loop does start from 1, but the implementation
> is in Pascal, where arrays are 1 indexed.
>
> [1]: Jonker, R., & Volgenant, A. (1987). A shortest augmenting path
>      algorithm for dense and sparse linear assignment
>      problems. Computing, 38(4), 325–340.
>
> Reported-by: ryenus <ryenus@gmail.com>
> Helped-by: Derrick Stolee <stolee@gmail.com>
> Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
> ---
>  linear-assignment.c   | 4 ++--
>  t/t3206-range-diff.sh | 5 +++++
>  2 files changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/linear-assignment.c b/linear-assignment.c
> index 9b3e56e283..7700b80eeb 100644
> --- a/linear-assignment.c
> +++ b/linear-assignment.c
> @@ -51,8 +51,8 @@ void compute_assignment(int column_count, int row_count, int *cost,
>  		else if (j1 < -1)
>  			row2column[i] = -2 - j1;
>  		else {
> -			int min = COST(!j1, i) - v[!j1];
> -			for (j = 1; j < column_count; j++)
> +			int min = INT_MAX;
> +			for (j = 0; j < column_count; j++)
>  				if (j != j1 && min > COST(j, i) - v[j])
>  					min = COST(j, i) - v[j];
>  			v[j1] -= min;
> diff --git a/t/t3206-range-diff.sh b/t/t3206-range-diff.sh
> index 2237c7f4af..fb4c13a84a 100755
> --- a/t/t3206-range-diff.sh
> +++ b/t/t3206-range-diff.sh
> @@ -142,4 +142,9 @@ test_expect_success 'changed message' '
>  	test_cmp expected actual
>  '
>  
> +test_expect_success 'no commits on one side' '
> +	git commit --amend -m "new message" &&
> +	git range-diff master HEAD@{1} HEAD
> +'
> +
>  test_done
Thomas Gummerer Sept. 12, 2018, 10:44 p.m. UTC | #2
On 09/12, Junio C Hamano wrote:
> Thomas Gummerer <t.gummerer@gmail.com> writes:

> > --- >8 ---
> >
> > Subject: [PATCH] linear-assignment: fix potential out of bounds memory access
> >
> > Currently the 'compute_assignment()' function can may read memory out
> > of bounds, even if used correctly.  Namely this happens when we only
> > have one column.  In that case we try to calculate the initial
> > minimum cost using '!j1' as column in the reduction transfer code.
> > That in turn causes us to try and get the cost from column 1 in the
> > cost matrix, which does not exist, and thus results in an out of
> > bounds memory read.
> 
> This nicely explains what goes wrong.
> 
> > Instead of trying to intialize the minimum cost from another column,
> > just set it to INT_MAX.  This also matches what the example code in the
> > original paper for the algorithm [1] does (it initializes the value to
> > inf, for which INT_MAX is the closest match in C).
> 
> Yeah, if we really want to avoid INT_MAX we could use another "have
> we found any value yet?" boolean variable, but the caller in
> get_correspondences() does not even worry about integer overflows
> when stuffing diffsize to the cost[] array, and the other possible
> value that can go to cost[] array is COST_MAX that is mere 65k, so
> it would be OK to use INT_MAX as sentinel here, I guess.

Right, I'm not sure it would be worth introducing another boolean
variable here.  In the normal case we'll always enter the if condition
inside the loop, and set a reasonable 'min' value.

That does not happen if we only have one column, and the 'min' will
remain 'INT_MAX'.  Now in that case it doesn't matter much, as having
only one column means there's no possibility to assign anything, so
the actual values shouldn't matter (at least that's my understanding
of the algorithm so far).

Another improvement we may be able to make here is to not even try to
compute the assignment if there's only one column for that reason, but
I'm out of time today and the rest of my week looks a bit busy, so I
probably won't get to do anything before the beginning of next week.

> > Note that the test only fails under valgrind on Linux, but the same
> > command has been reported to segfault on Mac OS.
> >
> > Also start from 0 in the loop, which matches what the example code in
> > the original paper does as well.  Starting from 1 means we'd ignore
> > the first column during the reduction transfer phase.  Note that in
> > the original paper the loop does start from 1, but the implementation
> > is in Pascal, where arrays are 1 indexed.
> >
> > [1]: Jonker, R., & Volgenant, A. (1987). A shortest augmenting path
> >      algorithm for dense and sparse linear assignment
> >      problems. Computing, 38(4), 325–340.
> >
> > Reported-by: ryenus <ryenus@gmail.com>
> > Helped-by: Derrick Stolee <stolee@gmail.com>
> > Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
> > ---
> >  linear-assignment.c   | 4 ++--
> >  t/t3206-range-diff.sh | 5 +++++
> >  2 files changed, 7 insertions(+), 2 deletions(-)
> >
> > diff --git a/linear-assignment.c b/linear-assignment.c
> > index 9b3e56e283..7700b80eeb 100644
> > --- a/linear-assignment.c
> > +++ b/linear-assignment.c
> > @@ -51,8 +51,8 @@ void compute_assignment(int column_count, int row_count, int *cost,
> >  		else if (j1 < -1)
> >  			row2column[i] = -2 - j1;
> >  		else {
> > -			int min = COST(!j1, i) - v[!j1];
> > -			for (j = 1; j < column_count; j++)
> > +			int min = INT_MAX;
> > +			for (j = 0; j < column_count; j++)
> >  				if (j != j1 && min > COST(j, i) - v[j])
> >  					min = COST(j, i) - v[j];
> >  			v[j1] -= min;
> > diff --git a/t/t3206-range-diff.sh b/t/t3206-range-diff.sh
> > index 2237c7f4af..fb4c13a84a 100755
> > --- a/t/t3206-range-diff.sh
> > +++ b/t/t3206-range-diff.sh
> > @@ -142,4 +142,9 @@ test_expect_success 'changed message' '
> >  	test_cmp expected actual
> >  '
> >  
> > +test_expect_success 'no commits on one side' '
> > +	git commit --amend -m "new message" &&
> > +	git range-diff master HEAD@{1} HEAD
> > +'
> > +
> >  test_done
Johannes Schindelin Sept. 13, 2018, 2:38 a.m. UTC | #3
Hi Thomas,

[quickly, as I will go back to a proper vacation after this]

On Wed, 12 Sep 2018, Thomas Gummerer wrote:

> diff --git a/linear-assignment.c b/linear-assignment.c
> index 9b3e56e283..7700b80eeb 100644
> --- a/linear-assignment.c
> +++ b/linear-assignment.c
> @@ -51,8 +51,8 @@ void compute_assignment(int column_count, int row_count, int *cost,
>  		else if (j1 < -1)
>  			row2column[i] = -2 - j1;
>  		else {
> -			int min = COST(!j1, i) - v[!j1];
> -			for (j = 1; j < column_count; j++)
> +			int min = INT_MAX;

I am worried about this, as I tried very hard to avoid integer overruns.

Wouldn't it be possible to replace the `else {` by an appropriate `else if
(...) { ... } else {`? E.g. `else if (column_count < 2)` or some such?

Ciao,
Dscho

> +			for (j = 0; j < column_count; j++)
>  				if (j != j1 && min > COST(j, i) - v[j])
>  					min = COST(j, i) - v[j];
>  			v[j1] -= min;
> diff --git a/t/t3206-range-diff.sh b/t/t3206-range-diff.sh
> index 2237c7f4af..fb4c13a84a 100755
> --- a/t/t3206-range-diff.sh
> +++ b/t/t3206-range-diff.sh
> @@ -142,4 +142,9 @@ test_expect_success 'changed message' '
>  	test_cmp expected actual
>  '
>  
> +test_expect_success 'no commits on one side' '
> +	git commit --amend -m "new message" &&
> +	git range-diff master HEAD@{1} HEAD
> +'
> +
>  test_done
> -- 
> 2.19.0.397.gdd90340f6a
> 
>
Eric Sunshine Sept. 13, 2018, 10:14 a.m. UTC | #4
On Wed, Sep 12, 2018 at 3:01 PM Thomas Gummerer <t.gummerer@gmail.com> wrote:
> Subject: [PATCH] linear-assignment: fix potential out of bounds memory access
>
> Currently the 'compute_assignment()' function can may read memory out

"can may"?

> of bounds, even if used correctly.  Namely this happens when we only
> have one column.  In that case we try to calculate the initial
> minimum cost using '!j1' as column in the reduction transfer code.
> That in turn causes us to try and get the cost from column 1 in the
> cost matrix, which does not exist, and thus results in an out of
> bounds memory read.
Thomas Gummerer Sept. 13, 2018, 10:13 p.m. UTC | #5
On 09/12, Johannes Schindelin wrote:
> Hi Thomas,
> 
> [quickly, as I will go back to a proper vacation after this]

Sorry about interrupting your vacation, enjoy wherever you are! :)

> On Wed, 12 Sep 2018, Thomas Gummerer wrote:
> 
> > diff --git a/linear-assignment.c b/linear-assignment.c
> > index 9b3e56e283..7700b80eeb 100644
> > --- a/linear-assignment.c
> > +++ b/linear-assignment.c
> > @@ -51,8 +51,8 @@ void compute_assignment(int column_count, int row_count, int *cost,
> >  		else if (j1 < -1)
> >  			row2column[i] = -2 - j1;
> >  		else {
> > -			int min = COST(!j1, i) - v[!j1];
> > -			for (j = 1; j < column_count; j++)
> > +			int min = INT_MAX;
> 
> I am worried about this, as I tried very hard to avoid integer overruns.

Ah fair enough, now I think I understand where the calculation of the
initial value of min comes from, thanks!

> Wouldn't it be possible to replace the `else {` by an appropriate `else if
> (...) { ... } else {`? E.g. `else if (column_count < 2)` or some such?

Yes, I think that would be possible.  However if we're already special
casing "column_count < 2", I think we might as well just exit early
before running through the whole algorithm in that case.  If there's
only one column, there are no commits that can be assigned to
eachother, as there is only the one.

We could also just not run call 'compute_assignment' in the first
place if column_count == 1, however I'd rather make the function safer
to call, just in case we find it useful for something else in the
future.

Will send an updated patch in a bit.

> Ciao,
> Dscho
> 
> > +			for (j = 0; j < column_count; j++)
> >  				if (j != j1 && min > COST(j, i) - v[j])
> >  					min = COST(j, i) - v[j];
> >  			v[j1] -= min;
> > diff --git a/t/t3206-range-diff.sh b/t/t3206-range-diff.sh
> > index 2237c7f4af..fb4c13a84a 100755
> > --- a/t/t3206-range-diff.sh
> > +++ b/t/t3206-range-diff.sh
> > @@ -142,4 +142,9 @@ test_expect_success 'changed message' '
> >  	test_cmp expected actual
> >  '
> >  
> > +test_expect_success 'no commits on one side' '
> > +	git commit --amend -m "new message" &&
> > +	git range-diff master HEAD@{1} HEAD
> > +'
> > +
> >  test_done
> > -- 
> > 2.19.0.397.gdd90340f6a
> > 
> >
diff mbox series

Patch

diff --git a/linear-assignment.c b/linear-assignment.c
index 9b3e56e283..7700b80eeb 100644
--- a/linear-assignment.c
+++ b/linear-assignment.c
@@ -51,8 +51,8 @@  void compute_assignment(int column_count, int row_count, int *cost,
 		else if (j1 < -1)
 			row2column[i] = -2 - j1;
 		else {
-			int min = COST(!j1, i) - v[!j1];
-			for (j = 1; j < column_count; j++)
+			int min = INT_MAX;
+			for (j = 0; j < column_count; j++)
 				if (j != j1 && min > COST(j, i) - v[j])
 					min = COST(j, i) - v[j];
 			v[j1] -= min;
diff --git a/t/t3206-range-diff.sh b/t/t3206-range-diff.sh
index 2237c7f4af..fb4c13a84a 100755
--- a/t/t3206-range-diff.sh
+++ b/t/t3206-range-diff.sh
@@ -142,4 +142,9 @@  test_expect_success 'changed message' '
 	test_cmp expected actual
 '
 
+test_expect_success 'no commits on one side' '
+	git commit --amend -m "new message" &&
+	git range-diff master HEAD@{1} HEAD
+'
+
 test_done