diff mbox series

[v2] builtin/blame: fix out-of-bounds read with excessive `--abbrev`

Message ID 20250109-b4-pks-blame-truncate-hash-length-v2-1-589c81a6ddb0@pks.im (mailing list archive)
State New
Headers show
Series [v2] builtin/blame: fix out-of-bounds read with excessive `--abbrev` | expand

Commit Message

Patrick Steinhardt Jan. 9, 2025, 11:48 a.m. UTC
In 6411a0a896 (builtin/blame: fix type of `length` variable when
emitting object ID, 2024-12-06) we have fixed the type of the `length`
variable. In order to avoid a cast from `size_t` to `int` in the call to
printf(3p) with the "%.*s" formatter we have converted the code to
instead use fwrite(3p), which accepts the length as a `size_t`.

It was reported though that this makes us read over the end of the OID
array when the provided `--abbrev=` length exceeds the length of the
object ID. This is because fwrite(3p) of course doesn't stop when it
sees a NUL byte, whereas printf(3p) does.

Fix the bug by reverting back to printf(3p) and culling the provided
length to `GIT_MAX_HEXSZ` to keep it from overflowing when cast to an
`int`.

Reported-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
This fixes the issue reported in [1]. Thanks!

Changes in v2:

  - Take into account that we may strip ^, * and ? indicators by moving
    around the check.
  - Fix the testcase so that it actually fails without the fix.
  - Link to v1: https://lore.kernel.org/r/20250109-b4-pks-blame-truncate-hash-length-v1-1-9ad4bb09e059@pks.im

Patrick

[1]: <4d812802-afbc-4635-7a19-73896fcda625@gmx.de>
---
 builtin/blame.c  | 5 ++++-
 t/t8002-blame.sh | 4 ++++
 2 files changed, 8 insertions(+), 1 deletion(-)


---
base-commit: 14650065b76b28d3cfa9453356ac5669b19e706e
change-id: 20250109-b4-pks-blame-truncate-hash-length-c875cac66d71

Comments

shejialuo Jan. 9, 2025, 12:40 p.m. UTC | #1
On Thu, Jan 09, 2025 at 12:48:22PM +0100, Patrick Steinhardt wrote:
> In 6411a0a896 (builtin/blame: fix type of `length` variable when
> emitting object ID, 2024-12-06) we have fixed the type of the `length`
> variable. In order to avoid a cast from `size_t` to `int` in the call to
> printf(3p) with the "%.*s" formatter we have converted the code to
> instead use fwrite(3p), which accepts the length as a `size_t`.
> 
> It was reported though that this makes us read over the end of the OID
> array when the provided `--abbrev=` length exceeds the length of the
> object ID. This is because fwrite(3p) of course doesn't stop when it
> sees a NUL byte, whereas printf(3p) does.
> 
> Fix the bug by reverting back to printf(3p) and culling the provided
> length to `GIT_MAX_HEXSZ` to keep it from overflowing when cast to an
> `int`.
> 
> Reported-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
> This fixes the issue reported in [1]. Thanks!
> 
> Changes in v2:
> 
>   - Take into account that we may strip ^, * and ? indicators by moving
>     around the check.
>   - Fix the testcase so that it actually fails without the fix.
>   - Link to v1: https://lore.kernel.org/r/20250109-b4-pks-blame-truncate-hash-length-v1-1-9ad4bb09e059@pks.im
> 
> Patrick
> 
> [1]: <4d812802-afbc-4635-7a19-73896fcda625@gmx.de>
> ---
>  builtin/blame.c  | 5 ++++-
>  t/t8002-blame.sh | 4 ++++
>  2 files changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/builtin/blame.c b/builtin/blame.c
> index 867032e4c16878ffd56df8a73162b89ca4bd2694..f92e487bed22eec576a4716f2e654cb61efb9903 100644
> --- a/builtin/blame.c
> +++ b/builtin/blame.c
> @@ -505,7 +505,10 @@ static void emit_other(struct blame_scoreboard *sb, struct blame_entry *ent, int
>  			length--;
>  			putchar('?');
>  		}
> -		fwrite(hex, 1, length, stdout);
> +
> +		if (length > GIT_MAX_HEXSZ)
> +			length = GIT_MAX_HEXSZ;

Here, we explicitly set the length if it exceeds the max hex of the
repo. Although `printf` will automatically hide the length, we should
explicitly do this. Make sense.

> +		printf("%.*s", (int)length, hex);
>  		if (opt & OUTPUT_ANNOTATE_COMPAT) {
>  			const char *name;
>  			if (opt & OUTPUT_SHOW_EMAIL)
> diff --git a/t/t8002-blame.sh b/t/t8002-blame.sh
> index 0147de304b4d104cc7f05ea1f8d68f1a07ceb80d..7cf6e0253a5bbd4d6e438e627dc18b47eac4df66 100755
> --- a/t/t8002-blame.sh
> +++ b/t/t8002-blame.sh
> @@ -126,6 +126,10 @@ test_expect_success '--no-abbrev works like --abbrev with full length' '
>  	check_abbrev $hexsz --no-abbrev
>  '
>  
> +test_expect_success 'blame --abbrev gets truncated' '
> +	check_abbrev $hexsz --abbrev=9000 HEAD
> +'
> +

By the way, I feel this usage is a little strange as the user side. When
I received the report mail from Johannes today morning, I feel a little
funny that we allow the value of the `--abrrev` option exceeds the
`GIT_MAX_HEXSZ` in the first place.

>  test_expect_success '--exclude-promisor-objects does not BUG-crash' '
>  	test_must_fail git blame --exclude-promisor-objects one
>  '
> 
> ---
> base-commit: 14650065b76b28d3cfa9453356ac5669b19e706e
> change-id: 20250109-b4-pks-blame-truncate-hash-length-c875cac66d71
> 

Thanks,
Jialuo
Johannes Schindelin Jan. 9, 2025, 1:43 p.m. UTC | #2
Hi Patrick,

On Thu, 9 Jan 2025, Patrick Steinhardt wrote:

> diff --git a/t/t8002-blame.sh b/t/t8002-blame.sh
> index 0147de304b4d104cc7f05ea1f8d68f1a07ceb80d..7cf6e0253a5bbd4d6e438e627dc18b47eac4df66 100755
> --- a/t/t8002-blame.sh
> +++ b/t/t8002-blame.sh
> @@ -126,6 +126,10 @@ test_expect_success '--no-abbrev works like --abbrev with full length' '
>  	check_abbrev $hexsz --no-abbrev
>  '
>
> +test_expect_success 'blame --abbrev gets truncated' '
> +	check_abbrev $hexsz --abbrev=9000 HEAD

Please note that the patch I proposed had `HEAD..` here, not `HEAD`. The
latter would not have surfaced the problem with boundary commits.

Ciao,
Johannes
Johannes Schindelin Jan. 9, 2025, 1:49 p.m. UTC | #3
Hi Jialuo,

On Thu, 9 Jan 2025, shejialuo wrote:

> On Thu, Jan 09, 2025 at 12:48:22PM +0100, Patrick Steinhardt wrote:
>
> > +		printf("%.*s", (int)length, hex);
> >  		if (opt & OUTPUT_ANNOTATE_COMPAT) {
> >  			const char *name;
> >  			if (opt & OUTPUT_SHOW_EMAIL)
> > diff --git a/t/t8002-blame.sh b/t/t8002-blame.sh
> > index 0147de304b4d104cc7f05ea1f8d68f1a07ceb80d..7cf6e0253a5bbd4d6e438e627dc18b47eac4df66 100755
> > --- a/t/t8002-blame.sh
> > +++ b/t/t8002-blame.sh
> > @@ -126,6 +126,10 @@ test_expect_success '--no-abbrev works like --abbrev with full length' '
> >  	check_abbrev $hexsz --no-abbrev
> >  '
> >
> > +test_expect_success 'blame --abbrev gets truncated' '
> > +	check_abbrev $hexsz --abbrev=9000 HEAD
> > +'
> > +
>
> By the way, I feel this usage is a little strange as the user side. When
> I received the report mail from Johannes today morning, I feel a little
> funny that we allow the value of the `--abrrev` option exceeds the
> `GIT_MAX_HEXSZ` in the first place.

See the explanation I provided in
https://lore.kernel.org/git/c439fcaf-11af-7862-9c3c-18dc0842b57d@gmx.de/:
When calling `git blame --abbrev=40 HEAD.. -- <file>` (in a SHA-1-based
repository), the OIDs are prefixed with a `^` and then the last hex digit
will be cut. The reason? Git wants to align the text after the OID.

When calling it with `--abbrev=41`, the full OID is shown.

Ciao,
Johannes
Junio C Hamano Jan. 9, 2025, 2:59 p.m. UTC | #4
Patrick Steinhardt <ps@pks.im> writes:

> diff --git a/builtin/blame.c b/builtin/blame.c
> index 867032e4c16878ffd56df8a73162b89ca4bd2694..f92e487bed22eec576a4716f2e654cb61efb9903 100644
> --- a/builtin/blame.c
> +++ b/builtin/blame.c
> @@ -505,7 +505,10 @@ static void emit_other(struct blame_scoreboard *sb, struct blame_entry *ent, int

Mental note: the "hex" in question that is used later is defined as

	char hex[GIT_MAX_HEXSZ + 1];

at the top of this scope.

>  			length--;
>  			putchar('?');
>  		}
> -		fwrite(hex, 1, length, stdout);
> +
> +		if (length > GIT_MAX_HEXSZ)
> +			length = GIT_MAX_HEXSZ;
> +		printf("%.*s", (int)length, hex);

This is not wrong per-se, but leaves a funny aftertaste after
reading it, especially if the reader knows the original *bug* was
about showing past the end of the string in "hex".  The question is
"what if the current hash function produces shorter than MAX_HEXSZ?"

The updated code is correct only because it reinstates the use of
printf() so the "length" being longer than the string itself no
longer matters, and the only requirement on "length" is not to
read beyond the end of hex[] array.  So feeding the length of the
array as the limit is not wrong, even though it may be feeding a
limit that is larger than the string stored in the array.

An obvious alternative would have been to base the limit on the
value of strlen(hex) and then we may even keep using fwrite(); then
we do not have worry about "what if MAX_HEXSZ is larger than the
current hash function?" and nonsense like that (I personally prefer
what is being reviewed much better; please do not take this "obvious
alternative" as a suggestion).

Limiting the "length" here does improve the resulting code, relative
to v1 iteration.  This printf is about showing the hexstring and we
are making sure we do not read past the end of the array that holds
the string, and doing it here means we do not have to worry about
leading prefix characters about boundary etc.  Very good
improvement.

Thanks.
shejialuo Jan. 10, 2025, 12:16 p.m. UTC | #5
On Thu, Jan 09, 2025 at 02:49:09PM +0100, Johannes Schindelin wrote:
> Hi Jialuo,
> 
> On Thu, 9 Jan 2025, shejialuo wrote:
> 
> > On Thu, Jan 09, 2025 at 12:48:22PM +0100, Patrick Steinhardt wrote:
> >
> > > +		printf("%.*s", (int)length, hex);
> > >  		if (opt & OUTPUT_ANNOTATE_COMPAT) {
> > >  			const char *name;
> > >  			if (opt & OUTPUT_SHOW_EMAIL)
> > > diff --git a/t/t8002-blame.sh b/t/t8002-blame.sh
> > > index 0147de304b4d104cc7f05ea1f8d68f1a07ceb80d..7cf6e0253a5bbd4d6e438e627dc18b47eac4df66 100755
> > > --- a/t/t8002-blame.sh
> > > +++ b/t/t8002-blame.sh
> > > @@ -126,6 +126,10 @@ test_expect_success '--no-abbrev works like --abbrev with full length' '
> > >  	check_abbrev $hexsz --no-abbrev
> > >  '
> > >
> > > +test_expect_success 'blame --abbrev gets truncated' '
> > > +	check_abbrev $hexsz --abbrev=9000 HEAD
> > > +'
> > > +
> >
> > By the way, I feel this usage is a little strange as the user side. When
> > I received the report mail from Johannes today morning, I feel a little
> > funny that we allow the value of the `--abrrev` option exceeds the
> > `GIT_MAX_HEXSZ` in the first place.
> 
> See the explanation I provided in
> https://lore.kernel.org/git/c439fcaf-11af-7862-9c3c-18dc0842b57d@gmx.de/:
> When calling `git blame --abbrev=40 HEAD.. -- <file>` (in a SHA-1-based
> repository), the OIDs are prefixed with a `^` and then the last hex digit
> will be cut. The reason? Git wants to align the text after the OID.
> 

I have read through this, thanks for the detailed explanation.

Thanks,
Jialuo
diff mbox series

Patch

diff --git a/builtin/blame.c b/builtin/blame.c
index 867032e4c16878ffd56df8a73162b89ca4bd2694..f92e487bed22eec576a4716f2e654cb61efb9903 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -505,7 +505,10 @@  static void emit_other(struct blame_scoreboard *sb, struct blame_entry *ent, int
 			length--;
 			putchar('?');
 		}
-		fwrite(hex, 1, length, stdout);
+
+		if (length > GIT_MAX_HEXSZ)
+			length = GIT_MAX_HEXSZ;
+		printf("%.*s", (int)length, hex);
 		if (opt & OUTPUT_ANNOTATE_COMPAT) {
 			const char *name;
 			if (opt & OUTPUT_SHOW_EMAIL)
diff --git a/t/t8002-blame.sh b/t/t8002-blame.sh
index 0147de304b4d104cc7f05ea1f8d68f1a07ceb80d..7cf6e0253a5bbd4d6e438e627dc18b47eac4df66 100755
--- a/t/t8002-blame.sh
+++ b/t/t8002-blame.sh
@@ -126,6 +126,10 @@  test_expect_success '--no-abbrev works like --abbrev with full length' '
 	check_abbrev $hexsz --no-abbrev
 '
 
+test_expect_success 'blame --abbrev gets truncated' '
+	check_abbrev $hexsz --abbrev=9000 HEAD
+'
+
 test_expect_success '--exclude-promisor-objects does not BUG-crash' '
 	test_must_fail git blame --exclude-promisor-objects one
 '