Message ID | 20250109-b4-pks-blame-truncate-hash-length-v2-1-589c81a6ddb0@pks.im (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [v2] builtin/blame: fix out-of-bounds read with excessive `--abbrev` | expand |
On Thu, Jan 09, 2025 at 12:48:22PM +0100, Patrick Steinhardt wrote: > In 6411a0a896 (builtin/blame: fix type of `length` variable when > emitting object ID, 2024-12-06) we have fixed the type of the `length` > variable. In order to avoid a cast from `size_t` to `int` in the call to > printf(3p) with the "%.*s" formatter we have converted the code to > instead use fwrite(3p), which accepts the length as a `size_t`. > > It was reported though that this makes us read over the end of the OID > array when the provided `--abbrev=` length exceeds the length of the > object ID. This is because fwrite(3p) of course doesn't stop when it > sees a NUL byte, whereas printf(3p) does. > > Fix the bug by reverting back to printf(3p) and culling the provided > length to `GIT_MAX_HEXSZ` to keep it from overflowing when cast to an > `int`. > > Reported-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> > Signed-off-by: Patrick Steinhardt <ps@pks.im> > --- > This fixes the issue reported in [1]. Thanks! > > Changes in v2: > > - Take into account that we may strip ^, * and ? indicators by moving > around the check. > - Fix the testcase so that it actually fails without the fix. > - Link to v1: https://lore.kernel.org/r/20250109-b4-pks-blame-truncate-hash-length-v1-1-9ad4bb09e059@pks.im > > Patrick > > [1]: <4d812802-afbc-4635-7a19-73896fcda625@gmx.de> > --- > builtin/blame.c | 5 ++++- > t/t8002-blame.sh | 4 ++++ > 2 files changed, 8 insertions(+), 1 deletion(-) > > diff --git a/builtin/blame.c b/builtin/blame.c > index 867032e4c16878ffd56df8a73162b89ca4bd2694..f92e487bed22eec576a4716f2e654cb61efb9903 100644 > --- a/builtin/blame.c > +++ b/builtin/blame.c > @@ -505,7 +505,10 @@ static void emit_other(struct blame_scoreboard *sb, struct blame_entry *ent, int > length--; > putchar('?'); > } > - fwrite(hex, 1, length, stdout); > + > + if (length > GIT_MAX_HEXSZ) > + length = GIT_MAX_HEXSZ; Here, we explicitly set the length if it exceeds the max hex of the repo. Although `printf` will automatically hide the length, we should explicitly do this. Make sense. > + printf("%.*s", (int)length, hex); > if (opt & OUTPUT_ANNOTATE_COMPAT) { > const char *name; > if (opt & OUTPUT_SHOW_EMAIL) > diff --git a/t/t8002-blame.sh b/t/t8002-blame.sh > index 0147de304b4d104cc7f05ea1f8d68f1a07ceb80d..7cf6e0253a5bbd4d6e438e627dc18b47eac4df66 100755 > --- a/t/t8002-blame.sh > +++ b/t/t8002-blame.sh > @@ -126,6 +126,10 @@ test_expect_success '--no-abbrev works like --abbrev with full length' ' > check_abbrev $hexsz --no-abbrev > ' > > +test_expect_success 'blame --abbrev gets truncated' ' > + check_abbrev $hexsz --abbrev=9000 HEAD > +' > + By the way, I feel this usage is a little strange as the user side. When I received the report mail from Johannes today morning, I feel a little funny that we allow the value of the `--abrrev` option exceeds the `GIT_MAX_HEXSZ` in the first place. > test_expect_success '--exclude-promisor-objects does not BUG-crash' ' > test_must_fail git blame --exclude-promisor-objects one > ' > > --- > base-commit: 14650065b76b28d3cfa9453356ac5669b19e706e > change-id: 20250109-b4-pks-blame-truncate-hash-length-c875cac66d71 > Thanks, Jialuo
Hi Patrick, On Thu, 9 Jan 2025, Patrick Steinhardt wrote: > diff --git a/t/t8002-blame.sh b/t/t8002-blame.sh > index 0147de304b4d104cc7f05ea1f8d68f1a07ceb80d..7cf6e0253a5bbd4d6e438e627dc18b47eac4df66 100755 > --- a/t/t8002-blame.sh > +++ b/t/t8002-blame.sh > @@ -126,6 +126,10 @@ test_expect_success '--no-abbrev works like --abbrev with full length' ' > check_abbrev $hexsz --no-abbrev > ' > > +test_expect_success 'blame --abbrev gets truncated' ' > + check_abbrev $hexsz --abbrev=9000 HEAD Please note that the patch I proposed had `HEAD..` here, not `HEAD`. The latter would not have surfaced the problem with boundary commits. Ciao, Johannes
Hi Jialuo, On Thu, 9 Jan 2025, shejialuo wrote: > On Thu, Jan 09, 2025 at 12:48:22PM +0100, Patrick Steinhardt wrote: > > > + printf("%.*s", (int)length, hex); > > if (opt & OUTPUT_ANNOTATE_COMPAT) { > > const char *name; > > if (opt & OUTPUT_SHOW_EMAIL) > > diff --git a/t/t8002-blame.sh b/t/t8002-blame.sh > > index 0147de304b4d104cc7f05ea1f8d68f1a07ceb80d..7cf6e0253a5bbd4d6e438e627dc18b47eac4df66 100755 > > --- a/t/t8002-blame.sh > > +++ b/t/t8002-blame.sh > > @@ -126,6 +126,10 @@ test_expect_success '--no-abbrev works like --abbrev with full length' ' > > check_abbrev $hexsz --no-abbrev > > ' > > > > +test_expect_success 'blame --abbrev gets truncated' ' > > + check_abbrev $hexsz --abbrev=9000 HEAD > > +' > > + > > By the way, I feel this usage is a little strange as the user side. When > I received the report mail from Johannes today morning, I feel a little > funny that we allow the value of the `--abrrev` option exceeds the > `GIT_MAX_HEXSZ` in the first place. See the explanation I provided in https://lore.kernel.org/git/c439fcaf-11af-7862-9c3c-18dc0842b57d@gmx.de/: When calling `git blame --abbrev=40 HEAD.. -- <file>` (in a SHA-1-based repository), the OIDs are prefixed with a `^` and then the last hex digit will be cut. The reason? Git wants to align the text after the OID. When calling it with `--abbrev=41`, the full OID is shown. Ciao, Johannes
Patrick Steinhardt <ps@pks.im> writes: > diff --git a/builtin/blame.c b/builtin/blame.c > index 867032e4c16878ffd56df8a73162b89ca4bd2694..f92e487bed22eec576a4716f2e654cb61efb9903 100644 > --- a/builtin/blame.c > +++ b/builtin/blame.c > @@ -505,7 +505,10 @@ static void emit_other(struct blame_scoreboard *sb, struct blame_entry *ent, int Mental note: the "hex" in question that is used later is defined as char hex[GIT_MAX_HEXSZ + 1]; at the top of this scope. > length--; > putchar('?'); > } > - fwrite(hex, 1, length, stdout); > + > + if (length > GIT_MAX_HEXSZ) > + length = GIT_MAX_HEXSZ; > + printf("%.*s", (int)length, hex); This is not wrong per-se, but leaves a funny aftertaste after reading it, especially if the reader knows the original *bug* was about showing past the end of the string in "hex". The question is "what if the current hash function produces shorter than MAX_HEXSZ?" The updated code is correct only because it reinstates the use of printf() so the "length" being longer than the string itself no longer matters, and the only requirement on "length" is not to read beyond the end of hex[] array. So feeding the length of the array as the limit is not wrong, even though it may be feeding a limit that is larger than the string stored in the array. An obvious alternative would have been to base the limit on the value of strlen(hex) and then we may even keep using fwrite(); then we do not have worry about "what if MAX_HEXSZ is larger than the current hash function?" and nonsense like that (I personally prefer what is being reviewed much better; please do not take this "obvious alternative" as a suggestion). Limiting the "length" here does improve the resulting code, relative to v1 iteration. This printf is about showing the hexstring and we are making sure we do not read past the end of the array that holds the string, and doing it here means we do not have to worry about leading prefix characters about boundary etc. Very good improvement. Thanks.
On Thu, Jan 09, 2025 at 02:49:09PM +0100, Johannes Schindelin wrote: > Hi Jialuo, > > On Thu, 9 Jan 2025, shejialuo wrote: > > > On Thu, Jan 09, 2025 at 12:48:22PM +0100, Patrick Steinhardt wrote: > > > > > + printf("%.*s", (int)length, hex); > > > if (opt & OUTPUT_ANNOTATE_COMPAT) { > > > const char *name; > > > if (opt & OUTPUT_SHOW_EMAIL) > > > diff --git a/t/t8002-blame.sh b/t/t8002-blame.sh > > > index 0147de304b4d104cc7f05ea1f8d68f1a07ceb80d..7cf6e0253a5bbd4d6e438e627dc18b47eac4df66 100755 > > > --- a/t/t8002-blame.sh > > > +++ b/t/t8002-blame.sh > > > @@ -126,6 +126,10 @@ test_expect_success '--no-abbrev works like --abbrev with full length' ' > > > check_abbrev $hexsz --no-abbrev > > > ' > > > > > > +test_expect_success 'blame --abbrev gets truncated' ' > > > + check_abbrev $hexsz --abbrev=9000 HEAD > > > +' > > > + > > > > By the way, I feel this usage is a little strange as the user side. When > > I received the report mail from Johannes today morning, I feel a little > > funny that we allow the value of the `--abrrev` option exceeds the > > `GIT_MAX_HEXSZ` in the first place. > > See the explanation I provided in > https://lore.kernel.org/git/c439fcaf-11af-7862-9c3c-18dc0842b57d@gmx.de/: > When calling `git blame --abbrev=40 HEAD.. -- <file>` (in a SHA-1-based > repository), the OIDs are prefixed with a `^` and then the last hex digit > will be cut. The reason? Git wants to align the text after the OID. > I have read through this, thanks for the detailed explanation. Thanks, Jialuo
diff --git a/builtin/blame.c b/builtin/blame.c index 867032e4c16878ffd56df8a73162b89ca4bd2694..f92e487bed22eec576a4716f2e654cb61efb9903 100644 --- a/builtin/blame.c +++ b/builtin/blame.c @@ -505,7 +505,10 @@ static void emit_other(struct blame_scoreboard *sb, struct blame_entry *ent, int length--; putchar('?'); } - fwrite(hex, 1, length, stdout); + + if (length > GIT_MAX_HEXSZ) + length = GIT_MAX_HEXSZ; + printf("%.*s", (int)length, hex); if (opt & OUTPUT_ANNOTATE_COMPAT) { const char *name; if (opt & OUTPUT_SHOW_EMAIL) diff --git a/t/t8002-blame.sh b/t/t8002-blame.sh index 0147de304b4d104cc7f05ea1f8d68f1a07ceb80d..7cf6e0253a5bbd4d6e438e627dc18b47eac4df66 100755 --- a/t/t8002-blame.sh +++ b/t/t8002-blame.sh @@ -126,6 +126,10 @@ test_expect_success '--no-abbrev works like --abbrev with full length' ' check_abbrev $hexsz --no-abbrev ' +test_expect_success 'blame --abbrev gets truncated' ' + check_abbrev $hexsz --abbrev=9000 HEAD +' + test_expect_success '--exclude-promisor-objects does not BUG-crash' ' test_must_fail git blame --exclude-promisor-objects one '
In 6411a0a896 (builtin/blame: fix type of `length` variable when emitting object ID, 2024-12-06) we have fixed the type of the `length` variable. In order to avoid a cast from `size_t` to `int` in the call to printf(3p) with the "%.*s" formatter we have converted the code to instead use fwrite(3p), which accepts the length as a `size_t`. It was reported though that this makes us read over the end of the OID array when the provided `--abbrev=` length exceeds the length of the object ID. This is because fwrite(3p) of course doesn't stop when it sees a NUL byte, whereas printf(3p) does. Fix the bug by reverting back to printf(3p) and culling the provided length to `GIT_MAX_HEXSZ` to keep it from overflowing when cast to an `int`. Reported-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Patrick Steinhardt <ps@pks.im> --- This fixes the issue reported in [1]. Thanks! Changes in v2: - Take into account that we may strip ^, * and ? indicators by moving around the check. - Fix the testcase so that it actually fails without the fix. - Link to v1: https://lore.kernel.org/r/20250109-b4-pks-blame-truncate-hash-length-v1-1-9ad4bb09e059@pks.im Patrick [1]: <4d812802-afbc-4635-7a19-73896fcda625@gmx.de> --- builtin/blame.c | 5 ++++- t/t8002-blame.sh | 4 ++++ 2 files changed, 8 insertions(+), 1 deletion(-) --- base-commit: 14650065b76b28d3cfa9453356ac5669b19e706e change-id: 20250109-b4-pks-blame-truncate-hash-length-c875cac66d71