[3/3] shorten_unambiguous_ref(): avoid sscanf()

To shorten a fully qualified ref (e.g., taking "refs/heads/foo" to just
"foo"), we munge the usual lookup rules ("refs/heads/%.*s", etc) to drop
the ".*" modifier (so "refs/heads/%s"), and then use sscanf() to match
that against the refname, pulling the "%s" content into a separate
buffer.

This has two downsides:

  - sscanf("%s") reportedly misbehaves on macOS with some input and
    locale combinations, returning a partial or garbled string. See
    this thread:

      https://lore.kernel.org/git/CAGF3oAcCi+fG12j-1U0hcrWwkF5K_9WhOi6ZPHBzUUzfkrZDxA@mail.gmail.com/

  - scanf in general is an error-prone interface. For example, scanning
    for "%s" will copy bytes into a destination string, which must have
    been correctly sized ahead of time to avoid a buffer overflow. In
    this case, the code is OK (the buffer is pessimistically sized to
    match the original string, which should give us a maximum). But in
    general, we do not want to encourage people to use scanf at all.

So instead, let's note that our lookup rules are not arbitrary format
strings, but all contain exactly one "%.*s" placeholder. We already rely
on this, both for lookup (we feed the lookup format along with exactly
one int/ptr combo to snprintf, etc) and for shortening (we munge "%.*s"
to "%s", and then insist that sscanf() finds exactly one result).

We can parse this manually by just matching the bytes that occur before
and after the "%.*s" placeholder. While we have a few extra lines of
parsing code, the result is arguably simpler, as can skip the
preprocessing step and its tricky memory management entirely.

The in-code comments should explain the parsing strategy, but there's
one subtle change here. The original code allocated a single buffer, and
then overwrote it in each loop iteration, since that's the only option
sscanf() gives us. But our parser can actually return a ptr/len combo
for the matched string, which is all we need (since we just feed it back
to the lookup rules with "%.*s"), and then copy it only when returning
to the caller.

Reported-by: 孟子易 <mengziyi540841@gmail.com>
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Jeff King <peff@peff.net>
---
BTW, this diff is generated with --patience, which generates a _much_
nicer output in this case. Not important to this series, but since there
was discussion of switching the default in a nearby thread, it seemed
like an interesting example.

 refs.c | 77 ++++++++++++++++++++++++++++++++--------------------------
 1 file changed, 42 insertions(+), 35 deletions(-)

Message ID	Y+vV8Ifkj1QV7KF0@coredump.intra.peff.net (mailing list archive)
State	Superseded
Headers	show Return-Path: <git-owner@vger.kernel.org> Date: Tue, 14 Feb 2023 13:41:52 -0500 From: Jeff King <peff@peff.net> To: Eric Sunshine <sunshine@sunshineco.com> Cc: Junio C Hamano <gitster@pobox.com>, =?utf-8?b?5a2f5a2Q5piT?= <mengziyi540841@gmail.com>, git@vger.kernel.org Subject: [PATCH 3/3] shorten_unambiguous_ref(): avoid sscanf() Message-ID: <Y+vV8Ifkj1QV7KF0@coredump.intra.peff.net> References: <Y+vVFFCRem6t4IGM@coredump.intra.peff.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <Y+vVFFCRem6t4IGM@coredump.intra.peff.net> Precedence: bulk
Series	get rid of sscanf() when shortening refs \| expand [0/3] get rid of sscanf() when shortening refs [1/3] shorten_unambiguous_ref(): avoid integer truncation [2/3] shorten_unambiguous_ref(): use NUM_REV_PARSE_RULES constant [3/3] shorten_unambiguous_ref(): avoid sscanf()

[3/3] shorten_unambiguous_ref(): avoid sscanf()

Commit Message

Comments

Patch