mailinfo: use starts_with() when checking scissors

Message ID	20210608204841.2793591-1-rybak.a.v@gmail.com (mailing list archive)
State	Accepted
Commit	4184cbd635140e83c5f0d57c377eec93a9b6eedf
Headers	show Return-Path: <git-owner@kernel.org> From: Andrei Rybak <rybak.a.v@gmail.com> To: git@vger.kernel.org Cc: Jeff King <peff@peff.net> Subject: [PATCH] mailinfo: use starts_with() when checking scissors Date: Tue, 8 Jun 2021 22:48:41 +0200 Message-Id: <20210608204841.2793591-1-rybak.a.v@gmail.com> In-Reply-To: <20190401215334.18678-1-rybak.a.v@gmail.com> References: <20190401215334.18678-1-rybak.a.v@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	mailinfo: use starts_with() when checking scissors \| expand mailinfo: use starts_with() when checking scissors

Message ID

20210608204841.2793591-1-rybak.a.v@gmail.com (mailing list archive)

State

Accepted

Commit

4184cbd635140e83c5f0d57c377eec93a9b6eedf

Headers

From: Andrei Rybak <rybak.a.v@gmail.com>
To: git@vger.kernel.org
Cc: Jeff King <peff@peff.net>
Subject: [PATCH] mailinfo: use starts_with() when checking scissors
Date: Tue,  8 Jun 2021 22:48:41 +0200
Message-Id: <20210608204841.2793591-1-rybak.a.v@gmail.com>
In-Reply-To: <20190401215334.18678-1-rybak.a.v@gmail.com>
References: <20190401215334.18678-1-rybak.a.v@gmail.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Precedence: bulk

Series

mailinfo: use starts_with() when checking scissors | expand

Commit Message

Andrei Rybak June 8, 2021, 8:48 p.m. UTC

Existing checks for scissors characters using memcmp(3) never read past
the end of the line, because all substrings we are interested in are two
characters long, and the outer loop guarantees we have at least one
character.  So at most we will look at the NUL.

However, this is too subtle and may lead to bugs in code which copies
this behavior without realizing substring length requirement.  So use
starts_with() instead, which will stop at NUL regardless of the length
of the prefix.  Remove extra pair of parentheses while we are here.

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Andrei Rybak <rybak.a.v@gmail.com>
---
 mailinfo.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

This patch was originally part of:
https://public-inbox.org/git/20190401215334.18678-1-rybak.a.v@gmail.com/
I've finally gotten around to sending it on its own.

Comments

Jeff King June 8, 2021, 9:57 p.m. UTC | #1

On Tue, Jun 08, 2021 at 10:48:41PM +0200, Andrei Rybak wrote:

> Existing checks for scissors characters using memcmp(3) never read past
> the end of the line, because all substrings we are interested in are two
> characters long, and the outer loop guarantees we have at least one
> character.  So at most we will look at the NUL.
> 
> However, this is too subtle and may lead to bugs in code which copies
> this behavior without realizing substring length requirement.  So use
> starts_with() instead, which will stop at NUL regardless of the length
> of the prefix.  Remove extra pair of parentheses while we are here.
> 
> Helped-by: Jeff King <peff@peff.net>
> Signed-off-by: Andrei Rybak <rybak.a.v@gmail.com>
> ---
>  mailinfo.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> This patch was originally part of:
> https://public-inbox.org/git/20190401215334.18678-1-rybak.a.v@gmail.com/
> I've finally gotten around to sending it on its own.

Well, after seeing the date I don't feel too bad for not remembering it. :)

I think this is an improvement in safety and readability, and worth
applying. I'd also be fine going further and using skip_prefix(), which
would let us drop the magic-number "2", though:

  - as Junio said in that earlier thread, we hope people aren't
    encouraged to add more versions of scissors here anyway).

  - I am not 100% sure that the increment of "c" in the conditional
    is not also relying on that "2", as well (i.e., it is incrementing
    one, and then the for-loop increments one). There's a non-trivial
    risk of introducing off-by-one errors or even more subtlety here. :)

So I'm OK leaving that. Thanks for resurrecting this cleanup.

-Peff

Junio C Hamano June 9, 2021, 2:22 a.m. UTC | #2

Jeff King <peff@peff.net> writes:

> I think this is an improvement in safety and readability, and worth
> applying. I'd also be fine going further and using skip_prefix(), which
> would let us drop the magic-number "2", though:
>
>   - as Junio said in that earlier thread, we hope people aren't
>     encouraged to add more versions of scissors here anyway).
>
>   - I am not 100% sure that the increment of "c" in the conditional
>     is not also relying on that "2", as well (i.e., it is incrementing
>     one, and then the for-loop increments one). There's a non-trivial
>     risk of introducing off-by-one errors or even more subtlety here. :)
>
> So I'm OK leaving that. Thanks for resurrecting this cleanup.

You are correct to point out that the increment of "c" relies on 2.
We increment by one here, and let the loop control to increment
another, to skip over these two bytes.

The posted patch _might_ make it easier to read, but I do not think
it improves safety at all.  At the point of doing memcmp(), we know
that *c is not NUL and because we are dealing with NUL-terminated
string, we know we can check c[1] (otherwise we wouldn't even be
able to see if c is pointing at the end of string), so we check c[0]
and c[1] against four variants of two-byte scissors patterns.  The
current code uses memcmp() of 2 bytes, which is perfectly safe under
the condition, and starts_with() would also be equally safe.

If we were to teach new scissors sequence that is longer than two
bytes, then starts_with() would start becoming safer, but that will
not happen, so...

Jeff King June 9, 2021, 3:59 a.m. UTC | #3

On Wed, Jun 09, 2021 at 11:22:10AM +0900, Junio C Hamano wrote:

> The posted patch _might_ make it easier to read, but I do not think
> it improves safety at all.  At the point of doing memcmp(), we know
> that *c is not NUL and because we are dealing with NUL-terminated
> string, we know we can check c[1] (otherwise we wouldn't even be
> able to see if c is pointing at the end of string), so we check c[0]
> and c[1] against four variants of two-byte scissors patterns.  The
> current code uses memcmp() of 2 bytes, which is perfectly safe under
> the condition, and starts_with() would also be equally safe.
> 
> If we were to teach new scissors sequence that is longer than two
> bytes, then starts_with() would start becoming safer, but that will
> not happen, so...

Right, I agree the current code is safe. The main value would be trying
to make the correctness of the code more apparent. Perhaps a comment
would be better there, like:

diff --git a/mailinfo.c b/mailinfo.c
index ccc6beb27e..25b606db28 100644
--- a/mailinfo.c
+++ b/mailinfo.c
@@ -705,12 +705,17 @@ static int is_scissors_line(const char *line)
 			perforation++;
 			continue;
 		}
+		/*
+		 * passing 2 bytes to memcmp is safe here; we have at least
+		 * one non-NUL character from the loop condition, plus a
+		 * terminating NUL
+		 */
 		if ((!memcmp(c, ">8", 2) || !memcmp(c, "8<", 2) ||
 		     !memcmp(c, ">%", 2) || !memcmp(c, "%<", 2))) {
 			in_perforation = 1;
 			perforation += 2;
 			scissors += 2;
-			c++;
+			c++; /* only 1 here to account for loop increment */
 			continue;
 		}
 		in_perforation = 0;

Though since this is code we'd not plan to modify, and presumably
anybody touching it would have to fully grok the loop anyway, it might
not be that important.

I dunno. I offer it as an alternative (and am happy to add a commit
message). But I'm fine with leaving it as-is, too.

-Peff

Junio C Hamano June 9, 2021, 5:11 a.m. UTC | #4

Jeff King <peff@peff.net> writes:

> I dunno. I offer it as an alternative (and am happy to add a commit
> message). But I'm fine with leaving it as-is, too.

I am fine with Andrei's version as posted, which I've queued.

Thanks, both.

diff --git a/mailinfo.c b/mailinfo.c
index ccc6beb27e..8013e5c566 100644
--- a/mailinfo.c
+++ b/mailinfo.c
@@ -705,8 +705,8 @@  static int is_scissors_line(const char *line)
 			perforation++;
 			continue;
 		}
-		if ((!memcmp(c, ">8", 2) || !memcmp(c, "8<", 2) ||
-		     !memcmp(c, ">%", 2) || !memcmp(c, "%<", 2))) {
+		if (starts_with(c, ">8") || starts_with(c, "8<") ||
+		    starts_with(c, ">%") || starts_with(c, "%<")) {
 			in_perforation = 1;
 			perforation += 2;
 			scissors += 2;

mailinfo: use starts_with() when checking scissors

Commit Message

Comments

Patch