Message ID | 20210608204841.2793591-1-rybak.a.v@gmail.com (mailing list archive) |
---|---|
State | Accepted |
Commit | 4184cbd635140e83c5f0d57c377eec93a9b6eedf |
Headers | show |
Series | mailinfo: use starts_with() when checking scissors | expand |
On Tue, Jun 08, 2021 at 10:48:41PM +0200, Andrei Rybak wrote: > Existing checks for scissors characters using memcmp(3) never read past > the end of the line, because all substrings we are interested in are two > characters long, and the outer loop guarantees we have at least one > character. So at most we will look at the NUL. > > However, this is too subtle and may lead to bugs in code which copies > this behavior without realizing substring length requirement. So use > starts_with() instead, which will stop at NUL regardless of the length > of the prefix. Remove extra pair of parentheses while we are here. > > Helped-by: Jeff King <peff@peff.net> > Signed-off-by: Andrei Rybak <rybak.a.v@gmail.com> > --- > mailinfo.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > This patch was originally part of: > https://public-inbox.org/git/20190401215334.18678-1-rybak.a.v@gmail.com/ > I've finally gotten around to sending it on its own. Well, after seeing the date I don't feel too bad for not remembering it. :) I think this is an improvement in safety and readability, and worth applying. I'd also be fine going further and using skip_prefix(), which would let us drop the magic-number "2", though: - as Junio said in that earlier thread, we hope people aren't encouraged to add more versions of scissors here anyway). - I am not 100% sure that the increment of "c" in the conditional is not also relying on that "2", as well (i.e., it is incrementing one, and then the for-loop increments one). There's a non-trivial risk of introducing off-by-one errors or even more subtlety here. :) So I'm OK leaving that. Thanks for resurrecting this cleanup. -Peff
Jeff King <peff@peff.net> writes: > I think this is an improvement in safety and readability, and worth > applying. I'd also be fine going further and using skip_prefix(), which > would let us drop the magic-number "2", though: > > - as Junio said in that earlier thread, we hope people aren't > encouraged to add more versions of scissors here anyway). > > - I am not 100% sure that the increment of "c" in the conditional > is not also relying on that "2", as well (i.e., it is incrementing > one, and then the for-loop increments one). There's a non-trivial > risk of introducing off-by-one errors or even more subtlety here. :) > > So I'm OK leaving that. Thanks for resurrecting this cleanup. You are correct to point out that the increment of "c" relies on 2. We increment by one here, and let the loop control to increment another, to skip over these two bytes. The posted patch _might_ make it easier to read, but I do not think it improves safety at all. At the point of doing memcmp(), we know that *c is not NUL and because we are dealing with NUL-terminated string, we know we can check c[1] (otherwise we wouldn't even be able to see if c is pointing at the end of string), so we check c[0] and c[1] against four variants of two-byte scissors patterns. The current code uses memcmp() of 2 bytes, which is perfectly safe under the condition, and starts_with() would also be equally safe. If we were to teach new scissors sequence that is longer than two bytes, then starts_with() would start becoming safer, but that will not happen, so...
On Wed, Jun 09, 2021 at 11:22:10AM +0900, Junio C Hamano wrote: > The posted patch _might_ make it easier to read, but I do not think > it improves safety at all. At the point of doing memcmp(), we know > that *c is not NUL and because we are dealing with NUL-terminated > string, we know we can check c[1] (otherwise we wouldn't even be > able to see if c is pointing at the end of string), so we check c[0] > and c[1] against four variants of two-byte scissors patterns. The > current code uses memcmp() of 2 bytes, which is perfectly safe under > the condition, and starts_with() would also be equally safe. > > If we were to teach new scissors sequence that is longer than two > bytes, then starts_with() would start becoming safer, but that will > not happen, so... Right, I agree the current code is safe. The main value would be trying to make the correctness of the code more apparent. Perhaps a comment would be better there, like: diff --git a/mailinfo.c b/mailinfo.c index ccc6beb27e..25b606db28 100644 --- a/mailinfo.c +++ b/mailinfo.c @@ -705,12 +705,17 @@ static int is_scissors_line(const char *line) perforation++; continue; } + /* + * passing 2 bytes to memcmp is safe here; we have at least + * one non-NUL character from the loop condition, plus a + * terminating NUL + */ if ((!memcmp(c, ">8", 2) || !memcmp(c, "8<", 2) || !memcmp(c, ">%", 2) || !memcmp(c, "%<", 2))) { in_perforation = 1; perforation += 2; scissors += 2; - c++; + c++; /* only 1 here to account for loop increment */ continue; } in_perforation = 0; Though since this is code we'd not plan to modify, and presumably anybody touching it would have to fully grok the loop anyway, it might not be that important. I dunno. I offer it as an alternative (and am happy to add a commit message). But I'm fine with leaving it as-is, too. -Peff
Jeff King <peff@peff.net> writes: > I dunno. I offer it as an alternative (and am happy to add a commit > message). But I'm fine with leaving it as-is, too. I am fine with Andrei's version as posted, which I've queued. Thanks, both.
diff --git a/mailinfo.c b/mailinfo.c index ccc6beb27e..8013e5c566 100644 --- a/mailinfo.c +++ b/mailinfo.c @@ -705,8 +705,8 @@ static int is_scissors_line(const char *line) perforation++; continue; } - if ((!memcmp(c, ">8", 2) || !memcmp(c, "8<", 2) || - !memcmp(c, ">%", 2) || !memcmp(c, "%<", 2))) { + if (starts_with(c, ">8") || starts_with(c, "8<") || + starts_with(c, ">%") || starts_with(c, "%<")) { in_perforation = 1; perforation += 2; scissors += 2;
Existing checks for scissors characters using memcmp(3) never read past the end of the line, because all substrings we are interested in are two characters long, and the outer loop guarantees we have at least one character. So at most we will look at the NUL. However, this is too subtle and may lead to bugs in code which copies this behavior without realizing substring length requirement. So use starts_with() instead, which will stop at NUL regardless of the length of the prefix. Remove extra pair of parentheses while we are here. Helped-by: Jeff King <peff@peff.net> Signed-off-by: Andrei Rybak <rybak.a.v@gmail.com> --- mailinfo.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) This patch was originally part of: https://public-inbox.org/git/20190401215334.18678-1-rybak.a.v@gmail.com/ I've finally gotten around to sending it on its own.