diff mbox series

revision: parse integer arguments to --max-count, --skip, etc., more carefully

Message ID xmqq5y181fx0.fsf_-_@gitster.g (mailing list archive)
State New, archived
Headers show
Series revision: parse integer arguments to --max-count, --skip, etc., more carefully | expand

Commit Message

Junio C Hamano Dec. 8, 2023, 10:35 p.m. UTC
The "rev-list" and other commands in the "log" family, being the
oldest part of the system, use their own custom argument parsers,
and integer values of some options are parsed with atoi(), which
allows a non-digit after the number (e.g., "1q") to be silently
ignored.  As a natural consequence, an argument that does not begin
with a digit (e.g., "q") silently becomes zero, too.

Switch to use strtol_i() and parse_timestamp() appropriately to
catch bogus input.

Note that one may naïvely expect that --max-count, --skip, etc., to
only take non-negative values, but we must allow them to also take
negative values, as an escape hatch to countermand a limit set by an
earlier option on the command line; the underlying variables are
initialized to (-1) and "--max-count=-1", for example, is a
legitimate way to reinitialize the limit.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 revision.c                 | 41 ++++++++++++++++++++++++++++----------
 t/t6005-rev-list-count.sh  | 15 +++++++++++++-
 t/t6009-rev-list-parent.sh | 11 ++++++++++
 3 files changed, 55 insertions(+), 12 deletions(-)

Comments

Jeff King Dec. 12, 2023, 1:30 a.m. UTC | #1
On Sat, Dec 09, 2023 at 07:35:23AM +0900, Junio C Hamano wrote:

> The "rev-list" and other commands in the "log" family, being the
> oldest part of the system, use their own custom argument parsers,
> and integer values of some options are parsed with atoi(), which
> allows a non-digit after the number (e.g., "1q") to be silently
> ignored.  As a natural consequence, an argument that does not begin
> with a digit (e.g., "q") silently becomes zero, too.
> 
> Switch to use strtol_i() and parse_timestamp() appropriately to
> catch bogus input.
> 
> Note that one may naïvely expect that --max-count, --skip, etc., to
> only take non-negative values, but we must allow them to also take
> negative values, as an escape hatch to countermand a limit set by an
> earlier option on the command line; the underlying variables are
> initialized to (-1) and "--max-count=-1", for example, is a
> legitimate way to reinitialize the limit.

This all looks pretty reasonable to me.

I couldn't help but think, though, that surely we have some helpers for
this already? But the closest seems to be git_parse_int(), which also
allows unit factors. I'm not sure if allowing "-n 1k" would be a feature
or a bug. ;)

I guess "strtol_i()" maybe is that helper already, though I did not even
know it existed. Looks like it goes back to 2007, and is seldom used. I
wonder if there are more spots that could benefit.

I don't think there is any such helper for timestamps, but the checks in
your parser look good (strtol_i() checks for overflow as we cast to int,
but I don't think we need to do the same here since timestamp_t and
parse_timestamp() should be matched).

-Peff
Junio C Hamano Dec. 12, 2023, 3:09 p.m. UTC | #2
Jeff King <peff@peff.net> writes:

> This all looks pretty reasonable to me.
>
> I couldn't help but think, though, that surely we have some helpers for
> this already? But the closest seems to be git_parse_int(), which also
> allows unit factors. I'm not sure if allowing "-n 1k" would be a feature
> or a bug. ;)

The change in question is meant to be a pure fix to replace a careless
use of atoi().  I do not mind to see a separate patch to add such a
feature later on top.

> I guess "strtol_i()" maybe is that helper already, though I did not even
> know it existed. Looks like it goes back to 2007, and is seldom used.

I didn't know about it, either ;-) I used it only because there is
one use of it, among places that used atoi() I wanted to tighten.

> I wonder if there are more spots that could benefit.

"git grep -e 'atoi('" would give somebody motivated a decent set of
#microproject ideas, but many hits are not suited for strtol_i(),
which is designed to parse an integer at the end of a string.  Some
places use atoi() immediately followed by strspn() to skip over
digits, which means they are parsing an integer and want to continue
reading after the integer, which is incompatible with what
strtol_i() wants to do.  They need either a separate helper or an
updated strtol_i() that optionally allows you to parse the prefix
and report where the integer ended, e.g., something like:

#define strtol_i(s,b,r) strtol_i2((s), (b), (r), NULL)
static inline int strtol_i2(char const *s, int base, int *result, char **endp)
{
	long ul;
        char *dummy = NULL;

        if (!endp)
		endp = &dummy;
	errno = 0;
	ul = strtol(s, &endp, base);
	if (errno ||
	    /*
	     * if we are told to parse to the end of the string by
	     * passing NULL to endp, it is an error to have any
	     * remaining character after the digits.
	     */
            (dummy && *dummy) ||
	    endp == s || (int) ul != ul)
		return -1;
	*result = ul;
	return 0;
}

perhaps.
Jeff King Dec. 12, 2023, 10:05 p.m. UTC | #3
On Tue, Dec 12, 2023 at 07:09:02AM -0800, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > This all looks pretty reasonable to me.
> >
> > I couldn't help but think, though, that surely we have some helpers for
> > this already? But the closest seems to be git_parse_int(), which also
> > allows unit factors. I'm not sure if allowing "-n 1k" would be a feature
> > or a bug. ;)
> 
> The change in question is meant to be a pure fix to replace a careless
> use of atoi().  I do not mind to see a separate patch to add such a
> feature later on top.

Oh, I mostly meant that I would have turned to git_parse_int() as that
already-existing helper, but it is not suitable because of the extra
unit-handling. I think your patch draws the line in the right place.

> > I wonder if there are more spots that could benefit.
> 
> "git grep -e 'atoi('" would give somebody motivated a decent set of
> #microproject ideas, but many hits are not suited for strtol_i(),
> which is designed to parse an integer at the end of a string.  Some
> places use atoi() immediately followed by strspn() to skip over
> digits, which means they are parsing an integer and want to continue
> reading after the integer, which is incompatible with what
> strtol_i() wants to do.  They need either a separate helper or an
> updated strtol_i() that optionally allows you to parse the prefix
> and report where the integer ended, e.g., something like:

Yeah, I agree this might be a good microproject (or leftoverbits) area,
and the semantics for the helper you propose make sense to me.

-Peff
diff mbox series

Patch

diff --git a/revision.c b/revision.c
index 0ae1c76db3..8cda6e43d4 100644
--- a/revision.c
+++ b/revision.c
@@ -2214,6 +2214,27 @@  static void add_message_grep(struct rev_info *revs, const char *pattern)
 	add_grep(revs, pattern, GREP_PATTERN_BODY);
 }
 
+static int parse_count(const char *arg)
+{
+	int count;
+
+	if (strtol_i(arg, 10, &count) < 0)
+		die("'%s': not an integer", arg);
+	return count;
+}
+
+static timestamp_t parse_age(const char *arg)
+{
+	timestamp_t num;
+	char *p;
+
+	errno = 0;
+	num = parse_timestamp(arg, &p, 10);
+	if (errno || *p || p == arg)
+		die("'%s': not a number of seconds since epoch", arg);
+	return num;
+}
+
 static int handle_revision_opt(struct rev_info *revs, int argc, const char **argv,
 			       int *unkc, const char **unkv,
 			       const struct setup_revision_opt* opt)
@@ -2240,29 +2261,27 @@  static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg
 	}
 
 	if ((argcount = parse_long_opt("max-count", argv, &optarg))) {
-		revs->max_count = atoi(optarg);
+		revs->max_count = parse_count(optarg);
 		revs->no_walk = 0;
 		return argcount;
 	} else if ((argcount = parse_long_opt("skip", argv, &optarg))) {
-		revs->skip_count = atoi(optarg);
+		revs->skip_count = parse_count(optarg);
 		return argcount;
 	} else if ((*arg == '-') && isdigit(arg[1])) {
 		/* accept -<digit>, like traditional "head" */
-		if (strtol_i(arg + 1, 10, &revs->max_count) < 0 ||
-		    revs->max_count < 0)
-			die("'%s': not a non-negative integer", arg + 1);
+		revs->max_count = parse_count(arg + 1);
 		revs->no_walk = 0;
 	} else if (!strcmp(arg, "-n")) {
 		if (argc <= 1)
 			return error("-n requires an argument");
-		revs->max_count = atoi(argv[1]);
+		revs->max_count = parse_count(argv[1]);
 		revs->no_walk = 0;
 		return 2;
 	} else if (skip_prefix(arg, "-n", &optarg)) {
-		revs->max_count = atoi(optarg);
+		revs->max_count = parse_count(optarg);
 		revs->no_walk = 0;
 	} else if ((argcount = parse_long_opt("max-age", argv, &optarg))) {
-		revs->max_age = atoi(optarg);
+		revs->max_age = parse_age(optarg);
 		return argcount;
 	} else if ((argcount = parse_long_opt("since", argv, &optarg))) {
 		revs->max_age = approxidate(optarg);
@@ -2274,7 +2293,7 @@  static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg
 		revs->max_age = approxidate(optarg);
 		return argcount;
 	} else if ((argcount = parse_long_opt("min-age", argv, &optarg))) {
-		revs->min_age = atoi(optarg);
+		revs->min_age = parse_age(optarg);
 		return argcount;
 	} else if ((argcount = parse_long_opt("before", argv, &optarg))) {
 		revs->min_age = approxidate(optarg);
@@ -2362,11 +2381,11 @@  static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg
 	} else if (!strcmp(arg, "--no-merges")) {
 		revs->max_parents = 1;
 	} else if (skip_prefix(arg, "--min-parents=", &optarg)) {
-		revs->min_parents = atoi(optarg);
+		revs->min_parents = parse_count(optarg);
 	} else if (!strcmp(arg, "--no-min-parents")) {
 		revs->min_parents = 0;
 	} else if (skip_prefix(arg, "--max-parents=", &optarg)) {
-		revs->max_parents = atoi(optarg);
+		revs->max_parents = parse_count(optarg);
 	} else if (!strcmp(arg, "--no-max-parents")) {
 		revs->max_parents = -1;
 	} else if (!strcmp(arg, "--boundary")) {
diff --git a/t/t6005-rev-list-count.sh b/t/t6005-rev-list-count.sh
index 0729f800c3..62538f5a02 100755
--- a/t/t6005-rev-list-count.sh
+++ b/t/t6005-rev-list-count.sh
@@ -18,13 +18,23 @@  test_expect_success 'no options' '
 '
 
 test_expect_success '--max-count' '
+	test_must_fail git rev-list --max-count=1q HEAD 2>error &&
+	grep "not an integer" error &&
+
 	test_stdout_line_count = 0 git rev-list HEAD --max-count=0 &&
 	test_stdout_line_count = 3 git rev-list HEAD --max-count=3 &&
 	test_stdout_line_count = 5 git rev-list HEAD --max-count=5 &&
-	test_stdout_line_count = 5 git rev-list HEAD --max-count=10
+	test_stdout_line_count = 5 git rev-list HEAD --max-count=10 &&
+	test_stdout_line_count = 5 git rev-list HEAD --max-count=-1
 '
 
 test_expect_success '--max-count all forms' '
+	test_must_fail git rev-list -1q HEAD 2>error &&
+	grep "not an integer" error &&
+	test_must_fail git rev-list --1 HEAD &&
+	test_must_fail git rev-list -n 1q HEAD 2>error &&
+	grep "not an integer" error &&
+
 	test_stdout_line_count = 1 git rev-list HEAD --max-count=1 &&
 	test_stdout_line_count = 1 git rev-list HEAD -1 &&
 	test_stdout_line_count = 1 git rev-list HEAD -n1 &&
@@ -32,6 +42,9 @@  test_expect_success '--max-count all forms' '
 '
 
 test_expect_success '--skip' '
+	test_must_fail git rev-list --skip 1q HEAD 2>error &&
+	grep "not an integer" error &&
+
 	test_stdout_line_count = 5 git rev-list HEAD --skip=0 &&
 	test_stdout_line_count = 2 git rev-list HEAD --skip=3 &&
 	test_stdout_line_count = 0 git rev-list HEAD --skip=5 &&
diff --git a/t/t6009-rev-list-parent.sh b/t/t6009-rev-list-parent.sh
index 5a67bbc760..9c9a8459af 100755
--- a/t/t6009-rev-list-parent.sh
+++ b/t/t6009-rev-list-parent.sh
@@ -62,6 +62,17 @@  test_expect_success 'setup roots, merges and octopuses' '
 	git checkout main
 '
 
+test_expect_success 'parse --max-parents & --min-parents' '
+	test_must_fail git rev-list --max-parents=1q HEAD 2>error &&
+	grep "not an integer" error &&
+
+	test_must_fail git rev-list --min-parents=1q HEAD 2>error &&
+	grep "not an integer" error &&
+
+	git rev-list --max-parents=1 --min-parents=1 HEAD &&
+	git rev-list --max-parents=-1 --min-parents=-1 HEAD
+'
+
 test_expect_success 'rev-list roots' '
 
 	check_revlist "--max-parents=0" one five