diff mbox series

fuzz: add new oss-fuzz fuzzer for date.c / date.h

Message ID pull.1612.git.1699724379458.gitgitgadget@gmail.com (mailing list archive)
State Superseded
Headers show
Series fuzz: add new oss-fuzz fuzzer for date.c / date.h | expand

Commit Message

Arthur Chan Nov. 11, 2023, 5:39 p.m. UTC
From: Arthur Chan <arthur.chan@adalogics.com>

Signed-off-by: Arthur Chan <arthur.chan@adalogics.com>
---
    fuzz: add new oss-fuzz fuzzer for date.c / date.h
    
    This patch is aimed to add a new oss-fuzz fuzzer to the oss-fuzz
    directory for fuzzing date.c / date.h in the base directory.
    
    The .gitignore of the oss-fuzz directory and the Makefile have been
    modified to accommodate the new fuzzer fuzz-date.c.

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1612%2Farthurscchan%2Fnew-fuzzer-date-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1612/arthurscchan/new-fuzzer-date-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1612

 Makefile             |  1 +
 oss-fuzz/.gitignore  |  1 +
 oss-fuzz/fuzz-date.c | 75 ++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 77 insertions(+)
 create mode 100644 oss-fuzz/fuzz-date.c


base-commit: dadef801b365989099a9929e995589e455c51fed

Comments

Junio C Hamano Nov. 12, 2023, 5:59 a.m. UTC | #1
"Arthur Chan via GitGitGadget" <gitgitgadget@gmail.com> writes:

> diff --git a/Makefile b/Makefile
> index 03adcb5a480..c9fe99a8c88 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -752,6 +752,7 @@ ETAGS_TARGET = TAGS
>  FUZZ_OBJS += oss-fuzz/fuzz-commit-graph.o
>  FUZZ_OBJS += oss-fuzz/fuzz-pack-headers.o
>  FUZZ_OBJS += oss-fuzz/fuzz-pack-idx.o
> +FUZZ_OBJS += oss-fuzz/fuzz-date.o

The same comment applies to .gitignore but I think the existing
entries are sorted and fuzz-date should be added between
fuzz-commit-graph and fuzz-pack-headers.

> diff --git a/oss-fuzz/fuzz-date.c b/oss-fuzz/fuzz-date.c
> new file mode 100644
> index 00000000000..29bcaf595e4
> --- /dev/null
> +++ b/oss-fuzz/fuzz-date.c
> @@ -0,0 +1,75 @@
> +#include "git-compat-util.h"
> +#include "date.h"
> +
> +int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size);
> +
> +int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size)
> +{
> +	int type;
> +	int time;
> +	int num;
> +	char *str;
> +	timestamp_t ts;
> +	enum date_mode_type dmtype;
> +	struct date_mode *dm;
> +
> +	if (size <= 8)
> +	{
> +		return 0;
> +	}

How much do we care about sticking to our coding style for source
files in this directory?  If we do (and I do not see a strong reason
not to), let's lose the {unneeded braces} around a single statement
block.

> +	type = (*((int *)data)) % 8;
> +	data += 4;
> +	size -= 4;

I'd prefer to avoid these hardcoded and unexplained constants.  I
think "8" relates to the number of case arms below?  I am not sure
if "4" is justifiable without "we assume everybody's int is four
bytes long", but if that is what is going on, perhaps use uint32_t
or something?

Also, is data[] guaranteed to be aligned well to allow us to do the
above?  As we only need to spread to DATE_UNIX-1 types (because we
exclude DATE_STRFTIME), it is sufficient to look at the lower nibble
of a single byte.  The upper nibble could be used to fuzz the .local
bit if you wanted to, e.g. so I wonder

	local_bit = !!(*data & 0x10);
	dmtype = (enum date_mode_type)(*data % DATE_UNIX);
	if (dmtype == DATE_STRFTIME)
		return 0;
	data++;
	size--;

> +	time = abs(*((int *)data));
> +	data += 4;
> +	size -= 4;

Ditto.  Rename "time" because the second parameter to show_date() is
*not* "time" but "tz".  The valid range of "tz" comfortably fits in
16-bit signed int, but note that there are valid negative values in
the range.

Are we assuming that the "tz" is attacker controlled?  Why are you
limiting its value to non-negative, yet you are not rejecting absurd
timezone offsets?  Good values lie in a range much narrower than
between -2400 and 2400.  Subjecting "tz" to fuzzer is perfectly
fine, but then limiting its value to non-negative contradicts with
it, so I am not sure what your intention is.

As I used the first byte to fuzz dmtype and .local, let's use the
next three bytes to allow feeding overly wild timezone values to the
machinery and see what breaks, perhaps like so:

	tz = *data++; /* int tz; */
	tz = (tz << 8) | *data++;
	tz = (tz << 8) | *data++;
	size -= 3;

Now the upfront length check needs to reject any input shorter than
4 bytes, so do so with a comment accordingly, perhaps like

	if (size < 4)
		/*
                 * we use the first byte to fuzz dmtype and local,
                 * then the next three bytes to fuzz tz	offset,
                 * and the remainder is fed as end-user input to
		 * approxidate().
		 */
		return 0;

before everything I wrote so far.

> +	str = (char *)malloc(size+1);

	(char *)malloc(size + 1);

> +	if (!str)
> +	{
> +		return 0;
> +	}

Ditto on {unnecessary braces}.

> +	memcpy(str, data, size);
> +	str[size] = '\0';
> +
> +	ts = approxidate_careful(str, &num);
> +	free(str);
> +
> +	switch(type)
> +	{
> +		case 0: default:
> +			dmtype = DATE_NORMAL;
> +			break;

Style.  In our codebase, "switch" and "case" align at the same
column, and case arms are written one per line, i.e.,

	switch (type) {
	case 0:
	default:
		...

The way dmtype is handled in a switch() below tells me that you do
not consider it is a potential attack vector (e.g., an attacker
cannot force us to use dmtype==DATE_STRFTIME without the format and
cause us to die).  Am I reading your intention correctly?

If so, I'd just do the "use the lower nibble of the first byte" as I
shown earlier, and this large switch statement will go away.

> +		case 1:
> +			dmtype = DATE_HUMAN;
> +			break;
> +		case 2:
> +			dmtype = DATE_SHORT;
> +			break;
> +		case 3:
> +			dmtype = DATE_ISO8601;
> +			break;
> +		case 4:
> +			dmtype = DATE_ISO8601_STRICT;
> +			break;
> +		case 5:
> +			dmtype = DATE_RFC2822;
> +			break;
> +		case 6:
> +			dmtype = DATE_RAW;
> +			break;
> +		case 7:
> +			dmtype = DATE_UNIX;
> +			break;
> +	}
> +
> +	dm = date_mode_from_type(dmtype);
> +	dm->local = 1;

Don't we want to allow the incoming data to fuzz the local bit, too?

> +	show_date(ts, time, dm);
> +
> +	date_mode_release(dm);
> +
> +	return 0;
> +}
>
> base-commit: dadef801b365989099a9929e995589e455c51fed
Junio C Hamano Nov. 12, 2023, 12:39 p.m. UTC | #2
Junio C Hamano <gitster@pobox.com> writes:

> As I used the first byte to fuzz dmtype and .local, let's use the
> next three bytes to allow feeding overly wild timezone values to the
> machinery and see what breaks, perhaps like so:
>
> 	tz = *data++; /* int tz; */
> 	tz = (tz << 8) | *data++;
> 	tz = (tz << 8) | *data++;
> 	size -= 3;

Just this part.  As data points at unsigned char, the above would
not give us any negative number.  We'd need to sign-extend the
24-bit resulting value if we are going to adopt the above approach.
diff mbox series

Patch

diff --git a/Makefile b/Makefile
index 03adcb5a480..c9fe99a8c88 100644
--- a/Makefile
+++ b/Makefile
@@ -752,6 +752,7 @@  ETAGS_TARGET = TAGS
 FUZZ_OBJS += oss-fuzz/fuzz-commit-graph.o
 FUZZ_OBJS += oss-fuzz/fuzz-pack-headers.o
 FUZZ_OBJS += oss-fuzz/fuzz-pack-idx.o
+FUZZ_OBJS += oss-fuzz/fuzz-date.o
 .PHONY: fuzz-objs
 fuzz-objs: $(FUZZ_OBJS)
 
diff --git a/oss-fuzz/.gitignore b/oss-fuzz/.gitignore
index 9acb74412ef..2375e77b108 100644
--- a/oss-fuzz/.gitignore
+++ b/oss-fuzz/.gitignore
@@ -1,3 +1,4 @@ 
 fuzz-commit-graph
 fuzz-pack-headers
 fuzz-pack-idx
+fuzz-date
diff --git a/oss-fuzz/fuzz-date.c b/oss-fuzz/fuzz-date.c
new file mode 100644
index 00000000000..29bcaf595e4
--- /dev/null
+++ b/oss-fuzz/fuzz-date.c
@@ -0,0 +1,75 @@ 
+#include "git-compat-util.h"
+#include "date.h"
+
+int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size);
+
+int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size)
+{
+	int type;
+	int time;
+	int num;
+	char *str;
+	timestamp_t ts;
+	enum date_mode_type dmtype;
+	struct date_mode *dm;
+
+	if (size <= 8)
+	{
+		return 0;
+	}
+
+	type = (*((int *)data)) % 8;
+	data += 4;
+	size -= 4;
+
+	time = abs(*((int *)data));
+	data += 4;
+	size -= 4;
+
+	str = (char *)malloc(size+1);
+	if (!str)
+	{
+		return 0;
+	}
+	memcpy(str, data, size);
+	str[size] = '\0';
+
+	ts = approxidate_careful(str, &num);
+	free(str);
+
+	switch(type)
+	{
+		case 0: default:
+			dmtype = DATE_NORMAL;
+			break;
+		case 1:
+			dmtype = DATE_HUMAN;
+			break;
+		case 2:
+			dmtype = DATE_SHORT;
+			break;
+		case 3:
+			dmtype = DATE_ISO8601;
+			break;
+		case 4:
+			dmtype = DATE_ISO8601_STRICT;
+			break;
+		case 5:
+			dmtype = DATE_RFC2822;
+			break;
+		case 6:
+			dmtype = DATE_RAW;
+			break;
+		case 7:
+			dmtype = DATE_UNIX;
+			break;
+	}
+
+	dm = date_mode_from_type(dmtype);
+	dm->local = 1;
+	show_date(ts, time, dm);
+
+	date_mode_release(dm);
+
+	return 0;
+}