diff mbox series

date.c: allow ISO 8601 reduced precision times

Message ID 20221216033638.2582956-1-phil.hord@gmail.com (mailing list archive)
State New, archived
Headers show
Series date.c: allow ISO 8601 reduced precision times | expand

Commit Message

Phil Hord Dec. 16, 2022, 3:36 a.m. UTC
From: Phil Hord <phil.hord@gmail.com>

ISO 8601 permits "reduced precision" time representations to omit the
seconds value or both the minutes and the seconds values.  The
abbreviate times could look like 17:45 or 1745 to omit the seconds,
or simply as 17 to omit both the minutes and the seconds.

parse_date_basic accepts the 17:45 format but it rejects the other two.
Fix it to accept 4-digit and 2-digit time values when they follow a
recognized date but no time has yet been parsed.

Add tests for these formats and some others, with and without colons.

Before this change:

$ test-tool date approxidate 2022-12-13T23:00 2022-12-13T2300 2022-12-13T23
2022-12-13T23:00 -> 2022-12-14 07:00:00 +0000
2022-12-13T2300 -> 2022-12-14 03:00:55 +0000
2022-12-13T23 -> 2022-12-14 03:00:55 +0000

After this change:

$ test-tool date approxidate 2022-12-13T23:00 2022-12-13T2300 2022-12-13T23
2022-12-13T23:00 -> 2022-12-14 07:00:00 +0000
2022-12-13T2300 -> 2022-12-14 07:00:00 +0000
2022-12-13T23 -> 2022-12-14 07:00:00 +0000

Note: ISO 8601 also allows reduced precision date strings such as
"2022-12" and "2022". This patch does not attempt to address these.

Reported-by: Pat LaVarre <plavarre@purestorage.com>
Signed-off-by: Phil Hord <phil.hord@gmail.com>
---
 date.c          | 22 +++++++++++++++++++++-
 t/t0006-date.sh |  6 ++++++
 2 files changed, 27 insertions(+), 1 deletion(-)

Comments

Junio C Hamano Dec. 16, 2022, 4:23 a.m. UTC | #1
Phil Hord <phil.hord@gmail.com> writes:

> From: Phil Hord <phil.hord@gmail.com>
>
> ISO 8601 permits "reduced precision" time representations to omit the
> seconds value or both the minutes and the seconds values.  The
> abbreviate times could look like 17:45 or 1745 to omit the seconds,
> or simply as 17 to omit both the minutes and the seconds.
>
> parse_date_basic accepts the 17:45 format but it rejects the other two.
> Fix it to accept 4-digit and 2-digit time values when they follow a
> recognized date but no time has yet been parsed.

I worry a bit that this may conflict with other approxidate
heuristics.

> $ test-tool date approxidate 2022-12-13T23:00 2022-12-13T2300 2022-12-13T23
> 2022-12-13T23:00 -> 2022-12-14 07:00:00 +0000
> 2022-12-13T2300 -> 2022-12-14 07:00:00 +0000
> 2022-12-13T23 -> 2022-12-14 07:00:00 +0000

All of these may be obvious improvements, but the thing is that
there is nothing in the approxidate parsing code that insists on the
presence of "T" to loosen the rule only for ISO-8601 case.

For example, with only 6 digits, do we still recognise our internal
timestamp format (i.e. seconds since epoch) without the
disambiguating '@' prefix?
Phil Hord Dec. 16, 2022, 6:38 p.m. UTC | #2
On Thu, Dec 15, 2022 at 8:23 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Phil Hord <phil.hord@gmail.com> writes:
>
> > From: Phil Hord <phil.hord@gmail.com>
> >
> > ISO 8601 permits "reduced precision" time representations to omit the
> > seconds value or both the minutes and the seconds values.  The
> > abbreviate times could look like 17:45 or 1745 to omit the seconds,
> > or simply as 17 to omit both the minutes and the seconds.
> >
> > parse_date_basic accepts the 17:45 format but it rejects the other two.
> > Fix it to accept 4-digit and 2-digit time values when they follow a
> > recognized date but no time has yet been parsed.
>
> I worry a bit that this may conflict with other approxidate
> heuristics.

I share your concern. I tried to make the ISO matching code very
specific to the case where we have already matched a date, we have not
yet matched a time value, and we see a lone 2-digit or 4-digit number
show up.

> > $ test-tool date approxidate 2022-12-13T23:00 2022-12-13T2300 2022-12-13T23
> > 2022-12-13T23:00 -> 2022-12-14 07:00:00 +0000
> > 2022-12-13T2300 -> 2022-12-14 07:00:00 +0000
> > 2022-12-13T23 -> 2022-12-14 07:00:00 +0000
>
> All of these may be obvious improvements, but the thing is that
> there is nothing in the approxidate parsing code that insists on the
> presence of "T" to loosen the rule only for ISO-8601 case.

I considered making the T an explicit marker, but it didn't seem
necessary here. But the looseness of approxidate with regard to spaces
is worrisome. That's why I added the date/no-time constraints.

> For example, with only 6 digits, do we still recognise our internal
> timestamp format (i.e. seconds since epoch) without the
> disambiguating '@' prefix?

I don't grok your example.  This change should not affect the
interpretation of any 6-digit number.

Oh, do you mean if there was _no_ delimiter before the time field?
Like 2022-12-132300?  My change will not recognize this format, and I
believe it was explicitly rejected by ISO-8601-1:2019.

approxidate seems not to recognize fewer than 9 digits as an epoch
number, even with the @ prefix.  But this is not because of my change.

test-tool date approxidate 123456789 12345678
123456789 -> 1973-11-29 21:33:09 +0000
12345678 -> 2022-12-16 18:34:02 +0000

test-tool date approxidate @123456789 @12345678
@123456789 -> 1973-11-29 21:33:09 +0000
@12345678 -> 2022-12-16 18:36:35 +0000

test-tool date parse 123456789 12345678
123456789 -> 1973-11-29 13:33:09 -0800
12345678 -> bad

test-tool date parse @123456789 @12345678
@123456789 -> 1973-11-29 13:33:09 -0800
@12345678 -> bad
Phil Hord Jan. 9, 2023, 6:41 a.m. UTC | #3
> On Thu, Dec 15, 2022 at 8:23 PM Junio C Hamano <gitster@pobox.com> wrote:
> > All of these may be obvious improvements, but the thing is that
> > there is nothing in the approxidate parsing code that insists on the
> > presence of "T" to loosen the rule only for ISO-8601 case.
>
> I considered making the T an explicit marker, but it didn't seem
> necessary here. But the looseness of approxidate with regard to spaces
> is worrisome. That's why I added the date/no-time constraints.
>
> > For example, with only 6 digits, do we still recognise our internal
> > timestamp format (i.e. seconds since epoch) without the
> > disambiguating '@' prefix?
>
> I don't grok your example.  This change should not affect the
> interpretation of any 6-digit number.
>
> Oh, do you mean if there was _no_ delimiter before the time field?
> Like 2022-12-132300?  My change will not recognize this format, and I
> believe it was explicitly rejected by ISO-8601-1:2019.
>
> approxidate seems not to recognize fewer than 9 digits as an epoch
> number, even with the @ prefix.  But this is not because of my change.
>
> test-tool date approxidate 123456789 12345678
> 123456789 -> 1973-11-29 21:33:09 +0000
> 12345678 -> 2022-12-16 18:34:02 +0000
>
> test-tool date approxidate @123456789 @12345678
> @123456789 -> 1973-11-29 21:33:09 +0000
> @12345678 -> 2022-12-16 18:36:35 +0000
>
> test-tool date parse 123456789 12345678
> 123456789 -> 1973-11-29 13:33:09 -0800
> 12345678 -> bad
>
> test-tool date parse @123456789 @12345678
> @123456789 -> 1973-11-29 13:33:09 -0800
> @12345678 -> bad


Do you have any suggestions about how I can better alleviate your
concerns?  I don't think there are real regressions here and I tried
to explain why.
Junio C Hamano Jan. 9, 2023, 8:48 a.m. UTC | #4
Phil Hord <phil.hord@gmail.com> writes:

> Do you have any suggestions about how I can better alleviate your
> concerns?  I don't think there are real regressions here and I tried
> to explain why.

Other than "including it in a released version and waiting for
people to scream", I do not think there is.  The "next" branch was
meant to be a test ground for these new features by letting
volunteer users to use it in their everyday development, and the
hope was that we can catch regressions by cooking risky topics
longer than usual in there, but we haven't been very successful, I
have to say.

Thanks.  Let's queue it and see what happens.
Junio C Hamano Jan. 9, 2023, 9:16 a.m. UTC | #5
Junio C Hamano <gitster@pobox.com> writes:

> Phil Hord <phil.hord@gmail.com> writes:
>
>> Do you have any suggestions about how I can better alleviate your
>> concerns?  I don't think there are real regressions here and I tried
>> to explain why.
>
> Other than "including it in a released version and waiting for
> people to scream", I do not think there is.  The "next" branch was
> meant to be a test ground for these new features by letting
> volunteer users to use it in their everyday development, and the
> hope was that we can catch regressions by cooking risky topics
> longer than usual in there, but we haven't been very successful, I
> have to say.
>
> Thanks.  Let's queue it and see what happens.

Actually, let's not queue it as-is, because it seems to break many
tests for me.  I won't have time to take further look myself before
later in the week when I come back online again, though.

Test Summary Report
-------------------
t4255-am-submodule.sh                            (Wstat: 256 (exited 1) Tests: 33 Failed: 22)
  Failed tests:  1-6, 11-13, 15-20, 25-27, 30-33
  Non-zero exit status: 1
t4150-am.sh                                      (Wstat: 256 (exited 1) Tests: 87 Failed: 62)
  Failed tests:  3, 5-7, 11-13, 15-16, 18-22, 24-25, 27-32
                34-36, 38-46, 48, 50-52, 54, 57-61, 63-65
                67-77, 82-85
  Non-zero exit status: 1
t4014-format-patch.sh                            (Wstat: 256 (exited 1) Tests: 193 Failed: 20)
  Failed tests:  6-7, 10, 141-157
  Non-zero exit status: 1
t7512-status-help.sh                             (Wstat: 256 (exited 1) Tests: 44 Failed: 1)
  Failed test:  26
  Non-zero exit status: 1
t3901-i18n-patch.sh                              (Wstat: 256 (exited 1) Tests: 20 Failed: 5)
  Failed tests:  16-20
  Non-zero exit status: 1
t4151-am-abort.sh                                (Wstat: 256 (exited 1) Tests: 20 Failed: 7)
  Failed tests:  2-3, 5-6, 15, 19-20
  Non-zero exit status: 1
t4153-am-resume-override-opts.sh                 (Wstat: 256 (exited 1) Tests: 5 Failed: 2)
  Failed tests:  3-4
  Non-zero exit status: 1
t5607-clone-bundle.sh                            (Wstat: 256 (exited 1) Tests: 14 Failed: 1)
  Failed test:  3
  Non-zero exit status: 1
t4152-am-subjects.sh                             (Wstat: 256 (exited 1) Tests: 13 Failed: 9)
  Failed tests:  5-13
  Non-zero exit status: 1
t4253-am-keep-cr-dos.sh                          (Wstat: 256 (exited 1) Tests: 7 Failed: 4)
  Failed tests:  3-4, 6-7
  Non-zero exit status: 1
t4257-am-interactive.sh                          (Wstat: 256 (exited 1) Tests: 4 Failed: 3)
  Failed tests:  2-4
  Non-zero exit status: 1
t4258-am-quoted-cr.sh                            (Wstat: 256 (exited 1) Tests: 4 Failed: 2)
  Failed tests:  3-4
  Non-zero exit status: 1
t4254-am-corrupt.sh                              (Wstat: 256 (exited 1) Tests: 4 Failed: 1)
  Failed test:  3
  Non-zero exit status: 1
t0023-crlf-am.sh                                 (Wstat: 256 (exited 1) Tests: 2 Failed: 1)
  Failed test:  2
  Non-zero exit status: 1
t4256-am-format-flowed.sh                        (Wstat: 256 (exited 1) Tests: 2 Failed: 1)
  Failed test:  2
  Non-zero exit status: 1
Files=987, Tests=28346, 137 wallclock secs (12.84 usr  4.19 sys + 799.53 cusr 1017.07 csys = 1833.63 CPU)
Result: FAIL
gmake[1]: *** [Makefile:62: prove] Error 1
gmake[1]: Leaving directory '/home/gitster/w/buildfarm/seen/t'
gmake: *** [Makefile:3196: test] Error 2
rmdir: failed to remove '/dev/shm/testpen.2086794': Directory not empty
Phil Hord Jan. 9, 2023, 6:30 p.m. UTC | #6
On Mon, Jan 9, 2023 at 1:16 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> Junio C Hamano <gitster@pobox.com> writes:
>
> > Phil Hord <phil.hord@gmail.com> writes:
> >
> >> Do you have any suggestions about how I can better alleviate your
> >> concerns?  I don't think there are real regressions here and I tried
> >> to explain why.
> >
> > Other than "including it in a released version and waiting for
> > people to scream", I do not think there is.  The "next" branch was
> > meant to be a test ground for these new features by letting
> > volunteer users to use it in their everyday development, and the
> > hope was that we can catch regressions by cooking risky topics
> > longer than usual in there, but we haven't been very successful, I
> > have to say.
> >
> > Thanks.  Let's queue it and see what happens.
>
> Actually, let's not queue it as-is, because it seems to break many
> tests for me.  I won't have time to take further look myself before
> later in the week when I come back online again, though.

Oh, wow. For me as well.  I thought I ran all the tests before
finishing up, but I guess I was too focused on the single test module.
I apologize for the oversight.
diff mbox series

Patch

diff --git a/date.c b/date.c
index 53bd6a7932..b011b9d6b3 100644
--- a/date.c
+++ b/date.c
@@ -638,6 +638,18 @@  static inline int nodate(struct tm *tm)
 		tm->tm_sec) < 0;
 }
 
+/*
+ * Have we filled in any part of the time yet?
+ * We just do a binary 'and' to see if the sign bit
+ * is set in all the values.
+ */
+static inline int notime(struct tm *tm)
+{
+	return (tm->tm_hour &
+		tm->tm_min &
+		tm->tm_sec) < 0;
+}
+
 /*
  * We've seen a digit. Time? Year? Date?
  */
@@ -689,7 +701,11 @@  static int match_digit(const char *date, struct tm *tm, int *offset, int *tm_gmt
 
 	/* 8 digits, compact style of ISO-8601's date: YYYYmmDD */
 	/* 6 digits, compact style of ISO-8601's time: HHMMSS */
-	if (n == 8 || n == 6) {
+	/* 4 digits, compact style of ISO-8601's time: HHMM */
+	/* 2 digits, compact style of ISO-8601's time: HH */
+	if (n == 8 || n == 6 ||
+		(!nodate(tm) && notime(tm) &&
+		(n == 4 || n == 2))) {
 		unsigned int num1 = num / 10000;
 		unsigned int num2 = (num % 10000) / 100;
 		unsigned int num3 = num % 100;
@@ -698,6 +714,10 @@  static int match_digit(const char *date, struct tm *tm, int *offset, int *tm_gmt
 		else if (n == 6 && set_time(num1, num2, num3, tm) == 0 &&
 			 *end == '.' && isdigit(end[1]))
 			strtoul(end + 1, &end, 10);
+		else if (n == 4)
+			set_time(num2, num3, 0, tm);
+		else if (n == 2)
+			set_time(num3, 0, 0, tm);
 		return end - date;
 	}
 
diff --git a/t/t0006-date.sh b/t/t0006-date.sh
index 2490162071..16fb0bf4bd 100755
--- a/t/t0006-date.sh
+++ b/t/t0006-date.sh
@@ -88,6 +88,12 @@  check_parse 2008-02-14 bad
 check_parse '2008-02-14 20:30:45' '2008-02-14 20:30:45 +0000'
 check_parse '2008-02-14 20:30:45 -0500' '2008-02-14 20:30:45 -0500'
 check_parse '2008.02.14 20:30:45 -0500' '2008-02-14 20:30:45 -0500'
+check_parse '20080214T20:30:45' '2008-02-14 20:30:45 +0000'
+check_parse '20080214T20:30' '2008-02-14 20:30:00 +0000'
+check_parse '20080214T20' '2008-02-14 20:00:00 +0000'
+check_parse '20080214T203045' '2008-02-14 20:30:45 +0000'
+check_parse '20080214T2030' '2008-02-14 20:30:00 +0000'
+check_parse '20080214T20' '2008-02-14 20:00:00 +0000'
 check_parse '20080214T203045-04:00' '2008-02-14 20:30:45 -0400'
 check_parse '20080214T203045 -04:00' '2008-02-14 20:30:45 -0400'
 check_parse '20080214T203045.019-04:00' '2008-02-14 20:30:45 -0400'