Message ID | 20221216033638.2582956-1-phil.hord@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | date.c: allow ISO 8601 reduced precision times | expand |
Phil Hord <phil.hord@gmail.com> writes: > From: Phil Hord <phil.hord@gmail.com> > > ISO 8601 permits "reduced precision" time representations to omit the > seconds value or both the minutes and the seconds values. The > abbreviate times could look like 17:45 or 1745 to omit the seconds, > or simply as 17 to omit both the minutes and the seconds. > > parse_date_basic accepts the 17:45 format but it rejects the other two. > Fix it to accept 4-digit and 2-digit time values when they follow a > recognized date but no time has yet been parsed. I worry a bit that this may conflict with other approxidate heuristics. > $ test-tool date approxidate 2022-12-13T23:00 2022-12-13T2300 2022-12-13T23 > 2022-12-13T23:00 -> 2022-12-14 07:00:00 +0000 > 2022-12-13T2300 -> 2022-12-14 07:00:00 +0000 > 2022-12-13T23 -> 2022-12-14 07:00:00 +0000 All of these may be obvious improvements, but the thing is that there is nothing in the approxidate parsing code that insists on the presence of "T" to loosen the rule only for ISO-8601 case. For example, with only 6 digits, do we still recognise our internal timestamp format (i.e. seconds since epoch) without the disambiguating '@' prefix?
On Thu, Dec 15, 2022 at 8:23 PM Junio C Hamano <gitster@pobox.com> wrote: > > Phil Hord <phil.hord@gmail.com> writes: > > > From: Phil Hord <phil.hord@gmail.com> > > > > ISO 8601 permits "reduced precision" time representations to omit the > > seconds value or both the minutes and the seconds values. The > > abbreviate times could look like 17:45 or 1745 to omit the seconds, > > or simply as 17 to omit both the minutes and the seconds. > > > > parse_date_basic accepts the 17:45 format but it rejects the other two. > > Fix it to accept 4-digit and 2-digit time values when they follow a > > recognized date but no time has yet been parsed. > > I worry a bit that this may conflict with other approxidate > heuristics. I share your concern. I tried to make the ISO matching code very specific to the case where we have already matched a date, we have not yet matched a time value, and we see a lone 2-digit or 4-digit number show up. > > $ test-tool date approxidate 2022-12-13T23:00 2022-12-13T2300 2022-12-13T23 > > 2022-12-13T23:00 -> 2022-12-14 07:00:00 +0000 > > 2022-12-13T2300 -> 2022-12-14 07:00:00 +0000 > > 2022-12-13T23 -> 2022-12-14 07:00:00 +0000 > > All of these may be obvious improvements, but the thing is that > there is nothing in the approxidate parsing code that insists on the > presence of "T" to loosen the rule only for ISO-8601 case. I considered making the T an explicit marker, but it didn't seem necessary here. But the looseness of approxidate with regard to spaces is worrisome. That's why I added the date/no-time constraints. > For example, with only 6 digits, do we still recognise our internal > timestamp format (i.e. seconds since epoch) without the > disambiguating '@' prefix? I don't grok your example. This change should not affect the interpretation of any 6-digit number. Oh, do you mean if there was _no_ delimiter before the time field? Like 2022-12-132300? My change will not recognize this format, and I believe it was explicitly rejected by ISO-8601-1:2019. approxidate seems not to recognize fewer than 9 digits as an epoch number, even with the @ prefix. But this is not because of my change. test-tool date approxidate 123456789 12345678 123456789 -> 1973-11-29 21:33:09 +0000 12345678 -> 2022-12-16 18:34:02 +0000 test-tool date approxidate @123456789 @12345678 @123456789 -> 1973-11-29 21:33:09 +0000 @12345678 -> 2022-12-16 18:36:35 +0000 test-tool date parse 123456789 12345678 123456789 -> 1973-11-29 13:33:09 -0800 12345678 -> bad test-tool date parse @123456789 @12345678 @123456789 -> 1973-11-29 13:33:09 -0800 @12345678 -> bad
> On Thu, Dec 15, 2022 at 8:23 PM Junio C Hamano <gitster@pobox.com> wrote: > > All of these may be obvious improvements, but the thing is that > > there is nothing in the approxidate parsing code that insists on the > > presence of "T" to loosen the rule only for ISO-8601 case. > > I considered making the T an explicit marker, but it didn't seem > necessary here. But the looseness of approxidate with regard to spaces > is worrisome. That's why I added the date/no-time constraints. > > > For example, with only 6 digits, do we still recognise our internal > > timestamp format (i.e. seconds since epoch) without the > > disambiguating '@' prefix? > > I don't grok your example. This change should not affect the > interpretation of any 6-digit number. > > Oh, do you mean if there was _no_ delimiter before the time field? > Like 2022-12-132300? My change will not recognize this format, and I > believe it was explicitly rejected by ISO-8601-1:2019. > > approxidate seems not to recognize fewer than 9 digits as an epoch > number, even with the @ prefix. But this is not because of my change. > > test-tool date approxidate 123456789 12345678 > 123456789 -> 1973-11-29 21:33:09 +0000 > 12345678 -> 2022-12-16 18:34:02 +0000 > > test-tool date approxidate @123456789 @12345678 > @123456789 -> 1973-11-29 21:33:09 +0000 > @12345678 -> 2022-12-16 18:36:35 +0000 > > test-tool date parse 123456789 12345678 > 123456789 -> 1973-11-29 13:33:09 -0800 > 12345678 -> bad > > test-tool date parse @123456789 @12345678 > @123456789 -> 1973-11-29 13:33:09 -0800 > @12345678 -> bad Do you have any suggestions about how I can better alleviate your concerns? I don't think there are real regressions here and I tried to explain why.
Phil Hord <phil.hord@gmail.com> writes: > Do you have any suggestions about how I can better alleviate your > concerns? I don't think there are real regressions here and I tried > to explain why. Other than "including it in a released version and waiting for people to scream", I do not think there is. The "next" branch was meant to be a test ground for these new features by letting volunteer users to use it in their everyday development, and the hope was that we can catch regressions by cooking risky topics longer than usual in there, but we haven't been very successful, I have to say. Thanks. Let's queue it and see what happens.
Junio C Hamano <gitster@pobox.com> writes: > Phil Hord <phil.hord@gmail.com> writes: > >> Do you have any suggestions about how I can better alleviate your >> concerns? I don't think there are real regressions here and I tried >> to explain why. > > Other than "including it in a released version and waiting for > people to scream", I do not think there is. The "next" branch was > meant to be a test ground for these new features by letting > volunteer users to use it in their everyday development, and the > hope was that we can catch regressions by cooking risky topics > longer than usual in there, but we haven't been very successful, I > have to say. > > Thanks. Let's queue it and see what happens. Actually, let's not queue it as-is, because it seems to break many tests for me. I won't have time to take further look myself before later in the week when I come back online again, though. Test Summary Report ------------------- t4255-am-submodule.sh (Wstat: 256 (exited 1) Tests: 33 Failed: 22) Failed tests: 1-6, 11-13, 15-20, 25-27, 30-33 Non-zero exit status: 1 t4150-am.sh (Wstat: 256 (exited 1) Tests: 87 Failed: 62) Failed tests: 3, 5-7, 11-13, 15-16, 18-22, 24-25, 27-32 34-36, 38-46, 48, 50-52, 54, 57-61, 63-65 67-77, 82-85 Non-zero exit status: 1 t4014-format-patch.sh (Wstat: 256 (exited 1) Tests: 193 Failed: 20) Failed tests: 6-7, 10, 141-157 Non-zero exit status: 1 t7512-status-help.sh (Wstat: 256 (exited 1) Tests: 44 Failed: 1) Failed test: 26 Non-zero exit status: 1 t3901-i18n-patch.sh (Wstat: 256 (exited 1) Tests: 20 Failed: 5) Failed tests: 16-20 Non-zero exit status: 1 t4151-am-abort.sh (Wstat: 256 (exited 1) Tests: 20 Failed: 7) Failed tests: 2-3, 5-6, 15, 19-20 Non-zero exit status: 1 t4153-am-resume-override-opts.sh (Wstat: 256 (exited 1) Tests: 5 Failed: 2) Failed tests: 3-4 Non-zero exit status: 1 t5607-clone-bundle.sh (Wstat: 256 (exited 1) Tests: 14 Failed: 1) Failed test: 3 Non-zero exit status: 1 t4152-am-subjects.sh (Wstat: 256 (exited 1) Tests: 13 Failed: 9) Failed tests: 5-13 Non-zero exit status: 1 t4253-am-keep-cr-dos.sh (Wstat: 256 (exited 1) Tests: 7 Failed: 4) Failed tests: 3-4, 6-7 Non-zero exit status: 1 t4257-am-interactive.sh (Wstat: 256 (exited 1) Tests: 4 Failed: 3) Failed tests: 2-4 Non-zero exit status: 1 t4258-am-quoted-cr.sh (Wstat: 256 (exited 1) Tests: 4 Failed: 2) Failed tests: 3-4 Non-zero exit status: 1 t4254-am-corrupt.sh (Wstat: 256 (exited 1) Tests: 4 Failed: 1) Failed test: 3 Non-zero exit status: 1 t0023-crlf-am.sh (Wstat: 256 (exited 1) Tests: 2 Failed: 1) Failed test: 2 Non-zero exit status: 1 t4256-am-format-flowed.sh (Wstat: 256 (exited 1) Tests: 2 Failed: 1) Failed test: 2 Non-zero exit status: 1 Files=987, Tests=28346, 137 wallclock secs (12.84 usr 4.19 sys + 799.53 cusr 1017.07 csys = 1833.63 CPU) Result: FAIL gmake[1]: *** [Makefile:62: prove] Error 1 gmake[1]: Leaving directory '/home/gitster/w/buildfarm/seen/t' gmake: *** [Makefile:3196: test] Error 2 rmdir: failed to remove '/dev/shm/testpen.2086794': Directory not empty
On Mon, Jan 9, 2023 at 1:16 AM Junio C Hamano <gitster@pobox.com> wrote: > > Junio C Hamano <gitster@pobox.com> writes: > > > Phil Hord <phil.hord@gmail.com> writes: > > > >> Do you have any suggestions about how I can better alleviate your > >> concerns? I don't think there are real regressions here and I tried > >> to explain why. > > > > Other than "including it in a released version and waiting for > > people to scream", I do not think there is. The "next" branch was > > meant to be a test ground for these new features by letting > > volunteer users to use it in their everyday development, and the > > hope was that we can catch regressions by cooking risky topics > > longer than usual in there, but we haven't been very successful, I > > have to say. > > > > Thanks. Let's queue it and see what happens. > > Actually, let's not queue it as-is, because it seems to break many > tests for me. I won't have time to take further look myself before > later in the week when I come back online again, though. Oh, wow. For me as well. I thought I ran all the tests before finishing up, but I guess I was too focused on the single test module. I apologize for the oversight.
diff --git a/date.c b/date.c index 53bd6a7932..b011b9d6b3 100644 --- a/date.c +++ b/date.c @@ -638,6 +638,18 @@ static inline int nodate(struct tm *tm) tm->tm_sec) < 0; } +/* + * Have we filled in any part of the time yet? + * We just do a binary 'and' to see if the sign bit + * is set in all the values. + */ +static inline int notime(struct tm *tm) +{ + return (tm->tm_hour & + tm->tm_min & + tm->tm_sec) < 0; +} + /* * We've seen a digit. Time? Year? Date? */ @@ -689,7 +701,11 @@ static int match_digit(const char *date, struct tm *tm, int *offset, int *tm_gmt /* 8 digits, compact style of ISO-8601's date: YYYYmmDD */ /* 6 digits, compact style of ISO-8601's time: HHMMSS */ - if (n == 8 || n == 6) { + /* 4 digits, compact style of ISO-8601's time: HHMM */ + /* 2 digits, compact style of ISO-8601's time: HH */ + if (n == 8 || n == 6 || + (!nodate(tm) && notime(tm) && + (n == 4 || n == 2))) { unsigned int num1 = num / 10000; unsigned int num2 = (num % 10000) / 100; unsigned int num3 = num % 100; @@ -698,6 +714,10 @@ static int match_digit(const char *date, struct tm *tm, int *offset, int *tm_gmt else if (n == 6 && set_time(num1, num2, num3, tm) == 0 && *end == '.' && isdigit(end[1])) strtoul(end + 1, &end, 10); + else if (n == 4) + set_time(num2, num3, 0, tm); + else if (n == 2) + set_time(num3, 0, 0, tm); return end - date; } diff --git a/t/t0006-date.sh b/t/t0006-date.sh index 2490162071..16fb0bf4bd 100755 --- a/t/t0006-date.sh +++ b/t/t0006-date.sh @@ -88,6 +88,12 @@ check_parse 2008-02-14 bad check_parse '2008-02-14 20:30:45' '2008-02-14 20:30:45 +0000' check_parse '2008-02-14 20:30:45 -0500' '2008-02-14 20:30:45 -0500' check_parse '2008.02.14 20:30:45 -0500' '2008-02-14 20:30:45 -0500' +check_parse '20080214T20:30:45' '2008-02-14 20:30:45 +0000' +check_parse '20080214T20:30' '2008-02-14 20:30:00 +0000' +check_parse '20080214T20' '2008-02-14 20:00:00 +0000' +check_parse '20080214T203045' '2008-02-14 20:30:45 +0000' +check_parse '20080214T2030' '2008-02-14 20:30:00 +0000' +check_parse '20080214T20' '2008-02-14 20:00:00 +0000' check_parse '20080214T203045-04:00' '2008-02-14 20:30:45 -0400' check_parse '20080214T203045 -04:00' '2008-02-14 20:30:45 -0400' check_parse '20080214T203045.019-04:00' '2008-02-14 20:30:45 -0400'