Message ID | 20230719162424.70781-2-five231003@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | Add new "describe" atom | expand |
Kousik Sanagavarapu <five231003@gmail.com> writes: > ref-filter.c | 59 ++++++++++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 59 insertions(+) New helper functions that do not have any caller and no documentation to explain how they are supposed to be called (i.e. the expectation on the callers---what values they need to feed as parameters when they call these helpers, and the expectation by the callers---what they expect to get out of the helpers once they return) makes it impossible to evaluate if they are any good [*]. Side note. Those of you who are keen to add unit tests to the system (Cc:ed) , do you think a patch line this one that adds a new helper function to the system, would benefit from being able to add a few unit tests for these otherwise unused helper functions? The calls to the new functions that the unit test framework would make should serve as a good piece of interface documentation, showing what the callers are supposed to pass and what they expect, I guess. So whatever framework we choose, it should allow adding a test or two to this patch easily, without being too intrusive. Would that be a good and concrete evaluation criterion? Anyway, because of that, I had to read [2/2] first and then come back here to review this one. The following is my attempt to write down the contract between the callers and this new helper function---please give something like that to the final version. The the example below is there just to illustrate the level of information that would be desired to help future readers and programmers. Do not take the contents as-written as truth---I may have (deliberately) mixed in incorrect descriptions ;-). /* * The string "to_parse" is expected to be a comma-separated list * of "key" or "key=val". If your atom allows "key1" and "key2" * (possibly with their values) as options, make two calls to this * funtion, passing "key1" in candiate and then passing "key2" in * candidate. * * The function Returns true ONLY when the to_parse string begins * with the candidate key, possibly followed by its value (valueless * key-only entries are allowed in the comman-separated list). * Otherwise, *end, *valuestart and *valuelen are LEFT INTACT and * the function returns false. * * *valuestart will point at the byte after '=' (i.e. the beginning * of the value), and the number of bytes in the value will be set * to *valuelen. * A key-only entry results in *valuestart set to NULL and *valuelen * set to 0. * *end will point at the next key[=val] in the comma-separated list * or NULL when the list ran out. */ > +static int match_atom_arg_value(const char *to_parse, const char *candidate, > + const char **end, const char **valuestart, > + size_t *valuelen) > +{ > + const char *atom; > + > + if (!(skip_prefix(to_parse, candidate, &atom))) > + return 0; > + if (valuestart) { As far as I saw, no callers pass NULL to valuestart. Getting rid of this if() statement and always entering its body would clarify what is going on, I think. > + if (*atom == '=') { > + *valuestart = atom + 1; > + *valuelen = strcspn(*valuestart, ",\0"); > + atom = *valuestart + *valuelen; > + } else { > + if (*atom != ',' && *atom != '\0') > + return 0; > + *valuestart = NULL; > + *valuelen = 0; > + } > + } > + if (*atom == ',') { > + *end = atom + 1; > + return 1; > + } > + if (*atom == '\0') { > + *end = atom; > + return 1; > + } > + return 0; > +} /* * Write something similar to document the contract between the caller * and this function here. */ > +static int match_atom_bool_arg(const char *to_parse, const char *candidate, > + const char **end, int *val) > +{ Thanks.
Junio C Hamano <gitster@pobox.com> writes: >> +static int match_atom_arg_value(const char *to_parse, const char *candidate, >> + const char **end, const char **valuestart, >> + size_t *valuelen) >> +{ >> + const char *atom; >> + >> + if (!(skip_prefix(to_parse, candidate, &atom))) >> + return 0; >> + if (valuestart) { > > As far as I saw, no callers pass NULL to valuestart. Getting rid of > this if() statement and always entering its body would clarify what > is going on, I think. Specifically, ... >> + if (*atom == '=') { >> + *valuestart = atom + 1; >> + *valuelen = strcspn(*valuestart, ",\0"); >> + atom = *valuestart + *valuelen; >> + } else { >> + if (*atom != ',' && *atom != '\0') >> + return 0; >> + *valuestart = NULL; >> + *valuelen = 0; >> + } >> + } >> + if (*atom == ',') { >> + *end = atom + 1; >> + return 1; >> + } >> + if (*atom == '\0') { >> + *end = atom; >> + return 1; >> + } >> + return 0; >> +} ... I think the body of the function would become easier to read if written like so: if (!skip_prefix(to_parse, candidate, &atom)) return 0; /* definitely not "candidate" */ if (*atom == '=') { /* we just saw "candidate=" */ *valuestart = atom + 1; atom = strchrnul(*valuestart, ','); *valuelen = atom - *valuestart; } else if (*atom != ',' && *atom != '\0') { /* key begins with "candidate" but has more chars */ return 0; } else { /* just "candidate" without "=val" */ *valuestart = NULL; *valuelen = 0; } /* atom points at either the ',' or NUL after this key[=val] */ if (*atom == ',') atom++; else if (*atom) BUG("should not happen"); *end = atom; return 1; as it is clear that *valuestart, *valuelen, and *end are not touched when the function returns 0 and they are all filled when the function returns 1. Also, avoid passing ",\0" to strcspn(); its effect is exactly the same as passing ",", and at that point you are better off using strchnul(). Thanks.
On Wed, Jul 19, 2023 at 04:23:15PM -0700, Junio C Hamano wrote: > Kousik Sanagavarapu <five231003@gmail.com> writes: > > > ref-filter.c | 59 ++++++++++++++++++++++++++++++++++++++++++++++++++++ > > 1 file changed, 59 insertions(+) > > New helper functions that do not have any caller and no > documentation to explain how they are supposed to be called > (i.e. the expectation on the callers---what values they need to feed > as parameters when they call these helpers, and the expectation by > the callers---what they expect to get out of the helpers once they > return) makes it impossible to evaluate if they are any good [*]. > > Side note. Those of you who are keen to add unit tests to > the system (Cc:ed) , do you think a patch line this one that > adds a new helper function to the system, would benefit from > being able to add a few unit tests for these otherwise > unused helper functions? > > The calls to the new functions that the unit test framework > would make should serve as a good piece of interface > documentation, showing what the callers are supposed to pass > and what they expect, I guess. > > So whatever framework we choose, it should allow adding a > test or two to this patch easily, without being too > intrusive. Would that be a good and concrete evaluation > criterion? > > Anyway, because of that, I had to read [2/2] first and then come > back here to review this one. > > The following is my attempt to write down the contract between the > callers and this new helper function---please give something like > that to the final version. The the example below is there just to > illustrate the level of information that would be desired to help > future readers and programmers. Do not take the contents as-written > as truth---I may have (deliberately) mixed in incorrect descriptions > ;-). I'll spot them---if there are any ;). > > /* > * The string "to_parse" is expected to be a comma-separated list > * of "key" or "key=val". If your atom allows "key1" and "key2" > * (possibly with their values) as options, make two calls to this > * funtion, passing "key1" in candiate and then passing "key2" in > * candidate. > * > * The function Returns true ONLY when the to_parse string begins > * with the candidate key, possibly followed by its value (valueless > * key-only entries are allowed in the comman-separated list). > * Otherwise, *end, *valuestart and *valuelen are LEFT INTACT and > * the function returns false. > * > * *valuestart will point at the byte after '=' (i.e. the beginning > * of the value), and the number of bytes in the value will be set > * to *valuelen. > * A key-only entry results in *valuestart set to NULL and *valuelen > * set to 0. > * *end will point at the next key[=val] in the comma-separated list > * or NULL when the list ran out. > */ > > > +static int match_atom_arg_value(const char *to_parse, const char *candidate, > > + const char **end, const char **valuestart, > > + size_t *valuelen) > > +{ > > + const char *atom; > > + > > + if (!(skip_prefix(to_parse, candidate, &atom))) > > + return 0; > > + if (valuestart) { > > As far as I saw, no callers pass NULL to valuestart. Getting rid of > this if() statement and always entering its body would clarify what > is going on, I think. > > > + if (*atom == '=') { > > + *valuestart = atom + 1; > > + *valuelen = strcspn(*valuestart, ",\0"); > > + atom = *valuestart + *valuelen; > > + } else { > > + if (*atom != ',' && *atom != '\0') > > + return 0; > > + *valuestart = NULL; > > + *valuelen = 0; > > + } > > + } > > + if (*atom == ',') { > > + *end = atom + 1; > > + return 1; > > + } > > + if (*atom == '\0') { > > + *end = atom; > > + return 1; > > + } > > + return 0; > > +} > > /* > * Write something similar to document the contract between the caller > * and this function here. > */ > > +static int match_atom_bool_arg(const char *to_parse, const char *candidate, > > + const char **end, int *val) > > +{ I'll make these changes in the re-rolled version. I've also read your reply to this email with the changes in `match_atom_arg_value()`. I'll add them too. Going off in a tangent here---In yesterday's review club which discussed the %(decorate:<options>) patch[1], Glen suggested the possibility of having a single kind of a framework (used this word very loosely here) for parsing these multiple options since we are beginning to see them so often (might also help new formats which maybe added in the future). The fact that this wasn't done already says something about its difficulty as Jacob mentioned yesterday. The difficulty being we don't exactly know which options to parse as they differ from format to format. Christian, Hariom and I had a similar discussion about refactoring these helper functions that are already there in pretty (`match_placeholder_*()`) so that they can be used here. One example of this usage of functions from pretty is done when making ref-filter support trailers. [1]: https://lore.kernel.org/git/20230715160730.4046-1-andy.koppe@gmail.com/ Thanks
Junio C Hamano <gitster@pobox.com> writes: > New helper functions that do not have any caller and no > documentation to explain how they are supposed to be called > (i.e. the expectation on the callers---what values they need to feed > as parameters when they call these helpers, and the expectation by > the callers---what they expect to get out of the helpers once they > return) makes it impossible to evaluate if they are any good [*]. Agreed. > Side note. Those of you who are keen to add unit tests to > the system (Cc:ed) , do you think a patch line this one that > adds a new helper function to the system, would benefit from > being able to add a few unit tests for these otherwise > unused helper functions? Absolutely. As a rule, we should strive to test all of our changes as they are introduced. With our current shell-based testing, this means that we have to add callers (either via a builtin or test-helper), but IMO a unit test framework would serve this purpose even better. > The calls to the new functions that the unit test framework > would make should serve as a good piece of interface > documentation, showing what the callers are supposed to pass > and what they expect, I guess. Agreed, and as documentation, unit tests can be easier to read, since they can include only the relevant details. > So whatever framework we choose, it should allow adding a > test or two to this patch easily, without being too > intrusive. Would that be a good and concrete evaluation > criterion? Perhaps, but the biggest blocker to adding a unit tests is whether the source file itself is amenable to being unit tested (e.g. does it depend on global state? does it compile easily?). Once that is in place, I can't imagine that there would be a sensible unit test framework that doesn't make it easy to add tests to a patch like this.
Kousik Sanagavarapu <five231003@gmail.com> writes: > Going off in a tangent here---In yesterday's review club which discussed > the %(decorate:<options>) patch[1], Glen suggested the possibility of > having a single kind of a framework (used this word very loosely here) > for parsing these multiple options since we are beginning to see them so > often (might also help new formats which maybe added in the future). The > fact that this wasn't done already says something about its difficulty > as Jacob mentioned yesterday. The difficulty being we don't exactly know > which options to parse as they differ from format to format. Yes, while writing the sample in-code documentation for the match_atom_arg_value() function, I found its interface to force the callers to do "parse if it is 'tags'; else parse if it is 'abbrev'; and so on" hard to use. I wondered if the primitive to parse these should be modeled after config.c:parse_config_key(), i.e. the helper function takes the input and returns the split point and lengths of the components it finds in the return parameter. As the contract between the caller and the callee is that the caller passes the beginning of "key1[=val1],key2[=val2],...", the interface may look like int parse_cskv(const char *to_parse, const char **key, size_t *keylen, const char **val, size_t *vallen, const char **end); and would be used like so: const char *cskv = "key1,key2=val2"; const char *key, *val; size_t keylen, vallen; while (parse_cskv(cskv, &key, &keylen, &val, &vallen, &cskv) { if (!val) printf("valueless key '%.*s'\n", (int)keylen, key); else printf("key-value pair '%.*s=%.*s'\n", (int)keylen, key, (int)vallen, val); } if (*cskv) fprintf(stderr, "error - trailing garbage seen '%s'\n", cskv); The helper's contract to the caller may look like this: - It expects the string to be "key" or "key=val" followed by ',' or '\0' (the latter ends the input, i.e. the last element of comma separated list). The out variables key, val, keylen, vallen are used to convey to the caller where key and val are found and how long they are. - If it is a valueless "key", the out variable val is set to NULL. - The out variable end is updated to point at one byte after the element that has just been parsed. The need for the caller to check against the list of keys it knows about in the loop still exists, but the parser may become simpler that way. I dunno.
Glen Choo <chooglen@google.com> writes: >> So whatever framework we choose, it should allow adding a >> test or two to this patch easily, without being too >> intrusive. Would that be a good and concrete evaluation >> criterion? > > Perhaps, but the biggest blocker to adding a unit tests is whether the > source file itself is amenable to being unit tested (e.g. does it depend > on global state? does it compile easily?). Perhaps. Now would this particular example, the change to ref-filter.c file, be a reasonable guinea-pig test case for candidate test frameworks to add tests for these two helper functions? They are pretty-much plain vanilla string manipulation functions that does not depend too many things that are specific to Git. They may use helpers we wrote, i.e. xstrndup(), skip_prefix(), and git_parse_maybe_bool(), but they shouldn't depend on the program start-up sequence, discovering repositories, installing at-exit handers, and other stuff. It was why I wondered if it can be used as a good evaluation criterion---if a test framework cannot easily add tests while this patch was being proposed in a non-intrusive way to demonstrate how these two functions are supposed to work and to protect their implementations from future breakage, it would not be all that useful, I would imagine.
Junio C Hamano <gitster@pobox.com> writes: > Glen Choo <chooglen@google.com> writes: > >>> So whatever framework we choose, it should allow adding a >>> test or two to this patch easily, without being too >>> intrusive. Would that be a good and concrete evaluation >>> criterion? >> >> Perhaps, but the biggest blocker to adding a unit tests is whether the >> source file itself is amenable to being unit tested (e.g. does it depend >> on global state? does it compile easily?). > > Now would this particular example, the change to ref-filter.c file, > be a reasonable guinea-pig test case for candidate test frameworks > to add tests for these two helper functions? They are pretty-much > plain vanilla string manipulation functions that does not depend too > many things that are specific to Git. Ah, yes. This would be close-to-ideal candidate then. > They may use helpers we > wrote, i.e. xstrndup(), skip_prefix(), and git_parse_maybe_bool(), > but they shouldn't depend on the program start-up sequence, > discovering repositories, installing at-exit handers, and other > stuff. It was why I wondered if it can be used as a good evaluation > criterion---if a test framework cannot easily add tests while this > patch was being proposed in a non-intrusive way to demonstrate how > these two functions are supposed to work and to protect their > implementations from future breakage, it would not be all that > useful, I would imagine. The current thinking among Googlers is that we won't remove the helpers from library code. Git will either provide them, e.g. via Calvin's git-std-lib RFC [1], or we will provide ways for callers to bring their own implementation (like trace2 or exit, since it doesn't necessarily make sense to use Git's implementation). So yes, the test framework should be able to support this sort of compilation pattern. I'm not sure how much test frameworks differ in this regard, maybe Josh has some insight here. [1] https://lore.kernel.org/git/20230627195251.1973421-1-calvinwan@google.com
diff --git a/ref-filter.c b/ref-filter.c index 60919f375f..f64437e781 100644 --- a/ref-filter.c +++ b/ref-filter.c @@ -255,6 +255,65 @@ static int err_bad_arg(struct strbuf *sb, const char *name, const char *arg) return -1; } +static int match_atom_arg_value(const char *to_parse, const char *candidate, + const char **end, const char **valuestart, + size_t *valuelen) +{ + const char *atom; + + if (!(skip_prefix(to_parse, candidate, &atom))) + return 0; + if (valuestart) { + if (*atom == '=') { + *valuestart = atom + 1; + *valuelen = strcspn(*valuestart, ",\0"); + atom = *valuestart + *valuelen; + } else { + if (*atom != ',' && *atom != '\0') + return 0; + *valuestart = NULL; + *valuelen = 0; + } + } + if (*atom == ',') { + *end = atom + 1; + return 1; + } + if (*atom == '\0') { + *end = atom; + return 1; + } + return 0; +} + +static int match_atom_bool_arg(const char *to_parse, const char *candidate, + const char **end, int *val) +{ + const char *argval; + char *strval; + size_t arglen; + int v; + + if (!match_atom_arg_value(to_parse, candidate, end, &argval, &arglen)) + return 0; + + if (!argval) { + *val = 1; + return 1; + } + + strval = xstrndup(argval, arglen); + v = git_parse_maybe_bool(strval); + free(strval); + + if (v == -1) + return 0; + + *val = v; + + return 1; +} + static int color_atom_parser(struct ref_format *format, struct used_atom *atom, const char *color_value, struct strbuf *err) {
The functions match_placeholder_arg_value() match_placeholder_bool_arg() were added in pretty 4f732e0fd7 (pretty: allow %(trailers) options with explicit value, 2019-01-29) to parse multiple options in an argument to --pretty. For example, git log --pretty="%(trailers:key=Signed-Off-By,separator=%x2C )" will output all the trailers matching the key and seperates them by a comma followed by a space per commit. Add similar functions, match_atom_arg_value() match_atom_bool_arg() in ref-filter. There is no atom yet that can use these functions in ref-filter, but we are going to add a new %(describe) atom in a subsequent commit where we parse options like tags=<bool-value> or match=<pattern> given to it. Mentored-by: Christian Couder <christian.couder@gmail.com> Mentored-by: Hariom Verma <hariom18599@gmail.com> Signed-off-by: Kousik Sanagavarapu <five231003@gmail.com> --- ref-filter.c | 59 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 59 insertions(+)