[v3,1/2] ref-filter: add multiple-option parsing functions

Message ID	20230719162424.70781-2-five231003@gmail.com (mailing list archive)
State	Superseded
Headers	show Return-Path: <git-owner@vger.kernel.org> From: Kousik Sanagavarapu <five231003@gmail.com> To: git@vger.kernel.org Cc: Junio C Hamano <gitster@pobox.com>, Christian Couder <christian.couder@gmail.com>, Hariom Verma <hariom18599@gmail.com>, Kousik Sanagavarapu <five231003@gmail.com> Subject: [PATCH v3 1/2] ref-filter: add multiple-option parsing functions Date: Wed, 19 Jul 2023 21:45:05 +0530 Message-ID: <20230719162424.70781-2-five231003@gmail.com> In-Reply-To: <20230719162424.70781-1-five231003@gmail.com> References: <20230714194249.66862-1-five231003@gmail.com> <20230719162424.70781-1-five231003@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	Add new "describe" atom \| expand [v3,0/2] Add new "describe" atom [v3,1/2] ref-filter: add multiple-option parsing functions [v3,2/2] ref-filter: add new "describe" atom

Kousik Sanagavarapu July 19, 2023, 4:15 p.m. UTC

The functions

	match_placeholder_arg_value()
	match_placeholder_bool_arg()

were added in pretty 4f732e0fd7 (pretty: allow %(trailers) options
with explicit value, 2019-01-29) to parse multiple options in an
argument to --pretty. For example,

	git log --pretty="%(trailers:key=Signed-Off-By,separator=%x2C )"

will output all the trailers matching the key and seperates them by
a comma followed by a space per commit.

Add similar functions,

	match_atom_arg_value()
	match_atom_bool_arg()

in ref-filter.

There is no atom yet that can use these functions in ref-filter, but we
are going to add a new %(describe) atom in a subsequent commit where we
parse options like tags=<bool-value> or match=<pattern> given to it.

Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: Hariom Verma <hariom18599@gmail.com>
Signed-off-by: Kousik Sanagavarapu <five231003@gmail.com>
---
 ref-filter.c | 59 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 59 insertions(+)

Junio C Hamano July 19, 2023, 11:23 p.m. UTC | #1

Kousik Sanagavarapu <five231003@gmail.com> writes:

>  ref-filter.c | 59 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 59 insertions(+)

New helper functions that do not have any caller and no
documentation to explain how they are supposed to be called
(i.e. the expectation on the callers---what values they need to feed
as parameters when they call these helpers, and the expectation by
the callers---what they expect to get out of the helpers once they
return) makes it impossible to evaluate if they are any good [*].

	Side note.  Those of you who are keen to add unit tests to
	the system (Cc:ed) , do you think a patch line this one that
	adds a new helper function to the system, would benefit from
	being able to add a few unit tests for these otherwise
	unused helper functions?

	The calls to the new functions that the unit test framework
	would make should serve as a good piece of interface
	documentation, showing what the callers are supposed to pass
	and what they expect, I guess.

	So whatever framework we choose, it should allow adding a
	test or two to this patch easily, without being too
	intrusive.  Would that be a good and concrete evaluation
	criterion?

Anyway, because of that, I had to read [2/2] first and then come
back here to review this one.

The following is my attempt to write down the contract between the
callers and this new helper function---please give something like
that to the final version.  The the example below is there just to
illustrate the level of information that would be desired to help
future readers and programmers.  Do not take the contents as-written
as truth---I may have (deliberately) mixed in incorrect descriptions
;-).

/*
 * The string "to_parse" is expected to be a comma-separated list
 * of "key" or "key=val".  If your atom allows "key1" and "key2"
 * (possibly with their values) as options, make two calls to this
 * funtion, passing "key1" in candiate and then passing "key2" in
 * candidate.
 *
 * The function Returns true ONLY when the to_parse string begins
 * with the candidate key, possibly followed by its value (valueless
 * key-only entries are allowed in the comman-separated list).
 * Otherwise, *end, *valuestart and *valuelen are LEFT INTACT and
 * the function returns false.
 *
 * *valuestart will point at the byte after '=' (i.e. the beginning
 * of the value), and the number of bytes in the value will be set
 * to *valuelen.
 * A key-only entry results in *valuestart set to NULL and *valuelen
 * set to 0.
 * *end will point at the next key[=val] in the comma-separated list
 * or NULL when the list ran out.
 */

> +static int match_atom_arg_value(const char *to_parse, const char *candidate,
> +				const char **end, const char **valuestart,
> +				size_t *valuelen)
> +{
> +	const char *atom;
> +
> +	if (!(skip_prefix(to_parse, candidate, &atom)))
> +		return 0;
> +	if (valuestart) {

As far as I saw, no callers pass NULL to valuestart.  Getting rid of
this if() statement and always entering its body would clarify what
is going on, I think.

> +		if (*atom == '=') {
> +			*valuestart = atom + 1;
> +			*valuelen = strcspn(*valuestart, ",\0");
> +			atom = *valuestart + *valuelen;
> +		} else {
> +			if (*atom != ',' && *atom != '\0')
> +				return 0;
> +			*valuestart = NULL;
> +			*valuelen = 0;
> +		}
> +	}
> +	if (*atom == ',') {
> +		*end = atom + 1;
> +		return 1;
> +	}
> +	if (*atom == '\0') {
> +		*end = atom;
> +		return 1;
> +	}
> +	return 0;
> +}

/*
 * Write something similar to document the contract between the caller
 * and this function here.
 */
> +static int match_atom_bool_arg(const char *to_parse, const char *candidate,
> +				const char **end, int *val)
> +{

Thanks.

Junio C Hamano July 20, 2023, 5:21 a.m. UTC | #2

Junio C Hamano <gitster@pobox.com> writes:

>> +static int match_atom_arg_value(const char *to_parse, const char *candidate,
>> +				const char **end, const char **valuestart,
>> +				size_t *valuelen)
>> +{
>> +	const char *atom;
>> +
>> +	if (!(skip_prefix(to_parse, candidate, &atom)))
>> +		return 0;
>> +	if (valuestart) {
>
> As far as I saw, no callers pass NULL to valuestart.  Getting rid of
> this if() statement and always entering its body would clarify what
> is going on, I think.

Specifically, ...

>> +		if (*atom == '=') {
>> +			*valuestart = atom + 1;
>> +			*valuelen = strcspn(*valuestart, ",\0");
>> +			atom = *valuestart + *valuelen;
>> +		} else {
>> +			if (*atom != ',' && *atom != '\0')
>> +				return 0;
>> +			*valuestart = NULL;
>> +			*valuelen = 0;
>> +		}
>> +	}
>> +	if (*atom == ',') {
>> +		*end = atom + 1;
>> +		return 1;
>> +	}
>> +	if (*atom == '\0') {
>> +		*end = atom;
>> +		return 1;
>> +	}
>> +	return 0;
>> +}

... I think the body of the function would become easier to read if
written like so:

	if (!skip_prefix(to_parse, candidate, &atom))
		return 0; /* definitely not "candidate" */

	if (*atom == '=') {
		/* we just saw "candidate=" */
		*valuestart = atom + 1;
                atom = strchrnul(*valuestart, ',');
		*valuelen = atom - *valuestart;
	} else if (*atom != ',' && *atom != '\0') {
        	 /* key begins with "candidate" but has more chars */
		return 0;
	} else {
        	/* just "candidate" without "=val" */
		*valuestart = NULL;
		*valuelen = 0;
	}

        /* atom points at either the ',' or NUL after this key[=val] */
	if (*atom == ',')
		atom++;
	else if (*atom)
		BUG("should not happen");

	*end = atom;
	return 1;

as it is clear that *valuestart, *valuelen, and *end are not touched
when the function returns 0 and they are all filled when the function
returns 1.

Also, avoid passing ",\0" to strcspn(); its effect is exactly the
same as passing ",", and at that point you are better off using
strchnul().

Thanks.

Kousik Sanagavarapu July 20, 2023, 4:52 p.m. UTC | #3

On Wed, Jul 19, 2023 at 04:23:15PM -0700, Junio C Hamano wrote:
> Kousik Sanagavarapu <five231003@gmail.com> writes:
> 
> >  ref-filter.c | 59 ++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 59 insertions(+)
> 
> New helper functions that do not have any caller and no
> documentation to explain how they are supposed to be called
> (i.e. the expectation on the callers---what values they need to feed
> as parameters when they call these helpers, and the expectation by
> the callers---what they expect to get out of the helpers once they
> return) makes it impossible to evaluate if they are any good [*].
> 
> 	Side note.  Those of you who are keen to add unit tests to
> 	the system (Cc:ed) , do you think a patch line this one that
> 	adds a new helper function to the system, would benefit from
> 	being able to add a few unit tests for these otherwise
> 	unused helper functions?
> 
> 	The calls to the new functions that the unit test framework
> 	would make should serve as a good piece of interface
> 	documentation, showing what the callers are supposed to pass
> 	and what they expect, I guess.
> 
> 	So whatever framework we choose, it should allow adding a
> 	test or two to this patch easily, without being too
> 	intrusive.  Would that be a good and concrete evaluation
> 	criterion?
> 
> Anyway, because of that, I had to read [2/2] first and then come
> back here to review this one.
> 
> The following is my attempt to write down the contract between the
> callers and this new helper function---please give something like
> that to the final version.  The the example below is there just to
> illustrate the level of information that would be desired to help
> future readers and programmers.  Do not take the contents as-written
> as truth---I may have (deliberately) mixed in incorrect descriptions
> ;-).

I'll spot them---if there are any ;).

> 
> /*
>  * The string "to_parse" is expected to be a comma-separated list
>  * of "key" or "key=val".  If your atom allows "key1" and "key2"
>  * (possibly with their values) as options, make two calls to this
>  * funtion, passing "key1" in candiate and then passing "key2" in
>  * candidate.
>  *
>  * The function Returns true ONLY when the to_parse string begins
>  * with the candidate key, possibly followed by its value (valueless
>  * key-only entries are allowed in the comman-separated list).
>  * Otherwise, *end, *valuestart and *valuelen are LEFT INTACT and
>  * the function returns false.
>  *
>  * *valuestart will point at the byte after '=' (i.e. the beginning
>  * of the value), and the number of bytes in the value will be set
>  * to *valuelen.
>  * A key-only entry results in *valuestart set to NULL and *valuelen
>  * set to 0.
>  * *end will point at the next key[=val] in the comma-separated list
>  * or NULL when the list ran out.
>  */
> 
> > +static int match_atom_arg_value(const char *to_parse, const char *candidate,
> > +				const char **end, const char **valuestart,
> > +				size_t *valuelen)
> > +{
> > +	const char *atom;
> > +
> > +	if (!(skip_prefix(to_parse, candidate, &atom)))
> > +		return 0;
> > +	if (valuestart) {
> 
> As far as I saw, no callers pass NULL to valuestart.  Getting rid of
> this if() statement and always entering its body would clarify what
> is going on, I think.
> 
> > +		if (*atom == '=') {
> > +			*valuestart = atom + 1;
> > +			*valuelen = strcspn(*valuestart, ",\0");
> > +			atom = *valuestart + *valuelen;
> > +		} else {
> > +			if (*atom != ',' && *atom != '\0')
> > +				return 0;
> > +			*valuestart = NULL;
> > +			*valuelen = 0;
> > +		}
> > +	}
> > +	if (*atom == ',') {
> > +		*end = atom + 1;
> > +		return 1;
> > +	}
> > +	if (*atom == '\0') {
> > +		*end = atom;
> > +		return 1;
> > +	}
> > +	return 0;
> > +}
> 
> /*
>  * Write something similar to document the contract between the caller
>  * and this function here.
>  */
> > +static int match_atom_bool_arg(const char *to_parse, const char *candidate,
> > +				const char **end, int *val)
> > +{

I'll make these changes in the re-rolled version. I've also read your
reply to this email with the changes in `match_atom_arg_value()`. I'll
add them too.

Going off in a tangent here---In yesterday's review club which discussed
the %(decorate:<options>) patch[1], Glen suggested the possibility of
having a single kind of a framework (used this word very loosely here)
for parsing these multiple options since we are beginning to see them so
often (might also help new formats which maybe added in the future). The
fact that this wasn't done already says something about its difficulty
as Jacob mentioned yesterday. The difficulty being we don't exactly know
which options to parse as they differ from format to format.

Christian, Hariom and I had a similar discussion about refactoring these
helper functions that are already there in pretty (`match_placeholder_*()`)
so that they can be used here.

One example of this usage of functions from pretty is done when making
ref-filter support trailers.

[1]: https://lore.kernel.org/git/20230715160730.4046-1-andy.koppe@gmail.com/

Thanks

Glen Choo July 20, 2023, 5:42 p.m. UTC | #4

Junio C Hamano <gitster@pobox.com> writes:

> New helper functions that do not have any caller and no
> documentation to explain how they are supposed to be called
> (i.e. the expectation on the callers---what values they need to feed
> as parameters when they call these helpers, and the expectation by
> the callers---what they expect to get out of the helpers once they
> return) makes it impossible to evaluate if they are any good [*].

Agreed.

> 	Side note.  Those of you who are keen to add unit tests to
> 	the system (Cc:ed) , do you think a patch line this one that
> 	adds a new helper function to the system, would benefit from
> 	being able to add a few unit tests for these otherwise
> 	unused helper functions?

Absolutely. As a rule, we should strive to test all of our changes as
they are introduced. With our current shell-based testing, this means
that we have to add callers (either via a builtin or test-helper), but
IMO a unit test framework would serve this purpose even better.

> 	The calls to the new functions that the unit test framework
> 	would make should serve as a good piece of interface
> 	documentation, showing what the callers are supposed to pass
> 	and what they expect, I guess.

Agreed, and as documentation, unit tests can be easier to read, since
they can include only the relevant details.

> 	So whatever framework we choose, it should allow adding a
> 	test or two to this patch easily, without being too
> 	intrusive.  Would that be a good and concrete evaluation
> 	criterion?

Perhaps, but the biggest blocker to adding a unit tests is whether the
source file itself is amenable to being unit tested (e.g. does it depend
on global state? does it compile easily?). Once that is in place, I
can't imagine that there would be a sensible unit test framework that
doesn't make it easy to add tests to a patch like this.

Junio C Hamano July 20, 2023, 5:59 p.m. UTC | #5

Kousik Sanagavarapu <five231003@gmail.com> writes:

> Going off in a tangent here---In yesterday's review club which discussed
> the %(decorate:<options>) patch[1], Glen suggested the possibility of
> having a single kind of a framework (used this word very loosely here)
> for parsing these multiple options since we are beginning to see them so
> often (might also help new formats which maybe added in the future). The
> fact that this wasn't done already says something about its difficulty
> as Jacob mentioned yesterday. The difficulty being we don't exactly know
> which options to parse as they differ from format to format.

Yes, while writing the sample in-code documentation for the
match_atom_arg_value() function, I found its interface to force the
callers to do "parse if it is 'tags'; else parse if it is 'abbrev';
and so on" hard to use.  I wondered if the primitive to parse these
should be modeled after config.c:parse_config_key(), i.e. the helper
function takes the input and returns the split point and lengths of
the components it finds in the return parameter.

As the contract between the caller and the callee is that the caller
passes the beginning of "key1[=val1],key2[=val2],...", the interface
may look like

	int parse_cskv(const char *to_parse,
		       const char **key, size_t *keylen,
		       const char **val, size_t *vallen,
		       const char **end);

and would be used like so:

	const char *cskv = "key1,key2=val2";
	const char *key, *val;
	size_t keylen, vallen;

	while (parse_cskv(cskv, &key, &keylen, &val, &vallen, &cskv) {
		if (!val)
			printf("valueless key '%.*s'\n",
			       (int)keylen, key);
		else
			printf("key-value pair '%.*s=%.*s'\n",
			       (int)keylen, key, (int)vallen, val);
	}
	if (*cskv)
		fprintf(stderr, "error - trailing garbage seen '%s'\n",
			cskv);

The helper's contract to the caller may look like this:

 - It expects the string to be "key" or "key=val" followed by ',' or
   '\0' (the latter ends the input, i.e. the last element of comma
   separated list).  The out variables key, val, keylen, vallen are
   used to convey to the caller where key and val are found and how
   long they are.

 - If it is a valueless "key", the out variable val is set to NULL.

 - The out variable end is updated to point at one byte after the
   element that has just been parsed.

The need for the caller to check against the list of keys it knows
about in the loop still exists, but the parser may become simpler
that way.  I dunno.

Junio C Hamano July 20, 2023, 8:30 p.m. UTC | #6

Glen Choo <chooglen@google.com> writes:

>> 	So whatever framework we choose, it should allow adding a
>> 	test or two to this patch easily, without being too
>> 	intrusive.  Would that be a good and concrete evaluation
>> 	criterion?
>
> Perhaps, but the biggest blocker to adding a unit tests is whether the
> source file itself is amenable to being unit tested (e.g. does it depend
> on global state? does it compile easily?).

Perhaps.  

Now would this particular example, the change to ref-filter.c file,
be a reasonable guinea-pig test case for candidate test frameworks
to add tests for these two helper functions?  They are pretty-much
plain vanilla string manipulation functions that does not depend too
many things that are specific to Git.  They may use helpers we
wrote, i.e. xstrndup(), skip_prefix(), and git_parse_maybe_bool(),
but they shouldn't depend on the program start-up sequence,
discovering repositories, installing at-exit handers, and other
stuff.  It was why I wondered if it can be used as a good evaluation
criterion---if a test framework cannot easily add tests while this
patch was being proposed in a non-intrusive way to demonstrate how
these two functions are supposed to work and to protect their
implementations from future breakage, it would not be all that
useful, I would imagine.

Glen Choo July 21, 2023, 6:26 p.m. UTC | #7

Junio C Hamano <gitster@pobox.com> writes:

> Glen Choo <chooglen@google.com> writes:
>
>>> 	So whatever framework we choose, it should allow adding a
>>> 	test or two to this patch easily, without being too
>>> 	intrusive.  Would that be a good and concrete evaluation
>>> 	criterion?
>>
>> Perhaps, but the biggest blocker to adding a unit tests is whether the
>> source file itself is amenable to being unit tested (e.g. does it depend
>> on global state? does it compile easily?).
>
> Now would this particular example, the change to ref-filter.c file,
> be a reasonable guinea-pig test case for candidate test frameworks
> to add tests for these two helper functions?  They are pretty-much
> plain vanilla string manipulation functions that does not depend too
> many things that are specific to Git.

Ah, yes. This would be close-to-ideal candidate then.

> They may use helpers we
> wrote, i.e. xstrndup(), skip_prefix(), and git_parse_maybe_bool(),
> but they shouldn't depend on the program start-up sequence,
> discovering repositories, installing at-exit handers, and other
> stuff.  It was why I wondered if it can be used as a good evaluation
> criterion---if a test framework cannot easily add tests while this
> patch was being proposed in a non-intrusive way to demonstrate how
> these two functions are supposed to work and to protect their
> implementations from future breakage, it would not be all that
> useful, I would imagine.

The current thinking among Googlers is that we won't remove the helpers
from library code. Git will either provide them, e.g. via Calvin's
git-std-lib RFC [1], or we will provide ways for callers to bring their
own implementation (like trace2 or exit, since it doesn't necessarily
make sense to use Git's implementation). So yes, the test framework
should be able to support this sort of compilation pattern. I'm not sure
how much test frameworks differ in this regard, maybe Josh has some
insight here.

[1] https://lore.kernel.org/git/20230627195251.1973421-1-calvinwan@google.com

[v3,1/2] ref-filter: add multiple-option parsing functions

Commit Message

Comments

Patch