[05/10] evolve: add the change-table structure

Message ID	2b3a00a6702eb8fb12e45b833ca74155939588ef.1663959325.git.gitgitgadget@gmail.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <git-owner@kernel.org> Message-Id: <2b3a00a6702eb8fb12e45b833ca74155939588ef.1663959325.git.gitgitgadget@gmail.com> In-Reply-To: <pull.1356.git.1663959324.gitgitgadget@gmail.com> References: <pull.1356.git.1663959324.gitgitgadget@gmail.com> Date: Fri, 23 Sep 2022 18:55:19 +0000 Subject: [PATCH 05/10] evolve: add the change-table structure Fcc: Sent Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit MIME-Version: 1.0 To: git@vger.kernel.org Cc: Christophe Poucet <christophe.poucet@gmail.com>, Stefan Xenos <sxenos@google.com> Precedence: bulk From: Stefan Xenos <sxenos@google.com>
Series	Add the Git Change command \| expand [00/10] Add the Git Change command [01/10] technical doc: add a design doc for the evolve command [02/10] sha1-array: implement oid_array_readonly_contains [03/10] ref-filter: add the metas namespace to ref-filter [04/10] evolve: add support for parsing metacommits [05/10] evolve: add the change-table structure [06/10] evolve: add support for writing metacommits [07/10] evolve: implement the git change command [08/10] evolve: add the git change list command [09/10] evolve: add delete command [10/10] evolve: add documentation for `git change`

Stefan Xenos Sept. 23, 2022, 6:55 p.m. UTC

From: Stefan Xenos <sxenos@google.com>

A change table stores a list of changes, and supports efficient lookup
from a commit hash to the list of changes that reference that commit
directly.

It can be used to look up content commits or metacommits at the head
of a change, but does not support lookup of commits referenced as part
of the commit history.

Signed-off-by: Stefan Xenos <sxenos@google.com>
Signed-off-by: Chris Poucet <poucet@google.com>
---
 Makefile       |   1 +
 change-table.c | 179 +++++++++++++++++++++++++++++++++++++++++++++++++
 change-table.h | 132 ++++++++++++++++++++++++++++++++++++
 3 files changed, 312 insertions(+)
 create mode 100644 change-table.c
 create mode 100644 change-table.h

Phillip Wood Sept. 27, 2022, 1:27 p.m. UTC | #1

Hi Chris

On 23/09/2022 19:55, Stefan Xenos via GitGitGadget wrote:
> From: Stefan Xenos <sxenos@google.com>
> 
> A change table stores a list of changes, and supports efficient lookup
> from a commit hash to the list of changes that reference that commit
> directly.
> 
> It can be used to look up content commits or metacommits at the head
> of a change, but does not support lookup of commits referenced as part
> of the commit history.
> 
> Signed-off-by: Stefan Xenos <sxenos@google.com>
> Signed-off-by: Chris Poucet <poucet@google.com>

 > diff --git a/change-table.h b/change-table.h
 > new file mode 100644
 > index 00000000000..166b5ed8073
 > --- /dev/null
 > +++ b/change-table.h
 > @@ -0,0 +1,132 @@
 > +#ifndef CHANGE_TABLE_H
 > +#define CHANGE_TABLE_H
 > +
 > +#include "oidmap.h"
 > +
 > +struct commit;
 > +struct ref_filter;
 > +
 > +/**

We tend to just use '/*' rather than '/**'

 > + * This struct holds a list of change refs. The first element is 
stored inline,
 > + * to optimize for small lists.
 > + */
 > +struct change_list {
 > +	/**
 > +	 * Ref name for the first change in the list, or null if none.
 > +	 *
 > +	 * This field is private. Use for_each_change_in to read.
 > +	 */
 > +	const char* first_refname;
 > +	/**
 > +	 * List of additional change refs. Note that this is empty if the list
 > +	 * contains 0 or 1 elements.
 > +	 *
 > +	 * This field is private. Use for_each_change_in to read.
 > +	 */
 > +	struct string_list additional_refnames;

Splitting this feels like a premature optimization. We don't have any 
tests yet, let alone any real-world experience using this code. Also if 
we want to save memory for lists with a single entry why are we 
embedding the struct string_list rather than just storing a pointer to it?

I think it would be simpler to use a struct strset to hold the refnames 
as we don't need the util field offered by struct string_list.

 > +/**
 > + * Holds information about the head of a single change.
 > + */
 > +struct change_head {
 > +	/**
 > +	 * The location pointed to by the head of the change. May be a 
commit or a
 > +	 * metacommit.
 > +	 */
 > +	struct object_id head;

I found this duality between commits and metacommits rather confusing - 
why isn't the head always a metacommit?

 > +/**
 > + * Holds information about the heads of each change, and permits 
effecient

s/effecient/efficient/

 > + * lookup from a commit to the changes that reference it directly.
 > + *
 > + * All fields should be considered private. Use the change_table 
functions
 > + * to interact with this struct.
 > + */
 > +struct change_table {
 > +	/**
 > +	 * Memory pool for the objects allocated by the change table.
 > +	 */
 > +	struct mem_pool memory_pool;
 > +	/* Map object_id to commit_change_list_entry structs. */
 > +	struct oidmap oid_to_metadata_index;
 > +	/**
 > +	 * List of ref names. The util value points to a change_head structure
 > +	 * allocated from memory_pool.
 > +	 */
 > +	struct string_list refname_to_change_head;

I think these days we'd use a strmap for this for O(1) lookups.

 > +};
 > +
 > +extern void change_table_init(struct change_table *to_initialize);

The struct change_table argument to all these functions changes its name 
more often than a criminal on the run. I would find it much easier to 
follow the code if we consistently called this argument "table"

 > + * Adds all changes matching the given ref filter to the given 
change_table
 > + * struct.
 > + */
 > +extern void change_table_add_matching_filter(struct change_table 
*to_modify,
 > +	struct repository* repo, struct ref_filter *filter);

I can't see any callers outside of change-table.c so do we really need 
to export this function.

> diff --git a/change-table.c b/change-table.c
> new file mode 100644
> index 00000000000..c61ba29f1ed
> --- /dev/null
> +++ b/change-table.c
> @@ -0,0 +1,179 @@
> +#include "cache.h"
> +#include "change-table.h"
> +#include "commit.h"
> +#include "ref-filter.h"
> +#include "metacommit-parser.h"
> +
> +void change_table_init(struct change_table *to_initialize)
> +{
> +	memset(to_initialize, 0, sizeof(*to_initialize));
> +	mem_pool_init(&to_initialize->memory_pool, 0);
> +	to_initialize->memory_pool.block_alloc = 4*1024 - sizeof(struct mp_block);

If we're using a mempool to minimize the allocation overhead we should 
leave .block_alloc set to the default value of 1MB rather than changing 
it to 4kB

> +	oidmap_init(&to_initialize->oid_to_metadata_index, 0);
> +	string_list_init_dup(&to_initialize->refname_to_change_head);
> +}
> +
> +static void change_list_clear(struct change_list *to_clear) {
> +	string_list_clear(&to_clear->additional_refnames, 0);
> +}
> +
> +static void commit_change_list_entry_clear(
> +	struct commit_change_list_entry *to_clear) {
> +	change_list_clear(&to_clear->changes);
> +}
> +
> +void change_table_clear(struct change_table *to_clear)
> +{
> +	struct oidmap_iter iter;
> +	struct commit_change_list_entry *next;
> +	for (next = oidmap_iter_first(&to_clear->oid_to_metadata_index, &iter);
> +		next;
> +		next = oidmap_iter_next(&iter)) {
> +
> +		commit_change_list_entry_clear(next);
> +	}
> +
> +	oidmap_free(&to_clear->oid_to_metadata_index, 0);
> +	string_list_clear(&to_clear->refname_to_change_head, 0);
> +	mem_pool_discard(&to_clear->memory_pool, 0);
> +}
> +
> +static void add_head_to_commit(struct change_table *to_modify,
> +	const struct object_id *to_add, const char *refname)

I found the function and argument names rather confusing. If I've 
understood the code correctly then this function is adding an assoation 
between the commit "to_add" and "refname". Despite its name "to_add" may 
already exist in the change table.

The formatting is a bit off as well (as are most of the function 
declarations in this patch and the next), we'd write that as

static void add_head_to_commit(struct change_table *table,
			       const struct object_id *to_add,
			       const char *refname)

> +{
> +	struct commit_change_list_entry *entry;
> +
> +	/**
> +	 * Note: the indices in the map are 1-based. 0 is used to indicate a missing
> +	 * element.
> +	 */

I'm confused by this comment, what indices is it talking about?

> +	entry = oidmap_get(&to_modify->oid_to_metadata_index, to_add);
> +	if (!entry) {
> +		entry = mem_pool_calloc(&to_modify->memory_pool, 1,
> +			sizeof(*entry));
> +		oidcpy(&entry->entry.oid, to_add);
> +		oidmap_put(&to_modify->oid_to_metadata_index, entry);
> +		string_list_init_nodup(&entry->changes.additional_refnames);
> +	}
> +
> +	if (!entry->changes.first_refname)
> +		entry->changes.first_refname = refname;
> +	else
> +		string_list_insert(&entry->changes.additional_refnames, refname);

This is an example of the complexity added by the current definition of 
struct change_list.

> +void change_table_add(struct change_table *to_modify, const char *refname,
> +	struct commit *to_add)
> +{
> +	struct change_head *new_head;
> +	struct string_list_item *new_item;
> +	int metacommit_type;
> +
> +	new_head = mem_pool_calloc(&to_modify->memory_pool, 1,
> +		sizeof(*new_head));
> +
> +	oidcpy(&new_head->head, &to_add->object.oid);
> +
> +	metacommit_type = get_metacommit_content(to_add, &new_head->content);
> +	if (metacommit_type == METACOMMIT_TYPE_NONE)
> +		oidcpy(&new_head->content, &to_add->object.oid);

If to_add is not a metacommit then the content is to_add itself, 
otherwise it will have been set by the call to get_metacommit_content().

> +	new_head->abandoned = (metacommit_type == METACOMMIT_TYPE_ABANDONED);

Style: I don't think we normally bother with parentheses here

> +	new_head->remote = starts_with(refname, "refs/remote/");
> +	new_head->hidden = starts_with(refname, "refs/hiddenmetas/");
> +
> +	new_item = string_list_insert(&to_modify->refname_to_change_head, refname);
> +	new_item->util = new_head;
> +	/* Use pointers to the copy of the string we're retaining locally */

string_list_insert() copied the string and we're using that copy. Saying 
we're retaining it locally when it will outlive this function call is 
confusing.

> +	refname = new_item->string;
> +
> +	if (!oideq(&new_head->content, &new_head->head))
> +		add_head_to_commit(to_modify, &new_head->content, refname);

If to_add is a metacommit then we remember the link between refname and 
the content commit.

> +	add_head_to_commit(to_modify, &new_head->head, refname);

We also remember the link between refname and to_add

> +}
> +
> +void change_table_add_all_visible(struct change_table *to_modify,
> +	struct repository* repo)
> +{
> +	struct ref_filter filter;

rather than using memset we'd write (the same goes for all the other 
memset() calls in this series, unless they're operation on a heap 
allocation)

	struct ref_filter filter = { 0 };

> +	const char *name_patterns[] = {NULL};
> +	memset(&filter, 0, sizeof(filter));
> +	filter.kind = FILTER_REFS_CHANGES;
> +	filter.name_patterns = name_patterns;
> +
> +	change_table_add_matching_filter(to_modify, repo, &filter);
> +}
> +
> +void change_table_add_matching_filter(struct change_table *to_modify,
> +	struct repository* repo, struct ref_filter *filter)
> +{
> +	struct ref_array matching_refs;
> +	int i;
> +
> +	memset(&matching_refs, 0, sizeof(matching_refs));
> +	filter_refs(&matching_refs, filter, filter->kind);
> +
> +	/**
> +	 * Determine the object id for the latest content commit for each change.
> +	 * Fetch the commit at the head of each change ref. If it's a normal commit,
> +	 * that's the commit we want. If it's a metacommit, locate its content parent
> +	 * and use that.
> +	 */
> +
> +	for (i = 0; i < matching_refs.nr; i++) {
> +		struct ref_array_item *item = matching_refs.items[i];
> +		struct commit *commit = item->commit;
> +
> +		commit = lookup_commit_reference_gently(repo, &item->objectname, 1);

We're assigning commit twice - why do we need to look it up if 
filter_refs returns it?

There are a number of places where we call 
lookup_commit_reference_gently(..., 1) to silence the warning if the 
objectname does not dereference to a commit. It is not clear to me that 
we want to hide those errors. Indeed I think we should be doing

		commit = lookup_commit_reference(repo, oid)
		if (!commit)
			BUG("commit missing ...")

unless there is a good reason that the lookup can fail.

> +		if (commit)
> +			change_table_add(to_modify, item->refname, commit);
> +	}
> +
> +	ref_array_clear(&matching_refs);
> +}

> +int for_each_change_referencing(struct change_table *table,
> +	const struct object_id *referenced_commit_id, each_change_fn fn, void *cb_data)
> +{
> +	const struct change_list *changes;
> +	int i;
> +	int retvalue;

We normally use "ret" for this

> +	struct commit_change_list_entry *entry;
> +
> +	entry = oidmap_get(&table->oid_to_metadata_index,
> +		referenced_commit_id);

This should be indented to start below the '(' of the function call.

> +	/* If this commit isn't referenced by any changes, it won't be in the map */
> +	if (!entry)
> +		return 0;
> +	changes = &entry->changes;
> +	if (!changes->first_refname)
> +		return 0;
> +	retvalue = fn(changes->first_refname, cb_data);
> +	for (i = 0; retvalue == 0 && i < changes->additional_refnames.nr; i++)
> +		retvalue = fn(changes->additional_refnames.items[i].string, cb_data);

Using an strset for struct change_list would simplify this

> +	return retvalue;
> +}
> +
> +struct change_head* get_change_head(struct change_table *heads,
> +	const char* refname)
> +{
> +	struct string_list_item *item = string_list_lookup(
> +		&heads->refname_to_change_head, refname);
> +
> +	if (!item)
> +		return NULL;
> +
> +	return (struct change_head *)item->util;

We don't bother with casting void* pointers like this. In any case this 
whole function could become

	return strmap_get(table, refname)

if we used an strmap instead of a string_list.


Aside from the style issues and using api's that have been added since 
Stefan wrote these patches this looks pretty sound. The only thing I 
don't really get why the public api allows normal commits to be added to 
the change table (I can see why we might want to add the content commit 
as well when we add a metacommit but that should be done internally)

Best Wishes

Phillip

Ævar Arnfjörð Bjarmason Sept. 27, 2022, 1:50 p.m. UTC | #2

On Tue, Sep 27 2022, Phillip Wood wrote:

> On 23/09/2022 19:55, Stefan Xenos via GitGitGadget wrote:
>> From: Stefan Xenos <sxenos@google.com>
>> A change table stores a list of changes, and supports efficient
>> lookup
>> from a commit hash to the list of changes that reference that commit
>> directly.
>> It can be used to look up content commits or metacommits at the head
>> of a change, but does not support lookup of commits referenced as part
>> of the commit history.
>> Signed-off-by: Stefan Xenos <sxenos@google.com>
>> Signed-off-by: Chris Poucet <poucet@google.com>
>
>> diff --git a/change-table.h b/change-table.h
>> new file mode 100644
>> index 00000000000..166b5ed8073
>> --- /dev/null
>> +++ b/change-table.h
>> @@ -0,0 +1,132 @@
>> +#ifndef CHANGE_TABLE_H
>> +#define CHANGE_TABLE_H
>> +
>> +#include "oidmap.h"
>> +
>> +struct commit;
>> +struct ref_filter;
>> +
>> +/**
>
> We tend to just use '/*' rather than '/**'

No, we use both, and /** is correct here. It's an API-doc syntax, see
e.g. strbuf.h.

It's explicitly meant for this sort of thing, i.e. comments on public
structs in a header & the functions in a header (and struct members,
etc.).

>> + * This struct holds a list of change refs. The first element is
>   stored inline,
>> + * to optimize for small lists.
>> + */
>> +struct change_list {
>> +	/**
>> +	 * Ref name for the first change in the list, or null if none.
>> +	 *
>> +	 * This field is private. Use for_each_change_in to read.
>> +	 */
>> +	const char* first_refname;
>> +	/**

While we're on nits we tend to add an extra \n before the next API
comment...

Phillip Wood Sept. 27, 2022, 2:13 p.m. UTC | #3

On 27/09/2022 14:50, Ævar Arnfjörð Bjarmason wrote:
> 
> On Tue, Sep 27 2022, Phillip Wood wrote:
> 
>> On 23/09/2022 19:55, Stefan Xenos via GitGitGadget wrote:
>>> From: Stefan Xenos <sxenos@google.com>
>>> A change table stores a list of changes, and supports efficient
>>> lookup
>>> from a commit hash to the list of changes that reference that commit
>>> directly.
>>> It can be used to look up content commits or metacommits at the head
>>> of a change, but does not support lookup of commits referenced as part
>>> of the commit history.
>>> Signed-off-by: Stefan Xenos <sxenos@google.com>
>>> Signed-off-by: Chris Poucet <poucet@google.com>
>>
>>> diff --git a/change-table.h b/change-table.h
>>> new file mode 100644
>>> index 00000000000..166b5ed8073
>>> --- /dev/null
>>> +++ b/change-table.h
>>> @@ -0,0 +1,132 @@
>>> +#ifndef CHANGE_TABLE_H
>>> +#define CHANGE_TABLE_H
>>> +
>>> +#include "oidmap.h"
>>> +
>>> +struct commit;
>>> +struct ref_filter;
>>> +
>>> +/**
>>
>> We tend to just use '/*' rather than '/**'
> 
> No, we use both, and /** is correct here. It's an API-doc syntax, see
> e.g. strbuf.h.
> 
> It's explicitly meant for this sort of thing, i.e. comments on public
> structs in a header & the functions in a header (and struct members,
> etc.).

We don't do that consistently, we don't mention them in CodingGuidelines 
and we don't use anything that processes API-doc comments. It would be a 
lot simpler and it would be consistent with our coding guidelines just 
to use the same style everywhere. That would avoid problems where this 
series uses API-doc comments for in-code comments in .c files and where 
single line comments in header files do not use the API-doc syntax.

Best Wishes

Phillip

>>> + * This struct holds a list of change refs. The first element is
>>    stored inline,
>>> + * to optimize for small lists.
>>> + */
>>> +struct change_list {
>>> +	/**
>>> +	 * Ref name for the first change in the list, or null if none.
>>> +	 *
>>> +	 * This field is private. Use for_each_change_in to read.
>>> +	 */
>>> +	const char* first_refname;
>>> +	/**
> 
> While we're on nits we tend to add an extra \n before the next API
> comment...

Phillip Wood Sept. 27, 2022, 2:18 p.m. UTC | #4

On 27/09/2022 14:27, Phillip Wood wrote:
>> +    /**
>> +     * Determine the object id for the latest content commit for each 
>> change.
>> +     * Fetch the commit at the head of each change ref. If it's a 
>> normal commit,
>> +     * that's the commit we want. If it's a metacommit, locate its 
>> content parent
>> +     * and use that.
>> +     */
>> +
>> +    for (i = 0; i < matching_refs.nr; i++) {
>> +        struct ref_array_item *item = matching_refs.items[i];
>> +        struct commit *commit = item->commit;
>> +
>> +        commit = lookup_commit_reference_gently(repo, 
>> &item->objectname, 1);
> 
> We're assigning commit twice - why do we need to look it up if 
> filter_refs returns it?

I think we want to look it up to check that item->objectname is a 
commit. item->commit is not filled unless we specify the verbose flag 
and I'm not sure what the implications of setting that are. If we get an 
objectname that does not name a commit we should call BUG() as suggested 
below.

> There are a number of places where we call 
> lookup_commit_reference_gently(..., 1) to silence the warning if the 
> objectname does not dereference to a commit. It is not clear to me that 
> we want to hide those errors. Indeed I think we should be doing
> 
>          commit = lookup_commit_reference(repo, oid)
>          if (!commit)
>              BUG("commit missing ...")
> 
> unless there is a good reason that the lookup can fail.

Ævar Arnfjörð Bjarmason Sept. 27, 2022, 3:28 p.m. UTC | #5

On Tue, Sep 27 2022, Phillip Wood wrote:

> On 27/09/2022 14:50, Ævar Arnfjörð Bjarmason wrote:
>> On Tue, Sep 27 2022, Phillip Wood wrote:
>> 
>>> On 23/09/2022 19:55, Stefan Xenos via GitGitGadget wrote:
>>>> From: Stefan Xenos <sxenos@google.com>
>>>> A change table stores a list of changes, and supports efficient
>>>> lookup
>>>> from a commit hash to the list of changes that reference that commit
>>>> directly.
>>>> It can be used to look up content commits or metacommits at the head
>>>> of a change, but does not support lookup of commits referenced as part
>>>> of the commit history.
>>>> Signed-off-by: Stefan Xenos <sxenos@google.com>
>>>> Signed-off-by: Chris Poucet <poucet@google.com>
>>>
>>>> diff --git a/change-table.h b/change-table.h
>>>> new file mode 100644
>>>> index 00000000000..166b5ed8073
>>>> --- /dev/null
>>>> +++ b/change-table.h
>>>> @@ -0,0 +1,132 @@
>>>> +#ifndef CHANGE_TABLE_H
>>>> +#define CHANGE_TABLE_H
>>>> +
>>>> +#include "oidmap.h"
>>>> +
>>>> +struct commit;
>>>> +struct ref_filter;
>>>> +
>>>> +/**
>>>
>>> We tend to just use '/*' rather than '/**'
>> No, we use both, and /** is correct here. It's an API-doc syntax,
>> see
>> e.g. strbuf.h.
>> It's explicitly meant for this sort of thing, i.e. comments on
>> public
>> structs in a header & the functions in a header (and struct members,
>> etc.).
>
> We don't do that consistently, we don't mention them in
> CodingGuidelines and we don't use anything that processes API-doc
> comments. It would be a lot simpler and it would be consistent with
> our coding guidelines just to use the same style everywhere. That
> would avoid problems where this series uses API-doc comments for
> in-code comments in .c files and where single line comments in header
> files do not use the API-doc syntax.

Yes, this isn't documented in CodingGuidelines (but FWIW is in various
commit messages).

I'm pointing out that this isn't a mistake, but the preferred style for
new API docs.

At least Emacs knows how to highlight these differently, which is the
main use I personally get out of them, I don't know what other use-cases
there are for them...

Phillip Wood Sept. 28, 2022, 2:33 p.m. UTC | #6

On 27/09/2022 16:28, Ævar Arnfjörð Bjarmason wrote:
> 
> On Tue, Sep 27 2022, Phillip Wood wrote:
> 
>> On 27/09/2022 14:50, Ævar Arnfjörð Bjarmason wrote:
>>> On Tue, Sep 27 2022, Phillip Wood wrote:
>>>
>>>> On 23/09/2022 19:55, Stefan Xenos via GitGitGadget wrote:
>>>>> +/**
>>>>
>>>> We tend to just use '/*' rather than '/**'
>>> No, we use both, and /** is correct here. It's an API-doc syntax,
>>> see
>>> e.g. strbuf.h.
>>> It's explicitly meant for this sort of thing, i.e. comments on
>>> public
>>> structs in a header & the functions in a header (and struct members,
>>> etc.).
>>
>> We don't do that consistently, we don't mention them in
>> CodingGuidelines and we don't use anything that processes API-doc
>> comments. It would be a lot simpler and it would be consistent with
>> our coding guidelines just to use the same style everywhere. That
>> would avoid problems where this series uses API-doc comments for
>> in-code comments in .c files and where single line comments in header
>> files do not use the API-doc syntax.
> 
> Yes, this isn't documented in CodingGuidelines (but FWIW is in various
> commit messages).
> 
> I'm pointing out that this isn't a mistake, but the preferred style for
> new API docs.

It seems a bit a stretch to call it the preferred style, however I had 
thought all our uses were historic but that's not the case.

Chris if you want to use '/**' style comments for the API docs then 
please do so consistently and do not use them elsewhere.

> At least Emacs knows how to highlight these differently, which is the
> main use I personally get out of them, I don't know what other use-cases
> there are for them...

I've come across them in projects that use gtk-doc or other 
documentation generators where it is necessary to distinguish 
'documentation' from 'code comments'. I don't think they add much value 
if one is not generating documentation from the source, it is just one 
more thing for contributors to remember.

Best Wishes

Phillip

Ævar Arnfjörð Bjarmason Sept. 28, 2022, 3:14 p.m. UTC | #7

On Wed, Sep 28 2022, Phillip Wood wrote:

> On 27/09/2022 16:28, Ævar Arnfjörð Bjarmason wrote:
>> On Tue, Sep 27 2022, Phillip Wood wrote:
>> 
>>> On 27/09/2022 14:50, Ævar Arnfjörð Bjarmason wrote:
>>>> On Tue, Sep 27 2022, Phillip Wood wrote:
>>>>
>>>>> On 23/09/2022 19:55, Stefan Xenos via GitGitGadget wrote:
>>>>>> +/**
>>>>>
>>>>> We tend to just use '/*' rather than '/**'
>>>> No, we use both, and /** is correct here. It's an API-doc syntax,
>>>> see
>>>> e.g. strbuf.h.
>>>> It's explicitly meant for this sort of thing, i.e. comments on
>>>> public
>>>> structs in a header & the functions in a header (and struct members,
>>>> etc.).
>>>
>>> We don't do that consistently, we don't mention them in
>>> CodingGuidelines and we don't use anything that processes API-doc
>>> comments. It would be a lot simpler and it would be consistent with
>>> our coding guidelines just to use the same style everywhere. That
>>> would avoid problems where this series uses API-doc comments for
>>> in-code comments in .c files and where single line comments in header
>>> files do not use the API-doc syntax.
>> Yes, this isn't documented in CodingGuidelines (but FWIW is in
>> various
>> commit messages).
>> I'm pointing out that this isn't a mistake, but the preferred style
>> for
>> new API docs.
>
> It seems a bit a stretch to call it the preferred style, however I had
> thought all our uses were historic but that's not the case.

FWIW that claim of mine comes from my (admittedly fuzzy recollection of)
history, around the time that this was being worked out (e.g. where API
docs should live, they used to mostly be in Documentation/technical/,
now they're mostly in *.h) we ended up with this mention in CodingGudielines:

 - When you come up with an API, document its functions and structures
   in the header file that exposes the API to its callers. Use what is
   in "strbuf.h" as a model for the appropriate tone and level of
   detail.

I.e. at some point we decreed strbuf.h as a best-practice model to
follow. I think every one of my own use of "/**" has come after reading
that file...

But patches to make it more explicit are most welcome. FWIW I think I
looked into that once, but couldn't find a canonical reference for what
this syntax is even called.

Hrm, looking a bit more I see it's probably JavaDoc (at least Emacs
highlights it as such). A grep of:

	git grep '^ \* @' -- '*.[ch]'

Shows that we make almost no use of the meta-syntax (i.e. "tags":
https://en.wikipedia.org/wiki/Javadoc#Table_of_Javadoc_tags)

> Chris if you want to use '/**' style comments for the API docs then
> please do so consistently and do not use them elsewhere.
>
>> At least Emacs knows how to highlight these differently, which is the
>> main use I personally get out of them, I don't know what other use-cases
>> there are for them...
>
> I've come across them in projects that use gtk-doc or other
> documentation generators where it is necessary to distinguish 
> 'documentation' from 'code comments'. I don't think they add much
> value if one is not generating documentation from the source, it is
> just one more thing for contributors to remember.

I for one find it very useful these "API docs" are shown differently in
my editor, even if I'm not using some fancier (and hypothetical) "make
javadoc" target.

I.e. you can ideally skim the *.h file and see at a glance what's an
implementation comment v.s. API doc comment.

Except that even for the supposed role model of strbuf.h we forgot in
some cases, and should have the below applied to it, oh well... :)

diff --git a/strbuf.h b/strbuf.h
index 76965a17d44..46549986ae3 100644
--- a/strbuf.h
+++ b/strbuf.h
@@ -492,7 +492,7 @@ int strbuf_getline_lf(struct strbuf *sb, FILE *fp);
 /* Uses NUL as the line terminator */
 int strbuf_getline_nul(struct strbuf *sb, FILE *fp);
 
-/*
+/**
  * Similar to strbuf_getline_lf(), but additionally treats a CR that
  * comes immediately before the LF as part of the terminator.
  * This is the most friendly version to be used to read "text" files
@@ -610,7 +610,7 @@ static inline struct strbuf **strbuf_split(const struct strbuf *sb,
 	return strbuf_split_max(sb, terminator, 0);
 }
 
-/*
+/**
  * Adds all strings of a string list to the strbuf, separated by the given
  * separator.  For example, if sep is
  *   ', '
@@ -653,7 +653,7 @@ int launch_editor(const char *path, struct strbuf *buffer,
 int launch_sequence_editor(const char *path, struct strbuf *buffer,
 			   const char *const *env);
 
-/*
+/**
  * In contrast to `launch_editor()`, this function writes out the contents
  * of the specified file first, then clears the `buffer`, then launches
  * the editor and reads back in the file contents into the `buffer`.
@@ -693,7 +693,7 @@ static inline void strbuf_complete_line(struct strbuf *sb)
 	strbuf_complete(sb, '\n');
 }
 
-/*
+/**
  * Copy "name" to "sb", expanding any special @-marks as handled by
  * interpret_branch_name(). The result is a non-qualified branch name
  * (so "foo" or "origin/master" instead of "refs/heads/foo" or
@@ -707,7 +707,7 @@ static inline void strbuf_complete_line(struct strbuf *sb)
 void strbuf_branchname(struct strbuf *sb, const char *name,
 		       unsigned allowed);
 
-/*
+/**
  * Like strbuf_branchname() above, but confirm that the result is
  * syntactically valid to be used as a local branch name in refs/heads/.
  *

Junio C Hamano Sept. 28, 2022, 3:59 p.m. UTC | #8

Phillip Wood <phillip.wood123@gmail.com> writes:

> Chris if you want to use '/**' style comments for the API docs then
> please do so consistently and do not use them elsewhere.

Thanks.

> I've come across them in projects that use gtk-doc or other
> documentation generators where it is necessary to distinguish
> 'documentation' from 'code comments'. I don't think they add much
> value if one is not generating documentation from the source, it is
> just one more thing for contributors to remember.

Yes, I personally find them annoying, but has tolerated them so far,
hoping that something good (read: automated documentation out of
comments) emerge someday, simply because the first ones were added
by folks who were interested in that direction, which unfortunately
has never materialized.

Maybe it would eventually happen, but I think there are a lot of
clean-up to do before it happens.  I somehow suspect that the sooner
the mechanism to create the documentation set, however incomplete
and messy the result is with the current material, the more incentive
the contributors have to apply /** vs /* distinction properly, but
of course on the other hand, until the existing material gets
cleaned up, the care they need to take to make the distinction does
feel like a makework.  So, I dunno.

Thanks.

Chris P Oct. 4, 2022, 2:48 p.m. UTC | #9

>  > +/**
>
> We tend to just use '/*' rather than '/**'

It seems there's some disagreement on this. Regardless, I changed the
ones in the implementation to be "/*"

>
>  > + * This struct holds a list of change refs. The first element is
> stored inline,
>  > + * to optimize for small lists.
>  > + */
>  > +struct change_list {
>  > +    /**
>  > +     * Ref name for the first change in the list, or null if none.
>  > +     *
>  > +     * This field is private. Use for_each_change_in to read.
>  > +     */
>  > +    const char* first_refname;
>  > +    /**
>  > +     * List of additional change refs. Note that this is empty if the list
>  > +     * contains 0 or 1 elements.
>  > +     *
>  > +     * This field is private. Use for_each_change_in to read.
>  > +     */
>  > +    struct string_list additional_refnames;
>
> Splitting this feels like a premature optimization. We don't have any
> tests yet, let alone any real-world experience using this code. Also if
> we want to save memory for lists with a single entry why are we
> embedding the struct string_list rather than just storing a pointer to it?

Agreed, simplified to a strset. Thanks for the suggestion.

>
> I think it would be simpler to use a struct strset to hold the refnames
> as we don't need the util field offered by struct string_list.

Done.

>
>  > +/**
>  > + * Holds information about the head of a single change.
>  > + */
>  > +struct change_head {
>  > +    /**
>  > +     * The location pointed to by the head of the change. May be a
> commit or a
>  > +     * metacommit.
>  > +     */
>  > +    struct object_id head;
>
> I found this duality between commits and metacommits rather confusing -
> why isn't the head always a metacommit?

There is no reason to create a metacommit for the first commit you create.
You only need one if you're replacing a commit with another commit.

>
>  > +/**
>  > + * Holds information about the heads of each change, and permits
> effecient
>
> s/effecient/efficient/

Done.

>
>  > + * lookup from a commit to the changes that reference it directly.
>  > + *
>  > + * All fields should be considered private. Use the change_table
> functions
>  > + * to interact with this struct.
>  > + */
>  > +struct change_table {
>  > +    /**
>  > +     * Memory pool for the objects allocated by the change table.
>  > +     */
>  > +    struct mem_pool memory_pool;
>  > +    /* Map object_id to commit_change_list_entry structs. */
>  > +    struct oidmap oid_to_metadata_index;
>  > +    /**
>  > +     * List of ref names. The util value points to a change_head structure
>  > +     * allocated from memory_pool.
>  > +     */
>  > +    struct string_list refname_to_change_head;
>
> I think these days we'd use a strmap for this for O(1) lookups.

Way better!

>
>  > +};
>  > +
>  > +extern void change_table_init(struct change_table *to_initialize);
>
> The struct change_table argument to all these functions changes its name
> more often than a criminal on the run. I would find it much easier to
> follow the code if we consistently called this argument "table"

Agreed, changed them all to "table".

>
>  > + * Adds all changes matching the given ref filter to the given
> change_table
>  > + * struct.
>  > + */
>  > +extern void change_table_add_matching_filter(struct change_table
> *to_modify,
>  > +    struct repository* repo, struct ref_filter *filter);
>
> I can't see any callers outside of change-table.c so do we really need
> to export this function.

Thanks for verifying, done.

> > +
> > +void change_table_init(struct change_table *to_initialize)
> > +{
> > +     memset(to_initialize, 0, sizeof(*to_initialize));
> > +     mem_pool_init(&to_initialize->memory_pool, 0);
> > +     to_initialize->memory_pool.block_alloc = 4*1024 - sizeof(struct mp_block);
>
> If we're using a mempool to minimize the allocation overhead we should
> leave .block_alloc set to the default value of 1MB rather than changing
> it to 4kB

Good question, I don't know the typical sizes that we'll get for these,
so for now just sticking with the default seems sensible.

> > +
> > +static void add_head_to_commit(struct change_table *to_modify,
> > +     const struct object_id *to_add, const char *refname)
>
> I found the function and argument names rather confusing. If I've
> understood the code correctly then this function is adding an assoation
> between the commit "to_add" and "refname". Despite its name "to_add" may
> already exist in the change table.
>
> The formatting is a bit off as well (as are most of the function
> declarations in this patch and the next), we'd write that as
>
> static void add_head_to_commit(struct change_table *table,
>                                const struct object_id *to_add,
>                                const char *refname)

Thanks, I wasn't clear on the guidelines. I hope the new format makes
more sense.

>
> > +{
> > +     struct commit_change_list_entry *entry;
> > +
> > +     /**
> > +      * Note: the indices in the map are 1-based. 0 is used to indicate a missing
> > +      * element.
> > +      */
>
> I'm confused by this comment, what indices is it talking about?

No idea, removed.

> > +
> > +     if (!entry->changes.first_refname)
> > +             entry->changes.first_refname = refname;
> > +     else
> > +             string_list_insert(&entry->changes.additional_refnames, refname);
>
> This is an example of the complexity added by the current definition of
> struct change_list.

Yes, simplified.

>
> > +void change_table_add(struct change_table *to_modify, const char *refname,
> > +     struct commit *to_add)
> > +{
> > +     struct change_head *new_head;
> > +     struct string_list_item *new_item;
> > +     int metacommit_type;
> > +
> > +     new_head = mem_pool_calloc(&to_modify->memory_pool, 1,
> > +             sizeof(*new_head));
> > +
> > +     oidcpy(&new_head->head, &to_add->object.oid);
> > +
> > +     metacommit_type = get_metacommit_content(to_add, &new_head->content);
> > +     if (metacommit_type == METACOMMIT_TYPE_NONE)
> > +             oidcpy(&new_head->content, &to_add->object.oid);
>
> If to_add is not a metacommit then the content is to_add itself,
> otherwise it will have been set by the call to get_metacommit_content().

Yes, added the comment.

>
> > +     new_head->abandoned = (metacommit_type == METACOMMIT_TYPE_ABANDONED);
>
> Style: I don't think we normally bother with parentheses here

I admit I prefer it here because operator priority isn't always
obvious (it could be read as
(new_head->abandoned = metacommit_type) == METACOMMIT_TYPE_ABANDONED;

>
> > +     new_head->remote = starts_with(refname, "refs/remote/");
> > +     new_head->hidden = starts_with(refname, "refs/hiddenmetas/");
> > +
> > +     new_item = string_list_insert(&to_modify->refname_to_change_head, refname);
> > +     new_item->util = new_head;
> > +     /* Use pointers to the copy of the string we're retaining locally */
>
> string_list_insert() copied the string and we're using that copy. Saying
> we're retaining it locally when it will outlive this function call is
> confusing.

This is now obsolete with the move to strmap.

>
> > +     refname = new_item->string;
> > +
> > +     if (!oideq(&new_head->content, &new_head->head))
> > +             add_head_to_commit(to_modify, &new_head->content, refname);
>
> If to_add is a metacommit then we remember the link between refname and
> the content commit.
>
> > +     add_head_to_commit(to_modify, &new_head->head, refname);
>
> We also remember the link between refname and to_add

Thanks, added the comment.

>
> > +}
> > +
> > +void change_table_add_all_visible(struct change_table *to_modify,
> > +     struct repository* repo)
> > +{
> > +     struct ref_filter filter;
>
> rather than using memset we'd write (the same goes for all the other
> memset() calls in this series, unless they're operation on a heap
> allocation)
>
>         struct ref_filter filter = { 0 };

Thanks, I wasn't aware of that trick.

>
> > +     const char *name_patterns[] = {NULL};
> > +     memset(&filter, 0, sizeof(filter));
> > +     filter.kind = FILTER_REFS_CHANGES;
> > +     filter.name_patterns = name_patterns;
> > +
> > +     change_table_add_matching_filter(to_modify, repo, &filter);
> > +}
> > +
> > +void change_table_add_matching_filter(struct change_table *to_modify,
> > +     struct repository* repo, struct ref_filter *filter)
> > +{
> > +     struct ref_array matching_refs;
> > +     int i;
> > +
> > +     memset(&matching_refs, 0, sizeof(matching_refs));
> > +     filter_refs(&matching_refs, filter, filter->kind);
> > +
> > +     /**
> > +      * Determine the object id for the latest content commit for each change.
> > +      * Fetch the commit at the head of each change ref. If it's a normal commit,
> > +      * that's the commit we want. If it's a metacommit, locate its content parent
> > +      * and use that.
> > +      */
> > +
> > +     for (i = 0; i < matching_refs.nr; i++) {
> > +             struct ref_array_item *item = matching_refs.items[i];
> > +             struct commit *commit = item->commit;
> > +
> > +             commit = lookup_commit_reference_gently(repo, &item->objectname, 1);
>
> We're assigning commit twice - why do we need to look it up if
> filter_refs returns it?

I think this is a case of missing logic if you look at what the
comment above it says.

>
> There are a number of places where we call
> lookup_commit_reference_gently(..., 1) to silence the warning if the
> objectname does not dereference to a commit. It is not clear to me that
> we want to hide those errors. Indeed I think we should be doing

Agreed, move to this.

>
>                 commit = lookup_commit_reference(repo, oid)
>                 if (!commit)
>                         BUG("commit missing ...")
>
> unless there is a good reason that the lookup can fail.

I can't think of any but then I'm not the original author.

>
> > +             if (commit)
> > +                     change_table_add(to_modify, item->refname, commit);
> > +     }
> > +
> > +     ref_array_clear(&matching_refs);
> > +}
>
> > +int for_each_change_referencing(struct change_table *table,
> > +     const struct object_id *referenced_commit_id, each_change_fn fn, void *cb_data)
> > +{
> > +     const struct change_list *changes;
> > +     int i;
> > +     int retvalue;
>
> We normally use "ret" for this

Done.

>
> > +     struct commit_change_list_entry *entry;
> > +
> > +     entry = oidmap_get(&table->oid_to_metadata_index,
> > +             referenced_commit_id);
>
> This should be indented to start below the '(' of the function call.

Done.

>
> > +     /* If this commit isn't referenced by any changes, it won't be in the map */
> > +     if (!entry)
> > +             return 0;
> > +     changes = &entry->changes;
> > +     if (!changes->first_refname)
> > +             return 0;
> > +     retvalue = fn(changes->first_refname, cb_data);
> > +     for (i = 0; retvalue == 0 && i < changes->additional_refnames.nr; i++)
> > +             retvalue = fn(changes->additional_refnames.items[i].string, cb_data);
>
> Using an strset for struct change_list would simplify this

Agreed! Simplified.

>
> > +     return retvalue;
> > +}
> > +
> > +struct change_head* get_change_head(struct change_table *heads,
> > +     const char* refname)
> > +{
> > +     struct string_list_item *item = string_list_lookup(
> > +             &heads->refname_to_change_head, refname);
> > +
> > +     if (!item)
> > +             return NULL;
> > +
> > +     return (struct change_head *)item->util;
>
> We don't bother with casting void* pointers like this. In any case this
> whole function could become
>
>         return strmap_get(table, refname)
>
> if we used an strmap instead of a string_list.
>
>
> Aside from the style issues and using api's that have been added since
> Stefan wrote these patches this looks pretty sound. The only thing I
> don't really get why the public api allows normal commits to be added to
> the change table (I can see why we might want to add the content commit
> as well when we add a metacommit but that should be done internally)
>
> Best Wishes
>
> Phillip

[05/10] evolve: add the change-table structure

Commit Message

Comments

Patch