Message ID | 2b3a00a6702eb8fb12e45b833ca74155939588ef.1663959325.git.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Add the Git Change command | expand |
Hi Chris On 23/09/2022 19:55, Stefan Xenos via GitGitGadget wrote: > From: Stefan Xenos <sxenos@google.com> > > A change table stores a list of changes, and supports efficient lookup > from a commit hash to the list of changes that reference that commit > directly. > > It can be used to look up content commits or metacommits at the head > of a change, but does not support lookup of commits referenced as part > of the commit history. > > Signed-off-by: Stefan Xenos <sxenos@google.com> > Signed-off-by: Chris Poucet <poucet@google.com> > diff --git a/change-table.h b/change-table.h > new file mode 100644 > index 00000000000..166b5ed8073 > --- /dev/null > +++ b/change-table.h > @@ -0,0 +1,132 @@ > +#ifndef CHANGE_TABLE_H > +#define CHANGE_TABLE_H > + > +#include "oidmap.h" > + > +struct commit; > +struct ref_filter; > + > +/** We tend to just use '/*' rather than '/**' > + * This struct holds a list of change refs. The first element is stored inline, > + * to optimize for small lists. > + */ > +struct change_list { > + /** > + * Ref name for the first change in the list, or null if none. > + * > + * This field is private. Use for_each_change_in to read. > + */ > + const char* first_refname; > + /** > + * List of additional change refs. Note that this is empty if the list > + * contains 0 or 1 elements. > + * > + * This field is private. Use for_each_change_in to read. > + */ > + struct string_list additional_refnames; Splitting this feels like a premature optimization. We don't have any tests yet, let alone any real-world experience using this code. Also if we want to save memory for lists with a single entry why are we embedding the struct string_list rather than just storing a pointer to it? I think it would be simpler to use a struct strset to hold the refnames as we don't need the util field offered by struct string_list. > +/** > + * Holds information about the head of a single change. > + */ > +struct change_head { > + /** > + * The location pointed to by the head of the change. May be a commit or a > + * metacommit. > + */ > + struct object_id head; I found this duality between commits and metacommits rather confusing - why isn't the head always a metacommit? > +/** > + * Holds information about the heads of each change, and permits effecient s/effecient/efficient/ > + * lookup from a commit to the changes that reference it directly. > + * > + * All fields should be considered private. Use the change_table functions > + * to interact with this struct. > + */ > +struct change_table { > + /** > + * Memory pool for the objects allocated by the change table. > + */ > + struct mem_pool memory_pool; > + /* Map object_id to commit_change_list_entry structs. */ > + struct oidmap oid_to_metadata_index; > + /** > + * List of ref names. The util value points to a change_head structure > + * allocated from memory_pool. > + */ > + struct string_list refname_to_change_head; I think these days we'd use a strmap for this for O(1) lookups. > +}; > + > +extern void change_table_init(struct change_table *to_initialize); The struct change_table argument to all these functions changes its name more often than a criminal on the run. I would find it much easier to follow the code if we consistently called this argument "table" > + * Adds all changes matching the given ref filter to the given change_table > + * struct. > + */ > +extern void change_table_add_matching_filter(struct change_table *to_modify, > + struct repository* repo, struct ref_filter *filter); I can't see any callers outside of change-table.c so do we really need to export this function. > diff --git a/change-table.c b/change-table.c > new file mode 100644 > index 00000000000..c61ba29f1ed > --- /dev/null > +++ b/change-table.c > @@ -0,0 +1,179 @@ > +#include "cache.h" > +#include "change-table.h" > +#include "commit.h" > +#include "ref-filter.h" > +#include "metacommit-parser.h" > + > +void change_table_init(struct change_table *to_initialize) > +{ > + memset(to_initialize, 0, sizeof(*to_initialize)); > + mem_pool_init(&to_initialize->memory_pool, 0); > + to_initialize->memory_pool.block_alloc = 4*1024 - sizeof(struct mp_block); If we're using a mempool to minimize the allocation overhead we should leave .block_alloc set to the default value of 1MB rather than changing it to 4kB > + oidmap_init(&to_initialize->oid_to_metadata_index, 0); > + string_list_init_dup(&to_initialize->refname_to_change_head); > +} > + > +static void change_list_clear(struct change_list *to_clear) { > + string_list_clear(&to_clear->additional_refnames, 0); > +} > + > +static void commit_change_list_entry_clear( > + struct commit_change_list_entry *to_clear) { > + change_list_clear(&to_clear->changes); > +} > + > +void change_table_clear(struct change_table *to_clear) > +{ > + struct oidmap_iter iter; > + struct commit_change_list_entry *next; > + for (next = oidmap_iter_first(&to_clear->oid_to_metadata_index, &iter); > + next; > + next = oidmap_iter_next(&iter)) { > + > + commit_change_list_entry_clear(next); > + } > + > + oidmap_free(&to_clear->oid_to_metadata_index, 0); > + string_list_clear(&to_clear->refname_to_change_head, 0); > + mem_pool_discard(&to_clear->memory_pool, 0); > +} > + > +static void add_head_to_commit(struct change_table *to_modify, > + const struct object_id *to_add, const char *refname) I found the function and argument names rather confusing. If I've understood the code correctly then this function is adding an assoation between the commit "to_add" and "refname". Despite its name "to_add" may already exist in the change table. The formatting is a bit off as well (as are most of the function declarations in this patch and the next), we'd write that as static void add_head_to_commit(struct change_table *table, const struct object_id *to_add, const char *refname) > +{ > + struct commit_change_list_entry *entry; > + > + /** > + * Note: the indices in the map are 1-based. 0 is used to indicate a missing > + * element. > + */ I'm confused by this comment, what indices is it talking about? > + entry = oidmap_get(&to_modify->oid_to_metadata_index, to_add); > + if (!entry) { > + entry = mem_pool_calloc(&to_modify->memory_pool, 1, > + sizeof(*entry)); > + oidcpy(&entry->entry.oid, to_add); > + oidmap_put(&to_modify->oid_to_metadata_index, entry); > + string_list_init_nodup(&entry->changes.additional_refnames); > + } > + > + if (!entry->changes.first_refname) > + entry->changes.first_refname = refname; > + else > + string_list_insert(&entry->changes.additional_refnames, refname); This is an example of the complexity added by the current definition of struct change_list. > +void change_table_add(struct change_table *to_modify, const char *refname, > + struct commit *to_add) > +{ > + struct change_head *new_head; > + struct string_list_item *new_item; > + int metacommit_type; > + > + new_head = mem_pool_calloc(&to_modify->memory_pool, 1, > + sizeof(*new_head)); > + > + oidcpy(&new_head->head, &to_add->object.oid); > + > + metacommit_type = get_metacommit_content(to_add, &new_head->content); > + if (metacommit_type == METACOMMIT_TYPE_NONE) > + oidcpy(&new_head->content, &to_add->object.oid); If to_add is not a metacommit then the content is to_add itself, otherwise it will have been set by the call to get_metacommit_content(). > + new_head->abandoned = (metacommit_type == METACOMMIT_TYPE_ABANDONED); Style: I don't think we normally bother with parentheses here > + new_head->remote = starts_with(refname, "refs/remote/"); > + new_head->hidden = starts_with(refname, "refs/hiddenmetas/"); > + > + new_item = string_list_insert(&to_modify->refname_to_change_head, refname); > + new_item->util = new_head; > + /* Use pointers to the copy of the string we're retaining locally */ string_list_insert() copied the string and we're using that copy. Saying we're retaining it locally when it will outlive this function call is confusing. > + refname = new_item->string; > + > + if (!oideq(&new_head->content, &new_head->head)) > + add_head_to_commit(to_modify, &new_head->content, refname); If to_add is a metacommit then we remember the link between refname and the content commit. > + add_head_to_commit(to_modify, &new_head->head, refname); We also remember the link between refname and to_add > +} > + > +void change_table_add_all_visible(struct change_table *to_modify, > + struct repository* repo) > +{ > + struct ref_filter filter; rather than using memset we'd write (the same goes for all the other memset() calls in this series, unless they're operation on a heap allocation) struct ref_filter filter = { 0 }; > + const char *name_patterns[] = {NULL}; > + memset(&filter, 0, sizeof(filter)); > + filter.kind = FILTER_REFS_CHANGES; > + filter.name_patterns = name_patterns; > + > + change_table_add_matching_filter(to_modify, repo, &filter); > +} > + > +void change_table_add_matching_filter(struct change_table *to_modify, > + struct repository* repo, struct ref_filter *filter) > +{ > + struct ref_array matching_refs; > + int i; > + > + memset(&matching_refs, 0, sizeof(matching_refs)); > + filter_refs(&matching_refs, filter, filter->kind); > + > + /** > + * Determine the object id for the latest content commit for each change. > + * Fetch the commit at the head of each change ref. If it's a normal commit, > + * that's the commit we want. If it's a metacommit, locate its content parent > + * and use that. > + */ > + > + for (i = 0; i < matching_refs.nr; i++) { > + struct ref_array_item *item = matching_refs.items[i]; > + struct commit *commit = item->commit; > + > + commit = lookup_commit_reference_gently(repo, &item->objectname, 1); We're assigning commit twice - why do we need to look it up if filter_refs returns it? There are a number of places where we call lookup_commit_reference_gently(..., 1) to silence the warning if the objectname does not dereference to a commit. It is not clear to me that we want to hide those errors. Indeed I think we should be doing commit = lookup_commit_reference(repo, oid) if (!commit) BUG("commit missing ...") unless there is a good reason that the lookup can fail. > + if (commit) > + change_table_add(to_modify, item->refname, commit); > + } > + > + ref_array_clear(&matching_refs); > +} > +int for_each_change_referencing(struct change_table *table, > + const struct object_id *referenced_commit_id, each_change_fn fn, void *cb_data) > +{ > + const struct change_list *changes; > + int i; > + int retvalue; We normally use "ret" for this > + struct commit_change_list_entry *entry; > + > + entry = oidmap_get(&table->oid_to_metadata_index, > + referenced_commit_id); This should be indented to start below the '(' of the function call. > + /* If this commit isn't referenced by any changes, it won't be in the map */ > + if (!entry) > + return 0; > + changes = &entry->changes; > + if (!changes->first_refname) > + return 0; > + retvalue = fn(changes->first_refname, cb_data); > + for (i = 0; retvalue == 0 && i < changes->additional_refnames.nr; i++) > + retvalue = fn(changes->additional_refnames.items[i].string, cb_data); Using an strset for struct change_list would simplify this > + return retvalue; > +} > + > +struct change_head* get_change_head(struct change_table *heads, > + const char* refname) > +{ > + struct string_list_item *item = string_list_lookup( > + &heads->refname_to_change_head, refname); > + > + if (!item) > + return NULL; > + > + return (struct change_head *)item->util; We don't bother with casting void* pointers like this. In any case this whole function could become return strmap_get(table, refname) if we used an strmap instead of a string_list. Aside from the style issues and using api's that have been added since Stefan wrote these patches this looks pretty sound. The only thing I don't really get why the public api allows normal commits to be added to the change table (I can see why we might want to add the content commit as well when we add a metacommit but that should be done internally) Best Wishes Phillip
On Tue, Sep 27 2022, Phillip Wood wrote: > On 23/09/2022 19:55, Stefan Xenos via GitGitGadget wrote: >> From: Stefan Xenos <sxenos@google.com> >> A change table stores a list of changes, and supports efficient >> lookup >> from a commit hash to the list of changes that reference that commit >> directly. >> It can be used to look up content commits or metacommits at the head >> of a change, but does not support lookup of commits referenced as part >> of the commit history. >> Signed-off-by: Stefan Xenos <sxenos@google.com> >> Signed-off-by: Chris Poucet <poucet@google.com> > >> diff --git a/change-table.h b/change-table.h >> new file mode 100644 >> index 00000000000..166b5ed8073 >> --- /dev/null >> +++ b/change-table.h >> @@ -0,0 +1,132 @@ >> +#ifndef CHANGE_TABLE_H >> +#define CHANGE_TABLE_H >> + >> +#include "oidmap.h" >> + >> +struct commit; >> +struct ref_filter; >> + >> +/** > > We tend to just use '/*' rather than '/**' No, we use both, and /** is correct here. It's an API-doc syntax, see e.g. strbuf.h. It's explicitly meant for this sort of thing, i.e. comments on public structs in a header & the functions in a header (and struct members, etc.). >> + * This struct holds a list of change refs. The first element is > stored inline, >> + * to optimize for small lists. >> + */ >> +struct change_list { >> + /** >> + * Ref name for the first change in the list, or null if none. >> + * >> + * This field is private. Use for_each_change_in to read. >> + */ >> + const char* first_refname; >> + /** While we're on nits we tend to add an extra \n before the next API comment...
On 27/09/2022 14:50, Ævar Arnfjörð Bjarmason wrote: > > On Tue, Sep 27 2022, Phillip Wood wrote: > >> On 23/09/2022 19:55, Stefan Xenos via GitGitGadget wrote: >>> From: Stefan Xenos <sxenos@google.com> >>> A change table stores a list of changes, and supports efficient >>> lookup >>> from a commit hash to the list of changes that reference that commit >>> directly. >>> It can be used to look up content commits or metacommits at the head >>> of a change, but does not support lookup of commits referenced as part >>> of the commit history. >>> Signed-off-by: Stefan Xenos <sxenos@google.com> >>> Signed-off-by: Chris Poucet <poucet@google.com> >> >>> diff --git a/change-table.h b/change-table.h >>> new file mode 100644 >>> index 00000000000..166b5ed8073 >>> --- /dev/null >>> +++ b/change-table.h >>> @@ -0,0 +1,132 @@ >>> +#ifndef CHANGE_TABLE_H >>> +#define CHANGE_TABLE_H >>> + >>> +#include "oidmap.h" >>> + >>> +struct commit; >>> +struct ref_filter; >>> + >>> +/** >> >> We tend to just use '/*' rather than '/**' > > No, we use both, and /** is correct here. It's an API-doc syntax, see > e.g. strbuf.h. > > It's explicitly meant for this sort of thing, i.e. comments on public > structs in a header & the functions in a header (and struct members, > etc.). We don't do that consistently, we don't mention them in CodingGuidelines and we don't use anything that processes API-doc comments. It would be a lot simpler and it would be consistent with our coding guidelines just to use the same style everywhere. That would avoid problems where this series uses API-doc comments for in-code comments in .c files and where single line comments in header files do not use the API-doc syntax. Best Wishes Phillip >>> + * This struct holds a list of change refs. The first element is >> stored inline, >>> + * to optimize for small lists. >>> + */ >>> +struct change_list { >>> + /** >>> + * Ref name for the first change in the list, or null if none. >>> + * >>> + * This field is private. Use for_each_change_in to read. >>> + */ >>> + const char* first_refname; >>> + /** > > While we're on nits we tend to add an extra \n before the next API > comment...
On 27/09/2022 14:27, Phillip Wood wrote: >> + /** >> + * Determine the object id for the latest content commit for each >> change. >> + * Fetch the commit at the head of each change ref. If it's a >> normal commit, >> + * that's the commit we want. If it's a metacommit, locate its >> content parent >> + * and use that. >> + */ >> + >> + for (i = 0; i < matching_refs.nr; i++) { >> + struct ref_array_item *item = matching_refs.items[i]; >> + struct commit *commit = item->commit; >> + >> + commit = lookup_commit_reference_gently(repo, >> &item->objectname, 1); > > We're assigning commit twice - why do we need to look it up if > filter_refs returns it? I think we want to look it up to check that item->objectname is a commit. item->commit is not filled unless we specify the verbose flag and I'm not sure what the implications of setting that are. If we get an objectname that does not name a commit we should call BUG() as suggested below. > There are a number of places where we call > lookup_commit_reference_gently(..., 1) to silence the warning if the > objectname does not dereference to a commit. It is not clear to me that > we want to hide those errors. Indeed I think we should be doing > > commit = lookup_commit_reference(repo, oid) > if (!commit) > BUG("commit missing ...") > > unless there is a good reason that the lookup can fail.
On Tue, Sep 27 2022, Phillip Wood wrote: > On 27/09/2022 14:50, Ævar Arnfjörð Bjarmason wrote: >> On Tue, Sep 27 2022, Phillip Wood wrote: >> >>> On 23/09/2022 19:55, Stefan Xenos via GitGitGadget wrote: >>>> From: Stefan Xenos <sxenos@google.com> >>>> A change table stores a list of changes, and supports efficient >>>> lookup >>>> from a commit hash to the list of changes that reference that commit >>>> directly. >>>> It can be used to look up content commits or metacommits at the head >>>> of a change, but does not support lookup of commits referenced as part >>>> of the commit history. >>>> Signed-off-by: Stefan Xenos <sxenos@google.com> >>>> Signed-off-by: Chris Poucet <poucet@google.com> >>> >>>> diff --git a/change-table.h b/change-table.h >>>> new file mode 100644 >>>> index 00000000000..166b5ed8073 >>>> --- /dev/null >>>> +++ b/change-table.h >>>> @@ -0,0 +1,132 @@ >>>> +#ifndef CHANGE_TABLE_H >>>> +#define CHANGE_TABLE_H >>>> + >>>> +#include "oidmap.h" >>>> + >>>> +struct commit; >>>> +struct ref_filter; >>>> + >>>> +/** >>> >>> We tend to just use '/*' rather than '/**' >> No, we use both, and /** is correct here. It's an API-doc syntax, >> see >> e.g. strbuf.h. >> It's explicitly meant for this sort of thing, i.e. comments on >> public >> structs in a header & the functions in a header (and struct members, >> etc.). > > We don't do that consistently, we don't mention them in > CodingGuidelines and we don't use anything that processes API-doc > comments. It would be a lot simpler and it would be consistent with > our coding guidelines just to use the same style everywhere. That > would avoid problems where this series uses API-doc comments for > in-code comments in .c files and where single line comments in header > files do not use the API-doc syntax. Yes, this isn't documented in CodingGuidelines (but FWIW is in various commit messages). I'm pointing out that this isn't a mistake, but the preferred style for new API docs. At least Emacs knows how to highlight these differently, which is the main use I personally get out of them, I don't know what other use-cases there are for them...
On 27/09/2022 16:28, Ævar Arnfjörð Bjarmason wrote: > > On Tue, Sep 27 2022, Phillip Wood wrote: > >> On 27/09/2022 14:50, Ævar Arnfjörð Bjarmason wrote: >>> On Tue, Sep 27 2022, Phillip Wood wrote: >>> >>>> On 23/09/2022 19:55, Stefan Xenos via GitGitGadget wrote: >>>>> +/** >>>> >>>> We tend to just use '/*' rather than '/**' >>> No, we use both, and /** is correct here. It's an API-doc syntax, >>> see >>> e.g. strbuf.h. >>> It's explicitly meant for this sort of thing, i.e. comments on >>> public >>> structs in a header & the functions in a header (and struct members, >>> etc.). >> >> We don't do that consistently, we don't mention them in >> CodingGuidelines and we don't use anything that processes API-doc >> comments. It would be a lot simpler and it would be consistent with >> our coding guidelines just to use the same style everywhere. That >> would avoid problems where this series uses API-doc comments for >> in-code comments in .c files and where single line comments in header >> files do not use the API-doc syntax. > > Yes, this isn't documented in CodingGuidelines (but FWIW is in various > commit messages). > > I'm pointing out that this isn't a mistake, but the preferred style for > new API docs. It seems a bit a stretch to call it the preferred style, however I had thought all our uses were historic but that's not the case. Chris if you want to use '/**' style comments for the API docs then please do so consistently and do not use them elsewhere. > At least Emacs knows how to highlight these differently, which is the > main use I personally get out of them, I don't know what other use-cases > there are for them... I've come across them in projects that use gtk-doc or other documentation generators where it is necessary to distinguish 'documentation' from 'code comments'. I don't think they add much value if one is not generating documentation from the source, it is just one more thing for contributors to remember. Best Wishes Phillip
On Wed, Sep 28 2022, Phillip Wood wrote: > On 27/09/2022 16:28, Ævar Arnfjörð Bjarmason wrote: >> On Tue, Sep 27 2022, Phillip Wood wrote: >> >>> On 27/09/2022 14:50, Ævar Arnfjörð Bjarmason wrote: >>>> On Tue, Sep 27 2022, Phillip Wood wrote: >>>> >>>>> On 23/09/2022 19:55, Stefan Xenos via GitGitGadget wrote: >>>>>> +/** >>>>> >>>>> We tend to just use '/*' rather than '/**' >>>> No, we use both, and /** is correct here. It's an API-doc syntax, >>>> see >>>> e.g. strbuf.h. >>>> It's explicitly meant for this sort of thing, i.e. comments on >>>> public >>>> structs in a header & the functions in a header (and struct members, >>>> etc.). >>> >>> We don't do that consistently, we don't mention them in >>> CodingGuidelines and we don't use anything that processes API-doc >>> comments. It would be a lot simpler and it would be consistent with >>> our coding guidelines just to use the same style everywhere. That >>> would avoid problems where this series uses API-doc comments for >>> in-code comments in .c files and where single line comments in header >>> files do not use the API-doc syntax. >> Yes, this isn't documented in CodingGuidelines (but FWIW is in >> various >> commit messages). >> I'm pointing out that this isn't a mistake, but the preferred style >> for >> new API docs. > > It seems a bit a stretch to call it the preferred style, however I had > thought all our uses were historic but that's not the case. FWIW that claim of mine comes from my (admittedly fuzzy recollection of) history, around the time that this was being worked out (e.g. where API docs should live, they used to mostly be in Documentation/technical/, now they're mostly in *.h) we ended up with this mention in CodingGudielines: - When you come up with an API, document its functions and structures in the header file that exposes the API to its callers. Use what is in "strbuf.h" as a model for the appropriate tone and level of detail. I.e. at some point we decreed strbuf.h as a best-practice model to follow. I think every one of my own use of "/**" has come after reading that file... But patches to make it more explicit are most welcome. FWIW I think I looked into that once, but couldn't find a canonical reference for what this syntax is even called. Hrm, looking a bit more I see it's probably JavaDoc (at least Emacs highlights it as such). A grep of: git grep '^ \* @' -- '*.[ch]' Shows that we make almost no use of the meta-syntax (i.e. "tags": https://en.wikipedia.org/wiki/Javadoc#Table_of_Javadoc_tags) > Chris if you want to use '/**' style comments for the API docs then > please do so consistently and do not use them elsewhere. > >> At least Emacs knows how to highlight these differently, which is the >> main use I personally get out of them, I don't know what other use-cases >> there are for them... > > I've come across them in projects that use gtk-doc or other > documentation generators where it is necessary to distinguish > 'documentation' from 'code comments'. I don't think they add much > value if one is not generating documentation from the source, it is > just one more thing for contributors to remember. I for one find it very useful these "API docs" are shown differently in my editor, even if I'm not using some fancier (and hypothetical) "make javadoc" target. I.e. you can ideally skim the *.h file and see at a glance what's an implementation comment v.s. API doc comment. Except that even for the supposed role model of strbuf.h we forgot in some cases, and should have the below applied to it, oh well... :) diff --git a/strbuf.h b/strbuf.h index 76965a17d44..46549986ae3 100644 --- a/strbuf.h +++ b/strbuf.h @@ -492,7 +492,7 @@ int strbuf_getline_lf(struct strbuf *sb, FILE *fp); /* Uses NUL as the line terminator */ int strbuf_getline_nul(struct strbuf *sb, FILE *fp); -/* +/** * Similar to strbuf_getline_lf(), but additionally treats a CR that * comes immediately before the LF as part of the terminator. * This is the most friendly version to be used to read "text" files @@ -610,7 +610,7 @@ static inline struct strbuf **strbuf_split(const struct strbuf *sb, return strbuf_split_max(sb, terminator, 0); } -/* +/** * Adds all strings of a string list to the strbuf, separated by the given * separator. For example, if sep is * ', ' @@ -653,7 +653,7 @@ int launch_editor(const char *path, struct strbuf *buffer, int launch_sequence_editor(const char *path, struct strbuf *buffer, const char *const *env); -/* +/** * In contrast to `launch_editor()`, this function writes out the contents * of the specified file first, then clears the `buffer`, then launches * the editor and reads back in the file contents into the `buffer`. @@ -693,7 +693,7 @@ static inline void strbuf_complete_line(struct strbuf *sb) strbuf_complete(sb, '\n'); } -/* +/** * Copy "name" to "sb", expanding any special @-marks as handled by * interpret_branch_name(). The result is a non-qualified branch name * (so "foo" or "origin/master" instead of "refs/heads/foo" or @@ -707,7 +707,7 @@ static inline void strbuf_complete_line(struct strbuf *sb) void strbuf_branchname(struct strbuf *sb, const char *name, unsigned allowed); -/* +/** * Like strbuf_branchname() above, but confirm that the result is * syntactically valid to be used as a local branch name in refs/heads/. *
Phillip Wood <phillip.wood123@gmail.com> writes: > Chris if you want to use '/**' style comments for the API docs then > please do so consistently and do not use them elsewhere. Thanks. > I've come across them in projects that use gtk-doc or other > documentation generators where it is necessary to distinguish > 'documentation' from 'code comments'. I don't think they add much > value if one is not generating documentation from the source, it is > just one more thing for contributors to remember. Yes, I personally find them annoying, but has tolerated them so far, hoping that something good (read: automated documentation out of comments) emerge someday, simply because the first ones were added by folks who were interested in that direction, which unfortunately has never materialized. Maybe it would eventually happen, but I think there are a lot of clean-up to do before it happens. I somehow suspect that the sooner the mechanism to create the documentation set, however incomplete and messy the result is with the current material, the more incentive the contributors have to apply /** vs /* distinction properly, but of course on the other hand, until the existing material gets cleaned up, the care they need to take to make the distinction does feel like a makework. So, I dunno. Thanks.
> > +/** > > We tend to just use '/*' rather than '/**' It seems there's some disagreement on this. Regardless, I changed the ones in the implementation to be "/*" > > > + * This struct holds a list of change refs. The first element is > stored inline, > > + * to optimize for small lists. > > + */ > > +struct change_list { > > + /** > > + * Ref name for the first change in the list, or null if none. > > + * > > + * This field is private. Use for_each_change_in to read. > > + */ > > + const char* first_refname; > > + /** > > + * List of additional change refs. Note that this is empty if the list > > + * contains 0 or 1 elements. > > + * > > + * This field is private. Use for_each_change_in to read. > > + */ > > + struct string_list additional_refnames; > > Splitting this feels like a premature optimization. We don't have any > tests yet, let alone any real-world experience using this code. Also if > we want to save memory for lists with a single entry why are we > embedding the struct string_list rather than just storing a pointer to it? Agreed, simplified to a strset. Thanks for the suggestion. > > I think it would be simpler to use a struct strset to hold the refnames > as we don't need the util field offered by struct string_list. Done. > > > +/** > > + * Holds information about the head of a single change. > > + */ > > +struct change_head { > > + /** > > + * The location pointed to by the head of the change. May be a > commit or a > > + * metacommit. > > + */ > > + struct object_id head; > > I found this duality between commits and metacommits rather confusing - > why isn't the head always a metacommit? There is no reason to create a metacommit for the first commit you create. You only need one if you're replacing a commit with another commit. > > > +/** > > + * Holds information about the heads of each change, and permits > effecient > > s/effecient/efficient/ Done. > > > + * lookup from a commit to the changes that reference it directly. > > + * > > + * All fields should be considered private. Use the change_table > functions > > + * to interact with this struct. > > + */ > > +struct change_table { > > + /** > > + * Memory pool for the objects allocated by the change table. > > + */ > > + struct mem_pool memory_pool; > > + /* Map object_id to commit_change_list_entry structs. */ > > + struct oidmap oid_to_metadata_index; > > + /** > > + * List of ref names. The util value points to a change_head structure > > + * allocated from memory_pool. > > + */ > > + struct string_list refname_to_change_head; > > I think these days we'd use a strmap for this for O(1) lookups. Way better! > > > +}; > > + > > +extern void change_table_init(struct change_table *to_initialize); > > The struct change_table argument to all these functions changes its name > more often than a criminal on the run. I would find it much easier to > follow the code if we consistently called this argument "table" Agreed, changed them all to "table". > > > + * Adds all changes matching the given ref filter to the given > change_table > > + * struct. > > + */ > > +extern void change_table_add_matching_filter(struct change_table > *to_modify, > > + struct repository* repo, struct ref_filter *filter); > > I can't see any callers outside of change-table.c so do we really need > to export this function. Thanks for verifying, done. > > + > > +void change_table_init(struct change_table *to_initialize) > > +{ > > + memset(to_initialize, 0, sizeof(*to_initialize)); > > + mem_pool_init(&to_initialize->memory_pool, 0); > > + to_initialize->memory_pool.block_alloc = 4*1024 - sizeof(struct mp_block); > > If we're using a mempool to minimize the allocation overhead we should > leave .block_alloc set to the default value of 1MB rather than changing > it to 4kB Good question, I don't know the typical sizes that we'll get for these, so for now just sticking with the default seems sensible. > > + > > +static void add_head_to_commit(struct change_table *to_modify, > > + const struct object_id *to_add, const char *refname) > > I found the function and argument names rather confusing. If I've > understood the code correctly then this function is adding an assoation > between the commit "to_add" and "refname". Despite its name "to_add" may > already exist in the change table. > > The formatting is a bit off as well (as are most of the function > declarations in this patch and the next), we'd write that as > > static void add_head_to_commit(struct change_table *table, > const struct object_id *to_add, > const char *refname) Thanks, I wasn't clear on the guidelines. I hope the new format makes more sense. > > > +{ > > + struct commit_change_list_entry *entry; > > + > > + /** > > + * Note: the indices in the map are 1-based. 0 is used to indicate a missing > > + * element. > > + */ > > I'm confused by this comment, what indices is it talking about? No idea, removed. > > + > > + if (!entry->changes.first_refname) > > + entry->changes.first_refname = refname; > > + else > > + string_list_insert(&entry->changes.additional_refnames, refname); > > This is an example of the complexity added by the current definition of > struct change_list. Yes, simplified. > > > +void change_table_add(struct change_table *to_modify, const char *refname, > > + struct commit *to_add) > > +{ > > + struct change_head *new_head; > > + struct string_list_item *new_item; > > + int metacommit_type; > > + > > + new_head = mem_pool_calloc(&to_modify->memory_pool, 1, > > + sizeof(*new_head)); > > + > > + oidcpy(&new_head->head, &to_add->object.oid); > > + > > + metacommit_type = get_metacommit_content(to_add, &new_head->content); > > + if (metacommit_type == METACOMMIT_TYPE_NONE) > > + oidcpy(&new_head->content, &to_add->object.oid); > > If to_add is not a metacommit then the content is to_add itself, > otherwise it will have been set by the call to get_metacommit_content(). Yes, added the comment. > > > + new_head->abandoned = (metacommit_type == METACOMMIT_TYPE_ABANDONED); > > Style: I don't think we normally bother with parentheses here I admit I prefer it here because operator priority isn't always obvious (it could be read as (new_head->abandoned = metacommit_type) == METACOMMIT_TYPE_ABANDONED; > > > + new_head->remote = starts_with(refname, "refs/remote/"); > > + new_head->hidden = starts_with(refname, "refs/hiddenmetas/"); > > + > > + new_item = string_list_insert(&to_modify->refname_to_change_head, refname); > > + new_item->util = new_head; > > + /* Use pointers to the copy of the string we're retaining locally */ > > string_list_insert() copied the string and we're using that copy. Saying > we're retaining it locally when it will outlive this function call is > confusing. This is now obsolete with the move to strmap. > > > + refname = new_item->string; > > + > > + if (!oideq(&new_head->content, &new_head->head)) > > + add_head_to_commit(to_modify, &new_head->content, refname); > > If to_add is a metacommit then we remember the link between refname and > the content commit. > > > + add_head_to_commit(to_modify, &new_head->head, refname); > > We also remember the link between refname and to_add Thanks, added the comment. > > > +} > > + > > +void change_table_add_all_visible(struct change_table *to_modify, > > + struct repository* repo) > > +{ > > + struct ref_filter filter; > > rather than using memset we'd write (the same goes for all the other > memset() calls in this series, unless they're operation on a heap > allocation) > > struct ref_filter filter = { 0 }; Thanks, I wasn't aware of that trick. > > > + const char *name_patterns[] = {NULL}; > > + memset(&filter, 0, sizeof(filter)); > > + filter.kind = FILTER_REFS_CHANGES; > > + filter.name_patterns = name_patterns; > > + > > + change_table_add_matching_filter(to_modify, repo, &filter); > > +} > > + > > +void change_table_add_matching_filter(struct change_table *to_modify, > > + struct repository* repo, struct ref_filter *filter) > > +{ > > + struct ref_array matching_refs; > > + int i; > > + > > + memset(&matching_refs, 0, sizeof(matching_refs)); > > + filter_refs(&matching_refs, filter, filter->kind); > > + > > + /** > > + * Determine the object id for the latest content commit for each change. > > + * Fetch the commit at the head of each change ref. If it's a normal commit, > > + * that's the commit we want. If it's a metacommit, locate its content parent > > + * and use that. > > + */ > > + > > + for (i = 0; i < matching_refs.nr; i++) { > > + struct ref_array_item *item = matching_refs.items[i]; > > + struct commit *commit = item->commit; > > + > > + commit = lookup_commit_reference_gently(repo, &item->objectname, 1); > > We're assigning commit twice - why do we need to look it up if > filter_refs returns it? I think this is a case of missing logic if you look at what the comment above it says. > > There are a number of places where we call > lookup_commit_reference_gently(..., 1) to silence the warning if the > objectname does not dereference to a commit. It is not clear to me that > we want to hide those errors. Indeed I think we should be doing Agreed, move to this. > > commit = lookup_commit_reference(repo, oid) > if (!commit) > BUG("commit missing ...") > > unless there is a good reason that the lookup can fail. I can't think of any but then I'm not the original author. > > > + if (commit) > > + change_table_add(to_modify, item->refname, commit); > > + } > > + > > + ref_array_clear(&matching_refs); > > +} > > > +int for_each_change_referencing(struct change_table *table, > > + const struct object_id *referenced_commit_id, each_change_fn fn, void *cb_data) > > +{ > > + const struct change_list *changes; > > + int i; > > + int retvalue; > > We normally use "ret" for this Done. > > > + struct commit_change_list_entry *entry; > > + > > + entry = oidmap_get(&table->oid_to_metadata_index, > > + referenced_commit_id); > > This should be indented to start below the '(' of the function call. Done. > > > + /* If this commit isn't referenced by any changes, it won't be in the map */ > > + if (!entry) > > + return 0; > > + changes = &entry->changes; > > + if (!changes->first_refname) > > + return 0; > > + retvalue = fn(changes->first_refname, cb_data); > > + for (i = 0; retvalue == 0 && i < changes->additional_refnames.nr; i++) > > + retvalue = fn(changes->additional_refnames.items[i].string, cb_data); > > Using an strset for struct change_list would simplify this Agreed! Simplified. > > > + return retvalue; > > +} > > + > > +struct change_head* get_change_head(struct change_table *heads, > > + const char* refname) > > +{ > > + struct string_list_item *item = string_list_lookup( > > + &heads->refname_to_change_head, refname); > > + > > + if (!item) > > + return NULL; > > + > > + return (struct change_head *)item->util; > > We don't bother with casting void* pointers like this. In any case this > whole function could become > > return strmap_get(table, refname) > > if we used an strmap instead of a string_list. > > > Aside from the style issues and using api's that have been added since > Stefan wrote these patches this looks pretty sound. The only thing I > don't really get why the public api allows normal commits to be added to > the change table (I can see why we might want to add the content commit > as well when we add a metacommit but that should be done internally) > > Best Wishes > > Phillip
diff --git a/Makefile b/Makefile index b2bcc00c289..2b847e7e7de 100644 --- a/Makefile +++ b/Makefile @@ -913,6 +913,7 @@ LIB_OBJS += bulk-checkin.o LIB_OBJS += bundle-uri.o LIB_OBJS += bundle.o LIB_OBJS += cache-tree.o +LIB_OBJS += change-table.o LIB_OBJS += cbtree.o LIB_OBJS += chdir-notify.o LIB_OBJS += checkout.o diff --git a/change-table.c b/change-table.c new file mode 100644 index 00000000000..c61ba29f1ed --- /dev/null +++ b/change-table.c @@ -0,0 +1,179 @@ +#include "cache.h" +#include "change-table.h" +#include "commit.h" +#include "ref-filter.h" +#include "metacommit-parser.h" + +void change_table_init(struct change_table *to_initialize) +{ + memset(to_initialize, 0, sizeof(*to_initialize)); + mem_pool_init(&to_initialize->memory_pool, 0); + to_initialize->memory_pool.block_alloc = 4*1024 - sizeof(struct mp_block); + oidmap_init(&to_initialize->oid_to_metadata_index, 0); + string_list_init_dup(&to_initialize->refname_to_change_head); +} + +static void change_list_clear(struct change_list *to_clear) { + string_list_clear(&to_clear->additional_refnames, 0); +} + +static void commit_change_list_entry_clear( + struct commit_change_list_entry *to_clear) { + change_list_clear(&to_clear->changes); +} + +void change_table_clear(struct change_table *to_clear) +{ + struct oidmap_iter iter; + struct commit_change_list_entry *next; + for (next = oidmap_iter_first(&to_clear->oid_to_metadata_index, &iter); + next; + next = oidmap_iter_next(&iter)) { + + commit_change_list_entry_clear(next); + } + + oidmap_free(&to_clear->oid_to_metadata_index, 0); + string_list_clear(&to_clear->refname_to_change_head, 0); + mem_pool_discard(&to_clear->memory_pool, 0); +} + +static void add_head_to_commit(struct change_table *to_modify, + const struct object_id *to_add, const char *refname) +{ + struct commit_change_list_entry *entry; + + /** + * Note: the indices in the map are 1-based. 0 is used to indicate a missing + * element. + */ + entry = oidmap_get(&to_modify->oid_to_metadata_index, to_add); + if (!entry) { + entry = mem_pool_calloc(&to_modify->memory_pool, 1, + sizeof(*entry)); + oidcpy(&entry->entry.oid, to_add); + oidmap_put(&to_modify->oid_to_metadata_index, entry); + string_list_init_nodup(&entry->changes.additional_refnames); + } + + if (!entry->changes.first_refname) + entry->changes.first_refname = refname; + else + string_list_insert(&entry->changes.additional_refnames, refname); +} + +void change_table_add(struct change_table *to_modify, const char *refname, + struct commit *to_add) +{ + struct change_head *new_head; + struct string_list_item *new_item; + int metacommit_type; + + new_head = mem_pool_calloc(&to_modify->memory_pool, 1, + sizeof(*new_head)); + + oidcpy(&new_head->head, &to_add->object.oid); + + metacommit_type = get_metacommit_content(to_add, &new_head->content); + if (metacommit_type == METACOMMIT_TYPE_NONE) + oidcpy(&new_head->content, &to_add->object.oid); + new_head->abandoned = (metacommit_type == METACOMMIT_TYPE_ABANDONED); + new_head->remote = starts_with(refname, "refs/remote/"); + new_head->hidden = starts_with(refname, "refs/hiddenmetas/"); + + new_item = string_list_insert(&to_modify->refname_to_change_head, refname); + new_item->util = new_head; + /* Use pointers to the copy of the string we're retaining locally */ + refname = new_item->string; + + if (!oideq(&new_head->content, &new_head->head)) + add_head_to_commit(to_modify, &new_head->content, refname); + add_head_to_commit(to_modify, &new_head->head, refname); +} + +void change_table_add_all_visible(struct change_table *to_modify, + struct repository* repo) +{ + struct ref_filter filter; + const char *name_patterns[] = {NULL}; + memset(&filter, 0, sizeof(filter)); + filter.kind = FILTER_REFS_CHANGES; + filter.name_patterns = name_patterns; + + change_table_add_matching_filter(to_modify, repo, &filter); +} + +void change_table_add_matching_filter(struct change_table *to_modify, + struct repository* repo, struct ref_filter *filter) +{ + struct ref_array matching_refs; + int i; + + memset(&matching_refs, 0, sizeof(matching_refs)); + filter_refs(&matching_refs, filter, filter->kind); + + /** + * Determine the object id for the latest content commit for each change. + * Fetch the commit at the head of each change ref. If it's a normal commit, + * that's the commit we want. If it's a metacommit, locate its content parent + * and use that. + */ + + for (i = 0; i < matching_refs.nr; i++) { + struct ref_array_item *item = matching_refs.items[i]; + struct commit *commit = item->commit; + + commit = lookup_commit_reference_gently(repo, &item->objectname, 1); + + if (commit) + change_table_add(to_modify, item->refname, commit); + } + + ref_array_clear(&matching_refs); +} + +static int return_true_callback(const char *refname, void *cb_data) +{ + return 1; +} + +int change_table_has_change_referencing(struct change_table *changes, + const struct object_id *referenced_commit_id) +{ + return for_each_change_referencing(changes, referenced_commit_id, + return_true_callback, NULL); +} + +int for_each_change_referencing(struct change_table *table, + const struct object_id *referenced_commit_id, each_change_fn fn, void *cb_data) +{ + const struct change_list *changes; + int i; + int retvalue; + struct commit_change_list_entry *entry; + + entry = oidmap_get(&table->oid_to_metadata_index, + referenced_commit_id); + /* If this commit isn't referenced by any changes, it won't be in the map */ + if (!entry) + return 0; + changes = &entry->changes; + if (!changes->first_refname) + return 0; + retvalue = fn(changes->first_refname, cb_data); + for (i = 0; retvalue == 0 && i < changes->additional_refnames.nr; i++) + retvalue = fn(changes->additional_refnames.items[i].string, cb_data); + return retvalue; +} + +struct change_head* get_change_head(struct change_table *heads, + const char* refname) +{ + struct string_list_item *item = string_list_lookup( + &heads->refname_to_change_head, refname); + + if (!item) + return NULL; + + return (struct change_head *)item->util; +} diff --git a/change-table.h b/change-table.h new file mode 100644 index 00000000000..166b5ed8073 --- /dev/null +++ b/change-table.h @@ -0,0 +1,132 @@ +#ifndef CHANGE_TABLE_H +#define CHANGE_TABLE_H + +#include "oidmap.h" + +struct commit; +struct ref_filter; + +/** + * This struct holds a list of change refs. The first element is stored inline, + * to optimize for small lists. + */ +struct change_list { + /** + * Ref name for the first change in the list, or null if none. + * + * This field is private. Use for_each_change_in to read. + */ + const char* first_refname; + /** + * List of additional change refs. Note that this is empty if the list + * contains 0 or 1 elements. + * + * This field is private. Use for_each_change_in to read. + */ + struct string_list additional_refnames; +}; + +/** + * Holds information about the head of a single change. + */ +struct change_head { + /** + * The location pointed to by the head of the change. May be a commit or a + * metacommit. + */ + struct object_id head; + /** + * The content commit for the latest commit in the change. Always points to a + * real commit, never a metacommit. + */ + struct object_id content; + /** + * Abandoned: indicates that the content commit should be removed from the + * history. + * + * Hidden: indicates that the change is an inactive change from the + * hiddenmetas namespace. Such changes will be hidden from the user by + * default. + * + * Deleted: indicates that the change has been removed from the repository. + * That is the ref was deleted since the time this struct was created. Such + * entries should be ignored. + */ + unsigned int abandoned:1, + hidden:1, + remote:1, + deleted:1; +}; + +/** + * Holds the list of change refs whose content points to a particular content + * commit. + */ +struct commit_change_list_entry { + struct oidmap_entry entry; + struct change_list changes; +}; + +/** + * Holds information about the heads of each change, and permits effecient + * lookup from a commit to the changes that reference it directly. + * + * All fields should be considered private. Use the change_table functions + * to interact with this struct. + */ +struct change_table { + /** + * Memory pool for the objects allocated by the change table. + */ + struct mem_pool memory_pool; + /* Map object_id to commit_change_list_entry structs. */ + struct oidmap oid_to_metadata_index; + /** + * List of ref names. The util value points to a change_head structure + * allocated from memory_pool. + */ + struct string_list refname_to_change_head; +}; + +extern void change_table_init(struct change_table *to_initialize); +extern void change_table_clear(struct change_table *to_clear); + +/* Adds the given change head to the change_table struct */ +extern void change_table_add(struct change_table *to_modify, + const char *refname, struct commit *target); + +/** + * Adds the non-hidden local changes to the given change_table struct. + */ +extern void change_table_add_all_visible(struct change_table *to_modify, + struct repository *repo); + +/* + * Adds all changes matching the given ref filter to the given change_table + * struct. + */ +extern void change_table_add_matching_filter(struct change_table *to_modify, + struct repository* repo, struct ref_filter *filter); + +typedef int each_change_fn(const char *refname, void *cb_data); + +extern int change_table_has_change_referencing(struct change_table *changes, + const struct object_id *referenced_commit_id); + +/** + * Iterates over all changes that reference the given commit. For metacommits, + * this is the list of changes that point directly to that metacommit. + * For normal commits, this is the list of changes that have this commit as + * their latest content. + */ +extern int for_each_change_referencing(struct change_table *heads, + const struct object_id *referenced_commit_id, each_change_fn fn, void *cb_data); + +/** + * Returns the change head for the given refname. Returns NULL if no such change + * exists. + */ +extern struct change_head* get_change_head(struct change_table *heads, + const char* refname); + +#endif