Message ID | 91e6e97a66aff88e0b860e34659dddc3396c7f28.1595927632.git.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Implement Corrected Commit Date | expand |
On Tue, Jul 28, 2020 at 09:13:46AM +0000, Abhishek Kumar via GitGitGadget wrote: > From: Abhishek Kumar <abhishekkumar8222@gmail.com> > > With 3d112755 (commit-graph: examine commits by generation number), Git > knew to sort by generation number before examining the diff when not > using pack order. c49c82aa (commit: move members graph_pos, generation > to a slab, 2020-06-17) moved generation number into a slab and > introduced a helper which returns GENERATION_NUMBER_INFINITY when > writing the graph. Sorting is no longer useful and essentially reverts > the earlier commit. This last sentence is slightly confusing. Do you think it would be more clear if you said elaborated a bit? Perhaps something like: [...] commit_gen_cmp is used when writing a commit-graph to sort commits in generation order before computing Bloom filters. Since c49c82aa made it so that 'commit_graph_generation()' returns 'GENERATION_NUMBER_INFINITY' during writing, we cannot call it within this function. Instead, access the generation number directly through the slab (i.e., by calling 'commit_graph_data_at(c)->generation') in order to access it while writing. I think the above would be a good extra paragraph in the commit message provided that you remove the sentence beginning with "Sorting is no longer useful..." > Let's fix this by accessing generation number directly through the slab. > > Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com> > --- > commit-graph.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/commit-graph.c b/commit-graph.c > index 1af68c297d..5d3c9bd23c 100644 > --- a/commit-graph.c > +++ b/commit-graph.c > @@ -144,8 +144,9 @@ static int commit_gen_cmp(const void *va, const void *vb) > const struct commit *a = *(const struct commit **)va; > const struct commit *b = *(const struct commit **)vb; > > - uint32_t generation_a = commit_graph_generation(a); > - uint32_t generation_b = commit_graph_generation(b); > + uint32_t generation_a = commit_graph_data_at(a)->generation; > + uint32_t generation_b = commit_graph_data_at(b)->generation; > + Nit; this whitespace diff is extraneous, but it's not hurting anything either. Since it looks like you're rerolling anyway, it would be good to just get rid of it. Otherwise this fix makes sense to me. > /* lower generation commits first */ > if (generation_a < generation_b) > return -1; > -- > gitgitgadget Thanks, Taylor
On Tue, Jul 28, 2020 at 11:28:44AM -0400, Taylor Blau wrote: > On Tue, Jul 28, 2020 at 09:13:46AM +0000, Abhishek Kumar via GitGitGadget wrote: > > From: Abhishek Kumar <abhishekkumar8222@gmail.com> > > > > With 3d112755 (commit-graph: examine commits by generation number), Git > > knew to sort by generation number before examining the diff when not > > using pack order. c49c82aa (commit: move members graph_pos, generation > > to a slab, 2020-06-17) moved generation number into a slab and > > introduced a helper which returns GENERATION_NUMBER_INFINITY when > > writing the graph. Sorting is no longer useful and essentially reverts > > the earlier commit. > > This last sentence is slightly confusing. Do you think it would be more > clear if you said elaborated a bit? Perhaps something like: > > [...] > > commit_gen_cmp is used when writing a commit-graph to sort commits in > generation order before computing Bloom filters. Since c49c82aa made > it so that 'commit_graph_generation()' returns > 'GENERATION_NUMBER_INFINITY' during writing, we cannot call it within > this function. Instead, access the generation number directly through > the slab (i.e., by calling 'commit_graph_data_at(c)->generation') in > order to access it while writing. > Thanks! That is clearer. Will change. > I think the above would be a good extra paragraph in the commit message > provided that you remove the sentence beginning with "Sorting is no > longer useful..." > > > Let's fix this by accessing generation number directly through the slab. > > > > Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com> > > --- > > commit-graph.c | 5 +++-- > > 1 file changed, 3 insertions(+), 2 deletions(-) > > > > diff --git a/commit-graph.c b/commit-graph.c > > index 1af68c297d..5d3c9bd23c 100644 > > --- a/commit-graph.c > > +++ b/commit-graph.c > > @@ -144,8 +144,9 @@ static int commit_gen_cmp(const void *va, const void *vb) > > const struct commit *a = *(const struct commit **)va; > > const struct commit *b = *(const struct commit **)vb; > > > > - uint32_t generation_a = commit_graph_generation(a); > > - uint32_t generation_b = commit_graph_generation(b); > > + uint32_t generation_a = commit_graph_data_at(a)->generation; > > + uint32_t generation_b = commit_graph_data_at(b)->generation; > > + > > Nit; this whitespace diff is extraneous, but it's not hurting anything > either. Since it looks like you're rerolling anyway, it would be good to > just get rid of it. > > Otherwise this fix makes sense to me. > > > /* lower generation commits first */ > > if (generation_a < generation_b) > > return -1; > > -- > > gitgitgadget > > Thanks, > Taylor
"Abhishek Kumar via GitGitGadget" <gitgitgadget@gmail.com> writes: > From: Abhishek Kumar <abhishekkumar8222@gmail.com> > > With 3d112755 (commit-graph: examine commits by generation number), Git > knew to sort by generation number before examining the diff when not > using pack order. c49c82aa (commit: move members graph_pos, generation > to a slab, 2020-06-17) moved generation number into a slab and > introduced a helper which returns GENERATION_NUMBER_INFINITY when > writing the graph. Sorting is no longer useful and essentially reverts > the earlier commit. > > Let's fix this by accessing generation number directly through the slab. It looks like unfortunate and unforeseen consequence of putting together graph position and generation number in the commit_graph_data struct. During writing of the commit-graph file generation number is computed, but graph position is undefined (yet), and commit_graph_generation() uses graph_pos field to find if the data for commit is initialized; in this case wrongly. Anyway, when writing the commit graph we first compute generation number, then (if requested) the changed-paths Bloom filter. Skipping the unnecessary check is a good thing... assuming that commit_gen_cmp() is used only when writing the commit graph, and not when traversing it (because then some commits may not have generation number set, and maybe even do not have any data on the commit slab) - which is the case. > > Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com> > --- > commit-graph.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/commit-graph.c b/commit-graph.c > index 1af68c297d..5d3c9bd23c 100644 > --- a/commit-graph.c > +++ b/commit-graph.c We might want to add function comment either here or in the header that this comparisonn function is to be used only for `git commit-graph write`, and not for graph traversal (even if similar funnction exists in other modules). > @@ -144,8 +144,9 @@ static int commit_gen_cmp(const void *va, const void *vb) > const struct commit *a = *(const struct commit **)va; > const struct commit *b = *(const struct commit **)vb; > > - uint32_t generation_a = commit_graph_generation(a); > - uint32_t generation_b = commit_graph_generation(b); > + uint32_t generation_a = commit_graph_data_at(a)->generation; > + uint32_t generation_b = commit_graph_data_at(b)->generation; > + > /* lower generation commits first */ > if (generation_a < generation_b) > return -1; Best, -- Jakub Narębski
On Tue, Aug 04, 2020 at 02:46:55AM +0200, Jakub Narębski wrote: > "Abhishek Kumar via GitGitGadget" <gitgitgadget@gmail.com> writes: > > > From: Abhishek Kumar <abhishekkumar8222@gmail.com> > > > > With 3d112755 (commit-graph: examine commits by generation number), Git > > knew to sort by generation number before examining the diff when not > > using pack order. c49c82aa (commit: move members graph_pos, generation > > to a slab, 2020-06-17) moved generation number into a slab and > > introduced a helper which returns GENERATION_NUMBER_INFINITY when > > writing the graph. Sorting is no longer useful and essentially reverts > > the earlier commit. > > > > Let's fix this by accessing generation number directly through the slab. > > It looks like unfortunate and unforeseen consequence of putting together > graph position and generation number in the commit_graph_data struct. > During writing of the commit-graph file generation number is computed, > but graph position is undefined (yet), and commit_graph_generation() > uses graph_pos field to find if the data for commit is initialized; > in this case wrongly. > > Anyway, when writing the commit graph we first compute generation > number, then (if requested) the changed-paths Bloom filter. Skipping > the unnecessary check is a good thing... assuming that commit_gen_cmp() > is used only when writing the commit graph, and not when traversing it > (because then some commits may not have generation number set, and maybe > even do not have any data on the commit slab) - which is the case. > > > > > Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com> > > --- > > commit-graph.c | 5 +++-- > > 1 file changed, 3 insertions(+), 2 deletions(-) > > > > diff --git a/commit-graph.c b/commit-graph.c > > index 1af68c297d..5d3c9bd23c 100644 > > --- a/commit-graph.c > > +++ b/commit-graph.c > > We might want to add function comment either here or in the header that > this comparisonn function is to be used only for `git commit-graph > write`, and not for graph traversal (even if similar funnction exists in > other modules). I think that probably within the function is just fine, and that we can avoid touching commit-graph.h here. > > > @@ -144,8 +144,9 @@ static int commit_gen_cmp(const void *va, const void *vb) > > const struct commit *a = *(const struct commit **)va; > > const struct commit *b = *(const struct commit **)vb; Maybe something like: /* * Access the generation number directly with * 'commit_graph_data_at(...)->generation' instead of going through * the slab as usual to avoid accessing a yet-uncomputed value. */ Folks that are curious for more can blame this commit and read there. I'd err on the side of being brief in the code comment and verbose in the commit message than the other way around ;). > > > > - uint32_t generation_a = commit_graph_generation(a); > > - uint32_t generation_b = commit_graph_generation(b); > > + uint32_t generation_a = commit_graph_data_at(a)->generation; > > + uint32_t generation_b = commit_graph_data_at(b)->generation; > > + > > /* lower generation commits first */ > > if (generation_a < generation_b) > > return -1; > > Best, > -- > Jakub Narębski Thanks, Taylor
Jakub Narębski <jnareb@gmail.com> writes: [...] >> @@ -144,8 +144,9 @@ static int commit_gen_cmp(const void *va, const void *vb) >> const struct commit *a = *(const struct commit **)va; >> const struct commit *b = *(const struct commit **)vb; >> >> - uint32_t generation_a = commit_graph_generation(a); >> - uint32_t generation_b = commit_graph_generation(b); >> + uint32_t generation_a = commit_graph_data_at(a)->generation; >> + uint32_t generation_b = commit_graph_data_at(b)->generation; >> + >> /* lower generation commits first */ >> if (generation_a < generation_b) >> return -1; NOTE: One more thing: we would want to check if corrected commit date (generation number v2) or topological level (generation number v1) is better for this purpose, that is gives better performance. The commit 3d11275505 (commit-graph: examine commits by generation number) which introduced using commit_gen_cmp when writing commit graph when finding commits via `--reachable` flags describes the following performance improvement: On the Linux kernel repository, this change reduced the computation time for 'git commit-graph write --reachable --changed-paths' from 3m00s to 1m37s. We would probably want time for no sorting, for sorting by generation number v2, and for sorting by topological level (generation number v1) for the same or similar case. Best, -- Jakub Narębski
Taylor Blau <me@ttaylorr.com> writes: > On Tue, Aug 04, 2020 at 02:46:55AM +0200, Jakub Narębski wrote: >> "Abhishek Kumar via GitGitGadget" <gitgitgadget@gmail.com> writes: [...] >>> diff --git a/commit-graph.c b/commit-graph.c >>> index 1af68c297d..5d3c9bd23c 100644 >>> --- a/commit-graph.c >>> +++ b/commit-graph.c >> >> We might want to add function comment either here or in the header that >> this comparisonn function is to be used only for `git commit-graph >> write`, and not for graph traversal (even if similar funnction exists in >> other modules). > > I think that probably within the function is just fine, and that we can > avoid touching commit-graph.h here. > >> >>> @@ -144,8 +144,9 @@ static int commit_gen_cmp(const void *va, const void *vb) >>> const struct commit *a = *(const struct commit **)va; >>> const struct commit *b = *(const struct commit **)vb; > > Maybe something like: > > /* > * Access the generation number directly with > * 'commit_graph_data_at(...)->generation' instead of going through > * the slab as usual to avoid accessing a yet-uncomputed value. > */ I think the last part of this comment should read: [...] * 'commit_graph_data_at(...)->generation' instead of going through * the commit_graph_generation() helper function to access just * computed data [during `git commit-graph write --reachable --changed-paths`]. */ Or something like that (the part in square brackets is optional; I am not sure if adding it helps or not). > > Folks that are curious for more can blame this commit and read there. > I'd err on the side of being brief in the code comment and verbose in > the commit message than the other way around ;). I agree. >>> >>> - uint32_t generation_a = commit_graph_generation(a); >>> - uint32_t generation_b = commit_graph_generation(b); >>> + uint32_t generation_a = commit_graph_data_at(a)->generation; >>> + uint32_t generation_b = commit_graph_data_at(b)->generation; >>> + >>> /* lower generation commits first */ >>> if (generation_a < generation_b) >>> return -1; Best,
diff --git a/commit-graph.c b/commit-graph.c index 1af68c297d..5d3c9bd23c 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -144,8 +144,9 @@ static int commit_gen_cmp(const void *va, const void *vb) const struct commit *a = *(const struct commit **)va; const struct commit *b = *(const struct commit **)vb; - uint32_t generation_a = commit_graph_generation(a); - uint32_t generation_b = commit_graph_generation(b); + uint32_t generation_a = commit_graph_data_at(a)->generation; + uint32_t generation_b = commit_graph_data_at(b)->generation; + /* lower generation commits first */ if (generation_a < generation_b) return -1;