Message ID | a3436b92a32f7f6dd02ad61eb2337a4d088d5e9c.1645735117.git.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Commit-graph: Generation Number v2 Fixes, v3 implementation | expand |
On Thu, Feb 24, 2022 at 08:38:32PM +0000, Derrick Stolee via GitGitGadget wrote: > From: Derrick Stolee <derrickstolee@github.com> > > The 'read_generation_data' member of 'struct commit_graph' was > introduced by 1fdc383c5 (commit-graph: use generation v2 only if entire > chain does, 2021-01-16). The intention was to avoid using corrected > commit dates if not all layers of a commit-graph had that data stored. > The logic in validate_mixed_generation_chain() at that point incorrectly > initialized read_generation_data to 1 if and only if the tip > commit-graph contained the Corrected Commit Date chunk. > > This was "fixed" in 448a39e65 (commit-graph: validate layers for > generation data, 2021-02-02) to validate that read_generation_data was > either non-zero for all layers, or it would set read_generation_data to > zero for all layers. > > The problem here is that read_generation_data is not initialized to be > non-zero anywhere! > > This change initializes read_generation_data immediately after the chunk > is parsed, so each layer will have its value present as soon as > possible. > > The read_generation_data member is used in fill_commit_graph_info() to > determine if we should use the corrected commit date or the topological > levels stored in the Commit Data chunk. Due to this bug, all previous > versions of Git were defaulting to topological levels in all cases! > > This can be measured with some performance tests. Using the Linux kernel > as a testbed, I generated a complete commit-graph containing corrected > commit dates and tested the 'new' version against the previous, 'old' > version. > > First, rev-list with --topo-order demonstrates a 26% improvement using > corrected commit dates: > > hyperfine \ > -n "old" "$OLD_GIT rev-list --topo-order -1000 v3.6" \ > -n "new" "$NEW_GIT rev-list --topo-order -1000 v3.6" \ > --warmup=10 > > Benchmark 1: old > Time (mean ± σ): 57.1 ms ± 3.1 ms > Range (min … max): 52.9 ms … 62.0 ms 55 runs > > Benchmark 2: new > Time (mean ± σ): 45.5 ms ± 3.3 ms > Range (min … max): 39.9 ms … 51.7 ms 59 runs > > Summary > 'new' ran > 1.26 ± 0.11 times faster than 'old' > > These performance improvements are due to the algorithmic improvements > given by walking fewer commits due to the higher cutoffs from corrected > commit dates. > > However, this comes at a cost. The additional I/O cost of parsing the > corrected commit dates is visible in case of merge-base commands that do > not reduce the overall number of walked commits. > > hyperfine \ > -n "old" "$OLD_GIT merge-base v4.8 v4.9" \ > -n "new" "$NEW_GIT merge-base v4.8 v4.9" \ > --warmup=10 > > Benchmark 1: old > Time (mean ± σ): 110.4 ms ± 6.4 ms > Range (min … max): 96.0 ms … 118.3 ms 25 runs > > Benchmark 2: new > Time (mean ± σ): 150.7 ms ± 1.1 ms > Range (min … max): 149.3 ms … 153.4 ms 19 runs > > Summary > 'old' ran > 1.36 ± 0.08 times faster than 'new' > > Performance issues like this are what motivated 702110aac (commit-graph: > use config to specify generation type, 2021-02-25). > > In the future, we could fix this performance problem by inserting the > corrected commit date offsets into the Commit Date chunk instead of > having that data in an extra chunk. > > Signed-off-by: Derrick Stolee <derrickstolee@github.com> > --- > commit-graph.c | 3 +++ > t/t4216-log-bloom.sh | 2 +- > t/t5318-commit-graph.sh | 14 ++++++++++++-- > t/t5324-split-commit-graph.sh | 9 +++++++-- > 4 files changed, 23 insertions(+), 5 deletions(-) > > diff --git a/commit-graph.c b/commit-graph.c > index a19bd96c2ee..8e52bb09552 100644 > --- a/commit-graph.c > +++ b/commit-graph.c > @@ -407,6 +407,9 @@ struct commit_graph *parse_commit_graph(struct repository *r, > &graph->chunk_generation_data); > pair_chunk(cf, GRAPH_CHUNKID_GENERATION_DATA_OVERFLOW, > &graph->chunk_generation_data_overflow); > + > + if (graph->chunk_generation_data) > + graph->read_generation_data = 1; > } > > if (r->settings.commit_graph_read_changed_paths) { I wanted to test your changes because they seem quite exciting in the context of my work as well, but this commit seems to uncover a bug with how we handle overflows. I originally triggered the bug when trying to do a mirror-fetch, but as it turns it seems to trigger now whenever the commit-graph is being read: $ git commit-graph verify fatal: commit-graph requires overflow generation data but has none $ git commit-graph write --split Finding commits for commit graph among packed objects: 100% (10235119/10235119), done. fatal: commit-graph requires overflow generation data but has none $ git commit-graph write --split=replace Finding commits for commit graph among packed objects: 100% (10235119/10235119), done. fatal: commit-graph requires overflow generation data but has none I initially assumed this may be a bug with how we previously wrote the commit-graph, but removing all chains still reliably triggers it: $ rm -f objects/info/commit-graphs/* $ git commit-graph write --split Finding commits for commit graph among packed objects: 100% (10235119/10235119), done. fatal: commit-graph requires overflow generation data but has none I haven't yet found the time to dig deeper into why this is happening. While the repository is publicly accessible at [1], unfortunately the bug seems to be triggered by a commit that's only kept alive by an internal reference. Patrick [1]: https://gitlab.com/gitlab-com/www-gitlab-com.git
On 2/28/2022 10:18 AM, Patrick Steinhardt wrote: > On Thu, Feb 24, 2022 at 08:38:32PM +0000, Derrick Stolee via GitGitGadget wrote: >> From: Derrick Stolee <derrickstolee@github.com> ... >> diff --git a/commit-graph.c b/commit-graph.c >> index a19bd96c2ee..8e52bb09552 100644 >> --- a/commit-graph.c >> +++ b/commit-graph.c >> @@ -407,6 +407,9 @@ struct commit_graph *parse_commit_graph(struct repository *r, >> &graph->chunk_generation_data); >> pair_chunk(cf, GRAPH_CHUNKID_GENERATION_DATA_OVERFLOW, >> &graph->chunk_generation_data_overflow); >> + >> + if (graph->chunk_generation_data) >> + graph->read_generation_data = 1; >> } >> >> if (r->settings.commit_graph_read_changed_paths) { > > I wanted to test your changes because they seem quite exciting in the > context of my work as well, but this commit seems to uncover a bug with > how we handle overflows. I originally triggered the bug when trying to > do a mirror-fetch, but as it turns it seems to trigger now whenever the > commit-graph is being read: > > $ git commit-graph verify > fatal: commit-graph requires overflow generation data but has none > > $ git commit-graph write --split > Finding commits for commit graph among packed objects: 100% (10235119/10235119), done. > fatal: commit-graph requires overflow generation data but has none > > $ git commit-graph write --split=replace > Finding commits for commit graph among packed objects: 100% (10235119/10235119), done. > fatal: commit-graph requires overflow generation data but has none > > I initially assumed this may be a bug with how we previously wrote the > commit-graph, but removing all chains still reliably triggers it: > > $ rm -f objects/info/commit-graphs/* > $ git commit-graph write --split > Finding commits for commit graph among packed objects: 100% (10235119/10235119), done. > fatal: commit-graph requires overflow generation data but has none > > I haven't yet found the time to dig deeper into why this is happening. > While the repository is publicly accessible at [1], unfortunately the > bug seems to be triggered by a commit that's only kept alive by an > internal reference. > > Patrick > > [1]: https://gitlab.com/gitlab-com/www-gitlab-com.git Thanks for including this information. Just to be clear: did you include patch 4 in your tests, or not? Patch 4 includes a fix related to overflow values, so it would be helpful to know if you found a _different_ bug or if it is the same one. Thanks, -Stolee
On Mon, Feb 28, 2022 at 11:23:38AM -0500, Derrick Stolee wrote: > On 2/28/2022 10:18 AM, Patrick Steinhardt wrote: > > On Thu, Feb 24, 2022 at 08:38:32PM +0000, Derrick Stolee via GitGitGadget wrote: > >> From: Derrick Stolee <derrickstolee@github.com> > ... > >> diff --git a/commit-graph.c b/commit-graph.c > >> index a19bd96c2ee..8e52bb09552 100644 > >> --- a/commit-graph.c > >> +++ b/commit-graph.c > >> @@ -407,6 +407,9 @@ struct commit_graph *parse_commit_graph(struct repository *r, > >> &graph->chunk_generation_data); > >> pair_chunk(cf, GRAPH_CHUNKID_GENERATION_DATA_OVERFLOW, > >> &graph->chunk_generation_data_overflow); > >> + > >> + if (graph->chunk_generation_data) > >> + graph->read_generation_data = 1; > >> } > >> > >> if (r->settings.commit_graph_read_changed_paths) { > > > > I wanted to test your changes because they seem quite exciting in the > > context of my work as well, but this commit seems to uncover a bug with > > how we handle overflows. I originally triggered the bug when trying to > > do a mirror-fetch, but as it turns it seems to trigger now whenever the > > commit-graph is being read: > > > > $ git commit-graph verify > > fatal: commit-graph requires overflow generation data but has none > > > > $ git commit-graph write --split > > Finding commits for commit graph among packed objects: 100% (10235119/10235119), done. > > fatal: commit-graph requires overflow generation data but has none > > > > $ git commit-graph write --split=replace > > Finding commits for commit graph among packed objects: 100% (10235119/10235119), done. > > fatal: commit-graph requires overflow generation data but has none > > > > I initially assumed this may be a bug with how we previously wrote the > > commit-graph, but removing all chains still reliably triggers it: > > > > $ rm -f objects/info/commit-graphs/* > > $ git commit-graph write --split > > Finding commits for commit graph among packed objects: 100% (10235119/10235119), done. > > fatal: commit-graph requires overflow generation data but has none > > > > I haven't yet found the time to dig deeper into why this is happening. > > While the repository is publicly accessible at [1], unfortunately the > > bug seems to be triggered by a commit that's only kept alive by an > > internal reference. > > > > Patrick > > > > [1]: https://gitlab.com/gitlab-com/www-gitlab-com.git > > Thanks for including this information. Just to be clear: did you > include patch 4 in your tests, or not? Patch 4 includes a fix > related to overflow values, so it would be helpful to know if you > found a _different_ bug or if it is the same one. > > Thanks, > -Stolee I initially only applied the first three patches, but after having hit the fatal error I also applied the rest of this series to have a look at whether it is indeed fixed already by one of your later patches. The error remains the same though. Patrick
On 2/28/2022 11:59 AM, Patrick Steinhardt wrote: > On Mon, Feb 28, 2022 at 11:23:38AM -0500, Derrick Stolee wrote: >> On 2/28/2022 10:18 AM, Patrick Steinhardt wrote: >>> I haven't yet found the time to dig deeper into why this is happening. >>> While the repository is publicly accessible at [1], unfortunately the >>> bug seems to be triggered by a commit that's only kept alive by an >>> internal reference. >>> >>> Patrick >>> >>> [1]: https://gitlab.com/gitlab-com/www-gitlab-com.git >> >> Thanks for including this information. Just to be clear: did you >> include patch 4 in your tests, or not? Patch 4 includes a fix >> related to overflow values, so it would be helpful to know if you >> found a _different_ bug or if it is the same one. >> >> Thanks, >> -Stolee > > I initially only applied the first three patches, but after having hit > the fatal error I also applied the rest of this series to have a look at > whether it is indeed fixed already by one of your later patches. The > error remains the same though. Thanks for this extra context. Is this a commit-graph that you wrote with the first three patches and then you get an error when reading it? Do you get the same error when deleting that file and rewriting it with all patches included? Thanks, -Stolee
On Mon, Feb 28, 2022 at 01:44:01PM -0500, Derrick Stolee wrote: > On 2/28/2022 11:59 AM, Patrick Steinhardt wrote: > > On Mon, Feb 28, 2022 at 11:23:38AM -0500, Derrick Stolee wrote: > >> On 2/28/2022 10:18 AM, Patrick Steinhardt wrote: > >>> I haven't yet found the time to dig deeper into why this is happening. > >>> While the repository is publicly accessible at [1], unfortunately the > >>> bug seems to be triggered by a commit that's only kept alive by an > >>> internal reference. > >>> > >>> Patrick > >>> > >>> [1]: https://gitlab.com/gitlab-com/www-gitlab-com.git > >> > >> Thanks for including this information. Just to be clear: did you > >> include patch 4 in your tests, or not? Patch 4 includes a fix > >> related to overflow values, so it would be helpful to know if you > >> found a _different_ bug or if it is the same one. > >> > >> Thanks, > >> -Stolee > > > > I initially only applied the first three patches, but after having hit > > the fatal error I also applied the rest of this series to have a look at > > whether it is indeed fixed already by one of your later patches. The > > error remains the same though. > > Thanks for this extra context. Is this a commit-graph that you wrote > with the first three patches and then you get an error when reading it? > > Do you get the same error when deleting that file and rewriting it with > all patches included? > > Thanks, > -Stolee Yes, I do. I've applied all four patches from v2 on top of 715d08a9e5 (The eighth batch, 2022-02-25) and still get the same results: $ find objects/info/commit-graphs/ objects/info/commit-graphs/ objects/info/commit-graphs/graph-607e641165f3e83a82d5b14af4e611bf2a688f35.graph objects/info/commit-graphs/commit-graph-chain objects/info/commit-graphs/graph-5f357c7573c0075d42d82b28e660bc3eac01bfe8.graph objects/info/commit-graphs/graph-e0c12ead1b61c7c30720ae372e8a9f98d95dfb2d.graph objects/info/commit-graphs/graph-c96723b133c2d81106a01ecd7a8773bb2ef6c2e1.graph $ git commit-graph verify fatal: commit-graph requires overflow generation data but has none $ git commit-graph write Finding commits for commit graph among packed objects: 100% (10235119/10235119), done. Expanding reachable commits in commit graph: 2197197, done. Finding extra edges in commit graph: 100% (2197197/2197197), done. fatal: commit-graph requires overflow generation data but has none $ rm -rf objects/info/commit-graphs/ $ git commit-graph write Finding commits for commit graph among packed objects: 100% (10235119/10235119), done. Expanding reachable commits in commit graph: 2197197, done. Finding extra edges in commit graph: 100% (2197197/2197197), done. fatal: commit-graph requires overflow generation data but has none) So even generating them completely anew doesn't seem to generate the overflow generation data. Patrick
On Tue, Mar 01, 2022 at 10:46:14AM +0100, Patrick Steinhardt wrote: > On Mon, Feb 28, 2022 at 01:44:01PM -0500, Derrick Stolee wrote: > > On 2/28/2022 11:59 AM, Patrick Steinhardt wrote: > > > On Mon, Feb 28, 2022 at 11:23:38AM -0500, Derrick Stolee wrote: > > >> On 2/28/2022 10:18 AM, Patrick Steinhardt wrote: > > >>> I haven't yet found the time to dig deeper into why this is happening. > > >>> While the repository is publicly accessible at [1], unfortunately the > > >>> bug seems to be triggered by a commit that's only kept alive by an > > >>> internal reference. > > >>> > > >>> Patrick > > >>> > > >>> [1]: https://gitlab.com/gitlab-com/www-gitlab-com.git > > >> > > >> Thanks for including this information. Just to be clear: did you > > >> include patch 4 in your tests, or not? Patch 4 includes a fix > > >> related to overflow values, so it would be helpful to know if you > > >> found a _different_ bug or if it is the same one. > > >> > > >> Thanks, > > >> -Stolee > > > > > > I initially only applied the first three patches, but after having hit > > > the fatal error I also applied the rest of this series to have a look at > > > whether it is indeed fixed already by one of your later patches. The > > > error remains the same though. > > > > Thanks for this extra context. Is this a commit-graph that you wrote > > with the first three patches and then you get an error when reading it? > > > > Do you get the same error when deleting that file and rewriting it with > > all patches included? > > > > Thanks, > > -Stolee > > Yes, I do. I've applied all four patches from v2 on top of 715d08a9e5 > (The eighth batch, 2022-02-25) and still get the same results: > > $ find objects/info/commit-graphs/ > objects/info/commit-graphs/ > objects/info/commit-graphs/graph-607e641165f3e83a82d5b14af4e611bf2a688f35.graph > objects/info/commit-graphs/commit-graph-chain > objects/info/commit-graphs/graph-5f357c7573c0075d42d82b28e660bc3eac01bfe8.graph > objects/info/commit-graphs/graph-e0c12ead1b61c7c30720ae372e8a9f98d95dfb2d.graph > objects/info/commit-graphs/graph-c96723b133c2d81106a01ecd7a8773bb2ef6c2e1.graph > > $ git commit-graph verify > fatal: commit-graph requires overflow generation data but has none > > $ git commit-graph write > Finding commits for commit graph among packed objects: 100% (10235119/10235119), done. > Expanding reachable commits in commit graph: 2197197, done. > Finding extra edges in commit graph: 100% (2197197/2197197), done. > fatal: commit-graph requires overflow generation data but has none > > $ rm -rf objects/info/commit-graphs/ > > $ git commit-graph write > Finding commits for commit graph among packed objects: 100% (10235119/10235119), done. > Expanding reachable commits in commit graph: 2197197, done. > Finding extra edges in commit graph: 100% (2197197/2197197), done. > fatal: commit-graph requires overflow generation data but has none) > > So even generating them completely anew doesn't seem to generate the > overflow generation data. > > Patrick I stand corrected. I forgot that the repository at hand was connected to another one via `objects/info/alternates`. If I prune commit-graphs from that alternate, too, then it works alright with your patches. This makes me wonder how such a bugfix should be handled though. As this series is right now, users will be faced with repository corruption as soon as they upgrade their Git version to one that contains this patch series. This corruption needs manual action: they have to go into the repository, delete the commit-graphs and then optionally create new ones. This is not a good user experience, and it's worse on the server-side where we now have a timeframe where all commit-graphs are potentially corrupt. This effectively leads to us being unable to serve those repos at all until we have rewritten the commit-graphs because all commands which make use of the commit-graph will now die: $ git log fatal: commit-graph requires overflow generation data but has none So the question is whether this is a change that needs to be rolled out over multiple releases. First we'd get in the bug fix such that we write correct commit-graphs, and after this fix has been released we can also release the fix that starts to actually parse the generation. This ensures there's a grace period during which we can hopefully correct the data on-disk such that users are not faced with failures. The better alternative would probably be to just gracefully handle commit-graphs which are corrupted in such a way. Can we maybe just continue to not parse generations in case we find that the commit-graph doesn't have overflow generation data? This is more of a general issue though: commit-graphs are an auxiliary cache that is not required for proper operation at all. If we fail to parse it, then Git shouldn't die but instead fail gracefully just ignore it. Furthermore, if we notice that graphs are corrupt when we try to write new ones, we may just delete the corrupt versions automatically and generate completely new ones. Patrick
On 3/1/2022 5:35 AM, Patrick Steinhardt wrote: > On Tue, Mar 01, 2022 at 10:46:14AM +0100, Patrick Steinhardt wrote: >> On Mon, Feb 28, 2022 at 01:44:01PM -0500, Derrick Stolee wrote: >>> On 2/28/2022 11:59 AM, Patrick Steinhardt wrote: >>>> On Mon, Feb 28, 2022 at 11:23:38AM -0500, Derrick Stolee wrote: >>>>> On 2/28/2022 10:18 AM, Patrick Steinhardt wrote: >>>>>> I haven't yet found the time to dig deeper into why this is happening. >>>>>> While the repository is publicly accessible at [1], unfortunately the >>>>>> bug seems to be triggered by a commit that's only kept alive by an >>>>>> internal reference. >>>>>> >>>>>> Patrick >>>>>> >>>>>> [1]: https://gitlab.com/gitlab-com/www-gitlab-com.git >>>>> >>>>> Thanks for including this information. Just to be clear: did you >>>>> include patch 4 in your tests, or not? Patch 4 includes a fix >>>>> related to overflow values, so it would be helpful to know if you >>>>> found a _different_ bug or if it is the same one. >>>>> >>>>> Thanks, >>>>> -Stolee >>>> >>>> I initially only applied the first three patches, but after having hit >>>> the fatal error I also applied the rest of this series to have a look at >>>> whether it is indeed fixed already by one of your later patches. The >>>> error remains the same though. >>> >>> Thanks for this extra context. Is this a commit-graph that you wrote >>> with the first three patches and then you get an error when reading it? >>> >>> Do you get the same error when deleting that file and rewriting it with >>> all patches included? >>> >>> Thanks, >>> -Stolee >> >> Yes, I do. I've applied all four patches from v2 on top of 715d08a9e5 >> (The eighth batch, 2022-02-25) and still get the same results: >> >> $ find objects/info/commit-graphs/ >> objects/info/commit-graphs/ >> objects/info/commit-graphs/graph-607e641165f3e83a82d5b14af4e611bf2a688f35.graph >> objects/info/commit-graphs/commit-graph-chain >> objects/info/commit-graphs/graph-5f357c7573c0075d42d82b28e660bc3eac01bfe8.graph >> objects/info/commit-graphs/graph-e0c12ead1b61c7c30720ae372e8a9f98d95dfb2d.graph >> objects/info/commit-graphs/graph-c96723b133c2d81106a01ecd7a8773bb2ef6c2e1.graph >> >> $ git commit-graph verify >> fatal: commit-graph requires overflow generation data but has none >> >> $ git commit-graph write >> Finding commits for commit graph among packed objects: 100% (10235119/10235119), done. >> Expanding reachable commits in commit graph: 2197197, done. >> Finding extra edges in commit graph: 100% (2197197/2197197), done. >> fatal: commit-graph requires overflow generation data but has none >> >> $ rm -rf objects/info/commit-graphs/ >> >> $ git commit-graph write >> Finding commits for commit graph among packed objects: 100% (10235119/10235119), done. >> Expanding reachable commits in commit graph: 2197197, done. >> Finding extra edges in commit graph: 100% (2197197/2197197), done. >> fatal: commit-graph requires overflow generation data but has none) >> >> So even generating them completely anew doesn't seem to generate the >> overflow generation data. >> >> Patrick > > I stand corrected. I forgot that the repository at hand was connected to > another one via `objects/info/alternates`. If I prune commit-graphs from > that alternate, too, then it works alright with your patches. OK, thanks. That clarifies the situation. I ordered the patches such that the fix in patch 4 could be immediately testable, which is not the case without patch 3. However, it does leave this temporary state where information can be incorrect if only a subset of the series is applied. > This makes me wonder how such a bugfix should be handled though. As this > series is right now, users will be faced with repository corruption as > soon as they upgrade their Git version to one that contains this patch > series. This corruption needs manual action: they have to go into the > repository, delete the commit-graphs and then optionally create new > ones. > > This is not a good user experience, and it's worse on the server-side > where we now have a timeframe where all commit-graphs are potentially > corrupt. This effectively leads to us being unable to serve those repos > at all until we have rewritten the commit-graphs because all commands > which make use of the commit-graph will now die: > > $ git log > fatal: commit-graph requires overflow generation data but has none > > So the question is whether this is a change that needs to be rolled out > over multiple releases. First we'd get in the bug fix such that we write > correct commit-graphs, and after this fix has been released we can also > release the fix that starts to actually parse the generation. This > ensures there's a grace period during which we can hopefully correct the > data on-disk such that users are not faced with failures. You are right that we need to be careful here, but I also think that previous versions of Git always wrote the correct data. Here is my thought process: 1. To get this bug, we need to have parsed the corrected commit date from an existing commit-graph in order to under-count the number of overflow values. 2. Before this series, Git versions were not parsing the corrected commit date, so they recompute the corrected commit date every time the commit-graph is written, getting the proper count of overflow values. For these reasons, data written by previous versions of Git are correct and can be trusted without a staged release. Does this make sense? Or, do you experience a different result when you build commit-graphs with a released Git version and then when writing on top with all patches applied? > The better alternative would probably be to just gracefully handle > commit-graphs which are corrupted in such a way. Can we maybe just > continue to not parse generations in case we find that the commit-graph > doesn't have overflow generation data? > > This is more of a general issue though: commit-graphs are an auxiliary > cache that is not required for proper operation at all. If we fail to > parse it, then Git shouldn't die but instead fail gracefully just ignore > it. Furthermore, if we notice that graphs are corrupt when we try to > write new ones, we may just delete the corrupt versions automatically > and generate completely new ones. You are right that we can be better about failures here and report and error instead of a die(). Especially in this case, we could just revert to topological levels instead of throwing out the commit-graph entirely. This seems like something for another series, so we can be sure to audit all cases of fatal errors when parsing the commit-graph so we catch all of them and do the "best" thing in each case. Thanks, -Stolee
On Tue, Mar 01, 2022 at 09:06:44AM -0500, Derrick Stolee wrote: > On 3/1/2022 5:35 AM, Patrick Steinhardt wrote: > > On Tue, Mar 01, 2022 at 10:46:14AM +0100, Patrick Steinhardt wrote: > >> On Mon, Feb 28, 2022 at 01:44:01PM -0500, Derrick Stolee wrote: > >>> On 2/28/2022 11:59 AM, Patrick Steinhardt wrote: > >>>> On Mon, Feb 28, 2022 at 11:23:38AM -0500, Derrick Stolee wrote: > >>>>> On 2/28/2022 10:18 AM, Patrick Steinhardt wrote: > >>>>>> I haven't yet found the time to dig deeper into why this is happening. > >>>>>> While the repository is publicly accessible at [1], unfortunately the > >>>>>> bug seems to be triggered by a commit that's only kept alive by an > >>>>>> internal reference. > >>>>>> > >>>>>> Patrick > >>>>>> > >>>>>> [1]: https://gitlab.com/gitlab-com/www-gitlab-com.git > >>>>> > >>>>> Thanks for including this information. Just to be clear: did you > >>>>> include patch 4 in your tests, or not? Patch 4 includes a fix > >>>>> related to overflow values, so it would be helpful to know if you > >>>>> found a _different_ bug or if it is the same one. > >>>>> > >>>>> Thanks, > >>>>> -Stolee > >>>> > >>>> I initially only applied the first three patches, but after having hit > >>>> the fatal error I also applied the rest of this series to have a look at > >>>> whether it is indeed fixed already by one of your later patches. The > >>>> error remains the same though. > >>> > >>> Thanks for this extra context. Is this a commit-graph that you wrote > >>> with the first three patches and then you get an error when reading it? > >>> > >>> Do you get the same error when deleting that file and rewriting it with > >>> all patches included? > >>> > >>> Thanks, > >>> -Stolee > >> > >> Yes, I do. I've applied all four patches from v2 on top of 715d08a9e5 > >> (The eighth batch, 2022-02-25) and still get the same results: > >> > >> $ find objects/info/commit-graphs/ > >> objects/info/commit-graphs/ > >> objects/info/commit-graphs/graph-607e641165f3e83a82d5b14af4e611bf2a688f35.graph > >> objects/info/commit-graphs/commit-graph-chain > >> objects/info/commit-graphs/graph-5f357c7573c0075d42d82b28e660bc3eac01bfe8.graph > >> objects/info/commit-graphs/graph-e0c12ead1b61c7c30720ae372e8a9f98d95dfb2d.graph > >> objects/info/commit-graphs/graph-c96723b133c2d81106a01ecd7a8773bb2ef6c2e1.graph > >> > >> $ git commit-graph verify > >> fatal: commit-graph requires overflow generation data but has none > >> > >> $ git commit-graph write > >> Finding commits for commit graph among packed objects: 100% (10235119/10235119), done. > >> Expanding reachable commits in commit graph: 2197197, done. > >> Finding extra edges in commit graph: 100% (2197197/2197197), done. > >> fatal: commit-graph requires overflow generation data but has none > >> > >> $ rm -rf objects/info/commit-graphs/ > >> > >> $ git commit-graph write > >> Finding commits for commit graph among packed objects: 100% (10235119/10235119), done. > >> Expanding reachable commits in commit graph: 2197197, done. > >> Finding extra edges in commit graph: 100% (2197197/2197197), done. > >> fatal: commit-graph requires overflow generation data but has none) > >> > >> So even generating them completely anew doesn't seem to generate the > >> overflow generation data. > >> > >> Patrick > > > > I stand corrected. I forgot that the repository at hand was connected to > > another one via `objects/info/alternates`. If I prune commit-graphs from > > that alternate, too, then it works alright with your patches. > > OK, thanks. That clarifies the situation. > > I ordered the patches such that the fix in patch 4 could be immediately > testable, which is not the case without patch 3. However, it does leave > this temporary state where information can be incorrect if only a subset > of the series is applied. > > > This makes me wonder how such a bugfix should be handled though. As this > > series is right now, users will be faced with repository corruption as > > soon as they upgrade their Git version to one that contains this patch > > series. This corruption needs manual action: they have to go into the > > repository, delete the commit-graphs and then optionally create new > > ones. > > > > This is not a good user experience, and it's worse on the server-side > > where we now have a timeframe where all commit-graphs are potentially > > corrupt. This effectively leads to us being unable to serve those repos > > at all until we have rewritten the commit-graphs because all commands > > which make use of the commit-graph will now die: > > > > $ git log > > fatal: commit-graph requires overflow generation data but has none > > > > So the question is whether this is a change that needs to be rolled out > > over multiple releases. First we'd get in the bug fix such that we write > > correct commit-graphs, and after this fix has been released we can also > > release the fix that starts to actually parse the generation. This > > ensures there's a grace period during which we can hopefully correct the > > data on-disk such that users are not faced with failures. > > You are right that we need to be careful here, but I also think that > previous versions of Git always wrote the correct data. Here is my > thought process: > > 1. To get this bug, we need to have parsed the corrected commit date > from an existing commit-graph in order to under-count the number > of overflow values. > > 2. Before this series, Git versions were not parsing the corrected > commit date, so they recompute the corrected commit date every > time the commit-graph is written, getting the proper count of > overflow values. > > For these reasons, data written by previous versions of Git are > correct and can be trusted without a staged release. > > Does this make sense? Or, do you experience a different result when > you build commit-graphs with a released Git version and then when > writing on top with all patches applied? Just to verify my understanding: you claim that the bug I was hitting shouldn't be encountered in the wild when the release , but only if one were to write a commit-graph with the intermediate stafe until patch 3/4 of your patch series? Hum. I have re-verified, and this indeed seems to play out. So I must've accidentally ran all my testing with the state generated without the final patch which fixes the corruption. I do see lots of the following warnings, but overall I can verify and write the commit-graph just fine: commit-graph generation for commit c80a42de8803e2d77818d0c82f88e748d7f9425f is 1623362063 < 1623362139 Thanks for your patience, and sorry for the noise :) > > The better alternative would probably be to just gracefully handle > > commit-graphs which are corrupted in such a way. Can we maybe just > > continue to not parse generations in case we find that the commit-graph > > doesn't have overflow generation data? > > > > This is more of a general issue though: commit-graphs are an auxiliary > > cache that is not required for proper operation at all. If we fail to > > parse it, then Git shouldn't die but instead fail gracefully just ignore > > it. Furthermore, if we notice that graphs are corrupt when we try to > > write new ones, we may just delete the corrupt versions automatically > > and generate completely new ones. > > You are right that we can be better about failures here and report > and error instead of a die(). Especially in this case, we could just > revert to topological levels instead of throwing out the commit-graph > entirely. > > This seems like something for another series, so we can be sure to > audit all cases of fatal errors when parsing the commit-graph so we > catch all of them and do the "best" thing in each case. I agree. Patrick
On 3/1/2022 9:53 AM, Patrick Steinhardt wrote: > On Tue, Mar 01, 2022 at 09:06:44AM -0500, Derrick Stolee wrote: >> On 3/1/2022 5:35 AM, Patrick Steinhardt wrote: >>> On Tue, Mar 01, 2022 at 10:46:14AM +0100, Patrick Steinhardt wrote: >>>> On Mon, Feb 28, 2022 at 01:44:01PM -0500, Derrick Stolee wrote: >>>>> On 2/28/2022 11:59 AM, Patrick Steinhardt wrote: >>>>>> On Mon, Feb 28, 2022 at 11:23:38AM -0500, Derrick Stolee wrote: >>>>>>> On 2/28/2022 10:18 AM, Patrick Steinhardt wrote: >>>>>>>> [1]: https://gitlab.com/gitlab-com/www-gitlab-com.git ... >>> So the question is whether this is a change that needs to be rolled out >>> over multiple releases. First we'd get in the bug fix such that we write >>> correct commit-graphs, and after this fix has been released we can also >>> release the fix that starts to actually parse the generation. This >>> ensures there's a grace period during which we can hopefully correct the >>> data on-disk such that users are not faced with failures. >> >> You are right that we need to be careful here, but I also think that >> previous versions of Git always wrote the correct data. Here is my >> thought process: >> >> 1. To get this bug, we need to have parsed the corrected commit date >> from an existing commit-graph in order to under-count the number >> of overflow values. >> >> 2. Before this series, Git versions were not parsing the corrected >> commit date, so they recompute the corrected commit date every >> time the commit-graph is written, getting the proper count of >> overflow values. >> >> For these reasons, data written by previous versions of Git are >> correct and can be trusted without a staged release. >> >> Does this make sense? Or, do you experience a different result when >> you build commit-graphs with a released Git version and then when >> writing on top with all patches applied? > > Just to verify my understanding: you claim that the bug I was hitting > shouldn't be encountered in the wild when the release , but > only if one were to write a commit-graph with the intermediate stafe > until patch 3/4 of your patch series? That is my claim. And my testing of the repo at [1] has demonstrated that it works correctly in these cases. > Hum. I have re-verified, and this indeed seems to play out. So I must've > accidentally ran all my testing with the state generated without the > final patch which fixes the corruption. I do see lots of the following > warnings, but overall I can verify and write the commit-graph just fine: > > commit-graph generation for commit c80a42de8803e2d77818d0c82f88e748d7f9425f is 1623362063 < 1623362139 But I'm not able to generate these warnings from either version. I tried generating different levels of a split commit-graph, but could not reproduce it. If you have reproduction steps using current 'master' (or any released Git version) and the four patches here, then I would love to get a full understanding of your errors. Thanks, -Stolee
On Tue, Mar 01, 2022 at 10:25:46AM -0500, Derrick Stolee wrote: > On 3/1/2022 9:53 AM, Patrick Steinhardt wrote: > > On Tue, Mar 01, 2022 at 09:06:44AM -0500, Derrick Stolee wrote: > >> On 3/1/2022 5:35 AM, Patrick Steinhardt wrote: > >>> On Tue, Mar 01, 2022 at 10:46:14AM +0100, Patrick Steinhardt wrote: > >>>> On Mon, Feb 28, 2022 at 01:44:01PM -0500, Derrick Stolee wrote: > >>>>> On 2/28/2022 11:59 AM, Patrick Steinhardt wrote: > >>>>>> On Mon, Feb 28, 2022 at 11:23:38AM -0500, Derrick Stolee wrote: > >>>>>>> On 2/28/2022 10:18 AM, Patrick Steinhardt wrote: > >>>>>>>> [1]: https://gitlab.com/gitlab-com/www-gitlab-com.git > ... > >>> So the question is whether this is a change that needs to be rolled out > >>> over multiple releases. First we'd get in the bug fix such that we write > >>> correct commit-graphs, and after this fix has been released we can also > >>> release the fix that starts to actually parse the generation. This > >>> ensures there's a grace period during which we can hopefully correct the > >>> data on-disk such that users are not faced with failures. > >> > >> You are right that we need to be careful here, but I also think that > >> previous versions of Git always wrote the correct data. Here is my > >> thought process: > >> > >> 1. To get this bug, we need to have parsed the corrected commit date > >> from an existing commit-graph in order to under-count the number > >> of overflow values. > >> > >> 2. Before this series, Git versions were not parsing the corrected > >> commit date, so they recompute the corrected commit date every > >> time the commit-graph is written, getting the proper count of > >> overflow values. > >> > >> For these reasons, data written by previous versions of Git are > >> correct and can be trusted without a staged release. > >> > >> Does this make sense? Or, do you experience a different result when > >> you build commit-graphs with a released Git version and then when > >> writing on top with all patches applied? > > > > Just to verify my understanding: you claim that the bug I was hitting > > shouldn't be encountered in the wild when the release , but > > only if one were to write a commit-graph with the intermediate stafe > > until patch 3/4 of your patch series? > > That is my claim. And my testing of the repo at [1] has demonstrated > that it works correctly in these cases. > > > Hum. I have re-verified, and this indeed seems to play out. So I must've > > accidentally ran all my testing with the state generated without the > > final patch which fixes the corruption. I do see lots of the following > > warnings, but overall I can verify and write the commit-graph just fine: > > > > commit-graph generation for commit c80a42de8803e2d77818d0c82f88e748d7f9425f is 1623362063 < 1623362139 > > But I'm not able to generate these warnings from either version. I > tried generating different levels of a split commit-graph, but > could not reproduce it. If you have reproduction steps using current > 'master' (or any released Git version) and the four patches here, > then I would love to get a full understanding of your errors. > > Thanks, > -Stolee I haven't yet been able to reproduce it with publicly available data, but with the internal references I'm able to evoke the warnings reliably. It only works when I have two repositories connected via alternates, when generating the commit-graph in the linked-to repo first, and then generating the commit-graph in the linking repo. The following recipe allows me to reproduce, but rely on private data: $ git --version git version 2.35.1 # The pool repository is the one we're linked to from the fork. $ cd "$pool" $ rm -rf objects/info/commit-graph objects/info/commit-graph $ git commit-graph write --split $ cd "$fork" $ rm -rf objects/info/commit-graph objects/info/commit-graph $ git commit-graph write --split $ git commit-graph verify --no-progress $ echo $? 0 # This is 715d08a9e51251ad8290b181b6ac3b9e1f9719d7 with your full v2 # applied on top. $ ~/Development/git/bin-wrappers/git --version git version 2.35.1.358.g7ede1bea24 $ ~/Development/git/bin-wrappers/git commit-graph verify --no-progress commit-graph generation for commit 06a91bac00ed11128becd48d5ae77eacd8f24c97 is 1623273624 < 1623273710 commit-graph generation for commit 0ae91029f27238e8f8e109c6bb3907f864dda14f is 1622151146 < 1622151220 commit-graph generation for commit 0d4582a33d8c8e3eb01adbf564f5e1deeb3b56a2 is 1631045222 < 1631045225 commit-graph generation for commit 0daf8976439d7e0bb9710c5ee63b570580e0dc03 is 1620347739 < 1620347789 commit-graph generation for commit 0e0ee8ffb3fa22cee7d28e21cbd6df26454932cf is 1623783297 < 1623783380 commit-graph generation for commit 0f08ab3de6ec115ea8a956a1996cb9759e640e74 is 1621543278 < 1621543339 commit-graph generation for commit 133ed0319b5a66ae0c2be76e5a887b880452b111 is 1620949864 < 1620949915 commit-graph generation for commit 1341b3e6c63343ae94a8a473fa057126ddd4669a is 1637344364 < 1637344384 commit-graph generation for commit 15bdfc501c2c9f23e9353bf6e6a5facd9c32a07a is 1623348103 < 1623348133 ... $ echo $? 1 When generating commit-graphs with your patches applied the `verify` step works alright. I've also by accident stumbled over the original error again: fatal: commit-graph requires overflow generation data but has none This time it's definitely not caused by generating commit-graphs with an in-between state of your patch series because the data comes straight from production with no changes to the commit-graphs performed by myself. There we're running Git v2.33.1 with a couple of backported patches (see [1]). While those patches cause us to make more use of the commit-graph, none modify the way we generate them. Of note is that the commit-graph contains references to commits which don't exist in the ODB anymore. Patrick [1]: https://gitlab.com/gitlab-org/gitlab-git/-/commits/pks-v2.33.1.gl3
On 3/2/2022 8:57 AM, Patrick Steinhardt wrote: > On Tue, Mar 01, 2022 at 10:25:46AM -0500, Derrick Stolee wrote: >> On 3/1/2022 9:53 AM, Patrick Steinhardt wrote: >>> Hum. I have re-verified, and this indeed seems to play out. So I must've >>> accidentally ran all my testing with the state generated without the >>> final patch which fixes the corruption. I do see lots of the following >>> warnings, but overall I can verify and write the commit-graph just fine: >>> >>> commit-graph generation for commit c80a42de8803e2d77818d0c82f88e748d7f9425f is 1623362063 < 1623362139 >> >> But I'm not able to generate these warnings from either version. I >> tried generating different levels of a split commit-graph, but >> could not reproduce it. If you have reproduction steps using current >> 'master' (or any released Git version) and the four patches here, >> then I would love to get a full understanding of your errors. >> >> Thanks, >> -Stolee > > I haven't yet been able to reproduce it with publicly available data, > but with the internal references I'm able to evoke the warnings > reliably. It only works when I have two repositories connected via > alternates, when generating the commit-graph in the linked-to repo > first, and then generating the commit-graph in the linking repo. > > The following recipe allows me to reproduce, but rely on private data: > > $ git --version > git version 2.35.1 > > # The pool repository is the one we're linked to from the fork. > $ cd "$pool" > $ rm -rf objects/info/commit-graph objects/info/commit-graph > $ git commit-graph write --split > > $ cd "$fork" > $ rm -rf objects/info/commit-graph objects/info/commit-graph > $ git commit-graph write --split > > $ git commit-graph verify --no-progress > $ echo $? > 0 > > # This is 715d08a9e51251ad8290b181b6ac3b9e1f9719d7 with your full v2 > # applied on top. > $ ~/Development/git/bin-wrappers/git --version > git version 2.35.1.358.g7ede1bea24 > > $ ~/Development/git/bin-wrappers/git commit-graph verify --no-progress > commit-graph generation for commit 06a91bac00ed11128becd48d5ae77eacd8f24c97 is 1623273624 < 1623273710 > commit-graph generation for commit 0ae91029f27238e8f8e109c6bb3907f864dda14f is 1622151146 < 1622151220 > commit-graph generation for commit 0d4582a33d8c8e3eb01adbf564f5e1deeb3b56a2 is 1631045222 < 1631045225 > commit-graph generation for commit 0daf8976439d7e0bb9710c5ee63b570580e0dc03 is 1620347739 < 1620347789 > commit-graph generation for commit 0e0ee8ffb3fa22cee7d28e21cbd6df26454932cf is 1623783297 < 1623783380 > commit-graph generation for commit 0f08ab3de6ec115ea8a956a1996cb9759e640e74 is 1621543278 < 1621543339 > commit-graph generation for commit 133ed0319b5a66ae0c2be76e5a887b880452b111 is 1620949864 < 1620949915 > commit-graph generation for commit 1341b3e6c63343ae94a8a473fa057126ddd4669a is 1637344364 < 1637344384 > commit-graph generation for commit 15bdfc501c2c9f23e9353bf6e6a5facd9c32a07a is 1623348103 < 1623348133 > ... > $ echo $? > 1 > > When generating commit-graphs with your patches applied the `verify` > step works alright. > > I've also by accident stumbled over the original error again: > > fatal: commit-graph requires overflow generation data but has none > > This time it's definitely not caused by generating commit-graphs with an > in-between state of your patch series because the data comes straight > from production with no changes to the commit-graphs performed by > myself. There we're running Git v2.33.1 with a couple of backported > patches (see [1]). While those patches cause us to make more use of the > commit-graph, none modify the way we generate them. > > Of note is that the commit-graph contains references to commits which > don't exist in the ODB anymore. > > Patrick > > [1]: https://gitlab.com/gitlab-org/gitlab-git/-/commits/pks-v2.33.1.gl3 Thank you for your diligence here, Patrick. I really appreciate the work you're putting in to verify the situation. Since our repro relies on private information, but is consistent, I wonder if we should take the patch below, which starts to ignore the older generation number v2 data and only writes freshly-computed numbers. Thanks, -Stolee --- 8< --- From c53d8bd52bbcab3862e8a826ee75692edc7e4173 Mon Sep 17 00:00:00 2001 From: Derrick Stolee <derrickstolee@github.com> Date: Wed, 2 Mar 2022 09:45:13 -0500 Subject: [PATCH v3 5/4] commit-graph: declare bankruptcy on GDAT chunks The Generation Data (GDAT) and Generation Data Overflow (GDOV) chunks store corrected commit date offsets, used for generation number v2. Recent changes have demonstrated that previous versions of Git were incorrectly parsing data from these chunks, but might have also been writing them incorrectly. I asserted [1] that the previous fixes were sufficient because the known reasons for incorrectly writing generation number v2 data relied on parsing the information incorrectly out of a commit-graph file, but the previous versions of Git were not reading the generation number v2 data. However, Patrick demonstrated [2] a case where in split commit-graphs across an alternate boundary (and possibly some other special conditions) it was possible to have a commit-graph that was generated by a previous version of Git have incorrect generation number v2 data which results in errors like the following: commit-graph generation for commit <oid> is 1623273624 < 1623273710 [1] https://lore.kernel.org/git/f50e74f0-9ffa-f4f2-4663-269801495ed3@github.com/ [2] https://lore.kernel.org/git/Yh93vOkt2DkrGPh2@ncase/ Clearly, there is something else going on. The situation is not completely understood, but the errors do not reproduce if the commit-graphs are all generated by a Git version including these recent fixes. If we cannot trust the existing data in the GDAT and GDOV chunks, then we can alter the format to change the chunk IDs for these chunks. This causes the new version of Git to silently ignore the older chunks (and disabling generation number v2 in the process) while writing new commit-graph files with correct data in the GDA2 and GDO2 chunks. Update commit-graph-format.txt including a historical note about these deprecated chunks. Reported-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Derrick Stolee <derrickstolee@github.com> --- Documentation/technical/commit-graph-format.txt | 12 ++++++++++-- commit-graph.c | 4 ++-- 2 files changed, 12 insertions(+), 4 deletions(-) diff --git a/Documentation/technical/commit-graph-format.txt b/Documentation/technical/commit-graph-format.txt index 87971c27dd7..484b185ba98 100644 --- a/Documentation/technical/commit-graph-format.txt +++ b/Documentation/technical/commit-graph-format.txt @@ -93,7 +93,7 @@ CHUNK DATA: 2 bits of the lowest byte, storing the 33rd and 34th bit of the commit time. - Generation Data (ID: {'G', 'D', 'A', 'T' }) (N * 4 bytes) [Optional] + Generation Data (ID: {'G', 'D', 'A', '2' }) (N * 4 bytes) [Optional] * This list of 4-byte values store corrected commit date offsets for the commits, arranged in the same order as commit data chunk. * If the corrected commit date offset cannot be stored within 31 bits, @@ -104,7 +104,7 @@ CHUNK DATA: by compatible versions of Git and in case of split commit-graph chains, the topmost layer also has Generation Data chunk. - Generation Data Overflow (ID: {'G', 'D', 'O', 'V' }) [Optional] + Generation Data Overflow (ID: {'G', 'D', 'O', '2' }) [Optional] * This list of 8-byte values stores the corrected commit date offsets for commits with corrected commit date offsets that cannot be stored within 31 bits. @@ -156,3 +156,11 @@ CHUNK DATA: TRAILER: H-byte HASH-checksum of all of the above. + +== Historical Notes: + +The Generation Data (GDA2) and Generation Data Overflow (GDO2) chunks have +the number '2' in their chunk IDs because a previous version of Git wrote +possibly erroneous data in these chunks with the IDs "GDAT" and "GDOV". By +changing the IDs, newer versions of Git will silently ignore those older +chunks and write the new information without trusting the incorrect data. diff --git a/commit-graph.c b/commit-graph.c index b86a6a634fe..fb2ced0bd6d 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -39,8 +39,8 @@ void git_test_write_commit_graph_or_die(void) #define GRAPH_CHUNKID_OIDFANOUT 0x4f494446 /* "OIDF" */ #define GRAPH_CHUNKID_OIDLOOKUP 0x4f49444c /* "OIDL" */ #define GRAPH_CHUNKID_DATA 0x43444154 /* "CDAT" */ -#define GRAPH_CHUNKID_GENERATION_DATA 0x47444154 /* "GDAT" */ -#define GRAPH_CHUNKID_GENERATION_DATA_OVERFLOW 0x47444f56 /* "GDOV" */ +#define GRAPH_CHUNKID_GENERATION_DATA 0x47444132 /* "GDA2" */ +#define GRAPH_CHUNKID_GENERATION_DATA_OVERFLOW 0x47444f32 /* "GDO2" */ #define GRAPH_CHUNKID_EXTRAEDGES 0x45444745 /* "EDGE" */ #define GRAPH_CHUNKID_BLOOMINDEXES 0x42494458 /* "BIDX" */ #define GRAPH_CHUNKID_BLOOMDATA 0x42444154 /* "BDAT" */
Derrick Stolee <derrickstolee@github.com> writes: > Since our repro relies on private information, but is consistent, I > wonder if we should take the patch below, which starts to ignore the > older generation number v2 data and only writes freshly-computed > numbers. ;-) > Clearly, there is something else going on. The situation is not > completely understood, but the errors do not reproduce if the > commit-graphs are all generated by a Git version including these recent > fixes. Do you mean "we know doing X and then Y and then Z on this particular private data with older version of Git without those two fixes will lead to a broken timestamp, but doing exactly the same with the two fixes, the breakage does not reproduce"? If so, that is quite encouraging news. Thanks for working well together. > If we cannot trust the existing data in the GDAT and GDOV chunks, then > we can alter the format to change the chunk IDs for these chunks. This > causes the new version of Git to silently ignore the older chunks (and > disabling generation number v2 in the process) while writing new > commit-graph files with correct data in the GDA2 and GDO2 chunks. > > Update commit-graph-format.txt including a historical note about these > deprecated chunks. Sensible. > @@ -156,3 +156,11 @@ CHUNK DATA: > TRAILER: > > H-byte HASH-checksum of all of the above. > + > +== Historical Notes: > + > +The Generation Data (GDA2) and Generation Data Overflow (GDO2) chunks have > +the number '2' in their chunk IDs because a previous version of Git wrote > +possibly erroneous data in these chunks with the IDs "GDAT" and "GDOV". By > +changing the IDs, newer versions of Git will silently ignore those older > +chunks and write the new information without trusting the incorrect data. Good. How does a new version of Git skip and ignore GDAT and GDOV in existing files? By not having any code to recognize what they are? I am wondering if there is some notion of "if you do not understand what this chunk is, you are incapable of handling this file correctly, so do not use it" kind of bit per chunks (similar to the index extensions where ones that begin with [A-Z] are optional) that may negatively affect this plan. Thanks. > diff --git a/commit-graph.c b/commit-graph.c > index b86a6a634fe..fb2ced0bd6d 100644 > --- a/commit-graph.c > +++ b/commit-graph.c > @@ -39,8 +39,8 @@ void git_test_write_commit_graph_or_die(void) > #define GRAPH_CHUNKID_OIDFANOUT 0x4f494446 /* "OIDF" */ > #define GRAPH_CHUNKID_OIDLOOKUP 0x4f49444c /* "OIDL" */ > #define GRAPH_CHUNKID_DATA 0x43444154 /* "CDAT" */ > -#define GRAPH_CHUNKID_GENERATION_DATA 0x47444154 /* "GDAT" */ > -#define GRAPH_CHUNKID_GENERATION_DATA_OVERFLOW 0x47444f56 /* "GDOV" */ > +#define GRAPH_CHUNKID_GENERATION_DATA 0x47444132 /* "GDA2" */ > +#define GRAPH_CHUNKID_GENERATION_DATA_OVERFLOW 0x47444f32 /* "GDO2" */ > #define GRAPH_CHUNKID_EXTRAEDGES 0x45444745 /* "EDGE" */ > #define GRAPH_CHUNKID_BLOOMINDEXES 0x42494458 /* "BIDX" */ > #define GRAPH_CHUNKID_BLOOMDATA 0x42444154 /* "BDAT" */
On 3/2/2022 1:15 PM, Junio C Hamano wrote: > Derrick Stolee <derrickstolee@github.com> writes: > >> Since our repro relies on private information, but is consistent, I >> wonder if we should take the patch below, which starts to ignore the >> older generation number v2 data and only writes freshly-computed >> numbers. > > ;-) > >> Clearly, there is something else going on. The situation is not >> completely understood, but the errors do not reproduce if the >> commit-graphs are all generated by a Git version including these recent >> fixes. > > Do you mean "we know doing X and then Y and then Z on this > particular private data with older version of Git without those two > fixes will lead to a broken timestamp, but doing exactly the same > with the two fixes, the breakage does not reproduce"? If so, that > is quite encouraging news. Thanks for working well together. Yes, that is my understanding. >> If we cannot trust the existing data in the GDAT and GDOV chunks, then >> we can alter the format to change the chunk IDs for these chunks. This >> causes the new version of Git to silently ignore the older chunks (and >> disabling generation number v2 in the process) while writing new >> commit-graph files with correct data in the GDA2 and GDO2 chunks. >> >> Update commit-graph-format.txt including a historical note about these >> deprecated chunks. > > Sensible. > >> @@ -156,3 +156,11 @@ CHUNK DATA: >> TRAILER: >> >> H-byte HASH-checksum of all of the above. >> + >> +== Historical Notes: >> + >> +The Generation Data (GDA2) and Generation Data Overflow (GDO2) chunks have >> +the number '2' in their chunk IDs because a previous version of Git wrote >> +possibly erroneous data in these chunks with the IDs "GDAT" and "GDOV". By >> +changing the IDs, newer versions of Git will silently ignore those older >> +chunks and write the new information without trusting the incorrect data. > > Good. How does a new version of Git skip and ignore GDAT and GDOV > in existing files? By not having any code to recognize what they > are? > > I am wondering if there is some notion of "if you do not understand > what this chunk is, you are incapable of handling this file > correctly, so do not use it" kind of bit per chunks (similar to the > index extensions where ones that begin with [A-Z] are optional) that > may negatively affect this plan. The chunk IDs do not have this special casing rule. This is a bit unfortunate for certain cases like adding something that _must_ be understood. Here, it works to our benefit that GDAT and GDOV are optional and can be safely ignored. Thus, clients with this patch will ignore GDAT and GDOV and continue using topological levels form the CDAT chunk. Older clients without this patch will ignore the new GDA2 and GDO2 chunks and continue using topological levels. For Git versions without this topic branch, this "continue using topological levels" means no change of behavior at all. Thanks, -Stolee
Derrick Stolee <derrickstolee@github.com> writes: >> I am wondering if there is some notion of "if you do not understand >> what this chunk is, you are incapable of handling this file >> correctly, so do not use it" kind of bit per chunks (similar to the >> index extensions where ones that begin with [A-Z] are optional) that >> may negatively affect this plan. > > The chunk IDs do not have this special casing rule. This is a > bit unfortunate for certain cases like adding something that _must_ > be understood. Here, it works to our benefit that GDAT and GDOV are > optional and can be safely ignored. Thus, clients with this patch > will ignore GDAT and GDOV and continue using topological levels > form the CDAT chunk. Older clients without this patch will ignore > the new GDA2 and GDO2 chunks and continue using topological levels. > > For Git versions without this topic branch, this "continue using > topological levels" means no change of behavior at all. Excellent. Thanks.
On Wed, Mar 02, 2022 at 09:57:17AM -0500, Derrick Stolee wrote: > On 3/2/2022 8:57 AM, Patrick Steinhardt wrote: > > On Tue, Mar 01, 2022 at 10:25:46AM -0500, Derrick Stolee wrote: > >> On 3/1/2022 9:53 AM, Patrick Steinhardt wrote: > > >>> Hum. I have re-verified, and this indeed seems to play out. So I must've > >>> accidentally ran all my testing with the state generated without the > >>> final patch which fixes the corruption. I do see lots of the following > >>> warnings, but overall I can verify and write the commit-graph just fine: > >>> > >>> commit-graph generation for commit c80a42de8803e2d77818d0c82f88e748d7f9425f is 1623362063 < 1623362139 > >> > >> But I'm not able to generate these warnings from either version. I > >> tried generating different levels of a split commit-graph, but > >> could not reproduce it. If you have reproduction steps using current > >> 'master' (or any released Git version) and the four patches here, > >> then I would love to get a full understanding of your errors. > >> > >> Thanks, > >> -Stolee > > > > I haven't yet been able to reproduce it with publicly available data, > > but with the internal references I'm able to evoke the warnings > > reliably. It only works when I have two repositories connected via > > alternates, when generating the commit-graph in the linked-to repo > > first, and then generating the commit-graph in the linking repo. > > > > The following recipe allows me to reproduce, but rely on private data: > > > > $ git --version > > git version 2.35.1 > > > > # The pool repository is the one we're linked to from the fork. > > $ cd "$pool" > > $ rm -rf objects/info/commit-graph objects/info/commit-graph > > $ git commit-graph write --split > > > > $ cd "$fork" > > $ rm -rf objects/info/commit-graph objects/info/commit-graph > > $ git commit-graph write --split > > > > $ git commit-graph verify --no-progress > > $ echo $? > > 0 > > > > # This is 715d08a9e51251ad8290b181b6ac3b9e1f9719d7 with your full v2 > > # applied on top. > > $ ~/Development/git/bin-wrappers/git --version > > git version 2.35.1.358.g7ede1bea24 > > > > $ ~/Development/git/bin-wrappers/git commit-graph verify --no-progress > > commit-graph generation for commit 06a91bac00ed11128becd48d5ae77eacd8f24c97 is 1623273624 < 1623273710 > > commit-graph generation for commit 0ae91029f27238e8f8e109c6bb3907f864dda14f is 1622151146 < 1622151220 > > commit-graph generation for commit 0d4582a33d8c8e3eb01adbf564f5e1deeb3b56a2 is 1631045222 < 1631045225 > > commit-graph generation for commit 0daf8976439d7e0bb9710c5ee63b570580e0dc03 is 1620347739 < 1620347789 > > commit-graph generation for commit 0e0ee8ffb3fa22cee7d28e21cbd6df26454932cf is 1623783297 < 1623783380 > > commit-graph generation for commit 0f08ab3de6ec115ea8a956a1996cb9759e640e74 is 1621543278 < 1621543339 > > commit-graph generation for commit 133ed0319b5a66ae0c2be76e5a887b880452b111 is 1620949864 < 1620949915 > > commit-graph generation for commit 1341b3e6c63343ae94a8a473fa057126ddd4669a is 1637344364 < 1637344384 > > commit-graph generation for commit 15bdfc501c2c9f23e9353bf6e6a5facd9c32a07a is 1623348103 < 1623348133 > > ... > > $ echo $? > > 1 > > > > When generating commit-graphs with your patches applied the `verify` > > step works alright. > > > > I've also by accident stumbled over the original error again: > > > > fatal: commit-graph requires overflow generation data but has none > > > > This time it's definitely not caused by generating commit-graphs with an > > in-between state of your patch series because the data comes straight > > from production with no changes to the commit-graphs performed by > > myself. There we're running Git v2.33.1 with a couple of backported > > patches (see [1]). While those patches cause us to make more use of the > > commit-graph, none modify the way we generate them. > > > > Of note is that the commit-graph contains references to commits which > > don't exist in the ODB anymore. > > > > Patrick > > > > [1]: https://gitlab.com/gitlab-org/gitlab-git/-/commits/pks-v2.33.1.gl3 > > Thank you for your diligence here, Patrick. I really appreciate the > work you're putting in to verify the situation. > > Since our repro relies on private information, but is consistent, I > wonder if we should take the patch below, which starts to ignore the > older generation number v2 data and only writes freshly-computed > numbers. > > Thanks, > -Stolee Thanks. With your patch below the `fatal:` error is gone, but I'm still seeing the same errors with regards to the commit-graph generations. So to summarize my findings: - This bug occurs when writing commit-graphs with v2.35.1, but reading them with your patches. - This bug occurs when I have two repositories connected via an alternates file. I haven't yet been able to reproduce it in a single repository that is not connected to a separate ODB. - This bug only occurs when I first generate the commit-graph in the repository I'm borrowing objects from. - This bug only occurs when I write commit-graphs with `--split` in both repositories. "Normal" commit-graphs don't have this issue, and neither can I see it with `--split=replace` or mixed-type commit-graphs. Beware, the following explanation is based on my very basic understanding of the commit-graph code and thus more likely to be wrong than right: With the old Git version, we've been mis-parsing the generation because `read_generation_data` wasn't ever set. As a result it can happen that the second split commit-graph we're generating computes its own generation numbers from the wrong starting point because it uses the mis-parsed generation numbers from the parent commit-graph. With your patches, we start to correctly account for overflows and would thus end up with a different value for the generation depending on where we parse the commit from: if we parse it from the first commit-graph it would be correct because it's contains the "root" of the generation numbers. But if we parse a commit from the second commit-graph we may have a mismatch because the generation numbers in there may have been derived from generation numbers mis-parsed from the first commit-graph. And because these would be wrong in case there was an overflow it is clear that the new corrected generation number may be wrong, as well. Patrick > --- 8< --- > > From c53d8bd52bbcab3862e8a826ee75692edc7e4173 Mon Sep 17 00:00:00 2001 > From: Derrick Stolee <derrickstolee@github.com> > Date: Wed, 2 Mar 2022 09:45:13 -0500 > Subject: [PATCH v3 5/4] commit-graph: declare bankruptcy on GDAT chunks > > The Generation Data (GDAT) and Generation Data Overflow (GDOV) chunks > store corrected commit date offsets, used for generation number v2. > Recent changes have demonstrated that previous versions of Git were > incorrectly parsing data from these chunks, but might have also been > writing them incorrectly. > > I asserted [1] that the previous fixes were sufficient because the known > reasons for incorrectly writing generation number v2 data relied on > parsing the information incorrectly out of a commit-graph file, but the > previous versions of Git were not reading the generation number v2 data. > > However, Patrick demonstrated [2] a case where in split commit-graphs > across an alternate boundary (and possibly some other special > conditions) it was possible to have a commit-graph that was generated by > a previous version of Git have incorrect generation number v2 data which > results in errors like the following: > > commit-graph generation for commit <oid> is 1623273624 < 1623273710 > > [1] https://lore.kernel.org/git/f50e74f0-9ffa-f4f2-4663-269801495ed3@github.com/ > [2] https://lore.kernel.org/git/Yh93vOkt2DkrGPh2@ncase/ > > Clearly, there is something else going on. The situation is not > completely understood, but the errors do not reproduce if the > commit-graphs are all generated by a Git version including these recent > fixes. > > If we cannot trust the existing data in the GDAT and GDOV chunks, then > we can alter the format to change the chunk IDs for these chunks. This > causes the new version of Git to silently ignore the older chunks (and > disabling generation number v2 in the process) while writing new > commit-graph files with correct data in the GDA2 and GDO2 chunks. > > Update commit-graph-format.txt including a historical note about these > deprecated chunks. > > Reported-by: Patrick Steinhardt <ps@pks.im> > Signed-off-by: Derrick Stolee <derrickstolee@github.com> > --- > Documentation/technical/commit-graph-format.txt | 12 ++++++++++-- > commit-graph.c | 4 ++-- > 2 files changed, 12 insertions(+), 4 deletions(-) > > diff --git a/Documentation/technical/commit-graph-format.txt b/Documentation/technical/commit-graph-format.txt > index 87971c27dd7..484b185ba98 100644 > --- a/Documentation/technical/commit-graph-format.txt > +++ b/Documentation/technical/commit-graph-format.txt > @@ -93,7 +93,7 @@ CHUNK DATA: > 2 bits of the lowest byte, storing the 33rd and 34th bit of the > commit time. > > - Generation Data (ID: {'G', 'D', 'A', 'T' }) (N * 4 bytes) [Optional] > + Generation Data (ID: {'G', 'D', 'A', '2' }) (N * 4 bytes) [Optional] > * This list of 4-byte values store corrected commit date offsets for the > commits, arranged in the same order as commit data chunk. > * If the corrected commit date offset cannot be stored within 31 bits, > @@ -104,7 +104,7 @@ CHUNK DATA: > by compatible versions of Git and in case of split commit-graph chains, > the topmost layer also has Generation Data chunk. > > - Generation Data Overflow (ID: {'G', 'D', 'O', 'V' }) [Optional] > + Generation Data Overflow (ID: {'G', 'D', 'O', '2' }) [Optional] > * This list of 8-byte values stores the corrected commit date offsets > for commits with corrected commit date offsets that cannot be > stored within 31 bits. > @@ -156,3 +156,11 @@ CHUNK DATA: > TRAILER: > > H-byte HASH-checksum of all of the above. > + > +== Historical Notes: > + > +The Generation Data (GDA2) and Generation Data Overflow (GDO2) chunks have > +the number '2' in their chunk IDs because a previous version of Git wrote > +possibly erroneous data in these chunks with the IDs "GDAT" and "GDOV". By > +changing the IDs, newer versions of Git will silently ignore those older > +chunks and write the new information without trusting the incorrect data. > diff --git a/commit-graph.c b/commit-graph.c > index b86a6a634fe..fb2ced0bd6d 100644 > --- a/commit-graph.c > +++ b/commit-graph.c > @@ -39,8 +39,8 @@ void git_test_write_commit_graph_or_die(void) > #define GRAPH_CHUNKID_OIDFANOUT 0x4f494446 /* "OIDF" */ > #define GRAPH_CHUNKID_OIDLOOKUP 0x4f49444c /* "OIDL" */ > #define GRAPH_CHUNKID_DATA 0x43444154 /* "CDAT" */ > -#define GRAPH_CHUNKID_GENERATION_DATA 0x47444154 /* "GDAT" */ > -#define GRAPH_CHUNKID_GENERATION_DATA_OVERFLOW 0x47444f56 /* "GDOV" */ > +#define GRAPH_CHUNKID_GENERATION_DATA 0x47444132 /* "GDA2" */ > +#define GRAPH_CHUNKID_GENERATION_DATA_OVERFLOW 0x47444f32 /* "GDO2" */ > #define GRAPH_CHUNKID_EXTRAEDGES 0x45444745 /* "EDGE" */ > #define GRAPH_CHUNKID_BLOOMINDEXES 0x42494458 /* "BIDX" */ > #define GRAPH_CHUNKID_BLOOMDATA 0x42444154 /* "BDAT" */ > -- > 2.35.1.138.gfc5de29e9e6 > > > >
On 3/3/2022 6:19 AM, Patrick Steinhardt wrote: > On Wed, Mar 02, 2022 at 09:57:17AM -0500, Derrick Stolee wrote: >> On 3/2/2022 8:57 AM, Patrick Steinhardt wrote: >>> On Tue, Mar 01, 2022 at 10:25:46AM -0500, Derrick Stolee wrote: >>>> On 3/1/2022 9:53 AM, Patrick Steinhardt wrote: >> >>>>> Hum. I have re-verified, and this indeed seems to play out. So I must've >>>>> accidentally ran all my testing with the state generated without the >>>>> final patch which fixes the corruption. I do see lots of the following >>>>> warnings, but overall I can verify and write the commit-graph just fine: >>>>> >>>>> commit-graph generation for commit c80a42de8803e2d77818d0c82f88e748d7f9425f is 1623362063 < 1623362139 >>>> >>>> But I'm not able to generate these warnings from either version. I >>>> tried generating different levels of a split commit-graph, but >>>> could not reproduce it. If you have reproduction steps using current >>>> 'master' (or any released Git version) and the four patches here, >>>> then I would love to get a full understanding of your errors. >>>> >>>> Thanks, >>>> -Stolee >>> >>> I haven't yet been able to reproduce it with publicly available data, >>> but with the internal references I'm able to evoke the warnings >>> reliably. It only works when I have two repositories connected via >>> alternates, when generating the commit-graph in the linked-to repo >>> first, and then generating the commit-graph in the linking repo. >>> >>> The following recipe allows me to reproduce, but rely on private data: >>> >>> $ git --version >>> git version 2.35.1 >>> >>> # The pool repository is the one we're linked to from the fork. >>> $ cd "$pool" >>> $ rm -rf objects/info/commit-graph objects/info/commit-graph >>> $ git commit-graph write --split >>> >>> $ cd "$fork" >>> $ rm -rf objects/info/commit-graph objects/info/commit-graph >>> $ git commit-graph write --split >>> >>> $ git commit-graph verify --no-progress >>> $ echo $? >>> 0 >>> >>> # This is 715d08a9e51251ad8290b181b6ac3b9e1f9719d7 with your full v2 >>> # applied on top. >>> $ ~/Development/git/bin-wrappers/git --version >>> git version 2.35.1.358.g7ede1bea24 >>> >>> $ ~/Development/git/bin-wrappers/git commit-graph verify --no-progress >>> commit-graph generation for commit 06a91bac00ed11128becd48d5ae77eacd8f24c97 is 1623273624 < 1623273710 >>> commit-graph generation for commit 0ae91029f27238e8f8e109c6bb3907f864dda14f is 1622151146 < 1622151220 >>> commit-graph generation for commit 0d4582a33d8c8e3eb01adbf564f5e1deeb3b56a2 is 1631045222 < 1631045225 >>> commit-graph generation for commit 0daf8976439d7e0bb9710c5ee63b570580e0dc03 is 1620347739 < 1620347789 >>> commit-graph generation for commit 0e0ee8ffb3fa22cee7d28e21cbd6df26454932cf is 1623783297 < 1623783380 >>> commit-graph generation for commit 0f08ab3de6ec115ea8a956a1996cb9759e640e74 is 1621543278 < 1621543339 >>> commit-graph generation for commit 133ed0319b5a66ae0c2be76e5a887b880452b111 is 1620949864 < 1620949915 >>> commit-graph generation for commit 1341b3e6c63343ae94a8a473fa057126ddd4669a is 1637344364 < 1637344384 >>> commit-graph generation for commit 15bdfc501c2c9f23e9353bf6e6a5facd9c32a07a is 1623348103 < 1623348133 >>> ... >>> $ echo $? >>> 1 >>> >>> When generating commit-graphs with your patches applied the `verify` >>> step works alright. >>> >>> I've also by accident stumbled over the original error again: >>> >>> fatal: commit-graph requires overflow generation data but has none >>> >>> This time it's definitely not caused by generating commit-graphs with an >>> in-between state of your patch series because the data comes straight >>> from production with no changes to the commit-graphs performed by >>> myself. There we're running Git v2.33.1 with a couple of backported >>> patches (see [1]). While those patches cause us to make more use of the >>> commit-graph, none modify the way we generate them. >>> >>> Of note is that the commit-graph contains references to commits which >>> don't exist in the ODB anymore. >>> >>> Patrick >>> >>> [1]: https://gitlab.com/gitlab-org/gitlab-git/-/commits/pks-v2.33.1.gl3 >> >> Thank you for your diligence here, Patrick. I really appreciate the >> work you're putting in to verify the situation. >> >> Since our repro relies on private information, but is consistent, I >> wonder if we should take the patch below, which starts to ignore the >> older generation number v2 data and only writes freshly-computed >> numbers. >> >> Thanks, >> -Stolee > > Thanks. With your patch below the `fatal:` error is gone, but I'm still > seeing the same errors with regards to the commit-graph generations. This is disappointing and unexpected. Thanks for verifying. > So to summarize my findings: > > - This bug occurs when writing commit-graphs with v2.35.1, but > reading them with your patches. > > - This bug occurs when I have two repositories connected via an > alternates file. I haven't yet been able to reproduce it in a > single repository that is not connected to a separate ODB. This is an interesting distinction. One that I didn't think would matter, but I'll look into the code to see how that could affect things. > - This bug only occurs when I first generate the commit-graph in the > repository I'm borrowing objects from. > > - This bug only occurs when I write commit-graphs with `--split` in > both repositories. "Normal" commit-graphs don't have this issue, > and neither can I see it with `--split=replace` or mixed-type > commit-graphs. > > Beware, the following explanation is based on my very basic > understanding of the commit-graph code and thus more likely to be wrong > than right: > > With the old Git version, we've been mis-parsing the generation because > `read_generation_data` wasn't ever set. As a result it can happen that > the second split commit-graph we're generating computes its own > generation numbers from the wrong starting point because it uses the > mis-parsed generation numbers from the parent commit-graph. > > With your patches, we start to correctly account for overflows and would > thus end up with a different value for the generation depending on where > we parse the commit from: if we parse it from the first commit-graph it > would be correct because it's contains the "root" of the generation > numbers. But if we parse a commit from the second commit-graph we may > have a mismatch because the generation numbers in there may have been > derived from generation numbers mis-parsed from the first commit-graph. > And because these would be wrong in case there was an overflow it is > clear that the new corrected generation number may be wrong, as well. Hm. My expectation was that the older layers of the split commit-graph would have read_generation_data disabled (because the new Git version cannot read the GDAT chunk) and then the validate_mixed_generation_chain() method would remove read_generation_data from all of the graphs in the list. Combining this with your thoughts on cross-alternate split commit-graphs, this makes me think we should try this: --- >8 --- diff --git a/commit-graph.c b/commit-graph.c index fb2ced0bd6..74c6534f56 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -609,8 +609,6 @@ struct commit_graph *read_commit_graph_one(struct repository *r, if (!g) g = load_commit_graph_chain(r, odb); - validate_mixed_generation_chain(g); - return g; } @@ -668,7 +666,13 @@ static int prepare_commit_graph(struct repository *r) !r->objects->commit_graph && odb; odb = odb->next) prepare_commit_graph_one(r, odb); - return !!r->objects->commit_graph; + + if (r->objects->commit_graph) { + validate_mixed_generation_chain(r->objects->commit_graph); + return 1; + } + + return 0; } int generation_numbers_enabled(struct repository *r) --- >8 --- Notice that I'm moving the validate_mixed_generation_chain() call out of read_commit_graph_one() and into prepare_commit_graph(). To my understanding, this _should_ have an equivalent end state as the old code, but might be worth trying just as a quick check. I will continue investigating and try to reproduce with this additional constraint of working across an alternate. Thanks, -Stolee
On 3/3/2022 11:00 AM, Derrick Stolee wrote: > On 3/3/2022 6:19 AM, Patrick Steinhardt wrote: >> On Wed, Mar 02, 2022 at 09:57:17AM -0500, Derrick Stolee wrote: >>> On 3/2/2022 8:57 AM, Patrick Steinhardt wrote: >>>> On Tue, Mar 01, 2022 at 10:25:46AM -0500, Derrick Stolee wrote: >>>>> On 3/1/2022 9:53 AM, Patrick Steinhardt wrote: >>> >>>>>> Hum. I have re-verified, and this indeed seems to play out. So I must've >>>>>> accidentally ran all my testing with the state generated without the >>>>>> final patch which fixes the corruption. I do see lots of the following >>>>>> warnings, but overall I can verify and write the commit-graph just fine: >>>>>> >>>>>> commit-graph generation for commit c80a42de8803e2d77818d0c82f88e748d7f9425f is 1623362063 < 1623362139 >>>>> >>>>> But I'm not able to generate these warnings from either version. I >>>>> tried generating different levels of a split commit-graph, but >>>>> could not reproduce it. If you have reproduction steps using current >>>>> 'master' (or any released Git version) and the four patches here, >>>>> then I would love to get a full understanding of your errors. >>>>> >>>>> Thanks, >>>>> -Stolee >>>> >>>> I haven't yet been able to reproduce it with publicly available data, >>>> but with the internal references I'm able to evoke the warnings >>>> reliably. It only works when I have two repositories connected via >>>> alternates, when generating the commit-graph in the linked-to repo >>>> first, and then generating the commit-graph in the linking repo. >>>> >>>> The following recipe allows me to reproduce, but rely on private data: >>>> >>>> $ git --version >>>> git version 2.35.1 >>>> >>>> # The pool repository is the one we're linked to from the fork. >>>> $ cd "$pool" >>>> $ rm -rf objects/info/commit-graph objects/info/commit-graph >>>> $ git commit-graph write --split >>>> >>>> $ cd "$fork" >>>> $ rm -rf objects/info/commit-graph objects/info/commit-graph >>>> $ git commit-graph write --split >>>> >>>> $ git commit-graph verify --no-progress >>>> $ echo $? >>>> 0 >>>> >>>> # This is 715d08a9e51251ad8290b181b6ac3b9e1f9719d7 with your full v2 >>>> # applied on top. >>>> $ ~/Development/git/bin-wrappers/git --version >>>> git version 2.35.1.358.g7ede1bea24 >>>> >>>> $ ~/Development/git/bin-wrappers/git commit-graph verify --no-progress >>>> commit-graph generation for commit 06a91bac00ed11128becd48d5ae77eacd8f24c97 is 1623273624 < 1623273710 >>>> commit-graph generation for commit 0ae91029f27238e8f8e109c6bb3907f864dda14f is 1622151146 < 1622151220 >>>> commit-graph generation for commit 0d4582a33d8c8e3eb01adbf564f5e1deeb3b56a2 is 1631045222 < 1631045225 >>>> commit-graph generation for commit 0daf8976439d7e0bb9710c5ee63b570580e0dc03 is 1620347739 < 1620347789 >>>> commit-graph generation for commit 0e0ee8ffb3fa22cee7d28e21cbd6df26454932cf is 1623783297 < 1623783380 >>>> commit-graph generation for commit 0f08ab3de6ec115ea8a956a1996cb9759e640e74 is 1621543278 < 1621543339 >>>> commit-graph generation for commit 133ed0319b5a66ae0c2be76e5a887b880452b111 is 1620949864 < 1620949915 >>>> commit-graph generation for commit 1341b3e6c63343ae94a8a473fa057126ddd4669a is 1637344364 < 1637344384 >>>> commit-graph generation for commit 15bdfc501c2c9f23e9353bf6e6a5facd9c32a07a is 1623348103 < 1623348133 >>>> ... >>>> $ echo $? >>>> 1 >>>> >>>> When generating commit-graphs with your patches applied the `verify` >>>> step works alright. >>>> >>>> I've also by accident stumbled over the original error again: >>>> >>>> fatal: commit-graph requires overflow generation data but has none >>>> >>>> This time it's definitely not caused by generating commit-graphs with an >>>> in-between state of your patch series because the data comes straight >>>> from production with no changes to the commit-graphs performed by >>>> myself. There we're running Git v2.33.1 with a couple of backported >>>> patches (see [1]). While those patches cause us to make more use of the >>>> commit-graph, none modify the way we generate them. >>>> >>>> Of note is that the commit-graph contains references to commits which >>>> don't exist in the ODB anymore. >>>> >>>> Patrick >>>> >>>> [1]: https://gitlab.com/gitlab-org/gitlab-git/-/commits/pks-v2.33.1.gl3 >>> >>> Thank you for your diligence here, Patrick. I really appreciate the >>> work you're putting in to verify the situation. >>> >>> Since our repro relies on private information, but is consistent, I >>> wonder if we should take the patch below, which starts to ignore the >>> older generation number v2 data and only writes freshly-computed >>> numbers. >>> >>> Thanks, >>> -Stolee >> >> Thanks. With your patch below the `fatal:` error is gone, but I'm still >> seeing the same errors with regards to the commit-graph generations. > > This is disappointing and unexpected. Thanks for verifying. > >> So to summarize my findings: >> >> - This bug occurs when writing commit-graphs with v2.35.1, but >> reading them with your patches. >> >> - This bug occurs when I have two repositories connected via an >> alternates file. I haven't yet been able to reproduce it in a >> single repository that is not connected to a separate ODB. > > This is an interesting distinction. One that I didn't think would > matter, but I'll look into the code to see how that could affect > things. > >> - This bug only occurs when I first generate the commit-graph in the >> repository I'm borrowing objects from. >> >> - This bug only occurs when I write commit-graphs with `--split` in >> both repositories. "Normal" commit-graphs don't have this issue, >> and neither can I see it with `--split=replace` or mixed-type >> commit-graphs. >> >> Beware, the following explanation is based on my very basic >> understanding of the commit-graph code and thus more likely to be wrong >> than right: >> >> With the old Git version, we've been mis-parsing the generation because >> `read_generation_data` wasn't ever set. As a result it can happen that >> the second split commit-graph we're generating computes its own >> generation numbers from the wrong starting point because it uses the >> mis-parsed generation numbers from the parent commit-graph. >> >> With your patches, we start to correctly account for overflows and would >> thus end up with a different value for the generation depending on where >> we parse the commit from: if we parse it from the first commit-graph it >> would be correct because it's contains the "root" of the generation >> numbers. But if we parse a commit from the second commit-graph we may >> have a mismatch because the generation numbers in there may have been >> derived from generation numbers mis-parsed from the first commit-graph. >> And because these would be wrong in case there was an overflow it is >> clear that the new corrected generation number may be wrong, as well. > > Hm. My expectation was that the older layers of the split commit-graph > would have read_generation_data disabled (because the new Git version > cannot read the GDAT chunk) and then the validate_mixed_generation_chain() > method would remove read_generation_data from all of the graphs in the > list. > > Combining this with your thoughts on cross-alternate split commit-graphs, > this makes me think we should try this: > > --- >8 --- > > diff --git a/commit-graph.c b/commit-graph.c > index fb2ced0bd6..74c6534f56 100644 > --- a/commit-graph.c > +++ b/commit-graph.c > @@ -609,8 +609,6 @@ struct commit_graph *read_commit_graph_one(struct repository *r, > if (!g) > g = load_commit_graph_chain(r, odb); > > - validate_mixed_generation_chain(g); > - > return g; > } > > @@ -668,7 +666,13 @@ static int prepare_commit_graph(struct repository *r) > !r->objects->commit_graph && odb; > odb = odb->next) > prepare_commit_graph_one(r, odb); > - return !!r->objects->commit_graph; > + > + if (r->objects->commit_graph) { > + validate_mixed_generation_chain(r->objects->commit_graph); > + return 1; > + } > + > + return 0; > } > > int generation_numbers_enabled(struct repository *r) > > > --- >8 --- > > Notice that I'm moving the validate_mixed_generation_chain() call > out of read_commit_graph_one() and into prepare_commit_graph(). To > my understanding, this _should_ have an equivalent end state as the > old code, but might be worth trying just as a quick check. > > I will continue investigating and try to reproduce with this > additional constraint of working across an alternate. My attempts to reproduce this across an alternate have failed. I tried running the following test against Git without these patches, then verify with the newer version of Git. (I also have generated a few new layers on top with these patches, and they correctly drop the GDA2 and GDO2 chunks when the lower layers "don't have gen v2".) test_description='commit-graph with offsets across alternates' . ./test-lib.sh if ! test_have_prereq TIME_IS_64BIT || ! test_have_prereq TIME_T_IS_64BIT then skip_all='skipping 64-bit timestamp tests' test_done fi UNIX_EPOCH_ZERO="@0 +0000" FUTURE_DATE="@4147483646 +0000" GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS=0 test_expect_success 'generate alternate split commit-graph' ' git init alternate && ( cd alternate && test_commit --date "$UNIX_EPOCH_ZERO" 1 && test_commit --date "$FUTURE_DATE" 2 && git commit-graph write --reachable && test_commit --date "$UNIX_EPOCH_ZERO" 3 && test_commit --date "$FUTURE_DATE" 4 && git commit-graph write --reachable --split=no-merge ) && git clone --shared alternate fork && ( cd fork && test_commit --date "$UNIX_EPOCH_ZERO" 5 && test_commit --date "$FUTURE_DATE" 6 && git commit-graph write --reachable --split=no-merge && test_commit --date "$UNIX_EPOCH_ZERO" 7 && test_commit --date "$FUTURE_DATE" 8 && git commit-graph write --reachable --split=no-merge ) ' test_done My testing after running this with -d allows me to reliably see these layers being created with GDAT and GDOV chunks. Running the 'git commit-graph verify' command with the new code does not show those errors, even after adding commits and another layer to the split commit-graph. I look forward to any additional insights you might have here. Thanks, -Stolee
On Fri, Mar 04, 2022 at 09:03:15AM -0500, Derrick Stolee wrote: > On 3/3/2022 11:00 AM, Derrick Stolee wrote: > > On 3/3/2022 6:19 AM, Patrick Steinhardt wrote: > >> On Wed, Mar 02, 2022 at 09:57:17AM -0500, Derrick Stolee wrote: > >>> On 3/2/2022 8:57 AM, Patrick Steinhardt wrote: > >>>> On Tue, Mar 01, 2022 at 10:25:46AM -0500, Derrick Stolee wrote: > >>>>> On 3/1/2022 9:53 AM, Patrick Steinhardt wrote: > >>> > >>>>>> Hum. I have re-verified, and this indeed seems to play out. So I must've > >>>>>> accidentally ran all my testing with the state generated without the > >>>>>> final patch which fixes the corruption. I do see lots of the following > >>>>>> warnings, but overall I can verify and write the commit-graph just fine: > >>>>>> > >>>>>> commit-graph generation for commit c80a42de8803e2d77818d0c82f88e748d7f9425f is 1623362063 < 1623362139 > >>>>> > >>>>> But I'm not able to generate these warnings from either version. I > >>>>> tried generating different levels of a split commit-graph, but > >>>>> could not reproduce it. If you have reproduction steps using current > >>>>> 'master' (or any released Git version) and the four patches here, > >>>>> then I would love to get a full understanding of your errors. > >>>>> > >>>>> Thanks, > >>>>> -Stolee > >>>> > >>>> I haven't yet been able to reproduce it with publicly available data, > >>>> but with the internal references I'm able to evoke the warnings > >>>> reliably. It only works when I have two repositories connected via > >>>> alternates, when generating the commit-graph in the linked-to repo > >>>> first, and then generating the commit-graph in the linking repo. > >>>> > >>>> The following recipe allows me to reproduce, but rely on private data: > >>>> > >>>> $ git --version > >>>> git version 2.35.1 > >>>> > >>>> # The pool repository is the one we're linked to from the fork. > >>>> $ cd "$pool" > >>>> $ rm -rf objects/info/commit-graph objects/info/commit-graph > >>>> $ git commit-graph write --split > >>>> > >>>> $ cd "$fork" > >>>> $ rm -rf objects/info/commit-graph objects/info/commit-graph > >>>> $ git commit-graph write --split > >>>> > >>>> $ git commit-graph verify --no-progress > >>>> $ echo $? > >>>> 0 > >>>> > >>>> # This is 715d08a9e51251ad8290b181b6ac3b9e1f9719d7 with your full v2 > >>>> # applied on top. > >>>> $ ~/Development/git/bin-wrappers/git --version > >>>> git version 2.35.1.358.g7ede1bea24 > >>>> > >>>> $ ~/Development/git/bin-wrappers/git commit-graph verify --no-progress > >>>> commit-graph generation for commit 06a91bac00ed11128becd48d5ae77eacd8f24c97 is 1623273624 < 1623273710 > >>>> commit-graph generation for commit 0ae91029f27238e8f8e109c6bb3907f864dda14f is 1622151146 < 1622151220 > >>>> commit-graph generation for commit 0d4582a33d8c8e3eb01adbf564f5e1deeb3b56a2 is 1631045222 < 1631045225 > >>>> commit-graph generation for commit 0daf8976439d7e0bb9710c5ee63b570580e0dc03 is 1620347739 < 1620347789 > >>>> commit-graph generation for commit 0e0ee8ffb3fa22cee7d28e21cbd6df26454932cf is 1623783297 < 1623783380 > >>>> commit-graph generation for commit 0f08ab3de6ec115ea8a956a1996cb9759e640e74 is 1621543278 < 1621543339 > >>>> commit-graph generation for commit 133ed0319b5a66ae0c2be76e5a887b880452b111 is 1620949864 < 1620949915 > >>>> commit-graph generation for commit 1341b3e6c63343ae94a8a473fa057126ddd4669a is 1637344364 < 1637344384 > >>>> commit-graph generation for commit 15bdfc501c2c9f23e9353bf6e6a5facd9c32a07a is 1623348103 < 1623348133 > >>>> ... > >>>> $ echo $? > >>>> 1 > >>>> > >>>> When generating commit-graphs with your patches applied the `verify` > >>>> step works alright. > >>>> > >>>> I've also by accident stumbled over the original error again: > >>>> > >>>> fatal: commit-graph requires overflow generation data but has none > >>>> > >>>> This time it's definitely not caused by generating commit-graphs with an > >>>> in-between state of your patch series because the data comes straight > >>>> from production with no changes to the commit-graphs performed by > >>>> myself. There we're running Git v2.33.1 with a couple of backported > >>>> patches (see [1]). While those patches cause us to make more use of the > >>>> commit-graph, none modify the way we generate them. > >>>> > >>>> Of note is that the commit-graph contains references to commits which > >>>> don't exist in the ODB anymore. > >>>> > >>>> Patrick > >>>> > >>>> [1]: https://gitlab.com/gitlab-org/gitlab-git/-/commits/pks-v2.33.1.gl3 > >>> > >>> Thank you for your diligence here, Patrick. I really appreciate the > >>> work you're putting in to verify the situation. > >>> > >>> Since our repro relies on private information, but is consistent, I > >>> wonder if we should take the patch below, which starts to ignore the > >>> older generation number v2 data and only writes freshly-computed > >>> numbers. > >>> > >>> Thanks, > >>> -Stolee > >> > >> Thanks. With your patch below the `fatal:` error is gone, but I'm still > >> seeing the same errors with regards to the commit-graph generations. > > > > This is disappointing and unexpected. Thanks for verifying. > > > >> So to summarize my findings: > >> > >> - This bug occurs when writing commit-graphs with v2.35.1, but > >> reading them with your patches. > >> > >> - This bug occurs when I have two repositories connected via an > >> alternates file. I haven't yet been able to reproduce it in a > >> single repository that is not connected to a separate ODB. > > > > This is an interesting distinction. One that I didn't think would > > matter, but I'll look into the code to see how that could affect > > things. > > > >> - This bug only occurs when I first generate the commit-graph in the > >> repository I'm borrowing objects from. > >> > >> - This bug only occurs when I write commit-graphs with `--split` in > >> both repositories. "Normal" commit-graphs don't have this issue, > >> and neither can I see it with `--split=replace` or mixed-type > >> commit-graphs. > >> > >> Beware, the following explanation is based on my very basic > >> understanding of the commit-graph code and thus more likely to be wrong > >> than right: > >> > >> With the old Git version, we've been mis-parsing the generation because > >> `read_generation_data` wasn't ever set. As a result it can happen that > >> the second split commit-graph we're generating computes its own > >> generation numbers from the wrong starting point because it uses the > >> mis-parsed generation numbers from the parent commit-graph. > >> > >> With your patches, we start to correctly account for overflows and would > >> thus end up with a different value for the generation depending on where > >> we parse the commit from: if we parse it from the first commit-graph it > >> would be correct because it's contains the "root" of the generation > >> numbers. But if we parse a commit from the second commit-graph we may > >> have a mismatch because the generation numbers in there may have been > >> derived from generation numbers mis-parsed from the first commit-graph. > >> And because these would be wrong in case there was an overflow it is > >> clear that the new corrected generation number may be wrong, as well. > > > > Hm. My expectation was that the older layers of the split commit-graph > > would have read_generation_data disabled (because the new Git version > > cannot read the GDAT chunk) and then the validate_mixed_generation_chain() > > method would remove read_generation_data from all of the graphs in the > > list. > > > > Combining this with your thoughts on cross-alternate split commit-graphs, > > this makes me think we should try this: > > > > --- >8 --- > > > > diff --git a/commit-graph.c b/commit-graph.c > > index fb2ced0bd6..74c6534f56 100644 > > --- a/commit-graph.c > > +++ b/commit-graph.c > > @@ -609,8 +609,6 @@ struct commit_graph *read_commit_graph_one(struct repository *r, > > if (!g) > > g = load_commit_graph_chain(r, odb); > > > > - validate_mixed_generation_chain(g); > > - > > return g; > > } > > > > @@ -668,7 +666,13 @@ static int prepare_commit_graph(struct repository *r) > > !r->objects->commit_graph && odb; > > odb = odb->next) > > prepare_commit_graph_one(r, odb); > > - return !!r->objects->commit_graph; > > + > > + if (r->objects->commit_graph) { > > + validate_mixed_generation_chain(r->objects->commit_graph); > > + return 1; > > + } > > + > > + return 0; > > } > > > > int generation_numbers_enabled(struct repository *r) > > > > > > --- >8 --- > > > > Notice that I'm moving the validate_mixed_generation_chain() call > > out of read_commit_graph_one() and into prepare_commit_graph(). To > > my understanding, this _should_ have an equivalent end state as the > > old code, but might be worth trying just as a quick check. > > > > I will continue investigating and try to reproduce with this > > additional constraint of working across an alternate. > > My attempts to reproduce this across an alternate have failed. I > tried running the following test against Git without these patches, > then verify with the newer version of Git. (I also have generated > a few new layers on top with these patches, and they correctly drop > the GDA2 and GDO2 chunks when the lower layers "don't have gen v2".) > > > test_description='commit-graph with offsets across alternates' > . ./test-lib.sh > > if ! test_have_prereq TIME_IS_64BIT || ! test_have_prereq TIME_T_IS_64BIT > then > skip_all='skipping 64-bit timestamp tests' > test_done > fi > > > UNIX_EPOCH_ZERO="@0 +0000" > FUTURE_DATE="@4147483646 +0000" > > GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS=0 > > test_expect_success 'generate alternate split commit-graph' ' > git init alternate && > ( > cd alternate && > test_commit --date "$UNIX_EPOCH_ZERO" 1 && > test_commit --date "$FUTURE_DATE" 2 && > git commit-graph write --reachable && > test_commit --date "$UNIX_EPOCH_ZERO" 3 && > test_commit --date "$FUTURE_DATE" 4 && > git commit-graph write --reachable --split=no-merge > ) && > git clone --shared alternate fork && > ( > cd fork && > test_commit --date "$UNIX_EPOCH_ZERO" 5 && > test_commit --date "$FUTURE_DATE" 6 && > git commit-graph write --reachable --split=no-merge && > test_commit --date "$UNIX_EPOCH_ZERO" 7 && > test_commit --date "$FUTURE_DATE" 8 && > git commit-graph write --reachable --split=no-merge > ) > ' > > test_done > > > My testing after running this with -d allows me to reliably see these > layers being created with GDAT and GDOV chunks. Running the 'git > commit-graph verify' command with the new code does not show those > errors, even after adding commits and another layer to the split > commit-graph. > > I look forward to any additional insights you might have here. I don't really know why, but now I've become unable to reproduce it again. I think we should just go with your patch 5/4 on top -- it does fix the most important issue, which is the `die()` I saw on almost all commands. The second part about the warnings I'm just not sure about, but I don't think it should stop this patch series given my own uncertainty. Patrick
On 3/7/2022 5:34 AM, Patrick Steinhardt wrote: > On Fri, Mar 04, 2022 at 09:03:15AM -0500, Derrick Stolee wrote: >> On 3/3/2022 11:00 AM, Derrick Stolee wrote: ... >>> I will continue investigating and try to reproduce with this >>> additional constraint of working across an alternate. >> >> My attempts to reproduce this across an alternate have failed. I >> tried running the following test against Git without these patches, >> then verify with the newer version of Git. (I also have generated >> a few new layers on top with these patches, and they correctly drop >> the GDA2 and GDO2 chunks when the lower layers "don't have gen v2".) >> >> >> test_description='commit-graph with offsets across alternates' >> . ./test-lib.sh >> >> if ! test_have_prereq TIME_IS_64BIT || ! test_have_prereq TIME_T_IS_64BIT >> then >> skip_all='skipping 64-bit timestamp tests' >> test_done >> fi >> >> >> UNIX_EPOCH_ZERO="@0 +0000" >> FUTURE_DATE="@4147483646 +0000" >> >> GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS=0 >> >> test_expect_success 'generate alternate split commit-graph' ' >> git init alternate && >> ( >> cd alternate && >> test_commit --date "$UNIX_EPOCH_ZERO" 1 && >> test_commit --date "$FUTURE_DATE" 2 && >> git commit-graph write --reachable && >> test_commit --date "$UNIX_EPOCH_ZERO" 3 && >> test_commit --date "$FUTURE_DATE" 4 && >> git commit-graph write --reachable --split=no-merge >> ) && >> git clone --shared alternate fork && >> ( >> cd fork && >> test_commit --date "$UNIX_EPOCH_ZERO" 5 && >> test_commit --date "$FUTURE_DATE" 6 && >> git commit-graph write --reachable --split=no-merge && >> test_commit --date "$UNIX_EPOCH_ZERO" 7 && >> test_commit --date "$FUTURE_DATE" 8 && >> git commit-graph write --reachable --split=no-merge >> ) >> ' >> >> test_done >> >> >> My testing after running this with -d allows me to reliably see these >> layers being created with GDAT and GDOV chunks. Running the 'git >> commit-graph verify' command with the new code does not show those >> errors, even after adding commits and another layer to the split >> commit-graph. >> >> I look forward to any additional insights you might have here. > > I don't really know why, but now I've become unable to reproduce it > again. I think we should just go with your patch 5/4 on top -- it does > fix the most important issue, which is the `die()` I saw on almost all > commands. The second part about the warnings I'm just not sure about, > but I don't think it should stop this patch series given my own > uncertainty. Thanks for following up. I agree that with 5/4 we should be safe. I'll remain available to quickly respond if anything else surprising comes up in this area. Thanks! -Stolee
Derrick Stolee <derrickstolee@github.com> writes: > Thanks for following up. I agree that with 5/4 we should be safe. > > I'll remain available to quickly respond if anything else surprising > comes up in this area. Thanks. I just picked up the "bankruptcy" step, so we should be good to go.
On Mon, Mar 07, 2022 at 08:45:07AM -0500, Derrick Stolee wrote: > On 3/7/2022 5:34 AM, Patrick Steinhardt wrote: > > On Fri, Mar 04, 2022 at 09:03:15AM -0500, Derrick Stolee wrote: > >> On 3/3/2022 11:00 AM, Derrick Stolee wrote: > ... > >>> I will continue investigating and try to reproduce with this > >>> additional constraint of working across an alternate. > >> > >> My attempts to reproduce this across an alternate have failed. I > >> tried running the following test against Git without these patches, > >> then verify with the newer version of Git. (I also have generated > >> a few new layers on top with these patches, and they correctly drop > >> the GDA2 and GDO2 chunks when the lower layers "don't have gen v2".) > >> > >> > >> test_description='commit-graph with offsets across alternates' > >> . ./test-lib.sh > >> > >> if ! test_have_prereq TIME_IS_64BIT || ! test_have_prereq TIME_T_IS_64BIT > >> then > >> skip_all='skipping 64-bit timestamp tests' > >> test_done > >> fi > >> > >> > >> UNIX_EPOCH_ZERO="@0 +0000" > >> FUTURE_DATE="@4147483646 +0000" > >> > >> GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS=0 > >> > >> test_expect_success 'generate alternate split commit-graph' ' > >> git init alternate && > >> ( > >> cd alternate && > >> test_commit --date "$UNIX_EPOCH_ZERO" 1 && > >> test_commit --date "$FUTURE_DATE" 2 && > >> git commit-graph write --reachable && > >> test_commit --date "$UNIX_EPOCH_ZERO" 3 && > >> test_commit --date "$FUTURE_DATE" 4 && > >> git commit-graph write --reachable --split=no-merge > >> ) && > >> git clone --shared alternate fork && > >> ( > >> cd fork && > >> test_commit --date "$UNIX_EPOCH_ZERO" 5 && > >> test_commit --date "$FUTURE_DATE" 6 && > >> git commit-graph write --reachable --split=no-merge && > >> test_commit --date "$UNIX_EPOCH_ZERO" 7 && > >> test_commit --date "$FUTURE_DATE" 8 && > >> git commit-graph write --reachable --split=no-merge > >> ) > >> ' > >> > >> test_done > >> > >> > >> My testing after running this with -d allows me to reliably see these > >> layers being created with GDAT and GDOV chunks. Running the 'git > >> commit-graph verify' command with the new code does not show those > >> errors, even after adding commits and another layer to the split > >> commit-graph. > >> > >> I look forward to any additional insights you might have here. > > > > I don't really know why, but now I've become unable to reproduce it > > again. I think we should just go with your patch 5/4 on top -- it does > > fix the most important issue, which is the `die()` I saw on almost all > > commands. The second part about the warnings I'm just not sure about, > > but I don't think it should stop this patch series given my own > > uncertainty. > > Thanks for following up. I agree that with 5/4 we should be safe. > > I'll remain available to quickly respond if anything else surprising > comes up in this area. > > Thanks! > -Stolee There is another surprise I hit today in the context of generation numbers. In production, I found the following bug: signal: aborted (core dumped): BUG: chunk-format.c:88: expected to write 8 bytes to chunk 47444f56, but wrote 168304 instead 47444f56 is the GENERATION_DATA_OVERFLOW chunk ID, and seemingly the precomputed size we intended to write was mismatching the data we have actually been writing to disk. And I think this stems from a mismatch in how we precompute the number of generation data overflows compared to how we're actually writing the data to disk: - We precompute how many generation number overflows there are in `compute_generation_numbers()`. Here we only increment the number of overflows in case all parents of a given commit have a non-zero generation number and if the generation is bigger than OFFSET_MAX. Seemingly we have found only a single commit which matches this criteria because we pass `sizeof(timestamp_t) * overflows` as expected size, and `sizeof(timestamp_t) == 8`. - On the other hand, when we write generation numbers to disk in `write_graph_chunk_generation_data_overflow()`, we always write a chunk in case its offset is bigger than OFFSET_MAX. So we don't care about the parents here, and this seems to extend the number of commits which match this criteria to 21038 commits we write into the file. The result is that the sanity check we do where we compare that the actually written amount of data matches what we expect fails because of the different ways we count this data. This time I don't have access to the repository myself, I only tried to combine what's happening based on the bug message and the code. Patrick
On 3/10/2022 8:58 AM, Patrick Steinhardt wrote: > On Mon, Mar 07, 2022 at 08:45:07AM -0500, Derrick Stolee wrote: >> On 3/7/2022 5:34 AM, Patrick Steinhardt wrote: >>> On Fri, Mar 04, 2022 at 09:03:15AM -0500, Derrick Stolee wrote: >>>> On 3/3/2022 11:00 AM, Derrick Stolee wrote: >> ... >>>>> I will continue investigating and try to reproduce with this >>>>> additional constraint of working across an alternate. >>>> >>>> My attempts to reproduce this across an alternate have failed. I >>>> tried running the following test against Git without these patches, >>>> then verify with the newer version of Git. (I also have generated >>>> a few new layers on top with these patches, and they correctly drop >>>> the GDA2 and GDO2 chunks when the lower layers "don't have gen v2".) >>>> >>>> >>>> test_description='commit-graph with offsets across alternates' >>>> . ./test-lib.sh >>>> >>>> if ! test_have_prereq TIME_IS_64BIT || ! test_have_prereq TIME_T_IS_64BIT >>>> then >>>> skip_all='skipping 64-bit timestamp tests' >>>> test_done >>>> fi >>>> >>>> >>>> UNIX_EPOCH_ZERO="@0 +0000" >>>> FUTURE_DATE="@4147483646 +0000" >>>> >>>> GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS=0 >>>> >>>> test_expect_success 'generate alternate split commit-graph' ' >>>> git init alternate && >>>> ( >>>> cd alternate && >>>> test_commit --date "$UNIX_EPOCH_ZERO" 1 && >>>> test_commit --date "$FUTURE_DATE" 2 && >>>> git commit-graph write --reachable && >>>> test_commit --date "$UNIX_EPOCH_ZERO" 3 && >>>> test_commit --date "$FUTURE_DATE" 4 && >>>> git commit-graph write --reachable --split=no-merge >>>> ) && >>>> git clone --shared alternate fork && >>>> ( >>>> cd fork && >>>> test_commit --date "$UNIX_EPOCH_ZERO" 5 && >>>> test_commit --date "$FUTURE_DATE" 6 && >>>> git commit-graph write --reachable --split=no-merge && >>>> test_commit --date "$UNIX_EPOCH_ZERO" 7 && >>>> test_commit --date "$FUTURE_DATE" 8 && >>>> git commit-graph write --reachable --split=no-merge >>>> ) >>>> ' >>>> >>>> test_done >>>> >>>> >>>> My testing after running this with -d allows me to reliably see these >>>> layers being created with GDAT and GDOV chunks. Running the 'git >>>> commit-graph verify' command with the new code does not show those >>>> errors, even after adding commits and another layer to the split >>>> commit-graph. >>>> >>>> I look forward to any additional insights you might have here. >>> >>> I don't really know why, but now I've become unable to reproduce it >>> again. I think we should just go with your patch 5/4 on top -- it does >>> fix the most important issue, which is the `die()` I saw on almost all >>> commands. The second part about the warnings I'm just not sure about, >>> but I don't think it should stop this patch series given my own >>> uncertainty. >> >> Thanks for following up. I agree that with 5/4 we should be safe. >> >> I'll remain available to quickly respond if anything else surprising >> comes up in this area. >> >> Thanks! >> -Stolee > > There is another surprise I hit today in the context of generation > numbers. In production, I found the following bug: "In production" makes me think this is on a version of Git without these patches. Am I right? > signal: aborted (core dumped): BUG: chunk-format.c:88: expected to write 8 bytes to chunk 47444f56, but wrote 168304 instead > > 47444f56 is the GENERATION_DATA_OVERFLOW chunk ID, and seemingly the > precomputed size we intended to write was mismatching the data we have > actually been writing to disk. And I think this stems from a mismatch in > how we precompute the number of generation data overflows compared to > how we're actually writing the data to disk: Yes, and this count was supposed to be fixed in patch v3 3/5. The issue is that we skip the increment if the commit was already parsed, so we would undercount. The fix was to do a loop at the end focused on counting these overflows. If you have v3 applied and you still get this error, then we need to look closely at this issue. For your production's sake, you might want to set "commitGraph.generationVersion=1" in your config until you have the fixes from this series. Thanks, -Stolee
diff --git a/commit-graph.c b/commit-graph.c index a19bd96c2ee..8e52bb09552 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -407,6 +407,9 @@ struct commit_graph *parse_commit_graph(struct repository *r, &graph->chunk_generation_data); pair_chunk(cf, GRAPH_CHUNKID_GENERATION_DATA_OVERFLOW, &graph->chunk_generation_data_overflow); + + if (graph->chunk_generation_data) + graph->read_generation_data = 1; } if (r->settings.commit_graph_read_changed_paths) { diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh index 5ed6d2a21c1..fa9d32facfb 100755 --- a/t/t4216-log-bloom.sh +++ b/t/t4216-log-bloom.sh @@ -48,7 +48,7 @@ graph_read_expect () { header: 43475048 1 $(test_oid oid_version) $NUM_CHUNKS 0 num_commits: $1 chunks: oid_fanout oid_lookup commit_metadata generation_data bloom_indexes bloom_data - options: bloom(1,10,7) + options: bloom(1,10,7) read_generation_data EOF test-tool read-graph >actual && test_cmp expect actual diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh index f9bffe38013..1afee1c2705 100755 --- a/t/t5318-commit-graph.sh +++ b/t/t5318-commit-graph.sh @@ -100,11 +100,21 @@ graph_read_expect() { OPTIONAL=" $2" NUM_CHUNKS=$((3 + $(echo "$2" | wc -w))) fi + GENERATION_VERSION=2 + if test ! -z "$3" + then + GENERATION_VERSION=$3 + fi + OPTIONS= + if test $GENERATION_VERSION -gt 1 + then + OPTIONS=" read_generation_data" + fi cat >expect <<- EOF header: 43475048 1 $(test_oid oid_version) $NUM_CHUNKS 0 num_commits: $1 chunks: oid_fanout oid_lookup commit_metadata$OPTIONAL - options: + options:$OPTIONS EOF test-tool read-graph >output && test_cmp expect output @@ -498,7 +508,7 @@ test_expect_success 'git commit-graph verify' ' cd "$TRASH_DIRECTORY/full" && git rev-parse commits/8 | git -c commitGraph.generationVersion=1 commit-graph write --stdin-commits && git commit-graph verify >output && - graph_read_expect 9 extra_edges + graph_read_expect 9 extra_edges 1 ' NUM_COMMITS=9 diff --git a/t/t5324-split-commit-graph.sh b/t/t5324-split-commit-graph.sh index 778fa418de2..669ddc645fa 100755 --- a/t/t5324-split-commit-graph.sh +++ b/t/t5324-split-commit-graph.sh @@ -30,11 +30,16 @@ graph_read_expect() { then NUM_BASE=$2 fi + OPTIONS= + if test -z "$3" + then + OPTIONS=" read_generation_data" + fi cat >expect <<- EOF header: 43475048 1 $(test_oid oid_version) 4 $NUM_BASE num_commits: $1 chunks: oid_fanout oid_lookup commit_metadata generation_data - options: + options:$OPTIONS EOF test-tool read-graph >output && test_cmp expect output @@ -624,7 +629,7 @@ test_expect_success 'write generation data chunk if topmost remaining layer has header: 43475048 1 $(test_oid oid_version) 5 1 num_commits: $(($NUM_SECOND_LAYER_COMMITS + $NUM_THIRD_LAYER_COMMITS + $NUM_FOURTH_LAYER_COMMITS + $NUM_FIFTH_LAYER_COMMITS)) chunks: oid_fanout oid_lookup commit_metadata generation_data - options: + options: read_generation_data EOF test_cmp expect output )