Message ID | cover.1686677910.git.jonathantanmy@google.com (mailing list archive) |
---|---|
Headers | show |
Series | Changed path filter hash fix and version bump | expand |
Jonathan Tan <jonathantanmy@google.com> writes: > Thanks Ramsay for spotting the errors and mentioning that I can use > octal escapes. Here's an update taking into account their comments. The changes look good. Will queue. Stolee, you had comments on an earlier round---how does this one look? Thanks.
On 6/13/2023 3:21 PM, Junio C Hamano wrote: > Jonathan Tan <jonathantanmy@google.com> writes: > >> Thanks Ramsay for spotting the errors and mentioning that I can use >> octal escapes. Here's an update taking into account their comments. > > The changes look good. Will queue. > > Stolee, you had comments on an earlier round---how does this one > look? I'm sorry I'm so late to this. I've been meaning to get to it, but it's been a crazy couple of weeks. This version is not ready. The backwards compatibility story is incomplete. When commitGraph.changedPathsVersion is set, it does not allow reading a previous filter version, leaving us in a poor performance state until the commit-graph file can be rewritten. While I was reviewing, it seemed reasonable to deprecate commitGraph.readChangedPaths, but this use of "also restrict writes to this version" didn't make sense to me at the time. Instead, it would be good to have this clarity between the config options: commitGraph.readChangedPaths: should we read and use the filters that exist on disk? Defaults to 'true'. commitGraph.changedPathsVersion: Which version should we _write_ when writing a new commit-graph? Defaults to '1' but will default to '2' in the next major verion, then '1' will no longer be an accepted value in the version after that. The tricky part is that during the commit-graph write, you will need to check the existing filter value to see if it matches. If not, the filters will need to be recomputed from scratch. This will change patch 4 a bit, but it's the right thing to do. Thanks, -Stolee
Derrick Stolee <derrickstolee@github.com> writes: > When commitGraph.changedPathsVersion is set, it does not allow > reading a previous filter version, leaving us in a poor performance > state until the commit-graph file can be rewritten. > > While I was reviewing, it seemed reasonable to deprecate > commitGraph.readChangedPaths, but this use of "also restrict writes > to this version" didn't make sense to me at the time. Instead, it > would be good to have this clarity between the config options: > > commitGraph.readChangedPaths: should we read and use the filters > that exist on disk? Defaults to 'true'. > > commitGraph.changedPathsVersion: Which version should we _write_ > when writing a new commit-graph? Defaults to '1' but will default > to '2' in the next major verion, then '1' will no longer be an > accepted value in the version after that. > > The tricky part is that during the commit-graph write, you will > need to check the existing filter value to see if it matches. If > not, the filters will need to be recomputed from scratch. This > will change patch 4 a bit, but it's the right thing to do. > > Thanks, > -Stolee OK - this sounds reasonable. I'll take a look.
On Tue, Jun 20, 2023 at 09:43:46AM -0400, Derrick Stolee wrote: > This version is not ready. The backwards compatibility story is > incomplete. I'm also late to the party, but I agree with Stolee here, having come to the same conclusion about needing to support reading older (corrupt) Bloom filters when possible (i.e. when paths contain no bytes which have their high-order bits set), and assuming the filter contains all paths otherwise. > commitGraph.changedPathsVersion: Which version should we _write_ > when writing a new commit-graph? Defaults to '1' but will default > to '2' in the next major verion, then '1' will no longer be an > accepted value in the version after that. I am not sure if there's a situation where we'd ever want to not write the newer versions when starting a new commit-graph (or chain) from scratch. I think that follows from what you and I are both suggesting w.r.t backwards compatibility. If that's the case, I think that we could in theory drop this configuration setting altogether. Or, at the very least, we should be able to change it change only what version we *write*, not read. I think this is what you are suggesting above, but I am not 100% sure, so apologies if I'm just repeating what you've already suggested. > The tricky part is that during the commit-graph write, you will > need to check the existing filter value to see if it matches. If > not, the filters will need to be recomputed from scratch. This > will change patch 4 a bit, but it's the right thing to do. Yup, good suggestion. Thanks, Taylor
On 6/21/2023 8:19 AM, Taylor Blau wrote: > On Tue, Jun 20, 2023 at 09:43:46AM -0400, Derrick Stolee wrote: >> commitGraph.changedPathsVersion: Which version should we _write_ >> when writing a new commit-graph? Defaults to '1' but will default >> to '2' in the next major verion, then '1' will no longer be an >> accepted value in the version after that. > > I am not sure if there's a situation where we'd ever want to not write > the newer versions when starting a new commit-graph (or chain) from > scratch. I'd rather have the choice to start writing the new filter mode be made by config rather than a change to the Git binary. Makes for a more gradual rollout to be sure there aren't issues with the new version. So please keep the configuration value, but have it indicate the mode used when writing filters. Thanks, -Stolee
Derrick Stolee <derrickstolee@github.com> writes: > On 6/21/2023 8:19 AM, Taylor Blau wrote: > > On Tue, Jun 20, 2023 at 09:43:46AM -0400, Derrick Stolee wrote: > > >> commitGraph.changedPathsVersion: Which version should we _write_ > >> when writing a new commit-graph? Defaults to '1' but will default > >> to '2' in the next major verion, then '1' will no longer be an > >> accepted value in the version after that. > > > > I am not sure if there's a situation where we'd ever want to not write > > the newer versions when starting a new commit-graph (or chain) from > > scratch. > > I'd rather have the choice to start writing the new filter mode be > made by config rather than a change to the Git binary. Makes for a > more gradual rollout to be sure there aren't issues with the new > version. > > So please keep the configuration value, but have it indicate the > mode used when writing filters. > > Thanks, > -Stolee It looks like we can't avoid writing both versions (we need to write version 1 so that we can reuse existing Bloom filters when writing, if the repo has version 1 Bloom filters) so a config that tells us which to write sounds doable. I'll take a look.
On 6/22/2023 6:27 PM, Jonathan Tan wrote: > It looks like we can't avoid writing both versions (we need to write > version 1 so that we can reuse existing Bloom filters when writing, if > the repo has version 1 Bloom filters) so a config that tells us which to > write sounds doable. I'll take a look. I don't fully understand what you're saying here. We need to be able to write both versions (not simultaneously, but toggled via config) so we can roll out this change carefully instead of suddenly due to the Git executable changing. But we don't need to be able to write version 1 just so we can reuse version 1 filters. In fact, we should be able to upgrade to version 2 if the config points at that, but we should _not_ re-use the filters in that case. This does present an interesting challenge for the upgrade: we have the commitGraph.maxNewFilters option to limit the amount of new filters we write at a given time. When shifting to version 2, we will start from scratch, so that could have some effect. I will consider how to handle this, perhaps by raising the number temporarily. Thanks, -Stolee