mbox series

[0/2] prevent `repack` to unpack and delete promisor objects

Message ID 20210414191403.4387-1-rafaeloliveira.cs@gmail.com (mailing list archive)
Headers show
Series prevent `repack` to unpack and delete promisor objects | expand

Message

Rafael Silva April 14, 2021, 7:14 p.m. UTC
This series is built on top of jk/promisor-optim. It conflicts with
changes on p5600 otherwise.

The following patches fixes the issue where we unnecessarily turn loose
all the promisor objects and deletes them right after when running
`repack -A -d ..` (via `git gc) for a partial repository. 

Special thanks to Peff, for proposing a better approach for managing
the situation and for Jonathan Tan for earlier interaction on the
solution. Previously, I thought we should skip the promisor objects
by just adding a check in loosened_object_can_be_discarded(). However,
Peff pointed out that we can do better by realizing much sooner that
we should not even consider loosening the objects for the _old_ promisor
packs.

It took me a bit to come up with the test because it seems `repack`
doesn't offer an option to skip the "deletion of unpacked objects",
so this series adds a new option to `repack` for skip the
`git prune-packed` execution thus allowing us to easily inspect the
unpacked objects before they are removed and simplification of our
test suite. Furthermore, The test will now test the `repack` code
path instead of performing the operations by calling
`pack-objects`.

Rafael Silva (2):
  repack: teach --no-prune-packed to skip `git prune-packed`
  repack: avoid loosening promisor pack objects in partial clones

 Documentation/git-repack.txt  |  5 +++++
 builtin/repack.c              | 15 ++++++++++++---
 t/perf/p5600-partial-clone.sh |  4 ++++
 t/t5616-partial-clone.sh      |  9 +++++++++
 t/t7700-repack.sh             | 23 +++++++++++------------
 5 files changed, 41 insertions(+), 15 deletions(-)

Comments

Junio C Hamano April 14, 2021, 10:10 p.m. UTC | #1
Rafael Silva <rafaeloliveira.cs@gmail.com> writes:

> This series is built on top of jk/promisor-optim. It conflicts with
> changes on p5600 otherwise.
>
> The following patches fixes the issue where we unnecessarily turn loose
> all the promisor objects and deletes them right after when running
> `repack -A -d ..` (via `git gc) for a partial repository. 
>
> Special thanks to Peff, for proposing a better approach for managing
> the situation and for Jonathan Tan for earlier interaction on the
> solution. Previously, I thought we should skip the promisor objects
> by just adding a check in loosened_object_can_be_discarded(). However,
> Peff pointed out that we can do better by realizing much sooner that
> we should not even consider loosening the objects for the _old_ promisor
> packs.
>
> It took me a bit to come up with the test because it seems `repack`
> doesn't offer an option to skip the "deletion of unpacked objects",
> so this series adds a new option to `repack` for skip the
> `git prune-packed` execution thus allowing us to easily inspect the
> unpacked objects before they are removed and simplification of our
> test suite. Furthermore, The test will now test the `repack` code
> path instead of performing the operations by calling
> `pack-objects`.

Beautiful.  Thanks for working so well together, all of you.


> Rafael Silva (2):
>   repack: teach --no-prune-packed to skip `git prune-packed`
>   repack: avoid loosening promisor pack objects in partial clones
>
>  Documentation/git-repack.txt  |  5 +++++
>  builtin/repack.c              | 15 ++++++++++++---
>  t/perf/p5600-partial-clone.sh |  4 ++++
>  t/t5616-partial-clone.sh      |  9 +++++++++
>  t/t7700-repack.sh             | 23 +++++++++++------------
>  5 files changed, 41 insertions(+), 15 deletions(-)
Jeff King April 15, 2021, 9:15 a.m. UTC | #2
On Wed, Apr 14, 2021 at 09:14:01PM +0200, Rafael Silva wrote:

> It took me a bit to come up with the test because it seems `repack`
> doesn't offer an option to skip the "deletion of unpacked objects",
> so this series adds a new option to `repack` for skip the
> `git prune-packed` execution thus allowing us to easily inspect the
> unpacked objects before they are removed and simplification of our
> test suite. Furthermore, The test will now test the `repack` code
> path instead of performing the operations by calling
> `pack-objects`.

Thanks for working on this. Overall the patches seem sane, though I
think Jonathan's comments (especially about the confusion in the commit
message of 2/2) are worth addressing.

I have mixed feelings on the "--no-prune-packed" option, just because
it's user-visible and I don't think it's something a normal user would
ever really want.

In the new test (and I think in the old ones you modified, though I
didn't look carefully) the main thing we care about is whether we write
out loose objects. So another solution would be to improve the debug
logging inside pack-objects to tell us more about what it's doing.

The fork of Git we use at GitHub has something similar; when we discard
objects or force them loose, we write their sha1 values to a log file.
This has come in handy for a lot of after-the-fact debugging ("oops,
this repo is corrupted; did we intentionally delete object X?").

I wonder if we could do something similar with the trace2 facility. I
know it can be turned on via config, but I don't know how good the
support is for enabling just one segment of data (and this may generate
a lot of entries, so people using trace2 for telemetry probably wouldn't
want it on).

For the purposes of the tests, though just a normal GIT_TRACE_PACK_DEBUG
would be plenty. I dunno. I don't want to open up a can of worms on
logging that would hold up getting this quite-substantial fix in place.
But once we add --no-prune-packed, it will be hard to take away.

-Peff
Rafael Silva April 18, 2021, 8:20 a.m. UTC | #3
Jeff King <peff@peff.net> writes:

> On Wed, Apr 14, 2021 at 09:14:01PM +0200, Rafael Silva wrote:
>
>> It took me a bit to come up with the test because it seems `repack`
>> doesn't offer an option to skip the "deletion of unpacked objects",
>> so this series adds a new option to `repack` for skip the
>> `git prune-packed` execution thus allowing us to easily inspect the
>> unpacked objects before they are removed and simplification of our
>> test suite. Furthermore, The test will now test the `repack` code
>> path instead of performing the operations by calling
>> `pack-objects`.
>
> Thanks for working on this. Overall the patches seem sane, though I
> think Jonathan's comments (especially about the confusion in the commit
> message of 2/2) are worth addressing.
>

Thanks for the review. Indeed, I will address Jonathan's comments on
the next revision to remove the confused and misleading commit message.

Sorry for the confusion.

>
> I have mixed feelings on the "--no-prune-packed" option, just because
> it's user-visible and I don't think it's something a normal user would
> ever really want.
>
> In the new test (and I think in the old ones you modified, though I
> didn't look carefully) the main thing we care about is whether we write
> out loose objects. So another solution would be to improve the debug
> logging inside pack-objects to tell us more about what it's doing.
>

Honestly, I'm also not happy adding an end user-visible variable just
for the sake of testing it, specially because it's unclear whether the
user will actually use this option.

Initially, what convinced myself for proposing the changes was the
additional cleanup in our test suite, but giving a second thought I'm
not sure now whether this is a strong argument either.

> The fork of Git we use at GitHub has something similar; when we discard
> objects or force them loose, we write their sha1 values to a log file.
> This has come in handy for a lot of after-the-fact debugging ("oops,
> this repo is corrupted; did we intentionally delete object X?").
>
> I wonder if we could do something similar with the trace2 facility. I
> know it can be turned on via config, but I don't know how good the
> support is for enabling just one segment of data (and this may generate
> a lot of entries, so people using trace2 for telemetry probably wouldn't
> want it on).
>
> For the purposes of the tests, though just a normal GIT_TRACE_PACK_DEBUG
> would be plenty. I dunno. I don't want to open up a can of worms on
> logging that would hold up getting this quite-substantial fix in place.
> But once we add --no-prune-packed, it will be hard to take away.
>
> -Peff

After reading this comment and investigating a bit more, I believe
increasing the debug logging of `pack-objects` will help drop the first
patch, at least for now, and allowing the fix to progress without the
"controversial" user option.  Later, we can revisit adding the
`--no-prune-packed` (or a better named) option in case we think this
will be useful for the users.

I'll be addressing this in v2.