mbox series

[00/11] Partial bundles

Message ID pull.1159.git.1645638911.gitgitgadget@gmail.com (mailing list archive)
Headers show
Series Partial bundles | expand

Message

Derrick Stolee via GitGitGadget Feb. 23, 2022, 5:55 p.m. UTC
While discussing bundle-URIs [1], it came to my attention that bundles have
no way to specify an object filter, so bundles cannot be used with partial
clones.

[1]
https://lore.kernel.org/git/7fab28bf-54e7-d0e9-110a-53fad6244cc9@gmail.com/

This series provides a way to fix that by adding a 'filter' capability to
the bundle file format and allowing one to create a partial bundle with 'git
bundle create --filter=blob:none '.

There are a couple things that I want to point out about this implementation
that could use some high-level feedback:

 1. I moved the '--filter' parsing into setup_revisions() instead of adding
    another place to parse it. This works for 'git bundle' but it also
    allows it to be parsed successfully in commands such as 'git diff' which
    doesn't make sense. Options such as '--objects' are already being parsed
    there, and they don't make sense either, so I want some thoughts on
    this.

 2. If someone uses 'git clone partial.bdl partial' where 'partial.bdl' is a
    filtered bundle, then the clone will fail with a message such as

fatal: missing blob object '9444604d515c0b162e37e59accd54a0bac50ed2e' fatal:
remote did not send all necessary objects

This might be fine. We don't expect users to clone partial bundles or fetch
partial bundles into an unfiltered repo and these failures are expected. It
is possible that we could put in custom logic to fail faster by reading the
bundle header for a filter.

Generally, the idea is to open this up as a potential way to bootstrap a
clone of a partial clone using a set of precomputed partial bundles.

Thanks, -Stolee

Derrick Stolee (11):
  index-pack: document and test the --promisor option
  revision: put object filter into struct rev_info
  pack-objects: use rev.filter when possible
  pack-bitmap: drop filter in prepare_bitmap_walk()
  list-objects: consolidate traverse_commit_list[_filtered]
  MyFirstObjectWalk: update recommended usage
  bundle: safely handle --objects option
  bundle: parse filter capability
  rev-list: move --filter parsing into revision.c
  bundle: create filtered bundles
  bundle: unbundle promisor packs

 Documentation/MyFirstObjectWalk.txt | 44 ++++++---------
 Documentation/git-index-pack.txt    |  8 +++
 builtin/pack-objects.c              |  9 +--
 builtin/rev-list.c                  | 29 +++-------
 bundle.c                            | 87 ++++++++++++++++++++++++-----
 bundle.h                            |  3 +
 list-objects-filter-options.c       |  2 +-
 list-objects-filter-options.h       |  5 ++
 list-objects.c                      | 25 +++------
 list-objects.h                      | 12 +++-
 pack-bitmap.c                       | 24 ++++----
 pack-bitmap.h                       |  2 -
 reachable.c                         |  2 +-
 revision.c                          | 11 ++++
 revision.h                          |  4 ++
 t/t5300-pack-object.sh              |  4 +-
 t/t6020-bundle-misc.sh              | 48 ++++++++++++++++
 17 files changed, 215 insertions(+), 104 deletions(-)


base-commit: 45fe28c951c3e70666ee4ef8379772851a8e4d32
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1159%2Fderrickstolee%2Fbundle%2Fpartial-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1159/derrickstolee/bundle/partial-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1159

Comments

Jeff Hostetler Feb. 28, 2022, 5 p.m. UTC | #1
On 2/23/22 12:55 PM, Derrick Stolee via GitGitGadget wrote:
> While discussing bundle-URIs [1], it came to my attention that bundles have
> no way to specify an object filter, so bundles cannot be used with partial
> clones.
> 
> [1]
> https://lore.kernel.org/git/7fab28bf-54e7-d0e9-110a-53fad6244cc9@gmail.com/
> 
> This series provides a way to fix that by adding a 'filter' capability to
> the bundle file format and allowing one to create a partial bundle with 'git
> bundle create --filter=blob:none '.

Nicely done.  There's a lot of refactoring here to move the
filtering code into a more usable place and get rid of some
of the awkward limitations of my original code.  Sorry that
you had to slog thru all of that.

> 
> There are a couple things that I want to point out about this implementation
> that could use some high-level feedback:
> 
>   1. I moved the '--filter' parsing into setup_revisions() instead of adding
>      another place to parse it. This works for 'git bundle' but it also
>      allows it to be parsed successfully in commands such as 'git diff' which
>      doesn't make sense. Options such as '--objects' are already being parsed
>      there, and they don't make sense either, so I want some thoughts on
>      this.

This feels like something that can wait for another task.
Let's keep this series focused on adding partial bundles.

> 
>   2. If someone uses 'git clone partial.bdl partial' where 'partial.bdl' is a
>      filtered bundle, then the clone will fail with a message such as
> 
> fatal: missing blob object '9444604d515c0b162e37e59accd54a0bac50ed2e' fatal:
> remote did not send all necessary objects
> 
> This might be fine. We don't expect users to clone partial bundles or fetch
> partial bundles into an unfiltered repo and these failures are expected. It
> is possible that we could put in custom logic to fail faster by reading the
> bundle header for a filter.
> 
> Generally, the idea is to open this up as a potential way to bootstrap a
> clone of a partial clone using a set of precomputed partial bundles.

I think this is to be expected.

Would it help to have Git do a no-checkout clone when cloning
from a partial bundle?  Maybe that would give the user a chance to set
a real remote (and maybe set the partial clone/fetch config settings)
and then backfill their local clone??   (That might be functional, but
not very user-friendly....)

Or should we just consider this limitation as a placeholder while we
wait for the Bundle URI effort?

Jeff

> 
> Thanks, -Stolee
> 
> Derrick Stolee (11):
>    index-pack: document and test the --promisor option
>    revision: put object filter into struct rev_info
>    pack-objects: use rev.filter when possible
>    pack-bitmap: drop filter in prepare_bitmap_walk()
>    list-objects: consolidate traverse_commit_list[_filtered]
>    MyFirstObjectWalk: update recommended usage
>    bundle: safely handle --objects option
>    bundle: parse filter capability
>    rev-list: move --filter parsing into revision.c
>    bundle: create filtered bundles
>    bundle: unbundle promisor packs
> 
>   Documentation/MyFirstObjectWalk.txt | 44 ++++++---------
>   Documentation/git-index-pack.txt    |  8 +++
>   builtin/pack-objects.c              |  9 +--
>   builtin/rev-list.c                  | 29 +++-------
>   bundle.c                            | 87 ++++++++++++++++++++++++-----
>   bundle.h                            |  3 +
>   list-objects-filter-options.c       |  2 +-
>   list-objects-filter-options.h       |  5 ++
>   list-objects.c                      | 25 +++------
>   list-objects.h                      | 12 +++-
>   pack-bitmap.c                       | 24 ++++----
>   pack-bitmap.h                       |  2 -
>   reachable.c                         |  2 +-
>   revision.c                          | 11 ++++
>   revision.h                          |  4 ++
>   t/t5300-pack-object.sh              |  4 +-
>   t/t6020-bundle-misc.sh              | 48 ++++++++++++++++
>   17 files changed, 215 insertions(+), 104 deletions(-)
> 
> 
> base-commit: 45fe28c951c3e70666ee4ef8379772851a8e4d32
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1159%2Fderrickstolee%2Fbundle%2Fpartial-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1159/derrickstolee/bundle/partial-v1
> Pull-Request: https://github.com/gitgitgadget/git/pull/1159
>
Derrick Stolee Feb. 28, 2022, 5:54 p.m. UTC | #2
On 2/28/2022 12:00 PM, Jeff Hostetler wrote:
> On 2/23/22 12:55 PM, Derrick Stolee via GitGitGadget wrote:
>> While discussing bundle-URIs [1], it came to my attention that bundles have
>> no way to specify an object filter, so bundles cannot be used with partial
>> clones.
>>
>> [1]
>> https://lore.kernel.org/git/7fab28bf-54e7-d0e9-110a-53fad6244cc9@gmail.com/
>>
>> This series provides a way to fix that by adding a 'filter' capability to
>> the bundle file format and allowing one to create a partial bundle with 'git
>> bundle create --filter=blob:none '.
> 
> Nicely done.  There's a lot of refactoring here to move the
> filtering code into a more usable place and get rid of some
> of the awkward limitations of my original code.  Sorry that
> you had to slog thru all of that.
> 
>>
>> There are a couple things that I want to point out about this implementation
>> that could use some high-level feedback:
>>
>>   1. I moved the '--filter' parsing into setup_revisions() instead of adding
>>      another place to parse it. This works for 'git bundle' but it also
>>      allows it to be parsed successfully in commands such as 'git diff' which
>>      doesn't make sense. Options such as '--objects' are already being parsed
>>      there, and they don't make sense either, so I want some thoughts on
>>      this.
> 
> This feels like something that can wait for another task.
> Let's keep this series focused on adding partial bundles.

What do you mean "can wait"? Do you recommend that I _don't_ do this
refactor and instead implement filter parsing directly in bundles?

Or, are you saying that we should not worry about these potential
side-effects of allowing (then ignoring) certain options in other
commands, at least until a later series?
 
>>   2. If someone uses 'git clone partial.bdl partial' where 'partial.bdl' is a
>>      filtered bundle, then the clone will fail with a message such as
>>
>> fatal: missing blob object '9444604d515c0b162e37e59accd54a0bac50ed2e' fatal:
>> remote did not send all necessary objects
>>
>> This might be fine. We don't expect users to clone partial bundles or fetch
>> partial bundles into an unfiltered repo and these failures are expected. It
>> is possible that we could put in custom logic to fail faster by reading the
>> bundle header for a filter.
>>
>> Generally, the idea is to open this up as a potential way to bootstrap a
>> clone of a partial clone using a set of precomputed partial bundles.
> 
> I think this is to be expected.
> 
> Would it help to have Git do a no-checkout clone when cloning
> from a partial bundle?  Maybe that would give the user a chance to set
> a real remote (and maybe set the partial clone/fetch config settings)
> and then backfill their local clone??   (That might be functional, but
> not very user-friendly....)
> 
> Or should we just consider this limitation as a placeholder while we
> wait for the Bundle URI effort?

It would be interesting to have another application of partial
bundles, such as cloning directly from a bundle, then allowing
a remote to be configured. It provides a "build-it-yourself"
approach to bundle URIs for partial clones.

I'm not sure if such an application is required for this series,
or if it could be delayed until after. I'm open to suggestions.

Thanks,
-Stolee
Jeff Hostetler March 1, 2022, 6:03 p.m. UTC | #3
On 2/28/22 12:54 PM, Derrick Stolee wrote:
> On 2/28/2022 12:00 PM, Jeff Hostetler wrote:
>> On 2/23/22 12:55 PM, Derrick Stolee via GitGitGadget wrote:
>>> While discussing bundle-URIs [1], it came to my attention that bundles have
>>> no way to specify an object filter, so bundles cannot be used with partial
>>> clones.
>>>
>>> [1]
>>> https://lore.kernel.org/git/7fab28bf-54e7-d0e9-110a-53fad6244cc9@gmail.com/
>>>
>>> This series provides a way to fix that by adding a 'filter' capability to
>>> the bundle file format and allowing one to create a partial bundle with 'git
>>> bundle create --filter=blob:none '.
>>
>> Nicely done.  There's a lot of refactoring here to move the
>> filtering code into a more usable place and get rid of some
>> of the awkward limitations of my original code.  Sorry that
>> you had to slog thru all of that.
>>
>>>
>>> There are a couple things that I want to point out about this implementation
>>> that could use some high-level feedback:
>>>
>>>    1. I moved the '--filter' parsing into setup_revisions() instead of adding
>>>       another place to parse it. This works for 'git bundle' but it also
>>>       allows it to be parsed successfully in commands such as 'git diff' which
>>>       doesn't make sense. Options such as '--objects' are already being parsed
>>>       there, and they don't make sense either, so I want some thoughts on
>>>       this.
>>
>> This feels like something that can wait for another task.
>> Let's keep this series focused on adding partial bundles.
> 
> What do you mean "can wait"? Do you recommend that I _don't_ do this
> refactor and instead implement filter parsing directly in bundles?
> 
> Or, are you saying that we should not worry about these potential
> side-effects of allowing (then ignoring) certain options in other
> commands, at least until a later series?

Sorry to be confusing here.  I just meant that the refactoring looks
good and we should continue with it.  And for now let's not worry
about those odd non-sense combinations that may now be possible.

That at some point later we can address the non-sense combinations
with some control flags on the call to the common parser or something.
I started thinking about that while looking at the code and just thought
that it would be a bit of a mess.  And that we didn't need to do any
thing about it now.  Such tweaking would be better done in a later
series.

>   
>>>    2. If someone uses 'git clone partial.bdl partial' where 'partial.bdl' is a
>>>       filtered bundle, then the clone will fail with a message such as
>>>
>>> fatal: missing blob object '9444604d515c0b162e37e59accd54a0bac50ed2e' fatal:
>>> remote did not send all necessary objects
>>>
>>> This might be fine. We don't expect users to clone partial bundles or fetch
>>> partial bundles into an unfiltered repo and these failures are expected. It
>>> is possible that we could put in custom logic to fail faster by reading the
>>> bundle header for a filter.
>>>
>>> Generally, the idea is to open this up as a potential way to bootstrap a
>>> clone of a partial clone using a set of precomputed partial bundles.
>>
>> I think this is to be expected.
>>
>> Would it help to have Git do a no-checkout clone when cloning
>> from a partial bundle?  Maybe that would give the user a chance to set
>> a real remote (and maybe set the partial clone/fetch config settings)
>> and then backfill their local clone??   (That might be functional, but
>> not very user-friendly....)
>>
>> Or should we just consider this limitation as a placeholder while we
>> wait for the Bundle URI effort?
> 
> It would be interesting to have another application of partial
> bundles, such as cloning directly from a bundle, then allowing
> a remote to be configured. It provides a "build-it-yourself"
> approach to bundle URIs for partial clones.
> 
> I'm not sure if such an application is required for this series,
> or if it could be delayed until after. I'm open to suggestions.

I think what you have here is nice and self-contained.  That is,
supporting partial bundles is a nice stop point.  Then have 1 or 2
series that consumes them.

Jeff
Derrick Stolee March 4, 2022, 7:19 p.m. UTC | #4
On 2/23/2022 12:55 PM, Derrick Stolee via GitGitGadget wrote:
> While discussing bundle-URIs [1], it came to my attention that bundles have
> no way to specify an object filter, so bundles cannot be used with partial
> clones.
> 
> [1]
> https://lore.kernel.org/git/7fab28bf-54e7-d0e9-110a-53fad6244cc9@gmail.com/
> 
> This series provides a way to fix that by adding a 'filter' capability to
> the bundle file format and allowing one to create a partial bundle with 'git
> bundle create --filter=blob:none '.
> 
> There are a couple things that I want to point out about this implementation
> that could use some high-level feedback:
> 
>  1. I moved the '--filter' parsing into setup_revisions() instead of adding
>     another place to parse it. This works for 'git bundle' but it also
>     allows it to be parsed successfully in commands such as 'git diff' which
>     doesn't make sense. Options such as '--objects' are already being parsed
>     there, and they don't make sense either, so I want some thoughts on
>     this.
> 
>  2. If someone uses 'git clone partial.bdl partial' where 'partial.bdl' is a
>     filtered bundle, then the clone will fail with a message such as
> 
> fatal: missing blob object '9444604d515c0b162e37e59accd54a0bac50ed2e' fatal:
> remote did not send all necessary objects
> 
> This might be fine. We don't expect users to clone partial bundles or fetch
> partial bundles into an unfiltered repo and these failures are expected. It
> is possible that we could put in custom logic to fail faster by reading the
> bundle header for a filter.
> 
> Generally, the idea is to open this up as a potential way to bootstrap a
> clone of a partial clone using a set of precomputed partial bundles.

Thanks Jeff, for providing a review of this series. I hope that at
least one other reviewer could take a look sometime.

Thanks,
-Stolee
Ævar Arnfjörð Bjarmason March 7, 2022, 2:55 p.m. UTC | #5
On Wed, Feb 23 2022, Derrick Stolee via GitGitGadget wrote:

Just on:

>  Documentation/MyFirstObjectWalk.txt | 44 ++++++---------
>  Documentation/git-index-pack.txt    |  8 +++
>  builtin/pack-objects.c              |  9 +--
>  builtin/rev-list.c                  | 29 +++-------
>  bundle.c                            | 87 ++++++++++++++++++++++++-----
>  bundle.h                            |  3 +
>  list-objects-filter-options.c       |  2 +-
>  list-objects-filter-options.h       |  5 ++
>  list-objects.c                      | 25 +++------
>  list-objects.h                      | 12 +++-
>  pack-bitmap.c                       | 24 ++++----
>  pack-bitmap.h                       |  2 -
>  reachable.c                         |  2 +-
>  revision.c                          | 11 ++++
>  revision.h                          |  4 ++
>  t/t5300-pack-object.sh              |  4 +-
>  t/t6020-bundle-misc.sh              | 48 ++++++++++++++++
>  17 files changed, 215 insertions(+), 104 deletions(-)

This is missing a corresponding change to
Documentation/technical/bundle-format.txt.
Derrick Stolee March 7, 2022, 2:59 p.m. UTC | #6
On 3/7/2022 9:55 AM, Ævar Arnfjörð Bjarmason wrote:
> 
> On Wed, Feb 23 2022, Derrick Stolee via GitGitGadget wrote:
> 
> Just on:
> 
>>  Documentation/MyFirstObjectWalk.txt | 44 ++++++---------
>>  Documentation/git-index-pack.txt    |  8 +++
>>  builtin/pack-objects.c              |  9 +--
>>  builtin/rev-list.c                  | 29 +++-------
>>  bundle.c                            | 87 ++++++++++++++++++++++++-----
>>  bundle.h                            |  3 +
>>  list-objects-filter-options.c       |  2 +-
>>  list-objects-filter-options.h       |  5 ++
>>  list-objects.c                      | 25 +++------
>>  list-objects.h                      | 12 +++-
>>  pack-bitmap.c                       | 24 ++++----
>>  pack-bitmap.h                       |  2 -
>>  reachable.c                         |  2 +-
>>  revision.c                          | 11 ++++
>>  revision.h                          |  4 ++
>>  t/t5300-pack-object.sh              |  4 +-
>>  t/t6020-bundle-misc.sh              | 48 ++++++++++++++++
>>  17 files changed, 215 insertions(+), 104 deletions(-)
> 
> This is missing a corresponding change to
> Documentation/technical/bundle-format.txt.

Thanks. I don't know how I missed that.

-Stolee