mbox series

[0/5] Fix and extend encoding handling in fast export/import

Message ID 20190425155118.7918-1-newren@gmail.com (mailing list archive)
Headers show
Series Fix and extend encoding handling in fast export/import | expand

Message

Elijah Newren April 25, 2019, 3:51 p.m. UTC
While stress testing `git filter-repo`, I noticed an issue with
encoding; further digging led to the fixes and features in this series.
See the individual commit messages for details.

Elijah Newren (5):
  t9350: fix encoding test to actually test reencoding
  fast-import: support 'encoding' commit header
  fast-export: avoid stripping encoding header if we cannot reencode
  fast-export: differentiate between explicitly utf-8 and implicitly
    utf-8
  fast-export: do automatic reencoding of commit messages only if
    requested

 Documentation/git-fast-import.txt |  7 ++++
 builtin/fast-export.c             | 44 ++++++++++++++++++++----
 fast-import.c                     | 12 +++++--
 t/t9300-fast-import.sh            | 20 +++++++++++
 t/t9350-fast-export.sh            | 57 ++++++++++++++++++++++++++-----
 5 files changed, 123 insertions(+), 17 deletions(-)

Comments

Elijah Newren April 25, 2019, 3:55 p.m. UTC | #1
On Thu, Apr 25, 2019 at 9:51 AM Elijah Newren <newren@gmail.com> wrote:
>
> While stress testing `git filter-repo`, I noticed an issue with
> encoding; further digging led to the fixes and features in this series.
> See the individual commit messages for details.

Whoops, forgot to cc Brian; I'm curious if my understanding is correct
about the sha256sum transition plans that the intent in the short term
is using fast-export & fast-import to transition to-and-from a
sha256sum repo on the fly; if so, I believe that transition work
should use the new --reencode=yes option in patch five.
Elijah Newren April 25, 2019, 3:57 p.m. UTC | #2
On Thu, Apr 25, 2019 at 9:55 AM Elijah Newren <newren@gmail.com> wrote:
>
> On Thu, Apr 25, 2019 at 9:51 AM Elijah Newren <newren@gmail.com> wrote:
> >
> > While stress testing `git filter-repo`, I noticed an issue with
> > encoding; further digging led to the fixes and features in this series.
> > See the individual commit messages for details.
>
> Whoops, forgot to cc Brian; I'm curious if my understanding is correct
> about the sha256sum transition plans that the intent in the short term
> is using fast-export & fast-import to transition to-and-from a
> sha256sum repo on the fly; if so, I believe that transition work
> should use the new --reencode=yes option in patch five.

I seem to be struggling with distractions this morning; I mean the
`--reencode=no` option from patch 5/5.
brian m. carlson April 26, 2019, 9:32 p.m. UTC | #3
On Thu, Apr 25, 2019 at 09:55:11AM -0600, Elijah Newren wrote:
> On Thu, Apr 25, 2019 at 9:51 AM Elijah Newren <newren@gmail.com> wrote:
> >
> > While stress testing `git filter-repo`, I noticed an issue with
> > encoding; further digging led to the fixes and features in this series.
> > See the individual commit messages for details.
> 
> Whoops, forgot to cc Brian; I'm curious if my understanding is correct
> about the sha256sum transition plans that the intent in the short term
> is using fast-export & fast-import to transition to-and-from a
> sha256sum repo on the fly; if so, I believe that transition work
> should use the new --reencode=yes option in patch five.

The plan is to convert using fast-import and fast-export, yes, but
on the fly, no. You'll convert your repository up front using
fast-import and fast-export and then conversion will happen on the fly
as needed internally. The latter is a thing I'm working on.

So individual users will want to use the --reencode option, but
internally we probably won't get as far as actually decoding most of the
commit object, so we'll keep the bytes in place.

I do appreciate the CC, though.