mbox series

[00/38] SHA-256, part 3/3

Message ID 20200710024728.3100527-1-sandals@crustytoothpaste.net (mailing list archive)
Headers show
Series SHA-256, part 3/3 | expand

Message

brian m. carlson July 10, 2020, 2:46 a.m. UTC
This is the final part required for the stage 4 implementation of
SHA-256.

The beginning part of this series is mostly final test fixes, and should
be straightforward.  Next are two patches that are required to fix the
remaining SHA-256 incompatibilities in our codebase, both of which were
necessitated by recent changes.

After that, the meat of the series: support for reading
extensions.objectFormat, removal of the compile-time option, and some
tests.  We introduce a knob to control the testsuite and a wire-in for
CI so we don't regress things in the future, and at the end, some docs.

The final patch here is the promised cleanup for test_oid_init in
various places in our codebase.

Note that SHA-256 repositories don't currently interoperate with SHA-1
repositories.  I'm working on additional patches for the pieces required
to implement that and full stage 1, 2, and 3 support, which will be in
future series.  However, it is possible to have a full SHA-256
repository that is fully functional after this series.  I refer the
reader to Documentation/technical/hash-function-transition.txt for the
details of the different stages.

I do plan to provide a shell-based tool for easier conversion of
repositories with submodules at some point in the future, but all the
pieces required are already built into fast-import and fast-export.

I would like to thank everyone who's contributed to the SHA-256 work
over the years with patches, review, documentation, suggestions, and
other contributions.  I could not possibly name all of the contributors
here, so I won't try.  None of this work could have been done without
all your help.

I plan to take a brief hiatus from more SHA-256 work after this and work
on some other Git- and non-Git-related projects, but I'll probably
resume at some point before year's end.  I'm happy to provide the
in-progress work for next steps if people are interested.

I fully expect this series won't be picked up until after the release,
so please feel free to focus on release-related activities for the
moment.

Johannes Schindelin (1):
  t3404: prepare 'short SHA-1 collision' tests for SHA-256

brian m. carlson (37):
  t: make test-bloom initialize repository
  t1001: use $ZERO_OID
  t3305: make hash agnostic
  t6100: make hash size independent
  t6101: make hash size independent
  t6301: make hash size independent
  t6500: specify test values for SHA-256
  t6501: avoid hard-coded objects
  t7003: compute appropriate length constant
  t7063: make hash size independent
  t7201: abstract away SHA-1-specific constants
  t7102: abstract away SHA-1-specific constants
  t7400: make hash size independent
  t7405: make hash size independent
  t7506: avoid checking for SHA-1-specific constants
  t7508: use $ZERO_OID instead of hard-coded constant
  t8002: make hash size independent
  t8003: make hash size independent
  t8011: make hash size independent
  t9300: abstract away SHA-1-specific constants
  t9300: use $ZERO_OID instead of hard-coded object ID
  t9301: make hash size independent
  t9350: make hash size independent
  t9500: ensure that algorithm info is preserved in config
  t9700: make hash size independent
  t5308: make test work with SHA-256
  t0410: mark test with SHA1 prerequisite
  http-fetch: set up git directory before parsing pack hashes
  builtin/verify-pack: implement an --object-format option
  setup: add support for reading extensions.objectformat
  Enable SHA-256 support by default
  t: add test_oid option to select hash algorithm
  t: allow testing different hash algorithms via environment
  t: make SHA1 prerequisite depend on default hash
  ci: run tests with SHA-256
  docs: add documentation for extensions.objectFormat
  t: remove test_oid_init in tests

 Documentation/config.txt               |   2 +
 Documentation/config/extensions.txt    |   7 ++
 builtin/init-db.c                      |   5 -
 builtin/verify-pack.c                  |  25 ++--
 ci/run-build-and-tests.sh              |   1 +
 config.mak.dev                         |   2 -
 http-fetch.c                           |   4 +-
 repository.c                           |   4 -
 setup.c                                |  12 +-
 t/helper/test-bloom.c                  |   2 +
 t/lib-pack.sh                          |  11 +-
 t/lib-submodule-update.sh              |   1 -
 t/t0000-basic.sh                       |  15 ++-
 t/t0001-init.sh                        |  31 +++++
 t/t0410-partial-clone.sh               |   2 +-
 t/t1006-cat-file.sh                    |   2 -
 t/t1050-large.sh                       |   1 -
 t/t1091-sparse-checkout-builtin.sh     |   4 +-
 t/t1410-reflog.sh                      |   1 -
 t/t1450-fsck.sh                        |   1 -
 t/t1500-rev-parse.sh                   |   1 -
 t/t3305-notes-fanout.sh                |   2 +-
 t/t3308-notes-merge.sh                 |   1 -
 t/t3404-rebase-interactive.sh          |  49 ++++++--
 t/t3600-rm.sh                          |   1 -
 t/t3800-mktag.sh                       |   1 -
 t/t4002-diff-basic.sh                  |   2 -
 t/t4027-diff-submodule.sh              |   1 -
 t/t4134-apply-submodule.sh             |   1 -
 t/t4200-rerere.sh                      |   1 -
 t/t4211-line-log.sh                    |   1 -
 t/t5300-pack-object.sh                 |   3 +-
 t/t5302-pack-index.sh                  |   1 -
 t/t5308-pack-detect-duplicates.sh      |  20 ++--
 t/t5313-pack-bounds-checks.sh          |   1 -
 t/t5318-commit-graph.sh                |   3 +-
 t/t5319-multi-pack-index.sh            |   1 -
 t/t5324-split-commit-graph.sh          |   1 -
 t/t5504-fetch-receive-strict.sh        |   1 -
 t/t5530-upload-pack-error.sh           |   1 -
 t/t5562-http-backend-content-length.sh |   1 -
 t/t5702-protocol-v2.sh                 |   3 +-
 t/t5703-upload-pack-ref-in-want.sh     |   1 -
 t/t6006-rev-list-format.sh             |   1 -
 t/t6100-rev-list-in-order.sh           |   4 +-
 t/t6101-rev-parse-parents.sh           |   2 +-
 t/t6301-for-each-ref-errors.sh         |   2 +-
 t/t6500-gc.sh                          |  27 ++++-
 t/t6501-freshen-objects.sh             |  14 +--
 t/t7003-filter-branch.sh               |   3 +-
 t/t7063-status-untracked-cache.sh      | 155 +++++++++++++------------
 t/t7102-reset.sh                       |  93 ++++++++-------
 t/t7201-co.sh                          |   5 +-
 t/t7400-submodule-basic.sh             |  26 ++---
 t/t7405-submodule-merge.sh             |   4 +-
 t/t7506-status-submodule.sh            |  12 +-
 t/t7508-status.sh                      |   2 +-
 t/t8002-blame.sh                       |  15 +--
 t/t8003-blame-corner-cases.sh          |   3 +-
 t/t8011-blame-split-file.sh            |   2 +-
 t/t9300-fast-import.sh                 | 117 ++++++++++---------
 t/t9301-fast-import-notes.sh           |  13 +--
 t/t9350-fast-export.sh                 |  15 ++-
 t/t9500-gitweb-standalone-no-errors.sh |  22 +++-
 t/t9700/test.pl                        |   6 +-
 t/test-lib-functions.sh                |  16 ++-
 t/test-lib.sh                          |   6 +-
 67 files changed, 472 insertions(+), 324 deletions(-)
 create mode 100644 Documentation/config/extensions.txt

Comments

Derrick Stolee July 10, 2020, 3:14 p.m. UTC | #1
On 7/9/2020 10:46 PM, brian m. carlson wrote:
> This is the final part required for the stage 4 implementation of
> SHA-256.

WOOHOO! What a milestone!

As usual, your commits are excellently organized and clear. I could
not find any fault in any of them.

The proof is really in the pudding: does this pass the test suite
when GIT_TEST_DEFAULT_HASH=sha256? You add that as a mode to the
CI scripts, so we will know.

I made a recommendation for a different model with how to do the CI,
but it's super minor and can be done later. Basically, if we create
a new job for SHA-256 mode, then we can more quickly identify when
a test failure is due to that toggle and not other optional GIT_TEST_*
variables.

I hope to play around with SHA256-enabled repos a bit later, to see
if I can find any issues poking around on my own. I doubt I will,
with how thoroughly you modified the test suite.

Thanks for this incredible achievement!

-Stolee
brian m. carlson July 10, 2020, 7:55 p.m. UTC | #2
On 2020-07-10 at 15:14:37, Derrick Stolee wrote:
> On 7/9/2020 10:46 PM, brian m. carlson wrote:
> > This is the final part required for the stage 4 implementation of
> > SHA-256.
> 
> WOOHOO! What a milestone!

I'm also excited about this.  It's been a lot of work, but we're finally
here.

> As usual, your commits are excellently organized and clear. I could
> not find any fault in any of them.

Thanks.

> The proof is really in the pudding: does this pass the test suite
> when GIT_TEST_DEFAULT_HASH=sha256? You add that as a mode to the
> CI scripts, so we will know.

I've seen several cases where we've accidentally regressed things with
SHA-256, so it seemed only prudent to set up CI.  I've run it locally on
my system and it works for me, but we'll see how it fares on the CI
system.

> I made a recommendation for a different model with how to do the CI,
> but it's super minor and can be done later. Basically, if we create
> a new job for SHA-256 mode, then we can more quickly identify when
> a test failure is due to that toggle and not other optional GIT_TEST_*
> variables.

I think that's a good suggestion, and I'm familiar enough with GitHub
Actions that I think I can set up an additional job.  If I reroll, I'll
try to squash such a change in.

> I hope to play around with SHA256-enabled repos a bit later, to see
> if I can find any issues poking around on my own. I doubt I will,
> with how thoroughly you modified the test suite.

For folks who are looking for a more convenient way to get patches,
you're welcome to grab them from the transition-stage-4 branch on
https://github.com/bk2204/git.git or
https://git.crustytoothpaste.net/git/bmc/git.git.  The GitHub URL has
slightly more bandwidth and a generally better uptime, since I don't
live in a datacenter.
Junio C Hamano July 10, 2020, 8:15 p.m. UTC | #3
"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> On 2020-07-10 at 15:14:37, Derrick Stolee wrote:
>> On 7/9/2020 10:46 PM, brian m. carlson wrote:
>> > This is the final part required for the stage 4 implementation of
>> > SHA-256.
>> 
>> WOOHOO! What a milestone!
>
> I'm also excited about this.  It's been a lot of work, but we're finally
> here.

Thanks.
Junio C Hamano July 11, 2020, 12:37 a.m. UTC | #4
"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> On 2020-07-10 at 15:14:37, Derrick Stolee wrote:
>> On 7/9/2020 10:46 PM, brian m. carlson wrote:
>> > This is the final part required for the stage 4 implementation of
>> > SHA-256.
>> 
>> WOOHOO! What a milestone!
>
> I'm also excited about this.  It's been a lot of work, but we're finally
> here.

This topic sits at the tip of 'seen' (formerly known as 'pu'), and

    https://travis-ci.org/github/git/git/jobs/707050671

shows that t7063 is broken at the tip of 'seen'.

 - At the tip of this topic, t7063 passes.
 - There is no other topic that touches t7063 in flight.
 - seen^1, i.e. everything other than this topic merged, passes t7063.

Ahh, this is an easy one.  It is an interaction between this one and
the dl/test-must-fail-fixes-6 topic.

There are a few hunks like this in this topic.

-	test_cmp ../expect ../actual
+	test_might_fail test_cmp ../expect ../actual

and the other series tightens test_must/might_fail so that these
test helpers can only be used on "git" (other users should just use
"! cmd" or "cmd || :" instead).

I do not think it was an explicit objective for Denton's series to
catch the use of test_might_fail with test_cmp specifically, but I
offhand do not think of a good use case for saying "expect and
actual may sometimes be the same, but they may be different", so in
that sense, it contributed to find a nonsensical code.  I haven't
read thru all the 38 patches of this series, so there may be an
obvious reason why we may want to have such a thing expressed that I
am missing, though...

Thanks.
brian m. carlson July 11, 2020, 1:06 a.m. UTC | #5
On 2020-07-11 at 00:37:14, Junio C Hamano wrote:
> I do not think it was an explicit objective for Denton's series to
> catch the use of test_might_fail with test_cmp specifically, but I
> offhand do not think of a good use case for saying "expect and
> actual may sometimes be the same, but they may be different", so in
> that sense, it contributed to find a nonsensical code.  I haven't
> read thru all the 38 patches of this series, so there may be an
> obvious reason why we may want to have such a thing expressed that I
> am missing, though...

As mentioned upthread, my patch is definitely not correct.  I've
squashed in a fix to remove the test_might_fail and will send out a
reroll later this weekend.  I want to wait a little bit in case anyone
has immediate comments on things so as not to send out patches too
frequently.

If the breakage is bothersome for you, please feel free to just remove
those test_might_fail entries in the meantime, and the series should
function correctly.