mbox series

[v4,00/13] Stop relying on SHA1 fallback for `the_hash_algo`

Message ID cover.1715057362.git.ps@pks.im (mailing list archive)
Headers show
Series Stop relying on SHA1 fallback for `the_hash_algo` | expand

Message

Patrick Steinhardt May 7, 2024, 4:52 a.m. UTC
Hi,

this is the third version of my patch series that stops relying on the
SHA1 fallback configured for `the_hash_algo`. Instead, any callers that
access `the_hash_algo` without it being initialized will now crash hard.
This is designed to catch various subtle bugs going forward.

The only change compared to v3 of this patch series is a rebase on top
of the following two topics:

  - ps/the-index-is-no-more, which causes a conflict in the final patch.

  - jc/no-default-attr-tree-in-bare, which causes a conflict in the
    fourth patch.

Interestingly, the range diff does not surface the second resolved
conflict. This is because the context is actually the same, but when
applying the patch in question you would get a conflict. To aid with
this I decided to generate patches with `--unified=4`, which surfaces
the previously-conflicting hunk in `compute_default_attr_source()`.

Patrick

Patrick Steinhardt (13):
  path: harden validation of HEAD with non-standard hashes
  path: move `validate_headref()` to its only user
  parse-options-cb: only abbreviate hashes when hash algo is known
  attr: don't recompute default attribute source
  attr: fix BUG() when parsing attrs outside of repo
  remote-curl: fix parsing of detached SHA256 heads
  builtin/rev-parse: allow shortening to more than 40 hex characters
  builtin/blame: don't access potentially unitialized `the_hash_algo`
  builtin/bundle: abort "verify" early when there is no repository
  builtin/diff: explicitly set hash algo when there is no repo
  builtin/shortlog: don't set up revisions without repo
  oss-fuzz/commit-graph: set up hash algorithm
  repository: stop setting SHA1 as the default object hash

 attr.c                       | 31 +++++++++++++++------
 builtin/blame.c              |  5 ++--
 builtin/bundle.c             |  5 ++++
 builtin/diff.c               |  9 ++++++
 builtin/rev-parse.c          |  5 ++--
 builtin/shortlog.c           |  2 +-
 oss-fuzz/fuzz-commit-graph.c |  1 +
 parse-options-cb.c           |  3 +-
 path.c                       | 53 ------------------------------------
 path.h                       |  1 -
 remote-curl.c                | 19 ++++++++++++-
 repository.c                 | 20 --------------
 setup.c                      | 53 ++++++++++++++++++++++++++++++++++++
 t/t0003-attributes.sh        | 15 ++++++++++
 t/t0040-parse-options.sh     | 17 ++++++++++++
 t/t1500-rev-parse.sh         |  6 ++++
 t/t5550-http-fetch-dumb.sh   | 15 ++++++++++
 17 files changed, 168 insertions(+), 92 deletions(-)

Range-diff against v3:
 1:  5134f35cda =  1:  5ee372c2d8 path: harden validation of HEAD with non-standard hashes
 2:  589b6a99ef =  2:  ece0ab94a8 path: move `validate_headref()` to its only user
 3:  9a63c445d2 =  3:  1a0859eaf1 parse-options-cb: only abbreviate hashes when hash algo is known
 4:  929bacbfce =  4:  1204a34216 attr: don't recompute default attribute source
 5:  8f20aec1ee =  5:  f098ce9690 attr: fix BUG() when parsing attrs outside of repo
 6:  53439067a1 =  6:  e8876052ad remote-curl: fix parsing of detached SHA256 heads
 7:  1f74960760 =  7:  6116030310 builtin/rev-parse: allow shortening to more than 40 hex characters
 8:  2d985abca1 =  8:  872ded113e builtin/blame: don't access potentially unitialized `the_hash_algo`
 9:  f3b23d28aa =  9:  5b4a21d2ce builtin/bundle: abort "verify" early when there is no repository
10:  7577b6b96c = 10:  e7254ae757 builtin/diff: explicitly set hash algo when there is no repo
11:  509c79d1d3 = 11:  c21b15ba17 builtin/shortlog: don't set up revisions without repo
12:  660f976129 = 12:  f37beb0e89 oss-fuzz/commit-graph: set up hash algorithm
13:  95909c2da5 ! 13:  950b08bc78 repository: stop setting SHA1 as the default object hash
    @@ Commit message
         Signed-off-by: Patrick Steinhardt <ps@pks.im>
     
      ## repository.c ##
    -@@ repository.c: void initialize_the_repository(void)
    - 	the_repo.parsed_objects = parsed_object_pool_new();
    - 
    - 	index_state_init(&the_index, the_repository);
    +@@ repository.c: void initialize_repository(struct repository *repo)
    + 	repo->parsed_objects = parsed_object_pool_new();
    + 	ALLOC_ARRAY(repo->index, 1);
    + 	index_state_init(repo->index, repo);
     -
    --	repo_set_hash_algo(&the_repo, GIT_HASH_SHA1);
    +-	/*
    +-	 * Unfortunately, we need to keep this hack around for the time being:
    +-	 *
    +-	 *   - Not setting up the hash algorithm for `the_repository` leads to
    +-	 *     crashes because `the_hash_algo` is a macro that expands to
    +-	 *     `the_repository->hash_algo`. So if Git commands try to access
    +-	 *     `the_hash_algo` without a Git directory we crash.
    +-	 *
    +-	 *   - Setting up the hash algorithm to be SHA1 by default breaks other
    +-	 *     commands when running with SHA256.
    +-	 *
    +-	 * This is another point in case why having global state is a bad idea.
    +-	 * Eventually, we should remove this hack and stop setting the hash
    +-	 * algorithm in this function altogether. Instead, it should only ever
    +-	 * be set via our repository setup procedures. But that requires more
    +-	 * work.
    +-	 */
    +-	if (repo == the_repository)
    +-		repo_set_hash_algo(repo, GIT_HASH_SHA1);
      }
      
      static void expand_base_dir(char **out, const char *in,