Message ID | 00738c81a1212970910da6f29fe3ecef87c2ec3a.1671116820.git.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | Optionally skip hashing index on write | expand |
On Thu, Dec 15 2022, Derrick Stolee via GitGitGadget wrote: > From: Derrick Stolee <derrickstolee@github.com> > + end = (unsigned char *)hdr + size; Commentary: The "unsigned char *" cast has moved up here from the below, that's needed (we're casting from the struct), and we should get rid of it from below, good. > + start = end - the_hash_algo->rawsz; Okey, so here we mark the start, which is the end minus the rawsz, but... > + oidread(&oid, start); > + if (oideq(&oid, null_oid())) > + return 0; > + > the_hash_algo->init_fn(&c); > the_hash_algo->update_fn(&c, hdr, size - the_hash_algo->rawsz); > the_hash_algo->final_fn(hash, &c); > - if (!hasheq(hash, (unsigned char *)hdr + size - the_hash_algo->rawsz)) > + if (!hasheq(hash, end - the_hash_algo->rawsz)) ...here we got rid of the cast, which is good, but let's not use "end - the_hash_algo->rawsz" here, let's use "start", which you already computed as "end - the_hash_algo->rawsz". This is just repeating it. I wondered if I just missed it being modified in the interim before carefully re-reading this, but we pass your tests with: - if (!hasheq(hash, end - the_hash_algo->rawsz)) + assert((end - the_hash_algo->rawsz) == start); + if (!hasheq(hash, start)) So, we can indeed juse the simpler "start" here, and it makes it easier to read, as we're assured that it didn't move in the interim. > + git_config_get_maybe_bool("index.skiphash", (int *)&f->skip_hash); Aside from the question of whether we use the repo_*() variant here, which I noted in my reply to the CL. The cast is suspicious. So, in the 1/4 we added this as *unsigned*: + * If set to 1, skip_hash indicates that we should + * not actually compute the hash for this hashfile and + * instead only use it as a buffered write. + */ + unsigned int skip_hash; But you need the cast here since the config API can and will set the "dest" to -1. See the "*dest == -1" test in git_configset_get_value(). So, here we're relying on a "unsigned int" cast'd to "int" correctly doing the right thing on a "-1" assignment. I'm not sure if that's portable or leads to undefined behavior, but in any case, won't such a -1 value be read back as ~0 from that "unsigned int" variable on most modern platforms? Just bypassing that entirely and making it "int" seems better here, or having an intermediate variable. I also wondered if this was all my fault, in your original version you were doing: int skip_hash; [...] if (!git_config_get_maybe_bool("index.skiphash", &skip_hash)) f->skip_hash = skip_hash; And I suggested that this was redundant, and that you could just write to "f->skip_hash" directly. But I didn't notice it was "unsigned", and in any case your original version had the same issue of assigning a -1 to the unsigned variable...
diff --git a/Documentation/config/index.txt b/Documentation/config/index.txt index 75f3a2d1054..23c7985eb40 100644 --- a/Documentation/config/index.txt +++ b/Documentation/config/index.txt @@ -30,3 +30,14 @@ index.version:: Specify the version with which new index files should be initialized. This does not affect existing repositories. If `feature.manyFiles` is enabled, then the default is 4. + +index.skipHash:: + When enabled, do not compute the trailing hash for the index file. + This accelerates Git commands that manipulate the index, such as + `git add`, `git commit`, or `git status`. Instead of storing the + checksum, write a trailing set of bytes with value zero, indicating + that the computation was skipped. ++ +If you enable `index.skipHash`, then Git clients older than 2.13.0 will +refuse to parse the index and Git clients older than 2.40.0 will report an +error during `git fsck`. diff --git a/read-cache.c b/read-cache.c index 46f5e497b14..3f7de8b2e20 100644 --- a/read-cache.c +++ b/read-cache.c @@ -1817,6 +1817,8 @@ static int verify_hdr(const struct cache_header *hdr, unsigned long size) git_hash_ctx c; unsigned char hash[GIT_MAX_RAWSZ]; int hdr_version; + unsigned char *start, *end; + struct object_id oid; if (hdr->hdr_signature != htonl(CACHE_SIGNATURE)) return error(_("bad signature 0x%08x"), hdr->hdr_signature); @@ -1827,10 +1829,16 @@ static int verify_hdr(const struct cache_header *hdr, unsigned long size) if (!verify_index_checksum) return 0; + end = (unsigned char *)hdr + size; + start = end - the_hash_algo->rawsz; + oidread(&oid, start); + if (oideq(&oid, null_oid())) + return 0; + the_hash_algo->init_fn(&c); the_hash_algo->update_fn(&c, hdr, size - the_hash_algo->rawsz); the_hash_algo->final_fn(hash, &c); - if (!hasheq(hash, (unsigned char *)hdr + size - the_hash_algo->rawsz)) + if (!hasheq(hash, end - the_hash_algo->rawsz)) return error(_("bad index file sha1 signature")); return 0; } @@ -2918,6 +2926,8 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile, f = hashfd(tempfile->fd, tempfile->filename.buf); + git_config_get_maybe_bool("index.skiphash", (int *)&f->skip_hash); + for (i = removed = extended = 0; i < entries; i++) { if (cache[i]->ce_flags & CE_REMOVE) removed++; diff --git a/t/t1600-index.sh b/t/t1600-index.sh index 010989f90e6..45feb0fc5d8 100755 --- a/t/t1600-index.sh +++ b/t/t1600-index.sh @@ -65,6 +65,12 @@ test_expect_success 'out of bounds index.version issues warning' ' ) ' +test_expect_success 'index.skipHash config option' ' + rm -f .git/index && + git -c index.skipHash=true add a && + git fsck +' + test_index_version () { INDEX_VERSION_CONFIG=$1 && FEATURE_MANY_FILES=$2 &&