diff mbox series

[v6,3/7] eoie: add End of Index Entry (EOIE) extension

Message ID 20180926195442.1380-4-benpeart@microsoft.com (mailing list archive)
State New, archived
Headers show
Series speed up index load through parallelization | expand

Commit Message

Ben Peart Sept. 26, 2018, 7:54 p.m. UTC
The End of Index Entry (EOIE) is used to locate the end of the variable
length index entries and the beginning of the extensions. Code can take
advantage of this to quickly locate the index extensions without having
to parse through all of the index entries.

Because it must be able to be loaded before the variable length cache
entries and other index extensions, this extension must be written last.
The signature for this extension is { 'E', 'O', 'I', 'E' }.

The extension consists of:

- 32-bit offset to the end of the index entries

- 160-bit SHA-1 over the extension types and their sizes (but not
their contents).  E.g. if we have "TREE" extension that is N-bytes
long, "REUC" extension that is M-bytes long, followed by "EOIE",
then the hash would be:

SHA-1("TREE" + <binary representation of N> +
	"REUC" + <binary representation of M>)

Signed-off-by: Ben Peart <Ben.Peart@microsoft.com>
---
 Documentation/technical/index-format.txt |  23 ++++
 read-cache.c                             | 151 +++++++++++++++++++++--
 t/README                                 |   5 +
 t/t1700-split-index.sh                   |   1 +
 4 files changed, 172 insertions(+), 8 deletions(-)

Comments

SZEDER Gábor Sept. 28, 2018, 12:19 a.m. UTC | #1
On Wed, Sep 26, 2018 at 03:54:38PM -0400, Ben Peart wrote:
> The End of Index Entry (EOIE) is used to locate the end of the variable

Nit: perhaps start with: 

  The End of Index Entry (EOIE) optional extension can be used to ...

to make it clearer for those who don't immediately realize the
significance of the upper case 'E' in the extension's signature.

> length index entries and the beginning of the extensions. Code can take
> advantage of this to quickly locate the index extensions without having
> to parse through all of the index entries.
> 
> Because it must be able to be loaded before the variable length cache
> entries and other index extensions, this extension must be written last.
> The signature for this extension is { 'E', 'O', 'I', 'E' }.
> 
> The extension consists of:
> 
> - 32-bit offset to the end of the index entries
> 
> - 160-bit SHA-1 over the extension types and their sizes (but not
> their contents).  E.g. if we have "TREE" extension that is N-bytes
> long, "REUC" extension that is M-bytes long, followed by "EOIE",
> then the hash would be:
> 
> SHA-1("TREE" + <binary representation of N> +
> 	"REUC" + <binary representation of M>)
> 
> Signed-off-by: Ben Peart <Ben.Peart@microsoft.com>
> ---
>  Documentation/technical/index-format.txt |  23 ++++
>  read-cache.c                             | 151 +++++++++++++++++++++--
>  t/README                                 |   5 +
>  t/t1700-split-index.sh                   |   1 +
>  4 files changed, 172 insertions(+), 8 deletions(-)
> 

> diff --git a/t/README b/t/README
> index 3ea6c85460..aa33ac4f26 100644
> --- a/t/README
> +++ b/t/README
> @@ -327,6 +327,11 @@ GIT_TEST_COMMIT_GRAPH=<boolean>, when true, forces the commit-graph to
>  be written after every 'git commit' command, and overrides the
>  'core.commitGraph' setting to true.
>  
> +GIT_TEST_DISABLE_EOIE=<boolean> disables writing the EOIE extension.
> +This is used to allow tests 1, 4-9 in t1700-split-index.sh to succeed
> +as they currently hard code SHA values for the index which are no longer
> +valid due to the addition of the EOIE extension.

Is this extension enabled by default?  The commit message doesn't
explicitly say so, but I don't see any way to turn it on or off, while
there is this new GIT_TEST environment variable to disable it for one
particular test, so it seems so.  If that's indeed the case, then
wouldn't it be better to update those hard-coded SHA1 values in t1700
instead?

>  Naming Tests
>  ------------
>  
> diff --git a/t/t1700-split-index.sh b/t/t1700-split-index.sh
> index be22398a85..1f168378c8 100755
> --- a/t/t1700-split-index.sh
> +++ b/t/t1700-split-index.sh
> @@ -7,6 +7,7 @@ test_description='split index mode tests'
>  # We need total control of index splitting here
>  sane_unset GIT_TEST_SPLIT_INDEX
>  sane_unset GIT_FSMONITOR_TEST
> +GIT_TEST_DISABLE_EOIE=true; export GIT_TEST_DISABLE_EOIE
>  
>  test_expect_success 'enable split index' '
>  	git config splitIndex.maxPercentChange 100 &&
> -- 
> 2.18.0.windows.1
>
Ben Peart Sept. 28, 2018, 6:38 p.m. UTC | #2
On 9/27/2018 8:19 PM, SZEDER Gábor wrote:
> On Wed, Sep 26, 2018 at 03:54:38PM -0400, Ben Peart wrote:
>> The End of Index Entry (EOIE) is used to locate the end of the variable
> 
> Nit: perhaps start with:
> 
>    The End of Index Entry (EOIE) optional extension can be used to ...
> 
> to make it clearer for those who don't immediately realize the
> significance of the upper case 'E' in the extension's signature.
> 
>> length index entries and the beginning of the extensions. Code can take
>> advantage of this to quickly locate the index extensions without having
>> to parse through all of the index entries.
>>
>> Because it must be able to be loaded before the variable length cache
>> entries and other index extensions, this extension must be written last.
>> The signature for this extension is { 'E', 'O', 'I', 'E' }.
>>
>> The extension consists of:
>>
>> - 32-bit offset to the end of the index entries
>>
>> - 160-bit SHA-1 over the extension types and their sizes (but not
>> their contents).  E.g. if we have "TREE" extension that is N-bytes
>> long, "REUC" extension that is M-bytes long, followed by "EOIE",
>> then the hash would be:
>>
>> SHA-1("TREE" + <binary representation of N> +
>> 	"REUC" + <binary representation of M>)
>>
>> Signed-off-by: Ben Peart <Ben.Peart@microsoft.com>
>> ---
>>   Documentation/technical/index-format.txt |  23 ++++
>>   read-cache.c                             | 151 +++++++++++++++++++++--
>>   t/README                                 |   5 +
>>   t/t1700-split-index.sh                   |   1 +
>>   4 files changed, 172 insertions(+), 8 deletions(-)
>>
> 
>> diff --git a/t/README b/t/README
>> index 3ea6c85460..aa33ac4f26 100644
>> --- a/t/README
>> +++ b/t/README
>> @@ -327,6 +327,11 @@ GIT_TEST_COMMIT_GRAPH=<boolean>, when true, forces the commit-graph to
>>   be written after every 'git commit' command, and overrides the
>>   'core.commitGraph' setting to true.
>>   
>> +GIT_TEST_DISABLE_EOIE=<boolean> disables writing the EOIE extension.
>> +This is used to allow tests 1, 4-9 in t1700-split-index.sh to succeed
>> +as they currently hard code SHA values for the index which are no longer
>> +valid due to the addition of the EOIE extension.
> 
> Is this extension enabled by default?  The commit message doesn't
> explicitly say so, but I don't see any way to turn it on or off, while
> there is this new GIT_TEST environment variable to disable it for one
> particular test, so it seems so.  If that's indeed the case, then
> wouldn't it be better to update those hard-coded SHA1 values in t1700
> instead?
> 

Yes, it is enabled by default and the only way to disable it is the 
GIT_TEST_DISABLE_EOIE environment variable.

The tests in t1700-split-index.sh assume that there are no extensions in 
the index file so anything that adds an extension, will break one or 
more of the tests.

First in 'enable split index', they hard code SHA values assuming there 
are no extensions. If some option adds an extension, these hard coded 
values no longer match and the test fails.

Later in 'disable split index' they save off the SHA of the index with 
split-index turned off and then in later tests, compare it to the SHA of 
the shared index.  Because extensions are stripped when the shared index 
is written out this only works if there were not extensions in the 
original index.

I'll document this behavior and reasoning in the test directly.

This did cause me to reexamine how EOIE and IEOT behave when split index 
is turned on.  These two extensions help most with a large index.  When 
split index is turned on, the large index is actually the shared index 
as the index is now the smaller set of deltas.

Currently, the extensions are stripped out of the shared index which 
means they are not available when they are needed to quickly load the 
shared index.  I'll see if I can update the patch so that these 
extensions are still written out and available in the shared index to 
speed up when it is loaded.

Thanks!

>>   Naming Tests
>>   ------------
>>   
>> diff --git a/t/t1700-split-index.sh b/t/t1700-split-index.sh
>> index be22398a85..1f168378c8 100755
>> --- a/t/t1700-split-index.sh
>> +++ b/t/t1700-split-index.sh
>> @@ -7,6 +7,7 @@ test_description='split index mode tests'
>>   # We need total control of index splitting here
>>   sane_unset GIT_TEST_SPLIT_INDEX
>>   sane_unset GIT_FSMONITOR_TEST
>> +GIT_TEST_DISABLE_EOIE=true; export GIT_TEST_DISABLE_EOIE
>>   
>>   test_expect_success 'enable split index' '
>>   	git config splitIndex.maxPercentChange 100 &&
>> -- 
>> 2.18.0.windows.1
>>
SZEDER Gábor Sept. 29, 2018, 12:51 a.m. UTC | #3
On Wed, Sep 26, 2018 at 03:54:38PM -0400, Ben Peart wrote:
> diff --git a/read-cache.c b/read-cache.c
> index 6ba99e2c96..80255d3088 100644
> --- a/read-cache.c
> +++ b/read-cache.c

> +static size_t read_eoie_extension(const char *mmap, size_t mmap_size)
> +{

<....>

> +	the_hash_algo->final_fn(hash, &c);
> +	if (hashcmp(hash, (const unsigned char *)index))
> +		return 0;

Please use !hasheq() instead of hashcmp().
Duy Nguyen Sept. 29, 2018, 5:45 a.m. UTC | #4
On Wed, Sep 26, 2018 at 03:54:38PM -0400, Ben Peart wrote:
> +
> +#define EOIE_SIZE (4 + GIT_SHA1_RAWSZ) /* <4-byte offset> + <20-byte hash> */
> +#define EOIE_SIZE_WITH_HEADER (4 + 4 + EOIE_SIZE) /* <4-byte signature> + <4-byte length> + EOIE_SIZE */

If you make these variables instead of macros, you can use
the_hash_algo, which makes this code sha256-friendlier and probably
can explain less, e.g. ...

> +
> +static size_t read_eoie_extension(const char *mmap, size_t mmap_size)
> +{
> +	/*
> +	 * The end of index entries (EOIE) extension is guaranteed to be last
> +	 * so that it can be found by scanning backwards from the EOF.
> +	 *
> +	 * "EOIE"
> +	 * <4-byte length>
> +	 * <4-byte offset>
> +	 * <20-byte hash>
> +	 */

	uint32_t EOIE_SIZE = 4 + the_hash_algo->rawsz;
	uint32_t EOIE_SIZE_WITH_HEADER = 4 + 4 + EOIE_SIZE;

> +	const char *index, *eoie;
> +	uint32_t extsize;
> +	size_t offset, src_offset;
> +	unsigned char hash[GIT_MAX_RAWSZ];
> +	git_hash_ctx c;
--
Duy
Junio C Hamano Sept. 29, 2018, 6:24 p.m. UTC | #5
Duy Nguyen <pclouds@gmail.com> writes:

> On Wed, Sep 26, 2018 at 03:54:38PM -0400, Ben Peart wrote:
>> +
>> +#define EOIE_SIZE (4 + GIT_SHA1_RAWSZ) /* <4-byte offset> + <20-byte hash> */
>> +#define EOIE_SIZE_WITH_HEADER (4 + 4 + EOIE_SIZE) /* <4-byte signature> + <4-byte length> + EOIE_SIZE */
>
> If you make these variables instead of macros, you can use
> the_hash_algo, which makes this code sha256-friendlier and probably
> can explain less, e.g. ...
>
>> +
>> +static size_t read_eoie_extension(const char *mmap, size_t mmap_size)
>> +{
>> +	/*
>> +	 * The end of index entries (EOIE) extension is guaranteed to be last
>> +	 * so that it can be found by scanning backwards from the EOF.
>> +	 *
>> +	 * "EOIE"
>> +	 * <4-byte length>
>> +	 * <4-byte offset>
>> +	 * <20-byte hash>

20? ;-)

>> +	 */
>
> 	uint32_t EOIE_SIZE = 4 + the_hash_algo->rawsz;
> 	uint32_t EOIE_SIZE_WITH_HEADER = 4 + 4 + EOIE_SIZE;
>
>> +	const char *index, *eoie;
>> +	uint32_t extsize;
>> +	size_t offset, src_offset;
>> +	unsigned char hash[GIT_MAX_RAWSZ];
>> +	git_hash_ctx c;
> --
> Duy
diff mbox series

Patch

diff --git a/Documentation/technical/index-format.txt b/Documentation/technical/index-format.txt
index db3572626b..6bc2d90f7f 100644
--- a/Documentation/technical/index-format.txt
+++ b/Documentation/technical/index-format.txt
@@ -314,3 +314,26 @@  The remaining data of each directory block is grouped by type:
 
   - An ewah bitmap, the n-th bit indicates whether the n-th index entry
     is not CE_FSMONITOR_VALID.
+
+== End of Index Entry
+
+  The End of Index Entry (EOIE) is used to locate the end of the variable
+  length index entries and the begining of the extensions. Code can take
+  advantage of this to quickly locate the index extensions without having
+  to parse through all of the index entries.
+
+  Because it must be able to be loaded before the variable length cache
+  entries and other index extensions, this extension must be written last.
+  The signature for this extension is { 'E', 'O', 'I', 'E' }.
+
+  The extension consists of:
+
+  - 32-bit offset to the end of the index entries
+
+  - 160-bit SHA-1 over the extension types and their sizes (but not
+	their contents).  E.g. if we have "TREE" extension that is N-bytes
+	long, "REUC" extension that is M-bytes long, followed by "EOIE",
+	then the hash would be:
+
+	SHA-1("TREE" + <binary representation of N> +
+		"REUC" + <binary representation of M>)
diff --git a/read-cache.c b/read-cache.c
index 6ba99e2c96..80255d3088 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -43,6 +43,7 @@ 
 #define CACHE_EXT_LINK 0x6c696e6b	  /* "link" */
 #define CACHE_EXT_UNTRACKED 0x554E5452	  /* "UNTR" */
 #define CACHE_EXT_FSMONITOR 0x46534D4E	  /* "FSMN" */
+#define CACHE_EXT_ENDOFINDEXENTRIES 0x454F4945	/* "EOIE" */
 
 /* changes that can be kept in $GIT_DIR/index (basically all extensions) */
 #define EXTMASK (RESOLVE_UNDO_CHANGED | CACHE_TREE_CHANGED | \
@@ -1693,6 +1694,9 @@  static int read_index_extension(struct index_state *istate,
 	case CACHE_EXT_FSMONITOR:
 		read_fsmonitor_extension(istate, data, sz);
 		break;
+	case CACHE_EXT_ENDOFINDEXENTRIES:
+		/* already handled in do_read_index() */
+		break;
 	default:
 		if (*ext < 'A' || 'Z' < *ext)
 			return error("index uses %.4s extension, which we do not understand",
@@ -1883,6 +1887,9 @@  static size_t estimate_cache_size(size_t ondisk_size, unsigned int entries)
 	return ondisk_size + entries * per_entry;
 }
 
+static size_t read_eoie_extension(const char *mmap, size_t mmap_size);
+static void write_eoie_extension(struct strbuf *sb, git_hash_ctx *eoie_context, size_t offset);
+
 /* remember to discard_cache() before reading a different cache! */
 int do_read_index(struct index_state *istate, const char *path, int must_exist)
 {
@@ -2190,11 +2197,15 @@  static int ce_write(git_hash_ctx *context, int fd, void *data, unsigned int len)
 	return 0;
 }
 
-static int write_index_ext_header(git_hash_ctx *context, int fd,
-				  unsigned int ext, unsigned int sz)
+static int write_index_ext_header(git_hash_ctx *context, git_hash_ctx *eoie_context,
+				  int fd, unsigned int ext, unsigned int sz)
 {
 	ext = htonl(ext);
 	sz = htonl(sz);
+	if (eoie_context) {
+		the_hash_algo->update_fn(eoie_context, &ext, 4);
+		the_hash_algo->update_fn(eoie_context, &sz, 4);
+	}
 	return ((ce_write(context, fd, &ext, 4) < 0) ||
 		(ce_write(context, fd, &sz, 4) < 0)) ? -1 : 0;
 }
@@ -2437,7 +2448,7 @@  static int do_write_index(struct index_state *istate, struct tempfile *tempfile,
 {
 	uint64_t start = getnanotime();
 	int newfd = tempfile->fd;
-	git_hash_ctx c;
+	git_hash_ctx c, eoie_c;
 	struct cache_header hdr;
 	int i, err = 0, removed, extended, hdr_version;
 	struct cache_entry **cache = istate->cache;
@@ -2446,6 +2457,7 @@  static int do_write_index(struct index_state *istate, struct tempfile *tempfile,
 	struct ondisk_cache_entry_extended ondisk;
 	struct strbuf previous_name_buf = STRBUF_INIT, *previous_name;
 	int drop_cache_tree = istate->drop_cache_tree;
+	off_t offset;
 
 	for (i = removed = extended = 0; i < entries; i++) {
 		if (cache[i]->ce_flags & CE_REMOVE)
@@ -2479,6 +2491,7 @@  static int do_write_index(struct index_state *istate, struct tempfile *tempfile,
 	if (ce_write(&c, newfd, &hdr, sizeof(hdr)) < 0)
 		return -1;
 
+	offset = lseek(newfd, 0, SEEK_CUR) + write_buffer_len;
 	previous_name = (hdr_version == 4) ? &previous_name_buf : NULL;
 
 	for (i = 0; i < entries; i++) {
@@ -2512,11 +2525,14 @@  static int do_write_index(struct index_state *istate, struct tempfile *tempfile,
 		return err;
 
 	/* Write extension data here */
+	offset = lseek(newfd, 0, SEEK_CUR) + write_buffer_len;
+	the_hash_algo->init_fn(&eoie_c);
+
 	if (!strip_extensions && istate->split_index) {
 		struct strbuf sb = STRBUF_INIT;
 
 		err = write_link_extension(&sb, istate) < 0 ||
-			write_index_ext_header(&c, newfd, CACHE_EXT_LINK,
+			write_index_ext_header(&c, &eoie_c, newfd, CACHE_EXT_LINK,
 					       sb.len) < 0 ||
 			ce_write(&c, newfd, sb.buf, sb.len) < 0;
 		strbuf_release(&sb);
@@ -2527,7 +2543,7 @@  static int do_write_index(struct index_state *istate, struct tempfile *tempfile,
 		struct strbuf sb = STRBUF_INIT;
 
 		cache_tree_write(&sb, istate->cache_tree);
-		err = write_index_ext_header(&c, newfd, CACHE_EXT_TREE, sb.len) < 0
+		err = write_index_ext_header(&c, &eoie_c, newfd, CACHE_EXT_TREE, sb.len) < 0
 			|| ce_write(&c, newfd, sb.buf, sb.len) < 0;
 		strbuf_release(&sb);
 		if (err)
@@ -2537,7 +2553,7 @@  static int do_write_index(struct index_state *istate, struct tempfile *tempfile,
 		struct strbuf sb = STRBUF_INIT;
 
 		resolve_undo_write(&sb, istate->resolve_undo);
-		err = write_index_ext_header(&c, newfd, CACHE_EXT_RESOLVE_UNDO,
+		err = write_index_ext_header(&c, &eoie_c, newfd, CACHE_EXT_RESOLVE_UNDO,
 					     sb.len) < 0
 			|| ce_write(&c, newfd, sb.buf, sb.len) < 0;
 		strbuf_release(&sb);
@@ -2548,7 +2564,7 @@  static int do_write_index(struct index_state *istate, struct tempfile *tempfile,
 		struct strbuf sb = STRBUF_INIT;
 
 		write_untracked_extension(&sb, istate->untracked);
-		err = write_index_ext_header(&c, newfd, CACHE_EXT_UNTRACKED,
+		err = write_index_ext_header(&c, &eoie_c, newfd, CACHE_EXT_UNTRACKED,
 					     sb.len) < 0 ||
 			ce_write(&c, newfd, sb.buf, sb.len) < 0;
 		strbuf_release(&sb);
@@ -2559,7 +2575,23 @@  static int do_write_index(struct index_state *istate, struct tempfile *tempfile,
 		struct strbuf sb = STRBUF_INIT;
 
 		write_fsmonitor_extension(&sb, istate);
-		err = write_index_ext_header(&c, newfd, CACHE_EXT_FSMONITOR, sb.len) < 0
+		err = write_index_ext_header(&c, &eoie_c, newfd, CACHE_EXT_FSMONITOR, sb.len) < 0
+			|| ce_write(&c, newfd, sb.buf, sb.len) < 0;
+		strbuf_release(&sb);
+		if (err)
+			return -1;
+	}
+
+	/*
+	 * CACHE_EXT_ENDOFINDEXENTRIES must be written as the last entry before the SHA1
+	 * so that it can be found and processed before all the index entries are
+	 * read.
+	 */
+	if (!strip_extensions && offset && !git_env_bool("GIT_TEST_DISABLE_EOIE", 0)) {
+		struct strbuf sb = STRBUF_INIT;
+
+		write_eoie_extension(&sb, &eoie_c, offset);
+		err = write_index_ext_header(&c, NULL, newfd, CACHE_EXT_ENDOFINDEXENTRIES, sb.len) < 0
 			|| ce_write(&c, newfd, sb.buf, sb.len) < 0;
 		strbuf_release(&sb);
 		if (err)
@@ -2975,3 +3007,106 @@  int should_validate_cache_entries(void)
 
 	return validate_index_cache_entries;
 }
+
+#define EOIE_SIZE (4 + GIT_SHA1_RAWSZ) /* <4-byte offset> + <20-byte hash> */
+#define EOIE_SIZE_WITH_HEADER (4 + 4 + EOIE_SIZE) /* <4-byte signature> + <4-byte length> + EOIE_SIZE */
+
+static size_t read_eoie_extension(const char *mmap, size_t mmap_size)
+{
+	/*
+	 * The end of index entries (EOIE) extension is guaranteed to be last
+	 * so that it can be found by scanning backwards from the EOF.
+	 *
+	 * "EOIE"
+	 * <4-byte length>
+	 * <4-byte offset>
+	 * <20-byte hash>
+	 */
+	const char *index, *eoie;
+	uint32_t extsize;
+	size_t offset, src_offset;
+	unsigned char hash[GIT_MAX_RAWSZ];
+	git_hash_ctx c;
+
+	/* ensure we have an index big enough to contain an EOIE extension */
+	if (mmap_size < sizeof(struct cache_header) + EOIE_SIZE_WITH_HEADER + the_hash_algo->rawsz)
+		return 0;
+
+	/* validate the extension signature */
+	index = eoie = mmap + mmap_size - EOIE_SIZE_WITH_HEADER - the_hash_algo->rawsz;
+	if (CACHE_EXT(index) != CACHE_EXT_ENDOFINDEXENTRIES)
+		return 0;
+	index += sizeof(uint32_t);
+
+	/* validate the extension size */
+	extsize = get_be32(index);
+	if (extsize != EOIE_SIZE)
+		return 0;
+	index += sizeof(uint32_t);
+
+	/*
+	 * Validate the offset we're going to look for the first extension
+	 * signature is after the index header and before the eoie extension.
+	 */
+	offset = get_be32(index);
+	if (mmap + offset < mmap + sizeof(struct cache_header))
+		return 0;
+	if (mmap + offset >= eoie)
+		return 0;
+	index += sizeof(uint32_t);
+
+	/*
+	 * The hash is computed over extension types and their sizes (but not
+	 * their contents).  E.g. if we have "TREE" extension that is N-bytes
+	 * long, "REUC" extension that is M-bytes long, followed by "EOIE",
+	 * then the hash would be:
+	 *
+	 * SHA-1("TREE" + <binary representation of N> +
+	 *	 "REUC" + <binary representation of M>)
+	 */
+	src_offset = offset;
+	the_hash_algo->init_fn(&c);
+	while (src_offset < mmap_size - the_hash_algo->rawsz - EOIE_SIZE_WITH_HEADER) {
+		/* After an array of active_nr index entries,
+		 * there can be arbitrary number of extended
+		 * sections, each of which is prefixed with
+		 * extension name (4-byte) and section length
+		 * in 4-byte network byte order.
+		 */
+		uint32_t extsize;
+		memcpy(&extsize, mmap + src_offset + 4, 4);
+		extsize = ntohl(extsize);
+
+		/* verify the extension size isn't so large it will wrap around */
+		if (src_offset + 8 + extsize < src_offset)
+			return 0;
+
+		the_hash_algo->update_fn(&c, mmap + src_offset, 8);
+
+		src_offset += 8;
+		src_offset += extsize;
+	}
+	the_hash_algo->final_fn(hash, &c);
+	if (hashcmp(hash, (const unsigned char *)index))
+		return 0;
+
+	/* Validate that the extension offsets returned us back to the eoie extension. */
+	if (src_offset != mmap_size - the_hash_algo->rawsz - EOIE_SIZE_WITH_HEADER)
+		return 0;
+
+	return offset;
+}
+
+static void write_eoie_extension(struct strbuf *sb, git_hash_ctx *eoie_context, size_t offset)
+{
+	uint32_t buffer;
+	unsigned char hash[GIT_MAX_RAWSZ];
+
+	/* offset */
+	put_be32(&buffer, offset);
+	strbuf_add(sb, &buffer, sizeof(uint32_t));
+
+	/* hash */
+	the_hash_algo->final_fn(hash, eoie_context);
+	strbuf_add(sb, hash, the_hash_algo->rawsz);
+}
diff --git a/t/README b/t/README
index 3ea6c85460..aa33ac4f26 100644
--- a/t/README
+++ b/t/README
@@ -327,6 +327,11 @@  GIT_TEST_COMMIT_GRAPH=<boolean>, when true, forces the commit-graph to
 be written after every 'git commit' command, and overrides the
 'core.commitGraph' setting to true.
 
+GIT_TEST_DISABLE_EOIE=<boolean> disables writing the EOIE extension.
+This is used to allow tests 1, 4-9 in t1700-split-index.sh to succeed
+as they currently hard code SHA values for the index which are no longer
+valid due to the addition of the EOIE extension.
+
 Naming Tests
 ------------
 
diff --git a/t/t1700-split-index.sh b/t/t1700-split-index.sh
index be22398a85..1f168378c8 100755
--- a/t/t1700-split-index.sh
+++ b/t/t1700-split-index.sh
@@ -7,6 +7,7 @@  test_description='split index mode tests'
 # We need total control of index splitting here
 sane_unset GIT_TEST_SPLIT_INDEX
 sane_unset GIT_FSMONITOR_TEST
+GIT_TEST_DISABLE_EOIE=true; export GIT_TEST_DISABLE_EOIE
 
 test_expect_success 'enable split index' '
 	git config splitIndex.maxPercentChange 100 &&