[11/30] chunk-format: parse trailing table of contents

Message ID	ebc719f92dd99bb6f5ae92104e87a05e520664d2.1667846164.git.gitgitgadget@gmail.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <git-owner@kernel.org> Message-Id: <ebc719f92dd99bb6f5ae92104e87a05e520664d2.1667846164.git.gitgitgadget@gmail.com> In-Reply-To: <pull.1408.git.1667846164.gitgitgadget@gmail.com> References: <pull.1408.git.1667846164.gitgitgadget@gmail.com> Date: Mon, 07 Nov 2022 18:35:45 +0000 Subject: [PATCH 11/30] chunk-format: parse trailing table of contents Fcc: Sent Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit MIME-Version: 1.0 To: git@vger.kernel.org Cc: jrnieder@gmail.com, Derrick Stolee <derrickstolee@github.com>, Derrick Stolee <derrickstolee@github.com> Precedence: bulk From: Derrick Stolee <derrickstolee@github.com>
Series	extensions.refFormat and packed-refs v2 file format \| expand [00/30,RFC] extensions.refFormat and packed-refs v2 file format [01/30] hashfile: allow skipping the hash function [02/30] read-cache: add index.computeHash config option [03/30] extensions: add refFormat extension [04/30] config: fix multi-level bulleted list [05/30] repository: wire ref extensions to ref backends [06/30] refs: allow loose files without packed-refs [07/30] chunk-format: number of chunks is optional [08/30] chunk-format: document trailing table of contents [09/30] chunk-format: store chunk offset during write [10/30] chunk-format: allow trailing table of contents [11/30] chunk-format: parse trailing table of contents [12/30] refs: extract packfile format to new file [13/30] packed-backend: extract add_write_error() [14/30] packed-backend: extract iterator/updates merge [15/30] packed-backend: create abstraction for writing refs [16/30] config: add config values for packed-refs v2 [17/30] packed-backend: create shell of v2 writes [18/30] packed-refs: write file format version 2 [19/30] packed-refs: read file format v2 [20/30] packed-refs: read optional prefix chunks [21/30] packed-refs: write prefix chunks [22/30] packed-backend: create GIT_TEST_PACKED_REFS_VERSION [23/30] t1409: test with packed-refs v2 [24/30] t5312: allow packed-refs v2 format [25/30] t5502: add PACKED_REFS_V1 prerequisite [26/30] t3210: require packed-refs v1 for some tests [27/30] t*: skip packed-refs v2 over http tests [28/30] ci: run GIT_TEST_PACKED_REFS_VERSION=2 in some builds [29/30] p1401: create performance test for ref operations [30/30] refs: skip hashing when writing packed-refs v2

Message ID

ebc719f92dd99bb6f5ae92104e87a05e520664d2.1667846164.git.gitgitgadget@gmail.com (mailing list archive)

State

New, archived

Headers

Message-Id: 
 <ebc719f92dd99bb6f5ae92104e87a05e520664d2.1667846164.git.gitgitgadget@gmail.com>
In-Reply-To: <pull.1408.git.1667846164.gitgitgadget@gmail.com>
References: <pull.1408.git.1667846164.gitgitgadget@gmail.com>
Date: Mon, 07 Nov 2022 18:35:45 +0000
Subject: [PATCH 11/30] chunk-format: parse trailing table of contents
Fcc: Sent
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
MIME-Version: 1.0
To: git@vger.kernel.org
Cc: jrnieder@gmail.com, Derrick Stolee <derrickstolee@github.com>,
        Derrick Stolee <derrickstolee@github.com>
Precedence: bulk
From: Derrick Stolee <derrickstolee@github.com>

Series

extensions.refFormat and packed-refs v2 file format | expand

Commit Message

Derrick Stolee Nov. 7, 2022, 6:35 p.m. UTC

From: Derrick Stolee <derrickstolee@github.com>

The new read_trailing_table_of_contents() mimics
read_table_of_contents() except that it reads the table of contents in
reverse from the end of the given hashfile. The file is given as a
memory-mapped section of memory and a size. Automatically calculate the
start of the trailing hash and read the table of contents in revers from
that position.

The errors come along from those in read_table_of_contents(). The one
exception is that the chunk_offset cannot be checked as going into the
table of contents since we do not have that length automatically. That
may have some surprising results for some narrow forms of corruption.
However, we do still limit the size to the size of the file plus the
part of the table of contents read so far. At minimum, the given sizes
can be used to limit parsing within the file itself.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 chunk-format.c | 53 ++++++++++++++++++++++++++++++++++++++++++++++++++
 chunk-format.h |  9 +++++++++
 2 files changed, 62 insertions(+)

diff --git a/chunk-format.c b/chunk-format.c
index 3f5cc9b5ddf..e836a121c5c 100644
--- a/chunk-format.c
+++ b/chunk-format.c
@@ -173,6 +173,59 @@  int read_table_of_contents(struct chunkfile *cf,
 	return 0;
 }
 
+int read_trailing_table_of_contents(struct chunkfile *cf,
+				    const unsigned char *mfile,
+				    size_t mfile_size)
+{
+	int i;
+	uint32_t chunk_id;
+	const unsigned char *table_of_contents = mfile + mfile_size - the_hash_algo->rawsz;
+
+	while (1) {
+		uint64_t chunk_offset;
+
+		table_of_contents -= CHUNK_TOC_ENTRY_SIZE;
+
+		chunk_id = get_be32(table_of_contents);
+		chunk_offset = get_be64(table_of_contents + 4);
+
+		/* Calculate the previous chunk size, if it exists. */
+		if (cf->chunks_nr) {
+			off_t previous_offset = cf->chunks[cf->chunks_nr - 1].offset;
+
+			if (chunk_offset < previous_offset ||
+			    chunk_offset > table_of_contents - mfile) {
+				error(_("improper chunk offset(s) %"PRIx64" and %"PRIx64""),
+				previous_offset, chunk_offset);
+				return -1;
+			}
+
+			cf->chunks[cf->chunks_nr - 1].size = chunk_offset - previous_offset;
+		}
+
+		/* Stop at the null chunk. We only need it for the last size. */
+		if (!chunk_id)
+			break;
+
+		for (i = 0; i < cf->chunks_nr; i++) {
+			if (cf->chunks[i].id == chunk_id) {
+				error(_("duplicate chunk ID %"PRIx32" found"),
+					chunk_id);
+				return -1;
+			}
+		}
+
+		ALLOC_GROW(cf->chunks, cf->chunks_nr + 1, cf->chunks_alloc);
+
+		cf->chunks[cf->chunks_nr].id = chunk_id;
+		cf->chunks[cf->chunks_nr].start = mfile + chunk_offset;
+		cf->chunks[cf->chunks_nr].offset = chunk_offset;
+		cf->chunks_nr++;
+	}
+
+	return 0;
+}
+
 static int pair_chunk_fn(const unsigned char *chunk_start,
 			 size_t chunk_size,
 			 void *data)
diff --git a/chunk-format.h b/chunk-format.h
index 39e8967e950..acb8dfbce80 100644
--- a/chunk-format.h
+++ b/chunk-format.h
@@ -46,6 +46,15 @@  int read_table_of_contents(struct chunkfile *cf,
 			   uint64_t toc_offset,
 			   int toc_length);
 
+/**
+ * Read the given chunkfile, but read the table of contents from the
+ * end of the given mfile. The file is expected to be a hashfile with
+ * the_hash_file->rawsz bytes at the end storing the hash.
+ */
+int read_trailing_table_of_contents(struct chunkfile *cf,
+				    const unsigned char *mfile,
+				    size_t mfile_size);
+
 #define CHUNK_NOT_FOUND (-2)
 
 /*

[11/30] chunk-format: parse trailing table of contents

Commit Message

Patch