From patchwork Thu Feb 18 14:07:23 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 12093667 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 218E6C433E0 for ; Thu, 18 Feb 2021 16:48:12 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id BE1B76146D for ; Thu, 18 Feb 2021 16:48:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233395AbhBRQrY (ORCPT ); Thu, 18 Feb 2021 11:47:24 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41420 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233530AbhBROI3 (ORCPT ); Thu, 18 Feb 2021 09:08:29 -0500 Received: from mail-wr1-x433.google.com (mail-wr1-x433.google.com [IPv6:2a00:1450:4864:20::433]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 10DF2C061786 for ; Thu, 18 Feb 2021 06:07:44 -0800 (PST) Received: by mail-wr1-x433.google.com with SMTP id l12so3146665wry.2 for ; Thu, 18 Feb 2021 06:07:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=5AF9N0vk+cHk5hTxJy9cJp3P1lCV7muF3iRhjAsxxzE=; b=aTaoCZMBr/g5yfFWKjXObLP02VejX5q/uwJSeGnd1wBAMlbzyDsGKmNOst0I0vME3S YZEHGhRuDw0SqoYRr7+4YWndT/IYdu6yegyA2TFw0jwFSy3vehxOnDXU0AsqvMg0vgpm 3KYsj6KtP4baGIA+ssLjgwL1Q1xOLs1YPllnWm9Aoc1rfbBTPmZVYLdsFYtz4b+sVga1 MQPyrYR5+grfpgsg0leEFbCG35pDTV6c7uJ1PE34u6Xq2NxLHF7Gs7+dIiAGWvFjXiJA l9JSaI+EygRaioGaNk3uqFmuijUWf1q6Zu1tzMICcIBth6XIPwqGiZ4sVm4bDe+tsePd /x+Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=5AF9N0vk+cHk5hTxJy9cJp3P1lCV7muF3iRhjAsxxzE=; b=IYPu8jFpnIHf0JL/aizEFnLWyWiuEi5mvfSgY/4nRAaFlh0KnD1O9nQ/DmmccyvvIV mmlpr7Jarewe7+dTaj+4JzZmLUQnn4F/Wwp9YZuL9Srcfr7s9cPMdJOsx3CgeFH9tllk qGN0ibm7xH9mpOdeAy0PlH981QV9hICILI0p69dcV7KiBl/IuHuInSq4devEy76cH3RZ SkAJqlRjuz/hTRJD57xCGEkbFoDUmz6E/RNbBsZ6+t8O0QsZMxeoAfVSh+gv3tNHojFq 0ZHEI8B1etjap8zSpew/NhtL3StKjBXSSrI6DFgcDAoNDACt561iCTtah6DPaKE/RYca LELQ== X-Gm-Message-State: AOAM533X+847K4RnuunEDhhnAfuq8vSNYo7rLmq5e20fG7szCamd/s7B S/64beAANJ8J5Wn+AW17e8jmwdUL3MM= X-Google-Smtp-Source: ABdhPJxqzDn7PlBaI3zjDCYm33GoEHDnGPuP8X3J69oWVbq143tFABkzMB2HXY6xhOJOPm6zLdukKg== X-Received: by 2002:a5d:448c:: with SMTP id j12mr4674484wrq.63.1613657262696; Thu, 18 Feb 2021 06:07:42 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id t2sm8291419wru.53.2021.02.18.06.07.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 18 Feb 2021 06:07:42 -0800 (PST) Message-Id: <243dcec9436853ff8d1bf2580e76ab909b7cb324.1613657259.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Thu, 18 Feb 2021 14:07:23 +0000 Subject: [PATCH v4 01/17] commit-graph: anonymize data in chunk_write_fn Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: me@ttaylorr.com, gitster@pobox.com, l.s.r@web.de, szeder.dev@gmail.com, Chris Torek , Derrick Stolee , Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee In preparation for creating an API around file formats using chunks and tables of contents, prepare the commit-graph write code to use prototypes that will match this new API. Specifically, convert chunk_write_fn to take a "void *data" parameter instead of the commit-graph-specific "struct write_commit_graph_context" pointer. Signed-off-by: Derrick Stolee --- commit-graph.c | 29 +++++++++++++++++++---------- 1 file changed, 19 insertions(+), 10 deletions(-) diff --git a/commit-graph.c b/commit-graph.c index f3bde2ad95a1..fae7d1b63931 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -1040,8 +1040,9 @@ struct write_commit_graph_context { }; static int write_graph_chunk_fanout(struct hashfile *f, - struct write_commit_graph_context *ctx) + void *data) { + struct write_commit_graph_context *ctx = data; int i, count = 0; struct commit **list = ctx->commits.list; @@ -1066,8 +1067,9 @@ static int write_graph_chunk_fanout(struct hashfile *f, } static int write_graph_chunk_oids(struct hashfile *f, - struct write_commit_graph_context *ctx) + void *data) { + struct write_commit_graph_context *ctx = data; struct commit **list = ctx->commits.list; int count; for (count = 0; count < ctx->commits.nr; count++, list++) { @@ -1085,8 +1087,9 @@ static const unsigned char *commit_to_sha1(size_t index, void *table) } static int write_graph_chunk_data(struct hashfile *f, - struct write_commit_graph_context *ctx) + void *data) { + struct write_commit_graph_context *ctx = data; struct commit **list = ctx->commits.list; struct commit **last = ctx->commits.list + ctx->commits.nr; uint32_t num_extra_edges = 0; @@ -1187,8 +1190,9 @@ static int write_graph_chunk_data(struct hashfile *f, } static int write_graph_chunk_generation_data(struct hashfile *f, - struct write_commit_graph_context *ctx) + void *data) { + struct write_commit_graph_context *ctx = data; int i, num_generation_data_overflows = 0; for (i = 0; i < ctx->commits.nr; i++) { @@ -1208,8 +1212,9 @@ static int write_graph_chunk_generation_data(struct hashfile *f, } static int write_graph_chunk_generation_data_overflow(struct hashfile *f, - struct write_commit_graph_context *ctx) + void *data) { + struct write_commit_graph_context *ctx = data; int i; for (i = 0; i < ctx->commits.nr; i++) { struct commit *c = ctx->commits.list[i]; @@ -1226,8 +1231,9 @@ static int write_graph_chunk_generation_data_overflow(struct hashfile *f, } static int write_graph_chunk_extra_edges(struct hashfile *f, - struct write_commit_graph_context *ctx) + void *data) { + struct write_commit_graph_context *ctx = data; struct commit **list = ctx->commits.list; struct commit **last = ctx->commits.list + ctx->commits.nr; struct commit_list *parent; @@ -1280,8 +1286,9 @@ static int write_graph_chunk_extra_edges(struct hashfile *f, } static int write_graph_chunk_bloom_indexes(struct hashfile *f, - struct write_commit_graph_context *ctx) + void *data) { + struct write_commit_graph_context *ctx = data; struct commit **list = ctx->commits.list; struct commit **last = ctx->commits.list + ctx->commits.nr; uint32_t cur_pos = 0; @@ -1315,8 +1322,9 @@ static void trace2_bloom_filter_settings(struct write_commit_graph_context *ctx) } static int write_graph_chunk_bloom_data(struct hashfile *f, - struct write_commit_graph_context *ctx) + void *data) { + struct write_commit_graph_context *ctx = data; struct commit **list = ctx->commits.list; struct commit **last = ctx->commits.list + ctx->commits.nr; @@ -1737,8 +1745,9 @@ static int write_graph_chunk_base_1(struct hashfile *f, } static int write_graph_chunk_base(struct hashfile *f, - struct write_commit_graph_context *ctx) + void *data) { + struct write_commit_graph_context *ctx = data; int num = write_graph_chunk_base_1(f, ctx->new_base_graph); if (num != ctx->num_commit_graphs_after - 1) { @@ -1750,7 +1759,7 @@ static int write_graph_chunk_base(struct hashfile *f, } typedef int (*chunk_write_fn)(struct hashfile *f, - struct write_commit_graph_context *ctx); + void *data); struct chunk_info { uint32_t id; From patchwork Thu Feb 18 14:07:24 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 12093665 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 88A0EC433DB for ; Thu, 18 Feb 2021 16:46:59 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4748B6146D for ; Thu, 18 Feb 2021 16:46:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232929AbhBRQq1 (ORCPT ); Thu, 18 Feb 2021 11:46:27 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41424 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233550AbhBROI3 (ORCPT ); Thu, 18 Feb 2021 09:08:29 -0500 Received: from mail-wm1-x32f.google.com (mail-wm1-x32f.google.com [IPv6:2a00:1450:4864:20::32f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F3349C061788 for ; Thu, 18 Feb 2021 06:07:44 -0800 (PST) Received: by mail-wm1-x32f.google.com with SMTP id o10so2856282wmc.1 for ; Thu, 18 Feb 2021 06:07:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=eVXJhiJktjVMhH9E9QDbeeZ0spv2V4GDVPN9nLs5T2c=; b=uXal/FHBlp1xXfV70FZV9A/d/dmbSEvrTQa/JjOsZgcb1S+6//Cd+vcr7VeOmotzCt DVwkrutOyP6VNvLXPazVw/KNwmxUr8Kzj+AXUTFyul/IOlX2Dwi3Nb3Z+kwY8dhKq6Qp 5Q5StIWhpOzEQOEo8lTubnyHd4WIKebJX78De8qhPMDOZ8iUUIbNdYd5AcYIOKoAiN3X Q3u+rpnHoUAMaUcalXtYDBhC6OL695wuEJfd68RKVHWhssvMhQ7egPImXb5P7FqcYrjs uOCnrw/5EHaz5FA4V1lyBDTVYIvK+zdVEpQ70ECqnFBMFh5ADr3cdVFDsWecDJ4SmeMz 5dmQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=eVXJhiJktjVMhH9E9QDbeeZ0spv2V4GDVPN9nLs5T2c=; b=VxILc0MhfnnKQlBWPNypB2rXlBHmSlT9B3Nmoxvyrht+kkUxeA2EBg+SfmODp1uG7H Hjaycf9Mkh+IVrP9h798Ii/dFWliT7uq7rvY3bwR0t53hGXHYjPwvUVVfkkvHxyTnZot dIxXjqx4TQqhu5bAydKekR1EdJyorOhoehzRR6Wda1bI8XJIPzyY0bXnHvyYBe1+8U1O TRa3nx7PiFpFmuFZsuSNfdLTsSwJR64b7isLSjQKOwVQ24pivDxifnwFO+VSI/SskqFQ VHQ/I8F6QUC9Ps36PtT4fVD4nN4tpKs+O5NJchJGrNDdDJXwQJhMWd5ydM7DH+gWm7kT uweg== X-Gm-Message-State: AOAM533TjKv3NSCsm4LQCg/dYoU/xrYmCJFYc9G1g60w6488kRi026Ui rPf1Za76DlSQMDZ1UXeL6cnlIZFe2tg= X-Google-Smtp-Source: ABdhPJyP5cEQZKT+JFXMGerqG5r86GnGFhUEmbA1rkQK/Vs2tcvhQyfzDoKY6wioYFM4YK3cYk9DXQ== X-Received: by 2002:a7b:ce12:: with SMTP id m18mr3881999wmc.148.1613657263708; Thu, 18 Feb 2021 06:07:43 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id r17sm9395146wrx.82.2021.02.18.06.07.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 18 Feb 2021 06:07:43 -0800 (PST) Message-Id: In-Reply-To: References: Date: Thu, 18 Feb 2021 14:07:24 +0000 Subject: [PATCH v4 02/17] chunk-format: create chunk format write API Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: me@ttaylorr.com, gitster@pobox.com, l.s.r@web.de, szeder.dev@gmail.com, Chris Torek , Derrick Stolee , Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee In anticipation of combining the logic from the commit-graph and multi-pack-index file formats, create a new chunk-format API. Use a 'struct chunkfile' pointer to keep track of data that has been registered for writes. This struct is anonymous outside of chunk-format.c to ensure no user attempts to interfere with the data. The next change will use this API in commit-graph.c, but the general approach is: 1. initialize the chunkfile with init_chunkfile(f). 2. add chunks in the intended writing order with add_chunk(). 3. write any header information to the hashfile f. 4. write the chunkfile data using write_chunkfile(). 5. free the chunkfile struct using free_chunkfile(). Helped-by: Taylor Blau Helped-by: Junio C Hamano Signed-off-by: Derrick Stolee --- Makefile | 1 + chunk-format.c | 90 ++++++++++++++++++++++++++++++++++++++++++++++++++ chunk-format.h | 21 ++++++++++++ 3 files changed, 112 insertions(+) create mode 100644 chunk-format.c create mode 100644 chunk-format.h diff --git a/Makefile b/Makefile index 7b64106930a6..50a7663841e9 100644 --- a/Makefile +++ b/Makefile @@ -854,6 +854,7 @@ LIB_OBJS += bundle.o LIB_OBJS += cache-tree.o LIB_OBJS += chdir-notify.o LIB_OBJS += checkout.o +LIB_OBJS += chunk-format.o LIB_OBJS += color.o LIB_OBJS += column.o LIB_OBJS += combine-diff.o diff --git a/chunk-format.c b/chunk-format.c new file mode 100644 index 000000000000..6c9b52b70c10 --- /dev/null +++ b/chunk-format.c @@ -0,0 +1,90 @@ +#include "cache.h" +#include "chunk-format.h" +#include "csum-file.h" + +/* + * When writing a chunk-based file format, collect the chunks in + * an array of chunk_info structs. The size stores the _expected_ + * amount of data that will be written by write_fn. + */ +struct chunk_info { + uint32_t id; + uint64_t size; + chunk_write_fn write_fn; +}; + +struct chunkfile { + struct hashfile *f; + + struct chunk_info *chunks; + size_t chunks_nr; + size_t chunks_alloc; +}; + +struct chunkfile *init_chunkfile(struct hashfile *f) +{ + struct chunkfile *cf = xcalloc(1, sizeof(*cf)); + cf->f = f; + return cf; +} + +void free_chunkfile(struct chunkfile *cf) +{ + if (!cf) + return; + free(cf->chunks); + free(cf); +} + +int get_num_chunks(struct chunkfile *cf) +{ + return cf->chunks_nr; +} + +void add_chunk(struct chunkfile *cf, + uint32_t id, + size_t size, + chunk_write_fn fn) +{ + ALLOC_GROW(cf->chunks, cf->chunks_nr + 1, cf->chunks_alloc); + + cf->chunks[cf->chunks_nr].id = id; + cf->chunks[cf->chunks_nr].write_fn = fn; + cf->chunks[cf->chunks_nr].size = size; + cf->chunks_nr++; +} + +int write_chunkfile(struct chunkfile *cf, void *data) +{ + int i; + uint64_t cur_offset = hashfile_total(cf->f); + + /* Add the table of contents to the current offset */ + cur_offset += (cf->chunks_nr + 1) * CHUNK_TOC_ENTRY_SIZE; + + for (i = 0; i < cf->chunks_nr; i++) { + hashwrite_be32(cf->f, cf->chunks[i].id); + hashwrite_be64(cf->f, cur_offset); + + cur_offset += cf->chunks[i].size; + } + + /* Trailing entry marks the end of the chunks */ + hashwrite_be32(cf->f, 0); + hashwrite_be64(cf->f, cur_offset); + + for (i = 0; i < cf->chunks_nr; i++) { + off_t start_offset = hashfile_total(cf->f); + int result = cf->chunks[i].write_fn(cf->f, data); + + if (result) + return result; + + if (hashfile_total(cf->f) - start_offset != cf->chunks[i].size) + BUG("expected to write %"PRId64" bytes to chunk %"PRIx32", but wrote %"PRId64" instead", + cf->chunks[i].size, cf->chunks[i].id, + hashfile_total(cf->f) - start_offset); + } + + return 0; +} diff --git a/chunk-format.h b/chunk-format.h new file mode 100644 index 000000000000..ce598b66d9f8 --- /dev/null +++ b/chunk-format.h @@ -0,0 +1,21 @@ +#ifndef CHUNK_FORMAT_H +#define CHUNK_FORMAT_H + +#include "git-compat-util.h" + +struct hashfile; +struct chunkfile; + +#define CHUNK_TOC_ENTRY_SIZE (sizeof(uint32_t) + sizeof(uint64_t)) + +struct chunkfile *init_chunkfile(struct hashfile *f); +void free_chunkfile(struct chunkfile *cf); +int get_num_chunks(struct chunkfile *cf); +typedef int (*chunk_write_fn)(struct hashfile *f, void *data); +void add_chunk(struct chunkfile *cf, + uint32_t id, + size_t size, + chunk_write_fn fn); +int write_chunkfile(struct chunkfile *cf, void *data); + +#endif From patchwork Thu Feb 18 14:07:25 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 12093659 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 14894C43219 for ; Thu, 18 Feb 2021 16:45:16 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C87C164EB8 for ; Thu, 18 Feb 2021 16:45:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233864AbhBRQon (ORCPT ); Thu, 18 Feb 2021 11:44:43 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41432 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233568AbhBROI3 (ORCPT ); Thu, 18 Feb 2021 09:08:29 -0500 Received: from mail-wr1-x42e.google.com (mail-wr1-x42e.google.com [IPv6:2a00:1450:4864:20::42e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0DC8FC06178C for ; Thu, 18 Feb 2021 06:07:46 -0800 (PST) Received: by mail-wr1-x42e.google.com with SMTP id a4so900426wro.8 for ; Thu, 18 Feb 2021 06:07:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=A0mjRAGwx9qw7No/M0C4XtAhKDwaCLfcwX8f2+rRN6o=; b=ovwkXL1KYK9BOvFqq4w7tspwhdv8imP5JIgReKugwJa8tbDTQmC56sucfNnhoLkvuf onXlm1TbcvlLSEKfIvGCuZzmDCc+g+LVoFNK7unzAnk1A5yEhp3ztsWb7QJttHvoqZbO fPQcz+S3Ub4hEttiJ8xM+d8Zz8pNXGkiW4AvbQzhq3fhUViEBj+fKsfWerjerbYdi9J0 lLPFTTdh1H2949qk425nYizVBBrRaBn5+4CmFz1V7L14Z3TwAS0VbqCXWcOarC/lZaPD LFAK9b9QdKM6wNqCrkMjlnv5h3+x7z3hx4eZf9OcD4MIlF0N6coLFXdssgFtbgJTqnMF Yz4A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=A0mjRAGwx9qw7No/M0C4XtAhKDwaCLfcwX8f2+rRN6o=; b=X3viQ7+tW76hCk+5OzaVMChzIxBrdXJ//n2Wj+FDM5jHhOu64fyCezQxL5Z4z8vPWn WrM4CgaTN9BwD9Ho03x97Bvt++LoRxFAs2lah31/3UnGlK5DJLdFoV0vBU/Ea5FmNMF/ 2SY3Nc3dbW7zbXQbs9hxuIGKqaGTUHj9cA+AxAq2i0qSD5OQzESG0tK98mZ4zP82kES4 6F+0GeKty+owZcd3z0xu+JwjOMj2Z84HMgBf73CJNQ2rNUEzX9YMf5qFjjhX+2+ide6B 5FdL38iel66NH9IR23KBS+SktIvo2cJmGAlVH3srjcJxcTbEPdnqWqn7RIRcitF4NZum cnOA== X-Gm-Message-State: AOAM530rz0qsIO/z3i5v0TlkWkFlDM9qD/sMD2RmAvblr+lWEsWotsiA h2niphmc2g43ffviqIIravRBswfCyjc= X-Google-Smtp-Source: ABdhPJwPs0EUua+MnAETVYK6zcXLEg1Gf0PqMneS0bqE9HbTjGACb7jt6PxceaANJTZCGAq3DtKTZQ== X-Received: by 2002:a5d:6803:: with SMTP id w3mr4598726wru.300.1613657264702; Thu, 18 Feb 2021 06:07:44 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id b2sm9593682wrn.2.2021.02.18.06.07.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 18 Feb 2021 06:07:44 -0800 (PST) Message-Id: In-Reply-To: References: Date: Thu, 18 Feb 2021 14:07:25 +0000 Subject: [PATCH v4 03/17] commit-graph: use chunk-format write API Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: me@ttaylorr.com, gitster@pobox.com, l.s.r@web.de, szeder.dev@gmail.com, Chris Torek , Derrick Stolee , Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee The commit-graph write logic is ready to make use of the chunk-format write API. Each chunk write method is already in the correct prototype. We only need to use the 'struct chunkfile' pointer and the correct API calls. Signed-off-by: Derrick Stolee Reported-by: SZEDER Gábor Signed-off-by: Taylor Blau --- commit-graph.c | 119 +++++++++++++++---------------------------------- 1 file changed, 37 insertions(+), 82 deletions(-) diff --git a/commit-graph.c b/commit-graph.c index fae7d1b63931..a889130cc849 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -19,6 +19,7 @@ #include "shallow.h" #include "json-writer.h" #include "trace2.h" +#include "chunk-format.h" void git_test_write_commit_graph_or_die(void) { @@ -44,7 +45,6 @@ void git_test_write_commit_graph_or_die(void) #define GRAPH_CHUNKID_BLOOMINDEXES 0x42494458 /* "BIDX" */ #define GRAPH_CHUNKID_BLOOMDATA 0x42444154 /* "BDAT" */ #define GRAPH_CHUNKID_BASE 0x42415345 /* "BASE" */ -#define MAX_NUM_CHUNKS 9 #define GRAPH_DATA_WIDTH (the_hash_algo->rawsz + 16) @@ -1758,27 +1758,17 @@ static int write_graph_chunk_base(struct hashfile *f, return 0; } -typedef int (*chunk_write_fn)(struct hashfile *f, - void *data); - -struct chunk_info { - uint32_t id; - uint64_t size; - chunk_write_fn write_fn; -}; - static int write_commit_graph_file(struct write_commit_graph_context *ctx) { uint32_t i; int fd; struct hashfile *f; struct lock_file lk = LOCK_INIT; - struct chunk_info chunks[MAX_NUM_CHUNKS + 1]; const unsigned hashsz = the_hash_algo->rawsz; struct strbuf progress_title = STRBUF_INIT; int num_chunks = 3; - uint64_t chunk_offset; struct object_id file_hash; + struct chunkfile *cf; if (ctx->split) { struct strbuf tmp_file = STRBUF_INIT; @@ -1824,76 +1814,50 @@ static int write_commit_graph_file(struct write_commit_graph_context *ctx) f = hashfd(lk.tempfile->fd, lk.tempfile->filename.buf); } - chunks[0].id = GRAPH_CHUNKID_OIDFANOUT; - chunks[0].size = GRAPH_FANOUT_SIZE; - chunks[0].write_fn = write_graph_chunk_fanout; - chunks[1].id = GRAPH_CHUNKID_OIDLOOKUP; - chunks[1].size = hashsz * ctx->commits.nr; - chunks[1].write_fn = write_graph_chunk_oids; - chunks[2].id = GRAPH_CHUNKID_DATA; - chunks[2].size = (hashsz + 16) * ctx->commits.nr; - chunks[2].write_fn = write_graph_chunk_data; + cf = init_chunkfile(f); + + add_chunk(cf, GRAPH_CHUNKID_OIDFANOUT, GRAPH_FANOUT_SIZE, + write_graph_chunk_fanout); + add_chunk(cf, GRAPH_CHUNKID_OIDLOOKUP, hashsz * ctx->commits.nr, + write_graph_chunk_oids); + add_chunk(cf, GRAPH_CHUNKID_DATA, (hashsz + 16) * ctx->commits.nr, + write_graph_chunk_data); if (git_env_bool(GIT_TEST_COMMIT_GRAPH_NO_GDAT, 0)) ctx->write_generation_data = 0; - if (ctx->write_generation_data) { - chunks[num_chunks].id = GRAPH_CHUNKID_GENERATION_DATA; - chunks[num_chunks].size = sizeof(uint32_t) * ctx->commits.nr; - chunks[num_chunks].write_fn = write_graph_chunk_generation_data; - num_chunks++; - } - if (ctx->num_generation_data_overflows) { - chunks[num_chunks].id = GRAPH_CHUNKID_GENERATION_DATA_OVERFLOW; - chunks[num_chunks].size = sizeof(timestamp_t) * ctx->num_generation_data_overflows; - chunks[num_chunks].write_fn = write_graph_chunk_generation_data_overflow; - num_chunks++; - } - if (ctx->num_extra_edges) { - chunks[num_chunks].id = GRAPH_CHUNKID_EXTRAEDGES; - chunks[num_chunks].size = 4 * ctx->num_extra_edges; - chunks[num_chunks].write_fn = write_graph_chunk_extra_edges; - num_chunks++; - } + if (ctx->write_generation_data) + add_chunk(cf, GRAPH_CHUNKID_GENERATION_DATA, + sizeof(uint32_t) * ctx->commits.nr, + write_graph_chunk_generation_data); + if (ctx->num_generation_data_overflows) + add_chunk(cf, GRAPH_CHUNKID_GENERATION_DATA_OVERFLOW, + sizeof(timestamp_t) * ctx->num_generation_data_overflows, + write_graph_chunk_generation_data_overflow); + if (ctx->num_extra_edges) + add_chunk(cf, GRAPH_CHUNKID_EXTRAEDGES, + 4 * ctx->num_extra_edges, + write_graph_chunk_extra_edges); if (ctx->changed_paths) { - chunks[num_chunks].id = GRAPH_CHUNKID_BLOOMINDEXES; - chunks[num_chunks].size = sizeof(uint32_t) * ctx->commits.nr; - chunks[num_chunks].write_fn = write_graph_chunk_bloom_indexes; - num_chunks++; - chunks[num_chunks].id = GRAPH_CHUNKID_BLOOMDATA; - chunks[num_chunks].size = sizeof(uint32_t) * 3 - + ctx->total_bloom_filter_data_size; - chunks[num_chunks].write_fn = write_graph_chunk_bloom_data; - num_chunks++; - } - if (ctx->num_commit_graphs_after > 1) { - chunks[num_chunks].id = GRAPH_CHUNKID_BASE; - chunks[num_chunks].size = hashsz * (ctx->num_commit_graphs_after - 1); - chunks[num_chunks].write_fn = write_graph_chunk_base; - num_chunks++; - } - - chunks[num_chunks].id = 0; - chunks[num_chunks].size = 0; + add_chunk(cf, GRAPH_CHUNKID_BLOOMINDEXES, + sizeof(uint32_t) * ctx->commits.nr, + write_graph_chunk_bloom_indexes); + add_chunk(cf, GRAPH_CHUNKID_BLOOMDATA, + sizeof(uint32_t) * 3 + + ctx->total_bloom_filter_data_size, + write_graph_chunk_bloom_data); + } + if (ctx->num_commit_graphs_after > 1) + add_chunk(cf, GRAPH_CHUNKID_BASE, + hashsz * (ctx->num_commit_graphs_after - 1), + write_graph_chunk_base); hashwrite_be32(f, GRAPH_SIGNATURE); hashwrite_u8(f, GRAPH_VERSION); hashwrite_u8(f, oid_version()); - hashwrite_u8(f, num_chunks); + hashwrite_u8(f, get_num_chunks(cf)); hashwrite_u8(f, ctx->num_commit_graphs_after - 1); - chunk_offset = 8 + (num_chunks + 1) * GRAPH_CHUNKLOOKUP_WIDTH; - for (i = 0; i <= num_chunks; i++) { - uint32_t chunk_write[3]; - - chunk_write[0] = htonl(chunks[i].id); - chunk_write[1] = htonl(chunk_offset >> 32); - chunk_write[2] = htonl(chunk_offset & 0xffffffff); - hashwrite(f, chunk_write, 12); - - chunk_offset += chunks[i].size; - } - if (ctx->report_progress) { strbuf_addf(&progress_title, Q_("Writing out commit graph in %d pass", @@ -1905,17 +1869,7 @@ static int write_commit_graph_file(struct write_commit_graph_context *ctx) num_chunks * ctx->commits.nr); } - for (i = 0; i < num_chunks; i++) { - uint64_t start_offset = f->total + f->offset; - - if (chunks[i].write_fn(f, ctx)) - return -1; - - if (f->total + f->offset != start_offset + chunks[i].size) - BUG("expected to write %"PRId64" bytes to chunk %"PRIx32", but wrote %"PRId64" instead", - chunks[i].size, chunks[i].id, - f->total + f->offset - start_offset); - } + write_chunkfile(cf, ctx); stop_progress(&ctx->progress); strbuf_release(&progress_title); @@ -1932,6 +1886,7 @@ static int write_commit_graph_file(struct write_commit_graph_context *ctx) close_commit_graph(ctx->r->objects); finalize_hashfile(f, file_hash.hash, CSUM_HASH_IN_STREAM | CSUM_FSYNC); + free_chunkfile(cf); if (ctx->split) { FILE *chainf = fdopen_lock_file(&lk, "w"); From patchwork Thu Feb 18 14:07:26 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 12093661 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 26579C43217 for ; Thu, 18 Feb 2021 16:45:16 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id F38F561494 for ; Thu, 18 Feb 2021 16:45:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233876AbhBRQox (ORCPT ); Thu, 18 Feb 2021 11:44:53 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41434 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233559AbhBROI3 (ORCPT ); Thu, 18 Feb 2021 09:08:29 -0500 Received: from mail-wm1-x32d.google.com (mail-wm1-x32d.google.com [IPv6:2a00:1450:4864:20::32d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EB7B9C061793 for ; Thu, 18 Feb 2021 06:07:46 -0800 (PST) Received: by mail-wm1-x32d.google.com with SMTP id n10so4084739wmq.0 for ; Thu, 18 Feb 2021 06:07:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=sgUJGixoYfN4DpB6eRKdzHk/vq3xeR3NlBM/Tpw0GqY=; b=T4PQvpqzCgx7JLeP30/BBFmi7XC0D6iL/5NE8Ne1Gjg6WK3gc76N196sgSXO2Blt7a ++Nr370EMEy2zHKsnCF8qX1eU0V9W4oQ7vIHL2HvEm+cpsXleF6nWQjTAF+HslHur/bj tsq9H1qEYjt9zrrA7Ab2SlwNBF9xrAxZAAB96sG+zdj//PfU0pX8Z2jOoZQ98Sd3Nh6v SS10Um5reaOAwtWGV3wtzpaWzlHx6CQgosi2Z9y3gUk5uuL/uTdMl85wmoJ1d2M9zULk vsUJvfLLv5/zrdxu0PGQoqnA3BF4u6UM0ecfzGPSLDxhWIcSMgonKdL9/MfQDaykNqz4 DAbw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=sgUJGixoYfN4DpB6eRKdzHk/vq3xeR3NlBM/Tpw0GqY=; b=GirKvNlX2fWT6LS7ssCh6f7Cir++QTSJTzUxZSDeYC6lLAH2rWyIKPjyBZMLGtUqQV TgTboeEIEpoevLuM7MtOTq43PTpmNwRIaLQgLMx5NAUKqWQOIwPFur114+QMbJ8rCuFY TdKO4KehQtgGq66b11VHQCFBF2ftvchgOAp34lv7YYjKkrRj83t8dTFWW67lxZynBKj5 9jVAf/Pu53Olu5DWLhcq9NuHIpM4jp1Y2lZi9p7x3pjsEhS+ADWEA8gwY7l28wyOtT8+ Dgb+Knslnj3OI0GWXW24ZFUIxrk4Lbo4MYZkM7veZ67nus+xBDlJH6f+VKVbDK/HrPnW cEtw== X-Gm-Message-State: AOAM5303O5mrWKy45h4eKRh66deryTkBaeJutkFFe8uiCisMmhL/MymQ XMrJmK/rExPPVIE/7Y1p41NI+bjdvHA= X-Google-Smtp-Source: ABdhPJx09cZt21RU9y5Sgj8PFDs30fBJeNc6nzKjTNFcYUFRQn3yUssyMEB63C9HE5H/VIfE+qxz3w== X-Received: by 2002:a1c:4805:: with SMTP id v5mr1362014wma.177.1613657265620; Thu, 18 Feb 2021 06:07:45 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id v18sm3212363wrf.75.2021.02.18.06.07.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 18 Feb 2021 06:07:45 -0800 (PST) Message-Id: <8fdaae017dab92249b4bd1230212a67c486cb622.1613657259.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Thu, 18 Feb 2021 14:07:26 +0000 Subject: [PATCH v4 04/17] midx: rename pack_info to write_midx_context Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: me@ttaylorr.com, gitster@pobox.com, l.s.r@web.de, szeder.dev@gmail.com, Chris Torek , Derrick Stolee , Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee In an effort to streamline our chunk-based file formats, align some of the code structure in write_midx_internal() to be similar to the patterns in write_commit_graph_file(). Specifically, let's create a "struct write_midx_context" that can be used as a data parameter to abstract function types. This change only renames "struct pack_info" to "struct write_midx_context" and the names of instances from "packs" to "ctx". In future changes, we will expand the data inside "struct write_midx_context" and align our chunk-writing method with the chunk-format API. Signed-off-by: Derrick Stolee --- midx.c | 130 ++++++++++++++++++++++++++++----------------------------- 1 file changed, 65 insertions(+), 65 deletions(-) diff --git a/midx.c b/midx.c index 79c282b070d2..561f65a63a5b 100644 --- a/midx.c +++ b/midx.c @@ -451,7 +451,7 @@ static int pack_info_compare(const void *_a, const void *_b) return strcmp(a->pack_name, b->pack_name); } -struct pack_list { +struct write_midx_context { struct pack_info *info; uint32_t nr; uint32_t alloc; @@ -463,37 +463,37 @@ struct pack_list { static void add_pack_to_midx(const char *full_path, size_t full_path_len, const char *file_name, void *data) { - struct pack_list *packs = (struct pack_list *)data; + struct write_midx_context *ctx = data; if (ends_with(file_name, ".idx")) { - display_progress(packs->progress, ++packs->pack_paths_checked); - if (packs->m && midx_contains_pack(packs->m, file_name)) + display_progress(ctx->progress, ++ctx->pack_paths_checked); + if (ctx->m && midx_contains_pack(ctx->m, file_name)) return; - ALLOC_GROW(packs->info, packs->nr + 1, packs->alloc); + ALLOC_GROW(ctx->info, ctx->nr + 1, ctx->alloc); - packs->info[packs->nr].p = add_packed_git(full_path, - full_path_len, - 0); + ctx->info[ctx->nr].p = add_packed_git(full_path, + full_path_len, + 0); - if (!packs->info[packs->nr].p) { + if (!ctx->info[ctx->nr].p) { warning(_("failed to add packfile '%s'"), full_path); return; } - if (open_pack_index(packs->info[packs->nr].p)) { + if (open_pack_index(ctx->info[ctx->nr].p)) { warning(_("failed to open pack-index '%s'"), full_path); - close_pack(packs->info[packs->nr].p); - FREE_AND_NULL(packs->info[packs->nr].p); + close_pack(ctx->info[ctx->nr].p); + FREE_AND_NULL(ctx->info[ctx->nr].p); return; } - packs->info[packs->nr].pack_name = xstrdup(file_name); - packs->info[packs->nr].orig_pack_int_id = packs->nr; - packs->info[packs->nr].expired = 0; - packs->nr++; + ctx->info[ctx->nr].pack_name = xstrdup(file_name); + ctx->info[ctx->nr].orig_pack_int_id = ctx->nr; + ctx->info[ctx->nr].expired = 0; + ctx->nr++; } } @@ -801,7 +801,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * uint32_t i; struct hashfile *f = NULL; struct lock_file lk; - struct pack_list packs; + struct write_midx_context ctx = { 0 }; uint32_t *pack_perm = NULL; uint64_t written = 0; uint32_t chunk_ids[MIDX_MAX_CHUNKS + 1]; @@ -820,40 +820,40 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * midx_name); if (m) - packs.m = m; + ctx.m = m; else - packs.m = load_multi_pack_index(object_dir, 1); - - packs.nr = 0; - packs.alloc = packs.m ? packs.m->num_packs : 16; - packs.info = NULL; - ALLOC_ARRAY(packs.info, packs.alloc); - - if (packs.m) { - for (i = 0; i < packs.m->num_packs; i++) { - ALLOC_GROW(packs.info, packs.nr + 1, packs.alloc); - - packs.info[packs.nr].orig_pack_int_id = i; - packs.info[packs.nr].pack_name = xstrdup(packs.m->pack_names[i]); - packs.info[packs.nr].p = NULL; - packs.info[packs.nr].expired = 0; - packs.nr++; + ctx.m = load_multi_pack_index(object_dir, 1); + + ctx.nr = 0; + ctx.alloc = ctx.m ? ctx.m->num_packs : 16; + ctx.info = NULL; + ALLOC_ARRAY(ctx.info, ctx.alloc); + + if (ctx.m) { + for (i = 0; i < ctx.m->num_packs; i++) { + ALLOC_GROW(ctx.info, ctx.nr + 1, ctx.alloc); + + ctx.info[ctx.nr].orig_pack_int_id = i; + ctx.info[ctx.nr].pack_name = xstrdup(ctx.m->pack_names[i]); + ctx.info[ctx.nr].p = NULL; + ctx.info[ctx.nr].expired = 0; + ctx.nr++; } } - packs.pack_paths_checked = 0; + ctx.pack_paths_checked = 0; if (flags & MIDX_PROGRESS) - packs.progress = start_delayed_progress(_("Adding packfiles to multi-pack-index"), 0); + ctx.progress = start_delayed_progress(_("Adding packfiles to multi-pack-index"), 0); else - packs.progress = NULL; + ctx.progress = NULL; - for_each_file_in_pack_dir(object_dir, add_pack_to_midx, &packs); - stop_progress(&packs.progress); + for_each_file_in_pack_dir(object_dir, add_pack_to_midx, &ctx); + stop_progress(&ctx.progress); - if (packs.m && packs.nr == packs.m->num_packs && !packs_to_drop) + if (ctx.m && ctx.nr == ctx.m->num_packs && !packs_to_drop) goto cleanup; - entries = get_sorted_entries(packs.m, packs.info, packs.nr, &nr_entries); + entries = get_sorted_entries(ctx.m, ctx.info, ctx.nr, &nr_entries); for (i = 0; i < nr_entries; i++) { if (entries[i].offset > 0x7fffffff) @@ -862,19 +862,19 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * large_offsets_needed = 1; } - QSORT(packs.info, packs.nr, pack_info_compare); + QSORT(ctx.info, ctx.nr, pack_info_compare); if (packs_to_drop && packs_to_drop->nr) { int drop_index = 0; int missing_drops = 0; - for (i = 0; i < packs.nr && drop_index < packs_to_drop->nr; i++) { - int cmp = strcmp(packs.info[i].pack_name, + for (i = 0; i < ctx.nr && drop_index < packs_to_drop->nr; i++) { + int cmp = strcmp(ctx.info[i].pack_name, packs_to_drop->items[drop_index].string); if (!cmp) { drop_index++; - packs.info[i].expired = 1; + ctx.info[i].expired = 1; } else if (cmp > 0) { error(_("did not see pack-file %s to drop"), packs_to_drop->items[drop_index].string); @@ -882,7 +882,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * missing_drops++; i--; } else { - packs.info[i].expired = 0; + ctx.info[i].expired = 0; } } @@ -898,19 +898,19 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * * * pack_perm[old_id] = new_id */ - ALLOC_ARRAY(pack_perm, packs.nr); - for (i = 0; i < packs.nr; i++) { - if (packs.info[i].expired) { + ALLOC_ARRAY(pack_perm, ctx.nr); + for (i = 0; i < ctx.nr; i++) { + if (ctx.info[i].expired) { dropped_packs++; - pack_perm[packs.info[i].orig_pack_int_id] = PACK_EXPIRED; + pack_perm[ctx.info[i].orig_pack_int_id] = PACK_EXPIRED; } else { - pack_perm[packs.info[i].orig_pack_int_id] = i - dropped_packs; + pack_perm[ctx.info[i].orig_pack_int_id] = i - dropped_packs; } } - for (i = 0; i < packs.nr; i++) { - if (!packs.info[i].expired) - pack_name_concat_len += strlen(packs.info[i].pack_name) + 1; + for (i = 0; i < ctx.nr; i++) { + if (!ctx.info[i].expired) + pack_name_concat_len += strlen(ctx.info[i].pack_name) + 1; } if (pack_name_concat_len % MIDX_CHUNK_ALIGNMENT) @@ -921,19 +921,19 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * f = hashfd(lk.tempfile->fd, lk.tempfile->filename.buf); FREE_AND_NULL(midx_name); - if (packs.m) - close_midx(packs.m); + if (ctx.m) + close_midx(ctx.m); cur_chunk = 0; num_chunks = large_offsets_needed ? 5 : 4; - if (packs.nr - dropped_packs == 0) { + if (ctx.nr - dropped_packs == 0) { error(_("no pack files to index.")); result = 1; goto cleanup; } - written = write_midx_header(f, num_chunks, packs.nr - dropped_packs); + written = write_midx_header(f, num_chunks, ctx.nr - dropped_packs); chunk_ids[cur_chunk] = MIDX_CHUNKID_PACKNAMES; chunk_offsets[cur_chunk] = written + (num_chunks + 1) * MIDX_CHUNKLOOKUP_WIDTH; @@ -990,7 +990,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * switch (chunk_ids[i]) { case MIDX_CHUNKID_PACKNAMES: - written += write_midx_pack_names(f, packs.info, packs.nr); + written += write_midx_pack_names(f, ctx.info, ctx.nr); break; case MIDX_CHUNKID_OIDFANOUT: @@ -1027,15 +1027,15 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * commit_lock_file(&lk); cleanup: - for (i = 0; i < packs.nr; i++) { - if (packs.info[i].p) { - close_pack(packs.info[i].p); - free(packs.info[i].p); + for (i = 0; i < ctx.nr; i++) { + if (ctx.info[i].p) { + close_pack(ctx.info[i].p); + free(ctx.info[i].p); } - free(packs.info[i].pack_name); + free(ctx.info[i].pack_name); } - free(packs.info); + free(ctx.info); free(entries); free(pack_perm); free(midx_name); From patchwork Thu Feb 18 14:07:27 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 12093657 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id AA381C43333 for ; Thu, 18 Feb 2021 16:45:15 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 86A1964EB1 for ; Thu, 18 Feb 2021 16:45:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233853AbhBRQoW (ORCPT ); Thu, 18 Feb 2021 11:44:22 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40814 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233553AbhBROI3 (ORCPT ); Thu, 18 Feb 2021 09:08:29 -0500 Received: from mail-wr1-x433.google.com (mail-wr1-x433.google.com [IPv6:2a00:1450:4864:20::433]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CE354C061794 for ; Thu, 18 Feb 2021 06:07:47 -0800 (PST) Received: by mail-wr1-x433.google.com with SMTP id n4so3156552wrx.1 for ; Thu, 18 Feb 2021 06:07:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=kGVHM52oWgqXnIlb4yOoAhhJoKcEoCuXj3Nru4t5efI=; b=fMyNDODKs+OFEeRBiqcGXEz83qEsrbV5S/IBSy5Iakb0uwIWC7ediSyBHMBTzAIZRu UqKoBtiRrUV3Wa6bR5yqmcnCZ5zi5IxM9JUNnp53SUERj0mAg5M8qukc58nM/mLrGfYb 7yKwLpmLn2ktU6C2d4L1dSLmWoirH2gkndLXYAnOZ/FyZSGF6IGz+vWjMv1boDbRj/Ro sNNvsHywt2h4PnKnSucLmqIgoCClyH8/rBqA1zWYuBUEvK26GqVh4RQRaLX1M475CnFT VzNliDmtgS+2D/FyAP31uDb8pduUhQHVdHR9OcTYrRfSSoj7oTkg7NIjXsWW35woLZ30 G5mg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=kGVHM52oWgqXnIlb4yOoAhhJoKcEoCuXj3Nru4t5efI=; b=YbnGWsF9rXYmJohhr2otydgihEoMAZy+G1wlxz0QznmilNraOrOy9jfnNn+OGif6Fc Pd3QQ3wyh1PUKxJDWl7zR3g4rGG7eXPoFgV8o7I7Hgf2G2r/c90Y7KJDi6z4nZt/EEvq SwR5Qf+rjvP1WDUtj6KxT1OP0VNpkwRModZFpy0FLRDJ7Z+7WIcDVraq1Od37IVzxcAw R2e+OxmM+wItXufutBP36SFulLE0hm0+9dd/85nVVguRyDWxzH4Wm/ZME+aaGj7i6KR9 IBWZpK871BUSG31wi4YY6dzLcl0li+9I1KsdUKtkqTzV0KzIBZfoZnG90IRZRb+/FGsB iF7A== X-Gm-Message-State: AOAM530jMAm4XPWRoQw5F0IegLbiFPVbKVMIpXF9BMpfFeX3PjkL8rg+ 1C3x9ukBszz8PKu9LAhgM9ThSpktft8= X-Google-Smtp-Source: ABdhPJyR0cjaAAv4+pWar8Js7uqpT8U/YSdHIuEYuACw/8gDbjF5A0GFa62KVFZYRvYT0GQRVOtNEQ== X-Received: by 2002:a5d:6b45:: with SMTP id x5mr4363542wrw.415.1613657266541; Thu, 18 Feb 2021 06:07:46 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id l22sm2486598wrb.4.2021.02.18.06.07.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 18 Feb 2021 06:07:46 -0800 (PST) Message-Id: In-Reply-To: References: Date: Thu, 18 Feb 2021 14:07:27 +0000 Subject: [PATCH v4 05/17] midx: use context in write_midx_pack_names() Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: me@ttaylorr.com, gitster@pobox.com, l.s.r@web.de, szeder.dev@gmail.com, Chris Torek , Derrick Stolee , Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee In an effort to align the write_midx_internal() to use the chunk-format API, start converting chunk writing methods to match chunk_write_fn. The first case is to convert write_midx_pack_names() to take "void *data". We already have the necessary data in "struct write_midx_context", so this conversion is rather mechanical. Signed-off-by: Derrick Stolee --- midx.c | 21 ++++++++++----------- 1 file changed, 10 insertions(+), 11 deletions(-) diff --git a/midx.c b/midx.c index 561f65a63a5b..88452b044337 100644 --- a/midx.c +++ b/midx.c @@ -643,27 +643,26 @@ static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m, return deduplicated_entries; } -static size_t write_midx_pack_names(struct hashfile *f, - struct pack_info *info, - uint32_t num_packs) +static size_t write_midx_pack_names(struct hashfile *f, void *data) { + struct write_midx_context *ctx = data; uint32_t i; unsigned char padding[MIDX_CHUNK_ALIGNMENT]; size_t written = 0; - for (i = 0; i < num_packs; i++) { + for (i = 0; i < ctx->nr; i++) { size_t writelen; - if (info[i].expired) + if (ctx->info[i].expired) continue; - if (i && strcmp(info[i].pack_name, info[i - 1].pack_name) <= 0) + if (i && strcmp(ctx->info[i].pack_name, ctx->info[i - 1].pack_name) <= 0) BUG("incorrect pack-file order: %s before %s", - info[i - 1].pack_name, - info[i].pack_name); + ctx->info[i - 1].pack_name, + ctx->info[i].pack_name); - writelen = strlen(info[i].pack_name) + 1; - hashwrite(f, info[i].pack_name, writelen); + writelen = strlen(ctx->info[i].pack_name) + 1; + hashwrite(f, ctx->info[i].pack_name, writelen); written += writelen; } @@ -990,7 +989,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * switch (chunk_ids[i]) { case MIDX_CHUNKID_PACKNAMES: - written += write_midx_pack_names(f, ctx.info, ctx.nr); + written += write_midx_pack_names(f, &ctx); break; case MIDX_CHUNKID_OIDFANOUT: From patchwork Thu Feb 18 14:07:28 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 12093663 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7ADDDC433E0 for ; Thu, 18 Feb 2021 16:46:29 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 18DA464EB1 for ; Thu, 18 Feb 2021 16:46:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233886AbhBRQpN (ORCPT ); Thu, 18 Feb 2021 11:45:13 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40692 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233252AbhBROI3 (ORCPT ); Thu, 18 Feb 2021 09:08:29 -0500 Received: from mail-wr1-x432.google.com (mail-wr1-x432.google.com [IPv6:2a00:1450:4864:20::432]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D44D9C061797 for ; Thu, 18 Feb 2021 06:07:48 -0800 (PST) Received: by mail-wr1-x432.google.com with SMTP id l12so3147009wry.2 for ; Thu, 18 Feb 2021 06:07:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=SmEIaXCinREEUX3RTyO7nxVqzBVoWKR9j4KM2/Wy8VQ=; b=qeumvM77id041hhsPC/jo6Ut28t/DE0HBseO0LrqSy4pP1oHF/8N9vwmH9suubdlKF cSs0lh76NGEyaz5EmrX5/y0UsUWIgH92Qk5MNxa7LoZ2fwPToIlFJ5n+xO9+U1XX9a+y f8MhDM/s23+rZVJo6I2HrbukghpimPomYItwPJTpQy23oy/jgyQroYjTw1jfUiu67PaF DibNGdsL2fyzhhD99NlBZkPky5L3hsrAnHv4VDuZNB7WZgLmOCuxMCepn7Q9ibqCaH5x n4K8Xc/Aa6bOGQsziUqxUodj/O3ef9FGi7WK/zTQIKx4ZQBN5kxovFMVEX3PFJHoBUOa Xlxg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=SmEIaXCinREEUX3RTyO7nxVqzBVoWKR9j4KM2/Wy8VQ=; b=Y3tn/sYR5rZVu/Ea8ISkt2pZnOskilg1cMVHU+wcF9BKd7sQHLmBS0QM1la2qBXcCm S294/7BXQJ8xviy3diaBpFJimW6/R1ZyCfGIS5euzQvP8ALL2TrR8EIBsst56Ipqj4EE wWYu5KWykwyVy0WUtQJ2ncERZ9L+hHqrOOy5kOxbK4CPHlfOZG4Hud8PUvsQnEMyiX00 jAWGwJyzYnLP4vyD/kJL4cy5CXqsHRsqOJCC2YgOcP+jIv3gZ/rhmC+thtd4kiJvYFiH aWhn1ab04Vh9o8DaJiaSeGz3c6qTKuYsBYfb70eL4Zgnj1wNWqUmZP/ZvV9U+F4++0Cs fBLw== X-Gm-Message-State: AOAM532kHKtei3oJQ5mIWcOq/XINZ1hDUzmZxPMLrDthWm/SRy97I9VP XXJM3Sl2gNAEywobIWkFmlfFmtCYqi8= X-Google-Smtp-Source: ABdhPJzUxWyzxZlT+dOBiMscUqWhEe+0zBvpAF1j2stU6ZTKJ0gozdFu0SFYXxnzYO+vXMNwvg4XCw== X-Received: by 2002:a5d:4850:: with SMTP id n16mr4607953wrs.296.1613657267392; Thu, 18 Feb 2021 06:07:47 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id o13sm11189417wrs.45.2021.02.18.06.07.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 18 Feb 2021 06:07:46 -0800 (PST) Message-Id: <2bb2847aad6edba152f1fa249e8e7d0401b0a185.1613657259.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Thu, 18 Feb 2021 14:07:28 +0000 Subject: [PATCH v4 06/17] midx: add entries to write_midx_context Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: me@ttaylorr.com, gitster@pobox.com, l.s.r@web.de, szeder.dev@gmail.com, Chris Torek , Derrick Stolee , Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee In an effort to align write_midx_internal() with the chunk-format API, continue to group necessary data into "struct write_midx_context". This change collects the "struct pack_midx_entry *entries" list and its count into the context. Update write_midx_oid_fanout() and write_midx_oid_lookup() to take the context directly, as these are easy conversions with this new data. Only the callers of write_midx_object_offsets() and write_midx_large_offsets() are updated here, since additional data in the context before those methods can match chunk_write_fn. Signed-off-by: Derrick Stolee --- midx.c | 49 ++++++++++++++++++++++++++----------------------- 1 file changed, 26 insertions(+), 23 deletions(-) diff --git a/midx.c b/midx.c index 88452b044337..4520ef82b91b 100644 --- a/midx.c +++ b/midx.c @@ -458,6 +458,9 @@ struct write_midx_context { struct multi_pack_index *m; struct progress *progress; unsigned pack_paths_checked; + + struct pack_midx_entry *entries; + uint32_t entries_nr; }; static void add_pack_to_midx(const char *full_path, size_t full_path_len, @@ -678,11 +681,11 @@ static size_t write_midx_pack_names(struct hashfile *f, void *data) } static size_t write_midx_oid_fanout(struct hashfile *f, - struct pack_midx_entry *objects, - uint32_t nr_objects) + void *data) { - struct pack_midx_entry *list = objects; - struct pack_midx_entry *last = objects + nr_objects; + struct write_midx_context *ctx = data; + struct pack_midx_entry *list = ctx->entries; + struct pack_midx_entry *last = ctx->entries + ctx->entries_nr; uint32_t count = 0; uint32_t i; @@ -706,18 +709,19 @@ static size_t write_midx_oid_fanout(struct hashfile *f, return MIDX_CHUNK_FANOUT_SIZE; } -static size_t write_midx_oid_lookup(struct hashfile *f, unsigned char hash_len, - struct pack_midx_entry *objects, - uint32_t nr_objects) +static size_t write_midx_oid_lookup(struct hashfile *f, + void *data) { - struct pack_midx_entry *list = objects; + struct write_midx_context *ctx = data; + unsigned char hash_len = the_hash_algo->rawsz; + struct pack_midx_entry *list = ctx->entries; uint32_t i; size_t written = 0; - for (i = 0; i < nr_objects; i++) { + for (i = 0; i < ctx->entries_nr; i++) { struct pack_midx_entry *obj = list++; - if (i < nr_objects - 1) { + if (i < ctx->entries_nr - 1) { struct pack_midx_entry *next = list; if (oidcmp(&obj->oid, &next->oid) >= 0) BUG("OIDs not in order: %s >= %s", @@ -805,8 +809,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * uint64_t written = 0; uint32_t chunk_ids[MIDX_MAX_CHUNKS + 1]; uint64_t chunk_offsets[MIDX_MAX_CHUNKS + 1]; - uint32_t nr_entries, num_large_offsets = 0; - struct pack_midx_entry *entries = NULL; + uint32_t num_large_offsets = 0; struct progress *progress = NULL; int large_offsets_needed = 0; int pack_name_concat_len = 0; @@ -852,12 +855,12 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * if (ctx.m && ctx.nr == ctx.m->num_packs && !packs_to_drop) goto cleanup; - entries = get_sorted_entries(ctx.m, ctx.info, ctx.nr, &nr_entries); + ctx.entries = get_sorted_entries(ctx.m, ctx.info, ctx.nr, &ctx.entries_nr); - for (i = 0; i < nr_entries; i++) { - if (entries[i].offset > 0x7fffffff) + for (i = 0; i < ctx.entries_nr; i++) { + if (ctx.entries[i].offset > 0x7fffffff) num_large_offsets++; - if (entries[i].offset > 0xffffffff) + if (ctx.entries[i].offset > 0xffffffff) large_offsets_needed = 1; } @@ -947,10 +950,10 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * cur_chunk++; chunk_ids[cur_chunk] = MIDX_CHUNKID_OBJECTOFFSETS; - chunk_offsets[cur_chunk] = chunk_offsets[cur_chunk - 1] + nr_entries * the_hash_algo->rawsz; + chunk_offsets[cur_chunk] = chunk_offsets[cur_chunk - 1] + ctx.entries_nr * the_hash_algo->rawsz; cur_chunk++; - chunk_offsets[cur_chunk] = chunk_offsets[cur_chunk - 1] + nr_entries * MIDX_CHUNK_OFFSET_WIDTH; + chunk_offsets[cur_chunk] = chunk_offsets[cur_chunk - 1] + ctx.entries_nr * MIDX_CHUNK_OFFSET_WIDTH; if (large_offsets_needed) { chunk_ids[cur_chunk] = MIDX_CHUNKID_LARGEOFFSETS; @@ -993,19 +996,19 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * break; case MIDX_CHUNKID_OIDFANOUT: - written += write_midx_oid_fanout(f, entries, nr_entries); + written += write_midx_oid_fanout(f, &ctx); break; case MIDX_CHUNKID_OIDLOOKUP: - written += write_midx_oid_lookup(f, the_hash_algo->rawsz, entries, nr_entries); + written += write_midx_oid_lookup(f, &ctx); break; case MIDX_CHUNKID_OBJECTOFFSETS: - written += write_midx_object_offsets(f, large_offsets_needed, pack_perm, entries, nr_entries); + written += write_midx_object_offsets(f, large_offsets_needed, pack_perm, ctx.entries, ctx.entries_nr); break; case MIDX_CHUNKID_LARGEOFFSETS: - written += write_midx_large_offsets(f, num_large_offsets, entries, nr_entries); + written += write_midx_large_offsets(f, num_large_offsets, ctx.entries, ctx.entries_nr); break; default: @@ -1035,7 +1038,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * } free(ctx.info); - free(entries); + free(ctx.entries); free(pack_perm); free(midx_name); return result; From patchwork Thu Feb 18 14:07:29 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 12093655 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 16218C4332E for ; Thu, 18 Feb 2021 16:45:15 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id DBB5264ECC for ; Thu, 18 Feb 2021 16:45:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233836AbhBRQne (ORCPT ); Thu, 18 Feb 2021 11:43:34 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40690 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233555AbhBROI3 (ORCPT ); Thu, 18 Feb 2021 09:08:29 -0500 Received: from mail-wm1-x329.google.com (mail-wm1-x329.google.com [IPv6:2a00:1450:4864:20::329]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A54AAC0617A7 for ; Thu, 18 Feb 2021 06:07:49 -0800 (PST) Received: by mail-wm1-x329.google.com with SMTP id o82so3941322wme.1 for ; Thu, 18 Feb 2021 06:07:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=Cur1jwyThUvHjWQICFEilcRtaj5W7he6jefrT/ISo+0=; b=H+krDQsr2a6Mg+AOmEOy03psccReK1y16g5VyN2uFw58Xaw/AvTYbQpcxHhvtXPOXV 2FnpAkN3hDs6ak9WXGtzMO+4Awqp01gt5Ee2bRu4BUw/ZHsXfasxy7ZaGjHZCim2c9LZ JVPatJKZoeW1eUxH4wm+639vqVqc/hvmWqnQh/xznu9xAYrr26s+KrhoeWbDwF76MylB tguiFSeXl5dR7S8no6Rc//zd8T+Diw3B4AMdqYsHAGMw6AGk8VPF/shCXPEsmTbictjE w/KtyTxJ9rj7NM4M0mEACjcsWPL9fwu2yjFX20kc5LeLUBrVjM4KbNAfy7hgItqA72PM Amcg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=Cur1jwyThUvHjWQICFEilcRtaj5W7he6jefrT/ISo+0=; b=JGJP7taoZLj7HWgej4XqNjr5UIkIl/kwAA5JCDxoqgIs7Jq54S3PQunj2cN13+0OEc evP2J0hud8IQ7q7UL3Xl1IGxZQy66dgIJ3LfPQd4aZZddeJX5yCEHSDA4Mjc0az1cO6k /9Vy6FjjEptv+/mCrFPq6JLbSdnGG4FQPjCbvA2iaR/xH+JNDS35wWbk1+dHr4j0Y3I+ NL/jkH27ymvBaIVrFMjZfvQxC/E6/0IEfs0RG50Uw9FIR2J18CZzU2rzMN4aB0v6hV1B hUwaMxoeK2RynyoQLQKBy9EGT9Tb4yMEDbhtIC2Ag27Afsw4vN34uyVQ5I4s360YBQJb beTA== X-Gm-Message-State: AOAM532q9vK/G1zKtb4p0AUMdlHwcI+iH8YU5A85BDXoRDjucZ6TqKiw u8QQ6vBkkzGrkAuhRt6mGxqVs7tfF/Y= X-Google-Smtp-Source: ABdhPJyHAGmPDqiOXWdieBzcf/lxl5oCyyrLNd66ewLPAHVSbxaNp3n+6pPMm4wQo9+gk+RbuDBdVA== X-Received: by 2002:a1c:f60b:: with SMTP id w11mr3937785wmc.134.1613657268351; Thu, 18 Feb 2021 06:07:48 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id p12sm7097917wmq.1.2021.02.18.06.07.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 18 Feb 2021 06:07:47 -0800 (PST) Message-Id: In-Reply-To: References: Date: Thu, 18 Feb 2021 14:07:29 +0000 Subject: [PATCH v4 07/17] midx: add pack_perm to write_midx_context Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: me@ttaylorr.com, gitster@pobox.com, l.s.r@web.de, szeder.dev@gmail.com, Chris Torek , Derrick Stolee , Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee In an effort to align write_midx_internal() with the chunk-format API, continue to group necessary data into "struct write_midx_context". This change collects the "uint32_t *pack_perm" and large_offsets_needed bit into the context. Update write_midx_object_offsets() to match chunk_write_fn. Signed-off-by: Derrick Stolee --- midx.c | 40 +++++++++++++++++++++------------------- 1 file changed, 21 insertions(+), 19 deletions(-) diff --git a/midx.c b/midx.c index 4520ef82b91b..cd994e333ecb 100644 --- a/midx.c +++ b/midx.c @@ -461,6 +461,9 @@ struct write_midx_context { struct pack_midx_entry *entries; uint32_t entries_nr; + + uint32_t *pack_perm; + unsigned large_offsets_needed:1; }; static void add_pack_to_midx(const char *full_path, size_t full_path_len, @@ -736,27 +739,27 @@ static size_t write_midx_oid_lookup(struct hashfile *f, return written; } -static size_t write_midx_object_offsets(struct hashfile *f, int large_offset_needed, - uint32_t *perm, - struct pack_midx_entry *objects, uint32_t nr_objects) +static size_t write_midx_object_offsets(struct hashfile *f, + void *data) { - struct pack_midx_entry *list = objects; + struct write_midx_context *ctx = data; + struct pack_midx_entry *list = ctx->entries; uint32_t i, nr_large_offset = 0; size_t written = 0; - for (i = 0; i < nr_objects; i++) { + for (i = 0; i < ctx->entries_nr; i++) { struct pack_midx_entry *obj = list++; - if (perm[obj->pack_int_id] == PACK_EXPIRED) + if (ctx->pack_perm[obj->pack_int_id] == PACK_EXPIRED) BUG("object %s is in an expired pack with int-id %d", oid_to_hex(&obj->oid), obj->pack_int_id); - hashwrite_be32(f, perm[obj->pack_int_id]); + hashwrite_be32(f, ctx->pack_perm[obj->pack_int_id]); - if (large_offset_needed && obj->offset >> 31) + if (ctx->large_offsets_needed && obj->offset >> 31) hashwrite_be32(f, MIDX_LARGE_OFFSET_NEEDED | nr_large_offset++); - else if (!large_offset_needed && obj->offset >> 32) + else if (!ctx->large_offsets_needed && obj->offset >> 32) BUG("object %s requires a large offset (%"PRIx64") but the MIDX is not writing large offsets!", oid_to_hex(&obj->oid), obj->offset); @@ -805,13 +808,11 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * struct hashfile *f = NULL; struct lock_file lk; struct write_midx_context ctx = { 0 }; - uint32_t *pack_perm = NULL; uint64_t written = 0; uint32_t chunk_ids[MIDX_MAX_CHUNKS + 1]; uint64_t chunk_offsets[MIDX_MAX_CHUNKS + 1]; uint32_t num_large_offsets = 0; struct progress *progress = NULL; - int large_offsets_needed = 0; int pack_name_concat_len = 0; int dropped_packs = 0; int result = 0; @@ -857,11 +858,12 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * ctx.entries = get_sorted_entries(ctx.m, ctx.info, ctx.nr, &ctx.entries_nr); + ctx.large_offsets_needed = 0; for (i = 0; i < ctx.entries_nr; i++) { if (ctx.entries[i].offset > 0x7fffffff) num_large_offsets++; if (ctx.entries[i].offset > 0xffffffff) - large_offsets_needed = 1; + ctx.large_offsets_needed = 1; } QSORT(ctx.info, ctx.nr, pack_info_compare); @@ -900,13 +902,13 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * * * pack_perm[old_id] = new_id */ - ALLOC_ARRAY(pack_perm, ctx.nr); + ALLOC_ARRAY(ctx.pack_perm, ctx.nr); for (i = 0; i < ctx.nr; i++) { if (ctx.info[i].expired) { dropped_packs++; - pack_perm[ctx.info[i].orig_pack_int_id] = PACK_EXPIRED; + ctx.pack_perm[ctx.info[i].orig_pack_int_id] = PACK_EXPIRED; } else { - pack_perm[ctx.info[i].orig_pack_int_id] = i - dropped_packs; + ctx.pack_perm[ctx.info[i].orig_pack_int_id] = i - dropped_packs; } } @@ -927,7 +929,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * close_midx(ctx.m); cur_chunk = 0; - num_chunks = large_offsets_needed ? 5 : 4; + num_chunks = ctx.large_offsets_needed ? 5 : 4; if (ctx.nr - dropped_packs == 0) { error(_("no pack files to index.")); @@ -954,7 +956,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * cur_chunk++; chunk_offsets[cur_chunk] = chunk_offsets[cur_chunk - 1] + ctx.entries_nr * MIDX_CHUNK_OFFSET_WIDTH; - if (large_offsets_needed) { + if (ctx.large_offsets_needed) { chunk_ids[cur_chunk] = MIDX_CHUNKID_LARGEOFFSETS; cur_chunk++; @@ -1004,7 +1006,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * break; case MIDX_CHUNKID_OBJECTOFFSETS: - written += write_midx_object_offsets(f, large_offsets_needed, pack_perm, ctx.entries, ctx.entries_nr); + written += write_midx_object_offsets(f, &ctx); break; case MIDX_CHUNKID_LARGEOFFSETS: @@ -1039,7 +1041,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * free(ctx.info); free(ctx.entries); - free(pack_perm); + free(ctx.pack_perm); free(midx_name); return result; } From patchwork Thu Feb 18 14:07:30 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 12093671 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D61BAC433E0 for ; Thu, 18 Feb 2021 16:49:05 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8D7E864EAF for ; Thu, 18 Feb 2021 16:49:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233829AbhBRQs2 (ORCPT ); Thu, 18 Feb 2021 11:48:28 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41520 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233582AbhBROIx (ORCPT ); Thu, 18 Feb 2021 09:08:53 -0500 Received: from mail-wr1-x436.google.com (mail-wr1-x436.google.com [IPv6:2a00:1450:4864:20::436]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6AE86C0617A9 for ; Thu, 18 Feb 2021 06:07:50 -0800 (PST) Received: by mail-wr1-x436.google.com with SMTP id r21so3106696wrr.9 for ; Thu, 18 Feb 2021 06:07:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=NkNbJX5oDO/3vy/Iuipy5TUY0HICiEs4cK5g77JnPHI=; b=gubnnsDeIjJjBLPNdK3BnyajDqspJUygtce8Lb6Dzomrt9A7cuoYJ4aiHEkwb4DHJB ao0hOkKjnsD14TmPmuJb0zfRwTQOJMwoJJm0RgxbtB5pxIURaPVrbx9tFudWdm3Vud8V 5qW3PcR64PtEJcoSXgZhE1AIdZ6P/yUvQQt2uKqJOBDw3BhYEY2T9R4W3asCbF7eB7RU CKzwLbsDdmiGHfhv4UR7AuRHQ1aI1m0680HKPQ6JJeJAXiN0osDqr5heFwuMlbkIhnzo v6dH007uzkuEC6iRlZsmGym6f+uPpum7NnQAQsENcZ13Jl6oGmo+ebdGaFUNW7nfzxAN VKPQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=NkNbJX5oDO/3vy/Iuipy5TUY0HICiEs4cK5g77JnPHI=; b=iiSNC6P2JXWdcSekXCml7ej2N+fLd+XfB3LhFJ0K6japjlRRZ4zWar7PfMmwjn7uKN HCzwv7PHCIeHAW3PbUJuecHVatdiC1nFUMBJmLKYi47eInFClytDscP+aH91Ne7ICuJ2 FGqKEAdlF8U2yRKNrzddfr3ENIzx11s3S91s+HYnJVbSbPKtSw9u/Rfz7tmw/zplyQdw q6tmrgd5oaMNsvXp0TLRuwzNpOr7WwJDLoQGgFTJQlcvDD492/jjkL1sTfpxyeezRro4 cQLabfN7K0lAd5hxOrHHV01bGmlLtoRQCCFy2zSHkknir+d5gGiLgbee+/P3uoVNzb55 E8ug== X-Gm-Message-State: AOAM531Y0awyQm7bssInRw7xPrB33HWcGMxQCgnclX5CbqHiU5XrO/m2 M4eI9JS5UEeWjkvIu4EoroRdiESGoFU= X-Google-Smtp-Source: ABdhPJxxK8BDDyxACEZgMaQqTzEBW+BI6oJLCa9zxTRU/PsxmYfZdz3RIexeF2SReI8zHG5Pq/631w== X-Received: by 2002:a5d:6b45:: with SMTP id x5mr4363763wrw.415.1613657269100; Thu, 18 Feb 2021 06:07:49 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id a21sm9459489wmb.5.2021.02.18.06.07.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 18 Feb 2021 06:07:48 -0800 (PST) Message-Id: In-Reply-To: References: Date: Thu, 18 Feb 2021 14:07:30 +0000 Subject: [PATCH v4 08/17] midx: add num_large_offsets to write_midx_context Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: me@ttaylorr.com, gitster@pobox.com, l.s.r@web.de, szeder.dev@gmail.com, Chris Torek , Derrick Stolee , Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee In an effort to align write_midx_internal() with the chunk-format API, continue to group necessary data into "struct write_midx_context". This change collects the "uint32_t num_large_offsets" into the context. With this new data, write_midx_large_offsets() now matches the chunk_write_fn type. Signed-off-by: Derrick Stolee --- midx.c | 17 ++++++++++------- 1 file changed, 10 insertions(+), 7 deletions(-) diff --git a/midx.c b/midx.c index cd994e333ecb..5be081f229ad 100644 --- a/midx.c +++ b/midx.c @@ -464,6 +464,7 @@ struct write_midx_context { uint32_t *pack_perm; unsigned large_offsets_needed:1; + uint32_t num_large_offsets; }; static void add_pack_to_midx(const char *full_path, size_t full_path_len, @@ -772,11 +773,14 @@ static size_t write_midx_object_offsets(struct hashfile *f, return written; } -static size_t write_midx_large_offsets(struct hashfile *f, uint32_t nr_large_offset, - struct pack_midx_entry *objects, uint32_t nr_objects) +static size_t write_midx_large_offsets(struct hashfile *f, + void *data) { - struct pack_midx_entry *list = objects, *end = objects + nr_objects; + struct write_midx_context *ctx = data; + struct pack_midx_entry *list = ctx->entries; + struct pack_midx_entry *end = ctx->entries + ctx->entries_nr; size_t written = 0; + uint32_t nr_large_offset = ctx->num_large_offsets; while (nr_large_offset) { struct pack_midx_entry *obj; @@ -811,7 +815,6 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * uint64_t written = 0; uint32_t chunk_ids[MIDX_MAX_CHUNKS + 1]; uint64_t chunk_offsets[MIDX_MAX_CHUNKS + 1]; - uint32_t num_large_offsets = 0; struct progress *progress = NULL; int pack_name_concat_len = 0; int dropped_packs = 0; @@ -861,7 +864,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * ctx.large_offsets_needed = 0; for (i = 0; i < ctx.entries_nr; i++) { if (ctx.entries[i].offset > 0x7fffffff) - num_large_offsets++; + ctx.num_large_offsets++; if (ctx.entries[i].offset > 0xffffffff) ctx.large_offsets_needed = 1; } @@ -961,7 +964,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * cur_chunk++; chunk_offsets[cur_chunk] = chunk_offsets[cur_chunk - 1] + - num_large_offsets * MIDX_CHUNK_LARGE_OFFSET_WIDTH; + ctx.num_large_offsets * MIDX_CHUNK_LARGE_OFFSET_WIDTH; } chunk_ids[cur_chunk] = 0; @@ -1010,7 +1013,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * break; case MIDX_CHUNKID_LARGEOFFSETS: - written += write_midx_large_offsets(f, num_large_offsets, ctx.entries, ctx.entries_nr); + written += write_midx_large_offsets(f, &ctx); break; default: From patchwork Thu Feb 18 14:07:31 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 12093673 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 336E7C433DB for ; Thu, 18 Feb 2021 16:49:39 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id CA3F764EB1 for ; Thu, 18 Feb 2021 16:49:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233894AbhBRQtN (ORCPT ); Thu, 18 Feb 2021 11:49:13 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41528 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233581AbhBROIx (ORCPT ); Thu, 18 Feb 2021 09:08:53 -0500 Received: from mail-wr1-x42b.google.com (mail-wr1-x42b.google.com [IPv6:2a00:1450:4864:20::42b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 83C96C0617AA for ; Thu, 18 Feb 2021 06:07:51 -0800 (PST) Received: by mail-wr1-x42b.google.com with SMTP id v14so3122178wro.7 for ; Thu, 18 Feb 2021 06:07:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=9rbRQ+4gSMEK2h+y1E/hg18dGvaKEJCIVpGJN9anMY4=; b=LdUnTSUSWpfbO+aUOF6oSZbwmImcMTL5/EJD5NUmTWgl2Vtnk1KKy7CtHPKib/GJIb bWt3MPOL2AMH/B3Yo8TqS2jlU9h4staUnNl5oxSxyORGjBztR7YaiGXqtEaNLLuzv1PJ ciJXNXzn5PgoSmzom48VU5vViY2CsIPr3qcrBcwQNh3hAK2B3ZWVhS9amYXjEKKgsmz7 rOODlK+nrNfpAkUumniBBguZm+FtbuKV4WK3qKdHsuxJmUh2JTKeN/bXceA7IzyDpJIQ iH4s2hVBWrAD2K9kTG6NvzDYAGzA/pHYxS6q+dWhJVjuQjj4dZFF/PCL8jxw0TH5e3Lp faHA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=9rbRQ+4gSMEK2h+y1E/hg18dGvaKEJCIVpGJN9anMY4=; b=tqYtj3Y/ZDyRhWOczAVsIIWf5FKcHAeZEpzRXp3eb/otXOxZho6Ai2jivox/xXfaUH HEuX5EWWy97yufdyWUFucBfSqlyPBv/PMfLNoBcyk8lJmE0skprdKc5t0kFuptXIpPAz yWIUlu6SMyKY4j5tPTXqBvu1bg3PTT2B9v7eR/v2JTVIs8Ohq3tBJls/RORdmWR/B6P1 bscBz+Qyoecee/wUfHBq97uAPMZmIcp8tGy+b/Ogicj+Nly4zIB2ZbFXhGUnENXS76Es i1eUhkWRQC3FhwHlb9Bf1IL2A03xcGS34xVj+SgonoMpKZy65IrbLT9eWkq6ZnOJJaeL /Ayw== X-Gm-Message-State: AOAM533xe9YXEM/+87U0GmKayRPkzcY5qb/SIKg+ABWhNMUGnpKZylQw KDpTGHP33x7+FKlU+shWRyYi8NrVxiI= X-Google-Smtp-Source: ABdhPJyfRwzw/3zkMtHgRbvc3u2Z0EWVCMP360gOjVvfZ6Zt2wRePoGHaV674qC9NWzlIi3cNvuwZg== X-Received: by 2002:a5d:6881:: with SMTP id h1mr4481976wru.384.1613657270206; Thu, 18 Feb 2021 06:07:50 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id r7sm7582156wmh.38.2021.02.18.06.07.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 18 Feb 2021 06:07:49 -0800 (PST) Message-Id: In-Reply-To: References: Date: Thu, 18 Feb 2021 14:07:31 +0000 Subject: [PATCH v4 09/17] midx: return success/failure in chunk write methods Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: me@ttaylorr.com, gitster@pobox.com, l.s.r@web.de, szeder.dev@gmail.com, Chris Torek , Derrick Stolee , Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee Historically, the chunk-writing methods in midx.c have returned the amount of data written so the writer method could compare this with the table of contents. This presents with some interesting issues: 1. If a chunk writing method has a bug that miscalculates the written bytes, then we can satisfy the table of contents without actually writing the right amount of data to the hashfile. The commit-graph writing code checks the hashfile struct directly for a more robust verification. 2. There is no way for a chunk writing method to gracefully fail. Returning an int presents an opportunity to fail without a die(). 3. The current pattern doesn't match chunk_write_fn type exactly, so we cannot share code with commit-graph.c For these reasons, convert the midx chunk writer methods to return an 'int'. Since none of them fail at the moment, they all return 0. Signed-off-by: Derrick Stolee --- midx.c | 63 +++++++++++++++++++++++++--------------------------------- 1 file changed, 27 insertions(+), 36 deletions(-) diff --git a/midx.c b/midx.c index 5be081f229ad..c92a6c47be01 100644 --- a/midx.c +++ b/midx.c @@ -650,7 +650,7 @@ static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m, return deduplicated_entries; } -static size_t write_midx_pack_names(struct hashfile *f, void *data) +static int write_midx_pack_names(struct hashfile *f, void *data) { struct write_midx_context *ctx = data; uint32_t i; @@ -678,14 +678,13 @@ static size_t write_midx_pack_names(struct hashfile *f, void *data) if (i < MIDX_CHUNK_ALIGNMENT) { memset(padding, 0, sizeof(padding)); hashwrite(f, padding, i); - written += i; } - return written; + return 0; } -static size_t write_midx_oid_fanout(struct hashfile *f, - void *data) +static int write_midx_oid_fanout(struct hashfile *f, + void *data) { struct write_midx_context *ctx = data; struct pack_midx_entry *list = ctx->entries; @@ -710,17 +709,16 @@ static size_t write_midx_oid_fanout(struct hashfile *f, list = next; } - return MIDX_CHUNK_FANOUT_SIZE; + return 0; } -static size_t write_midx_oid_lookup(struct hashfile *f, - void *data) +static int write_midx_oid_lookup(struct hashfile *f, + void *data) { struct write_midx_context *ctx = data; unsigned char hash_len = the_hash_algo->rawsz; struct pack_midx_entry *list = ctx->entries; uint32_t i; - size_t written = 0; for (i = 0; i < ctx->entries_nr; i++) { struct pack_midx_entry *obj = list++; @@ -734,19 +732,17 @@ static size_t write_midx_oid_lookup(struct hashfile *f, } hashwrite(f, obj->oid.hash, (int)hash_len); - written += hash_len; } - return written; + return 0; } -static size_t write_midx_object_offsets(struct hashfile *f, - void *data) +static int write_midx_object_offsets(struct hashfile *f, + void *data) { struct write_midx_context *ctx = data; struct pack_midx_entry *list = ctx->entries; uint32_t i, nr_large_offset = 0; - size_t written = 0; for (i = 0; i < ctx->entries_nr; i++) { struct pack_midx_entry *obj = list++; @@ -766,20 +762,17 @@ static size_t write_midx_object_offsets(struct hashfile *f, obj->offset); else hashwrite_be32(f, (uint32_t)obj->offset); - - written += MIDX_CHUNK_OFFSET_WIDTH; } - return written; + return 0; } -static size_t write_midx_large_offsets(struct hashfile *f, - void *data) +static int write_midx_large_offsets(struct hashfile *f, + void *data) { struct write_midx_context *ctx = data; struct pack_midx_entry *list = ctx->entries; struct pack_midx_entry *end = ctx->entries + ctx->entries_nr; - size_t written = 0; uint32_t nr_large_offset = ctx->num_large_offsets; while (nr_large_offset) { @@ -795,12 +788,12 @@ static size_t write_midx_large_offsets(struct hashfile *f, if (!(offset >> 31)) continue; - written += hashwrite_be64(f, offset); + hashwrite_be64(f, offset); nr_large_offset--; } - return written; + return 0; } static int write_midx_internal(const char *object_dir, struct multi_pack_index *m, @@ -812,7 +805,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * struct hashfile *f = NULL; struct lock_file lk; struct write_midx_context ctx = { 0 }; - uint64_t written = 0; + uint64_t header_size = 0; uint32_t chunk_ids[MIDX_MAX_CHUNKS + 1]; uint64_t chunk_offsets[MIDX_MAX_CHUNKS + 1]; struct progress *progress = NULL; @@ -940,10 +933,10 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * goto cleanup; } - written = write_midx_header(f, num_chunks, ctx.nr - dropped_packs); + header_size = write_midx_header(f, num_chunks, ctx.nr - dropped_packs); chunk_ids[cur_chunk] = MIDX_CHUNKID_PACKNAMES; - chunk_offsets[cur_chunk] = written + (num_chunks + 1) * MIDX_CHUNKLOOKUP_WIDTH; + chunk_offsets[cur_chunk] = header_size + (num_chunks + 1) * MIDX_CHUNKLOOKUP_WIDTH; cur_chunk++; chunk_ids[cur_chunk] = MIDX_CHUNKID_OIDFANOUT; @@ -981,39 +974,37 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * hashwrite_be32(f, chunk_ids[i]); hashwrite_be64(f, chunk_offsets[i]); - - written += MIDX_CHUNKLOOKUP_WIDTH; } if (flags & MIDX_PROGRESS) progress = start_delayed_progress(_("Writing chunks to multi-pack-index"), num_chunks); for (i = 0; i < num_chunks; i++) { - if (written != chunk_offsets[i]) + if (f->total + f->offset != chunk_offsets[i]) BUG("incorrect chunk offset (%"PRIu64" != %"PRIu64") for chunk id %"PRIx32, chunk_offsets[i], - written, + f->total + f->offset, chunk_ids[i]); switch (chunk_ids[i]) { case MIDX_CHUNKID_PACKNAMES: - written += write_midx_pack_names(f, &ctx); + write_midx_pack_names(f, &ctx); break; case MIDX_CHUNKID_OIDFANOUT: - written += write_midx_oid_fanout(f, &ctx); + write_midx_oid_fanout(f, &ctx); break; case MIDX_CHUNKID_OIDLOOKUP: - written += write_midx_oid_lookup(f, &ctx); + write_midx_oid_lookup(f, &ctx); break; case MIDX_CHUNKID_OBJECTOFFSETS: - written += write_midx_object_offsets(f, &ctx); + write_midx_object_offsets(f, &ctx); break; case MIDX_CHUNKID_LARGEOFFSETS: - written += write_midx_large_offsets(f, &ctx); + write_midx_large_offsets(f, &ctx); break; default: @@ -1025,9 +1016,9 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * } stop_progress(&progress); - if (written != chunk_offsets[num_chunks]) + if (hashfile_total(f) != chunk_offsets[num_chunks]) BUG("incorrect final offset %"PRIu64" != %"PRIu64, - written, + hashfile_total(f), chunk_offsets[num_chunks]); finalize_hashfile(f, NULL, CSUM_FSYNC | CSUM_HASH_IN_STREAM); From patchwork Thu Feb 18 14:07:32 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 12093675 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 93CA6C433E0 for ; Thu, 18 Feb 2021 16:51:21 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 46A1D64EB4 for ; Thu, 18 Feb 2021 16:51:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229652AbhBRQtv (ORCPT ); Thu, 18 Feb 2021 11:49:51 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41530 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233580AbhBROIx (ORCPT ); Thu, 18 Feb 2021 09:08:53 -0500 Received: from mail-wr1-x436.google.com (mail-wr1-x436.google.com [IPv6:2a00:1450:4864:20::436]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3514BC0617AB for ; Thu, 18 Feb 2021 06:07:53 -0800 (PST) Received: by mail-wr1-x436.google.com with SMTP id g6so3096243wrs.11 for ; Thu, 18 Feb 2021 06:07:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=u7s4MNAI8c5xo75w4IpgbLg/3f46uKnaaEAr+lPMDu4=; b=smNBRVlk1m7JLlQnKviszfDOAitWCMOFUjXSpVWZfpm+Pi8bfcdmAHXuMfkdiNbl3G qttOY8OW36qUwCUWPdSc5rjRYDg3AAVKJw06IV31BGwYGb8ilkN/oJWU03jrFF/EMOAL v41ysnqJZNWRVxFUS0h6mXB/+ljIKuPXfwlZUzvzTGFGW3WLBXT9SjwN7KLe6yQ/7uS4 T1OlsKoKgez0xqAV1LxejogXCEwhml8sQqjp7XmV4q319Eh9sbowGPay+ziZWJrYI0yU wuw/SH6qRxo6LMPqgBzNLW8O7bqP92mNvP190Gm/GVVLtHfVH8anB8Gi3joHssL9Yd09 +60g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=u7s4MNAI8c5xo75w4IpgbLg/3f46uKnaaEAr+lPMDu4=; b=AmjUp1/x19V7+GM97U/Ar/uIclsoJl4zfveMbZ/g+qn37DZPhk+MZP99s9IJ3d5154 yYqb371sgkGYJkEiHworqniJmF/Yj16OufcymxTI+GV96i+G7lbS4mLOgenw0I/mdDFb 6QCeXpI+wxTvsNCF1MtGBe2SB502UXx/OTvVi3FEKRGk/foxf7blODGYr1XeMKvR4nUd 5CKhrYoObIJVvAZ8wU7T8INroy3mABSne/OwC57cZsej6H063VCKjCx4QH0rmo5ugnUR fuvazN1vpa4d21V1FtSr91Gvue7Mdy5EEksAFkhAgQj7klDWnrQsWy161bsNXOBxx90k vjRg== X-Gm-Message-State: AOAM533Q14RrgSUJQDtx+dQ9eBGmEaoxYr82Jh97RBr+7FSokvA6bX29 DaA8NZOVV4mLtYyzakmsBzsVQqh1Osk= X-Google-Smtp-Source: ABdhPJy0KRre2SEbbE5qMg0M3ZNJqjvk8XoAlUkXG+sipw30Jq5ufa84TbNem++W6z0FEw1IwxALvg== X-Received: by 2002:adf:aa08:: with SMTP id p8mr4442625wrd.232.1613657271919; Thu, 18 Feb 2021 06:07:51 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id f14sm7705001wmc.32.2021.02.18.06.07.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 18 Feb 2021 06:07:50 -0800 (PST) Message-Id: <60e77e45180052cd859893cadc9dbe39ada5d06b.1613657259.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Thu, 18 Feb 2021 14:07:32 +0000 Subject: [PATCH v4 10/17] midx: drop chunk progress during write Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: me@ttaylorr.com, gitster@pobox.com, l.s.r@web.de, szeder.dev@gmail.com, Chris Torek , Derrick Stolee , Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee Most expensive operations in write_midx_internal() use the context struct's progress member, and these indicate the process of the expensive operations within the chunk writing methods. However, there is a competing progress struct that counts the progress over all chunks. This is not very helpful compared to the others, so drop it. This also reduces our barriers to combining the chunk writing code with chunk-format.c. Signed-off-by: Derrick Stolee --- midx.c | 7 ------- 1 file changed, 7 deletions(-) diff --git a/midx.c b/midx.c index c92a6c47be01..4f4aa351e60e 100644 --- a/midx.c +++ b/midx.c @@ -808,7 +808,6 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * uint64_t header_size = 0; uint32_t chunk_ids[MIDX_MAX_CHUNKS + 1]; uint64_t chunk_offsets[MIDX_MAX_CHUNKS + 1]; - struct progress *progress = NULL; int pack_name_concat_len = 0; int dropped_packs = 0; int result = 0; @@ -976,9 +975,6 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * hashwrite_be64(f, chunk_offsets[i]); } - if (flags & MIDX_PROGRESS) - progress = start_delayed_progress(_("Writing chunks to multi-pack-index"), - num_chunks); for (i = 0; i < num_chunks; i++) { if (f->total + f->offset != chunk_offsets[i]) BUG("incorrect chunk offset (%"PRIu64" != %"PRIu64") for chunk id %"PRIx32, @@ -1011,10 +1007,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * BUG("trying to write unknown chunk id %"PRIx32, chunk_ids[i]); } - - display_progress(progress, i + 1); } - stop_progress(&progress); if (hashfile_total(f) != chunk_offsets[num_chunks]) BUG("incorrect final offset %"PRIu64" != %"PRIu64, From patchwork Thu Feb 18 14:07:33 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 12093687 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6D3DAC433DB for ; Thu, 18 Feb 2021 16:52:11 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1F48864EAF for ; Thu, 18 Feb 2021 16:52:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233924AbhBRQv1 (ORCPT ); Thu, 18 Feb 2021 11:51:27 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41154 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233116AbhBROJd (ORCPT ); Thu, 18 Feb 2021 09:09:33 -0500 Received: from mail-wr1-x42a.google.com (mail-wr1-x42a.google.com [IPv6:2a00:1450:4864:20::42a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 35A18C061356 for ; Thu, 18 Feb 2021 06:07:54 -0800 (PST) Received: by mail-wr1-x42a.google.com with SMTP id u14so3136676wri.3 for ; Thu, 18 Feb 2021 06:07:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=bw5kYP5uPIt0cxb9pH4WJLJrsVSKjkp+3wz43GO4kqw=; b=sVLeEkxrkuUiB6MqH7WI/XBilhAs4Ck0i6hl919BwyV4JI5nkElQyPsbA+R0Eu5UtF VKfJNP+kY01+Pck/BMZCJh7IM1gQVYBS6q/6q750i4BdWJJTdnzQJF32ujfN9WrSMCj6 pLhyWZUqSt3RN3iszQhPm6Ndz9BmXcvEfaidtPlg2TVtFD/avGH5BNe16ProE5DGPiAE elnDhP4144uMyivv8FjbIOPJyu4S8OGOH1nWurtW5DXO9MIhnrqw5ydXWT2HJMm+4xi2 oS7d2L1WKrjp+yY6MB6ecQd9e4S/Z8DgApVRmHectyhC6xBotQJPrKl76OSFebciDUZa NyPw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=bw5kYP5uPIt0cxb9pH4WJLJrsVSKjkp+3wz43GO4kqw=; b=mzVwPDmwMbo5+hcA7diJQxW6GLSqPNa9BZJ1l6CaVA60WR47C0FMnTOUOz8OloEcSA tBCKJLmzTLAxL3Lmb/ghMOFeHdY/3CK+1udBP+u8Eu0qTBY5fKaCK1Ev0VDgCslMcqoO v6JIfuUWPUMQArfNJXFhSLswJoNAidgUYlMFjCCHgeushlfnB75+mII29RYM9gzYXRZM 2Ahm28RVAeveghtfuBWA4ZL8bgNeJ8Zc3VV61fEYJEhohsPeQBElbmZrU924QAP9gIDc VOnt1xuqZmTZcTiLys4Rla78GcCk9Ot0nHXON3huyAE1swDXvx+R/XrZvL1tOlOTn0db KUdg== X-Gm-Message-State: AOAM531okV6gbagxer/AoBQUqbTYI2uFimQid2m6FneyOvO1JV4p1Ccw ZxXNpjqNLtO6CefZoT+wDmjb1Smxzfo= X-Google-Smtp-Source: ABdhPJzn7H9Z0fZMzzDhPbqdWP9FsetPrlhFu2jUE8sfhB20lUv5w54E7QNJswVCyfpZEVM74DYLsA== X-Received: by 2002:adf:fb91:: with SMTP id a17mr4445785wrr.93.1613657272962; Thu, 18 Feb 2021 06:07:52 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id y16sm8761850wrw.46.2021.02.18.06.07.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 18 Feb 2021 06:07:52 -0800 (PST) Message-Id: <0b95aa7e152f01b4c33c79858d8979ffcc4eb420.1613657259.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Thu, 18 Feb 2021 14:07:33 +0000 Subject: [PATCH v4 11/17] midx: use chunk-format API in write_midx_internal() Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: me@ttaylorr.com, gitster@pobox.com, l.s.r@web.de, szeder.dev@gmail.com, Chris Torek , Derrick Stolee , Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee The chunk-format API allows writing the table of contents and all chunks using the anonymous 'struct chunkfile' type. We only need to convert our local chunk logic to this API for the multi-pack-index writes to share that logic with the commit-graph file writes. Signed-off-by: Derrick Stolee --- midx.c | 106 +++++++++++---------------------------------------------- 1 file changed, 20 insertions(+), 86 deletions(-) diff --git a/midx.c b/midx.c index 4f4aa351e60e..d2fd9c10feea 100644 --- a/midx.c +++ b/midx.c @@ -11,6 +11,7 @@ #include "trace2.h" #include "run-command.h" #include "repository.h" +#include "chunk-format.h" #define MIDX_SIGNATURE 0x4d494458 /* "MIDX" */ #define MIDX_VERSION 1 @@ -21,7 +22,6 @@ #define MIDX_HEADER_SIZE 12 #define MIDX_MIN_SIZE (MIDX_HEADER_SIZE + the_hash_algo->rawsz) -#define MIDX_MAX_CHUNKS 5 #define MIDX_CHUNK_ALIGNMENT 4 #define MIDX_CHUNKID_PACKNAMES 0x504e414d /* "PNAM" */ #define MIDX_CHUNKID_OIDFANOUT 0x4f494446 /* "OIDF" */ @@ -799,18 +799,15 @@ static int write_midx_large_offsets(struct hashfile *f, static int write_midx_internal(const char *object_dir, struct multi_pack_index *m, struct string_list *packs_to_drop, unsigned flags) { - unsigned char cur_chunk, num_chunks = 0; char *midx_name; uint32_t i; struct hashfile *f = NULL; struct lock_file lk; struct write_midx_context ctx = { 0 }; - uint64_t header_size = 0; - uint32_t chunk_ids[MIDX_MAX_CHUNKS + 1]; - uint64_t chunk_offsets[MIDX_MAX_CHUNKS + 1]; int pack_name_concat_len = 0; int dropped_packs = 0; int result = 0; + struct chunkfile *cf; midx_name = get_midx_filename(object_dir); if (safe_create_leading_directories(midx_name)) @@ -923,98 +920,35 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * if (ctx.m) close_midx(ctx.m); - cur_chunk = 0; - num_chunks = ctx.large_offsets_needed ? 5 : 4; - if (ctx.nr - dropped_packs == 0) { error(_("no pack files to index.")); result = 1; goto cleanup; } - header_size = write_midx_header(f, num_chunks, ctx.nr - dropped_packs); - - chunk_ids[cur_chunk] = MIDX_CHUNKID_PACKNAMES; - chunk_offsets[cur_chunk] = header_size + (num_chunks + 1) * MIDX_CHUNKLOOKUP_WIDTH; - - cur_chunk++; - chunk_ids[cur_chunk] = MIDX_CHUNKID_OIDFANOUT; - chunk_offsets[cur_chunk] = chunk_offsets[cur_chunk - 1] + pack_name_concat_len; - - cur_chunk++; - chunk_ids[cur_chunk] = MIDX_CHUNKID_OIDLOOKUP; - chunk_offsets[cur_chunk] = chunk_offsets[cur_chunk - 1] + MIDX_CHUNK_FANOUT_SIZE; - - cur_chunk++; - chunk_ids[cur_chunk] = MIDX_CHUNKID_OBJECTOFFSETS; - chunk_offsets[cur_chunk] = chunk_offsets[cur_chunk - 1] + ctx.entries_nr * the_hash_algo->rawsz; - - cur_chunk++; - chunk_offsets[cur_chunk] = chunk_offsets[cur_chunk - 1] + ctx.entries_nr * MIDX_CHUNK_OFFSET_WIDTH; - if (ctx.large_offsets_needed) { - chunk_ids[cur_chunk] = MIDX_CHUNKID_LARGEOFFSETS; - - cur_chunk++; - chunk_offsets[cur_chunk] = chunk_offsets[cur_chunk - 1] + - ctx.num_large_offsets * MIDX_CHUNK_LARGE_OFFSET_WIDTH; - } - - chunk_ids[cur_chunk] = 0; - - for (i = 0; i <= num_chunks; i++) { - if (i && chunk_offsets[i] < chunk_offsets[i - 1]) - BUG("incorrect chunk offsets: %"PRIu64" before %"PRIu64, - chunk_offsets[i - 1], - chunk_offsets[i]); - - if (chunk_offsets[i] % MIDX_CHUNK_ALIGNMENT) - BUG("chunk offset %"PRIu64" is not properly aligned", - chunk_offsets[i]); - - hashwrite_be32(f, chunk_ids[i]); - hashwrite_be64(f, chunk_offsets[i]); - } - - for (i = 0; i < num_chunks; i++) { - if (f->total + f->offset != chunk_offsets[i]) - BUG("incorrect chunk offset (%"PRIu64" != %"PRIu64") for chunk id %"PRIx32, - chunk_offsets[i], - f->total + f->offset, - chunk_ids[i]); + cf = init_chunkfile(f); - switch (chunk_ids[i]) { - case MIDX_CHUNKID_PACKNAMES: - write_midx_pack_names(f, &ctx); - break; + add_chunk(cf, MIDX_CHUNKID_PACKNAMES, pack_name_concat_len, + write_midx_pack_names); + add_chunk(cf, MIDX_CHUNKID_OIDFANOUT, MIDX_CHUNK_FANOUT_SIZE, + write_midx_oid_fanout); + add_chunk(cf, MIDX_CHUNKID_OIDLOOKUP, + ctx.entries_nr * the_hash_algo->rawsz, + write_midx_oid_lookup); + add_chunk(cf, MIDX_CHUNKID_OBJECTOFFSETS, + ctx.entries_nr * MIDX_CHUNK_OFFSET_WIDTH, + write_midx_object_offsets); - case MIDX_CHUNKID_OIDFANOUT: - write_midx_oid_fanout(f, &ctx); - break; - - case MIDX_CHUNKID_OIDLOOKUP: - write_midx_oid_lookup(f, &ctx); - break; - - case MIDX_CHUNKID_OBJECTOFFSETS: - write_midx_object_offsets(f, &ctx); - break; - - case MIDX_CHUNKID_LARGEOFFSETS: - write_midx_large_offsets(f, &ctx); - break; - - default: - BUG("trying to write unknown chunk id %"PRIx32, - chunk_ids[i]); - } - } + if (ctx.large_offsets_needed) + add_chunk(cf, MIDX_CHUNKID_LARGEOFFSETS, + ctx.num_large_offsets * MIDX_CHUNK_LARGE_OFFSET_WIDTH, + write_midx_large_offsets); - if (hashfile_total(f) != chunk_offsets[num_chunks]) - BUG("incorrect final offset %"PRIu64" != %"PRIu64, - hashfile_total(f), - chunk_offsets[num_chunks]); + write_midx_header(f, get_num_chunks(cf), ctx.nr - dropped_packs); + write_chunkfile(cf, &ctx); finalize_hashfile(f, NULL, CSUM_FSYNC | CSUM_HASH_IN_STREAM); + free_chunkfile(cf); commit_lock_file(&lk); cleanup: From patchwork Thu Feb 18 14:07:34 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 12093689 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8E1F1C433E0 for ; Thu, 18 Feb 2021 16:52:23 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4489A64EAD for ; Thu, 18 Feb 2021 16:52:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233501AbhBRQwN (ORCPT ); Thu, 18 Feb 2021 11:52:13 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41750 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230108AbhBROJ5 (ORCPT ); Thu, 18 Feb 2021 09:09:57 -0500 Received: from mail-wr1-x42b.google.com (mail-wr1-x42b.google.com [IPv6:2a00:1450:4864:20::42b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5BABBC06121C for ; Thu, 18 Feb 2021 06:07:55 -0800 (PST) Received: by mail-wr1-x42b.google.com with SMTP id v1so3124680wrd.6 for ; Thu, 18 Feb 2021 06:07:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=L7txn7xvbO69aI6sRhsV1+Kb5QylibPZGK1hohssVBA=; b=PE/rKbXroSCE4qp2rZ/mBozrWr7zTetWF1l4uT1jAuLgQ6SRkVWzgS4s1x1g7ItOxb 9Nj8pC4cM3rF8i9CbX9qjuM60oPd5aj0DWbKbIE3oFCAGpOWzEONzWkSaJXRIgKg4J/l OdDwt/dSLl1zgdEO0XS/JGX6wAOUTieTcBsrHJYRNYtLR4JvLSjLn6Y9xBv9ZJjHVenW ctt0ywzHlHQnlK8wsPQvG97Cy6WhzkFcSy16t4blgLd0rACl1B9N/sLhkCdYiSj6BzDa Ol133xwSU33PWpJTeFSaxo7EcEEq9kN62LmagVrslQa0U0SaOogB101R5HDgtoM4EA2c qvyQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=L7txn7xvbO69aI6sRhsV1+Kb5QylibPZGK1hohssVBA=; b=Uhy8fZDfvIDvuEwKLhM4FGsAcSkluNT3sXbUNxua3sofTSvdkAWkfoeYLmy+k9NBRs amzSQBZ4CyUMc43ENOgIhV4ob2P+/0rdWJxQx+fPXxrJcDuO2sg1y34NkRufqJYZDriZ hAvWw+86xz6F+fF+7q5U2fVu0Z+AfYJoil2y+71mKzXY7vikh4H87DjCirLffyGwsrA3 c/hilDWB+bSpfckFwTtFu7FKJkv6zKdx0vS+kFIEguM7YIvq0PczZrrvn6/co2UmbWBb 4M2Yau22yBAjp4MKaB3JfOWqqcOD3ppUpz7kxn8SdedU1D14d22l715Zw+wqhlYBOPDZ AOuQ== X-Gm-Message-State: AOAM5320AktSGz2/xne7+MAAJx5Xn+uhzY3t8vlOxR6LhJfqRgbsVEgx E58w5CwgwPD1JJg07XOuWVikqnm+iuw= X-Google-Smtp-Source: ABdhPJz09O/DUqAARdUa+HkWuMdOQ6/iNb+CKieWfLoP46dfswqRP4memvTOyIWXISLiApFdyI9scA== X-Received: by 2002:a5d:4582:: with SMTP id p2mr4536074wrq.126.1613657274042; Thu, 18 Feb 2021 06:07:54 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id z2sm7461715wmf.46.2021.02.18.06.07.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 18 Feb 2021 06:07:53 -0800 (PST) Message-Id: <3e0dbc45ce7ff5f1bda63ae8b8b45343b790417b.1613657259.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Thu, 18 Feb 2021 14:07:34 +0000 Subject: [PATCH v4 12/17] chunk-format: create read chunk API Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: me@ttaylorr.com, gitster@pobox.com, l.s.r@web.de, szeder.dev@gmail.com, Chris Torek , Derrick Stolee , Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee Add the capability to read the table of contents, then pair the chunks with necessary logic using read_chunk_fn pointers. Callers will be added in future changes, but the typical outline will be: 1. initialize a 'struct chunkfile' with init_chunkfile(NULL). 2. call read_table_of_contents(). 3. for each chunk to parse, a. call pair_chunk() to assign a pointer with the chunk position, or b. call read_chunk() to run a callback on the chunk start and size. 4. call free_chunkfile() to clear the 'struct chunkfile' data. We are re-using the anonymous 'struct chunkfile' data, as it is internal to the chunk-format API. This gives it essentially two modes: write and read. If the same struct instance was used for both reads and writes, then there would be failures. Helped-by: Junio C Hamano Signed-off-by: Derrick Stolee --- chunk-format.c | 80 ++++++++++++++++++++++++++++++++++++++++++++++++++ chunk-format.h | 47 +++++++++++++++++++++++++++++ 2 files changed, 127 insertions(+) diff --git a/chunk-format.c b/chunk-format.c index 6c9b52b70c10..2c1fecf1c3e5 100644 --- a/chunk-format.c +++ b/chunk-format.c @@ -11,6 +11,8 @@ struct chunk_info { uint32_t id; uint64_t size; chunk_write_fn write_fn; + + const void *start; }; struct chunkfile { @@ -88,3 +90,81 @@ int write_chunkfile(struct chunkfile *cf, void *data) return 0; } + +int read_table_of_contents(struct chunkfile *cf, + const unsigned char *mfile, + size_t mfile_size, + uint64_t toc_offset, + int toc_length) +{ + uint32_t chunk_id; + const unsigned char *table_of_contents = mfile + toc_offset; + + ALLOC_GROW(cf->chunks, toc_length, cf->chunks_alloc); + + while (toc_length--) { + uint64_t chunk_offset, next_chunk_offset; + + chunk_id = get_be32(table_of_contents); + chunk_offset = get_be64(table_of_contents + 4); + + if (!chunk_id) { + error(_("terminating chunk id appears earlier than expected")); + return 1; + } + + table_of_contents += CHUNK_TOC_ENTRY_SIZE; + next_chunk_offset = get_be64(table_of_contents + 4); + + if (next_chunk_offset < chunk_offset || + next_chunk_offset > mfile_size - the_hash_algo->rawsz) { + error(_("improper chunk offset(s) %"PRIx64" and %"PRIx64""), + chunk_offset, next_chunk_offset); + return -1; + } + + cf->chunks[cf->chunks_nr].id = chunk_id; + cf->chunks[cf->chunks_nr].start = mfile + chunk_offset; + cf->chunks[cf->chunks_nr].size = next_chunk_offset - chunk_offset; + cf->chunks_nr++; + } + + chunk_id = get_be32(table_of_contents); + if (chunk_id) { + error(_("final chunk has non-zero id %"PRIx32""), chunk_id); + return -1; + } + + return 0; +} + +static int pair_chunk_fn(const unsigned char *chunk_start, + size_t chunk_size, + void *data) +{ + const unsigned char **p = data; + *p = chunk_start; + return 0; +} + +int pair_chunk(struct chunkfile *cf, + uint32_t chunk_id, + const unsigned char **p) +{ + return read_chunk(cf, chunk_id, pair_chunk_fn, p); +} + +int read_chunk(struct chunkfile *cf, + uint32_t chunk_id, + chunk_read_fn fn, + void *data) +{ + int i; + + for (i = 0; i < cf->chunks_nr; i++) { + if (cf->chunks[i].id == chunk_id) + return fn(cf->chunks[i].start, cf->chunks[i].size, data); + } + + return CHUNK_NOT_FOUND; +} diff --git a/chunk-format.h b/chunk-format.h index ce598b66d9f8..9ccbe0037792 100644 --- a/chunk-format.h +++ b/chunk-format.h @@ -8,6 +8,20 @@ struct chunkfile; #define CHUNK_TOC_ENTRY_SIZE (sizeof(uint32_t) + sizeof(uint64_t)) +/* + * Initialize a 'struct chunkfile' for writing _or_ reading a file + * with the chunk format. + * + * If writing a file, supply a non-NULL 'struct hashfile *' that will + * be used to write. + * + * If reading a file, use a NULL 'struct hashfile *' and then call + * read_table_of_contents(). Supply the memory-mapped data to the + * pair_chunk() or read_chunk() methods, as appropriate. + * + * DO NOT MIX THESE MODES. Use different 'struct chunkfile' instances + * for reading and writing. + */ struct chunkfile *init_chunkfile(struct hashfile *f); void free_chunkfile(struct chunkfile *cf); int get_num_chunks(struct chunkfile *cf); @@ -18,4 +32,37 @@ void add_chunk(struct chunkfile *cf, chunk_write_fn fn); int write_chunkfile(struct chunkfile *cf, void *data); +int read_table_of_contents(struct chunkfile *cf, + const unsigned char *mfile, + size_t mfile_size, + uint64_t toc_offset, + int toc_length); + +#define CHUNK_NOT_FOUND (-2) + +/* + * Find 'chunk_id' in the given chunkfile and assign the + * given pointer to the position in the mmap'd file where + * that chunk begins. + * + * Returns CHUNK_NOT_FOUND if the chunk does not exist. + */ +int pair_chunk(struct chunkfile *cf, + uint32_t chunk_id, + const unsigned char **p); + +typedef int (*chunk_read_fn)(const unsigned char *chunk_start, + size_t chunk_size, void *data); +/* + * Find 'chunk_id' in the given chunkfile and call the + * given chunk_read_fn method with the information for + * that chunk. + * + * Returns CHUNK_NOT_FOUND if the chunk does not exist. + */ +int read_chunk(struct chunkfile *cf, + uint32_t chunk_id, + chunk_read_fn fn, + void *data); + #endif From patchwork Thu Feb 18 14:07:35 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 12093697 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B5EC3C433E0 for ; Thu, 18 Feb 2021 16:54:28 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 69A4064EAD for ; Thu, 18 Feb 2021 16:54:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233969AbhBRQx0 (ORCPT ); Thu, 18 Feb 2021 11:53:26 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41412 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232842AbhBROKo (ORCPT ); Thu, 18 Feb 2021 09:10:44 -0500 Received: from mail-wm1-x330.google.com (mail-wm1-x330.google.com [IPv6:2a00:1450:4864:20::330]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9E07FC06121D for ; Thu, 18 Feb 2021 06:07:56 -0800 (PST) Received: by mail-wm1-x330.google.com with SMTP id n10so4085354wmq.0 for ; Thu, 18 Feb 2021 06:07:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=vzmS8yzoSmi05koRWtYpOHFxzzJ97VIl5onOob4HIG8=; b=oTt7/7fqFkLr6R3w9lNgVmOxDFXEm3LovTIHeKIsmZ3Jgmtb2/QoH1XlYIdnB9Zoil 8FwqepURG9jCpFP44XpSzmYiYoXcEu4YbmgmzrfZvC4nTwR//jimvy0qK/4Mk3r0HZ19 ykbHIBfKuKu/g77kqcTUN45Mc+qireQx+p7s4VSzUS+fMsGv0ESnKh5lsoiRngHQ9ALW /t3dgon4bJDKlhACDShg8rTgKMPnbMIJCbnFl7EKd3n4MR8rNe1ezbnesd/5KUbRy3o2 67PsIFAOojOW0qeDD+HutLiOv6mbUup5ZttvR1hJAtgqO4KITxbyey+LbIjcAvlSMbIa ssEw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=vzmS8yzoSmi05koRWtYpOHFxzzJ97VIl5onOob4HIG8=; b=lC6QkAcZJw92B3cIv2wGziy9Isx29W0KzWbidaO8RzbDxMzw5vyfWydFulmEGwY7uT hpKc4lQvfiR7jK0O/Y+9llvqtyHUonAH96Ol6ABoWhUVndnHn5+1oCkyJgyisqlzhNcD +BR2uOBLtMB/4s8EqsC/K7WVWf1wbQF5ah7os/qc/pFnOPGS6RQXrcwTnQ0kbV7tkkyU q8FH46mh9X6pr7rFP57sCT94jIMTRkZ18wCwx30fW8D8u+qz4Lum0f1HrOWEAtFA4Bbm ek2NOYw+Qny0SJs5uC04YhscngF1kqVvcO//knS6jsUsc9wrOLSFZB5Ouvhl6pdNZWWs gIyg== X-Gm-Message-State: AOAM531146Op2ypj6qu2jUA8+UoDow//9wC195LQGdlGqvuebhvwENId v1PHb3LTAlVBDKrAuv23Ndkgbz7Ennc= X-Google-Smtp-Source: ABdhPJwxZsMNHXHJ6COoWtzC9qomKiH5Iyhgjrr6LpCzHFRP/PHDKmooC0Ats14MRb+UeK7HQwYY+w== X-Received: by 2002:a1c:7301:: with SMTP id d1mr3766857wmb.33.1613657275307; Thu, 18 Feb 2021 06:07:55 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id r7sm7582451wmh.38.2021.02.18.06.07.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 18 Feb 2021 06:07:54 -0800 (PST) Message-Id: <87f441fc5e154c95494f3f1219fb0e0c7d9b3244.1613657259.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Thu, 18 Feb 2021 14:07:35 +0000 Subject: [PATCH v4 13/17] commit-graph: use chunk-format read API Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: me@ttaylorr.com, gitster@pobox.com, l.s.r@web.de, szeder.dev@gmail.com, Chris Torek , Derrick Stolee , Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee Instead of parsing the table of contents directly, use the chunk-format API methods read_table_of_contents() and pair_chunk(). While the current implementation loses the duplicate-chunk detection, that will be added in a future change. Signed-off-by: Derrick Stolee --- commit-graph.c | 159 ++++++++++++++-------------------------- t/t5318-commit-graph.sh | 2 +- 2 files changed, 55 insertions(+), 106 deletions(-) diff --git a/commit-graph.c b/commit-graph.c index a889130cc849..76514a879e88 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -59,8 +59,7 @@ void git_test_write_commit_graph_or_die(void) #define GRAPH_HEADER_SIZE 8 #define GRAPH_FANOUT_SIZE (4 * 256) -#define GRAPH_CHUNKLOOKUP_WIDTH 12 -#define GRAPH_MIN_SIZE (GRAPH_HEADER_SIZE + 4 * GRAPH_CHUNKLOOKUP_WIDTH \ +#define GRAPH_MIN_SIZE (GRAPH_HEADER_SIZE + 4 * CHUNK_TOC_ENTRY_SIZE \ + GRAPH_FANOUT_SIZE + the_hash_algo->rawsz) #define CORRECTED_COMMIT_DATE_OFFSET_OVERFLOW (1ULL << 31) @@ -298,15 +297,43 @@ static int verify_commit_graph_lite(struct commit_graph *g) return 0; } +static int graph_read_oid_lookup(const unsigned char *chunk_start, + size_t chunk_size, void *data) +{ + struct commit_graph *g = data; + g->chunk_oid_lookup = chunk_start; + g->num_commits = chunk_size / g->hash_len; + return 0; +} + +static int graph_read_bloom_data(const unsigned char *chunk_start, + size_t chunk_size, void *data) +{ + struct commit_graph *g = data; + uint32_t hash_version; + g->chunk_bloom_data = chunk_start; + hash_version = get_be32(chunk_start); + + if (hash_version != 1) + return 0; + + g->bloom_filter_settings = xmalloc(sizeof(struct bloom_filter_settings)); + g->bloom_filter_settings->hash_version = hash_version; + g->bloom_filter_settings->num_hashes = get_be32(chunk_start + 4); + g->bloom_filter_settings->bits_per_entry = get_be32(chunk_start + 8); + g->bloom_filter_settings->max_changed_paths = DEFAULT_BLOOM_MAX_CHANGES; + + return 0; +} + struct commit_graph *parse_commit_graph(struct repository *r, void *graph_map, size_t graph_size) { - const unsigned char *data, *chunk_lookup; - uint32_t i; + const unsigned char *data; struct commit_graph *graph; - uint64_t next_chunk_offset; uint32_t graph_signature; unsigned char graph_version, hash_version; + struct chunkfile *cf = NULL; if (!graph_map) return NULL; @@ -347,7 +374,7 @@ struct commit_graph *parse_commit_graph(struct repository *r, graph->data_len = graph_size; if (graph_size < GRAPH_HEADER_SIZE + - (graph->num_chunks + 1) * GRAPH_CHUNKLOOKUP_WIDTH + + (graph->num_chunks + 1) * CHUNK_TOC_ENTRY_SIZE + GRAPH_FANOUT_SIZE + the_hash_algo->rawsz) { error(_("commit-graph file is too small to hold %u chunks"), graph->num_chunks); @@ -355,108 +382,28 @@ struct commit_graph *parse_commit_graph(struct repository *r, return NULL; } - chunk_lookup = data + 8; - next_chunk_offset = get_be64(chunk_lookup + 4); - for (i = 0; i < graph->num_chunks; i++) { - uint32_t chunk_id; - uint64_t chunk_offset = next_chunk_offset; - int chunk_repeated = 0; - - chunk_id = get_be32(chunk_lookup + 0); - - chunk_lookup += GRAPH_CHUNKLOOKUP_WIDTH; - next_chunk_offset = get_be64(chunk_lookup + 4); - - if (chunk_offset > graph_size - the_hash_algo->rawsz) { - error(_("commit-graph improper chunk offset %08x%08x"), (uint32_t)(chunk_offset >> 32), - (uint32_t)chunk_offset); - goto free_and_return; - } - - switch (chunk_id) { - case GRAPH_CHUNKID_OIDFANOUT: - if (graph->chunk_oid_fanout) - chunk_repeated = 1; - else - graph->chunk_oid_fanout = (uint32_t*)(data + chunk_offset); - break; - - case GRAPH_CHUNKID_OIDLOOKUP: - if (graph->chunk_oid_lookup) - chunk_repeated = 1; - else { - graph->chunk_oid_lookup = data + chunk_offset; - graph->num_commits = (next_chunk_offset - chunk_offset) - / graph->hash_len; - } - break; + cf = init_chunkfile(NULL); - case GRAPH_CHUNKID_DATA: - if (graph->chunk_commit_data) - chunk_repeated = 1; - else - graph->chunk_commit_data = data + chunk_offset; - break; - - case GRAPH_CHUNKID_GENERATION_DATA: - if (graph->chunk_generation_data) - chunk_repeated = 1; - else - graph->chunk_generation_data = data + chunk_offset; - break; - - case GRAPH_CHUNKID_GENERATION_DATA_OVERFLOW: - if (graph->chunk_generation_data_overflow) - chunk_repeated = 1; - else - graph->chunk_generation_data_overflow = data + chunk_offset; - break; - - case GRAPH_CHUNKID_EXTRAEDGES: - if (graph->chunk_extra_edges) - chunk_repeated = 1; - else - graph->chunk_extra_edges = data + chunk_offset; - break; - - case GRAPH_CHUNKID_BASE: - if (graph->chunk_base_graphs) - chunk_repeated = 1; - else - graph->chunk_base_graphs = data + chunk_offset; - break; - - case GRAPH_CHUNKID_BLOOMINDEXES: - if (graph->chunk_bloom_indexes) - chunk_repeated = 1; - else if (r->settings.commit_graph_read_changed_paths) - graph->chunk_bloom_indexes = data + chunk_offset; - break; - - case GRAPH_CHUNKID_BLOOMDATA: - if (graph->chunk_bloom_data) - chunk_repeated = 1; - else if (r->settings.commit_graph_read_changed_paths) { - uint32_t hash_version; - graph->chunk_bloom_data = data + chunk_offset; - hash_version = get_be32(data + chunk_offset); - - if (hash_version != 1) - break; + if (read_table_of_contents(cf, graph->data, graph_size, + GRAPH_HEADER_SIZE, graph->num_chunks)) + goto free_and_return; - graph->bloom_filter_settings = xmalloc(sizeof(struct bloom_filter_settings)); - graph->bloom_filter_settings->hash_version = hash_version; - graph->bloom_filter_settings->num_hashes = get_be32(data + chunk_offset + 4); - graph->bloom_filter_settings->bits_per_entry = get_be32(data + chunk_offset + 8); - graph->bloom_filter_settings->max_changed_paths = DEFAULT_BLOOM_MAX_CHANGES; - } - break; - } + pair_chunk(cf, GRAPH_CHUNKID_OIDFANOUT, + (const unsigned char **)&graph->chunk_oid_fanout); + read_chunk(cf, GRAPH_CHUNKID_OIDLOOKUP, graph_read_oid_lookup, graph); + pair_chunk(cf, GRAPH_CHUNKID_DATA, &graph->chunk_commit_data); + pair_chunk(cf, GRAPH_CHUNKID_EXTRAEDGES, &graph->chunk_extra_edges); + pair_chunk(cf, GRAPH_CHUNKID_BASE, &graph->chunk_base_graphs); + pair_chunk(cf, GRAPH_CHUNKID_GENERATION_DATA, + &graph->chunk_generation_data); + pair_chunk(cf, GRAPH_CHUNKID_GENERATION_DATA_OVERFLOW, + &graph->chunk_generation_data_overflow); - if (chunk_repeated) { - error(_("commit-graph chunk id %08x appears multiple times"), chunk_id); - goto free_and_return; - } + if (r->settings.commit_graph_read_changed_paths) { + pair_chunk(cf, GRAPH_CHUNKID_BLOOMINDEXES, + &graph->chunk_bloom_indexes); + read_chunk(cf, GRAPH_CHUNKID_BLOOMDATA, + graph_read_bloom_data, graph); } if (graph->chunk_bloom_indexes && graph->chunk_bloom_data) { @@ -473,9 +420,11 @@ struct commit_graph *parse_commit_graph(struct repository *r, if (verify_commit_graph_lite(graph)) goto free_and_return; + free_chunkfile(cf); return graph; free_and_return: + free_chunkfile(cf); free(graph->bloom_filter_settings); free(graph); return NULL; diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh index fa27df579a57..c7da741284e5 100755 --- a/t/t5318-commit-graph.sh +++ b/t/t5318-commit-graph.sh @@ -564,7 +564,7 @@ test_expect_success 'detect bad hash version' ' test_expect_success 'detect low chunk count' ' corrupt_graph_and_verify $GRAPH_BYTE_CHUNK_COUNT "\01" \ - "missing the .* chunk" + "final chunk has non-zero id" ' test_expect_success 'detect missing OID fanout chunk' ' From patchwork Thu Feb 18 14:07:36 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 12093695 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0801FC433DB for ; Thu, 18 Feb 2021 16:54:14 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A9EA664DFF for ; Thu, 18 Feb 2021 16:54:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233958AbhBRQxI (ORCPT ); Thu, 18 Feb 2021 11:53:08 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41420 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233045AbhBROKo (ORCPT ); Thu, 18 Feb 2021 09:10:44 -0500 Received: from mail-wm1-x335.google.com (mail-wm1-x335.google.com [IPv6:2a00:1450:4864:20::335]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5702BC06121E for ; Thu, 18 Feb 2021 06:07:57 -0800 (PST) Received: by mail-wm1-x335.google.com with SMTP id o10so2856651wmc.1 for ; Thu, 18 Feb 2021 06:07:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=1PgZT6EaPlfpce1Ua+iT0ZDtcNGb1Q1tQze+1iKn424=; b=SKzem4ODfNr3fEkwSuN1E9DXYMXwVMgraB3eOYSDtYCXQHeyWLUuRlNX81P3KbuX9M MaOsJUGPsOdzy3k29xF3YG55fsxjjNFrw/Ckah1cu6VFw3DBOAY67qkb+Zk8F/h2G8yL y9CaunQiQ6a0k0GNLOdA4qim85iM5jnQBoiwzAPRCpnEyVdaUu8BNc/pdks+MCZXaLmK 1zJy6YuoOGFjlL7PBWvNMuTS73gUHX1OnAGYbdOiVDxGiILK7f6FEenyY5cPwgof+b31 VkEtIdV0KLXtxiTmR+EXZeJmKXucxESUkEH/nxpz7RD81rJRHqy170dc9fiNxPckNkPW KijA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=1PgZT6EaPlfpce1Ua+iT0ZDtcNGb1Q1tQze+1iKn424=; b=jqIKIULVLWyrkP9dPPel6aSfc6Rr85FPBYBT4FJK8DLf/LSYmK9GpBO6zDniJNISTJ h7Ob3bwaFyDGsOm9UST2eoBVEnMjKGzD2IMa0saA11FdQkDr+WFFva+ateanupM0hEH6 GwmOC2+yenaGePa9GvLxAHvhA7CtAwgTECw0f2fxgEkj0j7EwggBE2XCJKtOED9/7XHO r92GDm7E/SKoaf3kuKnDJueLEhkfMdPUAFvaHga2bvLNhruXD0vE/37vPAbpV5mifNEK 1Jm4Ol/lWV4kSA8R+aOYdEZmyvog2T5kK1zKGarHmvoBMvGCTd2IwVhZ/x3NJ+APJ7oa mY+w== X-Gm-Message-State: AOAM532+zlJDqly2WoPDUdf0ZtPuQtYULTP9j1rumvWI57LxEs0wu/DP kHzf3b2DQk1G35ZiBlLj4QdzXAgXK/Q= X-Google-Smtp-Source: ABdhPJxNa3L0xcRE7g9Inde45ov1IzJxVBhqs1lkvlOUcMMdTHJwlUx9esUlaSO28lsqsJHR1ZbseA== X-Received: by 2002:a1c:4b05:: with SMTP id y5mr3800275wma.37.1613657276032; Thu, 18 Feb 2021 06:07:56 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id z8sm8786971wrr.55.2021.02.18.06.07.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 18 Feb 2021 06:07:55 -0800 (PST) Message-Id: <64010031db0e44c8c040fa1a53f568574dd5a940.1613657259.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Thu, 18 Feb 2021 14:07:36 +0000 Subject: [PATCH v4 14/17] midx: use chunk-format read API Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: me@ttaylorr.com, gitster@pobox.com, l.s.r@web.de, szeder.dev@gmail.com, Chris Torek , Derrick Stolee , Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee Instead of parsing the table of contents directly, use the chunk-format API methods read_table_of_contents() and pair_chunk(). In particular, we can use the return value of pair_chunk() to generate an error when a required chunk is missing. Signed-off-by: Derrick Stolee --- midx.c | 73 +++++++++++++------------------------ t/t5319-multi-pack-index.sh | 6 +-- 2 files changed, 29 insertions(+), 50 deletions(-) diff --git a/midx.c b/midx.c index d2fd9c10feea..d7ea0d1375fa 100644 --- a/midx.c +++ b/midx.c @@ -28,7 +28,6 @@ #define MIDX_CHUNKID_OIDLOOKUP 0x4f49444c /* "OIDL" */ #define MIDX_CHUNKID_OBJECTOFFSETS 0x4f4f4646 /* "OOFF" */ #define MIDX_CHUNKID_LARGEOFFSETS 0x4c4f4646 /* "LOFF" */ -#define MIDX_CHUNKLOOKUP_WIDTH (sizeof(uint32_t) + sizeof(uint64_t)) #define MIDX_CHUNK_FANOUT_SIZE (sizeof(uint32_t) * 256) #define MIDX_CHUNK_OFFSET_WIDTH (2 * sizeof(uint32_t)) #define MIDX_CHUNK_LARGE_OFFSET_WIDTH (sizeof(uint64_t)) @@ -53,6 +52,19 @@ static char *get_midx_filename(const char *object_dir) return xstrfmt("%s/pack/multi-pack-index", object_dir); } +static int midx_read_oid_fanout(const unsigned char *chunk_start, + size_t chunk_size, void *data) +{ + struct multi_pack_index *m = data; + m->chunk_oid_fanout = (uint32_t *)chunk_start; + + if (chunk_size != 4 * 256) { + error(_("multi-pack-index OID fanout is of the wrong size")); + return 1; + } + return 0; +} + struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local) { struct multi_pack_index *m = NULL; @@ -64,6 +76,7 @@ struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local char *midx_name = get_midx_filename(object_dir); uint32_t i; const char *cur_pack_name; + struct chunkfile *cf = NULL; fd = git_open(midx_name); @@ -113,58 +126,23 @@ struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local m->num_packs = get_be32(m->data + MIDX_BYTE_NUM_PACKS); - for (i = 0; i < m->num_chunks; i++) { - uint32_t chunk_id = get_be32(m->data + MIDX_HEADER_SIZE + - MIDX_CHUNKLOOKUP_WIDTH * i); - uint64_t chunk_offset = get_be64(m->data + MIDX_HEADER_SIZE + 4 + - MIDX_CHUNKLOOKUP_WIDTH * i); - - if (chunk_offset >= m->data_len) - die(_("invalid chunk offset (too large)")); - - switch (chunk_id) { - case MIDX_CHUNKID_PACKNAMES: - m->chunk_pack_names = m->data + chunk_offset; - break; - - case MIDX_CHUNKID_OIDFANOUT: - m->chunk_oid_fanout = (uint32_t *)(m->data + chunk_offset); - break; - - case MIDX_CHUNKID_OIDLOOKUP: - m->chunk_oid_lookup = m->data + chunk_offset; - break; - - case MIDX_CHUNKID_OBJECTOFFSETS: - m->chunk_object_offsets = m->data + chunk_offset; - break; - - case MIDX_CHUNKID_LARGEOFFSETS: - m->chunk_large_offsets = m->data + chunk_offset; - break; - - case 0: - die(_("terminating multi-pack-index chunk id appears earlier than expected")); - break; - - default: - /* - * Do nothing on unrecognized chunks, allowing future - * extensions to add optional chunks. - */ - break; - } - } + cf = init_chunkfile(NULL); - if (!m->chunk_pack_names) + if (read_table_of_contents(cf, m->data, midx_size, + MIDX_HEADER_SIZE, m->num_chunks)) + goto cleanup_fail; + + if (pair_chunk(cf, MIDX_CHUNKID_PACKNAMES, &m->chunk_pack_names) == CHUNK_NOT_FOUND) die(_("multi-pack-index missing required pack-name chunk")); - if (!m->chunk_oid_fanout) + if (read_chunk(cf, MIDX_CHUNKID_OIDFANOUT, midx_read_oid_fanout, m) == CHUNK_NOT_FOUND) die(_("multi-pack-index missing required OID fanout chunk")); - if (!m->chunk_oid_lookup) + if (pair_chunk(cf, MIDX_CHUNKID_OIDLOOKUP, &m->chunk_oid_lookup) == CHUNK_NOT_FOUND) die(_("multi-pack-index missing required OID lookup chunk")); - if (!m->chunk_object_offsets) + if (pair_chunk(cf, MIDX_CHUNKID_OBJECTOFFSETS, &m->chunk_object_offsets) == CHUNK_NOT_FOUND) die(_("multi-pack-index missing required object offsets chunk")); + pair_chunk(cf, MIDX_CHUNKID_LARGEOFFSETS, &m->chunk_large_offsets); + m->num_objects = ntohl(m->chunk_oid_fanout[255]); m->pack_names = xcalloc(m->num_packs, sizeof(*m->pack_names)); @@ -190,6 +168,7 @@ struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local cleanup_fail: free(m); free(midx_name); + free(cf); if (midx_map) munmap(midx_map, midx_size); if (0 <= fd) diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh index 297de502a94f..ad4e878b65b8 100755 --- a/t/t5319-multi-pack-index.sh +++ b/t/t5319-multi-pack-index.sh @@ -314,12 +314,12 @@ test_expect_success 'verify bad OID version' ' test_expect_success 'verify truncated chunk count' ' corrupt_midx_and_verify $MIDX_BYTE_CHUNK_COUNT "\01" $objdir \ - "missing required" + "final chunk has non-zero id" ' test_expect_success 'verify extended chunk count' ' corrupt_midx_and_verify $MIDX_BYTE_CHUNK_COUNT "\07" $objdir \ - "terminating multi-pack-index chunk id appears earlier than expected" + "terminating chunk id appears earlier than expected" ' test_expect_success 'verify missing required chunk' ' @@ -329,7 +329,7 @@ test_expect_success 'verify missing required chunk' ' test_expect_success 'verify invalid chunk offset' ' corrupt_midx_and_verify $MIDX_BYTE_CHUNK_OFFSET "\01" $objdir \ - "invalid chunk offset (too large)" + "improper chunk offset(s)" ' test_expect_success 'verify packnames out of order' ' From patchwork Thu Feb 18 14:07:37 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 12093693 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id AD942C433DB for ; Thu, 18 Feb 2021 16:54:09 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 5A2FA64DFF for ; Thu, 18 Feb 2021 16:54:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233945AbhBRQwx (ORCPT ); Thu, 18 Feb 2021 11:52:53 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41424 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233213AbhBROKo (ORCPT ); Thu, 18 Feb 2021 09:10:44 -0500 Received: from mail-wr1-x429.google.com (mail-wr1-x429.google.com [IPv6:2a00:1450:4864:20::429]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 978F2C06121F for ; Thu, 18 Feb 2021 06:07:58 -0800 (PST) Received: by mail-wr1-x429.google.com with SMTP id t15so3062453wrx.13 for ; Thu, 18 Feb 2021 06:07:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=yscnHumNb1Jj70TIpFXz3x1VXJQ6sM+kbPFEjpzuA5k=; b=UYg/1F5DauVyr0r0lVjTB58OqFj35k2QCm07WseMaXB1sFrhw08lJT+lJAfd05Z9hh NoArpIf79UlUtmkWW5c0PTvS2UPKVtx6A1XWv7d5sYgp9lo5wbOqIYbph4Iza/SdyL+L kWCnaz/J+y9eUMhuL1Vf8xOX1gheMM7uvO08Be1YOW5oU1rL+E7enWJY+HZZwFQ0DUPD hjY1jdh5b4P+g5Zek+8c0jNirGdXqdRG+ibnTEDOpUo5dG/kYm/2t7xOV3LPiDZ+wk4f nB3L/D/w07WLkggSpllVcDBz2GTXxRYZoEm5DLD/qh1mxyEu+IN6q0AGbKYvGY3sDx6L p8nw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=yscnHumNb1Jj70TIpFXz3x1VXJQ6sM+kbPFEjpzuA5k=; b=IBhsKSBoGPsjFfdwhgVLCRG/RxJPiYjlbUMNoyq3UT61E66Uip9gaMd+nOBVCf1Pbm Kr3ymDVVhc+pko9VKvRb/L+7ZsiDt7/3mAMhJs1Y+yg4b6RrIUiCo82769wgcV51criH 6DXF8HjeOSn0oMtH+ofUkTt34FP/MOUVvPLvdsG2CAaTs8IVbQKo9/WGNUsy2C6/+J41 x7TroC7hdsHq28sT7DQtmPQzdqv9I+M+VoTxwb9L4y5CAfijqUnb/gzJCZYxGZjnakCw KCn55PzXZvmUMZVNx70mTb6aKpivcNwbDVpNgMTIFtYNGVzm0/YPPB4TGOATkrwK30D8 KrXQ== X-Gm-Message-State: AOAM531mLj696Hw8FMbm+aES7Is32pmK/X03xPveitIvGadyDmBmVeq0 UnackcLlPb+dUn6fI9vBa1lmQF9J9Vs= X-Google-Smtp-Source: ABdhPJxIqQcFzCugH6R8ukUiH40zF4KxwkykU8Vh3LR9J2C7bI6QJONTDtkw1cB/XvX805X/kuCGow== X-Received: by 2002:adf:b749:: with SMTP id n9mr4462127wre.267.1613657277333; Thu, 18 Feb 2021 06:07:57 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id f7sm10582879wrw.81.2021.02.18.06.07.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 18 Feb 2021 06:07:56 -0800 (PST) Message-Id: <16205d8f8c408ffe17da366812483e1575700d7d.1613657260.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Thu, 18 Feb 2021 14:07:37 +0000 Subject: [PATCH v4 15/17] midx: use 64-bit multiplication for chunk sizes Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: me@ttaylorr.com, gitster@pobox.com, l.s.r@web.de, szeder.dev@gmail.com, Chris Torek , Derrick Stolee , Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee When calculating the sizes of certain chunks, we should use 64-bit multiplication always. This allows us to properly predict the chunk sizes without risk of overflow. Other possible overflows were discovered by evaluating each multiplication in midx.c and ensuring that at least one side of the operator was of type size_t or off_t. Signed-off-by: Derrick Stolee --- midx.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/midx.c b/midx.c index d7ea0d1375fa..5c7f2ed23330 100644 --- a/midx.c +++ b/midx.c @@ -244,7 +244,7 @@ static off_t nth_midxed_offset(struct multi_pack_index *m, uint32_t pos) const unsigned char *offset_data; uint32_t offset32; - offset_data = m->chunk_object_offsets + pos * MIDX_CHUNK_OFFSET_WIDTH; + offset_data = m->chunk_object_offsets + (off_t)pos * MIDX_CHUNK_OFFSET_WIDTH; offset32 = get_be32(offset_data + sizeof(uint32_t)); if (m->chunk_large_offsets && offset32 & MIDX_LARGE_OFFSET_NEEDED) { @@ -260,7 +260,8 @@ static off_t nth_midxed_offset(struct multi_pack_index *m, uint32_t pos) static uint32_t nth_midxed_pack_int_id(struct multi_pack_index *m, uint32_t pos) { - return get_be32(m->chunk_object_offsets + pos * MIDX_CHUNK_OFFSET_WIDTH); + return get_be32(m->chunk_object_offsets + + (off_t)pos * MIDX_CHUNK_OFFSET_WIDTH); } static int nth_midxed_pack_entry(struct repository *r, @@ -912,15 +913,15 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * add_chunk(cf, MIDX_CHUNKID_OIDFANOUT, MIDX_CHUNK_FANOUT_SIZE, write_midx_oid_fanout); add_chunk(cf, MIDX_CHUNKID_OIDLOOKUP, - ctx.entries_nr * the_hash_algo->rawsz, + (size_t)ctx.entries_nr * the_hash_algo->rawsz, write_midx_oid_lookup); add_chunk(cf, MIDX_CHUNKID_OBJECTOFFSETS, - ctx.entries_nr * MIDX_CHUNK_OFFSET_WIDTH, + (size_t)ctx.entries_nr * MIDX_CHUNK_OFFSET_WIDTH, write_midx_object_offsets); if (ctx.large_offsets_needed) add_chunk(cf, MIDX_CHUNKID_LARGEOFFSETS, - ctx.num_large_offsets * MIDX_CHUNK_LARGE_OFFSET_WIDTH, + (size_t)ctx.num_large_offsets * MIDX_CHUNK_LARGE_OFFSET_WIDTH, write_midx_large_offsets); write_midx_header(f, get_num_chunks(cf), ctx.nr - dropped_packs); From patchwork Thu Feb 18 14:07:38 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 12093699 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 540BDC433DB for ; Thu, 18 Feb 2021 16:54:48 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 0E95B6146D for ; Thu, 18 Feb 2021 16:54:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233982AbhBRQxf (ORCPT ); Thu, 18 Feb 2021 11:53:35 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42012 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233641AbhBROLL (ORCPT ); Thu, 18 Feb 2021 09:11:11 -0500 Received: from mail-wm1-x330.google.com (mail-wm1-x330.google.com [IPv6:2a00:1450:4864:20::330]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A35CCC061221 for ; Thu, 18 Feb 2021 06:07:59 -0800 (PST) Received: by mail-wm1-x330.google.com with SMTP id m1so4073300wml.2 for ; Thu, 18 Feb 2021 06:07:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=XXFb2RG+oHUGoAOF85mt8265szVQ1fzxkOUTQ8bMNVY=; b=FMU8GlkdlqlVtuxkzoNroRFRXyH88+1h6K9yo8YY9wZETFCpz6y9NVuD+ixHyU2osW CQHI2wNsa0muFE1BqKbsvbDVtEZmAI2gmxwf1S5DH9LAUmuRDY+vzaGi2ACJjR3xR6aw NHxVYAGsA9H/e0OSVL+tEiwaXGu0bXMrusQP7v8/NZI6Y/dH5TP/fg7fbQi+9cz8oIKd ewN3/HD94O7vnq+BqIY/dsQ5qscqO3aZmikLHnssqqU3BBNg+g4evI2eqoxq04IcKT6H XXH4sQOmAw28bi5++fwSrh1GWYvOw/a1mTUKL9orB8JySTFUD5pQWmOkeRa89WJkQVxz 43vQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=XXFb2RG+oHUGoAOF85mt8265szVQ1fzxkOUTQ8bMNVY=; b=He0El8fcScZd0K/DGWtAPBRhklwTe1KsaueFQdoMve9hrPlep8oTqz9FrNntfB9XkC scNfA+9+azQ50tj/0K/oLc6Cyf+UaBvQ/5lY+0BG42jxOeyTMM2aPnwc+p2pjxQJQtgt thdlM4WdZM+KHG5/KUfdL7GlAfEM3y1IP+ozOOGfEWsBCFyy4MO+zHhb1Y65JFj7yo7O lkllgCN4lr1+4klqccdF/FAaAZuhVY3/QfwVgPkP1Jm2p9u9m6Nj02ENjouiBoc531pk 4kclRpyqw7Tlg6zzHx2ORY1XWrabaMRySS22KLQ6mxDZGY3nbHouf/6Z+lnu5JPwbZGW rBTg== X-Gm-Message-State: AOAM533BuHXVoFl1gQkQ82eD3XY1SkL6m+lvN92J6N9nhxPIUK1/bfco W3lTchJQBjgCU7McrDBW89/0twKNYlU= X-Google-Smtp-Source: ABdhPJyIU6xg8uLPR4CrewIFr1m6tvHHuhTYRnO9ZAX2EUYfnDKkM7CoOplNn3RPCgqBWA7ZQEDp7w== X-Received: by 2002:a7b:ce95:: with SMTP id q21mr3779720wmj.178.1613657278303; Thu, 18 Feb 2021 06:07:58 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id q20sm7529547wmc.14.2021.02.18.06.07.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 18 Feb 2021 06:07:57 -0800 (PST) Message-Id: <0e58218be4185b428aec48a9169134517330bcde.1613657260.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Thu, 18 Feb 2021 14:07:38 +0000 Subject: [PATCH v4 16/17] chunk-format: restore duplicate chunk checks Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: me@ttaylorr.com, gitster@pobox.com, l.s.r@web.de, szeder.dev@gmail.com, Chris Torek , Derrick Stolee , Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee Before refactoring into the chunk-format API, the commit-graph parsing logic included checks for duplicate chunks. It is unlikely that we would desire a chunk-based file format that allows duplicate chunk IDs in the table of contents, so add duplicate checks into read_table_of_contents(). Signed-off-by: Derrick Stolee --- chunk-format.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/chunk-format.c b/chunk-format.c index 2c1fecf1c3e5..da191e59a29d 100644 --- a/chunk-format.c +++ b/chunk-format.c @@ -97,6 +97,7 @@ int read_table_of_contents(struct chunkfile *cf, uint64_t toc_offset, int toc_length) { + int i; uint32_t chunk_id; const unsigned char *table_of_contents = mfile + toc_offset; @@ -123,6 +124,14 @@ int read_table_of_contents(struct chunkfile *cf, return -1; } + for (i = 0; i < cf->chunks_nr; i++) { + if (cf->chunks[i].id == chunk_id) { + error(_("duplicate chunk ID %"PRIx32" found"), + chunk_id); + return -1; + } + } + cf->chunks[cf->chunks_nr].id = chunk_id; cf->chunks[cf->chunks_nr].start = mfile + chunk_offset; cf->chunks[cf->chunks_nr].size = next_chunk_offset - chunk_offset; From patchwork Thu Feb 18 14:07:39 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 12093691 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8FF24C433DB for ; Thu, 18 Feb 2021 16:54:05 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 3C5A664DFF for ; Thu, 18 Feb 2021 16:54:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233915AbhBRQwW (ORCPT ); Thu, 18 Feb 2021 11:52:22 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41434 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233631AbhBROKo (ORCPT ); Thu, 18 Feb 2021 09:10:44 -0500 Received: from mail-wm1-x32d.google.com (mail-wm1-x32d.google.com [IPv6:2a00:1450:4864:20::32d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D10B7C061222 for ; Thu, 18 Feb 2021 06:08:00 -0800 (PST) Received: by mail-wm1-x32d.google.com with SMTP id n10so4085602wmq.0 for ; Thu, 18 Feb 2021 06:08:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=VY70YxpMWWO9iYCYRFM9nL7VSofDIl8bmBKvouWjznc=; b=UVODMOet7OPq8pKAvMYzThysATmdtHRvt4XwQ22zdU5VsQzHpXV87ZWmJfmwVF7FGZ ENKmMKqGKXqAB1bA8jcbzgIO47VV0dC2wm8a6R8MnVQ8Tx17AciIAdPxnem0pcZ07z6d 0AUTbx3jXIJj91qyWy8S1rSDvEl/jVNX5znExak41WatySQNmqSymNCJOYtr1n5lX8uI 1DPlpovIcZRuU/cZIiekQNHZ+2AJLfv2xRB5QIM0KQD7il5B67fej+oKzH4UVFU7OYdK 7czLvUqwmCr8d+NOE0c8IVS+dyEVDyDWtwSfP1duBnvRGkCvrrqX6SgAZ/uEi8iO2C1U 5zlg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=VY70YxpMWWO9iYCYRFM9nL7VSofDIl8bmBKvouWjznc=; b=Fl2Jq0rdu3EYECUAOfHx6XsZOa8RULvyDrKrWKFd6tlJ5iAbbpFIbPxdodwSYGe/a4 f5kiZ0kzOcwQ0/Ey4R6o6Wfij3LhgTlY3sosGmjBsGNFrln4/ndbN5Ab4u6vBbdSSQgU dh9YBSaWjvHl5TLAouPfnr3i4dCIbAUPx6AOjSW4kzCFxuzoaCe92k71ZAq8vpeB3g/5 Z+fCiMHDGEbs3a09fYvLUANIe3QMVGtUi6g+bYGN2WiFSScHwtjVu+1aNXRWvb5rHA95 ZDiUeCOCoQkF9HbzS1+d8W44C+7RMpQSNC8zwqVo3EiB4aGmZvOQELbtDq/WqUwXa6AQ jUqg== X-Gm-Message-State: AOAM532eI2b9+JFgJi5DJAv9nI60A7rC3BHLyvd+idgiBLHuyb4IFihY hFKQGZmroQ2KsvzxsIpRDAn+jBKroZw= X-Google-Smtp-Source: ABdhPJx4G2a0TuSzWgkxO/FbgV2vq8M2XP0D2kGFwWvHiYxTjkOziy2nCxw/SnXiq0K0x4bUQE1heA== X-Received: by 2002:a05:600c:4f46:: with SMTP id m6mr3743457wmq.160.1613657279456; Thu, 18 Feb 2021 06:07:59 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id q2sm9041779wrx.79.2021.02.18.06.07.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 18 Feb 2021 06:07:59 -0800 (PST) Message-Id: <84bf6506dc12163b37f46192b3742c8fb234322f.1613657260.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Thu, 18 Feb 2021 14:07:39 +0000 Subject: [PATCH v4 17/17] chunk-format: add technical docs Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: me@ttaylorr.com, gitster@pobox.com, l.s.r@web.de, szeder.dev@gmail.com, Chris Torek , Derrick Stolee , Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee The chunk-based file format is now an API in the code, but we should also take time to document it as a file format. Specifically, it matches the CHUNK LOOKUP sections of the commit-graph and multi-pack-index files, but there are some commonalities that should be grouped in this document. Signed-off-by: Derrick Stolee --- Documentation/technical/chunk-format.txt | 116 ++++++++++++++++++ .../technical/commit-graph-format.txt | 3 + Documentation/technical/pack-format.txt | 3 + 3 files changed, 122 insertions(+) create mode 100644 Documentation/technical/chunk-format.txt diff --git a/Documentation/technical/chunk-format.txt b/Documentation/technical/chunk-format.txt new file mode 100644 index 000000000000..593614fcedab --- /dev/null +++ b/Documentation/technical/chunk-format.txt @@ -0,0 +1,116 @@ +Chunk-based file formats +======================== + +Some file formats in Git use a common concept of "chunks" to describe +sections of the file. This allows structured access to a large file by +scanning a small "table of contents" for the remaining data. This common +format is used by the `commit-graph` and `multi-pack-index` files. See +link:technical/pack-format.html[the `multi-pack-index` format] and +link:technical/commit-graph-format.html[the `commit-graph` format] for +how they use the chunks to describe structured data. + +A chunk-based file format begins with some header information custom to +that format. That header should include enough information to identify +the file type, format version, and number of chunks in the file. From this +information, that file can determine the start of the chunk-based region. + +The chunk-based region starts with a table of contents describing where +each chunk starts and ends. This consists of (C+1) rows of 12 bytes each, +where C is the number of chunks. Consider the following table: + + | Chunk ID (4 bytes) | Chunk Offset (8 bytes) | + |--------------------|------------------------| + | ID[0] | OFFSET[0] | + | ... | ... | + | ID[C] | OFFSET[C] | + | 0x0000 | OFFSET[C+1] | + +Each row consists of a 4-byte chunk identifier (ID) and an 8-byte offset. +Each integer is stored in network-byte order. + +The chunk identifier `ID[i]` is a label for the data stored within this +fill from `OFFSET[i]` (inclusive) to `OFFSET[i+1]` (exclusive). Thus, the +size of the `i`th chunk is equal to the difference between `OFFSET[i+1]` +and `OFFSET[i]`. This requires that the chunk data appears contiguously +in the same order as the table of contents. + +The final entry in the table of contents must be four zero bytes. This +confirms that the table of contents is ending and provides the offset for +the end of the chunk-based data. + +Note: The chunk-based format expects that the file contains _at least_ a +trailing hash after `OFFSET[C+1]`. + +Functions for working with chunk-based file formats are declared in +`chunk-format.h`. Using these methods provide extra checks that assist +developers when creating new file formats. + +Writing chunk-based file formats +-------------------------------- + +To write a chunk-based file format, create a `struct chunkfile` by +calling `init_chunkfile()` and pass a `struct hashfile` pointer. The +caller is responsible for opening the `hashfile` and writing header +information so the file format is identifiable before the chunk-based +format begins. + +Then, call `add_chunk()` for each chunk that is intended for write. This +populates the `chunkfile` with information about the order and size of +each chunk to write. Provide a `chunk_write_fn` function pointer to +perform the write of the chunk data upon request. + +Call `write_chunkfile()` to write the table of contents to the `hashfile` +followed by each of the chunks. This will verify that each chunk wrote +the expected amount of data so the table of contents is correct. + +Finally, call `free_chunkfile()` to clear the `struct chunkfile` data. The +caller is responsible for finalizing the `hashfile` by writing the trailing +hash and closing the file. + +Reading chunk-based file formats +-------------------------------- + +To read a chunk-based file format, the file must be opened as a +memory-mapped region. The chunk-format API expects that the entire file +is mapped as a contiguous memory region. + +Initialize a `struct chunkfile` pointer with `init_chunkfile(NULL)`. + +After reading the header information from the beginning of the file, +including the chunk count, call `read_table_of_contents()` to populate +the `struct chunkfile` with the list of chunks, their offsets, and their +sizes. + +Extract the data information for each chunk using `pair_chunk()` or +`read_chunk()`: + +* `pair_chunk()` assigns a given pointer with the location inside the + memory-mapped file corresponding to that chunk's offset. If the chunk + does not exist, then the pointer is not modified. + +* `read_chunk()` takes a `chunk_read_fn` function pointer and calls it + with the appropriate initial pointer and size information. The function + is not called if the chunk does not exist. Use this method to read chunks + if you need to perform immediate parsing or if you need to execute logic + based on the size of the chunk. + +After calling these methods, call `free_chunkfile()` to clear the +`struct chunkfile` data. This will not close the memory-mapped region. +Callers are expected to own that data for the timeframe the pointers into +the region are needed. + +Examples +-------- + +These file formats use the chunk-format API, and can be used as examples +for future formats: + +* *commit-graph:* see `write_commit_graph_file()` and `parse_commit_graph()` + in `commit-graph.c` for how the chunk-format API is used to write and + parse the commit-graph file format documented in + link:technical/commit-graph-format.html[the commit-graph file format]. + +* *multi-pack-index:* see `write_midx_internal()` and `load_multi_pack_index()` + in `midx.c` for how the chunk-format API is used to write and + parse the multi-pack-index file format documented in + link:technical/pack-format.html[the multi-pack-index file format]. diff --git a/Documentation/technical/commit-graph-format.txt b/Documentation/technical/commit-graph-format.txt index b6658eff1882..87971c27dd73 100644 --- a/Documentation/technical/commit-graph-format.txt +++ b/Documentation/technical/commit-graph-format.txt @@ -61,6 +61,9 @@ CHUNK LOOKUP: the length using the next chunk position if necessary.) Each chunk ID appears at most once. + The CHUNK LOOKUP matches the table of contents from + link:technical/chunk-format.html[the chunk-based file format]. + The remaining data in the body is described one chunk at a time, and these chunks may be given in any order. Chunks are required unless otherwise specified. diff --git a/Documentation/technical/pack-format.txt b/Documentation/technical/pack-format.txt index f96b2e605f34..2fb1e60d29ec 100644 --- a/Documentation/technical/pack-format.txt +++ b/Documentation/technical/pack-format.txt @@ -301,6 +301,9 @@ CHUNK LOOKUP: (Chunks are provided in file-order, so you can infer the length using the next chunk position if necessary.) + The CHUNK LOOKUP matches the table of contents from + link:technical/chunk-format.html[the chunk-based file format]. + The remaining data in the body is described one chunk at a time, and these chunks may be given in any order. Chunks are required unless otherwise specified.