From patchwork Tue Jan 26 16:01:10 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 12047279 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 67FCFC43381 for ; Tue, 26 Jan 2021 16:03:21 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 406D3221EC for ; Tue, 26 Jan 2021 16:03:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2404432AbhAZQC0 (ORCPT ); Tue, 26 Jan 2021 11:02:26 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37536 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2389512AbhAZQCK (ORCPT ); Tue, 26 Jan 2021 11:02:10 -0500 Received: from mail-wr1-x429.google.com (mail-wr1-x429.google.com [IPv6:2a00:1450:4864:20::429]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6D6C8C061A29 for ; Tue, 26 Jan 2021 08:01:30 -0800 (PST) Received: by mail-wr1-x429.google.com with SMTP id v15so17006516wrx.4 for ; Tue, 26 Jan 2021 08:01:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=mqI9oFM5GZYrAmP0G9H5RpNNvBIP6Zj5HMIokaAsvG4=; b=Vwg+uNyEyNcIx5mr1qjKtSkoIRfJJJDjEuM/K2Bc8y+v8QIlwCTkWC4z2F0TC8mg2A wJUfcdou85DPrwA7Gvury+Zb+Ovs2jjWgil/+7xk7hj8kGtbmG9fgUsTjtLB8rRwaEG4 dP8ljeS38P6PX5xq1DixmzZPMmFa+bNXg6OJhgI/uivBfj30gFSsb/1LWBo3ljpD/TEO Hmb73IFFNrfLpPZa9NuExdTMAw7k88cnLd1wKizHDE2/ybUIHHLoNs/rPEAvuX+qVKzq G5ZTC+1hEK2Khd4AXUf5lexgNxxl1UXI4L7U9hbEDGqKIQBNkMQ++6g+wATPkO8h1gbE CBfA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=mqI9oFM5GZYrAmP0G9H5RpNNvBIP6Zj5HMIokaAsvG4=; b=lqErx7HUm3kRE7pQeL1XMzzp9KNHNdTyxo5K68MgdLymmpxkw4gQCRIflO2ewxmRjz 6YFWPq8ct4Md9HwMkwIX2UpJnQV9tyAm2L+N+ypolDYjn2Q/e2EJMyY3D7Y6gsIs6DJs M1hUEelh5rYaRfMHzv/llMb2BJ64qelq+/7+LThgiz5onMD254m4/b8ZN8P3B1+On15h 1NMkD6fTWLilIcegzg1gzZkxkR/Mrqo+RL7gRRKre0r+oyn+5BM1jPzA/FGTeBePS2Ey llVzszBzxr91x4OcrvmPWV+Bj8BRDtstKynAec0eluwFP7Mi1lx6r2rKZWnpnc7uT/4N GI/A== X-Gm-Message-State: AOAM532YwiamerRKYbS3thYs7sCnu9KANs6hJIx+9x8TIeDwEL5XjEv3 0QlRNX3gwIjE02wwc+xIQLy5X6nptoY= X-Google-Smtp-Source: ABdhPJzj9oW3sqknA9TkCnU1pD1jHt5a+8e7MqSmx9WdLevDw0m6V2u1HxFcQM76nq7kdVn+XpPqmA== X-Received: by 2002:a05:6000:1543:: with SMTP id 3mr6952114wry.254.1611676889048; Tue, 26 Jan 2021 08:01:29 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id w14sm16278705wro.86.2021.01.26.08.01.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 26 Jan 2021 08:01:28 -0800 (PST) Message-Id: <09b32829e4ff2384cb35afaf1a34385e25bac8d8.1611676886.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Tue, 26 Jan 2021 16:01:10 +0000 Subject: [PATCH 01/17] commit-graph: anonymize data in chunk_write_fn Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: me@ttaylorr.com, gitster@pobox.com, l.s.r@web.de, szeder.dev@gmail.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee In preparation for creating an API around file formats using chunks and tables of contents, prepare the commit-graph write code to use prototypes that will match this new API. Specifically, convert chunk_write_fn to take a "void *data" parameter instead of the commit-graph-specific "struct write_commit_graph_context" pointer. Signed-off-by: Derrick Stolee --- commit-graph.c | 38 ++++++++++++++++++++++++++++---------- 1 file changed, 28 insertions(+), 10 deletions(-) diff --git a/commit-graph.c b/commit-graph.c index f3bde2ad95a..b26ed72396e 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -1040,8 +1040,10 @@ struct write_commit_graph_context { }; static int write_graph_chunk_fanout(struct hashfile *f, - struct write_commit_graph_context *ctx) + void *data) { + struct write_commit_graph_context *ctx = + (struct write_commit_graph_context *)data; int i, count = 0; struct commit **list = ctx->commits.list; @@ -1066,8 +1068,10 @@ static int write_graph_chunk_fanout(struct hashfile *f, } static int write_graph_chunk_oids(struct hashfile *f, - struct write_commit_graph_context *ctx) + void *data) { + struct write_commit_graph_context *ctx = + (struct write_commit_graph_context *)data; struct commit **list = ctx->commits.list; int count; for (count = 0; count < ctx->commits.nr; count++, list++) { @@ -1085,8 +1089,10 @@ static const unsigned char *commit_to_sha1(size_t index, void *table) } static int write_graph_chunk_data(struct hashfile *f, - struct write_commit_graph_context *ctx) + void *data) { + struct write_commit_graph_context *ctx = + (struct write_commit_graph_context *)data; struct commit **list = ctx->commits.list; struct commit **last = ctx->commits.list + ctx->commits.nr; uint32_t num_extra_edges = 0; @@ -1187,8 +1193,10 @@ static int write_graph_chunk_data(struct hashfile *f, } static int write_graph_chunk_generation_data(struct hashfile *f, - struct write_commit_graph_context *ctx) + void *data) { + struct write_commit_graph_context *ctx = + (struct write_commit_graph_context *)data; int i, num_generation_data_overflows = 0; for (i = 0; i < ctx->commits.nr; i++) { @@ -1208,8 +1216,10 @@ static int write_graph_chunk_generation_data(struct hashfile *f, } static int write_graph_chunk_generation_data_overflow(struct hashfile *f, - struct write_commit_graph_context *ctx) + void *data) { + struct write_commit_graph_context *ctx = + (struct write_commit_graph_context *)data; int i; for (i = 0; i < ctx->commits.nr; i++) { struct commit *c = ctx->commits.list[i]; @@ -1226,8 +1236,10 @@ static int write_graph_chunk_generation_data_overflow(struct hashfile *f, } static int write_graph_chunk_extra_edges(struct hashfile *f, - struct write_commit_graph_context *ctx) + void *data) { + struct write_commit_graph_context *ctx = + (struct write_commit_graph_context *)data; struct commit **list = ctx->commits.list; struct commit **last = ctx->commits.list + ctx->commits.nr; struct commit_list *parent; @@ -1280,8 +1292,10 @@ static int write_graph_chunk_extra_edges(struct hashfile *f, } static int write_graph_chunk_bloom_indexes(struct hashfile *f, - struct write_commit_graph_context *ctx) + void *data) { + struct write_commit_graph_context *ctx = + (struct write_commit_graph_context *)data; struct commit **list = ctx->commits.list; struct commit **last = ctx->commits.list + ctx->commits.nr; uint32_t cur_pos = 0; @@ -1315,8 +1329,10 @@ static void trace2_bloom_filter_settings(struct write_commit_graph_context *ctx) } static int write_graph_chunk_bloom_data(struct hashfile *f, - struct write_commit_graph_context *ctx) + void *data) { + struct write_commit_graph_context *ctx = + (struct write_commit_graph_context *)data; struct commit **list = ctx->commits.list; struct commit **last = ctx->commits.list + ctx->commits.nr; @@ -1737,8 +1753,10 @@ static int write_graph_chunk_base_1(struct hashfile *f, } static int write_graph_chunk_base(struct hashfile *f, - struct write_commit_graph_context *ctx) + void *data) { + struct write_commit_graph_context *ctx = + (struct write_commit_graph_context *)data; int num = write_graph_chunk_base_1(f, ctx->new_base_graph); if (num != ctx->num_commit_graphs_after - 1) { @@ -1750,7 +1768,7 @@ static int write_graph_chunk_base(struct hashfile *f, } typedef int (*chunk_write_fn)(struct hashfile *f, - struct write_commit_graph_context *ctx); + void *data); struct chunk_info { uint32_t id; From patchwork Tue Jan 26 16:01:11 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 12047275 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3B4A1C433DB for ; Tue, 26 Jan 2021 16:02:33 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id DF46E221EC for ; Tue, 26 Jan 2021 16:02:32 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2404144AbhAZQCW (ORCPT ); Tue, 26 Jan 2021 11:02:22 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37540 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2391058AbhAZQCM (ORCPT ); Tue, 26 Jan 2021 11:02:12 -0500 Received: from mail-wr1-x42a.google.com (mail-wr1-x42a.google.com [IPv6:2a00:1450:4864:20::42a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6DCB7C061A2E for ; Tue, 26 Jan 2021 08:01:31 -0800 (PST) Received: by mail-wr1-x42a.google.com with SMTP id 7so17030095wrz.0 for ; Tue, 26 Jan 2021 08:01:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=siZEtXbSOFXW3yL7TfR3RspVNwEahKzFKF1375HAfPw=; b=LY3nDHcmbUIdVlHmvjM4X2VnXOX/3IRYVtZfzL6HVDZkIq33iXdKFLifzbjKbupY59 UodG3QhrpB491jC/89lNbYRFUXUCfKxZNBuYMZ9VsgQj5cpUEU7v1XHTxItK8wUxFLAb DtPkBe/7IifytCjygYZHml7CcTg+Fdnecca+p86jBbQhqfd26TYJM3HsOOq7cItjODid IClL1Cl7U+UYn7mSfH2VJT6WiXE9ZPBqolg5cmtoRklvE1sGNyzkvJTlI4yZ2RVcxDOa U/s0f1cz+2I/tYCoABWDEs/5kFCVkHgpgw/+ZmOeNFjuqVW+F7+5P5QEMZ9MfgVj02Tw XuaA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=siZEtXbSOFXW3yL7TfR3RspVNwEahKzFKF1375HAfPw=; b=ADQSqhufo79uemL+j/ysuhFwUHmDBrh/TuuUS6R/o4NJjlbic07SiT9EdY4a0CdLNu n/mFSBr1GWbh6ThuGBM86XhEaw1KFyhmX9KjmJIwn4MqjxWcqjPKxxl8SGM5bkkVWHdG xccCBohoDEhg11Cb6PC/9cYimWfLfyJa0gC++9OPu5SfTC7Aa3yDOtavYSY0/bHS3x0c F4PmzV23vpQYelLL9YVaGnOSlQBr1owgPhzeVWpktd4daBxTTze3tA1KGPuNGjcssBzB rUwkS7I1FjJfkR0qnaauPyGW12pdN0qVBIhKAqhGDStZKpdpY4dOzcLSsU0ptkuRYOhe m9NQ== X-Gm-Message-State: AOAM530+r+jhEmdgcVZ0sLHpVzLjRhhPn5vSOnV2qyGmpPluVIYHOOhg bZbRkoRgYIcLFbUCyQvgLcB+iN2GRUg= X-Google-Smtp-Source: ABdhPJwiNxVIn+8G8fy9752END+z+BBXmOYo0qMOOs7yd5sYmkxjStHZqpnJTt5gSMfaSKQx3dGQyQ== X-Received: by 2002:adf:dd45:: with SMTP id u5mr6767241wrm.392.1611676889965; Tue, 26 Jan 2021 08:01:29 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id a184sm3968026wme.35.2021.01.26.08.01.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 26 Jan 2021 08:01:29 -0800 (PST) Message-Id: <9bd273f8c94fdb0c3adf8aedef3480ff5f4232b8.1611676886.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Tue, 26 Jan 2021 16:01:11 +0000 Subject: [PATCH 02/17] chunk-format: create chunk format write API Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: me@ttaylorr.com, gitster@pobox.com, l.s.r@web.de, szeder.dev@gmail.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee In anticipation of combining the logic from the commit-graph and multi-pack-index file formats, create a new chunk-format API. Use a 'struct chunkfile' pointer to keep track of data that has been registered for writes. This struct is anonymous outside of chunk-format.c to ensure no user attempts to interfere with the data. The next change will use this API in commit-graph.c, but the general approach is: 1. initialize the chunkfile with init_chunkfile(f). 2. add chunks in the intended writing order with add_chunk(). 3. write any header information to the hashfile f. 4. write the chunkfile data using write_chunkfile(). 5. free the chunkfile struct using free_chunkfile(). Helped-by: Taylor Blau Signed-off-by: Derrick Stolee --- Makefile | 1 + chunk-format.c | 91 ++++++++++++++++++++++++++++++++++++++++++++++++++ chunk-format.h | 20 +++++++++++ 3 files changed, 112 insertions(+) create mode 100644 chunk-format.c create mode 100644 chunk-format.h diff --git a/Makefile b/Makefile index 7b64106930a..50a7663841e 100644 --- a/Makefile +++ b/Makefile @@ -854,6 +854,7 @@ LIB_OBJS += bundle.o LIB_OBJS += cache-tree.o LIB_OBJS += chdir-notify.o LIB_OBJS += checkout.o +LIB_OBJS += chunk-format.o LIB_OBJS += color.o LIB_OBJS += column.o LIB_OBJS += combine-diff.o diff --git a/chunk-format.c b/chunk-format.c new file mode 100644 index 00000000000..2ce37ecc6bb --- /dev/null +++ b/chunk-format.c @@ -0,0 +1,91 @@ +#include "cache.h" +#include "chunk-format.h" +#include "csum-file.h" +#define CHUNK_LOOKUP_WIDTH 12 + +/* + * When writing a chunk-based file format, collect the chunks in + * an array of chunk_info structs. The size stores the _expected_ + * amount of data that will be written by write_fn. + */ +struct chunk_info { + uint32_t id; + uint64_t size; + chunk_write_fn write_fn; +}; + +struct chunkfile { + struct hashfile *f; + + struct chunk_info *chunks; + size_t chunks_nr; + size_t chunks_alloc; +}; + +struct chunkfile *init_chunkfile(struct hashfile *f) +{ + struct chunkfile *cf = xcalloc(1, sizeof(*cf)); + cf->f = f; + return cf; +} + +void free_chunkfile(struct chunkfile *cf) +{ + if (!cf) + return; + free(cf->chunks); + free(cf); +} + +int get_num_chunks(struct chunkfile *cf) +{ + return cf->chunks_nr; +} + +void add_chunk(struct chunkfile *cf, + uint64_t id, + chunk_write_fn fn, + size_t size) +{ + ALLOC_GROW(cf->chunks, cf->chunks_nr + 1, cf->chunks_alloc); + + cf->chunks[cf->chunks_nr].id = id; + cf->chunks[cf->chunks_nr].write_fn = fn; + cf->chunks[cf->chunks_nr].size = size; + cf->chunks_nr++; +} + +int write_chunkfile(struct chunkfile *cf, void *data) +{ + int i; + size_t cur_offset = cf->f->offset + cf->f->total; + + /* Add the table of contents to the current offset */ + cur_offset += (cf->chunks_nr + 1) * CHUNK_LOOKUP_WIDTH; + + for (i = 0; i < cf->chunks_nr; i++) { + hashwrite_be32(cf->f, cf->chunks[i].id); + hashwrite_be64(cf->f, cur_offset); + + cur_offset += cf->chunks[i].size; + } + + /* Trailing entry marks the end of the chunks */ + hashwrite_be32(cf->f, 0); + hashwrite_be64(cf->f, cur_offset); + + for (i = 0; i < cf->chunks_nr; i++) { + uint64_t start_offset = cf->f->total + cf->f->offset; + int result = cf->chunks[i].write_fn(cf->f, data); + + if (result) + return result; + + if (cf->f->total + cf->f->offset != start_offset + cf->chunks[i].size) + BUG("expected to write %"PRId64" bytes to chunk %"PRIx32", but wrote %"PRId64" instead", + cf->chunks[i].size, cf->chunks[i].id, + cf->f->total + cf->f->offset - start_offset); + } + + return 0; +} diff --git a/chunk-format.h b/chunk-format.h new file mode 100644 index 00000000000..bfaed672813 --- /dev/null +++ b/chunk-format.h @@ -0,0 +1,20 @@ +#ifndef CHUNK_FORMAT_H +#define CHUNK_FORMAT_H + +#include "git-compat-util.h" + +struct hashfile; +struct chunkfile; + +struct chunkfile *init_chunkfile(struct hashfile *f); +void free_chunkfile(struct chunkfile *cf); +int get_num_chunks(struct chunkfile *cf); +typedef int (*chunk_write_fn)(struct hashfile *f, + void *data); +void add_chunk(struct chunkfile *cf, + uint64_t id, + chunk_write_fn fn, + size_t size); +int write_chunkfile(struct chunkfile *cf, void *data); + +#endif From patchwork Tue Jan 26 16:01:12 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 12047287 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1250BC433E0 for ; Tue, 26 Jan 2021 16:04:56 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C8522207B3 for ; Tue, 26 Jan 2021 16:04:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2404736AbhAZQDl (ORCPT ); Tue, 26 Jan 2021 11:03:41 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37604 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2404169AbhAZQC2 (ORCPT ); Tue, 26 Jan 2021 11:02:28 -0500 Received: from mail-wr1-x42b.google.com (mail-wr1-x42b.google.com [IPv6:2a00:1450:4864:20::42b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8B906C0610D6 for ; Tue, 26 Jan 2021 08:01:32 -0800 (PST) Received: by mail-wr1-x42b.google.com with SMTP id v15so17006642wrx.4 for ; Tue, 26 Jan 2021 08:01:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=97irUJYECDLVWkK+k5eog9vFuUY3oDEfcKFZppunCQ0=; b=Hk3gFV8FhEpIFW9/fHV+GH7DrhvoDElsDHak4DEnvncPXQALxOku51Sv9pyH8o2dzw 16L4NGloOWUHoREFOqkMuD9faff9e2ucqT3skj9tYfim04h4NSCm41NxHkogrySXGS7S Gds9aycG+Uqulg/hHyxF/wT76oyzstbHcWC/8VcfamO4VTBAk18k/lW/0ApOzmf9GPri GlhMM32tc8KDUHqLG70gtZm7Mc65dglHF9YUvJ+GfMIkgQ1yyf3yamtfBikGpioUFZiY 9w98vwPDIkp9CQdOJrbmYTrzOw+Zim8W8wa3I5N8Cgbr8wvkb9TNBm1j8TJP+WSYF5mO FPbw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=97irUJYECDLVWkK+k5eog9vFuUY3oDEfcKFZppunCQ0=; b=Vi0ilwYp32/YBhFXi5QZ4vQyHLnQv47QUdyih5bEzVWgKUR3lIHd9wDCi4xFHSDkip imkgydHmXZievGZB8sVWrxnzj5uwDf+eXXEZqdois4t57GaTdfXgC2RyPInrcCE9iAnR hDQxrrfL7lZuxn2P15FK0rf0YWGq1yvDDB08XO2jINOUM0ETHm5ZDF5Z5sLKygQJWawK 8ZJ0P71uyKJhPAyq3OhAK2fU5nv1y5Z17x8k6er4QWDckTst2kVG9Pkv0LqANpjUonn6 p9bndLe/wqfDMdh00uzQB65oZJ8fxqQ+354fr/XtRnkg0pCldB/aw32InkVCMCnI5fv0 POuA== X-Gm-Message-State: AOAM532Mu37zEk+C2El55eV7ojdyqrhFhFJIEFRxPX5zln2r3j3a7HZj 8TrgrZOzZxaRwLUglOAznTSFb1I79jY= X-Google-Smtp-Source: ABdhPJwSXzu06mgdab8eo9kHOMEACaEAijcXod1ecXz3CDM7i84LQbB69sFQbd5BeSX8UDOWwRtHAA== X-Received: by 2002:adf:9148:: with SMTP id j66mr7159766wrj.28.1611676890874; Tue, 26 Jan 2021 08:01:30 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id r1sm28068267wrl.95.2021.01.26.08.01.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 26 Jan 2021 08:01:30 -0800 (PST) Message-Id: In-Reply-To: References: Date: Tue, 26 Jan 2021 16:01:12 +0000 Subject: [PATCH 03/17] commit-graph: use chunk-format write API Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: me@ttaylorr.com, gitster@pobox.com, l.s.r@web.de, szeder.dev@gmail.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee The commit-graph write logic is ready to make use of the chunk-format write API. Each chunk write method is already in the correct prototype. We only need to use the 'struct chunkfile' pointer and the correct API calls. Signed-off-by: Derrick Stolee --- commit-graph.c | 118 ++++++++++++++++--------------------------------- 1 file changed, 37 insertions(+), 81 deletions(-) diff --git a/commit-graph.c b/commit-graph.c index b26ed72396e..b2c0f233eab 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -19,6 +19,7 @@ #include "shallow.h" #include "json-writer.h" #include "trace2.h" +#include "chunk-format.h" void git_test_write_commit_graph_or_die(void) { @@ -1767,27 +1768,17 @@ static int write_graph_chunk_base(struct hashfile *f, return 0; } -typedef int (*chunk_write_fn)(struct hashfile *f, - void *data); - -struct chunk_info { - uint32_t id; - uint64_t size; - chunk_write_fn write_fn; -}; - static int write_commit_graph_file(struct write_commit_graph_context *ctx) { uint32_t i; int fd; struct hashfile *f; struct lock_file lk = LOCK_INIT; - struct chunk_info chunks[MAX_NUM_CHUNKS + 1]; const unsigned hashsz = the_hash_algo->rawsz; struct strbuf progress_title = STRBUF_INIT; int num_chunks = 3; - uint64_t chunk_offset; struct object_id file_hash; + struct chunkfile *cf; if (ctx->split) { struct strbuf tmp_file = STRBUF_INIT; @@ -1833,76 +1824,50 @@ static int write_commit_graph_file(struct write_commit_graph_context *ctx) f = hashfd(lk.tempfile->fd, lk.tempfile->filename.buf); } - chunks[0].id = GRAPH_CHUNKID_OIDFANOUT; - chunks[0].size = GRAPH_FANOUT_SIZE; - chunks[0].write_fn = write_graph_chunk_fanout; - chunks[1].id = GRAPH_CHUNKID_OIDLOOKUP; - chunks[1].size = hashsz * ctx->commits.nr; - chunks[1].write_fn = write_graph_chunk_oids; - chunks[2].id = GRAPH_CHUNKID_DATA; - chunks[2].size = (hashsz + 16) * ctx->commits.nr; - chunks[2].write_fn = write_graph_chunk_data; + cf = init_chunkfile(f); + + add_chunk(cf, GRAPH_CHUNKID_OIDFANOUT, + write_graph_chunk_fanout, GRAPH_FANOUT_SIZE); + add_chunk(cf, GRAPH_CHUNKID_OIDLOOKUP, + write_graph_chunk_oids, hashsz * ctx->commits.nr); + add_chunk(cf, GRAPH_CHUNKID_DATA, + write_graph_chunk_data, (hashsz + 16) * ctx->commits.nr); if (git_env_bool(GIT_TEST_COMMIT_GRAPH_NO_GDAT, 0)) ctx->write_generation_data = 0; - if (ctx->write_generation_data) { - chunks[num_chunks].id = GRAPH_CHUNKID_GENERATION_DATA; - chunks[num_chunks].size = sizeof(uint32_t) * ctx->commits.nr; - chunks[num_chunks].write_fn = write_graph_chunk_generation_data; - num_chunks++; - } - if (ctx->num_generation_data_overflows) { - chunks[num_chunks].id = GRAPH_CHUNKID_GENERATION_DATA_OVERFLOW; - chunks[num_chunks].size = sizeof(timestamp_t) * ctx->num_generation_data_overflows; - chunks[num_chunks].write_fn = write_graph_chunk_generation_data_overflow; - num_chunks++; - } - if (ctx->num_extra_edges) { - chunks[num_chunks].id = GRAPH_CHUNKID_EXTRAEDGES; - chunks[num_chunks].size = 4 * ctx->num_extra_edges; - chunks[num_chunks].write_fn = write_graph_chunk_extra_edges; - num_chunks++; - } + if (ctx->write_generation_data) + add_chunk(cf, GRAPH_CHUNKID_GENERATION_DATA, + write_graph_chunk_generation_data, + sizeof(uint32_t) * ctx->commits.nr); + if (ctx->num_generation_data_overflows) + add_chunk(cf, GRAPH_CHUNKID_GENERATION_DATA_OVERFLOW, + write_graph_chunk_generation_data_overflow, + sizeof(timestamp_t) * ctx->num_generation_data_overflows); + if (ctx->num_extra_edges) + add_chunk(cf, GRAPH_CHUNKID_EXTRAEDGES, + write_graph_chunk_extra_edges, + 4 * ctx->num_extra_edges); if (ctx->changed_paths) { - chunks[num_chunks].id = GRAPH_CHUNKID_BLOOMINDEXES; - chunks[num_chunks].size = sizeof(uint32_t) * ctx->commits.nr; - chunks[num_chunks].write_fn = write_graph_chunk_bloom_indexes; - num_chunks++; - chunks[num_chunks].id = GRAPH_CHUNKID_BLOOMDATA; - chunks[num_chunks].size = sizeof(uint32_t) * 3 - + ctx->total_bloom_filter_data_size; - chunks[num_chunks].write_fn = write_graph_chunk_bloom_data; - num_chunks++; - } - if (ctx->num_commit_graphs_after > 1) { - chunks[num_chunks].id = GRAPH_CHUNKID_BASE; - chunks[num_chunks].size = hashsz * (ctx->num_commit_graphs_after - 1); - chunks[num_chunks].write_fn = write_graph_chunk_base; - num_chunks++; - } - - chunks[num_chunks].id = 0; - chunks[num_chunks].size = 0; + add_chunk(cf, GRAPH_CHUNKID_BLOOMINDEXES, + write_graph_chunk_bloom_indexes, + sizeof(uint32_t) * ctx->commits.nr); + add_chunk(cf, GRAPH_CHUNKID_BLOOMDATA, + write_graph_chunk_bloom_data, + sizeof(uint32_t) * 3 + + ctx->total_bloom_filter_data_size); + } + if (ctx->num_commit_graphs_after > 1) + add_chunk(cf, GRAPH_CHUNKID_BASE, + write_graph_chunk_base, + hashsz * (ctx->num_commit_graphs_after - 1)); hashwrite_be32(f, GRAPH_SIGNATURE); hashwrite_u8(f, GRAPH_VERSION); hashwrite_u8(f, oid_version()); - hashwrite_u8(f, num_chunks); + hashwrite_u8(f, get_num_chunks(cf)); hashwrite_u8(f, ctx->num_commit_graphs_after - 1); - chunk_offset = 8 + (num_chunks + 1) * GRAPH_CHUNKLOOKUP_WIDTH; - for (i = 0; i <= num_chunks; i++) { - uint32_t chunk_write[3]; - - chunk_write[0] = htonl(chunks[i].id); - chunk_write[1] = htonl(chunk_offset >> 32); - chunk_write[2] = htonl(chunk_offset & 0xffffffff); - hashwrite(f, chunk_write, 12); - - chunk_offset += chunks[i].size; - } - if (ctx->report_progress) { strbuf_addf(&progress_title, Q_("Writing out commit graph in %d pass", @@ -1914,17 +1879,7 @@ static int write_commit_graph_file(struct write_commit_graph_context *ctx) num_chunks * ctx->commits.nr); } - for (i = 0; i < num_chunks; i++) { - uint64_t start_offset = f->total + f->offset; - - if (chunks[i].write_fn(f, ctx)) - return -1; - - if (f->total + f->offset != start_offset + chunks[i].size) - BUG("expected to write %"PRId64" bytes to chunk %"PRIx32", but wrote %"PRId64" instead", - chunks[i].size, chunks[i].id, - f->total + f->offset - start_offset); - } + write_chunkfile(cf, ctx); stop_progress(&ctx->progress); strbuf_release(&progress_title); @@ -1941,6 +1896,7 @@ static int write_commit_graph_file(struct write_commit_graph_context *ctx) close_commit_graph(ctx->r->objects); finalize_hashfile(f, file_hash.hash, CSUM_HASH_IN_STREAM | CSUM_FSYNC); + free_chunkfile(cf); if (ctx->split) { FILE *chainf = fdopen_lock_file(&lk, "w"); From patchwork Tue Jan 26 16:01:13 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 12047285 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 15EF4C433E0 for ; Tue, 26 Jan 2021 16:03:57 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D76042220B for ; Tue, 26 Jan 2021 16:03:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2391626AbhAZQDU (ORCPT ); Tue, 26 Jan 2021 11:03:20 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37602 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2404460AbhAZQC2 (ORCPT ); Tue, 26 Jan 2021 11:02:28 -0500 Received: from mail-wm1-x32e.google.com (mail-wm1-x32e.google.com [IPv6:2a00:1450:4864:20::32e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 83B0AC061D73 for ; Tue, 26 Jan 2021 08:01:33 -0800 (PST) Received: by mail-wm1-x32e.google.com with SMTP id o10so1695712wmc.1 for ; Tue, 26 Jan 2021 08:01:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=RrK4LgVKSOIyphxrGcSbqVj5ZT3qOBnWnoN7bO9GaiE=; b=femv5Up/nSURZ36Nq+8OgXtJlkTr1mM71TttjADdqP4BfkQ3WXCZVNY3MvvoKl1/lC WWQx0bMXtRfnSRwV0lC8yQMFggzrZuGzaR1N6Qfl6+B6/KyUvmddSJQxKVQ+COZgSoAL m4eaGwoJRDlDsXLzXo7T59IgaKwOTaAiSf++1shw8aA247R9ksUfALeV6lh3OJ07pNkb JVbXYxdvtY8FDettPl0cTpOvpWQ9xQLZ09+QN8WhM1UtCJTaZxWdaTGeT/iJgw4Hkp1v AQkQfBGp1HLq6MOyNmjlF0ztd4FrMDdLXJJutCOMmjoGyLttocdxk814GbLhbir1Eif2 HAPA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=RrK4LgVKSOIyphxrGcSbqVj5ZT3qOBnWnoN7bO9GaiE=; b=uf7DVoBCrtXdPaLTyZbFMg21BYerYmo0QoHV3vXb2G956FEc46Zh3FaaBqV3wQOuhX 2t+0u4plo0v4xVNW0gUW3mXF/AONB7je7xjjs1NNl9U6XMhDzd4D+rLz93MXSn5zFMSP ODmgQzpiuFLYxUNqrrOOxVcKUuTNh5TPQDJfcSMPE1TsOezs4jO+ZCCD6JHzvN7WtG4F qAkGVCj4qzk5wYeUFjehfhWoGI9w3R4hAMzoD0puNOikza8D5g+9OSsIDmSNr6Bc9fsZ QJGmfr5dhiN1Cs6M4WJnJZBfwdlcz1s8je1Rw0ockpdgjHSZB1fFr4cQBh59x9IXJXIb ZOnQ== X-Gm-Message-State: AOAM533HIwt3Qzq2RLRHjSzq9HhyCSvMB0Zq0XbGuoblskx+sIMtZuVs xYbW+rE7IeAHrFAi+G/tLMw/6LnWVVE= X-Google-Smtp-Source: ABdhPJyQvmVs0oC2Y9CBRlIUFtDIh9T1wiPmHoQ+WnGprIbL9Z5dhS5mbhIDf6qpiMinmxat3JWR6g== X-Received: by 2002:a05:600c:2281:: with SMTP id 1mr395670wmf.150.1611676891959; Tue, 26 Jan 2021 08:01:31 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id w4sm3669470wmc.13.2021.01.26.08.01.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 26 Jan 2021 08:01:31 -0800 (PST) Message-Id: <9fe5ee8611c3a18daaa9701bb5eefc8c408a7e76.1611676886.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Tue, 26 Jan 2021 16:01:13 +0000 Subject: [PATCH 04/17] midx: rename pack_info to write_midx_context Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: me@ttaylorr.com, gitster@pobox.com, l.s.r@web.de, szeder.dev@gmail.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee In an effort to streamline our chunk-based file formats, align some of the code structure in write_midx_internal() to be similar to the patterns in write_commit_graph_file(). Specifically, let's create a "struct write_midx_context" that can be used as a data parameter to abstract function types. This change only renames "struct pack_info" to "struct write_midx_context" and the names of instances from "packs" to "ctx". In future changes, we will expand the data inside "struct write_midx_context" and align our chunk-writing method with the chunk-format API. Signed-off-by: Derrick Stolee --- midx.c | 130 ++++++++++++++++++++++++++++----------------------------- 1 file changed, 65 insertions(+), 65 deletions(-) diff --git a/midx.c b/midx.c index 79c282b070d..dfc1a289246 100644 --- a/midx.c +++ b/midx.c @@ -451,7 +451,7 @@ static int pack_info_compare(const void *_a, const void *_b) return strcmp(a->pack_name, b->pack_name); } -struct pack_list { +struct write_midx_context { struct pack_info *info; uint32_t nr; uint32_t alloc; @@ -463,37 +463,37 @@ struct pack_list { static void add_pack_to_midx(const char *full_path, size_t full_path_len, const char *file_name, void *data) { - struct pack_list *packs = (struct pack_list *)data; + struct write_midx_context *ctx = (struct write_midx_context *)data; if (ends_with(file_name, ".idx")) { - display_progress(packs->progress, ++packs->pack_paths_checked); - if (packs->m && midx_contains_pack(packs->m, file_name)) + display_progress(ctx->progress, ++ctx->pack_paths_checked); + if (ctx->m && midx_contains_pack(ctx->m, file_name)) return; - ALLOC_GROW(packs->info, packs->nr + 1, packs->alloc); + ALLOC_GROW(ctx->info, ctx->nr + 1, ctx->alloc); - packs->info[packs->nr].p = add_packed_git(full_path, - full_path_len, - 0); + ctx->info[ctx->nr].p = add_packed_git(full_path, + full_path_len, + 0); - if (!packs->info[packs->nr].p) { + if (!ctx->info[ctx->nr].p) { warning(_("failed to add packfile '%s'"), full_path); return; } - if (open_pack_index(packs->info[packs->nr].p)) { + if (open_pack_index(ctx->info[ctx->nr].p)) { warning(_("failed to open pack-index '%s'"), full_path); - close_pack(packs->info[packs->nr].p); - FREE_AND_NULL(packs->info[packs->nr].p); + close_pack(ctx->info[ctx->nr].p); + FREE_AND_NULL(ctx->info[ctx->nr].p); return; } - packs->info[packs->nr].pack_name = xstrdup(file_name); - packs->info[packs->nr].orig_pack_int_id = packs->nr; - packs->info[packs->nr].expired = 0; - packs->nr++; + ctx->info[ctx->nr].pack_name = xstrdup(file_name); + ctx->info[ctx->nr].orig_pack_int_id = ctx->nr; + ctx->info[ctx->nr].expired = 0; + ctx->nr++; } } @@ -801,7 +801,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * uint32_t i; struct hashfile *f = NULL; struct lock_file lk; - struct pack_list packs; + struct write_midx_context ctx = { 0 }; uint32_t *pack_perm = NULL; uint64_t written = 0; uint32_t chunk_ids[MIDX_MAX_CHUNKS + 1]; @@ -820,40 +820,40 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * midx_name); if (m) - packs.m = m; + ctx.m = m; else - packs.m = load_multi_pack_index(object_dir, 1); - - packs.nr = 0; - packs.alloc = packs.m ? packs.m->num_packs : 16; - packs.info = NULL; - ALLOC_ARRAY(packs.info, packs.alloc); - - if (packs.m) { - for (i = 0; i < packs.m->num_packs; i++) { - ALLOC_GROW(packs.info, packs.nr + 1, packs.alloc); - - packs.info[packs.nr].orig_pack_int_id = i; - packs.info[packs.nr].pack_name = xstrdup(packs.m->pack_names[i]); - packs.info[packs.nr].p = NULL; - packs.info[packs.nr].expired = 0; - packs.nr++; + ctx.m = load_multi_pack_index(object_dir, 1); + + ctx.nr = 0; + ctx.alloc = ctx.m ? ctx.m->num_packs : 16; + ctx.info = NULL; + ALLOC_ARRAY(ctx.info, ctx.alloc); + + if (ctx.m) { + for (i = 0; i < ctx.m->num_packs; i++) { + ALLOC_GROW(ctx.info, ctx.nr + 1, ctx.alloc); + + ctx.info[ctx.nr].orig_pack_int_id = i; + ctx.info[ctx.nr].pack_name = xstrdup(ctx.m->pack_names[i]); + ctx.info[ctx.nr].p = NULL; + ctx.info[ctx.nr].expired = 0; + ctx.nr++; } } - packs.pack_paths_checked = 0; + ctx.pack_paths_checked = 0; if (flags & MIDX_PROGRESS) - packs.progress = start_delayed_progress(_("Adding packfiles to multi-pack-index"), 0); + ctx.progress = start_delayed_progress(_("Adding packfiles to multi-pack-index"), 0); else - packs.progress = NULL; + ctx.progress = NULL; - for_each_file_in_pack_dir(object_dir, add_pack_to_midx, &packs); - stop_progress(&packs.progress); + for_each_file_in_pack_dir(object_dir, add_pack_to_midx, &ctx); + stop_progress(&ctx.progress); - if (packs.m && packs.nr == packs.m->num_packs && !packs_to_drop) + if (ctx.m && ctx.nr == ctx.m->num_packs && !packs_to_drop) goto cleanup; - entries = get_sorted_entries(packs.m, packs.info, packs.nr, &nr_entries); + entries = get_sorted_entries(ctx.m, ctx.info, ctx.nr, &nr_entries); for (i = 0; i < nr_entries; i++) { if (entries[i].offset > 0x7fffffff) @@ -862,19 +862,19 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * large_offsets_needed = 1; } - QSORT(packs.info, packs.nr, pack_info_compare); + QSORT(ctx.info, ctx.nr, pack_info_compare); if (packs_to_drop && packs_to_drop->nr) { int drop_index = 0; int missing_drops = 0; - for (i = 0; i < packs.nr && drop_index < packs_to_drop->nr; i++) { - int cmp = strcmp(packs.info[i].pack_name, + for (i = 0; i < ctx.nr && drop_index < packs_to_drop->nr; i++) { + int cmp = strcmp(ctx.info[i].pack_name, packs_to_drop->items[drop_index].string); if (!cmp) { drop_index++; - packs.info[i].expired = 1; + ctx.info[i].expired = 1; } else if (cmp > 0) { error(_("did not see pack-file %s to drop"), packs_to_drop->items[drop_index].string); @@ -882,7 +882,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * missing_drops++; i--; } else { - packs.info[i].expired = 0; + ctx.info[i].expired = 0; } } @@ -898,19 +898,19 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * * * pack_perm[old_id] = new_id */ - ALLOC_ARRAY(pack_perm, packs.nr); - for (i = 0; i < packs.nr; i++) { - if (packs.info[i].expired) { + ALLOC_ARRAY(pack_perm, ctx.nr); + for (i = 0; i < ctx.nr; i++) { + if (ctx.info[i].expired) { dropped_packs++; - pack_perm[packs.info[i].orig_pack_int_id] = PACK_EXPIRED; + pack_perm[ctx.info[i].orig_pack_int_id] = PACK_EXPIRED; } else { - pack_perm[packs.info[i].orig_pack_int_id] = i - dropped_packs; + pack_perm[ctx.info[i].orig_pack_int_id] = i - dropped_packs; } } - for (i = 0; i < packs.nr; i++) { - if (!packs.info[i].expired) - pack_name_concat_len += strlen(packs.info[i].pack_name) + 1; + for (i = 0; i < ctx.nr; i++) { + if (!ctx.info[i].expired) + pack_name_concat_len += strlen(ctx.info[i].pack_name) + 1; } if (pack_name_concat_len % MIDX_CHUNK_ALIGNMENT) @@ -921,19 +921,19 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * f = hashfd(lk.tempfile->fd, lk.tempfile->filename.buf); FREE_AND_NULL(midx_name); - if (packs.m) - close_midx(packs.m); + if (ctx.m) + close_midx(ctx.m); cur_chunk = 0; num_chunks = large_offsets_needed ? 5 : 4; - if (packs.nr - dropped_packs == 0) { + if (ctx.nr - dropped_packs == 0) { error(_("no pack files to index.")); result = 1; goto cleanup; } - written = write_midx_header(f, num_chunks, packs.nr - dropped_packs); + written = write_midx_header(f, num_chunks, ctx.nr - dropped_packs); chunk_ids[cur_chunk] = MIDX_CHUNKID_PACKNAMES; chunk_offsets[cur_chunk] = written + (num_chunks + 1) * MIDX_CHUNKLOOKUP_WIDTH; @@ -990,7 +990,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * switch (chunk_ids[i]) { case MIDX_CHUNKID_PACKNAMES: - written += write_midx_pack_names(f, packs.info, packs.nr); + written += write_midx_pack_names(f, ctx.info, ctx.nr); break; case MIDX_CHUNKID_OIDFANOUT: @@ -1027,15 +1027,15 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * commit_lock_file(&lk); cleanup: - for (i = 0; i < packs.nr; i++) { - if (packs.info[i].p) { - close_pack(packs.info[i].p); - free(packs.info[i].p); + for (i = 0; i < ctx.nr; i++) { + if (ctx.info[i].p) { + close_pack(ctx.info[i].p); + free(ctx.info[i].p); } - free(packs.info[i].pack_name); + free(ctx.info[i].pack_name); } - free(packs.info); + free(ctx.info); free(entries); free(pack_perm); free(midx_name); From patchwork Tue Jan 26 16:01:14 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 12047281 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4EFD3C433DB for ; Tue, 26 Jan 2021 16:03:21 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 2A0F22229C for ; Tue, 26 Jan 2021 16:03:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2404579AbhAZQCw (ORCPT ); Tue, 26 Jan 2021 11:02:52 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37606 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2404477AbhAZQC2 (ORCPT ); Tue, 26 Jan 2021 11:02:28 -0500 Received: from mail-wm1-x330.google.com (mail-wm1-x330.google.com [IPv6:2a00:1450:4864:20::330]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3E785C061D7F for ; Tue, 26 Jan 2021 08:01:34 -0800 (PST) Received: by mail-wm1-x330.google.com with SMTP id f16so2881388wmq.5 for ; Tue, 26 Jan 2021 08:01:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=DVeZznH2fU1xeM9jKJdLFnNSw1D+YZp6mUDWZr+uTgA=; b=XLJyWmyCeayUpDX/A134m0k27Uh3HMV3pmeWy3DDaWL2rTz1YWGeTIl+WLhjwJFJlE L9C+qQ2AFH+GuDfiEPMXgkd1UH3suiTKTFicHnAadMpXJBVh36H1JViJZmOph5LUf3Wq TR/S/oOlGWwheBDKPPe/XsOK10Gc1KOYgJemNROeWolg38uWBKSzOreTCgXG8E0F58i5 UnqwLu/JwC2Uvn0UYkigS6689evby+Vvh9HS2r8wEXrzKP4Uz50IeCFpf/N2X6c9Z3y/ fabnDxnYZIQsTQPG9sWGoaHHVaPGYrmbQGeN3hl0+xq+6aMkmPwCbIe+fFAmXECIFAJX fbxw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=DVeZznH2fU1xeM9jKJdLFnNSw1D+YZp6mUDWZr+uTgA=; b=ScRzIeCxKdUEgnJTQ3HaSTHDK9fsiI/P88FTdzfWkGIxLpmJGtaUEAjpSS+j+K8rko W1xQtSs/VUKaZXqMWQpRQtS7oUa0a6KsIkP/NRPZttF7ycLqzTD6k3i8/aO6yjQbMg/j drkW1xvC9W1Bnm8zycdyQa/HqVMGVEZnuRmFAevJkQKUXGkyQQ58LwPfJdG96fzOMzZg RLM3ydYM3A1V3f/u2x1rsJd1Wu1Iz5VRhWUl2WVmcPmWz5GxreaTKcJghZYKd6ILCSeF NUmEtf5h5+AEJT+goZYNj0pl0POgwb30KCYHlgKHxuLF9DsWQn/xE6Hzb0BlBW0SlmCj A7kg== X-Gm-Message-State: AOAM530U7t+lhDZ54JFi82zeitzpI6OdEsbSbB3/NJ3OANMFk5lEkesT 5/iGrGEAEp6YQ1kngHt5Ttj5EOdPOuE= X-Google-Smtp-Source: ABdhPJxKb3hMCss6ta64D8c+3Xtswl9bnHLQR6nPYE8RkahG8C6UvHdgx4HB/beuxQfeQDWYLdVDiQ== X-Received: by 2002:a7b:cde1:: with SMTP id p1mr353095wmj.111.1611676892880; Tue, 26 Jan 2021 08:01:32 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id n15sm9284738wrx.2.2021.01.26.08.01.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 26 Jan 2021 08:01:32 -0800 (PST) Message-Id: <14a0246b98257cf0eb00de88b1a04409fc60138f.1611676886.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Tue, 26 Jan 2021 16:01:14 +0000 Subject: [PATCH 05/17] midx: use context in write_midx_pack_names() Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: me@ttaylorr.com, gitster@pobox.com, l.s.r@web.de, szeder.dev@gmail.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee In an effort to align the write_midx_internal() to use the chunk-format API, start converting chunk writing methods to match chunk_write_fn. The first case is to convert write_midx_pack_names() to take "void *data". We already have the necessary data in "struct write_midx_context", so this conversion is rather mechanical. Signed-off-by: Derrick Stolee --- midx.c | 21 ++++++++++----------- 1 file changed, 10 insertions(+), 11 deletions(-) diff --git a/midx.c b/midx.c index dfc1a289246..f348a70e018 100644 --- a/midx.c +++ b/midx.c @@ -643,27 +643,26 @@ static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m, return deduplicated_entries; } -static size_t write_midx_pack_names(struct hashfile *f, - struct pack_info *info, - uint32_t num_packs) +static size_t write_midx_pack_names(struct hashfile *f, void *data) { + struct write_midx_context *ctx = (struct write_midx_context *)data; uint32_t i; unsigned char padding[MIDX_CHUNK_ALIGNMENT]; size_t written = 0; - for (i = 0; i < num_packs; i++) { + for (i = 0; i < ctx->nr; i++) { size_t writelen; - if (info[i].expired) + if (ctx->info[i].expired) continue; - if (i && strcmp(info[i].pack_name, info[i - 1].pack_name) <= 0) + if (i && strcmp(ctx->info[i].pack_name, ctx->info[i - 1].pack_name) <= 0) BUG("incorrect pack-file order: %s before %s", - info[i - 1].pack_name, - info[i].pack_name); + ctx->info[i - 1].pack_name, + ctx->info[i].pack_name); - writelen = strlen(info[i].pack_name) + 1; - hashwrite(f, info[i].pack_name, writelen); + writelen = strlen(ctx->info[i].pack_name) + 1; + hashwrite(f, ctx->info[i].pack_name, writelen); written += writelen; } @@ -990,7 +989,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * switch (chunk_ids[i]) { case MIDX_CHUNKID_PACKNAMES: - written += write_midx_pack_names(f, ctx.info, ctx.nr); + written += write_midx_pack_names(f, &ctx); break; case MIDX_CHUNKID_OIDFANOUT: From patchwork Tue Jan 26 16:01:15 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 12047283 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 853A4C4332B for ; Tue, 26 Jan 2021 16:03:21 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 59ECC2220B for ; Tue, 26 Jan 2021 16:03:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2404647AbhAZQC7 (ORCPT ); Tue, 26 Jan 2021 11:02:59 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37608 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2404480AbhAZQC2 (ORCPT ); Tue, 26 Jan 2021 11:02:28 -0500 Received: from mail-wr1-x436.google.com (mail-wr1-x436.google.com [IPv6:2a00:1450:4864:20::436]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4396FC0698C0 for ; Tue, 26 Jan 2021 08:01:35 -0800 (PST) Received: by mail-wr1-x436.google.com with SMTP id c12so17040991wrc.7 for ; Tue, 26 Jan 2021 08:01:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=QU5COrCpqL9ajTbjR4BrbM9r/eRoyXPU2H69oqo7fLQ=; b=nCVW/uu+ojLaypJs/JLOv+46rSJQgTKX2s4pIxD/O1lGKaJ2omaWRzHtKQjIMjBpHB 7wN8pfjiy1VbUPtahLHUL7ZNwy9yQ85eXObpEBg4Fdgo0ImIflSPb2y2CSnpcEgHfvjI JRm/wd+YW3zNduK999nsaGDl014kraEobwhHCGv0wVa6CCEdmsp3j9HQEo7ZuMkItK87 geapAVTGWouzv/NacADDlkmrFKzlZWpJBpGuuD95s18RrEWYPrNY0/q0J9NHr9RIxtjr 5IdEqORQcue+a5a0ssee4YDruLp2NaRsc7XPtKzUVxjU4Z7XdVEJ30jjB1ZWl5ddCAoB J0Ww== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=QU5COrCpqL9ajTbjR4BrbM9r/eRoyXPU2H69oqo7fLQ=; b=bgMhIMeseGZA/pBZKlNEVggu19OBtQM26+XBl+SBU47kG6SJVvWl4DqSizQUiqWaBG S0LTUNdlauX9ingUkyRmykvSEaBTl+wlnCRJejyxZPe8mKPXPQTVtnBHZfuU3MrjrBPz DGaCNcP0uWj6x6qlHGRwFtoubCvh2lVn+BRnjPKTPAkr1XsHPR8orTBAUxbwJ0v/hCAr r97cc3/m4LGzlt8YZ80TaFXJ3a+c3hurWQ2BHKbAyeZhF4qzYug0dfxwbbUZxtHWKWtj kWaL7tp6JIJHH7aaIS3DK5dDl1T8IT9bGRGneS+Feus218Nfuo1ugdB90EIABpG0zKr1 4Sxw== X-Gm-Message-State: AOAM533GY9dVT2+isWb4TqyfI1tvTJJXpb5Iyt7X3D51RLqKycAaluZq s6DNgIm77NbgCt4h7rjFxJz0f+2Rieo= X-Google-Smtp-Source: ABdhPJymUQn6eSjwD8vrbQmQcuuYAjJGwUm/PIVjhTxedBiTtZTokgmqNjXkdDm9hx5/FNs881Ufzg== X-Received: by 2002:a5d:664c:: with SMTP id f12mr6765504wrw.61.1611676893750; Tue, 26 Jan 2021 08:01:33 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id l14sm13639750wrq.87.2021.01.26.08.01.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 26 Jan 2021 08:01:33 -0800 (PST) Message-Id: <79f479ef7d16bd5fd4be058eed5fe9d06291fd65.1611676886.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Tue, 26 Jan 2021 16:01:15 +0000 Subject: [PATCH 06/17] midx: add entries to write_midx_context Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: me@ttaylorr.com, gitster@pobox.com, l.s.r@web.de, szeder.dev@gmail.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee In an effort to align write_midx_internal() with the chunk-format API, continue to group necessary data into "struct write_midx_context". This change collects the "struct pack_midx_entry *entries" list and its count into the context. Update write_midx_oid_fanout() and write_midx_oid_lookup() to take the context directly, as these are easy conversions with this new data. Only the callers of write_midx_object_offsets() and write_midx_large_offsets() are updated here, since additional data in the context before those methods can match chunk_write_fn. Signed-off-by: Derrick Stolee --- midx.c | 49 ++++++++++++++++++++++++++----------------------- 1 file changed, 26 insertions(+), 23 deletions(-) diff --git a/midx.c b/midx.c index f348a70e018..ee6f3504c6a 100644 --- a/midx.c +++ b/midx.c @@ -458,6 +458,9 @@ struct write_midx_context { struct multi_pack_index *m; struct progress *progress; unsigned pack_paths_checked; + + struct pack_midx_entry *entries; + uint32_t entries_nr; }; static void add_pack_to_midx(const char *full_path, size_t full_path_len, @@ -678,11 +681,11 @@ static size_t write_midx_pack_names(struct hashfile *f, void *data) } static size_t write_midx_oid_fanout(struct hashfile *f, - struct pack_midx_entry *objects, - uint32_t nr_objects) + void *data) { - struct pack_midx_entry *list = objects; - struct pack_midx_entry *last = objects + nr_objects; + struct write_midx_context *ctx = (struct write_midx_context *)data; + struct pack_midx_entry *list = ctx->entries; + struct pack_midx_entry *last = ctx->entries + ctx->entries_nr; uint32_t count = 0; uint32_t i; @@ -706,18 +709,19 @@ static size_t write_midx_oid_fanout(struct hashfile *f, return MIDX_CHUNK_FANOUT_SIZE; } -static size_t write_midx_oid_lookup(struct hashfile *f, unsigned char hash_len, - struct pack_midx_entry *objects, - uint32_t nr_objects) +static size_t write_midx_oid_lookup(struct hashfile *f, + void *data) { - struct pack_midx_entry *list = objects; + struct write_midx_context *ctx = (struct write_midx_context *)data; + unsigned char hash_len = the_hash_algo->rawsz; + struct pack_midx_entry *list = ctx->entries; uint32_t i; size_t written = 0; - for (i = 0; i < nr_objects; i++) { + for (i = 0; i < ctx->entries_nr; i++) { struct pack_midx_entry *obj = list++; - if (i < nr_objects - 1) { + if (i < ctx->entries_nr - 1) { struct pack_midx_entry *next = list; if (oidcmp(&obj->oid, &next->oid) >= 0) BUG("OIDs not in order: %s >= %s", @@ -805,8 +809,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * uint64_t written = 0; uint32_t chunk_ids[MIDX_MAX_CHUNKS + 1]; uint64_t chunk_offsets[MIDX_MAX_CHUNKS + 1]; - uint32_t nr_entries, num_large_offsets = 0; - struct pack_midx_entry *entries = NULL; + uint32_t num_large_offsets = 0; struct progress *progress = NULL; int large_offsets_needed = 0; int pack_name_concat_len = 0; @@ -852,12 +855,12 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * if (ctx.m && ctx.nr == ctx.m->num_packs && !packs_to_drop) goto cleanup; - entries = get_sorted_entries(ctx.m, ctx.info, ctx.nr, &nr_entries); + ctx.entries = get_sorted_entries(ctx.m, ctx.info, ctx.nr, &ctx.entries_nr); - for (i = 0; i < nr_entries; i++) { - if (entries[i].offset > 0x7fffffff) + for (i = 0; i < ctx.entries_nr; i++) { + if (ctx.entries[i].offset > 0x7fffffff) num_large_offsets++; - if (entries[i].offset > 0xffffffff) + if (ctx.entries[i].offset > 0xffffffff) large_offsets_needed = 1; } @@ -947,10 +950,10 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * cur_chunk++; chunk_ids[cur_chunk] = MIDX_CHUNKID_OBJECTOFFSETS; - chunk_offsets[cur_chunk] = chunk_offsets[cur_chunk - 1] + nr_entries * the_hash_algo->rawsz; + chunk_offsets[cur_chunk] = chunk_offsets[cur_chunk - 1] + ctx.entries_nr * the_hash_algo->rawsz; cur_chunk++; - chunk_offsets[cur_chunk] = chunk_offsets[cur_chunk - 1] + nr_entries * MIDX_CHUNK_OFFSET_WIDTH; + chunk_offsets[cur_chunk] = chunk_offsets[cur_chunk - 1] + ctx.entries_nr * MIDX_CHUNK_OFFSET_WIDTH; if (large_offsets_needed) { chunk_ids[cur_chunk] = MIDX_CHUNKID_LARGEOFFSETS; @@ -993,19 +996,19 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * break; case MIDX_CHUNKID_OIDFANOUT: - written += write_midx_oid_fanout(f, entries, nr_entries); + written += write_midx_oid_fanout(f, &ctx); break; case MIDX_CHUNKID_OIDLOOKUP: - written += write_midx_oid_lookup(f, the_hash_algo->rawsz, entries, nr_entries); + written += write_midx_oid_lookup(f, &ctx); break; case MIDX_CHUNKID_OBJECTOFFSETS: - written += write_midx_object_offsets(f, large_offsets_needed, pack_perm, entries, nr_entries); + written += write_midx_object_offsets(f, large_offsets_needed, pack_perm, ctx.entries, ctx.entries_nr); break; case MIDX_CHUNKID_LARGEOFFSETS: - written += write_midx_large_offsets(f, num_large_offsets, entries, nr_entries); + written += write_midx_large_offsets(f, num_large_offsets, ctx.entries, ctx.entries_nr); break; default: @@ -1035,7 +1038,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * } free(ctx.info); - free(entries); + free(ctx.entries); free(pack_perm); free(midx_name); return result; From patchwork Tue Jan 26 16:01:16 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 12047299 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B807BC433DB for ; Tue, 26 Jan 2021 16:06:30 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 7F06E2220B for ; Tue, 26 Jan 2021 16:06:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2404355AbhAZQGC (ORCPT ); Tue, 26 Jan 2021 11:06:02 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37718 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2404602AbhAZQC7 (ORCPT ); Tue, 26 Jan 2021 11:02:59 -0500 Received: from mail-wr1-x42a.google.com (mail-wr1-x42a.google.com [IPv6:2a00:1450:4864:20::42a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4961EC0698C1 for ; Tue, 26 Jan 2021 08:01:36 -0800 (PST) Received: by mail-wr1-x42a.google.com with SMTP id v15so17006847wrx.4 for ; Tue, 26 Jan 2021 08:01:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=kAGVOi9Eji1v1ZSuzjXUMHK+pCNbc5PVfTo1bOBO/M0=; b=tZ8772shu5bI5wxfiEVYh+wLBgUhqlzYeQSaSAdNlzwDwPb/pNAFIs1CID45R6N+uo /mVLdYVCB/kHMwPISDBMrbqypwrnb3YLM1gQv4AuMcMCjjnjttSTHyLcELSMX7n1bvD+ LGxhLG51YL3TqRZjK4TnlPHl03C4pxXr7FBPkYbu0TwQjIq1wTWZzEXDbRdzmwiOT1eg KCwL41m8P93dBDdUOJSxAI1+ksCiSrUWy8xzCR6E8/aLgFmOYBPJ4MAmIh4kTcmQZ8w2 CfaxShegPSK7gzxKniF+Z4NepvQ3qrKeReRL4NUBmw3BlgqIc5A7FjP2ZkLGBAaV89zN dn4A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=kAGVOi9Eji1v1ZSuzjXUMHK+pCNbc5PVfTo1bOBO/M0=; b=N6b9wLleumBkm3ahs7bvSUBKSmW+1mks7panZCI6u+iz4e1P0fORa1VEYyqvFF2apL WWRUlp9lXdAv6gg7OQBREApkzHNKIV1C53tGk/z26NAhsxKtU9ah+4l6GArj0iCiWtP5 GaBTXT097zgnQo6b8bZZI0tu/LSDHwEgbqbzNDEkmnDCuPhweV5Kt4T3VNldayyWesVu +1hYmWtd7qbFXHKwO5yCnRZUuz1v6cRc+cnL7PRulroI7ekotH7mq46sV+yBJc6jN7f9 lSBYySYtf2WQ6dEFt3Sai5/aoehEvL/e8iMr+MsbjnYROIrQ8UamFTLPhdofxtlaCsIN rgtw== X-Gm-Message-State: AOAM5325Jjft8XAwUo6UVeU4HHHE2t8Nh9UtPM978q6mstx9k26eT5Lk DDoUXUJHbHlYNY59OjYPBzK04v65uSE= X-Google-Smtp-Source: ABdhPJyqB8WnI7IcovXXBoAdZLbulf2f/bBnQYQzA+jFJ5LEq2rrQZ56nbrR+7Jsi4vnjKM8nMWp7w== X-Received: by 2002:adf:8464:: with SMTP id 91mr6641404wrf.188.1611676894765; Tue, 26 Jan 2021 08:01:34 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id o12sm6848197wrx.82.2021.01.26.08.01.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 26 Jan 2021 08:01:34 -0800 (PST) Message-Id: <0b4ce3f1732a6b7297473ac3bae035f555a9268a.1611676886.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Tue, 26 Jan 2021 16:01:16 +0000 Subject: [PATCH 07/17] midx: add pack_perm to write_midx_context Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: me@ttaylorr.com, gitster@pobox.com, l.s.r@web.de, szeder.dev@gmail.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee In an effort to align write_midx_internal() with the chunk-format API, continue to group necessary data into "struct write_midx_context". This change collects the "uint32_t *pack_perm" and large_offsets_needed bit into the context. Update write_midx_object_offsets() to match chunk_write_fn. Signed-off-by: Derrick Stolee --- midx.c | 40 +++++++++++++++++++++------------------- 1 file changed, 21 insertions(+), 19 deletions(-) diff --git a/midx.c b/midx.c index ee6f3504c6a..66feff096e8 100644 --- a/midx.c +++ b/midx.c @@ -461,6 +461,9 @@ struct write_midx_context { struct pack_midx_entry *entries; uint32_t entries_nr; + + uint32_t *pack_perm; + unsigned large_offsets_needed:1; }; static void add_pack_to_midx(const char *full_path, size_t full_path_len, @@ -736,27 +739,27 @@ static size_t write_midx_oid_lookup(struct hashfile *f, return written; } -static size_t write_midx_object_offsets(struct hashfile *f, int large_offset_needed, - uint32_t *perm, - struct pack_midx_entry *objects, uint32_t nr_objects) +static size_t write_midx_object_offsets(struct hashfile *f, + void *data) { - struct pack_midx_entry *list = objects; + struct write_midx_context *ctx = (struct write_midx_context *)data; + struct pack_midx_entry *list = ctx->entries; uint32_t i, nr_large_offset = 0; size_t written = 0; - for (i = 0; i < nr_objects; i++) { + for (i = 0; i < ctx->entries_nr; i++) { struct pack_midx_entry *obj = list++; - if (perm[obj->pack_int_id] == PACK_EXPIRED) + if (ctx->pack_perm[obj->pack_int_id] == PACK_EXPIRED) BUG("object %s is in an expired pack with int-id %d", oid_to_hex(&obj->oid), obj->pack_int_id); - hashwrite_be32(f, perm[obj->pack_int_id]); + hashwrite_be32(f, ctx->pack_perm[obj->pack_int_id]); - if (large_offset_needed && obj->offset >> 31) + if (ctx->large_offsets_needed && obj->offset >> 31) hashwrite_be32(f, MIDX_LARGE_OFFSET_NEEDED | nr_large_offset++); - else if (!large_offset_needed && obj->offset >> 32) + else if (!ctx->large_offsets_needed && obj->offset >> 32) BUG("object %s requires a large offset (%"PRIx64") but the MIDX is not writing large offsets!", oid_to_hex(&obj->oid), obj->offset); @@ -805,13 +808,11 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * struct hashfile *f = NULL; struct lock_file lk; struct write_midx_context ctx = { 0 }; - uint32_t *pack_perm = NULL; uint64_t written = 0; uint32_t chunk_ids[MIDX_MAX_CHUNKS + 1]; uint64_t chunk_offsets[MIDX_MAX_CHUNKS + 1]; uint32_t num_large_offsets = 0; struct progress *progress = NULL; - int large_offsets_needed = 0; int pack_name_concat_len = 0; int dropped_packs = 0; int result = 0; @@ -857,11 +858,12 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * ctx.entries = get_sorted_entries(ctx.m, ctx.info, ctx.nr, &ctx.entries_nr); + ctx.large_offsets_needed = 0; for (i = 0; i < ctx.entries_nr; i++) { if (ctx.entries[i].offset > 0x7fffffff) num_large_offsets++; if (ctx.entries[i].offset > 0xffffffff) - large_offsets_needed = 1; + ctx.large_offsets_needed = 1; } QSORT(ctx.info, ctx.nr, pack_info_compare); @@ -900,13 +902,13 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * * * pack_perm[old_id] = new_id */ - ALLOC_ARRAY(pack_perm, ctx.nr); + ALLOC_ARRAY(ctx.pack_perm, ctx.nr); for (i = 0; i < ctx.nr; i++) { if (ctx.info[i].expired) { dropped_packs++; - pack_perm[ctx.info[i].orig_pack_int_id] = PACK_EXPIRED; + ctx.pack_perm[ctx.info[i].orig_pack_int_id] = PACK_EXPIRED; } else { - pack_perm[ctx.info[i].orig_pack_int_id] = i - dropped_packs; + ctx.pack_perm[ctx.info[i].orig_pack_int_id] = i - dropped_packs; } } @@ -927,7 +929,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * close_midx(ctx.m); cur_chunk = 0; - num_chunks = large_offsets_needed ? 5 : 4; + num_chunks = ctx.large_offsets_needed ? 5 : 4; if (ctx.nr - dropped_packs == 0) { error(_("no pack files to index.")); @@ -954,7 +956,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * cur_chunk++; chunk_offsets[cur_chunk] = chunk_offsets[cur_chunk - 1] + ctx.entries_nr * MIDX_CHUNK_OFFSET_WIDTH; - if (large_offsets_needed) { + if (ctx.large_offsets_needed) { chunk_ids[cur_chunk] = MIDX_CHUNKID_LARGEOFFSETS; cur_chunk++; @@ -1004,7 +1006,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * break; case MIDX_CHUNKID_OBJECTOFFSETS: - written += write_midx_object_offsets(f, large_offsets_needed, pack_perm, ctx.entries, ctx.entries_nr); + written += write_midx_object_offsets(f, &ctx); break; case MIDX_CHUNKID_LARGEOFFSETS: @@ -1039,7 +1041,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * free(ctx.info); free(ctx.entries); - free(pack_perm); + free(ctx.pack_perm); free(midx_name); return result; } From patchwork Tue Jan 26 16:01:17 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 12047307 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 72FB0C433E0 for ; Tue, 26 Jan 2021 16:07:03 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 382E7207B3 for ; Tue, 26 Jan 2021 16:07:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2404868AbhAZQGa (ORCPT ); Tue, 26 Jan 2021 11:06:30 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37716 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2404616AbhAZQC7 (ORCPT ); Tue, 26 Jan 2021 11:02:59 -0500 Received: from mail-wm1-x32a.google.com (mail-wm1-x32a.google.com [IPv6:2a00:1450:4864:20::32a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 51204C0698C2 for ; Tue, 26 Jan 2021 08:01:37 -0800 (PST) Received: by mail-wm1-x32a.google.com with SMTP id u14so3219430wmq.4 for ; Tue, 26 Jan 2021 08:01:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=SDf53TnuEVi5SZUAg22vP1yHKIDBDQSZUZC6stgTJ+Q=; b=tTW+CqggnwgJAdrQajTqsDwBdefKmh8XqfyQhmfRZMvBUgvEvxUT0OyIdrz2rbjnIV klC9NP+gH7LOBKnNgGf0dqaEbeOFRIbqHra8Byxfcev2S3GFc8noyn++2mA+EnIXQrmj 3/+8xMWD3VTmz+pbsP6yEKSTX8xSm3TwTmtEkpZDi6kPY6cOOOBMc7Ps1pTdtYTzpYFV hVhF1u8GMwzoh8kKhFhyMTY9fuGIQTjz3B27hiMBlVLfqJFHj2sD/HXEhwziTHWuXmHP FjtPhAGg3EClL+pvJZkVlxB8wxVxpGoXVYC5Cq5W35lrBYJJ05tW+/IS/UP4xsSDFDSY 3cRQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=SDf53TnuEVi5SZUAg22vP1yHKIDBDQSZUZC6stgTJ+Q=; b=D5hcZz4exyLqsOrLMbKqas19ZFainwYvLP7P8iaHRZzIZDHy41qii4No9MUv36agDb L6F0wL0n7v1WrQ81wdqWyDGD70smzbhHfUwifkm8fjx88KoISkOtdzGxN5GIR5MRBKMD qOej481i/lNN+SR+ON5+2qhMgR9Ppu3AiqTDVtxLN5VlA4cnjJHC0kOIBrTQVEd87Ue+ 04nv3BR5kaStcW5ZzSKtcaGPAh8jX5J4Od/khBr6bbEcApKksVYaTbsryeo8jj5ZwIxf tsn6y/ovI9dpv7U226k/o4PFIj/fKSAHs+S+RUc1qQl4K3EIdWLhoh3z1ypt9HiteGcN JBWA== X-Gm-Message-State: AOAM533FupJqiu4/WPmPsV8stve/iAWKFabT0yPz7jr09ByfVAzm6UQz /bpGAwZn82Y7LWnrkmuFUs2/2PVFDEA= X-Google-Smtp-Source: ABdhPJzixmcOQxHRP15QSeqAIMot9ZVbCMP6svp6a3V9IlUZoJ1VYcWyVJlITX50fSr/IAppL1dNWQ== X-Received: by 2002:a1c:7c06:: with SMTP id x6mr374018wmc.67.1611676895887; Tue, 26 Jan 2021 08:01:35 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id j4sm18253603wru.20.2021.01.26.08.01.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 26 Jan 2021 08:01:35 -0800 (PST) Message-Id: In-Reply-To: References: Date: Tue, 26 Jan 2021 16:01:17 +0000 Subject: [PATCH 08/17] midx: add num_large_offsets to write_midx_context Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: me@ttaylorr.com, gitster@pobox.com, l.s.r@web.de, szeder.dev@gmail.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee In an effort to align write_midx_internal() with the chunk-format API, continue to group necessary data into "struct write_midx_context". This change collects the "uint32_t num_large_offsets" into the context. With this new data, write_midx_large_offsets() now matches the chunk_write_fn type. Signed-off-by: Derrick Stolee --- midx.c | 17 ++++++++++------- 1 file changed, 10 insertions(+), 7 deletions(-) diff --git a/midx.c b/midx.c index 66feff096e8..40b815f8877 100644 --- a/midx.c +++ b/midx.c @@ -464,6 +464,7 @@ struct write_midx_context { uint32_t *pack_perm; unsigned large_offsets_needed:1; + uint32_t num_large_offsets; }; static void add_pack_to_midx(const char *full_path, size_t full_path_len, @@ -772,11 +773,14 @@ static size_t write_midx_object_offsets(struct hashfile *f, return written; } -static size_t write_midx_large_offsets(struct hashfile *f, uint32_t nr_large_offset, - struct pack_midx_entry *objects, uint32_t nr_objects) +static size_t write_midx_large_offsets(struct hashfile *f, + void *data) { - struct pack_midx_entry *list = objects, *end = objects + nr_objects; + struct write_midx_context *ctx = (struct write_midx_context *)data; + struct pack_midx_entry *list = ctx->entries; + struct pack_midx_entry *end = ctx->entries + ctx->entries_nr; size_t written = 0; + uint32_t nr_large_offset = ctx->num_large_offsets; while (nr_large_offset) { struct pack_midx_entry *obj; @@ -811,7 +815,6 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * uint64_t written = 0; uint32_t chunk_ids[MIDX_MAX_CHUNKS + 1]; uint64_t chunk_offsets[MIDX_MAX_CHUNKS + 1]; - uint32_t num_large_offsets = 0; struct progress *progress = NULL; int pack_name_concat_len = 0; int dropped_packs = 0; @@ -861,7 +864,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * ctx.large_offsets_needed = 0; for (i = 0; i < ctx.entries_nr; i++) { if (ctx.entries[i].offset > 0x7fffffff) - num_large_offsets++; + ctx.num_large_offsets++; if (ctx.entries[i].offset > 0xffffffff) ctx.large_offsets_needed = 1; } @@ -961,7 +964,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * cur_chunk++; chunk_offsets[cur_chunk] = chunk_offsets[cur_chunk - 1] + - num_large_offsets * MIDX_CHUNK_LARGE_OFFSET_WIDTH; + ctx.num_large_offsets * MIDX_CHUNK_LARGE_OFFSET_WIDTH; } chunk_ids[cur_chunk] = 0; @@ -1010,7 +1013,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * break; case MIDX_CHUNKID_LARGEOFFSETS: - written += write_midx_large_offsets(f, num_large_offsets, ctx.entries, ctx.entries_nr); + written += write_midx_large_offsets(f, &ctx); break; default: From patchwork Tue Jan 26 16:01:18 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 12047303 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 13EACC43381 for ; Tue, 26 Jan 2021 16:06:31 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id DE1252220B for ; Tue, 26 Jan 2021 16:06:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2404792AbhAZQGZ (ORCPT ); Tue, 26 Jan 2021 11:06:25 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37720 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2404624AbhAZQC7 (ORCPT ); Tue, 26 Jan 2021 11:02:59 -0500 Received: from mail-wr1-x431.google.com (mail-wr1-x431.google.com [IPv6:2a00:1450:4864:20::431]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3322FC0698C3 for ; Tue, 26 Jan 2021 08:01:38 -0800 (PST) Received: by mail-wr1-x431.google.com with SMTP id v15so17006982wrx.4 for ; Tue, 26 Jan 2021 08:01:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=9k+3UJIdPZccdofYs3E0eSplegaTV9/5BA3fJLqMwsw=; b=vZb+xK5FbTsMSajicO8QB3KQY7XTNKiAOC9xAu0Am0k7iDO7k9mQiEM4f3LkhN0pyS xMtqZEB23t98wzYsXwwk2S2IkpAPkGzHEoA9GwPsxX3Ra5uzJcweS+6X5Bpcuswzm0mU /lNdd+96OtAJ0zybzE3FemymOst1NIr26jEP5NWn4LlRqYv55j8DuZq00oMv2KEICqrE CJU/ePZ6cgAb7m7EFo6wozGmTi3HU8srPEoxpWNMxNNYykQjtgxCeigqIpnIoGU+o8EP TvHRDLoIePNq0Ip/QiNQw4VbrvP3WJ3aBRF5DslXsmi0rS7YlI6cMj5g81Hr/eywWMe8 uang== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=9k+3UJIdPZccdofYs3E0eSplegaTV9/5BA3fJLqMwsw=; b=MrxnaQYsYivO8rdpzm7vX2SOFKJsu7aonaPGJe3RzWWdUEa8GgVaclo6uwTf+eqK1w idRGsHrfworOrSVdtAM6l3m2J944pqM8jHpsc4YEg7RMUooIxts29H0Y9fWGyGfAu0gT FCe+sh/RJ/qeyKuKfvfjSvW4A2DShXRSRlFefMI90ZXL6jwNNCeoeJjHYEysTNjQTbRw qvt4YjRFpzO4jrh0PtYD/1ZOCY6V6TUSK0RUMCX9CyQpuxVgEfxdN8abdGFOiofOhYU8 H9Kth0dYEuGOp8ffqLC3AQyfoJvusP89v0ic3xua9T1yQkbuimWNOCjOBeW6MWtZN5II VYAQ== X-Gm-Message-State: AOAM532y4mfmNG8Wrb2OGQcPKoabyQriHYL9eGV3lgbi/vCb7lGeWhIz qyzFAqnAndHkuLgBE111tsEfjsJtc50= X-Google-Smtp-Source: ABdhPJwEtqnzbEy+UfVJrxcvby7Cl1eoPo5Bo7VZdVD90J5GCCAf3KuvACge5DQ0YdnKBAVDu4RuZQ== X-Received: by 2002:adf:f452:: with SMTP id f18mr6657767wrp.11.1611676896769; Tue, 26 Jan 2021 08:01:36 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id n15sm9285011wrx.2.2021.01.26.08.01.36 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 26 Jan 2021 08:01:36 -0800 (PST) Message-Id: <909ca28e0ba23bed307a01e8851f9132581417b7.1611676886.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Tue, 26 Jan 2021 16:01:18 +0000 Subject: [PATCH 09/17] midx: return success/failure in chunk write methods Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: me@ttaylorr.com, gitster@pobox.com, l.s.r@web.de, szeder.dev@gmail.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee Historically, the chunk-writing methods in midx.c have returned the amount of data written so the writer method could compare this with the table of contents. This presents with some interesting issues: 1. If a chunk writing method has a bug that miscalculates the written bytes, then we can satisfy the table of contents without actually writing the right amount of data to the hashfile. The commit-graph writing code checks the hashfile struct directly for a more robust verification. 2. There is no way for a chunk writing method to gracefully fail. Returning an int presents an opportunity to fail without a die(). 3. The current pattern doesn't match chunk_write_fn type exactly, so we cannot share code with commit-graph.c For these reasons, convert the midx chunk writer methods to return an 'int'. Since none of them fail at the moment, they all return 0. Signed-off-by: Derrick Stolee --- midx.c | 63 +++++++++++++++++++++++++--------------------------------- 1 file changed, 27 insertions(+), 36 deletions(-) diff --git a/midx.c b/midx.c index 40b815f8877..852dd5b776e 100644 --- a/midx.c +++ b/midx.c @@ -650,7 +650,7 @@ static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m, return deduplicated_entries; } -static size_t write_midx_pack_names(struct hashfile *f, void *data) +static int write_midx_pack_names(struct hashfile *f, void *data) { struct write_midx_context *ctx = (struct write_midx_context *)data; uint32_t i; @@ -678,14 +678,13 @@ static size_t write_midx_pack_names(struct hashfile *f, void *data) if (i < MIDX_CHUNK_ALIGNMENT) { memset(padding, 0, sizeof(padding)); hashwrite(f, padding, i); - written += i; } - return written; + return 0; } -static size_t write_midx_oid_fanout(struct hashfile *f, - void *data) +static int write_midx_oid_fanout(struct hashfile *f, + void *data) { struct write_midx_context *ctx = (struct write_midx_context *)data; struct pack_midx_entry *list = ctx->entries; @@ -710,17 +709,16 @@ static size_t write_midx_oid_fanout(struct hashfile *f, list = next; } - return MIDX_CHUNK_FANOUT_SIZE; + return 0; } -static size_t write_midx_oid_lookup(struct hashfile *f, - void *data) +static int write_midx_oid_lookup(struct hashfile *f, + void *data) { struct write_midx_context *ctx = (struct write_midx_context *)data; unsigned char hash_len = the_hash_algo->rawsz; struct pack_midx_entry *list = ctx->entries; uint32_t i; - size_t written = 0; for (i = 0; i < ctx->entries_nr; i++) { struct pack_midx_entry *obj = list++; @@ -734,19 +732,17 @@ static size_t write_midx_oid_lookup(struct hashfile *f, } hashwrite(f, obj->oid.hash, (int)hash_len); - written += hash_len; } - return written; + return 0; } -static size_t write_midx_object_offsets(struct hashfile *f, - void *data) +static int write_midx_object_offsets(struct hashfile *f, + void *data) { struct write_midx_context *ctx = (struct write_midx_context *)data; struct pack_midx_entry *list = ctx->entries; uint32_t i, nr_large_offset = 0; - size_t written = 0; for (i = 0; i < ctx->entries_nr; i++) { struct pack_midx_entry *obj = list++; @@ -766,20 +762,17 @@ static size_t write_midx_object_offsets(struct hashfile *f, obj->offset); else hashwrite_be32(f, (uint32_t)obj->offset); - - written += MIDX_CHUNK_OFFSET_WIDTH; } - return written; + return 0; } -static size_t write_midx_large_offsets(struct hashfile *f, - void *data) +static int write_midx_large_offsets(struct hashfile *f, + void *data) { struct write_midx_context *ctx = (struct write_midx_context *)data; struct pack_midx_entry *list = ctx->entries; struct pack_midx_entry *end = ctx->entries + ctx->entries_nr; - size_t written = 0; uint32_t nr_large_offset = ctx->num_large_offsets; while (nr_large_offset) { @@ -795,12 +788,12 @@ static size_t write_midx_large_offsets(struct hashfile *f, if (!(offset >> 31)) continue; - written += hashwrite_be64(f, offset); + hashwrite_be64(f, offset); nr_large_offset--; } - return written; + return 0; } static int write_midx_internal(const char *object_dir, struct multi_pack_index *m, @@ -812,7 +805,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * struct hashfile *f = NULL; struct lock_file lk; struct write_midx_context ctx = { 0 }; - uint64_t written = 0; + uint64_t header_size = 0; uint32_t chunk_ids[MIDX_MAX_CHUNKS + 1]; uint64_t chunk_offsets[MIDX_MAX_CHUNKS + 1]; struct progress *progress = NULL; @@ -940,10 +933,10 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * goto cleanup; } - written = write_midx_header(f, num_chunks, ctx.nr - dropped_packs); + header_size = write_midx_header(f, num_chunks, ctx.nr - dropped_packs); chunk_ids[cur_chunk] = MIDX_CHUNKID_PACKNAMES; - chunk_offsets[cur_chunk] = written + (num_chunks + 1) * MIDX_CHUNKLOOKUP_WIDTH; + chunk_offsets[cur_chunk] = header_size + (num_chunks + 1) * MIDX_CHUNKLOOKUP_WIDTH; cur_chunk++; chunk_ids[cur_chunk] = MIDX_CHUNKID_OIDFANOUT; @@ -981,39 +974,37 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * hashwrite_be32(f, chunk_ids[i]); hashwrite_be64(f, chunk_offsets[i]); - - written += MIDX_CHUNKLOOKUP_WIDTH; } if (flags & MIDX_PROGRESS) progress = start_delayed_progress(_("Writing chunks to multi-pack-index"), num_chunks); for (i = 0; i < num_chunks; i++) { - if (written != chunk_offsets[i]) + if (f->total + f->offset != chunk_offsets[i]) BUG("incorrect chunk offset (%"PRIu64" != %"PRIu64") for chunk id %"PRIx32, chunk_offsets[i], - written, + f->total + f->offset, chunk_ids[i]); switch (chunk_ids[i]) { case MIDX_CHUNKID_PACKNAMES: - written += write_midx_pack_names(f, &ctx); + write_midx_pack_names(f, &ctx); break; case MIDX_CHUNKID_OIDFANOUT: - written += write_midx_oid_fanout(f, &ctx); + write_midx_oid_fanout(f, &ctx); break; case MIDX_CHUNKID_OIDLOOKUP: - written += write_midx_oid_lookup(f, &ctx); + write_midx_oid_lookup(f, &ctx); break; case MIDX_CHUNKID_OBJECTOFFSETS: - written += write_midx_object_offsets(f, &ctx); + write_midx_object_offsets(f, &ctx); break; case MIDX_CHUNKID_LARGEOFFSETS: - written += write_midx_large_offsets(f, &ctx); + write_midx_large_offsets(f, &ctx); break; default: @@ -1025,9 +1016,9 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * } stop_progress(&progress); - if (written != chunk_offsets[num_chunks]) + if (f->total + f->offset != chunk_offsets[num_chunks]) BUG("incorrect final offset %"PRIu64" != %"PRIu64, - written, + f->total + f->offset, chunk_offsets[num_chunks]); finalize_hashfile(f, NULL, CSUM_FSYNC | CSUM_HASH_IN_STREAM); From patchwork Tue Jan 26 16:01:19 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 12047301 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id F38F7C433E9 for ; Tue, 26 Jan 2021 16:06:30 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id BACD32229C for ; Tue, 26 Jan 2021 16:06:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2404633AbhAZQGQ (ORCPT ); Tue, 26 Jan 2021 11:06:16 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37722 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2404626AbhAZQC7 (ORCPT ); Tue, 26 Jan 2021 11:02:59 -0500 Received: from mail-wr1-x430.google.com (mail-wr1-x430.google.com [IPv6:2a00:1450:4864:20::430]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F3FEBC0698C4 for ; Tue, 26 Jan 2021 08:01:38 -0800 (PST) Received: by mail-wr1-x430.google.com with SMTP id 6so17019518wri.3 for ; Tue, 26 Jan 2021 08:01:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=K4ZTH2mLu/dyy12r/5xQIKPI71QJPC6RYXCaS+e8/EQ=; b=rXgUGnBhw3765Up8gtwAx7APSHm8Ie/1NGZUoGyAqFtM6gC4XmJR9UGWK/VTmXJQ93 kgusDOyueuRhuHLDB/brUP7lyyNVBbOJ+w2jUb5RCrbdIK3jW+0zvs9FgCDuoltvKtjx DvmzR5jIjKeqKA5oBz5MS3BerqHXNk+SvJAXK+A5i/eOZH3IpInwcfQyqKEeYh6VbcYS SBAw2EmJK3QMyVirxtQp3ZqGtGcHEAFLTt5+GtQY0e/rE4bIVQhHhT+Is6FA3dd6iPqk geNp/RjPvRP2VRp5NlJapdHm30LwiE/xNm16wycFWmN9J+DjvmT4f/fXcqVYCyl9fs2I GMWg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=K4ZTH2mLu/dyy12r/5xQIKPI71QJPC6RYXCaS+e8/EQ=; b=fSu5fpmQfLEtjOIkjOUmONpYLV+/MNpoG4gTilWiynKWK6PXiZ1WZVvT+K4B+ElRRM enyZAQXbvkcNNczfvE+K3Tl1Tyga/QVizn0yqj3hy8sdqb1rD3vD5xc3VLhFdm0XAYHb ag9X6xMiHoi35fyAejWeLSb9HSlH6z8XAw942xuu0wQuhRDN+Tcym+3F1/En1CO4p209 wGLKM/lIZn44vonV3S6ujLDqymoJzclHJKOpEVc/95LZf+FL1pre84p+1ELZ9Bj1asaD jekooQNndTmCiV1upWzWft1vRPhv/DBlX1EYqA++VDHADEEr6+EuQho822CFPuGX+XvE uRoA== X-Gm-Message-State: AOAM533wr1mdtKZXVHB/SkC6TwZxe9QHBnQF4OtMhFk3hLBgEalcTASZ kj++ZgUK308DbfBm0AwLbSokjovhoJM= X-Google-Smtp-Source: ABdhPJxC314SVajJ7YkPkqHPOBf1df5v4+u7mGJmvo7DmDuUS+06uabIC29oWKLTSjXp6H0m2PdOOw== X-Received: by 2002:adf:efc2:: with SMTP id i2mr6621398wrp.422.1611676897635; Tue, 26 Jan 2021 08:01:37 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id m184sm3518530wmf.12.2021.01.26.08.01.36 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 26 Jan 2021 08:01:37 -0800 (PST) Message-Id: In-Reply-To: References: Date: Tue, 26 Jan 2021 16:01:19 +0000 Subject: [PATCH 10/17] midx: drop chunk progress during write Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: me@ttaylorr.com, gitster@pobox.com, l.s.r@web.de, szeder.dev@gmail.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee Most expensive operations in write_midx_internal() use the context struct's progress member, and these indicate the process of the expensive operations within the chunk writing methods. However, there is a competing progress struct that counts the progress over all chunks. This is not very helpful compared to the others, so drop it. This also reduces our barriers to combining the chunk writing code with chunk-format.c. Signed-off-by: Derrick Stolee --- midx.c | 7 ------- 1 file changed, 7 deletions(-) diff --git a/midx.c b/midx.c index 852dd5b776e..145c6bd0913 100644 --- a/midx.c +++ b/midx.c @@ -808,7 +808,6 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * uint64_t header_size = 0; uint32_t chunk_ids[MIDX_MAX_CHUNKS + 1]; uint64_t chunk_offsets[MIDX_MAX_CHUNKS + 1]; - struct progress *progress = NULL; int pack_name_concat_len = 0; int dropped_packs = 0; int result = 0; @@ -976,9 +975,6 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * hashwrite_be64(f, chunk_offsets[i]); } - if (flags & MIDX_PROGRESS) - progress = start_delayed_progress(_("Writing chunks to multi-pack-index"), - num_chunks); for (i = 0; i < num_chunks; i++) { if (f->total + f->offset != chunk_offsets[i]) BUG("incorrect chunk offset (%"PRIu64" != %"PRIu64") for chunk id %"PRIx32, @@ -1011,10 +1007,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * BUG("trying to write unknown chunk id %"PRIx32, chunk_ids[i]); } - - display_progress(progress, i + 1); } - stop_progress(&progress); if (f->total + f->offset != chunk_offsets[num_chunks]) BUG("incorrect final offset %"PRIu64" != %"PRIu64, From patchwork Tue Jan 26 16:01:20 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 12047293 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6303CC433DB for ; Tue, 26 Jan 2021 16:05:28 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 2B8AE2220B for ; Tue, 26 Jan 2021 16:05:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2391403AbhAZQFW (ORCPT ); Tue, 26 Jan 2021 11:05:22 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37724 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2404631AbhAZQC7 (ORCPT ); Tue, 26 Jan 2021 11:02:59 -0500 Received: from mail-wr1-x42a.google.com (mail-wr1-x42a.google.com [IPv6:2a00:1450:4864:20::42a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EB6EBC0698C5 for ; Tue, 26 Jan 2021 08:01:39 -0800 (PST) Received: by mail-wr1-x42a.google.com with SMTP id b5so17009326wrr.10 for ; Tue, 26 Jan 2021 08:01:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=fPA9Pt/h9TBeyQYflA9V7ZzZ1iz8PP16eANqeD9X+5E=; b=YWioCQX9FLdSdP85feDEr0ZAvzJCwQvAHlqrZy4M7WcBrejTne0l5NapI2qd9ivL1X ZietkkecbKUjOykzonUVzu2hIk8gThxCmzEP83EdaaH8PElHlnrX7Pu7f6wpjiPVWO+4 4ge6Ocl+EpIdf6J4y4QlxsPk8hEL8y/UmHqLypxnXjWzFGiLhjSklJDhxlFVb+ojO1Ib ic4gox9dcHvCsCJymQEXb0NV+mxkkS36B0TSB6W1LX7i3LtyHmr7h03tenOL2KsK1JDo 0g64yzls0fgsxE8mkuZhCKb+XmzBf7YvSUNJ5RWKtgSDKK76Hh5iMg3GF8ydLCQi7u/t WxLw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=fPA9Pt/h9TBeyQYflA9V7ZzZ1iz8PP16eANqeD9X+5E=; b=T/KTD+b2zeneK3QucHbeGj5Pi49vsxQ0CIPJa7Zn3jjpMva6uJR13TTlkDmb2nW8Hj +N2Qnj2LkVjd81cK4BNrSgSid0P5lnZ7Fq9Frop0fXN9oXlR87hKOjnbVHHaeN65WXTG yasPeC+fqgNqUBFgV7PcHYDvvzxXKjaPIbnjk0bkhck/WVwP2NIV2oCv9pIcjaNfktRo XZtiwI6Zk5keZcAJCluonK/+ep3IV+kqyu4l/hnws5S7CukMpJXC9WH2ZYXe4F6KrLiv qoc7WZYgmVAJChQGrGfg6DuwNzS3GXElA6/l7SOLC5MTNRL5WbGVE+vFO3hpyyaPQ8wB LRvQ== X-Gm-Message-State: AOAM5301PeYcUpcuPomBBnHkjGrrTGuitd2uPhcIrRgAROGJJEeFs2r4 rOq4iRLo4udx/hNCVHyRmOa1FnJzEcM= X-Google-Smtp-Source: ABdhPJz67AspyJ9vQ/8uQ6n/Dx8KV2s0IhKFMOfpnTvtZy9Zj77TFZ6EdpSOBn6dRAnNQTjYpUVD4Q== X-Received: by 2002:a5d:6282:: with SMTP id k2mr6973561wru.159.1611676898508; Tue, 26 Jan 2021 08:01:38 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id 15sm3693929wmk.3.2021.01.26.08.01.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 26 Jan 2021 08:01:38 -0800 (PST) Message-Id: <49cfb4f63e275ce70b20dd6d3f156971b33ddcec.1611676886.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Tue, 26 Jan 2021 16:01:20 +0000 Subject: [PATCH 11/17] midx: use chunk-format API in write_midx_internal() Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: me@ttaylorr.com, gitster@pobox.com, l.s.r@web.de, szeder.dev@gmail.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee The chunk-format API allows writing the table of contents and all chunks using the anonymous 'struct chunkfile' type. We only need to convert our local chunk logic to this API for the multi-pack-index writes to share that logic with the commit-graph file writes. Signed-off-by: Derrick Stolee --- midx.c | 104 +++++++++++---------------------------------------------- 1 file changed, 19 insertions(+), 85 deletions(-) diff --git a/midx.c b/midx.c index 145c6bd0913..0bfd2d802b6 100644 --- a/midx.c +++ b/midx.c @@ -11,6 +11,7 @@ #include "trace2.h" #include "run-command.h" #include "repository.h" +#include "chunk-format.h" #define MIDX_SIGNATURE 0x4d494458 /* "MIDX" */ #define MIDX_VERSION 1 @@ -799,18 +800,15 @@ static int write_midx_large_offsets(struct hashfile *f, static int write_midx_internal(const char *object_dir, struct multi_pack_index *m, struct string_list *packs_to_drop, unsigned flags) { - unsigned char cur_chunk, num_chunks = 0; char *midx_name; uint32_t i; struct hashfile *f = NULL; struct lock_file lk; struct write_midx_context ctx = { 0 }; - uint64_t header_size = 0; - uint32_t chunk_ids[MIDX_MAX_CHUNKS + 1]; - uint64_t chunk_offsets[MIDX_MAX_CHUNKS + 1]; int pack_name_concat_len = 0; int dropped_packs = 0; int result = 0; + struct chunkfile *cf; midx_name = get_midx_filename(object_dir); if (safe_create_leading_directories(midx_name)) @@ -923,98 +921,34 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * if (ctx.m) close_midx(ctx.m); - cur_chunk = 0; - num_chunks = ctx.large_offsets_needed ? 5 : 4; - if (ctx.nr - dropped_packs == 0) { error(_("no pack files to index.")); result = 1; goto cleanup; } - header_size = write_midx_header(f, num_chunks, ctx.nr - dropped_packs); - - chunk_ids[cur_chunk] = MIDX_CHUNKID_PACKNAMES; - chunk_offsets[cur_chunk] = header_size + (num_chunks + 1) * MIDX_CHUNKLOOKUP_WIDTH; - - cur_chunk++; - chunk_ids[cur_chunk] = MIDX_CHUNKID_OIDFANOUT; - chunk_offsets[cur_chunk] = chunk_offsets[cur_chunk - 1] + pack_name_concat_len; - - cur_chunk++; - chunk_ids[cur_chunk] = MIDX_CHUNKID_OIDLOOKUP; - chunk_offsets[cur_chunk] = chunk_offsets[cur_chunk - 1] + MIDX_CHUNK_FANOUT_SIZE; - - cur_chunk++; - chunk_ids[cur_chunk] = MIDX_CHUNKID_OBJECTOFFSETS; - chunk_offsets[cur_chunk] = chunk_offsets[cur_chunk - 1] + ctx.entries_nr * the_hash_algo->rawsz; - - cur_chunk++; - chunk_offsets[cur_chunk] = chunk_offsets[cur_chunk - 1] + ctx.entries_nr * MIDX_CHUNK_OFFSET_WIDTH; - if (ctx.large_offsets_needed) { - chunk_ids[cur_chunk] = MIDX_CHUNKID_LARGEOFFSETS; - - cur_chunk++; - chunk_offsets[cur_chunk] = chunk_offsets[cur_chunk - 1] + - ctx.num_large_offsets * MIDX_CHUNK_LARGE_OFFSET_WIDTH; - } - - chunk_ids[cur_chunk] = 0; - - for (i = 0; i <= num_chunks; i++) { - if (i && chunk_offsets[i] < chunk_offsets[i - 1]) - BUG("incorrect chunk offsets: %"PRIu64" before %"PRIu64, - chunk_offsets[i - 1], - chunk_offsets[i]); - - if (chunk_offsets[i] % MIDX_CHUNK_ALIGNMENT) - BUG("chunk offset %"PRIu64" is not properly aligned", - chunk_offsets[i]); - - hashwrite_be32(f, chunk_ids[i]); - hashwrite_be64(f, chunk_offsets[i]); - } - - for (i = 0; i < num_chunks; i++) { - if (f->total + f->offset != chunk_offsets[i]) - BUG("incorrect chunk offset (%"PRIu64" != %"PRIu64") for chunk id %"PRIx32, - chunk_offsets[i], - f->total + f->offset, - chunk_ids[i]); + cf = init_chunkfile(f); - switch (chunk_ids[i]) { - case MIDX_CHUNKID_PACKNAMES: - write_midx_pack_names(f, &ctx); - break; + add_chunk(cf, MIDX_CHUNKID_PACKNAMES, + write_midx_pack_names, pack_name_concat_len); + add_chunk(cf, MIDX_CHUNKID_OIDFANOUT, + write_midx_oid_fanout, MIDX_CHUNK_FANOUT_SIZE); + add_chunk(cf, MIDX_CHUNKID_OIDLOOKUP, + write_midx_oid_lookup, ctx.entries_nr * the_hash_algo->rawsz); + add_chunk(cf, MIDX_CHUNKID_OBJECTOFFSETS, + write_midx_object_offsets, + ctx.entries_nr * MIDX_CHUNK_OFFSET_WIDTH); - case MIDX_CHUNKID_OIDFANOUT: - write_midx_oid_fanout(f, &ctx); - break; - - case MIDX_CHUNKID_OIDLOOKUP: - write_midx_oid_lookup(f, &ctx); - break; - - case MIDX_CHUNKID_OBJECTOFFSETS: - write_midx_object_offsets(f, &ctx); - break; - - case MIDX_CHUNKID_LARGEOFFSETS: - write_midx_large_offsets(f, &ctx); - break; - - default: - BUG("trying to write unknown chunk id %"PRIx32, - chunk_ids[i]); - } - } + if (ctx.large_offsets_needed) + add_chunk(cf, MIDX_CHUNKID_LARGEOFFSETS, + write_midx_large_offsets, + ctx.num_large_offsets * MIDX_CHUNK_LARGE_OFFSET_WIDTH); - if (f->total + f->offset != chunk_offsets[num_chunks]) - BUG("incorrect final offset %"PRIu64" != %"PRIu64, - f->total + f->offset, - chunk_offsets[num_chunks]); + write_midx_header(f, get_num_chunks(cf), ctx.nr - dropped_packs); + write_chunkfile(cf, &ctx); finalize_hashfile(f, NULL, CSUM_FSYNC | CSUM_HASH_IN_STREAM); + free_chunkfile(cf); commit_lock_file(&lk); cleanup: From patchwork Tue Jan 26 16:01:21 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 12047311 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E4B8EC433E0 for ; Tue, 26 Jan 2021 16:09:46 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A85C12082D for ; Tue, 26 Jan 2021 16:09:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2404380AbhAZQGI (ORCPT ); Tue, 26 Jan 2021 11:06:08 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37728 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2404633AbhAZQC7 (ORCPT ); Tue, 26 Jan 2021 11:02:59 -0500 Received: from mail-wr1-x42c.google.com (mail-wr1-x42c.google.com [IPv6:2a00:1450:4864:20::42c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D624CC0698C6 for ; Tue, 26 Jan 2021 08:01:40 -0800 (PST) Received: by mail-wr1-x42c.google.com with SMTP id 7so17030706wrz.0 for ; Tue, 26 Jan 2021 08:01:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=IVzb1Uw8OscK/sUGg6fnMfB7FooZ5hvv1wQJ6LJPxtc=; b=E/bnB1tABI+6hzYIT3mOfJbaiqWTVa6bSJE4vg3MxIEEh0c3yMDOtAjP+lqprmfy3K mkOnHC9P7X22DkKlQLQz/358o+SFwj6oOKoJeOPcCTmVIVRZ/sEm9jjkOsO26nsfXPDz zPXkL4zT5ann1AqYKXT0jDZBt1SI6Ppqk32hT5zC+mwgsg4phLh+B3qmZPgN/S6t5YL6 zoaY5Q56wTyJHkR5tIh8qce/QHwHw7zkLqRN2WbjkTRHrU96pNqoW9ivup7iYSIGuwHe 5hhcXWOZi5ssd7VDXI+I+YLlebHajiq4iPWOoMZe5BVB4yfmASI3cBVBgmfy6W+78As+ 7mmA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=IVzb1Uw8OscK/sUGg6fnMfB7FooZ5hvv1wQJ6LJPxtc=; b=OpdrjBAaDa4dTX5RargIZQ9YcpuA6tdBwWa38mun0q0667MJ/9TvA+sZWVwH7/cszr bQ7YmK7xIWG/rZd+YbRKAfi4mZtBA75ZVzSSbO1j3/cVLtp/aHw/kunbVimUPXhPs0Hk FojQFg0RGNgOoPJRzlj+IDnY3ogPkjQWfSh30icMRT4/Sl26u/KQEM9Gw2/TQ0RCH2Xg qPV9qXR2ClUS2bTL26XnnGkOvdSh8t6I7YSOKBqFR4SLSxwZN7X3boQq8IXg+TEUqSVi 2GpywKILofHX//IOZ2kVaXYb+e/kmPC3yPXmxjKr9mFZ8HDTCHrn7gw93kyq7buWDnJB Y39Q== X-Gm-Message-State: AOAM530c2xAxnAwUAL8TaZKhC+jmpbFMU6epXaozL0phnFS3igDaMHXY Q5Otul9vxzpl3I4lurzZZsXOfnHi9PM= X-Google-Smtp-Source: ABdhPJwtWWc57xHBhOlce1m/GovXvyMHlLMWqlOAC+HRsQjbkX16hZd/pRe7pYVOdyY83vOja4B8Xw== X-Received: by 2002:adf:d1cb:: with SMTP id b11mr6984727wrd.118.1611676899395; Tue, 26 Jan 2021 08:01:39 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id w25sm3136289wmc.42.2021.01.26.08.01.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 26 Jan 2021 08:01:38 -0800 (PST) Message-Id: In-Reply-To: References: Date: Tue, 26 Jan 2021 16:01:21 +0000 Subject: [PATCH 12/17] chunk-format: create read chunk API Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: me@ttaylorr.com, gitster@pobox.com, l.s.r@web.de, szeder.dev@gmail.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee Add the capability to read the table of contents, then pair the chunks with necessary logic using read_chunk_fn pointers. Callers will be added in future changes, but the typical outline will be: 1. initialize a 'struct chunkfile' with init_chunkfile(NULL). 2. call read_table_of_contents(). 3. for each chunk to parse, call pair_chunk() with appropriate pointers. 4. call free_chunkfile() to clear the 'struct chunkfile' data. We are re-using the anonymous 'struct chunkfile' data, as it is internal to the chunk-format API. This gives it essentially two modes: write and read. If the same struct instance was used for both reads and writes, then there would be failures. Signed-off-by: Derrick Stolee --- chunk-format.c | 64 ++++++++++++++++++++++++++++++++++++++++++++++++++ chunk-format.h | 21 +++++++++++++++++ 2 files changed, 85 insertions(+) diff --git a/chunk-format.c b/chunk-format.c index 2ce37ecc6bb..674d31d5e58 100644 --- a/chunk-format.c +++ b/chunk-format.c @@ -12,6 +12,8 @@ struct chunk_info { uint32_t id; uint64_t size; chunk_write_fn write_fn; + + const void *start; }; struct chunkfile { @@ -89,3 +91,65 @@ int write_chunkfile(struct chunkfile *cf, void *data) return 0; } + +int read_table_of_contents(struct chunkfile *cf, + const unsigned char *mfile, + size_t mfile_size, + uint64_t toc_offset, + int toc_length) +{ + uint32_t chunk_id; + const unsigned char *table_of_contents = mfile + toc_offset; + + ALLOC_GROW(cf->chunks, toc_length, cf->chunks_alloc); + + while (toc_length--) { + uint64_t chunk_offset, next_chunk_offset; + + chunk_id = get_be32(table_of_contents); + chunk_offset = get_be64(table_of_contents + 4); + + if (!chunk_id) { + error(_("terminating chunk id appears earlier than expected")); + return 1; + } + + table_of_contents += CHUNK_LOOKUP_WIDTH; + next_chunk_offset = get_be64(table_of_contents + 4); + + if (next_chunk_offset < chunk_offset || + next_chunk_offset > mfile_size - the_hash_algo->rawsz) { + error(_("improper chunk offset(s) %"PRIx64" and %"PRIx64""), + chunk_offset, next_chunk_offset); + return -1; + } + + cf->chunks[cf->chunks_nr].id = chunk_id; + cf->chunks[cf->chunks_nr].start = mfile + chunk_offset; + cf->chunks[cf->chunks_nr].size = next_chunk_offset - chunk_offset; + cf->chunks_nr++; + } + + chunk_id = get_be32(table_of_contents); + if (chunk_id) { + error(_("final chunk has non-zero id %"PRIx32""), chunk_id); + return -1; + } + + return 0; +} + +int pair_chunk(struct chunkfile *cf, + uint32_t chunk_id, + chunk_read_fn fn, + void *data) +{ + int i; + + for (i = 0; i < cf->chunks_nr; i++) { + if (cf->chunks[i].id == chunk_id) + return fn(cf->chunks[i].start, cf->chunks[i].size, data); + } + + return CHUNK_NOT_FOUND; +} diff --git a/chunk-format.h b/chunk-format.h index bfaed672813..250e08b8e6a 100644 --- a/chunk-format.h +++ b/chunk-format.h @@ -17,4 +17,25 @@ void add_chunk(struct chunkfile *cf, size_t size); int write_chunkfile(struct chunkfile *cf, void *data); +int read_table_of_contents(struct chunkfile *cf, + const unsigned char *mfile, + size_t mfile_size, + uint64_t toc_offset, + int toc_length); + +/* + * When reading a table of contents, we find the chunk with matching 'id' + * then call its read_fn to populate the necessary 'data' based on the + * chunk start and size. + */ +typedef int (*chunk_read_fn)(const unsigned char *chunk_start, + size_t chunk_size, void *data); + + +#define CHUNK_NOT_FOUND (-2) +int pair_chunk(struct chunkfile *cf, + uint32_t chunk_id, + chunk_read_fn fn, + void *data); + #endif From patchwork Tue Jan 26 16:01:22 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 12047309 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 82CDAC433DB for ; Tue, 26 Jan 2021 16:07:03 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4E9492220B for ; Tue, 26 Jan 2021 16:07:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2404951AbhAZQGj (ORCPT ); Tue, 26 Jan 2021 11:06:39 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37602 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2404563AbhAZQCr (ORCPT ); Tue, 26 Jan 2021 11:02:47 -0500 Received: from mail-wm1-x32f.google.com (mail-wm1-x32f.google.com [IPv6:2a00:1450:4864:20::32f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 01C8AC0698C7 for ; Tue, 26 Jan 2021 08:01:42 -0800 (PST) Received: by mail-wm1-x32f.google.com with SMTP id c128so3246029wme.2 for ; Tue, 26 Jan 2021 08:01:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=g54d1hmA9HIwJVQpYPG3GOux5lOAMXVYgzBb/0D3228=; b=Pa/E98WgKQV5ShCJmWoM6qwvsUOSZwwz4T1yIhINJgssIWsYBbkmPnCDyEXfpsPWbY IUwwhngHrqHWmA8R/c3RonlKuhudLkTjMrJ2e7jPjFqjHeL4sYKBLdlccFABhehjScSt tGYaBKM+jh1KjZY7FAsKugiPJwEAknWQvb0LUjl3jFgwlQPATzMIPVMkZbPNEeGJ9xx8 pmlDrk6em8xXfo0miY0Hht4QBoLI68wmlNMkwtt3+sZfX15GVaAVf1DP2QWSxTnwpYOO APnpQjJZsnUgyGs8W5eM05ij1S71TzAy2eaOi+3o4eZ0/i2yI6UwNaYZ4+lLebxBnNw6 F4Nw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=g54d1hmA9HIwJVQpYPG3GOux5lOAMXVYgzBb/0D3228=; b=DGY/p6v5Tl7gJVL0gUq2tg882+64ZJ2wrWgi2paA8/j8fYPVWTtLICqv7htp5KaFcg YJ6ZwIF6q7d0nXmuelGD3KOqgVOU0Do1X1dRrm5Z2O3Bg5xutBG6gVRZKJayXZcTnx2Y fYKtrogHQJUKUBL+Zrs1DzAFlcymfIOc529x+KzK7FkZpTO7qehwx5mAhm9672tNk/dt fuWVEJ72fe81zEdI002wciepi//6vAFigt0BAxwHbytilEae76RpiE8T70ahnlU186b6 jf58+2AtXT7LvqibPwHYsmzfTWu80f7CiXn2e9g8vQgLvaITEB/70FjoDKYXQS7wkCw9 pW0A== X-Gm-Message-State: AOAM530KmJK4BnKhJWJvSV+lFzKIf1v8JHJOwdzMPy6uptMEMjoo43c7 0dVU+hgoAitpjj0MCA9tJ8ftDwXFJ5k= X-Google-Smtp-Source: ABdhPJzJfaO5xt3d8+luOstoZ3th1/bL4jVT1GNoOoHq1U+AP4jKAnc7qGY4gSE5iHZBcdC0oXBddA== X-Received: by 2002:a1c:808d:: with SMTP id b135mr343500wmd.157.1611676900537; Tue, 26 Jan 2021 08:01:40 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id v4sm28924990wrw.42.2021.01.26.08.01.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 26 Jan 2021 08:01:39 -0800 (PST) Message-Id: <7339990f07db81897a30cb12e2d33ac1ef855d20.1611676886.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Tue, 26 Jan 2021 16:01:22 +0000 Subject: [PATCH 13/17] commit-graph: use chunk-format read API Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: me@ttaylorr.com, gitster@pobox.com, l.s.r@web.de, szeder.dev@gmail.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee Instead of parsing the table of contents directly, use the chunk-format API methods read_table_of_contents() and pair_chunk(). While the current implementation loses the duplicate-chunk detection, that will be added in a future change. Signed-off-by: Derrick Stolee --- commit-graph.c | 209 ++++++++++++++++++++-------------------- t/t5318-commit-graph.sh | 2 +- 2 files changed, 108 insertions(+), 103 deletions(-) diff --git a/commit-graph.c b/commit-graph.c index b2c0f233eab..44c06d0fb67 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -299,15 +299,99 @@ static int verify_commit_graph_lite(struct commit_graph *g) return 0; } +static int graph_read_oid_fanout(const unsigned char *chunk_start, + size_t chunk_size, void *data) +{ + struct commit_graph *g = (struct commit_graph *)data; + g->chunk_oid_fanout = (uint32_t*)chunk_start; + return 0; +} + +static int graph_read_oid_lookup(const unsigned char *chunk_start, + size_t chunk_size, void *data) +{ + struct commit_graph *g = (struct commit_graph *)data; + g->chunk_oid_lookup = chunk_start; + g->num_commits = chunk_size / g->hash_len; + return 0; +} + +static int graph_read_data(const unsigned char *chunk_start, + size_t chunk_size, void *data) +{ + struct commit_graph *g = (struct commit_graph *)data; + g->chunk_commit_data = chunk_start; + return 0; +} + +static int graph_read_extra_edges(const unsigned char *chunk_start, + size_t chunk_size, void *data) +{ + struct commit_graph *g = (struct commit_graph *)data; + g->chunk_extra_edges = chunk_start; + return 0; +} + +static int graph_read_base_graphs(const unsigned char *chunk_start, + size_t chunk_size, void *data) +{ + struct commit_graph *g = (struct commit_graph *)data; + g->chunk_base_graphs = chunk_start; + return 0; +} + +static int graph_read_generation_data(const unsigned char *chunk_start, + size_t chunk_size, void *data) +{ + struct commit_graph *g = (struct commit_graph *)data; + g->chunk_generation_data = chunk_start; + return 0; +} + +static int graph_read_generation_overflow(const unsigned char *chunk_start, + size_t chunk_size, void *data) +{ + struct commit_graph *g = (struct commit_graph *)data; + g->chunk_generation_data_overflow = chunk_start; + return 0; +} + +static int graph_read_bloom_indices(const unsigned char *chunk_start, + size_t chunk_size, void *data) +{ + struct commit_graph *g = (struct commit_graph *)data; + g->chunk_bloom_indexes = chunk_start; + return 0; +} + +static int graph_read_bloom_data(const unsigned char *chunk_start, + size_t chunk_size, void *data) +{ + struct commit_graph *g = (struct commit_graph *)data; + uint32_t hash_version; + g->chunk_bloom_data = chunk_start; + hash_version = get_be32(chunk_start); + + if (hash_version != 1) + return 0; + + g->bloom_filter_settings = xmalloc(sizeof(struct bloom_filter_settings)); + g->bloom_filter_settings->hash_version = hash_version; + g->bloom_filter_settings->num_hashes = get_be32(chunk_start + 4); + g->bloom_filter_settings->bits_per_entry = get_be32(chunk_start + 8); + g->bloom_filter_settings->max_changed_paths = DEFAULT_BLOOM_MAX_CHANGES; + + return 0; +} + struct commit_graph *parse_commit_graph(struct repository *r, void *graph_map, size_t graph_size) { - const unsigned char *data, *chunk_lookup; - uint32_t i; + const unsigned char *data; struct commit_graph *graph; - uint64_t next_chunk_offset; uint32_t graph_signature; unsigned char graph_version, hash_version; + struct chunkfile *cf = NULL; if (!graph_map) return NULL; @@ -356,108 +440,27 @@ struct commit_graph *parse_commit_graph(struct repository *r, return NULL; } - chunk_lookup = data + 8; - next_chunk_offset = get_be64(chunk_lookup + 4); - for (i = 0; i < graph->num_chunks; i++) { - uint32_t chunk_id; - uint64_t chunk_offset = next_chunk_offset; - int chunk_repeated = 0; - - chunk_id = get_be32(chunk_lookup + 0); - - chunk_lookup += GRAPH_CHUNKLOOKUP_WIDTH; - next_chunk_offset = get_be64(chunk_lookup + 4); - - if (chunk_offset > graph_size - the_hash_algo->rawsz) { - error(_("commit-graph improper chunk offset %08x%08x"), (uint32_t)(chunk_offset >> 32), - (uint32_t)chunk_offset); - goto free_and_return; - } - - switch (chunk_id) { - case GRAPH_CHUNKID_OIDFANOUT: - if (graph->chunk_oid_fanout) - chunk_repeated = 1; - else - graph->chunk_oid_fanout = (uint32_t*)(data + chunk_offset); - break; - - case GRAPH_CHUNKID_OIDLOOKUP: - if (graph->chunk_oid_lookup) - chunk_repeated = 1; - else { - graph->chunk_oid_lookup = data + chunk_offset; - graph->num_commits = (next_chunk_offset - chunk_offset) - / graph->hash_len; - } - break; - - case GRAPH_CHUNKID_DATA: - if (graph->chunk_commit_data) - chunk_repeated = 1; - else - graph->chunk_commit_data = data + chunk_offset; - break; - - case GRAPH_CHUNKID_GENERATION_DATA: - if (graph->chunk_generation_data) - chunk_repeated = 1; - else - graph->chunk_generation_data = data + chunk_offset; - break; + cf = init_chunkfile(NULL); - case GRAPH_CHUNKID_GENERATION_DATA_OVERFLOW: - if (graph->chunk_generation_data_overflow) - chunk_repeated = 1; - else - graph->chunk_generation_data_overflow = data + chunk_offset; - break; - - case GRAPH_CHUNKID_EXTRAEDGES: - if (graph->chunk_extra_edges) - chunk_repeated = 1; - else - graph->chunk_extra_edges = data + chunk_offset; - break; - - case GRAPH_CHUNKID_BASE: - if (graph->chunk_base_graphs) - chunk_repeated = 1; - else - graph->chunk_base_graphs = data + chunk_offset; - break; - - case GRAPH_CHUNKID_BLOOMINDEXES: - if (graph->chunk_bloom_indexes) - chunk_repeated = 1; - else if (r->settings.commit_graph_read_changed_paths) - graph->chunk_bloom_indexes = data + chunk_offset; - break; - - case GRAPH_CHUNKID_BLOOMDATA: - if (graph->chunk_bloom_data) - chunk_repeated = 1; - else if (r->settings.commit_graph_read_changed_paths) { - uint32_t hash_version; - graph->chunk_bloom_data = data + chunk_offset; - hash_version = get_be32(data + chunk_offset); - - if (hash_version != 1) - break; + if (read_table_of_contents(cf, graph->data, graph_size, + GRAPH_HEADER_SIZE, graph->num_chunks)) + goto free_and_return; - graph->bloom_filter_settings = xmalloc(sizeof(struct bloom_filter_settings)); - graph->bloom_filter_settings->hash_version = hash_version; - graph->bloom_filter_settings->num_hashes = get_be32(data + chunk_offset + 4); - graph->bloom_filter_settings->bits_per_entry = get_be32(data + chunk_offset + 8); - graph->bloom_filter_settings->max_changed_paths = DEFAULT_BLOOM_MAX_CHANGES; - } - break; - } + pair_chunk(cf, GRAPH_CHUNKID_OIDFANOUT, graph_read_oid_fanout, graph); + pair_chunk(cf, GRAPH_CHUNKID_OIDLOOKUP, graph_read_oid_lookup, graph); + pair_chunk(cf, GRAPH_CHUNKID_DATA, graph_read_data, graph); + pair_chunk(cf, GRAPH_CHUNKID_EXTRAEDGES, graph_read_extra_edges, graph); + pair_chunk(cf, GRAPH_CHUNKID_BASE, graph_read_base_graphs, graph); + pair_chunk(cf, GRAPH_CHUNKID_GENERATION_DATA, + graph_read_generation_data, graph); + pair_chunk(cf, GRAPH_CHUNKID_GENERATION_DATA_OVERFLOW, + graph_read_generation_overflow, graph); - if (chunk_repeated) { - error(_("commit-graph chunk id %08x appears multiple times"), chunk_id); - goto free_and_return; - } + if (r->settings.commit_graph_read_changed_paths) { + pair_chunk(cf, GRAPH_CHUNKID_BLOOMINDEXES, + graph_read_bloom_indices, graph); + pair_chunk(cf, GRAPH_CHUNKID_BLOOMDATA, + graph_read_bloom_data, graph); } if (graph->chunk_bloom_indexes && graph->chunk_bloom_data) { @@ -474,9 +477,11 @@ struct commit_graph *parse_commit_graph(struct repository *r, if (verify_commit_graph_lite(graph)) goto free_and_return; + free_chunkfile(cf); return graph; free_and_return: + free_chunkfile(cf); free(graph->bloom_filter_settings); free(graph); return NULL; diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh index fa27df579a5..c7da741284e 100755 --- a/t/t5318-commit-graph.sh +++ b/t/t5318-commit-graph.sh @@ -564,7 +564,7 @@ test_expect_success 'detect bad hash version' ' test_expect_success 'detect low chunk count' ' corrupt_graph_and_verify $GRAPH_BYTE_CHUNK_COUNT "\01" \ - "missing the .* chunk" + "final chunk has non-zero id" ' test_expect_success 'detect missing OID fanout chunk' ' From patchwork Tue Jan 26 16:01:23 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 12047289 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 26899C433E6 for ; Tue, 26 Jan 2021 16:04:56 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E8A5B2220B for ; Tue, 26 Jan 2021 16:04:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2404480AbhAZQDr (ORCPT ); Tue, 26 Jan 2021 11:03:47 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37604 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2404565AbhAZQCr (ORCPT ); Tue, 26 Jan 2021 11:02:47 -0500 Received: from mail-wr1-x42e.google.com (mail-wr1-x42e.google.com [IPv6:2a00:1450:4864:20::42e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EA818C0698C8 for ; Tue, 26 Jan 2021 08:01:42 -0800 (PST) Received: by mail-wr1-x42e.google.com with SMTP id 6so17019745wri.3 for ; Tue, 26 Jan 2021 08:01:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=/9WZzc0JHsPutkXhtZtPjDnbj9w/mbrhFV+hIT7xcoo=; b=ag7Tu7N9BbF12VoAVBwg4a70HIqr++6HdKATakL1dTwD4inrXcn0voC2iwBplRyx7d jEWCvdq2LxTF95ww7UZg3E/wkxjo9f9L3xB/P5X6tNKX3LfwfJXIo0DkrPOadgrQ0NNX 08307OfLAgINRXcLLkl7rngz96ZatmDbIxzHzw1p0RZV+LFkl+7kjAdGr3AS9hQWljMl 4Ab1sCwpU+wlRs1IjTr/zOCaoywgevoib1tDEpVwCUcIkh1Mn62FLfnXIYhHLniPOBH5 seCJUWt9NGUBzKdO6/C0gkAGBEVbx+WQ84QV1JTqQZeEvu6hn4IPf8/6RRHAf44ucoIC ZSaw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=/9WZzc0JHsPutkXhtZtPjDnbj9w/mbrhFV+hIT7xcoo=; b=VD4Kotn+pJUEHbByvXkH/hNdkAgjmMnGGS5MgzlYpTsjxTs4gsWtUWH7DXzg5HlhD8 kAJerLR4GWo5oor46J++Ym8Q8lx6LSd8+PeCPOE3wg+ks0a3mdHrW80fvAm3WtBZeft1 dVtG2jMEv17B1YFjA9kjNE4ZoN61w+9OxGNhxFNaJe6JYlvkfZpxt9V0LfEUEOVF25A7 7bCUKCZwD6P3gl8uiz4DNwbdwrXJsVs7fLLZxkmqbU2gUhOCaJAcwLsN/5oFzV+bGIz2 yFurnJXb0ovStmY+sEq1jb78vYFWH1QUj+Tj0nuOUiWXvYQ9VVHAE/iPiWkoKYOOduTY 7bCg== X-Gm-Message-State: AOAM532rDExvsTL/4pcPBhD7jBo4XThZWKqlizAuSpElZYzz6385HmZz +E0Hskwdb9MdBttXPBoyE9Cf+rFqQV8= X-Google-Smtp-Source: ABdhPJxkm542g4izd0kPYWYLVMJDquNT4CphCXnN17T3hiYdC2z79E/yLghSlauppyTuAYBhXOtlhw== X-Received: by 2002:adf:f512:: with SMTP id q18mr6831963wro.55.1611676901422; Tue, 26 Jan 2021 08:01:41 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id l18sm3735818wme.37.2021.01.26.08.01.40 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 26 Jan 2021 08:01:40 -0800 (PST) Message-Id: In-Reply-To: References: Date: Tue, 26 Jan 2021 16:01:23 +0000 Subject: [PATCH 14/17] midx: use chunk-format read API Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: me@ttaylorr.com, gitster@pobox.com, l.s.r@web.de, szeder.dev@gmail.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee Instead of parsing the table of contents directly, use the chunk-format API methods read_table_of_contents() and pair_chunk(). In particular, we can use the return value of pair_chunk() to generate an error when a required chunk is missing. Signed-off-by: Derrick Stolee --- midx.c | 103 ++++++++++++++++++++---------------- t/t5319-multi-pack-index.sh | 6 +-- 2 files changed, 60 insertions(+), 49 deletions(-) diff --git a/midx.c b/midx.c index 0bfd2d802b6..dd019c00795 100644 --- a/midx.c +++ b/midx.c @@ -54,6 +54,51 @@ static char *get_midx_filename(const char *object_dir) return xstrfmt("%s/pack/multi-pack-index", object_dir); } +static int midx_read_pack_names(const unsigned char *chunk_start, + size_t chunk_size, void *data) +{ + struct multi_pack_index *m = (struct multi_pack_index *)data; + m->chunk_pack_names = chunk_start; + return 0; +} + +static int midx_read_oid_fanout(const unsigned char *chunk_start, + size_t chunk_size, void *data) +{ + struct multi_pack_index *m = (struct multi_pack_index *)data; + m->chunk_oid_fanout = (uint32_t *)chunk_start; + + if (chunk_size != 4 * 256) { + error(_("multi-pack-index OID fanout is of the wrong size")); + return 1; + } + return 0; +} + +static int midx_read_oid_lookup(const unsigned char *chunk_start, + size_t chunk_size, void *data) +{ + struct multi_pack_index *m = (struct multi_pack_index *)data; + m->chunk_oid_lookup = chunk_start; + return 0; +} + +static int midx_read_offsets(const unsigned char *chunk_start, + size_t chunk_size, void *data) +{ + struct multi_pack_index *m = (struct multi_pack_index *)data; + m->chunk_object_offsets = chunk_start; + return 0; +} + +static int midx_read_large_offsets(const unsigned char *chunk_start, + size_t chunk_size, void *data) +{ + struct multi_pack_index *m = (struct multi_pack_index *)data; + m->chunk_large_offsets = chunk_start; + return 0; +} + struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local) { struct multi_pack_index *m = NULL; @@ -65,6 +110,7 @@ struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local char *midx_name = get_midx_filename(object_dir); uint32_t i; const char *cur_pack_name; + struct chunkfile *cf = NULL; fd = git_open(midx_name); @@ -114,58 +160,23 @@ struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local m->num_packs = get_be32(m->data + MIDX_BYTE_NUM_PACKS); - for (i = 0; i < m->num_chunks; i++) { - uint32_t chunk_id = get_be32(m->data + MIDX_HEADER_SIZE + - MIDX_CHUNKLOOKUP_WIDTH * i); - uint64_t chunk_offset = get_be64(m->data + MIDX_HEADER_SIZE + 4 + - MIDX_CHUNKLOOKUP_WIDTH * i); - - if (chunk_offset >= m->data_len) - die(_("invalid chunk offset (too large)")); - - switch (chunk_id) { - case MIDX_CHUNKID_PACKNAMES: - m->chunk_pack_names = m->data + chunk_offset; - break; - - case MIDX_CHUNKID_OIDFANOUT: - m->chunk_oid_fanout = (uint32_t *)(m->data + chunk_offset); - break; - - case MIDX_CHUNKID_OIDLOOKUP: - m->chunk_oid_lookup = m->data + chunk_offset; - break; - - case MIDX_CHUNKID_OBJECTOFFSETS: - m->chunk_object_offsets = m->data + chunk_offset; - break; - - case MIDX_CHUNKID_LARGEOFFSETS: - m->chunk_large_offsets = m->data + chunk_offset; - break; - - case 0: - die(_("terminating multi-pack-index chunk id appears earlier than expected")); - break; - - default: - /* - * Do nothing on unrecognized chunks, allowing future - * extensions to add optional chunks. - */ - break; - } - } + cf = init_chunkfile(NULL); - if (!m->chunk_pack_names) + if (read_table_of_contents(cf, m->data, midx_size, + MIDX_HEADER_SIZE, m->num_chunks)) + goto cleanup_fail; + + if (pair_chunk(cf, MIDX_CHUNKID_PACKNAMES, midx_read_pack_names, m) == CHUNK_NOT_FOUND) die(_("multi-pack-index missing required pack-name chunk")); - if (!m->chunk_oid_fanout) + if (pair_chunk(cf, MIDX_CHUNKID_OIDFANOUT, midx_read_oid_fanout, m) == CHUNK_NOT_FOUND) die(_("multi-pack-index missing required OID fanout chunk")); - if (!m->chunk_oid_lookup) + if (pair_chunk(cf, MIDX_CHUNKID_OIDLOOKUP, midx_read_oid_lookup, m) == CHUNK_NOT_FOUND) die(_("multi-pack-index missing required OID lookup chunk")); - if (!m->chunk_object_offsets) + if (pair_chunk(cf, MIDX_CHUNKID_OBJECTOFFSETS, midx_read_offsets, m) == CHUNK_NOT_FOUND) die(_("multi-pack-index missing required object offsets chunk")); + pair_chunk(cf, MIDX_CHUNKID_LARGEOFFSETS, midx_read_large_offsets, m); + m->num_objects = ntohl(m->chunk_oid_fanout[255]); m->pack_names = xcalloc(m->num_packs, sizeof(*m->pack_names)); diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh index 297de502a94..ad4e878b65b 100755 --- a/t/t5319-multi-pack-index.sh +++ b/t/t5319-multi-pack-index.sh @@ -314,12 +314,12 @@ test_expect_success 'verify bad OID version' ' test_expect_success 'verify truncated chunk count' ' corrupt_midx_and_verify $MIDX_BYTE_CHUNK_COUNT "\01" $objdir \ - "missing required" + "final chunk has non-zero id" ' test_expect_success 'verify extended chunk count' ' corrupt_midx_and_verify $MIDX_BYTE_CHUNK_COUNT "\07" $objdir \ - "terminating multi-pack-index chunk id appears earlier than expected" + "terminating chunk id appears earlier than expected" ' test_expect_success 'verify missing required chunk' ' @@ -329,7 +329,7 @@ test_expect_success 'verify missing required chunk' ' test_expect_success 'verify invalid chunk offset' ' corrupt_midx_and_verify $MIDX_BYTE_CHUNK_OFFSET "\01" $objdir \ - "invalid chunk offset (too large)" + "improper chunk offset(s)" ' test_expect_success 'verify packnames out of order' ' From patchwork Tue Jan 26 16:01:24 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 12047297 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 920BDC433E0 for ; Tue, 26 Jan 2021 16:06:30 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 605B8207B3 for ; Tue, 26 Jan 2021 16:06:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2404545AbhAZQFs (ORCPT ); Tue, 26 Jan 2021 11:05:48 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37776 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2404657AbhAZQDO (ORCPT ); Tue, 26 Jan 2021 11:03:14 -0500 Received: from mail-wm1-x333.google.com (mail-wm1-x333.google.com [IPv6:2a00:1450:4864:20::333]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D2EE3C0698C9 for ; Tue, 26 Jan 2021 08:01:43 -0800 (PST) Received: by mail-wm1-x333.google.com with SMTP id f16so2881958wmq.5 for ; Tue, 26 Jan 2021 08:01:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=tScQM6HseLjvMkUwY+RsUDvgpbJ4oohiYQmpQI2/Y2A=; b=A939UeWUDuo4T1RZs5hgdyiqBVf9YWpYWsw2BCcsvPtT1065UZBfILdL8GuSfYSreW WDyBvwpsKOWsQfloz/40V5FfHbS2MinSP7+PCPDRFeKsaU+nFfIps/jU+dExgQipA5Pe QtakY80R5YeOwYmPw+jWJyRqh7U8MMW9xos0+CV3ig5kgeB1edeyYpByVp3zZF+kADbg 6NkYnH6gMDxH6U82rNlqJuI+D6HlFD5fGKyhOxPfAhlo9aWU9jzuD3ZupXdQiYWoM8gT f6v7EQB2lv7+i0ElibBvmHgtDlPro7ylAnCwe84TQBurgT51D2YjrrCmzOX+toK95B3F gTPQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=tScQM6HseLjvMkUwY+RsUDvgpbJ4oohiYQmpQI2/Y2A=; b=FzZUzSpOmdQlMf1ImVumn28imj3yjneecVm1UaKWLjedBQ9t7c2OHnw08vlaVIYMjS 4v6BFq/ATSJVRDLC7fWYnijaKsjmoev/QWpIulRGP8NqnUgfMsY4P6JbQS3grgYZtuwi bpGjGFCAxwucUJ/7OhV5NsWcMdd62A7gNexwxI71FeJQYV4+ObAwyGSFwnCw2WyF6bwz 9+mODaxjP48VL1gqGQEwXv7+ZOD42OAymF7aCbO2fuMNUQ0Q6rtlnldVUBlDOsysTDXR IcDxSk66nnYHkuj3g6x4SWoX6p2cC3+XCz1Wa8O9HkZqQdY7sNpYfpzr9iiwAfyYQ6Pw 2KOQ== X-Gm-Message-State: AOAM531XAJPhmq/n5VHbEqy/ZpD3oJpBJKfaa3uzMLgfGwDhx+/CkyUT f2PB+pKdksWVMHN14FmMRtKOBJ5Urns= X-Google-Smtp-Source: ABdhPJzL98u3z9d5sj3noZZUZZzaBDXNJQPOhNtorTEdufC/kiLMdqSGzwzEbO84DzDuNTgw3cL45Q== X-Received: by 2002:a1c:b657:: with SMTP id g84mr400903wmf.52.1611676902373; Tue, 26 Jan 2021 08:01:42 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id f17sm21054288wrv.0.2021.01.26.08.01.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 26 Jan 2021 08:01:41 -0800 (PST) Message-Id: In-Reply-To: References: Date: Tue, 26 Jan 2021 16:01:24 +0000 Subject: [PATCH 15/17] midx: use 64-bit multiplication for chunk sizes Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: me@ttaylorr.com, gitster@pobox.com, l.s.r@web.de, szeder.dev@gmail.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee When calculating the sizes of certain chunks, we should use 64-bit multiplication always. This allows us to properly predict the chunk sizes without risk of overflow. Signed-off-by: Derrick Stolee --- midx.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/midx.c b/midx.c index dd019c00795..47aaeb804b8 100644 --- a/midx.c +++ b/midx.c @@ -945,7 +945,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * add_chunk(cf, MIDX_CHUNKID_OIDFANOUT, write_midx_oid_fanout, MIDX_CHUNK_FANOUT_SIZE); add_chunk(cf, MIDX_CHUNKID_OIDLOOKUP, - write_midx_oid_lookup, ctx.entries_nr * the_hash_algo->rawsz); + write_midx_oid_lookup, (uint64_t)ctx.entries_nr * the_hash_algo->rawsz); add_chunk(cf, MIDX_CHUNKID_OBJECTOFFSETS, write_midx_object_offsets, ctx.entries_nr * MIDX_CHUNK_OFFSET_WIDTH); @@ -953,7 +953,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * if (ctx.large_offsets_needed) add_chunk(cf, MIDX_CHUNKID_LARGEOFFSETS, write_midx_large_offsets, - ctx.num_large_offsets * MIDX_CHUNK_LARGE_OFFSET_WIDTH); + (uint64_t)ctx.num_large_offsets * MIDX_CHUNK_LARGE_OFFSET_WIDTH); write_midx_header(f, get_num_chunks(cf), ctx.nr - dropped_packs); write_chunkfile(cf, &ctx); From patchwork Tue Jan 26 16:01:25 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 12047295 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3CA02C433E6 for ; Tue, 26 Jan 2021 16:05:38 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 050C3207B3 for ; Tue, 26 Jan 2021 16:05:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2404414AbhAZQFb (ORCPT ); Tue, 26 Jan 2021 11:05:31 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37778 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2404665AbhAZQDO (ORCPT ); Tue, 26 Jan 2021 11:03:14 -0500 Received: from mail-wr1-x435.google.com (mail-wr1-x435.google.com [IPv6:2a00:1450:4864:20::435]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AC52EC0698CA for ; Tue, 26 Jan 2021 08:01:44 -0800 (PST) Received: by mail-wr1-x435.google.com with SMTP id l12so17021337wry.2 for ; Tue, 26 Jan 2021 08:01:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=1BlroPWhD3y+SmudQNbrVaRfcLBj38002/ZAALSt3fk=; b=ddtZbXcr5kP3wtfpERSH/guww9vnCpzIGGssTpG6vhB0M/SjznKWFRGuV5s4LwGAvy Gpfx74FdmZH580BOvSpzyXs4jKSpty834DkgL9Juqf9XX6JT0MkCxtRzOh/rjvzAubBG leCyifJKY1qwLjZ3loMWamlG0dBBiocBOgXbF805ZRIlvmZiaxVrfktZE2NznxyPYLs9 3JDR6E3QEjSx4kS4kV1hndgYcu2KdQFXgAFQYepPyaZAumV7FWHh300rTiyI7kqyDMm6 qvcrLe3zD4s5cQ4pTH5phw6uN8wXJL5/eelrG+CbtP2sQfsEQq7urVHA6TMi1cMSajO0 ZA1A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=1BlroPWhD3y+SmudQNbrVaRfcLBj38002/ZAALSt3fk=; b=TtCbpJX7dYc1CY/mDLAMOMZPl75wPwUEz/Uye4AN3WCOj5zDHQca+71+lymoybXsO2 l7S2juW/TuN+cPq9C+7ZIXBxfDzpEJbJdIW9yX1D9yLTQQphSUrVIjL/LGpXmVsGseLJ JZaRZxKaUHg0L+3bLt+8T5+6F/nFsAAWpPWMhqV11b8QGdYI83ajYQ/uogBgJL13Imra /k1vm6PK2wXrXtXOKWGt9M5z0xoKohcxjA/jz+npX5N764Cg+DLVzqNXVm8/WKwhDviS RGadxR0PphTQ8a8H5UrQW+Yc+DS0WTwRlFxMGFwTnaZCQDOEzk4Ei6O+ET9IsWko4hLJ xj1w== X-Gm-Message-State: AOAM532iS4S/13ugEs9RZ/RScgncS/l/hjfeY2O7JS+t8Xkw927wEJCT DnNN1nNTW/NM/SGb2JplPltEbNuErkU= X-Google-Smtp-Source: ABdhPJzLH4Eu7dwC3odxb8etyzr9bUDezaUqrVNOuLBPb3AWDKOfSlv+8FvM9pHmWp8zq9BixfzoWQ== X-Received: by 2002:a5d:65cd:: with SMTP id e13mr6733279wrw.120.1611676903237; Tue, 26 Jan 2021 08:01:43 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id g1sm27408790wrq.30.2021.01.26.08.01.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 26 Jan 2021 08:01:42 -0800 (PST) Message-Id: <62a23842aa650d6f56d5d258cc76fa56a547c088.1611676886.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Tue, 26 Jan 2021 16:01:25 +0000 Subject: [PATCH 16/17] chunk-format: restore duplicate chunk checks Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: me@ttaylorr.com, gitster@pobox.com, l.s.r@web.de, szeder.dev@gmail.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee Before refactoring into the chunk-format API, the commit-graph parsing logic included checks for duplicate chunks. It is unlikely that we would desire a chunk-based file format that allows duplicate chunk IDs in the table of contents, so add duplicate checks into read_table_of_contents(). Signed-off-by: Derrick Stolee --- chunk-format.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/chunk-format.c b/chunk-format.c index 674d31d5e58..3c833038096 100644 --- a/chunk-format.c +++ b/chunk-format.c @@ -14,6 +14,7 @@ struct chunk_info { chunk_write_fn write_fn; const void *start; + unsigned found:1; }; struct chunkfile { @@ -98,6 +99,7 @@ int read_table_of_contents(struct chunkfile *cf, uint64_t toc_offset, int toc_length) { + int i; uint32_t chunk_id; const unsigned char *table_of_contents = mfile + toc_offset; @@ -124,6 +126,14 @@ int read_table_of_contents(struct chunkfile *cf, return -1; } + for (i = 0; i < cf->chunks_nr; i++) { + if (cf->chunks[i].id == chunk_id) { + error(_("duplicate chunk ID %"PRIx32" found"), + chunk_id); + return -1; + } + } + cf->chunks[cf->chunks_nr].id = chunk_id; cf->chunks[cf->chunks_nr].start = mfile + chunk_offset; cf->chunks[cf->chunks_nr].size = next_chunk_offset - chunk_offset; From patchwork Tue Jan 26 16:01:26 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 12047291 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3E777C433DB for ; Tue, 26 Jan 2021 16:04:56 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 0AE882245C for ; Tue, 26 Jan 2021 16:04:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2404310AbhAZQDv (ORCPT ); Tue, 26 Jan 2021 11:03:51 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37606 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2404570AbhAZQCu (ORCPT ); Tue, 26 Jan 2021 11:02:50 -0500 Received: from mail-wr1-x435.google.com (mail-wr1-x435.google.com [IPv6:2a00:1450:4864:20::435]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 91573C0698CB for ; Tue, 26 Jan 2021 08:01:45 -0800 (PST) Received: by mail-wr1-x435.google.com with SMTP id 6so17019896wri.3 for ; Tue, 26 Jan 2021 08:01:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=/31wU3H+Rpd5f02JkLU2apACd46gCGYTclTqM21pbrU=; b=WH1gy189bG8syuAuMMA8X5of+0Qp3VhR7nIfFuXmQHgMJXW+Xidwca+Obgzhx2C+f+ 5nB76y3MODoeRmZtKTIvJzlAebEzB8BP5PhyPOcpY527MHmiEqtuKn4WH7qmL1X2bLpN RsoH2WAF0OQIPCdip1GS49MXJ3AVDPzBP+Mu4zd0Uqo2Itn6FGLwtpARubn4weIRXMeN 99D1jWX7lm+iLbsZBjvAOJEMRCYIF6bCLl1nAce5yX1rU18b13ZFEBFms9bHgWatFjkz iOW96gMNqfIhZPWGUe1AT7YbO//cW7r6ZzGRv0V3g0IJdnC6zganhTMuaS1TsQMDWb64 4vHg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=/31wU3H+Rpd5f02JkLU2apACd46gCGYTclTqM21pbrU=; b=qAMMLyQCu65WlR0uiXQyiswtHNe9x3hzEAfHSLhvrVbYLFSt6tkVaWvu0MZvXkEasg JNJHHwHnuEymgOVUtV4n9RD9l7DHB7CB1Fq0kOM8F9e6XCAPIsIy+R5qj08r6DMIZgNP k/sl+OBmt0EwWxkDcE4Yp50ydHBXPne9UeixwROq9bE6k0H+YuuYYzFNpQEv48zHsBJm PpzxXDFRKX3q6tTOcjimNicyI7J/7UPAiFkaKVi0R1YdepA4hPxfE3r8ARAEXGzC/HD7 OgUnbIpC4sEXvCt8kauUtUFd5KcaHmo4HehcINPa6qCKSI6eVRvtEzC/EDhl6PozIkMm jSTQ== X-Gm-Message-State: AOAM53352ZPvpgq9shwEdSeKH4VjV8jjkc1nR8PifzzQ+dlvJhtq3Ve9 ExXD98TkDOtyJMM+yV9QO3aUpa+n2QU= X-Google-Smtp-Source: ABdhPJwAZHOMvDFOmAqpszmk6mDC4AyHbP7tRBFDrBw7r+k+wYUaKSlQ2ZRHsbuhrJ2j/jfFWzyoCA== X-Received: by 2002:adf:f909:: with SMTP id b9mr7028152wrr.111.1611676904149; Tue, 26 Jan 2021 08:01:44 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id w4sm3670237wmc.13.2021.01.26.08.01.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 26 Jan 2021 08:01:43 -0800 (PST) Message-Id: <05cbd0a8d93a3e54d868a549fe76e16cb75ba6d6.1611676886.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Tue, 26 Jan 2021 16:01:26 +0000 Subject: [PATCH 17/17] chunk-format: add technical docs Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: me@ttaylorr.com, gitster@pobox.com, l.s.r@web.de, szeder.dev@gmail.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee The chunk-based file format is now an API in the code, but we should also take time to document it as a file format. Specifically, it matches the CHUNK LOOKUP sections of the commit-graph and multi-pack-index files, but there are some commonalities that should be grouped in this document. Signed-off-by: Derrick Stolee --- Documentation/technical/chunk-format.txt | 54 +++++++++++++++++++ .../technical/commit-graph-format.txt | 3 ++ Documentation/technical/pack-format.txt | 3 ++ 3 files changed, 60 insertions(+) create mode 100644 Documentation/technical/chunk-format.txt diff --git a/Documentation/technical/chunk-format.txt b/Documentation/technical/chunk-format.txt new file mode 100644 index 00000000000..3db3792dea2 --- /dev/null +++ b/Documentation/technical/chunk-format.txt @@ -0,0 +1,54 @@ +Chunk-based file formats +======================== + +Some file formats in Git use a common concept of "chunks" to describe +sections of the file. This allows structured access to a large file by +scanning a small "table of contents" for the remaining data. This common +format is used by the `commit-graph` and `multi-pack-index` files. See +link:technical/pack-format.html[the `multi-pack-index` format] and +link:technical/commit-graph-format.html[the `commit-graph` format] for +how they use the chunks to describe structured data. + +A chunk-based file format begins with some header information custom to +that format. That header should include enough information to identify +the file type, format version, and number of chunks in the file. From this +information, that file can determine the start of the chunk-based region. + +The chunk-based region starts with a table of contents describing where +each chunk starts and ends. This consists of (C+1) rows of 12 bytes each, +where C is the number of chunks. Consider the following table: + + | Chunk ID (4 bytes) | Chunk Offset (8 bytes) | + |--------------------|------------------------| + | ID[0] | OFFSET[0] | + | ... | ... | + | ID[C] | OFFSET[C] | + | 0x0000 | OFFSET[C+1] | + +Each row consists of a 4-byte chunk identifier (ID) and an 8-byte offset. +Each integer is stored in network-byte order. + +The chunk identifier `ID[i]` is a label for the data stored within this +fill from `OFFSET[i]` (inclusive) to `OFFSET[i+1]` (exclusive). Thus, the +size of the `i`th chunk is equal to the difference between `OFFSET[i+1]` +and `OFFSET[i]`. This requires that the chunk data appears contiguously +in the same order as the table of contents. + +The final entry in the table of contents must be four zero bytes. This +confirms that the table of contents is ending and provides the offset for +the end of the chunk-based data. + +Note: The chunk-based format expects that the file contains _at least_ a +trailing hash after `OFFSET[C+1]`. + +Functions for working with chunk-based file formats are declared in +`chunk-format.h`. Using these methods provide extra checks that assist +developers when creating new file formats, including: + + 1. Writing and reading the table of contents. + + 2. Verifying that the data written in a chunk matches the expected size + that was recorded in the table of contents. + + 3. Checking that a table of contents describes offsets properly within + the file boundaries. diff --git a/Documentation/technical/commit-graph-format.txt b/Documentation/technical/commit-graph-format.txt index b6658eff188..87971c27dd7 100644 --- a/Documentation/technical/commit-graph-format.txt +++ b/Documentation/technical/commit-graph-format.txt @@ -61,6 +61,9 @@ CHUNK LOOKUP: the length using the next chunk position if necessary.) Each chunk ID appears at most once. + The CHUNK LOOKUP matches the table of contents from + link:technical/chunk-format.html[the chunk-based file format]. + The remaining data in the body is described one chunk at a time, and these chunks may be given in any order. Chunks are required unless otherwise specified. diff --git a/Documentation/technical/pack-format.txt b/Documentation/technical/pack-format.txt index f96b2e605f3..2fb1e60d29e 100644 --- a/Documentation/technical/pack-format.txt +++ b/Documentation/technical/pack-format.txt @@ -301,6 +301,9 @@ CHUNK LOOKUP: (Chunks are provided in file-order, so you can infer the length using the next chunk position if necessary.) + The CHUNK LOOKUP matches the table of contents from + link:technical/chunk-format.html[the chunk-based file format]. + The remaining data in the body is described one chunk at a time, and these chunks may be given in any order. Chunks are required unless otherwise specified.