From patchwork Fri Feb 5 14:30:36 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 12071099 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id EE93CC433E9 for ; Fri, 5 Feb 2021 22:17:29 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C691164EE4 for ; Fri, 5 Feb 2021 22:17:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232932AbhBEWRC (ORCPT ); Fri, 5 Feb 2021 17:17:02 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44422 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232746AbhBEOf7 (ORCPT ); Fri, 5 Feb 2021 09:35:59 -0500 Received: from mail-wm1-x32a.google.com (mail-wm1-x32a.google.com [IPv6:2a00:1450:4864:20::32a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 11509C061223 for ; Fri, 5 Feb 2021 08:14:05 -0800 (PST) Received: by mail-wm1-x32a.google.com with SMTP id u14so6316467wmq.4 for ; Fri, 05 Feb 2021 08:14:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=5AF9N0vk+cHk5hTxJy9cJp3P1lCV7muF3iRhjAsxxzE=; b=VXY+7lLHwLTI2ONwyxAS920n4M+9TeyreMdheNEGge8PwpybENAj6mTq+vKSyFQw81 MJrmK49Krg0piPBdqNKCwahbbf+wCXUhBhd5iA4xvFihldizCK6p8TSla9JEpntfwH5I M/UNS3jggJa+1zhclJaLVHsBolNoxRre/iOlB7ZGM4lsM7rgulonjOJrX1k0dOWwtemB LX1kOWDohm3EWmHHp0Rg0e09J7+ruqsVAbFNl/kFB7FcheXLBwoFKD001iVE0gpcaHB/ sqMsU8j7UDJu8IygwtVT9WBMQZMU45+dXvWbcVQnKE1pCa+cl7nMSo+clzTMLS71t/Z9 oAeQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=5AF9N0vk+cHk5hTxJy9cJp3P1lCV7muF3iRhjAsxxzE=; b=qYmMr42gJvfvWopsrWgFvQHnhYQfoKeD5naKGRGoJteT3D1/VjtBQwH3t4TwPqXTW9 silIjy0Iu7NW5bysxcqraovBsp4eDY+gLkGbEJhC7M/bolIxckbziv0ab0dSMXzFWhzl TISlwpyoVLe+bOciQMfh9gIeB9sqjC5U63e5VbYo0uhmme3mpCLSfdsZsW49Lqpiw3DM bIqzGi7VI9czmGmaLydsonWA9k8qzb/8RNxsJmR7Qa8uo7L0ThdL/8sxhjqR5U+gYKvR 0/s6hVzVNtbtyYCjGhFdYqPsL5TE2fQbbfR5gDnOY+SxhqMfnTUO/JtUqwMVJo38DBZG UnSQ== X-Gm-Message-State: AOAM530ozdvG8d37/k/uN0slXi0VVaiVRz+3B40zsKdrDjjq4Jt3HfM2 33S/T/OJR6mnP0SyWQbYPx/gejwfi4s= X-Google-Smtp-Source: ABdhPJx986be56j5ZZ9nOBz7DopezkzuO47Urd5u4wO0azdb582vCMG6ntPgHyEK8wTfoDe+emmuQQ== X-Received: by 2002:a1c:b782:: with SMTP id h124mr3887150wmf.67.1612535455469; Fri, 05 Feb 2021 06:30:55 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id u70sm7600558wmu.20.2021.02.05.06.30.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 05 Feb 2021 06:30:54 -0800 (PST) Message-Id: <243dcec9436853ff8d1bf2580e76ab909b7cb324.1612535453.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Fri, 05 Feb 2021 14:30:36 +0000 Subject: [PATCH v3 01/17] commit-graph: anonymize data in chunk_write_fn Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: me@ttaylorr.com, gitster@pobox.com, l.s.r@web.de, szeder.dev@gmail.com, Chris Torek , Derrick Stolee , Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee In preparation for creating an API around file formats using chunks and tables of contents, prepare the commit-graph write code to use prototypes that will match this new API. Specifically, convert chunk_write_fn to take a "void *data" parameter instead of the commit-graph-specific "struct write_commit_graph_context" pointer. Signed-off-by: Derrick Stolee --- commit-graph.c | 29 +++++++++++++++++++---------- 1 file changed, 19 insertions(+), 10 deletions(-) diff --git a/commit-graph.c b/commit-graph.c index f3bde2ad95a1..fae7d1b63931 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -1040,8 +1040,9 @@ struct write_commit_graph_context { }; static int write_graph_chunk_fanout(struct hashfile *f, - struct write_commit_graph_context *ctx) + void *data) { + struct write_commit_graph_context *ctx = data; int i, count = 0; struct commit **list = ctx->commits.list; @@ -1066,8 +1067,9 @@ static int write_graph_chunk_fanout(struct hashfile *f, } static int write_graph_chunk_oids(struct hashfile *f, - struct write_commit_graph_context *ctx) + void *data) { + struct write_commit_graph_context *ctx = data; struct commit **list = ctx->commits.list; int count; for (count = 0; count < ctx->commits.nr; count++, list++) { @@ -1085,8 +1087,9 @@ static const unsigned char *commit_to_sha1(size_t index, void *table) } static int write_graph_chunk_data(struct hashfile *f, - struct write_commit_graph_context *ctx) + void *data) { + struct write_commit_graph_context *ctx = data; struct commit **list = ctx->commits.list; struct commit **last = ctx->commits.list + ctx->commits.nr; uint32_t num_extra_edges = 0; @@ -1187,8 +1190,9 @@ static int write_graph_chunk_data(struct hashfile *f, } static int write_graph_chunk_generation_data(struct hashfile *f, - struct write_commit_graph_context *ctx) + void *data) { + struct write_commit_graph_context *ctx = data; int i, num_generation_data_overflows = 0; for (i = 0; i < ctx->commits.nr; i++) { @@ -1208,8 +1212,9 @@ static int write_graph_chunk_generation_data(struct hashfile *f, } static int write_graph_chunk_generation_data_overflow(struct hashfile *f, - struct write_commit_graph_context *ctx) + void *data) { + struct write_commit_graph_context *ctx = data; int i; for (i = 0; i < ctx->commits.nr; i++) { struct commit *c = ctx->commits.list[i]; @@ -1226,8 +1231,9 @@ static int write_graph_chunk_generation_data_overflow(struct hashfile *f, } static int write_graph_chunk_extra_edges(struct hashfile *f, - struct write_commit_graph_context *ctx) + void *data) { + struct write_commit_graph_context *ctx = data; struct commit **list = ctx->commits.list; struct commit **last = ctx->commits.list + ctx->commits.nr; struct commit_list *parent; @@ -1280,8 +1286,9 @@ static int write_graph_chunk_extra_edges(struct hashfile *f, } static int write_graph_chunk_bloom_indexes(struct hashfile *f, - struct write_commit_graph_context *ctx) + void *data) { + struct write_commit_graph_context *ctx = data; struct commit **list = ctx->commits.list; struct commit **last = ctx->commits.list + ctx->commits.nr; uint32_t cur_pos = 0; @@ -1315,8 +1322,9 @@ static void trace2_bloom_filter_settings(struct write_commit_graph_context *ctx) } static int write_graph_chunk_bloom_data(struct hashfile *f, - struct write_commit_graph_context *ctx) + void *data) { + struct write_commit_graph_context *ctx = data; struct commit **list = ctx->commits.list; struct commit **last = ctx->commits.list + ctx->commits.nr; @@ -1737,8 +1745,9 @@ static int write_graph_chunk_base_1(struct hashfile *f, } static int write_graph_chunk_base(struct hashfile *f, - struct write_commit_graph_context *ctx) + void *data) { + struct write_commit_graph_context *ctx = data; int num = write_graph_chunk_base_1(f, ctx->new_base_graph); if (num != ctx->num_commit_graphs_after - 1) { @@ -1750,7 +1759,7 @@ static int write_graph_chunk_base(struct hashfile *f, } typedef int (*chunk_write_fn)(struct hashfile *f, - struct write_commit_graph_context *ctx); + void *data); struct chunk_info { uint32_t id; From patchwork Fri Feb 5 14:30:37 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 12071031 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8628DC433DB for ; Fri, 5 Feb 2021 22:01:40 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 3F85B64E4E for ; Fri, 5 Feb 2021 22:01:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233002AbhBEWBL (ORCPT ); Fri, 5 Feb 2021 17:01:11 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49206 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232970AbhBEO5n (ORCPT ); Fri, 5 Feb 2021 09:57:43 -0500 Received: from mail-wm1-x32a.google.com (mail-wm1-x32a.google.com [IPv6:2a00:1450:4864:20::32a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 72909C06121E for ; Fri, 5 Feb 2021 08:28:40 -0800 (PST) Received: by mail-wm1-x32a.google.com with SMTP id u14so6353519wmq.4 for ; Fri, 05 Feb 2021 08:28:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=FESohs3eVG/Vtak1cJSjlpSe3VyQTAQuJDbcHz3Ibp8=; b=pLwXNUiq8fNt7rjt7OrDo1fI3amhyhC49xPMMIH2XoKREnVg2AZ1cABK7ztRWAVq3k 5JGNNu/ZF3lrH1T9G0sb7s/1gPC0BksgDZS7gCgLLMO1XihYGWcGyjImaMsUeSYJglKw o/95mVT5DTHRFv0VHDS6big6qc0I5UG32gHf0RaMFdm/ryPFRjc2o2wZw0QF7pWik1s5 lu42JN2EixLKTxnxw3FkMD00GDw4+JPY5zzk3+K/2H8F0jeyVXAt853kNs7zVDl/btZA kdk5tRhKVi7TDU/w94TT/gBbzk7XKKL6LNOacR7wV9tcMzCblYxXaQ7GPw0oeikbI0lg Pvwg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=FESohs3eVG/Vtak1cJSjlpSe3VyQTAQuJDbcHz3Ibp8=; b=awJe0lMoboDgf90TEaMRnMYxz7Yy04uS7ua6lmb0nOaSBeE18LdCgdMygJk6LZ8M90 6ApcEfedMXvGO03ueu7ePHVCIpmAfzu/WY4B+g0x2sez5aX+2cZslAbC/Umm4w3wqXkX 6JfDI4fZe/bAFfRv/He9BG0fcxRd8ul1ycV8nR9bDHGQrDE7d3+ab/p8Muymy/drj17G 8iy3LFdfmenEUhCjRuCAvl2Oei5SkY7KaWQ040IQoee2o/EYIUul85iixTjW7g+xG/rN 8iN45vld5GK3RLitkQ+clrEQHBtRJwuxsTPYcGljIU8B7/Ju36Y0t//m/d2tg3ANBm3v dSSw== X-Gm-Message-State: AOAM530EF11OgW6Sg0k422CqUz6aOSqF4E9wxi4EFhqPBFq16QoxYsJf ZrxPzF+VISZTGOXJWIo/vsHOvWAp8oA= X-Google-Smtp-Source: ABdhPJxkh4nyD5a9ia2nvNDegEdgfBJcqXm3IHeYtl4FG1GB+I3of7kiezL/Jc0brtsqtyLoLd+Wyg== X-Received: by 2002:a05:600c:20f:: with SMTP id 15mr3883348wmi.148.1612535456551; Fri, 05 Feb 2021 06:30:56 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id i6sm11491851wrs.71.2021.02.05.06.30.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 05 Feb 2021 06:30:55 -0800 (PST) Message-Id: <16c37d2370cf4fd5fc309ac6dc8aa6443ffcf3d7.1612535453.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Fri, 05 Feb 2021 14:30:37 +0000 Subject: [PATCH v3 02/17] chunk-format: create chunk format write API Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: me@ttaylorr.com, gitster@pobox.com, l.s.r@web.de, szeder.dev@gmail.com, Chris Torek , Derrick Stolee , Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee In anticipation of combining the logic from the commit-graph and multi-pack-index file formats, create a new chunk-format API. Use a 'struct chunkfile' pointer to keep track of data that has been registered for writes. This struct is anonymous outside of chunk-format.c to ensure no user attempts to interfere with the data. The next change will use this API in commit-graph.c, but the general approach is: 1. initialize the chunkfile with init_chunkfile(f). 2. add chunks in the intended writing order with add_chunk(). 3. write any header information to the hashfile f. 4. write the chunkfile data using write_chunkfile(). 5. free the chunkfile struct using free_chunkfile(). Helped-by: Taylor Blau Helped-by: Junio C Hamano Signed-off-by: Derrick Stolee Reported-by: SZEDER Gábor Signed-off-by: Derrick Stolee --- Makefile | 1 + chunk-format.c | 91 ++++++++++++++++++++++++++++++++++++++++++++++++++ chunk-format.h | 19 +++++++++++ 3 files changed, 111 insertions(+) create mode 100644 chunk-format.c create mode 100644 chunk-format.h diff --git a/Makefile b/Makefile index 7b64106930a6..50a7663841e9 100644 --- a/Makefile +++ b/Makefile @@ -854,6 +854,7 @@ LIB_OBJS += bundle.o LIB_OBJS += cache-tree.o LIB_OBJS += chdir-notify.o LIB_OBJS += checkout.o +LIB_OBJS += chunk-format.o LIB_OBJS += color.o LIB_OBJS += column.o LIB_OBJS += combine-diff.o diff --git a/chunk-format.c b/chunk-format.c new file mode 100644 index 000000000000..6e0f1900213e --- /dev/null +++ b/chunk-format.c @@ -0,0 +1,91 @@ +#include "cache.h" +#include "chunk-format.h" +#include "csum-file.h" +#define CHUNK_LOOKUP_WIDTH 12 + +/* + * When writing a chunk-based file format, collect the chunks in + * an array of chunk_info structs. The size stores the _expected_ + * amount of data that will be written by write_fn. + */ +struct chunk_info { + uint32_t id; + uint64_t size; + chunk_write_fn write_fn; +}; + +struct chunkfile { + struct hashfile *f; + + struct chunk_info *chunks; + size_t chunks_nr; + size_t chunks_alloc; +}; + +struct chunkfile *init_chunkfile(struct hashfile *f) +{ + struct chunkfile *cf = xcalloc(1, sizeof(*cf)); + cf->f = f; + return cf; +} + +void free_chunkfile(struct chunkfile *cf) +{ + if (!cf) + return; + free(cf->chunks); + free(cf); +} + +int get_num_chunks(struct chunkfile *cf) +{ + return cf->chunks_nr; +} + +void add_chunk(struct chunkfile *cf, + uint32_t id, + size_t size, + chunk_write_fn fn) +{ + ALLOC_GROW(cf->chunks, cf->chunks_nr + 1, cf->chunks_alloc); + + cf->chunks[cf->chunks_nr].id = id; + cf->chunks[cf->chunks_nr].write_fn = fn; + cf->chunks[cf->chunks_nr].size = size; + cf->chunks_nr++; +} + +int write_chunkfile(struct chunkfile *cf, void *data) +{ + int i; + uint64_t cur_offset = hashfile_total(cf->f); + + /* Add the table of contents to the current offset */ + cur_offset += (cf->chunks_nr + 1) * CHUNK_LOOKUP_WIDTH; + + for (i = 0; i < cf->chunks_nr; i++) { + hashwrite_be32(cf->f, cf->chunks[i].id); + hashwrite_be64(cf->f, cur_offset); + + cur_offset += cf->chunks[i].size; + } + + /* Trailing entry marks the end of the chunks */ + hashwrite_be32(cf->f, 0); + hashwrite_be64(cf->f, cur_offset); + + for (i = 0; i < cf->chunks_nr; i++) { + off_t start_offset = hashfile_total(cf->f); + int result = cf->chunks[i].write_fn(cf->f, data); + + if (result) + return result; + + if (hashfile_total(cf->f) - start_offset != cf->chunks[i].size) + BUG("expected to write %"PRId64" bytes to chunk %"PRIx32", but wrote %"PRId64" instead", + cf->chunks[i].size, cf->chunks[i].id, + hashfile_total(cf->f) - start_offset); + } + + return 0; +} diff --git a/chunk-format.h b/chunk-format.h new file mode 100644 index 000000000000..9a1d770accec --- /dev/null +++ b/chunk-format.h @@ -0,0 +1,19 @@ +#ifndef CHUNK_FORMAT_H +#define CHUNK_FORMAT_H + +#include "git-compat-util.h" + +struct hashfile; +struct chunkfile; + +struct chunkfile *init_chunkfile(struct hashfile *f); +void free_chunkfile(struct chunkfile *cf); +int get_num_chunks(struct chunkfile *cf); +typedef int (*chunk_write_fn)(struct hashfile *f, void *data); +void add_chunk(struct chunkfile *cf, + uint32_t id, + size_t size, + chunk_write_fn fn); +int write_chunkfile(struct chunkfile *cf, void *data); + +#endif From patchwork Fri Feb 5 14:30:38 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 12071101 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B2E8AC43381 for ; Fri, 5 Feb 2021 22:18:13 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 6924A61492 for ; Fri, 5 Feb 2021 22:18:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233102AbhBEWRW (ORCPT ); Fri, 5 Feb 2021 17:17:22 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43960 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232753AbhBEOgR (ORCPT ); Fri, 5 Feb 2021 09:36:17 -0500 Received: from mail-wm1-x32f.google.com (mail-wm1-x32f.google.com [IPv6:2a00:1450:4864:20::32f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 93E9EC06178A for ; Fri, 5 Feb 2021 08:14:47 -0800 (PST) Received: by mail-wm1-x32f.google.com with SMTP id j21so3625246wmj.0 for ; Fri, 05 Feb 2021 08:14:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=67ONBvc7e2kPQE1Xj9b/hJ6Ar4wdTpJCdz0ZQWHbmZo=; b=IMYNcard6atHkqqRJ4MpeimAUzA0dIpvZYJ3nJ3QwwmuvoGY0BCovDl/6btj61dUBB MEdOaQf+aU+r2uw3N58jMyZOX/t6egmZUlQQeTcAo3mjsi7un1np90bSV7dIEwb1BdRq 97ZUa8QxyKFNB35B5TA3BhEVqfcNRUisFCsu+WjevwpSHc3tTNaVjku4iZ+OrfBO9ARK xtccZRC+Gt4vtSPdMJsiMa2iOd2p939h2X0l9BhB/t8P1KrfbTADLgjBr2s3BUGlI/GH 2wWf09FScCbu5IGWvT/8g3mi1JpD02HPkCKt6S9IwjaKQkc9YF1M5fF0PcAikdzUBOXy YVQA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=67ONBvc7e2kPQE1Xj9b/hJ6Ar4wdTpJCdz0ZQWHbmZo=; b=T++Skdapzaikrm0pcRfYI0G2BZc9BjUuwvsnI0wx/pQGdLW3HPdhnVU+O0xPdkBsLq l2G2J5RHXmjtfdAXjx+6zl7zWYffsiamHW+AFyO6UQb72KyNJHLpRtc/qr8glO/rj7ee aFrl0usBli4eynrHmhBXCVHvxr2KdzkDdWjp/8E4A7vmPuO9zoJDTusw6aIW5qGw9Zlu dyMeVkg95HRUt/7yADMDaHHegsTpu3FJ4fow/5+MF8DJFT6jTIb9ccMwPOquOnGtc3Na 2ArrQO5HFwjEju73YD8NSCMFq+mG0VLpLZRsOZRUu5A/HcbRqvkA1Bd6aWehbGd5Tx1x 7Z1A== X-Gm-Message-State: AOAM532pOzXz6SV9gxeDlGr/BH+wR6yEsOj9vmen/GugaCtfY5ZozW03 6X7T22uwj/RG40ExcMVwcsMJwfVuYus= X-Google-Smtp-Source: ABdhPJy0RpTBizCfi0d1MH3YgkYXpAq39bTQK7vrWVM3y5N5g0+D+QVmoS8OnEuoLFuUzLh8S4bK+g== X-Received: by 2002:a1c:7913:: with SMTP id l19mr4001316wme.112.1612535457526; Fri, 05 Feb 2021 06:30:57 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id z1sm12229757wrp.62.2021.02.05.06.30.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 05 Feb 2021 06:30:57 -0800 (PST) Message-Id: In-Reply-To: References: Date: Fri, 05 Feb 2021 14:30:38 +0000 Subject: [PATCH v3 03/17] commit-graph: use chunk-format write API Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: me@ttaylorr.com, gitster@pobox.com, l.s.r@web.de, szeder.dev@gmail.com, Chris Torek , Derrick Stolee , Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee The commit-graph write logic is ready to make use of the chunk-format write API. Each chunk write method is already in the correct prototype. We only need to use the 'struct chunkfile' pointer and the correct API calls. Signed-off-by: Derrick Stolee --- commit-graph.c | 118 ++++++++++++++++--------------------------------- 1 file changed, 37 insertions(+), 81 deletions(-) diff --git a/commit-graph.c b/commit-graph.c index fae7d1b63931..7c607d23b29f 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -19,6 +19,7 @@ #include "shallow.h" #include "json-writer.h" #include "trace2.h" +#include "chunk-format.h" void git_test_write_commit_graph_or_die(void) { @@ -1758,27 +1759,17 @@ static int write_graph_chunk_base(struct hashfile *f, return 0; } -typedef int (*chunk_write_fn)(struct hashfile *f, - void *data); - -struct chunk_info { - uint32_t id; - uint64_t size; - chunk_write_fn write_fn; -}; - static int write_commit_graph_file(struct write_commit_graph_context *ctx) { uint32_t i; int fd; struct hashfile *f; struct lock_file lk = LOCK_INIT; - struct chunk_info chunks[MAX_NUM_CHUNKS + 1]; const unsigned hashsz = the_hash_algo->rawsz; struct strbuf progress_title = STRBUF_INIT; int num_chunks = 3; - uint64_t chunk_offset; struct object_id file_hash; + struct chunkfile *cf; if (ctx->split) { struct strbuf tmp_file = STRBUF_INIT; @@ -1824,76 +1815,50 @@ static int write_commit_graph_file(struct write_commit_graph_context *ctx) f = hashfd(lk.tempfile->fd, lk.tempfile->filename.buf); } - chunks[0].id = GRAPH_CHUNKID_OIDFANOUT; - chunks[0].size = GRAPH_FANOUT_SIZE; - chunks[0].write_fn = write_graph_chunk_fanout; - chunks[1].id = GRAPH_CHUNKID_OIDLOOKUP; - chunks[1].size = hashsz * ctx->commits.nr; - chunks[1].write_fn = write_graph_chunk_oids; - chunks[2].id = GRAPH_CHUNKID_DATA; - chunks[2].size = (hashsz + 16) * ctx->commits.nr; - chunks[2].write_fn = write_graph_chunk_data; + cf = init_chunkfile(f); + + add_chunk(cf, GRAPH_CHUNKID_OIDFANOUT, GRAPH_FANOUT_SIZE, + write_graph_chunk_fanout); + add_chunk(cf, GRAPH_CHUNKID_OIDLOOKUP, hashsz * ctx->commits.nr, + write_graph_chunk_oids); + add_chunk(cf, GRAPH_CHUNKID_DATA, (hashsz + 16) * ctx->commits.nr, + write_graph_chunk_data); if (git_env_bool(GIT_TEST_COMMIT_GRAPH_NO_GDAT, 0)) ctx->write_generation_data = 0; - if (ctx->write_generation_data) { - chunks[num_chunks].id = GRAPH_CHUNKID_GENERATION_DATA; - chunks[num_chunks].size = sizeof(uint32_t) * ctx->commits.nr; - chunks[num_chunks].write_fn = write_graph_chunk_generation_data; - num_chunks++; - } - if (ctx->num_generation_data_overflows) { - chunks[num_chunks].id = GRAPH_CHUNKID_GENERATION_DATA_OVERFLOW; - chunks[num_chunks].size = sizeof(timestamp_t) * ctx->num_generation_data_overflows; - chunks[num_chunks].write_fn = write_graph_chunk_generation_data_overflow; - num_chunks++; - } - if (ctx->num_extra_edges) { - chunks[num_chunks].id = GRAPH_CHUNKID_EXTRAEDGES; - chunks[num_chunks].size = 4 * ctx->num_extra_edges; - chunks[num_chunks].write_fn = write_graph_chunk_extra_edges; - num_chunks++; - } + if (ctx->write_generation_data) + add_chunk(cf, GRAPH_CHUNKID_GENERATION_DATA, + sizeof(uint32_t) * ctx->commits.nr, + write_graph_chunk_generation_data); + if (ctx->num_generation_data_overflows) + add_chunk(cf, GRAPH_CHUNKID_GENERATION_DATA_OVERFLOW, + sizeof(timestamp_t) * ctx->num_generation_data_overflows, + write_graph_chunk_generation_data_overflow); + if (ctx->num_extra_edges) + add_chunk(cf, GRAPH_CHUNKID_EXTRAEDGES, + 4 * ctx->num_extra_edges, + write_graph_chunk_extra_edges); if (ctx->changed_paths) { - chunks[num_chunks].id = GRAPH_CHUNKID_BLOOMINDEXES; - chunks[num_chunks].size = sizeof(uint32_t) * ctx->commits.nr; - chunks[num_chunks].write_fn = write_graph_chunk_bloom_indexes; - num_chunks++; - chunks[num_chunks].id = GRAPH_CHUNKID_BLOOMDATA; - chunks[num_chunks].size = sizeof(uint32_t) * 3 - + ctx->total_bloom_filter_data_size; - chunks[num_chunks].write_fn = write_graph_chunk_bloom_data; - num_chunks++; - } - if (ctx->num_commit_graphs_after > 1) { - chunks[num_chunks].id = GRAPH_CHUNKID_BASE; - chunks[num_chunks].size = hashsz * (ctx->num_commit_graphs_after - 1); - chunks[num_chunks].write_fn = write_graph_chunk_base; - num_chunks++; - } - - chunks[num_chunks].id = 0; - chunks[num_chunks].size = 0; + add_chunk(cf, GRAPH_CHUNKID_BLOOMINDEXES, + sizeof(uint32_t) * ctx->commits.nr, + write_graph_chunk_bloom_indexes); + add_chunk(cf, GRAPH_CHUNKID_BLOOMDATA, + sizeof(uint32_t) * 3 + + ctx->total_bloom_filter_data_size, + write_graph_chunk_bloom_data); + } + if (ctx->num_commit_graphs_after > 1) + add_chunk(cf, GRAPH_CHUNKID_BASE, + hashsz * (ctx->num_commit_graphs_after - 1), + write_graph_chunk_base); hashwrite_be32(f, GRAPH_SIGNATURE); hashwrite_u8(f, GRAPH_VERSION); hashwrite_u8(f, oid_version()); - hashwrite_u8(f, num_chunks); + hashwrite_u8(f, get_num_chunks(cf)); hashwrite_u8(f, ctx->num_commit_graphs_after - 1); - chunk_offset = 8 + (num_chunks + 1) * GRAPH_CHUNKLOOKUP_WIDTH; - for (i = 0; i <= num_chunks; i++) { - uint32_t chunk_write[3]; - - chunk_write[0] = htonl(chunks[i].id); - chunk_write[1] = htonl(chunk_offset >> 32); - chunk_write[2] = htonl(chunk_offset & 0xffffffff); - hashwrite(f, chunk_write, 12); - - chunk_offset += chunks[i].size; - } - if (ctx->report_progress) { strbuf_addf(&progress_title, Q_("Writing out commit graph in %d pass", @@ -1905,17 +1870,7 @@ static int write_commit_graph_file(struct write_commit_graph_context *ctx) num_chunks * ctx->commits.nr); } - for (i = 0; i < num_chunks; i++) { - uint64_t start_offset = f->total + f->offset; - - if (chunks[i].write_fn(f, ctx)) - return -1; - - if (f->total + f->offset != start_offset + chunks[i].size) - BUG("expected to write %"PRId64" bytes to chunk %"PRIx32", but wrote %"PRId64" instead", - chunks[i].size, chunks[i].id, - f->total + f->offset - start_offset); - } + write_chunkfile(cf, ctx); stop_progress(&ctx->progress); strbuf_release(&progress_title); @@ -1932,6 +1887,7 @@ static int write_commit_graph_file(struct write_commit_graph_context *ctx) close_commit_graph(ctx->r->objects); finalize_hashfile(f, file_hash.hash, CSUM_HASH_IN_STREAM | CSUM_FSYNC); + free_chunkfile(cf); if (ctx->split) { FILE *chainf = fdopen_lock_file(&lk, "w"); From patchwork Fri Feb 5 14:30:39 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 12071103 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 67ED1C4332B for ; Fri, 5 Feb 2021 22:18:15 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 2CC0D64EE4 for ; Fri, 5 Feb 2021 22:18:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233061AbhBEWRe (ORCPT ); Fri, 5 Feb 2021 17:17:34 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44552 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232694AbhBEOgf (ORCPT ); Fri, 5 Feb 2021 09:36:35 -0500 Received: from mail-wm1-x333.google.com (mail-wm1-x333.google.com [IPv6:2a00:1450:4864:20::333]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 495EDC061786 for ; Fri, 5 Feb 2021 08:14:40 -0800 (PST) Received: by mail-wm1-x333.google.com with SMTP id c127so6322160wmf.5 for ; Fri, 05 Feb 2021 08:14:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=sgUJGixoYfN4DpB6eRKdzHk/vq3xeR3NlBM/Tpw0GqY=; b=P/sCYx7IDQQ0Y6rOCAB7Klb0XUQQbzOSiSDY6s8o8H4F3imNljCjKiLfMD8dDlv42a zsmEQ1AhncyrhxjnSJ3q+7/FLYXKJZXjMGCxqRaFt/FJbyTdbTLM60ehRzb4hCpWpPTq bng/tIUZTQp5s0ZdTSJh7HpThO6KkXVRjBxnR35muCI2JAO7dX0QiwEgaghv7Yd6jZcS DZf9f9pN0DoQeJDLzFdGKis8OVtRjjSM9T0xYzklRdD7yuXFRx19po8gI6KEnzOYBwsk MxNuIRraJEYxGYrnNGagxiKCTDyIblvkruboZDAwukNNHLD/bgO46htTY2Mu1rbXI3OR JS2Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=sgUJGixoYfN4DpB6eRKdzHk/vq3xeR3NlBM/Tpw0GqY=; b=CC2LBfVS6eqbc/nc1ZiBCKRPBK6ItWTs2cXjaL9OQjEeZI+ypTvMWjz7zn+p4DsN/a bqMWLNkTvj/JzCLKepH2CPXW4e2Ra3MEfs9lRXj4Sgy2zu7vbaEXytT4RnvzYM9coJG4 kijNuBStgnvRDH5eC0KkTPN1vJwGfeKnDbRFM9YxaHU1jsT9O+YJaEQgQUvwYM9AapZ0 Wv3L53dOyeZEB8ycnq8wpRfoVnc59VK4ZVFGAwDB+kzhu+flnvzhaeu1e/RfG5i1WAdA ensKQpo7N7mTen7aj2VqHvR/AuqkUZL70uGfS0XX2IKtdZSn+9J0nBZwR8I7oVcvqtw2 jePQ== X-Gm-Message-State: AOAM533R73mrew7Ui0t20+r7IK/cAAOsrRQGWpjeK1HWIRbnebHaMX4Y 5N8vc0c9f7Th4NKMmGu1Mudhk0Tch6Y= X-Google-Smtp-Source: ABdhPJxunNCFWgu536Sjo1dyb14BMp28YhWoGjciLcFmzy1PETNYX3TS4qsDxkCqTvGrxZkicdXknw== X-Received: by 2002:a7b:cb05:: with SMTP id u5mr3831402wmj.140.1612535458560; Fri, 05 Feb 2021 06:30:58 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id 2sm8022770wmn.10.2021.02.05.06.30.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 05 Feb 2021 06:30:58 -0800 (PST) Message-Id: <66ff49ed93091ea34cb21eff60b3114eba006ff3.1612535453.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Fri, 05 Feb 2021 14:30:39 +0000 Subject: [PATCH v3 04/17] midx: rename pack_info to write_midx_context Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: me@ttaylorr.com, gitster@pobox.com, l.s.r@web.de, szeder.dev@gmail.com, Chris Torek , Derrick Stolee , Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee In an effort to streamline our chunk-based file formats, align some of the code structure in write_midx_internal() to be similar to the patterns in write_commit_graph_file(). Specifically, let's create a "struct write_midx_context" that can be used as a data parameter to abstract function types. This change only renames "struct pack_info" to "struct write_midx_context" and the names of instances from "packs" to "ctx". In future changes, we will expand the data inside "struct write_midx_context" and align our chunk-writing method with the chunk-format API. Signed-off-by: Derrick Stolee --- midx.c | 130 ++++++++++++++++++++++++++++----------------------------- 1 file changed, 65 insertions(+), 65 deletions(-) diff --git a/midx.c b/midx.c index 79c282b070d2..561f65a63a5b 100644 --- a/midx.c +++ b/midx.c @@ -451,7 +451,7 @@ static int pack_info_compare(const void *_a, const void *_b) return strcmp(a->pack_name, b->pack_name); } -struct pack_list { +struct write_midx_context { struct pack_info *info; uint32_t nr; uint32_t alloc; @@ -463,37 +463,37 @@ struct pack_list { static void add_pack_to_midx(const char *full_path, size_t full_path_len, const char *file_name, void *data) { - struct pack_list *packs = (struct pack_list *)data; + struct write_midx_context *ctx = data; if (ends_with(file_name, ".idx")) { - display_progress(packs->progress, ++packs->pack_paths_checked); - if (packs->m && midx_contains_pack(packs->m, file_name)) + display_progress(ctx->progress, ++ctx->pack_paths_checked); + if (ctx->m && midx_contains_pack(ctx->m, file_name)) return; - ALLOC_GROW(packs->info, packs->nr + 1, packs->alloc); + ALLOC_GROW(ctx->info, ctx->nr + 1, ctx->alloc); - packs->info[packs->nr].p = add_packed_git(full_path, - full_path_len, - 0); + ctx->info[ctx->nr].p = add_packed_git(full_path, + full_path_len, + 0); - if (!packs->info[packs->nr].p) { + if (!ctx->info[ctx->nr].p) { warning(_("failed to add packfile '%s'"), full_path); return; } - if (open_pack_index(packs->info[packs->nr].p)) { + if (open_pack_index(ctx->info[ctx->nr].p)) { warning(_("failed to open pack-index '%s'"), full_path); - close_pack(packs->info[packs->nr].p); - FREE_AND_NULL(packs->info[packs->nr].p); + close_pack(ctx->info[ctx->nr].p); + FREE_AND_NULL(ctx->info[ctx->nr].p); return; } - packs->info[packs->nr].pack_name = xstrdup(file_name); - packs->info[packs->nr].orig_pack_int_id = packs->nr; - packs->info[packs->nr].expired = 0; - packs->nr++; + ctx->info[ctx->nr].pack_name = xstrdup(file_name); + ctx->info[ctx->nr].orig_pack_int_id = ctx->nr; + ctx->info[ctx->nr].expired = 0; + ctx->nr++; } } @@ -801,7 +801,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * uint32_t i; struct hashfile *f = NULL; struct lock_file lk; - struct pack_list packs; + struct write_midx_context ctx = { 0 }; uint32_t *pack_perm = NULL; uint64_t written = 0; uint32_t chunk_ids[MIDX_MAX_CHUNKS + 1]; @@ -820,40 +820,40 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * midx_name); if (m) - packs.m = m; + ctx.m = m; else - packs.m = load_multi_pack_index(object_dir, 1); - - packs.nr = 0; - packs.alloc = packs.m ? packs.m->num_packs : 16; - packs.info = NULL; - ALLOC_ARRAY(packs.info, packs.alloc); - - if (packs.m) { - for (i = 0; i < packs.m->num_packs; i++) { - ALLOC_GROW(packs.info, packs.nr + 1, packs.alloc); - - packs.info[packs.nr].orig_pack_int_id = i; - packs.info[packs.nr].pack_name = xstrdup(packs.m->pack_names[i]); - packs.info[packs.nr].p = NULL; - packs.info[packs.nr].expired = 0; - packs.nr++; + ctx.m = load_multi_pack_index(object_dir, 1); + + ctx.nr = 0; + ctx.alloc = ctx.m ? ctx.m->num_packs : 16; + ctx.info = NULL; + ALLOC_ARRAY(ctx.info, ctx.alloc); + + if (ctx.m) { + for (i = 0; i < ctx.m->num_packs; i++) { + ALLOC_GROW(ctx.info, ctx.nr + 1, ctx.alloc); + + ctx.info[ctx.nr].orig_pack_int_id = i; + ctx.info[ctx.nr].pack_name = xstrdup(ctx.m->pack_names[i]); + ctx.info[ctx.nr].p = NULL; + ctx.info[ctx.nr].expired = 0; + ctx.nr++; } } - packs.pack_paths_checked = 0; + ctx.pack_paths_checked = 0; if (flags & MIDX_PROGRESS) - packs.progress = start_delayed_progress(_("Adding packfiles to multi-pack-index"), 0); + ctx.progress = start_delayed_progress(_("Adding packfiles to multi-pack-index"), 0); else - packs.progress = NULL; + ctx.progress = NULL; - for_each_file_in_pack_dir(object_dir, add_pack_to_midx, &packs); - stop_progress(&packs.progress); + for_each_file_in_pack_dir(object_dir, add_pack_to_midx, &ctx); + stop_progress(&ctx.progress); - if (packs.m && packs.nr == packs.m->num_packs && !packs_to_drop) + if (ctx.m && ctx.nr == ctx.m->num_packs && !packs_to_drop) goto cleanup; - entries = get_sorted_entries(packs.m, packs.info, packs.nr, &nr_entries); + entries = get_sorted_entries(ctx.m, ctx.info, ctx.nr, &nr_entries); for (i = 0; i < nr_entries; i++) { if (entries[i].offset > 0x7fffffff) @@ -862,19 +862,19 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * large_offsets_needed = 1; } - QSORT(packs.info, packs.nr, pack_info_compare); + QSORT(ctx.info, ctx.nr, pack_info_compare); if (packs_to_drop && packs_to_drop->nr) { int drop_index = 0; int missing_drops = 0; - for (i = 0; i < packs.nr && drop_index < packs_to_drop->nr; i++) { - int cmp = strcmp(packs.info[i].pack_name, + for (i = 0; i < ctx.nr && drop_index < packs_to_drop->nr; i++) { + int cmp = strcmp(ctx.info[i].pack_name, packs_to_drop->items[drop_index].string); if (!cmp) { drop_index++; - packs.info[i].expired = 1; + ctx.info[i].expired = 1; } else if (cmp > 0) { error(_("did not see pack-file %s to drop"), packs_to_drop->items[drop_index].string); @@ -882,7 +882,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * missing_drops++; i--; } else { - packs.info[i].expired = 0; + ctx.info[i].expired = 0; } } @@ -898,19 +898,19 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * * * pack_perm[old_id] = new_id */ - ALLOC_ARRAY(pack_perm, packs.nr); - for (i = 0; i < packs.nr; i++) { - if (packs.info[i].expired) { + ALLOC_ARRAY(pack_perm, ctx.nr); + for (i = 0; i < ctx.nr; i++) { + if (ctx.info[i].expired) { dropped_packs++; - pack_perm[packs.info[i].orig_pack_int_id] = PACK_EXPIRED; + pack_perm[ctx.info[i].orig_pack_int_id] = PACK_EXPIRED; } else { - pack_perm[packs.info[i].orig_pack_int_id] = i - dropped_packs; + pack_perm[ctx.info[i].orig_pack_int_id] = i - dropped_packs; } } - for (i = 0; i < packs.nr; i++) { - if (!packs.info[i].expired) - pack_name_concat_len += strlen(packs.info[i].pack_name) + 1; + for (i = 0; i < ctx.nr; i++) { + if (!ctx.info[i].expired) + pack_name_concat_len += strlen(ctx.info[i].pack_name) + 1; } if (pack_name_concat_len % MIDX_CHUNK_ALIGNMENT) @@ -921,19 +921,19 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * f = hashfd(lk.tempfile->fd, lk.tempfile->filename.buf); FREE_AND_NULL(midx_name); - if (packs.m) - close_midx(packs.m); + if (ctx.m) + close_midx(ctx.m); cur_chunk = 0; num_chunks = large_offsets_needed ? 5 : 4; - if (packs.nr - dropped_packs == 0) { + if (ctx.nr - dropped_packs == 0) { error(_("no pack files to index.")); result = 1; goto cleanup; } - written = write_midx_header(f, num_chunks, packs.nr - dropped_packs); + written = write_midx_header(f, num_chunks, ctx.nr - dropped_packs); chunk_ids[cur_chunk] = MIDX_CHUNKID_PACKNAMES; chunk_offsets[cur_chunk] = written + (num_chunks + 1) * MIDX_CHUNKLOOKUP_WIDTH; @@ -990,7 +990,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * switch (chunk_ids[i]) { case MIDX_CHUNKID_PACKNAMES: - written += write_midx_pack_names(f, packs.info, packs.nr); + written += write_midx_pack_names(f, ctx.info, ctx.nr); break; case MIDX_CHUNKID_OIDFANOUT: @@ -1027,15 +1027,15 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * commit_lock_file(&lk); cleanup: - for (i = 0; i < packs.nr; i++) { - if (packs.info[i].p) { - close_pack(packs.info[i].p); - free(packs.info[i].p); + for (i = 0; i < ctx.nr; i++) { + if (ctx.info[i].p) { + close_pack(ctx.info[i].p); + free(ctx.info[i].p); } - free(packs.info[i].pack_name); + free(ctx.info[i].pack_name); } - free(packs.info); + free(ctx.info); free(entries); free(pack_perm); free(midx_name); From patchwork Fri Feb 5 14:30:40 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 12071057 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3CB9BC43381 for ; Fri, 5 Feb 2021 22:07:09 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id EDEB264F92 for ; Fri, 5 Feb 2021 22:07:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232968AbhBEWGv (ORCPT ); Fri, 5 Feb 2021 17:06:51 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47862 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232888AbhBEOwL (ORCPT ); Fri, 5 Feb 2021 09:52:11 -0500 Received: from mail-wr1-x42c.google.com (mail-wr1-x42c.google.com [IPv6:2a00:1450:4864:20::42c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5CD60C0611C2 for ; Fri, 5 Feb 2021 08:29:40 -0800 (PST) Received: by mail-wr1-x42c.google.com with SMTP id u14so8376644wri.3 for ; Fri, 05 Feb 2021 08:29:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=kGVHM52oWgqXnIlb4yOoAhhJoKcEoCuXj3Nru4t5efI=; b=DRQxkmGrYtgIk7zO9oRTdAeuBxlUo4sQ8/+hUkzQEIidLef7MlxGFmFOriJ5SkzWhx f9v/JMHliU4S1SAEWrXGiX9hc/sbGkK73wLq+Id34vXxyR6Zqess+1htgMyVULLIsi+J i1kmbLE2zxkkp+TY8Kr7S7Q6oPTm5CLEKhfCFEj7g5gTcGFDV4ecQGB9XiXGGSAUzNmB 74dy1pE8Gp/Kq3yJzsym/RQhQj2AGQgClr3/X3LN5XR1QdnpavMAZB/Pb5G/SnTuUtOx 8SxyFB4x1UmRgQWSrYO3y5hBLmheHEU90mL/hYQEzsmBepDLCGj8MQvaFKINltSMI8nw bGHg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=kGVHM52oWgqXnIlb4yOoAhhJoKcEoCuXj3Nru4t5efI=; b=qy/bfFz12YYULnS/BWIp3iSRHszZpUddDM2ugMi5AEuKkGGKIFg4HkIoN4jR83VRkp BT82D7qXj88uBW8yar+D2LfC2MtMUsr9eKDasnK4BEBgADo+GbLDtzyCLMstnOlEBIJJ 9uplBkSquRtdxjzcCSk6wtc/zCgMfQBxUhYkemJ4gFKO5BjFWaxl0yRd54zNUzDJrUwc +yMxTZN4QTYVcrq7uc1N0yVEin0htQ7PVnbEWyY7b4jPe/iOW/ALQpjKgQwJobzZHlF5 WlSHDWuXCpk6HHHgBZ4qxnmFmwUHLu43Z2ESkDnrTbtG773ki/n+muqYveHUGe+0jrNg nr6g== X-Gm-Message-State: AOAM531KGpFGMsN0x6O6VX0coQY2gIPlgAqWhVBDmCo/0u1O6XOwqpu/ NzNW6D+9Wm8ktu62yP5fjs/4vKmZbGY= X-Google-Smtp-Source: ABdhPJwqR6uXSP7kLq7w1B8X+4M6Any9mfoEE8FyRL27DG+AoL4IkWysmoSAc7f/NKRQltfE/NatBQ== X-Received: by 2002:a5d:4492:: with SMTP id j18mr5239798wrq.403.1612535459602; Fri, 05 Feb 2021 06:30:59 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id i20sm8587743wmq.7.2021.02.05.06.30.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 05 Feb 2021 06:30:59 -0800 (PST) Message-Id: <1d7484c0cffa1a316a1f6bdbcd986e98b0fe2b53.1612535453.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Fri, 05 Feb 2021 14:30:40 +0000 Subject: [PATCH v3 05/17] midx: use context in write_midx_pack_names() Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: me@ttaylorr.com, gitster@pobox.com, l.s.r@web.de, szeder.dev@gmail.com, Chris Torek , Derrick Stolee , Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee In an effort to align the write_midx_internal() to use the chunk-format API, start converting chunk writing methods to match chunk_write_fn. The first case is to convert write_midx_pack_names() to take "void *data". We already have the necessary data in "struct write_midx_context", so this conversion is rather mechanical. Signed-off-by: Derrick Stolee --- midx.c | 21 ++++++++++----------- 1 file changed, 10 insertions(+), 11 deletions(-) diff --git a/midx.c b/midx.c index 561f65a63a5b..88452b044337 100644 --- a/midx.c +++ b/midx.c @@ -643,27 +643,26 @@ static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m, return deduplicated_entries; } -static size_t write_midx_pack_names(struct hashfile *f, - struct pack_info *info, - uint32_t num_packs) +static size_t write_midx_pack_names(struct hashfile *f, void *data) { + struct write_midx_context *ctx = data; uint32_t i; unsigned char padding[MIDX_CHUNK_ALIGNMENT]; size_t written = 0; - for (i = 0; i < num_packs; i++) { + for (i = 0; i < ctx->nr; i++) { size_t writelen; - if (info[i].expired) + if (ctx->info[i].expired) continue; - if (i && strcmp(info[i].pack_name, info[i - 1].pack_name) <= 0) + if (i && strcmp(ctx->info[i].pack_name, ctx->info[i - 1].pack_name) <= 0) BUG("incorrect pack-file order: %s before %s", - info[i - 1].pack_name, - info[i].pack_name); + ctx->info[i - 1].pack_name, + ctx->info[i].pack_name); - writelen = strlen(info[i].pack_name) + 1; - hashwrite(f, info[i].pack_name, writelen); + writelen = strlen(ctx->info[i].pack_name) + 1; + hashwrite(f, ctx->info[i].pack_name, writelen); written += writelen; } @@ -990,7 +989,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * switch (chunk_ids[i]) { case MIDX_CHUNKID_PACKNAMES: - written += write_midx_pack_names(f, ctx.info, ctx.nr); + written += write_midx_pack_names(f, &ctx); break; case MIDX_CHUNKID_OIDFANOUT: From patchwork Fri Feb 5 14:30:41 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 12071075 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E10D7C4332D for ; Fri, 5 Feb 2021 22:12:52 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A9C4664EE4 for ; Fri, 5 Feb 2021 22:12:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232942AbhBEWM2 (ORCPT ); Fri, 5 Feb 2021 17:12:28 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45656 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232800AbhBEOlh (ORCPT ); Fri, 5 Feb 2021 09:41:37 -0500 Received: from mail-ed1-x52b.google.com (mail-ed1-x52b.google.com [IPv6:2a00:1450:4864:20::52b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 17024C061356 for ; Fri, 5 Feb 2021 08:19:43 -0800 (PST) Received: by mail-ed1-x52b.google.com with SMTP id s26so3645149edt.10 for ; Fri, 05 Feb 2021 08:19:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=SmEIaXCinREEUX3RTyO7nxVqzBVoWKR9j4KM2/Wy8VQ=; b=NFgSCcuWazlvFPpqaXdvSVtu+HE9dAxZj27Lv7Q7rIgj4/LYxohV17HNZb0LdAXS5g 1OQ5QdreREbBoLGI0S1LyEWIm0CPglCcP6ar+8RjqgqeqAXf8bErmD/3k4F6wZcO3qny 4La6jRHC/rsLWhkrqR8hOwTg52RmSiKgzJ82UAw0RVFHv1Ugp2c/gyvpFXfGw/WKSNww wDkHH7fZdPw+VimOcNFZEb7XOeJIiPloncBizRDknXRdQ6hoIwqByuMy2aQAGV5YLE5o qkZUTcpm1LNzJISKO0vZZE9WR8u1qiLZVL+u596+1WgCdtXlbnVvKJrEXcWT6YVFnbpx DFRQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=SmEIaXCinREEUX3RTyO7nxVqzBVoWKR9j4KM2/Wy8VQ=; b=Xl6XRwVzUEsB3TmdV9tObtOwgPMTAQL7XZWTf3vJjtGl809VFGO7yP9CwJr/0Efkbi w1x7aDYQQW6yKKMJpkI24QbWoE81huycPqJGvbkhLPcv0dBsgJEyp+h4JG8JhQ+O6++3 zBnV8WfwR8keD+vqKTXAZ4GciwawusiolZAeklZgaSit5P9itU4wjluraV0/GXh5bZ72 lymMhFkUfH/zCagitCNJoPQWhI/CjSCjS/6v59puiCy+JJHBvruE/2qsgRzMqP26h3iE 0PWlZvl9XW3QYVKNyh840LNppLvOfI04F6tIMgSGccLDCgfK5TGVOeelRhR7WcuKsUpM nmtw== X-Gm-Message-State: AOAM531dsT7LRJCe11c6oSZVRSTeLoCBO6LeYIiMz1u8y1WrCk1slJxi SXGedxeznmra1wr6+ciEQHiMaOkh8AY= X-Google-Smtp-Source: ABdhPJwqjDhV87ysY0FRp3EYMT6xplol5W8+TbIwJMb8i3EpcHIcXONtVDKJjk/uENuKuaLW+1z/dg== X-Received: by 2002:a5d:6752:: with SMTP id l18mr5351294wrw.209.1612535460579; Fri, 05 Feb 2021 06:31:00 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id z18sm12786681wro.91.2021.02.05.06.30.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 05 Feb 2021 06:30:59 -0800 (PST) Message-Id: In-Reply-To: References: Date: Fri, 05 Feb 2021 14:30:41 +0000 Subject: [PATCH v3 06/17] midx: add entries to write_midx_context Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: me@ttaylorr.com, gitster@pobox.com, l.s.r@web.de, szeder.dev@gmail.com, Chris Torek , Derrick Stolee , Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee In an effort to align write_midx_internal() with the chunk-format API, continue to group necessary data into "struct write_midx_context". This change collects the "struct pack_midx_entry *entries" list and its count into the context. Update write_midx_oid_fanout() and write_midx_oid_lookup() to take the context directly, as these are easy conversions with this new data. Only the callers of write_midx_object_offsets() and write_midx_large_offsets() are updated here, since additional data in the context before those methods can match chunk_write_fn. Signed-off-by: Derrick Stolee --- midx.c | 49 ++++++++++++++++++++++++++----------------------- 1 file changed, 26 insertions(+), 23 deletions(-) diff --git a/midx.c b/midx.c index 88452b044337..4520ef82b91b 100644 --- a/midx.c +++ b/midx.c @@ -458,6 +458,9 @@ struct write_midx_context { struct multi_pack_index *m; struct progress *progress; unsigned pack_paths_checked; + + struct pack_midx_entry *entries; + uint32_t entries_nr; }; static void add_pack_to_midx(const char *full_path, size_t full_path_len, @@ -678,11 +681,11 @@ static size_t write_midx_pack_names(struct hashfile *f, void *data) } static size_t write_midx_oid_fanout(struct hashfile *f, - struct pack_midx_entry *objects, - uint32_t nr_objects) + void *data) { - struct pack_midx_entry *list = objects; - struct pack_midx_entry *last = objects + nr_objects; + struct write_midx_context *ctx = data; + struct pack_midx_entry *list = ctx->entries; + struct pack_midx_entry *last = ctx->entries + ctx->entries_nr; uint32_t count = 0; uint32_t i; @@ -706,18 +709,19 @@ static size_t write_midx_oid_fanout(struct hashfile *f, return MIDX_CHUNK_FANOUT_SIZE; } -static size_t write_midx_oid_lookup(struct hashfile *f, unsigned char hash_len, - struct pack_midx_entry *objects, - uint32_t nr_objects) +static size_t write_midx_oid_lookup(struct hashfile *f, + void *data) { - struct pack_midx_entry *list = objects; + struct write_midx_context *ctx = data; + unsigned char hash_len = the_hash_algo->rawsz; + struct pack_midx_entry *list = ctx->entries; uint32_t i; size_t written = 0; - for (i = 0; i < nr_objects; i++) { + for (i = 0; i < ctx->entries_nr; i++) { struct pack_midx_entry *obj = list++; - if (i < nr_objects - 1) { + if (i < ctx->entries_nr - 1) { struct pack_midx_entry *next = list; if (oidcmp(&obj->oid, &next->oid) >= 0) BUG("OIDs not in order: %s >= %s", @@ -805,8 +809,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * uint64_t written = 0; uint32_t chunk_ids[MIDX_MAX_CHUNKS + 1]; uint64_t chunk_offsets[MIDX_MAX_CHUNKS + 1]; - uint32_t nr_entries, num_large_offsets = 0; - struct pack_midx_entry *entries = NULL; + uint32_t num_large_offsets = 0; struct progress *progress = NULL; int large_offsets_needed = 0; int pack_name_concat_len = 0; @@ -852,12 +855,12 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * if (ctx.m && ctx.nr == ctx.m->num_packs && !packs_to_drop) goto cleanup; - entries = get_sorted_entries(ctx.m, ctx.info, ctx.nr, &nr_entries); + ctx.entries = get_sorted_entries(ctx.m, ctx.info, ctx.nr, &ctx.entries_nr); - for (i = 0; i < nr_entries; i++) { - if (entries[i].offset > 0x7fffffff) + for (i = 0; i < ctx.entries_nr; i++) { + if (ctx.entries[i].offset > 0x7fffffff) num_large_offsets++; - if (entries[i].offset > 0xffffffff) + if (ctx.entries[i].offset > 0xffffffff) large_offsets_needed = 1; } @@ -947,10 +950,10 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * cur_chunk++; chunk_ids[cur_chunk] = MIDX_CHUNKID_OBJECTOFFSETS; - chunk_offsets[cur_chunk] = chunk_offsets[cur_chunk - 1] + nr_entries * the_hash_algo->rawsz; + chunk_offsets[cur_chunk] = chunk_offsets[cur_chunk - 1] + ctx.entries_nr * the_hash_algo->rawsz; cur_chunk++; - chunk_offsets[cur_chunk] = chunk_offsets[cur_chunk - 1] + nr_entries * MIDX_CHUNK_OFFSET_WIDTH; + chunk_offsets[cur_chunk] = chunk_offsets[cur_chunk - 1] + ctx.entries_nr * MIDX_CHUNK_OFFSET_WIDTH; if (large_offsets_needed) { chunk_ids[cur_chunk] = MIDX_CHUNKID_LARGEOFFSETS; @@ -993,19 +996,19 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * break; case MIDX_CHUNKID_OIDFANOUT: - written += write_midx_oid_fanout(f, entries, nr_entries); + written += write_midx_oid_fanout(f, &ctx); break; case MIDX_CHUNKID_OIDLOOKUP: - written += write_midx_oid_lookup(f, the_hash_algo->rawsz, entries, nr_entries); + written += write_midx_oid_lookup(f, &ctx); break; case MIDX_CHUNKID_OBJECTOFFSETS: - written += write_midx_object_offsets(f, large_offsets_needed, pack_perm, entries, nr_entries); + written += write_midx_object_offsets(f, large_offsets_needed, pack_perm, ctx.entries, ctx.entries_nr); break; case MIDX_CHUNKID_LARGEOFFSETS: - written += write_midx_large_offsets(f, num_large_offsets, entries, nr_entries); + written += write_midx_large_offsets(f, num_large_offsets, ctx.entries, ctx.entries_nr); break; default: @@ -1035,7 +1038,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * } free(ctx.info); - free(entries); + free(ctx.entries); free(pack_perm); free(midx_name); return result; From patchwork Fri Feb 5 14:30:42 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 12071073 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5A81FC433E9 for ; Fri, 5 Feb 2021 22:12:52 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1DC6864FB4 for ; Fri, 5 Feb 2021 22:12:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232756AbhBEWMA (ORCPT ); Fri, 5 Feb 2021 17:12:00 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44798 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232788AbhBEOkc (ORCPT ); Fri, 5 Feb 2021 09:40:32 -0500 Received: from mail-wm1-x335.google.com (mail-wm1-x335.google.com [IPv6:2a00:1450:4864:20::335]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 88DF2C06178C for ; Fri, 5 Feb 2021 08:18:57 -0800 (PST) Received: by mail-wm1-x335.google.com with SMTP id c127so6333313wmf.5 for ; Fri, 05 Feb 2021 08:18:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=Cur1jwyThUvHjWQICFEilcRtaj5W7he6jefrT/ISo+0=; b=Pel3msMWP5GDa0bMitz9xo2ps/GwKw7TAnTk5Siea19XNWa4MvYxhiAfj9FlvjZUzc OSFSJuW8sjBrxrFDFG1iXFcKyZT0TqBKSS0jAGNkxGN1caUN0twinb7lhKgLErDz2eSN dF9vknvQgTuZHqLmDbnj5u4CM+M5kn2RKLcMvgjYrWjhRWazo8UX6emUh7CAxynQtv5r 5GSoa3DVO3yMklKwd75Xogdwp78y1AiBJKMo1/B7mPV6x99B1M31+q/x9GmUkyU5tSqE PKMoVQwRoVD7A4X5zicfQtsv5x4YNuymufI+Q5NXKpZR/iCx9NNGJ+ePtYqHPeut2PYV IElA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=Cur1jwyThUvHjWQICFEilcRtaj5W7he6jefrT/ISo+0=; b=G9/VcmZ7qNu9yjNkKTg69oOzMeqQOzCtzKJIhUtJgnbLal7qYaqsH2I71xB9XJX7As jvTjU936QHxd9EhUQfci3RNRT23YllBhOZ5VeYoiPBaWna+77IXlfVW7COvSAiW+z3JV vrrns3bZT4c6hNrEJw6L5yT2/A041NuYdrpxEEIclbBTOSpj2kw0CeGyz1OdfPpVS+xm Kh3qj9f/PsY4DUO2YdmoZXuEhGZCprSKVp0C9dahyWo72d0tYTrivZ8d1T08zscVpQZO bqsWjT8fut0EFZEonah8S6H+wS0mSJ0vDHe9gxLgdrb1A/rGdzb2Af43bU6ZEI78tUV6 S8Sw== X-Gm-Message-State: AOAM533kT/7Q/qdt4uxYt7TstfVfpxdbvf6pcHCmrlMLzMGY9gmgA6VP K6+lIdcJ+4IbZuEZD1eclH9OI76YSsU= X-Google-Smtp-Source: ABdhPJxuj8MD3EBOl8QnUDRC1VUBiNx7wdoM3I5Pkp/x8wsrKExXVJD5+cwVdjUNKrYJm29npHpRKg== X-Received: by 2002:a05:600c:1986:: with SMTP id t6mr3865559wmq.92.1612535461624; Fri, 05 Feb 2021 06:31:01 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id c5sm12306702wrn.77.2021.02.05.06.31.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 05 Feb 2021 06:31:01 -0800 (PST) Message-Id: In-Reply-To: References: Date: Fri, 05 Feb 2021 14:30:42 +0000 Subject: [PATCH v3 07/17] midx: add pack_perm to write_midx_context Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: me@ttaylorr.com, gitster@pobox.com, l.s.r@web.de, szeder.dev@gmail.com, Chris Torek , Derrick Stolee , Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee In an effort to align write_midx_internal() with the chunk-format API, continue to group necessary data into "struct write_midx_context". This change collects the "uint32_t *pack_perm" and large_offsets_needed bit into the context. Update write_midx_object_offsets() to match chunk_write_fn. Signed-off-by: Derrick Stolee --- midx.c | 40 +++++++++++++++++++++------------------- 1 file changed, 21 insertions(+), 19 deletions(-) diff --git a/midx.c b/midx.c index 4520ef82b91b..cd994e333ecb 100644 --- a/midx.c +++ b/midx.c @@ -461,6 +461,9 @@ struct write_midx_context { struct pack_midx_entry *entries; uint32_t entries_nr; + + uint32_t *pack_perm; + unsigned large_offsets_needed:1; }; static void add_pack_to_midx(const char *full_path, size_t full_path_len, @@ -736,27 +739,27 @@ static size_t write_midx_oid_lookup(struct hashfile *f, return written; } -static size_t write_midx_object_offsets(struct hashfile *f, int large_offset_needed, - uint32_t *perm, - struct pack_midx_entry *objects, uint32_t nr_objects) +static size_t write_midx_object_offsets(struct hashfile *f, + void *data) { - struct pack_midx_entry *list = objects; + struct write_midx_context *ctx = data; + struct pack_midx_entry *list = ctx->entries; uint32_t i, nr_large_offset = 0; size_t written = 0; - for (i = 0; i < nr_objects; i++) { + for (i = 0; i < ctx->entries_nr; i++) { struct pack_midx_entry *obj = list++; - if (perm[obj->pack_int_id] == PACK_EXPIRED) + if (ctx->pack_perm[obj->pack_int_id] == PACK_EXPIRED) BUG("object %s is in an expired pack with int-id %d", oid_to_hex(&obj->oid), obj->pack_int_id); - hashwrite_be32(f, perm[obj->pack_int_id]); + hashwrite_be32(f, ctx->pack_perm[obj->pack_int_id]); - if (large_offset_needed && obj->offset >> 31) + if (ctx->large_offsets_needed && obj->offset >> 31) hashwrite_be32(f, MIDX_LARGE_OFFSET_NEEDED | nr_large_offset++); - else if (!large_offset_needed && obj->offset >> 32) + else if (!ctx->large_offsets_needed && obj->offset >> 32) BUG("object %s requires a large offset (%"PRIx64") but the MIDX is not writing large offsets!", oid_to_hex(&obj->oid), obj->offset); @@ -805,13 +808,11 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * struct hashfile *f = NULL; struct lock_file lk; struct write_midx_context ctx = { 0 }; - uint32_t *pack_perm = NULL; uint64_t written = 0; uint32_t chunk_ids[MIDX_MAX_CHUNKS + 1]; uint64_t chunk_offsets[MIDX_MAX_CHUNKS + 1]; uint32_t num_large_offsets = 0; struct progress *progress = NULL; - int large_offsets_needed = 0; int pack_name_concat_len = 0; int dropped_packs = 0; int result = 0; @@ -857,11 +858,12 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * ctx.entries = get_sorted_entries(ctx.m, ctx.info, ctx.nr, &ctx.entries_nr); + ctx.large_offsets_needed = 0; for (i = 0; i < ctx.entries_nr; i++) { if (ctx.entries[i].offset > 0x7fffffff) num_large_offsets++; if (ctx.entries[i].offset > 0xffffffff) - large_offsets_needed = 1; + ctx.large_offsets_needed = 1; } QSORT(ctx.info, ctx.nr, pack_info_compare); @@ -900,13 +902,13 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * * * pack_perm[old_id] = new_id */ - ALLOC_ARRAY(pack_perm, ctx.nr); + ALLOC_ARRAY(ctx.pack_perm, ctx.nr); for (i = 0; i < ctx.nr; i++) { if (ctx.info[i].expired) { dropped_packs++; - pack_perm[ctx.info[i].orig_pack_int_id] = PACK_EXPIRED; + ctx.pack_perm[ctx.info[i].orig_pack_int_id] = PACK_EXPIRED; } else { - pack_perm[ctx.info[i].orig_pack_int_id] = i - dropped_packs; + ctx.pack_perm[ctx.info[i].orig_pack_int_id] = i - dropped_packs; } } @@ -927,7 +929,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * close_midx(ctx.m); cur_chunk = 0; - num_chunks = large_offsets_needed ? 5 : 4; + num_chunks = ctx.large_offsets_needed ? 5 : 4; if (ctx.nr - dropped_packs == 0) { error(_("no pack files to index.")); @@ -954,7 +956,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * cur_chunk++; chunk_offsets[cur_chunk] = chunk_offsets[cur_chunk - 1] + ctx.entries_nr * MIDX_CHUNK_OFFSET_WIDTH; - if (large_offsets_needed) { + if (ctx.large_offsets_needed) { chunk_ids[cur_chunk] = MIDX_CHUNKID_LARGEOFFSETS; cur_chunk++; @@ -1004,7 +1006,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * break; case MIDX_CHUNKID_OBJECTOFFSETS: - written += write_midx_object_offsets(f, large_offsets_needed, pack_perm, ctx.entries, ctx.entries_nr); + written += write_midx_object_offsets(f, &ctx); break; case MIDX_CHUNKID_LARGEOFFSETS: @@ -1039,7 +1041,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * free(ctx.info); free(ctx.entries); - free(pack_perm); + free(ctx.pack_perm); free(midx_name); return result; } From patchwork Fri Feb 5 14:30:43 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 12071043 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D2969C43381 for ; Fri, 5 Feb 2021 22:02:20 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id AF9F064F92 for ; Fri, 5 Feb 2021 22:02:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231274AbhBEWB3 (ORCPT ); Fri, 5 Feb 2021 17:01:29 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48752 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232901AbhBEO4S (ORCPT ); Fri, 5 Feb 2021 09:56:18 -0500 Received: from mail-lf1-x12c.google.com (mail-lf1-x12c.google.com [IPv6:2a00:1450:4864:20::12c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1F44FC06174A for ; Fri, 5 Feb 2021 08:33:48 -0800 (PST) Received: by mail-lf1-x12c.google.com with SMTP id h12so10695019lfp.9 for ; Fri, 05 Feb 2021 08:33:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=NkNbJX5oDO/3vy/Iuipy5TUY0HICiEs4cK5g77JnPHI=; b=gJXTbB76gyUOT7evytkclMOdSMZkV68OiyizaRHclyJX4r3EJD72RsXerrbXMbjGIg JvqvftiKrVvyYVcAn0Vcqjrov5gLMjUbxlSvVk6SgPf8ohRKmCNYip12e4AsCwEKUxU6 18ysmBanZjk113Faw+KeI7KmjoPO7OEqXtmyoGjAVbqn9lxYR4ELYCD4ix6WXCf/3Bng DkTTD6NfDo9cGvShNhMYpp7Yjf/LUAn08x4y4MllJweT+zy5z6GFGZV6PnrWL9SOocZ+ ujQMoqVgqFvcFjPgSEh+Suw0zBQY8roTatmQxpL41XSoFr0VmwtIbHrBEg9szYGDCUOF T15A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=NkNbJX5oDO/3vy/Iuipy5TUY0HICiEs4cK5g77JnPHI=; b=rq4/hsNGom34D1ez0lF7xpqbfGLpdqFAtYytO7bIG0JcyBthBCQ/7AhgkmzrsHcJQu Lb05oixh4b9nsfE4GoacWblxYUSklDt4MYhM+FnZ2gwfT20F+FCYCzpg9VYnZegejEuF 4OVFh8oq2/UvLRgwTZ5DYe2PcRYMUCzzeoG75wDR0bm8N3DcjKQljOnj2A4MA1DJQeuC O5XT0hHq7HZAnA83iQte+YV/dazfQuY5dsQAbgNliOYIsyWweUYqpd3XXeJHZIqdjG7a ZymTZB51M0UbKtz3jUXbOpM0Z9pBM2TBfeeFWlOtC+pxLrP3bNT+A2qEtSFm/abHSf7b 9zeA== X-Gm-Message-State: AOAM531MKHpn9LFd+IXFHOQC+ifFwZTYwPnWp9Cf8pag7HLPdjKt+/O9 of7QtpepaN3kfzTqZDBEnX+7g36+2sQ= X-Google-Smtp-Source: ABdhPJwknqFr5O5T+91Ig6VUQstcZf3Z2oNWshHIm7jKnNJUJmYwLkgDngqy+cStomYP6KAley7XPA== X-Received: by 2002:a5d:6a8f:: with SMTP id s15mr5400245wru.252.1612535462753; Fri, 05 Feb 2021 06:31:02 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id t197sm18934186wmt.3.2021.02.05.06.31.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 05 Feb 2021 06:31:02 -0800 (PST) Message-Id: In-Reply-To: References: Date: Fri, 05 Feb 2021 14:30:43 +0000 Subject: [PATCH v3 08/17] midx: add num_large_offsets to write_midx_context Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: me@ttaylorr.com, gitster@pobox.com, l.s.r@web.de, szeder.dev@gmail.com, Chris Torek , Derrick Stolee , Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee In an effort to align write_midx_internal() with the chunk-format API, continue to group necessary data into "struct write_midx_context". This change collects the "uint32_t num_large_offsets" into the context. With this new data, write_midx_large_offsets() now matches the chunk_write_fn type. Signed-off-by: Derrick Stolee --- midx.c | 17 ++++++++++------- 1 file changed, 10 insertions(+), 7 deletions(-) diff --git a/midx.c b/midx.c index cd994e333ecb..5be081f229ad 100644 --- a/midx.c +++ b/midx.c @@ -464,6 +464,7 @@ struct write_midx_context { uint32_t *pack_perm; unsigned large_offsets_needed:1; + uint32_t num_large_offsets; }; static void add_pack_to_midx(const char *full_path, size_t full_path_len, @@ -772,11 +773,14 @@ static size_t write_midx_object_offsets(struct hashfile *f, return written; } -static size_t write_midx_large_offsets(struct hashfile *f, uint32_t nr_large_offset, - struct pack_midx_entry *objects, uint32_t nr_objects) +static size_t write_midx_large_offsets(struct hashfile *f, + void *data) { - struct pack_midx_entry *list = objects, *end = objects + nr_objects; + struct write_midx_context *ctx = data; + struct pack_midx_entry *list = ctx->entries; + struct pack_midx_entry *end = ctx->entries + ctx->entries_nr; size_t written = 0; + uint32_t nr_large_offset = ctx->num_large_offsets; while (nr_large_offset) { struct pack_midx_entry *obj; @@ -811,7 +815,6 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * uint64_t written = 0; uint32_t chunk_ids[MIDX_MAX_CHUNKS + 1]; uint64_t chunk_offsets[MIDX_MAX_CHUNKS + 1]; - uint32_t num_large_offsets = 0; struct progress *progress = NULL; int pack_name_concat_len = 0; int dropped_packs = 0; @@ -861,7 +864,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * ctx.large_offsets_needed = 0; for (i = 0; i < ctx.entries_nr; i++) { if (ctx.entries[i].offset > 0x7fffffff) - num_large_offsets++; + ctx.num_large_offsets++; if (ctx.entries[i].offset > 0xffffffff) ctx.large_offsets_needed = 1; } @@ -961,7 +964,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * cur_chunk++; chunk_offsets[cur_chunk] = chunk_offsets[cur_chunk - 1] + - num_large_offsets * MIDX_CHUNK_LARGE_OFFSET_WIDTH; + ctx.num_large_offsets * MIDX_CHUNK_LARGE_OFFSET_WIDTH; } chunk_ids[cur_chunk] = 0; @@ -1010,7 +1013,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * break; case MIDX_CHUNKID_LARGEOFFSETS: - written += write_midx_large_offsets(f, num_large_offsets, ctx.entries, ctx.entries_nr); + written += write_midx_large_offsets(f, &ctx); break; default: From patchwork Fri Feb 5 14:30:44 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 12070425 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4E21BC433DB for ; Fri, 5 Feb 2021 16:41:30 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 0084764EE0 for ; Fri, 5 Feb 2021 16:41:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233037AbhBEPB5 (ORCPT ); Fri, 5 Feb 2021 10:01:57 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48888 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232983AbhBEO6g (ORCPT ); Fri, 5 Feb 2021 09:58:36 -0500 Received: from mail-ed1-x52d.google.com (mail-ed1-x52d.google.com [IPv6:2a00:1450:4864:20::52d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1E7ACC061223 for ; Fri, 5 Feb 2021 08:28:55 -0800 (PST) Received: by mail-ed1-x52d.google.com with SMTP id s3so9443478edi.7 for ; Fri, 05 Feb 2021 08:28:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=9rbRQ+4gSMEK2h+y1E/hg18dGvaKEJCIVpGJN9anMY4=; b=nlG04GI+vgooir4Do+O/vtLMs1B8wOMSbxCsU37UwC75po/SXjdoNxWy1zUDTDONZq VtReksXt1VEfcNL7X/0Fx/YakFwyh8HCLBsvH41avybjwz1i/HXT73LBQfT8ScOz1ixh pmM4AGK+gmgSoziCUbM1RJUeMUXV2KwcHO2CgXzVj5WRYWJgH7lmrJ5nenKAyqbP8xKr C1P4F119RjUxg+HGlnq9sqjpHaThOHpYfHsFhOOrY67ZianBpTdKuuNFsqNdVqiZU0/F WHJ7e1/Zt+hZsgdR2sp95sLKt7Mdri9R0PTLFDa3oVR0f0+FwMAlz+DOWXLmA8r/iuvR /AAA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=9rbRQ+4gSMEK2h+y1E/hg18dGvaKEJCIVpGJN9anMY4=; b=fMTUjpZXs/3V5lIz2K9vKNh/dl9HdDi3fSFnp3ZlKAwuZ7cJKu6wHrO8H6t4b5LTAy 54pV5IPZY7vghUmU67quPSesCJds1vNk4JQOP2UrtJwrILHeC5JfQb2N59Yqo1ANwoMd HlvT6E2mSkmkdgG/ZLJB9VDswLuN+t12TGAWLV5vUXFfiHJkHfsPzGzFhJro9ZpYiYBW ehh2oiGHG2/CKUtI4LqMllCQn9BG+xywssHHBogy2BGU+SbfU7fzpmodz8qVbEAN5m5s zQ1JGDCypBObI7FVqwAG3epgglF1TxZ2un5eNLfsdrZmnGyzKLR5Atr8b01jsZv2SyZk KfHw== X-Gm-Message-State: AOAM533+SPy63DM3t7Q4HNPbRh1IRsWWsxG5ZbeJ6uFg+2U4Tgh3eAjK sFSv8mxjppwv2A5ynT//lRRPdJXMvJU= X-Google-Smtp-Source: ABdhPJxUi25h6Q/xkWSJjfs2LI4DXyhEGXoxZMiL05v/fVj+11MJ1wVRhui3Ihu6H+R+amOQBgufUg== X-Received: by 2002:adf:e610:: with SMTP id p16mr5289561wrm.169.1612535463728; Fri, 05 Feb 2021 06:31:03 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id u4sm11928802wrr.37.2021.02.05.06.31.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 05 Feb 2021 06:31:03 -0800 (PST) Message-Id: <7aa3242e15b7a0cca7b7d4394311697ed84bc05a.1612535453.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Fri, 05 Feb 2021 14:30:44 +0000 Subject: [PATCH v3 09/17] midx: return success/failure in chunk write methods Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: me@ttaylorr.com, gitster@pobox.com, l.s.r@web.de, szeder.dev@gmail.com, Chris Torek , Derrick Stolee , Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee Historically, the chunk-writing methods in midx.c have returned the amount of data written so the writer method could compare this with the table of contents. This presents with some interesting issues: 1. If a chunk writing method has a bug that miscalculates the written bytes, then we can satisfy the table of contents without actually writing the right amount of data to the hashfile. The commit-graph writing code checks the hashfile struct directly for a more robust verification. 2. There is no way for a chunk writing method to gracefully fail. Returning an int presents an opportunity to fail without a die(). 3. The current pattern doesn't match chunk_write_fn type exactly, so we cannot share code with commit-graph.c For these reasons, convert the midx chunk writer methods to return an 'int'. Since none of them fail at the moment, they all return 0. Signed-off-by: Derrick Stolee --- midx.c | 63 +++++++++++++++++++++++++--------------------------------- 1 file changed, 27 insertions(+), 36 deletions(-) diff --git a/midx.c b/midx.c index 5be081f229ad..c92a6c47be01 100644 --- a/midx.c +++ b/midx.c @@ -650,7 +650,7 @@ static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m, return deduplicated_entries; } -static size_t write_midx_pack_names(struct hashfile *f, void *data) +static int write_midx_pack_names(struct hashfile *f, void *data) { struct write_midx_context *ctx = data; uint32_t i; @@ -678,14 +678,13 @@ static size_t write_midx_pack_names(struct hashfile *f, void *data) if (i < MIDX_CHUNK_ALIGNMENT) { memset(padding, 0, sizeof(padding)); hashwrite(f, padding, i); - written += i; } - return written; + return 0; } -static size_t write_midx_oid_fanout(struct hashfile *f, - void *data) +static int write_midx_oid_fanout(struct hashfile *f, + void *data) { struct write_midx_context *ctx = data; struct pack_midx_entry *list = ctx->entries; @@ -710,17 +709,16 @@ static size_t write_midx_oid_fanout(struct hashfile *f, list = next; } - return MIDX_CHUNK_FANOUT_SIZE; + return 0; } -static size_t write_midx_oid_lookup(struct hashfile *f, - void *data) +static int write_midx_oid_lookup(struct hashfile *f, + void *data) { struct write_midx_context *ctx = data; unsigned char hash_len = the_hash_algo->rawsz; struct pack_midx_entry *list = ctx->entries; uint32_t i; - size_t written = 0; for (i = 0; i < ctx->entries_nr; i++) { struct pack_midx_entry *obj = list++; @@ -734,19 +732,17 @@ static size_t write_midx_oid_lookup(struct hashfile *f, } hashwrite(f, obj->oid.hash, (int)hash_len); - written += hash_len; } - return written; + return 0; } -static size_t write_midx_object_offsets(struct hashfile *f, - void *data) +static int write_midx_object_offsets(struct hashfile *f, + void *data) { struct write_midx_context *ctx = data; struct pack_midx_entry *list = ctx->entries; uint32_t i, nr_large_offset = 0; - size_t written = 0; for (i = 0; i < ctx->entries_nr; i++) { struct pack_midx_entry *obj = list++; @@ -766,20 +762,17 @@ static size_t write_midx_object_offsets(struct hashfile *f, obj->offset); else hashwrite_be32(f, (uint32_t)obj->offset); - - written += MIDX_CHUNK_OFFSET_WIDTH; } - return written; + return 0; } -static size_t write_midx_large_offsets(struct hashfile *f, - void *data) +static int write_midx_large_offsets(struct hashfile *f, + void *data) { struct write_midx_context *ctx = data; struct pack_midx_entry *list = ctx->entries; struct pack_midx_entry *end = ctx->entries + ctx->entries_nr; - size_t written = 0; uint32_t nr_large_offset = ctx->num_large_offsets; while (nr_large_offset) { @@ -795,12 +788,12 @@ static size_t write_midx_large_offsets(struct hashfile *f, if (!(offset >> 31)) continue; - written += hashwrite_be64(f, offset); + hashwrite_be64(f, offset); nr_large_offset--; } - return written; + return 0; } static int write_midx_internal(const char *object_dir, struct multi_pack_index *m, @@ -812,7 +805,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * struct hashfile *f = NULL; struct lock_file lk; struct write_midx_context ctx = { 0 }; - uint64_t written = 0; + uint64_t header_size = 0; uint32_t chunk_ids[MIDX_MAX_CHUNKS + 1]; uint64_t chunk_offsets[MIDX_MAX_CHUNKS + 1]; struct progress *progress = NULL; @@ -940,10 +933,10 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * goto cleanup; } - written = write_midx_header(f, num_chunks, ctx.nr - dropped_packs); + header_size = write_midx_header(f, num_chunks, ctx.nr - dropped_packs); chunk_ids[cur_chunk] = MIDX_CHUNKID_PACKNAMES; - chunk_offsets[cur_chunk] = written + (num_chunks + 1) * MIDX_CHUNKLOOKUP_WIDTH; + chunk_offsets[cur_chunk] = header_size + (num_chunks + 1) * MIDX_CHUNKLOOKUP_WIDTH; cur_chunk++; chunk_ids[cur_chunk] = MIDX_CHUNKID_OIDFANOUT; @@ -981,39 +974,37 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * hashwrite_be32(f, chunk_ids[i]); hashwrite_be64(f, chunk_offsets[i]); - - written += MIDX_CHUNKLOOKUP_WIDTH; } if (flags & MIDX_PROGRESS) progress = start_delayed_progress(_("Writing chunks to multi-pack-index"), num_chunks); for (i = 0; i < num_chunks; i++) { - if (written != chunk_offsets[i]) + if (f->total + f->offset != chunk_offsets[i]) BUG("incorrect chunk offset (%"PRIu64" != %"PRIu64") for chunk id %"PRIx32, chunk_offsets[i], - written, + f->total + f->offset, chunk_ids[i]); switch (chunk_ids[i]) { case MIDX_CHUNKID_PACKNAMES: - written += write_midx_pack_names(f, &ctx); + write_midx_pack_names(f, &ctx); break; case MIDX_CHUNKID_OIDFANOUT: - written += write_midx_oid_fanout(f, &ctx); + write_midx_oid_fanout(f, &ctx); break; case MIDX_CHUNKID_OIDLOOKUP: - written += write_midx_oid_lookup(f, &ctx); + write_midx_oid_lookup(f, &ctx); break; case MIDX_CHUNKID_OBJECTOFFSETS: - written += write_midx_object_offsets(f, &ctx); + write_midx_object_offsets(f, &ctx); break; case MIDX_CHUNKID_LARGEOFFSETS: - written += write_midx_large_offsets(f, &ctx); + write_midx_large_offsets(f, &ctx); break; default: @@ -1025,9 +1016,9 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * } stop_progress(&progress); - if (written != chunk_offsets[num_chunks]) + if (hashfile_total(f) != chunk_offsets[num_chunks]) BUG("incorrect final offset %"PRIu64" != %"PRIu64, - written, + hashfile_total(f), chunk_offsets[num_chunks]); finalize_hashfile(f, NULL, CSUM_FSYNC | CSUM_HASH_IN_STREAM); From patchwork Fri Feb 5 14:30:45 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 12070411 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 86DB5C433DB for ; Fri, 5 Feb 2021 16:26:24 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 2A78464DB2 for ; Fri, 5 Feb 2021 16:26:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232819AbhBEOqq (ORCPT ); Fri, 5 Feb 2021 09:46:46 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45204 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232072AbhBEOje (ORCPT ); Fri, 5 Feb 2021 09:39:34 -0500 Received: from mail-ej1-x62f.google.com (mail-ej1-x62f.google.com [IPv6:2a00:1450:4864:20::62f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 432F9C0613D6 for ; Fri, 5 Feb 2021 08:17:38 -0800 (PST) Received: by mail-ej1-x62f.google.com with SMTP id f14so12762836ejc.8 for ; Fri, 05 Feb 2021 08:17:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=u7s4MNAI8c5xo75w4IpgbLg/3f46uKnaaEAr+lPMDu4=; b=Jgh46hPt3J+ykBRb/PDCNFRwzlWg/PnkVMoCscsXERfhJ6hfM5NFU+RYOR8s7odHl/ B9GZ2Yb/qmIx8c3EHWiIkt91TNmWNdU/W1YiyndsBIV5sApm4NAKZFiS0qUjN9wI4l0N p6drSdv28HgCT4Bgah1mTj7vGXJMTP4txJjsURXsruDFsjJAxXUcQMru9zgFBKMrpWpH PiYaNd2UsZKJosRuaczh9nNdMHe91ZXRHejeaji39ZsKVe+mMwm3cir9zMpNXyOnbbzp 8v7Y9gZLYlk/WAFwTW3rRM4JPQ8H59jAOT6LpTWEyrYHkpjH5Z4itHP/x1K7El162H9b 950A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=u7s4MNAI8c5xo75w4IpgbLg/3f46uKnaaEAr+lPMDu4=; b=b/GN136EgL5J+m4jEm4vA6Cny1bwcuUpO79AbXcWjmLaUU3UAKkrz6Vzz6rtcU9XwE lTL/TC6xDWM70CB+YKLpwFyKdrIfIpRqlYF39i5Kh+fje0NUakVb1jTp3v8tKPrj9hCb nWTmhbxVHRfn1KHXM2NXRiVI359u+c9jkNQ7Ae5yabI/8kw5TiWhuU2RKITQ/+9Bd0b+ hYbke/NOmjU57LZsmZRDyrOthxMShOMZByuOMgxKVHSIqcgTpgtqngC0yL8WrAP0bcpv RLTFWIeRf6QDMv/MfVdRiyJzS1YUARLMpWDzV4ZkYVQN6O9WO3oGU7gs4R11/O68Zajs clOQ== X-Gm-Message-State: AOAM531Xz8oywK9FU2AN7rV5NnRCgOvhXLdgArovzZhNsiNMn5BsW2Nj QzU1Gu4oQ1eUIKXrDwVec9t5oVXCG78= X-Google-Smtp-Source: ABdhPJwVfuDnbVKeFdvnjhYTSuRw9xQW/3WZxq0/doKcrbws5L6zGHa8VO3cERtCsCT+GReArDPQ2A== X-Received: by 2002:adf:e48f:: with SMTP id i15mr5183549wrm.135.1612535464867; Fri, 05 Feb 2021 06:31:04 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id c62sm9114100wmd.43.2021.02.05.06.31.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 05 Feb 2021 06:31:04 -0800 (PST) Message-Id: <70f68c95e479b9d6476e070922514a4a1f05fd3b.1612535453.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Fri, 05 Feb 2021 14:30:45 +0000 Subject: [PATCH v3 10/17] midx: drop chunk progress during write Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: me@ttaylorr.com, gitster@pobox.com, l.s.r@web.de, szeder.dev@gmail.com, Chris Torek , Derrick Stolee , Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee Most expensive operations in write_midx_internal() use the context struct's progress member, and these indicate the process of the expensive operations within the chunk writing methods. However, there is a competing progress struct that counts the progress over all chunks. This is not very helpful compared to the others, so drop it. This also reduces our barriers to combining the chunk writing code with chunk-format.c. Signed-off-by: Derrick Stolee --- midx.c | 7 ------- 1 file changed, 7 deletions(-) diff --git a/midx.c b/midx.c index c92a6c47be01..4f4aa351e60e 100644 --- a/midx.c +++ b/midx.c @@ -808,7 +808,6 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * uint64_t header_size = 0; uint32_t chunk_ids[MIDX_MAX_CHUNKS + 1]; uint64_t chunk_offsets[MIDX_MAX_CHUNKS + 1]; - struct progress *progress = NULL; int pack_name_concat_len = 0; int dropped_packs = 0; int result = 0; @@ -976,9 +975,6 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * hashwrite_be64(f, chunk_offsets[i]); } - if (flags & MIDX_PROGRESS) - progress = start_delayed_progress(_("Writing chunks to multi-pack-index"), - num_chunks); for (i = 0; i < num_chunks; i++) { if (f->total + f->offset != chunk_offsets[i]) BUG("incorrect chunk offset (%"PRIu64" != %"PRIu64") for chunk id %"PRIx32, @@ -1011,10 +1007,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * BUG("trying to write unknown chunk id %"PRIx32, chunk_ids[i]); } - - display_progress(progress, i + 1); } - stop_progress(&progress); if (hashfile_total(f) != chunk_offsets[num_chunks]) BUG("incorrect final offset %"PRIu64" != %"PRIu64, From patchwork Fri Feb 5 14:30:46 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 12071105 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C84D1C432C3 for ; Fri, 5 Feb 2021 22:18:15 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9C65A61492 for ; Fri, 5 Feb 2021 22:18:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233146AbhBEWRx (ORCPT ); Fri, 5 Feb 2021 17:17:53 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43120 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232654AbhBEOap (ORCPT ); Fri, 5 Feb 2021 09:30:45 -0500 Received: from mail-ed1-x536.google.com (mail-ed1-x536.google.com [IPv6:2a00:1450:4864:20::536]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EEFD1C061356 for ; Fri, 5 Feb 2021 08:08:11 -0800 (PST) Received: by mail-ed1-x536.google.com with SMTP id z22so9324930edb.9 for ; Fri, 05 Feb 2021 08:08:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=At4JRoTc29IW16Y+2nXz7iB3R7DOXkEhdA+z+7Y3wdk=; b=Zp4CjUxANaxxlf7BoraQvf7Ss4KR4rf0LJ6XbYVPiTcZBbhDnzlZWArEk/0mWVHA6M CVVPh2ufSvh6358zim38FtdalnizharqGjC8Adf/VXCbJ6GV85L+E/JfVMAKH1aLz0Sx B69+ayltN3eyJ91cSvzKTKp6N3cQVoiEOnrReodYNU3wCh+EboHLjCm40l52jRgjc9gP QJ/gUOsAwpbLeyANMCZn0+t+BI3Ycrhc+ZkhjpVEcZMfQS1jxy+uClIQnig5HSLxZIYC lfzkNv778/quUGKGv4jgGKn1XacSGpELb+dv5ksJCo6jldMZ4VjmujaKjONYpY9UW8sr l0Ig== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=At4JRoTc29IW16Y+2nXz7iB3R7DOXkEhdA+z+7Y3wdk=; b=MOAn1jJUvoz0AyTVZGFzvAG9sXldxsJCwFL9ykqU0hkBY8EZRi8yNijhr6zMvmFPTt KQ9Ze+RMcyKQ3k6MuYc90qIBwJZZlcI2GYp4qolTXkwyPvI4Ia3uXNbnQr9t8imbltlq tTAiPhE0w5bKDnAletAzjE8t70C3N+0OgSPlTh3WUES2jg/mxAkiKBf+dEMVYQRApZy8 EvGWqhVM8ndTndRc0oRdU7Z6a9nQKx7SPgogZm9TFooVXb2ElKPZY5/jfS+61JSAceGw lRzRJXW935Mmo3AiL1W0OEWqISPf/hCSOP33FmKHzWQxkGOUvuIPF3aOZ+P7niP2tAiC 12KQ== X-Gm-Message-State: AOAM531unaooDcglWv/iqPFBMw09HfGt2Bv2tK9JT7YXiFrKGJG9LZb/ UW5OhjpfUXh9DJoUAPXlnlWy9n3Oh2E= X-Google-Smtp-Source: ABdhPJxpmmpH5iHjs78+gKiwCp3Dy+ABdN1WD8+LcbOcrMed1w4EuK0MRqtORu4fSFNva4z5SnaD1w== X-Received: by 2002:a5d:414f:: with SMTP id c15mr5427739wrq.42.1612535465874; Fri, 05 Feb 2021 06:31:05 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id r15sm12592299wrj.61.2021.02.05.06.31.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 05 Feb 2021 06:31:05 -0800 (PST) Message-Id: <787cd7f18d2e47d975e10bf0f56188bc35589124.1612535453.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Fri, 05 Feb 2021 14:30:46 +0000 Subject: [PATCH v3 11/17] midx: use chunk-format API in write_midx_internal() Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: me@ttaylorr.com, gitster@pobox.com, l.s.r@web.de, szeder.dev@gmail.com, Chris Torek , Derrick Stolee , Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee The chunk-format API allows writing the table of contents and all chunks using the anonymous 'struct chunkfile' type. We only need to convert our local chunk logic to this API for the multi-pack-index writes to share that logic with the commit-graph file writes. Signed-off-by: Derrick Stolee --- midx.c | 105 +++++++++++---------------------------------------------- 1 file changed, 20 insertions(+), 85 deletions(-) diff --git a/midx.c b/midx.c index 4f4aa351e60e..d9c7411b083b 100644 --- a/midx.c +++ b/midx.c @@ -11,6 +11,7 @@ #include "trace2.h" #include "run-command.h" #include "repository.h" +#include "chunk-format.h" #define MIDX_SIGNATURE 0x4d494458 /* "MIDX" */ #define MIDX_VERSION 1 @@ -799,18 +800,15 @@ static int write_midx_large_offsets(struct hashfile *f, static int write_midx_internal(const char *object_dir, struct multi_pack_index *m, struct string_list *packs_to_drop, unsigned flags) { - unsigned char cur_chunk, num_chunks = 0; char *midx_name; uint32_t i; struct hashfile *f = NULL; struct lock_file lk; struct write_midx_context ctx = { 0 }; - uint64_t header_size = 0; - uint32_t chunk_ids[MIDX_MAX_CHUNKS + 1]; - uint64_t chunk_offsets[MIDX_MAX_CHUNKS + 1]; int pack_name_concat_len = 0; int dropped_packs = 0; int result = 0; + struct chunkfile *cf; midx_name = get_midx_filename(object_dir); if (safe_create_leading_directories(midx_name)) @@ -923,98 +921,35 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * if (ctx.m) close_midx(ctx.m); - cur_chunk = 0; - num_chunks = ctx.large_offsets_needed ? 5 : 4; - if (ctx.nr - dropped_packs == 0) { error(_("no pack files to index.")); result = 1; goto cleanup; } - header_size = write_midx_header(f, num_chunks, ctx.nr - dropped_packs); - - chunk_ids[cur_chunk] = MIDX_CHUNKID_PACKNAMES; - chunk_offsets[cur_chunk] = header_size + (num_chunks + 1) * MIDX_CHUNKLOOKUP_WIDTH; - - cur_chunk++; - chunk_ids[cur_chunk] = MIDX_CHUNKID_OIDFANOUT; - chunk_offsets[cur_chunk] = chunk_offsets[cur_chunk - 1] + pack_name_concat_len; - - cur_chunk++; - chunk_ids[cur_chunk] = MIDX_CHUNKID_OIDLOOKUP; - chunk_offsets[cur_chunk] = chunk_offsets[cur_chunk - 1] + MIDX_CHUNK_FANOUT_SIZE; - - cur_chunk++; - chunk_ids[cur_chunk] = MIDX_CHUNKID_OBJECTOFFSETS; - chunk_offsets[cur_chunk] = chunk_offsets[cur_chunk - 1] + ctx.entries_nr * the_hash_algo->rawsz; - - cur_chunk++; - chunk_offsets[cur_chunk] = chunk_offsets[cur_chunk - 1] + ctx.entries_nr * MIDX_CHUNK_OFFSET_WIDTH; - if (ctx.large_offsets_needed) { - chunk_ids[cur_chunk] = MIDX_CHUNKID_LARGEOFFSETS; - - cur_chunk++; - chunk_offsets[cur_chunk] = chunk_offsets[cur_chunk - 1] + - ctx.num_large_offsets * MIDX_CHUNK_LARGE_OFFSET_WIDTH; - } - - chunk_ids[cur_chunk] = 0; - - for (i = 0; i <= num_chunks; i++) { - if (i && chunk_offsets[i] < chunk_offsets[i - 1]) - BUG("incorrect chunk offsets: %"PRIu64" before %"PRIu64, - chunk_offsets[i - 1], - chunk_offsets[i]); - - if (chunk_offsets[i] % MIDX_CHUNK_ALIGNMENT) - BUG("chunk offset %"PRIu64" is not properly aligned", - chunk_offsets[i]); - - hashwrite_be32(f, chunk_ids[i]); - hashwrite_be64(f, chunk_offsets[i]); - } - - for (i = 0; i < num_chunks; i++) { - if (f->total + f->offset != chunk_offsets[i]) - BUG("incorrect chunk offset (%"PRIu64" != %"PRIu64") for chunk id %"PRIx32, - chunk_offsets[i], - f->total + f->offset, - chunk_ids[i]); + cf = init_chunkfile(f); - switch (chunk_ids[i]) { - case MIDX_CHUNKID_PACKNAMES: - write_midx_pack_names(f, &ctx); - break; + add_chunk(cf, MIDX_CHUNKID_PACKNAMES, pack_name_concat_len, + write_midx_pack_names); + add_chunk(cf, MIDX_CHUNKID_OIDFANOUT, MIDX_CHUNK_FANOUT_SIZE, + write_midx_oid_fanout); + add_chunk(cf, MIDX_CHUNKID_OIDLOOKUP, + ctx.entries_nr * the_hash_algo->rawsz, + write_midx_oid_lookup); + add_chunk(cf, MIDX_CHUNKID_OBJECTOFFSETS, + ctx.entries_nr * MIDX_CHUNK_OFFSET_WIDTH, + write_midx_object_offsets); - case MIDX_CHUNKID_OIDFANOUT: - write_midx_oid_fanout(f, &ctx); - break; - - case MIDX_CHUNKID_OIDLOOKUP: - write_midx_oid_lookup(f, &ctx); - break; - - case MIDX_CHUNKID_OBJECTOFFSETS: - write_midx_object_offsets(f, &ctx); - break; - - case MIDX_CHUNKID_LARGEOFFSETS: - write_midx_large_offsets(f, &ctx); - break; - - default: - BUG("trying to write unknown chunk id %"PRIx32, - chunk_ids[i]); - } - } + if (ctx.large_offsets_needed) + add_chunk(cf, MIDX_CHUNKID_LARGEOFFSETS, + ctx.num_large_offsets * MIDX_CHUNK_LARGE_OFFSET_WIDTH, + write_midx_large_offsets); - if (hashfile_total(f) != chunk_offsets[num_chunks]) - BUG("incorrect final offset %"PRIu64" != %"PRIu64, - hashfile_total(f), - chunk_offsets[num_chunks]); + write_midx_header(f, get_num_chunks(cf), ctx.nr - dropped_packs); + write_chunkfile(cf, &ctx); finalize_hashfile(f, NULL, CSUM_FSYNC | CSUM_HASH_IN_STREAM); + free_chunkfile(cf); commit_lock_file(&lk); cleanup: From patchwork Fri Feb 5 14:30:47 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 12071127 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 31EE6C433E6 for ; Fri, 5 Feb 2021 22:24:54 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E6AFA64F9F for ; Fri, 5 Feb 2021 22:24:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232010AbhBEWYO (ORCPT ); Fri, 5 Feb 2021 17:24:14 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42296 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232573AbhBEO0c (ORCPT ); Fri, 5 Feb 2021 09:26:32 -0500 Received: from mail-wm1-x32d.google.com (mail-wm1-x32d.google.com [IPv6:2a00:1450:4864:20::32d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B4C41C0617AB for ; Fri, 5 Feb 2021 08:04:15 -0800 (PST) Received: by mail-wm1-x32d.google.com with SMTP id w4so6429675wmi.4 for ; Fri, 05 Feb 2021 08:04:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=KXPAB9q+fGgT3wZ6Ax8vxxMhYx8Zk4iTFycG5IttMms=; b=ZoYmN+WLQb3rGDPHiD/BNsyEoW7OT9w5vsPVtGlKARSi9pteJDRR7f2vNZ+CKNm8Om sDlzO3vXgv178paYdBWcshGau6Qwzqfg7oS1yoy5GJad6+31HJ1kHZDeXB1M4Ii4bTQ/ mQr502q/qD0GK7UbW8nTEWKdMq78NjStgb3snwEd0R3W5mPIr+WUJe2oYmxIrt2FlaEt GzY5FPlxBrZdu1bCIAHCAWBo7fOocPmZAJ7mdY4jSgn71aeY6LTFpVn+zuCbbfwJWKX5 7e60S1HgSVTigXsE6RKiMgzdmwGHHD3rtItrylJPpUq2ZczE+yHTJjKaKGYX4KBSSIYJ RaAw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=KXPAB9q+fGgT3wZ6Ax8vxxMhYx8Zk4iTFycG5IttMms=; b=ucN6lt6yxkPYa3uqZLDWSFeTbNwB6WbT3w2dk9h4dve8YQKpj0NUapYxnLInLqbLTh jGgNpx8kyXiqyCydU0rEQD67gCPriAqZOkdQnew10lJWp0UOcAzdp1RbGxZHEYeDIXwW DHKfqJIRKwql/D0dqEH7zZrbvYNo1fcCqwD+cPvAmrvvRScU/WaD3LLzO0GhlRCcUJsk FxVCcSqJFjWn2mQrTHjCpvE1zmXCymlcEh9ciMiE3vMehXRvgWv1hFl8Sy8lJVi7Ql8t lROoVyA1jXcWcVkG+1l3fJAmfwxYflsedN1gE7fGwcJYvpu/pg+H2t1FavC2YldGA/3E djFA== X-Gm-Message-State: AOAM5327p7/9nQaiUOgoM2U++wX9EJX8DOddCZw8DIz3nr0mbq+KXPlG avGMrPye/P0EwYFCUF/Go70x62CF3k8= X-Google-Smtp-Source: ABdhPJy1DGeOUcLxTtAY7HuSNTFRbkjJdy+mQ7BUvbcDHjvkEf5GVrXmg7y+l0k56Ng512/ygFVV+Q== X-Received: by 2002:a1c:9cd8:: with SMTP id f207mr3930450wme.155.1612535466758; Fri, 05 Feb 2021 06:31:06 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id 9sm13359065wra.80.2021.02.05.06.31.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 05 Feb 2021 06:31:06 -0800 (PST) Message-Id: <366eb2afee839feccdfd2244b231d2ad718c76d4.1612535453.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Fri, 05 Feb 2021 14:30:47 +0000 Subject: [PATCH v3 12/17] chunk-format: create read chunk API Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: me@ttaylorr.com, gitster@pobox.com, l.s.r@web.de, szeder.dev@gmail.com, Chris Torek , Derrick Stolee , Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee Add the capability to read the table of contents, then pair the chunks with necessary logic using read_chunk_fn pointers. Callers will be added in future changes, but the typical outline will be: 1. initialize a 'struct chunkfile' with init_chunkfile(NULL). 2. call read_table_of_contents(). 3. for each chunk to parse, a. call pair_chunk() to assign a pointer with the chunk position, or b. call read_chunk() to run a callback on the chunk start and size. 4. call free_chunkfile() to clear the 'struct chunkfile' data. We are re-using the anonymous 'struct chunkfile' data, as it is internal to the chunk-format API. This gives it essentially two modes: write and read. If the same struct instance was used for both reads and writes, then there would be failures. Helped-by: Junio C Hamano Signed-off-by: Derrick Stolee --- chunk-format.c | 80 ++++++++++++++++++++++++++++++++++++++++++++++++++ chunk-format.h | 46 +++++++++++++++++++++++++++++ 2 files changed, 126 insertions(+) diff --git a/chunk-format.c b/chunk-format.c index 6e0f1900213e..bc9d4caf7276 100644 --- a/chunk-format.c +++ b/chunk-format.c @@ -12,6 +12,8 @@ struct chunk_info { uint32_t id; uint64_t size; chunk_write_fn write_fn; + + const void *start; }; struct chunkfile { @@ -89,3 +91,81 @@ int write_chunkfile(struct chunkfile *cf, void *data) return 0; } + +int read_table_of_contents(struct chunkfile *cf, + const unsigned char *mfile, + size_t mfile_size, + uint64_t toc_offset, + int toc_length) +{ + uint32_t chunk_id; + const unsigned char *table_of_contents = mfile + toc_offset; + + ALLOC_GROW(cf->chunks, toc_length, cf->chunks_alloc); + + while (toc_length--) { + uint64_t chunk_offset, next_chunk_offset; + + chunk_id = get_be32(table_of_contents); + chunk_offset = get_be64(table_of_contents + 4); + + if (!chunk_id) { + error(_("terminating chunk id appears earlier than expected")); + return 1; + } + + table_of_contents += CHUNK_LOOKUP_WIDTH; + next_chunk_offset = get_be64(table_of_contents + 4); + + if (next_chunk_offset < chunk_offset || + next_chunk_offset > mfile_size - the_hash_algo->rawsz) { + error(_("improper chunk offset(s) %"PRIx64" and %"PRIx64""), + chunk_offset, next_chunk_offset); + return -1; + } + + cf->chunks[cf->chunks_nr].id = chunk_id; + cf->chunks[cf->chunks_nr].start = mfile + chunk_offset; + cf->chunks[cf->chunks_nr].size = next_chunk_offset - chunk_offset; + cf->chunks_nr++; + } + + chunk_id = get_be32(table_of_contents); + if (chunk_id) { + error(_("final chunk has non-zero id %"PRIx32""), chunk_id); + return -1; + } + + return 0; +} + +static int pair_chunk_fn(const unsigned char *chunk_start, + size_t chunk_size, + void *data) +{ + const unsigned char **p = data; + *p = chunk_start; + return 0; +} + +int pair_chunk(struct chunkfile *cf, + uint32_t chunk_id, + const unsigned char **p) +{ + return read_chunk(cf, chunk_id, pair_chunk_fn, p); +} + +int read_chunk(struct chunkfile *cf, + uint32_t chunk_id, + chunk_read_fn fn, + void *data) +{ + int i; + + for (i = 0; i < cf->chunks_nr; i++) { + if (cf->chunks[i].id == chunk_id) + return fn(cf->chunks[i].start, cf->chunks[i].size, data); + } + + return CHUNK_NOT_FOUND; +} diff --git a/chunk-format.h b/chunk-format.h index 9a1d770accec..0edcc57db4e7 100644 --- a/chunk-format.h +++ b/chunk-format.h @@ -6,6 +6,19 @@ struct hashfile; struct chunkfile; +/* + * Initialize a 'struct chunkfile' for writing _or_ reading a file + * with the chunk format. + * + * If writing a file, supply a non-NULL 'struct hashfile *' that will + * be used to write. + * + * If reading a file, then supply the memory-mapped data to the + * pair_chunk() or read_chunk() methods, as appropriate. + * + * DO NOT MIX THESE MODES. Use different 'struct chunkfile' instances + * for reading and writing. + */ struct chunkfile *init_chunkfile(struct hashfile *f); void free_chunkfile(struct chunkfile *cf); int get_num_chunks(struct chunkfile *cf); @@ -16,4 +29,37 @@ void add_chunk(struct chunkfile *cf, chunk_write_fn fn); int write_chunkfile(struct chunkfile *cf, void *data); +int read_table_of_contents(struct chunkfile *cf, + const unsigned char *mfile, + size_t mfile_size, + uint64_t toc_offset, + int toc_length); + +#define CHUNK_NOT_FOUND (-2) + +/* + * Find 'chunk_id' in the given chunkfile and assign the + * given pointer to the position in the mmap'd file where + * that chunk begins. + * + * Returns CHUNK_NOT_FOUND if the chunk does not exist. + */ +int pair_chunk(struct chunkfile *cf, + uint32_t chunk_id, + const unsigned char **p); + +typedef int (*chunk_read_fn)(const unsigned char *chunk_start, + size_t chunk_size, void *data); +/* + * Find 'chunk_id' in the given chunkfile and call the + * given chunk_read_fn method with the information for + * that chunk. + * + * Returns CHUNK_NOT_FOUND if the chunk does not exist. + */ +int read_chunk(struct chunkfile *cf, + uint32_t chunk_id, + chunk_read_fn fn, + void *data); + #endif From patchwork Fri Feb 5 14:30:48 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 12071081 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 89384C433E9 for ; Fri, 5 Feb 2021 22:16:39 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 44FAC61492 for ; Fri, 5 Feb 2021 22:16:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232724AbhBEWQ1 (ORCPT ); Fri, 5 Feb 2021 17:16:27 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43960 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232575AbhBEOeB (ORCPT ); Fri, 5 Feb 2021 09:34:01 -0500 Received: from mail-wm1-x32f.google.com (mail-wm1-x32f.google.com [IPv6:2a00:1450:4864:20::32f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DEC0FC06178B for ; Fri, 5 Feb 2021 08:11:59 -0800 (PST) Received: by mail-wm1-x32f.google.com with SMTP id w4so6448121wmi.4 for ; Fri, 05 Feb 2021 08:11:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=69vd5Y2CPpd1hYEmKNSzHwI+erEl0uQVGxpZhdzj/sc=; b=G6yFjBQWoWJWPQ+dp8nllKn195T1jwhSBznM9Bd4o21WZxvMhIWlNMQoL6joQg5ojL WH98+joT7ctmEX9gmXnzLVw5+8rx3DjwnpovJQI3ZDOisKNLnwYnGkmtD6yMl9L2ah+B XcAzRlhbbA3Uo4d6WJpI6clLZjywHsxvJtjSXIEbHlyY/mcgFXhiKZAnOT5AwFXLKH8t EuGrO+HZO4a3j8SHSvR96fZ0/6oS6+AUuXVIxsauQzkZsSnDyd84nbOHUHOnWn/pcFMJ Z9g2rdFMeRklk319YhUjELQyJsscT6F5SwKyJRaHZVvD6Ab4C78ipkdbxCa74H1uTEwY GRxA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=69vd5Y2CPpd1hYEmKNSzHwI+erEl0uQVGxpZhdzj/sc=; b=dRYHdukH7Qq+UELYQ/w5Shiasm/Q5S9UZ30PxNj9WkCtr1vFLFOGdcdbl5YKAZN1+c zGDZbXQVYmC5JXFI69IcC+LMvYEPodhE17l6douprvMlTt8rtGhgyR2Xko1VfQ/pw6/Y VTbxboKG4QUM1SKQ/RjMuUy8VTZxYTq4TJLHqNFLjqUx0qvN0v2GDxWlraQPHPuEOZHz 8cvQAer+GTxdgQALu6O5zfl9Pb4r7wAodBUnqIzR4e4s7ecjT/gaS9JzkCm7Dvu9RaX6 WwwY3I8iKyO52jW+xk8BIZlXxebR6vukYY6i8J/efOMoPcqbpMxda5it8mKpJm5WFmpq Q3Vw== X-Gm-Message-State: AOAM5330WHWNZB2/fIL4sC6/Jyv8WZXbmca3NPZo/2ta548QEHUDD78/ xHMhpL+u/IkwTYuqmLGpN8Qnr6iPg6c= X-Google-Smtp-Source: ABdhPJz3e7Ftrjq4DHzAE06XWLhmDPfmMOxd387fUzHqf1T5+I5SmlyvcollKK05zckJu0vBXgOf2w== X-Received: by 2002:a7b:cf33:: with SMTP id m19mr3865297wmg.53.1612535467891; Fri, 05 Feb 2021 06:31:07 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id u10sm8619315wmj.40.2021.02.05.06.31.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 05 Feb 2021 06:31:07 -0800 (PST) Message-Id: <7838ad32e2e0b2e5941f5027d873f2453e08f7cb.1612535453.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Fri, 05 Feb 2021 14:30:48 +0000 Subject: [PATCH v3 13/17] commit-graph: use chunk-format read API Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: me@ttaylorr.com, gitster@pobox.com, l.s.r@web.de, szeder.dev@gmail.com, Chris Torek , Derrick Stolee , Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee Instead of parsing the table of contents directly, use the chunk-format API methods read_table_of_contents() and pair_chunk(). While the current implementation loses the duplicate-chunk detection, that will be added in a future change. Signed-off-by: Derrick Stolee --- commit-graph.c | 154 ++++++++++++++-------------------------- t/t5318-commit-graph.sh | 2 +- 2 files changed, 53 insertions(+), 103 deletions(-) diff --git a/commit-graph.c b/commit-graph.c index 7c607d23b29f..32cf5091d2fb 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -299,15 +299,43 @@ static int verify_commit_graph_lite(struct commit_graph *g) return 0; } +static int graph_read_oid_lookup(const unsigned char *chunk_start, + size_t chunk_size, void *data) +{ + struct commit_graph *g = data; + g->chunk_oid_lookup = chunk_start; + g->num_commits = chunk_size / g->hash_len; + return 0; +} + +static int graph_read_bloom_data(const unsigned char *chunk_start, + size_t chunk_size, void *data) +{ + struct commit_graph *g = data; + uint32_t hash_version; + g->chunk_bloom_data = chunk_start; + hash_version = get_be32(chunk_start); + + if (hash_version != 1) + return 0; + + g->bloom_filter_settings = xmalloc(sizeof(struct bloom_filter_settings)); + g->bloom_filter_settings->hash_version = hash_version; + g->bloom_filter_settings->num_hashes = get_be32(chunk_start + 4); + g->bloom_filter_settings->bits_per_entry = get_be32(chunk_start + 8); + g->bloom_filter_settings->max_changed_paths = DEFAULT_BLOOM_MAX_CHANGES; + + return 0; +} + struct commit_graph *parse_commit_graph(struct repository *r, void *graph_map, size_t graph_size) { - const unsigned char *data, *chunk_lookup; - uint32_t i; + const unsigned char *data; struct commit_graph *graph; - uint64_t next_chunk_offset; uint32_t graph_signature; unsigned char graph_version, hash_version; + struct chunkfile *cf = NULL; if (!graph_map) return NULL; @@ -356,108 +384,28 @@ struct commit_graph *parse_commit_graph(struct repository *r, return NULL; } - chunk_lookup = data + 8; - next_chunk_offset = get_be64(chunk_lookup + 4); - for (i = 0; i < graph->num_chunks; i++) { - uint32_t chunk_id; - uint64_t chunk_offset = next_chunk_offset; - int chunk_repeated = 0; - - chunk_id = get_be32(chunk_lookup + 0); - - chunk_lookup += GRAPH_CHUNKLOOKUP_WIDTH; - next_chunk_offset = get_be64(chunk_lookup + 4); - - if (chunk_offset > graph_size - the_hash_algo->rawsz) { - error(_("commit-graph improper chunk offset %08x%08x"), (uint32_t)(chunk_offset >> 32), - (uint32_t)chunk_offset); - goto free_and_return; - } - - switch (chunk_id) { - case GRAPH_CHUNKID_OIDFANOUT: - if (graph->chunk_oid_fanout) - chunk_repeated = 1; - else - graph->chunk_oid_fanout = (uint32_t*)(data + chunk_offset); - break; - - case GRAPH_CHUNKID_OIDLOOKUP: - if (graph->chunk_oid_lookup) - chunk_repeated = 1; - else { - graph->chunk_oid_lookup = data + chunk_offset; - graph->num_commits = (next_chunk_offset - chunk_offset) - / graph->hash_len; - } - break; + cf = init_chunkfile(NULL); - case GRAPH_CHUNKID_DATA: - if (graph->chunk_commit_data) - chunk_repeated = 1; - else - graph->chunk_commit_data = data + chunk_offset; - break; - - case GRAPH_CHUNKID_GENERATION_DATA: - if (graph->chunk_generation_data) - chunk_repeated = 1; - else - graph->chunk_generation_data = data + chunk_offset; - break; - - case GRAPH_CHUNKID_GENERATION_DATA_OVERFLOW: - if (graph->chunk_generation_data_overflow) - chunk_repeated = 1; - else - graph->chunk_generation_data_overflow = data + chunk_offset; - break; - - case GRAPH_CHUNKID_EXTRAEDGES: - if (graph->chunk_extra_edges) - chunk_repeated = 1; - else - graph->chunk_extra_edges = data + chunk_offset; - break; - - case GRAPH_CHUNKID_BASE: - if (graph->chunk_base_graphs) - chunk_repeated = 1; - else - graph->chunk_base_graphs = data + chunk_offset; - break; - - case GRAPH_CHUNKID_BLOOMINDEXES: - if (graph->chunk_bloom_indexes) - chunk_repeated = 1; - else if (r->settings.commit_graph_read_changed_paths) - graph->chunk_bloom_indexes = data + chunk_offset; - break; - - case GRAPH_CHUNKID_BLOOMDATA: - if (graph->chunk_bloom_data) - chunk_repeated = 1; - else if (r->settings.commit_graph_read_changed_paths) { - uint32_t hash_version; - graph->chunk_bloom_data = data + chunk_offset; - hash_version = get_be32(data + chunk_offset); - - if (hash_version != 1) - break; + if (read_table_of_contents(cf, graph->data, graph_size, + GRAPH_HEADER_SIZE, graph->num_chunks)) + goto free_and_return; - graph->bloom_filter_settings = xmalloc(sizeof(struct bloom_filter_settings)); - graph->bloom_filter_settings->hash_version = hash_version; - graph->bloom_filter_settings->num_hashes = get_be32(data + chunk_offset + 4); - graph->bloom_filter_settings->bits_per_entry = get_be32(data + chunk_offset + 8); - graph->bloom_filter_settings->max_changed_paths = DEFAULT_BLOOM_MAX_CHANGES; - } - break; - } + pair_chunk(cf, GRAPH_CHUNKID_OIDFANOUT, + (const unsigned char **)&graph->chunk_oid_fanout); + read_chunk(cf, GRAPH_CHUNKID_OIDLOOKUP, graph_read_oid_lookup, graph); + pair_chunk(cf, GRAPH_CHUNKID_DATA, &graph->chunk_commit_data); + pair_chunk(cf, GRAPH_CHUNKID_EXTRAEDGES, &graph->chunk_extra_edges); + pair_chunk(cf, GRAPH_CHUNKID_BASE, &graph->chunk_base_graphs); + pair_chunk(cf, GRAPH_CHUNKID_GENERATION_DATA, + &graph->chunk_generation_data); + pair_chunk(cf, GRAPH_CHUNKID_GENERATION_DATA_OVERFLOW, + &graph->chunk_generation_data_overflow); - if (chunk_repeated) { - error(_("commit-graph chunk id %08x appears multiple times"), chunk_id); - goto free_and_return; - } + if (r->settings.commit_graph_read_changed_paths) { + pair_chunk(cf, GRAPH_CHUNKID_BLOOMINDEXES, + &graph->chunk_bloom_indexes); + read_chunk(cf, GRAPH_CHUNKID_BLOOMDATA, + graph_read_bloom_data, graph); } if (graph->chunk_bloom_indexes && graph->chunk_bloom_data) { @@ -474,9 +422,11 @@ struct commit_graph *parse_commit_graph(struct repository *r, if (verify_commit_graph_lite(graph)) goto free_and_return; + free_chunkfile(cf); return graph; free_and_return: + free_chunkfile(cf); free(graph->bloom_filter_settings); free(graph); return NULL; diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh index fa27df579a57..c7da741284e5 100755 --- a/t/t5318-commit-graph.sh +++ b/t/t5318-commit-graph.sh @@ -564,7 +564,7 @@ test_expect_success 'detect bad hash version' ' test_expect_success 'detect low chunk count' ' corrupt_graph_and_verify $GRAPH_BYTE_CHUNK_COUNT "\01" \ - "missing the .* chunk" + "final chunk has non-zero id" ' test_expect_success 'detect missing OID fanout chunk' ' From patchwork Fri Feb 5 14:30:49 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 12071061 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 004EEC433E6 for ; Fri, 5 Feb 2021 22:08:16 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D70D764F9E for ; Fri, 5 Feb 2021 22:08:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233006AbhBEWHL (ORCPT ); Fri, 5 Feb 2021 17:07:11 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47886 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231321AbhBEOwM (ORCPT ); Fri, 5 Feb 2021 09:52:12 -0500 Received: from mail-wm1-x32f.google.com (mail-wm1-x32f.google.com [IPv6:2a00:1450:4864:20::32f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2431EC0611C3 for ; Fri, 5 Feb 2021 08:29:46 -0800 (PST) Received: by mail-wm1-x32f.google.com with SMTP id 190so6510567wmz.0 for ; Fri, 05 Feb 2021 08:29:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=hZrt/9Ddj4gx/b89hC5HBX3Cwr8bZTYyGFdAEwA+J8c=; b=ZBPorRCzy/rHnwUcZP8M4VMnyOUQ8UjluKfz0C27J7Ho74MnwXvs0hQZRZaKBKU7SZ IzSeV3lcmTi/QrlOMij2pIyWz8hAuLh9mfoCtg861D7xu+hqUcOxhr07vd6mH7jMZM4F MNfYUOQCxQ0c/pCsfxV3KeN3XWR5eGkrBAEyvB0ACNtseBUT3h+pdzGuSw4yXzXAJ8Zn oJRCN7Ert/Q6e/0j2sZKQZ3XW2Ivncd3NnIJ9SB4CFnADulZaSZhGAPgNDiqMQd/Gmp+ wab+a2c+L3a+IlE4KwZXd9gZmv4rk5oCuSsSGtMKVdVG/YdaLHxqujrhthlWBB2VfGqj RxIA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=hZrt/9Ddj4gx/b89hC5HBX3Cwr8bZTYyGFdAEwA+J8c=; b=B8Q6qhq58HVJTAqEeawb8HgnlALvil/mnED9ZKaHqhu2QqC+VgxF6KjqWDsuZis1BQ qPlihaWwg0Z4DALF9EKFO3sJrd3VpHPhCTtu55XLEIC6UpMG+VUSco48uqiBYegXaUDC Kncmm+1dm4/iOdWXyWFC4w6CvTea6K6vsXy28La8CEpQsDGTa3cjEML3G/OcHQI45jOw H7FKQ3WY0iMxyFTU2OWYvAdaNOJ8j6A/hfzUQgqWB3beCAjVodCwAPinZE/zRa5djl4H Y7w6SS+KXQrgliyd2ATB3ncyQaK4ohg4mHgMuNCjmG+1/aZzZE2yorB30nqstInn7lO+ V5UA== X-Gm-Message-State: AOAM5330szVYa0emal2xzbWsD/8EBQzV3+QBbRZdV4kI1QF6h13OZrYK cNteeuVyv7b6mOHYlSr9OuhHpvAtU1s= X-Google-Smtp-Source: ABdhPJzma14EFStLzGzTGttDB/vdLFylxt1qlFU+IVtRi5CBkDKAbRrF8Pe/BJO2zhuuiqEduR09eA== X-Received: by 2002:a7b:c305:: with SMTP id k5mr3796200wmj.57.1612535468915; Fri, 05 Feb 2021 06:31:08 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id d23sm9045545wmd.11.2021.02.05.06.31.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 05 Feb 2021 06:31:08 -0800 (PST) Message-Id: <6bddd9e63b9b41d196db1bb74c60ad66566f66c7.1612535453.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Fri, 05 Feb 2021 14:30:49 +0000 Subject: [PATCH v3 14/17] midx: use chunk-format read API Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: me@ttaylorr.com, gitster@pobox.com, l.s.r@web.de, szeder.dev@gmail.com, Chris Torek , Derrick Stolee , Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee Instead of parsing the table of contents directly, use the chunk-format API methods read_table_of_contents() and pair_chunk(). In particular, we can use the return value of pair_chunk() to generate an error when a required chunk is missing. Signed-off-by: Derrick Stolee --- midx.c | 72 ++++++++++++++----------------------- t/t5319-multi-pack-index.sh | 6 ++-- 2 files changed, 29 insertions(+), 49 deletions(-) diff --git a/midx.c b/midx.c index d9c7411b083b..aee9ed832d52 100644 --- a/midx.c +++ b/midx.c @@ -54,6 +54,19 @@ static char *get_midx_filename(const char *object_dir) return xstrfmt("%s/pack/multi-pack-index", object_dir); } +static int midx_read_oid_fanout(const unsigned char *chunk_start, + size_t chunk_size, void *data) +{ + struct multi_pack_index *m = data; + m->chunk_oid_fanout = (uint32_t *)chunk_start; + + if (chunk_size != 4 * 256) { + error(_("multi-pack-index OID fanout is of the wrong size")); + return 1; + } + return 0; +} + struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local) { struct multi_pack_index *m = NULL; @@ -65,6 +78,7 @@ struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local char *midx_name = get_midx_filename(object_dir); uint32_t i; const char *cur_pack_name; + struct chunkfile *cf = NULL; fd = git_open(midx_name); @@ -114,58 +128,23 @@ struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local m->num_packs = get_be32(m->data + MIDX_BYTE_NUM_PACKS); - for (i = 0; i < m->num_chunks; i++) { - uint32_t chunk_id = get_be32(m->data + MIDX_HEADER_SIZE + - MIDX_CHUNKLOOKUP_WIDTH * i); - uint64_t chunk_offset = get_be64(m->data + MIDX_HEADER_SIZE + 4 + - MIDX_CHUNKLOOKUP_WIDTH * i); - - if (chunk_offset >= m->data_len) - die(_("invalid chunk offset (too large)")); - - switch (chunk_id) { - case MIDX_CHUNKID_PACKNAMES: - m->chunk_pack_names = m->data + chunk_offset; - break; - - case MIDX_CHUNKID_OIDFANOUT: - m->chunk_oid_fanout = (uint32_t *)(m->data + chunk_offset); - break; - - case MIDX_CHUNKID_OIDLOOKUP: - m->chunk_oid_lookup = m->data + chunk_offset; - break; - - case MIDX_CHUNKID_OBJECTOFFSETS: - m->chunk_object_offsets = m->data + chunk_offset; - break; - - case MIDX_CHUNKID_LARGEOFFSETS: - m->chunk_large_offsets = m->data + chunk_offset; - break; - - case 0: - die(_("terminating multi-pack-index chunk id appears earlier than expected")); - break; - - default: - /* - * Do nothing on unrecognized chunks, allowing future - * extensions to add optional chunks. - */ - break; - } - } + cf = init_chunkfile(NULL); - if (!m->chunk_pack_names) + if (read_table_of_contents(cf, m->data, midx_size, + MIDX_HEADER_SIZE, m->num_chunks)) + goto cleanup_fail; + + if (pair_chunk(cf, MIDX_CHUNKID_PACKNAMES, &m->chunk_pack_names) == CHUNK_NOT_FOUND) die(_("multi-pack-index missing required pack-name chunk")); - if (!m->chunk_oid_fanout) + if (read_chunk(cf, MIDX_CHUNKID_OIDFANOUT, midx_read_oid_fanout, m) == CHUNK_NOT_FOUND) die(_("multi-pack-index missing required OID fanout chunk")); - if (!m->chunk_oid_lookup) + if (pair_chunk(cf, MIDX_CHUNKID_OIDLOOKUP, &m->chunk_oid_lookup) == CHUNK_NOT_FOUND) die(_("multi-pack-index missing required OID lookup chunk")); - if (!m->chunk_object_offsets) + if (pair_chunk(cf, MIDX_CHUNKID_OBJECTOFFSETS, &m->chunk_object_offsets) == CHUNK_NOT_FOUND) die(_("multi-pack-index missing required object offsets chunk")); + pair_chunk(cf, MIDX_CHUNKID_LARGEOFFSETS, &m->chunk_large_offsets); + m->num_objects = ntohl(m->chunk_oid_fanout[255]); m->pack_names = xcalloc(m->num_packs, sizeof(*m->pack_names)); @@ -191,6 +170,7 @@ struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local cleanup_fail: free(m); free(midx_name); + free(cf); if (midx_map) munmap(midx_map, midx_size); if (0 <= fd) diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh index 297de502a94f..ad4e878b65b8 100755 --- a/t/t5319-multi-pack-index.sh +++ b/t/t5319-multi-pack-index.sh @@ -314,12 +314,12 @@ test_expect_success 'verify bad OID version' ' test_expect_success 'verify truncated chunk count' ' corrupt_midx_and_verify $MIDX_BYTE_CHUNK_COUNT "\01" $objdir \ - "missing required" + "final chunk has non-zero id" ' test_expect_success 'verify extended chunk count' ' corrupt_midx_and_verify $MIDX_BYTE_CHUNK_COUNT "\07" $objdir \ - "terminating multi-pack-index chunk id appears earlier than expected" + "terminating chunk id appears earlier than expected" ' test_expect_success 'verify missing required chunk' ' @@ -329,7 +329,7 @@ test_expect_success 'verify missing required chunk' ' test_expect_success 'verify invalid chunk offset' ' corrupt_midx_and_verify $MIDX_BYTE_CHUNK_OFFSET "\01" $objdir \ - "invalid chunk offset (too large)" + "improper chunk offset(s)" ' test_expect_success 'verify packnames out of order' ' From patchwork Fri Feb 5 14:30:50 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 12070981 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E1692C433E0 for ; Fri, 5 Feb 2021 21:44:23 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 95D4264E33 for ; Fri, 5 Feb 2021 21:44:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231430AbhBEVnl (ORCPT ); Fri, 5 Feb 2021 16:43:41 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49208 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233015AbhBEO76 (ORCPT ); Fri, 5 Feb 2021 09:59:58 -0500 Received: from mail-lj1-x231.google.com (mail-lj1-x231.google.com [IPv6:2a00:1450:4864:20::231]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B5422C061574 for ; Fri, 5 Feb 2021 08:37:01 -0800 (PST) Received: by mail-lj1-x231.google.com with SMTP id f2so8441631ljp.11 for ; Fri, 05 Feb 2021 08:37:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=852UmEtFFsQyc5cH8Yvo3zgvKqaSAOmNSIfxoBcHgA0=; b=YycI25X2fj09ZTuRsnoMy3Mm01yPJJkPv0pHfLGYDNt/VWHUGkvPc69DnqbZwV7Ug7 o/O2UDWn8njuPNQQdOpxEGTZuvRTuYt2fEkPUohK2HppDSc3aKqCxPMjMDTcBaqN6wUz LyMWB50f5SndRMZi6k97x29L6RHAm1mgeG4qa8cwY4LNHU9P/tJyjZUJ94twee6Heomz 4Vr+TT+t/dM8rcX6HsIv0nok0rhjZHTW1q2cSFJQgcmQ34bxfUxyp1PGejmQi5ZtpXHS F6h9l6VyjvAIIJ54X4ScBGgaY7uBzgw0rVYLH11qPqvTNPQgK5fHSif4Q5PXnARm9Atl +QOw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=852UmEtFFsQyc5cH8Yvo3zgvKqaSAOmNSIfxoBcHgA0=; b=J1iveZbxPKUkcdn1YzbKoKquYCxxESVbsaXiFzgGr3uIwaGNxYMj2rkaNfc0eDjMN8 nfyDrBXWQZrFcBkDv9T5tfut0b1iA5tgt68R7WYesqOCp3bl82X4p2wieMDJ2JfmMCYt 7kmS3ewvWKjf5TsRw917YBd65MjH4aZJoVmkn4ujciLhp3RPu+6BCY20fbz2IBY5L3p2 Jali/XxFg6e87GRikjZy8iQcSnKZ63Omn2DW+MpGgv7ToWJR1tFKrFJX8OMOJGkjQb2w BF8nVJ3w/fgrS4KgdTUvTRask+3AYBCP/ng+yiyRfamjFVh2PxlNnv++LT2CG7mpr1Gp bqqQ== X-Gm-Message-State: AOAM532kHxiD/mz+L5Y2jlbs2eelMJ/E40rhABXwrR5EHAAEk9aO1Byu fgQhI/4lsRkEHJv9c2QCRTRXIFP/kck= X-Google-Smtp-Source: ABdhPJwZWebbwB2nEfKGVX0Q3bqi007jvm6gZroKfJ8v6ck4n6Mm92ccLD7dqj8y/bJ7kTWhLWsB8A== X-Received: by 2002:adf:e384:: with SMTP id e4mr5413921wrm.13.1612535470128; Fri, 05 Feb 2021 06:31:10 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id s23sm8815310wmc.35.2021.02.05.06.31.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 05 Feb 2021 06:31:09 -0800 (PST) Message-Id: <3cd97f389f1f59566a85e4496321ee8486ac894c.1612535453.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Fri, 05 Feb 2021 14:30:50 +0000 Subject: [PATCH v3 15/17] midx: use 64-bit multiplication for chunk sizes Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: me@ttaylorr.com, gitster@pobox.com, l.s.r@web.de, szeder.dev@gmail.com, Chris Torek , Derrick Stolee , Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee When calculating the sizes of certain chunks, we should use 64-bit multiplication always. This allows us to properly predict the chunk sizes without risk of overflow. Other possible overflows were discovered by evaluating each multiplication in midx.c and ensuring that at least one side of the operator was of type size_t or off_t. Signed-off-by: Derrick Stolee --- midx.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/midx.c b/midx.c index aee9ed832d52..95648a1f368f 100644 --- a/midx.c +++ b/midx.c @@ -246,7 +246,7 @@ static off_t nth_midxed_offset(struct multi_pack_index *m, uint32_t pos) const unsigned char *offset_data; uint32_t offset32; - offset_data = m->chunk_object_offsets + pos * MIDX_CHUNK_OFFSET_WIDTH; + offset_data = m->chunk_object_offsets + (off_t)pos * MIDX_CHUNK_OFFSET_WIDTH; offset32 = get_be32(offset_data + sizeof(uint32_t)); if (m->chunk_large_offsets && offset32 & MIDX_LARGE_OFFSET_NEEDED) { @@ -262,7 +262,8 @@ static off_t nth_midxed_offset(struct multi_pack_index *m, uint32_t pos) static uint32_t nth_midxed_pack_int_id(struct multi_pack_index *m, uint32_t pos) { - return get_be32(m->chunk_object_offsets + pos * MIDX_CHUNK_OFFSET_WIDTH); + return get_be32(m->chunk_object_offsets + + (off_t)pos * MIDX_CHUNK_OFFSET_WIDTH); } static int nth_midxed_pack_entry(struct repository *r, @@ -914,15 +915,15 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * add_chunk(cf, MIDX_CHUNKID_OIDFANOUT, MIDX_CHUNK_FANOUT_SIZE, write_midx_oid_fanout); add_chunk(cf, MIDX_CHUNKID_OIDLOOKUP, - ctx.entries_nr * the_hash_algo->rawsz, + (size_t)ctx.entries_nr * the_hash_algo->rawsz, write_midx_oid_lookup); add_chunk(cf, MIDX_CHUNKID_OBJECTOFFSETS, - ctx.entries_nr * MIDX_CHUNK_OFFSET_WIDTH, + (size_t)ctx.entries_nr * MIDX_CHUNK_OFFSET_WIDTH, write_midx_object_offsets); if (ctx.large_offsets_needed) add_chunk(cf, MIDX_CHUNKID_LARGEOFFSETS, - ctx.num_large_offsets * MIDX_CHUNK_LARGE_OFFSET_WIDTH, + (size_t)ctx.num_large_offsets * MIDX_CHUNK_LARGE_OFFSET_WIDTH, write_midx_large_offsets); write_midx_header(f, get_num_chunks(cf), ctx.nr - dropped_packs); From patchwork Fri Feb 5 14:30:51 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 12071077 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3150DC43332 for ; Fri, 5 Feb 2021 22:12:53 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 09F5864FA8 for ; Fri, 5 Feb 2021 22:12:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232995AbhBEWMg (ORCPT ); Fri, 5 Feb 2021 17:12:36 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45808 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230124AbhBEOmy (ORCPT ); Fri, 5 Feb 2021 09:42:54 -0500 Received: from mail-wm1-x32b.google.com (mail-wm1-x32b.google.com [IPv6:2a00:1450:4864:20::32b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5406BC061797 for ; Fri, 5 Feb 2021 08:20:20 -0800 (PST) Received: by mail-wm1-x32b.google.com with SMTP id m1so6352295wml.2 for ; Fri, 05 Feb 2021 08:20:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=IcnHLWkBF0/Re8l3DLlTqp0+tCWhKAAetGgWdWH9LR0=; b=hDzq7RbxzJrcyNa7yXuhyovnx1sPxmpk5lEo5taAi+RY8Z2h7lLCvyyd7ezZCR/ZEE 5G+xtrcmjVHZjCJsnFrI8zwH+7Kn6agXUoVabBMaSptjBlXoE/lbKO77w9+YTcYDTAlm 8avvKRBzuSuGinJQa0wa3Pop8tFMn7nkTHlkb2WIBXe87htKMSlUnda4u9YfGwMmbAeL fnLbFqWMjSR1EJzXnhfZ2+94HxeN6gFnwYAH77GsrjpHXqPQQ3L5GJRRLruGDlerpQU5 HHicsJBzUUIgbwKkZsyCX27fLNDoTi3s8sJUSivDl+cqBJ5YefFvpOGUlQ7bFWVFgPdm 4Mzw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=IcnHLWkBF0/Re8l3DLlTqp0+tCWhKAAetGgWdWH9LR0=; b=m4Byp1Ukpq3pPyqi742x55BVsisiIxm8KRh6L3MZ3iFUQDwXWETiKz9bp2zKq5LPhw V7iuTkLoCLxZhNKiSV3P90oliVSo2xHDah4v4nSAgZc2OtyGFTmfKUJq9iIFB7Ifctil nrRRw4ES5nNzZCFDSEN1cRgRcoYFejSDCAwIaGGDg/hyi54EkQ9IoRYkxcJraHP4f/rU +afowGkdBMqf2hUUkHdObdADiBqs1eAJOZDWDq/Th5g+WuOUz1wCPUrplzbx5xDGJ85R efxsB+7KJv5z0Ubnlia0arSCmoQETKXbVyGOJ3Wn51bxfAggig94m4r4c8ZM8z+7eFqN tFnA== X-Gm-Message-State: AOAM531a7Pq/Uglrqjm6XZj/aFk1HsTzdw/j/PxqS1gXTPSdXAIgpqzK G1i4eUCTbRKJDqsvt0zbaBhu3GlED18= X-Google-Smtp-Source: ABdhPJxfww7TplHF+vlen6Rh6QcagWs5bgILwtDUGjj3tJHWaTz5drBt8ChjxWxQad232kYriZ0fiQ== X-Received: by 2002:a1c:e402:: with SMTP id b2mr3919587wmh.122.1612535471162; Fri, 05 Feb 2021 06:31:11 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id q19sm9452189wmj.23.2021.02.05.06.31.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 05 Feb 2021 06:31:10 -0800 (PST) Message-Id: In-Reply-To: References: Date: Fri, 05 Feb 2021 14:30:51 +0000 Subject: [PATCH v3 16/17] chunk-format: restore duplicate chunk checks Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: me@ttaylorr.com, gitster@pobox.com, l.s.r@web.de, szeder.dev@gmail.com, Chris Torek , Derrick Stolee , Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee Before refactoring into the chunk-format API, the commit-graph parsing logic included checks for duplicate chunks. It is unlikely that we would desire a chunk-based file format that allows duplicate chunk IDs in the table of contents, so add duplicate checks into read_table_of_contents(). Signed-off-by: Derrick Stolee --- chunk-format.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/chunk-format.c b/chunk-format.c index bc9d4caf7276..e4889d9efcd1 100644 --- a/chunk-format.c +++ b/chunk-format.c @@ -98,6 +98,7 @@ int read_table_of_contents(struct chunkfile *cf, uint64_t toc_offset, int toc_length) { + int i; uint32_t chunk_id; const unsigned char *table_of_contents = mfile + toc_offset; @@ -124,6 +125,14 @@ int read_table_of_contents(struct chunkfile *cf, return -1; } + for (i = 0; i < cf->chunks_nr; i++) { + if (cf->chunks[i].id == chunk_id) { + error(_("duplicate chunk ID %"PRIx32" found"), + chunk_id); + return -1; + } + } + cf->chunks[cf->chunks_nr].id = chunk_id; cf->chunks[cf->chunks_nr].start = mfile + chunk_offset; cf->chunks[cf->chunks_nr].size = next_chunk_offset - chunk_offset; From patchwork Fri Feb 5 14:30:52 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 12070417 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5A89EC433E0 for ; Fri, 5 Feb 2021 16:33:20 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 09BAD64E93 for ; Fri, 5 Feb 2021 16:33:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231377AbhBEOxe (ORCPT ); Fri, 5 Feb 2021 09:53:34 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46706 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232807AbhBEOqq (ORCPT ); Fri, 5 Feb 2021 09:46:46 -0500 Received: from mail-lf1-x135.google.com (mail-lf1-x135.google.com [IPv6:2a00:1450:4864:20::135]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 111E2C061786 for ; Fri, 5 Feb 2021 08:24:29 -0800 (PST) Received: by mail-lf1-x135.google.com with SMTP id f1so10706568lfu.3 for ; Fri, 05 Feb 2021 08:24:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=VY70YxpMWWO9iYCYRFM9nL7VSofDIl8bmBKvouWjznc=; b=BboFbZoqaVlMV3pgrGxs7zsiFQVgk8pRkScgJUIi0u5dhesok22k7ys16eZR8v6qOI FEUgeYyyErk2V8x8+NJA/reHoHAhXq0Y/eANyBrZoB4ygE9E88U+VaJM+nfdFpPrtdw2 hRmcYQp9hLlAhGprnqzM12KjrgFAxRhOdPgGo5W/3lzHI+NYZGXg5NM3Np+/+SIzKlom pv0RHorOOo0/EM2SwqdiBQpp0Sgm7bwmA+IsOFZ0Nw/hA+1zU2avu3MFNoFRxjSYzw8V aTRVLDWbhNn9qdk+7vL9t+L40MhLBa38aGyZzgZxPfvL9vlASbmyz1D6XTRBG5QY3Fl6 UFgA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=VY70YxpMWWO9iYCYRFM9nL7VSofDIl8bmBKvouWjznc=; b=mcD8pCXavSLfjHXLPdLa3NXodl/dWw+EE4Xljgo1dS7cMbc1R8DeBMcMOViAcBuCK2 x5QH3Yz/Jvfe7Sz1VICLnY8xeZlkOTVUD6MwLYF9jAvyYtcsHi/YxvCfJFOmBrxBTNIB /XbrIgTKp0AxpXU1yxu7C0/6RCT1M+kKJC0ZrPeRyI7GMEBZir86NS6LKjmHtA2dEVQ2 6WZ86qH3jbpPNLBrv3V2dIzy6oq9txIeNRC6epgpL1R7uzoURpONy/f8TsAzprQ21b0d qwbLfVZnywmjPSpQ3yyCtbDLBsLQkw3SLSYFCPuPtFS3LNIMnhaD7DqL05F02idU2ANy HlYg== X-Gm-Message-State: AOAM532Vm71o6IihO1q6ZeVTI7n3UK+4sa1sXUvZ1BaZtTPpQa3EUa73 YEzv5WNPFc3W7K0s8c66QzLAO3MAJTE= X-Google-Smtp-Source: ABdhPJwjKz5FAC1DiVv94/jzrSe9s7jCgI6DiD/OJ8x6U8Zv2DwJ8+ODCf7o/gXDzex+nUBhPs7rPg== X-Received: by 2002:a05:6000:108b:: with SMTP id y11mr5234772wrw.379.1612535472209; Fri, 05 Feb 2021 06:31:12 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id q7sm12219440wrx.18.2021.02.05.06.31.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 05 Feb 2021 06:31:11 -0800 (PST) Message-Id: <4c7d751f1e39df30fe9ba3b3e9e0829ddead0340.1612535453.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Fri, 05 Feb 2021 14:30:52 +0000 Subject: [PATCH v3 17/17] chunk-format: add technical docs Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: me@ttaylorr.com, gitster@pobox.com, l.s.r@web.de, szeder.dev@gmail.com, Chris Torek , Derrick Stolee , Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee The chunk-based file format is now an API in the code, but we should also take time to document it as a file format. Specifically, it matches the CHUNK LOOKUP sections of the commit-graph and multi-pack-index files, but there are some commonalities that should be grouped in this document. Signed-off-by: Derrick Stolee --- Documentation/technical/chunk-format.txt | 116 ++++++++++++++++++ .../technical/commit-graph-format.txt | 3 + Documentation/technical/pack-format.txt | 3 + 3 files changed, 122 insertions(+) create mode 100644 Documentation/technical/chunk-format.txt diff --git a/Documentation/technical/chunk-format.txt b/Documentation/technical/chunk-format.txt new file mode 100644 index 000000000000..593614fcedab --- /dev/null +++ b/Documentation/technical/chunk-format.txt @@ -0,0 +1,116 @@ +Chunk-based file formats +======================== + +Some file formats in Git use a common concept of "chunks" to describe +sections of the file. This allows structured access to a large file by +scanning a small "table of contents" for the remaining data. This common +format is used by the `commit-graph` and `multi-pack-index` files. See +link:technical/pack-format.html[the `multi-pack-index` format] and +link:technical/commit-graph-format.html[the `commit-graph` format] for +how they use the chunks to describe structured data. + +A chunk-based file format begins with some header information custom to +that format. That header should include enough information to identify +the file type, format version, and number of chunks in the file. From this +information, that file can determine the start of the chunk-based region. + +The chunk-based region starts with a table of contents describing where +each chunk starts and ends. This consists of (C+1) rows of 12 bytes each, +where C is the number of chunks. Consider the following table: + + | Chunk ID (4 bytes) | Chunk Offset (8 bytes) | + |--------------------|------------------------| + | ID[0] | OFFSET[0] | + | ... | ... | + | ID[C] | OFFSET[C] | + | 0x0000 | OFFSET[C+1] | + +Each row consists of a 4-byte chunk identifier (ID) and an 8-byte offset. +Each integer is stored in network-byte order. + +The chunk identifier `ID[i]` is a label for the data stored within this +fill from `OFFSET[i]` (inclusive) to `OFFSET[i+1]` (exclusive). Thus, the +size of the `i`th chunk is equal to the difference between `OFFSET[i+1]` +and `OFFSET[i]`. This requires that the chunk data appears contiguously +in the same order as the table of contents. + +The final entry in the table of contents must be four zero bytes. This +confirms that the table of contents is ending and provides the offset for +the end of the chunk-based data. + +Note: The chunk-based format expects that the file contains _at least_ a +trailing hash after `OFFSET[C+1]`. + +Functions for working with chunk-based file formats are declared in +`chunk-format.h`. Using these methods provide extra checks that assist +developers when creating new file formats. + +Writing chunk-based file formats +-------------------------------- + +To write a chunk-based file format, create a `struct chunkfile` by +calling `init_chunkfile()` and pass a `struct hashfile` pointer. The +caller is responsible for opening the `hashfile` and writing header +information so the file format is identifiable before the chunk-based +format begins. + +Then, call `add_chunk()` for each chunk that is intended for write. This +populates the `chunkfile` with information about the order and size of +each chunk to write. Provide a `chunk_write_fn` function pointer to +perform the write of the chunk data upon request. + +Call `write_chunkfile()` to write the table of contents to the `hashfile` +followed by each of the chunks. This will verify that each chunk wrote +the expected amount of data so the table of contents is correct. + +Finally, call `free_chunkfile()` to clear the `struct chunkfile` data. The +caller is responsible for finalizing the `hashfile` by writing the trailing +hash and closing the file. + +Reading chunk-based file formats +-------------------------------- + +To read a chunk-based file format, the file must be opened as a +memory-mapped region. The chunk-format API expects that the entire file +is mapped as a contiguous memory region. + +Initialize a `struct chunkfile` pointer with `init_chunkfile(NULL)`. + +After reading the header information from the beginning of the file, +including the chunk count, call `read_table_of_contents()` to populate +the `struct chunkfile` with the list of chunks, their offsets, and their +sizes. + +Extract the data information for each chunk using `pair_chunk()` or +`read_chunk()`: + +* `pair_chunk()` assigns a given pointer with the location inside the + memory-mapped file corresponding to that chunk's offset. If the chunk + does not exist, then the pointer is not modified. + +* `read_chunk()` takes a `chunk_read_fn` function pointer and calls it + with the appropriate initial pointer and size information. The function + is not called if the chunk does not exist. Use this method to read chunks + if you need to perform immediate parsing or if you need to execute logic + based on the size of the chunk. + +After calling these methods, call `free_chunkfile()` to clear the +`struct chunkfile` data. This will not close the memory-mapped region. +Callers are expected to own that data for the timeframe the pointers into +the region are needed. + +Examples +-------- + +These file formats use the chunk-format API, and can be used as examples +for future formats: + +* *commit-graph:* see `write_commit_graph_file()` and `parse_commit_graph()` + in `commit-graph.c` for how the chunk-format API is used to write and + parse the commit-graph file format documented in + link:technical/commit-graph-format.html[the commit-graph file format]. + +* *multi-pack-index:* see `write_midx_internal()` and `load_multi_pack_index()` + in `midx.c` for how the chunk-format API is used to write and + parse the multi-pack-index file format documented in + link:technical/pack-format.html[the multi-pack-index file format]. diff --git a/Documentation/technical/commit-graph-format.txt b/Documentation/technical/commit-graph-format.txt index b6658eff1882..87971c27dd73 100644 --- a/Documentation/technical/commit-graph-format.txt +++ b/Documentation/technical/commit-graph-format.txt @@ -61,6 +61,9 @@ CHUNK LOOKUP: the length using the next chunk position if necessary.) Each chunk ID appears at most once. + The CHUNK LOOKUP matches the table of contents from + link:technical/chunk-format.html[the chunk-based file format]. + The remaining data in the body is described one chunk at a time, and these chunks may be given in any order. Chunks are required unless otherwise specified. diff --git a/Documentation/technical/pack-format.txt b/Documentation/technical/pack-format.txt index f96b2e605f34..2fb1e60d29ec 100644 --- a/Documentation/technical/pack-format.txt +++ b/Documentation/technical/pack-format.txt @@ -301,6 +301,9 @@ CHUNK LOOKUP: (Chunks are provided in file-order, so you can infer the length using the next chunk position if necessary.) + The CHUNK LOOKUP matches the table of contents from + link:technical/chunk-format.html[the chunk-based file format]. + The remaining data in the body is described one chunk at a time, and these chunks may be given in any order. Chunks are required unless otherwise specified.