From patchwork Fri Sep 8 23:10:18 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Eric W. Biederman" X-Patchwork-Id: 13377913 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 026B7EEB570 for ; Fri, 8 Sep 2023 23:31:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1345265AbjIHXbC (ORCPT ); Fri, 8 Sep 2023 19:31:02 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48408 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1345272AbjIHXa6 (ORCPT ); Fri, 8 Sep 2023 19:30:58 -0400 Received: from out01.mta.xmission.com (out01.mta.xmission.com [166.70.13.231]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 077A22105 for ; Fri, 8 Sep 2023 16:30:38 -0700 (PDT) Received: from in02.mta.xmission.com ([166.70.13.52]:34378) by out01.mta.xmission.com with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1qekdJ-007Qvl-UW; Fri, 08 Sep 2023 17:11:25 -0600 Received: from ip68-227-168-167.om.om.cox.net ([68.227.168.167]:54328 helo=localhost.localdomain) by in02.mta.xmission.com with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1qekdI-009u13-Te; Fri, 08 Sep 2023 17:11:25 -0600 From: "Eric W. Biederman" To: git@vger.kernel.org Cc: Junio C Hamano , "brian m. carlson" , "Eric W. Biederman" Date: Fri, 8 Sep 2023 18:10:18 -0500 Message-Id: <20230908231049.2035003-1-ebiederm@xmission.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <87sf7ol0z3.fsf@email.froward.int.ebiederm.org> References: <87sf7ol0z3.fsf@email.froward.int.ebiederm.org> MIME-Version: 1.0 X-XM-SPF: eid=1qekdI-009u13-Te;;;mid=<20230908231049.2035003-1-ebiederm@xmission.com>;;;hst=in02.mta.xmission.com;;;ip=68.227.168.167;;;frm=ebiederm@xmission.com;;;spf=pass X-XM-AID: U2FsdGVkX1/3QLOVx/ujlrJXiqC6Y+iBSe6gzFXACJU= X-SA-Exim-Connect-IP: 68.227.168.167 X-SA-Exim-Mail-From: ebiederm@xmission.com Subject: [PATCH 01/32] doc hash-file-transition: A map file for mapping between sha1 and sha256 X-SA-Exim-Version: 4.2.1 (built Sat, 08 Feb 2020 21:53:50 +0000) X-SA-Exim-Scanned: Yes (on in02.mta.xmission.com) Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org The v3 pack index file as documented has a lot of complexity making it difficult to implement correctly. I worked with bryan's preliminary implementation and it took several passes to get the bugs out. The complexity also requires multiple table look-ups to find all of the information that is needed to translate from one kind of oid to another. Which can't be good for cache locality. Even worse coming up with a new index file version requires making changes that have the potentialy to break anything that uses the index of a pack file. Instead of continuing to deal with the chance of braking things besides the oid mapping functionality, the additional complexity in the file format, and worry if the performance would be reasonable I stripped down the problem to it's fundamental complexity and came up with a file format that is exactly about mapping one kind of oid to another, and only supports two kinds of oids. Signed-off-by: "Eric W. Biederman" --- .../technical/hash-function-transition.txt | 40 +++++++++++++++++++ 1 file changed, 40 insertions(+) diff --git a/Documentation/technical/hash-function-transition.txt b/Documentation/technical/hash-function-transition.txt index ed574810891c..4b937480848a 100644 --- a/Documentation/technical/hash-function-transition.txt +++ b/Documentation/technical/hash-function-transition.txt @@ -209,6 +209,46 @@ format described in linkgit:gitformat-pack[5], just like today. The content that is compressed and stored uses SHA-256 content instead of SHA-1 content. +Per Pack Mapping Table +~~~~~~~~~~~~~~~~~~~~~~ +A pack compat map file (.compat) files have the following format: + +HEADER: + 4-byte signature: + The signature is: {'C', 'M', 'A', 'P'} + 1-byte version number: + Git only writes or recognizes version 1. + 1-byte First Object Id Version + We infer the length of object IDs (OIDs) from this value: + 1 => SHA-1 + 2 => SHA-256 + 1-byte Second Object Id Version + We infer the length of object IDs (OIDs) from this value: + 1 => SHA-1 + 2 => SHA-256 + 1-byte reserved (must be zero) + 4-byte number of objects names contained in this mapping + 1-byte length in bytes of shorted object names for the first object id. + This is the shortest possible length needed to make the + first object names unambigious. + 1-byte reserved (must be zero) + 1-byte length in bytes of shorted object names for the second object id. + This is the shortest possible length needed to make the + second object names unambigious. + 1-byte reserved (must be zero) + +OBJECT NAME TABLES: + [Object name raw length + 4]*Number of object names + This table is sorted by object name + Each entry in the table is formated as: + [20 or 32 byte] Object name + 4-byte index into the other object name table + +TRAILER: + checksum of the corresponding packfile, and + + checksum of all of the above. + Pack index ~~~~~~~~~~ Pack index (.idx) files use a new v3 format that supports multiple From patchwork Fri Sep 8 23:10:19 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Eric W. Biederman" X-Patchwork-Id: 13377895 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 91CACEEB570 for ; Fri, 8 Sep 2023 23:11:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S245331AbjIHXLg (ORCPT ); Fri, 8 Sep 2023 19:11:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36254 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S242771AbjIHXLd (ORCPT ); Fri, 8 Sep 2023 19:11:33 -0400 Received: from out02.mta.xmission.com (out02.mta.xmission.com [166.70.13.232]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 138DE133 for ; Fri, 8 Sep 2023 16:11:29 -0700 (PDT) Received: from in02.mta.xmission.com ([166.70.13.52]:39270) by out02.mta.xmission.com with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1qekdM-006MmZ-1J; Fri, 08 Sep 2023 17:11:28 -0600 Received: from ip68-227-168-167.om.om.cox.net ([68.227.168.167]:54328 helo=localhost.localdomain) by in02.mta.xmission.com with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1qekdL-009u13-0g; Fri, 08 Sep 2023 17:11:27 -0600 From: "Eric W. Biederman" To: git@vger.kernel.org Cc: Junio C Hamano , "brian m. carlson" , "Eric W. Biederman" Date: Fri, 8 Sep 2023 18:10:19 -0500 Message-Id: <20230908231049.2035003-2-ebiederm@xmission.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <87sf7ol0z3.fsf@email.froward.int.ebiederm.org> References: <87sf7ol0z3.fsf@email.froward.int.ebiederm.org> MIME-Version: 1.0 X-XM-SPF: eid=1qekdL-009u13-0g;;;mid=<20230908231049.2035003-2-ebiederm@xmission.com>;;;hst=in02.mta.xmission.com;;;ip=68.227.168.167;;;frm=ebiederm@xmission.com;;;spf=pass X-XM-AID: U2FsdGVkX1+LTVjOby0KHTZEKTR3lI9ZSCXb/K5ndBg= X-SA-Exim-Connect-IP: 68.227.168.167 X-SA-Exim-Mail-From: ebiederm@xmission.com Subject: [PATCH 02/32] doc hash-function-transition: Replace compatObjectFormat with compatMap X-SA-Exim-Version: 4.2.1 (built Sat, 08 Feb 2020 21:53:50 +0000) X-SA-Exim-Scanned: Yes (on in02.mta.xmission.com) Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Ir makes a lot of sense for the hash algorithm that determines how all of the objects in the repostiory be an extension so that versions of git that don't know about it won't even try. For implementing the compatiblity maps that really is not the case. An version of git that does not recognizes the won't care and continue to use the repository as is. The mapping functionality simply won't be present. Similarly if all of the objects are not mapped this could cause some practical difficulties but it will not cause anything to perform the wrong actions to the repository. Some commands just won't work. In the worst case all that needs to happen is for the compatibilty maps to be rebuilt. So let's use an option that forces unnecessary breakage of existing tools. Signed-off-by: "Eric W. Biederman" --- .../technical/hash-function-transition.txt | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/Documentation/technical/hash-function-transition.txt b/Documentation/technical/hash-function-transition.txt index 4b937480848a..10572c5794f9 100644 --- a/Documentation/technical/hash-function-transition.txt +++ b/Documentation/technical/hash-function-transition.txt @@ -148,14 +148,14 @@ Detailed Design Repository format extension ~~~~~~~~~~~~~~~~~~~~~~~~~~~ A SHA-256 repository uses repository format version `1` (see -Documentation/technical/repository-version.txt) with extensions -`objectFormat` and `compatObjectFormat`: +Documentation/technical/repository-version.txt) with the extension +`objectFormat`, and an optional core.compatMap configuration. [core] repositoryFormatVersion = 1 + compatMap = on [extensions] objectFormat = sha256 - compatObjectFormat = sha1 The combination of setting `core.repositoryFormatVersion=1` and populating `extensions.*` ensures that all versions of Git later than @@ -682,7 +682,7 @@ Some initial steps can be implemented independently of one another: - adding support for the PSRC field and safer object pruning The first user-visible change is the introduction of the objectFormat -extension (without compatObjectFormat). This requires: +extension. This requires: - teaching fsck about this mode of operation - using the hash function API (vtable) when computing object names @@ -690,7 +690,7 @@ extension (without compatObjectFormat). This requires: - rejecting attempts to fetch from or push to an incompatible repository -Next comes introduction of compatObjectFormat: +Next comes introduction of compatMap: - implementing the loose-object-idx - translating object names between object formats @@ -724,9 +724,9 @@ Over time projects would encourage their users to adopt the "early transition" and then "late transition" modes to take advantage of the new, more futureproof SHA-256 object names. -When objectFormat and compatObjectFormat are both set, commands -generating signatures would generate both SHA-1 and SHA-256 signatures -by default to support both new and old users. +When objectFormat and compatMap are both set, commands generating +signatures would generate both SHA-1 and SHA-256 signatures by default +to support both new and old users. In projects using SHA-256 heavily, users could be encouraged to adopt the "post-transition" mode to avoid accidentally making implicit use From patchwork Fri Sep 8 23:10:20 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Eric W. Biederman" X-Patchwork-Id: 13377911 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4C349EEB571 for ; Fri, 8 Sep 2023 23:30:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243950AbjIHXak (ORCPT ); Fri, 8 Sep 2023 19:30:40 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59222 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233421AbjIHXag (ORCPT ); Fri, 8 Sep 2023 19:30:36 -0400 Received: from out03.mta.xmission.com (out03.mta.xmission.com [166.70.13.233]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 66838E46 for ; Fri, 8 Sep 2023 16:30:31 -0700 (PDT) Received: from in02.mta.xmission.com ([166.70.13.52]:36982) by out03.mta.xmission.com with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1qekdO-00FHH8-Ds; Fri, 08 Sep 2023 17:11:30 -0600 Received: from ip68-227-168-167.om.om.cox.net ([68.227.168.167]:54328 helo=localhost.localdomain) by in02.mta.xmission.com with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1qekdN-009u13-31; Fri, 08 Sep 2023 17:11:30 -0600 From: "Eric W. Biederman" To: git@vger.kernel.org Cc: Junio C Hamano , "brian m. carlson" , "Eric W. Biederman" Date: Fri, 8 Sep 2023 18:10:20 -0500 Message-Id: <20230908231049.2035003-3-ebiederm@xmission.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <87sf7ol0z3.fsf@email.froward.int.ebiederm.org> References: <87sf7ol0z3.fsf@email.froward.int.ebiederm.org> MIME-Version: 1.0 X-XM-SPF: eid=1qekdN-009u13-31;;;mid=<20230908231049.2035003-3-ebiederm@xmission.com>;;;hst=in02.mta.xmission.com;;;ip=68.227.168.167;;;frm=ebiederm@xmission.com;;;spf=pass X-XM-AID: U2FsdGVkX18cfkxQQNfPQVEKp6L0RzH8tHxUxqYf3xs= X-SA-Exim-Connect-IP: 68.227.168.167 X-SA-Exim-Mail-From: ebiederm@xmission.com Subject: [PATCH 03/32] object-file-convert: Stubs for converting from one object format to another X-SA-Exim-Version: 4.2.1 (built Sat, 08 Feb 2020 21:53:50 +0000) X-SA-Exim-Scanned: Yes (on in02.mta.xmission.com) Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Two basic functions are provided: - convert_object_file Takes an object file it's type and hash algorithm and converts it into the equivalent object file that would have been generated with hash algorithm "to". For blob objects there is no converstion to be done and it is an error to use this function on them. For commit, tree and tag objects that embedded oids are replaced by the oids of the objects they refer to if those objects had been generated with the hash "to". - repo_oid_to_algop which takes an oid that refers to an object file and returns the oid of the equavalent object file generated with the target hash algorithm. Two core functions are modified: - oid_object_info_extended is updated to detect an oid encoding that does not match the current repository, use repo_oid_to_algop to find the correspoding oid in the current repository and to return the data for the oid. The pair of files object-file-convert.c and object-file-convert.h is introduced to hold as much of this logic as possible to keep this conversion logic cleanly separated from everything else and in the hopes that someday the code will be clean enough git can support compiling out support for sha1 and the various conversion functions. Signed-off-by: "Eric W. Biederman" --- Makefile | 1 + object-file-convert.c | 55 ++++++++++++++++++++++++++++ object-file-convert.h | 24 +++++++++++++ object-file.c | 83 +++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 163 insertions(+) create mode 100644 object-file-convert.c create mode 100644 object-file-convert.h diff --git a/Makefile b/Makefile index 577630936535..f7e824f25cda 100644 --- a/Makefile +++ b/Makefile @@ -1073,6 +1073,7 @@ LIB_OBJS += notes-cache.o LIB_OBJS += notes-merge.o LIB_OBJS += notes-utils.o LIB_OBJS += notes.o +LIB_OBJS += object-file-convert.o LIB_OBJS += object-file.o LIB_OBJS += object-name.o LIB_OBJS += object.o diff --git a/object-file-convert.c b/object-file-convert.c new file mode 100644 index 000000000000..9f4d5b354f5f --- /dev/null +++ b/object-file-convert.c @@ -0,0 +1,55 @@ +#include "git-compat-util.h" +#include "gettext.h" +#include "strbuf.h" +#include "repository.h" +#include "hash-ll.h" +#include "object.h" +#include "object-file-convert.h" + +int repo_oid_to_algop(struct repository *repo, const struct object_id *src, + const struct git_hash_algo *to, struct object_id *dest) +{ + /* + * If the source alogirthm is not set, then we're using the + * default hash algorithm for that object. + */ + const struct git_hash_algo *from = + src->algo ? &hash_algos[src->algo] : repo->hash_algo; + + if (from == to) { + if (src != dest) + oidcpy(dest, src); + return 0; + } + return -1; +} + +int convert_object_file(struct strbuf *outbuf, + const struct git_hash_algo *from, + const struct git_hash_algo *to, + const void *buf, size_t len, + enum object_type type, + int gentle) +{ + int ret; + + /* Don't call this function when no conversion is necessary */ + if ((from == to) || (type == OBJ_BLOB)) + die("Refusing noop object file conversion"); + + switch (type) { + case OBJ_COMMIT: + case OBJ_TREE: + case OBJ_TAG: + default: + /* Not implemented yet, so fail. */ + ret = -1; + break; + } + if (!ret) + return 0; + if (gentle) + return ret; + die(_("Failed to convert object from %s to %s"), + from->name, to->name); +} diff --git a/object-file-convert.h b/object-file-convert.h new file mode 100644 index 000000000000..a4f802aa8eea --- /dev/null +++ b/object-file-convert.h @@ -0,0 +1,24 @@ +#ifndef OBJECT_CONVERT_H +#define OBJECT_CONVERT_H + +struct repository; +struct object_id; +struct git_hash_algo; +struct strbuf; +#include "object.h" + +int repo_oid_to_algop(struct repository *repo, const struct object_id *src, + const struct git_hash_algo *to, struct object_id *dest); + +/* + * Convert an object file from one hash algorithm to another algorithm. + * Return -1 on failure, 0 on success. + */ +int convert_object_file(struct strbuf *outbuf, + const struct git_hash_algo *from, + const struct git_hash_algo *to, + const void *buf, size_t len, + enum object_type type, + int gentle); + +#endif /* OBJECT_CONVERT_H */ diff --git a/object-file.c b/object-file.c index 7dc0c4bfbba8..7f24f19b8a68 100644 --- a/object-file.c +++ b/object-file.c @@ -36,6 +36,7 @@ #include "quote.h" #include "packfile.h" #include "object-file.h" +#include "object-file-convert.h" #include "object-store.h" #include "oidtree.h" #include "path.h" @@ -1660,10 +1661,92 @@ static int do_oid_object_info_extended(struct repository *r, return 0; } +static int oid_object_info_convert(struct repository *r, + const struct object_id *input_oid, + struct object_info *input_oi, unsigned flags) +{ + const struct git_hash_algo *input_algo = &hash_algos[input_oid->algo]; + int do_die = flags & OBJECT_INFO_DIE_IF_CORRUPT; + struct strbuf type_name = STRBUF_INIT; + struct object_info oi = *input_oi; + struct object_id oid, delta_base_oid; + unsigned long size; + void *content; + int ret; + + if (repo_oid_to_algop(r, input_oid, the_hash_algo, &oid)) { + if (do_die) + die(_("missing mapping of %s to %s"), + oid_to_hex(input_oid), the_hash_algo->name); + return -1; + } + + /* Do we need to convert the delta base oid? */ + if (oi.delta_base_oid) + oi.delta_base_oid = &delta_base_oid; + + /* Do we need attributes that differ when converted? */ + if (oi.sizep || oi.contentp) { + oi.contentp = &content; + oi.sizep = &size; + oi.type_name = &type_name; + } + + ret = oid_object_info_extended(r, &oid, &oi, flags); + if (ret) + return -1; + + if (oi.contentp == &content) { + struct strbuf outbuf = STRBUF_INIT; + enum object_type type; + + type = type_from_string_gently(type_name.buf, type_name.len, + !do_die); + if (type == -1) + return -1; + if (type != OBJ_BLOB) { + ret = convert_object_file(&outbuf, + the_hash_algo, input_algo, + content, size, type, !do_die); + if (ret == -1) + return -1; + size = outbuf.len; + content = strbuf_detach(&outbuf, NULL); + } + if (input_oi->sizep) + *input_oi->sizep = size; + if (input_oi->contentp) + *input_oi->contentp = content; + else + free(content); + if (input_oi->type_name) + *input_oi->type_name = type_name; + else + strbuf_release(&type_name); + } + if (oi.delta_base_oid == &delta_base_oid) { + if (repo_oid_to_algop(r, &delta_base_oid, input_algo, + input_oi->delta_base_oid)) { + if (do_die) + die(_("missing mapping of %s to %s"), + oid_to_hex(&delta_base_oid), + input_algo->name); + return -1; + } + } + input_oi->whence = oi.whence; + input_oi->u = oi.u; + return ret; +} + int oid_object_info_extended(struct repository *r, const struct object_id *oid, struct object_info *oi, unsigned flags) { int ret; + + if (oid->algo && (hash_algo_by_ptr(r->hash_algo) != oid->algo)) + return oid_object_info_convert(r, oid, oi, flags); + obj_read_lock(); ret = do_oid_object_info_extended(r, oid, oi, flags); obj_read_unlock(); From patchwork Fri Sep 8 23:10:21 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Eric W. Biederman" X-Patchwork-Id: 13377896 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B543AEEB56E for ; Fri, 8 Sep 2023 23:11:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240190AbjIHXLq (ORCPT ); Fri, 8 Sep 2023 19:11:46 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36400 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244403AbjIHXLk (ORCPT ); Fri, 8 Sep 2023 19:11:40 -0400 Received: from out02.mta.xmission.com (out02.mta.xmission.com [166.70.13.232]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 04B6F1705 for ; Fri, 8 Sep 2023 16:11:34 -0700 (PDT) Received: from in02.mta.xmission.com ([166.70.13.52]:39318) by out02.mta.xmission.com with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1qekdR-006Mmt-5F; Fri, 08 Sep 2023 17:11:33 -0600 Received: from ip68-227-168-167.om.om.cox.net ([68.227.168.167]:54328 helo=localhost.localdomain) by in02.mta.xmission.com with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1qekdP-009u13-GA; Fri, 08 Sep 2023 17:11:32 -0600 From: "Eric W. Biederman" To: git@vger.kernel.org Cc: Junio C Hamano , "brian m. carlson" , "Eric W. Biederman" Date: Fri, 8 Sep 2023 18:10:21 -0500 Message-Id: <20230908231049.2035003-4-ebiederm@xmission.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <87sf7ol0z3.fsf@email.froward.int.ebiederm.org> References: <87sf7ol0z3.fsf@email.froward.int.ebiederm.org> MIME-Version: 1.0 X-XM-SPF: eid=1qekdP-009u13-GA;;;mid=<20230908231049.2035003-4-ebiederm@xmission.com>;;;hst=in02.mta.xmission.com;;;ip=68.227.168.167;;;frm=ebiederm@xmission.com;;;spf=pass X-XM-AID: U2FsdGVkX1+noybTYetKZVPWNmFnWPMZwPRygUOVSGo= X-SA-Exim-Connect-IP: 68.227.168.167 X-SA-Exim-Mail-From: ebiederm@xmission.com Subject: [PATCH 04/32] object-name: Initial support for ^{sha1} and ^{sha256} X-SA-Exim-Version: 4.2.1 (built Sat, 08 Feb 2020 21:53:50 +0000) X-SA-Exim-Scanned: Yes (on in02.mta.xmission.com) Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org In Documentation/technical/hash-function-transition.txt it suggests supporting references like abac87a^{sha1} and f787cac^{sha256}. This changes goes a step farther and supports a short oid in any algorithm, and to just ensures enough of the oid is present to disambiguate between all possible oids in any algorithm. Support for suffixes of ^{sha1} and ^{sha256} is implemented as it is easy, and can be handy for testing. To support this mode of operation two flags are added: GET_OID_SHA1, and GET_OID_SHA256. By default when an oid is specified in an algorithm that does not match the algorithm of the repository, the oid is translated to the oid that matches the hash algorithm of the repository. This ensures oids that don't match the repository hash algorithm can be used everywhere oids can currently be used. A new flag is added GET_OID_UNTRANSLATED that suppresses the translation of an oid into the repositories hash algorithm. This is useful for testing and raw tools like git cat-file. Signed-off-by: "Eric W. Biederman" --- hash-ll.h | 3 +++ object-name.c | 59 +++++++++++++++++++++++++++++++++++++++++++++------ 2 files changed, 55 insertions(+), 7 deletions(-) diff --git a/hash-ll.h b/hash-ll.h index 10d84cc20888..2a4f72d70c3f 100644 --- a/hash-ll.h +++ b/hash-ll.h @@ -143,8 +143,11 @@ struct object_id { #define GET_OID_BLOB 040 #define GET_OID_FOLLOW_SYMLINKS 0100 #define GET_OID_RECORD_PATH 0200 +#define GET_OID_SHA1 01000 +#define GET_OID_SHA256 02000 #define GET_OID_ONLY_TO_DIE 04000 #define GET_OID_REQUIRE_PATH 010000 +#define GET_OID_UNTRANSLATED 020000 #define GET_OID_DISAMBIGUATORS \ (GET_OID_COMMIT | GET_OID_COMMITTISH | \ diff --git a/object-name.c b/object-name.c index 0bfa29dbbfe9..ebe87f5c4fdd 100644 --- a/object-name.c +++ b/object-name.c @@ -25,6 +25,7 @@ #include "midx.h" #include "commit-reach.h" #include "date.h" +#include "object-file-convert.h" static int get_oid_oneline(struct repository *r, const char *, struct object_id *, struct commit_list *); @@ -32,6 +33,7 @@ typedef int (*disambiguate_hint_fn)(struct repository *, const struct object_id struct disambiguate_state { int len; /* length of prefix in hex chars */ + int algo; char hex_pfx[GIT_MAX_HEXSZ + 1]; struct object_id bin_pfx; @@ -49,6 +51,10 @@ struct disambiguate_state { static void update_candidates(struct disambiguate_state *ds, const struct object_id *current) { + /* Is the oid encoded in the desired algo? */ + if (ds->algo && (current->algo != ds->algo)) + return; + if (ds->always_call_fn) { ds->ambiguous = ds->fn(ds->repo, current, ds->cb_data) ? 1 : 0; return; @@ -134,6 +140,8 @@ static void unique_in_midx(struct multi_pack_index *m, { uint32_t num, i, first = 0; const struct object_id *current = NULL; + int len = ds->len > ds->repo->hash_algo->hexsz ? + ds->repo->hash_algo->hexsz : ds->len; num = m->num_objects; if (!num) @@ -149,7 +157,7 @@ static void unique_in_midx(struct multi_pack_index *m, for (i = first; i < num && !ds->ambiguous; i++) { struct object_id oid; current = nth_midxed_object_oid(&oid, m, i); - if (!match_hash(ds->len, ds->bin_pfx.hash, current->hash)) + if (!match_hash(len, ds->bin_pfx.hash, current->hash)) break; update_candidates(ds, current); } @@ -159,6 +167,8 @@ static void unique_in_pack(struct packed_git *p, struct disambiguate_state *ds) { uint32_t num, i, first = 0; + int len = ds->len > ds->repo->hash_algo->hexsz ? + ds->repo->hash_algo->hexsz : ds->len; if (p->multi_pack_index) return; @@ -177,7 +187,7 @@ static void unique_in_pack(struct packed_git *p, for (i = first; i < num && !ds->ambiguous; i++) { struct object_id oid; nth_packed_object_id(&oid, p, i); - if (!match_hash(ds->len, ds->bin_pfx.hash, oid.hash)) + if (!match_hash(len, ds->bin_pfx.hash, oid.hash)) break; update_candidates(ds, &oid); } @@ -188,6 +198,10 @@ static void find_short_packed_object(struct disambiguate_state *ds) struct multi_pack_index *m; struct packed_git *p; + /* Skip, unless oids from the repository algorithm are wanted */ + if (ds->algo && (&hash_algos[ds->algo] != ds->repo->hash_algo)) + return; + for (m = get_multi_pack_index(ds->repo); m && !ds->ambiguous; m = m->next) unique_in_midx(m, ds); @@ -330,7 +344,7 @@ static int init_object_disambiguation(struct repository *r, { int i; - if (len < MINIMUM_ABBREV || len > the_hash_algo->hexsz) + if (len < MINIMUM_ABBREV || len > GIT_MAX_HEXSZ) return -1; memset(ds, 0, sizeof(*ds)); @@ -357,6 +371,7 @@ static int init_object_disambiguation(struct repository *r, ds->len = len; ds->hex_pfx[len] = '\0'; ds->repo = r; + ds->algo = GIT_HASH_UNKNOWN; prepare_alt_odb(r); return 0; } @@ -491,9 +506,10 @@ static int repo_collect_ambiguous(struct repository *r UNUSED, return collect_ambiguous(oid, data); } -static int sort_ambiguous(const void *a, const void *b, void *ctx) +static int sort_ambiguous(const void *va, const void *vb, void *ctx) { struct repository *sort_ambiguous_repo = ctx; + const struct object_id *a = va, *b = vb; int a_type = oid_object_info(sort_ambiguous_repo, a, NULL); int b_type = oid_object_info(sort_ambiguous_repo, b, NULL); int a_type_sort; @@ -503,8 +519,13 @@ static int sort_ambiguous(const void *a, const void *b, void *ctx) * Sorts by hash within the same object type, just as * oid_array_for_each_unique() would do. */ - if (a_type == b_type) - return oidcmp(a, b); + if (a_type == b_type) { + /* Is the hash algorithm the same? */ + if (a->algo == b->algo) + return oidcmp(a, b); + else + return a->algo < b->algo ? -1 : 1; + } /* * Between object types show tags, then commits, and finally @@ -553,6 +574,11 @@ static enum get_oid_result get_short_oid(struct repository *r, else ds.fn = default_disambiguate_hint; + if (flags & GET_OID_SHA1) + ds.algo = GIT_HASH_SHA1; + else if (flags & GET_OID_SHA256) + ds.algo = GIT_HASH_SHA256; + find_short_object_filename(&ds); find_short_packed_object(&ds); status = finish_object_disambiguation(&ds, oid); @@ -606,6 +632,15 @@ static enum get_oid_result get_short_oid(struct repository *r, strbuf_release(&out.sb); } + /* Ensure oid->algo is set */ + if (oid->algo == GIT_HASH_UNKNOWN) + oid->algo = hash_algo_by_ptr(r->hash_algo); + + /* Return oids using the repository's hash algorithm */ + if ((&hash_algos[oid->algo] != r->hash_algo) && + !(flags & GET_OID_UNTRANSLATED)) + repo_oid_to_algop(r, oid, r->hash_algo, oid); + return status; } @@ -787,10 +822,12 @@ void strbuf_add_unique_abbrev(struct strbuf *sb, const struct object_id *oid, int repo_find_unique_abbrev_r(struct repository *r, char *hex, const struct object_id *oid, int len) { + const struct git_hash_algo *algo = + oid->algo ? &hash_algos[oid->algo] : r->hash_algo; struct disambiguate_state ds; struct min_abbrev_data mad; struct object_id oid_ret; - const unsigned hexsz = r->hash_algo->hexsz; + const unsigned hexsz = algo->hexsz; if (len < 0) { unsigned long count = repo_approximate_object_count(r); @@ -1158,6 +1195,14 @@ static int peel_onion(struct repository *r, const char *name, int len, return -1; sp++; /* beginning of type name, or closing brace for empty */ + + if (starts_with(sp, "sha1}")) + return get_short_oid(r, name, len - 7, oid, + lookup_flags | GET_OID_SHA1); + else if (starts_with(sp, "sha256")) + return get_short_oid(r, name, len - 9, oid, + lookup_flags | GET_OID_SHA256); + if (starts_with(sp, "commit}")) expected_type = OBJ_COMMIT; else if (starts_with(sp, "tag}")) From patchwork Fri Sep 8 23:10:22 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Eric W. Biederman" X-Patchwork-Id: 13377929 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CB2B9EEB565 for ; Sat, 9 Sep 2023 00:19:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1345363AbjIIATT (ORCPT ); Fri, 8 Sep 2023 20:19:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48726 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S243171AbjIIATT (ORCPT ); Fri, 8 Sep 2023 20:19:19 -0400 Received: from out03.mta.xmission.com (out03.mta.xmission.com [166.70.13.233]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0E48A133 for ; Fri, 8 Sep 2023 17:19:15 -0700 (PDT) Received: from in02.mta.xmission.com ([166.70.13.52]:37030) by out03.mta.xmission.com with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1qekdT-00FHHN-9m; Fri, 08 Sep 2023 17:11:35 -0600 Received: from ip68-227-168-167.om.om.cox.net ([68.227.168.167]:54328 helo=localhost.localdomain) by in02.mta.xmission.com with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1qekdS-009u13-7Y; Fri, 08 Sep 2023 17:11:34 -0600 From: "Eric W. Biederman" To: git@vger.kernel.org Cc: Junio C Hamano , "brian m. carlson" , "Eric W. Biederman" Date: Fri, 8 Sep 2023 18:10:22 -0500 Message-Id: <20230908231049.2035003-5-ebiederm@xmission.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <87sf7ol0z3.fsf@email.froward.int.ebiederm.org> References: <87sf7ol0z3.fsf@email.froward.int.ebiederm.org> MIME-Version: 1.0 X-XM-SPF: eid=1qekdS-009u13-7Y;;;mid=<20230908231049.2035003-5-ebiederm@xmission.com>;;;hst=in02.mta.xmission.com;;;ip=68.227.168.167;;;frm=ebiederm@xmission.com;;;spf=pass X-XM-AID: U2FsdGVkX196bS0tNajwF/IzzBF9S/OlMhU3CpFLmUA= X-SA-Exim-Connect-IP: 68.227.168.167 X-SA-Exim-Mail-From: ebiederm@xmission.com Subject: [PATCH 05/32] repository: add a compatibility hash algorithm X-SA-Exim-Version: 4.2.1 (built Sat, 08 Feb 2020 21:53:50 +0000) X-SA-Exim-Scanned: Yes (on in02.mta.xmission.com) Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org We currently have support for using a full stage 4 SHA-256 implementation. However, we'd like to support interoperability with SHA-1 repositories as well. The transition plan anticipates a compatibility hash algorithm configuration option that we can use to implement support for this. Let's add an element to the repository structure that indicates the compatibility hash algorithm so we can use it when we need to consider interoperability between algorithms. For now, we always set it to NULL, but we'll initialize it differently in the future. Inspired-by: brian m. carlson Signed-off-by: Eric W. Biederman --- repository.h | 3 +++ 1 file changed, 3 insertions(+) diff --git a/repository.h b/repository.h index 5f18486f6465..6c4130f0c36e 100644 --- a/repository.h +++ b/repository.h @@ -160,6 +160,9 @@ struct repository { /* Repository's current hash algorithm, as serialized on disk. */ const struct git_hash_algo *hash_algo; + /* Repository's compatibility hash algorithm. */ + const struct git_hash_algo *compat_hash_algo; + /* A unique-id for tracing purposes. */ int trace2_repo_id; From patchwork Fri Sep 8 23:10:23 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Eric W. Biederman" X-Patchwork-Id: 13377897 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 480E3EEB571 for ; Fri, 8 Sep 2023 23:11:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S245481AbjIHXLr (ORCPT ); Fri, 8 Sep 2023 19:11:47 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36520 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1344737AbjIHXLp (ORCPT ); Fri, 8 Sep 2023 19:11:45 -0400 Received: from out02.mta.xmission.com (out02.mta.xmission.com [166.70.13.232]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 538881FE5 for ; Fri, 8 Sep 2023 16:11:38 -0700 (PDT) Received: from in02.mta.xmission.com ([166.70.13.52]:39356) by out02.mta.xmission.com with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1qekdV-006MnC-Gf; Fri, 08 Sep 2023 17:11:37 -0600 Received: from ip68-227-168-167.om.om.cox.net ([68.227.168.167]:54328 helo=localhost.localdomain) by in02.mta.xmission.com with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1qekdU-009u13-CB; Fri, 08 Sep 2023 17:11:37 -0600 From: "Eric W. Biederman" To: git@vger.kernel.org Cc: Junio C Hamano , "brian m. carlson" , "Eric W. Biederman" Date: Fri, 8 Sep 2023 18:10:23 -0500 Message-Id: <20230908231049.2035003-6-ebiederm@xmission.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <87sf7ol0z3.fsf@email.froward.int.ebiederm.org> References: <87sf7ol0z3.fsf@email.froward.int.ebiederm.org> MIME-Version: 1.0 X-XM-SPF: eid=1qekdU-009u13-CB;;;mid=<20230908231049.2035003-6-ebiederm@xmission.com>;;;hst=in02.mta.xmission.com;;;ip=68.227.168.167;;;frm=ebiederm@xmission.com;;;spf=pass X-XM-AID: U2FsdGVkX1+LSpB3Tq+4y9fRsFAzEsFiNdFIvEiqSJk= X-SA-Exim-Connect-IP: 68.227.168.167 X-SA-Exim-Mail-From: ebiederm@xmission.com Subject: [PATCH 06/32] repository: Implement core.compatMap X-SA-Exim-Version: 4.2.1 (built Sat, 08 Feb 2020 21:53:50 +0000) X-SA-Exim-Scanned: Yes (on in02.mta.xmission.com) Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: "brian m. carlson" Add a configuration option to enable updating and reading from compatibility hash maps when git accesses the reposotiry. Add a helper function repo_enable_compat_map that when passed false disables the compatiblily hash algorithm and when passed true computes the compatibilty hash algorithm and sets "repo->compat_hash_algo". For now the option is limited to being specified in ".git/config". Perhaps in the future we can allow specifying it in ".gitconfig" as well. Signed-off-by: "Eric W. Biederman" --- Documentation/config/core.txt | 6 ++++++ repository.c | 11 +++++++++++ repository.h | 1 + setup.c | 5 +++++ setup.h | 1 + 5 files changed, 24 insertions(+) diff --git a/Documentation/config/core.txt b/Documentation/config/core.txt index dfbdaf00b8bc..a9eb2006cc32 100644 --- a/Documentation/config/core.txt +++ b/Documentation/config/core.txt @@ -736,3 +736,9 @@ core.abbrev:: If set to "no", no abbreviation is made and the object names are shown in their full length. The minimum length is 4. + +core.compatMap:: + Enables the use of a compat map to recored the hash in the + other object format. This allows repositories in different + objects formats to interoperate. It allows looking up old oids + in a repository that has been converted from sha1 to sha256. diff --git a/repository.c b/repository.c index a7679ceeaa45..de620d82bfc6 100644 --- a/repository.c +++ b/repository.c @@ -104,6 +104,16 @@ void repo_set_hash_algo(struct repository *repo, int hash_algo) repo->hash_algo = &hash_algos[hash_algo]; } +void repo_enable_compat_map(struct repository *repo, int enable_compat) +{ + const struct git_hash_algo *other_algo = + &hash_algos[(hash_algo_by_ptr(repo->hash_algo) == GIT_HASH_SHA1) ? + GIT_HASH_SHA256 : + GIT_HASH_SHA1]; + + repo->compat_hash_algo = enable_compat ? other_algo : NULL; +} + /* * Attempt to resolve and set the provided 'gitdir' for repository 'repo'. * Return 0 upon success and a non-zero value upon failure. @@ -184,6 +194,7 @@ int repo_init(struct repository *repo, goto error; repo_set_hash_algo(repo, format.hash_algo); + repo_enable_compat_map(repo, format.use_compat_map); repo->repository_format_worktree_config = format.worktree_config; /* take ownership of format.partial_clone */ diff --git a/repository.h b/repository.h index 6c4130f0c36e..03cadf6d9a98 100644 --- a/repository.h +++ b/repository.h @@ -202,6 +202,7 @@ void repo_set_gitdir(struct repository *repo, const char *root, const struct set_gitdir_args *extra_args); void repo_set_worktree(struct repository *repo, const char *path); void repo_set_hash_algo(struct repository *repo, int algo); +void repo_enable_compat_map(struct repository *repo, int enable_compat); void initialize_the_repository(void); RESULT_MUST_BE_USED int repo_init(struct repository *r, const char *gitdir, const char *worktree); diff --git a/setup.c b/setup.c index 18927a847b86..b4d32bd820f1 100644 --- a/setup.c +++ b/setup.c @@ -623,6 +623,8 @@ static int check_repo_format(const char *var, const char *value, return 0; } } + else if (strcmp(var, "core.compatmap") == 0) + data->use_compat_map = git_config_bool(var, value); return read_worktree_config(var, value, ctx, vdata); } @@ -1564,8 +1566,10 @@ const char *setup_git_directory_gently(int *nongit_ok) } if (startup_info->have_repository) { repo_set_hash_algo(the_repository, repo_fmt.hash_algo); + repo_enable_compat_map(the_repository, repo_fmt.use_compat_map); the_repository->repository_format_worktree_config = repo_fmt.worktree_config; + /* take ownership of repo_fmt.partial_clone */ the_repository->repository_format_partial_clone = repo_fmt.partial_clone; @@ -1657,6 +1661,7 @@ void check_repository_format(struct repository_format *fmt) check_repository_format_gently(get_git_dir(), fmt, NULL); startup_info->have_repository = 1; repo_set_hash_algo(the_repository, fmt->hash_algo); + repo_enable_compat_map(the_repository, fmt->use_compat_map); the_repository->repository_format_worktree_config = fmt->worktree_config; the_repository->repository_format_partial_clone = diff --git a/setup.h b/setup.h index 58fd2605dd26..afa05b2b64f3 100644 --- a/setup.h +++ b/setup.h @@ -86,6 +86,7 @@ struct repository_format { int worktree_config; int is_bare; int hash_algo; + int use_compat_map; int sparse_index; char *work_tree; struct string_list unknown_extensions; From patchwork Fri Sep 8 23:10:24 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Eric W. Biederman" X-Patchwork-Id: 13377898 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 00452EEB570 for ; Fri, 8 Sep 2023 23:11:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S245707AbjIHXLz (ORCPT ); Fri, 8 Sep 2023 19:11:55 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36350 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S245588AbjIHXLx (ORCPT ); Fri, 8 Sep 2023 19:11:53 -0400 Received: from out02.mta.xmission.com (out02.mta.xmission.com [166.70.13.232]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CE6AA1FFD for ; Fri, 8 Sep 2023 16:11:40 -0700 (PDT) Received: from in02.mta.xmission.com ([166.70.13.52]:39378) by out02.mta.xmission.com with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1qekdY-006MnX-0B; Fri, 08 Sep 2023 17:11:40 -0600 Received: from ip68-227-168-167.om.om.cox.net ([68.227.168.167]:54328 helo=localhost.localdomain) by in02.mta.xmission.com with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1qekdW-009u13-J3; Fri, 08 Sep 2023 17:11:39 -0600 From: "Eric W. Biederman" To: git@vger.kernel.org Cc: Junio C Hamano , "brian m. carlson" , "Eric W . Biederman" Date: Fri, 8 Sep 2023 18:10:24 -0500 Message-Id: <20230908231049.2035003-7-ebiederm@xmission.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <87sf7ol0z3.fsf@email.froward.int.ebiederm.org> References: <87sf7ol0z3.fsf@email.froward.int.ebiederm.org> MIME-Version: 1.0 X-XM-SPF: eid=1qekdW-009u13-J3;;;mid=<20230908231049.2035003-7-ebiederm@xmission.com>;;;hst=in02.mta.xmission.com;;;ip=68.227.168.167;;;frm=ebiederm@xmission.com;;;spf=pass X-XM-AID: U2FsdGVkX1/UrenUi31HcNm9AzhjB6F0oB4iJg56YJ4= X-SA-Exim-Connect-IP: 68.227.168.167 X-SA-Exim-Mail-From: ebiederm@xmission.com Subject: [PATCH 07/32] loose: add a mapping between SHA-1 and SHA-256 for loose objects X-SA-Exim-Version: 4.2.1 (built Sat, 08 Feb 2020 21:53:50 +0000) X-SA-Exim-Scanned: Yes (on in02.mta.xmission.com) Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: "brian m. carlson" As part of the transition plan, we'd like to add a file in the .git directory that maps loose objects between SHA-1 and SHA-256. Let's implement the specification in the transition plan and store this data on a per-repository basis in struct repository. **** - split repo_object_map between repo_loose_object_map_oid and repo_oid_to_algop. - Verified the loose_map is set in repo_loose_object_map_oid -- EWB Signed-off-by: brian m. carlson Signed-off-by: Eric W. Biederman --- Makefile | 1 + loose.c | 243 ++++++++++++++++++++++++++++++++++++++++++ loose.h | 20 ++++ object-file-convert.c | 14 ++- object-store-ll.h | 3 + object.c | 2 + repository.c | 6 ++ 7 files changed, 288 insertions(+), 1 deletion(-) create mode 100644 loose.c create mode 100644 loose.h diff --git a/Makefile b/Makefile index f7e824f25cda..3c18664def9a 100644 --- a/Makefile +++ b/Makefile @@ -1053,6 +1053,7 @@ LIB_OBJS += list-objects-filter.o LIB_OBJS += list-objects.o LIB_OBJS += lockfile.o LIB_OBJS += log-tree.o +LIB_OBJS += loose.o LIB_OBJS += ls-refs.o LIB_OBJS += mailinfo.o LIB_OBJS += mailmap.o diff --git a/loose.c b/loose.c new file mode 100644 index 000000000000..8ddb7112a541 --- /dev/null +++ b/loose.c @@ -0,0 +1,243 @@ +#include "git-compat-util.h" +#include "hash.h" +#include "path.h" +#include "object-store.h" +#include "hex.h" +#include "wrapper.h" +#include "gettext.h" +#include "loose.h" +#include "lockfile.h" + +static const char *loose_object_header = "# loose-object-idx\n"; + +static inline int should_use_loose_object_map(struct repository *repo) +{ + return repo->compat_hash_algo && repo->gitdir; +} + +void loose_object_map_init(struct loose_object_map **map) +{ + struct loose_object_map *m; + m = xmalloc(sizeof(**map)); + m->to_compat = kh_init_oid_map(); + m->to_storage = kh_init_oid_map(); + *map = m; +} + +static int insert_oid_pair(kh_oid_map_t *map, const struct object_id *key, const struct object_id *value) +{ + khiter_t pos; + int ret; + struct object_id *stored; + + pos = kh_put_oid_map(map, *key, &ret); + + /* This item already exists in the map. */ + if (ret == 0) + return 0; + + stored = xmalloc(sizeof(*stored)); + oidcpy(stored, value); + kh_value(map, pos) = stored; + return 1; +} + +static int load_one_loose_object_map(struct repository *repo, struct object_directory *dir) +{ + struct strbuf buf = STRBUF_INIT, path = STRBUF_INIT; + FILE *fp; + + if (!dir->loose_map) + loose_object_map_init(&dir->loose_map); + + insert_oid_pair(dir->loose_map->to_compat, repo->hash_algo->empty_tree, repo->compat_hash_algo->empty_tree); + insert_oid_pair(dir->loose_map->to_storage, repo->compat_hash_algo->empty_tree, repo->hash_algo->empty_tree); + + insert_oid_pair(dir->loose_map->to_compat, repo->hash_algo->empty_blob, repo->compat_hash_algo->empty_blob); + insert_oid_pair(dir->loose_map->to_storage, repo->compat_hash_algo->empty_blob, repo->hash_algo->empty_blob); + + insert_oid_pair(dir->loose_map->to_compat, repo->hash_algo->null_oid, repo->compat_hash_algo->null_oid); + insert_oid_pair(dir->loose_map->to_storage, repo->compat_hash_algo->null_oid, repo->hash_algo->null_oid); + + strbuf_git_common_path(&path, repo, "objects/loose-object-idx"); + fp = fopen(path.buf, "rb"); + if (!fp) + return 0; + + errno = 0; + if (strbuf_getwholeline(&buf, fp, '\n') || strcmp(buf.buf, loose_object_header)) + goto err; + while (!strbuf_getline_lf(&buf, fp)) { + const char *p; + struct object_id oid, compat_oid; + if (parse_oid_hex_algop(buf.buf, &oid, &p, repo->hash_algo) || + *p++ != ' ' || + parse_oid_hex_algop(p, &compat_oid, &p, repo->compat_hash_algo) || + p != buf.buf + buf.len) + goto err; + insert_oid_pair(dir->loose_map->to_compat, &oid, &compat_oid); + insert_oid_pair(dir->loose_map->to_storage, &compat_oid, &oid); + } + + strbuf_release(&buf); + strbuf_release(&path); + return errno ? -1 : 0; +err: + strbuf_release(&buf); + strbuf_release(&path); + return -1; +} + +int repo_read_loose_object_map(struct repository *repo) +{ + struct object_directory *dir; + + if (!should_use_loose_object_map(repo)) + return 0; + + prepare_alt_odb(repo); + + for (dir = repo->objects->odb; dir; dir = dir->next) { + if (load_one_loose_object_map(repo, dir) < 0) { + return -1; + } + } + return 0; +} + +int repo_write_loose_object_map(struct repository *repo) +{ + kh_oid_map_t *map = repo->objects->odb->loose_map->to_compat; + struct lock_file lock; + int fd; + khiter_t iter; + struct strbuf buf = STRBUF_INIT, path = STRBUF_INIT; + + if (!should_use_loose_object_map(repo)) + return 0; + + strbuf_git_common_path(&path, repo, "objects/loose-object-idx"); + fd = hold_lock_file_for_update_timeout(&lock, path.buf, LOCK_DIE_ON_ERROR, -1); + iter = kh_begin(map); + if (write_in_full(fd, loose_object_header, strlen(loose_object_header)) < 0) + goto errout; + + for (; iter != kh_end(map); iter++) { + if (kh_exist(map, iter)) { + if (oideq(&kh_key(map, iter), the_hash_algo->empty_tree) || + oideq(&kh_key(map, iter), the_hash_algo->empty_blob)) + continue; + strbuf_addf(&buf, "%s %s\n", oid_to_hex(&kh_key(map, iter)), oid_to_hex(kh_value(map, iter))); + if (write_in_full(fd, buf.buf, buf.len) < 0) + goto errout; + strbuf_reset(&buf); + } + } + strbuf_release(&buf); + if (commit_lock_file(&lock) < 0) { + error_errno(_("could not write loose object index %s"), path.buf); + strbuf_release(&path); + return -1; + } + strbuf_release(&path); + return 0; +errout: + rollback_lock_file(&lock); + strbuf_release(&buf); + error_errno(_("failed to write loose object index %s\n"), path.buf); + strbuf_release(&path); + return -1; +} + +static int write_one_object(struct repository *repo, const struct object_id *oid, + const struct object_id *compat_oid) +{ + struct lock_file lock; + int fd; + struct stat st; + struct strbuf buf = STRBUF_INIT, path = STRBUF_INIT; + + strbuf_git_common_path(&path, repo, "objects/loose-object-idx"); + hold_lock_file_for_update_timeout(&lock, path.buf, LOCK_DIE_ON_ERROR, -1); + + fd = open(path.buf, O_WRONLY | O_CREAT | O_APPEND, 0666); + if (fd < 0) + goto errout; + if (fstat(fd, &st) < 0) + goto errout; + if (!st.st_size && write_in_full(fd, loose_object_header, strlen(loose_object_header)) < 0) + goto errout; + + strbuf_addf(&buf, "%s %s\n", oid_to_hex(oid), oid_to_hex(compat_oid)); + if (write_in_full(fd, buf.buf, buf.len) < 0) + goto errout; + if (close(fd)) + goto errout; + adjust_shared_perm(path.buf); + rollback_lock_file(&lock); + strbuf_release(&buf); + strbuf_release(&path); + return 0; +errout: + error_errno(_("failed to write loose object index %s\n"), path.buf); + close(fd); + rollback_lock_file(&lock); + strbuf_release(&buf); + strbuf_release(&path); + return -1; +} + +int repo_add_loose_object_map(struct repository *repo, const struct object_id *oid, + const struct object_id *compat_oid) +{ + int inserted = 0; + + if (!should_use_loose_object_map(repo)) + return 0; + + inserted |= insert_oid_pair(repo->objects->odb->loose_map->to_compat, oid, compat_oid); + inserted |= insert_oid_pair(repo->objects->odb->loose_map->to_storage, compat_oid, oid); + if (inserted) + return write_one_object(repo, oid, compat_oid); + return 0; +} + +int repo_loose_object_map_oid(struct repository *repo, struct object_id *dest, + const struct git_hash_algo *to, + const struct object_id *src) +{ + struct object_directory *dir; + kh_oid_map_t *map; + khiter_t pos; + + for (dir = repo->objects->odb; dir; dir = dir->next) { + struct loose_object_map *loose_map = dir->loose_map; + if (!loose_map) + continue; + map = (to == repo->compat_hash_algo) ? + loose_map->to_compat : + loose_map->to_storage; + pos = kh_get_oid_map(map, *src); + if (pos < kh_end(map)) { + oidcpy(dest, kh_value(map, pos)); + return 0; + } + } + return -1; +} + +void loose_object_map_clear(struct loose_object_map **map) +{ + struct loose_object_map *m = *map; + struct object_id *oid; + + if (!m) + return; + + kh_foreach_value(m->to_compat, oid, free(oid)); + kh_foreach_value(m->to_storage, oid, free(oid)); + kh_destroy_oid_map(m->to_compat); + kh_destroy_oid_map(m->to_storage); + free(m); + *map = NULL; +} diff --git a/loose.h b/loose.h new file mode 100644 index 000000000000..061c6937aead --- /dev/null +++ b/loose.h @@ -0,0 +1,20 @@ +#ifndef LOOSE_H +#define LOOSE_H + +#include "khash.h" + +struct loose_object_map { + kh_oid_map_t *to_compat; + kh_oid_map_t *to_storage; +}; + +void loose_object_map_init(struct loose_object_map **map); +void loose_object_map_clear(struct loose_object_map **map); +int repo_loose_object_map_oid(struct repository *repo, struct object_id *dest, + const struct git_hash_algo *dest_algo, const struct object_id *src); +int repo_add_loose_object_map(struct repository *repo, const struct object_id *oid, + const struct object_id *compat_oid); +int repo_read_loose_object_map(struct repository *repo); +int repo_write_loose_object_map(struct repository *repo); + +#endif diff --git a/object-file-convert.c b/object-file-convert.c index 9f4d5b354f5f..e7c62434016d 100644 --- a/object-file-convert.c +++ b/object-file-convert.c @@ -4,6 +4,7 @@ #include "repository.h" #include "hash-ll.h" #include "object.h" +#include "loose.h" #include "object-file-convert.h" int repo_oid_to_algop(struct repository *repo, const struct object_id *src, @@ -21,7 +22,18 @@ int repo_oid_to_algop(struct repository *repo, const struct object_id *src, oidcpy(dest, src); return 0; } - return -1; + if (repo_loose_object_map_oid(repo, dest, to, src)) { + /* + * We may have loaded the object map at repo initialization but + * another process (perhaps upstream of a pipe from us) may have + * written a new object into the map. If the object is missing, + * let's reload the map to see if the object has appeared. + */ + repo_read_loose_object_map(repo); + if (repo_loose_object_map_oid(repo, dest, to, src)) + return -1; + } + return 0; } int convert_object_file(struct strbuf *outbuf, diff --git a/object-store-ll.h b/object-store-ll.h index 26a3895c821c..bc76d6bec80d 100644 --- a/object-store-ll.h +++ b/object-store-ll.h @@ -26,6 +26,9 @@ struct object_directory { uint32_t loose_objects_subdir_seen[8]; /* 256 bits */ struct oidtree *loose_objects_cache; + /* Map between object IDs for loose objects. */ + struct loose_object_map *loose_map; + /* * This is a temporary object store created by the tmp_objdir * facility. Disable ref updates since the objects in the store diff --git a/object.c b/object.c index 2c61e4c86217..186a0a47c0fb 100644 --- a/object.c +++ b/object.c @@ -13,6 +13,7 @@ #include "alloc.h" #include "packfile.h" #include "commit-graph.h" +#include "loose.h" unsigned int get_max_object_index(void) { @@ -540,6 +541,7 @@ void free_object_directory(struct object_directory *odb) { free(odb->path); odb_clear_loose_cache(odb); + loose_object_map_clear(&odb->loose_map); free(odb); } diff --git a/repository.c b/repository.c index de620d82bfc6..4ab44d3b0344 100644 --- a/repository.c +++ b/repository.c @@ -14,6 +14,7 @@ #include "read-cache-ll.h" #include "remote.h" #include "setup.h" +#include "loose.h" #include "submodule-config.h" #include "sparse-index.h" #include "trace2.h" @@ -112,6 +113,8 @@ void repo_enable_compat_map(struct repository *repo, int enable_compat) GIT_HASH_SHA1]; repo->compat_hash_algo = enable_compat ? other_algo : NULL; + if (enable_compat) + repo_read_loose_object_map(repo); } /* @@ -204,6 +207,9 @@ int repo_init(struct repository *repo, if (worktree) repo_set_worktree(repo, worktree); + if (repo->compat_hash_algo) + repo_read_loose_object_map(repo); + clear_repository_format(&format); return 0; From patchwork Fri Sep 8 23:10:25 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Eric W. Biederman" X-Patchwork-Id: 13377912 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9A579EEB571 for ; Fri, 8 Sep 2023 23:30:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244457AbjIHXao (ORCPT ); Fri, 8 Sep 2023 19:30:44 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57536 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231519AbjIHXam (ORCPT ); Fri, 8 Sep 2023 19:30:42 -0400 Received: from out03.mta.xmission.com (out03.mta.xmission.com [166.70.13.233]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E3F762109 for ; Fri, 8 Sep 2023 16:30:33 -0700 (PDT) Received: from in02.mta.xmission.com ([166.70.13.52]:37092) by out03.mta.xmission.com with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1qekda-00FHHe-1O; Fri, 08 Sep 2023 17:11:42 -0600 Received: from ip68-227-168-167.om.om.cox.net ([68.227.168.167]:54328 helo=localhost.localdomain) by in02.mta.xmission.com with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1qekdZ-009u13-27; Fri, 08 Sep 2023 17:11:41 -0600 From: "Eric W. Biederman" To: git@vger.kernel.org Cc: Junio C Hamano , "brian m. carlson" , "Eric W. Biederman" Date: Fri, 8 Sep 2023 18:10:25 -0500 Message-Id: <20230908231049.2035003-8-ebiederm@xmission.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <87sf7ol0z3.fsf@email.froward.int.ebiederm.org> References: <87sf7ol0z3.fsf@email.froward.int.ebiederm.org> MIME-Version: 1.0 X-XM-SPF: eid=1qekdZ-009u13-27;;;mid=<20230908231049.2035003-8-ebiederm@xmission.com>;;;hst=in02.mta.xmission.com;;;ip=68.227.168.167;;;frm=ebiederm@xmission.com;;;spf=pass X-XM-AID: U2FsdGVkX19XaInyDXSvzVDmwkrLv21VeYrPIC9tzp0= X-SA-Exim-Connect-IP: 68.227.168.167 X-SA-Exim-Mail-From: ebiederm@xmission.com Subject: [PATCH 08/32] loose: Compatibilty short name support X-SA-Exim-Version: 4.2.1 (built Sat, 08 Feb 2020 21:53:50 +0000) X-SA-Exim-Scanned: Yes (on in02.mta.xmission.com) Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Update loose_objects_cache when udpating the loose objects map. This oidtree is used to discover which oids are possibilities when resolving short names, and it can support a mixture of sha1 and sha256 oids. With this any oid recorded objects/loose-objects-idx is usable for resolving an oid to an object. To make this maintainable a helper insert_loose_map is factored out of load_one_loose_object_map and repo_add_loose_object_map, and then modified to also update the loose_objects_cache. Signed-off-by: "Eric W. Biederman" --- loose.c | 37 +++++++++++++++++++++++++------------ 1 file changed, 25 insertions(+), 12 deletions(-) diff --git a/loose.c b/loose.c index 8ddb7112a541..81ed0e6b0c1e 100644 --- a/loose.c +++ b/loose.c @@ -7,6 +7,7 @@ #include "gettext.h" #include "loose.h" #include "lockfile.h" +#include "oidtree.h" static const char *loose_object_header = "# loose-object-idx\n"; @@ -42,6 +43,21 @@ static int insert_oid_pair(kh_oid_map_t *map, const struct object_id *key, const return 1; } +static int insert_loose_map(struct object_directory *odb, + const struct object_id *oid, + const struct object_id *compat_oid) +{ + struct loose_object_map *map = odb->loose_map; + int inserted = 0; + + inserted |= insert_oid_pair(map->to_compat, oid, compat_oid); + inserted |= insert_oid_pair(map->to_storage, compat_oid, oid); + if (inserted) + oidtree_insert(odb->loose_objects_cache, compat_oid); + + return inserted; +} + static int load_one_loose_object_map(struct repository *repo, struct object_directory *dir) { struct strbuf buf = STRBUF_INIT, path = STRBUF_INIT; @@ -49,15 +65,14 @@ static int load_one_loose_object_map(struct repository *repo, struct object_dire if (!dir->loose_map) loose_object_map_init(&dir->loose_map); + if (!dir->loose_objects_cache) { + ALLOC_ARRAY(dir->loose_objects_cache, 1); + oidtree_init(dir->loose_objects_cache); + } - insert_oid_pair(dir->loose_map->to_compat, repo->hash_algo->empty_tree, repo->compat_hash_algo->empty_tree); - insert_oid_pair(dir->loose_map->to_storage, repo->compat_hash_algo->empty_tree, repo->hash_algo->empty_tree); - - insert_oid_pair(dir->loose_map->to_compat, repo->hash_algo->empty_blob, repo->compat_hash_algo->empty_blob); - insert_oid_pair(dir->loose_map->to_storage, repo->compat_hash_algo->empty_blob, repo->hash_algo->empty_blob); - - insert_oid_pair(dir->loose_map->to_compat, repo->hash_algo->null_oid, repo->compat_hash_algo->null_oid); - insert_oid_pair(dir->loose_map->to_storage, repo->compat_hash_algo->null_oid, repo->hash_algo->null_oid); + insert_loose_map(dir, repo->hash_algo->empty_tree, repo->compat_hash_algo->empty_tree); + insert_loose_map(dir, repo->hash_algo->empty_blob, repo->compat_hash_algo->empty_blob); + insert_loose_map(dir, repo->hash_algo->null_oid, repo->compat_hash_algo->null_oid); strbuf_git_common_path(&path, repo, "objects/loose-object-idx"); fp = fopen(path.buf, "rb"); @@ -75,8 +90,7 @@ static int load_one_loose_object_map(struct repository *repo, struct object_dire parse_oid_hex_algop(p, &compat_oid, &p, repo->compat_hash_algo) || p != buf.buf + buf.len) goto err; - insert_oid_pair(dir->loose_map->to_compat, &oid, &compat_oid); - insert_oid_pair(dir->loose_map->to_storage, &compat_oid, &oid); + insert_loose_map(dir, &oid, &compat_oid); } strbuf_release(&buf); @@ -195,8 +209,7 @@ int repo_add_loose_object_map(struct repository *repo, const struct object_id *o if (!should_use_loose_object_map(repo)) return 0; - inserted |= insert_oid_pair(repo->objects->odb->loose_map->to_compat, oid, compat_oid); - inserted |= insert_oid_pair(repo->objects->odb->loose_map->to_storage, compat_oid, oid); + inserted = insert_loose_map(repo->objects->odb, oid, compat_oid); if (inserted) return write_one_object(repo, oid, compat_oid); return 0; From patchwork Fri Sep 8 23:10:26 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Eric W. Biederman" X-Patchwork-Id: 13377924 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 506DFEEB565 for ; Sat, 9 Sep 2023 00:02:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241374AbjIIACv (ORCPT ); Fri, 8 Sep 2023 20:02:51 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34064 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234904AbjIIACu (ORCPT ); Fri, 8 Sep 2023 20:02:50 -0400 Received: from out01.mta.xmission.com (out01.mta.xmission.com [166.70.13.231]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2EE2EE46 for ; Fri, 8 Sep 2023 17:02:46 -0700 (PDT) Received: from in02.mta.xmission.com ([166.70.13.52]:34542) by out01.mta.xmission.com with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1qekdc-007QwF-H6; Fri, 08 Sep 2023 17:11:44 -0600 Received: from ip68-227-168-167.om.om.cox.net ([68.227.168.167]:54328 helo=localhost.localdomain) by in02.mta.xmission.com with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1qekdb-009u13-3x; Fri, 08 Sep 2023 17:11:44 -0600 From: "Eric W. Biederman" To: git@vger.kernel.org Cc: Junio C Hamano , "brian m. carlson" , "Eric W. Biederman" Date: Fri, 8 Sep 2023 18:10:26 -0500 Message-Id: <20230908231049.2035003-9-ebiederm@xmission.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <87sf7ol0z3.fsf@email.froward.int.ebiederm.org> References: <87sf7ol0z3.fsf@email.froward.int.ebiederm.org> MIME-Version: 1.0 X-XM-SPF: eid=1qekdb-009u13-3x;;;mid=<20230908231049.2035003-9-ebiederm@xmission.com>;;;hst=in02.mta.xmission.com;;;ip=68.227.168.167;;;frm=ebiederm@xmission.com;;;spf=pass X-XM-AID: U2FsdGVkX1+SGm9p8oywgrtyp1HYOxvR3NVpzYRm+7s= X-SA-Exim-Connect-IP: 68.227.168.167 X-SA-Exim-Mail-From: ebiederm@xmission.com Subject: [PATCH 09/32] object-file: Update the loose object map when writing loose objects X-SA-Exim-Version: 4.2.1 (built Sat, 08 Feb 2020 21:53:50 +0000) X-SA-Exim-Scanned: Yes (on in02.mta.xmission.com) Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org To implement SHA1 compatibility on SHA256 repositories the loose object map needs to be updated whenver a loose object is written. Updating the loose object map this way allows git to support the old hash algorithm in constant time. The functions write_loose_object, and stream_loose_object are the only two functions that write to the loose object store. Update stream_loose_object to compute the compatibiilty hash, update the loose object, and then call repo_add_loose_object_map to update the loose object map. Update write_object_file_flags to convert the object into it's compatibility encoding, hash the compatibility encoding, write the object, and then update the loose object map. Update force_object_loose to lookup the hash of the compatibility encoding, write the loose object, and then update the loose object map. Update write_object_file_litterally to refuse to write any objects when a compatibility encoding is enabled. The problem is that write_object_file_literally is frequently used to write ill-formed objects. Especially when the type of those objects is changed there is by definition no possibile way to convert them, as no converstion has been defined. Since a compatibilty encoding can not be found and a compatibility mapping can not be written the cleanest behavior is to simply disallow write_object_file_literraly from writing files. Except that the loose objects are updated before the loose object map I have not done any analysis to see how robust this scheme is in the event of failure. Signed-off-by: "Eric W. Biederman" --- object-file.c | 94 +++++++++++++++++++++++++++++++++++++++++---------- 1 file changed, 76 insertions(+), 18 deletions(-) diff --git a/object-file.c b/object-file.c index 7f24f19b8a68..6a14b8875343 100644 --- a/object-file.c +++ b/object-file.c @@ -44,6 +44,7 @@ #include "setup.h" #include "submodule.h" #include "fsck.h" +#include "loose.h" /* The maximum size for an object header. */ #define MAX_HEADER_LEN 32 @@ -2035,9 +2036,12 @@ static int start_loose_object_common(struct strbuf *tmp_file, const char *filename, unsigned flags, git_zstream *stream, unsigned char *buf, size_t buflen, - git_hash_ctx *c, + git_hash_ctx *c, git_hash_ctx *compat_c, char *hdr, int hdrlen) { + struct repository *repo = the_repository; + const struct git_hash_algo *algo = repo->hash_algo; + const struct git_hash_algo *compat = repo->compat_hash_algo; int fd; fd = create_tmpfile(tmp_file, filename); @@ -2057,14 +2061,18 @@ static int start_loose_object_common(struct strbuf *tmp_file, git_deflate_init(stream, zlib_compression_level); stream->next_out = buf; stream->avail_out = buflen; - the_hash_algo->init_fn(c); + algo->init_fn(c); + if (compat && compat_c) + compat->init_fn(compat_c); /* Start to feed header to zlib stream */ stream->next_in = (unsigned char *)hdr; stream->avail_in = hdrlen; while (git_deflate(stream, 0) == Z_OK) ; /* nothing */ - the_hash_algo->update_fn(c, hdr, hdrlen); + algo->update_fn(c, hdr, hdrlen); + if (compat && compat_c) + compat->update_fn(compat_c, hdr, hdrlen); return fd; } @@ -2073,16 +2081,21 @@ static int start_loose_object_common(struct strbuf *tmp_file, * Common steps for the inner git_deflate() loop for writing loose * objects. Returns what git_deflate() returns. */ -static int write_loose_object_common(git_hash_ctx *c, +static int write_loose_object_common(git_hash_ctx *c, git_hash_ctx *compat_c, git_zstream *stream, const int flush, unsigned char *in0, const int fd, unsigned char *compressed, const size_t compressed_len) { + struct repository *repo = the_repository; + const struct git_hash_algo *algo = repo->hash_algo; + const struct git_hash_algo *compat = repo->compat_hash_algo; int ret; ret = git_deflate(stream, flush ? Z_FINISH : 0); - the_hash_algo->update_fn(c, in0, stream->next_in - in0); + algo->update_fn(c, in0, stream->next_in - in0); + if (compat && compat_c) + compat->update_fn(compat_c, in0, stream->next_in - in0); if (write_in_full(fd, compressed, stream->next_out - compressed) < 0) die_errno(_("unable to write loose object file")); stream->next_out = compressed; @@ -2097,15 +2110,21 @@ static int write_loose_object_common(git_hash_ctx *c, * - End the compression of zlib stream. * - Get the calculated oid to "oid". */ -static int end_loose_object_common(git_hash_ctx *c, git_zstream *stream, - struct object_id *oid) +static int end_loose_object_common(git_hash_ctx *c, git_hash_ctx *compat_c, + git_zstream *stream, struct object_id *oid, + struct object_id *compat_oid) { + struct repository *repo = the_repository; + const struct git_hash_algo *algo = repo->hash_algo; + const struct git_hash_algo *compat = repo->compat_hash_algo; int ret; ret = git_deflate_end_gently(stream); if (ret != Z_OK) return ret; - the_hash_algo->final_oid_fn(oid, c); + algo->final_oid_fn(oid, c); + if (compat && compat_c) + compat->final_oid_fn(compat_oid, compat_c); return Z_OK; } @@ -2129,7 +2148,7 @@ static int write_loose_object(const struct object_id *oid, char *hdr, fd = start_loose_object_common(&tmp_file, filename.buf, flags, &stream, compressed, sizeof(compressed), - &c, hdr, hdrlen); + &c, NULL, hdr, hdrlen); if (fd < 0) return -1; @@ -2139,14 +2158,14 @@ static int write_loose_object(const struct object_id *oid, char *hdr, do { unsigned char *in0 = stream.next_in; - ret = write_loose_object_common(&c, &stream, 1, in0, fd, + ret = write_loose_object_common(&c, NULL, &stream, 1, in0, fd, compressed, sizeof(compressed)); } while (ret == Z_OK); if (ret != Z_STREAM_END) die(_("unable to deflate new object %s (%d)"), oid_to_hex(oid), ret); - ret = end_loose_object_common(&c, &stream, ¶no_oid); + ret = end_loose_object_common(&c, NULL, &stream, ¶no_oid, NULL); if (ret != Z_OK) die(_("deflateEnd on object %s failed (%d)"), oid_to_hex(oid), ret); @@ -2191,10 +2210,12 @@ static int freshen_packed_object(const struct object_id *oid) int stream_loose_object(struct input_stream *in_stream, size_t len, struct object_id *oid) { + const struct git_hash_algo *compat = the_repository->compat_hash_algo; + struct object_id compat_oid; int fd, ret, err = 0, flush = 0; unsigned char compressed[4096]; git_zstream stream; - git_hash_ctx c; + git_hash_ctx c, compat_c; struct strbuf tmp_file = STRBUF_INIT; struct strbuf filename = STRBUF_INIT; int dirlen; @@ -2218,7 +2239,7 @@ int stream_loose_object(struct input_stream *in_stream, size_t len, */ fd = start_loose_object_common(&tmp_file, filename.buf, 0, &stream, compressed, sizeof(compressed), - &c, hdr, hdrlen); + &c, &compat_c, hdr, hdrlen); if (fd < 0) { err = -1; goto cleanup; @@ -2236,7 +2257,7 @@ int stream_loose_object(struct input_stream *in_stream, size_t len, if (in_stream->is_finished) flush = 1; } - ret = write_loose_object_common(&c, &stream, flush, in0, fd, + ret = write_loose_object_common(&c, &compat_c, &stream, flush, in0, fd, compressed, sizeof(compressed)); /* * Unlike write_loose_object(), we do not have the entire @@ -2259,7 +2280,7 @@ int stream_loose_object(struct input_stream *in_stream, size_t len, */ if (ret != Z_STREAM_END) die(_("unable to stream deflate new object (%d)"), ret); - ret = end_loose_object_common(&c, &stream, oid); + ret = end_loose_object_common(&c, &compat_c, &stream, oid, &compat_oid); if (ret != Z_OK) die(_("deflateEnd on stream object failed (%d)"), ret); close_loose_object(fd, tmp_file.buf); @@ -2286,6 +2307,8 @@ int stream_loose_object(struct input_stream *in_stream, size_t len, } err = finalize_object_file(tmp_file.buf, filename.buf); + if (!err && compat) + err = repo_add_loose_object_map(the_repository, oid, &compat_oid); cleanup: strbuf_release(&tmp_file); strbuf_release(&filename); @@ -2296,17 +2319,38 @@ int write_object_file_flags(const void *buf, unsigned long len, enum object_type type, struct object_id *oid, unsigned flags) { + struct repository *repo = the_repository; + const struct git_hash_algo *algo = repo->hash_algo; + const struct git_hash_algo *compat = repo->compat_hash_algo; + struct object_id compat_oid; char hdr[MAX_HEADER_LEN]; int hdrlen = sizeof(hdr); + /* Generate compat_oid */ + if (compat) { + if (type == OBJ_BLOB) + hash_object_file(compat, buf, len, type, &compat_oid); + else { + struct strbuf converted = STRBUF_INIT; + convert_object_file(&converted, algo, compat, + buf, len, type, 0); + hash_object_file(compat, converted.buf, converted.len, + type, &compat_oid); + strbuf_release(&converted); + } + } + /* Normally if we have it in the pack then we do not bother writing * it out into .git/objects/??/?{38} file. */ - write_object_file_prepare(the_hash_algo, buf, len, type, oid, hdr, - &hdrlen); + write_object_file_prepare(algo, buf, len, type, oid, hdr, &hdrlen); if (freshen_packed_object(oid) || freshen_loose_object(oid)) return 0; - return write_loose_object(oid, hdr, hdrlen, buf, len, 0, flags); + if (write_loose_object(oid, hdr, hdrlen, buf, len, 0, flags)) + return -1; + if (compat) + return repo_add_loose_object_map(repo, oid, &compat_oid); + return 0; } int write_object_file_literally(const void *buf, unsigned long len, @@ -2324,6 +2368,10 @@ int write_object_file_literally(const void *buf, unsigned long len, if (!(flags & HASH_WRITE_OBJECT)) goto cleanup; + else if (the_repository->compat_hash_algo) { + status = -1; + goto cleanup; + } if (freshen_packed_object(oid) || freshen_loose_object(oid)) goto cleanup; status = write_loose_object(oid, header, hdrlen, buf, len, 0, 0); @@ -2335,9 +2383,12 @@ int write_object_file_literally(const void *buf, unsigned long len, int force_object_loose(const struct object_id *oid, time_t mtime) { + struct repository *repo = the_repository; + const struct git_hash_algo *compat = repo->compat_hash_algo; void *buf; unsigned long len; struct object_info oi = OBJECT_INFO_INIT; + struct object_id compat_oid; enum object_type type; char hdr[MAX_HEADER_LEN]; int hdrlen; @@ -2350,8 +2401,15 @@ int force_object_loose(const struct object_id *oid, time_t mtime) oi.contentp = &buf; if (oid_object_info_extended(the_repository, oid, &oi, 0)) return error(_("cannot read object for %s"), oid_to_hex(oid)); + if (compat) { + if (repo_oid_to_algop(repo, oid, compat, &compat_oid)) + return error(_("cannot map object %s to %s"), + oid_to_hex(oid), compat->name); + } hdrlen = format_object_header(hdr, sizeof(hdr), type, len); ret = write_loose_object(oid, hdr, hdrlen, buf, len, mtime, 0); + if (!ret && compat) + ret = repo_add_loose_object_map(the_repository, oid, &compat_oid); free(buf); return ret; From patchwork Fri Sep 8 23:10:27 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Eric W. Biederman" X-Patchwork-Id: 13377907 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D2B7DEEB571 for ; Fri, 8 Sep 2023 23:21:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1345173AbjIHXVm (ORCPT ); Fri, 8 Sep 2023 19:21:42 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55636 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1345166AbjIHXVl (ORCPT ); Fri, 8 Sep 2023 19:21:41 -0400 Received: from out01.mta.xmission.com (out01.mta.xmission.com [166.70.13.231]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6B59C133 for ; Fri, 8 Sep 2023 16:21:37 -0700 (PDT) Received: from in02.mta.xmission.com ([166.70.13.52]:34564) by out01.mta.xmission.com with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1qekde-007QwO-Pg; Fri, 08 Sep 2023 17:11:46 -0600 Received: from ip68-227-168-167.om.om.cox.net ([68.227.168.167]:54328 helo=localhost.localdomain) by in02.mta.xmission.com with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1qekdd-009u13-Jj; Fri, 08 Sep 2023 17:11:46 -0600 From: "Eric W. Biederman" To: git@vger.kernel.org Cc: Junio C Hamano , "brian m. carlson" , "Eric W. Biederman" Date: Fri, 8 Sep 2023 18:10:27 -0500 Message-Id: <20230908231049.2035003-10-ebiederm@xmission.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <87sf7ol0z3.fsf@email.froward.int.ebiederm.org> References: <87sf7ol0z3.fsf@email.froward.int.ebiederm.org> MIME-Version: 1.0 X-XM-SPF: eid=1qekdd-009u13-Jj;;;mid=<20230908231049.2035003-10-ebiederm@xmission.com>;;;hst=in02.mta.xmission.com;;;ip=68.227.168.167;;;frm=ebiederm@xmission.com;;;spf=pass X-XM-AID: U2FsdGVkX19hb3NqikBjoXtD6PxGAiK3Dbiy+vxXvZ8= X-SA-Exim-Connect-IP: 68.227.168.167 X-SA-Exim-Mail-From: ebiederm@xmission.com Subject: [PATCH 10/32] bulk-checkin: Only accept blobs X-SA-Exim-Version: 4.2.1 (built Sat, 08 Feb 2020 21:53:50 +0000) X-SA-Exim-Scanned: Yes (on in02.mta.xmission.com) Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org As the code is written today bulk_checkin only accepts blobs. When dealing with multiple hash algorithms it is necessary to distinguish between blobs and object types that have embedded oids. For object that embed oids a completely new object needs to be generated to compute the compatibility hash on. For blobs however all that is needed is to compute the compatibility hash on the same blob as the default hash. As the code will soon need the compatiblity hash from a bulk checkin remove support for a bulk checking of anything except blobs. Signed-off-by: "Eric W. Biederman" --- bulk-checkin.c | 35 +++++++++++++++++------------------ bulk-checkin.h | 6 +++--- object-file.c | 12 ++++++------ 3 files changed, 26 insertions(+), 27 deletions(-) diff --git a/bulk-checkin.c b/bulk-checkin.c index 73bff3a23d27..223562b4e748 100644 --- a/bulk-checkin.c +++ b/bulk-checkin.c @@ -155,10 +155,10 @@ static int already_written(struct bulk_checkin_packfile *state, struct object_id * status before calling us just in case we ask it to call us again * with a new pack. */ -static int stream_to_pack(struct bulk_checkin_packfile *state, - git_hash_ctx *ctx, off_t *already_hashed_to, - int fd, size_t size, enum object_type type, - const char *path, unsigned flags) +static int stream_blob_to_pack(struct bulk_checkin_packfile *state, + git_hash_ctx *ctx, off_t *already_hashed_to, + int fd, size_t size, const char *path, + unsigned flags) { git_zstream s; unsigned char ibuf[16384]; @@ -170,7 +170,7 @@ static int stream_to_pack(struct bulk_checkin_packfile *state, git_deflate_init(&s, pack_compression_level); - hdrlen = encode_in_pack_object_header(obuf, sizeof(obuf), type, size); + hdrlen = encode_in_pack_object_header(obuf, sizeof(obuf), OBJ_BLOB, size); s.next_out = obuf + hdrlen; s.avail_out = sizeof(obuf) - hdrlen; @@ -247,11 +247,10 @@ static void prepare_to_stream(struct bulk_checkin_packfile *state, die_errno("unable to write pack header"); } -static int deflate_to_pack(struct bulk_checkin_packfile *state, - struct object_id *result_oid, - int fd, size_t size, - enum object_type type, const char *path, - unsigned flags) +static int deflate_blob_to_pack(struct bulk_checkin_packfile *state, + struct object_id *result_oid, + int fd, size_t size, + const char *path, unsigned flags) { off_t seekback, already_hashed_to; git_hash_ctx ctx; @@ -265,7 +264,7 @@ static int deflate_to_pack(struct bulk_checkin_packfile *state, return error("cannot find the current offset"); header_len = format_object_header((char *)obuf, sizeof(obuf), - type, size); + OBJ_BLOB, size); the_hash_algo->init_fn(&ctx); the_hash_algo->update_fn(&ctx, obuf, header_len); @@ -282,8 +281,8 @@ static int deflate_to_pack(struct bulk_checkin_packfile *state, idx->offset = state->offset; crc32_begin(state->f); } - if (!stream_to_pack(state, &ctx, &already_hashed_to, - fd, size, type, path, flags)) + if (!stream_blob_to_pack(state, &ctx, &already_hashed_to, + fd, size, path, flags)) break; /* * Writing this object to the current pack will make @@ -350,12 +349,12 @@ void fsync_loose_object_bulk_checkin(int fd, const char *filename) } } -int index_bulk_checkin(struct object_id *oid, - int fd, size_t size, enum object_type type, - const char *path, unsigned flags) +int index_blob_bulk_checkin(struct object_id *oid, + int fd, size_t size, + const char *path, unsigned flags) { - int status = deflate_to_pack(&bulk_checkin_packfile, oid, fd, size, type, - path, flags); + int status = deflate_blob_to_pack(&bulk_checkin_packfile, oid, fd, size, + path, flags); if (!odb_transaction_nesting) flush_bulk_checkin_packfile(&bulk_checkin_packfile); return status; diff --git a/bulk-checkin.h b/bulk-checkin.h index 48fe9a6e9171..aa7286a7b3e1 100644 --- a/bulk-checkin.h +++ b/bulk-checkin.h @@ -9,9 +9,9 @@ void prepare_loose_object_bulk_checkin(void); void fsync_loose_object_bulk_checkin(int fd, const char *filename); -int index_bulk_checkin(struct object_id *oid, - int fd, size_t size, enum object_type type, - const char *path, unsigned flags); +int index_blob_bulk_checkin(struct object_id *oid, + int fd, size_t size, + const char *path, unsigned flags); /* * Tell the object database to optimize for adding diff --git a/object-file.c b/object-file.c index 6a14b8875343..6cc4ae1fd957 100644 --- a/object-file.c +++ b/object-file.c @@ -2587,11 +2587,11 @@ static int index_core(struct index_state *istate, * binary blobs, they generally do not want to get any conversion, and * callers should avoid this code path when filters are requested. */ -static int index_stream(struct object_id *oid, int fd, size_t size, - enum object_type type, const char *path, - unsigned flags) +static int index_blob_stream(struct object_id *oid, int fd, size_t size, + const char *path, + unsigned flags) { - return index_bulk_checkin(oid, fd, size, type, path, flags); + return index_blob_bulk_checkin(oid, fd, size, path, flags); } int index_fd(struct index_state *istate, struct object_id *oid, @@ -2613,8 +2613,8 @@ int index_fd(struct index_state *istate, struct object_id *oid, ret = index_core(istate, oid, fd, xsize_t(st->st_size), type, path, flags); else - ret = index_stream(oid, fd, xsize_t(st->st_size), type, path, - flags); + ret = index_blob_stream(oid, fd, xsize_t(st->st_size), path, + flags); close(fd); return ret; } From patchwork Fri Sep 8 23:10:28 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Eric W. Biederman" X-Patchwork-Id: 13377926 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8E19FEEB565 for ; Sat, 9 Sep 2023 00:05:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235722AbjIIAFW (ORCPT ); Fri, 8 Sep 2023 20:05:22 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38790 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231316AbjIIAFU (ORCPT ); Fri, 8 Sep 2023 20:05:20 -0400 Received: from out01.mta.xmission.com (out01.mta.xmission.com [166.70.13.231]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9077A18E for ; Fri, 8 Sep 2023 17:05:16 -0700 (PDT) Received: from in02.mta.xmission.com ([166.70.13.52]:34576) by out01.mta.xmission.com with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1qekdg-007QwX-SO; Fri, 08 Sep 2023 17:11:48 -0600 Received: from ip68-227-168-167.om.om.cox.net ([68.227.168.167]:54328 helo=localhost.localdomain) by in02.mta.xmission.com with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1qekdf-009u13-SV; Fri, 08 Sep 2023 17:11:48 -0600 From: "Eric W. Biederman" To: git@vger.kernel.org Cc: Junio C Hamano , "brian m. carlson" , "Eric W. Biederman" Date: Fri, 8 Sep 2023 18:10:28 -0500 Message-Id: <20230908231049.2035003-11-ebiederm@xmission.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <87sf7ol0z3.fsf@email.froward.int.ebiederm.org> References: <87sf7ol0z3.fsf@email.froward.int.ebiederm.org> MIME-Version: 1.0 X-XM-SPF: eid=1qekdf-009u13-SV;;;mid=<20230908231049.2035003-11-ebiederm@xmission.com>;;;hst=in02.mta.xmission.com;;;ip=68.227.168.167;;;frm=ebiederm@xmission.com;;;spf=pass X-XM-AID: U2FsdGVkX1/pOzUKcnpz8GqE4sTVYlTGrUob60OrQRI= X-SA-Exim-Connect-IP: 68.227.168.167 X-SA-Exim-Mail-From: ebiederm@xmission.com Subject: [PATCH 11/32] pack: Communicate the compat_oid through struct pack_idx_entry X-SA-Exim-Version: 4.2.1 (built Sat, 08 Feb 2020 21:53:50 +0000) X-SA-Exim-Scanned: Yes (on in02.mta.xmission.com) Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Add compat_oid into struct pack_idx_entry to allow communicating the the compat hash value of the objects being indexed to the code that builds the indexes for a pack. Having a mechanism that communicates the compat_oid from the code building the pack is necessary for bulk-checkin, fast-import, and index-pack. Only pack-objects could rely on the existing comaptibility mappings, but there is not point since the other creators of indexes can't. Unfortunately this adds a 4 byte hole into struct pack_idx_entry. Signed-off-by: "Eric W. Biederman" --- pack.h | 1 + 1 file changed, 1 insertion(+) diff --git a/pack.h b/pack.h index 3ab9e3f60c0b..321d38374f70 100644 --- a/pack.h +++ b/pack.h @@ -75,6 +75,7 @@ struct pack_idx_header { */ struct pack_idx_entry { struct object_id oid; + struct object_id compat_oid; uint32_t crc32; off_t offset; }; From patchwork Fri Sep 8 23:10:29 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Eric W. Biederman" X-Patchwork-Id: 13377909 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 54DABEEB571 for ; Fri, 8 Sep 2023 23:30:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232614AbjIHXac (ORCPT ); Fri, 8 Sep 2023 19:30:32 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56458 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1345268AbjIHXab (ORCPT ); Fri, 8 Sep 2023 19:30:31 -0400 Received: from out03.mta.xmission.com (out03.mta.xmission.com [166.70.13.233]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0C7E91FCD for ; Fri, 8 Sep 2023 16:30:27 -0700 (PDT) Received: from in02.mta.xmission.com ([166.70.13.52]:37160) by out03.mta.xmission.com with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1qekdj-00FHI6-0m; Fri, 08 Sep 2023 17:11:51 -0600 Received: from ip68-227-168-167.om.om.cox.net ([68.227.168.167]:54328 helo=localhost.localdomain) by in02.mta.xmission.com with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1qekdh-009u13-Ul; Fri, 08 Sep 2023 17:11:50 -0600 From: "Eric W. Biederman" To: git@vger.kernel.org Cc: Junio C Hamano , "brian m. carlson" , "Eric W. Biederman" Date: Fri, 8 Sep 2023 18:10:29 -0500 Message-Id: <20230908231049.2035003-12-ebiederm@xmission.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <87sf7ol0z3.fsf@email.froward.int.ebiederm.org> References: <87sf7ol0z3.fsf@email.froward.int.ebiederm.org> MIME-Version: 1.0 X-XM-SPF: eid=1qekdh-009u13-Ul;;;mid=<20230908231049.2035003-12-ebiederm@xmission.com>;;;hst=in02.mta.xmission.com;;;ip=68.227.168.167;;;frm=ebiederm@xmission.com;;;spf=pass X-XM-AID: U2FsdGVkX1/LPmC0Nn0+hQWPa2fNSTXarTtI+MHhI9Y= X-SA-Exim-Connect-IP: 68.227.168.167 X-SA-Exim-Mail-From: ebiederm@xmission.com Subject: [PATCH 12/32] bulk-checkin: hash object with compatibility algorithm X-SA-Exim-Version: 4.2.1 (built Sat, 08 Feb 2020 21:53:50 +0000) X-SA-Exim-Scanned: Yes (on in02.mta.xmission.com) Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: "brian m. carlson" Any time we write an object into the repository when we're in dual hash mode, we need to compute both algorithms. We already do this when we write a loose object into the repository, but we also need to do so in the other case we write an object, which is the bulk check-in code. **** Write the compatibility hash into idx->compat_oid so it is available for code that generates indexes that include the compatibilty mappings. --EWB Signed-off-by: brian m. carlson Signed-off-by: "Eric W. Biederman" --- bulk-checkin.c | 24 ++++++++++++++++++++---- 1 file changed, 20 insertions(+), 4 deletions(-) diff --git a/bulk-checkin.c b/bulk-checkin.c index 223562b4e748..3206412a19e0 100644 --- a/bulk-checkin.c +++ b/bulk-checkin.c @@ -156,7 +156,8 @@ static int already_written(struct bulk_checkin_packfile *state, struct object_id * with a new pack. */ static int stream_blob_to_pack(struct bulk_checkin_packfile *state, - git_hash_ctx *ctx, off_t *already_hashed_to, + git_hash_ctx *ctx, git_hash_ctx *compat_ctx, + off_t *already_hashed_to, int fd, size_t size, const char *path, unsigned flags) { @@ -167,6 +168,7 @@ static int stream_blob_to_pack(struct bulk_checkin_packfile *state, int status = Z_OK; int write_object = (flags & HASH_WRITE_OBJECT); off_t offset = 0; + const struct git_hash_algo *compat = the_repository->compat_hash_algo; git_deflate_init(&s, pack_compression_level); @@ -188,8 +190,11 @@ static int stream_blob_to_pack(struct bulk_checkin_packfile *state, size_t hsize = offset - *already_hashed_to; if (rsize < hsize) hsize = rsize; - if (hsize) + if (hsize) { the_hash_algo->update_fn(ctx, ibuf, hsize); + if (compat) + compat->update_fn(compat_ctx, ibuf, hsize); + } *already_hashed_to = offset; } s.next_in = ibuf; @@ -253,11 +258,13 @@ static int deflate_blob_to_pack(struct bulk_checkin_packfile *state, const char *path, unsigned flags) { off_t seekback, already_hashed_to; - git_hash_ctx ctx; + git_hash_ctx ctx, compat_ctx; unsigned char obuf[16384]; unsigned header_len; struct hashfile_checkpoint checkpoint = {0}; struct pack_idx_entry *idx = NULL; + const struct git_hash_algo *compat = the_repository->compat_hash_algo; + struct object_id compat_oid = {}; seekback = lseek(fd, 0, SEEK_CUR); if (seekback == (off_t) -1) @@ -267,6 +274,10 @@ static int deflate_blob_to_pack(struct bulk_checkin_packfile *state, OBJ_BLOB, size); the_hash_algo->init_fn(&ctx); the_hash_algo->update_fn(&ctx, obuf, header_len); + if (compat) { + compat->init_fn(&compat_ctx); + compat->update_fn(&compat_ctx, obuf, header_len); + } /* Note: idx is non-NULL when we are writing */ if ((flags & HASH_WRITE_OBJECT) != 0) @@ -281,7 +292,8 @@ static int deflate_blob_to_pack(struct bulk_checkin_packfile *state, idx->offset = state->offset; crc32_begin(state->f); } - if (!stream_blob_to_pack(state, &ctx, &already_hashed_to, + if (!stream_blob_to_pack(state, &ctx, &compat_ctx, + &already_hashed_to, fd, size, path, flags)) break; /* @@ -298,6 +310,8 @@ static int deflate_blob_to_pack(struct bulk_checkin_packfile *state, return error("cannot seek back"); } the_hash_algo->final_oid_fn(result_oid, &ctx); + if (compat) + compat->final_oid_fn(&compat_oid, &compat_ctx); if (!idx) return 0; @@ -308,6 +322,8 @@ static int deflate_blob_to_pack(struct bulk_checkin_packfile *state, free(idx); } else { oidcpy(&idx->oid, result_oid); + if (compat) + oidcpy(&idx->compat_oid, &compat_oid); ALLOC_GROW(state->written, state->nr_written + 1, state->alloc_written); From patchwork Fri Sep 8 23:10:30 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Eric W. Biederman" X-Patchwork-Id: 13377930 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0D378EEB57A for ; Sat, 9 Sep 2023 00:40:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1345938AbjIIAkt (ORCPT ); Fri, 8 Sep 2023 20:40:49 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55744 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1345811AbjIIAjv (ORCPT ); Fri, 8 Sep 2023 20:39:51 -0400 Received: from out03.mta.xmission.com (out03.mta.xmission.com [166.70.13.233]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 742432D44 for ; Fri, 8 Sep 2023 17:39:16 -0700 (PDT) Received: from in02.mta.xmission.com ([166.70.13.52]:37192) by out03.mta.xmission.com with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1qekdl-00FHIL-1z; Fri, 08 Sep 2023 17:11:53 -0600 Received: from ip68-227-168-167.om.om.cox.net ([68.227.168.167]:54328 helo=localhost.localdomain) by in02.mta.xmission.com with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1qekdk-009u13-3N; Fri, 08 Sep 2023 17:11:52 -0600 From: "Eric W. Biederman" To: git@vger.kernel.org Cc: Junio C Hamano , "brian m. carlson" , "Eric W. Biederman" Date: Fri, 8 Sep 2023 18:10:30 -0500 Message-Id: <20230908231049.2035003-13-ebiederm@xmission.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <87sf7ol0z3.fsf@email.froward.int.ebiederm.org> References: <87sf7ol0z3.fsf@email.froward.int.ebiederm.org> MIME-Version: 1.0 X-XM-SPF: eid=1qekdk-009u13-3N;;;mid=<20230908231049.2035003-13-ebiederm@xmission.com>;;;hst=in02.mta.xmission.com;;;ip=68.227.168.167;;;frm=ebiederm@xmission.com;;;spf=pass X-XM-AID: U2FsdGVkX18UggUqgxl2MkhJFTn5zai+9uiPM0Ud96E= X-SA-Exim-Connect-IP: 68.227.168.167 X-SA-Exim-Mail-From: ebiederm@xmission.com Subject: [PATCH 13/32] object-file: Add a compat_oid_in parameter to write_object_file_flags X-SA-Exim-Version: 4.2.1 (built Sat, 08 Feb 2020 21:53:50 +0000) X-SA-Exim-Scanned: Yes (on in02.mta.xmission.com) Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org To create the proper signatures for commit objects both versions of the commit object need to be generated and signed. After that it is a waste to throw away the work of generating the compatibility hash so update write_object_file_flags to take a compatibility hash input parameter that it can use to skip the work of generating the compatability hash. Update the places that don't generate the compatability hash to pass NULL so it is easy to tell write_object_file_flags should not attempt to use their compatability hash. Signed-off-by: "Eric W. Biederman" --- cache-tree.c | 2 +- object-file.c | 6 ++++-- object-store-ll.h | 4 ++-- 3 files changed, 7 insertions(+), 5 deletions(-) diff --git a/cache-tree.c b/cache-tree.c index 641427ed410a..ddc7d3d86959 100644 --- a/cache-tree.c +++ b/cache-tree.c @@ -448,7 +448,7 @@ static int update_one(struct cache_tree *it, hash_object_file(the_hash_algo, buffer.buf, buffer.len, OBJ_TREE, &it->oid); } else if (write_object_file_flags(buffer.buf, buffer.len, OBJ_TREE, - &it->oid, flags & WRITE_TREE_SILENT + &it->oid, NULL, flags & WRITE_TREE_SILENT ? HASH_SILENT : 0)) { strbuf_release(&buffer); return -1; diff --git a/object-file.c b/object-file.c index 6cc4ae1fd957..fd420dd303df 100644 --- a/object-file.c +++ b/object-file.c @@ -2317,7 +2317,7 @@ int stream_loose_object(struct input_stream *in_stream, size_t len, int write_object_file_flags(const void *buf, unsigned long len, enum object_type type, struct object_id *oid, - unsigned flags) + struct object_id *compat_oid_in, unsigned flags) { struct repository *repo = the_repository; const struct git_hash_algo *algo = repo->hash_algo; @@ -2328,7 +2328,9 @@ int write_object_file_flags(const void *buf, unsigned long len, /* Generate compat_oid */ if (compat) { - if (type == OBJ_BLOB) + if (compat_oid_in) + oidcpy(&compat_oid, compat_oid_in); + else if (type == OBJ_BLOB) hash_object_file(compat, buf, len, type, &compat_oid); else { struct strbuf converted = STRBUF_INIT; diff --git a/object-store-ll.h b/object-store-ll.h index bc76d6bec80d..c5f2bb2fc2fe 100644 --- a/object-store-ll.h +++ b/object-store-ll.h @@ -255,11 +255,11 @@ void hash_object_file(const struct git_hash_algo *algo, const void *buf, int write_object_file_flags(const void *buf, unsigned long len, enum object_type type, struct object_id *oid, - unsigned flags); + struct object_id *comapt_oid_in, unsigned flags); static inline int write_object_file(const void *buf, unsigned long len, enum object_type type, struct object_id *oid) { - return write_object_file_flags(buf, len, type, oid, 0); + return write_object_file_flags(buf, len, type, oid, NULL, 0); } int write_object_file_literally(const void *buf, unsigned long len, From patchwork Fri Sep 8 23:10:31 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Eric W. Biederman" X-Patchwork-Id: 13377910 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4210DEEB570 for ; Fri, 8 Sep 2023 23:30:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1345273AbjIHXaf (ORCPT ); Fri, 8 Sep 2023 19:30:35 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59152 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S243950AbjIHXad (ORCPT ); Fri, 8 Sep 2023 19:30:33 -0400 Received: from out03.mta.xmission.com (out03.mta.xmission.com [166.70.13.233]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AAF3CE6B for ; Fri, 8 Sep 2023 16:30:28 -0700 (PDT) Received: from in02.mta.xmission.com ([166.70.13.52]:37218) by out03.mta.xmission.com with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1qekdn-00FHId-Gi; Fri, 08 Sep 2023 17:11:55 -0600 Received: from ip68-227-168-167.om.om.cox.net ([68.227.168.167]:54328 helo=localhost.localdomain) by in02.mta.xmission.com with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1qekdm-009u13-3z; Fri, 08 Sep 2023 17:11:55 -0600 From: "Eric W. Biederman" To: git@vger.kernel.org Cc: Junio C Hamano , "brian m. carlson" , "Eric W . Biederman" Date: Fri, 8 Sep 2023 18:10:31 -0500 Message-Id: <20230908231049.2035003-14-ebiederm@xmission.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <87sf7ol0z3.fsf@email.froward.int.ebiederm.org> References: <87sf7ol0z3.fsf@email.froward.int.ebiederm.org> MIME-Version: 1.0 X-XM-SPF: eid=1qekdm-009u13-3z;;;mid=<20230908231049.2035003-14-ebiederm@xmission.com>;;;hst=in02.mta.xmission.com;;;ip=68.227.168.167;;;frm=ebiederm@xmission.com;;;spf=pass X-XM-AID: U2FsdGVkX1+TURJkKvTmTTd9AaHY8TxdfarsgiuBkYg= X-SA-Exim-Connect-IP: 68.227.168.167 X-SA-Exim-Mail-From: ebiederm@xmission.com Subject: [PATCH 14/32] commit: write commits for both hashes X-SA-Exim-Version: 4.2.1 (built Sat, 08 Feb 2020 21:53:50 +0000) X-SA-Exim-Scanned: Yes (on in02.mta.xmission.com) Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: "brian m. carlson" When we write a commit, we include data that is specific to the hash algorithm, such as parents and the root tree. In order to write both a SHA-1 commit and a SHA-256 version, we need to convert between them. However, a straightforward conversion isn't necessarily what we want. When we sign a commit, we sign its data, so if we create a commit for SHA-256 and then write a SHA-1 version, we'll still have only signed the SHA-256 data. While this is valid, it would be better to sign both forms of data so people using SHA-1 can verify the signatures as well. Consequently, we don't want to use the standard mapping that occurs when we write an object. Instead, let's move most of the writing of the commit into a separate function which is agnostic of the hash algorithm and which simply writes into a buffer and specify both versions of the object ourselves. We can then call this function twice: once with the SHA-256 contents, and if SHA-1 is enabled, once with the SHA-1 contents. If we're signing the commit, we then sign both versions and append both signatures to both buffers. To produce a consistent hash, we always append the signatures in the order in which Git implemented them: first SHA-1, then SHA-256. In order to make this signing code work, we split the commit signing code into two functions, one which signs the buffer, and one which appends the signature. ***** Updated to use write_object_file_flags and repo_oid_to_algop -- EWB Signed-off-by: brian m. carlson Signed-off-by: Eric W. Biederman --- commit.c | 176 +++++++++++++++++++++++++++++++++++++++++-------------- 1 file changed, 131 insertions(+), 45 deletions(-) diff --git a/commit.c b/commit.c index b3223478bc2a..522ebb4b3002 100644 --- a/commit.c +++ b/commit.c @@ -28,6 +28,7 @@ #include "shallow.h" #include "tree.h" #include "hook.h" +#include "object-file-convert.h" static struct commit_extra_header *read_commit_extra_header_lines(const char *buf, size_t len, const char **); @@ -1100,12 +1101,11 @@ static const char *gpg_sig_headers[] = { "gpgsig-sha256", }; -int sign_with_header(struct strbuf *buf, const char *keyid) +static int add_commit_signature(struct strbuf *buf, struct strbuf *sig, const struct git_hash_algo *algo) { - struct strbuf sig = STRBUF_INIT; int inspos, copypos; const char *eoh; - const char *gpg_sig_header = gpg_sig_headers[hash_algo_by_ptr(the_hash_algo)]; + const char *gpg_sig_header = gpg_sig_headers[hash_algo_by_ptr(algo)]; int gpg_sig_header_len = strlen(gpg_sig_header); /* find the end of the header */ @@ -1115,15 +1115,8 @@ int sign_with_header(struct strbuf *buf, const char *keyid) else inspos = eoh - buf->buf + 1; - if (!keyid || !*keyid) - keyid = get_signing_key(); - if (sign_buffer(buf, &sig, keyid)) { - strbuf_release(&sig); - return -1; - } - - for (copypos = 0; sig.buf[copypos]; ) { - const char *bol = sig.buf + copypos; + for (copypos = 0; sig->buf[copypos]; ) { + const char *bol = sig->buf + copypos; const char *eol = strchrnul(bol, '\n'); int len = (eol - bol) + !!*eol; @@ -1136,11 +1129,17 @@ int sign_with_header(struct strbuf *buf, const char *keyid) inspos += len; copypos += len; } - strbuf_release(&sig); return 0; } - +static int sign_commit_to_strbuf(struct strbuf *sig, struct strbuf *buf, const char *keyid) +{ + if (!keyid || !*keyid) + keyid = get_signing_key(); + if (sign_buffer(buf, sig, keyid)) + return -1; + return 0; +} int parse_signed_commit(const struct commit *commit, struct strbuf *payload, struct strbuf *signature, @@ -1599,70 +1598,157 @@ N_("Warning: commit message did not conform to UTF-8.\n" "You may want to amend it after fixing the message, or set the config\n" "variable i18n.commitEncoding to the encoding your project uses.\n"); -int commit_tree_extended(const char *msg, size_t msg_len, - const struct object_id *tree, - struct commit_list *parents, struct object_id *ret, - const char *author, const char *committer, - const char *sign_commit, - struct commit_extra_header *extra) +static void write_commit_tree(struct strbuf *buffer, const char *msg, size_t msg_len, + const struct object_id *tree, + const struct object_id *parents, size_t parents_len, + const char *author, const char *committer, + struct commit_extra_header *extra) { - int result; int encoding_is_utf8; - struct strbuf buffer; - - assert_oid_type(tree, OBJ_TREE); - - if (memchr(msg, '\0', msg_len)) - return error("a NUL byte in commit log message not allowed."); + size_t i; /* Not having i18n.commitencoding is the same as having utf-8 */ encoding_is_utf8 = is_encoding_utf8(git_commit_encoding); - strbuf_init(&buffer, 8192); /* should avoid reallocs for the headers */ - strbuf_addf(&buffer, "tree %s\n", oid_to_hex(tree)); + strbuf_init(buffer, 8192); /* should avoid reallocs for the headers */ + strbuf_addf(buffer, "tree %s\n", oid_to_hex(tree)); /* * NOTE! This ordering means that the same exact tree merged with a * different order of parents will be a _different_ changeset even * if everything else stays the same. */ - while (parents) { - struct commit *parent = pop_commit(&parents); - strbuf_addf(&buffer, "parent %s\n", - oid_to_hex(&parent->object.oid)); - } + for (i = 0; i < parents_len; i++) + strbuf_addf(buffer, "parent %s\n", oid_to_hex(&parents[i])); /* Person/date information */ if (!author) author = git_author_info(IDENT_STRICT); - strbuf_addf(&buffer, "author %s\n", author); + strbuf_addf(buffer, "author %s\n", author); if (!committer) committer = git_committer_info(IDENT_STRICT); - strbuf_addf(&buffer, "committer %s\n", committer); + strbuf_addf(buffer, "committer %s\n", committer); if (!encoding_is_utf8) - strbuf_addf(&buffer, "encoding %s\n", git_commit_encoding); + strbuf_addf(buffer, "encoding %s\n", git_commit_encoding); while (extra) { - add_extra_header(&buffer, extra); + add_extra_header(buffer, extra); extra = extra->next; } - strbuf_addch(&buffer, '\n'); + strbuf_addch(buffer, '\n'); /* And add the comment */ - strbuf_add(&buffer, msg, msg_len); + strbuf_add(buffer, msg, msg_len); +} - /* And check the encoding */ - if (encoding_is_utf8 && !verify_utf8(&buffer)) - fprintf(stderr, _(commit_utf8_warn)); +int commit_tree_extended(const char *msg, size_t msg_len, + const struct object_id *tree, + struct commit_list *parents, struct object_id *ret, + const char *author, const char *committer, + const char *sign_commit, + struct commit_extra_header *extra) +{ + struct repository *r = the_repository; + int result = 0; + int encoding_is_utf8; + struct strbuf buffer, compat_buffer; + struct strbuf sig = STRBUF_INIT, compat_sig = STRBUF_INIT; + struct object_id *parent_buf = NULL; + struct object_id compat_oid = {}; + size_t i, nparents; + + /* Not having i18n.commitencoding is the same as having utf-8 */ + encoding_is_utf8 = is_encoding_utf8(git_commit_encoding); + + assert_oid_type(tree, OBJ_TREE); + + if (memchr(msg, '\0', msg_len)) + return error("a NUL byte in commit log message not allowed."); + + nparents = commit_list_count(parents); + parent_buf = xcalloc(nparents, sizeof(*parent_buf)); + for (i = 0; i < nparents; i++) { + struct commit *parent = pop_commit(&parents); + oidcpy(&parent_buf[i], &parent->object.oid); + } - if (sign_commit && sign_with_header(&buffer, sign_commit)) { + /* should avoid reallocs for the headers */ + strbuf_init(&buffer, 8192); + strbuf_init(&compat_buffer, 8192); + + write_commit_tree(&buffer, msg, msg_len, tree, parent_buf, nparents, author, committer, extra); + if (sign_commit && sign_commit_to_strbuf(&sig, &buffer, sign_commit)) { result = -1; goto out; } + if (r->compat_hash_algo) { + struct object_id mapped_tree; + struct object_id *mapped_parents = xcalloc(nparents, sizeof(*mapped_parents)); + if (repo_oid_to_algop(r, tree, r->compat_hash_algo, &mapped_tree)) { + result = -1; + free(mapped_parents); + goto out; + } + for (i = 0; i < nparents; i++) + if (repo_oid_to_algop(r, &parent_buf[i], r->compat_hash_algo, &mapped_parents[i])) { + result = -1; + free(mapped_parents); + goto out; + } + write_commit_tree(&compat_buffer, msg, msg_len, &mapped_tree, + mapped_parents, nparents, author, committer, extra); + + hash_object_file(r->compat_hash_algo, compat_buffer.buf, compat_buffer.len, + OBJ_COMMIT, &compat_oid); - result = write_object_file(buffer.buf, buffer.len, OBJ_COMMIT, ret); + if (sign_commit && sign_commit_to_strbuf(&compat_sig, &compat_buffer, sign_commit)) { + result = -1; + goto out; + } + } + + if (sign_commit) { + struct sig_pairs { + struct strbuf *sig; + const struct git_hash_algo *algo; + } bufs [2] = { + { &compat_sig, r->compat_hash_algo }, + { &sig, r->hash_algo }, + }; + int i; + + /* + * We write algorithms in the order they were implemented in + * Git to produce a stable hash when multiple algorithms are + * used. + */ + if (r->compat_hash_algo && hash_algo_by_ptr(bufs[0].algo) > hash_algo_by_ptr(bufs[1].algo)) + SWAP(bufs[0], bufs[1]); + + /* + * We traverse each algorithm in order, and apply the signature + * to each buffer. + */ + for (i = 0; i < ARRAY_SIZE(bufs); i++) { + if (!bufs[i].algo) + continue; + add_commit_signature(&buffer, bufs[i].sig, bufs[i].algo); + if (r->compat_hash_algo) + add_commit_signature(&compat_buffer, bufs[i].sig, bufs[i].algo); + } + } + + /* And check the encoding. */ + if (encoding_is_utf8 && (!verify_utf8(&buffer) || !verify_utf8(&compat_buffer))) + fprintf(stderr, _(commit_utf8_warn)); + + result = write_object_file_flags(buffer.buf, buffer.len, OBJ_COMMIT, + ret, &compat_oid, 0); out: strbuf_release(&buffer); + strbuf_release(&compat_buffer); + strbuf_release(&sig); + strbuf_release(&compat_sig); return result; } From patchwork Fri Sep 8 23:10:32 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Eric W. Biederman" X-Patchwork-Id: 13377914 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2AE27EEB571 for ; Fri, 8 Sep 2023 23:31:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1345276AbjIHXbD (ORCPT ); Fri, 8 Sep 2023 19:31:03 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48428 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1345284AbjIHXa7 (ORCPT ); Fri, 8 Sep 2023 19:30:59 -0400 Received: from out03.mta.xmission.com (out03.mta.xmission.com [166.70.13.233]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0CA141FE5 for ; Fri, 8 Sep 2023 16:30:40 -0700 (PDT) Received: from in02.mta.xmission.com ([166.70.13.52]:37242) by out03.mta.xmission.com with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1qekdp-00FHIn-IC; Fri, 08 Sep 2023 17:11:57 -0600 Received: from ip68-227-168-167.om.om.cox.net ([68.227.168.167]:54328 helo=localhost.localdomain) by in02.mta.xmission.com with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1qekdo-009u13-JH; Fri, 08 Sep 2023 17:11:57 -0600 From: "Eric W. Biederman" To: git@vger.kernel.org Cc: Junio C Hamano , "brian m. carlson" , "Eric W. Biederman" Date: Fri, 8 Sep 2023 18:10:32 -0500 Message-Id: <20230908231049.2035003-15-ebiederm@xmission.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <87sf7ol0z3.fsf@email.froward.int.ebiederm.org> References: <87sf7ol0z3.fsf@email.froward.int.ebiederm.org> MIME-Version: 1.0 X-XM-SPF: eid=1qekdo-009u13-JH;;;mid=<20230908231049.2035003-15-ebiederm@xmission.com>;;;hst=in02.mta.xmission.com;;;ip=68.227.168.167;;;frm=ebiederm@xmission.com;;;spf=pass X-XM-AID: U2FsdGVkX19qsKm9mb9BAhBD2V7I5lbTWl/aQUj34I8= X-SA-Exim-Connect-IP: 68.227.168.167 X-SA-Exim-Mail-From: ebiederm@xmission.com Subject: [PATCH 15/32] cache: add a function to read an OID of a specific algorithm X-SA-Exim-Version: 4.2.1 (built Sat, 08 Feb 2020 21:53:50 +0000) X-SA-Exim-Scanned: Yes (on in02.mta.xmission.com) Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: "brian m. carlson" Currently, we always read a object ID of the current algorithm with oidread. However, once we start converting objects, we'll need to consider what happens when we want to read an object ID of a specific algorithm, such as the compatibility algorithm. To make this easier, let's define oidread_algop, which specifies which algorithm we should use for our object ID, and define oidread in terms of it. Signed-off-by: brian m. carlson Signed-off-by: "Eric W. Biederman" --- hash.h | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/hash.h b/hash.h index 615ae0691d07..e064807c1733 100644 --- a/hash.h +++ b/hash.h @@ -73,10 +73,15 @@ static inline void oidclr(struct object_id *oid) oid->algo = hash_algo_by_ptr(the_hash_algo); } +static inline void oidread_algop(struct object_id *oid, const unsigned char *hash, const struct git_hash_algo *algop) +{ + memcpy(oid->hash, hash, algop->rawsz); + oid->algo = hash_algo_by_ptr(algop); +} + static inline void oidread(struct object_id *oid, const unsigned char *hash) { - memcpy(oid->hash, hash, the_hash_algo->rawsz); - oid->algo = hash_algo_by_ptr(the_hash_algo); + oidread_algop(oid, hash, the_hash_algo); } static inline int is_empty_blob_sha1(const unsigned char *sha1) From patchwork Fri Sep 8 23:10:33 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Eric W. Biederman" X-Patchwork-Id: 13377906 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 47BB0EEB570 for ; Fri, 8 Sep 2023 23:16:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1344953AbjIHXQI (ORCPT ); Fri, 8 Sep 2023 19:16:08 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58462 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1344945AbjIHXQH (ORCPT ); Fri, 8 Sep 2023 19:16:07 -0400 Received: from out01.mta.xmission.com (out01.mta.xmission.com [166.70.13.231]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C6D30133 for ; Fri, 8 Sep 2023 16:16:02 -0700 (PDT) Received: from in02.mta.xmission.com ([166.70.13.52]:34686) by out01.mta.xmission.com with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1qekdr-007QxK-PB; Fri, 08 Sep 2023 17:11:59 -0600 Received: from ip68-227-168-167.om.om.cox.net ([68.227.168.167]:54328 helo=localhost.localdomain) by in02.mta.xmission.com with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1qekdq-009u13-KR; Fri, 08 Sep 2023 17:11:59 -0600 From: "Eric W. Biederman" To: git@vger.kernel.org Cc: Junio C Hamano , "brian m. carlson" , "Eric W. Biederman" Date: Fri, 8 Sep 2023 18:10:33 -0500 Message-Id: <20230908231049.2035003-16-ebiederm@xmission.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <87sf7ol0z3.fsf@email.froward.int.ebiederm.org> References: <87sf7ol0z3.fsf@email.froward.int.ebiederm.org> MIME-Version: 1.0 X-XM-SPF: eid=1qekdq-009u13-KR;;;mid=<20230908231049.2035003-16-ebiederm@xmission.com>;;;hst=in02.mta.xmission.com;;;ip=68.227.168.167;;;frm=ebiederm@xmission.com;;;spf=pass X-XM-AID: U2FsdGVkX1/wPHPM15oeeW8kgr97bzrp0nz7c5WXJQw= X-SA-Exim-Connect-IP: 68.227.168.167 X-SA-Exim-Mail-From: ebiederm@xmission.com Subject: [PATCH 16/32] object: Factor out parse_mode out of fast-import and tree-walk into in object.h X-SA-Exim-Version: 4.2.1 (built Sat, 08 Feb 2020 21:53:50 +0000) X-SA-Exim-Scanned: Yes (on in02.mta.xmission.com) Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org builtin/fast-import.c and tree-walk.c have almost identical version of get_mode. The two functions started out the same but have diverged slightly. The version in fast-import changed mode to a uint16_t to save memory. The version in tree-walk started erroring if no mode was present. As far as I can tell both of these changes are valid for both of the callers, so add the both changes and place the common parsing helper in object.h Rename the helper from get_mode to parse_mode so it does not conflict with another helper named get_mode in diff-no-index.c This will be used shortly in a new helper decode_tree_entry_raw which is used to compute cmpatibility objects as part of the sha256 transition. Signed-off-by: "Eric W. Biederman" --- builtin/fast-import.c | 18 ++---------------- object.h | 18 ++++++++++++++++++ tree-walk.c | 22 +++------------------- 3 files changed, 23 insertions(+), 35 deletions(-) diff --git a/builtin/fast-import.c b/builtin/fast-import.c index 4dbb10aff3da..2c645fcfbe3f 100644 --- a/builtin/fast-import.c +++ b/builtin/fast-import.c @@ -1235,20 +1235,6 @@ static void *gfi_unpack_entry( return unpack_entry(the_repository, p, oe->idx.offset, &type, sizep); } -static const char *get_mode(const char *str, uint16_t *modep) -{ - unsigned char c; - uint16_t mode = 0; - - while ((c = *str++) != ' ') { - if (c < '0' || c > '7') - return NULL; - mode = (mode << 3) + (c - '0'); - } - *modep = mode; - return str; -} - static void load_tree(struct tree_entry *root) { struct object_id *oid = &root->versions[1].oid; @@ -1286,7 +1272,7 @@ static void load_tree(struct tree_entry *root) t->entries[t->entry_count++] = e; e->tree = NULL; - c = get_mode(c, &e->versions[1].mode); + c = parse_mode(c, &e->versions[1].mode); if (!c) die("Corrupt mode in %s", oid_to_hex(oid)); e->versions[0].mode = e->versions[1].mode; @@ -2275,7 +2261,7 @@ static void file_change_m(const char *p, struct branch *b) struct object_id oid; uint16_t mode, inline_data = 0; - p = get_mode(p, &mode); + p = parse_mode(p, &mode); if (!p) die("Corrupt mode: %s", command_buf.buf); switch (mode) { diff --git a/object.h b/object.h index 114d45954d08..70c8d4ae63dc 100644 --- a/object.h +++ b/object.h @@ -190,6 +190,24 @@ void *create_object(struct repository *r, const struct object_id *oid, void *obj void *object_as_type(struct object *obj, enum object_type type, int quiet); + +static inline const char *parse_mode(const char *str, uint16_t *modep) +{ + unsigned char c; + unsigned int mode = 0; + + if (*str == ' ') + return NULL; + + while ((c = *str++) != ' ') { + if (c < '0' || c > '7') + return NULL; + mode = (mode << 3) + (c - '0'); + } + *modep = mode; + return str; +} + /* * Returns the object, having parsed it to find out what it is. * diff --git a/tree-walk.c b/tree-walk.c index 29ead71be173..3af50a01c2c7 100644 --- a/tree-walk.c +++ b/tree-walk.c @@ -10,27 +10,11 @@ #include "pathspec.h" #include "json-writer.h" -static const char *get_mode(const char *str, unsigned int *modep) -{ - unsigned char c; - unsigned int mode = 0; - - if (*str == ' ') - return NULL; - - while ((c = *str++) != ' ') { - if (c < '0' || c > '7') - return NULL; - mode = (mode << 3) + (c - '0'); - } - *modep = mode; - return str; -} - static int decode_tree_entry(struct tree_desc *desc, const char *buf, unsigned long size, struct strbuf *err) { const char *path; - unsigned int mode, len; + unsigned int len; + uint16_t mode; const unsigned hashsz = the_hash_algo->rawsz; if (size < hashsz + 3 || buf[size - (hashsz + 1)]) { @@ -38,7 +22,7 @@ static int decode_tree_entry(struct tree_desc *desc, const char *buf, unsigned l return -1; } - path = get_mode(buf, &mode); + path = parse_mode(buf, &mode); if (!path) { strbuf_addstr(err, _("malformed mode in tree entry")); return -1; From patchwork Fri Sep 8 23:10:34 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Eric W. Biederman" X-Patchwork-Id: 13377922 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 84373EEB573 for ; Fri, 8 Sep 2023 23:31:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1345286AbjIHXbt (ORCPT ); Fri, 8 Sep 2023 19:31:49 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39900 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1345277AbjIHXbq (ORCPT ); Fri, 8 Sep 2023 19:31:46 -0400 Received: from out03.mta.xmission.com (out03.mta.xmission.com [166.70.13.233]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 746681FFE for ; Fri, 8 Sep 2023 16:31:39 -0700 (PDT) Received: from in02.mta.xmission.com ([166.70.13.52]:37272) by out03.mta.xmission.com with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1qekdu-00FHJ1-3T; Fri, 08 Sep 2023 17:12:02 -0600 Received: from ip68-227-168-167.om.om.cox.net ([68.227.168.167]:54328 helo=localhost.localdomain) by in02.mta.xmission.com with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1qekds-009u13-Ru; Fri, 08 Sep 2023 17:12:01 -0600 From: "Eric W. Biederman" To: git@vger.kernel.org Cc: Junio C Hamano , "brian m. carlson" , "Eric W . Biederman" Date: Fri, 8 Sep 2023 18:10:34 -0500 Message-Id: <20230908231049.2035003-17-ebiederm@xmission.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <87sf7ol0z3.fsf@email.froward.int.ebiederm.org> References: <87sf7ol0z3.fsf@email.froward.int.ebiederm.org> MIME-Version: 1.0 X-XM-SPF: eid=1qekds-009u13-Ru;;;mid=<20230908231049.2035003-17-ebiederm@xmission.com>;;;hst=in02.mta.xmission.com;;;ip=68.227.168.167;;;frm=ebiederm@xmission.com;;;spf=pass X-XM-AID: U2FsdGVkX19FBbcEN4AXtN6TA+x5luMNeTcnvcLO1vg= X-SA-Exim-Connect-IP: 68.227.168.167 X-SA-Exim-Mail-From: ebiederm@xmission.com Subject: [PATCH 17/32] object-file-convert: add a function to convert trees between algorithms X-SA-Exim-Version: 4.2.1 (built Sat, 08 Feb 2020 21:53:50 +0000) X-SA-Exim-Scanned: Yes (on in02.mta.xmission.com) Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: "brian m. carlson" In the future, we're going to want to provide SHA-256 repositories that have compatibility support for SHA-1 as well. In order to do so, we'll need to be able to convert tree objects from SHA-256 to SHA-1 by writing a tree with each SHA-256 object ID mapped to a SHA-1 object ID. We implement a function, convert_tree_object, that takes an existing tree buffer and writes it to a new strbuf, converting between algorithms. Let's make this function generic, because while we only need it to convert from the main algorithm to the compatibility algorithm now, we may need to do the other way around in the future, such as for transport. We avoid reusing the code in decode_tree_entry because that code normalizes data, and we don't want that here. We want to produce a complete round trip of data, so if, for example, the old entry had a wrongly zero-padded mode, we'd want to preserve that when converting to ensure a stable hash value. **** - Removed the repository parameter to convert_tree_object - Removed setting from and to defaults in convert_tree_object - Replaced repo_map_object with oid_to_algop - Replaced get_mode with parse_mode - Made convert_tree_object static. - Called convert_tree_object from convert_object_file. -- EWB Signed-off-by: brian m. carlson Signed-off-by: Eric W. Biederman --- object-file-convert.c | 51 ++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 50 insertions(+), 1 deletion(-) diff --git a/object-file-convert.c b/object-file-convert.c index e7c62434016d..f266c8c6cc95 100644 --- a/object-file-convert.c +++ b/object-file-convert.c @@ -1,8 +1,10 @@ #include "git-compat-util.h" #include "gettext.h" #include "strbuf.h" +#include "hex.h" #include "repository.h" #include "hash-ll.h" +#include "hash.h" #include "object.h" #include "loose.h" #include "object-file-convert.h" @@ -36,6 +38,51 @@ int repo_oid_to_algop(struct repository *repo, const struct object_id *src, return 0; } +static int decode_tree_entry_raw(struct object_id *oid, const char **path, + size_t *len, const struct git_hash_algo *algo, + const char *buf, unsigned long size) +{ + uint16_t mode; + const unsigned hashsz = algo->rawsz; + + if (size < hashsz + 3 || buf[size - (hashsz + 1)]) { + return -1; + } + + *path = parse_mode(buf, &mode); + if (!*path || !**path) + return -1; + *len = strlen(*path) + 1; + + oidread_algop(oid, (const unsigned char *)*path + *len, algo); + return 0; +} + +static int convert_tree_object(struct strbuf *out, + const struct git_hash_algo *from, + const struct git_hash_algo *to, + const char *buffer, size_t size) +{ + const char *p = buffer, *end = buffer + size; + + while (p < end) { + struct object_id entry_oid, mapped_oid; + const char *path = NULL; + size_t pathlen; + + if (decode_tree_entry_raw(&entry_oid, &path, &pathlen, from, p, + end - p)) + return error(_("failed to decode tree entry")); + if (repo_oid_to_algop(the_repository, &entry_oid, to, &mapped_oid)) + return error(_("failed to map tree entry for %s"), oid_to_hex(&entry_oid)); + strbuf_add(out, p, path - p); + strbuf_add(out, path, pathlen); + strbuf_add(out, mapped_oid.hash, to->rawsz); + p = path + pathlen + from->rawsz; + } + return 0; +} + int convert_object_file(struct strbuf *outbuf, const struct git_hash_algo *from, const struct git_hash_algo *to, @@ -50,8 +97,10 @@ int convert_object_file(struct strbuf *outbuf, die("Refusing noop object file conversion"); switch (type) { - case OBJ_COMMIT: case OBJ_TREE: + ret = convert_tree_object(outbuf, from, to, buf, len); + break; + case OBJ_COMMIT: case OBJ_TAG: default: /* Not implemented yet, so fail. */ From patchwork Fri Sep 8 23:10:35 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Eric W. Biederman" X-Patchwork-Id: 13377921 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D3B67EEB571 for ; Fri, 8 Sep 2023 23:31:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1345283AbjIHXbr (ORCPT ); Fri, 8 Sep 2023 19:31:47 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39870 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1343698AbjIHXbl (ORCPT ); Fri, 8 Sep 2023 19:31:41 -0400 Received: from out03.mta.xmission.com (out03.mta.xmission.com [166.70.13.233]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CDA0E2122 for ; Fri, 8 Sep 2023 16:31:33 -0700 (PDT) Received: from in02.mta.xmission.com ([166.70.13.52]:37304) by out03.mta.xmission.com with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1qekdw-00FHJA-Ik; Fri, 08 Sep 2023 17:12:04 -0600 Received: from ip68-227-168-167.om.om.cox.net ([68.227.168.167]:54328 helo=localhost.localdomain) by in02.mta.xmission.com with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1qekdv-009u13-5r; Fri, 08 Sep 2023 17:12:03 -0600 From: "Eric W. Biederman" To: git@vger.kernel.org Cc: Junio C Hamano , "brian m. carlson" , "Eric W . Biederman" Date: Fri, 8 Sep 2023 18:10:35 -0500 Message-Id: <20230908231049.2035003-18-ebiederm@xmission.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <87sf7ol0z3.fsf@email.froward.int.ebiederm.org> References: <87sf7ol0z3.fsf@email.froward.int.ebiederm.org> MIME-Version: 1.0 X-XM-SPF: eid=1qekdv-009u13-5r;;;mid=<20230908231049.2035003-18-ebiederm@xmission.com>;;;hst=in02.mta.xmission.com;;;ip=68.227.168.167;;;frm=ebiederm@xmission.com;;;spf=pass X-XM-AID: U2FsdGVkX19w69wNEceLJoODGas/yjuMTHX1rXQzKsk= X-SA-Exim-Connect-IP: 68.227.168.167 X-SA-Exim-Mail-From: ebiederm@xmission.com Subject: [PATCH 18/32] object-file-convert: convert commit objects when writing X-SA-Exim-Version: 4.2.1 (built Sat, 08 Feb 2020 21:53:50 +0000) X-SA-Exim-Scanned: Yes (on in02.mta.xmission.com) Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: "brian m. carlson" When writing a commit object in a repository with both SHA-1 and SHA-256, we'll need to convert our commit objects so that we can write the hash values for both into the repository. To do so, let's add a function to convert commit objects. Read the commit object and map the tree value and any of the parent values, and copy the rest of the commit through unmodified. Note that we don't need to modify the signature headers, because they are the same under both algorithms. **** - made static and moved to object-file-convert.c - Renamed the variable compat_oid to mapped_oid for clarity - Replaced repo_map_object with oid_to_algop -- EWB Signed-off-by: brian m. carlson Signed-off-by: Eric W. Biederman --- object-file-convert.c | 44 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 44 insertions(+) diff --git a/object-file-convert.c b/object-file-convert.c index f266c8c6cc95..9c715a9864d5 100644 --- a/object-file-convert.c +++ b/object-file-convert.c @@ -83,6 +83,48 @@ static int convert_tree_object(struct strbuf *out, return 0; } +static int convert_commit_object(struct strbuf *out, + const struct git_hash_algo *from, + const struct git_hash_algo *to, + const char *buffer, size_t size) +{ + const char *tail = buffer; + const char *bufptr = buffer; + const int tree_entry_len = from->hexsz + 5; + const int parent_entry_len = from->hexsz + 7; + struct object_id oid, mapped_oid; + const char *p; + + tail += size; + if (tail <= bufptr + tree_entry_len + 1 || memcmp(bufptr, "tree ", 5) || + bufptr[tree_entry_len] != '\n') + return error("bogus commit object"); + if (parse_oid_hex_algop(bufptr + 5, &oid, &p, from) < 0) + return error("bad tree pointer"); + + if (repo_oid_to_algop(the_repository, &oid, to, &mapped_oid)) + return error("unable to map tree %s in commit object", + oid_to_hex(&oid)); + strbuf_addf(out, "tree %s\n", oid_to_hex(&mapped_oid)); + bufptr = p + 1; + + while (bufptr + parent_entry_len < tail && !memcmp(bufptr, "parent ", 7)) { + if (tail <= bufptr + parent_entry_len + 1 || + parse_oid_hex_algop(bufptr + 7, &oid, &p, from) || + *p != '\n') + return error("bad parents in commit"); + + if (repo_oid_to_algop(the_repository, &oid, to, &mapped_oid)) + return error("unable to map parent %s in commit object", + oid_to_hex(&oid)); + + strbuf_addf(out, "parent %s\n", oid_to_hex(&mapped_oid)); + bufptr = p + 1; + } + strbuf_add(out, bufptr, tail - bufptr); + return 0; +} + int convert_object_file(struct strbuf *outbuf, const struct git_hash_algo *from, const struct git_hash_algo *to, @@ -101,6 +143,8 @@ int convert_object_file(struct strbuf *outbuf, ret = convert_tree_object(outbuf, from, to, buf, len); break; case OBJ_COMMIT: + ret = convert_commit_object(outbuf, from, to, buf, len); + break; case OBJ_TAG: default: /* Not implemented yet, so fail. */ From patchwork Fri Sep 8 23:10:36 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Eric W. Biederman" X-Patchwork-Id: 13377899 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 11EFBEEB56E for ; Fri, 8 Sep 2023 23:12:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1344516AbjIHXMU (ORCPT ); Fri, 8 Sep 2023 19:12:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60316 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1344177AbjIHXMT (ORCPT ); Fri, 8 Sep 2023 19:12:19 -0400 Received: from out02.mta.xmission.com (out02.mta.xmission.com [166.70.13.232]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 904491FF5 for ; Fri, 8 Sep 2023 16:12:07 -0700 (PDT) Received: from in02.mta.xmission.com ([166.70.13.52]:39630) by out02.mta.xmission.com with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1qekdy-006MpA-PX; Fri, 08 Sep 2023 17:12:06 -0600 Received: from ip68-227-168-167.om.om.cox.net ([68.227.168.167]:54328 helo=localhost.localdomain) by in02.mta.xmission.com with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1qekdx-009u13-L5; Fri, 08 Sep 2023 17:12:06 -0600 From: "Eric W. Biederman" To: git@vger.kernel.org Cc: Junio C Hamano , "brian m. carlson" , "Eric W . Biederman" Date: Fri, 8 Sep 2023 18:10:36 -0500 Message-Id: <20230908231049.2035003-19-ebiederm@xmission.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <87sf7ol0z3.fsf@email.froward.int.ebiederm.org> References: <87sf7ol0z3.fsf@email.froward.int.ebiederm.org> MIME-Version: 1.0 X-XM-SPF: eid=1qekdx-009u13-L5;;;mid=<20230908231049.2035003-19-ebiederm@xmission.com>;;;hst=in02.mta.xmission.com;;;ip=68.227.168.167;;;frm=ebiederm@xmission.com;;;spf=pass X-XM-AID: U2FsdGVkX19Ckvsc6tmnaFuCKseftF4jBaOl9gyVK8A= X-SA-Exim-Connect-IP: 68.227.168.167 X-SA-Exim-Mail-From: ebiederm@xmission.com Subject: [PATCH 19/32] object-file-convert: convert tag commits when writing X-SA-Exim-Version: 4.2.1 (built Sat, 08 Feb 2020 21:53:50 +0000) X-SA-Exim-Scanned: Yes (on in02.mta.xmission.com) Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: "brian m. carlson" When writing a tag object in a repository with both SHA-1 and SHA-256, we'll need to convert our commit objects so that we can write the hash values for both into the repository. To do so, let's add a function to convert tag objects. Note that signatures for tag objects in the current algorithm trail the message, and those for the alternate algorithm are in headers. Therefore, we parse the tag object for both a trailing signature and a header and then, when writing the other format, swap the two around. We expose the add_commit_signature function, which we rename now that it is useful for tags as well, and use it to add the header. **** - Moved convert_tag_object into object-file-convert.c and made it static - Adjusted how convert_object_file calls convert_tag_object --EWB Signed-off-by: brian m. carlson Signed-off-by: Eric W. Biederman --- commit.c | 6 +++--- commit.h | 1 + object-file-convert.c | 50 +++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 54 insertions(+), 3 deletions(-) diff --git a/commit.c b/commit.c index 522ebb4b3002..54f19ed0328c 100644 --- a/commit.c +++ b/commit.c @@ -1101,7 +1101,7 @@ static const char *gpg_sig_headers[] = { "gpgsig-sha256", }; -static int add_commit_signature(struct strbuf *buf, struct strbuf *sig, const struct git_hash_algo *algo) +int add_header_signature(struct strbuf *buf, struct strbuf *sig, const struct git_hash_algo *algo) { int inspos, copypos; const char *eoh; @@ -1732,9 +1732,9 @@ int commit_tree_extended(const char *msg, size_t msg_len, for (i = 0; i < ARRAY_SIZE(bufs); i++) { if (!bufs[i].algo) continue; - add_commit_signature(&buffer, bufs[i].sig, bufs[i].algo); + add_header_signature(&buffer, bufs[i].sig, bufs[i].algo); if (r->compat_hash_algo) - add_commit_signature(&compat_buffer, bufs[i].sig, bufs[i].algo); + add_header_signature(&compat_buffer, bufs[i].sig, bufs[i].algo); } } diff --git a/commit.h b/commit.h index 28928833c544..03edcec0129f 100644 --- a/commit.h +++ b/commit.h @@ -370,5 +370,6 @@ int parse_buffer_signed_by_header(const char *buffer, struct strbuf *payload, struct strbuf *signature, const struct git_hash_algo *algop); +int add_header_signature(struct strbuf *buf, struct strbuf *sig, const struct git_hash_algo *algo); #endif /* COMMIT_H */ diff --git a/object-file-convert.c b/object-file-convert.c index 9c715a9864d5..d381d3d2ea65 100644 --- a/object-file-convert.c +++ b/object-file-convert.c @@ -7,6 +7,8 @@ #include "hash.h" #include "object.h" #include "loose.h" +#include "commit.h" +#include "gpg-interface.h" #include "object-file-convert.h" int repo_oid_to_algop(struct repository *repo, const struct object_id *src, @@ -125,6 +127,52 @@ static int convert_commit_object(struct strbuf *out, return 0; } +static int convert_tag_object(struct strbuf *out, + const struct git_hash_algo *from, + const struct git_hash_algo *to, + const char *buffer, size_t size) +{ + struct strbuf payload = STRBUF_INIT, temp = STRBUF_INIT, oursig = STRBUF_INIT, othersig = STRBUF_INIT; + size_t payload_size; + struct object_id oid, mapped_oid; + const char *p; + + /* Add some slop for longer signature header in the new algorithm. */ + strbuf_grow(out, size + 7); + + /* Is there a signature for our algorithm? */ + payload_size = parse_signed_buffer(buffer, size); + strbuf_add(&payload, buffer, payload_size); + if (payload_size != size) { + /* Yes, there is. */ + strbuf_add(&oursig, buffer + payload_size, size - payload_size); + } + /* Now, is there a signature for the other algorithm? */ + if (parse_buffer_signed_by_header(payload.buf, payload.len, &temp, &othersig, to)) { + /* Yes, there is. */ + strbuf_swap(&payload, &temp); + strbuf_release(&temp); + } + + /* + * Our payload is now in payload and we may have up to two signatrures + * in oursig and othersig. + */ + if (strncmp(payload.buf, "object ", 7) || payload.buf[from->hexsz + 7] != '\n') + return error("bogus tag object"); + if (parse_oid_hex_algop(payload.buf + 7, &oid, &p, from) < 0) + return error("bad tag object ID"); + if (repo_oid_to_algop(the_repository, &oid, to, &mapped_oid)) + return error("unable to map tree %s in tag object", + oid_to_hex(&oid)); + strbuf_addf(out, "object %s\n", oid_to_hex(&mapped_oid)); + strbuf_add(out, p, payload.len - (p - payload.buf)); + strbuf_addbuf(out, &othersig); + if (oursig.len) + add_header_signature(out, &oursig, from); + return 0; +} + int convert_object_file(struct strbuf *outbuf, const struct git_hash_algo *from, const struct git_hash_algo *to, @@ -146,6 +194,8 @@ int convert_object_file(struct strbuf *outbuf, ret = convert_commit_object(outbuf, from, to, buf, len); break; case OBJ_TAG: + ret = convert_tag_object(outbuf, from, to, buf, len); + break; default: /* Not implemented yet, so fail. */ ret = -1; From patchwork Fri Sep 8 23:10:37 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Eric W. Biederman" X-Patchwork-Id: 13377900 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A38A8EEB570 for ; Fri, 8 Sep 2023 23:12:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1344211AbjIHXMV (ORCPT ); Fri, 8 Sep 2023 19:12:21 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60296 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1344196AbjIHXMT (ORCPT ); Fri, 8 Sep 2023 19:12:19 -0400 Received: from out02.mta.xmission.com (out02.mta.xmission.com [166.70.13.232]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8B0BF133 for ; Fri, 8 Sep 2023 16:12:09 -0700 (PDT) Received: from in02.mta.xmission.com ([166.70.13.52]:39650) by out02.mta.xmission.com with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1qeke0-006MqE-NR; Fri, 08 Sep 2023 17:12:08 -0600 Received: from ip68-227-168-167.om.om.cox.net ([68.227.168.167]:54328 helo=localhost.localdomain) by in02.mta.xmission.com with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1qekdz-009u13-S0; Fri, 08 Sep 2023 17:12:08 -0600 From: "Eric W. Biederman" To: git@vger.kernel.org Cc: Junio C Hamano , "brian m. carlson" , "Eric W. Biederman" Date: Fri, 8 Sep 2023 18:10:37 -0500 Message-Id: <20230908231049.2035003-20-ebiederm@xmission.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <87sf7ol0z3.fsf@email.froward.int.ebiederm.org> References: <87sf7ol0z3.fsf@email.froward.int.ebiederm.org> MIME-Version: 1.0 X-XM-SPF: eid=1qekdz-009u13-S0;;;mid=<20230908231049.2035003-20-ebiederm@xmission.com>;;;hst=in02.mta.xmission.com;;;ip=68.227.168.167;;;frm=ebiederm@xmission.com;;;spf=pass X-XM-AID: U2FsdGVkX1+RJWlS2GgRzOCq66q0KRWdzt652tQUF2c= X-SA-Exim-Connect-IP: 68.227.168.167 X-SA-Exim-Mail-From: ebiederm@xmission.com Subject: [PATCH 20/32] builtin/cat-file: Let the oid determine the output algorithm X-SA-Exim-Version: 4.2.1 (built Sat, 08 Feb 2020 21:53:50 +0000) X-SA-Exim-Scanned: Yes (on in02.mta.xmission.com) Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Use GET_OID_UNTRANSLATED when calling get_oid_with_context. This implements the semi-obvious behaviour that specifying a sha1 oid shows the output for a sha1 encoded object, and specifying a sha256 oid shows the output for a sha256 encoded object. This is useful for testing the the conversion of an object to an equivalent object encoded with a different hash function. Signed-off-by: "Eric W. Biederman" --- builtin/cat-file.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/builtin/cat-file.c b/builtin/cat-file.c index 694c8538df2f..7c9600292376 100644 --- a/builtin/cat-file.c +++ b/builtin/cat-file.c @@ -107,7 +107,10 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name, struct object_info oi = OBJECT_INFO_INIT; struct strbuf sb = STRBUF_INIT; unsigned flags = OBJECT_INFO_LOOKUP_REPLACE; - unsigned get_oid_flags = GET_OID_RECORD_PATH | GET_OID_ONLY_TO_DIE; + unsigned get_oid_flags = + GET_OID_RECORD_PATH | + GET_OID_ONLY_TO_DIE | + GET_OID_UNTRANSLATED; const char *path = force_path; const int opt_cw = (opt == 'c' || opt == 'w'); if (!path && opt_cw) @@ -223,7 +226,8 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name, &size); const char *target; if (!skip_prefix(buffer, "object ", &target) || - get_oid_hex(target, &blob_oid)) + get_oid_hex_algop(target, &blob_oid, + &hash_algos[oid.algo])) die("%s not a valid tag", oid_to_hex(&oid)); free(buffer); } else From patchwork Fri Sep 8 23:10:38 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Eric W. Biederman" X-Patchwork-Id: 13377919 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1B81CEEB570 for ; Fri, 8 Sep 2023 23:31:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239379AbjIHXbj (ORCPT ); Fri, 8 Sep 2023 19:31:39 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49160 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S245341AbjIHXbh (ORCPT ); Fri, 8 Sep 2023 19:31:37 -0400 Received: from out03.mta.xmission.com (out03.mta.xmission.com [166.70.13.233]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 54FE51FFE for ; Fri, 8 Sep 2023 16:31:25 -0700 (PDT) Received: from in02.mta.xmission.com ([166.70.13.52]:37380) by out03.mta.xmission.com with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1qeke4-00FHJZ-FO; Fri, 08 Sep 2023 17:12:12 -0600 Received: from ip68-227-168-167.om.om.cox.net ([68.227.168.167]:54328 helo=localhost.localdomain) by in02.mta.xmission.com with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1qeke1-009u13-Pu; Fri, 08 Sep 2023 17:12:12 -0600 From: "Eric W. Biederman" To: git@vger.kernel.org Cc: Junio C Hamano , "brian m. carlson" , "Eric W. Biederman" Date: Fri, 8 Sep 2023 18:10:38 -0500 Message-Id: <20230908231049.2035003-21-ebiederm@xmission.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <87sf7ol0z3.fsf@email.froward.int.ebiederm.org> References: <87sf7ol0z3.fsf@email.froward.int.ebiederm.org> MIME-Version: 1.0 X-XM-SPF: eid=1qeke1-009u13-Pu;;;mid=<20230908231049.2035003-21-ebiederm@xmission.com>;;;hst=in02.mta.xmission.com;;;ip=68.227.168.167;;;frm=ebiederm@xmission.com;;;spf=pass X-XM-AID: U2FsdGVkX18oCVTiYH9Zq+p/uD5FthA0m2048PlADPY= X-SA-Exim-Connect-IP: 68.227.168.167 X-SA-Exim-Mail-From: ebiederm@xmission.com Subject: [PATCH 21/32] tree-walk: init_tree_desc take an oid to get the hash algorithm X-SA-Exim-Version: 4.2.1 (built Sat, 08 Feb 2020 21:53:50 +0000) X-SA-Exim-Scanned: Yes (on in02.mta.xmission.com) Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org To make it possible for git ls-tree to display the tree encoded in the hash algorithm of the oid specified to git ls-tree, update init_tree_desc to take as a parameter the oid of the tree object. Update all callers of init_tree_desc and init_tree_desc_gently to pass the oid of the tree object. Use the oid of the tree object to discover the hash algorithm of the oid and store that hash algorithm in struct tree_desc. Use the hash algorithm in decode_tree_entry and update_tree_entry_internal to handle reading a tree object encoded in a hash algorithm that differs from the repositories hash algorithm. Signed-off-by: "Eric W. Biederman" --- archive.c | 3 ++- builtin/am.c | 6 +++--- builtin/checkout.c | 8 +++++--- builtin/clone.c | 2 +- builtin/commit.c | 2 +- builtin/grep.c | 8 ++++---- builtin/merge.c | 3 ++- builtin/pack-objects.c | 6 ++++-- builtin/read-tree.c | 2 +- builtin/stash.c | 5 +++-- cache-tree.c | 2 +- delta-islands.c | 2 +- diff-lib.c | 2 +- fsck.c | 6 ++++-- http-push.c | 2 +- list-objects.c | 2 +- match-trees.c | 4 ++-- merge-ort.c | 11 ++++++----- merge-recursive.c | 2 +- merge.c | 3 ++- pack-bitmap-write.c | 2 +- packfile.c | 3 ++- reflog.c | 2 +- revision.c | 4 ++-- tree-walk.c | 36 +++++++++++++++++++++--------------- tree-walk.h | 7 +++++-- tree.c | 2 +- walker.c | 2 +- 28 files changed, 80 insertions(+), 59 deletions(-) diff --git a/archive.c b/archive.c index ca11db185b15..b10269aee7be 100644 --- a/archive.c +++ b/archive.c @@ -339,7 +339,8 @@ int write_archive_entries(struct archiver_args *args, opts.src_index = args->repo->index; opts.dst_index = args->repo->index; opts.fn = oneway_merge; - init_tree_desc(&t, args->tree->buffer, args->tree->size); + init_tree_desc(&t, &args->tree->object.oid, + args->tree->buffer, args->tree->size); if (unpack_trees(1, &t, &opts)) return -1; git_attr_set_direction(GIT_ATTR_INDEX); diff --git a/builtin/am.c b/builtin/am.c index 8bde034fae68..4dfd714b910e 100644 --- a/builtin/am.c +++ b/builtin/am.c @@ -1991,8 +1991,8 @@ static int fast_forward_to(struct tree *head, struct tree *remote, int reset) opts.reset = reset ? UNPACK_RESET_PROTECT_UNTRACKED : 0; opts.preserve_ignored = 0; /* FIXME: !overwrite_ignore */ opts.fn = twoway_merge; - init_tree_desc(&t[0], head->buffer, head->size); - init_tree_desc(&t[1], remote->buffer, remote->size); + init_tree_desc(&t[0], &head->object.oid, head->buffer, head->size); + init_tree_desc(&t[1], &remote->object.oid, remote->buffer, remote->size); if (unpack_trees(2, t, &opts)) { rollback_lock_file(&lock_file); @@ -2026,7 +2026,7 @@ static int merge_tree(struct tree *tree) opts.dst_index = &the_index; opts.merge = 1; opts.fn = oneway_merge; - init_tree_desc(&t[0], tree->buffer, tree->size); + init_tree_desc(&t[0], &tree->object.oid, tree->buffer, tree->size); if (unpack_trees(1, t, &opts)) { rollback_lock_file(&lock_file); diff --git a/builtin/checkout.c b/builtin/checkout.c index f53612f46870..03eff73fd031 100644 --- a/builtin/checkout.c +++ b/builtin/checkout.c @@ -701,7 +701,7 @@ static int reset_tree(struct tree *tree, const struct checkout_opts *o, info->commit ? &info->commit->object.oid : null_oid(), NULL); parse_tree(tree); - init_tree_desc(&tree_desc, tree->buffer, tree->size); + init_tree_desc(&tree_desc, &tree->object.oid, tree->buffer, tree->size); switch (unpack_trees(1, &tree_desc, &opts)) { case -2: *writeout_error = 1; @@ -815,10 +815,12 @@ static int merge_working_tree(const struct checkout_opts *opts, die(_("unable to parse commit %s"), oid_to_hex(old_commit_oid)); - init_tree_desc(&trees[0], tree->buffer, tree->size); + init_tree_desc(&trees[0], &tree->object.oid, + tree->buffer, tree->size); parse_tree(new_tree); tree = new_tree; - init_tree_desc(&trees[1], tree->buffer, tree->size); + init_tree_desc(&trees[1], &tree->object.oid, + tree->buffer, tree->size); ret = unpack_trees(2, trees, &topts); clear_unpack_trees_porcelain(&topts); diff --git a/builtin/clone.c b/builtin/clone.c index c6357af94989..79ceefb93995 100644 --- a/builtin/clone.c +++ b/builtin/clone.c @@ -737,7 +737,7 @@ static int checkout(int submodule_progress, int filter_submodules) if (!tree) die(_("unable to parse commit %s"), oid_to_hex(&oid)); parse_tree(tree); - init_tree_desc(&t, tree->buffer, tree->size); + init_tree_desc(&t, &tree->object.oid, tree->buffer, tree->size); if (unpack_trees(1, &t, &opts) < 0) die(_("unable to checkout working tree")); diff --git a/builtin/commit.c b/builtin/commit.c index 7da5f924484d..537319932b65 100644 --- a/builtin/commit.c +++ b/builtin/commit.c @@ -340,7 +340,7 @@ static void create_base_index(const struct commit *current_head) if (!tree) die(_("failed to unpack HEAD tree object")); parse_tree(tree); - init_tree_desc(&t, tree->buffer, tree->size); + init_tree_desc(&t, &tree->object.oid, tree->buffer, tree->size); if (unpack_trees(1, &t, &opts)) exit(128); /* We've already reported the error, finish dying */ } diff --git a/builtin/grep.c b/builtin/grep.c index 50e712a18479..0c2b8a376f8e 100644 --- a/builtin/grep.c +++ b/builtin/grep.c @@ -530,7 +530,7 @@ static int grep_submodule(struct grep_opt *opt, strbuf_addstr(&base, filename); strbuf_addch(&base, '/'); - init_tree_desc(&tree, data, size); + init_tree_desc(&tree, oid, data, size); hit = grep_tree(&subopt, pathspec, &tree, &base, base.len, object_type == OBJ_COMMIT); strbuf_release(&base); @@ -574,7 +574,7 @@ static int grep_cache(struct grep_opt *opt, data = repo_read_object_file(the_repository, &ce->oid, &type, &size); - init_tree_desc(&tree, data, size); + init_tree_desc(&tree, &ce->oid, data, size); hit |= grep_tree(opt, pathspec, &tree, &name, 0, 0); strbuf_setlen(&name, name_base_len); @@ -670,7 +670,7 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, oid_to_hex(&entry.oid)); strbuf_addch(base, '/'); - init_tree_desc(&sub, data, size); + init_tree_desc(&sub, &entry.oid, data, size); hit |= grep_tree(opt, pathspec, &sub, base, tn_len, check_attr); free(data); @@ -714,7 +714,7 @@ static int grep_object(struct grep_opt *opt, const struct pathspec *pathspec, strbuf_add(&base, name, len); strbuf_addch(&base, ':'); } - init_tree_desc(&tree, data, size); + init_tree_desc(&tree, &obj->oid, data, size); hit = grep_tree(opt, pathspec, &tree, &base, base.len, obj->type == OBJ_COMMIT); strbuf_release(&base); diff --git a/builtin/merge.c b/builtin/merge.c index de68910177fb..718165d45917 100644 --- a/builtin/merge.c +++ b/builtin/merge.c @@ -704,7 +704,8 @@ static int read_tree_trivial(struct object_id *common, struct object_id *head, cache_tree_free(&the_index.cache_tree); for (i = 0; i < nr_trees; i++) { parse_tree(trees[i]); - init_tree_desc(t+i, trees[i]->buffer, trees[i]->size); + init_tree_desc(t+i, &trees[i]->object.oid, + trees[i]->buffer, trees[i]->size); } if (unpack_trees(nr_trees, t, &opts)) return -1; diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index d2a162d52804..d34902002656 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -1756,7 +1756,8 @@ static void add_pbase_object(struct tree_desc *tree, tree = pbase_tree_get(&entry.oid); if (!tree) return; - init_tree_desc(&sub, tree->tree_data, tree->tree_size); + init_tree_desc(&sub, &tree->oid, + tree->tree_data, tree->tree_size); add_pbase_object(&sub, down, downlen, fullname); pbase_tree_put(tree); @@ -1816,7 +1817,8 @@ static void add_preferred_base_object(const char *name) } else { struct tree_desc tree; - init_tree_desc(&tree, it->pcache.tree_data, it->pcache.tree_size); + init_tree_desc(&tree, &it->pcache.oid, + it->pcache.tree_data, it->pcache.tree_size); add_pbase_object(&tree, name, cmplen, name); } } diff --git a/builtin/read-tree.c b/builtin/read-tree.c index 1fec702a04fa..24d6d156d3a2 100644 --- a/builtin/read-tree.c +++ b/builtin/read-tree.c @@ -264,7 +264,7 @@ int cmd_read_tree(int argc, const char **argv, const char *cmd_prefix) for (i = 0; i < nr_trees; i++) { struct tree *tree = trees[i]; parse_tree(tree); - init_tree_desc(t+i, tree->buffer, tree->size); + init_tree_desc(t+i, &tree->object.oid, tree->buffer, tree->size); } if (unpack_trees(nr_trees, t, &opts)) return 128; diff --git a/builtin/stash.c b/builtin/stash.c index fe64cde9ce30..9ee52af4d28e 100644 --- a/builtin/stash.c +++ b/builtin/stash.c @@ -285,7 +285,7 @@ static int reset_tree(struct object_id *i_tree, int update, int reset) if (parse_tree(tree)) return -1; - init_tree_desc(t, tree->buffer, tree->size); + init_tree_desc(t, &tree->object.oid, tree->buffer, tree->size); opts.head_idx = 1; opts.src_index = &the_index; @@ -871,7 +871,8 @@ static void diff_include_untracked(const struct stash_info *info, struct diff_op tree[i] = parse_tree_indirect(oid[i]); if (parse_tree(tree[i]) < 0) die(_("failed to parse tree")); - init_tree_desc(&tree_desc[i], tree[i]->buffer, tree[i]->size); + init_tree_desc(&tree_desc[i], &tree[i]->object.oid, + tree[i]->buffer, tree[i]->size); } unpack_tree_opt.head_idx = -1; diff --git a/cache-tree.c b/cache-tree.c index ddc7d3d86959..334973a01cee 100644 --- a/cache-tree.c +++ b/cache-tree.c @@ -770,7 +770,7 @@ static void prime_cache_tree_rec(struct repository *r, oidcpy(&it->oid, &tree->object.oid); - init_tree_desc(&desc, tree->buffer, tree->size); + init_tree_desc(&desc, &tree->object.oid, tree->buffer, tree->size); cnt = 0; while (tree_entry(&desc, &entry)) { if (!S_ISDIR(entry.mode)) diff --git a/delta-islands.c b/delta-islands.c index 5de5759f3f13..1ff3506b10f2 100644 --- a/delta-islands.c +++ b/delta-islands.c @@ -289,7 +289,7 @@ void resolve_tree_islands(struct repository *r, if (!tree || parse_tree(tree) < 0) die(_("bad tree object %s"), oid_to_hex(&ent->idx.oid)); - init_tree_desc(&desc, tree->buffer, tree->size); + init_tree_desc(&desc, &tree->object.oid, tree->buffer, tree->size); while (tree_entry(&desc, &entry)) { struct object *obj; diff --git a/diff-lib.c b/diff-lib.c index 6b0c6a7180cc..add323f5628d 100644 --- a/diff-lib.c +++ b/diff-lib.c @@ -558,7 +558,7 @@ static int diff_cache(struct rev_info *revs, opts.pathspec = &revs->diffopt.pathspec; opts.pathspec->recursive = 1; - init_tree_desc(&t, tree->buffer, tree->size); + init_tree_desc(&t, &tree->object.oid, tree->buffer, tree->size); return unpack_trees(1, &t, &opts); } diff --git a/fsck.c b/fsck.c index 2b1e348005b7..6b492a48da82 100644 --- a/fsck.c +++ b/fsck.c @@ -313,7 +313,8 @@ static int fsck_walk_tree(struct tree *tree, void *data, struct fsck_options *op return -1; name = fsck_get_object_name(options, &tree->object.oid); - if (init_tree_desc_gently(&desc, tree->buffer, tree->size, 0)) + if (init_tree_desc_gently(&desc, &tree->object.oid, + tree->buffer, tree->size, 0)) return -1; while (tree_entry_gently(&desc, &entry)) { struct object *obj; @@ -583,7 +584,8 @@ static int fsck_tree(const struct object_id *tree_oid, const char *o_name; struct name_stack df_dup_candidates = { NULL }; - if (init_tree_desc_gently(&desc, buffer, size, TREE_DESC_RAW_MODES)) { + if (init_tree_desc_gently(&desc, tree_oid, buffer, size, + TREE_DESC_RAW_MODES)) { retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_BAD_TREE, "cannot be parsed as a tree"); diff --git a/http-push.c b/http-push.c index a704f490fdb2..81c35b5e96f7 100644 --- a/http-push.c +++ b/http-push.c @@ -1308,7 +1308,7 @@ static struct object_list **process_tree(struct tree *tree, obj->flags |= SEEN; p = add_one_object(obj, p); - init_tree_desc(&desc, tree->buffer, tree->size); + init_tree_desc(&desc, &tree->object.oid, tree->buffer, tree->size); while (tree_entry(&desc, &entry)) switch (object_type(entry.mode)) { diff --git a/list-objects.c b/list-objects.c index e60a6cd5b46e..312335c8a7f2 100644 --- a/list-objects.c +++ b/list-objects.c @@ -97,7 +97,7 @@ static void process_tree_contents(struct traversal_context *ctx, enum interesting match = ctx->revs->diffopt.pathspec.nr == 0 ? all_entries_interesting : entry_not_interesting; - init_tree_desc(&desc, tree->buffer, tree->size); + init_tree_desc(&desc, &tree->object.oid, tree->buffer, tree->size); while (tree_entry(&desc, &entry)) { if (match != all_entries_interesting) { diff --git a/match-trees.c b/match-trees.c index 0885ac681cd5..3412b6a1401d 100644 --- a/match-trees.c +++ b/match-trees.c @@ -63,7 +63,7 @@ static void *fill_tree_desc_strict(struct tree_desc *desc, die("unable to read tree (%s)", oid_to_hex(hash)); if (type != OBJ_TREE) die("%s is not a tree", oid_to_hex(hash)); - init_tree_desc(desc, buffer, size); + init_tree_desc(desc, hash, buffer, size); return buffer; } @@ -194,7 +194,7 @@ static int splice_tree(const struct object_id *oid1, const char *prefix, buf = repo_read_object_file(the_repository, oid1, &type, &sz); if (!buf) die("cannot read tree %s", oid_to_hex(oid1)); - init_tree_desc(&desc, buf, sz); + init_tree_desc(&desc, oid1, buf, sz); rewrite_here = NULL; while (desc.size) { diff --git a/merge-ort.c b/merge-ort.c index 8631c997002d..3a5729c91e48 100644 --- a/merge-ort.c +++ b/merge-ort.c @@ -1679,9 +1679,10 @@ static int collect_merge_info(struct merge_options *opt, parse_tree(merge_base); parse_tree(side1); parse_tree(side2); - init_tree_desc(t + 0, merge_base->buffer, merge_base->size); - init_tree_desc(t + 1, side1->buffer, side1->size); - init_tree_desc(t + 2, side2->buffer, side2->size); + init_tree_desc(t + 0, &merge_base->object.oid, + merge_base->buffer, merge_base->size); + init_tree_desc(t + 1, &side1->object.oid, side1->buffer, side1->size); + init_tree_desc(t + 2, &side2->object.oid, side2->buffer, side2->size); trace2_region_enter("merge", "traverse_trees", opt->repo); ret = traverse_trees(NULL, 3, t, &info); @@ -4400,9 +4401,9 @@ static int checkout(struct merge_options *opt, unpack_opts.fn = twoway_merge; unpack_opts.preserve_ignored = 0; /* FIXME: !opts->overwrite_ignore */ parse_tree(prev); - init_tree_desc(&trees[0], prev->buffer, prev->size); + init_tree_desc(&trees[0], &prev->object.oid, prev->buffer, prev->size); parse_tree(next); - init_tree_desc(&trees[1], next->buffer, next->size); + init_tree_desc(&trees[1], &next->object.oid, next->buffer, next->size); ret = unpack_trees(2, trees, &unpack_opts); clear_unpack_trees_porcelain(&unpack_opts); diff --git a/merge-recursive.c b/merge-recursive.c index 6a4081bb0f52..93df9eecdd95 100644 --- a/merge-recursive.c +++ b/merge-recursive.c @@ -411,7 +411,7 @@ static inline int merge_detect_rename(struct merge_options *opt) static void init_tree_desc_from_tree(struct tree_desc *desc, struct tree *tree) { parse_tree(tree); - init_tree_desc(desc, tree->buffer, tree->size); + init_tree_desc(desc, &tree->object.oid, tree->buffer, tree->size); } static int unpack_trees_start(struct merge_options *opt, diff --git a/merge.c b/merge.c index b60925459c29..86179c34102d 100644 --- a/merge.c +++ b/merge.c @@ -81,7 +81,8 @@ int checkout_fast_forward(struct repository *r, } for (i = 0; i < nr_trees; i++) { parse_tree(trees[i]); - init_tree_desc(t+i, trees[i]->buffer, trees[i]->size); + init_tree_desc(t+i, &trees[i]->object.oid, + trees[i]->buffer, trees[i]->size); } memset(&opts, 0, sizeof(opts)); diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c index f6757c3cbf20..9211e08f0127 100644 --- a/pack-bitmap-write.c +++ b/pack-bitmap-write.c @@ -366,7 +366,7 @@ static int fill_bitmap_tree(struct bitmap *bitmap, if (parse_tree(tree) < 0) die("unable to load tree object %s", oid_to_hex(&tree->object.oid)); - init_tree_desc(&desc, tree->buffer, tree->size); + init_tree_desc(&desc, &tree->object.oid, tree->buffer, tree->size); while (tree_entry(&desc, &entry)) { switch (object_type(entry.mode)) { diff --git a/packfile.c b/packfile.c index 9cc0a2e37a83..1fae0fcdd9e7 100644 --- a/packfile.c +++ b/packfile.c @@ -2250,7 +2250,8 @@ static int add_promisor_object(const struct object_id *oid, struct tree *tree = (struct tree *)obj; struct tree_desc desc; struct name_entry entry; - if (init_tree_desc_gently(&desc, tree->buffer, tree->size, 0)) + if (init_tree_desc_gently(&desc, &tree->object.oid, + tree->buffer, tree->size, 0)) /* * Error messages are given when packs are * verified, so do not print any here. diff --git a/reflog.c b/reflog.c index 9ad50e7d93e4..c6992a19268f 100644 --- a/reflog.c +++ b/reflog.c @@ -40,7 +40,7 @@ static int tree_is_complete(const struct object_id *oid) tree->buffer = data; tree->size = size; } - init_tree_desc(&desc, tree->buffer, tree->size); + init_tree_desc(&desc, &tree->object.oid, tree->buffer, tree->size); complete = 1; while (tree_entry(&desc, &entry)) { if (!repo_has_object_file(the_repository, &entry.oid) || diff --git a/revision.c b/revision.c index 2f4c53ea207b..a60dfc23a2a5 100644 --- a/revision.c +++ b/revision.c @@ -82,7 +82,7 @@ static void mark_tree_contents_uninteresting(struct repository *r, if (parse_tree_gently(tree, 1) < 0) return; - init_tree_desc(&desc, tree->buffer, tree->size); + init_tree_desc(&desc, &tree->object.oid, tree->buffer, tree->size); while (tree_entry(&desc, &entry)) { switch (object_type(entry.mode)) { case OBJ_TREE: @@ -189,7 +189,7 @@ static void add_children_by_path(struct repository *r, if (parse_tree_gently(tree, 1) < 0) return; - init_tree_desc(&desc, tree->buffer, tree->size); + init_tree_desc(&desc, &tree->object.oid, tree->buffer, tree->size); while (tree_entry(&desc, &entry)) { switch (object_type(entry.mode)) { case OBJ_TREE: diff --git a/tree-walk.c b/tree-walk.c index 3af50a01c2c7..0b44ec7c75ff 100644 --- a/tree-walk.c +++ b/tree-walk.c @@ -15,7 +15,7 @@ static int decode_tree_entry(struct tree_desc *desc, const char *buf, unsigned l const char *path; unsigned int len; uint16_t mode; - const unsigned hashsz = the_hash_algo->rawsz; + const unsigned hashsz = desc->algo->rawsz; if (size < hashsz + 3 || buf[size - (hashsz + 1)]) { strbuf_addstr(err, _("too-short tree object")); @@ -37,15 +37,19 @@ static int decode_tree_entry(struct tree_desc *desc, const char *buf, unsigned l desc->entry.path = path; desc->entry.mode = (desc->flags & TREE_DESC_RAW_MODES) ? mode : canon_mode(mode); desc->entry.pathlen = len - 1; - oidread(&desc->entry.oid, (const unsigned char *)path + len); + oidread_algop(&desc->entry.oid, (const unsigned char *)path + len, + desc->algo); return 0; } -static int init_tree_desc_internal(struct tree_desc *desc, const void *buffer, - unsigned long size, struct strbuf *err, +static int init_tree_desc_internal(struct tree_desc *desc, + const struct object_id *oid, + const void *buffer, unsigned long size, + struct strbuf *err, enum tree_desc_flags flags) { + desc->algo = (oid && oid->algo) ? &hash_algos[oid->algo] : the_hash_algo; desc->buffer = buffer; desc->size = size; desc->flags = flags; @@ -54,19 +58,21 @@ static int init_tree_desc_internal(struct tree_desc *desc, const void *buffer, return 0; } -void init_tree_desc(struct tree_desc *desc, const void *buffer, unsigned long size) +void init_tree_desc(struct tree_desc *desc, const struct object_id *tree_oid, + const void *buffer, unsigned long size) { struct strbuf err = STRBUF_INIT; - if (init_tree_desc_internal(desc, buffer, size, &err, 0)) + if (init_tree_desc_internal(desc, tree_oid, buffer, size, &err, 0)) die("%s", err.buf); strbuf_release(&err); } -int init_tree_desc_gently(struct tree_desc *desc, const void *buffer, unsigned long size, +int init_tree_desc_gently(struct tree_desc *desc, const struct object_id *oid, + const void *buffer, unsigned long size, enum tree_desc_flags flags) { struct strbuf err = STRBUF_INIT; - int result = init_tree_desc_internal(desc, buffer, size, &err, flags); + int result = init_tree_desc_internal(desc, oid, buffer, size, &err, flags); if (result) error("%s", err.buf); strbuf_release(&err); @@ -85,7 +91,7 @@ void *fill_tree_descriptor(struct repository *r, if (!buf) die("unable to read tree %s", oid_to_hex(oid)); } - init_tree_desc(desc, buf, size); + init_tree_desc(desc, oid, buf, size); return buf; } @@ -102,7 +108,7 @@ static void entry_extract(struct tree_desc *t, struct name_entry *a) static int update_tree_entry_internal(struct tree_desc *desc, struct strbuf *err) { const void *buf = desc->buffer; - const unsigned char *end = (const unsigned char *)desc->entry.path + desc->entry.pathlen + 1 + the_hash_algo->rawsz; + const unsigned char *end = (const unsigned char *)desc->entry.path + desc->entry.pathlen + 1 + desc->algo->rawsz; unsigned long size = desc->size; unsigned long len = end - (const unsigned char *)buf; @@ -611,7 +617,7 @@ int get_tree_entry(struct repository *r, retval = -1; } else { struct tree_desc t; - init_tree_desc(&t, tree, size); + init_tree_desc(&t, tree_oid, tree, size); retval = find_tree_entry(r, &t, name, oid, mode); } free(tree); @@ -654,7 +660,7 @@ enum get_oid_result get_tree_entry_follow_symlinks(struct repository *r, struct tree_desc t; int follows_remaining = GET_TREE_ENTRY_FOLLOW_SYMLINKS_MAX_LINKS; - init_tree_desc(&t, NULL, 0UL); + init_tree_desc(&t, NULL, NULL, 0UL); strbuf_addstr(&namebuf, name); oidcpy(¤t_tree_oid, tree_oid); @@ -690,7 +696,7 @@ enum get_oid_result get_tree_entry_follow_symlinks(struct repository *r, goto done; /* descend */ - init_tree_desc(&t, tree, size); + init_tree_desc(&t, ¤t_tree_oid, tree, size); } /* Handle symlinks to e.g. a//b by removing leading slashes */ @@ -724,7 +730,7 @@ enum get_oid_result get_tree_entry_follow_symlinks(struct repository *r, free(parent->tree); parents_nr--; parent = &parents[parents_nr - 1]; - init_tree_desc(&t, parent->tree, parent->size); + init_tree_desc(&t, &parent->oid, parent->tree, parent->size); strbuf_remove(&namebuf, 0, remainder ? 3 : 2); continue; } @@ -804,7 +810,7 @@ enum get_oid_result get_tree_entry_follow_symlinks(struct repository *r, contents_start = contents; parent = &parents[parents_nr - 1]; - init_tree_desc(&t, parent->tree, parent->size); + init_tree_desc(&t, &parent->oid, parent->tree, parent->size); strbuf_splice(&namebuf, 0, len, contents_start, link_len); if (remainder) diff --git a/tree-walk.h b/tree-walk.h index 74cdceb3fed2..cf54d01019e9 100644 --- a/tree-walk.h +++ b/tree-walk.h @@ -26,6 +26,7 @@ struct name_entry { * A semi-opaque data structure used to maintain the current state of the walk. */ struct tree_desc { + const struct git_hash_algo *algo; /* * pointer into the memory representation of the tree. It always * points at the current entry being visited. @@ -85,9 +86,11 @@ int update_tree_entry_gently(struct tree_desc *); * size parameters are assumed to be the same as the buffer and size * members of `struct tree`. */ -void init_tree_desc(struct tree_desc *desc, const void *buf, unsigned long size); +void init_tree_desc(struct tree_desc *desc, const struct object_id *tree_oid, + const void *buf, unsigned long size); -int init_tree_desc_gently(struct tree_desc *desc, const void *buf, unsigned long size, +int init_tree_desc_gently(struct tree_desc *desc, const struct object_id *oid, + const void *buf, unsigned long size, enum tree_desc_flags flags); /* diff --git a/tree.c b/tree.c index c745462f968e..44bcf728f10a 100644 --- a/tree.c +++ b/tree.c @@ -27,7 +27,7 @@ int read_tree_at(struct repository *r, if (parse_tree(tree)) return -1; - init_tree_desc(&desc, tree->buffer, tree->size); + init_tree_desc(&desc, &tree->object.oid, tree->buffer, tree->size); while (tree_entry(&desc, &entry)) { if (retval != all_entries_interesting) { diff --git a/walker.c b/walker.c index 65002a7220ad..c0fd632d921c 100644 --- a/walker.c +++ b/walker.c @@ -45,7 +45,7 @@ static int process_tree(struct walker *walker, struct tree *tree) if (parse_tree(tree)) return -1; - init_tree_desc(&desc, tree->buffer, tree->size); + init_tree_desc(&desc, &tree->object.oid, tree->buffer, tree->size); while (tree_entry(&desc, &entry)) { struct object *obj = NULL; From patchwork Fri Sep 8 23:10:39 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Eric W. Biederman" X-Patchwork-Id: 13377901 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C3CA3EEB571 for ; Fri, 8 Sep 2023 23:12:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1344727AbjIHXMW (ORCPT ); Fri, 8 Sep 2023 19:12:22 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60322 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1343515AbjIHXMU (ORCPT ); Fri, 8 Sep 2023 19:12:20 -0400 Received: from out02.mta.xmission.com (out02.mta.xmission.com [166.70.13.232]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 537031FE5 for ; Fri, 8 Sep 2023 16:12:15 -0700 (PDT) Received: from in02.mta.xmission.com ([166.70.13.52]:39708) by out02.mta.xmission.com with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1qeke6-006Mxp-Fj; Fri, 08 Sep 2023 17:12:14 -0600 Received: from ip68-227-168-167.om.om.cox.net ([68.227.168.167]:54328 helo=localhost.localdomain) by in02.mta.xmission.com with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1qeke5-009u13-Hi; Fri, 08 Sep 2023 17:12:14 -0600 From: "Eric W. Biederman" To: git@vger.kernel.org Cc: Junio C Hamano , "brian m. carlson" , "Eric W. Biederman" Date: Fri, 8 Sep 2023 18:10:39 -0500 Message-Id: <20230908231049.2035003-22-ebiederm@xmission.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <87sf7ol0z3.fsf@email.froward.int.ebiederm.org> References: <87sf7ol0z3.fsf@email.froward.int.ebiederm.org> MIME-Version: 1.0 X-XM-SPF: eid=1qeke5-009u13-Hi;;;mid=<20230908231049.2035003-22-ebiederm@xmission.com>;;;hst=in02.mta.xmission.com;;;ip=68.227.168.167;;;frm=ebiederm@xmission.com;;;spf=pass X-XM-AID: U2FsdGVkX199OyHtnthml3Bm7RYHck+AOyBiu60wlsY= X-SA-Exim-Connect-IP: 68.227.168.167 X-SA-Exim-Mail-From: ebiederm@xmission.com Subject: [PATCH 22/32] object-file: Handle compat objects in check_object_signature X-SA-Exim-Version: 4.2.1 (built Sat, 08 Feb 2020 21:53:50 +0000) X-SA-Exim-Scanned: Yes (on in02.mta.xmission.com) Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Update check_object_signature to find the hash algorithm the exising signature uses, and to use the same hash algorithm when recomputing it to check the signature is valid. This will be useful when teaching git ls-tree to display objects encoded with the compat hash algorithm. Signed-off-by: "Eric W. Biederman" --- object-file.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/object-file.c b/object-file.c index fd420dd303df..d6140ebccaf1 100644 --- a/object-file.c +++ b/object-file.c @@ -1094,9 +1094,11 @@ int check_object_signature(struct repository *r, const struct object_id *oid, void *buf, unsigned long size, enum object_type type) { + const struct git_hash_algo *algo = + oid->algo ? &hash_algos[oid->algo] : r->hash_algo; struct object_id real_oid; - hash_object_file(r->hash_algo, buf, size, type, &real_oid); + hash_object_file(algo, buf, size, type, &real_oid); return !oideq(oid, &real_oid) ? -1 : 0; } From patchwork Fri Sep 8 23:10:40 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Eric W. Biederman" X-Patchwork-Id: 13377908 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 62787EEB570 for ; Fri, 8 Sep 2023 23:22:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1345240AbjIHXWq (ORCPT ); Fri, 8 Sep 2023 19:22:46 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53462 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1345209AbjIHXWm (ORCPT ); Fri, 8 Sep 2023 19:22:42 -0400 Received: from out01.mta.xmission.com (out01.mta.xmission.com [166.70.13.231]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E1C6C18E for ; Fri, 8 Sep 2023 16:22:38 -0700 (PDT) Received: from in02.mta.xmission.com ([166.70.13.52]:34858) by out01.mta.xmission.com with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1qeke8-007R3v-GP; Fri, 08 Sep 2023 17:12:16 -0600 Received: from ip68-227-168-167.om.om.cox.net ([68.227.168.167]:54328 helo=localhost.localdomain) by in02.mta.xmission.com with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1qeke7-009u13-Hq; Fri, 08 Sep 2023 17:12:16 -0600 From: "Eric W. Biederman" To: git@vger.kernel.org Cc: Junio C Hamano , "brian m. carlson" , "Eric W. Biederman" Date: Fri, 8 Sep 2023 18:10:40 -0500 Message-Id: <20230908231049.2035003-23-ebiederm@xmission.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <87sf7ol0z3.fsf@email.froward.int.ebiederm.org> References: <87sf7ol0z3.fsf@email.froward.int.ebiederm.org> MIME-Version: 1.0 X-XM-SPF: eid=1qeke7-009u13-Hq;;;mid=<20230908231049.2035003-23-ebiederm@xmission.com>;;;hst=in02.mta.xmission.com;;;ip=68.227.168.167;;;frm=ebiederm@xmission.com;;;spf=pass X-XM-AID: U2FsdGVkX18ET2hP4skw7r+DIIXWhRGh6nCdeaE1x0c= X-SA-Exim-Connect-IP: 68.227.168.167 X-SA-Exim-Mail-From: ebiederm@xmission.com Subject: [PATCH 23/32] builtin/ls-tree: Let the oid determine the output algorithm X-SA-Exim-Version: 4.2.1 (built Sat, 08 Feb 2020 21:53:50 +0000) X-SA-Exim-Scanned: Yes (on in02.mta.xmission.com) Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Update cmd_ls_tree to call get_oid_with_context and pass GET_OID_UNTRANSLATED instead of calling the simpler repo_get_oid. This implments in ls-tree the behavior that asking to display a sha1 hash displays the corrresponding sha1 encoded object and asking to display a sha256 hash displayes the corresponding sha256 encoded object. This is useful for testing the conversion of an object to an equivlanet object encoded with a different hash function. Signed-off-by: "Eric W. Biederman" --- builtin/ls-tree.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/builtin/ls-tree.c b/builtin/ls-tree.c index f558db5f3b80..346e3fd812eb 100644 --- a/builtin/ls-tree.c +++ b/builtin/ls-tree.c @@ -376,6 +376,7 @@ int cmd_ls_tree(int argc, const char **argv, const char *prefix) OPT_END() }; struct ls_tree_cmdmode_to_fmt *m2f = ls_tree_cmdmode_format; + struct object_context obj_context; int ret; git_config(git_default_config, NULL); @@ -407,7 +408,9 @@ int cmd_ls_tree(int argc, const char **argv, const char *prefix) ls_tree_usage, ls_tree_options); if (argc < 1) usage_with_options(ls_tree_usage, ls_tree_options); - if (repo_get_oid(the_repository, argv[0], &oid)) + if (get_oid_with_context(the_repository, argv[0], + GET_OID_UNTRANSLATED, &oid, + &obj_context)) die("Not a valid object name %s", argv[0]); /* From patchwork Fri Sep 8 23:10:41 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Eric W. Biederman" X-Patchwork-Id: 13377920 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A0446EEB570 for ; Fri, 8 Sep 2023 23:31:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1345274AbjIHXbq (ORCPT ); Fri, 8 Sep 2023 19:31:46 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49338 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S243209AbjIHXbl (ORCPT ); Fri, 8 Sep 2023 19:31:41 -0400 Received: from out01.mta.xmission.com (out01.mta.xmission.com [166.70.13.231]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CBDB92121 for ; Fri, 8 Sep 2023 16:31:32 -0700 (PDT) Received: from in02.mta.xmission.com ([166.70.13.52]:34892) by out01.mta.xmission.com with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1qekeA-007R4a-Eh; Fri, 08 Sep 2023 17:12:18 -0600 Received: from ip68-227-168-167.om.om.cox.net ([68.227.168.167]:54328 helo=localhost.localdomain) by in02.mta.xmission.com with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1qeke9-009u13-It; Fri, 08 Sep 2023 17:12:18 -0600 From: "Eric W. Biederman" To: git@vger.kernel.org Cc: Junio C Hamano , "brian m. carlson" , "Eric W. Biederman" Date: Fri, 8 Sep 2023 18:10:41 -0500 Message-Id: <20230908231049.2035003-24-ebiederm@xmission.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <87sf7ol0z3.fsf@email.froward.int.ebiederm.org> References: <87sf7ol0z3.fsf@email.froward.int.ebiederm.org> MIME-Version: 1.0 X-XM-SPF: eid=1qeke9-009u13-It;;;mid=<20230908231049.2035003-24-ebiederm@xmission.com>;;;hst=in02.mta.xmission.com;;;ip=68.227.168.167;;;frm=ebiederm@xmission.com;;;spf=pass X-XM-AID: U2FsdGVkX1+5H1npDdUAtKBB5i7WVOopukoiZ92XGAU= X-SA-Exim-Connect-IP: 68.227.168.167 X-SA-Exim-Mail-From: ebiederm@xmission.com Subject: [PATCH 24/32] builtin/pack-objects: Communicate the compatibility hash through struct pack_idx_entry X-SA-Exim-Version: 4.2.1 (built Sat, 08 Feb 2020 21:53:50 +0000) X-SA-Exim-Scanned: Yes (on in02.mta.xmission.com) Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org When pack-objects is run all objects in the repository should already have a compatibilty hash computed so it is just necessary to read the existing mappings and store the value in struct pack_idx_entry. Signed-off-by: "Eric W. Biederman" --- builtin/pack-objects.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index d34902002656..ff04660a18fd 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -42,6 +42,7 @@ #include "promisor-remote.h" #include "pack-mtimes.h" #include "parse-options.h" +#include "object-file-convert.h" /* * Objects we are going to pack are collected in the `to_pack` structure. @@ -1547,10 +1548,16 @@ static struct object_entry *create_object_entry(const struct object_id *oid, struct packed_git *found_pack, off_t found_offset) { + struct repository *repo = the_repository; + const struct git_hash_algo *compat = repo->compat_hash_algo; struct object_entry *entry; entry = packlist_alloc(&to_pack, oid); entry->hash = hash; + if (compat && + repo_oid_to_algop(repo, &entry->idx.oid, compat, + &entry->idx.compat_oid)) + die(_("can't map object %s while writing pack"), oid_to_hex(oid)); oe_set_type(entry, type); if (exclude) entry->preferred_base = 1; From patchwork Fri Sep 8 23:10:42 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Eric W. Biederman" X-Patchwork-Id: 13377918 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3F2ADEEB570 for ; Fri, 8 Sep 2023 23:31:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1344066AbjIHXb3 (ORCPT ); Fri, 8 Sep 2023 19:31:29 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47618 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238478AbjIHXb2 (ORCPT ); Fri, 8 Sep 2023 19:31:28 -0400 Received: from out03.mta.xmission.com (out03.mta.xmission.com [166.70.13.233]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D27CAE46 for ; Fri, 8 Sep 2023 16:31:18 -0700 (PDT) Received: from in02.mta.xmission.com ([166.70.13.52]:37494) by out03.mta.xmission.com with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1qekeD-00FHSJ-Q9; Fri, 08 Sep 2023 17:12:21 -0600 Received: from ip68-227-168-167.om.om.cox.net ([68.227.168.167]:54328 helo=localhost.localdomain) by in02.mta.xmission.com with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1qekeB-009u13-H9; Fri, 08 Sep 2023 17:12:21 -0600 From: "Eric W. Biederman" To: git@vger.kernel.org Cc: Junio C Hamano , "brian m. carlson" , "Eric W. Biederman" Date: Fri, 8 Sep 2023 18:10:42 -0500 Message-Id: <20230908231049.2035003-25-ebiederm@xmission.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <87sf7ol0z3.fsf@email.froward.int.ebiederm.org> References: <87sf7ol0z3.fsf@email.froward.int.ebiederm.org> MIME-Version: 1.0 X-XM-SPF: eid=1qekeB-009u13-H9;;;mid=<20230908231049.2035003-25-ebiederm@xmission.com>;;;hst=in02.mta.xmission.com;;;ip=68.227.168.167;;;frm=ebiederm@xmission.com;;;spf=pass X-XM-AID: U2FsdGVkX1+QfoEvKjfNPeVzLMTWQpEc7tGBrVWRgxY= X-SA-Exim-Connect-IP: 68.227.168.167 X-SA-Exim-Mail-From: ebiederm@xmission.com Subject: [PATCH 25/32] pack-compat-map: Add support for .compat files of a packfile X-SA-Exim-Version: 4.2.1 (built Sat, 08 Feb 2020 21:53:50 +0000) X-SA-Exim-Scanned: Yes (on in02.mta.xmission.com) Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org These .compat files hold a bidirectional mapping between the names of stored objects between sha1 and sha256. Care has been taken so that index-pack --verify can be supported to validate an existing compat map file is not currupted. Signed-off-by: "Eric W. Biederman" --- Makefile | 2 + builtin.h | 1 + builtin/show-compat-map.c | 139 ++++++++++++++++ git.c | 1 + object-file-convert.c | 7 + object-name.c | 18 ++ object-store-ll.h | 6 + pack-compat-map.c | 334 ++++++++++++++++++++++++++++++++++++++ pack-compat-map.h | 27 +++ pack-write.c | 158 ++++++++++++++++++ packfile.c | 12 ++ 11 files changed, 705 insertions(+) create mode 100644 builtin/show-compat-map.c create mode 100644 pack-compat-map.c create mode 100644 pack-compat-map.h diff --git a/Makefile b/Makefile index 3c18664def9a..b3f3dbe7bfeb 100644 --- a/Makefile +++ b/Makefile @@ -1088,6 +1088,7 @@ LIB_OBJS += pack-check.o LIB_OBJS += pack-mtimes.o LIB_OBJS += pack-objects.o LIB_OBJS += pack-revindex.o +LIB_OBJS += pack-compat-map.o LIB_OBJS += pack-write.o LIB_OBJS += packfile.o LIB_OBJS += pager.o @@ -1299,6 +1300,7 @@ BUILTIN_OBJS += builtin/send-pack.o BUILTIN_OBJS += builtin/shortlog.o BUILTIN_OBJS += builtin/show-branch.o BUILTIN_OBJS += builtin/show-index.o +BUILTIN_OBJS += builtin/show-compat-map.o BUILTIN_OBJS += builtin/show-ref.o BUILTIN_OBJS += builtin/sparse-checkout.o BUILTIN_OBJS += builtin/stash.o diff --git a/builtin.h b/builtin.h index d560baa6618a..25882d281dd2 100644 --- a/builtin.h +++ b/builtin.h @@ -223,6 +223,7 @@ int cmd_shortlog(int argc, const char **argv, const char *prefix); int cmd_show(int argc, const char **argv, const char *prefix); int cmd_show_branch(int argc, const char **argv, const char *prefix); int cmd_show_index(int argc, const char **argv, const char *prefix); +int cmd_show_compat_map(int argc, const char **argv, const char *prefix); int cmd_sparse_checkout(int argc, const char **argv, const char *prefix); int cmd_status(int argc, const char **argv, const char *prefix); int cmd_stash(int argc, const char **argv, const char *prefix); diff --git a/builtin/show-compat-map.c b/builtin/show-compat-map.c new file mode 100644 index 000000000000..8cc10bdaab61 --- /dev/null +++ b/builtin/show-compat-map.c @@ -0,0 +1,139 @@ +#include "builtin.h" +#include "gettext.h" +#include "hash.h" +#include "hex.h" +#include "pack.h" +#include "parse-options.h" +#include "repository.h" + +static const char *const show_compat_map_usage[] = { + "git show-compat-map [--verbose] ", + NULL +}; + +struct pack_compat_map_header { + uint8_t sig[4]; + uint8_t version; + uint8_t first_oid_version; + uint8_t second_oid_version; + uint8_t mbz1; + uint32_t nr_objects; + uint8_t first_abbrev_len; + uint8_t mbz2; + uint8_t second_abbrev_len; + uint8_t mbz3; +}; + +struct map_entry { + struct object_id oid; + uint32_t index; +}; + +static const struct git_hash_algo *from_oid_version(unsigned oid_version) +{ + if (oid_version == 1) { + return &hash_algos[GIT_HASH_SHA1]; + } else if (oid_version == 2) { + return &hash_algos[GIT_HASH_SHA256]; + } + die("unknown oid version %u\n", oid_version); +} + +static void read_half_map(struct map_entry *map, unsigned nr, + const struct git_hash_algo *algo) +{ + unsigned i; + for (i = 0; i < nr; i++) { + uint32_t index; + if (fread(map[i].oid.hash, algo->rawsz, 1, stdin) != 1) + die("unable to read hash of %s entry %u/%u", + algo->name, i, nr); + if (fread(&index, 4, 1, stdin) != 1) + die("unable to read index of %s entry %u/%u", + algo->name, i, nr); + map[i].oid.algo = hash_algo_by_ptr(algo); + map[i].index = ntohl(index); + } +} + +static void print_half_map(const struct map_entry *map, + unsigned nr) +{ + unsigned i; + for (i = 0; i < nr; i++) { + printf("%s %"PRIu32"\n", + oid_to_hex(&map[i].oid), + map[i].index); + } +} + +static void print_map(const struct map_entry *map, + const struct map_entry *compat_map, + unsigned nr) +{ + unsigned i; + for (i = 0; i < nr; i++) { + printf("%s ", + oid_to_hex(&map[i].oid)); + printf("%s\n", + oid_to_hex(&compat_map[map[i].index].oid)); + } +} + +int cmd_show_compat_map(int argc, const char **argv, const char *prefix) +{ + const struct git_hash_algo *algo = NULL, *compat = NULL; + unsigned nr; + struct pack_compat_map_header hdr; + struct map_entry *map, *compat_map; + int verbose = 0; + const struct option show_comapt_map_options[] = { + OPT_BOOL(0, "verbose", &verbose, + N_("print implementation details of the map file")), + OPT_END() + }; + + argc = parse_options(argc, argv, prefix, show_comapt_map_options, + show_compat_map_usage, 0); + + if (fread(&hdr, sizeof(hdr), 1, stdin) != 1) + die("unable to read header"); + if ((hdr.sig[0] != 'C') || + (hdr.sig[1] != 'M') || + (hdr.sig[2] != 'A') || + (hdr.sig[3] != 'P')) + die("Missing map signature"); + if (hdr.version != 1) + die("Unknown map version"); + if ((hdr.mbz1 != 0) || + (hdr.mbz2 != 0) || + (hdr.mbz3 != 0)) + die("Must be zero fields non-zero"); + + nr = ntohl(hdr.nr_objects); + + algo = from_oid_version(hdr.first_oid_version); + compat = from_oid_version(hdr.second_oid_version); + + + if (verbose) { + printf("Map v%u for %u objects from %s to %s abbrevs (%u:%u)\n", + hdr.version, + nr, + algo->name, compat->name, + hdr.first_abbrev_len, + hdr.second_abbrev_len); + } + ALLOC_ARRAY(map, nr); + ALLOC_ARRAY(compat_map, nr); + read_half_map(map, nr, algo); + read_half_map(compat_map, nr, compat); + if (verbose) { + print_half_map(map, nr); + print_half_map(compat_map, nr); + } + print_map(map, compat_map, nr); + free(compat_map); + free(map); + return 0; +} diff --git a/git.c b/git.c index c67e44dd82d2..bfaeece5ae0e 100644 --- a/git.c +++ b/git.c @@ -606,6 +606,7 @@ static struct cmd_struct commands[] = { { "show", cmd_show, RUN_SETUP }, { "show-branch", cmd_show_branch, RUN_SETUP }, { "show-index", cmd_show_index, RUN_SETUP_GENTLY }, + { "show-compat-map", cmd_show_compat_map, RUN_SETUP_GENTLY }, { "show-ref", cmd_show_ref, RUN_SETUP }, { "sparse-checkout", cmd_sparse_checkout, RUN_SETUP }, { "stage", cmd_add, RUN_SETUP | NEED_WORK_TREE }, diff --git a/object-file-convert.c b/object-file-convert.c index d381d3d2ea65..7978aa63dfa9 100644 --- a/object-file-convert.c +++ b/object-file-convert.c @@ -9,6 +9,7 @@ #include "loose.h" #include "commit.h" #include "gpg-interface.h" +#include "pack-compat-map.h" #include "object-file-convert.h" int repo_oid_to_algop(struct repository *repo, const struct object_id *src, @@ -27,6 +28,12 @@ int repo_oid_to_algop(struct repository *repo, const struct object_id *src, return 0; } if (repo_loose_object_map_oid(repo, dest, to, src)) { + /* + * It's not in the loose object map, so let's see if it's in a + * pack. + */ + if (!repo_packed_oid_to_algop(repo, src, to, dest)) + return 0; /* * We may have loaded the object map at repo initialization but * another process (perhaps upstream of a pipe from us) may have diff --git a/object-name.c b/object-name.c index ebe87f5c4fdd..d33c82bc96ba 100644 --- a/object-name.c +++ b/object-name.c @@ -26,6 +26,7 @@ #include "commit-reach.h" #include "date.h" #include "object-file-convert.h" +#include "pack-compat-map.h" static int get_oid_oneline(struct repository *r, const char *, struct object_id *, struct commit_list *); @@ -210,6 +211,19 @@ static void find_short_packed_object(struct disambiguate_state *ds) unique_in_pack(p, ds); } +static void find_short_packed_compat_object(struct disambiguate_state *ds) +{ + struct packed_git *p; + + /* Skip, unless compatibility oids are wanted */ + if (!ds->algo && (&hash_algos[ds->algo] != ds->repo->compat_hash_algo)) + return; + + for (p = get_packed_git(ds->repo); p && !ds->ambiguous; p = p->next) + pack_compat_map_each(ds->repo, p, ds->bin_pfx.hash, ds->len, + match_prefix, ds); +} + static int finish_object_disambiguation(struct disambiguate_state *ds, struct object_id *oid) { @@ -581,6 +595,7 @@ static enum get_oid_result get_short_oid(struct repository *r, find_short_object_filename(&ds); find_short_packed_object(&ds); + find_short_packed_compat_object(&ds); status = finish_object_disambiguation(&ds, oid); /* @@ -592,6 +607,7 @@ static enum get_oid_result get_short_oid(struct repository *r, reprepare_packed_git(r); find_short_object_filename(&ds); find_short_packed_object(&ds); + find_short_packed_compat_object(&ds); status = finish_object_disambiguation(&ds, oid); } @@ -659,6 +675,7 @@ int repo_for_each_abbrev(struct repository *r, const char *prefix, ds.cb_data = &collect; find_short_object_filename(&ds); find_short_packed_object(&ds); + find_short_packed_compat_object(&ds); ret = oid_array_for_each_unique(&collect, fn, cb_data); oid_array_clear(&collect); @@ -871,6 +888,7 @@ int repo_find_unique_abbrev_r(struct repository *r, char *hex, ds.cb_data = (void *)&mad; find_short_object_filename(&ds); + find_short_packed_compat_object(&ds); (void)finish_object_disambiguation(&ds, &oid_ret); hex[mad.cur_len] = 0; diff --git a/object-store-ll.h b/object-store-ll.h index c5f2bb2fc2fe..c37c19ada0c3 100644 --- a/object-store-ll.h +++ b/object-store-ll.h @@ -135,6 +135,12 @@ struct packed_git { */ const uint32_t *mtimes_map; size_t mtimes_size; + + const void *compat_mapping; + size_t compat_mapping_size; + const uint8_t *hash_map; + const uint8_t *compat_hash_map; + /* something like ".git/objects/pack/xxxxx.pack" */ char pack_name[FLEX_ARRAY]; /* more */ }; diff --git a/pack-compat-map.c b/pack-compat-map.c new file mode 100644 index 000000000000..3a992095ebe3 --- /dev/null +++ b/pack-compat-map.c @@ -0,0 +1,334 @@ +#include "git-compat-util.h" +#include "gettext.h" +#include "hex.h" +#include "hash-ll.h" +#include "hash.h" +#include "object-store.h" +#include "object-file.h" +#include "packfile.h" +#include "pack-compat-map.h" +#include "packfile.h" + +struct pack_compat_map_header { + uint8_t sig[4]; + uint8_t version; + uint8_t first_oid_version; + uint8_t second_oid_version; + uint8_t mbz1; + uint32_t nr_objects; + uint8_t first_abbrev_len; + uint8_t mbz2; + uint8_t second_abbrev_len; + uint8_t mbz3; +}; + +static char *pack_compat_map_filename(struct packed_git *p) +{ + size_t len; + if (!strip_suffix(p->pack_name, ".pack", &len)) + BUG("pack_name does not end in .pack"); + return xstrfmt("%.*s.compat", (int)len, p->pack_name); +} + +static int oid_version_match(const char *filename, + unsigned oid_version, + const struct git_hash_algo *algo) +{ + const struct git_hash_algo *found = NULL; + int ret = 0; + + if (oid_version == 1) { + found = &hash_algos[GIT_HASH_SHA1]; + } else if (oid_version == 2) { + found = &hash_algos[GIT_HASH_SHA256]; + } + if (found == NULL) { + ret = error(_("compat map file %s hash version %u unknown"), + filename, oid_version); + } + else if (found != algo) { + ret = error(_("compat map file %s found hash %s expected hash %s"), + filename, found->name, algo->name); + } + return ret; +} + + +static int load_pack_compat_map_file(char *compat_map_file, + struct repository *repo, + struct packed_git *p) +{ + const struct pack_compat_map_header *hdr; + unsigned compat_map_objects = 0; + const uint8_t *data = NULL; + const uint8_t *packs_hash = NULL; + int fd, ret = 0; + struct stat st; + size_t size, map1sz, map2sz, expected_size; + + fd = git_open(compat_map_file); + + if (fd < 0) { + ret = -1; + goto cleanup; + } + if (fstat(fd, &st)) { + ret = error_errno(_("failed to read %s"), compat_map_file); + goto cleanup; + } + + size = xsize_t(st.st_size); + + if (size < sizeof(struct pack_compat_map_header)) { + ret = error(_("compat map file %s is too small"), compat_map_file); + goto cleanup; + } + + data = xmmap(NULL, size, PROT_READ, MAP_PRIVATE, fd, 0); + + hdr = (const struct pack_compat_map_header *)data; + if ((hdr->sig[0] != 'C') || + (hdr->sig[1] != 'M') || + (hdr->sig[2] != 'A') || + (hdr->sig[3] != 'P')) { + ret = error(_("compat map file %s has unknown signature"), + compat_map_file); + goto cleanup; + } + + if (hdr->version != 1) { + ret = error(_("compat map file %s has unsupported version %"PRIu8), + compat_map_file, hdr->version); + goto cleanup; + } + + ret = oid_version_match(compat_map_file, hdr->first_oid_version, repo->hash_algo); + if (ret) + goto cleanup; + ret = oid_version_match(compat_map_file, hdr->second_oid_version, repo->compat_hash_algo); + if (ret) + goto cleanup; + compat_map_objects = ntohl(hdr->nr_objects); + if (compat_map_objects != p->num_objects) { + ret = error(_("compat map file %s number of objects found %u wanted %u"), + compat_map_file, compat_map_objects, p->num_objects); + goto cleanup; + } + + map1sz = st_mult(repo->hash_algo->rawsz + 4, compat_map_objects); + map2sz = st_mult(repo->compat_hash_algo->rawsz + 4, compat_map_objects); + + expected_size = sizeof(struct pack_compat_map_header); + expected_size = st_add(expected_size, map1sz); + expected_size = st_add(expected_size, map2sz); + expected_size = st_add(expected_size, 2 * repo->hash_algo->rawsz); + + if (size != expected_size) { + ret = error(_("compat map file %s is corrupt size %zu expected %zu objects %u sz1 %zu sz2 %zu"), + compat_map_file, size, expected_size, compat_map_objects, + map1sz, map2sz + ); + goto cleanup; + } + + packs_hash = data + sizeof(struct pack_compat_map_header) + map1sz + map2sz; + if (hashcmp(packs_hash, p->hash)) { + ret = error(_("compat map file %s does not match pack %s\n"), + compat_map_file, hash_to_hex(p->hash)); + } + + + p->compat_mapping = data; + p->compat_mapping_size = size; + + p->hash_map = data + sizeof(struct pack_compat_map_header); + p->compat_hash_map = p->hash_map + map1sz; + +cleanup: + if (ret) { + if (data) { + munmap((void *)data, size); + } + } + if (fd >= 0) + close(fd); + return ret; +} + +int load_pack_compat_map(struct repository *repo, struct packed_git *p) +{ + char *compat_map_name = NULL; + int ret = 0; + + if (p->compat_mapping) + return ret; /* already loaded */ + + if (!repo->compat_hash_algo) + return 1; /* Nothing to do */ + + ret = open_pack_index(p); + if (ret < 0) + goto cleanup; + + compat_map_name = pack_compat_map_filename(p); + ret = load_pack_compat_map_file(compat_map_name, repo, p); +cleanup: + free(compat_map_name); + return ret; +} + +static int keycmp(const unsigned char *a, const unsigned char *b, + size_t key_hex_size) +{ + size_t key_byte_size = key_hex_size / 2; + unsigned a_last, b_last, mask = (key_hex_size & 1) ? 0xf0 : 0; + int cmp = memcmp(a, b, key_byte_size); + if (cmp) + return cmp; + + a_last = a[key_byte_size] & mask; + b_last = b[key_byte_size] & mask; + + if (a_last == b_last) + cmp = 0; + else if (a_last < b_last) + cmp = -1; + else + cmp = 1; + + return cmp; +} + +static const uint8_t *bsearch_map(const unsigned char *hash, + const uint8_t *table, unsigned nr, + size_t entry_size, size_t key_hex_size) +{ + uint32_t hi, lo; + + hi = nr - 1; + lo = 0; + while (lo < hi) { + unsigned mi = lo + ((hi - lo) / 2); + const unsigned char *entry = table + (mi * entry_size); + int cmp = keycmp(entry, hash, key_hex_size); + if (!cmp) + return entry; + if (cmp > 0) + hi = mi; + else + lo = mi + 1; + } + if (lo == hi) { + const unsigned char *entry = table + (lo * entry_size); + int cmp = keycmp(entry, hash, key_hex_size); + if (!cmp) + return entry; + } + return NULL; +} + +static void map_each(const struct git_hash_algo *compat, + const unsigned char *prefix, size_t prefix_hexsz, + const uint8_t *table, unsigned nr, size_t entry_bytes, + compat_map_iter_t iter, void *data) +{ + const uint8_t *found, *last = table + (entry_bytes * nr); + + found = bsearch_map(prefix, table, nr, entry_bytes, prefix_hexsz); + if (!found) + return; + + /* Visit each matching key */ + do { + struct object_id oid; + + if (keycmp(found, prefix, prefix_hexsz) != 0) + break; + + oidread_algop(&oid, found, compat); + if (iter(&oid, data) == CB_BREAK) + break; + + found = found + entry_bytes; + } while (found < last); +} + +void pack_compat_map_each(struct repository *repo, struct packed_git *p, + const unsigned char *prefix, size_t prefix_hexsz, + compat_map_iter_t iter, void *data) +{ + const struct git_hash_algo *compat = repo->compat_hash_algo; + + if (!p->num_objects || + (!p->compat_mapping && load_pack_compat_map(repo, p))) + return; + + if (prefix_hexsz > compat->hexsz) + prefix_hexsz = compat->hexsz; + + map_each(compat, prefix, prefix_hexsz, + p->compat_hash_map, p->num_objects, compat->rawsz + 4, + iter, data); +} + +static int compat_map_to_algop(const struct object_id *src, + const struct git_hash_algo *to, + const struct git_hash_algo *from, + const uint8_t *to_table, + const uint8_t *from_table, + unsigned nr, + struct object_id *dest) +{ + const uint8_t *found; + uint32_t index; + + if (src->algo != hash_algo_by_ptr(from)) + return -1; + + found = bsearch_map(src->hash, + from_table, nr, + from->rawsz + 4, + from->hexsz); + if (!found) + return -1; + + index = ntohl(*(uint32_t *)(found + from->rawsz)); + oidread_algop(dest, to_table + index * (to->rawsz + 4), to); + return 0; +} + +static int pack_to_algop(struct repository *repo, struct packed_git *p, + const struct object_id *src, + const struct git_hash_algo *to, struct object_id *dest) +{ + if (!p->compat_mapping && load_pack_compat_map(repo, p)) + return -1; + + if (to == repo->hash_algo) { + return compat_map_to_algop(src, to, repo->compat_hash_algo, + p->hash_map, + p->compat_hash_map, + p->num_objects, dest); + } + else if (to == repo->compat_hash_algo) { + return compat_map_to_algop(src, to, repo->hash_algo, + p->compat_hash_map, + p->hash_map, + p->num_objects, dest); + } + else + return -1; +} + +int repo_packed_oid_to_algop(struct repository *repo, + const struct object_id *src, + const struct git_hash_algo *to, + struct object_id *dest) +{ + struct packed_git *p; + for (p = get_packed_git(repo); p; p = p->next) { + if (!pack_to_algop(repo, p, src, to, dest)) + return 0; + } + return -1; +} diff --git a/pack-compat-map.h b/pack-compat-map.h new file mode 100644 index 000000000000..2a4561ffdff6 --- /dev/null +++ b/pack-compat-map.h @@ -0,0 +1,27 @@ +#ifndef PACK_COMPAT_MAP_H +#define PACK_COMPAT_MAP_H + +#include "cbtree.h" +struct repository; +struct packed_git; +struct object_id; +struct git_hash_algo; +struct pack_idx_entry; + +int load_pack_compat_map(struct repository *repo, struct packed_git *p); + +typedef enum cb_next (*compat_map_iter_t)(const struct object_id *, void *data); +void pack_compat_map_each(struct repository *repo, struct packed_git *p, + const unsigned char *prefix, size_t prefix_hexsz, + compat_map_iter_t, void *data); + +int repo_packed_oid_to_algop(struct repository *repo, + const struct object_id *src, + const struct git_hash_algo *to, + struct object_id *dest); + +const char *write_compat_map_file(const char *compat_map_name, + struct pack_idx_entry **objects, + int nr_objects, const unsigned char *hash); + +#endif /* PACK_COMPAT_MAP_H */ diff --git a/pack-write.c b/pack-write.c index b19ddf15b284..f22eea964f77 100644 --- a/pack-write.c +++ b/pack-write.c @@ -12,6 +12,7 @@ #include "pack-revindex.h" #include "path.h" #include "strbuf.h" +#include "object-file-convert.h" void reset_pack_idx_option(struct pack_idx_option *opts) { @@ -345,6 +346,157 @@ static char *write_mtimes_file(struct packing_data *to_pack, return mtimes_name; } +struct map_entry { + const struct pack_idx_entry *idx; + uint32_t oid_index; + uint32_t compat_oid_index; +}; + +static int map_oid_cmp(const void *_a, const void *_b) +{ + struct map_entry *a = *(struct map_entry **)_a; + struct map_entry *b = *(struct map_entry **)_b; + return oidcmp(&a->idx->oid, &b->idx->oid); +} + +static int map_compat_oid_cmp(const void *_a, const void *_b) +{ + struct map_entry *a = *(struct map_entry **)_a; + struct map_entry *b = *(struct map_entry **)_b; + return oidcmp(&a->idx->compat_oid, &b->idx->compat_oid); +} + +struct pack_compat_map_header { + uint8_t sig[4]; + uint8_t version; + uint8_t first_oid_version; + uint8_t second_oid_version; + uint8_t mbz1; + uint32_t nr_objects; + uint8_t first_abbrev_len; + uint8_t mbz2; + uint8_t second_abbrev_len; + uint8_t mbz3; +}; + +static inline unsigned last_matching_offset(const struct object_id *a, + const struct object_id *b, + const struct git_hash_algo *algop) +{ + unsigned i; + for (i = 0; i < algop->rawsz; i++) + if (a->hash[i] != b->hash[i]) + return i; + /* We should never hit this case. */ + return i; +} + +/* + * The *hash contains the pack content hash. + * The objects array is passed in sorted. + */ +const char *write_compat_map_file(const char *compat_map_name, + struct pack_idx_entry **objects, + int nr_objects, const unsigned char *hash) +{ + struct repository *repo = the_repository; + const struct git_hash_algo *algo = repo->hash_algo; + const struct git_hash_algo *compat = repo->compat_hash_algo; + unsigned short_name_len, compat_short_name_len; + struct hashfile *f; + struct map_entry *map_entries, **map; + struct pack_compat_map_header hdr; + unsigned i; + int fd; + + if (!compat || !nr_objects) + return NULL; + + ALLOC_ARRAY(map_entries, nr_objects); + ALLOC_ARRAY(map, nr_objects); + short_name_len = 1; + for (i = 0; i < nr_objects; ++i) { + unsigned offset; + + map[i] = &map_entries[i]; + map_entries[i].idx = objects[i]; + if (!objects[i]->compat_oid.algo) + BUG("No mapping from %s to %s\n", + oid_to_hex(&objects[i]->oid), + compat->name); + + map_entries[i].oid_index = i; + map_entries[i].compat_oid_index = 0; + if (i == 0) + continue; + + offset = last_matching_offset(&map_entries[i].idx->oid, + &map_entries[i - 1].idx->oid, + algo); + if (offset > short_name_len) + short_name_len = offset; + } + QSORT(map, nr_objects, map_compat_oid_cmp); + compat_short_name_len = 1; + for (i = 0; i < nr_objects; ++i) { + unsigned offset; + + map[i]->compat_oid_index = i; + + if (i == 0) + continue; + + offset = last_matching_offset(&map[i]->idx->compat_oid, + &map[i - 1]->idx->compat_oid, + compat); + if (offset > compat_short_name_len) + compat_short_name_len = offset; + } + + if (compat_map_name) { + /* Verify an existing compat map file */ + f = hashfd_check(compat_map_name); + } else { + struct strbuf tmp_file = STRBUF_INIT; + fd = odb_mkstemp(&tmp_file, "pack/tmp_compat_map_XXXXXX"); + compat_map_name = strbuf_detach(&tmp_file, NULL); + f = hashfd(fd, compat_map_name); + } + + hdr.sig[0] = 'C'; + hdr.sig[1] = 'M'; + hdr.sig[2] = 'A'; + hdr.sig[3] = 'P'; + hdr.version = 1; + hdr.first_oid_version = oid_version(algo); + hdr.second_oid_version = oid_version(compat); + hdr.mbz1 = 0; + hdr.nr_objects = htonl(nr_objects); + hdr.first_abbrev_len = short_name_len; + hdr.mbz2 = 0; + hdr.second_abbrev_len = compat_short_name_len; + hdr.mbz3 = 0; + hashwrite(f, &hdr, sizeof(hdr)); + + QSORT(map, nr_objects, map_oid_cmp); + for (i = 0; i < nr_objects; i++) { + hashwrite(f, map[i]->idx->oid.hash, algo->rawsz); + hashwrite_be32(f, map[i]->compat_oid_index); + } + QSORT(map, nr_objects, map_compat_oid_cmp); + for (i = 0; i < nr_objects; i++) { + hashwrite(f, map[i]->idx->compat_oid.hash, compat->rawsz); + hashwrite_be32(f, map[i]->oid_index); + } + + hashwrite(f, hash, algo->rawsz); + finalize_hashfile(f, NULL, FSYNC_COMPONENT_PACK_METADATA, + CSUM_HASH_IN_STREAM | CSUM_CLOSE | CSUM_FSYNC); + free(map); + free(map_entries); + return compat_map_name; +} + off_t write_pack_header(struct hashfile *f, uint32_t nr_entries) { struct pack_header hdr; @@ -548,6 +700,7 @@ void stage_tmp_packfiles(struct strbuf *name_buffer, { const char *rev_tmp_name = NULL; char *mtimes_tmp_name = NULL; + const char *compat_map_tmp_name = NULL; if (adjust_shared_perm(pack_tmp_name)) die_errno("unable to make temporary pack file readable"); @@ -566,11 +719,16 @@ void stage_tmp_packfiles(struct strbuf *name_buffer, hash); } + compat_map_tmp_name = write_compat_map_file(NULL, written_list, + nr_written, hash); + rename_tmp_packfile(name_buffer, pack_tmp_name, "pack"); if (rev_tmp_name) rename_tmp_packfile(name_buffer, rev_tmp_name, "rev"); if (mtimes_tmp_name) rename_tmp_packfile(name_buffer, mtimes_tmp_name, "mtimes"); + if (compat_map_tmp_name) + rename_tmp_packfile(name_buffer, compat_map_tmp_name, "compat"); free((char *)rev_tmp_name); free(mtimes_tmp_name); diff --git a/packfile.c b/packfile.c index 1fae0fcdd9e7..c1a6bd9bc6b3 100644 --- a/packfile.c +++ b/packfile.c @@ -349,6 +349,17 @@ static void close_pack_mtimes(struct packed_git *p) p->mtimes_map = NULL; } +static void close_pack_compat_map(struct packed_git *p) +{ + if (!p->compat_mapping) + return; + + munmap((void *)p->compat_mapping, p->compat_mapping_size); + p->compat_mapping = NULL; + p->hash_map = NULL; + p->compat_hash_map = NULL; +} + void close_pack(struct packed_git *p) { close_pack_windows(p); @@ -356,6 +367,7 @@ void close_pack(struct packed_git *p) close_pack_index(p); close_pack_revindex(p); close_pack_mtimes(p); + close_pack_compat_map(p); oidset_clear(&p->bad_objects); } From patchwork Fri Sep 8 23:10:43 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Eric W. Biederman" X-Patchwork-Id: 13377902 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id AB3A0EEB570 for ; Fri, 8 Sep 2023 23:12:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1344725AbjIHXMc (ORCPT ); Fri, 8 Sep 2023 19:12:32 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37858 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1344545AbjIHXMb (ORCPT ); Fri, 8 Sep 2023 19:12:31 -0400 Received: from out02.mta.xmission.com (out02.mta.xmission.com [166.70.13.232]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3A7471FF2 for ; Fri, 8 Sep 2023 16:12:25 -0700 (PDT) Received: from in02.mta.xmission.com ([166.70.13.52]:39822) by out02.mta.xmission.com with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1qekeG-006MyC-CU; Fri, 08 Sep 2023 17:12:24 -0600 Received: from ip68-227-168-167.om.om.cox.net ([68.227.168.167]:54328 helo=localhost.localdomain) by in02.mta.xmission.com with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1qekeE-009u13-SU; Fri, 08 Sep 2023 17:12:24 -0600 From: "Eric W. Biederman" To: git@vger.kernel.org Cc: Junio C Hamano , "brian m. carlson" , "Eric W. Biederman" Date: Fri, 8 Sep 2023 18:10:43 -0500 Message-Id: <20230908231049.2035003-26-ebiederm@xmission.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <87sf7ol0z3.fsf@email.froward.int.ebiederm.org> References: <87sf7ol0z3.fsf@email.froward.int.ebiederm.org> MIME-Version: 1.0 X-XM-SPF: eid=1qekeE-009u13-SU;;;mid=<20230908231049.2035003-26-ebiederm@xmission.com>;;;hst=in02.mta.xmission.com;;;ip=68.227.168.167;;;frm=ebiederm@xmission.com;;;spf=pass X-XM-AID: U2FsdGVkX1+0rJW20o6ZkD0VvNQkSkqaxqQ3p6FDhaA= X-SA-Exim-Connect-IP: 68.227.168.167 X-SA-Exim-Mail-From: ebiederm@xmission.com Subject: [PATCH 26/32] object-file-convert: Implement convert_object_file_{begin,step,end} X-SA-Exim-Version: 4.2.1 (built Sat, 08 Feb 2020 21:53:50 +0000) X-SA-Exim-Scanned: Yes (on in02.mta.xmission.com) Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org When converting trees, commits, and tags the objects they reference need to be converted before the objects themselves can be converted. Split convert_objet_file_convert into a couple of pieces that are effectively an iterator over the oids that need to be converted. This allows the objects to be processed depth first when being converted and it allows changing the logic to map oids. In cases like "git index-pack" none of the oids will be mapped in any of the existing mapping tables so an in-memory table needs to be converted and consulted, and this allows that. Not having to update the existing object id mapping mechanisms is particularly nice as it makes it easy to avoid having to introduce new locks to syncrhonize the update of internal mapping mechanisms. This was inspired by a similar change by "brian m. carlson" where he modified convert_object_file to return the unmmaped oids. Inspired-by: brian m. carlson Signed-off-by: "Eric W. Biederman" --- object-file-convert.c | 226 ++++++++++++++++++++++++++++++------------ object-file-convert.h | 21 ++++ 2 files changed, 186 insertions(+), 61 deletions(-) diff --git a/object-file-convert.c b/object-file-convert.c index 7978aa63dfa9..3fd080ebc112 100644 --- a/object-file-convert.c +++ b/object-file-convert.c @@ -67,55 +67,74 @@ static int decode_tree_entry_raw(struct object_id *oid, const char **path, return 0; } -static int convert_tree_object(struct strbuf *out, - const struct git_hash_algo *from, - const struct git_hash_algo *to, - const char *buffer, size_t size) +static int convert_tree_object_step(struct object_file_convert_state *state) { - const char *p = buffer, *end = buffer + size; + const char *buf = state->buf, *p, *end = buf + state->buf_len; + const struct git_hash_algo *from = state->from; + const struct git_hash_algo *to = state->to; + struct strbuf *out = state->outbuf; + + /* The current position */ + p = buf + state->buf_pos; while (p < end) { - struct object_id entry_oid, mapped_oid; + struct object_id entry_oid; const char *path = NULL; size_t pathlen; if (decode_tree_entry_raw(&entry_oid, &path, &pathlen, from, p, end - p)) return error(_("failed to decode tree entry")); - if (repo_oid_to_algop(the_repository, &entry_oid, to, &mapped_oid)) - return error(_("failed to map tree entry for %s"), oid_to_hex(&entry_oid)); + + if (!state->mapped_oid.algo) { + oidcpy(&state->oid, &entry_oid); + return 1; + } + else if (!oideq(&entry_oid, &state->oid)) + return error(_("bad object_file_convert_state oid")); + strbuf_add(out, p, path - p); strbuf_add(out, path, pathlen); - strbuf_add(out, mapped_oid.hash, to->rawsz); + strbuf_add(out, state->mapped_oid.hash, to->rawsz); + state->mapped_oid.algo = 0; p = path + pathlen + from->rawsz; + state->buf_pos = p - buf; } return 0; } -static int convert_commit_object(struct strbuf *out, - const struct git_hash_algo *from, - const struct git_hash_algo *to, - const char *buffer, size_t size) +static int convert_commit_object_step(struct object_file_convert_state *state) { - const char *tail = buffer; - const char *bufptr = buffer; + const struct git_hash_algo *from = state->from; + struct strbuf *out = state->outbuf; + const char *buf = state->buf; + const char *tail = buf + state->buf_len; + const char *bufptr = buf + state->buf_pos; const int tree_entry_len = from->hexsz + 5; const int parent_entry_len = from->hexsz + 7; - struct object_id oid, mapped_oid; + struct object_id oid; const char *p; - tail += size; - if (tail <= bufptr + tree_entry_len + 1 || memcmp(bufptr, "tree ", 5) || - bufptr[tree_entry_len] != '\n') - return error("bogus commit object"); - if (parse_oid_hex_algop(bufptr + 5, &oid, &p, from) < 0) - return error("bad tree pointer"); + if (state->buf_pos == 0) { + if (tail <= bufptr + tree_entry_len + 1 || memcmp(bufptr, "tree ", 5) || + bufptr[tree_entry_len] != '\n') + return error("bogus commit object"); + + if (parse_oid_hex_algop(bufptr + 5, &oid, &p, from) < 0) + return error("bad tree pointer"); - if (repo_oid_to_algop(the_repository, &oid, to, &mapped_oid)) - return error("unable to map tree %s in commit object", - oid_to_hex(&oid)); - strbuf_addf(out, "tree %s\n", oid_to_hex(&mapped_oid)); - bufptr = p + 1; + if (!state->mapped_oid.algo) { + oidcpy(&state->oid, &oid); + return 1; + } + else if (!oideq(&oid, &state->oid)) + return error(_("bad object_file_convert_state oid")); + + strbuf_addf(out, "tree %s\n", oid_to_hex(&state->mapped_oid)); + state->mapped_oid.algo = 0; + bufptr = p + 1; + state->buf_pos = bufptr - buf; + } while (bufptr + parent_entry_len < tail && !memcmp(bufptr, "parent ", 7)) { if (tail <= bufptr + parent_entry_len + 1 || @@ -123,26 +142,44 @@ static int convert_commit_object(struct strbuf *out, *p != '\n') return error("bad parents in commit"); - if (repo_oid_to_algop(the_repository, &oid, to, &mapped_oid)) - return error("unable to map parent %s in commit object", - oid_to_hex(&oid)); + if (!state->mapped_oid.algo) { + oidcpy(&state->oid, &oid); + return 1; + } + else if (!oideq(&oid, &state->oid)) + return error(_("bad object_file_convert_state oid")); - strbuf_addf(out, "parent %s\n", oid_to_hex(&mapped_oid)); + strbuf_addf(out, "parent %s\n", oid_to_hex(&state->mapped_oid)); + state->mapped_oid.algo = 0; bufptr = p + 1; + state->buf_pos = bufptr - buf; } strbuf_add(out, bufptr, tail - bufptr); return 0; } -static int convert_tag_object(struct strbuf *out, - const struct git_hash_algo *from, - const struct git_hash_algo *to, - const char *buffer, size_t size) +static int convert_tag_object_step(struct object_file_convert_state *state) { struct strbuf payload = STRBUF_INIT, temp = STRBUF_INIT, oursig = STRBUF_INIT, othersig = STRBUF_INIT; - size_t payload_size; - struct object_id oid, mapped_oid; + const struct git_hash_algo *from = state->from; + const struct git_hash_algo *to = state->to; + struct strbuf *out = state->outbuf; + const char *buffer = state->buf; + size_t payload_size, size = state->buf_len;; + struct object_id oid; const char *p; + int ret = 0; + + if (!state->mapped_oid.algo) { + if (strncmp(buffer, "object ", 7) || + buffer[from->hexsz + 7] != '\n') + return error("bogus tag object"); + if (parse_oid_hex_algop(buffer + 7, &oid, &p, from) < 0) + return error("bad tag object ID"); + + oidcpy(&state->oid, &oid); + return 1; + } /* Add some slop for longer signature header in the new algorithm. */ strbuf_grow(out, size + 7); @@ -165,52 +202,119 @@ static int convert_tag_object(struct strbuf *out, * Our payload is now in payload and we may have up to two signatrures * in oursig and othersig. */ - if (strncmp(payload.buf, "object ", 7) || payload.buf[from->hexsz + 7] != '\n') - return error("bogus tag object"); - if (parse_oid_hex_algop(payload.buf + 7, &oid, &p, from) < 0) - return error("bad tag object ID"); - if (repo_oid_to_algop(the_repository, &oid, to, &mapped_oid)) - return error("unable to map tree %s in tag object", - oid_to_hex(&oid)); - strbuf_addf(out, "object %s\n", oid_to_hex(&mapped_oid)); + if (strncmp(payload.buf, "object ", 7) || payload.buf[from->hexsz + 7] != '\n') { + ret = error("bogus tag object"); + goto out; + } + if (parse_oid_hex_algop(payload.buf + 7, &oid, &p, from) < 0) { + ret = error("bad tag object ID"); + goto out; + } + if (!oideq(&oid, &state->oid)) { + ret = error(_("bad object_file_convert_state oid")); + goto out; + } + + strbuf_addf(out, "object %s\n", oid_to_hex(&state->mapped_oid)); strbuf_add(out, p, payload.len - (p - payload.buf)); strbuf_addbuf(out, &othersig); if (oursig.len) add_header_signature(out, &oursig, from); - return 0; +out: + strbuf_release(&oursig); + strbuf_release(&othersig); + strbuf_release(&payload); + return ret; } -int convert_object_file(struct strbuf *outbuf, - const struct git_hash_algo *from, - const struct git_hash_algo *to, - const void *buf, size_t len, - enum object_type type, - int gentle) +void convert_object_file_begin(struct object_file_convert_state *state, + struct strbuf *outbuf, + const struct git_hash_algo *from, + const struct git_hash_algo *to, + const void *buf, size_t len, + enum object_type type) { - int ret; + memset(state, 0, sizeof(*state)); + state->outbuf = outbuf; + state->from = from; + state->to = to; + state->buf = buf; + state->buf_len = len; + state->buf_pos = 0; + state->type = type; + /* Don't call this function when no conversion is necessary */ if ((from == to) || (type == OBJ_BLOB)) - die("Refusing noop object file conversion"); + BUG("Attempting noop object file conversion"); switch (type) { case OBJ_TREE: - ret = convert_tree_object(outbuf, from, to, buf, len); + case OBJ_COMMIT: + case OBJ_TAG: + break; + default: + /* Not implemented yet, so fail. */ + BUG("Unknown object file type found in conversion"); + } +} + +int convert_object_file_step(struct object_file_convert_state *state) +{ + int ret; + + switch(state->type) { + case OBJ_TREE: + ret = convert_tree_object_step(state); break; case OBJ_COMMIT: - ret = convert_commit_object(outbuf, from, to, buf, len); + ret = convert_commit_object_step(state); break; case OBJ_TAG: - ret = convert_tag_object(outbuf, from, to, buf, len); + ret = convert_tag_object_step(state); break; default: - /* Not implemented yet, so fail. */ ret = -1; break; } - if (!ret) - return 0; - if (gentle) + return ret; +} + +void convert_object_file_end(struct object_file_convert_state *state, int ret) +{ + if (ret != 0) { + strbuf_release(state->outbuf); + } + memset(state, 0, sizeof(*state)); +} + +int convert_object_file(struct strbuf *outbuf, + const struct git_hash_algo *from, + const struct git_hash_algo *to, + const void *buf, size_t len, + enum object_type type, + int gentle) +{ + struct object_file_convert_state state; + int ret; + + convert_object_file_begin(&state, outbuf, from, to, buf, len, type); + + for (;;) { + ret = convert_object_file_step(&state); + if (ret != 1) + break; + ret = repo_oid_to_algop(the_repository, &state.oid, state.to, + &state.mapped_oid); + if (ret) { + error(_("failed to map %s entry for %s"), + type_name(type), oid_to_hex(&state.oid)); + break; + } + } + + convert_object_file_end(&state, ret); + if (!ret || gentle) return ret; die(_("Failed to convert object from %s to %s"), from->name, to->name); diff --git a/object-file-convert.h b/object-file-convert.h index a4f802aa8eea..da032d7a91ef 100644 --- a/object-file-convert.h +++ b/object-file-convert.h @@ -10,6 +10,27 @@ struct strbuf; int repo_oid_to_algop(struct repository *repo, const struct object_id *src, const struct git_hash_algo *to, struct object_id *dest); +struct object_file_convert_state { + struct strbuf *outbuf; + const struct git_hash_algo *from; + const struct git_hash_algo *to; + const void *buf; + size_t buf_len; + size_t buf_pos; + enum object_type type; + struct object_id oid; + struct object_id mapped_oid; +}; + +void convert_object_file_begin(struct object_file_convert_state *state, + struct strbuf *outbuf, + const struct git_hash_algo *from, + const struct git_hash_algo *to, + const void *buf, size_t len, + enum object_type type); +int convert_object_file_step(struct object_file_convert_state *state); +void convert_object_file_end(struct object_file_convert_state *state, int ret); + /* * Convert an object file from one hash algorithm to another algorithm. * Return -1 on failure, 0 on success. From patchwork Fri Sep 8 23:10:44 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Eric W. Biederman" X-Patchwork-Id: 13377903 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A3A0BEEB570 for ; Fri, 8 Sep 2023 23:12:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1344745AbjIHXMi (ORCPT ); Fri, 8 Sep 2023 19:12:38 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37722 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1344757AbjIHXMg (ORCPT ); Fri, 8 Sep 2023 19:12:36 -0400 Received: from out02.mta.xmission.com (out02.mta.xmission.com [166.70.13.232]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 765FBE45 for ; Fri, 8 Sep 2023 16:12:27 -0700 (PDT) Received: from in02.mta.xmission.com ([166.70.13.52]:39866) by out02.mta.xmission.com with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1qekeI-006MyQ-MU; Fri, 08 Sep 2023 17:12:26 -0600 Received: from ip68-227-168-167.om.om.cox.net ([68.227.168.167]:54328 helo=localhost.localdomain) by in02.mta.xmission.com with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1qekeH-009u13-FR; Fri, 08 Sep 2023 17:12:26 -0600 From: "Eric W. Biederman" To: git@vger.kernel.org Cc: Junio C Hamano , "brian m. carlson" , "Eric W. Biederman" Date: Fri, 8 Sep 2023 18:10:44 -0500 Message-Id: <20230908231049.2035003-27-ebiederm@xmission.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <87sf7ol0z3.fsf@email.froward.int.ebiederm.org> References: <87sf7ol0z3.fsf@email.froward.int.ebiederm.org> MIME-Version: 1.0 X-XM-SPF: eid=1qekeH-009u13-FR;;;mid=<20230908231049.2035003-27-ebiederm@xmission.com>;;;hst=in02.mta.xmission.com;;;ip=68.227.168.167;;;frm=ebiederm@xmission.com;;;spf=pass X-XM-AID: U2FsdGVkX19AlIWH+EqBgyrJW6jAEONR7qdP73lVEBg= X-SA-Exim-Connect-IP: 68.227.168.167 X-SA-Exim-Mail-From: ebiederm@xmission.com Subject: [PATCH 27/32] builtin/fast-import: compute compatibility hashs for imported objects X-SA-Exim-Version: 4.2.1 (built Sat, 08 Feb 2020 21:53:50 +0000) X-SA-Exim-Scanned: Yes (on in02.mta.xmission.com) Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org When the code is in dual hash mode for every object fast-import creates compute the standard oid and it's compatibility mapping. The compatibility mapping is stored in struct pack_idx_entry so that it can be used when an index is created. For fast-import the code needs to be careful because when a new object only refers to other newly created objects the compatibility mapping for those new objects is not stored anywhere permanently. So have the code first look the the compatibility oid in the newly created objects, and then look for the compatibilty oid in the standard mapping tables. As fast-import requires objects to be specified before the objects that reference them nothing special needs to happen to deal with out of order objects. Signed-off-by: "Eric W. Biederman" --- builtin/fast-import.c | 89 +++++++++++++++++++++++++++++++++++++------ 1 file changed, 77 insertions(+), 12 deletions(-) diff --git a/builtin/fast-import.c b/builtin/fast-import.c index 2c645fcfbe3f..f1c250dd3c8f 100644 --- a/builtin/fast-import.c +++ b/builtin/fast-import.c @@ -26,6 +26,8 @@ #include "commit-reach.h" #include "khash.h" #include "date.h" +#include "object-file-convert.h" +#include "pack-compat-map.h" #define PACK_ID_BITS 16 #define MAX_PACK_ID ((1<hash); + tmp.index_name = write_idx_file(NULL, idx, object_count, &pack_idx_opts, + pack_data->hash); + tmp.compat_name = write_compat_map_file(NULL, idx, object_count, + pack_data->hash); free(idx); - return tmpfile; + return tmp; } -static char *keep_pack(const char *curr_index_name) +static char *keep_pack(struct pack_index_names curr) { static const char *keep_msg = "fast-import"; struct strbuf name = STRBUF_INIT; @@ -818,9 +827,17 @@ static char *keep_pack(const char *curr_index_name) die("cannot store pack file"); odb_pack_name(&name, pack_data->hash, "idx"); - if (finalize_object_file(curr_index_name, name.buf)) + if (finalize_object_file(curr.index_name, name.buf)) die("cannot store index file"); - free((void *)curr_index_name); + + if (curr.compat_name) { + odb_pack_name(&name, pack_data->hash, "compat"); + if (finalize_object_file(curr.compat_name, name.buf)) + die("cannot store compatibility map file"); + } + + free((void *)curr.index_name); + free((void *)curr.compat_name); return strbuf_detach(&name, NULL); } @@ -943,6 +960,8 @@ static int store_object( struct object_id *oidout, uintmax_t mark) { + struct repository *repo = the_repository; + const struct git_hash_algo *compat = repo->compat_hash_algo; void *out, *delta; struct object_entry *e; unsigned char hdr[96]; @@ -966,8 +985,7 @@ static int store_object( if (e->idx.offset) { duplicate_count_by_type[type]++; return 1; - } else if (find_sha1_pack(oid.hash, - get_all_packs(the_repository))) { + } else if (find_sha1_pack(oid.hash, get_all_packs(repo))) { e->type = type; e->pack_id = MAX_PACK_ID; e->idx.offset = 1; /* just not zero! */ @@ -1026,6 +1044,42 @@ static int store_object( e->type = type; e->pack_id = pack_id; e->idx.offset = pack_size; + if (compat && (type == OBJ_BLOB)) { + compat->init_fn(&c); + compat->update_fn(&c, hdr, hdrlen); + compat->update_fn(&c, dat->buf, dat->len); + compat->final_oid_fn(&e->idx.compat_oid, &c); + } else if (compat) { + struct object_file_convert_state state; + struct strbuf out = STRBUF_INIT; + int ret; + + convert_object_file_begin(&state, &out, the_hash_algo, compat, + dat->buf, dat->len, type); + for (;;) { + struct object_entry *pobj; + + convert_object_file_step(&state); + if (ret != 1) + break; + + ret = -1; + pobj = find_object(&state.oid); + if (pobj && pobj->idx.compat_oid.algo) + oidcpy(&state.mapped_oid, &pobj->idx.compat_oid); + else if (pobj) + break; + else if (repo_oid_to_algop(repo, &state.oid, compat, + &state.mapped_oid)) + break; + } + convert_object_file_end(&state, ret); + if (ret) + die(_("No mapping for %s to %s\n"), + oid_to_hex(&state.oid), compat->name); + hash_object_file(compat, out.buf, out.len, type, &e->idx.compat_oid); + strbuf_release(&out); + } object_count++; object_count_by_type[type]++; @@ -1084,14 +1138,15 @@ static void truncate_pack(struct hashfile_checkpoint *checkpoint) static void stream_blob(uintmax_t len, struct object_id *oidout, uintmax_t mark) { + const struct git_hash_algo *compat = the_repository->compat_hash_algo; size_t in_sz = 64 * 1024, out_sz = 64 * 1024; unsigned char *in_buf = xmalloc(in_sz); unsigned char *out_buf = xmalloc(out_sz); struct object_entry *e; - struct object_id oid; + struct object_id oid, compat_oid; unsigned long hdrlen; off_t offset; - git_hash_ctx c; + git_hash_ctx c, compat_c; git_zstream s; struct hashfile_checkpoint checkpoint; int status = Z_OK; @@ -1109,6 +1164,10 @@ static void stream_blob(uintmax_t len, struct object_id *oidout, uintmax_t mark) the_hash_algo->init_fn(&c); the_hash_algo->update_fn(&c, out_buf, hdrlen); + if (compat) { + compat->init_fn(&compat_c); + compat->update_fn(&compat_c, out_buf, hdrlen); + } crc32_begin(pack_file); @@ -1127,6 +1186,8 @@ static void stream_blob(uintmax_t len, struct object_id *oidout, uintmax_t mark) die("EOF in data (%" PRIuMAX " bytes remaining)", len); the_hash_algo->update_fn(&c, in_buf, n); + if (compat) + compat->update_fn(&compat_c, in_buf, n); s.next_in = in_buf; s.avail_in = n; len -= n; @@ -1153,6 +1214,8 @@ static void stream_blob(uintmax_t len, struct object_id *oidout, uintmax_t mark) } git_deflate_end(&s); the_hash_algo->final_oid_fn(&oid, &c); + if (compat) + compat->final_oid_fn(&compat_oid, &compat_c); if (oidout) oidcpy(oidout, &oid); @@ -1180,6 +1243,8 @@ static void stream_blob(uintmax_t len, struct object_id *oidout, uintmax_t mark) e->pack_id = pack_id; e->idx.offset = offset; e->idx.crc32 = crc32_end(pack_file); + if (compat) + oidcpy(&e->idx.compat_oid, &compat_oid); object_count++; object_count_by_type[OBJ_BLOB]++; } From patchwork Fri Sep 8 23:10:45 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Eric W. Biederman" X-Patchwork-Id: 13377917 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9BCD8EEB571 for ; Fri, 8 Sep 2023 23:31:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S245288AbjIHXbW (ORCPT ); Fri, 8 Sep 2023 19:31:22 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59348 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237902AbjIHXbV (ORCPT ); Fri, 8 Sep 2023 19:31:21 -0400 Received: from out03.mta.xmission.com (out03.mta.xmission.com [166.70.13.233]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 430B9E46 for ; Fri, 8 Sep 2023 16:31:14 -0700 (PDT) Received: from in02.mta.xmission.com ([166.70.13.52]:37576) by out03.mta.xmission.com with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1qekeK-00FHSi-Ov; Fri, 08 Sep 2023 17:12:28 -0600 Received: from ip68-227-168-167.om.om.cox.net ([68.227.168.167]:54328 helo=localhost.localdomain) by in02.mta.xmission.com with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1qekeJ-009u13-Os; Fri, 08 Sep 2023 17:12:28 -0600 From: "Eric W. Biederman" To: git@vger.kernel.org Cc: Junio C Hamano , "brian m. carlson" , "Eric W. Biederman" Date: Fri, 8 Sep 2023 18:10:45 -0500 Message-Id: <20230908231049.2035003-28-ebiederm@xmission.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <87sf7ol0z3.fsf@email.froward.int.ebiederm.org> References: <87sf7ol0z3.fsf@email.froward.int.ebiederm.org> MIME-Version: 1.0 X-XM-SPF: eid=1qekeJ-009u13-Os;;;mid=<20230908231049.2035003-28-ebiederm@xmission.com>;;;hst=in02.mta.xmission.com;;;ip=68.227.168.167;;;frm=ebiederm@xmission.com;;;spf=pass X-XM-AID: U2FsdGVkX1/tfzKy/AuVG0IDHhV1dEGvNMzsi2+Edfs= X-SA-Exim-Connect-IP: 68.227.168.167 X-SA-Exim-Mail-From: ebiederm@xmission.com Subject: [PATCH 28/32] builtin/index-pack: Add a simple oid index X-SA-Exim-Version: 4.2.1 (built Sat, 08 Feb 2020 21:53:50 +0000) X-SA-Exim-Scanned: Yes (on in02.mta.xmission.com) Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org To support computing the compatibility hash a way to lookup objects by their oid is needed. This adds a simple hash table to enable looking up objects by their oid. The implementation is inspired by the hash table for looking up object_entries by their oid in struct packing_data, and implemented in pack-objects.c Signed-off-by: "Eric W. Biederman" --- builtin/index-pack.c | 68 +++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 67 insertions(+), 1 deletion(-) diff --git a/builtin/index-pack.c b/builtin/index-pack.c index 006ffdc9c550..75c2113e455c 100644 --- a/builtin/index-pack.c +++ b/builtin/index-pack.c @@ -126,6 +126,9 @@ static int ref_deltas_alloc; static int nr_resolved_deltas; static int nr_threads; +static int32_t *oid_index; +static uint32_t oid_index_size; + static int from_stdin; static int strict; static int do_fsck_object; @@ -183,6 +186,62 @@ static inline void unlock_mutex(pthread_mutex_t *mutex) pthread_mutex_unlock(mutex); } +static uint32_t locate_oid_index(const struct object_id *oid, int *found) +{ + uint32_t i, mask = (oid_index_size - 1); + + i = oidhash(oid) & mask; + + while (oid_index[i] > 0) { + uint32_t pos = oid_index[i] - 1; + + if (oideq(oid, &objects[pos].idx.oid)) { + *found = 1; + return i; + } + + i = (i + 1) & mask; + } + + *found = 0; + return i; +} + +static void place_in_oid_index(struct object_entry *obj) +{ + int found; + uint32_t pos = locate_oid_index(&obj->idx.oid, &found); + + /* Ignore duplicates */ + if (found) + return; + + oid_index[pos] = (obj - objects) + 1; +} + +static struct object_entry *find_in_oid_index(struct object_id *oid) +{ + uint32_t i; + int found; + + i = locate_oid_index(oid, &found); + if (!found) + return NULL; + + return &objects[oid_index[i] - 1]; +} + +static inline uint32_t closest_pow2(uint32_t v) +{ + v = v - 1; + v |= v >> 1; + v |= v >> 2; + v |= v >> 4; + v |= v >> 8; + v |= v >> 16; + return v + 1; +} + /* * Mutex and conditional variable can't be statically-initialized on Windows. */ @@ -987,6 +1046,7 @@ static struct base_data *resolve_delta(struct object_entry *delta_obj, bad_object(delta_obj->idx.offset, _("failed to apply delta")); hash_object_file(the_hash_algo, result_data, result_size, delta_obj->real_type, &delta_obj->idx.oid); + place_in_oid_index(delta_obj); sha1_object(result_data, NULL, result_size, delta_obj->real_type, &delta_obj->idx.oid); @@ -1188,12 +1248,16 @@ static void parse_pack_objects(unsigned char *hash) ref_deltas[nr_ref_deltas].obj_no = i; nr_ref_deltas++; } else if (!data) { + place_in_oid_index(obj); + /* large blobs, check later */ obj->real_type = OBJ_BAD; nr_delays++; - } else + } else { + place_in_oid_index(obj); sha1_object(data, NULL, obj->size, obj->type, &obj->idx.oid); + } free(data); display_progress(progress, i+1); } @@ -1918,6 +1982,8 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix) if (show_stat) CALLOC_ARRAY(obj_stat, st_add(nr_objects, 1)); CALLOC_ARRAY(ofs_deltas, nr_objects); + oid_index_size = closest_pow2(nr_objects * 3); + CALLOC_ARRAY(oid_index, oid_index_size); parse_pack_objects(pack_hash); if (report_end_of_input) write_in_full(2, "\0", 1); From patchwork Fri Sep 8 23:10:46 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Eric W. Biederman" X-Patchwork-Id: 13377904 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1BD51EEB56E for ; Fri, 8 Sep 2023 23:12:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1344761AbjIHXMj (ORCPT ); Fri, 8 Sep 2023 19:12:39 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37710 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1344752AbjIHXMi (ORCPT ); Fri, 8 Sep 2023 19:12:38 -0400 X-Greylist: delayed 66 seconds by postgrey-1.37 at lindbergh.monkeyblade.net; Fri, 08 Sep 2023 16:12:32 PDT Received: from out01.mta.xmission.com (out01.mta.xmission.com [166.70.13.231]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 90825210B for ; Fri, 8 Sep 2023 16:12:32 -0700 (PDT) Received: from in02.mta.xmission.com ([166.70.13.52]:35036) by out01.mta.xmission.com with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1qekeN-007R4u-IF; Fri, 08 Sep 2023 17:12:31 -0600 Received: from ip68-227-168-167.om.om.cox.net ([68.227.168.167]:54328 helo=localhost.localdomain) by in02.mta.xmission.com with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1qekeL-009u13-Qw; Fri, 08 Sep 2023 17:12:31 -0600 From: "Eric W. Biederman" To: git@vger.kernel.org Cc: Junio C Hamano , "brian m. carlson" , "Eric W. Biederman" Date: Fri, 8 Sep 2023 18:10:46 -0500 Message-Id: <20230908231049.2035003-29-ebiederm@xmission.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <87sf7ol0z3.fsf@email.froward.int.ebiederm.org> References: <87sf7ol0z3.fsf@email.froward.int.ebiederm.org> MIME-Version: 1.0 X-XM-SPF: eid=1qekeL-009u13-Qw;;;mid=<20230908231049.2035003-29-ebiederm@xmission.com>;;;hst=in02.mta.xmission.com;;;ip=68.227.168.167;;;frm=ebiederm@xmission.com;;;spf=pass X-XM-AID: U2FsdGVkX1+RIArhl2iJZ4XaH6HK0Y0jJdmrgsXanEE= X-SA-Exim-Connect-IP: 68.227.168.167 X-SA-Exim-Mail-From: ebiederm@xmission.com Subject: [PATCH 29/32] builtin/index-pack: Compute the compatibility hash X-SA-Exim-Version: 4.2.1 (built Sat, 08 Feb 2020 21:53:50 +0000) X-SA-Exim-Scanned: Yes (on in02.mta.xmission.com) Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org When a pack is received encoded with the same algorithm as our repository it is necessary to compute it's hash values and create it's indexes. That is the job of "git index-pack". To compute the primary hash values of the objects, the objects must be loaded in memory. With the objects loaded into memory this is the perfect time to also compute the compatiblity hash values of the objects as loading the objects into memory is the primary cost of that operation. This is limited by the the fact that to compute the compatiblity hash for tree objects, commit objects, and tag objects the objects need to encoded into their compatbiilty form which requires replacing references to objects encoded with the primary hash to references to the same objects encoded with the compatibility hash. Which means that before the compatibility hash for a tree object, commit object or tag object can be computed the compatibility hash for all objects to which they refer must be computed first. In general this requires an extra pass so that the dependencies between objects can be resolved. Signed-off-by: "Eric W. Biederman" --- builtin/index-pack.c | 335 ++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 328 insertions(+), 7 deletions(-) diff --git a/builtin/index-pack.c b/builtin/index-pack.c index 75c2113e455c..f5da671ed82d 100644 --- a/builtin/index-pack.c +++ b/builtin/index-pack.c @@ -18,12 +18,15 @@ #include "thread-utils.h" #include "packfile.h" #include "pack-revindex.h" +#include "pack-compat-map.h" #include "object-file.h" #include "object-store-ll.h" #include "oid-array.h" #include "replace-object.h" #include "promisor-remote.h" #include "setup.h" +#include "strbuf.h" +#include "object-file-convert.h" static const char index_pack_usage[] = "git index-pack [-v] [-o ] [--keep | --keep=] [--[no-]rev-index] [--verify] [--strict] ( | --stdin [--fix-thin] [])"; @@ -124,6 +127,8 @@ static int nr_ofs_deltas; static int nr_ref_deltas; static int ref_deltas_alloc; static int nr_resolved_deltas; +static int nr_pending_mappings; +static int nr_resolved_mappings; static int nr_threads; static int32_t *oid_index; @@ -505,28 +510,76 @@ static void prune_base_data(struct base_data *retain) } } +static int compat_hash_object_file(const void *buf, size_t len, enum object_type type, + struct object_id *oid) +{ + struct repository *repo = the_repository; + const struct git_hash_algo *algo = repo->hash_algo; + const struct git_hash_algo *compat = repo->compat_hash_algo; + struct object_file_convert_state state; + struct strbuf out = STRBUF_INIT; + int ret; + + convert_object_file_begin(&state, &out, algo, compat, + buf, len, type); + for (;;) { + struct object_entry *pobj; + ret = convert_object_file_step(&state); + if (ret != 1) + break; + + pobj = find_in_oid_index(&state.oid); + + ret = -1; + if (pobj && pobj->idx.compat_oid.algo) + oidcpy(&state.mapped_oid, &pobj->idx.compat_oid); + else if (pobj) + break; + else if (repo_oid_to_algop(repo, &state.oid, compat, + &state.mapped_oid)) + break; + } + convert_object_file_end(&state, ret); + if (ret == 0) { + hash_object_file(compat, out.buf, out.len, type, oid); + strbuf_release(&out); + } + return ret; +} + static int is_delta_type(enum object_type type) { return (type == OBJ_REF_DELTA || type == OBJ_OFS_DELTA); } static void *unpack_entry_data(off_t offset, unsigned long size, - enum object_type type, struct object_id *oid) + enum object_type type, struct object_id *oid, + struct object_id *compat_oid) { + const struct git_hash_algo *compat = the_repository->compat_hash_algo; static char fixed_buf[8192]; int status; git_zstream stream; void *buf; - git_hash_ctx c; + git_hash_ctx c, compat_c; char hdr[32]; int hdrlen; + if (!compat) + compat_oid = NULL; + if (!is_delta_type(type)) { hdrlen = format_object_header(hdr, sizeof(hdr), type, size); the_hash_algo->init_fn(&c); the_hash_algo->update_fn(&c, hdr, hdrlen); - } else + if (compat_oid && (type == OBJ_BLOB)) { + compat->init_fn(&compat_c); + compat->update_fn(&compat_c, hdr, hdrlen); + } + } else { oid = NULL; + compat_oid = NULL; + } if (type == OBJ_BLOB && size > big_file_threshold) buf = fixed_buf; else @@ -545,6 +598,8 @@ static void *unpack_entry_data(off_t offset, unsigned long size, use(input_len - stream.avail_in); if (oid) the_hash_algo->update_fn(&c, last_out, stream.next_out - last_out); + if (compat_oid && (type == OBJ_BLOB)) + compat->update_fn(&compat_c, last_out, stream.next_out - last_out); if (buf == fixed_buf) { stream.next_out = buf; stream.avail_out = sizeof(fixed_buf); @@ -555,13 +610,20 @@ static void *unpack_entry_data(off_t offset, unsigned long size, git_inflate_end(&stream); if (oid) the_hash_algo->final_oid_fn(oid, &c); + if (compat_oid && (type == OBJ_BLOB)) + compat->final_oid_fn(compat_oid, &compat_c); + else if (compat_oid && + compat_hash_object_file(buf, size, type, compat_oid)) { + nr_pending_mappings++; + } return buf == fixed_buf ? NULL : buf; } static void *unpack_raw_entry(struct object_entry *obj, off_t *ofs_offset, struct object_id *ref_oid, - struct object_id *oid) + struct object_id *oid, + struct object_id *compat_oid) { unsigned char *p; unsigned long size, c; @@ -620,7 +682,8 @@ static void *unpack_raw_entry(struct object_entry *obj, } obj->hdr_size = consumed_bytes - obj->idx.offset; - data = unpack_entry_data(obj->idx.offset, obj->size, obj->type, oid); + data = unpack_entry_data(obj->idx.offset, obj->size, obj->type, oid, + compat_oid); obj->idx.crc32 = input_crc32; return data; } @@ -1023,9 +1086,11 @@ static struct base_data *make_base(struct object_entry *obj, static struct base_data *resolve_delta(struct object_entry *delta_obj, struct base_data *base) { + const struct git_hash_algo *compat = the_repository->compat_hash_algo; void *delta_data, *result_data; struct base_data *result; unsigned long result_size; + int pending_map = 0; if (show_stat) { int i = delta_obj - objects; @@ -1046,6 +1111,16 @@ static struct base_data *resolve_delta(struct object_entry *delta_obj, bad_object(delta_obj->idx.offset, _("failed to apply delta")); hash_object_file(the_hash_algo, result_data, result_size, delta_obj->real_type, &delta_obj->idx.oid); + if (compat && (delta_obj->real_type == OBJ_BLOB)) + hash_object_file(compat, result_data, result_size, + delta_obj->real_type, &delta_obj->idx.compat_oid); + else if (compat && + compat_hash_object_file(result_data, result_size, + delta_obj->real_type, + &delta_obj->idx.compat_oid)) { + pending_map = 1; + } + place_in_oid_index(delta_obj); sha1_object(result_data, NULL, result_size, delta_obj->real_type, &delta_obj->idx.oid); @@ -1056,6 +1131,8 @@ static struct base_data *resolve_delta(struct object_entry *delta_obj, counter_lock(); nr_resolved_deltas++; + if (pending_map) + nr_pending_mappings++; counter_unlock(); return result; @@ -1236,7 +1313,8 @@ static void parse_pack_objects(unsigned char *hash) struct object_entry *obj = &objects[i]; void *data = unpack_raw_entry(obj, &ofs_delta->offset, &ref_delta_oid, - &obj->idx.oid); + &obj->idx.oid, + &obj->idx.compat_oid); obj->real_type = obj->type; if (obj->type == OBJ_OFS_DELTA) { nr_ofs_deltas++; @@ -1578,6 +1656,7 @@ static void rename_tmp_packfile(const char **final_name, static void final(const char *final_pack_name, const char *curr_pack_name, const char *final_index_name, const char *curr_index_name, const char *final_rev_index_name, const char *curr_rev_index_name, + const char *final_compat_index_name, const char *curr_compat_index_name, const char *keep_msg, const char *promisor_msg, unsigned char *hash) { @@ -1585,6 +1664,7 @@ static void final(const char *final_pack_name, const char *curr_pack_name, struct strbuf pack_name = STRBUF_INIT; struct strbuf index_name = STRBUF_INIT; struct strbuf rev_index_name = STRBUF_INIT; + struct strbuf compat_index_name = STRBUF_INIT; int err; if (!from_stdin) { @@ -1608,6 +1688,9 @@ static void final(const char *final_pack_name, const char *curr_pack_name, if (curr_rev_index_name) rename_tmp_packfile(&final_rev_index_name, curr_rev_index_name, &rev_index_name, hash, "rev", 1); + if (curr_compat_index_name) + rename_tmp_packfile(&final_compat_index_name, curr_compat_index_name, + &compat_index_name, hash, "compat", 1); rename_tmp_packfile(&final_index_name, curr_index_name, &index_name, hash, "idx", 1); @@ -1640,6 +1723,7 @@ static void final(const char *final_pack_name, const char *curr_pack_name, } } + strbuf_release(&compat_index_name); strbuf_release(&rev_index_name); strbuf_release(&index_name); strbuf_release(&pack_name); @@ -1789,16 +1873,236 @@ static void show_pack_info(int stat_only) free(chain_histogram); } +static int compare_ofs_delta_entry_obj_no(const void *a, const void *b) +{ + const struct ofs_delta_entry *delta_a = a; + const struct ofs_delta_entry *delta_b = b; + + return delta_a->obj_no < delta_b->obj_no ? -1 : + delta_a->obj_no > delta_b->obj_no ? 1 : + 0; +} + +static int compare_ref_delta_entry_obj_no(const void *a, const void *b) +{ + const struct ref_delta_entry *delta_a = a; + const struct ref_delta_entry *delta_b = b; + + return delta_a->obj_no < delta_b->obj_no ? -1 : + delta_a->obj_no > delta_b->obj_no ? 1 : + 0; +} + +static struct ofs_delta_entry *find_ofs_delta_obj_no(int obj_no) +{ + int first = 0, last = nr_ofs_deltas; + + while (first < last) { + int next = first + (last - first) / 2; + struct ofs_delta_entry *entry = &ofs_deltas[next]; + + if (obj_no == entry->obj_no) + return entry; + if (obj_no < entry->obj_no) { + last = next; + continue; + } + first = next + 1; + } + return NULL; +} + +static struct ref_delta_entry *find_ref_delta_obj_no(int obj_no) +{ + int first = 0, last = nr_ref_deltas; + + while (first < last) { + int next = first + (last - first) / 2; + struct ref_delta_entry *entry = &ref_deltas[next]; + + if (obj_no == entry->obj_no) + return entry; + if (obj_no < entry->obj_no) { + last = next; + continue; + } + first = next + 1; + } + return NULL; +} + +static struct object_entry *find_obj_offset(off_t offset) +{ + int first = 0, last = nr_objects; + + while (first < last) { + int next = first + (last - first) / 2; + struct object_entry *entry = &objects[next]; + + if (offset == entry->idx.offset) + return entry; + if (offset < entry->idx.offset) { + last = next; + continue; + } + first = next + 1; + } + return NULL; +} + +static void *get_object_data(struct object_entry *obj, size_t *result_size) +{ + /* Allow random reading objects */ + void *data; + + if (!is_delta_type(obj->type)) { + data = get_data_from_pack(obj); + *result_size = obj->size; + return data; + } + if (obj->type == OBJ_OFS_DELTA) { + struct ofs_delta_entry *delta; + struct object_entry *bobj; + size_t base_size; + void *base, *raw; + + delta = find_ofs_delta_obj_no(obj - objects); + if (!delta) + BUG("Delta object without ofs_delta entry"); + + bobj = find_obj_offset(delta->offset); + if (!bobj) + BUG("Delta object without object entry"); + + base = get_object_data(bobj, &base_size); + raw = get_data_from_pack(obj); + data = patch_delta( + base, base_size, + raw, obj->size, + result_size); + if (!data) + BUG("patch_delta failed"); + free(raw); + free(base); + return data; + } + if (obj->type == OBJ_REF_DELTA) { + struct ref_delta_entry *delta; + enum object_type base_type; + size_t base_size; + void *base, *raw; + + delta = find_ref_delta_obj_no(obj - objects); + if (!delta) + BUG("Delta object without ref_delta entry"); + + base = repo_read_object_file(the_repository, &delta->oid, + &base_type, &base_size); + if (!base) + BUG("ref_delta oid %s not present in repository", + oid_to_hex(&delta->oid)); + raw = get_data_from_pack(obj); + data = patch_delta( + base, base_size, + raw, obj->size, + result_size); + if (!data) + BUG("patch_delta failed"); + free(raw); + free(base); + return data; + } + return NULL; /* The code never reaches here */ +} + +static void compute_compat_oid(struct object_entry *obj) +{ + struct repository *repo = the_repository; + const struct git_hash_algo *algo = repo->hash_algo; + const struct git_hash_algo *compat = repo->compat_hash_algo; + struct object_file_convert_state state; + struct strbuf out = STRBUF_INIT; + size_t data_size; + void *data; + int ret; + + if (obj->idx.compat_oid.algo) + return; + + if (obj->real_type == OBJ_BLOB) + die("Blob object not converted"); + + data = get_object_data(obj, &data_size); + + convert_object_file_begin(&state, &out, algo, compat, + data, data_size, obj->real_type); + + for (;;) { + struct object_entry *pobj; + ret = convert_object_file_step(&state); + if (ret != 1) + break; + /* Does it name an object in the pack? */ + pobj = find_in_oid_index(&state.oid); + if (pobj) { + compute_compat_oid(pobj); + oidcpy(&state.mapped_oid, &pobj->idx.compat_oid); + } else if (repo_oid_to_algop(repo, &state.oid, compat, + &state.mapped_oid)) + die(_("No mapping for oid %s to %s\n"), + oid_to_hex(&state.oid), compat->name); + } + convert_object_file_end(&state, ret); + if (ret != 0) + die(_("Bad object %s\n"), oid_to_hex(&obj->idx.oid)); + hash_object_file(compat, out.buf, out.len, obj->real_type, + &obj->idx.compat_oid); + strbuf_release(&out); + + free(data); + + nr_resolved_mappings++; + display_progress(progress, nr_resolved_mappings); +} + +static void compute_compat_oids(void) +{ + unsigned i; + + if (verbose) + progress = start_progress(_("Mapping objects"), + nr_pending_mappings); + + /* Sort deltas by obj_no for fast searching */ + QSORT(ofs_deltas, nr_ofs_deltas, compare_ofs_delta_entry_obj_no); + QSORT(ref_deltas, nr_ref_deltas, compare_ref_delta_entry_obj_no); + + for (i = 0; i < nr_objects; i++) { + struct object_entry *obj = &objects[i]; + if (obj->idx.compat_oid.algo) + continue; + if (is_delta_type(obj->real_type)) + continue; + compute_compat_oid(obj); + } + + stop_progress(&progress); +} + int cmd_index_pack(int argc, const char **argv, const char *prefix) { + const struct git_hash_algo *compat = the_repository->compat_hash_algo; int i, fix_thin_pack = 0, verify = 0, stat_only = 0, rev_index; const char *curr_index; const char *curr_rev_index = NULL; + const char *curr_compat_index = NULL; const char *index_name = NULL, *pack_name = NULL, *rev_index_name = NULL; + const char *compat_index_name = NULL; const char *keep_msg = NULL; const char *promisor_msg = NULL; struct strbuf index_name_buf = STRBUF_INIT; struct strbuf rev_index_name_buf = STRBUF_INIT; + struct strbuf compat_index_name_buf = STRBUF_INIT; struct pack_idx_entry **idx_objects; struct pack_idx_option opts; unsigned char pack_hash[GIT_MAX_RAWSZ]; @@ -1946,6 +2250,12 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix) "idx", "rev", &rev_index_name_buf); } + if (compat) { + if (index_name) + compat_index_name = derive_filename(index_name, + "idx", "compat", + &compat_index_name_buf); + } if (verify) { if (!index_name) @@ -1989,6 +2299,8 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix) write_in_full(2, "\0", 1); resolve_deltas(); conclude_pack(fix_thin_pack, curr_pack, pack_hash); + if (compat) + compute_compat_oids(); free(ofs_deltas); free(ref_deltas); if (strict) @@ -1999,18 +2311,24 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix) ALLOC_ARRAY(idx_objects, nr_objects); for (i = 0; i < nr_objects; i++) - idx_objects[i] = &objects[i].idx; + idx_objects[i] = (struct pack_idx_entry *)&objects[i].idx; curr_index = write_idx_file(index_name, idx_objects, nr_objects, &opts, pack_hash); if (rev_index) curr_rev_index = write_rev_file(rev_index_name, idx_objects, nr_objects, pack_hash, opts.flags); + + if (compat) + curr_compat_index = write_compat_map_file( + (opts.flags & WRITE_IDX_VERIFY) ? compat_index_name : NULL, + idx_objects, nr_objects, pack_hash); free(idx_objects); if (!verify) final(pack_name, curr_pack, index_name, curr_index, rev_index_name, curr_rev_index, + compat_index_name, curr_compat_index, keep_msg, promisor_msg, pack_hash); else @@ -2023,12 +2341,15 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix) free(objects); strbuf_release(&index_name_buf); strbuf_release(&rev_index_name_buf); + strbuf_release(&compat_index_name_buf); if (!pack_name) free((void *) curr_pack); if (!index_name) free((void *) curr_index); if (!rev_index_name) free((void *) curr_rev_index); + if (!compat_index_name) + free((void *) curr_compat_index); /* * Let the caller know this pack is not self contained From patchwork Fri Sep 8 23:10:47 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Eric W. Biederman" X-Patchwork-Id: 13377916 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 879C8EEB571 for ; Fri, 8 Sep 2023 23:31:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1344310AbjIHXbS (ORCPT ); Fri, 8 Sep 2023 19:31:18 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59206 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S243773AbjIHXbQ (ORCPT ); Fri, 8 Sep 2023 19:31:16 -0400 Received: from out03.mta.xmission.com (out03.mta.xmission.com [166.70.13.233]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5F74EE46 for ; Fri, 8 Sep 2023 16:31:06 -0700 (PDT) Received: from in02.mta.xmission.com ([166.70.13.52]:37636) by out03.mta.xmission.com with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1qekeP-00FHT2-Pq; Fri, 08 Sep 2023 17:12:33 -0600 Received: from ip68-227-168-167.om.om.cox.net ([68.227.168.167]:54328 helo=localhost.localdomain) by in02.mta.xmission.com with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1qekeO-009u13-KK; Fri, 08 Sep 2023 17:12:33 -0600 From: "Eric W. Biederman" To: git@vger.kernel.org Cc: Junio C Hamano , "brian m. carlson" , "Eric W. Biederman" Date: Fri, 8 Sep 2023 18:10:47 -0500 Message-Id: <20230908231049.2035003-30-ebiederm@xmission.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <87sf7ol0z3.fsf@email.froward.int.ebiederm.org> References: <87sf7ol0z3.fsf@email.froward.int.ebiederm.org> MIME-Version: 1.0 X-XM-SPF: eid=1qekeO-009u13-KK;;;mid=<20230908231049.2035003-30-ebiederm@xmission.com>;;;hst=in02.mta.xmission.com;;;ip=68.227.168.167;;;frm=ebiederm@xmission.com;;;spf=pass X-XM-AID: U2FsdGVkX19R7ltaLW1xXtFciYsER75FXlQa1q2Njt0= X-SA-Exim-Connect-IP: 68.227.168.167 X-SA-Exim-Mail-From: ebiederm@xmission.com Subject: [PATCH 30/32] builtin/index-pack: Make the stack in compute_compat_oid explicit X-SA-Exim-Version: 4.2.1 (built Sat, 08 Feb 2020 21:53:50 +0000) X-SA-Exim-Scanned: Yes (on in02.mta.xmission.com) Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Testing index-pack generating the compatibilty hashes on a large repository (in this case the linux kernel) resulted in a stack overflow. I confirmed this by using ulimit -s to force the stack to a much larger size, and rerunning the test and the code succeeded. Still it is not a good look to overflow the stack in the default configuration. Ideally the objects would be ordered such that no object has any references to any object that comes after it. With such an ordering convert_object_file followed by hash_object_file to could just be on every object in order to compute the compatibility hashes for every object. Unfortunately the work to compute such an order is roughly equaivalent to the depth first processing compute_compat_oid is doing. The objects have to be loaded to get which other objects they reference. Knowning which objects reference which others is necessary to compute such an order. Long story short I can see how to move the depth first traversal into a topological sort, but that just moves the problem that caused the deep recursion into another function, and makes everything more expensive by requiring reading the objects yet another time. Avoid stack overflow by using an explicitly stack made of heap allocated objects instead of using the C call stack. To get a feel for how much this explicit stack consumes I instrumented up the code. Testing against a linux kernel 2.16GiB packfile. This packfile had 9,033,248 objects, and 7,470,317 deltas. There were 6,543,758 mappings that cound not be computed opportunistically when the data was first read. In the function compute_compat_oid I measured a maximum cco stack depth of 66,415. I measured a maximum memory consumption of 103,783,520 bytes, or about 1563 bytes per level of the stack. In short call it 100MiB extra to compute the mappings in a 2GiB packfile. Signed-off-by: "Eric W. Biederman" --- builtin/index-pack.c | 106 +++++++++++++++++++++++++++++-------------- 1 file changed, 71 insertions(+), 35 deletions(-) diff --git a/builtin/index-pack.c b/builtin/index-pack.c index f5da671ed82d..6827d14b91ce 100644 --- a/builtin/index-pack.c +++ b/builtin/index-pack.c @@ -2015,54 +2015,90 @@ static void *get_object_data(struct object_entry *obj, size_t *result_size) return NULL; /* The code never reaches here */ } -static void compute_compat_oid(struct object_entry *obj) -{ - struct repository *repo = the_repository; - const struct git_hash_algo *algo = repo->hash_algo; - const struct git_hash_algo *compat = repo->compat_hash_algo; +struct cco { + struct cco *prev; + struct object_entry *obj; struct object_file_convert_state state; - struct strbuf out = STRBUF_INIT; + struct strbuf out; size_t data_size; void *data; - int ret; +}; - if (obj->idx.compat_oid.algo) - return; +static struct cco *cco_push(struct cco *prev, struct object_entry *obj) +{ + struct repository *repo = the_repository; + const struct git_hash_algo *algo = repo->hash_algo; + const struct git_hash_algo *compat = repo->compat_hash_algo; + struct cco *cco; if (obj->real_type == OBJ_BLOB) - die("Blob object not converted"); + BUG("Blob object not converted"); - data = get_object_data(obj, &data_size); + cco = xmallocz(sizeof(*cco)); + cco->prev = prev; + cco->obj = obj; + strbuf_init(&cco->out, 0); - convert_object_file_begin(&state, &out, algo, compat, - data, data_size, obj->real_type); + cco->data = get_object_data(obj, &cco->data_size); - for (;;) { - struct object_entry *pobj; - ret = convert_object_file_step(&state); - if (ret != 1) - break; - /* Does it name an object in the pack? */ - pobj = find_in_oid_index(&state.oid); - if (pobj) { - compute_compat_oid(pobj); - oidcpy(&state.mapped_oid, &pobj->idx.compat_oid); - } else if (repo_oid_to_algop(repo, &state.oid, compat, - &state.mapped_oid)) - die(_("No mapping for oid %s to %s\n"), - oid_to_hex(&state.oid), compat->name); - } - convert_object_file_end(&state, ret); - if (ret != 0) - die(_("Bad object %s\n"), oid_to_hex(&obj->idx.oid)); - hash_object_file(compat, out.buf, out.len, obj->real_type, - &obj->idx.compat_oid); - strbuf_release(&out); + convert_object_file_begin(&cco->state, &cco->out, algo, compat, + cco->data, cco->data_size, obj->real_type); + return cco; +} - free(data); +static struct cco *cco_pop(struct cco *cco, int ret) +{ + struct repository *repo = the_repository; + const struct git_hash_algo *compat = repo->compat_hash_algo; + struct cco *prev = cco->prev; + + convert_object_file_end(&cco->state, ret); + if (ret != 0) + die(_("Bad object %s\n"), oid_to_hex(&cco->obj->idx.oid)); + hash_object_file(compat, cco->out.buf, cco->out.len, + cco->obj->real_type, &cco->obj->idx.compat_oid); + strbuf_release(&cco->out); + if (prev) + oidcpy(&prev->state.mapped_oid, &cco->obj->idx.compat_oid); nr_resolved_mappings++; display_progress(progress, nr_resolved_mappings); + + free(cco->data); + free(cco); + + return prev; +} + +static void compute_compat_oid(struct object_entry *obj) +{ + struct repository *repo = the_repository; + const struct git_hash_algo *compat = repo->compat_hash_algo; + struct cco *cco; + + cco = cco_push(NULL, obj); + for (;cco;) { + struct object_entry *pobj; + + int ret = convert_object_file_step(&cco->state); + if (ret != 1) { + cco = cco_pop(cco, ret); + continue; + } + + /* Does it name an object in the pack? */ + pobj = find_in_oid_index(&cco->state.oid); + if (pobj && pobj->idx.compat_oid.algo) + oidcpy(&cco->state.mapped_oid, &pobj->idx.compat_oid); + else if (pobj) + cco = cco_push(cco, pobj); + else if (repo_oid_to_algop(repo, &cco->state.oid, compat, + &cco->state.mapped_oid)) + die(_("When converting %s no mapping for oid %s to %s\n"), + oid_to_hex(&cco->obj->idx.oid), + oid_to_hex(&cco->state.oid), + compat->name); + } } static void compute_compat_oids(void) From patchwork Fri Sep 8 23:10:48 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Eric W. Biederman" X-Patchwork-Id: 13377905 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6FEADEEB570 for ; Fri, 8 Sep 2023 23:12:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1344767AbjIHXMr (ORCPT ); Fri, 8 Sep 2023 19:12:47 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33036 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1344775AbjIHXMn (ORCPT ); Fri, 8 Sep 2023 19:12:43 -0400 Received: from out02.mta.xmission.com (out02.mta.xmission.com [166.70.13.232]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AEB78E46 for ; Fri, 8 Sep 2023 16:12:36 -0700 (PDT) Received: from in02.mta.xmission.com ([166.70.13.52]:39970) by out02.mta.xmission.com with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1qekeR-006Mz5-Rm; Fri, 08 Sep 2023 17:12:35 -0600 Received: from ip68-227-168-167.om.om.cox.net ([68.227.168.167]:54328 helo=localhost.localdomain) by in02.mta.xmission.com with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1qekeQ-009u13-SE; Fri, 08 Sep 2023 17:12:35 -0600 From: "Eric W. Biederman" To: git@vger.kernel.org Cc: Junio C Hamano , "brian m. carlson" , "Eric W. Biederman" Date: Fri, 8 Sep 2023 18:10:48 -0500 Message-Id: <20230908231049.2035003-31-ebiederm@xmission.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <87sf7ol0z3.fsf@email.froward.int.ebiederm.org> References: <87sf7ol0z3.fsf@email.froward.int.ebiederm.org> MIME-Version: 1.0 X-XM-SPF: eid=1qekeQ-009u13-SE;;;mid=<20230908231049.2035003-31-ebiederm@xmission.com>;;;hst=in02.mta.xmission.com;;;ip=68.227.168.167;;;frm=ebiederm@xmission.com;;;spf=pass X-XM-AID: U2FsdGVkX1/8pbldQrsM7Fw6wO8gSYSrsAIUxrJLnEM= X-SA-Exim-Connect-IP: 68.227.168.167 X-SA-Exim-Mail-From: ebiederm@xmission.com Subject: [PATCH 31/32] unpack-objects: Update to compute and write the compatibility hashes X-SA-Exim-Version: 4.2.1 (built Sat, 08 Feb 2020 21:53:50 +0000) X-SA-Exim-Scanned: Yes (on in02.mta.xmission.com) Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org To properly generate the compatibility hash objects that are referred to must be written before the objects that refer to them. When --strict is set the unpack-objects already writes objects in that order. When a compatibilty hash is desired force use of the same code path that --strict uses. If --strict is not wanted don't actually fsck the object buffers, just use fsck_walk to walk to the parents of the objects recursively. Unlike in index-pack nothing special needs to be done when an object is written. The guarantee that referred to objects are written to the loose object store before their refers ensures that the object mappings are in the loose object map. The object mapings being in the loose object map guarantees that the call to convert_object_file can find all of the mappings of the referred to objects. Signed-off-by: "Eric W. Biederman" --- builtin/unpack-objects.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c index 32505255a009..834551142cd8 100644 --- a/builtin/unpack-objects.c +++ b/builtin/unpack-objects.c @@ -241,7 +241,8 @@ static int check_object(struct object *obj, enum object_type type, obj_buf = lookup_object_buffer(obj); if (!obj_buf) die("Whoops! Cannot find object '%s'", oid_to_hex(&obj->oid)); - if (fsck_object(obj, obj_buf->buffer, obj_buf->size, &fsck_options)) + if (strict && + fsck_object(obj, obj_buf->buffer, obj_buf->size, &fsck_options)) die("fsck error in packed object"); fsck_options.walk = check_object; if (fsck_walk(obj, NULL, &fsck_options)) @@ -270,7 +271,7 @@ static void added_object(unsigned nr, enum object_type type, static void write_object(unsigned nr, enum object_type type, void *buf, unsigned long size) { - if (!strict) { + if (!strict && !the_repository->compat_hash_algo) { if (write_object_file(buf, size, type, &obj_list[nr].oid) < 0) die("failed to write object"); @@ -409,7 +410,7 @@ static void stream_blob(unsigned long size, unsigned nr) die(_("inflate returned (%d)"), data.status); git_inflate_end(&zstream); - if (strict) { + if (strict || the_repository->compat_hash_algo) { struct blob *blob = lookup_blob(the_repository, &info->oid); if (!blob) @@ -670,11 +671,10 @@ int cmd_unpack_objects(int argc, const char **argv, const char *prefix UNUSED) unpack_all(); the_hash_algo->update_fn(&ctx, buffer, offset); the_hash_algo->final_oid_fn(&oid, &ctx); - if (strict) { + if (strict || the_repository->compat_hash_algo) write_rest(); - if (fsck_finish(&fsck_options)) - die(_("fsck error in pack objects")); - } + if (strict && fsck_finish(&fsck_options)) + die(_("fsck error in pack objects")); if (!hasheq(fill(the_hash_algo->rawsz), oid.hash)) die("final sha1 did not match"); use(the_hash_algo->rawsz); From patchwork Fri Sep 8 23:10:49 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Eric W. Biederman" X-Patchwork-Id: 13377915 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A76EBEEB571 for ; Fri, 8 Sep 2023 23:31:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238962AbjIHXbJ (ORCPT ); Fri, 8 Sep 2023 19:31:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33886 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236914AbjIHXbI (ORCPT ); Fri, 8 Sep 2023 19:31:08 -0400 Received: from out03.mta.xmission.com (out03.mta.xmission.com [166.70.13.233]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2FDE8212A for ; Fri, 8 Sep 2023 16:30:54 -0700 (PDT) Received: from in02.mta.xmission.com ([166.70.13.52]:37694) by out03.mta.xmission.com with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1qekeU-00FHTK-3m; Fri, 08 Sep 2023 17:12:38 -0600 Received: from ip68-227-168-167.om.om.cox.net ([68.227.168.167]:54328 helo=localhost.localdomain) by in02.mta.xmission.com with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1qekeS-009u13-UB; Fri, 08 Sep 2023 17:12:37 -0600 From: "Eric W. Biederman" To: git@vger.kernel.org Cc: Junio C Hamano , "brian m. carlson" , "Eric W. Biederman" Date: Fri, 8 Sep 2023 18:10:49 -0500 Message-Id: <20230908231049.2035003-32-ebiederm@xmission.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <87sf7ol0z3.fsf@email.froward.int.ebiederm.org> References: <87sf7ol0z3.fsf@email.froward.int.ebiederm.org> MIME-Version: 1.0 X-XM-SPF: eid=1qekeS-009u13-UB;;;mid=<20230908231049.2035003-32-ebiederm@xmission.com>;;;hst=in02.mta.xmission.com;;;ip=68.227.168.167;;;frm=ebiederm@xmission.com;;;spf=pass X-XM-AID: U2FsdGVkX19I/XbGBI/Kx8ojoZqRVwW7X0RIbQiXenc= X-SA-Exim-Connect-IP: 68.227.168.167 X-SA-Exim-Mail-From: ebiederm@xmission.com Subject: [PATCH 32/32] object-file-convert: Implement repo_submodule_oid_to_algop X-SA-Exim-Version: 4.2.1 (built Sat, 08 Feb 2020 21:53:50 +0000) X-SA-Exim-Scanned: Yes (on in02.mta.xmission.com) Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From time to time git tree objects contain gitlinks. These gitlinks contain the oid of an object in another git repository. To succesfully translate these oids it is necessary to look at the mapping tables in the submodules where the mapping tables live. Limiting myself to submodule interfaces I can see in the code repo_submodule_oid_to_algop is the best I can figure out how to do, for a gitlink agnostic implementation. The big downsides are that the code as implemented is not thread safe, it depends upon a worktree, and it always walks through all of the submodules. There are interfaces in the code to lookup the submodule for an individual gitlink. As such iterating all of the submodules could be avoided if care was taken to compute the path to the gitlink and to recognizes the code is translating a gitlink. The dependency on a worktree, and the thread safety issues I do not see a solution to short of reworking how git deals with submodules. For now repo_oid_to_algop does not call repo_submodule_oid_to_algop to allow avoiding the thread safety issues. Update callers of repo_oid_to_algop that can benefit from a submodule translation to also call repo_sumodule_oid_to_algop. Signed-off-by: "Eric W. Biederman" --- builtin/fast-import.c | 5 ++++- builtin/index-pack.c | 4 +++- object-file-convert.c | 45 +++++++++++++++++++++++++++++++++++++++++++ object-file-convert.h | 5 +++++ 4 files changed, 57 insertions(+), 2 deletions(-) diff --git a/builtin/fast-import.c b/builtin/fast-import.c index f1c250dd3c8f..66c471bc730e 100644 --- a/builtin/fast-import.c +++ b/builtin/fast-import.c @@ -1070,7 +1070,10 @@ static int store_object( else if (pobj) break; else if (repo_oid_to_algop(repo, &state.oid, compat, - &state.mapped_oid)) + &state.mapped_oid) && + repo_submodule_oid_to_algop(repo, &state.oid, + compat, + &state.mapped_oid)) break; } convert_object_file_end(&state, ret); diff --git a/builtin/index-pack.c b/builtin/index-pack.c index 6827d14b91ce..4100fd56a845 100644 --- a/builtin/index-pack.c +++ b/builtin/index-pack.c @@ -2093,7 +2093,9 @@ static void compute_compat_oid(struct object_entry *obj) else if (pobj) cco = cco_push(cco, pobj); else if (repo_oid_to_algop(repo, &cco->state.oid, compat, - &cco->state.mapped_oid)) + &cco->state.mapped_oid) && + repo_submodule_oid_to_algop(repo, &cco->state.oid, compat, + &cco->state.mapped_oid)) die(_("When converting %s no mapping for oid %s to %s\n"), oid_to_hex(&cco->obj->idx.oid), oid_to_hex(&cco->state.oid), diff --git a/object-file-convert.c b/object-file-convert.c index 3fd080ebc112..2306e17dd57e 100644 --- a/object-file-convert.c +++ b/object-file-convert.c @@ -11,6 +11,45 @@ #include "gpg-interface.h" #include "pack-compat-map.h" #include "object-file-convert.h" +#include "read-cache.h" +#include "submodule-config.h" + +int repo_submodule_oid_to_algop(struct repository *repo, + const struct object_id *src, + const struct git_hash_algo *to, + struct object_id *dest) +{ + int i; + + if (repo_read_index(repo) < 0) + die(_("index file corrupt")); + + for (i = 0; i < repo->index->cache_nr; i++) { + const struct cache_entry *ce = repo->index->cache[i]; + struct repository subrepo = {}; + int ret; + + if (!S_ISGITLINK(ce->ce_mode)) + continue; + + while (i + 1 < repo->index->cache_nr && + !strcmp(ce->name, repo->index->cache[i + 1]->name)) + /* + * Skip entries with the same name in different stages + * to make sure an entry is returned only once. + */ + i++; + + if (repo_submodule_init(&subrepo, repo, ce->name, null_oid())) + continue; + + ret = repo_oid_to_algop(&subrepo, src, to, dest); + repo_clear(&subrepo); + if (ret == 0) + return 0; + } + return -1; +} int repo_oid_to_algop(struct repository *repo, const struct object_id *src, const struct git_hash_algo *to, struct object_id *dest) @@ -34,6 +73,7 @@ int repo_oid_to_algop(struct repository *repo, const struct object_id *src, */ if (!repo_packed_oid_to_algop(repo, src, to, dest)) return 0; + /* * We may have loaded the object map at repo initialization but * another process (perhaps upstream of a pipe from us) may have @@ -306,6 +346,11 @@ int convert_object_file(struct strbuf *outbuf, break; ret = repo_oid_to_algop(the_repository, &state.oid, state.to, &state.mapped_oid); + if (ret) + ret = repo_submodule_oid_to_algop(the_repository, + &state.oid, + state.to, + &state.mapped_oid); if (ret) { error(_("failed to map %s entry for %s"), type_name(type), oid_to_hex(&state.oid)); diff --git a/object-file-convert.h b/object-file-convert.h index da032d7a91ef..7a19feda5f0c 100644 --- a/object-file-convert.h +++ b/object-file-convert.h @@ -10,6 +10,11 @@ struct strbuf; int repo_oid_to_algop(struct repository *repo, const struct object_id *src, const struct git_hash_algo *to, struct object_id *dest); +int repo_submodule_oid_to_algop(struct repository *repo, + const struct object_id *src, + const struct git_hash_algo *to, + struct object_id *dest); + struct object_file_convert_state { struct strbuf *outbuf; const struct git_hash_algo *from;