From patchwork Mon Apr 17 16:21:38 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13214337 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 89BD7C77B7A for ; Mon, 17 Apr 2023 16:21:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230015AbjDQQVu (ORCPT ); Mon, 17 Apr 2023 12:21:50 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42508 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229977AbjDQQVr (ORCPT ); Mon, 17 Apr 2023 12:21:47 -0400 Received: from mail-wm1-x32e.google.com (mail-wm1-x32e.google.com [IPv6:2a00:1450:4864:20::32e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 085AB4ECE for ; Mon, 17 Apr 2023 09:21:46 -0700 (PDT) Received: by mail-wm1-x32e.google.com with SMTP id 5b1f17b1804b1-3f0aabd1040so45820355e9.1 for ; Mon, 17 Apr 2023 09:21:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1681748504; x=1684340504; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=9/kM71VJN5RJKFSYvsOZB4R+xGsWdJ66FfZktqdO5m0=; b=pQ7rHxcOwm9fV1ZFcU7CLaDJ8MVd22ppcCTlMMBgECFG9Vf6KYcWJ/shZ/Ce/I+99v OnXF6Yj6k7PsWtuNa3gdVvpVRK4OqaFRkhl4hlfWzY0lssxKa49BLjLANVXrxI38QwWN DgO+QCBIHiAkCFc1SGZPIA/4y3L16a6XY2ryWzspv7fG7M9k+P/0exD4XWx+sSHbnI6h rBwgNeFmsaloUIXgE3ap7uEj2D9LUDVUNWHBMFU92lkfOanGDVvSAvyg1MYlWbj2dtpW Zrdpy5BgFRkSfTlYfta8GOkkcrhI6Z+z3y7lUUnuasfuCtCzCbFXrd4HbpPffMa6pj3g o16w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1681748504; x=1684340504; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=9/kM71VJN5RJKFSYvsOZB4R+xGsWdJ66FfZktqdO5m0=; b=jTn1F/mO2r8AlPcls+6CmmLoaqDF1u5ROee53Mu+ED6FcPIldt41SR7w+ZhtojeLRx fLSUj19/bDXchIsd0RWGGSOxRf4yQKcBTcHLg4Q4/flv+i7zYirJJIhrxhfAu531Gnis hvv+Gp1JViAQMG/tfGAkRtaX7rdTzvZI6LO7mKpHZ85cYATMmHu8d3Fzk9mC2pe90Jar gN9Oz75XZnRMrmiltuXk5Kib7D6FNxrtAAN2XB6Z4FIsKACak7ZAAm4RfBcmq8txPByQ hMG7FSKXDIrFWYhYATgVdTMdmmemNoU0Pjmns0VbIDIVyhcu4/n+m3gMHRrMjuOtB8mf GAwA== X-Gm-Message-State: AAQBX9efyWmEWiWZAia5VlkydEmuHebSNNpZ+Xo8rIiFkfbTq2YL/Ij5 0JuZuY+XJLAA37L4lYXLTgUOXQByLP0= X-Google-Smtp-Source: AKy350Z9jrNygJb7q8mYO9iTTUryY6DEgmzA1y+UEL9dapYxlaIBdv4p+SEeVWqADjjvKJ2G46awUw== X-Received: by 2002:a05:6000:51:b0:2ef:21da:d611 with SMTP id k17-20020a056000005100b002ef21dad611mr6081240wrx.22.1681748504080; Mon, 17 Apr 2023 09:21:44 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id d17-20020a056000115100b002fa6929eb83sm3086551wrx.21.2023.04.17.09.21.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 17 Apr 2023 09:21:43 -0700 (PDT) Message-Id: <6bc3c56453ee2d0263210c233dbc946b5dbdcb4d.1681748502.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Mon, 17 Apr 2023 16:21:38 +0000 Subject: [PATCH 1/4] fsck: create scaffolding for rev-index checks Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: me@ttaylorr.com, gitster@pobox.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee The 'fsck' builtin checks many of Git's on-disk data structures, but does not currently validate the pack rev-index files (a .rev file to pair with a .pack and .idx file). Before doing a more-involved check process, create the scaffolding within builtin/fsck.c to have a new error type and add that error type when the API method verify_pack_revindex() returns an error. That method does nothing currently, but we will add checks to it in later changes. For now, check that 'git fsck' succeeds without any errors in the normal case. Future checks will be paired with tests that corrupt the .rev file appropriately. Signed-off-by: Derrick Stolee --- builtin/fsck.c | 30 ++++++++++++++++++++++++++++++ pack-revindex.c | 11 +++++++++++ pack-revindex.h | 8 ++++++++ t/t5325-reverse-index.sh | 14 ++++++++++++++ 4 files changed, 63 insertions(+) diff --git a/builtin/fsck.c b/builtin/fsck.c index 095b39d3980..2ab78129bde 100644 --- a/builtin/fsck.c +++ b/builtin/fsck.c @@ -24,6 +24,7 @@ #include "resolve-undo.h" #include "run-command.h" #include "worktree.h" +#include "pack-revindex.h" #define REACHABLE 0x0001 #define SEEN 0x0002 @@ -53,6 +54,7 @@ static int name_objects; #define ERROR_REFS 010 #define ERROR_COMMIT_GRAPH 020 #define ERROR_MULTI_PACK_INDEX 040 +#define ERROR_PACK_REV_INDEX 0100 static const char *describe_object(const struct object_id *oid) { @@ -856,6 +858,32 @@ static int mark_packed_for_connectivity(const struct object_id *oid, return 0; } +static int check_pack_rev_indexes(struct repository *r, int show_progress) +{ + struct progress *progress = NULL; + uint32_t pack_count = 0; + int res = 0; + + if (show_progress) { + for (struct packed_git *p = get_all_packs(the_repository); p; p = p->next) + pack_count++; + progress = start_delayed_progress("Verifying reverse pack-indexes", pack_count); + pack_count = 0; + } + + for (struct packed_git *p = get_all_packs(the_repository); p; p = p->next) { + if (!load_pack_revindex(the_repository, p) && + verify_pack_revindex(p)) { + error(_("invalid rev-index for pack '%s'"), p->pack_name); + res = ERROR_PACK_REV_INDEX; + } + display_progress(progress, ++pack_count); + } + stop_progress(&progress); + + return res; +} + static char const * const fsck_usage[] = { N_("git fsck [--tags] [--root] [--unreachable] [--cache] [--no-reflogs]\n" " [--[no-]full] [--strict] [--verbose] [--lost-found]\n" @@ -1019,6 +1047,8 @@ int cmd_fsck(int argc, const char **argv, const char *prefix) free_worktrees(worktrees); } + errors_found |= check_pack_rev_indexes(the_repository, show_progress); + check_connectivity(); if (the_repository->settings.core_commit_graph) { diff --git a/pack-revindex.c b/pack-revindex.c index 29f5358b256..c3f2aaa3fea 100644 --- a/pack-revindex.c +++ b/pack-revindex.c @@ -301,6 +301,17 @@ int load_pack_revindex(struct repository *r, struct packed_git *p) return -1; } +/* + * verify_pack_revindex verifies that the on-disk rev-index for the given + * pack-file is the same that would be created if written from scratch. + * + * A negative number is returned on error. + */ +int verify_pack_revindex(struct packed_git *p) +{ + return 0; +} + int load_midx_revindex(struct multi_pack_index *m) { struct strbuf revindex_name = STRBUF_INIT; diff --git a/pack-revindex.h b/pack-revindex.h index 46e834064e1..c8861873b02 100644 --- a/pack-revindex.h +++ b/pack-revindex.h @@ -51,6 +51,14 @@ struct repository; */ int load_pack_revindex(struct repository *r, struct packed_git *p); +/* + * verify_pack_revindex verifies that the on-disk rev-index for the given + * pack-file is the same that would be created if written from scratch. + * + * A negative number is returned on error. + */ +int verify_pack_revindex(struct packed_git *p); + /* * load_midx_revindex loads the '.rev' file corresponding to the given * multi-pack index by mmap-ing it and assigning pointers in the diff --git a/t/t5325-reverse-index.sh b/t/t5325-reverse-index.sh index 0548fce1aa6..206c412f50b 100755 --- a/t/t5325-reverse-index.sh +++ b/t/t5325-reverse-index.sh @@ -131,4 +131,18 @@ test_expect_success 'revindex in-memory vs on-disk' ' test_cmp on-disk in-core ) ' + +test_expect_success 'fsck succeeds on good rev-index' ' + test_when_finished rm -fr repo && + git init repo && + ( + cd repo && + + test_commit commit && + git -c pack.writeReverseIndex=true repack -ad && + git fsck 2>err && + test_must_be_empty err + ) +' + test_done From patchwork Mon Apr 17 16:21:39 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13214338 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 20438C77B76 for ; Mon, 17 Apr 2023 16:21:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230046AbjDQQVv (ORCPT ); Mon, 17 Apr 2023 12:21:51 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42522 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230002AbjDQQVs (ORCPT ); Mon, 17 Apr 2023 12:21:48 -0400 Received: from mail-wm1-x32c.google.com (mail-wm1-x32c.google.com [IPv6:2a00:1450:4864:20::32c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C7044524A for ; Mon, 17 Apr 2023 09:21:46 -0700 (PDT) Received: by mail-wm1-x32c.google.com with SMTP id hg25-20020a05600c539900b003f05a99a841so21323725wmb.3 for ; Mon, 17 Apr 2023 09:21:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1681748505; x=1684340505; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=NBAxlDoDIUCetTwSWTjui5SaY8g5aiwMN4pSeJeGQW4=; b=FkI467A8fRIlGFgUL2e4ld+mb61h34a66RawE3dNAIdLf1XbeBMPcMc2v3VbBbutUv kz/g7Tng1ZZmxo4DLlzpJKECCnyMR63RozwPrfe3bEpDclbICgD5fX7KgCDrSaA1NaOv QLYukrnDDP+Kew68N1RsZY+JJnqajPESjWD8osWuBj4l3BN6k/uPlX6XLDUPAD/UNA83 cRJj6s7fPXDFr+qAMp8F7I4c+SE7BerMEbhIxEKXb8ESKrYmo0pncjmmEgOubDfn0JtC tITuRGknB84YLmPx6qqx1BGMIM/CWrBWjw4kW4vn6XkD3kw4q5GYyCngXIX62YYkGbo8 GIrg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1681748505; x=1684340505; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=NBAxlDoDIUCetTwSWTjui5SaY8g5aiwMN4pSeJeGQW4=; b=ENVaRBkn05EUqeQ+CMHS3pVkHKo6aRkNaGD/zgmxMSTzmjAKpJH19mqe8WMmR9dsIV 77Ak64bJ1GV1xN6inudkAKGLIpDS4p8hJgrv5XjEP81MLAz7CEj3hxdAoJIDaUzWuInI KVAaetvizdxqvPQHw/+j+Re36p2HMVImhFShQ4bjLo2kofdRfMdAgZdV0WqwvNmNnmQ4 37n+fsh6GfOZHXkisvmmPoCLXwUnFS7Bt+177h7kLuXUy7uSm1fJ50IPJijenuZ/MP/y FTllGodFt7I3W0XnDqDqL9CmXj6OJsiVCA03z9kJw03zU5CkZRsoROJ3sWP8RJPRmWH8 h3FA== X-Gm-Message-State: AAQBX9cg996MKhy7CNOTWgWEaLpvRyjzQJNuuwpPOiv2LYMQ55Ru/dtB vgRx35RTXvBqTv8P6wsIxwiJIiD3My0= X-Google-Smtp-Source: AKy350ab2+3gv18Dgg81uI9nPNFsXlBWXJBh7VsdadVM3iw4kFEn3IpVn/KnLuY7GrqS8I1aRg6lvQ== X-Received: by 2002:a05:600c:4688:b0:3f1:65d4:30f7 with SMTP id p8-20020a05600c468800b003f165d430f7mr6054265wmo.10.1681748505120; Mon, 17 Apr 2023 09:21:45 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id k16-20020adff290000000b002f53fa16239sm10851664wro.103.2023.04.17.09.21.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 17 Apr 2023 09:21:44 -0700 (PDT) Message-Id: <7db4ec3e327ed3695f4f5409cb2dc80c72688758.1681748502.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Mon, 17 Apr 2023 16:21:39 +0000 Subject: [PATCH 2/4] fsck: check rev-index checksums Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: me@ttaylorr.com, gitster@pobox.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee The previous change added calls to verify_pack_revindex() in builtin/fsck.c, but the implementation of the method was left empty. Add the first and most-obvious check to this method: checksum verification. While here, create a helper method in the test script that makes it easy to adjust the .rev file and check that 'git fsck' reports the correct error message. Signed-off-by: Derrick Stolee --- pack-revindex.c | 10 ++++++++++ t/t5325-reverse-index.sh | 40 ++++++++++++++++++++++++++++++++++++++++ 2 files changed, 50 insertions(+) diff --git a/pack-revindex.c b/pack-revindex.c index c3f2aaa3fea..007a806994f 100644 --- a/pack-revindex.c +++ b/pack-revindex.c @@ -5,6 +5,7 @@ #include "packfile.h" #include "config.h" #include "midx.h" +#include "csum-file.h" struct revindex_entry { off_t offset; @@ -309,6 +310,15 @@ int load_pack_revindex(struct repository *r, struct packed_git *p) */ int verify_pack_revindex(struct packed_git *p) { + /* Do not bother checking if not initialized. */ + if (!p->revindex_map) + return 0; + + if (!hashfile_checksum_valid((const unsigned char *)p->revindex_map, p->revindex_size)) { + error(_("invalid checksum")); + return -1; + } + return 0; } diff --git a/t/t5325-reverse-index.sh b/t/t5325-reverse-index.sh index 206c412f50b..6b7c709a1f6 100755 --- a/t/t5325-reverse-index.sh +++ b/t/t5325-reverse-index.sh @@ -145,4 +145,44 @@ test_expect_success 'fsck succeeds on good rev-index' ' ) ' +test_expect_success 'set up rev-index corruption tests' ' + git init corrupt && + ( + cd corrupt && + + test_commit commit && + git -c pack.writeReverseIndex=true repack -ad && + + revfile=$(ls .git/objects/pack/pack-*.rev) && + chmod a+w $revfile && + cp $revfile $revfile.bak + ) +' + +corrupt_rev_and_verify () { + ( + pos="$1" && + value="$2" && + error="$3" && + + cd corrupt && + revfile=$(ls .git/objects/pack/pack-*.rev) && + + # Reset to original rev-file. + cp $revfile.bak $revfile && + + printf "$value" | dd of=$revfile bs=1 seek="$pos" conv=notrunc && + test_must_fail git fsck 2>err && + grep "$error" err + ) +} + +test_expect_success 'fsck catches invalid checksum' ' + revfile=$(ls corrupt/.git/objects/pack/pack-*.rev) && + orig_size=$(wc -c <$revfile) && + hashpos=$((orig_size - 10)) && + corrupt_rev_and_verify $hashpos bogus \ + "invalid checksum" +' + test_done From patchwork Mon Apr 17 16:21:40 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13214339 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 921FEC77B7A for ; Mon, 17 Apr 2023 16:22:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230109AbjDQQVz (ORCPT ); Mon, 17 Apr 2023 12:21:55 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42522 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229959AbjDQQVt (ORCPT ); Mon, 17 Apr 2023 12:21:49 -0400 Received: from mail-wm1-x329.google.com (mail-wm1-x329.google.com [IPv6:2a00:1450:4864:20::329]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C83107A9E for ; Mon, 17 Apr 2023 09:21:47 -0700 (PDT) Received: by mail-wm1-x329.google.com with SMTP id 5b1f17b1804b1-3f09b9ac51dso45789435e9.0 for ; Mon, 17 Apr 2023 09:21:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1681748506; x=1684340506; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=/I+TJb3UF85e7M46snjurYlU1bJrW9+bpGYaoMDA3Sc=; b=O4hluSS/zRNmU31X5clANad0dmsgvpEefM/wIzzcjNpPmuk0bzde9xg4md1dVdYu6n R+FmkQN0gtf6/CtGJ2AacWro2SfFKsYDb+9B+a8hv+rKmBujZ1/aNvmOTvLsZ4kk5vlL U7YWMB2Pu8kUoX0eEdsLsWKMLbrJTpyK38baD4xtD8J+d5Eo1d8a4yAANwb+JcxTTW69 BWZXIRLRUGqLrCpyJ+hSQEfRzl3NLkwVp/3jsZPzgmsZ1UadvtF483fN3JqKvSf7PEI7 tVgSAB7UbygyOMWOOssFUmfb1W98BbtpGogrB3U2lDKvQ/dZ43K5e4NyVk4tvI4Rvjvf tZ8Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1681748506; x=1684340506; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=/I+TJb3UF85e7M46snjurYlU1bJrW9+bpGYaoMDA3Sc=; b=fC+6nxek48quCLIzu7Pxc5rQR0Vd6wk1rmtWjyE7ZNQp7CSwuPHoZGRpqk6+B4EkUa WNFXsp0Vjwv5XMNY0hQC++BPCU/9qyT9Iqog4kmGxGfANNbi5BcDKC9HZh/iip/yrxJv lqamhdjVAcMJ9jm0CuatfpVZ1uKehp6v4/BcozDs9XQo0hf+b9q5+M/6ISVcTeePEtp3 kFpq19kkqltIK1qq8Xu/IkJ5Sb+DQuawc/9ORRp+oDR9Zq1ilN8Y86VuhPcOljCJe9KD 9JiERTntOOBv/HzsIY0JpU/h4z8SkasLCZKpkaU4ERwJ2L1EfTIXbNVk3yUGHOOq5U0g 1nQg== X-Gm-Message-State: AAQBX9c24pO6f0y1c6YpAATXZQDOycOOZaVNeE8XJf1JDID4fr2dFhka 2+DNBYTBlQuXqWYCGa4IpHhYaIKUQRc= X-Google-Smtp-Source: AKy350YQYPH+4Sogr9awf8IW4rXDQHEuM9v0Dp2WJcU3abMahySXevHopKc2+Ud99RdjyKIgaiDfmA== X-Received: by 2002:adf:eec8:0:b0:2ef:afe0:727a with SMTP id a8-20020adfeec8000000b002efafe0727amr5438253wrp.12.1681748505971; Mon, 17 Apr 2023 09:21:45 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id y14-20020adffa4e000000b002f5fbc6ffb2sm10732490wrr.23.2023.04.17.09.21.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 17 Apr 2023 09:21:45 -0700 (PDT) Message-Id: In-Reply-To: References: Date: Mon, 17 Apr 2023 16:21:40 +0000 Subject: [PATCH 3/4] fsck: check rev-index position values Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: me@ttaylorr.com, gitster@pobox.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee When checking a rev-index file, it may be helpful to identify exactly which positions are incorrect. Compare the rev-index to a freshly-computed in-memory rev-index and report the comparison failures. This additional check (on top of the checksum validation) can help find files that were corrupt by a single bit flip on-disk or perhaps were written incorrectly due to a bug in Git. Signed-off-by: Derrick Stolee --- pack-revindex.c | 25 +++++++++++++++++++++---- t/t5325-reverse-index.sh | 5 +++++ 2 files changed, 26 insertions(+), 4 deletions(-) diff --git a/pack-revindex.c b/pack-revindex.c index 007a806994f..62a9846470c 100644 --- a/pack-revindex.c +++ b/pack-revindex.c @@ -310,16 +310,33 @@ int load_pack_revindex(struct repository *r, struct packed_git *p) */ int verify_pack_revindex(struct packed_git *p) { + int res = 0; + /* Do not bother checking if not initialized. */ - if (!p->revindex_map) - return 0; + if (!p->revindex_map || !p->revindex_data) + return res; if (!hashfile_checksum_valid((const unsigned char *)p->revindex_map, p->revindex_size)) { error(_("invalid checksum")); - return -1; + res = -1; } - return 0; + /* This may fail due to a broken .idx. */ + if (create_pack_revindex_in_memory(p)) + return res; + + for (size_t i = 0; i < p->num_objects; i++) { + uint32_t nr = p->revindex[i].nr; + uint32_t rev_val = get_be32(p->revindex_data + i); + + if (nr != rev_val) { + error(_("invalid rev-index position at %"PRIu64": %"PRIu32" != %"PRIu32""), + (uint64_t)i, nr, rev_val); + res = -1; + } + } + + return res; } int load_midx_revindex(struct multi_pack_index *m) diff --git a/t/t5325-reverse-index.sh b/t/t5325-reverse-index.sh index 6b7c709a1f6..5c3c80f88f0 100755 --- a/t/t5325-reverse-index.sh +++ b/t/t5325-reverse-index.sh @@ -185,4 +185,9 @@ test_expect_success 'fsck catches invalid checksum' ' "invalid checksum" ' +test_expect_success 'fsck catches invalid row position' ' + corrupt_rev_and_verify 14 "\07" \ + "invalid rev-index position" +' + test_done From patchwork Mon Apr 17 16:21:41 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13214340 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 20739C77B76 for ; Mon, 17 Apr 2023 16:22:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230019AbjDQQWI (ORCPT ); Mon, 17 Apr 2023 12:22:08 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42630 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230035AbjDQQVv (ORCPT ); Mon, 17 Apr 2023 12:21:51 -0400 Received: from mail-wm1-x32d.google.com (mail-wm1-x32d.google.com [IPv6:2a00:1450:4864:20::32d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AB29DAD11 for ; Mon, 17 Apr 2023 09:21:48 -0700 (PDT) Received: by mail-wm1-x32d.google.com with SMTP id n9-20020a05600c4f8900b003f05f617f3cso20109958wmq.2 for ; Mon, 17 Apr 2023 09:21:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1681748507; x=1684340507; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=QhyBznFkIqvGexVnZFVC2f2hnlMta/niQqiKb/ru0HM=; b=o9nSnc3sptsZ3U4PBLB2tY5J5V9ywnzGo4DOVU1kkCsmZgSoeYAp8PmmJgO6VjVcXK zo0RUvzAv+0VMTo5DY0Mxho2EVH/h4hC1eaQriDrQNVoMhaQn5wv+7UYJ+xo2i7UW/tP rUq1RwaOZX9XAhPXtQ/Qr+U5LpoAyRedcTWfyNOB89uwCAdf/AXfmfiwYSjm4uL0E278 VDVakRUn7GJD6tLcgLpaDSj6CrrEPAMnV9qJFDZXrbqvco/5/XkEtcrun7D61jIvoqWV /fx+UN5tBq8BVmTtm+SnU+JzECheLOhpUVXVRUw6PvWGbtJJjL4XxAky3MFSY2V6R904 zCXg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1681748507; x=1684340507; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=QhyBznFkIqvGexVnZFVC2f2hnlMta/niQqiKb/ru0HM=; b=PBF5qTWRle3C2jEUab4Iv+xykGSLXC55x2AS8TcDiVuCgMvAzezrTbgAqzYTq0Rt+a 9I8ANDL9+mmc9+WCEPqwZxrZCkv/VI+7opkFX+eZjlvcDjBuEJRJEjdf4bn0plHAZXge 06zvX+cZ7UCMvfGBojMA1Gon9TepHl+UHYDPIga+eTWsBniOKcY0TyrVUn6yfKCyj28Z 9y+0jeU8tPyPgPiQl1sWD9anVGAunK9WlfdAyq6qf8VcTHiECYMv/Mdbu+q4LUMpvIQ4 bg/+3UtjDrcNXQsHppFpPywO9E87gc6MPVU8yt6XHthYPA/qJkV6uccJen5GMHAiQ9BV Dn7Q== X-Gm-Message-State: AAQBX9dlz47ORa/IZFD0xURb+kk5G9fhCbqAETlhFqt8gSw3LK3au3XG 1nDnr3Lh25HSja/Jm+h/Tf3JYrXc4NM= X-Google-Smtp-Source: AKy350ZjCWAlrdHGGZj+yjD8uY9ZMTPiPepigw66M77IjecGvluxc4RfI9T62X4A6pa+fE/JyCqZ0g== X-Received: by 2002:a7b:cb07:0:b0:3f0:5519:9049 with SMTP id u7-20020a7bcb07000000b003f055199049mr11463983wmj.8.1681748506837; Mon, 17 Apr 2023 09:21:46 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id j4-20020a5d5644000000b002f02df4c7a3sm10831162wrw.30.2023.04.17.09.21.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 17 Apr 2023 09:21:46 -0700 (PDT) Message-Id: <7d894d859ef109091b86bc4e2e4f6cea0e808370.1681748502.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Mon, 17 Apr 2023 16:21:41 +0000 Subject: [PATCH 4/4] fsck: validate .rev file header Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: me@ttaylorr.com, gitster@pobox.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee While parsing a .rev file, we check the header information to be sure it makes sense. This happens before doing any additional validation such as a checksum or value check. In order to differentiate between a bad header and a non-existent file, we need to update the API for loading a reverse index. Make load_pack_revindex_from_disk() non-static and specify that a positive value means "the file does not exist" while other errors during parsing are negative values. Since an invalid header prevents setting up the structures we would use for further validations, we can stop at that point. The place where we can distinguish between a missing file and a corrupt file is inside load_revindex_from_disk(), which is used both by pack rev-indexes and multi-pack-index rev-indexes. Some tests in t5326 demonstrate that it is critical to take some conditions to allow positive error signals. Add tests that check the three header values. Signed-off-by: Derrick Stolee --- builtin/fsck.c | 10 ++++++++-- pack-bitmap.c | 4 ++-- pack-revindex.c | 5 +++-- pack-revindex.h | 8 ++++++++ t/t5325-reverse-index.sh | 15 +++++++++++++++ 5 files changed, 36 insertions(+), 6 deletions(-) diff --git a/builtin/fsck.c b/builtin/fsck.c index 2ab78129bde..2414190c049 100644 --- a/builtin/fsck.c +++ b/builtin/fsck.c @@ -872,8 +872,14 @@ static int check_pack_rev_indexes(struct repository *r, int show_progress) } for (struct packed_git *p = get_all_packs(the_repository); p; p = p->next) { - if (!load_pack_revindex(the_repository, p) && - verify_pack_revindex(p)) { + int load_error = load_pack_revindex_from_disk(p); + + if (load_error < 0) { + error(_("unable to load rev-index for pack '%s'"), p->pack_name); + res = ERROR_PACK_REV_INDEX; + } else if (!load_error && + !load_pack_revindex(the_repository, p) && + verify_pack_revindex(p)) { error(_("invalid rev-index for pack '%s'"), p->pack_name); res = ERROR_PACK_REV_INDEX; } diff --git a/pack-bitmap.c b/pack-bitmap.c index 38b35c48237..3828aab612a 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -379,7 +379,7 @@ static int open_midx_bitmap_1(struct bitmap_index *bitmap_git, goto cleanup; } - if (load_midx_revindex(bitmap_git->midx) < 0) { + if (load_midx_revindex(bitmap_git->midx)) { warning(_("multi-pack bitmap is missing required reverse index")); goto cleanup; } @@ -2140,7 +2140,7 @@ uint32_t *create_bitmap_mapping(struct bitmap_index *bitmap_git, if (!bitmap_is_midx(bitmap_git)) load_reverse_index(r, bitmap_git); - else if (load_midx_revindex(bitmap_git->midx) < 0) + else if (load_midx_revindex(bitmap_git->midx)) BUG("rebuild_existing_bitmaps: missing required rev-cache " "extension"); diff --git a/pack-revindex.c b/pack-revindex.c index 62a9846470c..146334e2c96 100644 --- a/pack-revindex.c +++ b/pack-revindex.c @@ -212,7 +212,8 @@ static int load_revindex_from_disk(char *revindex_name, fd = git_open(revindex_name); if (fd < 0) { - ret = -1; + /* "No file" means return 1. */ + ret = 1; goto cleanup; } if (fstat(fd, &st)) { @@ -264,7 +265,7 @@ cleanup: return ret; } -static int load_pack_revindex_from_disk(struct packed_git *p) +int load_pack_revindex_from_disk(struct packed_git *p) { char *revindex_name; int ret; diff --git a/pack-revindex.h b/pack-revindex.h index c8861873b02..6dd47efea10 100644 --- a/pack-revindex.h +++ b/pack-revindex.h @@ -51,6 +51,14 @@ struct repository; */ int load_pack_revindex(struct repository *r, struct packed_git *p); +/* + * Specifically load a pack revindex from disk. + * + * Returns 0 on success, 1 on "no .rev file", and -1 when there is an + * error parsing the .rev file. + */ +int load_pack_revindex_from_disk(struct packed_git *p); + /* * verify_pack_revindex verifies that the on-disk rev-index for the given * pack-file is the same that would be created if written from scratch. diff --git a/t/t5325-reverse-index.sh b/t/t5325-reverse-index.sh index 5c3c80f88f0..431a603ca0e 100755 --- a/t/t5325-reverse-index.sh +++ b/t/t5325-reverse-index.sh @@ -190,4 +190,19 @@ test_expect_success 'fsck catches invalid row position' ' "invalid rev-index position" ' +test_expect_success 'fsck catches invalid header: magic number' ' + corrupt_rev_and_verify 1 "\07" \ + "reverse-index file .* has unknown signature" +' + +test_expect_success 'fsck catches invalid header: version' ' + corrupt_rev_and_verify 7 "\02" \ + "reverse-index file .* has unsupported version" +' + +test_expect_success 'fsck catches invalid header: hash function' ' + corrupt_rev_and_verify 11 "\03" \ + "reverse-index file .* has unsupported hash id" +' + test_done