[v9,17/17] fsck: report invalid object type-path combinations

Improve the error that's emitted in cases where we find a loose object
we parse, but which isn't at the location we expect it to be.

Before this change we'd prefix the error with a not-a-OID derived from
the path at which the object was found, due to an emergent behavior in
how we'd end up with an "OID" in these codepaths.

Now we'll instead say what object we hashed, and what path it was
found at. Before this patch series e.g.:

    $ git hash-object --stdin -w -t blob </dev/null
    e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
    $ mv objects/e6/ objects/e7

Would emit ("[...]" used to abbreviate the OIDs):

    git fsck
    error: hash mismatch for ./objects/e7/9d[...] (expected e79d[...])
    error: e79d[...]: object corrupt or missing: ./objects/e7/9d[...]

Now we'll instead emit:

    error: e69d[...]: hash-path mismatch, found at: ./objects/e7/9d[...]

Furthermore, we'll do the right thing when the object type and its
location are bad. I.e. this case:

    $ git hash-object --stdin -w -t garbage --literally </dev/null
    8315a83d2acc4c174aed59430f9a9c4ed926440f
    $ mv objects/83 objects/84

As noted in an earlier commits we'd simply die early in those cases,
until preceding commits fixed the hard die on invalid object type:

    $ git fsck
    fatal: invalid object type

Now we'll instead emit sensible error messages:

    $ git fsck
    error: 8315[...]: hash-path mismatch, found at: ./objects/84/15[...]
    error: 8315[...]: object is of unknown type 'garbage': ./objects/84/15[...]

In both fsck.c and object-file.c we're using null_oid as a sentinel
value for checking whether we got far enough to be certain that the
issue was indeed this OID mismatch.

We need to add the "object corrupt or missing" special-case to deal
with cases where read_loose_object() will return an error before
completing check_object_signature(), e.g. if we have an error in
unpack_loose_rest() because we find garbage after the valid gzip
content:

    $ git hash-object --stdin -w -t blob </dev/null
    e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
    $ chmod 755 objects/e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391
    $ echo garbage >>objects/e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391
    $ git fsck
    error: garbage at end of loose object 'e69d[...]'
    error: unable to unpack contents of ./objects/e6/9d[...]
    error: e69d[...]: object corrupt or missing: ./objects/e6/9d[...]

There is currently some weird messaging in the edge case when the two
are combined, i.e. because we're not explicitly passing along an error
state about this specific scenario from check_stream_oid() via
read_loose_object() we'll end up printing the null OID if an object is
of an unknown type *and* it can't be unpacked by zlib, e.g.:

    $ git hash-object --stdin -w -t garbage --literally </dev/null
    8315a83d2acc4c174aed59430f9a9c4ed926440f
    $ chmod 755 objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f
    $ echo garbage >>objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f
    $ /usr/bin/git fsck
    fatal: invalid object type
    $ ~/g/git/git fsck
    error: garbage at end of loose object '8315a83d2acc4c174aed59430f9a9c4ed926440f'
    error: unable to unpack contents of ./objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f
    error: 8315a83d2acc4c174aed59430f9a9c4ed926440f: object corrupt or missing: ./objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f
    error: 0000000000000000000000000000000000000000: object is of unknown type 'garbage': ./objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f
    [...]

I think it's OK to leave that for future improvements, which would
involve enum-ifying more error state as we've done with "enum
unpack_loose_header_result" in preceding commits. In these
increasingly more obscure cases the worst that can happen is that
we'll get slightly nonsensical or inapplicable error messages.

There's other such potential edge cases, all of which might produce
some confusing messaging, but still be handled correctly as far as
passing along errors goes. E.g. if check_object_signature() returns
and oideq(real_oid, null_oid()) is true, which could happen if it
returns -1 due to the read_istream() call having failed.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/fast-export.c |  2 +-
 builtin/fsck.c        | 23 +++++++++++++++--------
 builtin/index-pack.c  |  2 +-
 builtin/mktag.c       |  3 ++-
 cache.h               |  3 ++-
 object-file.c         | 21 ++++++++++-----------
 object-store.h        |  1 +
 object.c              |  4 ++--
 pack-check.c          |  3 ++-
 t/t1006-cat-file.sh   |  2 +-
 t/t1450-fsck.sh       |  8 +++++---
 11 files changed, 42 insertions(+), 30 deletions(-)

Message ID	patch-v9-17.17-8d926e41fc3-20210930T133300Z-avarab@gmail.com (mailing list archive)
State	Superseded
Headers	show Return-Path: <git-owner@kernel.org> X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 474B8C433F5 for <git@archiver.kernel.org>; Thu, 30 Sep 2021 13:38:10 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 2B44C61211 for <git@archiver.kernel.org>; Thu, 30 Sep 2021 13:38:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1351688AbhI3Nju (ORCPT <rfc822;git@archiver.kernel.org>); Thu, 30 Sep 2021 09:39:50 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51976 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1351677AbhI3Nj2 (ORCPT <rfc822;git@vger.kernel.org>); Thu, 30 Sep 2021 09:39:28 -0400 Received: from mail-wr1-x42e.google.com (mail-wr1-x42e.google.com [IPv6:2a00:1450:4864:20::42e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B205DC06176A for <git@vger.kernel.org>; Thu, 30 Sep 2021 06:37:44 -0700 (PDT) Received: by mail-wr1-x42e.google.com with SMTP id v17so10095394wrv.9 for <git@vger.kernel.org>; Thu, 30 Sep 2021 06:37:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=J1wyhHpBYgG5VOd2YKZRcn+26JraaPDCGdnBTCs9r5c=; b=okf0Yidmgl69fQpwr+16NnxvlIsSPlToTpjCD2Zw63I8BZhlQdF1OTjvZYXDZbI391 qsf11WyEpjt9cw+jDZCzx9Z6/PJVZSbHvvinLEifKlkTy/FATMg0sYdCkvS5VP4kjGce w0Ajq65CEsj5lHTJxsLb5Lpwo444a7VFbbn/WbbEyNpQ05O+exuxMxYNJ1Gan5BvIJM0 W6Zal9W5WX5+k/Kij7J/Lu73YSdCx6XBREk+m64oqCpR0/2reMnDKlc9x2vrkACUwt0A X8LqGF5qfsC82ABDrS5dklvtedUZqEn+1L0Cfm3HVNp85A75UVO/s/6qNBuCGiFL/8/n XGhw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=J1wyhHpBYgG5VOd2YKZRcn+26JraaPDCGdnBTCs9r5c=; b=nVZzEkns/pqT4TFzpvcr0cDJo/+P+gxLRung8VJjkd+q4Q5FDme7/R78e6o9ECqCF+ o/8tNHUqW0eivzdc9rM67oflCcAH+UmO3mGGWbKmMTHsQ/Lc/UCnA/jy/u6gCJHuEvT2 cm/hSHnbMfObWL2C+0uxo4c3vcefntDlRu2q9HuBuKy91iF0pL92pgZqw+ydOrsTRb9c Rve360jlHotRYHcTJPQt08lkv6t2AR+L1N4GFT/Xptc2XBNDeHDMyTWwFNoSqYjw8Glm xFwNZJngD7XJnIa+i/xbv70aysw6Z4Nn6ISgwSNyAtg5EC01n4s3kn6/dBviO653WNhS gn1Q== X-Gm-Message-State: AOAM532csMbUihZqHtP6BQHuyo3u4d1XjLmJc0wPYdvnDBaQwkOM3qrt JU9Nq5OtKiertFtdnMRsNXN1ROgNLJva5pht X-Google-Smtp-Source: ABdhPJzto+7LDxd0EGQ8cBoYe/YtwZvwzPDz1y8xt95kwNdPO3faI6qtaSU6MCFaC82wk/7C2NMT9A== X-Received: by 2002:a5d:59a4:: with SMTP id p4mr6018216wrr.332.1633009062883; Thu, 30 Sep 2021 06:37:42 -0700 (PDT) Received: from vm.nix.is (vm.nix.is. [2a01:4f8:120:2468::2]) by smtp.gmail.com with ESMTPSA id o11sm4654713wmh.11.2021.09.30.06.37.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 30 Sep 2021 06:37:42 -0700 (PDT) From: =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsCBCamFybWFzb24=?= <avarab@gmail.com> To: git@vger.kernel.org Cc: Junio C Hamano <gitster@pobox.com>, Jeff King <peff@peff.net>, Jonathan Tan <jonathantanmy@google.com>, Andrei Rybak <rybak.a.v@gmail.com>, Taylor Blau <me@ttaylorr.com>, =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsCBCamFybWFz?= =?utf-8?b?b24=?= <avarab@gmail.com> Subject: [PATCH v9 17/17] fsck: report invalid object type-path combinations Date: Thu, 30 Sep 2021 15:37:22 +0200 Message-Id: <patch-v9-17.17-8d926e41fc3-20210930T133300Z-avarab@gmail.com> X-Mailer: git-send-email 2.33.0.1374.g05459a61530 In-Reply-To: <cover-v9-00.17-00000000000-20210930T133300Z-avarab@gmail.com> References: <cover-v8-00.17-00000000000-20210928T021616Z-avarab@gmail.com> <cover-v9-00.17-00000000000-20210930T133300Z-avarab@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: <git.vger.kernel.org> X-Mailing-List: git@vger.kernel.org
Series	fsck: lib-ify object-file.c & better fsck "invalid object" error reporting \| expand [v9,00/17] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting [v9,01/17] fsck tests: add test for fsck-ing an unknown type [v9,02/17] fsck tests: refactor one test to use a sub-repo [v9,03/17] fsck tests: test current hash/type mismatch behavior [v9,04/17] fsck tests: test for garbage appended to a loose object [v9,05/17] cat-file tests: move bogus_* variable declarations earlier [v9,06/17] cat-file tests: test for missing/bogus object with -t, -s and -p [v9,07/17] cat-file tests: add corrupt loose object test [v9,08/17] cat-file tests: test for current --allow-unknown-type behavior [v9,09/17] object-file.c: don't set "typep" when returning non-zero [v9,10/17] object-file.c: return -1, not "status" from unpack_loose_header() [v9,11/17] object-file.c: make parse_loose_header_extended() public [v9,12/17] object-file.c: simplify unpack_loose_short_header() [v9,13/17] object-file.c: use "enum" return type for unpack_loose_header() [v9,14/17] object-file.c: return ULHR_TOO_LONG on "header too long" [v9,15/17] object-file.c: stop dying in parse_loose_header() [v9,16/17] fsck: don't hard die on invalid object types [v9,17/17] fsck: report invalid object type-path combinations

[v9,17/17] fsck: report invalid object type-path combinations

Commit Message

Comments

Patch