From patchwork Wed Oct 27 07:49:08 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matt Cooper X-Patchwork-Id: 12586537 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0DEDFC433FE for ; Wed, 27 Oct 2021 07:49:18 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id EDE0861039 for ; Wed, 27 Oct 2021 07:49:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240670AbhJ0Hvm (ORCPT ); Wed, 27 Oct 2021 03:51:42 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36412 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240665AbhJ0Hvl (ORCPT ); Wed, 27 Oct 2021 03:51:41 -0400 Received: from mail-wm1-x32f.google.com (mail-wm1-x32f.google.com [IPv6:2a00:1450:4864:20::32f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1F7F5C061767 for ; Wed, 27 Oct 2021 00:49:16 -0700 (PDT) Received: by mail-wm1-x32f.google.com with SMTP id 131-20020a1c0489000000b0032cca9883b5so4126936wme.0 for ; Wed, 27 Oct 2021 00:49:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=JvctJzyk4uLC8DEgkdfnF3rVG4fDq711e6qh07JXQJI=; b=qwOvkR9T564ScUdmQlSmu4nN5IwoFK9V3Z0766TCNr6ryZYaNCjj8N1LYrO0vdtrGR ELQCx2IREWlbxFOt+zXwKlf13Ff+ou9eTYIk+YuyTpDPPLWn49gBWYvGdVSAzX8+zIUi asRV7BsiBjo2nqkbTKNaCsSfhUOe1fXNsV60HGUo39VCjHW+tuYcKTpFnI0siQwQmc3Y Infdlw0T6xPcWpln2CZJ3E5suyJgjrKBwnIO2Wnw9Mg9trrmHCX1u6QrBQxm6+vJot1X aYHz0ntqB8CTob3Kqw2MW5NTog2pBEqk0zLE/Y8mWlTTXO650t1lFKNYQGseZSOT1kG1 lDfA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=JvctJzyk4uLC8DEgkdfnF3rVG4fDq711e6qh07JXQJI=; b=jyt5cqGiIC30SbeSLKORgEVk2u+X7arzUARvUl2zpJr3d88dNnpU52rRamQ0ac0+3F tf9p70a9FE68Paw95uLKK6T+ogUC3PMSCY2TgISk6+Adl0QwsVkExvQTzbnjqaAYvQ8y RGR810wKh72V756Elexd1iCnt7+SDmLK+qbyvs2EFnMOGC2rIxCDKK6oWxRNotmDg3CO 3CzLI6NQr/9siK/O9HaLvWCqz5w4EydZfRc8XRbAMnhgpDzZe3wwY0TmaCvMFih7rX6G cNIt7XZebpYOdAT9WXpd8c7VlJpgRBTsbubaPHU3UtG4EIlZVNosSHzne7Ea4OCxK2TN kYbg== X-Gm-Message-State: AOAM531aHC9/AfkR5JXQYrG8CmDmLMYbY0D3LtOY8LRYnI2veM7xqRoy z4zySvPklcA2cBHG79JO30zFBlF9aYo= X-Google-Smtp-Source: ABdhPJx/Wwyd9Uh6xl+/TfNsCT/FNxcAyHYbWcKFJSeHsmPgAJ7tB+f3sSePB5ThUwzUksN1ChI4wQ== X-Received: by 2002:a1c:ac03:: with SMTP id v3mr4231151wme.127.1635320954515; Wed, 27 Oct 2021 00:49:14 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id c17sm2754353wmk.23.2021.10.27.00.49.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 27 Oct 2021 00:49:14 -0700 (PDT) Message-Id: <449eb5c205e139e21b619c4eb975afc3d47427f3.1635320952.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Wed, 27 Oct 2021 07:49:08 +0000 Subject: [PATCH 1/5] t1051: introduce a smudge filter test for extremely large files Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Johannes Schindelin , Matt Cooper Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Matt Cooper From: Matt Cooper The filter system allows for alterations to file contents when they're added to the database or workdir. ("Smudge" when moving to the workdir; "clean" when moving to the database.) This is used natively to handle CRLF to LF conversions. It's also employed by Git-LFS to replace large files from the workdir with small tracking files in the repo and vice versa. Git pulls the entire smudged file into memory. While this is inefficient, there's a more insidious problem on some platforms due to inconsistency between using unsigned long and size_t for the same type of data (size of a file in bytes). On most 64-bit platforms, unsigned long is 64 bits, and size_t is typedef'd to unsigned long. On Windows, however, unsigned long is only 32 bits (and therefore on 64-bit Windows, size_t is typedef'd to unsigned long long in order to be 64 bits). Practically speaking, this means 64-bit Windows users of Git-LFS can't handle files larger than 2^32 bytes. Other 64-bit platforms don't suffer this limitation. This commit introduces a test exposing the issue; future commits make it pass. The test simulates the way Git-LFS works by having a tiny file checked into the repository and expanding it to a huge file on checkout. Helped-by: Johannes Schindelin Signed-off-by: Matt Cooper Signed-off-by: Johannes Schindelin --- t/t1051-large-conversion.sh | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/t/t1051-large-conversion.sh b/t/t1051-large-conversion.sh index 8b7640b3ba8..684ba5bd0a5 100755 --- a/t/t1051-large-conversion.sh +++ b/t/t1051-large-conversion.sh @@ -83,4 +83,16 @@ test_expect_success 'ident converts on output' ' test_cmp small.clean large.clean ' +# This smudge filter prepends 5GB of zeros to the file it checks out. This +# ensures that smudging doesn't mangle large files on 64-bit Windows. +test_expect_failure EXPENSIVE,!LONG_IS_64BIT 'files over 4GB convert on output' ' + test_commit test small "a small file" && + test_config filter.makelarge.smudge "dd if=/dev/zero bs=$((1024*1024)) count=$((5*1024)) && cat" && + echo "small filter=makelarge" >.gitattributes && + rm small && + git checkout -- small && + size=$(test_file_size small) && + test "$size" -ge $((5 * 1024 * 1024 * 1024)) +' + test_done From patchwork Wed Oct 27 07:49:09 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matt Cooper X-Patchwork-Id: 12586539 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2667EC433F5 for ; Wed, 27 Oct 2021 07:49:24 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 0B8C8610C7 for ; Wed, 27 Oct 2021 07:49:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240690AbhJ0Hvq (ORCPT ); Wed, 27 Oct 2021 03:51:46 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36422 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240663AbhJ0Hvl (ORCPT ); Wed, 27 Oct 2021 03:51:41 -0400 Received: from mail-wm1-x332.google.com (mail-wm1-x332.google.com [IPv6:2a00:1450:4864:20::332]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A09ACC061570 for ; Wed, 27 Oct 2021 00:49:16 -0700 (PDT) Received: by mail-wm1-x332.google.com with SMTP id b194-20020a1c1bcb000000b0032cd7b47853so388617wmb.5 for ; Wed, 27 Oct 2021 00:49:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=G8b0Yza3/ZQAy0Ifk3OIHmi/taJeE58zzCQSjNSOtls=; b=KCQ7rlv5YIQlqAaVelOOUnbrxG6rglpkBQiW2PlkrfF9EfKJ9KKalHcLCJEK2LQAnL HnZwsEPzfcF42/18ZZRHSBDdR7zYnU438ApMNAC3W9+CY1VRJ2klm9obZUa5Z7DNTzgv ogLGY6EGvx+RMhOJY6NctcVLNzW8adGe4BWI/lK9urkOigGkKJSZojmiShMbFLz4UXpf +NZlrrouBw0JbdzDX/YdPqcUKyXXAWl6tMzd2ra83QPnPZbzpIyFjBVTCUnsZeBEHVbX 3fFU5rtF23s0BT/T4wbIPTgVy+7FeVStjWPpwb7N+yf370hHB/Qvp9khwmv+gf6vWQKA LnaA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=G8b0Yza3/ZQAy0Ifk3OIHmi/taJeE58zzCQSjNSOtls=; b=VyFYlgSHLuOXjAcEJBmj2W2f0KiuwE+FDmbQMCIarid+Qz5BkLOSmnk69r1aPgUsxe cB7otVUxXWRN8NZNgdFHVXNba33OkJ/lCkMHeZd8tdv5PXyTvecT/kskxh4bVJ6EKZ/X OWEFGzD/CPmvENurGWkAMwrWdjLisA41Bo2fbqtb2WrPoYVcS7mB50NuMfsv3/onfQVS mKRCI0gvSXL4nKyA8ZTjjA2hZW2WJnWrirPt24L0h47I4CGv2r9k+8coLwF3lP5yL9DO MdocsFSvfwf2/ZvKte+sMys9GFyx+JiH994XDRv+4qQ4+2vIMN4vEVeC6zQtJE7yQHmj +3Mw== X-Gm-Message-State: AOAM532Ht3OuJkWt58fugPlbj32HuTZTb60YibyoyIfpnXe8BH+p7vcq Cokx7SmfMrbRoE1u8theKK1zIJwmSJg= X-Google-Smtp-Source: ABdhPJwbmMWovZyt/2JqLjFf7Dzmxiadde6O/3rQTVt/FC1l3k60hwTTmWu4cYD4poyGHUFyirLPiA== X-Received: by 2002:a1c:98d2:: with SMTP id a201mr4047776wme.113.1635320955225; Wed, 27 Oct 2021 00:49:15 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id s3sm20737063wrm.40.2021.10.27.00.49.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 27 Oct 2021 00:49:14 -0700 (PDT) Message-Id: <5b9d149ba23be06c70262128616e80f45a053a84.1635320952.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Wed, 27 Oct 2021 07:49:09 +0000 Subject: [PATCH 2/5] odb: teach read_blob_entry to use size_t Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Johannes Schindelin , Matt Cooper Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Matt Cooper From: Matt Cooper There is mixed use of size_t and unsigned long to deal with sizes in the codebase. Recall that Windows defines unsigned long as 32 bits even on 64-bit platforms, meaning that converting size_t to unsigned long narrows the range. This mostly doesn't cause a problem since Git rarely deals with files larger than 2^32 bytes. But adjunct systems such as Git LFS, which use smudge/clean filters to keep huge files out of the repository, may have huge file contents passed through some of the functions in entry.c and convert.c. On Windows, this results in a truncated file being written to the workdir. I traced this to one specific use of unsigned long in write_entry (and a similar instance in write_pc_item_to_fd for parallel checkout). That appeared to be for the call to read_blob_entry, which expects a pointer to unsigned long. By altering the signature of read_blob_entry to expect a size_t, write_entry can be switched to use size_t internally (which all of its callers and most of its callees already used). To avoid touching dozens of additional files, read_blob_entry uses a local unsigned long to call a chain of functions which aren't prepared to accept size_t. Helped-by: Johannes Schindelin Signed-off-by: Matt Cooper Signed-off-by: Johannes Schindelin --- entry.c | 8 +++++--- entry.h | 2 +- parallel-checkout.c | 2 +- t/t1051-large-conversion.sh | 2 +- 4 files changed, 8 insertions(+), 6 deletions(-) diff --git a/entry.c b/entry.c index 711ee0693c7..4cb3942dbdc 100644 --- a/entry.c +++ b/entry.c @@ -82,11 +82,13 @@ static int create_file(const char *path, unsigned int mode) return open(path, O_WRONLY | O_CREAT | O_EXCL, mode); } -void *read_blob_entry(const struct cache_entry *ce, unsigned long *size) +void *read_blob_entry(const struct cache_entry *ce, size_t *size) { enum object_type type; - void *blob_data = read_object_file(&ce->oid, &type, size); + unsigned long ul; + void *blob_data = read_object_file(&ce->oid, &type, &ul); + *size = ul; if (blob_data) { if (type == OBJ_BLOB) return blob_data; @@ -270,7 +272,7 @@ static int write_entry(struct cache_entry *ce, char *path, struct conv_attrs *ca int fd, ret, fstat_done = 0; char *new_blob; struct strbuf buf = STRBUF_INIT; - unsigned long size; + size_t size; ssize_t wrote; size_t newsize = 0; struct stat st; diff --git a/entry.h b/entry.h index b8c0e170dc7..61ee8c17604 100644 --- a/entry.h +++ b/entry.h @@ -51,7 +51,7 @@ int finish_delayed_checkout(struct checkout *state, int *nr_checkouts); */ void unlink_entry(const struct cache_entry *ce); -void *read_blob_entry(const struct cache_entry *ce, unsigned long *size); +void *read_blob_entry(const struct cache_entry *ce, size_t *size); int fstat_checkout_output(int fd, const struct checkout *state, struct stat *st); void update_ce_after_write(const struct checkout *state, struct cache_entry *ce, struct stat *st); diff --git a/parallel-checkout.c b/parallel-checkout.c index 6b1af32bb3d..b6f4a25642e 100644 --- a/parallel-checkout.c +++ b/parallel-checkout.c @@ -261,7 +261,7 @@ static int write_pc_item_to_fd(struct parallel_checkout_item *pc_item, int fd, struct stream_filter *filter; struct strbuf buf = STRBUF_INIT; char *blob; - unsigned long size; + size_t size; ssize_t wrote; /* Sanity check */ diff --git a/t/t1051-large-conversion.sh b/t/t1051-large-conversion.sh index 684ba5bd0a5..38aa0d8a075 100755 --- a/t/t1051-large-conversion.sh +++ b/t/t1051-large-conversion.sh @@ -85,7 +85,7 @@ test_expect_success 'ident converts on output' ' # This smudge filter prepends 5GB of zeros to the file it checks out. This # ensures that smudging doesn't mangle large files on 64-bit Windows. -test_expect_failure EXPENSIVE,!LONG_IS_64BIT 'files over 4GB convert on output' ' +test_expect_success EXPENSIVE,!LONG_IS_64BIT 'files over 4GB convert on output' ' test_commit test small "a small file" && test_config filter.makelarge.smudge "dd if=/dev/zero bs=$((1024*1024)) count=$((5*1024)) && cat" && echo "small filter=makelarge" >.gitattributes && From patchwork Wed Oct 27 07:49:10 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Schindelin X-Patchwork-Id: 12586541 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BE0BBC433EF for ; Wed, 27 Oct 2021 07:49:24 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id AA506610CA for ; Wed, 27 Oct 2021 07:49:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240694AbhJ0Hvs (ORCPT ); Wed, 27 Oct 2021 03:51:48 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36424 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240671AbhJ0Hvm (ORCPT ); Wed, 27 Oct 2021 03:51:42 -0400 Received: from mail-wr1-x42b.google.com (mail-wr1-x42b.google.com [IPv6:2a00:1450:4864:20::42b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2906CC061745 for ; Wed, 27 Oct 2021 00:49:17 -0700 (PDT) Received: by mail-wr1-x42b.google.com with SMTP id k7so2451622wrd.13 for ; Wed, 27 Oct 2021 00:49:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=Ujbnv4+43ixUwHCk3jD1n5WL8u2kycVREyhlj89wnLE=; b=AALz/9KZlR+qpObunt/YDNAOzlgQqt5fdrnSHZCpau1e7FYkfcKO9RYH9vpC0396DT BX1R0y7s6he89J9q696OS78FSZVo4LoBR/uMBI8mlK85CVLbdTEILBj/a2C+vzMJ8Mxa LyerrgA+25Z3qWghivWf+aK92cz774I2hnh6JBiOtuySei1W8kPOPZZXzlG02UF64cEy kJYRsu4c13EOas4EFv7v+E3zHeofheNY99ahmqGeG2XvCmV56/whvjzDNz3oI0uTFEgT 25MGPieyLSl3Y7yzXhDe1RZ+fR/h1QBvHYtYvY1eHygeqtN1SVH3IIS5kDlDvwawhWCg 4dqw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=Ujbnv4+43ixUwHCk3jD1n5WL8u2kycVREyhlj89wnLE=; b=X1o8WZX+0t1Q5+ggqjYSUv0zl9ozcZVDfl9EEdu1gv7w3XcxT6tH+ikHcjmDur5Vyr bVRv2OAiv0RuTyGVTDXUnoxajul9VXePsCvJzb8ZUpTVpO7qqGzix9TF4WW/4K0MiAP6 +OxtF2f1cFAAnBWoRhVu/Z3/FHqCvl7sIY9FgQlqudTjYKGz23h60t84Qf8I8D5G05BM 1ktskilZ7MYQJdw/KnwKyLaZRnYJAg8C/vhCSSdOh7T+A1+EX26o+tRSBFYPaYqyS9br KhQO6xlbRynO28p5lfpUhRl2524Pf7kXqS/PKPqPpzcrHBYE10JeJGDwxHcnrFfIGbfC enWA== X-Gm-Message-State: AOAM533xceyaXOfXdu3bAF6e6wMYGYj21khYUnwnO9zDHxZGX1UIyAM2 gB3xUYbTlwVjz/GHenrEl6PAFVbya/Q= X-Google-Smtp-Source: ABdhPJwlFaS0S3Bd/Btc3HC5QHnyX4M3tpc9JtrGR+astfWqZHNq1fKIvrwmB/kENFIca/KjkqYvbw== X-Received: by 2002:a5d:6d51:: with SMTP id k17mr37770997wri.233.1635320955792; Wed, 27 Oct 2021 00:49:15 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id r11sm2653921wro.93.2021.10.27.00.49.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 27 Oct 2021 00:49:15 -0700 (PDT) Message-Id: In-Reply-To: References: Date: Wed, 27 Oct 2021 07:49:10 +0000 Subject: [PATCH 3/5] git-compat-util: introduce more size_t helpers Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Johannes Schindelin , Johannes Schindelin Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Johannes Schindelin From: Johannes Schindelin We will use them in the next commit. Signed-off-by: Johannes Schindelin Signed-off-by: Matt Cooper Signed-off-by: Johannes Schindelin --- git-compat-util.h | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+) diff --git a/git-compat-util.h b/git-compat-util.h index a508dbe5a35..7977720655c 100644 --- a/git-compat-util.h +++ b/git-compat-util.h @@ -113,6 +113,14 @@ #define unsigned_mult_overflows(a, b) \ ((a) && (b) > maximum_unsigned_value_of_type(a) / (a)) +/* + * Returns true if the left shift of "a" by "shift" bits will + * overflow. The types of "a" and "b" must be unsigned. + * Note that this macro evaluates "a" twice! + */ +#define unsigned_left_shift_overflows(a, shift) \ + ((a) > maximum_unsigned_value_of_type(a) >> shift) + #ifdef __GNUC__ #define TYPEOF(x) (__typeof__(x)) #else @@ -859,6 +867,23 @@ static inline size_t st_sub(size_t a, size_t b) return a - b; } +static inline size_t st_left_shift(size_t a, unsigned shift) +{ + if (unsigned_left_shift_overflows(a, shift)) + die("size_t overflow: %"PRIuMAX" << %u", + (uintmax_t)a, shift); + return a << shift; +} + +static inline unsigned long cast_size_t_to_ulong(size_t a) +{ + if (a != (unsigned long)a) + die("object too large to read on this platform: %" + PRIuMAX" is cut off to %lu", + (uintmax_t)a, (unsigned long)a); + return (unsigned long)a; +} + #ifdef HAVE_ALLOCA_H # include # define xalloca(size) (alloca(size)) From patchwork Wed Oct 27 07:49:11 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matt Cooper X-Patchwork-Id: 12586545 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A9BC0C433EF for ; Wed, 27 Oct 2021 07:49:28 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9615661039 for ; Wed, 27 Oct 2021 07:49:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240682AbhJ0Hvw (ORCPT ); Wed, 27 Oct 2021 03:51:52 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36428 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240678AbhJ0Hvn (ORCPT ); Wed, 27 Oct 2021 03:51:43 -0400 Received: from mail-wr1-x435.google.com (mail-wr1-x435.google.com [IPv6:2a00:1450:4864:20::435]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B7EA9C061570 for ; Wed, 27 Oct 2021 00:49:17 -0700 (PDT) Received: by mail-wr1-x435.google.com with SMTP id d3so2488790wrh.8 for ; Wed, 27 Oct 2021 00:49:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=oUOtW+VMfF77w6KugTIk7kk1SNVq3GaMJS9+jVTFH/w=; b=iyRWfxhhoYo1xRVdGec7S9jhK97i7IIsX7QnB1epZanyBCae4Oia4rZORQrvi+M3tu ibkUHS5u2mDM3bLiukkmf4VIkW7PrZyQ073365U3OQVvaYmUNRt+nnCk2MRMIpbhohOF S1yKYwk6ST0TsAfbgPO5oK0afDr/q2+Tb/VmMqhbzAcUCj4k/90uO4uv3/zxy5DsHNiS /7r6kU9bHh2sRoff8C3VO/MfpnAZdO7+eFWSu7cdpblM0dcUQKs93h/HoswZctVfrRit A5Qazp0bKE5XkUWsRSh5XQHwrYhzjz9JEFxcFLX9eBrkhsr9ucoM5DtrzMqAJAeyjOJX Y4wA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=oUOtW+VMfF77w6KugTIk7kk1SNVq3GaMJS9+jVTFH/w=; b=kpFhvIMlRQDAO8/2HIVT8Q6JcYZ7EyHeVro4HfrxiUrLAj5z2kXIr5oVNywvJvyLhR tJ9c2SInqeThbnKU5GQW8Rrz3QhOwGkmGyYsU6o/ZC2srckFOJlVwBSKor1zGWIGb3dQ BWmUXBiej2YOvTE5ODKuQgX7pfhKzB/AcfzK9C0eOPalHqMrXvCzGLpdAWmeoLLrHK1P YXtLWMZ0IV0ZhT+/dKkCa1lHfdg5jz31HS9YB4mjwlY4E3+lbZZzzQOfsbXOyMpaI4wd zzlAyNoGHMqqBkQWt1aKjfIBtXIjkm57nrEsiUevObBNY6gVfRPX7THin/6WqmcBvk10 gFNw== X-Gm-Message-State: AOAM531dX5POCNh92qaUgZCmXO/TdbeKniFSQlFr23JF121oVBp2c31g Xfej5tKd3sYUzifsRuKpQx4++JBeErY= X-Google-Smtp-Source: ABdhPJzfsndfSHG7w/FUBpppFAQ3ZBIkbMRZgfDuz04LsaY1OKaIcpQdFyZ4rZKOf0hfwYsDiNDJBA== X-Received: by 2002:a5d:6d8e:: with SMTP id l14mr38320523wrs.304.1635320956354; Wed, 27 Oct 2021 00:49:16 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id u5sm21702536wrg.57.2021.10.27.00.49.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 27 Oct 2021 00:49:16 -0700 (PDT) Message-Id: In-Reply-To: References: Date: Wed, 27 Oct 2021 07:49:11 +0000 Subject: [PATCH 4/5] odb: guard against data loss checking out a huge file Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Johannes Schindelin , Matt Cooper Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Matt Cooper From: Matt Cooper This introduces an additional guard for platforms where `unsigned long` and `size_t` are not of the same size. If the size of an object in the database would overflow `unsigned long`, instead we now exit with an error. A complete fix will have to update _many_ other functions throughout the codebase to use `size_t` instead of `unsigned long`. It will have to be implemented at some stage. This commit puts in a stop-gap for the time being. Helped-by: Johannes Schindelin Signed-off-by: Matt Cooper Signed-off-by: Johannes Schindelin --- delta.h | 6 +++--- object-file.c | 6 +++--- packfile.c | 6 +++--- 3 files changed, 9 insertions(+), 9 deletions(-) diff --git a/delta.h b/delta.h index 2df5fe13d95..8a56ec07992 100644 --- a/delta.h +++ b/delta.h @@ -90,15 +90,15 @@ static inline unsigned long get_delta_hdr_size(const unsigned char **datap, const unsigned char *top) { const unsigned char *data = *datap; - unsigned long cmd, size = 0; + size_t cmd, size = 0; int i = 0; do { cmd = *data++; - size |= (cmd & 0x7f) << i; + size |= st_left_shift(cmd & 0x7f, i); i += 7; } while (cmd & 0x80 && data < top); *datap = data; - return size; + return cast_size_t_to_ulong(size); } #endif diff --git a/object-file.c b/object-file.c index f233b440b22..70e456fc2a3 100644 --- a/object-file.c +++ b/object-file.c @@ -1344,7 +1344,7 @@ static int parse_loose_header_extended(const char *hdr, struct object_info *oi, unsigned int flags) { const char *type_buf = hdr; - unsigned long size; + size_t size; int type, type_len = 0; /* @@ -1388,12 +1388,12 @@ static int parse_loose_header_extended(const char *hdr, struct object_info *oi, if (c > 9) break; hdr++; - size = size * 10 + c; + size = st_add(st_mult(size, 10), c); } } if (oi->sizep) - *oi->sizep = size; + *oi->sizep = cast_size_t_to_ulong(size); /* * The length must be followed by a zero byte diff --git a/packfile.c b/packfile.c index 755aa7aec5e..3ccea004396 100644 --- a/packfile.c +++ b/packfile.c @@ -1059,7 +1059,7 @@ unsigned long unpack_object_header_buffer(const unsigned char *buf, unsigned long len, enum object_type *type, unsigned long *sizep) { unsigned shift; - unsigned long size, c; + size_t size, c; unsigned long used = 0; c = buf[used++]; @@ -1073,10 +1073,10 @@ unsigned long unpack_object_header_buffer(const unsigned char *buf, break; } c = buf[used++]; - size += (c & 0x7f) << shift; + size = st_add(size, st_left_shift(c & 0x7f, shift)); shift += 7; } - *sizep = size; + *sizep = cast_size_t_to_ulong(size); return used; } From patchwork Wed Oct 27 07:49:12 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matt Cooper X-Patchwork-Id: 12586543 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 37AF0C433F5 for ; Wed, 27 Oct 2021 07:49:28 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1735D60EB4 for ; Wed, 27 Oct 2021 07:49:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240669AbhJ0Hvw (ORCPT ); Wed, 27 Oct 2021 03:51:52 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36430 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240677AbhJ0Hvn (ORCPT ); Wed, 27 Oct 2021 03:51:43 -0400 Received: from mail-wr1-x430.google.com (mail-wr1-x430.google.com [IPv6:2a00:1450:4864:20::430]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 60F02C061745 for ; Wed, 27 Oct 2021 00:49:18 -0700 (PDT) Received: by mail-wr1-x430.google.com with SMTP id e4so2501843wrc.7 for ; Wed, 27 Oct 2021 00:49:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=pwdyCR0wgp3QygnQ53RXts/rz/SCa8YoiLNLmK8eyKU=; b=cX8mUPuyfIC4jxZxR2d08G5M6Gire+qV7931Z9oM/HCbbHG5dLVbihDOKaYYV+pgqM GTD6gB4k+FBfEn438ylI4ZBdBLdHK47iy7Iw79lzNFyK2+3Y7YLcYzwmXZr6AxClXa34 7e+csWz7XgqXI616DwDVQPd3VmZSHmKZqXma3abYJwp7oHvcRfea4szTEvkIhmrPuMV+ gx/ehpq8aKkt4OVfajna08J/ohaX0ncTTLfjpuXOtiS3kgbtGFcMcqeo7cboFxF+KZ7Q yl/idW0gNZWwV2aOXaJ6z0IkESAQlPoYFToeHrL/FABoj5CdneFWsYSvguTx3zyOJ6EB mdeQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=pwdyCR0wgp3QygnQ53RXts/rz/SCa8YoiLNLmK8eyKU=; b=YW+XWoaN/XROVk6pyjj/ijMF3XsX4ri9LVQvGGDa77Fo513f4XXXOCGs60falTz0kb /2KHdIrpzsZagGxO7VYG+DraDnH++FrYZOZnTx/X+7aDPrHFei1Gv0VWSt8piFLB/st3 CxfKukyzJdRWwKa4NCXeZnwbX7SA0oUp8tw1G7MrDRjOtGYiIZnOnvkCnNULuOj5zc3P MRtyQ3qS51Cd8wB4NUyQr85jy9t045CrnIqWmbm7Z8Yg69wh40wXy0DWH1Xwa+rX2grn yV4Pa3S7ZPpSWeFk2wL5/5e9DudHZZkAecZxDNC+wrHX6qMtj3KIHski029xmBsgmEdQ 7CJA== X-Gm-Message-State: AOAM5324c78wZ+3LY+ig46+oJdTYE8rlb8UnZm9BHpqnHW/3qAKm/e7E VyTnvEgSz4+/PxKSriVHo6uNoYZzYeE= X-Google-Smtp-Source: ABdhPJwT7NeN7ZUq+hbjHAurTsDeIFZ8/vPJQBVzr1rwJ8ZB6eAf8pyB/ZSQAKWvxA//RKjvB/HBbg== X-Received: by 2002:adf:e987:: with SMTP id h7mr125773wrm.82.1635320956892; Wed, 27 Oct 2021 00:49:16 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id q14sm15518219wrr.28.2021.10.27.00.49.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 27 Oct 2021 00:49:16 -0700 (PDT) Message-Id: <20387ce355759629f9456d8b02226ba2600e2d36.1635320952.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Wed, 27 Oct 2021 07:49:12 +0000 Subject: [PATCH 5/5] clean/smudge: allow clean filters to process extremely large files Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Johannes Schindelin , Matt Cooper Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Matt Cooper From: Matt Cooper The filter system allows for alterations to file contents when they're moved between the database and the worktree. We already made sure that it is possible for smudge filters to produce contents that are larger than `unsigned long` can represent (which matters on systems where `unsigned long` is narrower than `size_t`, most notably 64-bit Windows). Now we make sure that clean filters can _consume_ contents that are larger than that. Note that this commit only allows clean filters' _input_ to be larger than can be represented by `unsigned long`. This change makes only a very minute dent into the much larger project to teach Git to use `size_t` instead of `unsigned long` wherever appropriate. Helped-by: Johannes Schindelin Signed-off-by: Matt Cooper Signed-off-by: Johannes Schindelin --- convert.c | 2 +- t/t1051-large-conversion.sh | 10 ++++++++++ 2 files changed, 11 insertions(+), 1 deletion(-) diff --git a/convert.c b/convert.c index fd9c84b0257..5ad6dfc08a0 100644 --- a/convert.c +++ b/convert.c @@ -613,7 +613,7 @@ static int crlf_to_worktree(const char *src, size_t len, struct strbuf *buf, struct filter_params { const char *src; - unsigned long size; + size_t size; int fd; const char *cmd; const char *path; diff --git a/t/t1051-large-conversion.sh b/t/t1051-large-conversion.sh index 38aa0d8a075..c488850bee4 100755 --- a/t/t1051-large-conversion.sh +++ b/t/t1051-large-conversion.sh @@ -95,4 +95,14 @@ test_expect_success EXPENSIVE,!LONG_IS_64BIT 'files over 4GB convert on output' test "$size" -ge $((5 * 1024 * 1024 * 1024)) ' +# This clean filter writes down the size of input it receives. By checking against +# the actual size, we ensure that cleaning doesn't mangle large files on 64-bit Windows. +test_expect_success EXPENSIVE,!LONG_IS_64BIT 'files over 4GB convert on input' ' + dd if=/dev/zero bs=$((1024*1024)) count=$((5*1024)) >big && + test_config filter.checklarge.clean "wc -c >big.size" && + echo "big filter=checklarge" >.gitattributes && + git add big && + test $(test_file_size big) -eq $(cat big.size) +' + test_done