From patchwork Thu Oct 28 20:50:31 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Carlo_Marcelo_Arenas_Bel=C3=B3n?= X-Patchwork-Id: 12591101 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 715A8C433EF for ; Thu, 28 Oct 2021 20:50:44 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 5408260FE3 for ; Thu, 28 Oct 2021 20:50:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231299AbhJ1UxJ (ORCPT ); Thu, 28 Oct 2021 16:53:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34198 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230404AbhJ1UxI (ORCPT ); Thu, 28 Oct 2021 16:53:08 -0400 Received: from mail-wm1-x32c.google.com (mail-wm1-x32c.google.com [IPv6:2a00:1450:4864:20::32c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E6DF3C061745 for ; Thu, 28 Oct 2021 13:50:40 -0700 (PDT) Received: by mail-wm1-x32c.google.com with SMTP id 67-20020a1c1946000000b0030d4c90fa87so6110883wmz.2 for ; Thu, 28 Oct 2021 13:50:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:in-reply-to:references:from:date:subject:mime-version :content-transfer-encoding:fcc:to:cc; bh=As5zPCH7OBI2EjZyENdXYBbyh7weut22H3Fe+DIkjnk=; b=Y3HAEiKpPTZGNrzBG0Np/f4vSTJe2Xh1Fq/Njm1wwHNlLy4i2mv5jOxxbT5adEb8Wf SDNIsIU/qyRrNBxA8kpdKGckFFcKjfCrUPTKvb6yfb3IAIYnvdxruStqUE5ttzUbCUJJ WpMHgAMvYvbyiEd4YFuIy/Ue3wrea78CvX4s8E4udpakXUU3hRCq7HjZWSiW9j/bTQ9o TNevcePbhiPfOvdlZREtVAxKY1wGOBgYfKmHqzJvxsWpyGT7i/7Sw8U154TaTFc2agZ3 mh+9xbMkhqB1o01v0gjSG/3b/gLOUnQycZpS5jPQikI+ik5hLOShiW2IChQSog3wodRg EJPw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:mime-version:content-transfer-encoding:fcc:to:cc; bh=As5zPCH7OBI2EjZyENdXYBbyh7weut22H3Fe+DIkjnk=; b=ebugVbzYKc5TbWJxEYsPLJLzr+GCJZrLnumz9ZG+n+sdlbbW9/w+EVsH04XpKaF9ts QfnPhA3+JakMSff0shh/egZKK/tBtHg1IK2GTbt6YEPefN68GMYyuxbkGYO17x/M5nLw Rxpvpe9ngC2ub1j8fbZQZLekcHW4LUOMqSAEMJp1L64MZgdyEi4N3aulFvDFm3RfbFnv 1nUApriHBHqLL587cjTjZLw2lGXYqc8Sy46AYRnh5IXp/HnWWKt49B9NTijEW9VHNCAt 3LxjVpFmzxTL1F1EyAecOwAchg3rs7TEF5ZSPJ4EvdcwG1xEbbbzmtQBwwMpH3X7LiLG xNRg== X-Gm-Message-State: AOAM532zVa7/AZIHNqun0SAzLPlG6PsPcMNl961xcIS9XxBOlYCaf+ha EWMEFZsndfgegfrkXi0b7qo/WKwb1IA= X-Google-Smtp-Source: ABdhPJyyK+4a6kvQASmzkKuBmZDY28c1YbaNuIYzEtC+kijPplKOfoJIYnzWKOaQFf/pZPzZu4B5SA== X-Received: by 2002:a05:600c:2205:: with SMTP id z5mr15468135wml.133.1635454239616; Thu, 28 Oct 2021 13:50:39 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id x21sm7042020wmc.14.2021.10.28.13.50.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 28 Oct 2021 13:50:39 -0700 (PDT) Message-Id: <068f897b973b1f8889145f97c42fe6233c272dd5.1635454237.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Thu, 28 Oct 2021 20:50:31 +0000 Subject: [PATCH v2 1/7] test-genzeros: allow more than 2G zeros in Windows MIME-Version: 1.0 Fcc: Sent To: git@vger.kernel.org Cc: Carlo Arenas , Johannes Schindelin , =?utf-8?q?Carlo_Marcelo_A?= =?utf-8?q?renas_Bel=C3=B3n?= Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: =?utf-8?q?Carlo_Marcelo_Arenas_Bel=C3=B3n?= From: =?UTF-8?q?Carlo=20Marcelo=20Arenas=20Bel=C3=B3n?= d5cfd142ec (tests: teach the test-tool to generate NUL bytes and use it, 2019-02-14), add a way to generate zeroes in a portable way without using /dev/zero (needed by HP NonStop), but uses a long variable that is limited to 2^31 in Windows. Use instead a (POSIX/C99) intmax_t that is at least 64bit wide in 64-bit Windows to use in a future test. Signed-off-by: Carlo Marcelo Arenas Belón Signed-off-by: Johannes Schindelin --- t/helper/test-genzeros.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/t/helper/test-genzeros.c b/t/helper/test-genzeros.c index 9532f5bac97..b1197e91a89 100644 --- a/t/helper/test-genzeros.c +++ b/t/helper/test-genzeros.c @@ -3,14 +3,14 @@ int cmd__genzeros(int argc, const char **argv) { - long count; + intmax_t count; if (argc > 2) { fprintf(stderr, "usage: %s []\n", argv[0]); return 1; } - count = argc > 1 ? strtol(argv[1], NULL, 0) : -1L; + count = argc > 1 ? strtoimax(argv[1], NULL, 0) : -1; while (count < 0 || count--) { if (putchar(0) == EOF) From patchwork Thu Oct 28 20:50:32 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Schindelin X-Patchwork-Id: 12591103 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 22659C433F5 for ; Thu, 28 Oct 2021 20:50:46 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 07B6060FC1 for ; Thu, 28 Oct 2021 20:50:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231157AbhJ1UxM (ORCPT ); Thu, 28 Oct 2021 16:53:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34200 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231244AbhJ1UxJ (ORCPT ); Thu, 28 Oct 2021 16:53:09 -0400 Received: from mail-wm1-x32c.google.com (mail-wm1-x32c.google.com [IPv6:2a00:1450:4864:20::32c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 86CAAC061570 for ; Thu, 28 Oct 2021 13:50:41 -0700 (PDT) Received: by mail-wm1-x32c.google.com with SMTP id m42so6955275wms.2 for ; Thu, 28 Oct 2021 13:50:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=sF22lQObrJDoM/89dY5Yn/wl6X5w4lC79GmCyeJSztk=; b=KuM6cL5w4AzsfJ9O+4kdbLLvj3VqINLUfX2MfpvWHc/V9unZsbJ54BG3EEK9LE/Ed1 OyR89iTHuffZO27q7t+rd31W68oga+zWyaG4Etz2x1l/cQXsnTgqFaQIPiqNujTVN9aU tTMOWSIfRHof01K6g1afSpP+k4+W8omogUIXkWtEjoG/RzvlMuQe7NJz4y8u5YG09o44 AHEcmXHl7VOTMsJAGpYoN6wYs+L8Qc5svL5rNRt1AOzQzNi+gVb8AbZsHZiOsQvqCW4O UcwdJkm+uWenPK4m8uxjLsQyDowZUt1Zn2AcMCt1cj0YiJcWIfMjqWn6aZXl/tQsYjc4 clWg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=sF22lQObrJDoM/89dY5Yn/wl6X5w4lC79GmCyeJSztk=; b=qQW9nWFDMTZMCum0xszzTlNLN+RiuqqrGtEd9zWr4eUfZcX3LpAfgXkEU69kVzP0/3 ztWQQPnqKiPJtATNgmWYEqkq7LgASqag4XWts9nCw9mEvqgFV5Ty4hf+d/MJHQTxl7wU ayK+BIcUYxMSASwTqfmmeBt+LLRO1vQZFs/ke0RIKWrzKkbeUyAt6TS4SIN/E1+qzGem 7QcCsG8W4uREjfQ66B++0bIp17xLbQgUJfYo1wXJ2PPNlKi5ZacX75gFazFcKQEX7+U1 5jKyC2WkmGp7XLOzgpYwEqnYGVHkMOnxT1VGrEqHFLLNFpKHKY/Yg8mRZFzCO3xefa88 VvMw== X-Gm-Message-State: AOAM531KdatP/YYZ/jRl6mFSiteExRPBWvVyVqTXzcSSYYyMEDtV26tZ DVFD8+fdOQO5ileLv9ZmkFJrCOUynK0= X-Google-Smtp-Source: ABdhPJwvbAssAKuLAkcS9ZgrYRcYoHmUBpW0XhGY93WrVhvsiim2hmSWpT2ginkDbc+eylD/UuVzUA== X-Received: by 2002:a05:600c:2043:: with SMTP id p3mr7305512wmg.20.1635454240131; Thu, 28 Oct 2021 13:50:40 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id l2sm4691010wrs.90.2021.10.28.13.50.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 28 Oct 2021 13:50:39 -0700 (PDT) Message-Id: <6edcbae372ef63bd75ca6cc2d181f7506f35880f.1635454237.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Thu, 28 Oct 2021 20:50:32 +0000 Subject: [PATCH v2 2/7] test-tool genzeros: generate large amounts of data more efficiently Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Carlo Arenas , Johannes Schindelin , Johannes Schindelin Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Johannes Schindelin From: Johannes Schindelin In this developer's tests, producing one gigabyte worth of NULs in a busy loop that writes out individual bytes, unbuffered, took ~27sec. Writing chunked 256kB buffers instead only took ~0.6sec This matters because we are about to introduce a pair of test cases that want to be able to produce 5GB of NULs, and we cannot use `/dev/zero` because of the HP NonStop platform's lack of support for that device. Signed-off-by: Johannes Schindelin --- t/helper/test-genzeros.c | 17 +++++++++++++++-- 1 file changed, 15 insertions(+), 2 deletions(-) diff --git a/t/helper/test-genzeros.c b/t/helper/test-genzeros.c index b1197e91a89..c061c429da9 100644 --- a/t/helper/test-genzeros.c +++ b/t/helper/test-genzeros.c @@ -3,7 +3,10 @@ int cmd__genzeros(int argc, const char **argv) { + /* static, so that it is NUL-initialized */ + static char zeros[256 * 1024]; intmax_t count; + ssize_t n; if (argc > 2) { fprintf(stderr, "usage: %s []\n", argv[0]); @@ -12,9 +15,19 @@ int cmd__genzeros(int argc, const char **argv) count = argc > 1 ? strtoimax(argv[1], NULL, 0) : -1; - while (count < 0 || count--) { - if (putchar(0) == EOF) + /* Writing out individual NUL bytes is slow... */ + while (count < 0) + if (write(1, zeros, ARRAY_SIZE(zeros) < 0)) return -1; + + while (count > 0) { + n = write(1, zeros, count < ARRAY_SIZE(zeros) ? + count : ARRAY_SIZE(zeros)); + + if (n < 0) + return -1; + + count -= n; } return 0; From patchwork Thu Oct 28 20:50:33 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matt Cooper X-Patchwork-Id: 12591105 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 069BCC433EF for ; Thu, 28 Oct 2021 20:50:47 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E3B0C60F21 for ; Thu, 28 Oct 2021 20:50:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231244AbhJ1UxN (ORCPT ); Thu, 28 Oct 2021 16:53:13 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34206 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230404AbhJ1UxJ (ORCPT ); Thu, 28 Oct 2021 16:53:09 -0400 Received: from mail-wm1-x336.google.com (mail-wm1-x336.google.com [IPv6:2a00:1450:4864:20::336]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 20095C061745 for ; Thu, 28 Oct 2021 13:50:42 -0700 (PDT) Received: by mail-wm1-x336.google.com with SMTP id j128-20020a1c2386000000b003301a98dd62so2230183wmj.5 for ; Thu, 28 Oct 2021 13:50:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=/3BQ47Of5lw0cRdrRiB/L5MVqoyNIW0Sj9oxuURLlVI=; b=DaLztAzygQIbKD1r4ycmKcrTcP7Rw41cwvCYzZpJIMoei/z8JS4RrpHmRzA7n3aDoN PwYzA5gOkg+TplS6RsdnSijWw7vFzK2LXB13SDZjg8LJX72tT2NarsARr6kClzRjXW3m m/vUcMjBovv2AC+56TXaORLbVvstO6ik1K8F0HNiUxV1LI/z0lSHLN3/jMUpbsn7EkeN ndlyMDQVo+cojBfH/NcMLYxeGQaFoF6XKZgjla3zdxfHUxgDSWyxfAMD9ciuOmm56ahb ah/MzPDSgwfUO6QSwgXSPK3oaB9wfwZb7hQomAllCqthii6pa3rL4jrSmO0GhNdIDkFO t9mQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=/3BQ47Of5lw0cRdrRiB/L5MVqoyNIW0Sj9oxuURLlVI=; b=b2tkZFVxIXIoqKZKaYFtkFyCRw8SEbXpIaRtfFpUkxKLAkJBk2BRWbp2+gw5M/C14T ugKgsj94e4Ok++Of8DKURGetbh+ADvUYsdBrnaJcVrg19Ohr3tcv6UQS8WZJPio5arhP xSJioEoh9G/hLE1EnnC4+Gq9O6DxmJEE4sfbc4XHSfnCzRBV3bb61r9nnhvfpg5KekQL T0GzhlPxRyyYnG7CR61sp8uCBqn57e9oAwBsH50jVCowCYE/1WzhpKUnpEgysMY35Kuv vyp9jD36B84AHnIXmiz6h2tZUeWqy4XS9BnEiQkt4lyQE6vxiqGBPVCG3xD1SV8d0oIY S0HQ== X-Gm-Message-State: AOAM530O7PsTCt3fBmI29aWSh9l+Hz49sJcGGgyjba9dB/7SXxT9jiTw +nmZKHug+173rdXwxGOrTO9YC8V2PKk= X-Google-Smtp-Source: ABdhPJzhsIDGd3xFPVF0jjNJcAEhjDMOBI95TsgbHCJYZr5pMYxIn7cSbNjtZwodLGfwfmIL+WXYaw== X-Received: by 2002:a1c:2507:: with SMTP id l7mr14591035wml.186.1635454240752; Thu, 28 Oct 2021 13:50:40 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id z1sm4140721wre.21.2021.10.28.13.50.40 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 28 Oct 2021 13:50:40 -0700 (PDT) Message-Id: <1bdded86f5db61e3983a0f817817f0fbfbae112c.1635454237.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Thu, 28 Oct 2021 20:50:33 +0000 Subject: [PATCH v2 3/7] t1051: introduce a smudge filter test for extremely large files Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Carlo Arenas , Johannes Schindelin , Matt Cooper Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Matt Cooper From: Matt Cooper The filter system allows for alterations to file contents when they're added to the database or workdir. ("Smudge" when moving to the workdir; "clean" when moving to the database.) This is used natively to handle CRLF to LF conversions. It's also employed by Git-LFS to replace large files from the workdir with small tracking files in the repo and vice versa. Git pulls the entire smudged file into memory. While this is inefficient, there's a more insidious problem on some platforms due to inconsistency between using unsigned long and size_t for the same type of data (size of a file in bytes). On most 64-bit platforms, unsigned long is 64 bits, and size_t is typedef'd to unsigned long. On Windows, however, unsigned long is only 32 bits (and therefore on 64-bit Windows, size_t is typedef'd to unsigned long long in order to be 64 bits). Practically speaking, this means 64-bit Windows users of Git-LFS can't handle files larger than 2^32 bytes. Other 64-bit platforms don't suffer this limitation. This commit introduces a test exposing the issue; future commits make it pass. The test simulates the way Git-LFS works by having a tiny file checked into the repository and expanding it to a huge file on checkout. Helped-by: Johannes Schindelin Signed-off-by: Matt Cooper Signed-off-by: Johannes Schindelin --- t/t1051-large-conversion.sh | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/t/t1051-large-conversion.sh b/t/t1051-large-conversion.sh index 8b7640b3ba8..7c1a2845005 100755 --- a/t/t1051-large-conversion.sh +++ b/t/t1051-large-conversion.sh @@ -83,4 +83,17 @@ test_expect_success 'ident converts on output' ' test_cmp small.clean large.clean ' +# This smudge filter prepends 5GB of zeros to the file it checks out. This +# ensures that smudging doesn't mangle large files on 64-bit Windows. +test_expect_failure EXPENSIVE,!LONG_IS_64BIT 'files over 4GB convert on output' ' + test_commit test small "a small file" && + test_config filter.makelarge.smudge \ + "test-tool genzeros $((5*1024*1024*1024)) && cat" && + echo "small filter=makelarge" >.gitattributes && + rm small && + git checkout -- small && + size=$(test_file_size small) && + test "$size" -ge $((5 * 1024 * 1024 * 1024)) +' + test_done From patchwork Thu Oct 28 20:50:34 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matt Cooper X-Patchwork-Id: 12591109 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0D251C433FE for ; Thu, 28 Oct 2021 20:50:49 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E8A8660FE3 for ; Thu, 28 Oct 2021 20:50:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231132AbhJ1UxP (ORCPT ); Thu, 28 Oct 2021 16:53:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34210 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231303AbhJ1UxL (ORCPT ); Thu, 28 Oct 2021 16:53:11 -0400 Received: from mail-wm1-x32b.google.com (mail-wm1-x32b.google.com [IPv6:2a00:1450:4864:20::32b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A5492C061767 for ; Thu, 28 Oct 2021 13:50:42 -0700 (PDT) Received: by mail-wm1-x32b.google.com with SMTP id b2-20020a1c8002000000b0032fb900951eso2468894wmd.4 for ; Thu, 28 Oct 2021 13:50:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=xQJS77RTg3Pdf1nt4Dna+a75Am/a8dbrk8tITHuHJ9s=; b=m2A6h+t7gJ6+4UzltwtniY0UTX2HJQboeeayDPulBJKCb4fzH3G7vqf8lNefjyJeeX WSOFibLqcaQiDtFSTCoCZfM1twJBIEh9Xgi86GNAAkFv9B2PP/tq7oDmZPW+6UY2RClZ UDItaQmNUOqdXhQx9adSs/rHCUD9ggI47G/wPWQk1WFRw3aUWRHaACfpvcmi5RL6xeFU 7k33dVKSJwqLpuhxDe12li8r86YRVaBC9F/iANUh6k65fodfatak/EnYeDzy+Nnp0tmg asM6c+3vqrI0ctfjwFYv7DR+aMah9gAKXXpXbc4Rc7GsdoejnP23SOeT1DJh5DBCmtlR qE3g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=xQJS77RTg3Pdf1nt4Dna+a75Am/a8dbrk8tITHuHJ9s=; b=0Hmf5G9NRB2V13hotz0GJxHEyR6n0/oqw65xYyRbuuImosbA1s3wnHX1obV1LBeXBx IVK+v8cuzcUpbeiddyDH1jBJj4lTm4B+wc71vgKfQYGak/31j8QkppEAx6geAjujefj8 hRYIMYe3jY1qxQvPl/lK4kGH+/VgFg/tWXv+dQ6lyFRGHc0s0VSLK3hLI4YxHWGw/79c j8jrh4yQjrl7o2Ksom4H9CW1uYWlIpWoeKO3DqztdE9bVKokCGXRibB55zd6rOHOzSHa O6Jf3W6uD10CC+pkTvbVvXbt8n4/vcRySZJ/dxIJcsq/eFcJ8CSKKrpqUID3lyHx0xu1 A88A== X-Gm-Message-State: AOAM530HJTVXBztq31cKaAJcNYrPIpAL0EwHSLblnmu7HlX9X+90fe+6 tsFgpD5N/KwjV80hEPxOPwdid/NZ1uA= X-Google-Smtp-Source: ABdhPJzsw0mcmMT4c/gCV1aahZozUmUC7RRy+WSdj2o1RdqMd858QwhE61bEpw+0+uWzHW6QbAIWqQ== X-Received: by 2002:a05:600c:1c21:: with SMTP id j33mr14706576wms.163.1635454241340; Thu, 28 Oct 2021 13:50:41 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id r11sm3817625wro.93.2021.10.28.13.50.40 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 28 Oct 2021 13:50:41 -0700 (PDT) Message-Id: <3ffd3a001f742713eb0fde8508f876ff95103d82.1635454237.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Thu, 28 Oct 2021 20:50:34 +0000 Subject: [PATCH v2 4/7] odb: teach read_blob_entry to use size_t Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Carlo Arenas , Johannes Schindelin , Matt Cooper Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Matt Cooper From: Matt Cooper There is mixed use of size_t and unsigned long to deal with sizes in the codebase. Recall that Windows defines unsigned long as 32 bits even on 64-bit platforms, meaning that converting size_t to unsigned long narrows the range. This mostly doesn't cause a problem since Git rarely deals with files larger than 2^32 bytes. But adjunct systems such as Git LFS, which use smudge/clean filters to keep huge files out of the repository, may have huge file contents passed through some of the functions in entry.c and convert.c. On Windows, this results in a truncated file being written to the workdir. I traced this to one specific use of unsigned long in write_entry (and a similar instance in write_pc_item_to_fd for parallel checkout). That appeared to be for the call to read_blob_entry, which expects a pointer to unsigned long. By altering the signature of read_blob_entry to expect a size_t, write_entry can be switched to use size_t internally (which all of its callers and most of its callees already used). To avoid touching dozens of additional files, read_blob_entry uses a local unsigned long to call a chain of functions which aren't prepared to accept size_t. Helped-by: Johannes Schindelin Signed-off-by: Matt Cooper Signed-off-by: Johannes Schindelin --- entry.c | 8 +++++--- entry.h | 2 +- parallel-checkout.c | 2 +- t/t1051-large-conversion.sh | 2 +- 4 files changed, 8 insertions(+), 6 deletions(-) diff --git a/entry.c b/entry.c index 711ee0693c7..4cb3942dbdc 100644 --- a/entry.c +++ b/entry.c @@ -82,11 +82,13 @@ static int create_file(const char *path, unsigned int mode) return open(path, O_WRONLY | O_CREAT | O_EXCL, mode); } -void *read_blob_entry(const struct cache_entry *ce, unsigned long *size) +void *read_blob_entry(const struct cache_entry *ce, size_t *size) { enum object_type type; - void *blob_data = read_object_file(&ce->oid, &type, size); + unsigned long ul; + void *blob_data = read_object_file(&ce->oid, &type, &ul); + *size = ul; if (blob_data) { if (type == OBJ_BLOB) return blob_data; @@ -270,7 +272,7 @@ static int write_entry(struct cache_entry *ce, char *path, struct conv_attrs *ca int fd, ret, fstat_done = 0; char *new_blob; struct strbuf buf = STRBUF_INIT; - unsigned long size; + size_t size; ssize_t wrote; size_t newsize = 0; struct stat st; diff --git a/entry.h b/entry.h index b8c0e170dc7..61ee8c17604 100644 --- a/entry.h +++ b/entry.h @@ -51,7 +51,7 @@ int finish_delayed_checkout(struct checkout *state, int *nr_checkouts); */ void unlink_entry(const struct cache_entry *ce); -void *read_blob_entry(const struct cache_entry *ce, unsigned long *size); +void *read_blob_entry(const struct cache_entry *ce, size_t *size); int fstat_checkout_output(int fd, const struct checkout *state, struct stat *st); void update_ce_after_write(const struct checkout *state, struct cache_entry *ce, struct stat *st); diff --git a/parallel-checkout.c b/parallel-checkout.c index 6b1af32bb3d..b6f4a25642e 100644 --- a/parallel-checkout.c +++ b/parallel-checkout.c @@ -261,7 +261,7 @@ static int write_pc_item_to_fd(struct parallel_checkout_item *pc_item, int fd, struct stream_filter *filter; struct strbuf buf = STRBUF_INIT; char *blob; - unsigned long size; + size_t size; ssize_t wrote; /* Sanity check */ diff --git a/t/t1051-large-conversion.sh b/t/t1051-large-conversion.sh index 7c1a2845005..5ba03d02682 100755 --- a/t/t1051-large-conversion.sh +++ b/t/t1051-large-conversion.sh @@ -85,7 +85,7 @@ test_expect_success 'ident converts on output' ' # This smudge filter prepends 5GB of zeros to the file it checks out. This # ensures that smudging doesn't mangle large files on 64-bit Windows. -test_expect_failure EXPENSIVE,!LONG_IS_64BIT 'files over 4GB convert on output' ' +test_expect_success EXPENSIVE,!LONG_IS_64BIT 'files over 4GB convert on output' ' test_commit test small "a small file" && test_config filter.makelarge.smudge \ "test-tool genzeros $((5*1024*1024*1024)) && cat" && From patchwork Thu Oct 28 20:50:35 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Schindelin X-Patchwork-Id: 12591107 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 656BEC433F5 for ; Thu, 28 Oct 2021 20:50:48 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4B86660FC1 for ; Thu, 28 Oct 2021 20:50:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231332AbhJ1UxO (ORCPT ); Thu, 28 Oct 2021 16:53:14 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34212 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231304AbhJ1UxL (ORCPT ); Thu, 28 Oct 2021 16:53:11 -0400 Received: from mail-wr1-x435.google.com (mail-wr1-x435.google.com [IPv6:2a00:1450:4864:20::435]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3074FC0613B9 for ; Thu, 28 Oct 2021 13:50:43 -0700 (PDT) Received: by mail-wr1-x435.google.com with SMTP id b12so7946078wrh.4 for ; Thu, 28 Oct 2021 13:50:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=Yu1dICMBoWKbOVp013Ja9I8dWIsf9vmiXFjRDgFo3G4=; b=dsIE4nw5RUttDxaXiilZHO9rCtgpz+o6P+WZc8+BtpFePD9YgcUMNyNtAi0YbfGASH uZOpXIatv8aKoob8/9OufaQTvSqVbeHvVF0aemNw8bUUuNQ8LWRaFyCoHfYc496/8R9u l8Q/D84xLbUivCXQZSCFn67DZEBvkmS+jdcyqHk6NNAl5qbvVsGyX7jntIgKYzDJmmbs Yjfysmtn/2qtNRfC27K3kH1zQVy6LDvGxqAINO39mZvTqs0uckyj+TFydridOOTomT1T T6ecg6bgOqIOeKIZhNH+5jVzXlnp+2tbM7Y7ZIuCvdHM4JXrgXe139F0LgZ3x+Wtyn7G ITcA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=Yu1dICMBoWKbOVp013Ja9I8dWIsf9vmiXFjRDgFo3G4=; b=2xS2rg8z4qPyeH/jXD47/GmCQrXsMjdiyVxznj1cd+BK6S2xguJKeVdkO4DMvo2Iqv OEVjlOLXmMHXJqOrnuSR6FzNrIQuyy76/xSVsYK5WEE3uRRK1TTvvx3zfFZGw5dW8Y8O PDP0y5G0LS8NAan9Rp6wKUgcoiBEC/VWGcW6hTYJP1YuwiPZN5ncpBXdYsx74XqXqUmn s1ujUzdMquX0XI+J+BOGDBYGUeA8eJWhyWwObIIWaPhXPk0nc3ZDWthprIpPNUWK0uRR Ow8mBWxwDUWNAlMZEYQMEMYcnfeLrPE7CI2QhpZQExvCaMcw4xFcSOTf1hXI7XmG/cw6 ZxTg== X-Gm-Message-State: AOAM532qjzRF7I4WkTNtEhGnL/0SEvdcPGTPbYC8hyAs1e2l22cD93hp e4o2DN7Na0DpeXVM1iQTrzmvNMgR06M= X-Google-Smtp-Source: ABdhPJxOfWoQPJ9NSTGd6kkOdtxjXE2X0qH63Dsg/OTWlWKQv2y2dUCeCKs2NOaF8W75jUotl/aKFQ== X-Received: by 2002:a05:6000:18ae:: with SMTP id b14mr8625621wri.263.1635454241842; Thu, 28 Oct 2021 13:50:41 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id o194sm8018157wme.40.2021.10.28.13.50.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 28 Oct 2021 13:50:41 -0700 (PDT) Message-Id: <32472ae3f98bbe0162b39a16109522ec18026404.1635454237.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Thu, 28 Oct 2021 20:50:35 +0000 Subject: [PATCH v2 5/7] git-compat-util: introduce more size_t helpers Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Carlo Arenas , Johannes Schindelin , Johannes Schindelin Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Johannes Schindelin From: Johannes Schindelin We will use them in the next commit. Signed-off-by: Johannes Schindelin --- git-compat-util.h | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+) diff --git a/git-compat-util.h b/git-compat-util.h index a508dbe5a35..7977720655c 100644 --- a/git-compat-util.h +++ b/git-compat-util.h @@ -113,6 +113,14 @@ #define unsigned_mult_overflows(a, b) \ ((a) && (b) > maximum_unsigned_value_of_type(a) / (a)) +/* + * Returns true if the left shift of "a" by "shift" bits will + * overflow. The types of "a" and "b" must be unsigned. + * Note that this macro evaluates "a" twice! + */ +#define unsigned_left_shift_overflows(a, shift) \ + ((a) > maximum_unsigned_value_of_type(a) >> shift) + #ifdef __GNUC__ #define TYPEOF(x) (__typeof__(x)) #else @@ -859,6 +867,23 @@ static inline size_t st_sub(size_t a, size_t b) return a - b; } +static inline size_t st_left_shift(size_t a, unsigned shift) +{ + if (unsigned_left_shift_overflows(a, shift)) + die("size_t overflow: %"PRIuMAX" << %u", + (uintmax_t)a, shift); + return a << shift; +} + +static inline unsigned long cast_size_t_to_ulong(size_t a) +{ + if (a != (unsigned long)a) + die("object too large to read on this platform: %" + PRIuMAX" is cut off to %lu", + (uintmax_t)a, (unsigned long)a); + return (unsigned long)a; +} + #ifdef HAVE_ALLOCA_H # include # define xalloca(size) (alloca(size)) From patchwork Thu Oct 28 20:50:36 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matt Cooper X-Patchwork-Id: 12591111 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 78AF0C433EF for ; Thu, 28 Oct 2021 20:50:56 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 5E11860F92 for ; Thu, 28 Oct 2021 20:50:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231347AbhJ1UxS (ORCPT ); Thu, 28 Oct 2021 16:53:18 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34214 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231307AbhJ1UxL (ORCPT ); Thu, 28 Oct 2021 16:53:11 -0400 Received: from mail-wm1-x331.google.com (mail-wm1-x331.google.com [IPv6:2a00:1450:4864:20::331]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id ABF21C061348 for ; Thu, 28 Oct 2021 13:50:43 -0700 (PDT) Received: by mail-wm1-x331.google.com with SMTP id o4-20020a1c7504000000b0032cab7473caso5337880wmc.1 for ; Thu, 28 Oct 2021 13:50:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=oUOtW+VMfF77w6KugTIk7kk1SNVq3GaMJS9+jVTFH/w=; b=UxQLySi2b3e76QvKhtPPYHtzbTjRDXz8WYSn71ED9ZZpxbUYrafgg34LC+P8E36Euw a3TDDK+599H9WMcQk8WmgMx3Ya9EjevTTxRvjZI9S7lOA1/es0u9/EmctUCNYRDXScVh iIz+Oa43HE7jnVNHtnlZ09GA1y5GqdRImPMoqCWYKDlkMn+af8I2YCNk6WUQVLyXEOFG tAjYvNs0VFCbaFlgPszJBavaiY8DS57IYLYqV5bN9Y+0PcTIGdzyrIu+qPuDa4GLlE5z jXKKITb2AsepQyCF3JmEBj5gHSuXg2fed+Smmzo4XiRtDkY94aHR//fLHLAlKATJkpLd Tssw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=oUOtW+VMfF77w6KugTIk7kk1SNVq3GaMJS9+jVTFH/w=; b=6rs/y4GsBB9qrjNRr3ghupDS61t7dcGQAJXX5l76aflAM/9ZhrwqbZ/ihCbW2w40zq fLGk8DO60S/HZfEKZv2d6pAWv1WjU0GeMvtsJuNWDwEWO0dtcXiy5m0IK1N97UhBDQto 4FzZSCoQhxUQ7pl675+3H6+DbQT90jepC0JBUijI7mTDyZNCwKUmSImraI6AOcU6XyG3 oRaLmYvnymUHwXbUPkgXwspotxmBZqKl1Md7+2C22uPI2c9ckvhk2teyCavewqVhh/JB nZiWSRQfXszsyGTQKXlhL+R0IdFzGeuToyCVDHnpvIF4mQLZD/ufHMGNvyr2qXNQLJVk cy8g== X-Gm-Message-State: AOAM532mm1LWTzZNvieX9X1T6Hfryasye8+w+LCRJRmKtT+ukSqudnmZ dHDH430NNg6AhT517ZdnBE9OlQQe1Gs= X-Google-Smtp-Source: ABdhPJzE0n1tnJMj24Kmg+UFJfFdtHwA32c4fAJ4htsr1gbhZtoYTTI2l1grAsUwmsrQddZw9+H5ww== X-Received: by 2002:a7b:c010:: with SMTP id c16mr6933046wmb.141.1635454242375; Thu, 28 Oct 2021 13:50:42 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id u15sm3661501wmq.12.2021.10.28.13.50.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 28 Oct 2021 13:50:42 -0700 (PDT) Message-Id: In-Reply-To: References: Date: Thu, 28 Oct 2021 20:50:36 +0000 Subject: [PATCH v2 6/7] odb: guard against data loss checking out a huge file Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Carlo Arenas , Johannes Schindelin , Matt Cooper Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Matt Cooper From: Matt Cooper This introduces an additional guard for platforms where `unsigned long` and `size_t` are not of the same size. If the size of an object in the database would overflow `unsigned long`, instead we now exit with an error. A complete fix will have to update _many_ other functions throughout the codebase to use `size_t` instead of `unsigned long`. It will have to be implemented at some stage. This commit puts in a stop-gap for the time being. Helped-by: Johannes Schindelin Signed-off-by: Matt Cooper Signed-off-by: Johannes Schindelin --- delta.h | 6 +++--- object-file.c | 6 +++--- packfile.c | 6 +++--- 3 files changed, 9 insertions(+), 9 deletions(-) diff --git a/delta.h b/delta.h index 2df5fe13d95..8a56ec07992 100644 --- a/delta.h +++ b/delta.h @@ -90,15 +90,15 @@ static inline unsigned long get_delta_hdr_size(const unsigned char **datap, const unsigned char *top) { const unsigned char *data = *datap; - unsigned long cmd, size = 0; + size_t cmd, size = 0; int i = 0; do { cmd = *data++; - size |= (cmd & 0x7f) << i; + size |= st_left_shift(cmd & 0x7f, i); i += 7; } while (cmd & 0x80 && data < top); *datap = data; - return size; + return cast_size_t_to_ulong(size); } #endif diff --git a/object-file.c b/object-file.c index f233b440b22..70e456fc2a3 100644 --- a/object-file.c +++ b/object-file.c @@ -1344,7 +1344,7 @@ static int parse_loose_header_extended(const char *hdr, struct object_info *oi, unsigned int flags) { const char *type_buf = hdr; - unsigned long size; + size_t size; int type, type_len = 0; /* @@ -1388,12 +1388,12 @@ static int parse_loose_header_extended(const char *hdr, struct object_info *oi, if (c > 9) break; hdr++; - size = size * 10 + c; + size = st_add(st_mult(size, 10), c); } } if (oi->sizep) - *oi->sizep = size; + *oi->sizep = cast_size_t_to_ulong(size); /* * The length must be followed by a zero byte diff --git a/packfile.c b/packfile.c index 755aa7aec5e..3ccea004396 100644 --- a/packfile.c +++ b/packfile.c @@ -1059,7 +1059,7 @@ unsigned long unpack_object_header_buffer(const unsigned char *buf, unsigned long len, enum object_type *type, unsigned long *sizep) { unsigned shift; - unsigned long size, c; + size_t size, c; unsigned long used = 0; c = buf[used++]; @@ -1073,10 +1073,10 @@ unsigned long unpack_object_header_buffer(const unsigned char *buf, break; } c = buf[used++]; - size += (c & 0x7f) << shift; + size = st_add(size, st_left_shift(c & 0x7f, shift)); shift += 7; } - *sizep = size; + *sizep = cast_size_t_to_ulong(size); return used; } From patchwork Thu Oct 28 20:50:37 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matt Cooper X-Patchwork-Id: 12591113 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 37940C433FE for ; Thu, 28 Oct 2021 20:50:57 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 20E3960FC1 for ; Thu, 28 Oct 2021 20:50:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231381AbhJ1UxX (ORCPT ); Thu, 28 Oct 2021 16:53:23 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34222 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231313AbhJ1UxM (ORCPT ); Thu, 28 Oct 2021 16:53:12 -0400 Received: from mail-wr1-x435.google.com (mail-wr1-x435.google.com [IPv6:2a00:1450:4864:20::435]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4E3D1C061243 for ; Thu, 28 Oct 2021 13:50:44 -0700 (PDT) Received: by mail-wr1-x435.google.com with SMTP id k7so12322868wrd.13 for ; Thu, 28 Oct 2021 13:50:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=DhwSEccaMqAjCPcvUMPxM2dKEjzMOvtOPxQAoRmPJrM=; b=Z57bbonY3EFrT6ep1ITdy7KEvzECGujMlRDQIV2cNDLYMDWyIH21nHXVVyNCssobJO K9UMNItfdTEGVuApyGXxErnSl2SpUGslXqFzlrf2pRQhhViQuOwz30nh70/5WMTFc5qS F3xHlPi8vP/v/cr5jzVcoPCcQYBFod/dNeEj27JGNglz3i1+X9SlVkxqaIVAkYVCo5YM d83vzgckdMLU7o9RHD34ApSsS409KEJQLL5SqyWvvc1F5xFdIWhA9rfoRgPZNRKSgG6Y EuXJ5JxdmDfrJrNaa6KYKO+qtkSgTFyGvQs8R4H2g3ayv9/AOTNnurcn/Qgd4spuChDm ZgeA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=DhwSEccaMqAjCPcvUMPxM2dKEjzMOvtOPxQAoRmPJrM=; b=totrHTSmWnvLTXUTbBrnvRzK7WSzm8BrPRGgyoQ5T3fmCmb8IsQszOFflZ6vzkpAFZ TDaH+QKxX11j1WOjlaYN0vXl7x7OsBDXoJtQqGxiLWqMKHXNXMixHMTEXhZbEXwjCMhS Mjd/mqLL/R9egP2yG57jaZRlWw33Okf/qdaulUhrBZEd/T9B1zde/KySp1ps4Kn3P2k8 LqjzrbvxV5tXWrkb+jhuF1TNLNMX2GT0B9pCk4oRgiGvDSCagF5ZaYW0RBiJgmTqABmB 7St8Tg9myJJcwsuyiFCIvizlY3RxgyE2MTDM1rvgxWaHZPiwpBNNFnc3BhDVkfGuGtmE DwOw== X-Gm-Message-State: AOAM533n1P6bqutPncCB1xUR67R22hVc7E9NloOQ1V2GX8Cz14mDkO9J n7t9aPReFYOTT3pZnM6i2q+ppVqE5kA= X-Google-Smtp-Source: ABdhPJzBSnKCELgDqmpcGnGPsyVM0FNHM8HjZktzYOd7OmgsrcTf1kofcEd/eWv7yMB9VqHyoeVhww== X-Received: by 2002:a5d:5915:: with SMTP id v21mr8628411wrd.270.1635454242948; Thu, 28 Oct 2021 13:50:42 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id u10sm5284571wrs.5.2021.10.28.13.50.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 28 Oct 2021 13:50:42 -0700 (PDT) Message-Id: In-Reply-To: References: Date: Thu, 28 Oct 2021 20:50:37 +0000 Subject: [PATCH v2 7/7] clean/smudge: allow clean filters to process extremely large files Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Carlo Arenas , Johannes Schindelin , Matt Cooper Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Matt Cooper From: Matt Cooper The filter system allows for alterations to file contents when they're moved between the database and the worktree. We already made sure that it is possible for smudge filters to produce contents that are larger than `unsigned long` can represent (which matters on systems where `unsigned long` is narrower than `size_t`, most notably 64-bit Windows). Now we make sure that clean filters can _consume_ contents that are larger than that. Note that this commit only allows clean filters' _input_ to be larger than can be represented by `unsigned long`. This change makes only a very minute dent into the much larger project to teach Git to use `size_t` instead of `unsigned long` wherever appropriate. Helped-by: Johannes Schindelin Signed-off-by: Matt Cooper Signed-off-by: Johannes Schindelin --- convert.c | 2 +- t/t1051-large-conversion.sh | 10 ++++++++++ 2 files changed, 11 insertions(+), 1 deletion(-) diff --git a/convert.c b/convert.c index fd9c84b0257..5ad6dfc08a0 100644 --- a/convert.c +++ b/convert.c @@ -613,7 +613,7 @@ static int crlf_to_worktree(const char *src, size_t len, struct strbuf *buf, struct filter_params { const char *src; - unsigned long size; + size_t size; int fd; const char *cmd; const char *path; diff --git a/t/t1051-large-conversion.sh b/t/t1051-large-conversion.sh index 5ba03d02682..83d9cf485a3 100755 --- a/t/t1051-large-conversion.sh +++ b/t/t1051-large-conversion.sh @@ -96,4 +96,14 @@ test_expect_success EXPENSIVE,!LONG_IS_64BIT 'files over 4GB convert on output' test "$size" -ge $((5 * 1024 * 1024 * 1024)) ' +# This clean filter writes down the size of input it receives. By checking against +# the actual size, we ensure that cleaning doesn't mangle large files on 64-bit Windows. +test_expect_success EXPENSIVE,!LONG_IS_64BIT 'files over 4GB convert on input' ' + test-tool genzeros $((5*1024*1024*1024)) >big && + test_config filter.checklarge.clean "wc -c >big.size" && + echo "big filter=checklarge" >.gitattributes && + git add big && + test $(test_file_size big) -eq $(cat big.size) +' + test_done