From patchwork Thu Jan 13 16:43:46 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 12712929 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 10DE8C433FE for ; Thu, 13 Jan 2022 16:43:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236852AbiAMQnz (ORCPT ); Thu, 13 Jan 2022 11:43:55 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38942 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229630AbiAMQny (ORCPT ); Thu, 13 Jan 2022 11:43:54 -0500 Received: from mail-wr1-x431.google.com (mail-wr1-x431.google.com [IPv6:2a00:1450:4864:20::431]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 52E44C061574 for ; Thu, 13 Jan 2022 08:43:54 -0800 (PST) Received: by mail-wr1-x431.google.com with SMTP id k18so11154303wrg.11 for ; Thu, 13 Jan 2022 08:43:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=S1PaJJPg2o30UCucYg9eejKkU83zrUc2phIaz5eaHU4=; b=HyXSJq5i06FayjGPQUMfmsfo7fEX7Jw3GWiRPBZle51N+r6he42OSHDEFWuwZXqRrP FfJTSABXarjAV2ljJeT9S87oNlPOqGlqGKOM8ZGM+Xe7FGLF92IrSdseSrVCBMVrdrk1 /KUjffD9Ax9hAl2Zc1WJVQDCTbwU4NWXwupUh0xI3paUR1PJn4QJ30FBz1fgLfo3ogpH pBZcwto+MPSEFqgrgDHugd0BTvdWvRzHGjG7cFSvgTnY3ktJ6TpX7ETPzD0ZrPbiE8/X zddhwuA6UEgyt11Jkyz8Hw2hYVVaWTM3CByj81IniHvRJXn6FtfGGg0CDnYXAWZo49uS Pbnw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=S1PaJJPg2o30UCucYg9eejKkU83zrUc2phIaz5eaHU4=; b=txDNbFxKdx6mlYtbLuiB13OicE13WHK5BOlWiFWa0Yy5fsDsU/FilBsm8O6XoCgjCx c/IJ7/7c0T8qCM3rcE00akkFJS7L18OwU4AqzxS9foFLOMf7T9oijAgdVOLprQ4X+CCh 7Qn9li45iIukeFmDjNWi499WCz4Z9r07vSqyMVI7jQr9Yl0OHR/ErD6rwZbSM3fAcmJw BTYDVPhc/QErDXXdMmrBEFmBr9O3xmztS1jEu24WjzNVnAYnfAluI0QueeFti7r3BsiO i6OCboOI1eYDZJxk4Y14qrEDZ/UsPfG8+OunDoXnuzuYEI1D/W+eK+QZJgrgAZjjj56Y vKvw== X-Gm-Message-State: AOAM533WAFbHUMV9NV+OXRWDDRMumfGEM9FUMI3jJ6C2h3FJkm+3I127 s0JZarE/nDZqUmh0zJEh0H/CI9ygt20= X-Google-Smtp-Source: ABdhPJxzF3lGw6ZjIkpQNAuxP6LknL2g47dxCKEEm2DxT4hp2Plh0gGlZTEHj3eH04f7PcVcDmLXvg== X-Received: by 2002:adf:fac8:: with SMTP id a8mr690122wrs.140.1642092232809; Thu, 13 Jan 2022 08:43:52 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id a6sm2968385wrq.22.2022.01.13.08.43.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 13 Jan 2022 08:43:52 -0800 (PST) Message-Id: In-Reply-To: References: Date: Thu, 13 Jan 2022 16:43:46 +0000 Subject: [PATCH 1/5] t1011: add testcase demonstrating accidental loss of user modifications Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Victoria Dye , Derrick Stolee , Lessley Dennington , Elijah Newren , Elijah Newren Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Elijah Newren From: Elijah Newren If a user has a file with local modifications that is not marked as SKIP_WORKTREE, but the sparsity patterns are such that it should be marked that way, and the user then invokes a command like * git checkout -q HEAD^ or * git read-tree -mu HEAD^ Then the file will be deleted along with all the users' modifications. Add a testcase demonstrating this problem. Note: This bug only triggers if something other than 'HEAD' is given; if the commands above had specified 'HEAD', then the users' file would be left alone. Signed-off-by: Elijah Newren --- t/t1011-read-tree-sparse-checkout.sh | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+) diff --git a/t/t1011-read-tree-sparse-checkout.sh b/t/t1011-read-tree-sparse-checkout.sh index 24092c09a95..1b2395b8a82 100755 --- a/t/t1011-read-tree-sparse-checkout.sh +++ b/t/t1011-read-tree-sparse-checkout.sh @@ -187,6 +187,27 @@ test_expect_success 'read-tree updates worktree, absent case' ' test ! -f init.t ' +test_expect_success 'read-tree will not throw away dirty changes, non-sparse' ' + echo "/*" >.git/info/sparse-checkout && + read_tree_u_must_succeed -m -u HEAD && + + echo dirty >init.t && + read_tree_u_must_fail -m -u HEAD^ && + test_path_is_file init.t && + grep -q dirty init.t +' + +test_expect_failure 'read-tree will not throw away dirty changes, sparse' ' + echo "/*" >.git/info/sparse-checkout && + read_tree_u_must_succeed -m -u HEAD && + + echo dirty >init.t && + echo sub/added >.git/info/sparse-checkout && + read_tree_u_must_fail -m -u HEAD^ && + test_path_is_file init.t && + grep -q dirty init.t +' + test_expect_success 'read-tree updates worktree, dirty case' ' echo sub/added >.git/info/sparse-checkout && git checkout -f top && From patchwork Thu Jan 13 16:43:47 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 12712930 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CE87DC433F5 for ; Thu, 13 Jan 2022 16:43:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236854AbiAMQn4 (ORCPT ); Thu, 13 Jan 2022 11:43:56 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38950 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236846AbiAMQnz (ORCPT ); Thu, 13 Jan 2022 11:43:55 -0500 Received: from mail-wm1-x336.google.com (mail-wm1-x336.google.com [IPv6:2a00:1450:4864:20::336]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 05813C061574 for ; Thu, 13 Jan 2022 08:43:55 -0800 (PST) Received: by mail-wm1-x336.google.com with SMTP id c66so4283693wma.5 for ; Thu, 13 Jan 2022 08:43:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=mTcAzTV+TNFIZfO+bBIgYrhvWWXIkrL3H7jY+ZZLcls=; b=VpDqk5DELVlvOpA7xxmp57xKfd1mjme3vvnYR0P+zmY4KySKmabLfTUsgmAQaHmRlE sqxqr7FVuCwEzaqOLvES2S4FSeh9VUCAfShJQnqx3dM0b0urW5+TDLSVwSbLFZ47L0GX u/0eJW3Zw7dE/Q5bAppVFmlVhOpymFAGy/TjW6oinL0Dn3DKoVJ8dJjb/pzd/GXoHGNU oMX+4c2E0n9SYTMGyFp1t6ugIKCjbrGA+Um0btQQHofZ5pzinI7IjpsPeuvRjYQ002k6 9mcv9S7rvlcwP4/ETyHtXJEgLREoV/EayTTxS3zLxR4uWUurYe1cKXbNLsUVM8cDu1Fe nzHA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=mTcAzTV+TNFIZfO+bBIgYrhvWWXIkrL3H7jY+ZZLcls=; b=0mB06EdewxZmNd8ziMdDvHE77CWbMmxCXh3v5WPP/zCeCxWHZZ/migSrvfJfLbHzx8 f6usruU3AuC45Pojn4PxnXDwsRzUwAR5rbIx87jjL8rZwcI6wTbALuZAjxYxiFwNXf+X 4KNNoGmlrwHGg2mTv46UIkuQz7gmFuevwPsRd5gDzeLv0lbc0FNOYAqvfMq1E93+6HBh Rz0ezMddiyJ1DBJm0WIyEQ92/QcR32wn94UaUdJ4r+nSCUglXp5uSOmfpwRWx/r7Tiq8 bK28VU1w1sGbnHKc4wSl0OXicvZspAwsYgNLvCtWctoybjj1k+Oc5ogW9iAnlFdQ3iJ/ c/5g== X-Gm-Message-State: AOAM531LtTUq89u8M/uV4bqhkc4xGDx3x9IeCZEIAJKTX8BWbXgEvt/W RqbD/uXa0UaOdHQla6vD8XOEIu4xQCs= X-Google-Smtp-Source: ABdhPJz0q+h3riPsUdc805Z3/y/OfZmslAWjXoy16dpk8qdR2+aD7tO1ahm8X+U0txndcRfUiIjF8Q== X-Received: by 2002:a05:600c:4e53:: with SMTP id e19mr8790318wmq.15.1642092233488; Thu, 13 Jan 2022 08:43:53 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id bi24sm8009734wmb.9.2022.01.13.08.43.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 13 Jan 2022 08:43:53 -0800 (PST) Message-Id: <1e3958576e2a132180f8c5d20acc1315d3062119.1642092230.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Thu, 13 Jan 2022 16:43:47 +0000 Subject: [PATCH 2/5] unpack-trees: fix accidental loss of user changes Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Victoria Dye , Derrick Stolee , Lessley Dennington , Elijah Newren , Elijah Newren Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Elijah Newren From: Elijah Newren For sparse-checkouts, we don't want unpack-trees to error out on files that are missing from the worktree, so there has traditionally been logic to make it skip the verify_uptodate() check for these. Unfortunately, it was skipping the verify_uptodate() check for files that were expected to *become* SKIP_WORKTREE. For files that were not already SKIP_WORKTREE, that can cause us to later delete the file in apply_sparse_checkout(). Only skip the check for files that were already SKIP_WORKTREE as well to avoid lightly discarding important changes users may have made to files. Note 1: unpack-trees.c is already a bit complex, and the logic around CE_SKIP_WORKTREE and CE_NEW_SKIP_WORKTREE in that file are no exception. I also tried just replacing CE_NEW_SKIP_WORKTREE with CE_SKIP_WORKTREE in the verify_uptodate() check instead of checking for both flags, and found that it also fixed this bug and passed all the tests. I also attempted to devise a few testcases that might trip either variant of my fix and was unable to find any problems. It may be that just checking CE_SKIP_WORKTREE is a better fix, but I'm not sure. I thought it was a bit safer to strictly reduce the number of cases where we skip the up-to-date check rather than just toggling which kind of cases skip it, and thus went with the current variant of the fix. Note 2: I also wondered if verify_absent() might have a similar bug, but despite my attempts to try to devise a testcase that would trigger such a thing, I couldn't find any problematic testcases. Thus, this patch makes no attempt to apply similar changes to verify_absent() and verify_absent_if_directory(). Signed-off-by: Elijah Newren --- t/t1011-read-tree-sparse-checkout.sh | 2 +- unpack-trees.c | 4 +++- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/t/t1011-read-tree-sparse-checkout.sh b/t/t1011-read-tree-sparse-checkout.sh index 1b2395b8a82..4ed0885bf2f 100755 --- a/t/t1011-read-tree-sparse-checkout.sh +++ b/t/t1011-read-tree-sparse-checkout.sh @@ -197,7 +197,7 @@ test_expect_success 'read-tree will not throw away dirty changes, non-sparse' ' grep -q dirty init.t ' -test_expect_failure 'read-tree will not throw away dirty changes, sparse' ' +test_expect_success 'read-tree will not throw away dirty changes, sparse' ' echo "/*" >.git/info/sparse-checkout && read_tree_u_must_succeed -m -u HEAD && diff --git a/unpack-trees.c b/unpack-trees.c index 98e2f2e0e6f..6d9d89c662a 100644 --- a/unpack-trees.c +++ b/unpack-trees.c @@ -2059,7 +2059,9 @@ static int verify_uptodate_1(const struct cache_entry *ce, int verify_uptodate(const struct cache_entry *ce, struct unpack_trees_options *o) { - if (!o->skip_sparse_checkout && (ce->ce_flags & CE_NEW_SKIP_WORKTREE)) + if (!o->skip_sparse_checkout && + (ce->ce_flags & CE_SKIP_WORKTREE) && + (ce->ce_flags & CE_NEW_SKIP_WORKTREE)) return 0; return verify_uptodate_1(ce, o, ERROR_NOT_UPTODATE_FILE); } From patchwork Thu Jan 13 16:43:48 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 12712931 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 71856C433FE for ; Thu, 13 Jan 2022 16:43:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236861AbiAMQn5 (ORCPT ); Thu, 13 Jan 2022 11:43:57 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38960 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236846AbiAMQn5 (ORCPT ); Thu, 13 Jan 2022 11:43:57 -0500 Received: from mail-wr1-x434.google.com (mail-wr1-x434.google.com [IPv6:2a00:1450:4864:20::434]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 60CC4C061574 for ; Thu, 13 Jan 2022 08:43:56 -0800 (PST) Received: by mail-wr1-x434.google.com with SMTP id k30so11175780wrd.9 for ; Thu, 13 Jan 2022 08:43:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=n+J/W38/fWwGLGQ4zxztIcBKoS4TcVEB72KIxpuy4X0=; b=WHqIyRvWDBmKCwofsWx5GD3XBaUVUkYIBLsMt76UBkL/Baw7HaLwwOBo+lkRuteTcx OzewXo7774q2i/3q9gxM9lw30o0kMnFTmCQfEv2UDtkGK8Z0lPydXXtUYcunDWtUeNzh WVVNxGATS1KMgsxQUdamn+em+YjGvZeadBqTezBUFgjy6lu/rrk8XH53YhrU45YAZhTC G6xx9tdbxE3n5a0buUUw9PCYtLZMFJ18AYeRqt2lIc8fI03sneYyYUSnKaqMjonyn+5L l0nrdV2FmuytnkCSWj43MroBIBl9OzixcLNwM+RXuuhAqAN4ZHSA42llBWkTPVTYwJ2X iHnw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=n+J/W38/fWwGLGQ4zxztIcBKoS4TcVEB72KIxpuy4X0=; b=iVWtQaQlgg6s6geUjwJMwZ1ZYXf2RUpKlu0uTpULpjh/x/yB8WuKAsJVKasGZALWSR 7QkniJ7MqxpFWXq5AfMsKEQxsMhLeGbEA65iRlCY05KuevsSO/SdWtSW1eP9ePjJayGX EG+BWxdwdtErSTUbTy/oH0sVHIrNvKXQZGMFGJDdqmKk0odIh7ngZ+iQqpljV3B0rlIF aT1vaFlvfcGYnFtw56nPGqXjB/rQZbigHJyolUUEej8CU2/FWh7tR5dsQ5nNVHXPdk1M j06YCCHjX6BTeH9yybM26OPKjjGQ062LJYLGqW3i53RRf93GFvctJnCwWFW4AyHAvAws q4Mg== X-Gm-Message-State: AOAM533CIs8mI/HoonLWLLiMwWzCnhXkYf4g5XVlmgonyQuvaudvLS9S iFJyZMVx6llI7XdETsBY8pGB+g/Z3Dk= X-Google-Smtp-Source: ABdhPJwyDitCi91VKD+h4qszFp8GgkQPJfLdIKCE1ibteLicRgSYEcRkJX2bEd/voWEDP0mQb71Asg== X-Received: by 2002:a5d:648e:: with SMTP id o14mr4754472wri.667.1642092234328; Thu, 13 Jan 2022 08:43:54 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id b1sm3730421wrd.92.2022.01.13.08.43.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 13 Jan 2022 08:43:53 -0800 (PST) Message-Id: In-Reply-To: References: Date: Thu, 13 Jan 2022 16:43:48 +0000 Subject: [PATCH 3/5] repo_read_index: clear SKIP_WORKTREE bit from files present in worktree Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Victoria Dye , Derrick Stolee , Lessley Dennington , Elijah Newren , Elijah Newren Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Elijah Newren From: Elijah Newren The fix is short (~30 lines), but the description is not. Sorry. There is a set of problems caused by files in what I'll refer to as the "present-despite-SKIP_WORKTREE" state. This commit aims to not just fix these problems, but remove the entire class as a possibility -- for those using sparse checkouts. But first, we need to understand the problems this class presents. A quick outline: * Problems * User facing issues * Problem space complexity * Maintenance and code correctness challenges * SKIP_WORKTREE expectations in Git * Suggested solution * Pros/Cons of suggested solution * Notes on testcase modifications === User facing issues === There are various ways for users to get files to be present in the working copy despite having the SKIP_WORKTREE bit set for that file in the index. This may come from: * various git commands not really supporting the SKIP_WORKTREE bit[1,2] * users grabbing files from elsewhere and writing them to the worktree (perhaps even cached in their editor) * users attempting to "abort" a sparse-checkout operation with a not-so-early Ctrl+C (updating $GIT_DIR/info/sparse-checkout and the working tree is not atomic)[3]. Once users have present-despite-SKIP_WORKTREE files, any modifications users make to these files will be ignored, possibly to users' confusion. Further: * these files will degrade performance for the sparse-index case due to requiring the index to be expanded (see commit 55dfcf9591 ("sparse-checkout: clear tracked sparse dirs", 2021-09-08) for why we try to delete entire directories outside the sparse cone). * these files will not be updated by by standard commands (switch/checkout/pull/merge/rebase will leave them alone unless conflicts happen -- and even then, the conflicted file may be written somewhere else to avoid overwriting the SKIP_WORKTREE file that is present and in the way) * there is nothing in Git that users can use to discover such files (status, diff, grep, etc. all ignore it) * there is no reasonable mechanism to "recover" from such a condition (neither `git sparse-checkout reapply` nor `git reset --hard` will correct it). So, not only are users modifications ignored, but the files get progressively more stale over time. At some point in the future, they may change their sparseness specification or disable sparse-checkouts. At that time, all present-despite-SKIP_WORKTREE files will show up as having lots of modifications because they represent a version from a different branch or commit. These might include user-made local changes from days before, but the only way to tell is to have users look through them all closely. If these users come to others for help, there will be no logs that explain the issue; it's just a mysterious list of changes. Users might adamantly claim (correctly, as it turns out) that they didn't modify these files, while others presume they did. [1] https://lore.kernel.org/git/xmqqbmb1a7ga.fsf@gitster-ct.c.googlers.com/ [2] https://lore.kernel.org/git/CABPp-BH9tju7WVm=QZDOvaMDdZbpNXrVWQdN-jmfN8wC6YVhmw@mail.gmail.com/ [3] https://lore.kernel.org/git/CABPp-BFnFpzwGC11TLoLs8YK5yiisA5D5-fFjXnJsbESVDwZsA@mail.gmail.com/ === Problem space complexity === SKIP_WORKTREE has been part of Git for over a decade. Duy did lots of work on it initially, and several others have since come along and put lots of work into it. Stolee spent most of 2021 on the sparse-index, with lots of bugfixes along the way including to non-sparse-index cases as we are still trying to get sparse checkouts to behave reasonably. Basically every codepath throughout the treat needs to be aware of an additional type of file: tracked-but-not-present. The extra type results in lots of extra testcases and lots of extra code everywhere. But, the sad thing is that we actually have more than one extra type. We have tracked, tracked-but-not-present (SKIP_WORKTREE), and tracked-but-promised-to-not-be-present-but-is-present-anyway (present-despite-SKIP_WORKTREE). Two types is a monumental amount of effort to support, and adding a third feels a bit like insanity[4]. [4] Some examples of which can be seen at https://lore.kernel.org/git/CABPp-BGJ_Nvi5TmgriD9Bh6eNXE2EDq2f8e8QKXAeYG3BxZafA@mail.gmail.com/ === Maintenance and code correctness challenges === Matheus' patches to grep stalled for nearly a year, in part because of complications of how to handle sparse-checkouts appropriately in all cases[5][6] (with trying to sanely figure out how to sanely handle present-despite-SKIP_WORKTREE files being one of the complications). His rm/add follow-ups also took months because of those kinds of issues[7]. The corner cases with things like submodules and SKIP_WORKTREE with the addition of present-despite-SKIP_WORKTREE start becoming really complex[8]. We've had to add ugly logic to merge-ort to attempt to handle present-despite-SKIP_WORKTREE files[9], and basically just been forced to give up in merge-recursive knowing full well that we'll sometimes silently discard user modifications. Despite stash essentially being a merge, it needed extra code (beyond what was in merge-ort and merge-recursive) to manually tweak SKIP_WORKTREE bits in order to avoid a few different bugs that'd result in an early abort with a partial stash application[10]. [5] See https://lore.kernel.org/git/5f3f7ac77039d41d1692ceae4b0c5df3bb45b74a.1612901326.git.matheus.bernardino@usp.br/#t and the dates on the thread; also Matheus and I had several conversations off-list trying to resolve the issues over that time [6] ...it finally kind of got unstuck after https://lore.kernel.org/git/CABPp-BGJ_Nvi5TmgriD9Bh6eNXE2EDq2f8e8QKXAeYG3BxZafA@mail.gmail.com/ [7] See for example https://lore.kernel.org/git/CABPp-BHwNoVnooqDFPAsZxBT9aR5Dwk5D9sDRCvYSb8akxAJgA@mail.gmail.com/#t and quotes like "The core functionality of sparse-checkout has always been only partially implemented", a statement I still believe is true today. [8] https://lore.kernel.org/git/pull.809.git.git.1592356884310.gitgitgadget@gmail.com/ [9] See commit 66b209b86a ("merge-ort: implement CE_SKIP_WORKTREE handling with conflicted entries", 2021-03-20) [10] See commit ba359fd507 ("stash: fix stash application in sparse-checkouts", 2020-12-01) === SKIP_WORKTREE expectations in Git === A couple quotes: From [11] (before the "sparse-checkout" command existed): If it needs too many special cases, hacks, and conditionals, then it is not worth the complexity---if it is easier to write a correct code by allowing Git to populate working tree files, it is perfectly fine to do so. In a sense, the sparse checkout "feature" itself is a hack by itself, and that is why I think this part should be "best effort" as well. From the git-sparse-checkout manual (still present today): THIS COMMAND IS EXPERIMENTAL. ITS BEHAVIOR, AND THE BEHAVIOR OF OTHER COMMANDS IN THE PRESENCE OF SPARSE-CHECKOUTS, WILL LIKELY CHANGE IN THE FUTURE. [11] https://lore.kernel.org/git/xmqqbmb1a7ga.fsf@gitster-ct.c.googlers.com/ === Suggested solution === SKIP_WORKTREE was written to allow sparse-checkouts, in particular, as the name of the option implies, to allow the file to NOT be in the worktree but consider it to be unchanged rather than deleted. The suggests a simple solution: present-despite-SKIP_WORKTREE files should not exist, for those using sparse-checkouts. Enforce this at index loading time by checking if core.sparseCheckout is true; if so, check files in the index with the SKIP_WORKTREE bit set to verify that they are absent from the working tree. If they are present, unset the bit (in memory, though any commands that write to the index will record the update). Users can, of course, can get the SKIP_WORKTREE bit back such as by running `git sparse-checkout reapply` (if they have ensured the file is unmodified and doesn't match the specified sparsity patterns). === Pros/Cons of suggested solution === Pros: * Solves the user visible problems reported above, which I've been complaining about for nearly a year but couldn't find a solution to. * Helps prevent slow performance degradation with a sparse-index. * Much easier behavior in sparse-checkouts for users to reason about * Very simple, ~30 lines of code. * Significantly simplifies some ugly testcases, and obviates the need to test an entire class of potential issues. * Reduces code complexity, reasoning, and maintenance. Avoids disagreements about weird corner cases[12]. * It has been reported that some users might be (ab)using SKIP_WORKTREE as a let-me-modify-but-keep-the-file-in-the-worktree mechanism[13, and a few other similar references]. These users know of multiple caveats and shortcomings in doing so; perhaps not surprising given the "SKIP_WORKTREE expecations" section above. However, these users use `git update-index --skip-worktree`, and not `git sparse-checkout` or core.sparseCheckout=true. As such, these users would be unaffected by this change and can continue abusing the system as before. [12] https://lore.kernel.org/git/CABPp-BH9tju7WVm=QZDOvaMDdZbpNXrVWQdN-jmfN8wC6YVhmw@mail.gmail.com/ [13] https://stackoverflow.com/questions/13630849/git-difference-between-assume-unchanged-and-skip-worktree Cons: * When core.sparseCheckout is enabled, this adds a performance cost to reading the index. I'll defer discussion of this cost to a subsequent patch, since I have some optimizations to add. === Notes on testcase modifications === The good: * t1011: Compare to two cases above it ('read-tree will not throw away dirty changes, non-sparse'); since the file is present, it should match the non-sparse case now * t1092: sparse-index & sparse-checkout now match full-worktree behavior in more cases! Yaay for consistency! * t6428, t7012: look at how much simpler the tests become! Merge and stash can just fail early telling the user there's a file in the way, instead of not noticing until it's about to write a file and then have to implement sudden crash avoidance. Hurray for sanity! * t7817: sparse behavior better matches full tree behavior. Hurray for sanity! The confusing: * t3705: These changes were ONLY needed on Windows, but they don't hurt other platforms. Let's discuss each individually: * core.sparseCheckout should be false by default. Nothing in this testcase toggles that until many, many tests later. However, early tests (#5 in particular) were testing `update-index --skip-worktree` behavior in a non-sparse-checkout, but the Windows tests in CI were behaving as if core.sparseCheckout=true had been specified somewhere. I do not have access to a Windows machine. But I just manually did what should have been a no-op and turned the config off. And it fixed the test. * I have no idea why the leftover .gitattributes file from this test was causing failures for test #18 on Windows, but only with these changes of mine. Test #18 was checking for empty stderr, and specifically wanted to know that some error completely unrelated to file endings did not appear. The leftover .gitattributes file thus caused some spurious stderr unrelated to the thing being checked. Since other tests did not intend to test normalization, just proactively remove the .gitattributes file. I'm certain this is cleaner and better, I'm just unsure why/how this didn't trigger problems before. Signed-off-by: Elijah Newren --- repository.c | 7 ++++ sparse-index.c | 21 +++++++++++ sparse-index.h | 1 + t/t1011-read-tree-sparse-checkout.sh | 2 +- t/t1092-sparse-checkout-compatibility.sh | 16 ++++----- t/t3705-add-sparse-checkout.sh | 2 ++ t/t6428-merge-conflicts-sparse.sh | 23 +++---------- t/t7012-skip-worktree-writing.sh | 44 ++++-------------------- t/t7817-grep-sparse-checkout.sh | 11 ++++-- 9 files changed, 61 insertions(+), 66 deletions(-) diff --git a/repository.c b/repository.c index 34610c5a33e..dddee32258f 100644 --- a/repository.c +++ b/repository.c @@ -301,6 +301,13 @@ int repo_read_index(struct repository *repo) if (repo->settings.command_requires_full_index) ensure_full_index(repo->index); + /* + * If sparse checkouts are in use, check whether paths with the + * SKIP_WORKTREE attribute are missing from the worktree; if not, + * clear that attribute for that path. + */ + clear_skip_worktree_from_present_files(repo->index); + return res; } diff --git a/sparse-index.c b/sparse-index.c index a1d505d50e9..b82648b10ee 100644 --- a/sparse-index.c +++ b/sparse-index.c @@ -341,6 +341,27 @@ void ensure_correct_sparsity(struct index_state *istate) ensure_full_index(istate); } +void clear_skip_worktree_from_present_files(struct index_state *istate) +{ + int i; + if (!core_apply_sparse_checkout) + return; + +restart: + for (i = 0; i < istate->cache_nr; i++) { + struct cache_entry *ce = istate->cache[i]; + struct stat st; + + if (ce_skip_worktree(ce) && !lstat(ce->name, &st)) { + if (S_ISSPARSEDIR(ce->ce_mode)) { + ensure_full_index(istate); + goto restart; + } + ce->ce_flags &= ~CE_SKIP_WORKTREE; + } + } +} + /* * This static global helps avoid infinite recursion between * expand_to_path() and index_file_exists(). diff --git a/sparse-index.h b/sparse-index.h index 656bd835b25..633d4fb7e31 100644 --- a/sparse-index.h +++ b/sparse-index.h @@ -5,6 +5,7 @@ struct index_state; #define SPARSE_INDEX_MEMORY_ONLY (1 << 0) int convert_to_sparse(struct index_state *istate, int flags); void ensure_correct_sparsity(struct index_state *istate); +void clear_skip_worktree_from_present_files(struct index_state *istate); /* * Some places in the codebase expect to search for a specific path. diff --git a/t/t1011-read-tree-sparse-checkout.sh b/t/t1011-read-tree-sparse-checkout.sh index 4ed0885bf2f..dd957be1b78 100755 --- a/t/t1011-read-tree-sparse-checkout.sh +++ b/t/t1011-read-tree-sparse-checkout.sh @@ -212,7 +212,7 @@ test_expect_success 'read-tree updates worktree, dirty case' ' echo sub/added >.git/info/sparse-checkout && git checkout -f top && echo dirty >init.t && - read_tree_u_must_succeed -m -u HEAD^ && + read_tree_u_must_fail -m -u HEAD^ && grep -q dirty init.t && rm init.t ' diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh index 53f84881de7..df2839e8e2c 100755 --- a/t/t1092-sparse-checkout-compatibility.sh +++ b/t/t1092-sparse-checkout-compatibility.sh @@ -370,7 +370,7 @@ test_expect_success 'status/add: outside sparse cone' ' write_script edit-contents <<-\EOF && echo text >>$1 EOF - run_on_sparse ../edit-contents folder1/a && + run_on_all ../edit-contents folder1/a && run_on_all ../edit-contents folder1/new && test_sparse_match git status --porcelain=v2 && @@ -379,8 +379,8 @@ test_expect_success 'status/add: outside sparse cone' ' test_sparse_match test_must_fail git add folder1/a && grep "Disable or modify the sparsity rules" sparse-checkout-err && test_sparse_unstaged folder1/a && - test_sparse_match test_must_fail git add --refresh folder1/a && - grep "Disable or modify the sparsity rules" sparse-checkout-err && + test_all_match git add --refresh folder1/a && + test_must_be_empty sparse-checkout-err && test_sparse_unstaged folder1/a && test_sparse_match test_must_fail git add folder1/new && grep "Disable or modify the sparsity rules" sparse-checkout-err && @@ -646,11 +646,11 @@ test_expect_success 'update-index modify outside sparse definition' ' run_on_sparse cp ../initial-repo/folder1/a folder1/a && run_on_all ../edit-contents folder1/a && - # If file has skip-worktree enabled, update-index does not modify the - # index entry - test_sparse_match git update-index folder1/a && - test_sparse_match git status --porcelain=v2 && - test_must_be_empty sparse-checkout-out && + # If file has skip-worktree enabled, but the file is present, it is + # treated the same as if skip-worktree is disabled + test_all_match git status --porcelain=v2 && + test_all_match git update-index folder1/a && + test_all_match git status --porcelain=v2 && # When skip-worktree is disabled (even on files outside sparse cone), file # is updated in the index diff --git a/t/t3705-add-sparse-checkout.sh b/t/t3705-add-sparse-checkout.sh index f3143c92908..61506c1d7ce 100755 --- a/t/t3705-add-sparse-checkout.sh +++ b/t/t3705-add-sparse-checkout.sh @@ -19,6 +19,7 @@ setup_sparse_entry () { fi && git add sparse_entry && git update-index --skip-worktree sparse_entry && + git config core.sparseCheckout false && git commit --allow-empty -m "ensure sparse_entry exists at HEAD" && SPARSE_ENTRY_BLOB=$(git rev-parse :sparse_entry) } @@ -126,6 +127,7 @@ test_expect_success 'git add --chmod does not update sparse entries' ' ' test_expect_success 'git add --renormalize does not update sparse entries' ' + test_when_finished rm .gitattributes && test_config core.autocrlf false && setup_sparse_entry "LINEONE\r\nLINETWO\r\n" && echo "sparse_entry text=auto" >.gitattributes && diff --git a/t/t6428-merge-conflicts-sparse.sh b/t/t6428-merge-conflicts-sparse.sh index 7e8bf497f82..142c9aaabc5 100755 --- a/t/t6428-merge-conflicts-sparse.sh +++ b/t/t6428-merge-conflicts-sparse.sh @@ -112,7 +112,7 @@ test_expect_success 'conflicting entries written to worktree even if sparse' ' ) ' -test_expect_merge_algorithm failure success 'present-despite-SKIP_WORKTREE handled reasonably' ' +test_expect_success 'present-despite-SKIP_WORKTREE handled reasonably' ' test_setup_numerals in_the_way && ( cd numerals_in_the_way && @@ -132,26 +132,13 @@ test_expect_merge_algorithm failure success 'present-despite-SKIP_WORKTREE handl test_must_fail git merge -s recursive B^0 && - git ls-files -t >index_files && - test_cmp expected-index index_files && + test_path_is_missing .git/MERGE_HEAD && - test_path_is_file README && test_path_is_file numerals && - test_cmp expected-merge numerals && - - # There should still be a file with "foobar" in it - grep foobar * && - - # 5 other files: - # * expected-merge - # * expected-index - # * index_files - # * others - # * whatever name was given to the numerals file that had - # "foobar" in it - git ls-files -o >others && - test_line_count = 5 others + # numerals should still have "foobar" in it + echo foobar >expect && + test_cmp expect numerals ) ' diff --git a/t/t7012-skip-worktree-writing.sh b/t/t7012-skip-worktree-writing.sh index a1080b94e38..cb9f1a6981e 100755 --- a/t/t7012-skip-worktree-writing.sh +++ b/t/t7012-skip-worktree-writing.sh @@ -171,50 +171,20 @@ test_expect_success 'stash restore in sparse checkout' ' # Put a file in the working directory in the way echo in the way >modified && - git stash apply && + test_must_fail git stash apply 2>error&& - # Ensure stash vivifies modifies paths... - cat >expect <<-EOF && - H addme - H modified - H removeme - H subdir/A - S untouched - EOF - git ls-files -t >actual && - test_cmp expect actual && + grep "changes.*would be overwritten by merge" error && - # ...and that the paths show up in status as changed... - cat >expect <<-EOF && - A addme - M modified - D removeme - M subdir/A - ?? actual - ?? expect - ?? modified.stash.XXXXXX - EOF - git status --porcelain | \ - sed -e s/stash......./stash.XXXXXX/ >actual && - test_cmp expect actual && + echo in the way >expect && + test_cmp expect modified && + git diff --quiet HEAD ":!modified" && # ...and that working directory reflects the files correctly - test_path_is_file addme && + test_path_is_missing addme && test_path_is_file modified && test_path_is_missing removeme && test_path_is_file subdir/A && - test_path_is_missing untouched && - - # ...including that we have the expected "modified" file... - cat >expect <<-EOF && - modified - tweaked - EOF - test_cmp expect modified && - - # ...and that the other "modified" file is still present... - echo in the way >expect && - test_cmp expect modified.stash.* + test_path_is_missing untouched ) ' diff --git a/t/t7817-grep-sparse-checkout.sh b/t/t7817-grep-sparse-checkout.sh index 590b99bbb6f..eb595645657 100755 --- a/t/t7817-grep-sparse-checkout.sh +++ b/t/t7817-grep-sparse-checkout.sh @@ -83,10 +83,13 @@ test_expect_success 'setup' ' # The test below covers a special case: the sparsity patterns exclude '/b' and # sparse checkout is enabled, but the path exists in the working tree (e.g. -# manually created after `git sparse-checkout init`). git grep should skip it. +# manually created after `git sparse-checkout init`). Although b is marked +# as SKIP_WORKTREE, git grep should notice it IS present in the worktree and +# report it. test_expect_success 'working tree grep honors sparse checkout' ' cat >expect <<-EOF && a:text + b:new-text EOF test_when_finished "rm -f b" && echo "new-text" >b && @@ -126,12 +129,16 @@ test_expect_success 'grep --cached searches entries with the SKIP_WORKTREE bit' ' # Note that sub2/ is present in the worktree but it is excluded by the sparsity -# patterns, so grep should not recurse into it. +# patterns. We also explicitly mark it as SKIP_WORKTREE in case it got cleared +# by previous git commands. Thus sub2 starts as SKIP_WORKTREE but since it is +# present in the working tree, grep should recurse into it. test_expect_success 'grep --recurse-submodules honors sparse checkout in submodule' ' cat >expect <<-EOF && a:text sub/B/b:text + sub2/a:text EOF + git update-index --skip-worktree sub2 && git grep --recurse-submodules "text" >actual && test_cmp expect actual ' From patchwork Thu Jan 13 16:43:49 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 12712932 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3E053C433EF for ; Thu, 13 Jan 2022 16:43:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236864AbiAMQn6 (ORCPT ); Thu, 13 Jan 2022 11:43:58 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38962 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236855AbiAMQn5 (ORCPT ); Thu, 13 Jan 2022 11:43:57 -0500 Received: from mail-wm1-x329.google.com (mail-wm1-x329.google.com [IPv6:2a00:1450:4864:20::329]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C416CC06161C for ; Thu, 13 Jan 2022 08:43:56 -0800 (PST) Received: by mail-wm1-x329.google.com with SMTP id v123so4309051wme.2 for ; Thu, 13 Jan 2022 08:43:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=B8sYjWF/iiVOI9HKxp3FHaiKzKBQ6qAlUF3+nG4HV5w=; b=FrVMNS0Dv8JX5dhk9o7jJgoQGjbEsqGTICFQm4fN0VNSD/ICd2oe3Jgd2hsfnXncb8 4n5k6CIlxuRTpsXNMGsLjcVlvSqx9i9M7e5SIaK7k0tgcP3WeNZn2wRIS4nfDZcjyCJp UlwxCkvneKoC//V21NF7vYMse+2WfAiB5aW7/dzvE4vlpoEZ/imw8nRbnMeIX3ZQh5VG R3tKHqa7ybXvwQRZ1crLvG2QdXoAQmTE1JM+Tdm9x6i3KBzl9GqzobQQLuqK9XiIDJhV oHAsasH9W0ZaUIycjFXFNhxB/7lb8O1PEgF7ifok35neSK0WJQ/elId31uFYM0lQmYVb hJOQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=B8sYjWF/iiVOI9HKxp3FHaiKzKBQ6qAlUF3+nG4HV5w=; b=q1fvwxwjfVRq1jPumT7a+rWbfC8B5eISU4+3FwC+nx6G474Sa3q1IPTptOmiofy0XY k9nuBcKxVEKtOkLdyWV2YWgTZ6IJzFtMaeTJ/eEhs4ZlGgWSVEppL5wqqzjA6kC345aV ct5QBi0EVI00mbPJEXWkXu/247YnrlyNg9nOOJSZTsExX9NaHOMOfOCsZExo3w8W/JBN zwUuC0m6jZS+10kEd6HBUEvLYl/2C9BycuY/eX93p1iO9mxAHxpVutWj2WyX6r6qX7MY I3T5yCyZp5FU5rJJUGyMogzC2mG9m9CtX8v5spIjsAaNd/JriqIDVThPkZwImbN6VoGl S75Q== X-Gm-Message-State: AOAM533wV5O6AR5KHBXvfd3btRoWUxvXQGvze9eyS2pntny9Di23aX8L EnoR9iyOXm3wJNBrL+vLYC1I97/Tfjw= X-Google-Smtp-Source: ABdhPJxV1Hq3rSI9dQ435ck+exGBFUgrD3E+sihvbZ+vPGQqV0FWQ40VlZe+cU38QOKwT0uB31/Lsw== X-Received: by 2002:a05:600c:364b:: with SMTP id y11mr2291549wmq.156.1642092235027; Thu, 13 Jan 2022 08:43:55 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id l6sm4762232wry.18.2022.01.13.08.43.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 13 Jan 2022 08:43:54 -0800 (PST) Message-Id: In-Reply-To: References: Date: Thu, 13 Jan 2022 16:43:49 +0000 Subject: [PATCH 4/5] Update documentation related to sparsity and the skip-worktree bit Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Victoria Dye , Derrick Stolee , Lessley Dennington , Elijah Newren , Elijah Newren Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Elijah Newren From: Elijah Newren Make several small updates, to address a few documentation issues I spotted: * sparse-checkout focused on "patterns" even though the inputs (and outputs in the case of `list`) are directories in cone-mode * The description section of the sparse-checkout documentation was a bit sparse (no pun intended), and focused more on internal mechanics rather than end user usage. This made sense in the early days when the command was even more experimental, but let's adjust a bit to try to make it more approachable to end users who may want to consider using it. Keep the scary backward compatibility warning, though; we're still hard at work trying to fix up commands to behave reasonably in sparse checkouts. * both read-tree and update-index tried to describe how to use the skip-worktree bit, but both predated the sparse-checkout command. The sparse-checkout command is a far easier mechanism to use and for users trying to reduce the size of their working tree, we should recommend users to look at it instead. * The update-index documentation pointed out that assume-unchanged and skip-worktree sounded similar but had different purposes. However, it made no attempt to explain the differences, only to point out that they were different. Explain the differences. * The update-index documentation focused much more on (internal?) implementation details than on end-user usage. Try to explain its purpose better for users of update-index, rather than fellow developers trying to work with the SKIP_WORKTREE bit. * Clarify that when core.sparseCheckout=true, we treat a file's presence in the working tree as being an override to the SKIP_WORKTREE bit (i.e. in sparse checkouts when the file is present we ignore the SKIP_WORKTREE bit). Note that this commit, like many touching documentation, is best viewed with the `--color-words` option to diff/log. Signed-off-by: Elijah Newren --- Documentation/git-read-tree.txt | 12 +++-- Documentation/git-sparse-checkout.txt | 76 ++++++++++++++++----------- Documentation/git-update-index.txt | 57 +++++++++++++++----- 3 files changed, 98 insertions(+), 47 deletions(-) diff --git a/Documentation/git-read-tree.txt b/Documentation/git-read-tree.txt index 8c3aceb8324..99bb387134d 100644 --- a/Documentation/git-read-tree.txt +++ b/Documentation/git-read-tree.txt @@ -375,9 +375,14 @@ have finished your work-in-progress), attempt the merge again. SPARSE CHECKOUT --------------- +Note: The `update-index` and `read-tree` primitives for supporting the +skip-worktree bit predated the introduction of +linkgit:git-sparse-checkout[1]. Users are encouraged to use +`sparse-checkout` in preference to these low-level primitives. + "Sparse checkout" allows populating the working directory sparsely. -It uses the skip-worktree bit (see linkgit:git-update-index[1]) to tell -Git whether a file in the working directory is worth looking at. +It uses the skip-worktree bit (see linkgit:git-update-index[1]) to +tell Git whether a file in the working directory is worth looking at. 'git read-tree' and other merge-based commands ('git merge', 'git checkout'...) can help maintaining the skip-worktree bitmap and working @@ -385,7 +390,8 @@ directory update. `$GIT_DIR/info/sparse-checkout` is used to define the skip-worktree reference bitmap. When 'git read-tree' needs to update the working directory, it resets the skip-worktree bit in the index based on this file, which uses the same syntax as .gitignore files. -If an entry matches a pattern in this file, skip-worktree will not be +If an entry matches a pattern in this file, or the entry corresponds to +a file present in the working tree, then skip-worktree will not be set on that entry. Otherwise, skip-worktree will be set. Then it compares the new skip-worktree value with the previous one. If diff --git a/Documentation/git-sparse-checkout.txt b/Documentation/git-sparse-checkout.txt index b81dbe06543..3da3d5a1007 100644 --- a/Documentation/git-sparse-checkout.txt +++ b/Documentation/git-sparse-checkout.txt @@ -3,9 +3,7 @@ git-sparse-checkout(1) NAME ---- -git-sparse-checkout - Initialize and modify the sparse-checkout -configuration, which reduces the checkout to a set of paths -given by a list of patterns. +git-sparse-checkout - Reduce your working tree to a subset of tracked files SYNOPSIS @@ -17,8 +15,20 @@ SYNOPSIS DESCRIPTION ----------- -Initialize and modify the sparse-checkout configuration, which reduces -the checkout to a set of paths given by a list of patterns. +This command is used to create sparse checkouts, which means that it +changes the working tree from having all tracked files present, to only +have a subset of them. It can also switch which subset of files are +present, or undo and go back to having all tracked files present in the +working copy. + +The subset of files is chosen by providing a list of directories in +cone mode (which is recommended), or by providing a list of patterns +in non-cone mode. + +When in a sparse-checkout, other Git commands behave a bit differently. +For example, switching branches will not update paths outside the +sparse-checkout directories/patterns, and `git commit -a` will not record +paths outside the sparse-checkout directories/patterns as deleted. THIS COMMAND IS EXPERIMENTAL. ITS BEHAVIOR, AND THE BEHAVIOR OF OTHER COMMANDS IN THE PRESENCE OF SPARSE-CHECKOUTS, WILL LIKELY CHANGE IN @@ -28,7 +38,7 @@ THE FUTURE. COMMANDS -------- 'list':: - Describe the patterns in the sparse-checkout file. + Describe the directories or patterns in the sparse-checkout file. 'set':: Enable the necessary config settings @@ -38,20 +48,26 @@ COMMANDS list of arguments following the 'set' subcommand. Update the working directory to match the new patterns. + -When the `--stdin` option is provided, the patterns are read from -standard in as a newline-delimited list instead of from the arguments. +When the `--stdin` option is provided, the directories or patterns are +read from standard in as a newline-delimited list instead of from the +arguments. + When `--cone` is passed or `core.sparseCheckoutCone` is enabled, the -input list is considered a list of directories instead of -sparse-checkout patterns. This allows for better performance with a -limited set of patterns (see 'CONE PATTERN SET' below). Note that the -set command will write patterns to the sparse-checkout file to include -all files contained in those directories (recursively) as well as -files that are siblings of ancestor directories. The input format -matches the output of `git ls-tree --name-only`. This includes -interpreting pathnames that begin with a double quote (") as C-style -quoted strings. This may become the default in the future; --no-cone -can be passed to request non-cone mode. +input list is considered a list of directories. This allows for +better performance with a limited set of patterns (see 'CONE PATTERN +SET' below). The input format matches the output of `git ls-tree +--name-only`. This includes interpreting pathnames that begin with a +double quote (") as C-style quoted strings. Note that the set command +will write patterns to the sparse-checkout file to include all files +contained in those directories (recursively) as well as files that are +siblings of ancestor directories. This may become the default in the +future; --no-cone can be passed to request non-cone mode. ++ +When `--no-cone` is passed or `core.sparseCheckoutCone` is not enabled, +the input list is considered a list of patterns. This mode is harder +to use and less performant, and is thus not recommended. See the +"Sparse Checkout" section of linkgit:git-read-tree[1] and the "Pattern +Set" sections below for more details. + Use the `--[no-]sparse-index` option to use a sparse index (the default is to not use it). A sparse index reduces the size of the @@ -69,11 +85,10 @@ understand the sparse directory entries index extension and may fail to interact with your repository until it is disabled. 'add':: - Update the sparse-checkout file to include additional patterns. - By default, these patterns are read from the command-line arguments, - but they can be read from stdin using the `--stdin` option. When - `core.sparseCheckoutCone` is enabled, the given patterns are interpreted - as directory names as in the 'set' subcommand. + Update the sparse-checkout file to include additional directories + (in cone mode) or patterns (in non-cone mode). By default, these + directories or patterns are read from the command-line arguments, + but they can be read from stdin using the `--stdin` option. 'reapply':: Reapply the sparsity pattern rules to paths in the working tree. @@ -117,13 +132,14 @@ decreased in utility. SPARSE CHECKOUT --------------- -"Sparse checkout" allows populating the working directory sparsely. -It uses the skip-worktree bit (see linkgit:git-update-index[1]) to tell -Git whether a file in the working directory is worth looking at. If -the skip-worktree bit is set, then the file is ignored in the working -directory. Git will avoid populating the contents of those files, which -makes a sparse checkout helpful when working in a repository with many -files, but only a few are important to the current user. +"Sparse checkout" allows populating the working directory sparsely. It +uses the skip-worktree bit (see linkgit:git-update-index[1]) to tell Git +whether a file in the working directory is worth looking at. If the +skip-worktree bit is set, and the file is not present in the working tree, +then its absence is ignored. Git will avoid populating the contents of +those files, which makes a sparse checkout helpful when working in a +repository with many files, but only a few are important to the current +user. The `$GIT_DIR/info/sparse-checkout` file is used to define the skip-worktree reference bitmap. When Git updates the working diff --git a/Documentation/git-update-index.txt b/Documentation/git-update-index.txt index 2853f168d97..568dbfe76b8 100644 --- a/Documentation/git-update-index.txt +++ b/Documentation/git-update-index.txt @@ -351,6 +351,10 @@ unchanged". Note that "assume unchanged" bit is *not* set if the index (use `git update-index --really-refresh` if you want to mark them as "assume unchanged"). +Sometimes users confuse the assume-unchanged bit with the +skip-worktree bit. See the final paragraph in the "Skip-worktree bit" +section below for an explanation of the differences. + EXAMPLES -------- @@ -392,22 +396,47 @@ M foo.c SKIP-WORKTREE BIT ----------------- -Skip-worktree bit can be defined in one (long) sentence: When reading -an entry, if it is marked as skip-worktree, then Git pretends its -working directory version is up to date and read the index version -instead. - -To elaborate, "reading" means checking for file existence, reading -file attributes or file content. The working directory version may be -present or absent. If present, its content may match against the index -version or not. Writing is not affected by this bit, content safety -is still first priority. Note that Git _can_ update working directory -file, that is marked skip-worktree, if it is safe to do so (i.e. -working directory version matches index version) +Skip-worktree bit can be defined in one (long) sentence: Tell git to +avoid writing the file to the working directory when reasonably +possible, and treat the file as unchanged when it is not +present in the working directory. + +Note that not all git commands will pay attention to this bit, and +some only partially support it. + +The update-index flags and the read-tree capabilities relating to the +skip-worktree bit predated the introduction of the +linkgit:git-sparse-checkout[1] command, which provides a much easier +way to configure and handle the skip-worktree bits. If you want to +reduce your working tree to only deal with a subset of the files in +the repository, we strongly encourage the use of +linkgit:git-sparse-checkout[1] in preference to the low-level +update-index and read-tree primitives. + +The primary purpose of the skip-worktree bit is to enable sparse +checkouts, i.e. to have working directories with only a subset of +paths present. When the skip-worktree bit is set, Git commands (such +as `switch`, `pull`, `merge`) will avoid writing these files. +However, these commands will sometimes write these files anyway in +important cases such as conflicts during a merge or rebase. Git +commands will also avoid treating the lack of such files as an +intentional deletion; for example `git add -u` will not not stage a +deletion for these files and `git commit -a` will not make a commit +deleting them either. Although this bit looks similar to assume-unchanged bit, its goal is -different from assume-unchanged bit's. Skip-worktree also takes -precedence over assume-unchanged bit when both are set. +different. The assume-unchanged bit is for leaving the file in the +working tree but having Git omit checking it for changes and presuming +that the file has not been changed (though if it can determine without +stat'ing the file that it has changed, it is free to record the +changes). skip-worktree tells Git to ignore the absence of the file, +avoid updating it when possible with commands that normally update +much of the working directory (e.g. `checkout`, `switch`, `pull`, +etc.), and not have its absence be recorded in commits. Note that in +sparse checkouts (setup by `git sparse-checkout` or by configuring +core.sparseCheckout to true), if a file is marked as skip-worktree in +the index but is found in the working tree, Git will clear the +skip-worktree bit for that file. SPLIT INDEX ----------- From patchwork Thu Jan 13 16:43:50 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 12712933 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0D690C4332F for ; Thu, 13 Jan 2022 16:44:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236865AbiAMQn7 (ORCPT ); Thu, 13 Jan 2022 11:43:59 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38966 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236862AbiAMQn6 (ORCPT ); Thu, 13 Jan 2022 11:43:58 -0500 Received: from mail-wm1-x32c.google.com (mail-wm1-x32c.google.com [IPv6:2a00:1450:4864:20::32c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 77D37C061574 for ; Thu, 13 Jan 2022 08:43:57 -0800 (PST) Received: by mail-wm1-x32c.google.com with SMTP id ay4-20020a05600c1e0400b0034a81a94607so3124678wmb.1 for ; Thu, 13 Jan 2022 08:43:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=8tktbsfgdwLJuLd8ONiKhRO06je4eOmcmCGWb6d79ds=; b=ogmkrtTnNmyKX4DinL4JSio/xU9HLfH/0hQFCE+Pw1cM7X5r3omaX46xZDv7j4mvhB IbVSMluC4BigVy367gdyqyCdaFaJSZ3feS7tElVoJKJxzWHrMCmmtyRrV9dbItfXhd2E 1Av+dZXTcgULlDBGMTj4bsQXiSTcoRMVjD6uXY1iG1h+aG8T1QdpqnhlMZjopvRBcXUL sx1nuZqrM67bmSWWMnMaZIVyxGvTnBVQRKZojO4VqTuZcQyzEmO9JTqDY525sqvKeiIw VygbES9HduK2i34ntku2PUS0XgchonotOiw/8TdGMS0eZJHsIQE7UaoGjkI4Rj29VujU jrvg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=8tktbsfgdwLJuLd8ONiKhRO06je4eOmcmCGWb6d79ds=; b=AEB4mRWkGDGpSZAWarFdz0WWHRPiQuy1MJCjtCT3oz5RVU5jNbVJtOsAaojdUx26GQ VVK3JfMT7RN9dSCfQt9bJx36bICom9bsKRscSAfxLvPudY2e5+6w7dBZDutvYXeIDQjA 8BAGmDldxrWPDw70HlVsnzEOq+IbL8yKlBT5XPIkxdTPxYzzc898ij7tYtc8S4hKQFpR kwVxR+yJ/IJVWl82JYQaOrvcS553i5XFhsx7RPiFojd8MA18fvFRCp0znxIN8lH7SlGW 2LrSTwWaQmpEuFz/XI56ZSZSidndIRuF9NG94iRRLquhRLmFT8rXCIcXayNn0pBj6A0A ufRw== X-Gm-Message-State: AOAM531aQFrjvLn53iuxRo9H71Dvzw/wsIodZ1E4XZIayetA9VKHIds6 JBIjoDqw58mj+xV7ygnj59vUw0XHb20= X-Google-Smtp-Source: ABdhPJwRP9nS3UGDZldsj83HHS7RG6H+MRlpVs0jlkJfhO6TdwjRwlyXcZ2/ngoH1wTKS9/H7RNHSw== X-Received: by 2002:a7b:c094:: with SMTP id r20mr4593141wmh.157.1642092235839; Thu, 13 Jan 2022 08:43:55 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id az6sm2917384wmb.48.2022.01.13.08.43.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 13 Jan 2022 08:43:55 -0800 (PST) Message-Id: In-Reply-To: References: Date: Thu, 13 Jan 2022 16:43:50 +0000 Subject: [PATCH 5/5] Accelerate clear_skip_worktree_from_present_files() by caching Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Victoria Dye , Derrick Stolee , Lessley Dennington , Elijah Newren , Elijah Newren Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Elijah Newren From: Elijah Newren Trying to clear the skip-worktree bit from files that are present does present some computational overhead, for sparse-checkouts. (We do not do the bit clearing in non-sparse-checkouts.) Optimize it as follows: Rather than lstat()'ing every SKIP_WORKTREE path, take advantage of the fact that entire directories will often be missing, especially for cone mode and even more so ever since commit 55dfcf9591 ("sparse-checkout: clear tracked sparse dirs", 2021-09-08). If we have already determined that the parent directory of a file (or other previous ancestor) does not exist, then the file cannot exist either so we do not need to lstat() it separately. Timings for p2000 included below, reformatted to fit in normal commit message line lengths, which compare three things: * Timings before this series * Timings of the unoptimized version of clear_skip_worktree_from_present_files() from a few commits ago * Timings after the optimization in this commit (NOTE: t/perf/ appears to have timing resolution only down to 0.01 s, which presents significant measurement error when timings only differ by 0.01s. I don't trust any such timings below, and yet all the optimized results differ by at most 0.01s.) Test Before Series Unoptimized Optimized ----------------------------------------------------------------------------- *git status* full-v3 0.15(0.10+0.06) 0.32(0.16+0.17) +113.3% 0.16(0.10+0.07) +6.7% full-v4 0.15(0.11+0.05) 0.32(0.17+0.16) +113.3% 0.16(0.11+0.05) +6.7% sparse-v3 0.04(0.03+0.04) 0.04(0.02+0.05) +0.0% 0.04(0.02+0.05) +0.0% sparse-v4 0.04(0.03+0.04) 0.04(0.02+0.05) +0.0% 0.04(0.03+0.05) +0.0% *git add -A* full-v3 0.40(0.30+0.07) 0.56(0.36+0.17) +40.0% 0.39(0.30+0.07) -2.5% full-v4 0.37(0.28+0.07) 0.54(0.37+0.16) +45.9% 0.38(0.29+0.07) +2.7% sparse-v3 0.06(0.04+0.05) 0.08(0.05+0.05) +33.3% 0.06(0.05+0.04) +0.0% sparse-v4 0.05(0.03+0.05) 0.05(0.04+0.04) +0.0% 0.06(0.04+0.05) +20.0% *git add .* full-v3 0.40(0.31+0.07) 0.57(0.37+0.17) +42.5% 0.41(0.30+0.08) +2.5% full-v4 0.38(0.30+0.06) 0.55(0.37+0.16) +44.7% 0.38(0.30+0.06) +0.0% sparse-v3 0.06(0.04+0.05) 0.06(0.05+0.04) +0.0% 0.06(0.03+0.05) +0.0% sparse-v4 0.06(0.05+0.05) 0.06(0.04+0.05) +0.0% 0.06(0.04+0.06) +0.0% *git commit -a -m A* full-v3 0.41(0.32+0.06) 0.58(0.39+0.17) +41.5% 0.42(0.32+0.07) +2.4% full-v4 0.39(0.30+0.07) 0.56(0.38+0.17) +43.6% 0.40(0.31+0.07) +2.6% sparse-v3 0.04(0.03+0.04) 0.04(0.03+0.04) +0.0% 0.04(0.03+0.04) +0.0% sparse-v4 0.04(0.03+0.05) 0.04(0.03+0.05) +0.0% 0.04(0.03+0.04) +0.0% *git checkout -f -* full-v3 0.56(0.46+0.07) 0.73(0.55+0.16) +30.4% 0.57(0.47+0.08) +1.8% full-v4 0.54(0.45+0.07) 0.71(0.53+0.17) +31.5% 0.55(0.45+0.07) +1.9% sparse-v3 0.06(0.04+0.04) 0.06(0.04+0.05) +0.0% 0.06(0.04+0.05) +0.0% sparse-v4 0.05(0.05+0.04) 0.05(0.04+0.05) +0.0% 0.06(0.04+0.05) +20.0% *git reset* full-v3 0.34(0.26+0.05) 0.51(0.34+0.15) +50.0% 0.34(0.26+0.06) +0.0% full-v4 0.32(0.24+0.06) 0.49(0.32+0.15) +53.1% 0.33(0.25+0.06) +3.1% sparse-v3 0.04(0.03+0.04) 0.04(0.03+0.04) +0.0% 0.04(0.03+0.04) +0.0% sparse-v4 0.03(0.03+0.04) 0.03(0.02+0.04) +0.0% 0.03(0.03+0.04) +0.0% *git reset --hard* full-v3 0.57(0.46+0.07) 0.90(0.61+0.25) +57.9% 0.57(0.45+0.08) +0.0% full-v4 0.54(0.46+0.05) 0.88(0.59+0.26) +63.0% 0.55(0.45+0.07) +1.9% sparse-v3 0.07(0.03+0.03) 0.07(0.04+0.03) +0.0% 0.07(0.03+0.03) +0.0% sparse-v4 0.06(0.03+0.03) 0.06(0.04+0.02) +0.0% 0.06(0.03+0.03) +0.0% *git reset -- does-not-exist* full-v3 0.35(0.27+0.06) 0.52(0.32+0.17) +48.6% 0.35(0.27+0.06) +0.0% full-v4 0.33(0.26+0.05) 0.50(0.33+0.15) +51.5% 0.33(0.26+0.06) +0.0% sparse-v3 0.04(0.03+0.04) 0.04(0.03+0.04) +0.0% 0.04(0.03+0.04) +0.0% sparse-v4 0.04(0.02+0.04) 0.03(0.02+0.04) -25.0% 0.03(0.02+0.04) -25.0% *git diff* full-v3 0.07(0.04+0.04) 0.24(0.11+0.14) +242.9% 0.07(0.04+0.04) +0.0% full-v4 0.07(0.03+0.05) 0.24(0.13+0.12) +242.9% 0.08(0.04+0.05) +14.3% sparse-v3 0.02(0.01+0.04) 0.02(0.01+0.04) +0.0% 0.02(0.01+0.05) +0.0% sparse-v4 0.02(0.02+0.03) 0.02(0.01+0.04) +0.0% 0.02(0.01+0.04) +0.0% *git diff --cached* full-v3 0.05(0.03+0.02) 0.22(0.12+0.09) +340.0% 0.05(0.03+0.01) +0.0% full-v4 0.05(0.03+0.01) 0.23(0.12+0.11) +360.0% 0.05(0.03+0.02) +0.0% sparse-v3 0.01(0.00+0.00) 0.01(0.00+0.00) +0.0% 0.01(0.00+0.00) +0.0% sparse-v4 0.01(0.00+0.00) 0.01(0.00+0.00) +0.0% 0.01(0.00+0.00) +0.0% *git blame f2/f4/a* full-v3 0.18(0.13+0.05) 0.52(0.29+0.23) +188.9% 0.19(0.15+0.04) +5.6% full-v4 0.19(0.15+0.04) 0.52(0.28+0.23) +173.7% 0.19(0.14+0.04) +0.0% sparse-v3 0.10(0.08+0.02) 0.10(0.09+0.01) +0.0% 0.10(0.09+0.01) +0.0% sparse-v4 0.10(0.08+0.02) 0.10(0.08+0.02) +0.0% 0.10(0.08+0.02) +0.0% *git blame f2/f4/f3/a* full-v3 0.45(0.36+0.08) 0.78(0.51+0.27) +73.3% 0.45(0.37+0.08) +0.0% full-v4 0.45(0.37+0.08) 0.78(0.51+0.26) +73.3% 0.45(0.37+0.08) +0.0% sparse-v3 0.36(0.32+0.04) 0.36(0.31+0.05) +0.0% 0.36(0.31+0.04) +0.0% sparse-v4 0.36(0.31+0.05) 0.36(0.31+0.05) +0.0% 0.36(0.31+0.04) +0.0% *git checkout-index -f --all* full-v3 0.07(0.02+0.05) 0.24(0.12+0.12) +242.9% 0.08(0.04+0.04) +14.3% full-v4 0.07(0.03+0.04) 0.24(0.11+0.13) +242.9% 0.08(0.03+0.04) +14.3% sparse-v3 0.04(0.01+0.03) 0.04(0.00+0.03) +0.0% 0.04(0.01+0.03) +0.0% sparse-v4 0.04(0.01+0.02) 0.04(0.01+0.03) +0.0% 0.04(0.01+0.02) +0.0% *git update-index --add --remove f2/f4/a* full-v3 0.29(0.23+0.02) 0.46(0.30+0.12) +58.6% 0.30(0.24+0.02) +3.4% full-v4 0.27(0.22+0.02) 0.45(0.29+0.12) +66.7% 0.28(0.22+0.03) +3.7% sparse-v3 0.02(0.02+0.00) 0.02(0.01+0.00) +0.0% 0.02(0.01+0.00) +0.0% sparse-v4 0.02(0.02+0.00) 0.02(0.02+0.00) +0.0% 0.02(0.02+0.00) +0.0% So, with the optimization, the extra work appears to be essentially 0 for sparse-checkouts that are also using sparse-indexes (even before my optimization), and the extra work appears to be just marginally more than 0 for sparse-checkouts that are using full indexes. Signed-off-by: Elijah Newren --- sparse-index.c | 56 ++++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 54 insertions(+), 2 deletions(-) diff --git a/sparse-index.c b/sparse-index.c index b82648b10ee..eed170cd8f7 100644 --- a/sparse-index.c +++ b/sparse-index.c @@ -341,18 +341,70 @@ void ensure_correct_sparsity(struct index_state *istate) ensure_full_index(istate); } +static int path_found(const char *path, const char **dirname, size_t *dir_len, + int *dir_found) +{ + struct stat st; + char *newdir; + char *tmp; + + /* + * If dirname corresponds to a directory that doesn't exist, and this + * path starts with dirname, then path can't exist. + */ + if (!*dir_found && !memcmp(path, *dirname, *dir_len)) + return 0; + + /* + * If path itself exists, return 1. + */ + if (!lstat(path, &st)) + return 1; + + /* + * Otherwise, path does not exist so we'll return 0...but we'll first + * determine some info about its parent directory so we can avoid + * lstat calls for future cache entries. + */ + newdir = strrchr(path, '/'); + if (!newdir) + return 0; /* Didn't find a parent dir; just return 0 now. */ + + /* + * If path starts with directory (which we already lstat'ed and found), + * then no need to lstat parent directory again. + */ + if (*dir_found && *dirname && memcmp(path, *dirname, *dir_len)) + return 0; + + /* Free previous dirname, and cache path's dirname */ + *dirname = path; + *dir_len = newdir - path + 1; + + tmp = xstrndup(path, *dir_len); + *dir_found = !lstat(tmp, &st); + free(tmp); + + return 0; +} + void clear_skip_worktree_from_present_files(struct index_state *istate) { + const char *last_dirname = NULL; + size_t dir_len = 0; + int dir_found = 1; + int i; + if (!core_apply_sparse_checkout) return; restart: for (i = 0; i < istate->cache_nr; i++) { struct cache_entry *ce = istate->cache[i]; - struct stat st; - if (ce_skip_worktree(ce) && !lstat(ce->name, &st)) { + if (ce_skip_worktree(ce) && + path_found(ce->name, &last_dirname, &dir_len, &dir_found)) { if (S_ISSPARSEDIR(ce->ce_mode)) { ensure_full_index(istate); goto restart;