From patchwork Thu Mar 26 21:27:34 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee via GitGitGadget X-Patchwork-Id: 11461247 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B917417EA for ; Thu, 26 Mar 2020 21:27:47 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 8906420714 for ; Thu, 26 Mar 2020 21:27:47 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="e2I41Jjk" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727702AbgCZV1r (ORCPT ); Thu, 26 Mar 2020 17:27:47 -0400 Received: from mail-ed1-f65.google.com ([209.85.208.65]:45544 "EHLO mail-ed1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726260AbgCZV1q (ORCPT ); Thu, 26 Mar 2020 17:27:46 -0400 Received: by mail-ed1-f65.google.com with SMTP id u59so8654498edc.12 for ; Thu, 26 Mar 2020 14:27:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=QI3gH0FzLV7g0yB+cY6fjEbNythXyD8cgG7YB5sfWw8=; b=e2I41JjkhvfSt7wQqasFgVeFJr+ZfbYxISQgjw229iVWP9RLbubHW6Kqefz/UdE3ic jk87C/gWyJg/JNAuVh0UQwZSVtJdFK7W+CHlYdNjDiIA62Z0IkzbV0jyCuKh9+D9T8my SVIwm/YGb87i3BCcu78HeyDeXnVVpc7TKXBMiDKqkdY73deIyn+/6NFKYcHBwbKtKKv/ zzoOmZ9ULqjhZfG3slD8rZbmGZoeviZmKx4onatbn8eKc/G0PSz9pKrrH8XtPqSrJa2n n002aJ95+sMO+aS2hLlOm/KIniD8dkobEAPhyzvBg4qZx+ONJpVkd8L/VCA20ABwd9R+ fnLQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=QI3gH0FzLV7g0yB+cY6fjEbNythXyD8cgG7YB5sfWw8=; b=PGjN0Yl9YmN2WX3in8mvJdu1/t4G70TF5zV7JINKoeM6LcPpF1MnZwe8p9/ykR8R5x 8z5vzBcchQTK/QVzBWI3QedO7CkDDZ6/gZgRzCdQhH2X216w1SPLl+Fo9BSJFsLbbG9L DbalOn7jp8ua6YaQjfLVnv+CGJf4b+3ATy14MDyMmHXCnwrAoebGB2pDVbiTmNV2F/Jq IgNYEgX9TeWhjRUEvuB3otStd485/DZpkRvIaatgTfjkaQw4URlcOc7gI69bAGt5sroS LgdjG2GKXdWAbRv0xUsj8KDPmsFRSHzruxf3Q87afoH76mqTtH627dlX5mhgMbw6DXsv uCuQ== X-Gm-Message-State: ANhLgQ3PifR9ua9D7DmLlWLjCsIFR4e3RO0cxkv0wqnjYPU7UYptv/l7 d2bfZoz0/hbWEwxkulTx4t+HlgUJ X-Google-Smtp-Source: ADFU+vu+P3pTDCvXzZ07FDVfLmg2i0wA+XOpRnilakfV9ZHX9JSbBDvhh4iHfQbLKyy73Ruc02utsg== X-Received: by 2002:a17:906:1993:: with SMTP id g19mr9872323ejd.70.1585258063408; Thu, 26 Mar 2020 14:27:43 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id p20sm552471edw.31.2020.03.26.14.27.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 26 Mar 2020 14:27:42 -0700 (PDT) Message-Id: <752403e339bae098104f41c541c2b0e684586c1f.1585258061.git.gitgitgadget@gmail.com> In-Reply-To: References: From: "Elijah Newren via GitGitGadget" Date: Thu, 26 Mar 2020 21:27:34 +0000 Subject: [PATCH v4 1/7] t7063: more thorough status checking Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Martin Melka , SZEDER =?utf-8?b?R8OhYm9y?= , Samuel Lijin , =?utf-8?b?Tmd1eeG7hW4gVGjDoWkgTmfhu41j?= Duy , Derrick Stolee , Elijah Newren , Elijah Newren Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Elijah Newren It turns out the t7063 has some testcases that even without using the untracked cache cover situations that nothing else in the testsuite handles. Checking the results of git status --porcelain both with and without the untracked cache, and comparing both against our expected results helped uncover a critical bug in some dir.c restructuring. Unfortunately, it's not easy to run status and tell it to ignore the untracked cache; the only knob we have it to instruct it to *delete* (and ignore) the untracked cache. Create a simple helper that will create a clone of the index that is missing the untracked cache bits, and use it to compare that the results with the untracked cache match the results we get without the untracked cache. Signed-off-by: Elijah Newren --- t/t7063-status-untracked-cache.sh | 52 +++++++++++++++++++++++++++++++ 1 file changed, 52 insertions(+) diff --git a/t/t7063-status-untracked-cache.sh b/t/t7063-status-untracked-cache.sh index 190ae149cf3..156d06c34e8 100755 --- a/t/t7063-status-untracked-cache.sh +++ b/t/t7063-status-untracked-cache.sh @@ -30,6 +30,30 @@ status_is_clean() { test_must_be_empty ../status.actual } +# Ignore_Untracked_Cache, abbreviated to 3 letters because then people can +# compare commands side-by-side, e.g. +# iuc status --porcelain >expect && +# git status --porcelain >actual && +# test_cmp expect actual +iuc() { + git ls-files -s >../current-index-entries + git ls-files -t | grep ^S | sed -e s/^S.// >../current-sparse-entries + + GIT_INDEX_FILE=.git/tmp_index + export GIT_INDEX_FILE + git update-index --index-info <../current-index-entries + git update-index --skip-worktree $(cat ../current-sparse-entries) + + git -c core.untrackedCache=false "$@" + ret=$? + + rm ../current-index-entries + rm $GIT_INDEX_FILE + unset GIT_INDEX_FILE + + return $ret +} + test_lazy_prereq UNTRACKED_CACHE ' { git update-index --test-untracked-cache; ret=$?; } && test $ret -ne 1 @@ -95,6 +119,8 @@ test_expect_success 'status first time (empty cache)' ' : >../trace && GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ git status --porcelain >../actual && + iuc status --porcelain >../status.iuc && + test_cmp ../status.expect ../status.iuc && test_cmp ../status.expect ../actual && cat >../trace.expect <../trace && GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ git status --porcelain >../actual && + iuc status --porcelain >../status.iuc && + test_cmp ../status.expect ../status.iuc && test_cmp ../status.expect ../actual && cat >../trace.expect <../trace && GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ git status --porcelain >../actual && + iuc status --porcelain >../status.iuc && cat >../status.expect <../trace.expect <../trace && GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ git status --porcelain >../actual && + iuc status --porcelain >../status.iuc && cat >../status.expect <../trace.expect <../trace && GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ git status --porcelain >../actual && + iuc status --porcelain >../status.iuc && cat >../status.expect <../trace.expect <../trace && GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ git status --porcelain >../actual && + iuc status --porcelain >../status.iuc && cat >../status.expect <../trace.expect <../trace && GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ git status --porcelain >../actual && + iuc status --porcelain >../status.iuc && cat >../status.expect <../trace.expect <../trace && GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ git status --porcelain >../actual && + iuc status --porcelain >../status.iuc && cat >../status.expect <../trace.expect <../status.actual && + iuc status --porcelain >../status.iuc && cat >../status.expect <../trace.expect <../trace && GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ git status --porcelain >../status.actual && + iuc status --porcelain >../status.iuc && cat >../status.expect <../trace.expect <../trace && GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ git status --porcelain >../status.actual && + iuc status --porcelain >../status.iuc && cat >../status.expect <../trace.expect <../trace && GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ git status --porcelain >../status.actual && + iuc status --porcelain >../status.iuc && + test_cmp ../status.expect ../status.iuc && test_cmp ../status.expect ../status.actual && cat >../trace.expect <../status.actual && + iuc status --porcelain >../status.iuc && cat >../status.expect <../status.actual && + iuc status --porcelain >../status.iuc && cat >../status.expect < X-Patchwork-Id: 11461245 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8053217D4 for ; Thu, 26 Mar 2020 21:27:47 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 614B220714 for ; Thu, 26 Mar 2020 21:27:47 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="b1jbr288" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727690AbgCZV1q (ORCPT ); Thu, 26 Mar 2020 17:27:46 -0400 Received: from mail-ed1-f68.google.com ([209.85.208.68]:36577 "EHLO mail-ed1-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726954AbgCZV1q (ORCPT ); Thu, 26 Mar 2020 17:27:46 -0400 Received: by mail-ed1-f68.google.com with SMTP id i7so3754372edq.3 for ; Thu, 26 Mar 2020 14:27:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=p9Pr9vcHIBAGv0RtCZmYZyBzvmoQaEM7VyyaJVVCmuc=; b=b1jbr288R/0i3UfHj6WdAgxQlS3E4H9RLCCoJjxF72Yq0Lo99IPCRzW4eBqarXPXc2 aKYCp9HC1cJKzVMK4do6s9knroJ5lU8Za6MJ18VTH1iUFD7mSpNcxBkJF3ZoViJCKdJC eNW5unzXn2ibtUx470re00xyJpuOfSkYKQJAAD1ul06ZW7/wpKpb8D0DwgfTFD1Ry0hN HVHp22hwS7H8oML6Z2sQ3Q099jeFAikTacDCRZwQ+VKkSojsoIPjMibglB8JzCsH70uG kU4gFsEwok0sEE7K3Dq8U1bo3stFWwvAHfgBHZHygPYEe5hBvYKIR7g82pNkGT2k3w7J wmrg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=p9Pr9vcHIBAGv0RtCZmYZyBzvmoQaEM7VyyaJVVCmuc=; b=Oi8Iy6K8QV+n3WpelcnyxPmZF7QpEWDzsxmV8hFHWiOIi7EOyq8UXBqbW1CqTl8uIz E7oKOgWRRghZNQEj5c74xmChGFbz8DuKQS6Zv4+Mp0uHgJAzidCpGRHIj6Y7NbL5SuCt A9TH2ocTtrpC45MtaUAdWLKkrg5aHarFY0NF0RuLwyYOOZL3IVQvQBqwjuywVWi9fazu TpZr+BIjKb5OTaT4Q1rUlaj/taAkBgFAOD0QVRnFuY6mew0PiJlK9IKAQbNmIWjGP0T1 okMoSCSG/S9ndl2BvixD4S/FxpTPvRm0LWR2WA0s74ma9tYES8Nsp+MZv4KS+FMlIQvR eEuQ== X-Gm-Message-State: ANhLgQ0exzgA+WuQigX5dh1FWgph4oyh8wNExwbKIdFUO8DT3VdPZh/y VHy71TsobBu5izUvsyuys3ZjeVZF X-Google-Smtp-Source: ADFU+vvceTpCa/BffDAED4NPxAPLoxG9KCpFT+nG4JK1ii/zEW0MgIUZVMeFNW/GT5Pg2Ndfens3dg== X-Received: by 2002:a17:906:694d:: with SMTP id c13mr7911807ejs.268.1585258064114; Thu, 26 Mar 2020 14:27:44 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id s4sm548423edw.19.2020.03.26.14.27.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 26 Mar 2020 14:27:43 -0700 (PDT) Message-Id: In-Reply-To: References: From: "Elijah Newren via GitGitGadget" Date: Thu, 26 Mar 2020 21:27:35 +0000 Subject: [PATCH v4 2/7] dir: fix simple typo in comment Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Martin Melka , SZEDER =?utf-8?b?R8OhYm9y?= , Samuel Lijin , =?utf-8?b?Tmd1eeG7hW4gVGjDoWkgTmfhu41j?= Duy , Derrick Stolee , Elijah Newren , Elijah Newren Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Elijah Newren Signed-off-by: Elijah Newren --- dir.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/dir.c b/dir.c index b460211e614..b505ba747bb 100644 --- a/dir.c +++ b/dir.c @@ -2174,7 +2174,7 @@ static void add_path_to_appropriate_result_list(struct dir_struct *dir, * If 'stop_at_first_file' is specified, 'path_excluded' is returned * to signal that a file was found. This is the least significant value that * indicates that a file was encountered that does not depend on the order of - * whether an untracked or exluded path was encountered first. + * whether an untracked or excluded path was encountered first. * * Returns the most significant path_treatment value encountered in the scan. * If 'stop_at_first_file' is specified, `path_excluded` is the most From patchwork Thu Mar 26 21:27:36 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee via GitGitGadget X-Patchwork-Id: 11461251 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B0D41913 for ; Thu, 26 Mar 2020 21:27:50 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 895CA2074D for ; Thu, 26 Mar 2020 21:27:50 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="P4ycirxk" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727719AbgCZV1t (ORCPT ); Thu, 26 Mar 2020 17:27:49 -0400 Received: from mail-ed1-f65.google.com ([209.85.208.65]:42375 "EHLO mail-ed1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727456AbgCZV1s (ORCPT ); Thu, 26 Mar 2020 17:27:48 -0400 Received: by mail-ed1-f65.google.com with SMTP id cw6so7980660edb.9 for ; Thu, 26 Mar 2020 14:27:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=KOdFBrHGWNs2atkOc59kMAuhffvFz5pBGVjQsL0Jl2A=; b=P4ycirxkoqWM5hWmcHHnLNzc14Ume7oE+KJtwmy6g/egqnmDWqStIpdKPIE4hhePh3 gJH64EPWTcX6tlrhAXGdSqFeLSGvm6JqU07mrCEGMdabu0mQkgnfEkGxsCLVb8lG5BJP AF8r2oa7M1PxUg0vJHnghS8SrbxJaIS8fysycUrczvnU+C4TiucdKaPFXhctb70LASYZ cY1KJjIYxPdp6tVJ1wRbPp0vci1uexC1uncE+Xvx79cTo2QryG55U3zCQx8tjr1S1U5w yPH+6SRQ/KnxmA6j1dRydHA40zXLK4Q4q2/GO2J4XMlis905UQ6kdZaQGO16CHatSN1l aKXw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=KOdFBrHGWNs2atkOc59kMAuhffvFz5pBGVjQsL0Jl2A=; b=UmdqMy41vb3CWJ7e7B3llNYAzKL3OJcjj78Q8BK4Qyyb2kVuE8G0x9qQrpF8S+53u6 vUU2GDGPN06sgsfuv72uR0cHtRrkoJd5nuifNtCUTsabZKibXaje55EzZJ92cEEzr/8n 54j24NCHhgRg6soTen8FD0iTKlyeuSL79b9ZTmtJLBimN8oxhV0Y7Rlmvbo6erxsmaho CNlK50dZj/p3DntfLrEVOa0r+W5Kksu9n0xkQubVlaBjTuz0J2Vg5gWevDQ6a+PL9WOy Io0ZnC9rPJcOrMRe5GcnOI26GfGyk5TQ4l1L4ox8whkSRMJ//7RUCSJOIv1iFIAYs89B 4Sfw== X-Gm-Message-State: ANhLgQ3ZbuAUrMq334BxHjVk9nA/XjneOriLXrCTclwNG7If4g7t++i9 AfBFaRHfIRsTbUXkWEb8xoQIjsZR X-Google-Smtp-Source: ADFU+vuZoTO8SDtRnGIixmvOsXJMsGnkwN7FtuzlxdQk61m7XyXCCpRd2AvT1dhOMHvbYGZlQUQ0CQ== X-Received: by 2002:a17:906:aac3:: with SMTP id kt3mr9410847ejb.16.1585258065040; Thu, 26 Mar 2020 14:27:45 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id p4sm445319eju.57.2020.03.26.14.27.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 26 Mar 2020 14:27:44 -0700 (PDT) Message-Id: <48f37e5b11476d2b59dcd7496b44c6a862311f57.1585258061.git.gitgitgadget@gmail.com> In-Reply-To: References: From: "Elijah Newren via GitGitGadget" Date: Thu, 26 Mar 2020 21:27:36 +0000 Subject: [PATCH v4 3/7] dir: consolidate treat_path() and treat_one_path() Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Martin Melka , SZEDER =?utf-8?b?R8OhYm9y?= , Samuel Lijin , =?utf-8?b?Tmd1eeG7hW4gVGjDoWkgTmfhu41j?= Duy , Derrick Stolee , Elijah Newren , Elijah Newren Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Elijah Newren Commit 16e2cfa90993 ("read_directory(): further split treat_path()", 2010-01-08) split treat_one_path() out of treat_path(), because treat_leading_path() would not have access to a dirent but wanted to re-use as much of treat_path() as possible. Not re-using all of treat_path() caused other bugs, as noted in commit b9670c1f5e6b ("dir: fix checks on common prefix directory", 2019-12-19). Finally, in commit ad6f2157f951 ("dir: restructure in a way to avoid passing around a struct dirent", 2020-01-16), dirents were removed from treat_path() and other functions entirely. Since the only reason for splitting these functions was the lack of a dirent -- which no longer applies to either function -- and since the split caused problems in the past resulting in us not using treat_one_path() separately anymore, just undo the split. Signed-off-by: Elijah Newren --- dir.c | 121 ++++++++++++++++++++++++++-------------------------------- 1 file changed, 55 insertions(+), 66 deletions(-) diff --git a/dir.c b/dir.c index b505ba747bb..d0f3d660850 100644 --- a/dir.c +++ b/dir.c @@ -1863,21 +1863,65 @@ static int resolve_dtype(int dtype, struct index_state *istate, return dtype; } -static enum path_treatment treat_one_path(struct dir_struct *dir, - struct untracked_cache_dir *untracked, - struct index_state *istate, - struct strbuf *path, - int baselen, - const struct pathspec *pathspec, - int dtype) -{ - int exclude; - int has_path_in_index = !!index_file_exists(istate, path->buf, path->len, ignore_case); +static enum path_treatment treat_path_fast(struct dir_struct *dir, + struct untracked_cache_dir *untracked, + struct cached_dir *cdir, + struct index_state *istate, + struct strbuf *path, + int baselen, + const struct pathspec *pathspec) +{ + strbuf_setlen(path, baselen); + if (!cdir->ucd) { + strbuf_addstr(path, cdir->file); + return path_untracked; + } + strbuf_addstr(path, cdir->ucd->name); + /* treat_one_path() does this before it calls treat_directory() */ + strbuf_complete(path, '/'); + if (cdir->ucd->check_only) + /* + * check_only is set as a result of treat_directory() getting + * to its bottom. Verify again the same set of directories + * with check_only set. + */ + return read_directory_recursive(dir, istate, path->buf, path->len, + cdir->ucd, 1, 0, pathspec); + /* + * We get path_recurse in the first run when + * directory_exists_in_index() returns index_nonexistent. We + * are sure that new changes in the index does not impact the + * outcome. Return now. + */ + return path_recurse; +} + +static enum path_treatment treat_path(struct dir_struct *dir, + struct untracked_cache_dir *untracked, + struct cached_dir *cdir, + struct index_state *istate, + struct strbuf *path, + int baselen, + const struct pathspec *pathspec) +{ + int has_path_in_index, dtype, exclude; enum path_treatment path_treatment; - dtype = resolve_dtype(dtype, istate, path->buf, path->len); + if (!cdir->d_name) + return treat_path_fast(dir, untracked, cdir, istate, path, + baselen, pathspec); + if (is_dot_or_dotdot(cdir->d_name) || !fspathcmp(cdir->d_name, ".git")) + return path_none; + strbuf_setlen(path, baselen); + strbuf_addstr(path, cdir->d_name); + if (simplify_away(path->buf, path->len, pathspec)) + return path_none; + + dtype = resolve_dtype(cdir->d_type, istate, path->buf, path->len); /* Always exclude indexed files */ + has_path_in_index = !!index_file_exists(istate, path->buf, path->len, + ignore_case); if (dtype != DT_DIR && has_path_in_index) return path_none; @@ -1942,61 +1986,6 @@ static enum path_treatment treat_one_path(struct dir_struct *dir, } } -static enum path_treatment treat_path_fast(struct dir_struct *dir, - struct untracked_cache_dir *untracked, - struct cached_dir *cdir, - struct index_state *istate, - struct strbuf *path, - int baselen, - const struct pathspec *pathspec) -{ - strbuf_setlen(path, baselen); - if (!cdir->ucd) { - strbuf_addstr(path, cdir->file); - return path_untracked; - } - strbuf_addstr(path, cdir->ucd->name); - /* treat_one_path() does this before it calls treat_directory() */ - strbuf_complete(path, '/'); - if (cdir->ucd->check_only) - /* - * check_only is set as a result of treat_directory() getting - * to its bottom. Verify again the same set of directories - * with check_only set. - */ - return read_directory_recursive(dir, istate, path->buf, path->len, - cdir->ucd, 1, 0, pathspec); - /* - * We get path_recurse in the first run when - * directory_exists_in_index() returns index_nonexistent. We - * are sure that new changes in the index does not impact the - * outcome. Return now. - */ - return path_recurse; -} - -static enum path_treatment treat_path(struct dir_struct *dir, - struct untracked_cache_dir *untracked, - struct cached_dir *cdir, - struct index_state *istate, - struct strbuf *path, - int baselen, - const struct pathspec *pathspec) -{ - if (!cdir->d_name) - return treat_path_fast(dir, untracked, cdir, istate, path, - baselen, pathspec); - if (is_dot_or_dotdot(cdir->d_name) || !fspathcmp(cdir->d_name, ".git")) - return path_none; - strbuf_setlen(path, baselen); - strbuf_addstr(path, cdir->d_name); - if (simplify_away(path->buf, path->len, pathspec)) - return path_none; - - return treat_one_path(dir, untracked, istate, path, baselen, pathspec, - cdir->d_type); -} - static void add_untracked(struct untracked_cache_dir *dir, const char *name) { if (!dir) From patchwork Thu Mar 26 21:27:37 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee via GitGitGadget X-Patchwork-Id: 11461249 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0E4CE913 for ; Thu, 26 Mar 2020 21:27:49 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id E44CF2074D for ; Thu, 26 Mar 2020 21:27:48 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="LytDxeZF" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727711AbgCZV1s (ORCPT ); Thu, 26 Mar 2020 17:27:48 -0400 Received: from mail-ed1-f65.google.com ([209.85.208.65]:35016 "EHLO mail-ed1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726954AbgCZV1r (ORCPT ); Thu, 26 Mar 2020 17:27:47 -0400 Received: by mail-ed1-f65.google.com with SMTP id a20so8740031edj.2 for ; Thu, 26 Mar 2020 14:27:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=ip1sdkCAke+L9RTot1ew8WG61s7QEX3alK/LLGEt0Fk=; b=LytDxeZFrUPMIRK0eZbLnSX0a3aKsEkML949NS+giPsNNaahqWRerMbqUbLSTLc/6n rf41UQar2XtB25GuR1Lom+ltPtDPtSeLbxqSOJjGR+IVs1QoRGYEj6xe4pafVbNbtvUn BwByflbHaSwXFqZuFuYwk1Ku50Q/E8mnaiLQSWxeNZxuqb6+++QA4rWf4LkB2OlMM3I9 DbaCZsqlJ6ackU9kr0KS5xHs8MSBwYCZo64NEv/g+6Ysmo4gfbLX07B7kiZDMvjzgJPc /F9ksMaDE6wR+G1U3rGS7cCh5EMTu0I8Vl+wCosciVclLON5TSv2XDgKf2VlbaxodoEA uM+Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=ip1sdkCAke+L9RTot1ew8WG61s7QEX3alK/LLGEt0Fk=; b=rdSii8bsgxiCqftAX1JHD2Bie+6MeMlVU1br64BGEepqn7l4zqSeWyhL9a0XEo5H10 XIvJZdXne15yBuQ81YjzCYLSP1URHxOTqTq+HL1cjLrwrzIjPntswEqHBifFyqHTFzBo RoKAO14bHfk5D2l96INFibToPJENOgRY1rFnEsb4AGPjUIWsIBUtF1u67/4BC9tbqVRN imGlHYEDz02gOw4BCEQazsmo44nV9rj0C1gdwpIL+mwffuHvhw+L7WTWe5WnfnY7mRHu YLHEDg1GIrROHHHCGPzKSjd4jTAZhET1kaZNZLklJe8RCENUW6eHLlosTdZoiB+9frbn yw6A== X-Gm-Message-State: ANhLgQ13hY2wI/TEnhPoBwcUVZpYl/d9yz4VCBYPLvilSMG1WT/bCAtU 7J8DeaCyIL1ZRhfyYMMFE9ND/v6t X-Google-Smtp-Source: ADFU+vtmVUl9PawTvVQQoPoigFsOQoH1fcTuMLpBgOib2HYqSuHc64FTT76hL+3Bh57mnKhpNpF6JQ== X-Received: by 2002:a17:906:2cd1:: with SMTP id r17mr9974315ejr.280.1585258066027; Thu, 26 Mar 2020 14:27:46 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id u29sm523960edd.47.2020.03.26.14.27.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 26 Mar 2020 14:27:45 -0700 (PDT) Message-Id: In-Reply-To: References: From: "Elijah Newren via GitGitGadget" Date: Thu, 26 Mar 2020 21:27:37 +0000 Subject: [PATCH v4 4/7] dir: fix broken comment Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Martin Melka , SZEDER =?utf-8?b?R8OhYm9y?= , Samuel Lijin , =?utf-8?b?Tmd1eeG7hW4gVGjDoWkgTmfhu41j?= Duy , Derrick Stolee , Elijah Newren , Elijah Newren Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Elijah Newren Signed-off-by: Elijah Newren --- dir.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/dir.c b/dir.c index d0f3d660850..3a367683661 100644 --- a/dir.c +++ b/dir.c @@ -2259,7 +2259,7 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir, add_untracked(untracked, path.buf + baselen); break; } - /* skip the dir_add_* part */ + /* skip the add_path_to_appropriate_result_list() */ continue; } From patchwork Thu Mar 26 21:27:38 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee via GitGitGadget X-Patchwork-Id: 11461253 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B50A517EA for ; Thu, 26 Mar 2020 21:27:52 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 978BB2074D for ; Thu, 26 Mar 2020 21:27:52 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="lxWj3fFP" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727724AbgCZV1v (ORCPT ); Thu, 26 Mar 2020 17:27:51 -0400 Received: from mail-ed1-f65.google.com ([209.85.208.65]:39626 "EHLO mail-ed1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726260AbgCZV1t (ORCPT ); Thu, 26 Mar 2020 17:27:49 -0400 Received: by mail-ed1-f65.google.com with SMTP id a43so8699002edf.6 for ; Thu, 26 Mar 2020 14:27:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=GEY9TL7lTJrR/Du82wrLRvpKVZBjDrenRLjdBqlYW3k=; b=lxWj3fFPWsBGZDw8cEZTBoxVH7B3TpOc6l0js9CBV77dpMjDn66z4iyvaL7MqrMP/1 s6s8ffdISOI+MLmnACYs16VeRWW9JB9ZxlDWofoeveXr2NSmkDUu3MaA8wKG6rHAFZZJ 95Gk2AvI7Et/ccnDwh7+J5eSBZ1ts9v08WZ4VclayiCeIbpyGXyb+i1vZXjEM6W7MwbF W5rU7b7UzItnSuv14WYzZHYeJJgPw39X5OqfjR5NmgU3M10T5AQ4u2n6vpDfXi9vhGkW sFWQXjNrcEVEi6fb1987b2UzT5JHk8tHxw4CzI0d4otqKIhrDWCsI55y8Nm46ZM6KaT9 TzmQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=GEY9TL7lTJrR/Du82wrLRvpKVZBjDrenRLjdBqlYW3k=; b=nhYlE/0KNsluGQyjces493fJai7IO8sV4To4IrVOqij7MaWcRyK//9DKre0z9Y2GUo QEj4upJwtgkXm9D3YMD9FvVb9Qibk4Ui1AFO90Qs+iRRxfKK6pKhX1X8gIUMfdv6w/fM Czpm/PQ2K20xukxQZ4UXxrGg4N7R+YaJs5df8R28WTPjgmT6AemDMd6LENDDWTOlia8d 2pzEwJF5UQ7kJjVEOJPRxCfPeAOjIdjjXzHQ9OQK3y611R6bJLmZXiqtvumo0SoNIc8Q IKmKPZLGVwqFmpo+OrSAitPzVnM9+dzCyHldzINZdE+/fJTd8KASFez4rVaBnbJEl/vT KMJg== X-Gm-Message-State: ANhLgQ1SQKoZcLabqfHl6UZWZSnKIK5GGhXPDNqX2WfEcFT2gYzpwtMm Glgj8r3mPTEfVoseugD5uXV+z/9V X-Google-Smtp-Source: ADFU+vsSBNkUkH98xuxGUCL4gAY3ex/Zl2OmN9sOjoYlL5zuM77C3SdQsEQ8QIq5AYaC5czBNTNwkg== X-Received: by 2002:a17:906:7d02:: with SMTP id u2mr4918408ejo.71.1585258066838; Thu, 26 Mar 2020 14:27:46 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id m21sm520319edb.90.2020.03.26.14.27.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 26 Mar 2020 14:27:46 -0700 (PDT) Message-Id: <2603c1a9d13a2ff8c6d719a0f5858211116ef007.1585258061.git.gitgitgadget@gmail.com> In-Reply-To: References: From: "Elijah Newren via GitGitGadget" Date: Thu, 26 Mar 2020 21:27:38 +0000 Subject: [PATCH v4 5/7] dir: fix confusion based on variable tense Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Martin Melka , SZEDER =?utf-8?b?R8OhYm9y?= , Samuel Lijin , =?utf-8?b?Tmd1eeG7hW4gVGjDoWkgTmfhu41j?= Duy , Derrick Stolee , Elijah Newren , Elijah Newren Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Elijah Newren Despite having contributed several fixes in this area, I have for months (years?) assumed that the "exclude" variable was a directive; this caused me to think of it as a different mode we operate in and left me confused as I tried to build up a mental model around why we'd need such a directive. I mostly tried to ignore it while focusing on the pieces I was trying to understand. Then I finally traced this variable all back to a call to is_excluded(), meaning it was actually functioning as an adjective. In particular, it was a checked property ("Does this path match a rule in .gitignore?"), rather than a mode passed in from the caller. Change the variable name to match the part of speech used by the function called to define it, which will hopefully make these bits of code slightly clearer to the next reader. Signed-off-by: Elijah Newren --- dir.c | 26 +++++++++++++------------- 1 file changed, 13 insertions(+), 13 deletions(-) diff --git a/dir.c b/dir.c index 3a367683661..8074e651e6f 100644 --- a/dir.c +++ b/dir.c @@ -1656,7 +1656,7 @@ static enum exist_status directory_exists_in_index(struct index_state *istate, static enum path_treatment treat_directory(struct dir_struct *dir, struct index_state *istate, struct untracked_cache_dir *untracked, - const char *dirname, int len, int baselen, int exclude, + const char *dirname, int len, int baselen, int excluded, const struct pathspec *pathspec) { int nested_repo = 0; @@ -1679,13 +1679,13 @@ static enum path_treatment treat_directory(struct dir_struct *dir, } if (nested_repo) return ((dir->flags & DIR_SKIP_NESTED_GIT) ? path_none : - (exclude ? path_excluded : path_untracked)); + (excluded ? path_excluded : path_untracked)); if (dir->flags & DIR_SHOW_OTHER_DIRECTORIES) break; - if (exclude && - (dir->flags & DIR_SHOW_IGNORED_TOO) && - (dir->flags & DIR_SHOW_IGNORED_TOO_MODE_MATCHING)) { + if (excluded && + (dir->flags & DIR_SHOW_IGNORED_TOO) && + (dir->flags & DIR_SHOW_IGNORED_TOO_MODE_MATCHING)) { /* * This is an excluded directory and we are @@ -1713,7 +1713,7 @@ static enum path_treatment treat_directory(struct dir_struct *dir, /* This is the "show_other_directories" case */ if (!(dir->flags & DIR_HIDE_EMPTY_DIRECTORIES)) - return exclude ? path_excluded : path_untracked; + return excluded ? path_excluded : path_untracked; untracked = lookup_untracked(dir->untracked, untracked, dirname + baselen, len - baselen); @@ -1723,7 +1723,7 @@ static enum path_treatment treat_directory(struct dir_struct *dir, * the directory contains any files. */ return read_directory_recursive(dir, istate, dirname, len, - untracked, 1, exclude, pathspec); + untracked, 1, excluded, pathspec); } /* @@ -1904,7 +1904,7 @@ static enum path_treatment treat_path(struct dir_struct *dir, int baselen, const struct pathspec *pathspec) { - int has_path_in_index, dtype, exclude; + int has_path_in_index, dtype, excluded; enum path_treatment path_treatment; if (!cdir->d_name) @@ -1949,13 +1949,13 @@ static enum path_treatment treat_path(struct dir_struct *dir, (directory_exists_in_index(istate, path->buf, path->len) == index_nonexistent)) return path_none; - exclude = is_excluded(dir, istate, path->buf, &dtype); + excluded = is_excluded(dir, istate, path->buf, &dtype); /* * Excluded? If we don't explicitly want to show * ignored files, ignore it */ - if (exclude && !(dir->flags & (DIR_SHOW_IGNORED|DIR_SHOW_IGNORED_TOO))) + if (excluded && !(dir->flags & (DIR_SHOW_IGNORED|DIR_SHOW_IGNORED_TOO))) return path_excluded; switch (dtype) { @@ -1965,7 +1965,7 @@ static enum path_treatment treat_path(struct dir_struct *dir, strbuf_addch(path, '/'); path_treatment = treat_directory(dir, istate, untracked, path->buf, path->len, - baselen, exclude, pathspec); + baselen, excluded, pathspec); /* * If 1) we only want to return directories that * match an exclude pattern and 2) this directory does @@ -1974,7 +1974,7 @@ static enum path_treatment treat_path(struct dir_struct *dir, * recurse into this directory (instead of marking the * directory itself as an ignored path). */ - if (!exclude && + if (!excluded && path_treatment == path_excluded && (dir->flags & DIR_SHOW_IGNORED_TOO) && (dir->flags & DIR_SHOW_IGNORED_TOO_MODE_MATCHING)) @@ -1982,7 +1982,7 @@ static enum path_treatment treat_path(struct dir_struct *dir, return path_treatment; case DT_REG: case DT_LNK: - return exclude ? path_excluded : path_untracked; + return excluded ? path_excluded : path_untracked; } } From patchwork Thu Mar 26 21:27:39 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee via GitGitGadget X-Patchwork-Id: 11461255 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 21970913 for ; Thu, 26 Mar 2020 21:27:53 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 03C932074D for ; Thu, 26 Mar 2020 21:27:53 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="OQELjf/f" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727726AbgCZV1w (ORCPT ); Thu, 26 Mar 2020 17:27:52 -0400 Received: from mail-ed1-f67.google.com ([209.85.208.67]:41407 "EHLO mail-ed1-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727716AbgCZV1u (ORCPT ); Thu, 26 Mar 2020 17:27:50 -0400 Received: by mail-ed1-f67.google.com with SMTP id v1so8687961edq.8 for ; Thu, 26 Mar 2020 14:27:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=xkC+IVvzcmDAOU11a56Rp9tRjmkMcpKNCWI68pKm8dU=; b=OQELjf/fI/F/pWrhhYfCYHfJzMBxMWd1FCifO0ReICThHc/+FhGOawdD306WqGxCDg MDBthWXtBo8gQ0PjATz7GmBwcSTPidQN3XvK6bzm4jbw+V6VJq1NFlOgU/aqN0XATBay aTqdZUo7TA3aNm2UYtqea7aoHSI2aS/2OiYZ8UUF4oZgdYtoHK/rGFUsgCXTOgjV4jkc nwBPZr+y/ZPH50MejGqwJsqMVCqRPudhJVq1eiQLBwwyeYLuBT2WxCOMxuahj+IAcGAN mXB0BN2MiavjbtjCQJdygkivlM+7Nrv03PcMYrvACbTqP6Ixv9+HWLb1vn/8grFtOYDZ qb9g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=xkC+IVvzcmDAOU11a56Rp9tRjmkMcpKNCWI68pKm8dU=; b=OYA7ohNkKqkMFPj090oTQKh5vMKfDcCXJdv5aNzPzgmrwlzHShSOgy6Q/KO6zAx18q WEtrL6D4pnvRSsMDnIGfuXH/5VdILtDltuYThjIk1gkNM9gkaymP6k7ZiV8se+9MBrxH uOHI+p2gGtsHcxYGnanueK0KEAuJdSyKUt1JuEoFlO/iEV411BvN9wmOel7IpfMwM6c1 a6vcoLxH8Oy8ybcREv0grYnJOGvtAAJcIo0oYaMLGrsUz5LTz5omIeuyeryEJeWptXWd htKCv1yWaDhbamhUPRqMbr50oAaaZFZVbjRCX8cNJCVdfEMHVo73uOp8u+6Byxv8LcIz LrPg== X-Gm-Message-State: ANhLgQ3J4zwgWWbqIUuPnedwxvc0uk6TnKRgAjQY6g3nv1uqdmoVQGNr c/lmGgZdNoDXZLqV7DXIkYGAYROf X-Google-Smtp-Source: ADFU+vuIyyLKesib/YlGqimPo212SjNgc5I6AYFeH+g56fLbtha3aZP7ARTKBVen8dadKdfBB1WxLQ== X-Received: by 2002:aa7:c559:: with SMTP id s25mr3830303edr.2.1585258067899; Thu, 26 Mar 2020 14:27:47 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id s7sm330843ejx.28.2020.03.26.14.27.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 26 Mar 2020 14:27:47 -0700 (PDT) Message-Id: <576f364329dba507e9976b775a4b48dd1257b077.1585258061.git.gitgitgadget@gmail.com> In-Reply-To: References: From: "Derrick Stolee via GitGitGadget" Date: Thu, 26 Mar 2020 21:27:39 +0000 Subject: [PATCH v4 6/7] dir: refactor treat_directory to clarify control flow Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Martin Melka , SZEDER =?utf-8?b?R8OhYm9y?= , Samuel Lijin , =?utf-8?b?Tmd1eeG7hW4gVGjDoWkgTmfhu41j?= Duy , Derrick Stolee , Elijah Newren , Derrick Stolee Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee The logic in treat_directory() is handled by a multi-case switch statement, but this switch is very asymmetrical, as the first two cases are simple but the third is more complicated than the rest of the method. In fact, the third case includes a "break" statement that leads to the block of code outside the switch statement. That is the only way to reach that block, as the switch handles all possible values from directory_exists_in_index(); Extract the switch statement into a series of "if" statements. This simplifies the trivial cases, while clarifying how to reach the "show_other_directories" case. This is particularly important as the "show_other_directories" case will expand in a later change. Helped-by: Elijah Newren Signed-off-by: Derrick Stolee Signed-off-by: Elijah Newren --- dir.c | 35 +++++++++++++++++------------------ 1 file changed, 17 insertions(+), 18 deletions(-) diff --git a/dir.c b/dir.c index 8074e651e6f..d9bcb7e19b6 100644 --- a/dir.c +++ b/dir.c @@ -1660,29 +1660,28 @@ static enum path_treatment treat_directory(struct dir_struct *dir, const struct pathspec *pathspec) { int nested_repo = 0; - /* The "len-1" is to strip the final '/' */ - switch (directory_exists_in_index(istate, dirname, len-1)) { - case index_directory: - return path_recurse; + enum exist_status status = directory_exists_in_index(istate, dirname, len-1); - case index_gitdir: + if (status == index_directory) + return path_recurse; + if (status == index_gitdir) return path_none; + if (status != index_nonexistent) + BUG("Unhandled value for directory_exists_in_index: %d\n", status); - case index_nonexistent: - if ((dir->flags & DIR_SKIP_NESTED_GIT) || - !(dir->flags & DIR_NO_GITLINKS)) { - struct strbuf sb = STRBUF_INIT; - strbuf_addstr(&sb, dirname); - nested_repo = is_nonbare_repository_dir(&sb); - strbuf_release(&sb); - } - if (nested_repo) - return ((dir->flags & DIR_SKIP_NESTED_GIT) ? path_none : - (excluded ? path_excluded : path_untracked)); + if ((dir->flags & DIR_SKIP_NESTED_GIT) || + !(dir->flags & DIR_NO_GITLINKS)) { + struct strbuf sb = STRBUF_INIT; + strbuf_addstr(&sb, dirname); + nested_repo = is_nonbare_repository_dir(&sb); + strbuf_release(&sb); + } + if (nested_repo) + return ((dir->flags & DIR_SKIP_NESTED_GIT) ? path_none : + (excluded ? path_excluded : path_untracked)); - if (dir->flags & DIR_SHOW_OTHER_DIRECTORIES) - break; + if (!(dir->flags & DIR_SHOW_OTHER_DIRECTORIES)) { if (excluded && (dir->flags & DIR_SHOW_IGNORED_TOO) && (dir->flags & DIR_SHOW_IGNORED_TOO_MODE_MATCHING)) { From patchwork Thu Mar 26 21:27:40 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee via GitGitGadget X-Patchwork-Id: 11461259 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 724C417D4 for ; Thu, 26 Mar 2020 21:27:56 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 40EDA2074D for ; Thu, 26 Mar 2020 21:27:56 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="fb5cGpSB" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727731AbgCZV1z (ORCPT ); Thu, 26 Mar 2020 17:27:55 -0400 Received: from mail-ed1-f67.google.com ([209.85.208.67]:36590 "EHLO mail-ed1-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727720AbgCZV1x (ORCPT ); Thu, 26 Mar 2020 17:27:53 -0400 Received: by mail-ed1-f67.google.com with SMTP id i7so3754643edq.3 for ; Thu, 26 Mar 2020 14:27:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=+KJm8A8bLNMN812JVxLRcJcKTDPWwOxSeyDnj/O8i7k=; b=fb5cGpSBDDVean72JKQaPItoL3Dv9GTByj3o6ELqqUdvq2Z4RP2ISFINaLNiQ1kRX1 Z+6hETEJ7xbcIhnF0kqeExTsIjTx0wUs2XnHunV4YTpvMQM6/Djwqn/MBZNZTzvblwdl kJJJYLhBugmVLaX3R3KSOmOujRT6FGL6TV1vm3+MM68+H8QXZPiXG1Fpp511oBu8cTYq 6Acfo4KdCFPSk6fi6CBShoIYW6I9OEPTCsdW9Tns3knEq+K9pUYsxck9GACpq55Im9Pf iImWA1XgNLSYOztiINGMuoKwiPZqTaRnr14GFMkmhiP7ZpcxLlH9uXhGTa9vFntGDCFw J4yw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=+KJm8A8bLNMN812JVxLRcJcKTDPWwOxSeyDnj/O8i7k=; b=FuAmh2XVId7dYWQI7NmAZWLU+gWIgnSGZE21mR1bsB/RnXMhae+Sxh1FWPb86vSxze qoWFc/coIDKXHv1/rx9LYTPbHY036akNDLDHnTcG3F9K04PipwtqCiwdj4EO+a8jaJtC PFsL5ueOhwHuB79Jj+pFv5S+nguw4V2GmkL0iUpXjlEngYDa99V9TBrjOCCNPn7MNtHs HNnfjgxh6sLrfb8I9iEpGdt+3KQVt2dU6lSRJWXvUJtR9gO7tZyOZoy5Fhc9EuqqiF/e QKYQuNbHAhhiIEUa15Ms9cq7Ynl/OZbbpAwq9HkIMT+NE1fIp2eTeUnBlnPtvgB4f1Eq EaAQ== X-Gm-Message-State: ANhLgQ3b6QJIVGX3KIWAq+cuc/JV5azHJNG8wD7d5/0lmfxpfY0Rvsxn UEe9MwQ8HNlrBzoqrMsUTa52DVkl X-Google-Smtp-Source: ADFU+vsj4eqRqfIw6AQyXi66QyTHtbyY2k1p6N76Ywzn54RB3vKy6iMr22SItO7IjkDnwTXkJsb2xA== X-Received: by 2002:aa7:c2d8:: with SMTP id m24mr9931876edp.223.1585258068815; Thu, 26 Mar 2020 14:27:48 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id yc5sm441583ejb.66.2020.03.26.14.27.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 26 Mar 2020 14:27:48 -0700 (PDT) Message-Id: In-Reply-To: References: From: "Elijah Newren via GitGitGadget" Date: Thu, 26 Mar 2020 21:27:40 +0000 Subject: [PATCH v4 7/7] dir: replace exponential algorithm with a linear one Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Martin Melka , SZEDER =?utf-8?b?R8OhYm9y?= , Samuel Lijin , =?utf-8?b?Tmd1eeG7hW4gVGjDoWkgTmfhu41j?= Duy , Derrick Stolee , Elijah Newren , Elijah Newren Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Elijah Newren dir's read_directory_recursive() naturally operates recursively in order to walk the directory tree. Treating of directories is sometimes weird because there are so many different permutations about how to handle directories. Some examples: * 'git ls-files -o --directory' only needs to know that a directory itself is untracked; it doesn't need to recurse into it to see what is underneath. * 'git status' needs to recurse into an untracked directory, but only to determine whether or not it is empty. If there are no files underneath, the directory itself will be omitted from the output. If it is not empty, only the directory will be listed. * 'git status --ignored' needs to recurse into untracked directories and report all the ignored entries and then report the directory as untracked -- UNLESS all the entries under the directory are ignored, in which case we don't print any of the entries under the directory and just report the directory itself as ignored. (Note that although this forces us to walk all untracked files underneath the directory as well, we strip them from the output, except for users like 'git clean' who also set DIR_KEEP_TRACKED_CONTENTS.) * For 'git clean', we may need to recurse into a directory that doesn't match any specified pathspecs, if it's possible that there is an entry underneath the directory that can match one of the pathspecs. In such a case, we need to be careful to omit the directory itself from the list of paths (see commit 404ebceda01c ("dir: also check directories for matching pathspecs", 2019-09-17)) Part of the tension noted above is that the treatment of a directory can change based on the files within it, and based on the various settings in dir->flags. Trying to keep this in mind while reading over the code, it is easy to think in terms of "treat_directory() tells us what to do with a directory, and read_directory_recursive() is the thing that recurses". Since we need to look into a directory to know how to treat it, though, it is quite easy to decide to (also) recurse into the directory from treat_directory() by adding a read_directory_recursive() call. Adding such a call is actually fine, IF we make sure that read_directory_recursive() does not also recurse into that same directory. Unfortunately, commit df5bcdf83aeb ("dir: recurse into untracked dirs for ignored files", 2017-05-18), added exactly such a case to the code, meaning we'd have two calls to read_directory_recursive() for an untracked directory. So, if we had a file named one/two/three/four/five/somefile.txt and nothing in one/ was tracked, then 'git status --ignored' would call read_directory_recursive() twice on the directory 'one/', and each of those would call read_directory_recursive() twice on the directory 'one/two/', and so on until read_directory_recursive() was called 2^5 times for 'one/two/three/four/five/'. Avoid calling read_directory_recursive() twice per level by moving a lot of the special logic into treat_directory(). Since dir.c is somewhat complex, extra cruft built up around this over time. While trying to unravel it, I noticed several instances where the first call to read_directory_recursive() would return e.g. path_untracked for some directory and a later one would return e.g. path_none, despite the fact that the directory clearly should have been considered untracked. The code happened to work due to the side-effect from the first invocation of adding untracked entries to dir->entries; this allowed it to get the correct output despite the supposed override in return value by the later call. I am somewhat concerned that there are still bugs and maybe even testcases with the wrong expectation. I have tried to carefully document treat_directory() since it becomes more complex after this change (though much of this complexity came from elsewhere that probably deserved better comments to begin with). However, much of my work felt more like a game of whackamole while attempting to make the code match the existing regression tests than an attempt to create an implementation that matched some clear design. That seems wrong to me, but the rules of existing behavior had so many special cases that I had a hard time coming up with some overarching rules about what correct behavior is for all cases, forcing me to hope that the regression tests are correct and sufficient. Such a hope seems likely to be ill-founded, given my experience with dir.c-related testcases in the last few months: Examples where the documentation was hard to parse or even just wrong: * 3aca58045f4f (git-clean.txt: do not claim we will delete files with -n/--dry-run, 2019-09-17) * 09487f2cbad3 (clean: avoid removing untracked files in a nested git repository, 2019-09-17) * e86bbcf987fa (clean: disambiguate the definition of -d, 2019-09-17) Examples where testcases were declared wrong and changed: * 09487f2cbad3 (clean: avoid removing untracked files in a nested git repository, 2019-09-17) * e86bbcf987fa (clean: disambiguate the definition of -d, 2019-09-17) * a2b13367fe55 (Revert "dir.c: make 'git-status --ignored' work within leading directories", 2019-12-10) Examples where testcases were clearly inadequate: * 502c386ff944 (t7300-clean: demonstrate deleting nested repo with an ignored file breakage, 2019-08-25) * 7541cc530239 (t7300: add testcases showing failure to clean specified pathspecs, 2019-09-17) * a5e916c7453b (dir: fix off-by-one error in match_pathspec_item, 2019-09-17) * 404ebceda01c (dir: also check directories for matching pathspecs, 2019-09-17) * 09487f2cbad3 (clean: avoid removing untracked files in a nested git repository, 2019-09-17) * e86bbcf987fa (clean: disambiguate the definition of -d, 2019-09-17) * 452efd11fbf6 (t3011: demonstrate directory traversal failures, 2019-12-10) * b9670c1f5e6b (dir: fix checks on common prefix directory, 2019-12-19) Examples where "correct behavior" was unclear to everyone: https://lore.kernel.org/git/20190905154735.29784-1-newren@gmail.com/ Other commits of note: * 902b90cf42bc (clean: fix theoretical path corruption, 2019-09-17) However, on the positive side, it does make the code much faster. For the following simple shell loop in an empty repository: for depth in $(seq 10 25) do dirs=$(for i in $(seq 1 $depth) ; do printf 'dir/' ; done) rm -rf dir mkdir -p $dirs >$dirs/untracked-file /usr/bin/time --format="$depth: %e" git status --ignored >/dev/null done I saw the following timings, in seconds (note that the numbers are a little noisy from run-to-run, but the trend is very clear with every run): 10: 0.03 11: 0.05 12: 0.08 13: 0.19 14: 0.29 15: 0.50 16: 1.05 17: 2.11 18: 4.11 19: 8.60 20: 17.55 21: 33.87 22: 68.71 23: 140.05 24: 274.45 25: 551.15 For the above run, using strace I can look for the number of untracked directories opened and can verify that it matches the expected 2^($depth+1)-2 (the sum of 2^1 + 2^2 + 2^3 + ... + 2^$depth). After this fix, with strace I can verify that the number of untracked directories that are opened drops to just $depth, and the timings all drop to 0.00. In fact, it isn't until a depth of 190 nested directories that it sometimes starts reporting a time of 0.01 seconds and doesn't consistently report 0.01 seconds until there are 240 nested directories. The previous code would have taken 17.55 * 2^220 / (60*60*24*365) = 9.4 * 10^59 YEARS to have completed the 240 nested directories case. It's not often that you get to speed something up by a factor of 3*10^69. Signed-off-by: Elijah Newren --- dir.c | 167 ++++++++++++++++++++++++++++++++++++++++++---------------- 1 file changed, 121 insertions(+), 46 deletions(-) diff --git a/dir.c b/dir.c index d9bcb7e19b6..29283fc2588 100644 --- a/dir.c +++ b/dir.c @@ -1659,7 +1659,13 @@ static enum path_treatment treat_directory(struct dir_struct *dir, const char *dirname, int len, int baselen, int excluded, const struct pathspec *pathspec) { - int nested_repo = 0; + /* + * WARNING: From this function, you can return path_recurse or you + * can call read_directory_recursive() (or neither), but + * you CAN'T DO BOTH. + */ + enum path_treatment state; + int nested_repo = 0, old_ignored_nr, check_only, stop_early; /* The "len-1" is to strip the final '/' */ enum exist_status status = directory_exists_in_index(istate, dirname, len-1); @@ -1711,18 +1717,117 @@ static enum path_treatment treat_directory(struct dir_struct *dir, /* This is the "show_other_directories" case */ - if (!(dir->flags & DIR_HIDE_EMPTY_DIRECTORIES)) + /* + * We only need to recurse into untracked/ignored directories if + * either of the following bits is set: + * - DIR_SHOW_IGNORED_TOO (because then we need to determine if + * there are ignored directories below) + * - DIR_HIDE_EMPTY_DIRECTORIES (because we have to determine if + * the directory is empty) + */ + if (!(dir->flags & (DIR_SHOW_IGNORED_TOO | DIR_HIDE_EMPTY_DIRECTORIES))) return excluded ? path_excluded : path_untracked; + /* + * If we have we don't want to know the all the paths under an + * untracked or ignored directory, we still need to go into the + * directory to determine if it is empty (because an empty directory + * should be path_none instead of path_excluded or path_untracked). + */ + check_only = ((dir->flags & DIR_HIDE_EMPTY_DIRECTORIES) && + !(dir->flags & DIR_SHOW_IGNORED_TOO)); + + /* + * However, there's another optimization possible as a subset of + * check_only, based on the cases we have to consider: + * A) Directory matches no exclude patterns: + * * Directory is empty => path_none + * * Directory has an untracked file under it => path_untracked + * * Directory has only ignored files under it => path_excluded + * B) Directory matches an exclude pattern: + * * Directory is empty => path_none + * * Directory has an untracked file under it => path_excluded + * * Directory has only ignored files under it => path_excluded + * In case A, we can exit as soon as we've found an untracked + * file but otherwise have to walk all files. In case B, though, + * we can stop at the first file we find under the directory. + */ + stop_early = check_only && excluded; + + /* + * If /every/ file within an untracked directory is ignored, then + * we want to treat the directory as ignored (for e.g. status + * --porcelain), without listing the individual ignored files + * underneath. To do so, we'll save the current ignored_nr, and + * pop all the ones added after it if it turns out the entire + * directory is ignored. + */ + old_ignored_nr = dir->ignored_nr; + + /* Actually recurse into dirname now, we'll fixup the state later. */ untracked = lookup_untracked(dir->untracked, untracked, dirname + baselen, len - baselen); + state = read_directory_recursive(dir, istate, dirname, len, untracked, + check_only, stop_early, pathspec); + + /* There are a variety of reasons we may need to fixup the state... */ + if (state == path_excluded) { + int i; + + /* + * When stop_early is set, read_directory_recursive() will + * never return path_untracked regardless of whether + * underlying paths were untracked or ignored (because + * returning early means it excluded some paths, or + * something like that -- see commit 5aaa7fd39aaf ("Improve + * performance of git status --ignored", 2017-09-18)). + * However, we're not really concerned with the status of + * files under the directory, we just wanted to know + * whether the directory was empty (state == path_none) or + * not (state == path_excluded), and if not, we'd return + * our original status based on whether the untracked + * directory matched an exclusion pattern. + */ + if (stop_early) + state = excluded ? path_excluded : path_untracked; + + else { + /* + * When + * !stop_early && state == path_excluded + * then all paths under dirname were ignored. For + * this case, git status --porcelain wants to just + * list the directory itself as ignored and not + * list the individual paths underneath. Remove + * the individual paths underneath. + */ + for (i = old_ignored_nr + 1; iignored_nr; ++i) + free(dir->ignored[i]); + dir->ignored_nr = old_ignored_nr; + } + } + + /* + * If there is nothing under the current directory and we are not + * hiding empty directories, then we need to report on the + * untracked or ignored status of the directory itself. + */ + if (state == path_none && !(dir->flags & DIR_HIDE_EMPTY_DIRECTORIES)) + state = excluded ? path_excluded : path_untracked; /* - * If this is an excluded directory, then we only need to check if - * the directory contains any files. + * We can recurse into untracked directories that don't match any + * of the given pathspecs when some file underneath the directory + * might match one of the pathspecs. If so, we should make sure + * to note that the directory itself did not match. */ - return read_directory_recursive(dir, istate, dirname, len, - untracked, 1, excluded, pathspec); + if (pathspec && + !match_pathspec(istate, pathspec, dirname, len, + 0 /* prefix */, NULL, + 0 /* do NOT special case dirs */)) + state = path_none; + + return state; } /* @@ -1870,6 +1975,11 @@ static enum path_treatment treat_path_fast(struct dir_struct *dir, int baselen, const struct pathspec *pathspec) { + /* + * WARNING: From this function, you can return path_recurse or you + * can call read_directory_recursive() (or neither), but + * you CAN'T DO BOTH. + */ strbuf_setlen(path, baselen); if (!cdir->ucd) { strbuf_addstr(path, cdir->file); @@ -2175,14 +2285,10 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir, int stop_at_first_file, const struct pathspec *pathspec) { /* - * WARNING WARNING WARNING: - * - * Any updates to the traversal logic here may need corresponding - * updates in treat_leading_path(). See the commit message for the - * commit adding this warning as well as the commit preceding it - * for details. + * WARNING: Do NOT recurse unless path_recurse is returned from + * treat_path(). Recursing on any other return value + * can result in exponential slowdown. */ - struct cached_dir cdir; enum path_treatment state, subdir_state, dir_state = path_none; struct strbuf path = STRBUF_INIT; @@ -2204,13 +2310,7 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir, dir_state = state; /* recurse into subdir if instructed by treat_path */ - if ((state == path_recurse) || - ((state == path_untracked) && - (resolve_dtype(cdir.d_type, istate, path.buf, path.len) == DT_DIR) && - ((dir->flags & DIR_SHOW_IGNORED_TOO) || - (pathspec && - do_match_pathspec(istate, pathspec, path.buf, path.len, - baselen, NULL, DO_MATCH_LEADING_PATHSPEC) == MATCHED_RECURSIVELY_LEADING_PATHSPEC)))) { + if (state == path_recurse) { struct untracked_cache_dir *ud; ud = lookup_untracked(dir->untracked, untracked, path.buf + baselen, @@ -2294,15 +2394,6 @@ static int treat_leading_path(struct dir_struct *dir, const char *path, int len, const struct pathspec *pathspec) { - /* - * WARNING WARNING WARNING: - * - * Any updates to the traversal logic here may need corresponding - * updates in read_directory_recursive(). See 777b420347 (dir: - * synchronize treat_leading_path() and read_directory_recursive(), - * 2019-12-19) and its parent commit for details. - */ - struct strbuf sb = STRBUF_INIT; struct strbuf subdir = STRBUF_INIT; int prevlen, baselen; @@ -2353,23 +2444,7 @@ static int treat_leading_path(struct dir_struct *dir, strbuf_reset(&subdir); strbuf_add(&subdir, path+prevlen, baselen-prevlen); cdir.d_name = subdir.buf; - state = treat_path(dir, NULL, &cdir, istate, &sb, prevlen, - pathspec); - if (state == path_untracked && - resolve_dtype(cdir.d_type, istate, sb.buf, sb.len) == DT_DIR && - (dir->flags & DIR_SHOW_IGNORED_TOO || - do_match_pathspec(istate, pathspec, sb.buf, sb.len, - baselen, NULL, DO_MATCH_LEADING_PATHSPEC) == MATCHED_RECURSIVELY_LEADING_PATHSPEC)) { - if (!match_pathspec(istate, pathspec, sb.buf, sb.len, - 0 /* prefix */, NULL, - 0 /* do NOT special case dirs */)) - state = path_none; - add_path_to_appropriate_result_list(dir, NULL, &cdir, - istate, - &sb, baselen, - pathspec, state); - state = path_recurse; - } + state = treat_path(dir, NULL, &cdir, istate, &sb, prevlen, pathspec); if (state != path_recurse) break; /* do not recurse into it */