From patchwork Tue Jun 22 08:04:37 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 12336563 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BAED9C48BE5 for ; Tue, 22 Jun 2021 08:04:52 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A016261108 for ; Tue, 22 Jun 2021 08:04:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230439AbhFVIHG (ORCPT ); Tue, 22 Jun 2021 04:07:06 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41446 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230380AbhFVIHA (ORCPT ); Tue, 22 Jun 2021 04:07:00 -0400 Received: from mail-wr1-x432.google.com (mail-wr1-x432.google.com [IPv6:2a00:1450:4864:20::432]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1D175C061756 for ; Tue, 22 Jun 2021 01:04:45 -0700 (PDT) Received: by mail-wr1-x432.google.com with SMTP id a11so22501284wrt.13 for ; Tue, 22 Jun 2021 01:04:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=WaNi0PNToLyqrzeralRfEUb73qsmawlAv79BHtK3meU=; b=MiFBgzMzD5zGUTcakBbF6Gptwyx6u4Tncgu3VbOzprib3Ev47nuIBhRHDJ3VB/JVnt QjSXgzV2wguLLNdKi10iRjOLNgITYzEgz5f5YfNYvu0pCCDj5O8ryvqrkHxy3IykcsR2 BfyutSkVNh6TG/MrHZ3VFBOJR1Q+OUKzuuRz5KNQkiZA6Rs3zm70gcRLlmFBm9YrtX0F ZXEDhITQJ20udGUAancIiCtixQPCnIx6QKewKtFFW126/lH6xi6hvjtgiBVP546z0IQl E/s/H9U0KTaByhd3C5P+JXrC8Bg1q0YfKgWCtBtixJ0Yj/ChagysiRmuEipk5L3bok/g oGVQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=WaNi0PNToLyqrzeralRfEUb73qsmawlAv79BHtK3meU=; b=Hp5oPaF8/2jpEHqzc1DYbyM8V7lR7HJxB8g3bLzwWaUBwP7y0XXDRpd8VkDJpRv2WJ Zi2UGuYV2u+DEOT2qP5aR5wAvIlEqntNm4Dd9VLZDLScKg53/ZWqGvn+9PKCYhhXhR44 quSiJ0RTFxp5mLYb4ehHjeo56ngepdNmIzQx77LEZD8dPjH94JljgHoJbNc6BRiGOPHm T7vhR3F9y6rMl/97/9rJvlVDMRaKiwhM7QcLM2aPYURoI6+ub7hGH9sd6ReCwiuJUW2x JGs12cH2CTi+Z+8/kqj7G43IYuhL/HAXIxoRSsu9h0z1nlXTNhrj+CxPSM3tinwBJNNT +UUQ== X-Gm-Message-State: AOAM531/So6zlWgA6Ee5KSziwOMZQGhJPZtcZx5itPwoIqbSSLInOAOI 5ztOuu3UTeoOfsN4IdiGbmkjEueD7Dg= X-Google-Smtp-Source: ABdhPJwpMjPJcf0q6pfb5qAufdF+R2qpHW/z/z9bpUUn3Zw6BfFGfTz7WHbRFUHI9321lPymRcoxOQ== X-Received: by 2002:adf:b354:: with SMTP id k20mr3056955wrd.136.1624349083784; Tue, 22 Jun 2021 01:04:43 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id d4sm1494632wmd.42.2021.06.22.01.04.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 22 Jun 2021 01:04:43 -0700 (PDT) Message-Id: <04f5ebdabe147b6652c5487ad2a440ba7b7389d5.1624349082.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Tue, 22 Jun 2021 08:04:37 +0000 Subject: [PATCH v3 1/5] promisor-remote: output trace2 statistics for number of objects fetched Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Jonathan Tan , Derrick Stolee , Taylor Blau , Derrick Stolee , Elijah Newren , Elijah Newren Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Elijah Newren From: Elijah Newren Signed-off-by: Elijah Newren --- promisor-remote.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/promisor-remote.c b/promisor-remote.c index da3f2ca2615e..d465377d7d32 100644 --- a/promisor-remote.c +++ b/promisor-remote.c @@ -12,7 +12,8 @@ void set_repository_format_partial_clone(char *partial_clone) repository_format_partial_clone = xstrdup_or_null(partial_clone); } -static int fetch_objects(const char *remote_name, +static int fetch_objects(struct repository *repo, + const char *remote_name, const struct object_id *oids, int oid_nr) { @@ -30,6 +31,8 @@ static int fetch_objects(const char *remote_name, die(_("promisor-remote: unable to fork off fetch subprocess")); child_in = xfdopen(child.in, "w"); + trace2_data_intmax("promisor", repo, "fetch_count", oid_nr); + for (i = 0; i < oid_nr; i++) { if (fputs(oid_to_hex(&oids[i]), child_in) < 0) die_errno(_("promisor-remote: could not write to fetch subprocess")); @@ -238,7 +241,7 @@ int promisor_remote_get_direct(struct repository *repo, promisor_remote_init(); for (r = promisors; r; r = r->next) { - if (fetch_objects(r->name, remaining_oids, remaining_nr) < 0) { + if (fetch_objects(repo, r->name, remaining_oids, remaining_nr) < 0) { if (remaining_nr == 1) continue; remaining_nr = remove_fetched_oids(repo, &remaining_oids, From patchwork Tue Jun 22 08:04:38 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 12336567 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 377CDC2B9F4 for ; Tue, 22 Jun 2021 08:04:56 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1CEBB6128A for ; Tue, 22 Jun 2021 08:04:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230452AbhFVIHJ (ORCPT ); Tue, 22 Jun 2021 04:07:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41452 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230392AbhFVIHC (ORCPT ); Tue, 22 Jun 2021 04:07:02 -0400 Received: from mail-wm1-x32b.google.com (mail-wm1-x32b.google.com [IPv6:2a00:1450:4864:20::32b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C4007C06175F for ; Tue, 22 Jun 2021 01:04:45 -0700 (PDT) Received: by mail-wm1-x32b.google.com with SMTP id j11-20020a05600c1c0bb02901e23d4c0977so476281wms.0 for ; Tue, 22 Jun 2021 01:04:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=0rornULiPb4aocIuNRBO1gWiUJldsp1W22PEYrTdgX0=; b=ItkxamHaUUwTF7yPECt8T3yNNcdALaUVs+DJbY8/wxkEcXNNxXy+hPuXksKHyKZMyr 9wHui+0ZGykYGi0oO6vqum+mkYrmvtUcIEjzSoiEFaOoOgAwRN7FqQPnnFGKkehSllBQ Ev4GLV5RRMs33S5d3epT/mlV8p8oU5cdoG9d0Wzqept/BPOkD/pYCeTodAzDm0Ps0a7i aIWm8LaKGjfp5+Eb8h2j4T+FVSQjTj1U4uPAJX86oJ/8yg+EX17V9ZbKvXy4/tRhIBzN 6glAhQgRHjwssOkuXr4Dc8hDaAUty0d6Zsj1M9w7VlsJuNqYG4Zaa+O/I/5aPwepd11v vfGQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=0rornULiPb4aocIuNRBO1gWiUJldsp1W22PEYrTdgX0=; b=EIsnfkMDTHrZPp0LsdQx7tCgymOYocXdkOTWiu3udA5t0Mv6oNKpy3Jt8zHByjoBBz WDOW0z6tHVZYj2kmwXSu4VM6n5wADXTGf4lM+YMAtUKb1Wbbq09JBpMcjJ8Qp+91mkRN vFNdZjWqQ1JmSzPq7NZoqF4uW2zMK7oOq8jMmHWtL6Y2V++Q7KCFB2tVh4DbA1tdynoS 2phM1E6M/1W9ec2uYJfRYa86jkMVyhXInRgb+Fsl2IYMJ5+5KcGG7sTMDi7ksUL5PRO5 NsllGLZihK+evHmm953O+NSCEn9V/zAwXUfnoIO0k2ZNv8yWo0ZR8nSdYHGPaJ5JFRSj gknA== X-Gm-Message-State: AOAM5300icTTH31j6Ho4oQ+fhQahlKGGmzNrngmLlHOJrmjvg5UfYVvE vRfUk5m8s4sm9msgk7NZ3yd3zjRH7s8= X-Google-Smtp-Source: ABdhPJztpNkZKLDEf7e6yAWiNblWaL4BzE8MTs0e+nySBjFnubovIPIwqGOpEHsDW9lFa09JP7wIyA== X-Received: by 2002:a05:600c:190f:: with SMTP id j15mr2806742wmq.37.1624349084386; Tue, 22 Jun 2021 01:04:44 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id q11sm20283168wrx.80.2021.06.22.01.04.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 22 Jun 2021 01:04:44 -0700 (PDT) Message-Id: <4796e096fdb4f4ef8513c0e9214754d41b4db5d7.1624349082.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Tue, 22 Jun 2021 08:04:38 +0000 Subject: [PATCH v3 2/5] t6421: add tests checking for excessive object downloads during merge Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Jonathan Tan , Derrick Stolee , Taylor Blau , Derrick Stolee , Elijah Newren , Elijah Newren Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Elijah Newren From: Elijah Newren Signed-off-by: Elijah Newren --- t/t6421-merge-partial-clone.sh | 440 +++++++++++++++++++++++++++++++++ 1 file changed, 440 insertions(+) create mode 100755 t/t6421-merge-partial-clone.sh diff --git a/t/t6421-merge-partial-clone.sh b/t/t6421-merge-partial-clone.sh new file mode 100755 index 000000000000..3dcffc15f801 --- /dev/null +++ b/t/t6421-merge-partial-clone.sh @@ -0,0 +1,440 @@ +#!/bin/sh + +test_description="limiting blob downloads when merging with partial clones" +# Uses a methodology similar to +# t6042: corner cases with renames but not criss-cross merges +# t6036: corner cases with both renames and criss-cross merges +# t6423: directory rename detection +# +# The setup for all of them, pictorially, is: +# +# A +# o +# / \ +# O o ? +# \ / +# o +# B +# +# To help make it easier to follow the flow of tests, they have been +# divided into sections and each test will start with a quick explanation +# of what commits O, A, and B contain. +# +# Notation: +# z/{b,c} means files z/b and z/c both exist +# x/d_1 means file x/d exists with content d1. (Purpose of the +# underscore notation is to differentiate different +# files that might be renamed into each other's paths.) + +. ./test-lib.sh +. "$TEST_DIRECTORY"/lib-merge.sh + +test_setup_repo () { + test -d server && return + test_create_repo server && + ( + cd server && + + git config uploadpack.allowfilter 1 && + git config uploadpack.allowanysha1inwant 1 && + + mkdir -p general && + test_seq 2 9 >general/leap1 && + cp general/leap1 general/leap2 && + echo leap2 >>general/leap2 && + + mkdir -p basename && + cp general/leap1 basename/numbers && + cp general/leap1 basename/sequence && + cp general/leap1 basename/values && + echo numbers >>basename/numbers && + echo sequence >>basename/sequence && + echo values >>basename/values && + + mkdir -p dir/unchanged && + mkdir -p dir/subdir/tweaked && + echo a >dir/subdir/a && + echo b >dir/subdir/b && + echo c >dir/subdir/c && + echo d >dir/subdir/d && + echo e >dir/subdir/e && + cp general/leap1 dir/subdir/Makefile && + echo toplevel makefile >>dir/subdir/Makefile && + echo f >dir/subdir/tweaked/f && + echo g >dir/subdir/tweaked/g && + echo h >dir/subdir/tweaked/h && + echo subdirectory makefile >dir/subdir/tweaked/Makefile && + for i in $(test_seq 1 88) + do + echo content $i >dir/unchanged/file_$i + done && + git add . && + git commit -m "O" && + + git branch O && + git branch A && + git branch B-single && + git branch B-dir && + git branch B-many && + + git switch A && + + git rm general/leap* && + mkdir general/ && + test_seq 1 9 >general/jump1 && + cp general/jump1 general/jump2 && + echo leap2 >>general/jump2 && + + rm basename/numbers basename/sequence basename/values && + mkdir -p basename/subdir/ + cp general/jump1 basename/subdir/numbers && + cp general/jump1 basename/subdir/sequence && + cp general/jump1 basename/subdir/values && + echo numbers >>basename/subdir/numbers && + echo sequence >>basename/subdir/sequence && + echo values >>basename/subdir/values && + + git rm dir/subdir/tweaked/f && + echo more >>dir/subdir/e && + echo more >>dir/subdir/Makefile && + echo more >>dir/subdir/tweaked/Makefile && + mkdir dir/subdir/newsubdir && + echo rust code >dir/subdir/newsubdir/newfile.rs && + git mv dir/subdir/e dir/subdir/newsubdir/ && + git mv dir folder && + git add . && + git commit -m "A" && + + git switch B-single && + echo new first line >dir/subdir/Makefile && + cat general/leap1 >>dir/subdir/Makefile && + echo toplevel makefile >>dir/subdir/Makefile && + echo perl code >general/newfile.pl && + git add . && + git commit -m "B-single" && + + git switch B-dir && + echo java code >dir/subdir/newfile.java && + echo scala code >dir/subdir/newfile.scala && + echo groovy code >dir/subdir/newfile.groovy && + git add . && + git commit -m "B-dir" && + + git switch B-many && + test_seq 2 10 >general/leap1 && + rm general/leap2 && + cp general/leap1 general/leap2 && + echo leap2 >>general/leap2 && + + rm basename/numbers basename/sequence basename/values && + mkdir -p basename/subdir/ + cp general/leap1 basename/subdir/numbers && + cp general/leap1 basename/subdir/sequence && + cp general/leap1 basename/subdir/values && + echo numbers >>basename/subdir/numbers && + echo sequence >>basename/subdir/sequence && + echo values >>basename/subdir/values && + + mkdir dir/subdir/newsubdir/ && + echo c code >dir/subdir/newfile.c && + echo python code >dir/subdir/newsubdir/newfile.py && + git add . && + git commit -m "B-many" && + + git switch A + ) +} + +# Testcase: Objects downloaded for single relevant rename +# Commit O: +# general/{leap1_O, leap2_O} +# basename/{numbers_O, sequence_O, values_O} +# dir/subdir/{a,b,c,d,e_O,Makefile_TOP_O} +# dir/subdir/tweaked/{f,g,h,Makefile_SUB_O} +# dir/unchanged/ +# Commit A: +# (Rename leap->jump, rename basename/ -> basename/subdir/, rename dir/ +# -> folder/, move e into newsubdir, add newfile.rs, remove f, modify +# both both Makefiles and jumps) +# general/{jump1_A, jump2_A} +# basename/subdir/{numbers_A, sequence_A, values_A} +# folder/subdir/{a,b,c,d,Makefile_TOP_A} +# folder/subdir/newsubdir/{e_A,newfile.rs} +# folder/subdir/tweaked/{g,h,Makefile_SUB_A} +# folder/unchanged/ +# Commit B(-single): +# (add newfile.pl, tweak Makefile_TOP) +# general/{leap1_O, leap2_O,newfile.pl} +# basename/{numbers_O, sequence_O, values_O} +# dir/{a,b,c,d,e_O,Makefile_TOP_B} +# dir/tweaked/{f,g,h,Makefile_SUB_O} +# dir/unchanged/ +# Expected: +# general/{jump1_A, jump2_A,newfile.pl} +# basename/subdir/{numbers_A, sequence_A, values_A} +# folder/subdir/{a,b,c,d,Makefile_TOP_Merged} +# folder/subdir/newsubdir/{e_A,newfile.rs} +# folder/subdir/tweaked/{g,h,Makefile_SUB_A} +# folder/unchanged/ +# +# Objects that need to be fetched: +# Rename detection: +# Side1 (O->A): +# Basename-matches rename detection only needs to fetch these objects: +# Makefile_TOP_O, Makefile_TOP_A +# (Despite many renames, all others are content irrelevant. They +# are also location irrelevant because newfile.rs was added on +# the side doing the directory rename, and newfile.pl was added to +# a directory that was not renamed on either side.) +# General rename detection only needs to fetch these objects: +# +# (Even though newfile.rs, jump[12], basename/subdir/*, and e +# could all be used as destinations in rename detection, the +# basename detection for Makefile matches up all relevant +# sources, so these other files never end up needing to be +# used) +# Side2 (O->B): +# Basename-matches rename detection only needs to fetch these objects: +# +# (there are no deleted files, so no possible sources) +# General rename detection only needs to fetch these objects: +# +# (there are no deleted files, so no possible sources) +# Merge: +# 3-way content merge needs to grab these objects: +# Makefile_TOP_B +# Nothing else needs to fetch objects +# +# Summary: 2 fetches (1 for 2 objects, 1 for 1 object) +# +test_expect_merge_algorithm failure failure 'Objects downloaded for single relevant rename' ' + test_setup_repo && + git clone --sparse --filter=blob:none "file://$(pwd)/server" objects-single && + ( + cd objects-single && + + git rev-list --objects --all --missing=print | + grep "^?" | sort >missing-objects-before && + + git checkout -q origin/A && + + GIT_TRACE2_PERF="$(pwd)/trace.output" git \ + -c merge.directoryRenames=true merge --no-stat \ + --no-progress origin/B-single && + + # Check the number of objects we reported we would fetch + cat >expect <<-EOF && + fetch_count:2 + fetch_count:1 + EOF + grep fetch_count trace.output | cut -d "|" -f 9 | tr -d " ." >actual && + test_cmp expect actual && + + # Check the number of fetch commands exec-ed + grep d0.*fetch.negotiationAlgorithm trace.output >fetches && + test_line_count = 2 fetches && + + git rev-list --objects --all --missing=print | + grep "^?" | sort >missing-objects-after && + comm -2 -3 missing-objects-before missing-objects-after >old && + comm -1 -3 missing-objects-before missing-objects-after >new && + # No new missing objects + test_must_be_empty new && + # Fetched 2 + 1 = 3 objects + test_line_count = 3 old + ) +' + +# Testcase: Objects downloaded for directory rename +# Commit O: +# general/{leap1_O, leap2_O} +# basename/{numbers_O, sequence_O, values_O} +# dir/subdir/{a,b,c,d,e_O,Makefile_TOP_O} +# dir/subdir/tweaked/{f,g,h,Makefile_SUB_O} +# dir/unchanged/ +# Commit A: +# (Rename leap->jump, rename basename/ -> basename/subdir/, rename dir/ -> +# folder/, move e into newsubdir, add newfile.rs, remove f, modify +# both Makefiles and jumps) +# general/{jump1_A, jump2_A} +# basename/subdir/{numbers_A, sequence_A, values_A} +# folder/subdir/{a,b,c,d,Makefile_TOP_A} +# folder/subdir/newsubdir/{e_A,newfile.rs} +# folder/subdir/tweaked/{g,h,Makefile_SUB_A} +# folder/unchanged/ +# Commit B(-dir): +# (add dir/subdir/newfile.{java,scala,groovy} +# general/{leap1_O, leap2_O} +# basename/{numbers_O, sequence_O, values_O} +# dir/subdir/{a,b,c,d,e_O,Makefile_TOP_O, +# newfile.java,newfile.scala,newfile.groovy} +# dir/subdir/tweaked/{f,g,h,Makefile_SUB_O} +# dir/unchanged/ +# Expected: +# general/{jump1_A, jump2_A} +# basename/subdir/{numbers_A, sequence_A, values_A} +# folder/subdir/{a,b,c,d,Makefile_TOP_A, +# newfile.java,newfile.scala,newfile.groovy} +# folder/subdir/newsubdir/{e_A,newfile.rs} +# folder/subdir/tweaked/{g,h,Makefile_SUB_A} +# folder/unchanged/ +# +# Objects that need to be fetched: +# Makefile_TOP_O, Makefile_TOP_A +# Makefile_SUB_O, Makefile_SUB_A +# e_O, e_A +# * Despite A's rename of jump->leap, those renames are irrelevant. +# * Despite A's rename of basename/ -> basename/subdir/, those renames are +# irrelevant. +# * Because of A's rename of dir/ -> folder/ and B-dir's addition of +# newfile.* into dir/subdir/, we need to determine directory renames. +# (Technically, there are enough exact renames to determine directory +# rename detection, but the current implementation always does +# basename searching before directory rename detection. Running it +# also before basename searching would mean doing directory rename +# detection twice, but it's a bit expensive to do that and cases like +# this are not all that common.) +# Summary: 1 fetches for 6 objects +# +test_expect_merge_algorithm failure failure 'Objects downloaded when a directory rename triggered' ' + test_setup_repo && + git clone --sparse --filter=blob:none "file://$(pwd)/server" objects-dir && + ( + cd objects-dir && + + git rev-list --objects --all --missing=print | + grep "^?" | sort >missing-objects-before && + + git checkout -q origin/A && + + GIT_TRACE2_PERF="$(pwd)/trace.output" git \ + -c merge.directoryRenames=true merge --no-stat \ + --no-progress origin/B-dir && + + # Check the number of objects we reported we would fetch + cat >expect <<-EOF && + fetch_count:6 + EOF + grep fetch_count trace.output | cut -d "|" -f 9 | tr -d " ." >actual && + test_cmp expect actual && + + # Check the number of fetch commands exec-ed + grep d0.*fetch.negotiationAlgorithm trace.output >fetches && + test_line_count = 1 fetches && + + git rev-list --objects --all --missing=print | + grep "^?" | sort >missing-objects-after && + comm -2 -3 missing-objects-before missing-objects-after >old && + comm -1 -3 missing-objects-before missing-objects-after >new && + # No new missing objects + test_must_be_empty new && + # Fetched 6 objects + test_line_count = 6 old + ) +' + +# Testcase: Objects downloaded with lots of renames and modifications +# Commit O: +# general/{leap1_O, leap2_O} +# basename/{numbers_O, sequence_O, values_O} +# dir/subdir/{a,b,c,d,e_O,Makefile_TOP_O} +# dir/subdir/tweaked/{f,g,h,Makefile_SUB_O} +# dir/unchanged/ +# Commit A: +# (Rename leap->jump, rename basename/ -> basename/subdir/, rename dir/ +# -> folder/, move e into newsubdir, add newfile.rs, remove f, modify +# both both Makefiles and jumps) +# general/{jump1_A, jump2_A} +# basename/subdir/{numbers_A, sequence_A, values_A} +# folder/subdir/{a,b,c,d,Makefile_TOP_A} +# folder/subdir/newsubdir/{e_A,newfile.rs} +# folder/subdir/tweaked/{g,h,Makefile_SUB_A} +# folder/unchanged/ +# Commit B(-minimal): +# (modify both leaps, rename basename/ -> basename/subdir/, add +# newfile.{c,py}) +# general/{leap1_B, leap2_B} +# basename/subdir/{numbers_B, sequence_B, values_B} +# dir/{a,b,c,d,e_O,Makefile_TOP_O,newfile.c} +# dir/tweaked/{f,g,h,Makefile_SUB_O,newfile.py} +# dir/unchanged/ +# Expected: +# general/{jump1_Merged, jump2_Merged} +# basename/subdir/{numbers_Merged, sequence_Merged, values_Merged} +# folder/subdir/{a,b,c,d,Makefile_TOP_A,newfile.c} +# folder/subdir/newsubdir/e_A +# folder/subdir/tweaked/{g,h,Makefile_SUB_A,newfile.py} +# folder/unchanged/ +# +# Objects that need to be fetched: +# Rename detection: +# Side1 (O->A): +# Basename-matches rename detection only needs to fetch these objects: +# numbers_O, numbers_A +# sequence_O, sequence_A +# values_O, values_A +# Makefile_TOP_O, Makefile_TOP_A +# Makefile_SUB_O, Makefile_SUB_A +# e_O, e_A +# General rename detection only needs to fetch these objects: +# leap1_O, leap2_O +# jump1_A, jump2_A, newfile.rs +# (only need remaining relevant sources, but any relevant sources need +# to be matched against all possible unpaired destinations) +# Side2 (O->B): +# Basename-matches rename detection only needs to fetch these objects: +# numbers_B +# sequence_B +# values_B +# (because numbers_O, sequence_O, and values_O already fetched above) +# General rename detection only needs to fetch these objects: +# +# Merge: +# 3-way content merge needs to grab these objects: +# leap1_B +# leap2_B +# Nothing else needs to fetch objects +# +# Summary: 4 fetches (1 for 6 objects, 1 for 8, 1 for 3, 1 for 2) +# +test_expect_merge_algorithm failure failure 'Objects downloaded with lots of renames and modifications' ' + test_setup_repo && + git clone --sparse --filter=blob:none "file://$(pwd)/server" objects-many && + ( + cd objects-many && + + git rev-list --objects --all --missing=print | + grep "^?" | sort >missing-objects-before && + + git checkout -q origin/A && + + GIT_TRACE2_PERF="$(pwd)/trace.output" git \ + -c merge.directoryRenames=true merge --no-stat \ + --no-progress origin/B-many && + + # Check the number of objects we reported we would fetch + cat >expect <<-EOF && + fetch_count:12 + fetch_count:5 + fetch_count:3 + fetch_count:2 + EOF + grep fetch_count trace.output | cut -d "|" -f 9 | tr -d " ." >actual && + test_cmp expect actual && + + # Check the number of fetch commands exec-ed + grep d0.*fetch.negotiationAlgorithm trace.output >fetches && + test_line_count = 4 fetches && + + git rev-list --objects --all --missing=print | + grep "^?" | sort >missing-objects-after && + comm -2 -3 missing-objects-before missing-objects-after >old && + comm -1 -3 missing-objects-before missing-objects-after >new && + # No new missing objects + test_must_be_empty new && + # Fetched 12 + 5 + 3 + 2 = 22 objects + test_line_count = 22 old + ) +' + +test_done From patchwork Tue Jun 22 08:04:39 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 12336565 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DD335C49EA2 for ; Tue, 22 Jun 2021 08:04:53 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C17A261351 for ; Tue, 22 Jun 2021 08:04:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230357AbhFVIHI (ORCPT ); Tue, 22 Jun 2021 04:07:08 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41454 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230330AbhFVIHC (ORCPT ); Tue, 22 Jun 2021 04:07:02 -0400 Received: from mail-wr1-x433.google.com (mail-wr1-x433.google.com [IPv6:2a00:1450:4864:20::433]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4CD10C061760 for ; Tue, 22 Jun 2021 01:04:46 -0700 (PDT) Received: by mail-wr1-x433.google.com with SMTP id b3so12236494wrm.6 for ; Tue, 22 Jun 2021 01:04:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=yjljHMsI83jMnc6ysuMNAloovg3NdkTceuIBTZHhyB0=; b=p9NMNXL0MbPfxo7wLm++l3fnsb/NJH/JVvLtVCqOLCtNhWfkd+qbAxUwZIB4dRvN0g KNxbGaCVnsWWCUfcAX5TLl1QG9tE2CbpHzIVuWdFPjUdERBA3tWuFo3FvVlfo2hSXgN6 7JWDc5lU5M4fkhwSo/iOAhZeYTxbxXVYQrt4t0t2jtR3Mg2qwlLtMcrRqSLbZxhcMkcK ntwRvHxLKHd2RANSMroInS8DI6CMvdM1duJKV2+B/7bHHimkRBAanG8PXDNxCCt6v9+E ImglbFfq06hCN6Q97FPrhGFfWVOM5Leojf2CfEyMfWsEiTTumobzzr5uC3ueRH0UORYM DTSQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=yjljHMsI83jMnc6ysuMNAloovg3NdkTceuIBTZHhyB0=; b=kf92ojd9rTWvmcAyBkLKEAubVheEbsSH8CA3SdMaDOPnJmvzZpvNYN1eUgdaOu03G+ qO+By2Y3+w+f0AFQZ1+zDX20qiDQ4o5rauFq3P51SqrlkZ+WUl5/t4KDsKCC1PHSyzqy +9cqAbdM9jOq3ULIvI+RJZoLmY+5M7vv202gEAoaaWeC659oKrfLExWj9j93L+udblRn ZiT5yS7DnM2fZBPuXA52lSKhT7J+jRzFe/N3DQyVFQNAWO6Xu+nWUrWNP6SruglOtl8O MOlgETdzcEG25BvVkVD+iY8lkWO2rze9QtdXGDIQRxE+bsg3yh33nnchPKy/gl5dyorZ AErw== X-Gm-Message-State: AOAM532rzlZCAYo2SMWzdfzd0TudF5VyWWrb/Mw84dyYt5qHSm+JTey9 PwBCFMbOk2f7qdqJVjXQhvJArHSeNnk= X-Google-Smtp-Source: ABdhPJyOa/331FMl/Me2NXOdfxRIQihwC4Au9bVDudO7KDCJwcj1U/Cr4Ks0ymBU1oQ84XiwPdeutg== X-Received: by 2002:adf:e3c7:: with SMTP id k7mr3174651wrm.31.1624349084926; Tue, 22 Jun 2021 01:04:44 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id z10sm1426489wmp.39.2021.06.22.01.04.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 22 Jun 2021 01:04:44 -0700 (PDT) Message-Id: <7ed0162cdb4e3fe20c122bffa093d366c30df9c7.1624349082.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Tue, 22 Jun 2021 08:04:39 +0000 Subject: [PATCH v3 3/5] diffcore-rename: allow different missing_object_cb functions Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Jonathan Tan , Derrick Stolee , Taylor Blau , Derrick Stolee , Elijah Newren , Elijah Newren Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Elijah Newren From: Elijah Newren estimate_similarity() was setting up a diff_populate_filespec_options every time it was called, requiring the caller of estimate_similarity() to pass in some data needed to set up this option. Currently the needed data consisted of a single variable (skip_unmodified), but we want to also have the different estimate_similarity() callsites start using different missing_object_cb functions as well. Rather than also passing that data in, just have the caller pass in the whole diff_populate_filespec_options, and reduce the number of times we need to set it up. As a side note, this also drops the number of calls to has_promisor_remote() dramatically. If L is the number of basename paths to compare, M is the number of inexact sources, and N is the number of inexact destinations, then the number of calls to has_promisor_remote() drops from L+M*N down to at most 2 -- one for each of the sites that calls estimate_similarity(). has_promisor_remote() is a very fast function so this almost certainly has no measurable performance impact, but it seems cleaner to avoid calling that function so many times. Signed-off-by: Elijah Newren --- diffcore-rename.c | 58 +++++++++++++++++++++++++++++++---------------- 1 file changed, 39 insertions(+), 19 deletions(-) diff --git a/diffcore-rename.c b/diffcore-rename.c index 35378d84e8f1..e13e52046026 100644 --- a/diffcore-rename.c +++ b/diffcore-rename.c @@ -126,7 +126,7 @@ static int estimate_similarity(struct repository *r, struct diff_filespec *src, struct diff_filespec *dst, int minimum_score, - int skip_unmodified) + struct diff_populate_filespec_options *dpf_opt) { /* src points at a file that existed in the original tree (or * optionally a file in the destination tree) and dst points @@ -143,15 +143,6 @@ static int estimate_similarity(struct repository *r, */ unsigned long max_size, delta_size, base_size, src_copied, literal_added; int score; - struct diff_populate_filespec_options dpf_options = { - .check_size_only = 1 - }; - struct prefetch_options prefetch_options = {r, skip_unmodified}; - - if (r == the_repository && has_promisor_remote()) { - dpf_options.missing_object_cb = prefetch; - dpf_options.missing_object_data = &prefetch_options; - } /* We deal only with regular files. Symlink renames are handled * only when they are exact matches --- in other words, no edits @@ -169,11 +160,13 @@ static int estimate_similarity(struct repository *r, * is a possible size - we really should have a flag to * say whether the size is valid or not!) */ + dpf_opt->check_size_only = 1; + if (!src->cnt_data && - diff_populate_filespec(r, src, &dpf_options)) + diff_populate_filespec(r, src, dpf_opt)) return 0; if (!dst->cnt_data && - diff_populate_filespec(r, dst, &dpf_options)) + diff_populate_filespec(r, dst, dpf_opt)) return 0; max_size = ((src->size > dst->size) ? src->size : dst->size); @@ -191,11 +184,11 @@ static int estimate_similarity(struct repository *r, if (max_size * (MAX_SCORE-minimum_score) < delta_size * MAX_SCORE) return 0; - dpf_options.check_size_only = 0; + dpf_opt->check_size_only = 0; - if (!src->cnt_data && diff_populate_filespec(r, src, &dpf_options)) + if (!src->cnt_data && diff_populate_filespec(r, src, dpf_opt)) return 0; - if (!dst->cnt_data && diff_populate_filespec(r, dst, &dpf_options)) + if (!dst->cnt_data && diff_populate_filespec(r, dst, dpf_opt)) return 0; if (diffcore_count_changes(r, src, dst, @@ -862,7 +855,11 @@ static int find_basename_matches(struct diff_options *options, int i, renames = 0; struct strintmap sources; struct strintmap dests; - + struct diff_populate_filespec_options dpf_options = { + .check_binary = 0, + .missing_object_cb = NULL, + .missing_object_data = NULL + }; /* * The prefeteching stuff wants to know if it can skip prefetching * blobs that are unmodified...and will then do a little extra work @@ -873,7 +870,10 @@ static int find_basename_matches(struct diff_options *options, * the extra work necessary to check if rename_src entries are * unmodified would be a small waste. */ - int skip_unmodified = 0; + struct prefetch_options prefetch_options = { + .repo = options->repo, + .skip_unmodified = 0 + }; /* * Create maps of basename -> fullname(s) for remaining sources and @@ -910,6 +910,11 @@ static int find_basename_matches(struct diff_options *options, strintmap_set(&dests, base, i); } + if (options->repo == the_repository && has_promisor_remote()) { + dpf_options.missing_object_cb = prefetch; + dpf_options.missing_object_data = &prefetch_options; + } + /* Now look for basename matchups and do similarity estimation */ for (i = 0; i < rename_src_nr; ++i) { char *filename = rename_src[i].p->one->path; @@ -953,7 +958,7 @@ static int find_basename_matches(struct diff_options *options, one = rename_src[src_index].p->one; two = rename_dst[dst_index].p->two; score = estimate_similarity(options->repo, one, two, - minimum_score, skip_unmodified); + minimum_score, &dpf_options); /* If sufficiently similar, record as rename pair */ if (score < minimum_score) @@ -1272,6 +1277,14 @@ void diffcore_rename_extended(struct diff_options *options, int num_sources, want_copies; struct progress *progress = NULL; struct dir_rename_info info; + struct diff_populate_filespec_options dpf_options = { + .check_binary = 0, + .missing_object_cb = NULL, + .missing_object_data = NULL + }; + struct prefetch_options prefetch_options = { + .repo = options->repo + }; trace2_region_enter("diff", "setup", options->repo); info.setup = 0; @@ -1433,6 +1446,13 @@ void diffcore_rename_extended(struct diff_options *options, (uint64_t)num_destinations * (uint64_t)num_sources); } + /* Finish setting up dpf_options */ + prefetch_options.skip_unmodified = skip_unmodified; + if (options->repo == the_repository && has_promisor_remote()) { + dpf_options.missing_object_cb = prefetch; + dpf_options.missing_object_data = &prefetch_options; + } + CALLOC_ARRAY(mx, st_mult(NUM_CANDIDATE_PER_DST, num_destinations)); for (dst_cnt = i = 0; i < rename_dst_nr; i++) { struct diff_filespec *two = rename_dst[i].p->two; @@ -1458,7 +1478,7 @@ void diffcore_rename_extended(struct diff_options *options, this_src.score = estimate_similarity(options->repo, one, two, minimum_score, - skip_unmodified); + &dpf_options); this_src.name_score = basename_same(one, two); this_src.dst = i; this_src.src = j; From patchwork Tue Jun 22 08:04:40 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 12336569 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DE365C48BDF for ; Tue, 22 Jun 2021 08:04:58 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C639461351 for ; Tue, 22 Jun 2021 08:04:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230384AbhFVIHL (ORCPT ); Tue, 22 Jun 2021 04:07:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41474 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230377AbhFVIHG (ORCPT ); Tue, 22 Jun 2021 04:07:06 -0400 Received: from mail-wr1-x433.google.com (mail-wr1-x433.google.com [IPv6:2a00:1450:4864:20::433]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E65BEC0617A6 for ; Tue, 22 Jun 2021 01:04:46 -0700 (PDT) Received: by mail-wr1-x433.google.com with SMTP id j2so11950232wrs.12 for ; Tue, 22 Jun 2021 01:04:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=XINKupP9EZI+poFCqnlKAsUbuPBBJzJF9ke1oIyiHBk=; b=DUpYHHMWy4Nwzic71EXxxCkdOcvJflgpeNfqLRkvDkH4JJcw6ITFApyok3lflATXPe 4kA+yd9mNGUXsUKHS+GpgXt+7pY2ViYDLSpay/62PlNSwHvTrbCsXpGJjLpaA2l//xuu iHb4xaY97AdFOgpUnJA85V9LI1Mx4UdDR6ppBQF18Dt4bQiu7rR0cjW2zR2NgKOWlmpq +knJtrBALaYmv/9tnWr9T4qU1KZyY8Ps9ZMGqMyyQYZJU0kSHDmDGdzBaz0Hqiyxscj6 uvhFj0mjYfrQdgnwfMXMgJrFB1JboqRgkR0kVwtYR0VPiSfpH+wAijIOCeyDu2eA/FCE 4L/A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=XINKupP9EZI+poFCqnlKAsUbuPBBJzJF9ke1oIyiHBk=; b=o7b9dZ+CzvT03bXwpRZ6hHJ2p9akW8lvmEsbgTduyJDHJ7LXT5ClzycD3tjCqKfjSV pH2gy8ZzzgzaGWb7UMuJVDX4lArdCCyRU5RMZO2GHGuU+WGMIHA+h/wCHbJaZuUCH5MG pdb8oyxjTR23GmYj7MlkNmflaa5zWCvADgo6e9hhnADZQMWbHjq3ldWWvB7p5PNSlZg1 ZfdsrDmVfTY9ty4qDev6NDKsSW8mRw3edAybwqu5L9+lYPryU4zUvHYOISlghg32ynsU 0lg2tEpGoX2jNfly+6Ywbu9UG+YYFz2h7H+rZW0IrXLwblZ+/gEgjzT6RkifHd/KJb4x LUsA== X-Gm-Message-State: AOAM533WkCKVmYTwX72+nDTXqhqRlq+O7Bg2OwvD3YVyI51ysUZQVx0z TfowcElPExB1VItOIH+aRwo0vr5mm/s= X-Google-Smtp-Source: ABdhPJxsXPaSXMaRHI46izaVMQfBM4QVkhV1gSEgz4Z1YdyUi0xMNV5XfI0Tdi9WpBcnFHscFUWLQQ== X-Received: by 2002:a05:6000:1207:: with SMTP id e7mr3158601wrx.77.1624349085524; Tue, 22 Jun 2021 01:04:45 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id w9sm14632303wru.3.2021.06.22.01.04.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 22 Jun 2021 01:04:45 -0700 (PDT) Message-Id: In-Reply-To: References: Date: Tue, 22 Jun 2021 08:04:40 +0000 Subject: [PATCH v3 4/5] diffcore-rename: use a different prefetch for basename comparisons Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Jonathan Tan , Derrick Stolee , Taylor Blau , Derrick Stolee , Elijah Newren , Elijah Newren Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Elijah Newren From: Elijah Newren merge-ort was designed to minimize the amount of data needed and used, and several changes were made to diffcore-rename to take advantage of extra metadata to enable this data minimization (particularly the relevant_sources variable for skipping "irrelevant" renames). This effort obviously succeeded in drastically reducing computation times, but should also theoretically allow partial clones to download much less information. Previously, though, the "prefetch" command used in diffcore-rename had never been modified and downloaded many blobs that were unnecessary for merge-ort. This commit corrects that. When doing basename comparisons, we want to fetch only the objects that will be used for basename comparisons. If after basename fetching this leaves us with no more relevant sources (or no more destinations), then we won't need to do the full inexact rename detection and can skip downloading additional source and destination files. Even if we have to do that later full inexact rename detection, irrelevant sources are culled after basename matching and before the full inexact rename detection, so we can still avoid downloading the blobs for irrelevant sources. Rename prefetch() to inexact_prefetch(), and introduce a new basename_prefetch() to take advantage of this. If we modify the testcase from commit 557ac0350d ("merge-ort: begin performance work; instrument with trace2_region_* calls", 2021-01-23) to pass --sparse --filter=blob:none to the clone command, and use the new trace2 "fetch_count" output from a few commits ago to track both the number of fetch subcommands invoked and the number of objects fetched across all those fetches, then for the mega-renames testcase we observe the following: BEFORE this commit, rebasing 35 patches: strategy # of fetches total # of objects fetched --------- ------------ -------------------------- recursive 62 11423 ort 30 11391 AFTER this commit, rebasing the same 35 patches: ort 32 63 This means that the new code only needs to download less than 2 blobs per patch being rebased. That is especially interesting given that the repository at the start only had approximately half a dozen TOTAL blobs downloaded to start with (because the default sparse-checkout of just the toplevel directory was in use). So, for this particular linux kernel testcase that involved ~26,000 renames on the upstream side (drivers/ -> pilots/) across which 35 patches were being rebased, this change reduces the number of blobs that need to be downloaded by a factor of ~180. Signed-off-by: Elijah Newren --- diffcore-rename.c | 101 +++++++++++++++++++++++++++------ t/t6421-merge-partial-clone.sh | 4 +- 2 files changed, 85 insertions(+), 20 deletions(-) diff --git a/diffcore-rename.c b/diffcore-rename.c index e13e52046026..4ef0459cfb50 100644 --- a/diffcore-rename.c +++ b/diffcore-rename.c @@ -87,13 +87,13 @@ struct diff_score { short name_score; }; -struct prefetch_options { +struct inexact_prefetch_options { struct repository *repo; int skip_unmodified; }; -static void prefetch(void *prefetch_options) +static void inexact_prefetch(void *prefetch_options) { - struct prefetch_options *options = prefetch_options; + struct inexact_prefetch_options *options = prefetch_options; int i; struct oid_array to_fetch = OID_ARRAY_INIT; @@ -816,6 +816,78 @@ static int idx_possible_rename(char *filename, struct dir_rename_info *info) return idx; } +struct basename_prefetch_options { + struct repository *repo; + struct strintmap *relevant_sources; + struct strintmap *sources; + struct strintmap *dests; + struct dir_rename_info *info; +}; +static void basename_prefetch(void *prefetch_options) +{ + struct basename_prefetch_options *options = prefetch_options; + struct strintmap *relevant_sources = options->relevant_sources; + struct strintmap *sources = options->sources; + struct strintmap *dests = options->dests; + struct dir_rename_info *info = options->info; + int i; + struct oid_array to_fetch = OID_ARRAY_INIT; + + /* + * TODO: The following loops mirror the code/logic from + * find_basename_matches(), though not quite exactly. Maybe + * abstract the iteration logic out somehow? + */ + for (i = 0; i < rename_src_nr; ++i) { + char *filename = rename_src[i].p->one->path; + const char *base = NULL; + intptr_t src_index; + intptr_t dst_index; + + /* Skip irrelevant sources */ + if (relevant_sources && + !strintmap_contains(relevant_sources, filename)) + continue; + + /* + * If the basename is unique among remaining sources, then + * src_index will equal 'i' and we can attempt to match it + * to a unique basename in the destinations. Otherwise, + * use directory rename heuristics, if possible. + */ + base = get_basename(filename); + src_index = strintmap_get(sources, base); + assert(src_index == -1 || src_index == i); + + if (strintmap_contains(dests, base)) { + struct diff_filespec *one, *two; + + /* Find a matching destination, if possible */ + dst_index = strintmap_get(dests, base); + if (src_index == -1 || dst_index == -1) { + src_index = i; + dst_index = idx_possible_rename(filename, info); + } + if (dst_index == -1) + continue; + + /* Ignore this dest if already used in a rename */ + if (rename_dst[dst_index].is_rename) + continue; /* already used previously */ + + one = rename_src[src_index].p->one; + two = rename_dst[dst_index].p->two; + + /* Add the pairs */ + diff_add_if_missing(options->repo, &to_fetch, two); + diff_add_if_missing(options->repo, &to_fetch, one); + } + } + + promisor_remote_get_direct(options->repo, to_fetch.oid, to_fetch.nr); + oid_array_clear(&to_fetch); +} + static int find_basename_matches(struct diff_options *options, int minimum_score, struct dir_rename_info *info, @@ -860,19 +932,12 @@ static int find_basename_matches(struct diff_options *options, .missing_object_cb = NULL, .missing_object_data = NULL }; - /* - * The prefeteching stuff wants to know if it can skip prefetching - * blobs that are unmodified...and will then do a little extra work - * to verify that the oids are indeed different before prefetching. - * Unmodified blobs are only relevant when doing copy detection; - * when limiting to rename detection, diffcore_rename[_extended]() - * will never be called with unmodified source paths fed to us, so - * the extra work necessary to check if rename_src entries are - * unmodified would be a small waste. - */ - struct prefetch_options prefetch_options = { + struct basename_prefetch_options prefetch_options = { .repo = options->repo, - .skip_unmodified = 0 + .relevant_sources = relevant_sources, + .sources = &sources, + .dests = &dests, + .info = info }; /* @@ -911,7 +976,7 @@ static int find_basename_matches(struct diff_options *options, } if (options->repo == the_repository && has_promisor_remote()) { - dpf_options.missing_object_cb = prefetch; + dpf_options.missing_object_cb = basename_prefetch; dpf_options.missing_object_data = &prefetch_options; } @@ -1282,7 +1347,7 @@ void diffcore_rename_extended(struct diff_options *options, .missing_object_cb = NULL, .missing_object_data = NULL }; - struct prefetch_options prefetch_options = { + struct inexact_prefetch_options prefetch_options = { .repo = options->repo }; @@ -1449,7 +1514,7 @@ void diffcore_rename_extended(struct diff_options *options, /* Finish setting up dpf_options */ prefetch_options.skip_unmodified = skip_unmodified; if (options->repo == the_repository && has_promisor_remote()) { - dpf_options.missing_object_cb = prefetch; + dpf_options.missing_object_cb = inexact_prefetch; dpf_options.missing_object_data = &prefetch_options; } diff --git a/t/t6421-merge-partial-clone.sh b/t/t6421-merge-partial-clone.sh index 3dcffc15f801..aad975d24ddb 100755 --- a/t/t6421-merge-partial-clone.sh +++ b/t/t6421-merge-partial-clone.sh @@ -207,7 +207,7 @@ test_setup_repo () { # # Summary: 2 fetches (1 for 2 objects, 1 for 1 object) # -test_expect_merge_algorithm failure failure 'Objects downloaded for single relevant rename' ' +test_expect_merge_algorithm failure success 'Objects downloaded for single relevant rename' ' test_setup_repo && git clone --sparse --filter=blob:none "file://$(pwd)/server" objects-single && ( @@ -296,7 +296,7 @@ test_expect_merge_algorithm failure failure 'Objects downloaded for single relev # this are not all that common.) # Summary: 1 fetches for 6 objects # -test_expect_merge_algorithm failure failure 'Objects downloaded when a directory rename triggered' ' +test_expect_merge_algorithm failure success 'Objects downloaded when a directory rename triggered' ' test_setup_repo && git clone --sparse --filter=blob:none "file://$(pwd)/server" objects-dir && ( From patchwork Tue Jun 22 08:04:41 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 12336571 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id AF368C2B9F4 for ; Tue, 22 Jun 2021 08:05:06 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 939996128C for ; Tue, 22 Jun 2021 08:05:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230380AbhFVIHT (ORCPT ); Tue, 22 Jun 2021 04:07:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41480 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230436AbhFVIHG (ORCPT ); Tue, 22 Jun 2021 04:07:06 -0400 Received: from mail-wr1-x431.google.com (mail-wr1-x431.google.com [IPv6:2a00:1450:4864:20::431]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7C5D8C0617A8 for ; Tue, 22 Jun 2021 01:04:47 -0700 (PDT) Received: by mail-wr1-x431.google.com with SMTP id y7so22403563wrh.7 for ; Tue, 22 Jun 2021 01:04:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=cPouRtX0ywbuQRmrWfNTHyz7mMmKhchMKzvO6gSIOns=; b=UiYq37/sEafzLwTa1qfGpa9Amad05KrKUe1iK86lcxoi1bMWeLmJkJjVCaMl3s1gkl 2JrtKiEnDxhS7yuDgiFhDbrPgoPq/mf+49xFHCIwfn7BTVyRYcm7YJ/vOk8GqBf8zASd qYhHz/m8/n8n0CNZyHyESMQoIUOra70qwQ3L6iT2FsnWV0swied5vYgZnWMlY/0RdZyx QkTjxYa8o9gO+Va2CzRiyw0NvguIztyviVykUNUj6M9fsUcglR513xIeBnJZB7NlMc2N v5G8sEHOCHOr7sbveWeGdyzqaym5B/GG/zzz++BGlA4gmQXJK8I6yRFtxGN7O8nC7qN1 emkA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=cPouRtX0ywbuQRmrWfNTHyz7mMmKhchMKzvO6gSIOns=; b=aasuEdgfwSGzIVUVjbgXXyrda6lsBtoQpPz+xmyS+2B6I8RWwP1yJAZ4MKwAV+KIZg y/Nkrs1FogLpsxqmoFBugDWkice36SJBwTwteYFD//WqXO+S+lk5gzLtLGFJPo4jJkn5 czJOnhWoYgJRwAtntk+s5OXqmEuam6oWNvmRs4Job27c9a6yK/nHkcNnbp1q82UOOepn 73JwO5akn3/FrxGzKomP4dMziwrI4Z+fD1h3acwmaJ0BO4cXkygm42n+e1w8gKPR6V3C 8Y9FmH4ejUkc1JVI+KOzIXK2Xwr3yfRDRKGCT6i24ZEc0ye8GmeV6cV4sQoddzPX+CZK UqhQ== X-Gm-Message-State: AOAM531Xlst1Obpkz0G03Z19BIUONpycQyitjhq7wQF8EhdwPGXV3Ra1 xtNdlVuWNP1ffyo4beF2F/AOYAIrZdY= X-Google-Smtp-Source: ABdhPJz5f9I5rlXU3IU9R5NjFF1Q1VIWveBHfM8aokNe+w45X9Hi+heXiJXhQY1IoGURI6qGJFzXKg== X-Received: by 2002:a5d:538f:: with SMTP id d15mr3150405wrv.408.1624349086102; Tue, 22 Jun 2021 01:04:46 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id v22sm1427214wmh.40.2021.06.22.01.04.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 22 Jun 2021 01:04:45 -0700 (PDT) Message-Id: <69011cfe9fae1410213aae7a5599a3cd3c4207ce.1624349082.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Tue, 22 Jun 2021 08:04:41 +0000 Subject: [PATCH v3 5/5] merge-ort: add prefetching for content merges Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Jonathan Tan , Derrick Stolee , Taylor Blau , Derrick Stolee , Elijah Newren , Elijah Newren Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Elijah Newren From: Elijah Newren Commit 7fbbcb21b1 ("diff: batch fetching of missing blobs", 2019-04-05) introduced batching of fetching missing blobs, so that the diff machinery would have one fetch subprocess grab N blobs instead of N processes each grabbing 1. However, the diff machinery is not the only thing in a merge that needs to work on blobs. The 3-way content merges need them as well. Rather than download all the blobs 1 at a time, prefetch all the blobs needed for regular content merges. This does not cover all possible paths in merge-ort that might need to download blobs. Others include: - The blob_unchanged() calls to avoid modify/delete conflicts (when blob renormalization results in an "unchanged" file) - Preliminary content merges needed for rename/add and rename/rename(2to1) style conflicts. (Both of these types of conflicts can result in nested conflict markers from the need to do two levels of content merging; the first happens before our new prefetch_for_content_merges() function.) The first of these wouldn't be an extreme amount of work to support, and even the second could be theoretically supported in batching, but all of these cases seem unusual to me, and this is a minor performance optimization anyway; in the worst case we only get some of the fetches batched and have a few additional one-off fetches. So for now, just handle the regular 3-way content merges in our prefetching. For the testcase from the previous commit, the number of downloaded objects remains at 63, but this drops the number of fetches needed from 32 down to 20, a sizeable reduction. Signed-off-by: Elijah Newren --- merge-ort.c | 50 ++++++++++++++++++++++++++++++++++ t/t6421-merge-partial-clone.sh | 2 +- 2 files changed, 51 insertions(+), 1 deletion(-) diff --git a/merge-ort.c b/merge-ort.c index cfa751053b01..e3a5dfc7b312 100644 --- a/merge-ort.c +++ b/merge-ort.c @@ -29,6 +29,7 @@ #include "entry.h" #include "ll-merge.h" #include "object-store.h" +#include "promisor-remote.h" #include "revision.h" #include "strmap.h" #include "submodule.h" @@ -3485,6 +3486,54 @@ static void process_entry(struct merge_options *opt, record_entry_for_tree(dir_metadata, path, &ci->merged); } +static void prefetch_for_content_merges(struct merge_options *opt, + struct string_list *plist) +{ + struct string_list_item *e; + struct oid_array to_fetch = OID_ARRAY_INIT; + + if (opt->repo != the_repository || !has_promisor_remote()) + return; + + for (e = &plist->items[plist->nr-1]; e >= plist->items; --e) { + /* char *path = e->string; */ + struct conflict_info *ci = e->util; + int i; + + /* Ignore clean entries */ + if (ci->merged.clean) + continue; + + /* Ignore entries that don't need a content merge */ + if (ci->match_mask || ci->filemask < 6 || + !S_ISREG(ci->stages[1].mode) || + !S_ISREG(ci->stages[2].mode) || + oideq(&ci->stages[1].oid, &ci->stages[2].oid)) + continue; + + /* Also don't need content merge if base matches either side */ + if (ci->filemask == 7 && + S_ISREG(ci->stages[0].mode) && + (oideq(&ci->stages[0].oid, &ci->stages[1].oid) || + oideq(&ci->stages[0].oid, &ci->stages[2].oid))) + continue; + + for (i = 0; i < 3; i++) { + unsigned side_mask = (1 << i); + struct version_info *vi = &ci->stages[i]; + + if ((ci->filemask & side_mask) && + S_ISREG(vi->mode) && + oid_object_info_extended(opt->repo, &vi->oid, NULL, + OBJECT_INFO_FOR_PREFETCH)) + oid_array_append(&to_fetch, &vi->oid); + } + } + + promisor_remote_get_direct(opt->repo, to_fetch.oid, to_fetch.nr); + oid_array_clear(&to_fetch); +} + static void process_entries(struct merge_options *opt, struct object_id *result_oid) { @@ -3531,6 +3580,7 @@ static void process_entries(struct merge_options *opt, * the way when it is time to process the file at the same path). */ trace2_region_enter("merge", "processing", opt->repo); + prefetch_for_content_merges(opt, &plist); for (entry = &plist.items[plist.nr-1]; entry >= plist.items; --entry) { char *path = entry->string; /* diff --git a/t/t6421-merge-partial-clone.sh b/t/t6421-merge-partial-clone.sh index aad975d24ddb..36bcd7c32801 100755 --- a/t/t6421-merge-partial-clone.sh +++ b/t/t6421-merge-partial-clone.sh @@ -397,7 +397,7 @@ test_expect_merge_algorithm failure success 'Objects downloaded when a directory # # Summary: 4 fetches (1 for 6 objects, 1 for 8, 1 for 3, 1 for 2) # -test_expect_merge_algorithm failure failure 'Objects downloaded with lots of renames and modifications' ' +test_expect_merge_algorithm failure success 'Objects downloaded with lots of renames and modifications' ' test_setup_repo && git clone --sparse --filter=blob:none "file://$(pwd)/server" objects-many && (