diff mbox series

[v3,24/34] t/perf/p7519: speed up test using "test-tool touch"

Message ID f1ef9656fc3adf079c8e40a74baeb5356bcf1586.1625150864.git.gitgitgadget@gmail.com (mailing list archive)
State New, archived
Headers show
Series Builtin FSMonitor Feature | expand

Commit Message

Jeff Hostetler July 1, 2021, 2:47 p.m. UTC
From: Jeff Hostetler <jeffhost@microsoft.com>

Change p7519 to use a single "test-tool touch" command to update
the mtime on a series of (thousands) files instead of invoking
thousands of commands to update a single file.

This is primarily for Windows where process creation is so
very slow and reduces the test run time by minutes.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 t/perf/p7519-fsmonitor.sh | 14 ++++++--------
 1 file changed, 6 insertions(+), 8 deletions(-)

Comments

Ævar Arnfjörð Bjarmason July 1, 2021, 11:09 p.m. UTC | #1
On Thu, Jul 01 2021, Jeff Hostetler via GitGitGadget wrote:

> From: Jeff Hostetler <jeffhost@microsoft.com>
>
> Change p7519 to use a single "test-tool touch" command to update
> the mtime on a series of (thousands) files instead of invoking
> thousands of commands to update a single file.
>
> This is primarily for Windows where process creation is so
> very slow and reduces the test run time by minutes.
>
> Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
> ---
>  t/perf/p7519-fsmonitor.sh | 14 ++++++--------
>  1 file changed, 6 insertions(+), 8 deletions(-)
>
> diff --git a/t/perf/p7519-fsmonitor.sh b/t/perf/p7519-fsmonitor.sh
> index 5eb5044a103..f74e6014a0a 100755
> --- a/t/perf/p7519-fsmonitor.sh
> +++ b/t/perf/p7519-fsmonitor.sh
> @@ -119,10 +119,11 @@ test_expect_success "one time repo setup" '
>  	fi &&
>  
>  	mkdir 1_file 10_files 100_files 1000_files 10000_files &&
> -	for i in $(test_seq 1 10); do touch 10_files/$i; done &&
> -	for i in $(test_seq 1 100); do touch 100_files/$i; done &&
> -	for i in $(test_seq 1 1000); do touch 1000_files/$i; done &&
> -	for i in $(test_seq 1 10000); do touch 10000_files/$i; done &&
> +	test-tool touch sequence --pattern="10_files/%d" --start=1 --count=10 &&
> +	test-tool touch sequence --pattern="100_files/%d" --start=1 --count=100 &&
> +	test-tool touch sequence --pattern="1000_files/%d" --start=1 --count=1000 &&
> +	test-tool touch sequence --pattern="10000_files/%d" --start=1 --count=10000 &&
> +
>  	git add 1_file 10_files 100_files 1000_files 10000_files &&
>  	git commit -qm "Add files" &&
>  
> @@ -200,15 +201,12 @@ test_fsmonitor_suite() {
>  	# Update the mtimes on upto 100k files to make status think
>  	# that they are dirty.  For simplicity, omit any files with
>  	# LFs (i.e. anything that ls-files thinks it needs to dquote).
> -	# Then fully backslash-quote the paths to capture any
> -	# whitespace so that they pass thru xargs properly.
>  	#
>  	test_perf_w_drop_caches "status (dirty) ($DESC)" '
>  		git ls-files | \
>  			head -100000 | \
>  			grep -v \" | \
> -			sed '\''s/\(.\)/\\\1/g'\'' | \
> -			xargs test-tool chmtime -300 &&
> +			test-tool touch stdin &&
>  		git status
>  	'

Did you try to replace this with some variant of:

    test_seq 1 10000 | xargs touch

Which (depending on your xargs version) would invoke "touch" commands
with however many argv items it thinks you can handle.
Jeff Hostetler July 13, 2021, 5:06 p.m. UTC | #2
On 7/1/21 7:09 PM, Ævar Arnfjörð Bjarmason wrote:
> 
> On Thu, Jul 01 2021, Jeff Hostetler via GitGitGadget wrote:
> 
>> From: Jeff Hostetler <jeffhost@microsoft.com>
>>
>> Change p7519 to use a single "test-tool touch" command to update
>> the mtime on a series of (thousands) files instead of invoking
>> thousands of commands to update a single file.
>>
>> This is primarily for Windows where process creation is so
>> very slow and reduces the test run time by minutes.
>>
>> Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
>> ---
>>   t/perf/p7519-fsmonitor.sh | 14 ++++++--------
>>   1 file changed, 6 insertions(+), 8 deletions(-)
>>
>> diff --git a/t/perf/p7519-fsmonitor.sh b/t/perf/p7519-fsmonitor.sh
>> index 5eb5044a103..f74e6014a0a 100755
>> --- a/t/perf/p7519-fsmonitor.sh
>> +++ b/t/perf/p7519-fsmonitor.sh
>> @@ -119,10 +119,11 @@ test_expect_success "one time repo setup" '
>>   	fi &&
>>   
>>   	mkdir 1_file 10_files 100_files 1000_files 10000_files &&
>> -	for i in $(test_seq 1 10); do touch 10_files/$i; done &&
>> -	for i in $(test_seq 1 100); do touch 100_files/$i; done &&
>> -	for i in $(test_seq 1 1000); do touch 1000_files/$i; done &&
>> -	for i in $(test_seq 1 10000); do touch 10000_files/$i; done &&
>> +	test-tool touch sequence --pattern="10_files/%d" --start=1 --count=10 &&
>> +	test-tool touch sequence --pattern="100_files/%d" --start=1 --count=100 &&
>> +	test-tool touch sequence --pattern="1000_files/%d" --start=1 --count=1000 &&
>> +	test-tool touch sequence --pattern="10000_files/%d" --start=1 --count=10000 &&
>> +
>>   	git add 1_file 10_files 100_files 1000_files 10000_files &&
>>   	git commit -qm "Add files" &&
>>   
>> @@ -200,15 +201,12 @@ test_fsmonitor_suite() {
>>   	# Update the mtimes on upto 100k files to make status think
>>   	# that they are dirty.  For simplicity, omit any files with
>>   	# LFs (i.e. anything that ls-files thinks it needs to dquote).
>> -	# Then fully backslash-quote the paths to capture any
>> -	# whitespace so that they pass thru xargs properly.
>>   	#
>>   	test_perf_w_drop_caches "status (dirty) ($DESC)" '
>>   		git ls-files | \
>>   			head -100000 | \
>>   			grep -v \" | \
>> -			sed '\''s/\(.\)/\\\1/g'\'' | \
>> -			xargs test-tool chmtime -300 &&
>> +			test-tool touch stdin &&
>>   		git status
>>   	'
> 
> Did you try to replace this with some variant of:
> 
>      test_seq 1 10000 | xargs touch
> 
> Which (depending on your xargs version) would invoke "touch" commands
> with however many argv items it thinks you can handle.
> 

a quick test on my Windows machine shows that

	test_seq 1 10000 | xargs touch

takes 3.1 seconds.

just a simple

	test_seq 1 10000 >/dev/null

take 0.2 seconds.

using my test-tool helper cuts that time in half.

Jeff
Elijah Newren July 13, 2021, 5:36 p.m. UTC | #3
On Tue, Jul 13, 2021 at 10:10 AM Jeff Hostetler <git@jeffhostetler.com> wrote:
>
> On 7/1/21 7:09 PM, Ævar Arnfjörð Bjarmason wrote:
> >
> > Did you try to replace this with some variant of:
> >
> >      test_seq 1 10000 | xargs touch
> >
> > Which (depending on your xargs version) would invoke "touch" commands
> > with however many argv items it thinks you can handle.
>
> a quick test on my Windows machine shows that
>
>         test_seq 1 10000 | xargs touch
>
> takes 3.1 seconds.
>
> just a simple
>
>         test_seq 1 10000 >/dev/null
>
> take 0.2 seconds.
>
> using my test-tool helper cuts that time in half.

Yeah, test_seq is pretty bad; it's just a loop in shell.  Is there a
'seq' on windows, and does using it instead of test_seq make things
faster with Ævar's suggested command?

I'd really like to modify test_seq to use seq when it's available and
fall back to the looping-in-shell when we need to for various
platforms.

Maybe it'd even make sense to write a 'test-tool seq' and make
test_seq use that just so we can rip out that super lame shell
looping.
Junio C Hamano July 13, 2021, 5:47 p.m. UTC | #4
Elijah Newren <newren@gmail.com> writes:

> On Tue, Jul 13, 2021 at 10:10 AM Jeff Hostetler <git@jeffhostetler.com> wrote:
>>
>> a quick test on my Windows machine shows that
>>
>>         test_seq 1 10000 | xargs touch
>>
>> takes 3.1 seconds.
>>
>> just a simple
>>
>>         test_seq 1 10000 >/dev/null
>>
>> take 0.2 seconds.
>>
>> using my test-tool helper cuts that time in half.
>
> Yeah, test_seq is pretty bad; it's just a loop in shell.  Is there a
> 'seq' on windows, and does using it instead of test_seq make things
> faster with Ævar's suggested command?

Unless I am misreading Jeff's message, I do not think that makes
sense.  Counting to 10000 in shell loop is trivial (0.2 seconds),
but letting touch invoked 10000 times to create (or smudge mtime of,
but I suspect that is not what is going on here) 10000 files takes
3.1 seconds, and of course a native binary that creates 10000 files
with a single invocation would be faster.

> I'd really like to modify test_seq to use seq when it's available and
> fall back to the looping-in-shell when we need to for various
> platforms.

So, if I am reading Jeff correctly, that optimizes something that is
not a bottleneck.

> Maybe it'd even make sense to write a 'test-tool seq' and make
> test_seq use that just so we can rip out that super lame shell
> looping.
Elijah Newren July 13, 2021, 5:50 p.m. UTC | #5
On Tue, Jul 13, 2021 at 10:47 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> Elijah Newren <newren@gmail.com> writes:
>
> > On Tue, Jul 13, 2021 at 10:10 AM Jeff Hostetler <git@jeffhostetler.com> wrote:
> >>
> >> a quick test on my Windows machine shows that
> >>
> >>         test_seq 1 10000 | xargs touch
> >>
> >> takes 3.1 seconds.
> >>
> >> just a simple
> >>
> >>         test_seq 1 10000 >/dev/null
> >>
> >> take 0.2 seconds.
> >>
> >> using my test-tool helper cuts that time in half.
> >
> > Yeah, test_seq is pretty bad; it's just a loop in shell.  Is there a
> > 'seq' on windows, and does using it instead of test_seq make things
> > faster with Ævar's suggested command?
>
> Unless I am misreading Jeff's message, I do not think that makes
> sense.  Counting to 10000 in shell loop is trivial (0.2 seconds),
> but letting touch invoked 10000 times to create (or smudge mtime of,
> but I suspect that is not what is going on here) 10000 files takes
> 3.1 seconds, and of course a native binary that creates 10000 files
> with a single invocation would be faster.
>
> > I'd really like to modify test_seq to use seq when it's available and
> > fall back to the looping-in-shell when we need to for various
> > platforms.
>
> So, if I am reading Jeff correctly, that optimizes something that is
> not a bottleneck.

Oh, indeed.  Sorry, I misread.
Jeff Hostetler July 13, 2021, 5:58 p.m. UTC | #6
On 7/13/21 1:36 PM, Elijah Newren wrote:
> On Tue, Jul 13, 2021 at 10:10 AM Jeff Hostetler <git@jeffhostetler.com> wrote:
>>
>> On 7/1/21 7:09 PM, Ævar Arnfjörð Bjarmason wrote:
>>>
>>> Did you try to replace this with some variant of:
>>>
>>>       test_seq 1 10000 | xargs touch
>>>
>>> Which (depending on your xargs version) would invoke "touch" commands
>>> with however many argv items it thinks you can handle.
>>
>> a quick test on my Windows machine shows that
>>
>>          test_seq 1 10000 | xargs touch
>>
>> takes 3.1 seconds.
>>
>> just a simple
>>
>>          test_seq 1 10000 >/dev/null
>>
>> take 0.2 seconds.
>>
>> using my test-tool helper cuts that time in half.
> 
> Yeah, test_seq is pretty bad; it's just a loop in shell.  Is there a
> 'seq' on windows, and does using it instead of test_seq make things
> faster with Ævar's suggested command?
> 

The Git for Windows SDK bash environment does have a /usr/bin/seq
which appears to be from GNU coreutils 8.32.  (This is different
from the version that I have on my Mac (which doesn't have a version
number).)

Using it:

	seq 1 10000 >/dev/null

takes 0.04 seconds instead of 0.2.

However, it doesn't help the touch.

	seq 1 10000 | xargs touch

still takes ~3.1 seconds.

FWIW, the xargs is clustering the 10,000 files into ~4 command lines,
so there is a little bit of Windows process overhead, but not that
much.

	seq 1 10000 | xargs wc -l | grep total

> I'd really like to modify test_seq to use seq when it's available and
> fall back to the looping-in-shell when we need to for various
> platforms.
> 
> Maybe it'd even make sense to write a 'test-tool seq' and make
> test_seq use that just so we can rip out that super lame shell
> looping.
>
Jeff Hostetler July 13, 2021, 6:04 p.m. UTC | #7
On 7/1/21 10:47 AM, Jeff Hostetler via GitGitGadget wrote:
> From: Jeff Hostetler <jeffhost@microsoft.com>
> 
> Change p7519 to use a single "test-tool touch" command to update
> the mtime on a series of (thousands) files instead of invoking
> thousands of commands to update a single file.
> 
> This is primarily for Windows where process creation is so
> very slow and reduces the test run time by minutes.
> 
> Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
> ---
>   t/perf/p7519-fsmonitor.sh | 14 ++++++--------
>   1 file changed, 6 insertions(+), 8 deletions(-)
> 
> diff --git a/t/perf/p7519-fsmonitor.sh b/t/perf/p7519-fsmonitor.sh
> index 5eb5044a103..f74e6014a0a 100755
> --- a/t/perf/p7519-fsmonitor.sh
> +++ b/t/perf/p7519-fsmonitor.sh
> @@ -119,10 +119,11 @@ test_expect_success "one time repo setup" '
>   	fi &&
>   
>   	mkdir 1_file 10_files 100_files 1000_files 10000_files &&
> -	for i in $(test_seq 1 10); do touch 10_files/$i; done &&
> -	for i in $(test_seq 1 100); do touch 100_files/$i; done &&
> -	for i in $(test_seq 1 1000); do touch 1000_files/$i; done &&
> -	for i in $(test_seq 1 10000); do touch 10000_files/$i; done &&
> +	test-tool touch sequence --pattern="10_files/%d" --start=1 --count=10 &&
> +	test-tool touch sequence --pattern="100_files/%d" --start=1 --count=100 &&
> +	test-tool touch sequence --pattern="1000_files/%d" --start=1 --count=1000 &&
> +	test-tool touch sequence --pattern="10000_files/%d" --start=1 --count=10000 &&

The big win in taking *minutes* off of the run time of this
test was getting rid of the `for` loops and one `touch` invocation
per file.

So whether we keep my `test-tool touch` command or switch to
`test_seq` or `seq` is open for debate.  Mine seems quicker, but
it is more or less round off error in the larger picture considering
what we started with.

I'll play with this a bit.
Jeff
Junio C Hamano July 13, 2021, 6:07 p.m. UTC | #8
Jeff Hostetler <git@jeffhostetler.com> writes:

> FWIW, the xargs is clustering the 10,000 files into ~4 command lines,
> so there is a little bit of Windows process overhead, but not that
> much.
>
> 	seq 1 10000 | xargs wc -l | grep total
>
>> I'd really like to modify test_seq to use seq when it's available and
>> fall back to the looping-in-shell when we need to for various
>> platforms.
>> Maybe it'd even make sense to write a 'test-tool seq' and make
>> test_seq use that just so we can rip out that super lame shell
>> looping.
>> 

So what lame in this picture is not shell, or process overhead, but
I/O performance.

I've seen some noises about Windows file creation performance raised
as an issue when doing initial checkout followed by "git clone", and
an idea floated to create a bunch of open file handles for writing
in threads when checkout (really the caller that repeatedly calls
entry.c:write_entry() by iterating the in-core index) starts, and
write out the contents in parallel, as a workaround.  When I heard
it, I somehow thought it was meant as a not-so-funny joke, but from
the sounds of it, the I/O performance may be so horrible to require
such a hack to be usable there.  Sigh...
Ævar Arnfjörð Bjarmason July 13, 2021, 6:18 p.m. UTC | #9
On Tue, Jul 13 2021, Jeff Hostetler wrote:

> On 7/1/21 7:09 PM, Ævar Arnfjörð Bjarmason wrote:
>> On Thu, Jul 01 2021, Jeff Hostetler via GitGitGadget wrote:
>> 
>>> From: Jeff Hostetler <jeffhost@microsoft.com>
>>>
>>> Change p7519 to use a single "test-tool touch" command to update
>>> the mtime on a series of (thousands) files instead of invoking
>>> thousands of commands to update a single file.
>>>
>>> This is primarily for Windows where process creation is so
>>> very slow and reduces the test run time by minutes.
>>>
>>> Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
>>> ---
>>>   t/perf/p7519-fsmonitor.sh | 14 ++++++--------
>>>   1 file changed, 6 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/t/perf/p7519-fsmonitor.sh b/t/perf/p7519-fsmonitor.sh
>>> index 5eb5044a103..f74e6014a0a 100755
>>> --- a/t/perf/p7519-fsmonitor.sh
>>> +++ b/t/perf/p7519-fsmonitor.sh
>>> @@ -119,10 +119,11 @@ test_expect_success "one time repo setup" '
>>>   	fi &&
>>>     	mkdir 1_file 10_files 100_files 1000_files 10000_files &&
>>> -	for i in $(test_seq 1 10); do touch 10_files/$i; done &&
>>> -	for i in $(test_seq 1 100); do touch 100_files/$i; done &&
>>> -	for i in $(test_seq 1 1000); do touch 1000_files/$i; done &&
>>> -	for i in $(test_seq 1 10000); do touch 10000_files/$i; done &&
>>> +	test-tool touch sequence --pattern="10_files/%d" --start=1 --count=10 &&
>>> +	test-tool touch sequence --pattern="100_files/%d" --start=1 --count=100 &&
>>> +	test-tool touch sequence --pattern="1000_files/%d" --start=1 --count=1000 &&
>>> +	test-tool touch sequence --pattern="10000_files/%d" --start=1 --count=10000 &&
>>> +
>>>   	git add 1_file 10_files 100_files 1000_files 10000_files &&
>>>   	git commit -qm "Add files" &&
>>>   @@ -200,15 +201,12 @@ test_fsmonitor_suite() {
>>>   	# Update the mtimes on upto 100k files to make status think
>>>   	# that they are dirty.  For simplicity, omit any files with
>>>   	# LFs (i.e. anything that ls-files thinks it needs to dquote).
>>> -	# Then fully backslash-quote the paths to capture any
>>> -	# whitespace so that they pass thru xargs properly.
>>>   	#
>>>   	test_perf_w_drop_caches "status (dirty) ($DESC)" '
>>>   		git ls-files | \
>>>   			head -100000 | \
>>>   			grep -v \" | \
>>> -			sed '\''s/\(.\)/\\\1/g'\'' | \
>>> -			xargs test-tool chmtime -300 &&
>>> +			test-tool touch stdin &&
>>>   		git status
>>>   	'
>> Did you try to replace this with some variant of:
>>      test_seq 1 10000 | xargs touch
>> Which (depending on your xargs version) would invoke "touch"
>> commands
>> with however many argv items it thinks you can handle.
>> 
>
> a quick test on my Windows machine shows that
>
> 	test_seq 1 10000 | xargs touch
>
> takes 3.1 seconds.
>
> just a simple
>
> 	test_seq 1 10000 >/dev/null
>
> take 0.2 seconds.
>
> using my test-tool helper cuts that time in half.

There's what Elijah mentioned about test_seq, so maybe it's just that.

But what I was suggesting was using the xargs mode where it does N
arguments at a time.

Does this work for you, and does it cause xargs to invoke "touch" with
the relevant N number of arguments, and does it help with the
performance?

    test_seq 1 10000 | xargs touch
    test_seq 1 10000 | xargs -n 10 touch
    test_seq 1 10000 | xargs -n 100 touch
    test_seq 1 10000 | xargs -n 1000 touch

etc.

Also I didn't notice this before, but the -300 part of "chmtime -300"
was redundant before then? I.e. you're implicitly changing it to "=+0"
instead with your "touch" helper, are you not?
Jeff Hostetler July 13, 2021, 6:19 p.m. UTC | #10
On 7/13/21 2:07 PM, Junio C Hamano wrote:
> Jeff Hostetler <git@jeffhostetler.com> writes:
> 
>> FWIW, the xargs is clustering the 10,000 files into ~4 command lines,
>> so there is a little bit of Windows process overhead, but not that
>> much.
>>
>> 	seq 1 10000 | xargs wc -l | grep total
>>
>>> I'd really like to modify test_seq to use seq when it's available and
>>> fall back to the looping-in-shell when we need to for various
>>> platforms.
>>> Maybe it'd even make sense to write a 'test-tool seq' and make
>>> test_seq use that just so we can rip out that super lame shell
>>> looping.
>>>
> 
> So what lame in this picture is not shell, or process overhead, but
> I/O performance.
> 
> I've seen some noises about Windows file creation performance raised
> as an issue when doing initial checkout followed by "git clone", and
> an idea floated to create a bunch of open file handles for writing
> in threads when checkout (really the caller that repeatedly calls
> entry.c:write_entry() by iterating the in-core index) starts, and
> write out the contents in parallel, as a workaround.  When I heard
> it, I somehow thought it was meant as a not-so-funny joke, but from
> the sounds of it, the I/O performance may be so horrible to require
> such a hack to be usable there.  Sigh...
> 

Yes, there are some things here (that I believe to be I/O related)
on Windows that I want to look at when I wrap up FSMonitor.  And
yes, some of them sound pretty stupid.

Jeff
Jeff Hostetler July 13, 2021, 7:05 p.m. UTC | #11
On 7/13/21 2:18 PM, Ævar Arnfjörð Bjarmason wrote:
> 
> On Tue, Jul 13 2021, Jeff Hostetler wrote:
> 
>> On 7/1/21 7:09 PM, Ævar Arnfjörð Bjarmason wrote:
>>> On Thu, Jul 01 2021, Jeff Hostetler via GitGitGadget wrote:
>>>
>>>> From: Jeff Hostetler <jeffhost@microsoft.com>
>>>>
>>>> Change p7519 to use a single "test-tool touch" command to update
>>>> the mtime on a series of (thousands) files instead of invoking
>>>> thousands of commands to update a single file.
>>>>
>>>> This is primarily for Windows where process creation is so
>>>> very slow and reduces the test run time by minutes.
>>>>
>>>> Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
>>>> ---
>>>>    t/perf/p7519-fsmonitor.sh | 14 ++++++--------
>>>>    1 file changed, 6 insertions(+), 8 deletions(-)
>>>>
>>>> diff --git a/t/perf/p7519-fsmonitor.sh b/t/perf/p7519-fsmonitor.sh
>>>> index 5eb5044a103..f74e6014a0a 100755
>>>> --- a/t/perf/p7519-fsmonitor.sh
>>>> +++ b/t/perf/p7519-fsmonitor.sh
>>>> @@ -119,10 +119,11 @@ test_expect_success "one time repo setup" '
>>>>    	fi &&
>>>>      	mkdir 1_file 10_files 100_files 1000_files 10000_files &&
>>>> -	for i in $(test_seq 1 10); do touch 10_files/$i; done &&
>>>> -	for i in $(test_seq 1 100); do touch 100_files/$i; done &&
>>>> -	for i in $(test_seq 1 1000); do touch 1000_files/$i; done &&
>>>> -	for i in $(test_seq 1 10000); do touch 10000_files/$i; done &&
>>>> +	test-tool touch sequence --pattern="10_files/%d" --start=1 --count=10 &&
>>>> +	test-tool touch sequence --pattern="100_files/%d" --start=1 --count=100 &&
>>>> +	test-tool touch sequence --pattern="1000_files/%d" --start=1 --count=1000 &&
>>>> +	test-tool touch sequence --pattern="10000_files/%d" --start=1 --count=10000 &&
>>>> +
>>>>    	git add 1_file 10_files 100_files 1000_files 10000_files &&
>>>>    	git commit -qm "Add files" &&
>>>>    @@ -200,15 +201,12 @@ test_fsmonitor_suite() {
>>>>    	# Update the mtimes on upto 100k files to make status think
>>>>    	# that they are dirty.  For simplicity, omit any files with
>>>>    	# LFs (i.e. anything that ls-files thinks it needs to dquote).
>>>> -	# Then fully backslash-quote the paths to capture any
>>>> -	# whitespace so that they pass thru xargs properly.
>>>>    	#
>>>>    	test_perf_w_drop_caches "status (dirty) ($DESC)" '
>>>>    		git ls-files | \
>>>>    			head -100000 | \
>>>>    			grep -v \" | \
>>>> -			sed '\''s/\(.\)/\\\1/g'\'' | \
>>>> -			xargs test-tool chmtime -300 &&
>>>> +			test-tool touch stdin &&
>>>>    		git status
>>>>    	'
>>> Did you try to replace this with some variant of:
>>>       test_seq 1 10000 | xargs touch
>>> Which (depending on your xargs version) would invoke "touch"
>>> commands
>>> with however many argv items it thinks you can handle.
>>>
>>
>> a quick test on my Windows machine shows that
>>
>> 	test_seq 1 10000 | xargs touch
>>
>> takes 3.1 seconds.
>>
>> just a simple
>>
>> 	test_seq 1 10000 >/dev/null
>>
>> take 0.2 seconds.
>>
>> using my test-tool helper cuts that time in half.
> 
> There's what Elijah mentioned about test_seq, so maybe it's just that.
> 
> But what I was suggesting was using the xargs mode where it does N
> arguments at a time.
> 
> Does this work for you, and does it cause xargs to invoke "touch" with
> the relevant N number of arguments, and does it help with the
> performance?
> 
>      test_seq 1 10000 | xargs touch
>      test_seq 1 10000 | xargs -n 10 touch
>      test_seq 1 10000 | xargs -n 100 touch
>      test_seq 1 10000 | xargs -n 1000 touch

The GFW SDK version of xargs does have `-n N` and it does work as
advertised.  And it does slow down things considerably.  Letting it
do ~2500 per command in 4 commands took the 3.1 seconds listed above.

Add a -n 100 to it takes 5.7 seconds, so process creation overhead
is a factor here.


> 
> etc.
> 
> Also I didn't notice this before, but the -300 part of "chmtime -300"
> was redundant before then? I.e. you're implicitly changing it to "=+0"
> instead with your "touch" helper, are you not?
> 

Right. I'm changing it to the current time.

Jeff
Ævar Arnfjörð Bjarmason July 20, 2021, 7:18 p.m. UTC | #12
On Tue, Jul 13 2021, Jeff Hostetler wrote:

> On 7/13/21 2:18 PM, Ævar Arnfjörð Bjarmason wrote:
>> On Tue, Jul 13 2021, Jeff Hostetler wrote:
>> 
>>> On 7/1/21 7:09 PM, Ævar Arnfjörð Bjarmason wrote:
>>>> On Thu, Jul 01 2021, Jeff Hostetler via GitGitGadget wrote:
>>>>
>>>>> From: Jeff Hostetler <jeffhost@microsoft.com>
>>>>>
>>>>> Change p7519 to use a single "test-tool touch" command to update
>>>>> the mtime on a series of (thousands) files instead of invoking
>>>>> thousands of commands to update a single file.
>>>>>
>>>>> This is primarily for Windows where process creation is so
>>>>> very slow and reduces the test run time by minutes.
>>>>>
>>>>> Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
>>>>> ---
>>>>>    t/perf/p7519-fsmonitor.sh | 14 ++++++--------
>>>>>    1 file changed, 6 insertions(+), 8 deletions(-)
>>>>>
>>>>> diff --git a/t/perf/p7519-fsmonitor.sh b/t/perf/p7519-fsmonitor.sh
>>>>> index 5eb5044a103..f74e6014a0a 100755
>>>>> --- a/t/perf/p7519-fsmonitor.sh
>>>>> +++ b/t/perf/p7519-fsmonitor.sh
>>>>> @@ -119,10 +119,11 @@ test_expect_success "one time repo setup" '
>>>>>    	fi &&
>>>>>      	mkdir 1_file 10_files 100_files 1000_files 10000_files &&
>>>>> -	for i in $(test_seq 1 10); do touch 10_files/$i; done &&
>>>>> -	for i in $(test_seq 1 100); do touch 100_files/$i; done &&
>>>>> -	for i in $(test_seq 1 1000); do touch 1000_files/$i; done &&
>>>>> -	for i in $(test_seq 1 10000); do touch 10000_files/$i; done &&
>>>>> +	test-tool touch sequence --pattern="10_files/%d" --start=1 --count=10 &&
>>>>> +	test-tool touch sequence --pattern="100_files/%d" --start=1 --count=100 &&
>>>>> +	test-tool touch sequence --pattern="1000_files/%d" --start=1 --count=1000 &&
>>>>> +	test-tool touch sequence --pattern="10000_files/%d" --start=1 --count=10000 &&
>>>>> +
>>>>>    	git add 1_file 10_files 100_files 1000_files 10000_files &&
>>>>>    	git commit -qm "Add files" &&
>>>>>    @@ -200,15 +201,12 @@ test_fsmonitor_suite() {
>>>>>    	# Update the mtimes on upto 100k files to make status think
>>>>>    	# that they are dirty.  For simplicity, omit any files with
>>>>>    	# LFs (i.e. anything that ls-files thinks it needs to dquote).
>>>>> -	# Then fully backslash-quote the paths to capture any
>>>>> -	# whitespace so that they pass thru xargs properly.
>>>>>    	#
>>>>>    	test_perf_w_drop_caches "status (dirty) ($DESC)" '
>>>>>    		git ls-files | \
>>>>>    			head -100000 | \
>>>>>    			grep -v \" | \
>>>>> -			sed '\''s/\(.\)/\\\1/g'\'' | \
>>>>> -			xargs test-tool chmtime -300 &&
>>>>> +			test-tool touch stdin &&
>>>>>    		git status
>>>>>    	'
>>>> Did you try to replace this with some variant of:
>>>>       test_seq 1 10000 | xargs touch
>>>> Which (depending on your xargs version) would invoke "touch"
>>>> commands
>>>> with however many argv items it thinks you can handle.
>>>>
>>>
>>> a quick test on my Windows machine shows that
>>>
>>> 	test_seq 1 10000 | xargs touch
>>>
>>> takes 3.1 seconds.
>>>
>>> just a simple
>>>
>>> 	test_seq 1 10000 >/dev/null
>>>
>>> take 0.2 seconds.
>>>
>>> using my test-tool helper cuts that time in half.
>> There's what Elijah mentioned about test_seq, so maybe it's just
>> that.
>> But what I was suggesting was using the xargs mode where it does N
>> arguments at a time.
>> Does this work for you, and does it cause xargs to invoke "touch"
>> with
>> the relevant N number of arguments, and does it help with the
>> performance?
>>      test_seq 1 10000 | xargs touch
>>      test_seq 1 10000 | xargs -n 10 touch
>>      test_seq 1 10000 | xargs -n 100 touch
>>      test_seq 1 10000 | xargs -n 1000 touch
>
> The GFW SDK version of xargs does have `-n N` and it does work as
> advertised.  And it does slow down things considerably.  Letting it
> do ~2500 per command in 4 commands took the 3.1 seconds listed above.
>
> Add a -n 100 to it takes 5.7 seconds, so process creation overhead
> is a factor here.

Doesn't -n 2500 being faster than -n 100 suggest the opposite of process
overhead being the deciding factor? With -n 2500 you'll invoke 4 touch
processes, so one takes 2500/3.1 =~ 0.8s to run, whereas with -n 100 you
invoke 100 of them, so if the overall time is then 5.7 seconds that's
5.7/100 =~ ~0.06s.

Or am I misunderstanding you, or does some implicit parallelism kick in
with that version of xargs depending on -n?

>> etc.
>> Also I didn't notice this before, but the -300 part of "chmtime
>> -300"
>> was redundant before then? I.e. you're implicitly changing it to "=+0"
>> instead with your "touch" helper, are you not?
>> 
>
> Right. I'm changing it to the current time.

If that "while we're at it change the behavior of the test" is wanted I
think it should be called out in the commit message. Right now it looks
like it might be an unintentional regression in the test.
diff mbox series

Patch

diff --git a/t/perf/p7519-fsmonitor.sh b/t/perf/p7519-fsmonitor.sh
index 5eb5044a103..f74e6014a0a 100755
--- a/t/perf/p7519-fsmonitor.sh
+++ b/t/perf/p7519-fsmonitor.sh
@@ -119,10 +119,11 @@  test_expect_success "one time repo setup" '
 	fi &&
 
 	mkdir 1_file 10_files 100_files 1000_files 10000_files &&
-	for i in $(test_seq 1 10); do touch 10_files/$i; done &&
-	for i in $(test_seq 1 100); do touch 100_files/$i; done &&
-	for i in $(test_seq 1 1000); do touch 1000_files/$i; done &&
-	for i in $(test_seq 1 10000); do touch 10000_files/$i; done &&
+	test-tool touch sequence --pattern="10_files/%d" --start=1 --count=10 &&
+	test-tool touch sequence --pattern="100_files/%d" --start=1 --count=100 &&
+	test-tool touch sequence --pattern="1000_files/%d" --start=1 --count=1000 &&
+	test-tool touch sequence --pattern="10000_files/%d" --start=1 --count=10000 &&
+
 	git add 1_file 10_files 100_files 1000_files 10000_files &&
 	git commit -qm "Add files" &&
 
@@ -200,15 +201,12 @@  test_fsmonitor_suite() {
 	# Update the mtimes on upto 100k files to make status think
 	# that they are dirty.  For simplicity, omit any files with
 	# LFs (i.e. anything that ls-files thinks it needs to dquote).
-	# Then fully backslash-quote the paths to capture any
-	# whitespace so that they pass thru xargs properly.
 	#
 	test_perf_w_drop_caches "status (dirty) ($DESC)" '
 		git ls-files | \
 			head -100000 | \
 			grep -v \" | \
-			sed '\''s/\(.\)/\\\1/g'\'' | \
-			xargs test-tool chmtime -300 &&
+			test-tool touch stdin &&
 		git status
 	'