[v2,01/10] t: add helper to convert object IDs to paths

Message ID	20190616185330.549436-2-sandals@crustytoothpaste.net (mailing list archive)
State	New, archived
Headers	show Return-Path: <git-owner@kernel.org> From: "brian m. carlson" <sandals@crustytoothpaste.net> To: <git@vger.kernel.org> Cc: Jeff King <peff@peff.net>, Duy Nguyen <pclouds@gmail.com>, Johannes Schindelin <Johannes.Schindelin@gmx.de>, Jonathan Tan <jonathantanmy@google.com>, Eric Sunshine <sunshine@sunshineco.com>, Junio C Hamano <gitster@pobox.com> Subject: [PATCH v2 01/10] t: add helper to convert object IDs to paths Date: Sun, 16 Jun 2019 18:53:21 +0000 Message-Id: <20190616185330.549436-2-sandals@crustytoothpaste.net> In-Reply-To: <20190616185330.549436-1-sandals@crustytoothpaste.net> References: <20190616185330.549436-1-sandals@crustytoothpaste.net> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: git-owner@vger.kernel.org Precedence: bulk
Series	Hash-independent tests, part 4 \| expand [v2,00/10] Hash-independent tests, part 4 [v2,01/10] t: add helper to convert object IDs to paths [v2,02/10] t1410: make hash size independent [v2,03/10] t1450: make hash size independent [v2,04/10] t5000: make hash independent [v2,05/10] t6030: make test work with SHA-256 [v2,06/10] t0027: make hash size independent [v2,07/10] t0090: make test pass with SHA-256 [v2,08/10] t1007: remove SHA1 prerequisites [v2,09/10] t1710: make hash independent [v2,10/10] t2203: avoid hard-coded object ID values

brian m. carlson June 16, 2019, 6:53 p.m. UTC

There are several places in our testsuite where we want to insert a
slash after an object ID to make it into a path we can reference under
.git/objects, and we have various ways of doing so.  Add a helper to
provide a standard way of doing this that works for all size hashes.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 t/test-lib-functions.sh | 6 ++++++
 1 file changed, 6 insertions(+)

Johannes Schindelin June 17, 2019, 7:05 p.m. UTC | #1

Hi brian,

On Sun, 16 Jun 2019, brian m. carlson wrote:

> diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
> index 8270de74be..11a6abca2e 100644
> --- a/t/test-lib-functions.sh
> +++ b/t/test-lib-functions.sh
> @@ -1314,6 +1314,12 @@ test_oid () {
>  	eval "printf '%s' \"\${$var}\""
>  }
>
> +# Insert a slash into an object ID so it can be used to reference a location
> +# under ".git/objects".  For example, "deadbeef..." becomes "de/adbeef..".
> +test_oid_to_path () {
> +	echo "$1" | sed -e 's!^..!&/!'
> +}

I guess it does not *really* matter all that much, but this does spawn a
new process (and I think it actually spawns 4 on Windows, for reasons, and
spawning processes is super expensive on Windows).

We might actually want to think about using something like this instead
(which admittedly looks a bit like gobbledygook to the uninitiated, but it
definitely avoids any spawned process):

test_oid_to_path () {
	echo "${1%${1#??}}/${1#??}"
}

Ciao,
Dscho

brian m. carlson June 18, 2019, 1:29 a.m. UTC | #2

On 2019-06-17 at 19:05:03, Johannes Schindelin wrote:
> I guess it does not *really* matter all that much, but this does spawn a
> new process (and I think it actually spawns 4 on Windows, for reasons, and
> spawning processes is super expensive on Windows).
> 
> We might actually want to think about using something like this instead
> (which admittedly looks a bit like gobbledygook to the uninitiated, but it
> definitely avoids any spawned process):
> 
> test_oid_to_path () {
> 	echo "${1%${1#??}}/${1#??}"
> }

I'm fine making that change. The original design was because we had
other code that used that technique and I didn't see an obviously better
solution. Now you've provided one and a good justification.

I think the complexity isn't too terrible since it's a one-line function
and the comment adequately explains the behavior so people don't have to
parse it to understand it.

Johannes Sixt June 18, 2019, 6:14 a.m. UTC | #3

Am 18.06.19 um 03:29 schrieb brian m. carlson:
> On 2019-06-17 at 19:05:03, Johannes Schindelin wrote:
>> I guess it does not *really* matter all that much, but this does spawn a
>> new process (and I think it actually spawns 4 on Windows, for reasons, and
>> spawning processes is super expensive on Windows).
>>
>> We might actually want to think about using something like this instead
>> (which admittedly looks a bit like gobbledygook to the uninitiated, but it
>> definitely avoids any spawned process):
>>
>> test_oid_to_path () {
>> 	echo "${1%${1#??}}/${1#??}"
>> }
> 
> I'm fine making that change. The original design was because we had
> other code that used that technique and I didn't see an obviously better
> solution. Now you've provided one and a good justification.

Regardless of how it is implemented, I have another gripe with this
helper: the way it must be used requires a process: $(test_out_to_path $foo)

And looking through this patch series, I see a gazillion of *new*
process substitutions $(test_something...) and $(basename $whatever).
Can't we do something about it?

-- Hannes

Johannes Schindelin June 18, 2019, 4:15 p.m. UTC | #4

Hi Hannes,

On Tue, 18 Jun 2019, Johannes Sixt wrote:

> Am 18.06.19 um 03:29 schrieb brian m. carlson:
> > On 2019-06-17 at 19:05:03, Johannes Schindelin wrote:
> >> I guess it does not *really* matter all that much, but this does spawn a
> >> new process (and I think it actually spawns 4 on Windows, for reasons, and
> >> spawning processes is super expensive on Windows).
> >>
> >> We might actually want to think about using something like this instead
> >> (which admittedly looks a bit like gobbledygook to the uninitiated, but it
> >> definitely avoids any spawned process):
> >>
> >> test_oid_to_path () {
> >> 	echo "${1%${1#??}}/${1#??}"
> >> }
> >
> > I'm fine making that change. The original design was because we had
> > other code that used that technique and I didn't see an obviously better
> > solution. Now you've provided one and a good justification.
>
> Regardless of how it is implemented, I have another gripe with this
> helper: the way it must be used requires a process: $(test_out_to_path
> $foo)

Indeed.

> And looking through this patch series, I see a gazillion of *new*
> process substitutions $(test_something...) and $(basename $whatever).
> Can't we do something about it?

I wish there was. Unix shell scripting has not evolved much in the past,
what, 3 decades? So I don't really see a way to "pass variables by
reference" to shell functions, short of calling `eval` (which buys
preciously little as it _also_ has to spawn a new process [*1*]).

Of course, if we would slowly get away from depending so much on shell
scripting, then we would reap quite a few other benefits, too, not only
speed, but also much easier debugging, code coverage, etc.

Ciao,
Dscho

Footnote *1*: Theoretically, it could be a *ton* faster by using threads
on Windows. But threads are pretty much an afterthought on Unix/Linux, so
no mainstream POSIX shell supports this. They all `fork()` to interpret an
`eval` as far as I can tell.

SZEDER Gábor June 18, 2019, 4:55 p.m. UTC | #5

On Tue, Jun 18, 2019 at 06:15:46PM +0200, Johannes Schindelin wrote:
> > Regardless of how it is implemented, I have another gripe with this
> > helper: the way it must be used requires a process: $(test_out_to_path
> > $foo)
> 
> Indeed.
> 
> > And looking through this patch series, I see a gazillion of *new*
> > process substitutions $(test_something...) and $(basename $whatever).
> > Can't we do something about it?
> 
> I wish there was. Unix shell scripting has not evolved much in the past,
> what, 3 decades? So I don't really see a way to "pass variables by
> reference" to shell functions, short of calling `eval` (which buys
> preciously little as it _also_ has to spawn a new process [*1*]).

> Footnote *1*: Theoretically, it could be a *ton* faster by using threads
> on Windows. But threads are pretty much an afterthought on Unix/Linux, so
> no mainstream POSIX shell supports this. They all `fork()` to interpret an
> `eval` as far as I can tell.

'eval' doesn't fork().  It can't possibly fork(), because if it did,
then any variables set in the eval-ed code snippet couldn't be visible
outside the 'eval'.

Jeff King June 18, 2019, 5:04 p.m. UTC | #6

On Tue, Jun 18, 2019 at 06:15:46PM +0200, Johannes Schindelin wrote:

> > And looking through this patch series, I see a gazillion of *new*
> > process substitutions $(test_something...) and $(basename $whatever).
> > Can't we do something about it?
> 
> I wish there was. Unix shell scripting has not evolved much in the past,
> what, 3 decades? So I don't really see a way to "pass variables by
> reference" to shell functions, short of calling `eval` (which buys
> preciously little as it _also_ has to spawn a new process [*1*]).

Really? An eval can impact the caller's state, so it _can't_ happen in a
sub-process in most cases.

E.g., if I run this:

-- >8 --
#!/bin/sh

# usage: test_oid_to_path <var> <oid>
# to set the variable <var> in the caller's environment to the path of <oid>
test_oid_to_path() {
	path="${2%${2#??}}/${2#??}"
	eval "$1=\$path"
}

test_oid_to_path foo 1234abcd
echo foo: $foo
-- >8 --

it all happens in a single process, under both bash and dash.

-Peff

brian m. carlson June 18, 2019, 11:03 p.m. UTC | #7

On 2019-06-18 at 06:14:03, Johannes Sixt wrote:
> Am 18.06.19 um 03:29 schrieb brian m. carlson:
> > On 2019-06-17 at 19:05:03, Johannes Schindelin wrote:
> > I'm fine making that change. The original design was because we had
> > other code that used that technique and I didn't see an obviously better
> > solution. Now you've provided one and a good justification.
> 
> Regardless of how it is implemented, I have another gripe with this
> helper: the way it must be used requires a process: $(test_out_to_path $foo)
> 
> And looking through this patch series, I see a gazillion of *new*
> process substitutions $(test_something...) and $(basename $whatever).
> Can't we do something about it?

A command substitution need not invoke another process. In fact, all of
the test_oid calls operate only within the shell. The test_oid_cache and
test_oid_init calls require spawning only expr, and they are typically
limited to one or two per test.

Within reason, I'm happy to try to make things easier for Windows folks
if I can, but it's still the case that process creation is more
expensive on Windows and shell scripting is designed around process
creation. My hope is that Microsoft's work on WSL will help them improve
their Win32 process creation time so it's faster and this becomes less
of an issue.

Johannes Schindelin June 19, 2019, 7:55 p.m. UTC | #8

Hi Peff,

On Tue, 18 Jun 2019, Jeff King wrote:

> On Tue, Jun 18, 2019 at 06:15:46PM +0200, Johannes Schindelin wrote:
>
> > > And looking through this patch series, I see a gazillion of *new*
> > > process substitutions $(test_something...) and $(basename $whatever).
> > > Can't we do something about it?
> >
> > I wish there was. Unix shell scripting has not evolved much in the past,
> > what, 3 decades? So I don't really see a way to "pass variables by
> > reference" to shell functions, short of calling `eval` (which buys
> > preciously little as it _also_ has to spawn a new process [*1*]).
>
> Really? An eval can impact the caller's state, so it _can't_ happen in a
> sub-process in most cases.
>
> E.g., if I run this:
>
> -- >8 --
> #!/bin/sh
>
> # usage: test_oid_to_path <var> <oid>
> # to set the variable <var> in the caller's environment to the path of <oid>
> test_oid_to_path() {
> 	path="${2%${2#??}}/${2#??}"
> 	eval "$1=\$path"
> }
>
> test_oid_to_path foo 1234abcd
> echo foo: $foo
> -- >8 --
>
> it all happens in a single process, under both bash and dash.

Oops. I think I may have read too much into the name `eval.c` in Dash's
source code, I assumed that it was all about the `eval` command. But now
that I look at this again, I see:

/*
 * Execute a command inside back quotes.  If it's a builtin command, we
 * want to save its output in a block obtained from malloc.  Otherwise
 * we fork off a subprocess and get the output of the command via a pipe.
 * Should be called with interrupts off.
 */

void
evalbackcmd(union node *n, struct backcmd *result)
{
	[...]

I saw this at
https://git.kernel.org/pub/scm/utils/dash/dash.git/tree/src/eval.c#n608

So this is actually about `$(...)` (or the old, non-nestable backtick
version of it) and not about `eval`.

And of course I was also making another incorrect connection to the Perl
command `eval` which allows you to catch code that `die()`s (which *must*
run in a subprocess).

So yeah, I guess `eval` would work here to avoid the `$(...)` constructs.

Sorry for the noise,
Dscho

Johannes Schindelin June 19, 2019, 7:56 p.m. UTC | #9

Hi,

On Tue, 18 Jun 2019, SZEDER Gábor wrote:

> On Tue, Jun 18, 2019 at 06:15:46PM +0200, Johannes Schindelin wrote:
> > > Regardless of how it is implemented, I have another gripe with this
> > > helper: the way it must be used requires a process: $(test_out_to_path
> > > $foo)
> >
> > Indeed.
> >
> > > And looking through this patch series, I see a gazillion of *new*
> > > process substitutions $(test_something...) and $(basename $whatever).
> > > Can't we do something about it?
> >
> > I wish there was. Unix shell scripting has not evolved much in the past,
> > what, 3 decades? So I don't really see a way to "pass variables by
> > reference" to shell functions, short of calling `eval` (which buys
> > preciously little as it _also_ has to spawn a new process [*1*]).
>
> > Footnote *1*: Theoretically, it could be a *ton* faster by using threads
> > on Windows. But threads are pretty much an afterthought on Unix/Linux, so
> > no mainstream POSIX shell supports this. They all `fork()` to interpret an
> > `eval` as far as I can tell.
>
> 'eval' doesn't fork().  It can't possibly fork(), because if it did,
> then any variables set in the eval-ed code snippet couldn't be visible
> outside the 'eval'.

Good point. My brain must have made a couple of incorrect associations
there.

Sorry,
Dscho

[v2,01/10] t: add helper to convert object IDs to paths

Commit Message

Comments

Patch