[v2,3/5] scalar: enable built-in FSMonitor on `register`

Message ID	5fdf8337972d7092aba06a9c750f42cd5868e630.1660694290.git.gitgitgadget@gmail.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <git-owner@kernel.org> Message-Id: <5fdf8337972d7092aba06a9c750f42cd5868e630.1660694290.git.gitgitgadget@gmail.com> In-Reply-To: <pull.1324.v2.git.1660694290.gitgitgadget@gmail.com> References: <pull.1324.git.1660673269.gitgitgadget@gmail.com> <pull.1324.v2.git.1660694290.gitgitgadget@gmail.com> Date: Tue, 16 Aug 2022 23:58:07 +0000 Subject: [PATCH v2 3/5] scalar: enable built-in FSMonitor on `register` Fcc: Sent Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit MIME-Version: 1.0 To: git@vger.kernel.org Cc: johannes.schindelin@gmx.de, mjcheetham@outlook.com, gitster@pobox.com, Victoria Dye <vdye@github.com>, Matthew John Cheetham <mjcheetham@outlook.com> Precedence: bulk From: Matthew John Cheetham <mjcheetham@outlook.com>
Series	[v2,1/5] scalar-unregister: handle error codes greater than 0 \| expand [v2,1/5] scalar-unregister: handle error codes greater than 0 [v2,2/5] scalar-[un]register: clearly indicate source of error [v2,3/5] scalar: enable built-in FSMonitor on `register` [v2,4/5] scalar unregister: stop FSMonitor daemon [v2,5/5] scalar: update technical doc roadmap with FSMonitor support

Matthew John Cheetham Aug. 16, 2022, 11:58 p.m. UTC

From: Matthew John Cheetham <mjcheetham@outlook.com>

Using the built-in FSMonitor makes many common commands quite a bit
faster. So let's teach the `scalar register` command to enable the
built-in FSMonitor and kick-start the fsmonitor--daemon process (for
convenience).

For simplicity, we only support the built-in FSMonitor (and no external
file system monitor such as e.g. Watchman).

Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Victoria Dye <vdye@github.com>
---
 contrib/scalar/scalar.c          | 21 +++++++++++++++++++++
 contrib/scalar/t/t9099-scalar.sh | 11 +++++++++++
 2 files changed, 32 insertions(+)

Derrick Stolee Aug. 17, 2022, 2:34 p.m. UTC | #1

On 8/16/2022 7:58 PM, Matthew John Cheetham via GitGitGadget wrote:

> +#ifdef HAVE_FSMONITOR_DAEMON_BACKEND
> +		/*
> +		 * Enable the built-in FSMonitor on supported platforms.
> +		 */
> +		{ "core.fsmonitor", "true" },
> +#endif
> +	if (fsmonitor_ipc__is_supported() && start_fsmonitor_daemon())
> +		return error(_("could not start the FSMonitor daemon"));
> +

I initially worried if fsmonitor_ipc__is_supported() could use some
run-time information to detect if FS Monitor is supported (say, existence
of a network share or something). However, that implementation is
currently defined as a constant depending on
HAVE_FSMONITOR_DAEMON_BACKEND.

The reason I was worried is that we could enable core.fsmonitor=true based
on the compile-time macro, but then avoid starting the daemon based on the
run-time results. If we get into this state, would the user's 'git status'
calls start complaining about the core.fsmonitor=true config because it is
not supported?

The most future-proof thing to do might be to move the config write out of
the set_recommended_config() and into start_fsmonitor_daemon(). Perhaps
rename it to enable_fsmonitor() so it can fail due to writing the config
_or_ for starting the daemon. The error message would change, then, too.

Or maybe I'm making a mountain out of a mole hill and what exists here is
perfectly fine.

> +test_lazy_prereq BUILTIN_FSMONITOR '
> +	git version --build-options | grep -q "feature:.*fsmonitor--daemon"
> +'

It looks like we already have a FSMONITOR_DAEMON prereq in test-lib.sh.
Should we use that instead?

Thanks,
-Stolee

Junio C Hamano Aug. 17, 2022, 2:43 p.m. UTC | #2

"Matthew John Cheetham via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> +static int start_fsmonitor_daemon(void)
> +{
> +	assert(fsmonitor_ipc__is_supported());
> +
> +	if (fsmonitor_ipc__get_state() != IPC_STATE__LISTENING)
> +		return run_git("fsmonitor--daemon", "start", NULL);
> +
> +	return 0;
> +}

The function got ultra simple ;-).

> @@ -247,6 +265,9 @@ static int register_dir(void)
>  	if (toggle_maintenance(1))
>  		return error(_("could not turn on maintenance"));
>  
> +	if (fsmonitor_ipc__is_supported() && start_fsmonitor_daemon())
> +		return error(_("could not start the FSMonitor daemon"));
> +
>  	return 0;
>  }

As long as it is done consistently, I do not think it makes a huge
difference between the "call it only when supported" and "when asked
to do what we do not support, silently succeed without doing
anything".  It however makes the code appear to be more in control
to do it this way, I think, which is good.

Junio C Hamano Aug. 17, 2022, 3:54 p.m. UTC | #3

Derrick Stolee <derrickstolee@github.com> writes:

> On 8/16/2022 7:58 PM, Matthew John Cheetham via GitGitGadget wrote:
>
>> +#ifdef HAVE_FSMONITOR_DAEMON_BACKEND
>> +		/*
>> +		 * Enable the built-in FSMonitor on supported platforms.
>> +		 */
>> +		{ "core.fsmonitor", "true" },
>> +#endif
>> +	if (fsmonitor_ipc__is_supported() && start_fsmonitor_daemon())
>> +		return error(_("could not start the FSMonitor daemon"));
>> +
>
> I initially worried if fsmonitor_ipc__is_supported() could use some
> run-time information to detect if FS Monitor is supported (say, existence
> of a network share or something). However, that implementation is
> currently defined as a constant depending on
> HAVE_FSMONITOR_DAEMON_BACKEND.
>
> The reason I was worried is that we could enable core.fsmonitor=true based
> on the compile-time macro, but then avoid starting the daemon based on the
> run-time results. If we get into this state, would the user's 'git status'
> calls start complaining about the core.fsmonitor=true config because it is
> not supported?

Ah, I didn't consider the possibility where the user uses the
configuration to say "enable it if you are able, but it is OK if you
cannot".  Whether the "is supported" is dynamic or compiled-in, that
may be a valid issue to consider.  An easy way out may be to declare
that the value "true" for "core.fsmonitor" variable means exactly
that, i.e. the user asks to run it, but it is not an error if it
cannot run.

A variant that may need slightly more work would be to introduce a
separate value (perhaps "when-able") that means that, while keeping
the "true" to mean "run the built-in one, or error out to let me
know otherwise" as before.

Thanks.

Victoria Dye Aug. 17, 2022, 11:47 p.m. UTC | #4

Derrick Stolee wrote:
> On 8/16/2022 7:58 PM, Matthew John Cheetham via GitGitGadget wrote:
> 
>> +#ifdef HAVE_FSMONITOR_DAEMON_BACKEND
>> +		/*
>> +		 * Enable the built-in FSMonitor on supported platforms.
>> +		 */
>> +		{ "core.fsmonitor", "true" },
>> +#endif
>> +	if (fsmonitor_ipc__is_supported() && start_fsmonitor_daemon())
>> +		return error(_("could not start the FSMonitor daemon"));
>> +
> 
> I initially worried if fsmonitor_ipc__is_supported() could use some
> run-time information to detect if FS Monitor is supported (say, existence
> of a network share or something). However, that implementation is
> currently defined as a constant depending on
> HAVE_FSMONITOR_DAEMON_BACKEND.
> 
> The reason I was worried is that we could enable core.fsmonitor=true based
> on the compile-time macro, but then avoid starting the daemon based on the
> run-time results. If we get into this state, would the user's 'git status'
> calls start complaining about the core.fsmonitor=true config because it is
> not supported?
> 
> The most future-proof thing to do might be to move the config write out of
> the set_recommended_config() and into start_fsmonitor_daemon(). Perhaps
> rename it to enable_fsmonitor() so it can fail due to writing the config
> _or_ for starting the daemon. The error message would change, then, too.

I spent some time digging into this, and I think gating both the config and
subsequent 'git fsmonitor--daemon start' on having platform *and* repository
support is a good idea. I'll update the next version to both set the
'core.fsmonitor' config and start the daemon only if the built-in FSMonitor
is fully supported.

(warning: long-winded tangent mostly unrelated to FSMonitor)

In the process of testing FSMonitor behavior, I think found other issues
with Scalar registration. Specifically, the test I wrote attempted to
'scalar register' a bare repo, since bare directories are incompatible with
FSMonitor. After seeing that FSMonitor was *not* incompatible with the
repository, I found that Scalar was 1) ignoring the bare repository and, as
a result, 2) identifying my Git clone (way above GIT_CEILING_DIRECTORIES) as
the "enlistment root". I think 1) might be fine as-is - uniformly ignoring
bare repos seems like a reasonable choice - but 2) seems like more of a
problem. 

Right now, 'setup_enlistment_directory()' searches for the repo root
beginning at directory '<dir>', which is either a user-provided path or
current working directory. It checks whether '<dir>' or '<dir>/src' is a
repo root: if so, it sets the enlistment info; otherwise, it repeats the
process with the parent of '<dir>' until the repo root is found. For
example, given the following directory structure:

somedir
└── enlistment
    ├── src
    │   └── .git
    └── test
        └── data

'scalar register somedir/enlistment/test/data' will search:

  * somedir/enlistment/test/data/src
  * somedir/enlistment/test/data
  * somedir/enlistment/test/src
  * somedir/enlistment/test
  * somedir/enlistment/src

The current usage of GIT_CEILING_DIRECTORIES relies on the fact that, when
invoking a normal 'git' command, 'setup_git_directory()' only searches
upwards from the current working directory to find the repo root; it's a
clear "yes" or "no" as to whether that search passes a ceiling directory.
Scalar isn't as clear, since it searches for the repo root both "downwards"
into '<dir>/src' *and* upwards through the parents of '<dir>'. It's not
totally clear to me what the "right" behavior for Scalar is, but my current
thought is to follow the same rules as 'setup_git_directory()', but for the
*enlistment* root rather than the repository root. It's more restrictive
than GIT_CEILING_DIRECTORIES on a normal git repo, e.g.:

1. 'GIT_CEILING_DIRECTORIES=somedir/enlistment git -C somedir/enlistment/src status' 
   is valid.
2. 'GIT_CEILING_DIRECTORIES=somedir/enlistment scalar register somedir/enlistment/src'
   is not valid.

but since Scalar works on the entire enlistment (not just the repo inside of
it), I think it makes sense to prevent it from crossing a ceiling directory
boundary.

What do you think? Hopefully my rambling wasn't too confusing (if it is,
please let me know what I can clarify). 

> 
> Or maybe I'm making a mountain out of a mole hill and what exists here is
> perfectly fine.
> 
>> +test_lazy_prereq BUILTIN_FSMONITOR '
>> +	git version --build-options | grep -q "feature:.*fsmonitor--daemon"
>> +'
> 
> It looks like we already have a FSMONITOR_DAEMON prereq in test-lib.sh.
> Should we use that instead?

Works for me, happy to reuse code wherever possible. :)

> 
> Thanks,
> -Stolee

Derrick Stolee Aug. 18, 2022, 1:19 p.m. UTC | #5

On 8/17/2022 7:47 PM, Victoria Dye wrote:

> (warning: long-winded tangent mostly unrelated to FSMonitor)
> 
> In the process of testing FSMonitor behavior, I think found other issues
> with Scalar registration. Specifically, the test I wrote attempted to
> 'scalar register' a bare repo, since bare directories are incompatible with
> FSMonitor. After seeing that FSMonitor was *not* incompatible with the
> repository, I found that Scalar was 1) ignoring the bare repository and, as

This is interesting, that Scalar doesn't recognize a bare repo. There are
definitely some config settings that it recommends that don't make sense
in a bare repo, but it's interesting that it completely ignores it. Good
find.

I'm not sure there is anything to 'fix' except maybe error out when the
discovered Git repository is bare. Add a warning, at minimum.

> a result, 2) identifying my Git clone (way above GIT_CEILING_DIRECTORIES) as
> the "enlistment root". I think 1) might be fine as-is - uniformly ignoring
> bare repos seems like a reasonable choice - but 2) seems like more of a
> problem. 

...

> The current usage of GIT_CEILING_DIRECTORIES relies on the fact that, when
> invoking a normal 'git' command, 'setup_git_directory()' only searches
> upwards from the current working directory to find the repo root; it's a
> clear "yes" or "no" as to whether that search passes a ceiling directory.
> Scalar isn't as clear, since it searches for the repo root both "downwards"
> into '<dir>/src' *and* upwards through the parents of '<dir>'. It's not
> totally clear to me what the "right" behavior for Scalar is, but my current
> thought is to follow the same rules as 'setup_git_directory()', but for the
> *enlistment* root rather than the repository root. It's more restrictive
> than GIT_CEILING_DIRECTORIES on a normal git repo, e.g.:
> 
> 1. 'GIT_CEILING_DIRECTORIES=somedir/enlistment git -C somedir/enlistment/src status' 
>    is valid.
> 2. 'GIT_CEILING_DIRECTORIES=somedir/enlistment scalar register somedir/enlistment/src'
>    is not valid.

This is interesting, that we can't recognize the ceiling as the root.

> but since Scalar works on the entire enlistment (not just the repo inside of
> it), I think it makes sense to prevent it from crossing a ceiling directory
> boundary.

I think the enlistment root was something that was inherited from VFS for
Git, and we can mostly abandon it. The things we need to do are all based
on the Git repository itself, not the parent. The only thing we need to
keep is to allow a user to specify the repo by pointing to the directory
immediately above the 'src' directory.

> 'scalar register somedir/enlistment/test/data' will search:
> 
>   * somedir/enlistment/test/data/src
>   * somedir/enlistment/test/data
>   * somedir/enlistment/test/src
>   * somedir/enlistment/test
>   * somedir/enlistment/src

Instead, we could do the following on a specified <dir>:

 * If <dir>/src exists, find the Git directory by finding the first Git
   repository containing <dir>/src.
 * Otherwise, find the first Git repository containing <dir>.

Is there an easy way to discover a Git repository at a specific directory?
Or, do we do something simpler, like changing directories then calling
setup_git_directory()? I think simplifying the logic that way should
respect GIT_CEILING_DIRECTORIES correctly.

Thanks,
-Stolee

[v2,3/5] scalar: enable built-in FSMonitor on `register`

Commit Message

Comments

Patch