diff mbox series

[04/15] scalar: 'register' sets recommended config and starts maintenance

Message ID 3786f4c597fffc13f638efd26875dcb257d54ab4.1630359290.git.gitgitgadget@gmail.com (mailing list archive)
State Superseded
Headers show
Series Upstreaming the Scalar command | expand

Commit Message

Derrick Stolee Aug. 30, 2021, 9:34 p.m. UTC
From: Derrick Stolee <dstolee@microsoft.com>

Let's start implementing the `register` command. With this commit,
recommended settings are configured upon `scalar register`, and Git's
background maintenance is started.

The recommended config settings may very well change in the future. For
example, once the built-in FSMonitor is available, we will want to
enable it upon `scalar register`. For that reason, we explicitly support
running `scalar register` in an already-registered enlistment.

Co-authored-by: Victoria Dye <vdye@github.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 contrib/scalar/scalar.c   | 256 ++++++++++++++++++++++++++++++++++++++
 contrib/scalar/scalar.txt |  14 ++-
 2 files changed, 269 insertions(+), 1 deletion(-)

Comments

Ævar Arnfjörð Bjarmason Aug. 31, 2021, 8:11 a.m. UTC | #1
On Mon, Aug 30 2021, Derrick Stolee via GitGitGadget wrote:

> [...]
> +#ifndef WIN32
> +		{ "core.untrackedCache", "true" },
> +#else
> +		/*
> +		 * Unfortunately, Scalar's Functional Tests demonstrated
> +		 * that the untracked cache feature is unreliable on Windows
> +		 * (which is a bummer because that platform would benefit the
> +		 * most from it). For some reason, freshly created files seem
> +		 * not to update the directory's `lastModified` time
> +		 * immediately, but the untracked cache would need to rely on
> +		 * that.
> +		 *
> +		 * Therefore, with a sad heart, we disable this very useful
> +		 * feature on Windows.
> +		 */
> +		{ "core.untrackedCache", "false" },
> +#endif
> [...]

Ok, but why the need to set it to "false" explicitly? Does it need to be
so opinionated as to overwrite existing user-set config in these cases?

> +		{ "core.bare", "false" },

Shouldn't this be set by "git init" already?

> [...]
> +		{ "core.logAllRefUpdates", "true" },

An opinionated thing unrelated to performance?

> [...]
> +		{ "feature.manyFiles", "false" },
> +		{ "feature.experimental", "false" },

Ditto the question about the need to set this, these are false by
default, right?

> [...]
> +		if (git_config_get_string(config[i].key, &value)) {
> +			trace2_data_string("scalar", the_repository, config[i].key, "created");
> +			if (git_config_set_gently(config[i].key,
> +						  config[i].value) < 0)
> +				return error(_("could not configure %s=%s"),
> +					     config[i].key, config[i].value);
> +		} else {
> +			trace2_data_string("scalar", the_repository, config[i].key, "exists");
> +			free(value);
> +		}

The commit message doesn't discuss these trace2 additions, these in
particular seem like they might be useful, but better done as as some
more general trace2 intergration in config.c, i.e. if the functions
being called here did the same logging on config set/get.
Derrick Stolee Aug. 31, 2021, 2:22 p.m. UTC | #2
On 8/31/2021 4:11 AM, Ævar Arnfjörð Bjarmason wrote:
> 
> On Mon, Aug 30 2021, Derrick Stolee via GitGitGadget wrote:
> 
>> [...]
>> +#ifndef WIN32
>> +		{ "core.untrackedCache", "true" },
>> +#else
>> +		/*
>> +		 * Unfortunately, Scalar's Functional Tests demonstrated
>> +		 * that the untracked cache feature is unreliable on Windows
>> +		 * (which is a bummer because that platform would benefit the
>> +		 * most from it). For some reason, freshly created files seem
>> +		 * not to update the directory's `lastModified` time
>> +		 * immediately, but the untracked cache would need to rely on
>> +		 * that.
>> +		 *
>> +		 * Therefore, with a sad heart, we disable this very useful
>> +		 * feature on Windows.
>> +		 */
>> +		{ "core.untrackedCache", "false" },
>> +#endif
>> [...]
> 
> Ok, but why the need to set it to "false" explicitly? Does it need to be
> so opinionated as to overwrite existing user-set config in these cases?

Users can overwrite this local config value, but this is placed to avoid
a global config value from applying specifically within Scalar-created
repos.
 
>> +		{ "core.bare", "false" },
> 
> Shouldn't this be set by "git init" already?

This one is probably a bit _too_ defensive. It can be removed.

>> [...]
>> +		{ "core.logAllRefUpdates", "true" },
> 
> An opinionated thing unrelated to performance?

It's an opinionated thing related to supporting monorepo users. It helps
us diagnose issues they have by recreating a sequence of events.

>> [...]
>> +		{ "feature.manyFiles", "false" },
>> +		{ "feature.experimental", "false" },
> 
> Ditto the question about the need to set this, these are false by
> default, right?

But if a user has them on globally, then we don't want them to apply
locally (in favor of the settings that we set explicitly).

>> [...]
>> +		if (git_config_get_string(config[i].key, &value)) {
>> +			trace2_data_string("scalar", the_repository, config[i].key, "created");
>> +			if (git_config_set_gently(config[i].key,
>> +						  config[i].value) < 0)
>> +				return error(_("could not configure %s=%s"),
>> +					     config[i].key, config[i].value);
>> +		} else {
>> +			trace2_data_string("scalar", the_repository, config[i].key, "exists");
>> +			free(value);
>> +		}
> 
> The commit message doesn't discuss these trace2 additions, these in
> particular seem like they might be useful, but better done as as some
> more general trace2 intergration in config.c, i.e. if the functions
> being called here did the same logging on config set/get.

If we want to do such a tracing change within git_config_set*(), then
that would be an appropriate replacement. The biggest reason to include
them here is to trace that an existing value already exists, for the
case of running 'scalar reconfigure' during an upgrade. That part
doesn't make much sense to put into config.c.

Thanks,
-Stolee
Junio C Hamano Sept. 1, 2021, 4:16 p.m. UTC | #3
"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> +static void setup_enlistment_directory(int argc, const char **argv,
> +				       const char * const *usagestr,
> +				       const struct option *options,
> +				       struct strbuf *enlistment_root)
> +{
> +	struct strbuf path = STRBUF_INIT;
> +	char *root;
> +	int enlistment_found = 0;
> +
> +	if (startup_info->have_repository)
> +		BUG("gitdir already set up?!?");
> +
> +	if (argc > 1)
> +		usage_with_options(usagestr, options);
> +
> +	/* find the worktree, determine its corresponding root */
> +	if (argc == 1)
> +		strbuf_add_absolute_path(&path, argv[0]);
> +	else if (strbuf_getcwd(&path) < 0)
> +		die(_("need a working directory"));
> +
> +	strbuf_trim_trailing_dir_sep(&path);
> +	do {
> +		const size_t len = path.len;
> +
> +		/* check if currently in enlistment root with src/ workdir */
> +		strbuf_addstr(&path, "/src/.git");
> +		if (is_git_directory(path.buf)) {
> +			strbuf_strip_suffix(&path, "/.git");
> +
> +			if (enlistment_root)
> +				strbuf_add(enlistment_root, path.buf, len);
> +
> +			enlistment_found = 1;
> +			break;
> +		}

This special casing of "normally the top of the working tree is
enlisted, but if the repository is called src/, then we enslist
one level up" is a bit of eyesore because

 (1) it is unclear why such a directory with 'src/' subdirectory is
     so special, and

 (2) it fails to serve those who has the same need but named their
     source subdirectory differently (like 'source/').

"The design decisions we made are all part of being opinionated" can
all explain it away, but at least we should let the users know where
the opinionated choices scalar makes want to lead them to, and this
"src/" stuff needs a bit of clarification.  Perhaps a documentation
will be added in later steps?

> +	for (i = 0; config[i].key; i++) {
> +		if (git_config_get_string(config[i].key, &value)) {
> +			trace2_data_string("scalar", the_repository, config[i].key, "created");
> +			if (git_config_set_gently(config[i].key,
> +						  config[i].value) < 0)
> +				return error(_("could not configure %s=%s"),
> +					     config[i].key, config[i].value);
> +		} else {
> +			trace2_data_string("scalar", the_repository, config[i].key, "exists");
> +			free(value);
> +		}

I wonder if we should have a table of configuration variables and
their default values.  The above code implements a skewed "we only
avoid overriding what is explicitly configured".  A variable that
the user left unconfigured because the user found its default
satisfactory will be overridden, and if the value scalar wants to
use happens to be the default value, we leave an explicit
configuration to that default value in the resulting configuration
file.

But I think the above is the best we can do without such a central
registry of configuration variables.
Johannes Schindelin Sept. 3, 2021, 3:41 p.m. UTC | #4
Hi Junio,

On Wed, 1 Sep 2021, Junio C Hamano wrote:

> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > +static void setup_enlistment_directory(int argc, const char **argv,
> > +				       const char * const *usagestr,
> > +				       const struct option *options,
> > +				       struct strbuf *enlistment_root)
> > +{
> > +	struct strbuf path = STRBUF_INIT;
> > +	char *root;
> > +	int enlistment_found = 0;
> > +
> > +	if (startup_info->have_repository)
> > +		BUG("gitdir already set up?!?");
> > +
> > +	if (argc > 1)
> > +		usage_with_options(usagestr, options);
> > +
> > +	/* find the worktree, determine its corresponding root */
> > +	if (argc == 1)
> > +		strbuf_add_absolute_path(&path, argv[0]);
> > +	else if (strbuf_getcwd(&path) < 0)
> > +		die(_("need a working directory"));
> > +
> > +	strbuf_trim_trailing_dir_sep(&path);
> > +	do {
> > +		const size_t len = path.len;
> > +
> > +		/* check if currently in enlistment root with src/ workdir */
> > +		strbuf_addstr(&path, "/src/.git");
> > +		if (is_git_directory(path.buf)) {
> > +			strbuf_strip_suffix(&path, "/.git");
> > +
> > +			if (enlistment_root)
> > +				strbuf_add(enlistment_root, path.buf, len);
> > +
> > +			enlistment_found = 1;
> > +			break;
> > +		}
>
> This special casing of "normally the top of the working tree is
> enlisted, but if the repository is called src/, then we enslist
> one level up" is a bit of eyesore because
>
>  (1) it is unclear why such a directory with 'src/' subdirectory is
>      so special, and
>
>  (2) it fails to serve those who has the same need but named their
>      source subdirectory differently (like 'source/').

All true. I wish we had come up with a better way, or with a way to
override this via an option.

Unfortunately, we are now bound by the fact that there are already users
out there...

> "The design decisions we made are all part of being opinionated" can
> all explain it away, but at least we should let the users know where
> the opinionated choices scalar makes want to lead them to, and this
> "src/" stuff needs a bit of clarification.  Perhaps a documentation
> will be added in later steps?

I had hoped that the initial blurb of the manual page was sufficient, but
you're right, the `register` subcommand is particular in that it allows to
force Scalar to consider the worktree to be identical to the Scalar
enlistment. I added this:

	diff --git a/contrib/scalar/scalar.txt b/contrib/scalar/scalar.txt
	index 1593da45eae..568987064b2 100644
	--- a/contrib/scalar/scalar.txt
	+++ b/contrib/scalar/scalar.txt
	@@ -40,6 +40,10 @@ register [<enlistment>]::
		and starts background maintenance. If `<enlistment>` is not provided,
		then the enlistment associated with the current working directory is
		registered.
	++
	+Note: when this subcommand is called in a worktree that is called `src/`, its
	+parent directory is considered to be the Scalar enlistment. If the worktree is
	+_not_ called `src/`, it itself will be considered to be the Scalar enlistment.

> > +	for (i = 0; config[i].key; i++) {
> > +		if (git_config_get_string(config[i].key, &value)) {
> > +			trace2_data_string("scalar", the_repository, config[i].key, "created");
> > +			if (git_config_set_gently(config[i].key,
> > +						  config[i].value) < 0)
> > +				return error(_("could not configure %s=%s"),
> > +					     config[i].key, config[i].value);
> > +		} else {
> > +			trace2_data_string("scalar", the_repository, config[i].key, "exists");
> > +			free(value);
> > +		}
>
> I wonder if we should have a table of configuration variables and
> their default values.  The above code implements a skewed "we only
> avoid overriding what is explicitly configured".  A variable that
> the user left unconfigured because the user found its default
> satisfactory will be overridden, and if the value scalar wants to
> use happens to be the default value, we leave an explicit
> configuration to that default value in the resulting configuration
> file.
>
> But I think the above is the best we can do without such a central
> registry of configuration variables.

Even with such a central registry, there would still be the question
whether the user, by staying with the default, wanted Git (or in this
instance, Scalar) to keep using the old default. The intention is
unfortunately not clear just from setting the variable.

So I think this is the best we can do.

Ciao,
Dscho
Junio C Hamano Sept. 3, 2021, 5:35 p.m. UTC | #5
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

>> "The design decisions we made are all part of being opinionated" can
>> all explain it away, but at least we should let the users know where
>> the opinionated choices scalar makes want to lead them to, and this
>> "src/" stuff needs a bit of clarification.  Perhaps a documentation
>> will be added in later steps?
>
> I had hoped that the initial blurb of the manual page was sufficient, but
> you're right, the `register` subcommand is particular in that it allows to
> force Scalar to consider the worktree to be identical to the Scalar
> enlistment. I added this:

Sorry, if it weren't clear that I was commenting on each step as I
read along without peeking later steps.  I think I saw it was
written somewhere that this was to encourage use of read-only
directory that keeps the sources with build artifacts and crufts
created outside it (so forests of projects will not have the source
directories, each of which has its own .git/, next to each other---
instead we would have shell directories, each with its own src/ and
src/.git, next to each other).  The additional documentation below
is a good thing to have handy when readers learn how to use
"register" (or more generally, what an "enlistment" is).  As long as
the motivation behind that design is given somewhere (not necessarily
here) for readers to discover, I am OK with the design.

> 	diff --git a/contrib/scalar/scalar.txt b/contrib/scalar/scalar.txt
> 	index 1593da45eae..568987064b2 100644
> 	--- a/contrib/scalar/scalar.txt
> 	+++ b/contrib/scalar/scalar.txt
> 	@@ -40,6 +40,10 @@ register [<enlistment>]::
> 		and starts background maintenance. If `<enlistment>` is not provided,
> 		then the enlistment associated with the current working directory is
> 		registered.
> 	++
> 	+Note: when this subcommand is called in a worktree that is called `src/`, its
> 	+parent directory is considered to be the Scalar enlistment. If the worktree is
> 	+_not_ called `src/`, it itself will be considered to be the Scalar enlistment.

Thanks.
diff mbox series

Patch

diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
index 7cff29e0fcd..7660327c27b 100644
--- a/contrib/scalar/scalar.c
+++ b/contrib/scalar/scalar.c
@@ -5,11 +5,267 @@ 
 #include "cache.h"
 #include "gettext.h"
 #include "parse-options.h"
+#include "config.h"
+#include "run-command.h"
+
+/*
+ * Remove the deepest subdirectory in the provided path string. Path must not
+ * include a trailing path separator. Returns 1 if parent directory found,
+ * otherwise 0.
+ */
+static int strbuf_parent_directory(struct strbuf *buf)
+{
+	size_t len = buf->len;
+	size_t offset = offset_1st_component(buf->buf);
+	char *path_sep = find_last_dir_sep(buf->buf + offset);
+	strbuf_setlen(buf, path_sep ? path_sep - buf->buf : offset);
+
+	return buf->len < len;
+}
+
+static void setup_enlistment_directory(int argc, const char **argv,
+				       const char * const *usagestr,
+				       const struct option *options,
+				       struct strbuf *enlistment_root)
+{
+	struct strbuf path = STRBUF_INIT;
+	char *root;
+	int enlistment_found = 0;
+
+	if (startup_info->have_repository)
+		BUG("gitdir already set up?!?");
+
+	if (argc > 1)
+		usage_with_options(usagestr, options);
+
+	/* find the worktree, determine its corresponding root */
+	if (argc == 1)
+		strbuf_add_absolute_path(&path, argv[0]);
+	else if (strbuf_getcwd(&path) < 0)
+		die(_("need a working directory"));
+
+	strbuf_trim_trailing_dir_sep(&path);
+	do {
+		const size_t len = path.len;
+
+		/* check if currently in enlistment root with src/ workdir */
+		strbuf_addstr(&path, "/src/.git");
+		if (is_git_directory(path.buf)) {
+			strbuf_strip_suffix(&path, "/.git");
+
+			if (enlistment_root)
+				strbuf_add(enlistment_root, path.buf, len);
+
+			enlistment_found = 1;
+			break;
+		}
+
+		/* reset to original path */
+		strbuf_setlen(&path, len);
+
+		/* check if currently in workdir */
+		strbuf_addstr(&path, "/.git");
+		if (is_git_directory(path.buf)) {
+			strbuf_setlen(&path, len);
+
+			if (enlistment_root) {
+				/*
+				 * If the worktree's directory's name is `src`, the enlistment is the
+				 * parent directory, otherwise it is identical to the worktree.
+				 */
+				root = strip_path_suffix(path.buf, "src");
+				strbuf_addstr(enlistment_root, root ? root : path.buf);
+				free(root);
+			}
+
+			enlistment_found = 1;
+			break;
+		}
+
+		strbuf_setlen(&path, len);
+	} while (strbuf_parent_directory(&path));
+
+	if (!enlistment_found)
+		die(_("could not find enlistment root"));
+
+	if (chdir(path.buf) < 0)
+		die_errno(_("could not switch to '%s'"), path.buf);
+
+	strbuf_release(&path);
+	setup_git_directory();
+}
+
+static int run_git(const char *arg, ...)
+{
+	struct strvec argv = STRVEC_INIT;
+	va_list args;
+	const char *p;
+	int res;
+
+	va_start(args, arg);
+	strvec_push(&argv, arg);
+	while ((p = va_arg(args, const char *)))
+		strvec_push(&argv, p);
+	va_end(args);
+
+	res = run_command_v_opt(argv.v, RUN_GIT_CMD);
+
+	strvec_clear(&argv);
+	return res;
+}
+
+static int set_recommended_config(void)
+{
+	struct {
+		const char *key;
+		const char *value;
+	} config[] = {
+		{ "am.keepCR", "true" },
+		{ "core.FSCache", "true" },
+		{ "core.multiPackIndex", "true" },
+		{ "core.preloadIndex", "true" },
+#ifndef WIN32
+		{ "core.untrackedCache", "true" },
+#else
+		/*
+		 * Unfortunately, Scalar's Functional Tests demonstrated
+		 * that the untracked cache feature is unreliable on Windows
+		 * (which is a bummer because that platform would benefit the
+		 * most from it). For some reason, freshly created files seem
+		 * not to update the directory's `lastModified` time
+		 * immediately, but the untracked cache would need to rely on
+		 * that.
+		 *
+		 * Therefore, with a sad heart, we disable this very useful
+		 * feature on Windows.
+		 */
+		{ "core.untrackedCache", "false" },
+#endif
+		{ "core.bare", "false" },
+		{ "core.logAllRefUpdates", "true" },
+		{ "credential.https://dev.azure.com.useHttpPath", "true" },
+		{ "credential.validate", "false" }, /* GCM4W-only */
+		{ "gc.auto", "0" },
+		{ "gui.GCWarning", "false" },
+		{ "index.threads", "true" },
+		{ "index.version", "4" },
+		{ "merge.stat", "false" },
+		{ "merge.renames", "false" },
+		{ "pack.useBitmaps", "false" },
+		{ "pack.useSparse", "true" },
+		{ "receive.autoGC", "false" },
+		{ "reset.quiet", "true" },
+		{ "feature.manyFiles", "false" },
+		{ "feature.experimental", "false" },
+		{ "fetch.unpackLimit", "1" },
+		{ "fetch.writeCommitGraph", "false" },
+#ifdef WIN32
+		{ "http.sslBackend", "schannel" },
+#endif
+		{ "status.aheadBehind", "false" },
+		{ "commitGraph.generationVersion", "1" },
+		{ "core.autoCRLF", "false" },
+		{ "core.safeCRLF", "false" },
+		{ NULL, NULL },
+	};
+	int i;
+	char *value;
+
+	for (i = 0; config[i].key; i++) {
+		if (git_config_get_string(config[i].key, &value)) {
+			trace2_data_string("scalar", the_repository, config[i].key, "created");
+			if (git_config_set_gently(config[i].key,
+						  config[i].value) < 0)
+				return error(_("could not configure %s=%s"),
+					     config[i].key, config[i].value);
+		} else {
+			trace2_data_string("scalar", the_repository, config[i].key, "exists");
+			free(value);
+		}
+	}
+
+	/*
+	 * The `log.excludeDecoration` setting is special because it allows
+	 * for multiple values.
+	 */
+	if (git_config_get_string("log.excludeDecoration", &value)) {
+		trace2_data_string("scalar", the_repository,
+				   "log.excludeDecoration", "created");
+		if (git_config_set_multivar_gently("log.excludeDecoration",
+						   "refs/prefetch/*",
+						   CONFIG_REGEX_NONE, 0))
+			return error(_("could not configure "
+				       "log.excludeDecoration"));
+	} else {
+		trace2_data_string("scalar", the_repository,
+				   "log.excludeDecoration", "exists");
+		free(value);
+	}
+
+	return 0;
+}
+
+static int start_maintenance(void)
+{
+	return run_git("maintenance", "start", NULL);
+}
+
+static int add_enlistment(void)
+{
+	int res;
+
+	if (!the_repository->worktree)
+		die(_("Scalar enlistments require a worktree"));
+
+	res = run_git("config", "--global", "--get", "--fixed-value",
+		      "scalar.repo", the_repository->worktree, NULL);
+
+	/*
+	 * If the setting is already there, then do nothing.
+	 */
+	if (!res)
+		return 0;
+
+	return run_git("config", "--global", "--add",
+		       "scalar.repo", the_repository->worktree, NULL);
+}
+
+static int register_dir(void)
+{
+	int res = add_enlistment();
+
+	if (!res)
+		res = set_recommended_config();
+
+	if (!res)
+		res = start_maintenance();
+
+	return res;
+}
+
+static int cmd_register(int argc, const char **argv)
+{
+	struct option options[] = {
+		OPT_END(),
+	};
+	const char * const usage[] = {
+		N_("scalar register [<enlistment>]"),
+		NULL
+	};
+
+	argc = parse_options(argc, argv, NULL, options,
+			     usage, 0);
+
+	setup_enlistment_directory(argc, argv, usage, options, NULL);
+
+	return register_dir();
+}
 
 static struct {
 	const char *name;
 	int (*fn)(int, const char **);
 } builtins[] = {
+	{ "register", cmd_register },
 	{ NULL, NULL},
 };
 
diff --git a/contrib/scalar/scalar.txt b/contrib/scalar/scalar.txt
index 5f7131861a5..41429db7990 100644
--- a/contrib/scalar/scalar.txt
+++ b/contrib/scalar/scalar.txt
@@ -8,7 +8,7 @@  scalar - an opinionated repository management tool
 SYNOPSIS
 --------
 [verse]
-scalar <command> [<options>]
+scalar register [<enlistment>]
 
 DESCRIPTION
 -----------
@@ -29,6 +29,18 @@  will be identical to the worktree.
 The `scalar` command implements various subcommands, and different options
 depending on the subcommand.
 
+COMMANDS
+--------
+
+Register
+~~~~~~~~
+
+register [<enlistment>]::
+    Adds the enlistment's repository to the list of registered repositories
+    and starts background maintenance. If `<enlistment>` is not provided,
+    then the enlistment associated with the current working directory is
+    registered.
+
 SEE ALSO
 --------
 linkgit:git-maintenance[1].