diff mbox series

[v4,04/15] scalar: 'register' sets recommended config and starts maintenance

Message ID bbbc4c3339043bcd718dd2defcbaaaac2092227a.1631630356.git.gitgitgadget@gmail.com (mailing list archive)
State New, archived
Headers show
Series Upstreaming the Scalar command | expand

Commit Message

Derrick Stolee Sept. 14, 2021, 2:39 p.m. UTC
From: Derrick Stolee <dstolee@microsoft.com>

Let's start implementing the `register` command. With this commit,
recommended settings are configured upon `scalar register`, and Git's
background maintenance is started.

The recommended config settings may very well change in the future. For
example, once the built-in FSMonitor is available, we will want to
enable it upon `scalar register`. For that reason, we explicitly support
running `scalar register` in an already-registered enlistment.

Co-authored-by: Victoria Dye <vdye@github.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 contrib/scalar/scalar.c   | 255 ++++++++++++++++++++++++++++++++++++++
 contrib/scalar/scalar.txt |  18 ++-
 2 files changed, 272 insertions(+), 1 deletion(-)

Comments

Elijah Newren Sept. 28, 2021, 5:01 a.m. UTC | #1
On Tue, Sep 14, 2021 at 7:39 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
...
> +static int set_recommended_config(void)
> +{
> +       struct {
> +               const char *key;
> +               const char *value;
> +       } config[] = {
> +               { "am.keepCR", "true" },
> +               { "core.FSCache", "true" },
> +               { "core.multiPackIndex", "true" },
> +               { "core.preloadIndex", "true" },
> +#ifndef WIN32
> +               { "core.untrackedCache", "true" },
> +#else
> +               /*
> +                * Unfortunately, Scalar's Functional Tests demonstrated
> +                * that the untracked cache feature is unreliable on Windows
> +                * (which is a bummer because that platform would benefit the
> +                * most from it). For some reason, freshly created files seem
> +                * not to update the directory's `lastModified` time
> +                * immediately, but the untracked cache would need to rely on
> +                * that.
> +                *
> +                * Therefore, with a sad heart, we disable this very useful
> +                * feature on Windows.
> +                */
> +               { "core.untrackedCache", "false" },
> +#endif

Interesting.  (I'm somewhat leery of the untrackedCache just knowing
that it used to operate despite an exponential number of visits to
files (exponential in depth of directories) and getting different
answers with different visits, making me feel like it was black magic
that it ever worked and wondering what kind of corner case issues
still lurk with it.  See e.g.
https://lore.kernel.org/git/CABPp-BFiwzzUgiTj_zu+vF5x20L0=1cf25cHwk7KZQj2YkVzXw@mail.gmail.com/)

> +               { "core.logAllRefUpdates", "true" },
> +               { "credential.https://dev.azure.com.useHttpPath", "true" },

Not only opinionated, but special configuration for certain sites?
I'm not complaining, just slightly surprised.

> +               { "credential.validate", "false" }, /* GCM4W-only */
> +               { "gc.auto", "0" },
> +               { "gui.GCWarning", "false" },
> +               { "index.threads", "true" },
> +               { "index.version", "4" },

I take it your users don't make use of jgit?  (Users aren't using jgit
directly here, at least not to my knowledge, but multiple gradle
plugins do.)  I tried turning this on a while back, and quickly got
multiple reports of problems because jgit didn't understand the index.
I had to turn it off and send out various PSAs on how to recover.

> +               { "merge.stat", "false" },
> +               { "merge.renames", "false" },

Is this just historical and not needed anymore, is it here just for a
little longer and you are planning on transitioning away from this, or
are you still set on this setting?

> +               { "pack.useBitmaps", "false" },

I don't understand anything bitmap related, but I thought they were
performance related, so I'm surprised by this one.  Is there a reason
for this one?  (Is it handled by maintenance instead?)

> +               { "pack.useSparse", "true" },
> +               { "receive.autoGC", "false" },
> +               { "reset.quiet", "true" },
> +               { "feature.manyFiles", "false" },

If you simply set core.untrackedCache to false _after_ setting
feature.manyFiles to true, would it make sense to switch this?  (Or
does it matter, since you've already individually set all the config
settings that this one would set?)

> +               { "feature.experimental", "false" },
> +               { "fetch.unpackLimit", "1" },
> +               { "fetch.writeCommitGraph", "false" },
> +#ifdef WIN32
> +               { "http.sslBackend", "schannel" },
> +#endif
> +               { "status.aheadBehind", "false" },
> +               { "commitGraph.generationVersion", "1" },
> +               { "core.autoCRLF", "false" },
> +               { "core.safeCRLF", "false" },
> +               { NULL, NULL },
> +       };

Are there easy-ish ways for other groups of users to adopt scalar but
change the list of config settings (e.g. index.version and
merge.renames) in some common way for all those users?
Elijah Newren Sept. 28, 2021, 5:05 a.m. UTC | #2
Sorry, one more thing...

On Tue, Sep 14, 2021 at 7:39 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
...
> +               /* check if currently in enlistment root with src/ workdir */
> +               strbuf_addstr(&path, "/src/.git");
> +               if (is_git_directory(path.buf)) {

...and...

> +               /* check if currently in workdir */
> +               strbuf_addstr(&path, "/.git");
> +               if (is_git_directory(path.buf)) {

Do these two checks suggest that only a primary worktree can be
enlisted with scalar?  (Is git-worktree generally incompatible?)
Ævar Arnfjörð Bjarmason Sept. 28, 2021, 7:27 a.m. UTC | #3
On Mon, Sep 27 2021, Elijah Newren wrote:

> On Tue, Sep 14, 2021 at 7:39 AM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
> ...
>> +               { "pack.useBitmaps", "false" },
>
> I don't understand anything bitmap related, but I thought they were
> performance related, so I'm surprised by this one.  Is there a reason
> for this one?  (Is it handled by maintenance instead?)

I don't know why Derrick did this, but there's still AFAIK cases where
bitmaps are worse than not in the context of a client (the scalar
use-case), see the rabbit hole starting at:
https://lore.kernel.org/git/878s6nfq54.fsf@evledraar.gmail.com/
Johannes Schindelin Oct. 6, 2021, 8:32 p.m. UTC | #4
Hi Elijah,

On Mon, 27 Sep 2021, Elijah Newren wrote:

> On Tue, Sep 14, 2021 at 7:39 AM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
> ...
> > +static int set_recommended_config(void)
> > +{
> > +       struct {
> > +               const char *key;
> > +               const char *value;
> > +       } config[] = {
> > +               { "am.keepCR", "true" },
> > +               { "core.FSCache", "true" },
> > +               { "core.multiPackIndex", "true" },
> > +               { "core.preloadIndex", "true" },
> > +#ifndef WIN32
> > +               { "core.untrackedCache", "true" },
> > +#else
> > +               /*
> > +                * Unfortunately, Scalar's Functional Tests demonstrated
> > +                * that the untracked cache feature is unreliable on Windows
> > +                * (which is a bummer because that platform would benefit the
> > +                * most from it). For some reason, freshly created files seem
> > +                * not to update the directory's `lastModified` time
> > +                * immediately, but the untracked cache would need to rely on
> > +                * that.
> > +                *
> > +                * Therefore, with a sad heart, we disable this very useful
> > +                * feature on Windows.
> > +                */
> > +               { "core.untrackedCache", "false" },
> > +#endif
>
> Interesting.  (I'm somewhat leery of the untrackedCache just knowing
> that it used to operate despite an exponential number of visits to
> files (exponential in depth of directories) and getting different
> answers with different visits, making me feel like it was black magic
> that it ever worked and wondering what kind of corner case issues
> still lurk with it.  See e.g.
> https://lore.kernel.org/git/CABPp-BFiwzzUgiTj_zu+vF5x20L0=1cf25cHwk7KZQj2YkVzXw@mail.gmail.com/)

The implementation of the untracked cache certainly is quite a challenge
to wrap one's head around, for sure. However, it does manage to speed up
operations substantially (when it works).

The real fun starts when you turn on the FSMonitor, though. Then it is
reliable, all of a sudden! The reason seems to be some sort of delayed
lastModified (AKA mtime) evaluation which is somehow triggered by
FSMonitor ;-)

So in microsoft/git, where we include FSMonitor and turn it on as part of
`scalar clone`, we also enable the untracked cache, for noticeably happier
users.

> > +               { "core.logAllRefUpdates", "true" },
> > +               { "credential.https://dev.azure.com.useHttpPath", "true" },
>
> Not only opinionated, but special configuration for certain sites?
> I'm not complaining, just slightly surprised.

Yes. I am not aware of other sites where you would want to use different
credentials depending on the URL path, but Azure DevOps definitely is such
a site, and therefore needs `useHttpPath`. Rather than requiring users to
know this, we set it for them.

> > +               { "credential.validate", "false" }, /* GCM4W-only */
> > +               { "gc.auto", "0" },
> > +               { "gui.GCWarning", "false" },
> > +               { "index.threads", "true" },
> > +               { "index.version", "4" },
>
> I take it your users don't make use of jgit?

Nope ;-) I doubt that the features we use to make Git scalable are
implemented in JGit.

> (Users aren't using jgit directly here, at least not to my knowledge,
> but multiple gradle plugins do.)  I tried turning this on a while back,
> and quickly got multiple reports of problems because jgit didn't
> understand the index. I had to turn it off and send out various PSAs on
> how to recover.

TBH it gives me shivers of dread thinking about large
repositories/worktrees being handled within a Java VM. The amount of,
let's call it "non-canonical" code, required by JGit to make it somewhat
performant, is staggering. Just think about the way you have to emulate
mmap()ing part of a packfile and interpreting it as a packed C struct. I
forgot the details, of course, and I am quite glad that I did.

> > +               { "merge.stat", "false" },
> > +               { "merge.renames", "false" },
>
> Is this just historical and not needed anymore, is it here just for a
> little longer and you are planning on transitioning away from this, or
> are you still set on this setting?

It is here mostly for historical reasons.

> > +               { "pack.useBitmaps", "false" },
>
> I don't understand anything bitmap related, but I thought they were
> performance related, so I'm surprised by this one.  Is there a reason
> for this one?  (Is it handled by maintenance instead?)

Again, this is here for historical reasons. Scalar sets this, and my goal
with this patch series is to port it from .NET to C. So I did not question
the reasoning.

My _guess_ however is that bitmaps really only work well when everything
is in one single pack. Which is rather not the case with Scalar
enlistments: they are way too large to be repacked all the time.

> > +               { "pack.useSparse", "true" },
> > +               { "receive.autoGC", "false" },
> > +               { "reset.quiet", "true" },
> > +               { "feature.manyFiles", "false" },
>
> If you simply set core.untrackedCache to false _after_ setting
> feature.manyFiles to true, would it make sense to switch this?  (Or
> does it matter, since you've already individually set all the config
> settings that this one would set?)

Frankly, I was a bit puzzled why `feature.manyFiles` was set to `false`.
The rationale is explained in
https://github.com/microsoft/scalar/commit/2fc84dba9c95:

	The feature.* config settings change the defaults for some other
	config settings. We already monitor config settings pretty carefully,
	so let's disable these.

As to switching this, it shouldn't matter. The idea of `feature.*` is to
set defaults, but not override any explicitly configured settings.

> > +               { "feature.experimental", "false" },
> > +               { "fetch.unpackLimit", "1" },
> > +               { "fetch.writeCommitGraph", "false" },
> > +#ifdef WIN32
> > +               { "http.sslBackend", "schannel" },
> > +#endif
> > +               { "status.aheadBehind", "false" },
> > +               { "commitGraph.generationVersion", "1" },
> > +               { "core.autoCRLF", "false" },
> > +               { "core.safeCRLF", "false" },
> > +               { NULL, NULL },
> > +       };
>
> Are there easy-ish ways for other groups of users to adopt scalar but
> change the list of config settings (e.g. index.version and
> merge.renames) in some common way for all those users?

Not in Scalar.

I would hope, however, that we could figure out ways to make this more
configurable when re-implementing this functionality in core Git. I have a
couple ideas, but nothing fleshed out, and besides, I do not want to think
too far ahead, I already made that mistake and then got bogged down in
discussions about minimal vs non-minimal changes in the top-level Makefile
;-)

So yeah, good point, but it's probably not a good time yet to discuss this
tangent.

Thank you for reviewing,
Dscho
Johannes Schindelin Oct. 6, 2021, 8:38 p.m. UTC | #5
Hi Elijah,

On Mon, 27 Sep 2021, Elijah Newren wrote:

> Sorry, one more thing...
>
> On Tue, Sep 14, 2021 at 7:39 AM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
> ...
> > +               /* check if currently in enlistment root with src/ workdir */
> > +               strbuf_addstr(&path, "/src/.git");
> > +               if (is_git_directory(path.buf)) {
>
> ...and...
>
> > +               /* check if currently in workdir */
> > +               strbuf_addstr(&path, "/.git");
> > +               if (is_git_directory(path.buf)) {
>
> Do these two checks suggest that only a primary worktree can be
> enlisted with scalar?  (Is git-worktree generally incompatible?)

Good point! I think we'll need to use `is_nonbare_repository_dir()`
instead.

This also has the additional benefit of doing away with quite a bit of
`/.git` appending and undoing it. I.e. it simplifies the code
dramatically.

Ciao,
Dscho
diff mbox series

Patch

diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
index 7cff29e0fcd..0e627bb100e 100644
--- a/contrib/scalar/scalar.c
+++ b/contrib/scalar/scalar.c
@@ -5,11 +5,266 @@ 
 #include "cache.h"
 #include "gettext.h"
 #include "parse-options.h"
+#include "config.h"
+#include "run-command.h"
+
+/*
+ * Remove the deepest subdirectory in the provided path string. Path must not
+ * include a trailing path separator. Returns 1 if parent directory found,
+ * otherwise 0.
+ */
+static int strbuf_parent_directory(struct strbuf *buf)
+{
+	size_t len = buf->len;
+	size_t offset = offset_1st_component(buf->buf);
+	char *path_sep = find_last_dir_sep(buf->buf + offset);
+	strbuf_setlen(buf, path_sep ? path_sep - buf->buf : offset);
+
+	return buf->len < len;
+}
+
+static void setup_enlistment_directory(int argc, const char **argv,
+				       const char * const *usagestr,
+				       const struct option *options,
+				       struct strbuf *enlistment_root)
+{
+	struct strbuf path = STRBUF_INIT;
+	char *root;
+	int enlistment_found = 0;
+
+	if (startup_info->have_repository)
+		BUG("gitdir already set up?!?");
+
+	if (argc > 1)
+		usage_with_options(usagestr, options);
+
+	/* find the worktree, determine its corresponding root */
+	if (argc == 1)
+		strbuf_add_absolute_path(&path, argv[0]);
+	else if (strbuf_getcwd(&path) < 0)
+		die(_("need a working directory"));
+
+	strbuf_trim_trailing_dir_sep(&path);
+	do {
+		const size_t len = path.len;
+
+		/* check if currently in enlistment root with src/ workdir */
+		strbuf_addstr(&path, "/src/.git");
+		if (is_git_directory(path.buf)) {
+			strbuf_strip_suffix(&path, "/.git");
+
+			if (enlistment_root)
+				strbuf_add(enlistment_root, path.buf, len);
+
+			enlistment_found = 1;
+			break;
+		}
+
+		/* reset to original path */
+		strbuf_setlen(&path, len);
+
+		/* check if currently in workdir */
+		strbuf_addstr(&path, "/.git");
+		if (is_git_directory(path.buf)) {
+			strbuf_setlen(&path, len);
+
+			if (enlistment_root) {
+				/*
+				 * If the worktree's directory's name is `src`, the enlistment is the
+				 * parent directory, otherwise it is identical to the worktree.
+				 */
+				root = strip_path_suffix(path.buf, "src");
+				strbuf_addstr(enlistment_root, root ? root : path.buf);
+				free(root);
+			}
+
+			enlistment_found = 1;
+			break;
+		}
+
+		strbuf_setlen(&path, len);
+	} while (strbuf_parent_directory(&path));
+
+	if (!enlistment_found)
+		die(_("could not find enlistment root"));
+
+	if (chdir(path.buf) < 0)
+		die_errno(_("could not switch to '%s'"), path.buf);
+
+	strbuf_release(&path);
+	setup_git_directory();
+}
+
+static int run_git(const char *arg, ...)
+{
+	struct strvec argv = STRVEC_INIT;
+	va_list args;
+	const char *p;
+	int res;
+
+	va_start(args, arg);
+	strvec_push(&argv, arg);
+	while ((p = va_arg(args, const char *)))
+		strvec_push(&argv, p);
+	va_end(args);
+
+	res = run_command_v_opt(argv.v, RUN_GIT_CMD);
+
+	strvec_clear(&argv);
+	return res;
+}
+
+static int set_recommended_config(void)
+{
+	struct {
+		const char *key;
+		const char *value;
+	} config[] = {
+		{ "am.keepCR", "true" },
+		{ "core.FSCache", "true" },
+		{ "core.multiPackIndex", "true" },
+		{ "core.preloadIndex", "true" },
+#ifndef WIN32
+		{ "core.untrackedCache", "true" },
+#else
+		/*
+		 * Unfortunately, Scalar's Functional Tests demonstrated
+		 * that the untracked cache feature is unreliable on Windows
+		 * (which is a bummer because that platform would benefit the
+		 * most from it). For some reason, freshly created files seem
+		 * not to update the directory's `lastModified` time
+		 * immediately, but the untracked cache would need to rely on
+		 * that.
+		 *
+		 * Therefore, with a sad heart, we disable this very useful
+		 * feature on Windows.
+		 */
+		{ "core.untrackedCache", "false" },
+#endif
+		{ "core.logAllRefUpdates", "true" },
+		{ "credential.https://dev.azure.com.useHttpPath", "true" },
+		{ "credential.validate", "false" }, /* GCM4W-only */
+		{ "gc.auto", "0" },
+		{ "gui.GCWarning", "false" },
+		{ "index.threads", "true" },
+		{ "index.version", "4" },
+		{ "merge.stat", "false" },
+		{ "merge.renames", "false" },
+		{ "pack.useBitmaps", "false" },
+		{ "pack.useSparse", "true" },
+		{ "receive.autoGC", "false" },
+		{ "reset.quiet", "true" },
+		{ "feature.manyFiles", "false" },
+		{ "feature.experimental", "false" },
+		{ "fetch.unpackLimit", "1" },
+		{ "fetch.writeCommitGraph", "false" },
+#ifdef WIN32
+		{ "http.sslBackend", "schannel" },
+#endif
+		{ "status.aheadBehind", "false" },
+		{ "commitGraph.generationVersion", "1" },
+		{ "core.autoCRLF", "false" },
+		{ "core.safeCRLF", "false" },
+		{ NULL, NULL },
+	};
+	int i;
+	char *value;
+
+	for (i = 0; config[i].key; i++) {
+		if (git_config_get_string(config[i].key, &value)) {
+			trace2_data_string("scalar", the_repository, config[i].key, "created");
+			if (git_config_set_gently(config[i].key,
+						  config[i].value) < 0)
+				return error(_("could not configure %s=%s"),
+					     config[i].key, config[i].value);
+		} else {
+			trace2_data_string("scalar", the_repository, config[i].key, "exists");
+			free(value);
+		}
+	}
+
+	/*
+	 * The `log.excludeDecoration` setting is special because it allows
+	 * for multiple values.
+	 */
+	if (git_config_get_string("log.excludeDecoration", &value)) {
+		trace2_data_string("scalar", the_repository,
+				   "log.excludeDecoration", "created");
+		if (git_config_set_multivar_gently("log.excludeDecoration",
+						   "refs/prefetch/*",
+						   CONFIG_REGEX_NONE, 0))
+			return error(_("could not configure "
+				       "log.excludeDecoration"));
+	} else {
+		trace2_data_string("scalar", the_repository,
+				   "log.excludeDecoration", "exists");
+		free(value);
+	}
+
+	return 0;
+}
+
+static int start_maintenance(void)
+{
+	return run_git("maintenance", "start", NULL);
+}
+
+static int add_enlistment(void)
+{
+	int res;
+
+	if (!the_repository->worktree)
+		die(_("Scalar enlistments require a worktree"));
+
+	res = run_git("config", "--global", "--get", "--fixed-value",
+		      "scalar.repo", the_repository->worktree, NULL);
+
+	/*
+	 * If the setting is already there, then do nothing.
+	 */
+	if (!res)
+		return 0;
+
+	return run_git("config", "--global", "--add",
+		       "scalar.repo", the_repository->worktree, NULL);
+}
+
+static int register_dir(void)
+{
+	int res = add_enlistment();
+
+	if (!res)
+		res = set_recommended_config();
+
+	if (!res)
+		res = start_maintenance();
+
+	return res;
+}
+
+static int cmd_register(int argc, const char **argv)
+{
+	struct option options[] = {
+		OPT_END(),
+	};
+	const char * const usage[] = {
+		N_("scalar register [<enlistment>]"),
+		NULL
+	};
+
+	argc = parse_options(argc, argv, NULL, options,
+			     usage, 0);
+
+	setup_enlistment_directory(argc, argv, usage, options, NULL);
+
+	return register_dir();
+}
 
 static struct {
 	const char *name;
 	int (*fn)(int, const char **);
 } builtins[] = {
+	{ "register", cmd_register },
 	{ NULL, NULL},
 };
 
diff --git a/contrib/scalar/scalar.txt b/contrib/scalar/scalar.txt
index 5f7131861a5..568987064b2 100644
--- a/contrib/scalar/scalar.txt
+++ b/contrib/scalar/scalar.txt
@@ -8,7 +8,7 @@  scalar - an opinionated repository management tool
 SYNOPSIS
 --------
 [verse]
-scalar <command> [<options>]
+scalar register [<enlistment>]
 
 DESCRIPTION
 -----------
@@ -29,6 +29,22 @@  will be identical to the worktree.
 The `scalar` command implements various subcommands, and different options
 depending on the subcommand.
 
+COMMANDS
+--------
+
+Register
+~~~~~~~~
+
+register [<enlistment>]::
+	Adds the enlistment's repository to the list of registered repositories
+	and starts background maintenance. If `<enlistment>` is not provided,
+	then the enlistment associated with the current working directory is
+	registered.
++
+Note: when this subcommand is called in a worktree that is called `src/`, its
+parent directory is considered to be the Scalar enlistment. If the worktree is
+_not_ called `src/`, it itself will be considered to be the Scalar enlistment.
+
 SEE ALSO
 --------
 linkgit:git-maintenance[1].