Message ID | bbbc4c3339043bcd718dd2defcbaaaac2092227a.1631630356.git.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Upstreaming the Scalar command | expand |
On Tue, Sep 14, 2021 at 7:39 AM Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com> wrote: ... > +static int set_recommended_config(void) > +{ > + struct { > + const char *key; > + const char *value; > + } config[] = { > + { "am.keepCR", "true" }, > + { "core.FSCache", "true" }, > + { "core.multiPackIndex", "true" }, > + { "core.preloadIndex", "true" }, > +#ifndef WIN32 > + { "core.untrackedCache", "true" }, > +#else > + /* > + * Unfortunately, Scalar's Functional Tests demonstrated > + * that the untracked cache feature is unreliable on Windows > + * (which is a bummer because that platform would benefit the > + * most from it). For some reason, freshly created files seem > + * not to update the directory's `lastModified` time > + * immediately, but the untracked cache would need to rely on > + * that. > + * > + * Therefore, with a sad heart, we disable this very useful > + * feature on Windows. > + */ > + { "core.untrackedCache", "false" }, > +#endif Interesting. (I'm somewhat leery of the untrackedCache just knowing that it used to operate despite an exponential number of visits to files (exponential in depth of directories) and getting different answers with different visits, making me feel like it was black magic that it ever worked and wondering what kind of corner case issues still lurk with it. See e.g. https://lore.kernel.org/git/CABPp-BFiwzzUgiTj_zu+vF5x20L0=1cf25cHwk7KZQj2YkVzXw@mail.gmail.com/) > + { "core.logAllRefUpdates", "true" }, > + { "credential.https://dev.azure.com.useHttpPath", "true" }, Not only opinionated, but special configuration for certain sites? I'm not complaining, just slightly surprised. > + { "credential.validate", "false" }, /* GCM4W-only */ > + { "gc.auto", "0" }, > + { "gui.GCWarning", "false" }, > + { "index.threads", "true" }, > + { "index.version", "4" }, I take it your users don't make use of jgit? (Users aren't using jgit directly here, at least not to my knowledge, but multiple gradle plugins do.) I tried turning this on a while back, and quickly got multiple reports of problems because jgit didn't understand the index. I had to turn it off and send out various PSAs on how to recover. > + { "merge.stat", "false" }, > + { "merge.renames", "false" }, Is this just historical and not needed anymore, is it here just for a little longer and you are planning on transitioning away from this, or are you still set on this setting? > + { "pack.useBitmaps", "false" }, I don't understand anything bitmap related, but I thought they were performance related, so I'm surprised by this one. Is there a reason for this one? (Is it handled by maintenance instead?) > + { "pack.useSparse", "true" }, > + { "receive.autoGC", "false" }, > + { "reset.quiet", "true" }, > + { "feature.manyFiles", "false" }, If you simply set core.untrackedCache to false _after_ setting feature.manyFiles to true, would it make sense to switch this? (Or does it matter, since you've already individually set all the config settings that this one would set?) > + { "feature.experimental", "false" }, > + { "fetch.unpackLimit", "1" }, > + { "fetch.writeCommitGraph", "false" }, > +#ifdef WIN32 > + { "http.sslBackend", "schannel" }, > +#endif > + { "status.aheadBehind", "false" }, > + { "commitGraph.generationVersion", "1" }, > + { "core.autoCRLF", "false" }, > + { "core.safeCRLF", "false" }, > + { NULL, NULL }, > + }; Are there easy-ish ways for other groups of users to adopt scalar but change the list of config settings (e.g. index.version and merge.renames) in some common way for all those users?
Sorry, one more thing... On Tue, Sep 14, 2021 at 7:39 AM Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com> wrote: ... > + /* check if currently in enlistment root with src/ workdir */ > + strbuf_addstr(&path, "/src/.git"); > + if (is_git_directory(path.buf)) { ...and... > + /* check if currently in workdir */ > + strbuf_addstr(&path, "/.git"); > + if (is_git_directory(path.buf)) { Do these two checks suggest that only a primary worktree can be enlisted with scalar? (Is git-worktree generally incompatible?)
On Mon, Sep 27 2021, Elijah Newren wrote: > On Tue, Sep 14, 2021 at 7:39 AM Derrick Stolee via GitGitGadget > <gitgitgadget@gmail.com> wrote: > ... >> + { "pack.useBitmaps", "false" }, > > I don't understand anything bitmap related, but I thought they were > performance related, so I'm surprised by this one. Is there a reason > for this one? (Is it handled by maintenance instead?) I don't know why Derrick did this, but there's still AFAIK cases where bitmaps are worse than not in the context of a client (the scalar use-case), see the rabbit hole starting at: https://lore.kernel.org/git/878s6nfq54.fsf@evledraar.gmail.com/
Hi Elijah, On Mon, 27 Sep 2021, Elijah Newren wrote: > On Tue, Sep 14, 2021 at 7:39 AM Derrick Stolee via GitGitGadget > <gitgitgadget@gmail.com> wrote: > ... > > +static int set_recommended_config(void) > > +{ > > + struct { > > + const char *key; > > + const char *value; > > + } config[] = { > > + { "am.keepCR", "true" }, > > + { "core.FSCache", "true" }, > > + { "core.multiPackIndex", "true" }, > > + { "core.preloadIndex", "true" }, > > +#ifndef WIN32 > > + { "core.untrackedCache", "true" }, > > +#else > > + /* > > + * Unfortunately, Scalar's Functional Tests demonstrated > > + * that the untracked cache feature is unreliable on Windows > > + * (which is a bummer because that platform would benefit the > > + * most from it). For some reason, freshly created files seem > > + * not to update the directory's `lastModified` time > > + * immediately, but the untracked cache would need to rely on > > + * that. > > + * > > + * Therefore, with a sad heart, we disable this very useful > > + * feature on Windows. > > + */ > > + { "core.untrackedCache", "false" }, > > +#endif > > Interesting. (I'm somewhat leery of the untrackedCache just knowing > that it used to operate despite an exponential number of visits to > files (exponential in depth of directories) and getting different > answers with different visits, making me feel like it was black magic > that it ever worked and wondering what kind of corner case issues > still lurk with it. See e.g. > https://lore.kernel.org/git/CABPp-BFiwzzUgiTj_zu+vF5x20L0=1cf25cHwk7KZQj2YkVzXw@mail.gmail.com/) The implementation of the untracked cache certainly is quite a challenge to wrap one's head around, for sure. However, it does manage to speed up operations substantially (when it works). The real fun starts when you turn on the FSMonitor, though. Then it is reliable, all of a sudden! The reason seems to be some sort of delayed lastModified (AKA mtime) evaluation which is somehow triggered by FSMonitor ;-) So in microsoft/git, where we include FSMonitor and turn it on as part of `scalar clone`, we also enable the untracked cache, for noticeably happier users. > > + { "core.logAllRefUpdates", "true" }, > > + { "credential.https://dev.azure.com.useHttpPath", "true" }, > > Not only opinionated, but special configuration for certain sites? > I'm not complaining, just slightly surprised. Yes. I am not aware of other sites where you would want to use different credentials depending on the URL path, but Azure DevOps definitely is such a site, and therefore needs `useHttpPath`. Rather than requiring users to know this, we set it for them. > > + { "credential.validate", "false" }, /* GCM4W-only */ > > + { "gc.auto", "0" }, > > + { "gui.GCWarning", "false" }, > > + { "index.threads", "true" }, > > + { "index.version", "4" }, > > I take it your users don't make use of jgit? Nope ;-) I doubt that the features we use to make Git scalable are implemented in JGit. > (Users aren't using jgit directly here, at least not to my knowledge, > but multiple gradle plugins do.) I tried turning this on a while back, > and quickly got multiple reports of problems because jgit didn't > understand the index. I had to turn it off and send out various PSAs on > how to recover. TBH it gives me shivers of dread thinking about large repositories/worktrees being handled within a Java VM. The amount of, let's call it "non-canonical" code, required by JGit to make it somewhat performant, is staggering. Just think about the way you have to emulate mmap()ing part of a packfile and interpreting it as a packed C struct. I forgot the details, of course, and I am quite glad that I did. > > + { "merge.stat", "false" }, > > + { "merge.renames", "false" }, > > Is this just historical and not needed anymore, is it here just for a > little longer and you are planning on transitioning away from this, or > are you still set on this setting? It is here mostly for historical reasons. > > + { "pack.useBitmaps", "false" }, > > I don't understand anything bitmap related, but I thought they were > performance related, so I'm surprised by this one. Is there a reason > for this one? (Is it handled by maintenance instead?) Again, this is here for historical reasons. Scalar sets this, and my goal with this patch series is to port it from .NET to C. So I did not question the reasoning. My _guess_ however is that bitmaps really only work well when everything is in one single pack. Which is rather not the case with Scalar enlistments: they are way too large to be repacked all the time. > > + { "pack.useSparse", "true" }, > > + { "receive.autoGC", "false" }, > > + { "reset.quiet", "true" }, > > + { "feature.manyFiles", "false" }, > > If you simply set core.untrackedCache to false _after_ setting > feature.manyFiles to true, would it make sense to switch this? (Or > does it matter, since you've already individually set all the config > settings that this one would set?) Frankly, I was a bit puzzled why `feature.manyFiles` was set to `false`. The rationale is explained in https://github.com/microsoft/scalar/commit/2fc84dba9c95: The feature.* config settings change the defaults for some other config settings. We already monitor config settings pretty carefully, so let's disable these. As to switching this, it shouldn't matter. The idea of `feature.*` is to set defaults, but not override any explicitly configured settings. > > + { "feature.experimental", "false" }, > > + { "fetch.unpackLimit", "1" }, > > + { "fetch.writeCommitGraph", "false" }, > > +#ifdef WIN32 > > + { "http.sslBackend", "schannel" }, > > +#endif > > + { "status.aheadBehind", "false" }, > > + { "commitGraph.generationVersion", "1" }, > > + { "core.autoCRLF", "false" }, > > + { "core.safeCRLF", "false" }, > > + { NULL, NULL }, > > + }; > > Are there easy-ish ways for other groups of users to adopt scalar but > change the list of config settings (e.g. index.version and > merge.renames) in some common way for all those users? Not in Scalar. I would hope, however, that we could figure out ways to make this more configurable when re-implementing this functionality in core Git. I have a couple ideas, but nothing fleshed out, and besides, I do not want to think too far ahead, I already made that mistake and then got bogged down in discussions about minimal vs non-minimal changes in the top-level Makefile ;-) So yeah, good point, but it's probably not a good time yet to discuss this tangent. Thank you for reviewing, Dscho
Hi Elijah, On Mon, 27 Sep 2021, Elijah Newren wrote: > Sorry, one more thing... > > On Tue, Sep 14, 2021 at 7:39 AM Derrick Stolee via GitGitGadget > <gitgitgadget@gmail.com> wrote: > ... > > + /* check if currently in enlistment root with src/ workdir */ > > + strbuf_addstr(&path, "/src/.git"); > > + if (is_git_directory(path.buf)) { > > ...and... > > > + /* check if currently in workdir */ > > + strbuf_addstr(&path, "/.git"); > > + if (is_git_directory(path.buf)) { > > Do these two checks suggest that only a primary worktree can be > enlisted with scalar? (Is git-worktree generally incompatible?) Good point! I think we'll need to use `is_nonbare_repository_dir()` instead. This also has the additional benefit of doing away with quite a bit of `/.git` appending and undoing it. I.e. it simplifies the code dramatically. Ciao, Dscho
diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c index 7cff29e0fcd..0e627bb100e 100644 --- a/contrib/scalar/scalar.c +++ b/contrib/scalar/scalar.c @@ -5,11 +5,266 @@ #include "cache.h" #include "gettext.h" #include "parse-options.h" +#include "config.h" +#include "run-command.h" + +/* + * Remove the deepest subdirectory in the provided path string. Path must not + * include a trailing path separator. Returns 1 if parent directory found, + * otherwise 0. + */ +static int strbuf_parent_directory(struct strbuf *buf) +{ + size_t len = buf->len; + size_t offset = offset_1st_component(buf->buf); + char *path_sep = find_last_dir_sep(buf->buf + offset); + strbuf_setlen(buf, path_sep ? path_sep - buf->buf : offset); + + return buf->len < len; +} + +static void setup_enlistment_directory(int argc, const char **argv, + const char * const *usagestr, + const struct option *options, + struct strbuf *enlistment_root) +{ + struct strbuf path = STRBUF_INIT; + char *root; + int enlistment_found = 0; + + if (startup_info->have_repository) + BUG("gitdir already set up?!?"); + + if (argc > 1) + usage_with_options(usagestr, options); + + /* find the worktree, determine its corresponding root */ + if (argc == 1) + strbuf_add_absolute_path(&path, argv[0]); + else if (strbuf_getcwd(&path) < 0) + die(_("need a working directory")); + + strbuf_trim_trailing_dir_sep(&path); + do { + const size_t len = path.len; + + /* check if currently in enlistment root with src/ workdir */ + strbuf_addstr(&path, "/src/.git"); + if (is_git_directory(path.buf)) { + strbuf_strip_suffix(&path, "/.git"); + + if (enlistment_root) + strbuf_add(enlistment_root, path.buf, len); + + enlistment_found = 1; + break; + } + + /* reset to original path */ + strbuf_setlen(&path, len); + + /* check if currently in workdir */ + strbuf_addstr(&path, "/.git"); + if (is_git_directory(path.buf)) { + strbuf_setlen(&path, len); + + if (enlistment_root) { + /* + * If the worktree's directory's name is `src`, the enlistment is the + * parent directory, otherwise it is identical to the worktree. + */ + root = strip_path_suffix(path.buf, "src"); + strbuf_addstr(enlistment_root, root ? root : path.buf); + free(root); + } + + enlistment_found = 1; + break; + } + + strbuf_setlen(&path, len); + } while (strbuf_parent_directory(&path)); + + if (!enlistment_found) + die(_("could not find enlistment root")); + + if (chdir(path.buf) < 0) + die_errno(_("could not switch to '%s'"), path.buf); + + strbuf_release(&path); + setup_git_directory(); +} + +static int run_git(const char *arg, ...) +{ + struct strvec argv = STRVEC_INIT; + va_list args; + const char *p; + int res; + + va_start(args, arg); + strvec_push(&argv, arg); + while ((p = va_arg(args, const char *))) + strvec_push(&argv, p); + va_end(args); + + res = run_command_v_opt(argv.v, RUN_GIT_CMD); + + strvec_clear(&argv); + return res; +} + +static int set_recommended_config(void) +{ + struct { + const char *key; + const char *value; + } config[] = { + { "am.keepCR", "true" }, + { "core.FSCache", "true" }, + { "core.multiPackIndex", "true" }, + { "core.preloadIndex", "true" }, +#ifndef WIN32 + { "core.untrackedCache", "true" }, +#else + /* + * Unfortunately, Scalar's Functional Tests demonstrated + * that the untracked cache feature is unreliable on Windows + * (which is a bummer because that platform would benefit the + * most from it). For some reason, freshly created files seem + * not to update the directory's `lastModified` time + * immediately, but the untracked cache would need to rely on + * that. + * + * Therefore, with a sad heart, we disable this very useful + * feature on Windows. + */ + { "core.untrackedCache", "false" }, +#endif + { "core.logAllRefUpdates", "true" }, + { "credential.https://dev.azure.com.useHttpPath", "true" }, + { "credential.validate", "false" }, /* GCM4W-only */ + { "gc.auto", "0" }, + { "gui.GCWarning", "false" }, + { "index.threads", "true" }, + { "index.version", "4" }, + { "merge.stat", "false" }, + { "merge.renames", "false" }, + { "pack.useBitmaps", "false" }, + { "pack.useSparse", "true" }, + { "receive.autoGC", "false" }, + { "reset.quiet", "true" }, + { "feature.manyFiles", "false" }, + { "feature.experimental", "false" }, + { "fetch.unpackLimit", "1" }, + { "fetch.writeCommitGraph", "false" }, +#ifdef WIN32 + { "http.sslBackend", "schannel" }, +#endif + { "status.aheadBehind", "false" }, + { "commitGraph.generationVersion", "1" }, + { "core.autoCRLF", "false" }, + { "core.safeCRLF", "false" }, + { NULL, NULL }, + }; + int i; + char *value; + + for (i = 0; config[i].key; i++) { + if (git_config_get_string(config[i].key, &value)) { + trace2_data_string("scalar", the_repository, config[i].key, "created"); + if (git_config_set_gently(config[i].key, + config[i].value) < 0) + return error(_("could not configure %s=%s"), + config[i].key, config[i].value); + } else { + trace2_data_string("scalar", the_repository, config[i].key, "exists"); + free(value); + } + } + + /* + * The `log.excludeDecoration` setting is special because it allows + * for multiple values. + */ + if (git_config_get_string("log.excludeDecoration", &value)) { + trace2_data_string("scalar", the_repository, + "log.excludeDecoration", "created"); + if (git_config_set_multivar_gently("log.excludeDecoration", + "refs/prefetch/*", + CONFIG_REGEX_NONE, 0)) + return error(_("could not configure " + "log.excludeDecoration")); + } else { + trace2_data_string("scalar", the_repository, + "log.excludeDecoration", "exists"); + free(value); + } + + return 0; +} + +static int start_maintenance(void) +{ + return run_git("maintenance", "start", NULL); +} + +static int add_enlistment(void) +{ + int res; + + if (!the_repository->worktree) + die(_("Scalar enlistments require a worktree")); + + res = run_git("config", "--global", "--get", "--fixed-value", + "scalar.repo", the_repository->worktree, NULL); + + /* + * If the setting is already there, then do nothing. + */ + if (!res) + return 0; + + return run_git("config", "--global", "--add", + "scalar.repo", the_repository->worktree, NULL); +} + +static int register_dir(void) +{ + int res = add_enlistment(); + + if (!res) + res = set_recommended_config(); + + if (!res) + res = start_maintenance(); + + return res; +} + +static int cmd_register(int argc, const char **argv) +{ + struct option options[] = { + OPT_END(), + }; + const char * const usage[] = { + N_("scalar register [<enlistment>]"), + NULL + }; + + argc = parse_options(argc, argv, NULL, options, + usage, 0); + + setup_enlistment_directory(argc, argv, usage, options, NULL); + + return register_dir(); +} static struct { const char *name; int (*fn)(int, const char **); } builtins[] = { + { "register", cmd_register }, { NULL, NULL}, }; diff --git a/contrib/scalar/scalar.txt b/contrib/scalar/scalar.txt index 5f7131861a5..568987064b2 100644 --- a/contrib/scalar/scalar.txt +++ b/contrib/scalar/scalar.txt @@ -8,7 +8,7 @@ scalar - an opinionated repository management tool SYNOPSIS -------- [verse] -scalar <command> [<options>] +scalar register [<enlistment>] DESCRIPTION ----------- @@ -29,6 +29,22 @@ will be identical to the worktree. The `scalar` command implements various subcommands, and different options depending on the subcommand. +COMMANDS +-------- + +Register +~~~~~~~~ + +register [<enlistment>]:: + Adds the enlistment's repository to the list of registered repositories + and starts background maintenance. If `<enlistment>` is not provided, + then the enlistment associated with the current working directory is + registered. ++ +Note: when this subcommand is called in a worktree that is called `src/`, its +parent directory is considered to be the Scalar enlistment. If the worktree is +_not_ called `src/`, it itself will be considered to be the Scalar enlistment. + SEE ALSO -------- linkgit:git-maintenance[1].