Message ID | 20201007074538.25891-3-shouryashukla.oo@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | submodule: port subcommand add from shell to C | expand |
Shourya Shukla <shouryashukla.oo@gmail.com> writes: > From: Prathamesh Chavan <pc44800@gmail.com> > > Convert submodule subcommand 'add' to a builtin and call it via > 'git-submodule.sh'. > > Also, since the command die()s out in case of absence of commits in the > submodule, the keyword 'fatal' is prefixed in the error messages. > Therefore, prepend the keyword in the expected output of test t7400.6. > > While at it, eliminate the extra preprocessor directive > `#include "dir.h"` at the start of 'submodule--helper.c'. > > Mentored-by: Christian Couder <chriscool@tuxfamily.org> > Mentored-by: Stefan Beller <stefanbeller@gmail.com> > Signed-off-by: Prathamesh Chavan <pc44800@gmail.com> > Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com> > Signed-off-by: Shourya Shukla <shouryashukla.oo@gmail.com> > --- > builtin/submodule--helper.c | 391 +++++++++++++++++++++++++++++++++++- > git-submodule.sh | 161 +-------------- > t/t7400-submodule-basic.sh | 2 +- > 3 files changed, 392 insertions(+), 162 deletions(-) Whoa. That looks like a huge change. Makes me wonder if we want this split into multiple pieces, but let's read on. > diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c > index de5ad73bb8..ec0a50d032 100644 > --- a/builtin/submodule--helper.c > +++ b/builtin/submodule--helper.c > @@ -19,7 +19,6 @@ > #include "diffcore.h" > #include "diff.h" > #include "object-store.h" > -#include "dir.h" > #include "advice.h" > > #define OPT_QUIET (1 << 0) > @@ -2744,6 +2743,395 @@ static int module_set_branch(int argc, const char **argv, const char *prefix) > return !!ret; > } > > +struct add_data { > + const char *prefix; > + const char *branch; > + const char *reference_path; > + const char *sm_path; > + const char *sm_name; > + const char *repo; > + const char *realrepo; > + int depth; > + unsigned int force: 1; > + unsigned int quiet: 1; > + unsigned int progress: 1; > + unsigned int dissociate: 1; > +}; > +#define ADD_DATA_INIT { NULL, NULL, NULL, NULL, NULL, NULL, NULL, 0, 0, 0, 0, 0 } This is used in a context like this: struct add_data data = ADD_DATA_INIT; It is a tangent, but wouldn't #define ADD_DATA_INIT { 0 } be a more appropriate way to express that there is nothing other than the initialization to zero values going on? > +/* > + * Guess dir name from repository: strip leading '.*[/:]', > + * strip trailing '[:/]*.git'. > + */ The original also strips trailing '/'. The original does these in order: - if $repo ends with '/', remove that. The above description does not mention it. - if the result of the above ends with zero or more ':', followed by zero or more '/', followed by ".git", drop the matching part. The above description sounds as if it will remove ":/:/:.git" from the end (and the code seems to have the same bug, as after_slash_or_colon won't allow the code to know if the previous character before ".git" was slash or colon). - if the result of the above has '/' or ':' in it, remove everything before it and '/' or ':' itself. > +static char *guess_dir_name(const char *repo) > +{ > + const char *p, *start, *end, *limit; > + int after_slash_or_colon; > + > + after_slash_or_colon = 0; > + limit = repo + strlen(repo); > + start = repo; > + end = limit; > + for (p = repo; p < limit; p++) { > + if (starts_with(p, ".git")) { > + /* strip trailing '[:/]*.git' */ > + if (!after_slash_or_colon) > + end = p; > + p += 3; > + } else if (*p == '/' || *p == ':') { > + /* strip leading '.*[/:]' */ > + if (end == limit) > + end = p; > + after_slash_or_colon = 1; > + } else if (after_slash_or_colon) { > + start = p; > + end = limit; > + after_slash_or_colon = 0; > + } > + } > + return xstrndup(start, end - start); So, this looks quite bogus and unnatural. Checking for ".git" at every position in the string is meaningless. I wonder if something along the following (beware: not even compile tested or checked for off-by-ones) would be easier to follow and more faithful conversion to the original. ep = repo + strlen(repo); /* * eat trailing slashes - a conversion less faithful to * the original may want to loop to cull duplicated trailing * slashes, but we can leave it as user-error for now. */ if (repo < ep - 1 && ep[-1] == '/') ep--; /* eat ":*/*\.git" at the tail */ if (repo < ep - 4 && !memcmp(".git", ep - 4, 4)) { ep -= 4; while (repo < ep - 1 && ep[-1] == '/') ep--; while (repo < ep - 1 && ep[-1] == ':') ep--; } /* find the last ':' or '/' */ for (sp = ep - 1; repo <= sp; sp--) { if (*sp == '/' || *sp == ':') break; } sp++; /* exclude '/' or ':' itself */ /* sp point at the beginning, and ep points at the end */ return xmemdupz(sp, ep - sp); > +} That's it for now; I didn't look at the remainder of this patch during this sitting before I have to move on, but I may revisit the rest at some other time. Thanks.
Shourya Shukla <shouryashukla.oo@gmail.com> writes: > +static void fprintf_submodule_remote(const char *str) The fact that the helper happens to use fprintf() to do its job is much less important than it writes to the standard error stream. Name it after what it does than how it does so. Is there a word that explains at a higher-level concept than "print to stderr" that this function tries to achieve? Same question for the name of the only caller of this function, modify_remote_v(). That name does not mean anything to readers other than that it futz with output from "remote -v" command, which is the least interesting piece of information. What does it try to achieve by using "remote -v"? Can we name the function after that? > +{ > + const char *p = str; > + const char *start; > + const char *end; > + char *name, *url; > + > + start = p; > + while (*p != ' ') > + p++; > + end = p; > + name = xstrndup(start, end - start); > + while(*p == ' ') > + p++; Perhaps make a small helper out of these seven lines, so that the caller can say something like p = str; name = parse_token(&p); url = parse_token(&p); This one you should be able to do without any extra allocation, though. Just write a parse_token() that finds start and length, prepare "char *name; int namelen" and the same pair for URL, and then fprintf(stderr, " %.*s\t%.*s\n", namelen, name, urllen, url); > + fprintf(stderr, " %s\t%s\n", name, url); > + free(name); > + free(url); > +} > +static int check_sm_exists(unsigned int force, const char *path) { > + > + int cache_pos, dir_in_cache = 0; Have a blank line here to separate decl and the first statement. > + if (read_cache() < 0) > + die(_("index file corrupt")); > + > + cache_pos = cache_name_pos(path, strlen(path)); > + if(cache_pos < 0 && (directory_exists_in_index(&the_index, > + path, strlen(path)) == index_directory)) > + dir_in_cache = 1; Funny line wrapping. Try to cut long line at an operator as close to the root of the parse tree (in this case, &&) as possible, i.e. if (cache_pos < 0 && directory_exists_in_index(&the_index, path, strlen(path)) == index_directory) It is OK to further wrap after == if the second line bothers you. A bigger question. Can the path be a regular file but at a higher stage because we are in the middle of a conflicted merge? We'd get cache_pos that is negative in that case, too, and we definitely would want to say the path already exists in the index in such a case, but ... > + > + if (!force) { > + if (cache_pos >= 0 || dir_in_cache) > + die(_("'%s' already exists in the index"), path); ... the current code may not trigger this die() in such a case, no? > + } else { > + struct cache_entry *ce = NULL; > + if (cache_pos >= 0) > + ce = the_index.cache[cache_pos]; > + if (dir_in_cache || (ce && !S_ISGITLINK(ce->ce_mode))) > + die(_("'%s' already exists in the index and is not a " > + "submodule"), path); Likewise here. cache_pos < 0 does not automatically mean it does not exist. It tells you that it does not exist as a merged entry. > + } > + return 0; > +} > + > +static void modify_remote_v(struct strbuf *sb) This roughly corresponds to this part of the original grep '(fetch)' | sed -e s,^," ", -e s,' (fetch)',, I actualy would suggest moving the "git remote -v" invocation and capturing of its output to this helper function and name it after what it does, which seems to be "show fetch remotes" to me. > +{ > + int i; > + for (i = 0; i < sb->len; i++) { > + const char *start = sb->buf + i; > + const char *end = start; > + while (sb->buf[i++] != '\n') > + end++; > + if (!strcmp("fetch", xstrndup(end - 6, 5))) The original makes sure the 'fetch' appears inside "()" but this does not. Any reason why we want to do it differently? > + fprintf_submodule_remote(xstrndup(start, end - start - 7)); The result of xstrndup() is leaking here. In any case, with a helper function like parse_token() suggested before, you can get rid of the fprintf_submodule_remote() helper and open code it here, without any temporary allocation and freeing. You'd have the start of each line of "git remote -v" output (so you know where it starts and it ends), and a parser that roughly does this: /* * cp points at the current location, and ep points the * end of the buffer. find the tail of the current string * and store its length in *len, skip over whitespaces and * return the location to be used as the new cp. */ const char *parse_token(char *cp, char *ep, int *len); and make the latter half of this function (former half would be spawning "remote -v" and capturing its output in sb) a loop whose body may look like { char *end, *name, *url, *tail; int namelen, urllen; end = strchrnul(start, '\n'); name = start; url = parse_token(name, end, &namelen); tail = parse_token(url, end, &urllen); if (!memcmp(tail, "(fetch)", 7)) fprintf(stderr, ...); /* see above */ start = *end ? end + 1 : end; } > +static int add_submodule(struct add_data *info) > +{ > + /* perhaps the path exists and is already a git repo, else clone it */ > + if (is_directory(info->sm_path)) { > + char *sub_git_path = xstrfmt("%s/.git", info->sm_path); > + if (is_directory(sub_git_path) || file_exists(sub_git_path)) > + printf(_("Adding existing repo at '%s' to the index\n"), > + info->sm_path); > + else > + die(_("'%s' already exists and is not a valid git repo"), > + info->sm_path); > + free(sub_git_path); > + } else { > + struct strvec clone_args = STRVEC_INIT; > + struct child_process cp = CHILD_PROCESS_INIT; > + char *submodule_git_dir = xstrfmt(".git/modules/%s", info->sm_name); > + > + if (is_directory(submodule_git_dir)) { > + if (!info->force) { As I said, it would be a better organization to have a helper function that does what is done from here ... > + struct child_process cp_rem = CHILD_PROCESS_INIT; > + struct strbuf sb_rem = STRBUF_INIT; > + cp_rem.git_cmd = 1; > + fprintf(stderr, _("A git directory for '%s' is " > + "found locally with remote(s):\n"), > + info->sm_name); > + strvec_pushf(&cp_rem.env_array, > + "GIT_DIR=%s", submodule_git_dir); > + strvec_push(&cp_rem.env_array, "GIT_WORK_TREE=."); > + strvec_pushl(&cp_rem.args, "remote", "-v", NULL); > + if (!capture_command(&cp_rem, &sb_rem, 0)) { > + modify_remote_v(&sb_rem); > + } ... up to here. Can you say what the purpose of that helper function? I'd say it is for a given git repository (specified by submodule_git_dir, which we will pass as the parameter to that helper), report the names and URLs of fetch remotes defined in that repository. So, perhaps its signature might be: static void report_fetch_remotes(FILE *output, const char *git_dir); where we would make a call to it from here like so: report_fetch_remotes(stderr, submodule_git_dir); > + error(_("If you want to reuse this local git " > + "directory instead of cloning again from\n " > + " %s\n" > + "use the '--force' option. If the local " > + "git directory is not the correct repo\n" > + "or you are unsure what this means choose " > + "another name with the '--name' option."), > + info->realrepo); > + return 1; > + } else { > + printf(_("Reactivating local git directory for " > + "submodule '%s'."), info->sm_path); > + } > + } > + free(submodule_git_dir); I think I've reviewed up to this point this time around. Thanks.
Shourya Shukla <shouryashukla.oo@gmail.com> writes: > +static void config_added_submodule(struct add_data *info) > +{ This one I may take a look at later, but won't review in this message. > +} > + > +static int module_add(int argc, const char **argv, const char *prefix) > +{ > + const char *branch = NULL, *custom_name = NULL, *realrepo = NULL; > + const char *reference_path = NULL, *repo = NULL, *name = NULL; > + char *path; > + int force = 0, quiet = 0, depth = -1, progress = 0, dissociate = 0; > + struct add_data info = ADD_DATA_INIT; > + struct strbuf sb = STRBUF_INIT; > + > + struct option options[] = { > + OPT_STRING('b', "branch", &branch, N_("branch"), > + N_("branch of repository to add as submodule")), > + OPT_BOOL('f', "force", &force, N_("allow adding an otherwise " > + "ignored submodule path")), > + OPT__QUIET(&quiet, N_("print only error messages")), > + OPT_BOOL(0, "progress", &progress, N_("force cloning progress")), > + OPT_STRING(0, "reference", &reference_path, N_("repository"), > + N_("reference repository")), > + OPT_BOOL(0, "dissociate", &dissociate, N_("borrow the objects from reference repositories")), > + OPT_STRING(0, "name", &custom_name, N_("name"), > + N_("sets the submodule’s name to the given string " > + "instead of defaulting to its path")), > + OPT_INTEGER(0, "depth", &depth, N_("depth for shallow clones")), > + OPT_END() > + }; > + > + const char *const usage[] = { > + N_("git submodule--helper add [<options>] [--] [<path>]"), > + NULL > + }; > + > + argc = parse_options(argc, argv, prefix, options, usage, 0); > + > + if (!is_writing_gitmodules_ok()) > + die(_("please make sure that the .gitmodules file is in the working tree")); > + > + if (reference_path && !is_absolute_path(reference_path) && prefix) Checking "*prefix" lets us avoid an unnecessary allocation, i.e. if (prefix && *prefix && reference_path && !is_absolute_path(reference_path)) > + reference_path = xstrfmt("%s%s", prefix, reference_path); > + > + if (argc == 0 || argc > 2) { Nice that you are checking excess args, which the original didn't do. > + usage_with_options(usage, options); > + } else if (argc == 1) { > + repo = argv[0]; > + path = guess_dir_name(repo); We've reviewed the function already. Good. > + } else { > + repo = argv[0]; > + path = xstrdup(argv[1]); OK. So after this if/else if/else cascade, path is an allocated piece of memory we could later free() whichever branch is taken. > + } > + > + if (!is_absolute_path(path) && prefix) > + path = xstrfmt("%s%s", prefix, path); This also makes path freeable, but the original path is leaked. if (prefix && *prefix && !is_absolute_path(path)) { free(path); path = xstrfmt(...); } Is there a reason (does not have to be a strong reason) why we use 'path', not 'sm_path', as the variable name that corresponds to $sm_path in the original, by the way? > + /* assure repo is absolute or relative to parent */ > + if (starts_with_dot_dot_slash(repo) || starts_with_dot_slash(repo)) { > + char *remote = get_default_remote(); > + char *remoteurl; > + struct strbuf sb = STRBUF_INIT; > + > + if (prefix) > + die(_("relative path can only be used from the toplevel " > + "of the working tree")); This is 'git submodule--helper resolve-relative-url "$repo"' in the original. > + /* dereference source url relative to parent's url */ > + strbuf_addf(&sb, "remote.%s.url", remote); > + if (git_config_get_string(sb.buf, &remoteurl)) > + remoteurl = xgetcwd(); > + realrepo = relative_url(remoteurl, repo, NULL); And this is copied-and-pasted from resolve_relative_url() function found in builtins/submodule--helper.c. relative_url() returns an allocated memory so we can free() realrepo if we took this branch. > + free(remoteurl); > + free(remote); > + } else if (is_dir_sep(repo[0]) || strchr(repo, ':')) { > + realrepo = repo; This repo came from argv[0] so we cannot free realrepo if we took this branch. Are we willing to leak realrepo we obtained from the other branch? > + } else { > + die(_("repo URL: '%s' must be absolute or begin with ./|../"), > + repo); > + } > + > + /* > + * normalize path: > + * multiple //; leading ./; /./; /../; > + */ > + normalize_path_copy(path, path); It's nice that a handy (almost) equivalent helper is already available ;-) > + /* strip trailing '/' */ > + if (is_dir_sep(path[strlen(path) -1])) > + path[strlen(path) - 1] = '\0'; The original dealt with multiple trailing '/' but this one does not. Shouldn't it loop starting at the end? > + if (check_sm_exists(force, path)) > + return 1; OK. I think we reviewed the function. Seeing it in the context of the calling site makes us realize that it has a wrong name. "check submodule exists" sounds as if we expect a submodule to exist at the path, and it is an error for a submodule not to be there, but that is not what this caller (which is the only caller of the helper) wants to check. And more importantly, the helper reacts to anything sitting at the path, not just submoudle. So what does the helper really do? I think it checks if it is OK to create a submodule there. IOW, "exists" part of the name is what makes it a misnomer. Perhaps "can_create_submodule()"? > + strbuf_addstr(&sb, path); > + if (is_nonbare_repository_dir(&sb)) { > + struct object_id oid; > + if (resolve_gitlink_ref(path, "HEAD", &oid) < 0) > + die(_("'%s' does not have a commit checked out"), path); > + } > + > + if (!force) { > + struct strbuf sb = STRBUF_INIT; > + struct child_process cp = CHILD_PROCESS_INIT; > + cp.git_cmd = 1; > + cp.no_stdout = 1; > + strvec_pushl(&cp.args, "add", "--dry-run", "--ignore-missing", > + "--no-warn-embedded-repo", path, NULL); > + if (pipe_command(&cp, NULL, 0, NULL, 0, &sb, 0)) { > + fprintf(stderr, _("%s"), sb.buf); Sorry, but I cannot guess what this _("%s") is trying to achieve. Shouldn't it be strbuf_complete_line(&sb); fputs(sb.buf, stderr); instead? > + return 1; The original honors the exit code from the dry-run and relays it to the user. Is this a regression, or nobody care what exit status they get as long as it is not zero? > + } > + strbuf_release(&sb); > + } > + > + name = custom_name ? custom_name : path; > + if (check_submodule_name(name)) > + die(_("'%s' is not a valid submodule name"), name); > + > + info.prefix = prefix; > + info.sm_name = name; > + info.sm_path = path; > + info.repo = repo; > + info.realrepo = realrepo; > + info.reference_path = reference_path; > + info.branch = branch; > + info.depth = depth; > + info.progress = !!progress; > + info.dissociate = !!dissociate; > + info.force = !!force; > + info.quiet = !!quiet; > + > + if (add_submodule(&info)) > + return 1; I think we've reviewed this funciton already. > + config_added_submodule(&info); > + > + free(path); Looking a bit uneven wrt to leak handling. > + return 0; > +} Thanks.
Shourya Shukla <shouryashukla.oo@gmail.com> writes: > +static void config_added_submodule(struct add_data *info) > +{ > + char *key, *var = NULL; > + struct child_process cp = CHILD_PROCESS_INIT; > + > + key = xstrfmt("submodule.%s.url", info->sm_name); > + git_config_set_gently(key, info->realrepo); > + free(key); > + > + cp.git_cmd = 1; > + strvec_pushl(&cp.args, "add", "--no-warn-embedded-repo", NULL); > + if (info->force) > + strvec_push(&cp.args, "--force"); > + strvec_pushl(&cp.args, "--", info->sm_path, ".gitmodules", NULL); Hmph, you lost me. I think this ought to correspond to this part of the original: git add --no-warn-embedded-repo $force "$sm_path" || die "$(eval_gettext "Failed to add submodule '\$sm_path'")" I can see that adding "--" before $sm_path may be an improvement, but why do we also add .gitmodules here, and ... > + key = xstrfmt("submodule.%s.path", info->sm_name); > + git_config_set_in_file_gently(".gitmodules", key, info->sm_path); > + free(key); > + key = xstrfmt("submodule.%s.url", info->sm_name); > + git_config_set_in_file_gently(".gitmodules", key, info->repo); > + free(key); > + key = xstrfmt("submodule.%s.branch", info->sm_name); > + if (info->branch) > + git_config_set_in_file_gently(".gitmodules", key, info->branch); > + free(key); ... perform quite a lot of configuration writing, before actually spawning the "git add" and make sure it succeeds? The original won't futz with any of these .gitmodules entries if "git add" of the $sm_path fails and that is a good discipline to follow. > + if (run_command(&cp)) > + die(_("failed to add submodule '%s'"), info->sm_path); > + > + /* > + * NEEDSWORK: In a multi-working-tree world, this needs to be > + * set in the per-worktree config. > + */ > + if (!git_config_get_string("submodule.active", &var) && var) { What if this were a valueless true ("[submodule] active\n" without "= true")? Wouldn't get_string() fail? > + /* > + * If the submodule being adding isn't already covered by the > + * current configured pathspec, set the submodule's active flag > + */ > + if (!is_submodule_active(the_repository, info->sm_path)) { > + key = xstrfmt("submodule.%s.active", info->sm_name); > + git_config_set_gently(key, "true"); > + free(key); > + } > + } else { > + key = xstrfmt("submodule.%s.active", info->sm_name); > + git_config_set_gently(key, "true"); > + free(key); > + } > +} > + > + ... > + info.prefix = prefix; > + info.sm_name = name; > + info.sm_path = path; > + info.repo = repo; > + info.realrepo = realrepo; > + info.reference_path = reference_path; > + info.branch = branch; > + info.depth = depth; > + info.progress = !!progress; > + info.dissociate = !!dissociate; > + info.force = !!force; > + info.quiet = !!quiet; > + > + if (add_submodule(&info)) > + return 1; > + config_added_submodule(&info); Whew. This was way too big to be reviewed in a single sitting. I do not know offhand if there is a better way to structure the changes into a more digestible pieces to help prevent reviewers from overlooking potential mistakes, though. Thanks.
> Whew. > > This was way too big to be reviewed in a single sitting. I do not > know offhand if there is a better way to structure the changes into > a more digestible pieces to help prevent reviewers from overlooking > potential mistakes, though. > > Thanks. I just took a look at this, and one thing that would have helped is if you ported the end of the function first in a commit, and work your way backwards (in one or more commits). After reading through the whole thing, I saw that this is mostly a straightforward start-to-finish port (besides factoring out code into functions), but it would be much easier for reviewers to conceptualize and discuss the different parts if they were already divided.
On Thu, Nov 19 2020, Jonathan Tan wrote: >> Whew. >> >> This was way too big to be reviewed in a single sitting. I do not >> know offhand if there is a better way to structure the changes into >> a more digestible pieces to help prevent reviewers from overlooking >> potential mistakes, though. >> >> Thanks. > > I just took a look at this, and one thing that would have helped is if > you ported the end of the function first in a commit, and work your way > backwards (in one or more commits). > > After reading through the whole thing, I saw that this is mostly a > straightforward start-to-finish port (besides factoring out code into > functions), but it would be much easier for reviewers to conceptualize > and discuss the different parts if they were already divided. Having done some minor changes to git-submodule.sh recently, I wondered if we weren't at the point where it would be a nice approach to invert the C/sh helper relationship. I.e. write git-submodule.c, which would be the small entry point, it would then mostly dispatch to a submodule--helper, which would in turn mostly dispatch to a new submodule--helper-sh (containing most of the current git-submodule.sh code), which in turn would re-dispatch to the C submodule--helper (which as an aside, then sometimes calls itself via process invocation). It's quite a bit of spaghetti code, but means that there's a straighter path to porting some of the setup code such as the "--check-writeable", is_absolute_path() etc. being changed at the start of the change here to git-submodule.sh.
Hi Ævar, On Thu, 19 Nov 2020, Ævar Arnfjörð Bjarmason wrote: > > On Thu, Nov 19 2020, Jonathan Tan wrote: > > >> Whew. > >> > >> This was way too big to be reviewed in a single sitting. I do not > >> know offhand if there is a better way to structure the changes into > >> a more digestible pieces to help prevent reviewers from overlooking > >> potential mistakes, though. > >> > >> Thanks. > > > > I just took a look at this, and one thing that would have helped is if > > you ported the end of the function first in a commit, and work your way > > backwards (in one or more commits). > > > > After reading through the whole thing, I saw that this is mostly a > > straightforward start-to-finish port (besides factoring out code into > > functions), but it would be much easier for reviewers to conceptualize > > and discuss the different parts if they were already divided. > > Having done some minor changes to git-submodule.sh recently, I wondered > if we weren't at the point where it would be a nice approach to invert > the C/sh helper relationship. > > I.e. write git-submodule.c, which would be the small entry point, it > would then mostly dispatch to a submodule--helper, which would in turn > mostly dispatch to a new submodule--helper-sh (containing most of the > current git-submodule.sh code), which in turn would re-dispatch to the C > submodule--helper (which as an aside, then sometimes calls itself via > process invocation). > > It's quite a bit of spaghetti code, but means that there's a straighter > path to porting some of the setup code such as the "--check-writeable", > is_absolute_path() etc. being changed at the start of the change here to > git-submodule.sh. Looking at https://github.com/gitgitgadget/git/blob/ss/submodule-add-in-c/git-submodule.sh, I see that while there are still 794 lines, most of it is just mostly redundant option parsing. The only function left to convert is `cmd_update()`, and the first half was already converted to C long ago, via the `git submodule--helper update-clone` subcommand: https://github.com/gitgitgadget/git/blob/ss/submodule-add-in-c/git-submodule.sh#L269-L530 At this stage I suspect that having a little more patience would make more sense. After `cmd_update` is converted, we can simply finish the conversion, without having to keep the shell script around. Ciao, Dscho
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > At this stage I suspect that having a little more patience would make more > sense. After `cmd_update` is converted, we can simply finish the > conversion, without having to keep the shell script around. That matches how I view the current state. We seem to be reviewer bandwidth limited and folks who took a look at this series to help moving it forward certainly deserve a lot of gratitude. Thanks.
diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c index de5ad73bb8..ec0a50d032 100644 --- a/builtin/submodule--helper.c +++ b/builtin/submodule--helper.c @@ -19,7 +19,6 @@ #include "diffcore.h" #include "diff.h" #include "object-store.h" -#include "dir.h" #include "advice.h" #define OPT_QUIET (1 << 0) @@ -2744,6 +2743,395 @@ static int module_set_branch(int argc, const char **argv, const char *prefix) return !!ret; } +struct add_data { + const char *prefix; + const char *branch; + const char *reference_path; + const char *sm_path; + const char *sm_name; + const char *repo; + const char *realrepo; + int depth; + unsigned int force: 1; + unsigned int quiet: 1; + unsigned int progress: 1; + unsigned int dissociate: 1; +}; +#define ADD_DATA_INIT { NULL, NULL, NULL, NULL, NULL, NULL, NULL, 0, 0, 0, 0, 0 } + +/* + * Guess dir name from repository: strip leading '.*[/:]', + * strip trailing '[:/]*.git'. + */ +static char *guess_dir_name(const char *repo) +{ + const char *p, *start, *end, *limit; + int after_slash_or_colon; + + after_slash_or_colon = 0; + limit = repo + strlen(repo); + start = repo; + end = limit; + for (p = repo; p < limit; p++) { + if (starts_with(p, ".git")) { + /* strip trailing '[:/]*.git' */ + if (!after_slash_or_colon) + end = p; + p += 3; + } else if (*p == '/' || *p == ':') { + /* strip leading '.*[/:]' */ + if (end == limit) + end = p; + after_slash_or_colon = 1; + } else if (after_slash_or_colon) { + start = p; + end = limit; + after_slash_or_colon = 0; + } + } + return xstrndup(start, end - start); +} + +static void fprintf_submodule_remote(const char *str) +{ + const char *p = str; + const char *start; + const char *end; + char *name, *url; + + start = p; + while (*p != ' ') + p++; + end = p; + name = xstrndup(start, end - start); + + while(*p == ' ') + p++; + start = p; + while (*p != ' ') + p++; + end = p; + url = xstrndup(start, end - start); + + fprintf(stderr, " %s\t%s\n", name, url); + free(name); + free(url); +} + +static int check_sm_exists(unsigned int force, const char *path) { + + int cache_pos, dir_in_cache = 0; + if (read_cache() < 0) + die(_("index file corrupt")); + + cache_pos = cache_name_pos(path, strlen(path)); + if(cache_pos < 0 && (directory_exists_in_index(&the_index, + path, strlen(path)) == index_directory)) + dir_in_cache = 1; + + if (!force) { + if (cache_pos >= 0 || dir_in_cache) + die(_("'%s' already exists in the index"), path); + } else { + struct cache_entry *ce = NULL; + if (cache_pos >= 0) + ce = the_index.cache[cache_pos]; + if (dir_in_cache || (ce && !S_ISGITLINK(ce->ce_mode))) + die(_("'%s' already exists in the index and is not a " + "submodule"), path); + } + return 0; +} + +static void modify_remote_v(struct strbuf *sb) +{ + int i; + for (i = 0; i < sb->len; i++) { + const char *start = sb->buf + i; + const char *end = start; + while (sb->buf[i++] != '\n') + end++; + if (!strcmp("fetch", xstrndup(end - 6, 5))) + fprintf_submodule_remote(xstrndup(start, end - start - 7)); + } +} + +static int add_submodule(struct add_data *info) +{ + /* perhaps the path exists and is already a git repo, else clone it */ + if (is_directory(info->sm_path)) { + char *sub_git_path = xstrfmt("%s/.git", info->sm_path); + if (is_directory(sub_git_path) || file_exists(sub_git_path)) + printf(_("Adding existing repo at '%s' to the index\n"), + info->sm_path); + else + die(_("'%s' already exists and is not a valid git repo"), + info->sm_path); + free(sub_git_path); + } else { + struct strvec clone_args = STRVEC_INIT; + struct child_process cp = CHILD_PROCESS_INIT; + char *submodule_git_dir = xstrfmt(".git/modules/%s", info->sm_name); + + if (is_directory(submodule_git_dir)) { + if (!info->force) { + struct child_process cp_rem = CHILD_PROCESS_INIT; + struct strbuf sb_rem = STRBUF_INIT; + cp_rem.git_cmd = 1; + fprintf(stderr, _("A git directory for '%s' is " + "found locally with remote(s):\n"), + info->sm_name); + strvec_pushf(&cp_rem.env_array, + "GIT_DIR=%s", submodule_git_dir); + strvec_push(&cp_rem.env_array, "GIT_WORK_TREE=."); + strvec_pushl(&cp_rem.args, "remote", "-v", NULL); + if (!capture_command(&cp_rem, &sb_rem, 0)) { + modify_remote_v(&sb_rem); + } + error(_("If you want to reuse this local git " + "directory instead of cloning again from\n " + " %s\n" + "use the '--force' option. If the local " + "git directory is not the correct repo\n" + "or you are unsure what this means choose " + "another name with the '--name' option."), + info->realrepo); + return 1; + } else { + printf(_("Reactivating local git directory for " + "submodule '%s'."), info->sm_path); + } + } + free(submodule_git_dir); + + strvec_push(&clone_args, "clone"); + + if (info->quiet) + strvec_push(&clone_args, "--quiet"); + + if (info->progress) + strvec_push(&clone_args, "--progress"); + + if (info->prefix) + strvec_pushl(&clone_args, "--prefix", info->prefix, NULL); + strvec_pushl(&clone_args, "--path", info->sm_path, "--name", + info->sm_name, "--url", info->realrepo, NULL); + if (info->reference_path) + strvec_pushl(&clone_args, "--reference", + info->reference_path, NULL); + if (info->dissociate) + strvec_push(&clone_args, "--dissociate"); + + if (info->depth >= 0) + strvec_pushf(&clone_args, "--depth=%d", info->depth); + + if (module_clone(clone_args.nr, clone_args.v, info->prefix)) { + strvec_clear(&clone_args); + return -1; + } + + prepare_submodule_repo_env(&cp.env_array); + cp.git_cmd = 1; + cp.dir = info->sm_path; + strvec_pushl(&cp.args, "checkout", "-f", "-q", NULL); + + if (info->branch) { + strvec_pushl(&cp.args, "-B", info->branch, NULL); + strvec_pushf(&cp.args, "origin/%s", info->branch); + } + + if (run_command(&cp)) + die(_("unable to checkout submodule '%s'"), info->sm_path); + } + return 0; +} + +static void config_added_submodule(struct add_data *info) +{ + char *key, *var = NULL; + struct child_process cp = CHILD_PROCESS_INIT; + + key = xstrfmt("submodule.%s.url", info->sm_name); + git_config_set_gently(key, info->realrepo); + free(key); + + cp.git_cmd = 1; + strvec_pushl(&cp.args, "add", "--no-warn-embedded-repo", NULL); + if (info->force) + strvec_push(&cp.args, "--force"); + strvec_pushl(&cp.args, "--", info->sm_path, ".gitmodules", NULL); + + key = xstrfmt("submodule.%s.path", info->sm_name); + git_config_set_in_file_gently(".gitmodules", key, info->sm_path); + free(key); + key = xstrfmt("submodule.%s.url", info->sm_name); + git_config_set_in_file_gently(".gitmodules", key, info->repo); + free(key); + key = xstrfmt("submodule.%s.branch", info->sm_name); + if (info->branch) + git_config_set_in_file_gently(".gitmodules", key, info->branch); + free(key); + + if (run_command(&cp)) + die(_("failed to add submodule '%s'"), info->sm_path); + + /* + * NEEDSWORK: In a multi-working-tree world, this needs to be + * set in the per-worktree config. + */ + if (!git_config_get_string("submodule.active", &var) && var) { + + /* + * If the submodule being adding isn't already covered by the + * current configured pathspec, set the submodule's active flag + */ + if (!is_submodule_active(the_repository, info->sm_path)) { + key = xstrfmt("submodule.%s.active", info->sm_name); + git_config_set_gently(key, "true"); + free(key); + } + } else { + key = xstrfmt("submodule.%s.active", info->sm_name); + git_config_set_gently(key, "true"); + free(key); + } +} + +static int module_add(int argc, const char **argv, const char *prefix) +{ + const char *branch = NULL, *custom_name = NULL, *realrepo = NULL; + const char *reference_path = NULL, *repo = NULL, *name = NULL; + char *path; + int force = 0, quiet = 0, depth = -1, progress = 0, dissociate = 0; + struct add_data info = ADD_DATA_INIT; + struct strbuf sb = STRBUF_INIT; + + struct option options[] = { + OPT_STRING('b', "branch", &branch, N_("branch"), + N_("branch of repository to add as submodule")), + OPT_BOOL('f', "force", &force, N_("allow adding an otherwise " + "ignored submodule path")), + OPT__QUIET(&quiet, N_("print only error messages")), + OPT_BOOL(0, "progress", &progress, N_("force cloning progress")), + OPT_STRING(0, "reference", &reference_path, N_("repository"), + N_("reference repository")), + OPT_BOOL(0, "dissociate", &dissociate, N_("borrow the objects from reference repositories")), + OPT_STRING(0, "name", &custom_name, N_("name"), + N_("sets the submodule’s name to the given string " + "instead of defaulting to its path")), + OPT_INTEGER(0, "depth", &depth, N_("depth for shallow clones")), + OPT_END() + }; + + const char *const usage[] = { + N_("git submodule--helper add [<options>] [--] [<path>]"), + NULL + }; + + argc = parse_options(argc, argv, prefix, options, usage, 0); + + if (!is_writing_gitmodules_ok()) + die(_("please make sure that the .gitmodules file is in the working tree")); + + if (reference_path && !is_absolute_path(reference_path) && prefix) + reference_path = xstrfmt("%s%s", prefix, reference_path); + + if (argc == 0 || argc > 2) { + usage_with_options(usage, options); + } else if (argc == 1) { + repo = argv[0]; + path = guess_dir_name(repo); + } else { + repo = argv[0]; + path = xstrdup(argv[1]); + } + + if (!is_absolute_path(path) && prefix) + path = xstrfmt("%s%s", prefix, path); + + /* assure repo is absolute or relative to parent */ + if (starts_with_dot_dot_slash(repo) || starts_with_dot_slash(repo)) { + char *remote = get_default_remote(); + char *remoteurl; + struct strbuf sb = STRBUF_INIT; + + if (prefix) + die(_("relative path can only be used from the toplevel " + "of the working tree")); + /* dereference source url relative to parent's url */ + strbuf_addf(&sb, "remote.%s.url", remote); + if (git_config_get_string(sb.buf, &remoteurl)) + remoteurl = xgetcwd(); + realrepo = relative_url(remoteurl, repo, NULL); + + free(remoteurl); + free(remote); + } else if (is_dir_sep(repo[0]) || strchr(repo, ':')) { + realrepo = repo; + } else { + die(_("repo URL: '%s' must be absolute or begin with ./|../"), + repo); + } + + /* + * normalize path: + * multiple //; leading ./; /./; /../; + */ + normalize_path_copy(path, path); + /* strip trailing '/' */ + if (is_dir_sep(path[strlen(path) -1])) + path[strlen(path) - 1] = '\0'; + + if (check_sm_exists(force, path)) + return 1; + + strbuf_addstr(&sb, path); + if (is_nonbare_repository_dir(&sb)) { + struct object_id oid; + if (resolve_gitlink_ref(path, "HEAD", &oid) < 0) + die(_("'%s' does not have a commit checked out"), path); + } + + if (!force) { + struct strbuf sb = STRBUF_INIT; + struct child_process cp = CHILD_PROCESS_INIT; + cp.git_cmd = 1; + cp.no_stdout = 1; + strvec_pushl(&cp.args, "add", "--dry-run", "--ignore-missing", + "--no-warn-embedded-repo", path, NULL); + if (pipe_command(&cp, NULL, 0, NULL, 0, &sb, 0)) { + fprintf(stderr, _("%s"), sb.buf); + return 1; + } + strbuf_release(&sb); + } + + name = custom_name ? custom_name : path; + if (check_submodule_name(name)) + die(_("'%s' is not a valid submodule name"), name); + + info.prefix = prefix; + info.sm_name = name; + info.sm_path = path; + info.repo = repo; + info.realrepo = realrepo; + info.reference_path = reference_path; + info.branch = branch; + info.depth = depth; + info.progress = !!progress; + info.dissociate = !!dissociate; + info.force = !!force; + info.quiet = !!quiet; + + if (add_submodule(&info)) + return 1; + config_added_submodule(&info); + + free(path); + + return 0; +} + #define SUPPORT_SUPER_PREFIX (1<<0) struct cmd_struct { @@ -2777,6 +3165,7 @@ static struct cmd_struct commands[] = { {"config", module_config, 0}, {"set-url", module_set_url, 0}, {"set-branch", module_set_branch, 0}, + {"add", module_add, SUPPORT_SUPER_PREFIX}, }; int cmd_submodule__helper(int argc, const char **argv, const char *prefix) diff --git a/git-submodule.sh b/git-submodule.sh index 7ce52872b7..f1cbe4934a 100755 --- a/git-submodule.sh +++ b/git-submodule.sh @@ -146,166 +146,7 @@ cmd_add() shift done - if ! git submodule--helper config --check-writeable >/dev/null 2>&1 - then - die "$(eval_gettext "please make sure that the .gitmodules file is in the working tree")" - fi - - if test -n "$reference_path" - then - is_absolute_path "$reference_path" || - reference_path="$wt_prefix$reference_path" - - reference="--reference=$reference_path" - fi - - repo=$1 - sm_path=$2 - - if test -z "$sm_path"; then - sm_path=$(printf '%s\n' "$repo" | - sed -e 's|/$||' -e 's|:*/*\.git$||' -e 's|.*[/:]||g') - fi - - if test -z "$repo" || test -z "$sm_path"; then - usage - fi - - is_absolute_path "$sm_path" || sm_path="$wt_prefix$sm_path" - - # assure repo is absolute or relative to parent - case "$repo" in - ./*|../*) - test -z "$wt_prefix" || - die "$(gettext "Relative path can only be used from the toplevel of the working tree")" - - # dereference source url relative to parent's url - realrepo=$(git submodule--helper resolve-relative-url "$repo") || exit - ;; - *:*|/*) - # absolute url - realrepo=$repo - ;; - *) - die "$(eval_gettext "repo URL: '\$repo' must be absolute or begin with ./|../")" - ;; - esac - - # normalize path: - # multiple //; leading ./; /./; /../; trailing / - sm_path=$(printf '%s/\n' "$sm_path" | - sed -e ' - s|//*|/|g - s|^\(\./\)*|| - s|/\(\./\)*|/|g - :start - s|\([^/]*\)/\.\./|| - tstart - s|/*$|| - ') - if test -z "$force" - then - git ls-files --error-unmatch "$sm_path" > /dev/null 2>&1 && - die "$(eval_gettext "'\$sm_path' already exists in the index")" - else - git ls-files -s "$sm_path" | sane_grep -v "^160000" > /dev/null 2>&1 && - die "$(eval_gettext "'\$sm_path' already exists in the index and is not a submodule")" - fi - - if test -d "$sm_path" && - test -z $(git -C "$sm_path" rev-parse --show-cdup 2>/dev/null) - then - git -C "$sm_path" rev-parse --verify -q HEAD >/dev/null || - die "$(eval_gettext "'\$sm_path' does not have a commit checked out")" - fi - - if test -z "$force" - then - dryerr=$(git add --dry-run --ignore-missing --no-warn-embedded-repo "$sm_path" 2>&1 >/dev/null) - res=$? - if test $res -ne 0 - then - echo >&2 "$dryerr" - exit $res - fi - fi - - if test -n "$custom_name" - then - sm_name="$custom_name" - else - sm_name="$sm_path" - fi - - if ! git submodule--helper check-name "$sm_name" - then - die "$(eval_gettext "'$sm_name' is not a valid submodule name")" - fi - - # perhaps the path exists and is already a git repo, else clone it - if test -e "$sm_path" - then - if test -d "$sm_path"/.git || test -f "$sm_path"/.git - then - eval_gettextln "Adding existing repo at '\$sm_path' to the index" - else - die "$(eval_gettext "'\$sm_path' already exists and is not a valid git repo")" - fi - - else - if test -d ".git/modules/$sm_name" - then - if test -z "$force" - then - eval_gettextln >&2 "A git directory for '\$sm_name' is found locally with remote(s):" - GIT_DIR=".git/modules/$sm_name" GIT_WORK_TREE=. git remote -v | grep '(fetch)' | sed -e s,^," ", -e s,' (fetch)',, >&2 - die "$(eval_gettextln "\ -If you want to reuse this local git directory instead of cloning again from - \$realrepo -use the '--force' option. If the local git directory is not the correct repo -or you are unsure what this means choose another name with the '--name' option.")" - else - eval_gettextln "Reactivating local git directory for submodule '\$sm_name'." - fi - fi - git submodule--helper clone ${GIT_QUIET:+--quiet} ${progress:+"--progress"} --prefix "$wt_prefix" --path "$sm_path" --name "$sm_name" --url "$realrepo" ${reference:+"$reference"} ${dissociate:+"--dissociate"} ${depth:+"$depth"} || exit - ( - sanitize_submodule_env - cd "$sm_path" && - # ash fails to wordsplit ${branch:+-b "$branch"...} - case "$branch" in - '') git checkout -f -q ;; - ?*) git checkout -f -q -B "$branch" "origin/$branch" ;; - esac - ) || die "$(eval_gettext "Unable to checkout submodule '\$sm_path'")" - fi - git config submodule."$sm_name".url "$realrepo" - - git add --no-warn-embedded-repo $force "$sm_path" || - die "$(eval_gettext "Failed to add submodule '\$sm_path'")" - - git submodule--helper config submodule."$sm_name".path "$sm_path" && - git submodule--helper config submodule."$sm_name".url "$repo" && - if test -n "$branch" - then - git submodule--helper config submodule."$sm_name".branch "$branch" - fi && - git add --force .gitmodules || - die "$(eval_gettext "Failed to register submodule '\$sm_path'")" - - # NEEDSWORK: In a multi-working-tree world, this needs to be - # set in the per-worktree config. - if git config --get submodule.active >/dev/null - then - # If the submodule being adding isn't already covered by the - # current configured pathspec, set the submodule's active flag - if ! git submodule--helper is-active "$sm_path" - then - git config submodule."$sm_name".active "true" - fi - else - git config submodule."$sm_name".active "true" - fi + git ${wt_prefix:+-C "$wt_prefix"} ${prefix:+--super-prefix "$prefix"} submodule--helper add ${force:+--force} ${GIT_QUIET:+--quiet} ${progress:+--progress} ${branch:+--branch "$branch"} ${reference_path:+--reference "$reference_path"} ${dissociate:+--dissociate} ${custom_name:+--name "$custom_name"} ${depth:+"$depth"} -- "$@" } # diff --git a/t/t7400-submodule-basic.sh b/t/t7400-submodule-basic.sh index fec7e0299d..4ab8298385 100755 --- a/t/t7400-submodule-basic.sh +++ b/t/t7400-submodule-basic.sh @@ -48,7 +48,7 @@ test_expect_success 'submodule update aborts on missing gitmodules url' ' test_expect_success 'add aborts on repository with no commits' ' cat >expect <<-\EOF && - '"'repo-no-commits'"' does not have a commit checked out + fatal: '"'repo-no-commits'"' does not have a commit checked out EOF git init repo-no-commits && test_must_fail git submodule add ../a ./repo-no-commits 2>actual &&