Message ID | 20201005122646.27994-3-alban.gruin@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Rewrite the remaining merge strategies from shell to C | expand |
Alban Gruin <alban.gruin@gmail.com> writes: > This rewrites `git merge-one-file' from shell to C. This port is not > completely straightforward: to save precious cycles by avoiding reading > and flushing the index repeatedly, write temporary files when an > operation can be performed in-memory, or allow other function to use the > rewrite without forking nor worrying about the index,... So, the in-core index is still used, but when the contents of the in-core index does not have to be written out disk, we just don't? Makes sense. > diff --git a/builtin/merge-one-file.c b/builtin/merge-one-file.c > new file mode 100644 > index 0000000000..598338ba16 > --- /dev/null > +++ b/builtin/merge-one-file.c > @@ -0,0 +1,92 @@ > +/* > + * Builtin "git merge-one-file" > + * > + * Copyright (c) 2020 Alban Gruin > + * > + * Based on git-merge-one-file.sh, written by Linus Torvalds. > + * > + * This is the git per-file merge utility, called with > + * > + * argv[1] - original file SHA1 (or empty) > + * argv[2] - file in branch1 SHA1 (or empty) > + * argv[3] - file in branch2 SHA1 (or empty) Let's modernize this comment while we are at it. SHA1 -> "object name" (or "blob object name") > + * argv[4] - pathname in repository > + * argv[5] - original file mode (or empty) > + * argv[6] - file in branch1 mode (or empty) > + * argv[7] - file in branch2 mode (or empty) > + * > + * Handle some trivial cases. The _really_ trivial cases have been > + * handled already by git read-tree, but that one doesn't do any merges > + * that might change the tree layout. > + */ > + > +#define USE_THE_INDEX_COMPATIBILITY_MACROS > +#include "cache.h" > +#include "builtin.h" > +#include "lockfile.h" > +#include "merge-strategies.h" > + > +static const char builtin_merge_one_file_usage[] = > + "git merge-one-file <orig blob> <our blob> <their blob> <path> " > + "<orig mode> <our mode> <their mode>\n\n" > + "Blob ids and modes should be empty for missing files."; > + > +static int read_mode(const char *name, const char *arg, unsigned int *mode) > +{ > + char *last; > + int ret = 0; > + > + *mode = strtol(arg, &last, 8); > + > + if (*last) > + ret = error(_("invalid '%s' mode: expected nothing, got '%c'"), name, *last); > + else if (!(S_ISREG(*mode) || S_ISDIR(*mode) || S_ISLNK(*mode))) > + ret = error(_("invalid '%s' mode: %o"), name, *mode); > + > + return ret; > +} > + > +int cmd_merge_one_file(int argc, const char **argv, const char *prefix) > +{ > + struct object_id orig_blob, our_blob, their_blob, > + *p_orig_blob = NULL, *p_our_blob = NULL, *p_their_blob = NULL; > + unsigned int orig_mode = 0, our_mode = 0, their_mode = 0, ret = 0; > + struct lock_file lock = LOCK_INIT; > + > + if (argc != 8) > + usage(builtin_merge_one_file_usage); > + > + if (read_cache() < 0) > + die("invalid index"); > + > + hold_locked_index(&lock, LOCK_DIE_ON_ERROR); > + > + if (!get_oid(argv[1], &orig_blob)) { > + p_orig_blob = &orig_blob; > + ret = read_mode("orig", argv[5], &orig_mode); > + } argv[1] is defined as "either the object name of the blob in the common ancestor, or an empty string". So you need to distinguish three cases here, but you are only catching two. - argv[1] is an empty string; p_orig_blob can legitimately be left NULL. - argv[1] is a valid blob object name. orig_blob should be populated and p_orig_blob should point at it. - argv[1] is garbage, names a non-blob object, or there is no such object with that name. Don't we want to catch it as a mistake? Also, when argv[1] is an empty string, argv[5] must also be an empty string, or we got a wrong input---don't we want to catch it as a mistake? The third case needs a bit of thought. For example, if $1 and $2 are the same and points at a non-existent object, we know we won't care because we only care about $3. In a lazily-cloned repository, that may matter---we would not want to fail even if we not have blob $1 and $2, as long as they are reasonably spelled a full hexadecimal object name. But we would want to fail if blob object named by $3 is missing. One way to achieve semantics closer to the above than the posted patch may be to tighten the parsing. Instead of using "anything goes" get_oid(), use get_oid_hex(), perhaps. > + if (!get_oid(argv[2], &our_blob)) { > + p_our_blob = &our_blob; > + ret = read_mode("our", argv[6], &our_mode); > + } > + > + if (!get_oid(argv[3], &their_blob)) { > + p_their_blob = &their_blob; > + ret = read_mode("their", argv[7], &their_mode); > + } > + > + if (ret) > + return ret; > + > + ret = merge_strategies_one_file(the_repository, > + p_orig_blob, p_our_blob, p_their_blob, argv[4], > + orig_mode, our_mode, their_mode); That's a funny function name. It's not like the function will be taught different strategy to handle the three-way merge, no? It probably makes sense to name it after what it does, which is "three way merge". > + if (ret) { > + rollback_lock_file(&lock); > + return !!ret; > + } > + > + return write_locked_index(&the_index, &lock, COMMIT_LOCK); > +} > diff --git a/merge-strategies.c b/merge-strategies.c > new file mode 100644 > index 0000000000..bbe6f48698 > --- /dev/null > +++ b/merge-strategies.c > @@ -0,0 +1,214 @@ > +#include "cache.h" > +#include "dir.h" > +#include "ll-merge.h" > +#include "merge-strategies.h" > +#include "xdiff-interface.h" > + > +static int add_to_index_cacheinfo(struct index_state *istate, > + unsigned int mode, > + const struct object_id *oid, const char *path) > +{ > + struct cache_entry *ce; > + int len, option; > + > + if (!verify_path(path, mode)) > + return error(_("Invalid path '%s'"), path); > + > + len = strlen(path); > + ce = make_empty_cache_entry(istate, len); > + > + oidcpy(&ce->oid, oid); > + memcpy(ce->name, path, len); > + ce->ce_flags = create_ce_flags(0); > + ce->ce_namelen = len; > + ce->ce_mode = create_ce_mode(mode); > + if (assume_unchanged) > + ce->ce_flags |= CE_VALID; > + option = ADD_CACHE_OK_TO_ADD | ADD_CACHE_OK_TO_REPLACE; > + if (add_index_entry(istate, ce, option)) > + return error(_("%s: cannot add to the index"), path); > + > + return 0; > +} The above correctly does 'git update-index --add --cacheinfo "$6" "$2" "$4"' but don't copy-and-paste existing code to do so. Add one preliminary patch before everything else in the series to massage and extract add_cacheinfo() function out of builtin/update-index.c, move it to somewhere common like read-cache.c and so that we can call it from here. > +static int checkout_from_index(struct index_state *istate, const char *path) > +{ > + struct checkout state = CHECKOUT_INIT; > + struct cache_entry *ce; > + > + state.istate = istate; > + state.force = 1; > + state.base_dir = ""; > + state.base_dir_len = 0; > + > + ce = index_file_exists(istate, path, strlen(path), 0); This call is unfortunate for the reasons I mention later. But if you must have this call, then you need to sanity check what you get from index_file_exists(). ce must be a merged cache entry, so if (!ce || ce_stage(ce)) BUG(...); > + if (checkout_entry(ce, &state, NULL, NULL) < 0) > + return error(_("%s: cannot checkout file"), path); > + return 0; > +} > + > +static int merge_one_file_deleted(struct index_state *istate, > + const struct object_id *orig_blob, > + const struct object_id *our_blob, > + const struct object_id *their_blob, const char *path, > + unsigned int orig_mode, unsigned int our_mode, unsigned int their_mode) > +{ > + if ((our_blob && orig_mode != our_mode) || > + (their_blob && orig_mode != their_mode)) > + return error(_("File %s deleted on one branch but had its " > + "permissions changed on the other."), path); > + > + if (our_blob) { > + printf(_("Removing %s\n"), path); > + > + if (file_exists(path)) > + remove_path(path); > + } > + > + if (remove_file_from_index(istate, path)) > + return error("%s: cannot remove from the index", path); > + return 0; If the side that did not remove changed the mode, we don't silently remove but fail and give a chance to inspect the situation to the end user. If we had the blob and it is removed by them, we give a message and only in that case we remove the file from the working tree, together with any leading directory that has become empty. And after that we make sure that the path is no longer in the index. The function removes entries for the path at all the stages, which is exactly what we want. OK. > +} > + > +static int do_merge_one_file(struct index_state *istate, > + const struct object_id *orig_blob, > + const struct object_id *our_blob, > + const struct object_id *their_blob, const char *path, > + unsigned int orig_mode, unsigned int our_mode, unsigned int their_mode) > +{ > + int ret, i, dest; > + ssize_t written; > + mmbuffer_t result = {NULL, 0}; > + mmfile_t mmfs[3]; > + struct ll_merge_options merge_opts = {0}; > + struct cache_entry *ce; > + > + if (our_mode == S_IFLNK || their_mode == S_IFLNK) > + return error(_("%s: Not merging symbolic link changes."), path); > + else if (our_mode == S_IFGITLINK || their_mode == S_IFGITLINK) > + return error(_("%s: Not merging conflicting submodule changes."), path); > + > + read_mmblob(mmfs + 1, our_blob); > + read_mmblob(mmfs + 2, their_blob); > + > + if (orig_blob) { > + printf(_("Auto-merging %s\n"), path); > + read_mmblob(mmfs + 0, orig_blob); > + } else { > + printf(_("Added %s in both, but differently.\n"), path); > + read_mmblob(mmfs + 0, &null_oid); > + } > + > + merge_opts.xdl_opts = XDL_MERGE_ZEALOUS_ALNUM; > + ret = ll_merge(&result, path, > + mmfs + 0, "orig", > + mmfs + 1, "our", > + mmfs + 2, "their", > + istate, &merge_opts); Is it correct to call into ll_merge() here? The original used to call "git merge-file" which called into xdl_merge(). Calling into ll_merge() means the path is used to look up the attributes and use the custom merge driver, which I am not offhand sure is what we want to see at this low level (and if it turns out to be a good idea, we definitely should explain the change of semantics in the proposed log message for this commit). > + for (i = 0; i < 3; i++) > + free(mmfs[i].ptr); > + > + if (ret < 0) { > + free(result.ptr); > + return error(_("Failed to execute internal merge")); > + } > + > + /* > + * Create the working tree file, using "our tree" version from > + * the index, and then store the result of the merge. > + */ The above is copied from the original, to explain what it did after the comment, but it does not seem to match what the new code does. > + ce = index_file_exists(istate, path, strlen(path), 0); > + if (!ce) > + BUG("file is not present in the cache?"); > + > + unlink(path); > + if ((dest = open(path, O_WRONLY | O_CREAT, ce->ce_mode)) < 0) { > + free(result.ptr); > + return error_errno(_("failed to open file '%s'"), path); > + } > + > + written = write_in_full(dest, result.ptr, result.size); > + close(dest); > + > + free(result.ptr); > + > + if (written < 0) > + return error_errno(_("failed to write to '%s'"), path); > + This open(..., ce->ce_mode) call is way insufficient. The comment we have above this part of the code talks about the difficulty of doing this correctly in scripted version. Creating a file by 'git checkout-index -f --stage=2 -- "$4"' and reusing it to store the merged contents was the cleanest and easiest way without having direct access to adjust_shared_perm() to create a working tree file with the correct permission bits. We are writing in C, so we should be able to do much better than the scripted version, as we can later call adjust_shared_perm(). > + if (ret != 0 || !orig_blob) > + ret = error(_("content conflict in %s"), path); > + if (our_mode != their_mode) > + return error(_("permission conflict: %o->%o,%o in %s"), > + orig_mode, our_mode, their_mode, path); > + if (ret) > + return -1; > + > + return add_file_to_index(istate, path, 0); > +} > + > +int merge_strategies_one_file(struct repository *r, > + const struct object_id *orig_blob, > + const struct object_id *our_blob, > + const struct object_id *their_blob, const char *path, > + unsigned int orig_mode, unsigned int our_mode, > + unsigned int their_mode) > +{ In a long if/else if/else if/.../else cascade, enclose all bodies in braces, if any one of them has a multi-statement body, to avoid being distracting. > + if (orig_blob && > + ((!their_blob && our_blob && oideq(orig_blob, our_blob)) || > + (!our_blob && their_blob && oideq(orig_blob, their_blob)))) > + /* Deleted in both or deleted in one and unchanged in the other. */ > + return merge_one_file_deleted(r->index, > + orig_blob, our_blob, their_blob, path, > + orig_mode, our_mode, their_mode); OK, we've already reviewed that function. > + else if (!orig_blob && our_blob && !their_blob) { > + /* > + * Added in one. The other side did not add and we > + * added so there is nothing to be done, except making > + * the path merged. > + */ > + return add_to_index_cacheinfo(r->index, our_mode, our_blob, path); OK, we've already reviewed that function. > + } else if (!orig_blob && !our_blob && their_blob) { > + printf(_("Adding %s\n"), path); > + > + if (file_exists(path)) > + return error(_("untracked %s is overwritten by the merge."), path); > + > + if (add_to_index_cacheinfo(r->index, their_mode, their_blob, path)) > + return -1; > + return checkout_from_index(r->index, path); You did "add_to_index_cacheinfo()", so you MUST know which ce is to be checked out. Consider if it is worth to teach add_to_index_cacheinfo() to give you ce back and pass it to checkout_from_index(); that way, you do not have to call index_file_exists() based on path in the function. > + } else if (!orig_blob && our_blob && their_blob && > + oideq(our_blob, their_blob)) { > + /* Added in both, identically (check for same permissions). */ > + if (our_mode != their_mode) > + return error(_("File %s added identically in both branches, " > + "but permissions conflict %o->%o."), > + path, our_mode, their_mode); > + > + printf(_("Adding %s\n"), path); > + > + if (add_to_index_cacheinfo(r->index, our_mode, our_blob, path)) > + return -1; > + return checkout_from_index(r->index, path); Ditto. > + } else if (our_blob && their_blob) > + /* Modified in both, but differently. */ > + return do_merge_one_file(r->index, > + orig_blob, our_blob, their_blob, path, > + orig_mode, our_mode, their_mode); > + else { > + char orig_hex[GIT_MAX_HEXSZ] = {0}, our_hex[GIT_MAX_HEXSZ] = {0}, > + their_hex[GIT_MAX_HEXSZ] = {0}; > + > + if (orig_blob) > + oid_to_hex_r(orig_hex, orig_blob); > + if (our_blob) > + oid_to_hex_r(our_hex, our_blob); > + if (their_blob) > + oid_to_hex_r(their_hex, their_blob); > + > + return error(_("%s: Not handling case %s -> %s -> %s"), > + path, orig_hex, our_hex, their_hex); > + } > + > + return 0; > +} I can see that this does go in the right direction. With a bit more attention to details it would soon be production-ready quality. Thanks.
Hi Junio, On 07/10/2020 00:01, Junio C Hamano wrote: > Alban Gruin <alban.gruin@gmail.com> writes: > >> This rewrites `git merge-one-file' from shell to C. This port is not >> completely straightforward: to save precious cycles by avoiding reading >> and flushing the index repeatedly, write temporary files when an >> operation can be performed in-memory, or allow other function to use the >> rewrite without forking nor worrying about the index,... > > So, the in-core index is still used, but when the contents of the in-core > index does not have to be written out disk, we just don't? Makes sense. > >> diff --git a/builtin/merge-one-file.c b/builtin/merge-one-file.c >> new file mode 100644 >> index 0000000000..598338ba16 >> --- /dev/null >> +++ b/builtin/merge-one-file.c >> @@ -0,0 +1,92 @@ >> +/* >> + * Builtin "git merge-one-file" >> + * >> + * Copyright (c) 2020 Alban Gruin >> + * >> + * Based on git-merge-one-file.sh, written by Linus Torvalds. >> + * >> + * This is the git per-file merge utility, called with >> + * >> + * argv[1] - original file SHA1 (or empty) >> + * argv[2] - file in branch1 SHA1 (or empty) >> + * argv[3] - file in branch2 SHA1 (or empty) > > Let's modernize this comment while we are at it. > > SHA1 -> "object name" (or "blob object name") > >> + * argv[4] - pathname in repository >> + * argv[5] - original file mode (or empty) >> + * argv[6] - file in branch1 mode (or empty) >> + * argv[7] - file in branch2 mode (or empty) >> + * >> + * Handle some trivial cases. The _really_ trivial cases have been >> + * handled already by git read-tree, but that one doesn't do any merges >> + * that might change the tree layout. >> + */ >> + >> +#define USE_THE_INDEX_COMPATIBILITY_MACROS >> +#include "cache.h" >> +#include "builtin.h" >> +#include "lockfile.h" >> +#include "merge-strategies.h" >> + >> +static const char builtin_merge_one_file_usage[] = >> + "git merge-one-file <orig blob> <our blob> <their blob> <path> " >> + "<orig mode> <our mode> <their mode>\n\n" >> + "Blob ids and modes should be empty for missing files."; >> + >> +static int read_mode(const char *name, const char *arg, unsigned int *mode) >> +{ >> + char *last; >> + int ret = 0; >> + >> + *mode = strtol(arg, &last, 8); >> + >> + if (*last) >> + ret = error(_("invalid '%s' mode: expected nothing, got '%c'"), name, *last); >> + else if (!(S_ISREG(*mode) || S_ISDIR(*mode) || S_ISLNK(*mode))) >> + ret = error(_("invalid '%s' mode: %o"), name, *mode); >> + >> + return ret; >> +} >> + >> +int cmd_merge_one_file(int argc, const char **argv, const char *prefix) >> +{ >> + struct object_id orig_blob, our_blob, their_blob, >> + *p_orig_blob = NULL, *p_our_blob = NULL, *p_their_blob = NULL; >> + unsigned int orig_mode = 0, our_mode = 0, their_mode = 0, ret = 0; >> + struct lock_file lock = LOCK_INIT; >> + >> + if (argc != 8) >> + usage(builtin_merge_one_file_usage); >> + >> + if (read_cache() < 0) >> + die("invalid index"); >> + >> + hold_locked_index(&lock, LOCK_DIE_ON_ERROR); >> + >> + if (!get_oid(argv[1], &orig_blob)) { >> + p_orig_blob = &orig_blob; >> + ret = read_mode("orig", argv[5], &orig_mode); >> + } > > argv[1] is defined as "either the object name of the blob in the > common ancestor, or an empty string". So you need to distinguish > three cases here, but you are only catching two. > > - argv[1] is an empty string; p_orig_blob can legitimately be left > NULL. > > - argv[1] is a valid blob object name. orig_blob should be > populated and p_orig_blob should point at it. > > - argv[1] is garbage, names a non-blob object, or there is no such > object with that name. Don't we want to catch it as a mistake? > > Also, when argv[1] is an empty string, argv[5] must also be an empty > string, or we got a wrong input---don't we want to catch it as a > mistake? > > The third case needs a bit of thought. For example, if $1 and $2 > are the same and points at a non-existent object, we know we won't > care because we only care about $3. In a lazily-cloned repository, > that may matter---we would not want to fail even if we not have blob > $1 and $2, as long as they are reasonably spelled a full hexadecimal > object name. But we would want to fail if blob object named by $3 > is missing. > > One way to achieve semantics closer to the above than the posted > patch may be to tighten the parsing. Instead of using "anything > goes" get_oid(), use get_oid_hex(), perhaps. > >> + if (!get_oid(argv[2], &our_blob)) { >> + p_our_blob = &our_blob; >> + ret = read_mode("our", argv[6], &our_mode); >> + } >> + >> + if (!get_oid(argv[3], &their_blob)) { >> + p_their_blob = &their_blob; >> + ret = read_mode("their", argv[7], &their_mode); >> + } >> + >> + if (ret) >> + return ret; >> + >> + ret = merge_strategies_one_file(the_repository, >> + p_orig_blob, p_our_blob, p_their_blob, argv[4], >> + orig_mode, our_mode, their_mode); > > That's a funny function name. It's not like the function will be > taught different strategy to handle the three-way merge, no? It > probably makes sense to name it after what it does, which is "three > way merge". > Okay. There's already a function called threeway_merge() in unpack_trees() that does something different. merge_strategies_threeway() should be good? >> + if (ret) { >> + rollback_lock_file(&lock); >> + return !!ret; >> + } >> + >> + return write_locked_index(&the_index, &lock, COMMIT_LOCK); >> +} > >> diff --git a/merge-strategies.c b/merge-strategies.c >> new file mode 100644 >> index 0000000000..bbe6f48698 >> --- /dev/null >> +++ b/merge-strategies.c >> @@ -0,0 +1,214 @@ >> +#include "cache.h" >> +#include "dir.h" >> +#include "ll-merge.h" >> +#include "merge-strategies.h" >> +#include "xdiff-interface.h" >> + > >> +static int add_to_index_cacheinfo(struct index_state *istate, >> + unsigned int mode, >> + const struct object_id *oid, const char *path) >> +{ >> + struct cache_entry *ce; >> + int len, option; >> + >> + if (!verify_path(path, mode)) >> + return error(_("Invalid path '%s'"), path); >> + >> + len = strlen(path); >> + ce = make_empty_cache_entry(istate, len); >> + >> + oidcpy(&ce->oid, oid); >> + memcpy(ce->name, path, len); >> + ce->ce_flags = create_ce_flags(0); >> + ce->ce_namelen = len; >> + ce->ce_mode = create_ce_mode(mode); >> + if (assume_unchanged) >> + ce->ce_flags |= CE_VALID; >> + option = ADD_CACHE_OK_TO_ADD | ADD_CACHE_OK_TO_REPLACE; >> + if (add_index_entry(istate, ce, option)) >> + return error(_("%s: cannot add to the index"), path); >> + >> + return 0; >> +} > > The above correctly does 'git update-index --add --cacheinfo "$6" > "$2" "$4"' but don't copy-and-paste existing code to do so. Add one > preliminary patch before everything else in the series to massage > and extract add_cacheinfo() function out of builtin/update-index.c, > move it to somewhere common like read-cache.c and so that we can > call it from here. > Hmm, I’d really like to do this, but I have one remark/question about it. In builtin/update-index.c, when add_cache_entry() fails, this message is printed: cannot add to the index - missing --add option? Obviously, this is not what we want to show in git-merge when add_index_entry() fails. But then, verify_path() can also fail, and will show a sensible message for any situation: Invalid path '%s' Should I return error when verify_path() fails, but eg. -2 in the case of add_index_entry(), and if this new add_cacheinfo() returns -2 but not -1, print the correct message? Or let the caller verify the path so it cannot fail because of this? >> +static int checkout_from_index(struct index_state *istate, const char *path) >> +{ >> + struct checkout state = CHECKOUT_INIT; >> + struct cache_entry *ce; >> + >> + state.istate = istate; >> + state.force = 1; >> + state.base_dir = ""; >> + state.base_dir_len = 0; >> + >> + ce = index_file_exists(istate, path, strlen(path), 0); > > This call is unfortunate for the reasons I mention later. > > But if you must have this call, then you need to sanity check what > you get from index_file_exists(). ce must be a merged cache entry, > so > > if (!ce || ce_stage(ce)) > BUG(...); > That’s ok, I managed to remove it following your advice. >> + if (checkout_entry(ce, &state, NULL, NULL) < 0) >> + return error(_("%s: cannot checkout file"), path); >> + return 0; >> +} >> + >> +static int merge_one_file_deleted(struct index_state *istate, >> + const struct object_id *orig_blob, >> + const struct object_id *our_blob, >> + const struct object_id *their_blob, const char *path, >> + unsigned int orig_mode, unsigned int our_mode, unsigned int their_mode) >> +{ >> + if ((our_blob && orig_mode != our_mode) || >> + (their_blob && orig_mode != their_mode)) >> + return error(_("File %s deleted on one branch but had its " >> + "permissions changed on the other."), path); >> + >> + if (our_blob) { >> + printf(_("Removing %s\n"), path); >> + >> + if (file_exists(path)) >> + remove_path(path); >> + } >> + >> + if (remove_file_from_index(istate, path)) >> + return error("%s: cannot remove from the index", path); >> + return 0; > > If the side that did not remove changed the mode, we don't silently > remove but fail and give a chance to inspect the situation to the > end user. If we had the blob and it is removed by them, we give a > message and only in that case we remove the file from the working > tree, together with any leading directory that has become empty. > > And after that we make sure that the path is no longer in the > index. The function removes entries for the path at all the stages, > which is exactly what we want. > > OK. > >> +} >> + >> +static int do_merge_one_file(struct index_state *istate, >> + const struct object_id *orig_blob, >> + const struct object_id *our_blob, >> + const struct object_id *their_blob, const char *path, >> + unsigned int orig_mode, unsigned int our_mode, unsigned int their_mode) >> +{ >> + int ret, i, dest; >> + ssize_t written; >> + mmbuffer_t result = {NULL, 0}; >> + mmfile_t mmfs[3]; >> + struct ll_merge_options merge_opts = {0}; >> + struct cache_entry *ce; >> + >> + if (our_mode == S_IFLNK || their_mode == S_IFLNK) >> + return error(_("%s: Not merging symbolic link changes."), path); >> + else if (our_mode == S_IFGITLINK || their_mode == S_IFGITLINK) >> + return error(_("%s: Not merging conflicting submodule changes."), path); >> + >> + read_mmblob(mmfs + 1, our_blob); >> + read_mmblob(mmfs + 2, their_blob); >> + >> + if (orig_blob) { >> + printf(_("Auto-merging %s\n"), path); >> + read_mmblob(mmfs + 0, orig_blob); >> + } else { >> + printf(_("Added %s in both, but differently.\n"), path); >> + read_mmblob(mmfs + 0, &null_oid); >> + } >> + >> + merge_opts.xdl_opts = XDL_MERGE_ZEALOUS_ALNUM; >> + ret = ll_merge(&result, path, >> + mmfs + 0, "orig", >> + mmfs + 1, "our", >> + mmfs + 2, "their", >> + istate, &merge_opts); > > Is it correct to call into ll_merge() here? The original used to > call "git merge-file" which called into xdl_merge(). Calling into > ll_merge() means the path is used to look up the attributes and use > the custom merge driver, which I am not offhand sure is what we want > to see at this low level (and if it turns out to be a good idea, we > definitely should explain the change of semantics in the proposed > log message for this commit). > >> + for (i = 0; i < 3; i++) >> + free(mmfs[i].ptr); >> + >> + if (ret < 0) { >> + free(result.ptr); >> + return error(_("Failed to execute internal merge")); >> + } >> + >> + /* >> + * Create the working tree file, using "our tree" version from >> + * the index, and then store the result of the merge. >> + */ > > The above is copied from the original, to explain what it did after > the comment, but it does not seem to match what the new code does. > >> + ce = index_file_exists(istate, path, strlen(path), 0); >> + if (!ce) >> + BUG("file is not present in the cache?"); >> + >> + unlink(path); >> + if ((dest = open(path, O_WRONLY | O_CREAT, ce->ce_mode)) < 0) { >> + free(result.ptr); >> + return error_errno(_("failed to open file '%s'"), path); >> + } >> + >> + written = write_in_full(dest, result.ptr, result.size); >> + close(dest); >> + >> + free(result.ptr); >> + >> + if (written < 0) >> + return error_errno(_("failed to write to '%s'"), path); >> + > > This open(..., ce->ce_mode) call is way insufficient. > > The comment we have above this part of the code talks about the > difficulty of doing this correctly in scripted version. Creating a > file by 'git checkout-index -f --stage=2 -- "$4"' and reusing it to > store the merged contents was the cleanest and easiest way without > having direct access to adjust_shared_perm() to create a working > tree file with the correct permission bits. > > We are writing in C, so we should be able to do much better than the > scripted version, as we can later call adjust_shared_perm(). > I'm not sure I understand the issue correctly. Is this because I fetch an entry from the index to have the mode of the file, instead of using `our_mode'? So I should move the error handling of ll_merge()/xdl_merge() and the detection of the permission conflict before writing in the file, and call open(…, our_mode)? I'm also not sure why we need adjust_shared_perm() here. >> + if (ret != 0 || !orig_blob) >> + ret = error(_("content conflict in %s"), path); >> + if (our_mode != their_mode) >> + return error(_("permission conflict: %o->%o,%o in %s"), >> + orig_mode, our_mode, their_mode, path); >> + if (ret) >> + return -1; >> + >> + return add_file_to_index(istate, path, 0); >> +} >> + >> +int merge_strategies_one_file(struct repository *r, >> + const struct object_id *orig_blob, >> + const struct object_id *our_blob, >> + const struct object_id *their_blob, const char *path, >> + unsigned int orig_mode, unsigned int our_mode, >> + unsigned int their_mode) >> +{ > > In a long if/else if/else if/.../else cascade, enclose all bodies in > braces, if any one of them has a multi-statement body, to avoid > being distracting. > >> + if (orig_blob && >> + ((!their_blob && our_blob && oideq(orig_blob, our_blob)) || >> + (!our_blob && their_blob && oideq(orig_blob, their_blob)))) >> + /* Deleted in both or deleted in one and unchanged in the other. */ >> + return merge_one_file_deleted(r->index, >> + orig_blob, our_blob, their_blob, path, >> + orig_mode, our_mode, their_mode); > > OK, we've already reviewed that function. > >> + else if (!orig_blob && our_blob && !their_blob) { >> + /* >> + * Added in one. The other side did not add and we >> + * added so there is nothing to be done, except making >> + * the path merged. >> + */ >> + return add_to_index_cacheinfo(r->index, our_mode, our_blob, path); > > OK, we've already reviewed that function. > >> + } else if (!orig_blob && !our_blob && their_blob) { >> + printf(_("Adding %s\n"), path); >> + >> + if (file_exists(path)) >> + return error(_("untracked %s is overwritten by the merge."), path); >> + >> + if (add_to_index_cacheinfo(r->index, their_mode, their_blob, path)) >> + return -1; >> + return checkout_from_index(r->index, path); > > You did "add_to_index_cacheinfo()", so you MUST know which ce is to > be checked out. > > Consider if it is worth to teach add_to_index_cacheinfo() to give > you ce back and pass it to checkout_from_index(); that way, you do > not have to call index_file_exists() based on path in the function. > OK, this is doable. >> + } else if (!orig_blob && our_blob && their_blob && >> + oideq(our_blob, their_blob)) { >> + /* Added in both, identically (check for same permissions). */ >> + if (our_mode != their_mode) >> + return error(_("File %s added identically in both branches, " >> + "but permissions conflict %o->%o."), >> + path, our_mode, their_mode); >> + >> + printf(_("Adding %s\n"), path); >> + >> + if (add_to_index_cacheinfo(r->index, our_mode, our_blob, path)) >> + return -1; >> + return checkout_from_index(r->index, path); > > Ditto. > >> + } else if (our_blob && their_blob) >> + /* Modified in both, but differently. */ >> + return do_merge_one_file(r->index, >> + orig_blob, our_blob, their_blob, path, >> + orig_mode, our_mode, their_mode); >> + else { >> + char orig_hex[GIT_MAX_HEXSZ] = {0}, our_hex[GIT_MAX_HEXSZ] = {0}, >> + their_hex[GIT_MAX_HEXSZ] = {0}; >> + >> + if (orig_blob) >> + oid_to_hex_r(orig_hex, orig_blob); >> + if (our_blob) >> + oid_to_hex_r(our_hex, our_blob); >> + if (their_blob) >> + oid_to_hex_r(their_hex, their_blob); >> + >> + return error(_("%s: Not handling case %s -> %s -> %s"), >> + path, orig_hex, our_hex, their_hex); >> + } >> + >> + return 0; >> +} > > I can see that this does go in the right direction. With a bit more > attention to details it would soon be production-ready quality. > > Thanks. > Thank you, Alban
Alban Gruin <alban.gruin@gmail.com> writes: >>> + /* >>> + * Create the working tree file, using "our tree" version from >>> + * the index, and then store the result of the merge. >>> + */ >> >> The above is copied from the original, to explain what it did after >> the comment, but it does not seem to match what the new code does. >> >>> + ce = index_file_exists(istate, path, strlen(path), 0); >>> + if (!ce) >>> + BUG("file is not present in the cache?"); >>> + >>> + unlink(path); >>> + if ((dest = open(path, O_WRONLY | O_CREAT, ce->ce_mode)) < 0) { >>> + free(result.ptr); >>> + return error_errno(_("failed to open file '%s'"), path); >>> + } >>> + >>> + written = write_in_full(dest, result.ptr, result.size); >>> + close(dest); >>> + >>> + free(result.ptr); >>> + >>> + if (written < 0) >>> + return error_errno(_("failed to write to '%s'"), path); >>> + >> >> This open(..., ce->ce_mode) call is way insufficient. >> >> The comment we have above this part of the code talks about the >> difficulty of doing this correctly in scripted version. Creating a >> file by 'git checkout-index -f --stage=2 -- "$4"' and reusing it to >> store the merged contents was the cleanest and easiest way without >> having direct access to adjust_shared_perm() to create a working >> tree file with the correct permission bits. The original that the comment applies to does this git checkout-index -f --stage=2 -- "$4" && cat "$src1" >"$4" || exit 1 to create path "$4" with the correct mode bits, instead of a naïve mv "$src1" "$4" because the filemode 'git checkout-index -f --stage=2 -- "$4"' gives to file "$4" is by definition the most correct one for the path. The command knows how user's umask and type bits in the index should interact and produce the final mode bits, but "$src1" was created without any regard to the mode bits---the 'git merge-file' command only cares about the contents and not filemode. We can even lose the executable bit that way. And preparing "$4" and then catting the computed contents into it was a roundabout way (it wastes the entire writing-out of the contents from the index), and that was what the comment was about. But all that is unnecessary once you port this to C. So the comment does not apply to the code you wrote, I think, and should just be dropped.
Alban Gruin <alban.gruin@gmail.com> writes: >>> + int ret, i, dest; >>> + ssize_t written; >>> + mmbuffer_t result = {NULL, 0}; >>> + mmfile_t mmfs[3]; >>> + struct ll_merge_options merge_opts = {0}; >>> + struct cache_entry *ce; >>> + >>> + if (our_mode == S_IFLNK || their_mode == S_IFLNK) >>> + return error(_("%s: Not merging symbolic link changes."), path); >>> + else if (our_mode == S_IFGITLINK || their_mode == S_IFGITLINK) >>> + return error(_("%s: Not merging conflicting submodule changes."), path); >>> + >>> + read_mmblob(mmfs + 1, our_blob); >>> + read_mmblob(mmfs + 2, their_blob); >>> + >>> + if (orig_blob) { >>> + printf(_("Auto-merging %s\n"), path); >>> + read_mmblob(mmfs + 0, orig_blob); >>> + } else { >>> + printf(_("Added %s in both, but differently.\n"), path); >>> + read_mmblob(mmfs + 0, &null_oid); >>> + } >>> + >>> + merge_opts.xdl_opts = XDL_MERGE_ZEALOUS_ALNUM; >>> + ret = ll_merge(&result, path, >>> + mmfs + 0, "orig", >>> + mmfs + 1, "our", >>> + mmfs + 2, "their", >>> + istate, &merge_opts); >> >> Is it correct to call into ll_merge() here? The original used to >> call "git merge-file" which called into xdl_merge(). Calling into >> ll_merge() means the path is used to look up the attributes and use >> the custom merge driver, which I am not offhand sure is what we want >> to see at this low level (and if it turns out to be a good idea, we >> definitely should explain the change of semantics in the proposed >> log message for this commit). I am still not sure if it is correct to call ll_merge() and not the xdl_merge() from here. We need to highlight this change in the log message, if we were still going to do this. Thanks.
Junio C Hamano <gitster@pobox.com> writes: >>> This open(..., ce->ce_mode) call is way insufficient. >>> ... > But all that is unnecessary once you port this to C. So the comment > does not apply to the code you wrote, I think, and should just be > dropped. Sorry, forgot to mention one thing. Using ce->ce_mode to create the output file is the way how helpers in entry.c check out paths from the index to the working tree, so the code is OK. It's just the copied comment was about the issue that your code did not even have to worry about. Thanks.
diff --git a/Makefile b/Makefile index de53954590..6dfdb33cb2 100644 --- a/Makefile +++ b/Makefile @@ -601,7 +601,6 @@ SCRIPT_SH += git-bisect.sh SCRIPT_SH += git-difftool--helper.sh SCRIPT_SH += git-filter-branch.sh SCRIPT_SH += git-merge-octopus.sh -SCRIPT_SH += git-merge-one-file.sh SCRIPT_SH += git-merge-resolve.sh SCRIPT_SH += git-mergetool.sh SCRIPT_SH += git-quiltimport.sh @@ -909,6 +908,7 @@ LIB_OBJS += match-trees.o LIB_OBJS += mem-pool.o LIB_OBJS += merge-blobs.o LIB_OBJS += merge-recursive.o +LIB_OBJS += merge-strategies.o LIB_OBJS += merge.o LIB_OBJS += mergesort.o LIB_OBJS += midx.o @@ -1094,6 +1094,7 @@ BUILTIN_OBJS += builtin/mailsplit.o BUILTIN_OBJS += builtin/merge-base.o BUILTIN_OBJS += builtin/merge-file.o BUILTIN_OBJS += builtin/merge-index.o +BUILTIN_OBJS += builtin/merge-one-file.o BUILTIN_OBJS += builtin/merge-ours.o BUILTIN_OBJS += builtin/merge-recursive.o BUILTIN_OBJS += builtin/merge-tree.o diff --git a/builtin.h b/builtin.h index 53fb290963..4d2cd78856 100644 --- a/builtin.h +++ b/builtin.h @@ -178,6 +178,7 @@ int cmd_merge_base(int argc, const char **argv, const char *prefix); int cmd_merge_index(int argc, const char **argv, const char *prefix); int cmd_merge_ours(int argc, const char **argv, const char *prefix); int cmd_merge_file(int argc, const char **argv, const char *prefix); +int cmd_merge_one_file(int argc, const char **argv, const char *prefix); int cmd_merge_recursive(int argc, const char **argv, const char *prefix); int cmd_merge_tree(int argc, const char **argv, const char *prefix); int cmd_mktag(int argc, const char **argv, const char *prefix); diff --git a/builtin/merge-one-file.c b/builtin/merge-one-file.c new file mode 100644 index 0000000000..598338ba16 --- /dev/null +++ b/builtin/merge-one-file.c @@ -0,0 +1,92 @@ +/* + * Builtin "git merge-one-file" + * + * Copyright (c) 2020 Alban Gruin + * + * Based on git-merge-one-file.sh, written by Linus Torvalds. + * + * This is the git per-file merge utility, called with + * + * argv[1] - original file SHA1 (or empty) + * argv[2] - file in branch1 SHA1 (or empty) + * argv[3] - file in branch2 SHA1 (or empty) + * argv[4] - pathname in repository + * argv[5] - original file mode (or empty) + * argv[6] - file in branch1 mode (or empty) + * argv[7] - file in branch2 mode (or empty) + * + * Handle some trivial cases. The _really_ trivial cases have been + * handled already by git read-tree, but that one doesn't do any merges + * that might change the tree layout. + */ + +#define USE_THE_INDEX_COMPATIBILITY_MACROS +#include "cache.h" +#include "builtin.h" +#include "lockfile.h" +#include "merge-strategies.h" + +static const char builtin_merge_one_file_usage[] = + "git merge-one-file <orig blob> <our blob> <their blob> <path> " + "<orig mode> <our mode> <their mode>\n\n" + "Blob ids and modes should be empty for missing files."; + +static int read_mode(const char *name, const char *arg, unsigned int *mode) +{ + char *last; + int ret = 0; + + *mode = strtol(arg, &last, 8); + + if (*last) + ret = error(_("invalid '%s' mode: expected nothing, got '%c'"), name, *last); + else if (!(S_ISREG(*mode) || S_ISDIR(*mode) || S_ISLNK(*mode))) + ret = error(_("invalid '%s' mode: %o"), name, *mode); + + return ret; +} + +int cmd_merge_one_file(int argc, const char **argv, const char *prefix) +{ + struct object_id orig_blob, our_blob, their_blob, + *p_orig_blob = NULL, *p_our_blob = NULL, *p_their_blob = NULL; + unsigned int orig_mode = 0, our_mode = 0, their_mode = 0, ret = 0; + struct lock_file lock = LOCK_INIT; + + if (argc != 8) + usage(builtin_merge_one_file_usage); + + if (read_cache() < 0) + die("invalid index"); + + hold_locked_index(&lock, LOCK_DIE_ON_ERROR); + + if (!get_oid(argv[1], &orig_blob)) { + p_orig_blob = &orig_blob; + ret = read_mode("orig", argv[5], &orig_mode); + } + + if (!get_oid(argv[2], &our_blob)) { + p_our_blob = &our_blob; + ret = read_mode("our", argv[6], &our_mode); + } + + if (!get_oid(argv[3], &their_blob)) { + p_their_blob = &their_blob; + ret = read_mode("their", argv[7], &their_mode); + } + + if (ret) + return ret; + + ret = merge_strategies_one_file(the_repository, + p_orig_blob, p_our_blob, p_their_blob, argv[4], + orig_mode, our_mode, their_mode); + + if (ret) { + rollback_lock_file(&lock); + return !!ret; + } + + return write_locked_index(&the_index, &lock, COMMIT_LOCK); +} diff --git a/git-merge-one-file.sh b/git-merge-one-file.sh deleted file mode 100755 index f6d9852d2f..0000000000 --- a/git-merge-one-file.sh +++ /dev/null @@ -1,167 +0,0 @@ -#!/bin/sh -# -# Copyright (c) Linus Torvalds, 2005 -# -# This is the git per-file merge script, called with -# -# $1 - original file SHA1 (or empty) -# $2 - file in branch1 SHA1 (or empty) -# $3 - file in branch2 SHA1 (or empty) -# $4 - pathname in repository -# $5 - original file mode (or empty) -# $6 - file in branch1 mode (or empty) -# $7 - file in branch2 mode (or empty) -# -# Handle some trivial cases.. The _really_ trivial cases have -# been handled already by git read-tree, but that one doesn't -# do any merges that might change the tree layout. - -USAGE='<orig blob> <our blob> <their blob> <path>' -USAGE="$USAGE <orig mode> <our mode> <their mode>" -LONG_USAGE="usage: git merge-one-file $USAGE - -Blob ids and modes should be empty for missing files." - -SUBDIRECTORY_OK=Yes -. git-sh-setup -cd_to_toplevel -require_work_tree - -if test $# != 7 -then - echo "$LONG_USAGE" - exit 1 -fi - -case "${1:-.}${2:-.}${3:-.}" in -# -# Deleted in both or deleted in one and unchanged in the other -# -"$1.." | "$1.$1" | "$1$1.") - if { test -z "$6" && test "$5" != "$7"; } || - { test -z "$7" && test "$5" != "$6"; } - then - echo "ERROR: File $4 deleted on one branch but had its" >&2 - echo "ERROR: permissions changed on the other." >&2 - exit 1 - fi - - if test -n "$2" - then - echo "Removing $4" - else - # read-tree checked that index matches HEAD already, - # so we know we do not have this path tracked. - # there may be an unrelated working tree file here, - # which we should just leave unmolested. Make sure - # we do not have it in the index, though. - exec git update-index --remove -- "$4" - fi - if test -f "$4" - then - rm -f -- "$4" && - rmdir -p "$(expr "z$4" : 'z\(.*\)/')" 2>/dev/null || : - fi && - exec git update-index --remove -- "$4" - ;; - -# -# Added in one. -# -".$2.") - # the other side did not add and we added so there is nothing - # to be done, except making the path merged. - exec git update-index --add --cacheinfo "$6" "$2" "$4" - ;; -"..$3") - echo "Adding $4" - if test -f "$4" - then - echo "ERROR: untracked $4 is overwritten by the merge." >&2 - exit 1 - fi - git update-index --add --cacheinfo "$7" "$3" "$4" && - exec git checkout-index -u -f -- "$4" - ;; - -# -# Added in both, identically (check for same permissions). -# -".$3$2") - if test "$6" != "$7" - then - echo "ERROR: File $4 added identically in both branches," >&2 - echo "ERROR: but permissions conflict $6->$7." >&2 - exit 1 - fi - echo "Adding $4" - git update-index --add --cacheinfo "$6" "$2" "$4" && - exec git checkout-index -u -f -- "$4" - ;; - -# -# Modified in both, but differently. -# -"$1$2$3" | ".$2$3") - - case ",$6,$7," in - *,120000,*) - echo "ERROR: $4: Not merging symbolic link changes." >&2 - exit 1 - ;; - *,160000,*) - echo "ERROR: $4: Not merging conflicting submodule changes." >&2 - exit 1 - ;; - esac - - src1=$(git unpack-file $2) - src2=$(git unpack-file $3) - case "$1" in - '') - echo "Added $4 in both, but differently." - orig=$(git unpack-file $(git hash-object /dev/null)) - ;; - *) - echo "Auto-merging $4" - orig=$(git unpack-file $1) - ;; - esac - - git merge-file "$src1" "$orig" "$src2" - ret=$? - msg= - if test $ret != 0 || test -z "$1" - then - msg='content conflict' - ret=1 - fi - - # Create the working tree file, using "our tree" version from the - # index, and then store the result of the merge. - git checkout-index -f --stage=2 -- "$4" && cat "$src1" >"$4" || exit 1 - rm -f -- "$orig" "$src1" "$src2" - - if test "$6" != "$7" - then - if test -n "$msg" - then - msg="$msg, " - fi - msg="${msg}permissions conflict: $5->$6,$7" - ret=1 - fi - - if test $ret != 0 - then - echo "ERROR: $msg in $4" >&2 - exit 1 - fi - exec git update-index -- "$4" - ;; - -*) - echo "ERROR: $4: Not handling case $1 -> $2 -> $3" >&2 - ;; -esac -exit 1 diff --git a/git.c b/git.c index f1e8b56d99..a4d3f98094 100644 --- a/git.c +++ b/git.c @@ -540,6 +540,7 @@ static struct cmd_struct commands[] = { { "merge-file", cmd_merge_file, RUN_SETUP_GENTLY }, { "merge-index", cmd_merge_index, RUN_SETUP | NO_PARSEOPT }, { "merge-ours", cmd_merge_ours, RUN_SETUP | NO_PARSEOPT }, + { "merge-one-file", cmd_merge_one_file, RUN_SETUP | NEED_WORK_TREE | NO_PARSEOPT }, { "merge-recursive", cmd_merge_recursive, RUN_SETUP | NEED_WORK_TREE | NO_PARSEOPT }, { "merge-recursive-ours", cmd_merge_recursive, RUN_SETUP | NEED_WORK_TREE | NO_PARSEOPT }, { "merge-recursive-theirs", cmd_merge_recursive, RUN_SETUP | NEED_WORK_TREE | NO_PARSEOPT }, diff --git a/merge-strategies.c b/merge-strategies.c new file mode 100644 index 0000000000..bbe6f48698 --- /dev/null +++ b/merge-strategies.c @@ -0,0 +1,214 @@ +#include "cache.h" +#include "dir.h" +#include "ll-merge.h" +#include "merge-strategies.h" +#include "xdiff-interface.h" + +static int add_to_index_cacheinfo(struct index_state *istate, + unsigned int mode, + const struct object_id *oid, const char *path) +{ + struct cache_entry *ce; + int len, option; + + if (!verify_path(path, mode)) + return error(_("Invalid path '%s'"), path); + + len = strlen(path); + ce = make_empty_cache_entry(istate, len); + + oidcpy(&ce->oid, oid); + memcpy(ce->name, path, len); + ce->ce_flags = create_ce_flags(0); + ce->ce_namelen = len; + ce->ce_mode = create_ce_mode(mode); + if (assume_unchanged) + ce->ce_flags |= CE_VALID; + option = ADD_CACHE_OK_TO_ADD | ADD_CACHE_OK_TO_REPLACE; + if (add_index_entry(istate, ce, option)) + return error(_("%s: cannot add to the index"), path); + + return 0; +} + +static int checkout_from_index(struct index_state *istate, const char *path) +{ + struct checkout state = CHECKOUT_INIT; + struct cache_entry *ce; + + state.istate = istate; + state.force = 1; + state.base_dir = ""; + state.base_dir_len = 0; + + ce = index_file_exists(istate, path, strlen(path), 0); + if (checkout_entry(ce, &state, NULL, NULL) < 0) + return error(_("%s: cannot checkout file"), path); + return 0; +} + +static int merge_one_file_deleted(struct index_state *istate, + const struct object_id *orig_blob, + const struct object_id *our_blob, + const struct object_id *their_blob, const char *path, + unsigned int orig_mode, unsigned int our_mode, unsigned int their_mode) +{ + if ((our_blob && orig_mode != our_mode) || + (their_blob && orig_mode != their_mode)) + return error(_("File %s deleted on one branch but had its " + "permissions changed on the other."), path); + + if (our_blob) { + printf(_("Removing %s\n"), path); + + if (file_exists(path)) + remove_path(path); + } + + if (remove_file_from_index(istate, path)) + return error("%s: cannot remove from the index", path); + return 0; +} + +static int do_merge_one_file(struct index_state *istate, + const struct object_id *orig_blob, + const struct object_id *our_blob, + const struct object_id *their_blob, const char *path, + unsigned int orig_mode, unsigned int our_mode, unsigned int their_mode) +{ + int ret, i, dest; + ssize_t written; + mmbuffer_t result = {NULL, 0}; + mmfile_t mmfs[3]; + struct ll_merge_options merge_opts = {0}; + struct cache_entry *ce; + + if (our_mode == S_IFLNK || their_mode == S_IFLNK) + return error(_("%s: Not merging symbolic link changes."), path); + else if (our_mode == S_IFGITLINK || their_mode == S_IFGITLINK) + return error(_("%s: Not merging conflicting submodule changes."), path); + + read_mmblob(mmfs + 1, our_blob); + read_mmblob(mmfs + 2, their_blob); + + if (orig_blob) { + printf(_("Auto-merging %s\n"), path); + read_mmblob(mmfs + 0, orig_blob); + } else { + printf(_("Added %s in both, but differently.\n"), path); + read_mmblob(mmfs + 0, &null_oid); + } + + merge_opts.xdl_opts = XDL_MERGE_ZEALOUS_ALNUM; + ret = ll_merge(&result, path, + mmfs + 0, "orig", + mmfs + 1, "our", + mmfs + 2, "their", + istate, &merge_opts); + + for (i = 0; i < 3; i++) + free(mmfs[i].ptr); + + if (ret < 0) { + free(result.ptr); + return error(_("Failed to execute internal merge")); + } + + /* + * Create the working tree file, using "our tree" version from + * the index, and then store the result of the merge. + */ + ce = index_file_exists(istate, path, strlen(path), 0); + if (!ce) + BUG("file is not present in the cache?"); + + unlink(path); + if ((dest = open(path, O_WRONLY | O_CREAT, ce->ce_mode)) < 0) { + free(result.ptr); + return error_errno(_("failed to open file '%s'"), path); + } + + written = write_in_full(dest, result.ptr, result.size); + close(dest); + + free(result.ptr); + + if (written < 0) + return error_errno(_("failed to write to '%s'"), path); + + if (ret != 0 || !orig_blob) + ret = error(_("content conflict in %s"), path); + if (our_mode != their_mode) + return error(_("permission conflict: %o->%o,%o in %s"), + orig_mode, our_mode, their_mode, path); + if (ret) + return -1; + + return add_file_to_index(istate, path, 0); +} + +int merge_strategies_one_file(struct repository *r, + const struct object_id *orig_blob, + const struct object_id *our_blob, + const struct object_id *their_blob, const char *path, + unsigned int orig_mode, unsigned int our_mode, + unsigned int their_mode) +{ + if (orig_blob && + ((!their_blob && our_blob && oideq(orig_blob, our_blob)) || + (!our_blob && their_blob && oideq(orig_blob, their_blob)))) + /* Deleted in both or deleted in one and unchanged in the other. */ + return merge_one_file_deleted(r->index, + orig_blob, our_blob, their_blob, path, + orig_mode, our_mode, their_mode); + else if (!orig_blob && our_blob && !their_blob) { + /* + * Added in one. The other side did not add and we + * added so there is nothing to be done, except making + * the path merged. + */ + return add_to_index_cacheinfo(r->index, our_mode, our_blob, path); + } else if (!orig_blob && !our_blob && their_blob) { + printf(_("Adding %s\n"), path); + + if (file_exists(path)) + return error(_("untracked %s is overwritten by the merge."), path); + + if (add_to_index_cacheinfo(r->index, their_mode, their_blob, path)) + return -1; + return checkout_from_index(r->index, path); + } else if (!orig_blob && our_blob && their_blob && + oideq(our_blob, their_blob)) { + /* Added in both, identically (check for same permissions). */ + if (our_mode != their_mode) + return error(_("File %s added identically in both branches, " + "but permissions conflict %o->%o."), + path, our_mode, their_mode); + + printf(_("Adding %s\n"), path); + + if (add_to_index_cacheinfo(r->index, our_mode, our_blob, path)) + return -1; + return checkout_from_index(r->index, path); + } else if (our_blob && their_blob) + /* Modified in both, but differently. */ + return do_merge_one_file(r->index, + orig_blob, our_blob, their_blob, path, + orig_mode, our_mode, their_mode); + else { + char orig_hex[GIT_MAX_HEXSZ] = {0}, our_hex[GIT_MAX_HEXSZ] = {0}, + their_hex[GIT_MAX_HEXSZ] = {0}; + + if (orig_blob) + oid_to_hex_r(orig_hex, orig_blob); + if (our_blob) + oid_to_hex_r(our_hex, our_blob); + if (their_blob) + oid_to_hex_r(their_hex, their_blob); + + return error(_("%s: Not handling case %s -> %s -> %s"), + path, orig_hex, our_hex, their_hex); + } + + return 0; +} diff --git a/merge-strategies.h b/merge-strategies.h new file mode 100644 index 0000000000..b527d145c7 --- /dev/null +++ b/merge-strategies.h @@ -0,0 +1,13 @@ +#ifndef MERGE_STRATEGIES_H +#define MERGE_STRATEGIES_H + +#include "object.h" + +int merge_strategies_one_file(struct repository *r, + const struct object_id *orig_blob, + const struct object_id *our_blob, + const struct object_id *their_blob, const char *path, + unsigned int orig_mode, unsigned int our_mode, + unsigned int their_mode); + +#endif /* MERGE_STRATEGIES_H */ diff --git a/t/t6415-merge-dir-to-symlink.sh b/t/t6415-merge-dir-to-symlink.sh index 2eddcc7664..5fb74e39a0 100755 --- a/t/t6415-merge-dir-to-symlink.sh +++ b/t/t6415-merge-dir-to-symlink.sh @@ -94,7 +94,7 @@ test_expect_success SYMLINKS 'a/b was resolved as symlink' ' test -h a/b ' -test_expect_failure 'do not lose untracked in merge (resolve)' ' +test_expect_success 'do not lose untracked in merge (resolve)' ' git reset --hard && git checkout baseline^0 && >a/b/c/e &&
This rewrites `git merge-one-file' from shell to C. This port is not completely straightforward: to save precious cycles by avoiding reading and flushing the index repeatedly, write temporary files when an operation can be performed in-memory, or allow other function to use the rewrite without forking nor worrying about the index, the calls to external processes are replaced by calls to functions in libgit.a: - calls to `update-index --add --cacheinfo' are replaced by calls to add_cache_entry(); - calls to `update-index --remove' are replaced by calls to remove_file_from_cache(); - calls to `checkout-index -u -f' are replaced by calls to checkout_entry(); - calls to `unpack-file' and `merge-files' are replaced by calls to read_mmblob() and ll_merge(), respectively, to merge files in-memory; - calls to `checkout-index -f --stage=2' are replaced by calls to cache_file_exists(); - calls to `update-index' are replaced by calls to add_file_to_cache(). The bulk of the rewrite is done in a new file in libgit.a, merge-strategies.c. This will enable the resolve and octopus strategies to directly call it instead of forking. This also fixes a bug present in the original script: instead of checking if a _regular_ file exists when a file exists in the branch to merge, but not in our branch, the rewritten version checks if a file of any kind (ie. a directory, ...) exists. This fixes the tests t6035.14, where the branch to merge had a new file, `a/b', but our branch had a directory there; it should have failed because a directory exists, but it did not because there was no regular file called `a/b'. This test is now marked as successful. Signed-off-by: Alban Gruin <alban.gruin@gmail.com> --- Makefile | 3 +- builtin.h | 1 + builtin/merge-one-file.c | 92 ++++++++++++++ git-merge-one-file.sh | 167 ------------------------- git.c | 1 + merge-strategies.c | 214 ++++++++++++++++++++++++++++++++ merge-strategies.h | 13 ++ t/t6415-merge-dir-to-symlink.sh | 2 +- 8 files changed, 324 insertions(+), 169 deletions(-) create mode 100644 builtin/merge-one-file.c delete mode 100755 git-merge-one-file.sh create mode 100644 merge-strategies.c create mode 100644 merge-strategies.h