Message ID | 20201124115315.13311-7-alban.gruin@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Rewrite the remaining merge strategies from shell to C | expand |
On 11/24/2020 6:53 AM, Alban Gruin wrote: > + > pgm = argv[i++]; > + setup_work_tree(); > + > + if (!strcmp(pgm, "git-merge-one-file")) { This stood out to me as possibly fragile. What if we call the non-dashed form "git merge-one-file"? Shouldn't we be doing so? Or, is this something that is handled higher in the builtin machinery to take the non-dashed version and change it to the dashed version for historical reasons? > + merge_action = merge_one_file_func; > + hold_locked_index(&lock, LOCK_DIE_ON_ERROR); > + } else { > + merge_action = merge_one_file_spawn; > + data = (void *)pgm; > + } > + ... > + if (merge_action == merge_one_file_func) { nit: This made me think it would be better to check the 'lock' itself to see if it was initialized or not. Perhaps if (lock.tempfile) { would be the appropriate way to check this? For now, this is equivalent behavior, but it might be helpful if we add more cases that take the lock in the future. > + if (err) { > + rollback_lock_file(&lock); > + return err; > + } > + > + return write_locked_index(&the_index, &lock, COMMIT_LOCK); > } > return err; nit: this could be simplified. In total, I recommend: if (lock.tempfile) { if (err) rollback_lock_file(&lock); else return write_locked_index(&the_index, &lock, COMMIT_LOCK); } return err; > } > diff --git a/merge-strategies.c b/merge-strategies.c > index 6f27e66dfe..542cefcf3d 100644 > --- a/merge-strategies.c > +++ b/merge-strategies.c > @@ -178,6 +178,18 @@ int merge_three_way(struct repository *r, > return 0; > } > > +int merge_one_file_func(struct repository *r, > + const struct object_id *orig_blob, > + const struct object_id *our_blob, > + const struct object_id *their_blob, const char *path, > + unsigned int orig_mode, unsigned int our_mode, unsigned int their_mode, > + void *data) > +{ > + return merge_three_way(r, > + orig_blob, our_blob, their_blob, path, > + orig_mode, our_mode, their_mode); > +} > + Again, I don't recommend making this callback in the library. Instead, keep it in the builtin and then use merge_three_way() which is in the library. > int merge_one_file_spawn(struct repository *r, > const struct object_id *orig_blob, > const struct object_id *our_blob, > @@ -261,17 +273,22 @@ int merge_all_index(struct repository *r, int oneshot, int quiet, > merge_fn fn, void *data) > { > int err = 0, ret; > - unsigned int i; > + unsigned int i, prev_nr; > > for (i = 0; i < r->index->cache_nr; i++) { > const struct cache_entry *ce = r->index->cache[i]; > if (!ce_stage(ce)) > continue; > > + prev_nr = r->index->cache_nr; > ret = merge_entry(r, quiet || oneshot, i, ce->name, &err, fn, data); > - if (ret > 0) > - i += ret - 1; > - else if (ret == -1) > + if (ret > 0) { > + /* Don't bother handling an index that has > + grown, since merge_one_file_func() can't grow > + it, and merge_one_file_spawn() can't change > + it. */ multi-line comment style is as follows: /* * Don't bother handling an index that has * grown, since merge_one_file_func() can't grow * it, and merge_one_file_spawn() can't change it. */ Thanks, -Stolee
On Tue, 5 Jan 2021 at 17:13, Derrick Stolee <stolee@gmail.com> wrote: > > On 11/24/2020 6:53 AM, Alban Gruin wrote: > > + if (merge_action == merge_one_file_func) { > > nit: This made me think it would be better to check the 'lock' > itself to see if it was initialized or not. Perhaps > > if (lock.tempfile) { > > would be the appropriate way to check this? > nit: this could be simplified. In total, I recommend: > > if (lock.tempfile) { > if (err) > rollback_lock_file(&lock); > else > return write_locked_index(&the_index, &lock, COMMIT_LOCK); > } > return err; FWIW, I also find that way of writing it easier to grok. Although, rather than peeking at `lock.tempfile`, I suggest using `is_lock_file_locked(&lock)`. Martin
Le 05/01/2021 à 17:11, Derrick Stolee a écrit : > On 11/24/2020 6:53 AM, Alban Gruin wrote: >> + >> pgm = argv[i++]; >> + setup_work_tree(); >> + >> + if (!strcmp(pgm, "git-merge-one-file")) { > > This stood out to me as possibly fragile. What if we call the > non-dashed form "git merge-one-file"? Shouldn't we be doing so? > > Or, is this something that is handled higher in the builtin > machinery to take the non-dashed version and change it to the > dashed version for historical reasons? > We had the same discussion with Phillip, who pointed out this previous discussion about this topic: https://lore.kernel.org/git/xmqqblv5kr9u.fsf@gitster-ct.c.googlers.com/ So, it's probably OK to do that. >> + merge_action = merge_one_file_func; >> + hold_locked_index(&lock, LOCK_DIE_ON_ERROR); >> + } else { >> + merge_action = merge_one_file_spawn; >> + data = (void *)pgm; >> + } >> + > > ... > >> + if (merge_action == merge_one_file_func) { > > nit: This made me think it would be better to check the 'lock' > itself to see if it was initialized or not. Perhaps > > if (lock.tempfile) { > > would be the appropriate way to check this? > > For now, this is equivalent behavior, but it might be helpful if > we add more cases that take the lock in the future. > >> + if (err) { >> + rollback_lock_file(&lock); >> + return err; >> + } >> + >> + return write_locked_index(&the_index, &lock, COMMIT_LOCK); >> } >> return err; > > nit: this could be simplified. In total, I recommend: > > if (lock.tempfile) { > if (err) > rollback_lock_file(&lock); > else > return write_locked_index(&the_index, &lock, COMMIT_LOCK); > } > return err; > Sure, looks better than mine. :) > >> } >> diff --git a/merge-strategies.c b/merge-strategies.c >> index 6f27e66dfe..542cefcf3d 100644 >> --- a/merge-strategies.c >> +++ b/merge-strategies.c >> @@ -178,6 +178,18 @@ int merge_three_way(struct repository *r, >> return 0; >> } >> >> +int merge_one_file_func(struct repository *r, >> + const struct object_id *orig_blob, >> + const struct object_id *our_blob, >> + const struct object_id *their_blob, const char *path, >> + unsigned int orig_mode, unsigned int our_mode, unsigned int their_mode, >> + void *data) >> +{ >> + return merge_three_way(r, >> + orig_blob, our_blob, their_blob, path, >> + orig_mode, our_mode, their_mode); >> +} >> + > > Again, I don't recommend making this callback in the library. Instead, keep > it in the builtin and then use merge_three_way() which is in the library. > This is not possible with this callback, as it will be used later by merge_strategies_resolve() and indirectly by merge_strategies_octopus(). >> int merge_one_file_spawn(struct repository *r, >> const struct object_id *orig_blob, >> const struct object_id *our_blob, >> @@ -261,17 +273,22 @@ int merge_all_index(struct repository *r, int oneshot, int quiet, >> merge_fn fn, void *data) >> { >> int err = 0, ret; >> - unsigned int i; >> + unsigned int i, prev_nr; >> >> for (i = 0; i < r->index->cache_nr; i++) { >> const struct cache_entry *ce = r->index->cache[i]; >> if (!ce_stage(ce)) >> continue; >> >> + prev_nr = r->index->cache_nr; >> ret = merge_entry(r, quiet || oneshot, i, ce->name, &err, fn, data); >> - if (ret > 0) >> - i += ret - 1; >> - else if (ret == -1) >> + if (ret > 0) { >> + /* Don't bother handling an index that has >> + grown, since merge_one_file_func() can't grow >> + it, and merge_one_file_spawn() can't change >> + it. */ > > multi-line comment style is as follows: > > /* > * Don't bother handling an index that has > * grown, since merge_one_file_func() can't grow > * it, and merge_one_file_spawn() can't change it. > */ > > Thanks, > -Stolee >
Hi Martin & Derrick, Le 05/01/2021 à 18:35, Martin Ågren a écrit : > On Tue, 5 Jan 2021 at 17:13, Derrick Stolee <stolee@gmail.com> wrote: >> >> On 11/24/2020 6:53 AM, Alban Gruin wrote: >>> + if (merge_action == merge_one_file_func) { >> >> nit: This made me think it would be better to check the 'lock' >> itself to see if it was initialized or not. Perhaps >> >> if (lock.tempfile) { >> >> would be the appropriate way to check this? > >> nit: this could be simplified. In total, I recommend: >> >> if (lock.tempfile) { >> if (err) >> rollback_lock_file(&lock); >> else >> return write_locked_index(&the_index, &lock, COMMIT_LOCK); >> } >> return err; > > FWIW, I also find that way of writing it easier to grok. Although, > rather than peeking at `lock.tempfile`, I suggest using > `is_lock_file_locked(&lock)`. > OK, this looks good to me. > Martin > Alban
Alban Gruin <alban.gruin@gmail.com> writes: > We had the same discussion with Phillip, who pointed out this previous > discussion about this topic: > https://lore.kernel.org/git/xmqqblv5kr9u.fsf@gitster-ct.c.googlers.com/ > > So, it's probably OK to do that. These days, there exists an optional installation option exists that won't even install built-in commands in $GIT_EXEC_PATH, which invalidates the assessment made in 2019 in the article you cited above, so the code might still be OK, but the old justification no longer would apply. In any case, if two people who reviewed a patch found the same thing in it fishy, it is an indication that the reason why the apparently fishy code is OK needs to be better explained so that future readers of the code do not have to be puzzled about the same thing. Thanks.
Hi Junio, Le 06/01/2021 à 03:04, Junio C Hamano a écrit : > Alban Gruin <alban.gruin@gmail.com> writes: > >> We had the same discussion with Phillip, who pointed out this previous >> discussion about this topic: >> https://lore.kernel.org/git/xmqqblv5kr9u.fsf@gitster-ct.c.googlers.com/ >> >> So, it's probably OK to do that. > > These days, there exists an optional installation option exists that > won't even install built-in commands in $GIT_EXEC_PATH, which > invalidates the assessment made in 2019 in the article you cited > above, so the code might still be OK, but the old justification no > longer would apply. > > In any case, if two people who reviewed a patch found the same thing > in it fishy, it is an indication that the reason why the apparently > fishy code is OK needs to be better explained so that future readers > of the code do not have to be puzzled about the same thing. > > Thanks. > Perhaps we could try to check if the provided command exists (with locate_in_PATH()), if it does, run it through merge_one_file_spawn(), else, use merge_one_file_func()? Alban
Alban Gruin <alban.gruin@gmail.com> writes: >> These days, there exists an optional installation option exists that >> won't even install built-in commands in $GIT_EXEC_PATH, which >> invalidates the assessment made in 2019 in the article you cited >> above, so the code might still be OK, but the old justification no >> longer would apply. >> >> In any case, if two people who reviewed a patch found the same thing >> in it fishy, it is an indication that the reason why the apparently >> fishy code is OK needs to be better explained so that future readers >> of the code do not have to be puzzled about the same thing. > > Perhaps we could try to check if the provided command exists (with > locate_in_PATH()), if it does, run it through merge_one_file_spawn(), > else, use merge_one_file_func()? So you think your current implementation will be broken if the "no dashed git binary on disk" installation option is used? I do not think "first check if an on-disk command exists and use it, otherwise check its name" alone would work well in practice. Both the 'cat' example that appears in the manual page, and the typical invocation of git-merge-one-file from merge-resolve: git merge-index cat MM git merge-index git-merge-one-file -a would work just as well as before, but does not give you a way to bypass fork() for the latter. And changing the order of checks would mean the users won't have a way to override a buggy builtin implementation of merge_one_file function. Besides, using the name of the binary feels like a bad hack. As the invocation from merge-resolve is purely an internal matter, it may make more sense to introduce a new option and explicitly tell merge-index that the command line is not asking for an external program to be spawned, e.g. git merge-index --use=merge-one-file -a You'd prepare a table of internally implemented "take info on a single path that is being merged and give an automated resolution" functions, which begins with a single entry that maps the string "merge-one-file" to your merge_one_file_func function. Any value to the "--use" option that names a function not in the table would cause an error. Note that in the above the "table of functions" is merely conceptual. It is perfectly OK to implement the single entry table by codeflow (i.e. "if (!strcmp()) ... else error();"). But thinking in terms of "a table of functions the user can choose from" helps to form the right mental picture. Hmm?
Hi Junio, Le 10/01/2021 à 21:51, Junio C Hamano a écrit : > Alban Gruin <alban.gruin@gmail.com> writes: > >>> These days, there exists an optional installation option exists that >>> won't even install built-in commands in $GIT_EXEC_PATH, which >>> invalidates the assessment made in 2019 in the article you cited >>> above, so the code might still be OK, but the old justification no >>> longer would apply. >>> >>> In any case, if two people who reviewed a patch found the same thing >>> in it fishy, it is an indication that the reason why the apparently >>> fishy code is OK needs to be better explained so that future readers >>> of the code do not have to be puzzled about the same thing. >> >> Perhaps we could try to check if the provided command exists (with >> locate_in_PATH()), if it does, run it through merge_one_file_spawn(), >> else, use merge_one_file_func()? > > So you think your current implementation will be broken if the "no > dashed git binary on disk" installation option is used? > > I do not think "first check if an on-disk command exists and use it, > otherwise check its name" alone would work well in practice. Both > the 'cat' example that appears in the manual page, and the typical > invocation of git-merge-one-file from merge-resolve: > > git merge-index cat MM > git merge-index git-merge-one-file -a > > would work just as well as before, but does not give you a way to > bypass fork() for the latter. And changing the order of checks > would mean the users won't have a way to override a buggy builtin > implementation of merge_one_file function. Besides, using the name > of the binary feels like a bad hack. > > As the invocation from merge-resolve is purely an internal matter, > it may make more sense to introduce a new option and explicitly tell > merge-index that the command line is not asking for an external > program to be spawned, e.g. > > git merge-index --use=merge-one-file -a > > You'd prepare a table of internally implemented "take info on a > single path that is being merged and give an automated resolution" > functions, which begins with a single entry that maps the string > "merge-one-file" to your merge_one_file_func function. Any value to > the "--use" option that names a function not in the table would > cause an error. > > Note that in the above the "table of functions" is merely > conceptual. It is perfectly OK to implement the single entry table > by codeflow (i.e. "if (!strcmp()) ... else error();"). But thinking > in terms of "a table of functions the user can choose from" helps to > form the right mental picture. > > Hmm? > Yes, this should work. To achieve this, I think I have to reorder this series a bit. Currently, this is what it looks like: 1. convert git-merge-one-file; 2. libify git-merge-index, add the ability to call merge-one-file directly; 3. convert the resolve strategy; 4. convert the octopus strategy. After the reorder, the series would look like this: 1. libify git-merge-index, add `--use=merge-one-file', change git-merge-resolve.sh, -octopus.sh, and t6060 to use this new parameter; 2. convert git-merge-one-file, add the ability for merge-index to call it directly; 3. convert the resolve strategy; 4. convert the octopus strategy. Alban
diff --git a/builtin/merge-index.c b/builtin/merge-index.c index d5e5713b25..60fcde579f 100644 --- a/builtin/merge-index.c +++ b/builtin/merge-index.c @@ -1,11 +1,15 @@ #define USE_THE_INDEX_COMPATIBILITY_MACROS #include "builtin.h" +#include "lockfile.h" #include "merge-strategies.h" int cmd_merge_index(int argc, const char **argv, const char *prefix) { int i, force_file = 0, err = 0, one_shot = 0, quiet = 0; const char *pgm; + void *data = NULL; + merge_fn merge_action; + struct lock_file lock = LOCK_INIT; /* Without this we cannot rely on waitpid() to tell * what happened to our children. @@ -26,7 +30,18 @@ int cmd_merge_index(int argc, const char **argv, const char *prefix) quiet = 1; i++; } + pgm = argv[i++]; + setup_work_tree(); + + if (!strcmp(pgm, "git-merge-one-file")) { + merge_action = merge_one_file_func; + hold_locked_index(&lock, LOCK_DIE_ON_ERROR); + } else { + merge_action = merge_one_file_spawn; + data = (void *)pgm; + } + for (; i < argc; i++) { const char *arg = argv[i]; if (!force_file && *arg == '-') { @@ -36,13 +51,22 @@ int cmd_merge_index(int argc, const char **argv, const char *prefix) } if (!strcmp(arg, "-a")) { err |= merge_all_index(the_repository, one_shot, quiet, - merge_one_file_spawn, (void *)pgm); + merge_action, data); continue; } die("git merge-index: unknown option %s", arg); } err |= merge_index_path(the_repository, one_shot, quiet, arg, - merge_one_file_spawn, (void *)pgm); + merge_action, data); + } + + if (merge_action == merge_one_file_func) { + if (err) { + rollback_lock_file(&lock); + return err; + } + + return write_locked_index(&the_index, &lock, COMMIT_LOCK); } return err; } diff --git a/merge-strategies.c b/merge-strategies.c index 6f27e66dfe..542cefcf3d 100644 --- a/merge-strategies.c +++ b/merge-strategies.c @@ -178,6 +178,18 @@ int merge_three_way(struct repository *r, return 0; } +int merge_one_file_func(struct repository *r, + const struct object_id *orig_blob, + const struct object_id *our_blob, + const struct object_id *their_blob, const char *path, + unsigned int orig_mode, unsigned int our_mode, unsigned int their_mode, + void *data) +{ + return merge_three_way(r, + orig_blob, our_blob, their_blob, path, + orig_mode, our_mode, their_mode); +} + int merge_one_file_spawn(struct repository *r, const struct object_id *orig_blob, const struct object_id *our_blob, @@ -261,17 +273,22 @@ int merge_all_index(struct repository *r, int oneshot, int quiet, merge_fn fn, void *data) { int err = 0, ret; - unsigned int i; + unsigned int i, prev_nr; for (i = 0; i < r->index->cache_nr; i++) { const struct cache_entry *ce = r->index->cache[i]; if (!ce_stage(ce)) continue; + prev_nr = r->index->cache_nr; ret = merge_entry(r, quiet || oneshot, i, ce->name, &err, fn, data); - if (ret > 0) - i += ret - 1; - else if (ret == -1) + if (ret > 0) { + /* Don't bother handling an index that has + grown, since merge_one_file_func() can't grow + it, and merge_one_file_spawn() can't change + it. */ + i += ret - (prev_nr - r->index->cache_nr) - 1; + } else if (ret == -1) return -1; if (err && !oneshot) diff --git a/merge-strategies.h b/merge-strategies.h index 94c40635c4..0b74d45431 100644 --- a/merge-strategies.h +++ b/merge-strategies.h @@ -16,6 +16,13 @@ typedef int (*merge_fn)(struct repository *r, unsigned int orig_mode, unsigned int our_mode, unsigned int their_mode, void *data); +int merge_one_file_func(struct repository *r, + const struct object_id *orig_blob, + const struct object_id *our_blob, + const struct object_id *their_blob, const char *path, + unsigned int orig_mode, unsigned int our_mode, unsigned int their_mode, + void *data); + int merge_one_file_spawn(struct repository *r, const struct object_id *orig_blob, const struct object_id *our_blob,
Since `git-merge-one-file' has been rewritten and libified, this teaches `merge-index' to call merge_three_way() without forking using a new callback, merge_one_file_func(). To avoid any issue with a shrinking index because of the merge function used (directly in the process or by forking), as described earlier, the iterator of the loop of merge_all_index() is increased by the number of entries with the same name, minus the difference between the number of entries in the index before and after the merge. This should handle a shrinking index correctly, but could lead to issues with a growing index. However, this case is not treated, as there is no callback that can produce such a case. Signed-off-by: Alban Gruin <alban.gruin@gmail.com> --- builtin/merge-index.c | 28 ++++++++++++++++++++++++++-- merge-strategies.c | 25 +++++++++++++++++++++---- merge-strategies.h | 7 +++++++ 3 files changed, 54 insertions(+), 6 deletions(-)