diff mbox series

[v6,06/13] merge-index: don't fork if the requested program is `git-merge-one-file'

Message ID 20201124115315.13311-7-alban.gruin@gmail.com (mailing list archive)
State New, archived
Headers show
Series Rewrite the remaining merge strategies from shell to C | expand

Commit Message

Alban Gruin Nov. 24, 2020, 11:53 a.m. UTC
Since `git-merge-one-file' has been rewritten and libified, this teaches
`merge-index' to call merge_three_way() without forking using a new
callback, merge_one_file_func().

To avoid any issue with a shrinking index because of the merge function
used (directly in the process or by forking), as described earlier, the
iterator of the loop of merge_all_index() is increased by the number of
entries with the same name, minus the difference between the number of
entries in the index before and after the merge.

This should handle a shrinking index correctly, but could lead to issues
with a growing index.  However, this case is not treated, as there is no
callback that can produce such a case.

Signed-off-by: Alban Gruin <alban.gruin@gmail.com>
---
 builtin/merge-index.c | 28 ++++++++++++++++++++++++++--
 merge-strategies.c    | 25 +++++++++++++++++++++----
 merge-strategies.h    |  7 +++++++
 3 files changed, 54 insertions(+), 6 deletions(-)

Comments

Derrick Stolee Jan. 5, 2021, 4:11 p.m. UTC | #1
On 11/24/2020 6:53 AM, Alban Gruin wrote:
> +
>  	pgm = argv[i++];
> +	setup_work_tree();
> +
> +	if (!strcmp(pgm, "git-merge-one-file")) {

This stood out to me as possibly fragile. What if we call the
non-dashed form "git merge-one-file"? Shouldn't we be doing so?

Or, is this something that is handled higher in the builtin
machinery to take the non-dashed version and change it to the
dashed version for historical reasons?

> +		merge_action = merge_one_file_func;
> +		hold_locked_index(&lock, LOCK_DIE_ON_ERROR);
> +	} else {
> +		merge_action = merge_one_file_spawn;
> +		data = (void *)pgm;
> +	}
> +

...

> +	if (merge_action == merge_one_file_func) {

nit: This made me think it would be better to check the 'lock'
itself to see if it was initialized or not. Perhaps

	if (lock.tempfile) {

would be the appropriate way to check this?

For now, this is equivalent behavior, but it might be helpful if
we add more cases that take the lock in the future.

> +		if (err) {
> +			rollback_lock_file(&lock);
> +			return err;
> +		}
> +
> +		return write_locked_index(&the_index, &lock, COMMIT_LOCK);
>  	}
>  	return err;

nit: this could be simplified. In total, I recommend:

	if (lock.tempfile) {
		if (err)
			rollback_lock_file(&lock);
		else
			return write_locked_index(&the_index, &lock, COMMIT_LOCK);
	}
	return err;


>  }
> diff --git a/merge-strategies.c b/merge-strategies.c
> index 6f27e66dfe..542cefcf3d 100644
> --- a/merge-strategies.c
> +++ b/merge-strategies.c
> @@ -178,6 +178,18 @@ int merge_three_way(struct repository *r,
>  	return 0;
>  }
>  
> +int merge_one_file_func(struct repository *r,
> +			const struct object_id *orig_blob,
> +			const struct object_id *our_blob,
> +			const struct object_id *their_blob, const char *path,
> +			unsigned int orig_mode, unsigned int our_mode, unsigned int their_mode,
> +			void *data)
> +{
> +	return merge_three_way(r,
> +			       orig_blob, our_blob, their_blob, path,
> +			       orig_mode, our_mode, their_mode);
> +}
> +

Again, I don't recommend making this callback in the library. Instead, keep
it in the builtin and then use merge_three_way() which is in the library.

>  int merge_one_file_spawn(struct repository *r,
>  			 const struct object_id *orig_blob,
>  			 const struct object_id *our_blob,
> @@ -261,17 +273,22 @@ int merge_all_index(struct repository *r, int oneshot, int quiet,
>  		    merge_fn fn, void *data)
>  {
>  	int err = 0, ret;
> -	unsigned int i;
> +	unsigned int i, prev_nr;
>  
>  	for (i = 0; i < r->index->cache_nr; i++) {
>  		const struct cache_entry *ce = r->index->cache[i];
>  		if (!ce_stage(ce))
>  			continue;
>  
> +		prev_nr = r->index->cache_nr;
>  		ret = merge_entry(r, quiet || oneshot, i, ce->name, &err, fn, data);
> -		if (ret > 0)
> -			i += ret - 1;
> -		else if (ret == -1)
> +		if (ret > 0) {
> +			/* Don't bother handling an index that has
> +			   grown, since merge_one_file_func() can't grow
> +			   it, and merge_one_file_spawn() can't change
> +			   it. */

multi-line comment style is as follows:

	/*
	 * Don't bother handling an index that has
	 * grown, since merge_one_file_func() can't grow
	 * it, and merge_one_file_spawn() can't change it.
	 */

Thanks,
-Stolee
Martin Ågren Jan. 5, 2021, 5:35 p.m. UTC | #2
On Tue, 5 Jan 2021 at 17:13, Derrick Stolee <stolee@gmail.com> wrote:
>
> On 11/24/2020 6:53 AM, Alban Gruin wrote:
> > +     if (merge_action == merge_one_file_func) {
>
> nit: This made me think it would be better to check the 'lock'
> itself to see if it was initialized or not. Perhaps
>
>         if (lock.tempfile) {
>
> would be the appropriate way to check this?

> nit: this could be simplified. In total, I recommend:
>
>         if (lock.tempfile) {
>                 if (err)
>                         rollback_lock_file(&lock);
>                 else
>                         return write_locked_index(&the_index, &lock, COMMIT_LOCK);
>         }
>         return err;

FWIW, I also find that way of writing it easier to grok. Although,
rather than peeking at `lock.tempfile`, I suggest using
`is_lock_file_locked(&lock)`.

Martin
Alban Gruin Jan. 5, 2021, 11:20 p.m. UTC | #3
Le 05/01/2021 à 17:11, Derrick Stolee a écrit :
> On 11/24/2020 6:53 AM, Alban Gruin wrote:
>> +
>>  	pgm = argv[i++];
>> +	setup_work_tree();
>> +
>> +	if (!strcmp(pgm, "git-merge-one-file")) {
> 
> This stood out to me as possibly fragile. What if we call the
> non-dashed form "git merge-one-file"? Shouldn't we be doing so?
> 
> Or, is this something that is handled higher in the builtin
> machinery to take the non-dashed version and change it to the
> dashed version for historical reasons?
> 

We had the same discussion with Phillip, who pointed out this previous
discussion about this topic:
https://lore.kernel.org/git/xmqqblv5kr9u.fsf@gitster-ct.c.googlers.com/

So, it's probably OK to do that.

>> +		merge_action = merge_one_file_func;
>> +		hold_locked_index(&lock, LOCK_DIE_ON_ERROR);
>> +	} else {
>> +		merge_action = merge_one_file_spawn;
>> +		data = (void *)pgm;
>> +	}
>> +
> 
> ...
> 
>> +	if (merge_action == merge_one_file_func) {
> 
> nit: This made me think it would be better to check the 'lock'
> itself to see if it was initialized or not. Perhaps
> 
> 	if (lock.tempfile) {
> 
> would be the appropriate way to check this?
> 
> For now, this is equivalent behavior, but it might be helpful if
> we add more cases that take the lock in the future.
> 
>> +		if (err) {
>> +			rollback_lock_file(&lock);
>> +			return err;
>> +		}
>> +
>> +		return write_locked_index(&the_index, &lock, COMMIT_LOCK);
>>  	}
>>  	return err;
> 
> nit: this could be simplified. In total, I recommend:
> 
> 	if (lock.tempfile) {
> 		if (err)
> 			rollback_lock_file(&lock);
> 		else
> 			return write_locked_index(&the_index, &lock, COMMIT_LOCK);
> 	}
> 	return err;
> 

Sure, looks better than mine.  :)

> 
>>  }
>> diff --git a/merge-strategies.c b/merge-strategies.c
>> index 6f27e66dfe..542cefcf3d 100644
>> --- a/merge-strategies.c
>> +++ b/merge-strategies.c
>> @@ -178,6 +178,18 @@ int merge_three_way(struct repository *r,
>>  	return 0;
>>  }
>>  
>> +int merge_one_file_func(struct repository *r,
>> +			const struct object_id *orig_blob,
>> +			const struct object_id *our_blob,
>> +			const struct object_id *their_blob, const char *path,
>> +			unsigned int orig_mode, unsigned int our_mode, unsigned int their_mode,
>> +			void *data)
>> +{
>> +	return merge_three_way(r,
>> +			       orig_blob, our_blob, their_blob, path,
>> +			       orig_mode, our_mode, their_mode);
>> +}
>> +
> 
> Again, I don't recommend making this callback in the library. Instead, keep
> it in the builtin and then use merge_three_way() which is in the library.
> 

This is not possible with this callback, as it will be used later by
merge_strategies_resolve() and indirectly by merge_strategies_octopus().

>>  int merge_one_file_spawn(struct repository *r,
>>  			 const struct object_id *orig_blob,
>>  			 const struct object_id *our_blob,
>> @@ -261,17 +273,22 @@ int merge_all_index(struct repository *r, int oneshot, int quiet,
>>  		    merge_fn fn, void *data)
>>  {
>>  	int err = 0, ret;
>> -	unsigned int i;
>> +	unsigned int i, prev_nr;
>>  
>>  	for (i = 0; i < r->index->cache_nr; i++) {
>>  		const struct cache_entry *ce = r->index->cache[i];
>>  		if (!ce_stage(ce))
>>  			continue;
>>  
>> +		prev_nr = r->index->cache_nr;
>>  		ret = merge_entry(r, quiet || oneshot, i, ce->name, &err, fn, data);
>> -		if (ret > 0)
>> -			i += ret - 1;
>> -		else if (ret == -1)
>> +		if (ret > 0) {
>> +			/* Don't bother handling an index that has
>> +			   grown, since merge_one_file_func() can't grow
>> +			   it, and merge_one_file_spawn() can't change
>> +			   it. */
> 
> multi-line comment style is as follows:
> 
> 	/*
> 	 * Don't bother handling an index that has
> 	 * grown, since merge_one_file_func() can't grow
> 	 * it, and merge_one_file_spawn() can't change it.
> 	 */
> 
> Thanks,
> -Stolee
>
Alban Gruin Jan. 5, 2021, 11:20 p.m. UTC | #4
Hi Martin & Derrick,

Le 05/01/2021 à 18:35, Martin Ågren a écrit :
> On Tue, 5 Jan 2021 at 17:13, Derrick Stolee <stolee@gmail.com> wrote:
>>
>> On 11/24/2020 6:53 AM, Alban Gruin wrote:
>>> +     if (merge_action == merge_one_file_func) {
>>
>> nit: This made me think it would be better to check the 'lock'
>> itself to see if it was initialized or not. Perhaps
>>
>>         if (lock.tempfile) {
>>
>> would be the appropriate way to check this?
> 
>> nit: this could be simplified. In total, I recommend:
>>
>>         if (lock.tempfile) {
>>                 if (err)
>>                         rollback_lock_file(&lock);
>>                 else
>>                         return write_locked_index(&the_index, &lock, COMMIT_LOCK);
>>         }
>>         return err;
> 
> FWIW, I also find that way of writing it easier to grok. Although,
> rather than peeking at `lock.tempfile`, I suggest using
> `is_lock_file_locked(&lock)`.
> 

OK, this looks good to me.

> Martin
> 

Alban
Junio C Hamano Jan. 6, 2021, 2:04 a.m. UTC | #5
Alban Gruin <alban.gruin@gmail.com> writes:

> We had the same discussion with Phillip, who pointed out this previous
> discussion about this topic:
> https://lore.kernel.org/git/xmqqblv5kr9u.fsf@gitster-ct.c.googlers.com/
>
> So, it's probably OK to do that.

These days, there exists an optional installation option exists that
won't even install built-in commands in $GIT_EXEC_PATH, which
invalidates the assessment made in 2019 in the article you cited
above, so the code might still be OK, but the old justification no
longer would apply.

In any case, if two people who reviewed a patch found the same thing
in it fishy, it is an indication that the reason why the apparently
fishy code is OK needs to be better explained so that future readers
of the code do not have to be puzzled about the same thing.

Thanks.
Alban Gruin Jan. 10, 2021, 5:15 p.m. UTC | #6
Hi Junio,

Le 06/01/2021 à 03:04, Junio C Hamano a écrit :
> Alban Gruin <alban.gruin@gmail.com> writes:
> 
>> We had the same discussion with Phillip, who pointed out this previous
>> discussion about this topic:
>> https://lore.kernel.org/git/xmqqblv5kr9u.fsf@gitster-ct.c.googlers.com/
>>
>> So, it's probably OK to do that.
> 
> These days, there exists an optional installation option exists that
> won't even install built-in commands in $GIT_EXEC_PATH, which
> invalidates the assessment made in 2019 in the article you cited
> above, so the code might still be OK, but the old justification no
> longer would apply.
> 
> In any case, if two people who reviewed a patch found the same thing
> in it fishy, it is an indication that the reason why the apparently
> fishy code is OK needs to be better explained so that future readers
> of the code do not have to be puzzled about the same thing.
> 
> Thanks.
> 

Perhaps we could try to check if the provided command exists (with
locate_in_PATH()), if it does, run it through merge_one_file_spawn(),
else, use merge_one_file_func()?

Alban
Junio C Hamano Jan. 10, 2021, 8:51 p.m. UTC | #7
Alban Gruin <alban.gruin@gmail.com> writes:

>> These days, there exists an optional installation option exists that
>> won't even install built-in commands in $GIT_EXEC_PATH, which
>> invalidates the assessment made in 2019 in the article you cited
>> above, so the code might still be OK, but the old justification no
>> longer would apply.
>> 
>> In any case, if two people who reviewed a patch found the same thing
>> in it fishy, it is an indication that the reason why the apparently
>> fishy code is OK needs to be better explained so that future readers
>> of the code do not have to be puzzled about the same thing.
>
> Perhaps we could try to check if the provided command exists (with
> locate_in_PATH()), if it does, run it through merge_one_file_spawn(),
> else, use merge_one_file_func()?

So you think your current implementation will be broken if the "no
dashed git binary on disk" installation option is used?

I do not think "first check if an on-disk command exists and use it,
otherwise check its name" alone would work well in practice.  Both
the 'cat' example that appears in the manual page, and the typical
invocation of git-merge-one-file from merge-resolve:

	git merge-index cat MM
	git merge-index git-merge-one-file -a

would work just as well as before, but does not give you a way to
bypass fork() for the latter.  And changing the order of checks
would mean the users won't have a way to override a buggy builtin
implementation of merge_one_file function.  Besides, using the name
of the binary feels like a bad hack.  

As the invocation from merge-resolve is purely an internal matter,
it may make more sense to introduce a new option and explicitly tell
merge-index that the command line is not asking for an external
program to be spawned, e.g.

	git merge-index --use=merge-one-file -a

You'd prepare a table of internally implemented "take info on a
single path that is being merged and give an automated resolution"
functions, which begins with a single entry that maps the string
"merge-one-file" to your merge_one_file_func function.  Any value to
the "--use" option that names a function not in the table would
cause an error.

Note that in the above the "table of functions" is merely
conceptual.  It is perfectly OK to implement the single entry table
by codeflow (i.e. "if (!strcmp()) ... else error();").  But thinking
in terms of "a table of functions the user can choose from" helps to
form the right mental picture.

Hmm?
Alban Gruin March 8, 2021, 8:32 p.m. UTC | #8
Hi Junio,

Le 10/01/2021 à 21:51, Junio C Hamano a écrit :
> Alban Gruin <alban.gruin@gmail.com> writes:
> 
>>> These days, there exists an optional installation option exists that
>>> won't even install built-in commands in $GIT_EXEC_PATH, which
>>> invalidates the assessment made in 2019 in the article you cited
>>> above, so the code might still be OK, but the old justification no
>>> longer would apply.
>>>
>>> In any case, if two people who reviewed a patch found the same thing
>>> in it fishy, it is an indication that the reason why the apparently
>>> fishy code is OK needs to be better explained so that future readers
>>> of the code do not have to be puzzled about the same thing.
>>
>> Perhaps we could try to check if the provided command exists (with
>> locate_in_PATH()), if it does, run it through merge_one_file_spawn(),
>> else, use merge_one_file_func()?
> 
> So you think your current implementation will be broken if the "no
> dashed git binary on disk" installation option is used?
> 
> I do not think "first check if an on-disk command exists and use it,
> otherwise check its name" alone would work well in practice.  Both
> the 'cat' example that appears in the manual page, and the typical
> invocation of git-merge-one-file from merge-resolve:
> 
> 	git merge-index cat MM
> 	git merge-index git-merge-one-file -a
> 
> would work just as well as before, but does not give you a way to
> bypass fork() for the latter.  And changing the order of checks
> would mean the users won't have a way to override a buggy builtin
> implementation of merge_one_file function.  Besides, using the name
> of the binary feels like a bad hack.  
> 
> As the invocation from merge-resolve is purely an internal matter,
> it may make more sense to introduce a new option and explicitly tell
> merge-index that the command line is not asking for an external
> program to be spawned, e.g.
> 
> 	git merge-index --use=merge-one-file -a
> 
> You'd prepare a table of internally implemented "take info on a
> single path that is being merged and give an automated resolution"
> functions, which begins with a single entry that maps the string
> "merge-one-file" to your merge_one_file_func function.  Any value to
> the "--use" option that names a function not in the table would
> cause an error.
> 
> Note that in the above the "table of functions" is merely
> conceptual.  It is perfectly OK to implement the single entry table
> by codeflow (i.e. "if (!strcmp()) ... else error();").  But thinking
> in terms of "a table of functions the user can choose from" helps to
> form the right mental picture.
> 
> Hmm?
> 

Yes, this should work.

To achieve this, I think I have to reorder this series a bit.
Currently, this is what it looks like:

 1. convert git-merge-one-file;
 2. libify git-merge-index, add the ability to call merge-one-file directly;
 3. convert the resolve strategy;
 4. convert the octopus strategy.

After the reorder, the series would look like this:

 1. libify git-merge-index, add `--use=merge-one-file', change
git-merge-resolve.sh, -octopus.sh, and t6060 to use this new parameter;
 2. convert git-merge-one-file, add the ability for merge-index to call
it directly;
 3. convert the resolve strategy;
 4. convert the octopus strategy.

Alban
diff mbox series

Patch

diff --git a/builtin/merge-index.c b/builtin/merge-index.c
index d5e5713b25..60fcde579f 100644
--- a/builtin/merge-index.c
+++ b/builtin/merge-index.c
@@ -1,11 +1,15 @@ 
 #define USE_THE_INDEX_COMPATIBILITY_MACROS
 #include "builtin.h"
+#include "lockfile.h"
 #include "merge-strategies.h"
 
 int cmd_merge_index(int argc, const char **argv, const char *prefix)
 {
 	int i, force_file = 0, err = 0, one_shot = 0, quiet = 0;
 	const char *pgm;
+	void *data = NULL;
+	merge_fn merge_action;
+	struct lock_file lock = LOCK_INIT;
 
 	/* Without this we cannot rely on waitpid() to tell
 	 * what happened to our children.
@@ -26,7 +30,18 @@  int cmd_merge_index(int argc, const char **argv, const char *prefix)
 		quiet = 1;
 		i++;
 	}
+
 	pgm = argv[i++];
+	setup_work_tree();
+
+	if (!strcmp(pgm, "git-merge-one-file")) {
+		merge_action = merge_one_file_func;
+		hold_locked_index(&lock, LOCK_DIE_ON_ERROR);
+	} else {
+		merge_action = merge_one_file_spawn;
+		data = (void *)pgm;
+	}
+
 	for (; i < argc; i++) {
 		const char *arg = argv[i];
 		if (!force_file && *arg == '-') {
@@ -36,13 +51,22 @@  int cmd_merge_index(int argc, const char **argv, const char *prefix)
 			}
 			if (!strcmp(arg, "-a")) {
 				err |= merge_all_index(the_repository, one_shot, quiet,
-						       merge_one_file_spawn, (void *)pgm);
+						       merge_action, data);
 				continue;
 			}
 			die("git merge-index: unknown option %s", arg);
 		}
 		err |= merge_index_path(the_repository, one_shot, quiet, arg,
-					merge_one_file_spawn, (void *)pgm);
+					merge_action, data);
+	}
+
+	if (merge_action == merge_one_file_func) {
+		if (err) {
+			rollback_lock_file(&lock);
+			return err;
+		}
+
+		return write_locked_index(&the_index, &lock, COMMIT_LOCK);
 	}
 	return err;
 }
diff --git a/merge-strategies.c b/merge-strategies.c
index 6f27e66dfe..542cefcf3d 100644
--- a/merge-strategies.c
+++ b/merge-strategies.c
@@ -178,6 +178,18 @@  int merge_three_way(struct repository *r,
 	return 0;
 }
 
+int merge_one_file_func(struct repository *r,
+			const struct object_id *orig_blob,
+			const struct object_id *our_blob,
+			const struct object_id *their_blob, const char *path,
+			unsigned int orig_mode, unsigned int our_mode, unsigned int their_mode,
+			void *data)
+{
+	return merge_three_way(r,
+			       orig_blob, our_blob, their_blob, path,
+			       orig_mode, our_mode, their_mode);
+}
+
 int merge_one_file_spawn(struct repository *r,
 			 const struct object_id *orig_blob,
 			 const struct object_id *our_blob,
@@ -261,17 +273,22 @@  int merge_all_index(struct repository *r, int oneshot, int quiet,
 		    merge_fn fn, void *data)
 {
 	int err = 0, ret;
-	unsigned int i;
+	unsigned int i, prev_nr;
 
 	for (i = 0; i < r->index->cache_nr; i++) {
 		const struct cache_entry *ce = r->index->cache[i];
 		if (!ce_stage(ce))
 			continue;
 
+		prev_nr = r->index->cache_nr;
 		ret = merge_entry(r, quiet || oneshot, i, ce->name, &err, fn, data);
-		if (ret > 0)
-			i += ret - 1;
-		else if (ret == -1)
+		if (ret > 0) {
+			/* Don't bother handling an index that has
+			   grown, since merge_one_file_func() can't grow
+			   it, and merge_one_file_spawn() can't change
+			   it. */
+			i += ret - (prev_nr - r->index->cache_nr) - 1;
+		} else if (ret == -1)
 			return -1;
 
 		if (err && !oneshot)
diff --git a/merge-strategies.h b/merge-strategies.h
index 94c40635c4..0b74d45431 100644
--- a/merge-strategies.h
+++ b/merge-strategies.h
@@ -16,6 +16,13 @@  typedef int (*merge_fn)(struct repository *r,
 			unsigned int orig_mode, unsigned int our_mode, unsigned int their_mode,
 			void *data);
 
+int merge_one_file_func(struct repository *r,
+			const struct object_id *orig_blob,
+			const struct object_id *our_blob,
+			const struct object_id *their_blob, const char *path,
+			unsigned int orig_mode, unsigned int our_mode, unsigned int their_mode,
+			void *data);
+
 int merge_one_file_spawn(struct repository *r,
 			 const struct object_id *orig_blob,
 			 const struct object_id *our_blob,