diff mbox series

[v2,5/5] fetch: Make --jobs control submodules and remotes

Message ID 20190812213448.2649-6-palmer@sifive.com (mailing list archive)
State New, archived
Headers show
Series fetch: Extend --jobs to multiple remotes | expand

Commit Message

Palmer Dabbelt Aug. 12, 2019, 9:34 p.m. UTC
The existing --jobs argument was defined to control the number of jobs
used for submodule fetching, but it makes more sense to have this
argument control the number of jobs to be used when fetching from
multiple remotes as well.

This patch simply changes the --jobs argument parsing code to set both
max_children_for_{submodules,fetch}, as well as noting this new behavior
in the documentation.

Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
---
 Documentation/fetch-options.txt |  4 ++++
 builtin/fetch.c                 | 21 ++++++++++++++++++---
 2 files changed, 22 insertions(+), 3 deletions(-)

Comments

Johannes Schindelin Aug. 13, 2019, 8:16 p.m. UTC | #1
Hi,


On Mon, 12 Aug 2019, Palmer Dabbelt wrote:

> The existing --jobs argument was defined to control the number of jobs
> used for submodule fetching, but it makes more sense to have this
> argument control the number of jobs to be used when fetching from
> multiple remotes as well.
>
> This patch simply changes the --jobs argument parsing code to set both
> max_children_for_{submodules,fetch}, as well as noting this new behavior
> in the documentation.
>
> Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
> ---

I very much miss in this description a reflection of my analysis in
https://public-inbox.org/git/nycvar.QRO.7.76.6.1907191507420.47@tvgsbejvaqbjf.bet/

Given that analysis, combined with the fact that the `--jobs` option
tries to control both the `--multiple` and `--recursive-submodules` code
paths in the end, anyway, I do doubt that it makes sense to even
introduce the `--fetch-jobs` and the `--submodule-fetch-jobs` options;
They are probably only confusing and do not add much benefit to the end
user.

Ciao,
Johannes

>  Documentation/fetch-options.txt |  4 ++++
>  builtin/fetch.c                 | 21 ++++++++++++++++++---
>  2 files changed, 22 insertions(+), 3 deletions(-)
>
> diff --git a/Documentation/fetch-options.txt b/Documentation/fetch-options.txt
> index 5836024f1934..0915fd4ed6d5 100644
> --- a/Documentation/fetch-options.txt
> +++ b/Documentation/fetch-options.txt
> @@ -160,6 +160,10 @@ ifndef::git-pull[]
>
>  -j::
>  --jobs=<n>::
> +	Number of parallel children to be used for all forms of fetching.
> +	This is the same as passing `--submodule-fetch-jobs=<n>` and
> +	`--fetch-jobs=<n>`.
> +
>  --submodule-fetch-jobs=<n>::
>  	Number of parallel children to be used for fetching submodules.
>  	Each will fetch from different submodules, such that fetching many
> diff --git a/builtin/fetch.c b/builtin/fetch.c
> index 67d001f3f78b..41498e9efb3b 100644
> --- a/builtin/fetch.c
> +++ b/builtin/fetch.c
> @@ -114,6 +114,20 @@ static int git_fetch_config(const char *k, const char *v, void *cb)
>  	return git_default_config(k, v, cb);
>  }
>
> +static int parse_jobs_arg(const struct option *opt, const char *arg, int unset)
> +{
> +	int jobs;
> +
> +	jobs = atoi(arg);
> +	if (jobs < 1)
> +		die(_("There must be a positive number of jobs"));
> +
> +	max_children_for_submodules = jobs;
> +	max_children_for_fetch = jobs;
> +
> +	return 0;
> +}
> +
>  static int parse_refmap_arg(const struct option *opt, const char *arg, int unset)
>  {
>  	BUG_ON_OPT_NEG(unset);
> @@ -142,12 +156,13 @@ static struct option builtin_fetch_options[] = {
>  		    N_("fetch all tags and associated objects"), TAGS_SET),
>  	OPT_SET_INT('n', NULL, &tags,
>  		    N_("do not fetch all tags (--no-tags)"), TAGS_UNSET),
> -	OPT_INTEGER('j', "jobs", &max_children_for_submodules,
> +	{ OPTION_CALLBACK, 'j', "jobs", NULL, N_("jobs"),
> +		    N_("number of parallel tasks to run while fetching"),
> +		    PARSE_OPT_OPTARG, &parse_jobs_arg },
> +	OPT_INTEGER(0, "submodule-fetch-jobs", &max_children_for_submodules,
>  		    N_("number of submodules fetched in parallel")),
>  	OPT_INTEGER(0, "fetch-jobs", &max_children_for_fetch,
>  		    N_("number of remotes fetched in parallel")),
> -	OPT_INTEGER(0, "submodule-fetch-jobs", &max_children_for_submodules,
> -		    N_("number of submodules fetched in parallel")),
>  	OPT_BOOL('p', "prune", &prune,
>  		 N_("prune remote-tracking branches no longer on remote")),
>  	OPT_BOOL('P', "prune-tags", &prune_tags,
> --
> 2.21.0
>
>
Junio C Hamano Aug. 13, 2019, 10 p.m. UTC | #2
Palmer Dabbelt <palmer@sifive.com> writes:

> The existing --jobs argument was defined to control the number of jobs
> used for submodule fetching, but it makes more sense to have this
> argument control the number of jobs to be used when fetching from
> multiple remotes as well.
>
> This patch simply changes the --jobs argument parsing code to set both
> max_children_for_{submodules,fetch}, as well as noting this new behavior
> in the documentation.

That's a sensible, if overly careful, transition plan.  This patch
cannot be queued together with the other four, though, for the plan
to be practical.  It probably needs to come a few releases after the
other four hits a release.

A less involved and much more careless transition plan may be to
just declare that "--jobs that only controls submodule fetches is a
bug and it must also affect how fetches from multiple remote
repositories are done" and come directly to this step, without
necessarily introducing options that control them independently.

I have a suspicion that we can afford to go the careless route for
this particular one, if we wanted to, as not may people would care
about controlling these independently.
Palmer Dabbelt Aug. 13, 2019, 10:06 p.m. UTC | #3
On Tue, 13 Aug 2019 13:16:11 PDT (-0700), Johannes.Schindelin@gmx.de wrote:
> Hi,
>
>
> On Mon, 12 Aug 2019, Palmer Dabbelt wrote:
>
>> The existing --jobs argument was defined to control the number of jobs
>> used for submodule fetching, but it makes more sense to have this
>> argument control the number of jobs to be used when fetching from
>> multiple remotes as well.
>>
>> This patch simply changes the --jobs argument parsing code to set both
>> max_children_for_{submodules,fetch}, as well as noting this new behavior
>> in the documentation.
>>
>> Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
>> ---
>
> I very much miss in this description a reflection of my analysis in
> https://public-inbox.org/git/nycvar.QRO.7.76.6.1907191507420.47@tvgsbejvaqbjf.bet/
>
> Given that analysis, combined with the fact that the `--jobs` option
> tries to control both the `--multiple` and `--recursive-submodules` code
> paths in the end, anyway, I do doubt that it makes sense to even
> introduce the `--fetch-jobs` and the `--submodule-fetch-jobs` options;
> They are probably only confusing and do not add much benefit to the end
> user.

The cover letter at least attempts to describe this.  I figured I'd have to 
pick one option for a v2, so I went with the more complicated one under the 
assumption it would be easy to re-spin a v3 that drops the extra arguments.  
I'm happy to do so.

>
> Ciao,
> Johannes
>
>>  Documentation/fetch-options.txt |  4 ++++
>>  builtin/fetch.c                 | 21 ++++++++++++++++++---
>>  2 files changed, 22 insertions(+), 3 deletions(-)
>>
>> diff --git a/Documentation/fetch-options.txt b/Documentation/fetch-options.txt
>> index 5836024f1934..0915fd4ed6d5 100644
>> --- a/Documentation/fetch-options.txt
>> +++ b/Documentation/fetch-options.txt
>> @@ -160,6 +160,10 @@ ifndef::git-pull[]
>>
>>  -j::
>>  --jobs=<n>::
>> +	Number of parallel children to be used for all forms of fetching.
>> +	This is the same as passing `--submodule-fetch-jobs=<n>` and
>> +	`--fetch-jobs=<n>`.
>> +
>>  --submodule-fetch-jobs=<n>::
>>  	Number of parallel children to be used for fetching submodules.
>>  	Each will fetch from different submodules, such that fetching many
>> diff --git a/builtin/fetch.c b/builtin/fetch.c
>> index 67d001f3f78b..41498e9efb3b 100644
>> --- a/builtin/fetch.c
>> +++ b/builtin/fetch.c
>> @@ -114,6 +114,20 @@ static int git_fetch_config(const char *k, const char *v, void *cb)
>>  	return git_default_config(k, v, cb);
>>  }
>>
>> +static int parse_jobs_arg(const struct option *opt, const char *arg, int unset)
>> +{
>> +	int jobs;
>> +
>> +	jobs = atoi(arg);
>> +	if (jobs < 1)
>> +		die(_("There must be a positive number of jobs"));
>> +
>> +	max_children_for_submodules = jobs;
>> +	max_children_for_fetch = jobs;
>> +
>> +	return 0;
>> +}
>> +
>>  static int parse_refmap_arg(const struct option *opt, const char *arg, int unset)
>>  {
>>  	BUG_ON_OPT_NEG(unset);
>> @@ -142,12 +156,13 @@ static struct option builtin_fetch_options[] = {
>>  		    N_("fetch all tags and associated objects"), TAGS_SET),
>>  	OPT_SET_INT('n', NULL, &tags,
>>  		    N_("do not fetch all tags (--no-tags)"), TAGS_UNSET),
>> -	OPT_INTEGER('j', "jobs", &max_children_for_submodules,
>> +	{ OPTION_CALLBACK, 'j', "jobs", NULL, N_("jobs"),
>> +		    N_("number of parallel tasks to run while fetching"),
>> +		    PARSE_OPT_OPTARG, &parse_jobs_arg },
>> +	OPT_INTEGER(0, "submodule-fetch-jobs", &max_children_for_submodules,
>>  		    N_("number of submodules fetched in parallel")),
>>  	OPT_INTEGER(0, "fetch-jobs", &max_children_for_fetch,
>>  		    N_("number of remotes fetched in parallel")),
>> -	OPT_INTEGER(0, "submodule-fetch-jobs", &max_children_for_submodules,
>> -		    N_("number of submodules fetched in parallel")),
>>  	OPT_BOOL('p', "prune", &prune,
>>  		 N_("prune remote-tracking branches no longer on remote")),
>>  	OPT_BOOL('P', "prune-tags", &prune_tags,
>> --
>> 2.21.0
>>
>>
Palmer Dabbelt Aug. 13, 2019, 10:06 p.m. UTC | #4
On Tue, 13 Aug 2019 15:00:13 PDT (-0700), gitster@pobox.com wrote:
> Palmer Dabbelt <palmer@sifive.com> writes:
>
>> The existing --jobs argument was defined to control the number of jobs
>> used for submodule fetching, but it makes more sense to have this
>> argument control the number of jobs to be used when fetching from
>> multiple remotes as well.
>>
>> This patch simply changes the --jobs argument parsing code to set both
>> max_children_for_{submodules,fetch}, as well as noting this new behavior
>> in the documentation.
>
> That's a sensible, if overly careful, transition plan.  This patch
> cannot be queued together with the other four, though, for the plan
> to be practical.  It probably needs to come a few releases after the
> other four hits a release.
>
> A less involved and much more careless transition plan may be to
> just declare that "--jobs that only controls submodule fetches is a
> bug and it must also affect how fetches from multiple remote
> repositories are done" and come directly to this step, without
> necessarily introducing options that control them independently.
>
> I have a suspicion that we can afford to go the careless route for
> this particular one, if we wanted to, as not may people would care
> about controlling these independently.

This was brought up as part of the v1, and the cover letter lays out a plan to 
do so.  I'm happy to re-spin the patch set to just have --jobs control 
everything.
SZEDER Gábor Aug. 14, 2019, 8:32 a.m. UTC | #5
On Mon, Aug 12, 2019 at 02:34:48PM -0700, Palmer Dabbelt wrote:
> diff --git a/builtin/fetch.c b/builtin/fetch.c
> index 67d001f3f78b..41498e9efb3b 100644
> --- a/builtin/fetch.c
> +++ b/builtin/fetch.c
> @@ -114,6 +114,20 @@ static int git_fetch_config(const char *k, const char *v, void *cb)
>  	return git_default_config(k, v, cb);
>  }
>  
> +static int parse_jobs_arg(const struct option *opt, const char *arg, int unset)
> +{
> +	int jobs;
> +
> +	jobs = atoi(arg);
> +	if (jobs < 1)
> +		die(_("There must be a positive number of jobs"));
> +
> +	max_children_for_submodules = jobs;
> +	max_children_for_fetch = jobs;
> +
> +	return 0;
> +}
> +
>  static int parse_refmap_arg(const struct option *opt, const char *arg, int unset)
>  {
>  	BUG_ON_OPT_NEG(unset);
> @@ -142,12 +156,13 @@ static struct option builtin_fetch_options[] = {
>  		    N_("fetch all tags and associated objects"), TAGS_SET),
>  	OPT_SET_INT('n', NULL, &tags,
>  		    N_("do not fetch all tags (--no-tags)"), TAGS_UNSET),
> -	OPT_INTEGER('j', "jobs", &max_children_for_submodules,
> +	{ OPTION_CALLBACK, 'j', "jobs", NULL, N_("jobs"),
> +		    N_("number of parallel tasks to run while fetching"),
> +		    PARSE_OPT_OPTARG, &parse_jobs_arg },

These changes result segmentation faults in the tests '--quiet
propagates to parallel submodules' and 'fetching submodules respects
parallel settings' in 't5526-fetch-submodules.sh'.

If the number of jobs is specified as '-j 2' or '--jobs 7', i.e. as an
unstuck argument of the option, as opposed to '-j2' or '--jobs=7',
then 'arg' in the parse_jobs_arg() callback is NULL, which then causes
the segfault somewhere inside that atoi() call.
Junio C Hamano Aug. 14, 2019, 3:54 p.m. UTC | #6
SZEDER Gábor <szeder.dev@gmail.com> writes:

> On Mon, Aug 12, 2019 at 02:34:48PM -0700, Palmer Dabbelt wrote:
>> diff --git a/builtin/fetch.c b/builtin/fetch.c
>> index 67d001f3f78b..41498e9efb3b 100644
>> --- a/builtin/fetch.c
>> +++ b/builtin/fetch.c
>> @@ -114,6 +114,20 @@ static int git_fetch_config(const char *k, const char *v, void *cb)
>>  	return git_default_config(k, v, cb);
>>  }
>>  
>> +static int parse_jobs_arg(const struct option *opt, const char *arg, int unset)
>> +{
>> +	int jobs;
>> +
>> +	jobs = atoi(arg);
>> +	if (jobs < 1)
>> +		die(_("There must be a positive number of jobs"));
>> +
>> +	max_children_for_submodules = jobs;
>> +	max_children_for_fetch = jobs;
>> +
>> +	return 0;
>> +}
>> +
>>  static int parse_refmap_arg(const struct option *opt, const char *arg, int unset)
>>  {
>>  	BUG_ON_OPT_NEG(unset);
>> @@ -142,12 +156,13 @@ static struct option builtin_fetch_options[] = {
>>  		    N_("fetch all tags and associated objects"), TAGS_SET),
>>  	OPT_SET_INT('n', NULL, &tags,
>>  		    N_("do not fetch all tags (--no-tags)"), TAGS_UNSET),
>> -	OPT_INTEGER('j', "jobs", &max_children_for_submodules,
>> +	{ OPTION_CALLBACK, 'j', "jobs", NULL, N_("jobs"),
>> +		    N_("number of parallel tasks to run while fetching"),
>> +		    PARSE_OPT_OPTARG, &parse_jobs_arg },
>
> These changes result segmentation faults in the tests '--quiet
> propagates to parallel submodules' and 'fetching submodules respects
> parallel settings' in 't5526-fetch-submodules.sh'.
>
> If the number of jobs is specified as '-j 2' or '--jobs 7', i.e. as an
> unstuck argument of the option, as opposed to '-j2' or '--jobs=7',
> then 'arg' in the parse_jobs_arg() callback is NULL, which then causes
> the segfault somewhere inside that atoi() call.

True.  

An easier and more readable way would be to set another "nr-jobs"
variable using plain vanilla OPT_INTEGER() and override the other
two with it when they are not set after parse_options() returns, I
guess.
Palmer Dabbelt Aug. 14, 2019, 6:33 p.m. UTC | #7
On Wed, 14 Aug 2019 01:32:45 PDT (-0700), szeder.dev@gmail.com wrote:
> On Mon, Aug 12, 2019 at 02:34:48PM -0700, Palmer Dabbelt wrote:
>> diff --git a/builtin/fetch.c b/builtin/fetch.c
>> index 67d001f3f78b..41498e9efb3b 100644
>> --- a/builtin/fetch.c
>> +++ b/builtin/fetch.c
>> @@ -114,6 +114,20 @@ static int git_fetch_config(const char *k, const char *v, void *cb)
>>  	return git_default_config(k, v, cb);
>>  }
>>
>> +static int parse_jobs_arg(const struct option *opt, const char *arg, int unset)
>> +{
>> +	int jobs;
>> +
>> +	jobs = atoi(arg);
>> +	if (jobs < 1)
>> +		die(_("There must be a positive number of jobs"));
>> +
>> +	max_children_for_submodules = jobs;
>> +	max_children_for_fetch = jobs;
>> +
>> +	return 0;
>> +}
>> +
>>  static int parse_refmap_arg(const struct option *opt, const char *arg, int unset)
>>  {
>>  	BUG_ON_OPT_NEG(unset);
>> @@ -142,12 +156,13 @@ static struct option builtin_fetch_options[] = {
>>  		    N_("fetch all tags and associated objects"), TAGS_SET),
>>  	OPT_SET_INT('n', NULL, &tags,
>>  		    N_("do not fetch all tags (--no-tags)"), TAGS_UNSET),
>> -	OPT_INTEGER('j', "jobs", &max_children_for_submodules,
>> +	{ OPTION_CALLBACK, 'j', "jobs", NULL, N_("jobs"),
>> +		    N_("number of parallel tasks to run while fetching"),
>> +		    PARSE_OPT_OPTARG, &parse_jobs_arg },
>
> These changes result segmentation faults in the tests '--quiet
> propagates to parallel submodules' and 'fetching submodules respects
> parallel settings' in 't5526-fetch-submodules.sh'.
>
> If the number of jobs is specified as '-j 2' or '--jobs 7', i.e. as an
> unstuck argument of the option, as opposed to '-j2' or '--jobs=7',
> then 'arg' in the parse_jobs_arg() callback is NULL, which then causes
> the segfault somewhere inside that atoi() call.

Thanks, I'll fix that in the v3.
diff mbox series

Patch

diff --git a/Documentation/fetch-options.txt b/Documentation/fetch-options.txt
index 5836024f1934..0915fd4ed6d5 100644
--- a/Documentation/fetch-options.txt
+++ b/Documentation/fetch-options.txt
@@ -160,6 +160,10 @@  ifndef::git-pull[]
 
 -j::
 --jobs=<n>::
+	Number of parallel children to be used for all forms of fetching.
+	This is the same as passing `--submodule-fetch-jobs=<n>` and
+	`--fetch-jobs=<n>`.
+
 --submodule-fetch-jobs=<n>::
 	Number of parallel children to be used for fetching submodules.
 	Each will fetch from different submodules, such that fetching many
diff --git a/builtin/fetch.c b/builtin/fetch.c
index 67d001f3f78b..41498e9efb3b 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -114,6 +114,20 @@  static int git_fetch_config(const char *k, const char *v, void *cb)
 	return git_default_config(k, v, cb);
 }
 
+static int parse_jobs_arg(const struct option *opt, const char *arg, int unset)
+{
+	int jobs;
+
+	jobs = atoi(arg);
+	if (jobs < 1)
+		die(_("There must be a positive number of jobs"));
+
+	max_children_for_submodules = jobs;
+	max_children_for_fetch = jobs;
+
+	return 0;
+}
+
 static int parse_refmap_arg(const struct option *opt, const char *arg, int unset)
 {
 	BUG_ON_OPT_NEG(unset);
@@ -142,12 +156,13 @@  static struct option builtin_fetch_options[] = {
 		    N_("fetch all tags and associated objects"), TAGS_SET),
 	OPT_SET_INT('n', NULL, &tags,
 		    N_("do not fetch all tags (--no-tags)"), TAGS_UNSET),
-	OPT_INTEGER('j', "jobs", &max_children_for_submodules,
+	{ OPTION_CALLBACK, 'j', "jobs", NULL, N_("jobs"),
+		    N_("number of parallel tasks to run while fetching"),
+		    PARSE_OPT_OPTARG, &parse_jobs_arg },
+	OPT_INTEGER(0, "submodule-fetch-jobs", &max_children_for_submodules,
 		    N_("number of submodules fetched in parallel")),
 	OPT_INTEGER(0, "fetch-jobs", &max_children_for_fetch,
 		    N_("number of remotes fetched in parallel")),
-	OPT_INTEGER(0, "submodule-fetch-jobs", &max_children_for_submodules,
-		    N_("number of submodules fetched in parallel")),
 	OPT_BOOL('p', "prune", &prune,
 		 N_("prune remote-tracking branches no longer on remote")),
 	OPT_BOOL('P', "prune-tags", &prune_tags,