diff mbox series

ansible: Increase default task parallelism

Message ID 20250404204827.34941-1-cel@kernel.org (mailing list archive)
State New
Headers show
Series ansible: Increase default task parallelism | expand

Checks

Context Check Description
mcgrof/vmtest-main-VM_Test-2 success Logs for Setup and Run Make Targets (debian:testing)
mcgrof/vmtest-main-VM_Test-3 success Logs for Setup and Run Make Targets (debian:testing)
mcgrof/vmtest-main-VM_Test-5 success Logs for Setup and Run Make Targets (fedora:latest)
mcgrof/vmtest-main-VM_Test-4 success Logs for Setup and Run Make Targets (fedora:latest)
mcgrof/vmtest-main-VM_Test-6 success Logs for Setup and Run Make Targets (opensuse/tumbleweed)
mcgrof/vmtest-main-VM_Test-7 success Logs for Setup and Run Make Targets (opensuse/tumbleweed)
mcgrof/vmtest-main-VM_Test-1 fail Logs for Run kdevops CI
mcgrof/vmtest-main-PR fail PR summary
mcgrof/vmtest-main-VM_Test-0 fail Logs for Run kdevops CI

Commit Message

Chuck Lever April 4, 2025, 8:48 p.m. UTC
From: Chuck Lever <chuck.lever@oracle.com>

Enable workflows with more than 5 target nodes to run concurrently.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 kconfigs/Kconfig.ansible_cfg                         | 9 +++++++++
 playbooks/roles/ansible_cfg/templates/ansible.cfg.j2 | 1 +
 2 files changed, 10 insertions(+)

This will certainly conflict with other recent patches posted here,
but I thought I would post for comments. I've been running with the
"forks" setting elevated for a while, but the new ansible_cfg logic
added here is untested.

Comments

Luis Chamberlain April 4, 2025, 11:41 p.m. UTC | #1
On Fri, Apr 04, 2025 at 04:48:27PM -0400, cel@kernel.org wrote:
> From: Chuck Lever <chuck.lever@oracle.com>
> 
> Enable workflows with more than 5 target nodes to run concurrently.
> 
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

We have a few places that try to increase the size too, might be nice
to remove that if we have a central place that does this nicely now.
In spirit with Daniel's proposed changes to fold the inteprter for
python, so we can simplify and remove special casing it everwhere.

Other than that:

Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>

 Luis
Daniel Gomez April 8, 2025, 1:15 p.m. UTC | #2
On Fri, Apr 04, 2025 at 04:48:27PM +0100, cel@kernel.org wrote:
> From: Chuck Lever <chuck.lever@oracle.com>
> 
> Enable workflows with more than 5 target nodes to run concurrently.
> 
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
>  kconfigs/Kconfig.ansible_cfg                         | 9 +++++++++
>  playbooks/roles/ansible_cfg/templates/ansible.cfg.j2 | 1 +
>  2 files changed, 10 insertions(+)
> 
> This will certainly conflict with other recent patches posted here,
> but I thought I would post for comments. I've been running with the
> "forks" setting elevated for a while, but the new ansible_cfg logic
> added here is untested.
> 
> diff --git a/kconfigs/Kconfig.ansible_cfg b/kconfigs/Kconfig.ansible_cfg
> index 7286b0fe5025..34c8c93fadd6 100644
> --- a/kconfigs/Kconfig.ansible_cfg
> +++ b/kconfigs/Kconfig.ansible_cfg
> @@ -108,6 +108,15 @@ config ANSIBLE_CFG_DEPRECATION_WARNINGS
>  	  Toggle to control the showing of deprecation warnings
>  	  https://docs.ansible.com/ansible/latest/reference_appendices/config.html#deprecation-warnings
>  
> +config ANSIBLE_CFG_FORKS
> +	int "forks"
> +	otuput yaml

Typo here.

> +	default 10

I'm thinking if we want to optimize this number a bit more.

What about a script that returns min(nproc, number of running/configured
guests). But I'm not sure if nproc should account for the cores we assign to the
guests or not. This may require to set up a sensible value first, and update the
value later once we know the number of guests.

What do you think?

> +	help
> +	  Set the default maximum number of concurrent Ansible
> +	  operations (forks).
> +	  https://docs.ansible.com/ansible/latest/playbook_guide/playbooks_strategies.html#setting-the-number-of-forks
> +
>  if DISTRO_OPENSUSE
>  
>  config ANSIBLE_CFG_RECONNECTION_RETRIES
> diff --git a/playbooks/roles/ansible_cfg/templates/ansible.cfg.j2 b/playbooks/roles/ansible_cfg/templates/ansible.cfg.j2
> index e13931b5ce97..04ae67782c20 100644
> --- a/playbooks/roles/ansible_cfg/templates/ansible.cfg.j2
> +++ b/playbooks/roles/ansible_cfg/templates/ansible.cfg.j2
> @@ -8,6 +8,7 @@ display_skipped_hosts = {{ ansible_cfg_callback_plugin_display_skipped_hosts }}
>  show_custom_stats = {{ ansible_cfg_callback_plugin_show_custom_stats }}
>  show_per_host_start = {{ ansible_cfg_callback_plugin_show_per_host_start }}
>  show_task_path_on_failure = {{ ansible_cfg_callback_plugin_show_task_path_on_failure }}
> +forks = {{ ansible_cfg_forks }}

I think it's best practice to define fallback variable values in defaults/
main.yaml. If you agree, could we add this variable there as well?

>  {% if ansible_facts['distribution'] == 'openSUSE' %}
>  [connection]
>  retries = {{ ansible_cfg_reconnection_retries }}
> -- 
> 2.49.0
> 

I'd also suggest adding the _CLI option so we can override this value in CI.
This would make it easier to run multiple kdevops instances on the same machine.
Daniel Gomez April 8, 2025, 1:23 p.m. UTC | #3
On Fri, Apr 04, 2025 at 04:41:44PM +0100, Luis Chamberlain wrote:
> On Fri, Apr 04, 2025 at 04:48:27PM -0400, cel@kernel.org wrote:
> > From: Chuck Lever <chuck.lever@oracle.com>
> > 
> > Enable workflows with more than 5 target nodes to run concurrently.
> > 
> > Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> 
> We have a few places that try to increase the size too, might be nice
> to remove that if we have a central place that does this nicely now.
> In spirit with Daniel's proposed changes to fold the inteprter for
> python, so we can simplify and remove special casing it everwhere.

Agree with this. I found it is actually used everywhere:
 
git grep "\-f" | grep "\-i" | grep hosts | wc -l
      76
git grep 30 | grep Makefile | grep "\-f" | wc -l
      74

We can probably add ANSIBLE_INVENTORY [1] to the ansible.cfg TODO list too.

[1]
https://docs.ansible.com/ansible/latest/reference_appendices/config.html#envvar-ANSIBLE_INVENTORY

> 
> Other than that:
> 
> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
> 
>  Luis
Chuck Lever April 8, 2025, 1:51 p.m. UTC | #4
Hi Daniel -

On 4/8/25 9:15 AM, Daniel Gomez wrote:
> On Fri, Apr 04, 2025 at 04:48:27PM +0100, cel@kernel.org wrote:
>> From: Chuck Lever <chuck.lever@oracle.com>
>>
>> Enable workflows with more than 5 target nodes to run concurrently.
>>
>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>> ---
>>  kconfigs/Kconfig.ansible_cfg                         | 9 +++++++++
>>  playbooks/roles/ansible_cfg/templates/ansible.cfg.j2 | 1 +
>>  2 files changed, 10 insertions(+)
>>
>> This will certainly conflict with other recent patches posted here,
>> but I thought I would post for comments. I've been running with the
>> "forks" setting elevated for a while, but the new ansible_cfg logic
>> added here is untested.
>>
>> diff --git a/kconfigs/Kconfig.ansible_cfg b/kconfigs/Kconfig.ansible_cfg
>> index 7286b0fe5025..34c8c93fadd6 100644
>> --- a/kconfigs/Kconfig.ansible_cfg
>> +++ b/kconfigs/Kconfig.ansible_cfg
>> @@ -108,6 +108,15 @@ config ANSIBLE_CFG_DEPRECATION_WARNINGS
>>  	  Toggle to control the showing of deprecation warnings
>>  	  https://docs.ansible.com/ansible/latest/reference_appendices/config.html#deprecation-warnings
>>  
>> +config ANSIBLE_CFG_FORKS
>> +	int "forks"
>> +	otuput yaml
> 
> Typo here.

Thanks!


>> +	default 10
> 
> I'm thinking if we want to optimize this number a bit more.
> 
> What about a script that returns min(nproc, number of running/configured
> guests). But I'm not sure if nproc should account for the cores we assign to the
> guests or not. This may require to set up a sensible value first, and update the
> value later once we know the number of guests.
> 
> What do you think?

I have systems with multiple buildbot workers, each running multiple
libvirt guests with multiple vCPUs. I'm not certain how one of these
workers would be able to guess how many total guests with how many
provisioned vCPUs are active on the host system.

Also, except when doing something CPU intensive, I don't think the
guest count is very interesting. The number of provisioned vCPUs might
be a better match, but again, it's hard to ascertain that total.

At best, any heuristic is going to be a wild miss in some scenarios.

I'm open to discussion, though.


>> +	help
>> +	  Set the default maximum number of concurrent Ansible
>> +	  operations (forks).
>> +	  https://docs.ansible.com/ansible/latest/playbook_guide/playbooks_strategies.html#setting-the-number-of-forks
>> +
>>  if DISTRO_OPENSUSE
>>  
>>  config ANSIBLE_CFG_RECONNECTION_RETRIES
>> diff --git a/playbooks/roles/ansible_cfg/templates/ansible.cfg.j2 b/playbooks/roles/ansible_cfg/templates/ansible.cfg.j2
>> index e13931b5ce97..04ae67782c20 100644
>> --- a/playbooks/roles/ansible_cfg/templates/ansible.cfg.j2
>> +++ b/playbooks/roles/ansible_cfg/templates/ansible.cfg.j2
>> @@ -8,6 +8,7 @@ display_skipped_hosts = {{ ansible_cfg_callback_plugin_display_skipped_hosts }}
>>  show_custom_stats = {{ ansible_cfg_callback_plugin_show_custom_stats }}
>>  show_per_host_start = {{ ansible_cfg_callback_plugin_show_per_host_start }}
>>  show_task_path_on_failure = {{ ansible_cfg_callback_plugin_show_task_path_on_failure }}
>> +forks = {{ ansible_cfg_forks }}
> 
> I think it's best practice to define fallback variable values in defaults/
> main.yaml. If you agree, could we add this variable there as well?

For values that come from Kconfig, that means we possibly have a default
value in the Kconfig file and one in one or more defaults/main.yml
files. (Even worse for terraform when you have default values in vars.tf
as well...)

Frequently these default values are different from one another. Even
when they are the same, it's easy to forget and change one and not the
others.

Thus, in this context I generally prefer to let Kconfig decide the
default value, and don't set up defaults further down the chain.

But I can add a default/main.yml if you like. The above is just my
rationale for why I didn't add one here.


>>  {% if ansible_facts['distribution'] == 'openSUSE' %}
>>  [connection]
>>  retries = {{ ansible_cfg_reconnection_retries }}
>> -- 
>> 2.49.0
>>
> 
> I'd also suggest adding the _CLI option so we can override this value in CI.
> This would make it easier to run multiple kdevops instances on the same machine.

Will look into it, and post a v2. Thanks for the review!
Daniel Gomez April 9, 2025, 6:52 p.m. UTC | #5
> >> +	default 10
> > 
> > I'm thinking if we want to optimize this number a bit more.
> > 
> > What about a script that returns min(nproc, number of running/configured
> > guests). But I'm not sure if nproc should account for the cores we assign to the
> > guests or not. This may require to set up a sensible value first, and update the
> > value later once we know the number of guests.
> > 
> > What do you think?
> 
> I have systems with multiple buildbot workers, each running multiple
> libvirt guests with multiple vCPUs. I'm not certain how one of these
> workers would be able to guess how many total guests with how many
> provisioned vCPUs are active on the host system.

Right, in this case we need to specify a custom Ansible fork value, since
(AFAIK) there's no mechanism to share information between buildbot workers. For
this case, I'd suggest defining how many buildbots workers will run in parallel,
and how many forks each one is allowed to use. And probably if all workers
perform the same pipeline/workflow/build, this value would be the same for all
of them.

That said, I wonder if we should actually use this value also for host builds?

> 
> Also, except when doing something CPU intensive, I don't think the
> guest count is very interesting. The number of provisioned vCPUs might
> be a better match, but again, it's hard to ascertain that total.
> 
> At best, any heuristic is going to be a wild miss in some scenarios.

I agree with this. Unless it's assumed that only one kdevops instance is run per
machine (in the default case), then we can use nproc as default value here. Users
can then override this value via CLI or menuconfig.

> 
> I'm open to discussion, though.
> 
> 
> >> +	help
> >> +	  Set the default maximum number of concurrent Ansible
> >> +	  operations (forks).
> >> +	  https://docs.ansible.com/ansible/latest/playbook_guide/playbooks_strategies.html#setting-the-number-of-forks
> >> +
> >>  if DISTRO_OPENSUSE
> >>  
> >>  config ANSIBLE_CFG_RECONNECTION_RETRIES
> >> diff --git a/playbooks/roles/ansible_cfg/templates/ansible.cfg.j2 b/playbooks/roles/ansible_cfg/templates/ansible.cfg.j2
> >> index e13931b5ce97..04ae67782c20 100644
> >> --- a/playbooks/roles/ansible_cfg/templates/ansible.cfg.j2
> >> +++ b/playbooks/roles/ansible_cfg/templates/ansible.cfg.j2
> >> @@ -8,6 +8,7 @@ display_skipped_hosts = {{ ansible_cfg_callback_plugin_display_skipped_hosts }}
> >>  show_custom_stats = {{ ansible_cfg_callback_plugin_show_custom_stats }}
> >>  show_per_host_start = {{ ansible_cfg_callback_plugin_show_per_host_start }}
> >>  show_task_path_on_failure = {{ ansible_cfg_callback_plugin_show_task_path_on_failure }}
> >> +forks = {{ ansible_cfg_forks }}
> > 
> > I think it's best practice to define fallback variable values in defaults/
> > main.yaml. If you agree, could we add this variable there as well?
> 
> For values that come from Kconfig, that means we possibly have a default
> value in the Kconfig file and one in one or more defaults/main.yml
> files. (Even worse for terraform when you have default values in vars.tf
> as well...)
> 
> Frequently these default values are different from one another. Even
> when they are the same, it's easy to forget and change one and not the
> others.
> 
> Thus, in this context I generally prefer to let Kconfig decide the
> default value, and don't set up defaults further down the chain.
> 
> But I can add a default/main.yml if you like. The above is just my
> rationale for why I didn't add one here.

I completely agree with you. The only reason I created the file was what I
mentioned earlier, + in the case where we want to run a playbook without kdevops
infrastructure.

I'll follow up with a patch to remove the file after these changes are in.

> 
> 
> >>  {% if ansible_facts['distribution'] == 'openSUSE' %}
> >>  [connection]
> >>  retries = {{ ansible_cfg_reconnection_retries }}
> >> -- 
> >> 2.49.0
> >>
> > 
> > I'd also suggest adding the _CLI option so we can override this value in CI.
> > This would make it easier to run multiple kdevops instances on the same machine.
> 
> Will look into it, and post a v2. Thanks for the review!

FYI, commit e71f8e8 ("ansible_cfg: add callback plugin cli support") can be used
as reference.

> 
> 
> -- 
> Chuck Lever
diff mbox series

Patch

diff --git a/kconfigs/Kconfig.ansible_cfg b/kconfigs/Kconfig.ansible_cfg
index 7286b0fe5025..34c8c93fadd6 100644
--- a/kconfigs/Kconfig.ansible_cfg
+++ b/kconfigs/Kconfig.ansible_cfg
@@ -108,6 +108,15 @@  config ANSIBLE_CFG_DEPRECATION_WARNINGS
 	  Toggle to control the showing of deprecation warnings
 	  https://docs.ansible.com/ansible/latest/reference_appendices/config.html#deprecation-warnings
 
+config ANSIBLE_CFG_FORKS
+	int "forks"
+	otuput yaml
+	default 10
+	help
+	  Set the default maximum number of concurrent Ansible
+	  operations (forks).
+	  https://docs.ansible.com/ansible/latest/playbook_guide/playbooks_strategies.html#setting-the-number-of-forks
+
 if DISTRO_OPENSUSE
 
 config ANSIBLE_CFG_RECONNECTION_RETRIES
diff --git a/playbooks/roles/ansible_cfg/templates/ansible.cfg.j2 b/playbooks/roles/ansible_cfg/templates/ansible.cfg.j2
index e13931b5ce97..04ae67782c20 100644
--- a/playbooks/roles/ansible_cfg/templates/ansible.cfg.j2
+++ b/playbooks/roles/ansible_cfg/templates/ansible.cfg.j2
@@ -8,6 +8,7 @@  display_skipped_hosts = {{ ansible_cfg_callback_plugin_display_skipped_hosts }}
 show_custom_stats = {{ ansible_cfg_callback_plugin_show_custom_stats }}
 show_per_host_start = {{ ansible_cfg_callback_plugin_show_per_host_start }}
 show_task_path_on_failure = {{ ansible_cfg_callback_plugin_show_task_path_on_failure }}
+forks = {{ ansible_cfg_forks }}
 {% if ansible_facts['distribution'] == 'openSUSE' %}
 [connection]
 retries = {{ ansible_cfg_reconnection_retries }}