Message ID | 20250404204827.34941-1-cel@kernel.org (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | ansible: Increase default task parallelism | expand |
Context | Check | Description |
---|---|---|
mcgrof/vmtest-main-VM_Test-2 | success | Logs for Setup and Run Make Targets (debian:testing) |
mcgrof/vmtest-main-VM_Test-3 | success | Logs for Setup and Run Make Targets (debian:testing) |
mcgrof/vmtest-main-VM_Test-5 | success | Logs for Setup and Run Make Targets (fedora:latest) |
mcgrof/vmtest-main-VM_Test-4 | success | Logs for Setup and Run Make Targets (fedora:latest) |
mcgrof/vmtest-main-VM_Test-6 | success | Logs for Setup and Run Make Targets (opensuse/tumbleweed) |
mcgrof/vmtest-main-VM_Test-7 | success | Logs for Setup and Run Make Targets (opensuse/tumbleweed) |
mcgrof/vmtest-main-VM_Test-1 | fail | Logs for Run kdevops CI |
mcgrof/vmtest-main-PR | fail | PR summary |
mcgrof/vmtest-main-VM_Test-0 | fail | Logs for Run kdevops CI |
On Fri, Apr 04, 2025 at 04:48:27PM -0400, cel@kernel.org wrote: > From: Chuck Lever <chuck.lever@oracle.com> > > Enable workflows with more than 5 target nodes to run concurrently. > > Signed-off-by: Chuck Lever <chuck.lever@oracle.com> We have a few places that try to increase the size too, might be nice to remove that if we have a central place that does this nicely now. In spirit with Daniel's proposed changes to fold the inteprter for python, so we can simplify and remove special casing it everwhere. Other than that: Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Luis
On Fri, Apr 04, 2025 at 04:48:27PM +0100, cel@kernel.org wrote: > From: Chuck Lever <chuck.lever@oracle.com> > > Enable workflows with more than 5 target nodes to run concurrently. > > Signed-off-by: Chuck Lever <chuck.lever@oracle.com> > --- > kconfigs/Kconfig.ansible_cfg | 9 +++++++++ > playbooks/roles/ansible_cfg/templates/ansible.cfg.j2 | 1 + > 2 files changed, 10 insertions(+) > > This will certainly conflict with other recent patches posted here, > but I thought I would post for comments. I've been running with the > "forks" setting elevated for a while, but the new ansible_cfg logic > added here is untested. > > diff --git a/kconfigs/Kconfig.ansible_cfg b/kconfigs/Kconfig.ansible_cfg > index 7286b0fe5025..34c8c93fadd6 100644 > --- a/kconfigs/Kconfig.ansible_cfg > +++ b/kconfigs/Kconfig.ansible_cfg > @@ -108,6 +108,15 @@ config ANSIBLE_CFG_DEPRECATION_WARNINGS > Toggle to control the showing of deprecation warnings > https://docs.ansible.com/ansible/latest/reference_appendices/config.html#deprecation-warnings > > +config ANSIBLE_CFG_FORKS > + int "forks" > + otuput yaml Typo here. > + default 10 I'm thinking if we want to optimize this number a bit more. What about a script that returns min(nproc, number of running/configured guests). But I'm not sure if nproc should account for the cores we assign to the guests or not. This may require to set up a sensible value first, and update the value later once we know the number of guests. What do you think? > + help > + Set the default maximum number of concurrent Ansible > + operations (forks). > + https://docs.ansible.com/ansible/latest/playbook_guide/playbooks_strategies.html#setting-the-number-of-forks > + > if DISTRO_OPENSUSE > > config ANSIBLE_CFG_RECONNECTION_RETRIES > diff --git a/playbooks/roles/ansible_cfg/templates/ansible.cfg.j2 b/playbooks/roles/ansible_cfg/templates/ansible.cfg.j2 > index e13931b5ce97..04ae67782c20 100644 > --- a/playbooks/roles/ansible_cfg/templates/ansible.cfg.j2 > +++ b/playbooks/roles/ansible_cfg/templates/ansible.cfg.j2 > @@ -8,6 +8,7 @@ display_skipped_hosts = {{ ansible_cfg_callback_plugin_display_skipped_hosts }} > show_custom_stats = {{ ansible_cfg_callback_plugin_show_custom_stats }} > show_per_host_start = {{ ansible_cfg_callback_plugin_show_per_host_start }} > show_task_path_on_failure = {{ ansible_cfg_callback_plugin_show_task_path_on_failure }} > +forks = {{ ansible_cfg_forks }} I think it's best practice to define fallback variable values in defaults/ main.yaml. If you agree, could we add this variable there as well? > {% if ansible_facts['distribution'] == 'openSUSE' %} > [connection] > retries = {{ ansible_cfg_reconnection_retries }} > -- > 2.49.0 > I'd also suggest adding the _CLI option so we can override this value in CI. This would make it easier to run multiple kdevops instances on the same machine.
On Fri, Apr 04, 2025 at 04:41:44PM +0100, Luis Chamberlain wrote: > On Fri, Apr 04, 2025 at 04:48:27PM -0400, cel@kernel.org wrote: > > From: Chuck Lever <chuck.lever@oracle.com> > > > > Enable workflows with more than 5 target nodes to run concurrently. > > > > Signed-off-by: Chuck Lever <chuck.lever@oracle.com> > > We have a few places that try to increase the size too, might be nice > to remove that if we have a central place that does this nicely now. > In spirit with Daniel's proposed changes to fold the inteprter for > python, so we can simplify and remove special casing it everwhere. Agree with this. I found it is actually used everywhere: git grep "\-f" | grep "\-i" | grep hosts | wc -l 76 git grep 30 | grep Makefile | grep "\-f" | wc -l 74 We can probably add ANSIBLE_INVENTORY [1] to the ansible.cfg TODO list too. [1] https://docs.ansible.com/ansible/latest/reference_appendices/config.html#envvar-ANSIBLE_INVENTORY > > Other than that: > > Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> > > Luis
Hi Daniel - On 4/8/25 9:15 AM, Daniel Gomez wrote: > On Fri, Apr 04, 2025 at 04:48:27PM +0100, cel@kernel.org wrote: >> From: Chuck Lever <chuck.lever@oracle.com> >> >> Enable workflows with more than 5 target nodes to run concurrently. >> >> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> >> --- >> kconfigs/Kconfig.ansible_cfg | 9 +++++++++ >> playbooks/roles/ansible_cfg/templates/ansible.cfg.j2 | 1 + >> 2 files changed, 10 insertions(+) >> >> This will certainly conflict with other recent patches posted here, >> but I thought I would post for comments. I've been running with the >> "forks" setting elevated for a while, but the new ansible_cfg logic >> added here is untested. >> >> diff --git a/kconfigs/Kconfig.ansible_cfg b/kconfigs/Kconfig.ansible_cfg >> index 7286b0fe5025..34c8c93fadd6 100644 >> --- a/kconfigs/Kconfig.ansible_cfg >> +++ b/kconfigs/Kconfig.ansible_cfg >> @@ -108,6 +108,15 @@ config ANSIBLE_CFG_DEPRECATION_WARNINGS >> Toggle to control the showing of deprecation warnings >> https://docs.ansible.com/ansible/latest/reference_appendices/config.html#deprecation-warnings >> >> +config ANSIBLE_CFG_FORKS >> + int "forks" >> + otuput yaml > > Typo here. Thanks! >> + default 10 > > I'm thinking if we want to optimize this number a bit more. > > What about a script that returns min(nproc, number of running/configured > guests). But I'm not sure if nproc should account for the cores we assign to the > guests or not. This may require to set up a sensible value first, and update the > value later once we know the number of guests. > > What do you think? I have systems with multiple buildbot workers, each running multiple libvirt guests with multiple vCPUs. I'm not certain how one of these workers would be able to guess how many total guests with how many provisioned vCPUs are active on the host system. Also, except when doing something CPU intensive, I don't think the guest count is very interesting. The number of provisioned vCPUs might be a better match, but again, it's hard to ascertain that total. At best, any heuristic is going to be a wild miss in some scenarios. I'm open to discussion, though. >> + help >> + Set the default maximum number of concurrent Ansible >> + operations (forks). >> + https://docs.ansible.com/ansible/latest/playbook_guide/playbooks_strategies.html#setting-the-number-of-forks >> + >> if DISTRO_OPENSUSE >> >> config ANSIBLE_CFG_RECONNECTION_RETRIES >> diff --git a/playbooks/roles/ansible_cfg/templates/ansible.cfg.j2 b/playbooks/roles/ansible_cfg/templates/ansible.cfg.j2 >> index e13931b5ce97..04ae67782c20 100644 >> --- a/playbooks/roles/ansible_cfg/templates/ansible.cfg.j2 >> +++ b/playbooks/roles/ansible_cfg/templates/ansible.cfg.j2 >> @@ -8,6 +8,7 @@ display_skipped_hosts = {{ ansible_cfg_callback_plugin_display_skipped_hosts }} >> show_custom_stats = {{ ansible_cfg_callback_plugin_show_custom_stats }} >> show_per_host_start = {{ ansible_cfg_callback_plugin_show_per_host_start }} >> show_task_path_on_failure = {{ ansible_cfg_callback_plugin_show_task_path_on_failure }} >> +forks = {{ ansible_cfg_forks }} > > I think it's best practice to define fallback variable values in defaults/ > main.yaml. If you agree, could we add this variable there as well? For values that come from Kconfig, that means we possibly have a default value in the Kconfig file and one in one or more defaults/main.yml files. (Even worse for terraform when you have default values in vars.tf as well...) Frequently these default values are different from one another. Even when they are the same, it's easy to forget and change one and not the others. Thus, in this context I generally prefer to let Kconfig decide the default value, and don't set up defaults further down the chain. But I can add a default/main.yml if you like. The above is just my rationale for why I didn't add one here. >> {% if ansible_facts['distribution'] == 'openSUSE' %} >> [connection] >> retries = {{ ansible_cfg_reconnection_retries }} >> -- >> 2.49.0 >> > > I'd also suggest adding the _CLI option so we can override this value in CI. > This would make it easier to run multiple kdevops instances on the same machine. Will look into it, and post a v2. Thanks for the review!
> >> + default 10 > > > > I'm thinking if we want to optimize this number a bit more. > > > > What about a script that returns min(nproc, number of running/configured > > guests). But I'm not sure if nproc should account for the cores we assign to the > > guests or not. This may require to set up a sensible value first, and update the > > value later once we know the number of guests. > > > > What do you think? > > I have systems with multiple buildbot workers, each running multiple > libvirt guests with multiple vCPUs. I'm not certain how one of these > workers would be able to guess how many total guests with how many > provisioned vCPUs are active on the host system. Right, in this case we need to specify a custom Ansible fork value, since (AFAIK) there's no mechanism to share information between buildbot workers. For this case, I'd suggest defining how many buildbots workers will run in parallel, and how many forks each one is allowed to use. And probably if all workers perform the same pipeline/workflow/build, this value would be the same for all of them. That said, I wonder if we should actually use this value also for host builds? > > Also, except when doing something CPU intensive, I don't think the > guest count is very interesting. The number of provisioned vCPUs might > be a better match, but again, it's hard to ascertain that total. > > At best, any heuristic is going to be a wild miss in some scenarios. I agree with this. Unless it's assumed that only one kdevops instance is run per machine (in the default case), then we can use nproc as default value here. Users can then override this value via CLI or menuconfig. > > I'm open to discussion, though. > > > >> + help > >> + Set the default maximum number of concurrent Ansible > >> + operations (forks). > >> + https://docs.ansible.com/ansible/latest/playbook_guide/playbooks_strategies.html#setting-the-number-of-forks > >> + > >> if DISTRO_OPENSUSE > >> > >> config ANSIBLE_CFG_RECONNECTION_RETRIES > >> diff --git a/playbooks/roles/ansible_cfg/templates/ansible.cfg.j2 b/playbooks/roles/ansible_cfg/templates/ansible.cfg.j2 > >> index e13931b5ce97..04ae67782c20 100644 > >> --- a/playbooks/roles/ansible_cfg/templates/ansible.cfg.j2 > >> +++ b/playbooks/roles/ansible_cfg/templates/ansible.cfg.j2 > >> @@ -8,6 +8,7 @@ display_skipped_hosts = {{ ansible_cfg_callback_plugin_display_skipped_hosts }} > >> show_custom_stats = {{ ansible_cfg_callback_plugin_show_custom_stats }} > >> show_per_host_start = {{ ansible_cfg_callback_plugin_show_per_host_start }} > >> show_task_path_on_failure = {{ ansible_cfg_callback_plugin_show_task_path_on_failure }} > >> +forks = {{ ansible_cfg_forks }} > > > > I think it's best practice to define fallback variable values in defaults/ > > main.yaml. If you agree, could we add this variable there as well? > > For values that come from Kconfig, that means we possibly have a default > value in the Kconfig file and one in one or more defaults/main.yml > files. (Even worse for terraform when you have default values in vars.tf > as well...) > > Frequently these default values are different from one another. Even > when they are the same, it's easy to forget and change one and not the > others. > > Thus, in this context I generally prefer to let Kconfig decide the > default value, and don't set up defaults further down the chain. > > But I can add a default/main.yml if you like. The above is just my > rationale for why I didn't add one here. I completely agree with you. The only reason I created the file was what I mentioned earlier, + in the case where we want to run a playbook without kdevops infrastructure. I'll follow up with a patch to remove the file after these changes are in. > > > >> {% if ansible_facts['distribution'] == 'openSUSE' %} > >> [connection] > >> retries = {{ ansible_cfg_reconnection_retries }} > >> -- > >> 2.49.0 > >> > > > > I'd also suggest adding the _CLI option so we can override this value in CI. > > This would make it easier to run multiple kdevops instances on the same machine. > > Will look into it, and post a v2. Thanks for the review! FYI, commit e71f8e8 ("ansible_cfg: add callback plugin cli support") can be used as reference. > > > -- > Chuck Lever
diff --git a/kconfigs/Kconfig.ansible_cfg b/kconfigs/Kconfig.ansible_cfg index 7286b0fe5025..34c8c93fadd6 100644 --- a/kconfigs/Kconfig.ansible_cfg +++ b/kconfigs/Kconfig.ansible_cfg @@ -108,6 +108,15 @@ config ANSIBLE_CFG_DEPRECATION_WARNINGS Toggle to control the showing of deprecation warnings https://docs.ansible.com/ansible/latest/reference_appendices/config.html#deprecation-warnings +config ANSIBLE_CFG_FORKS + int "forks" + otuput yaml + default 10 + help + Set the default maximum number of concurrent Ansible + operations (forks). + https://docs.ansible.com/ansible/latest/playbook_guide/playbooks_strategies.html#setting-the-number-of-forks + if DISTRO_OPENSUSE config ANSIBLE_CFG_RECONNECTION_RETRIES diff --git a/playbooks/roles/ansible_cfg/templates/ansible.cfg.j2 b/playbooks/roles/ansible_cfg/templates/ansible.cfg.j2 index e13931b5ce97..04ae67782c20 100644 --- a/playbooks/roles/ansible_cfg/templates/ansible.cfg.j2 +++ b/playbooks/roles/ansible_cfg/templates/ansible.cfg.j2 @@ -8,6 +8,7 @@ display_skipped_hosts = {{ ansible_cfg_callback_plugin_display_skipped_hosts }} show_custom_stats = {{ ansible_cfg_callback_plugin_show_custom_stats }} show_per_host_start = {{ ansible_cfg_callback_plugin_show_per_host_start }} show_task_path_on_failure = {{ ansible_cfg_callback_plugin_show_task_path_on_failure }} +forks = {{ ansible_cfg_forks }} {% if ansible_facts['distribution'] == 'openSUSE' %} [connection] retries = {{ ansible_cfg_reconnection_retries }}