diff mbox series

[3/2] padata: initialize usable masks to reflect offlined CPU

Message ID 20190812210200.13653-1-daniel.m.jordan@oracle.com (mailing list archive)
State Changes Requested
Delegated to: Herbert Xu
Headers show
Series [v2] padata: validate cpumask without removed CPU during offline | expand

Commit Message

Daniel Jordan Aug. 12, 2019, 9:02 p.m. UTC
__padata_remove_cpu clears the offlined CPU from the usable masks after
padata_alloc_pd has initialized pd->cpu, which means pd->cpu could be
initialized to this CPU, causing padata to wait indefinitely for the
next job in padata_get_next.

Make the usable masks reflect the offline CPU when they're established
in padata_setup_cpumasks so pd->cpu is initialized properly.

Fixes: 6fc4dbcf0276 ("padata: Replace delayed timer with immediate workqueue in padata_reorder")
Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: linux-crypto@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---

Hi, one more edge case.  All combinations of CPUs among
parallel_cpumask, serial_cpumask, and CPU hotplug have now been tested
in a 4-CPU VM, and an 8-CPU VM has run with random combinations of these
settings for over an hour.

 kernel/padata.c | 18 ++++++++++++++----
 1 file changed, 14 insertions(+), 4 deletions(-)

Comments

Herbert Xu Aug. 22, 2019, 3:51 a.m. UTC | #1
On Mon, Aug 12, 2019 at 05:02:00PM -0400, Daniel Jordan wrote:
> __padata_remove_cpu clears the offlined CPU from the usable masks after
> padata_alloc_pd has initialized pd->cpu, which means pd->cpu could be
> initialized to this CPU, causing padata to wait indefinitely for the
> next job in padata_get_next.
> 
> Make the usable masks reflect the offline CPU when they're established
> in padata_setup_cpumasks so pd->cpu is initialized properly.
> 
> Fixes: 6fc4dbcf0276 ("padata: Replace delayed timer with immediate workqueue in padata_reorder")
> Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
> Cc: Herbert Xu <herbert@gondor.apana.org.au>
> Cc: Steffen Klassert <steffen.klassert@secunet.com>
> Cc: linux-crypto@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> ---
> 
> Hi, one more edge case.  All combinations of CPUs among
> parallel_cpumask, serial_cpumask, and CPU hotplug have now been tested
> in a 4-CPU VM, and an 8-CPU VM has run with random combinations of these
> settings for over an hour.
> 
>  kernel/padata.c | 18 ++++++++++++++----
>  1 file changed, 14 insertions(+), 4 deletions(-)

If we modify patch 2/2 by calling this after cpu_online_mask
has been updated then this problem should go away because we
can then remove the cpumask_clear_cpu calls.

Cheers,
Daniel Jordan Aug. 22, 2019, 10:11 p.m. UTC | #2
On 8/21/19 11:51 PM, Herbert Xu wrote:
> On Mon, Aug 12, 2019 at 05:02:00PM -0400, Daniel Jordan wrote:
>> __padata_remove_cpu clears the offlined CPU from the usable masks after
>> padata_alloc_pd has initialized pd->cpu, which means pd->cpu could be
>> initialized to this CPU, causing padata to wait indefinitely for the
>> next job in padata_get_next.
>>
>> Make the usable masks reflect the offline CPU when they're established
>> in padata_setup_cpumasks so pd->cpu is initialized properly.
>>
>> Fixes: 6fc4dbcf0276 ("padata: Replace delayed timer with immediate workqueue in padata_reorder")
>> Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
>> Cc: Herbert Xu <herbert@gondor.apana.org.au>
>> Cc: Steffen Klassert <steffen.klassert@secunet.com>
>> Cc: linux-crypto@vger.kernel.org
>> Cc: linux-kernel@vger.kernel.org
>> ---
>>
>> Hi, one more edge case.  All combinations of CPUs among
>> parallel_cpumask, serial_cpumask, and CPU hotplug have now been tested
>> in a 4-CPU VM, and an 8-CPU VM has run with random combinations of these
>> settings for over an hour.
>>
>>   kernel/padata.c | 18 ++++++++++++++----
>>   1 file changed, 14 insertions(+), 4 deletions(-)
> 
> If we modify patch 2/2 by calling this after cpu_online_mask
> has been updated then this problem should go away because we
> can then remove the cpumask_clear_cpu calls.

Yep, agreed.
diff mbox series

Patch

diff --git a/kernel/padata.c b/kernel/padata.c
index 01460ea1d160..c1002ac4720c 100644
--- a/kernel/padata.c
+++ b/kernel/padata.c
@@ -702,17 +702,27 @@  static int __padata_remove_cpu(struct padata_instance *pinst, int cpu)
 	struct parallel_data *pd = NULL;
 
 	if (cpumask_test_cpu(cpu, cpu_online_mask)) {
+		cpumask_var_t pcpu, cbcpu;
+
 		__padata_stop(pinst);
 
-		pd = padata_alloc_pd(pinst, pinst->cpumask.pcpu,
-				     pinst->cpumask.cbcpu);
+		/*
+		 * padata_alloc_pd uses cpu_online_mask to get the usable
+		 * masks, but @cpu hasn't been removed from it yet, so use
+		 * temporary masks that exclude @cpu so the usable masks show
+		 * @cpu as offline for pd->cpu's initialization.
+		 */
+		cpumask_copy(pcpu, pinst->cpumask.pcpu);
+		cpumask_copy(cbcpu, pinst->cpumask.cbcpu);
+		cpumask_clear_cpu(cpu, cbcpu);
+		cpumask_clear_cpu(cpu, pcpu);
+
+		pd = padata_alloc_pd(pinst, pcpu, cbcpu);
 		if (!pd)
 			return -ENOMEM;
 
 		padata_replace(pinst, pd);
 
-		cpumask_clear_cpu(cpu, pd->cpumask.cbcpu);
-		cpumask_clear_cpu(cpu, pd->cpumask.pcpu);
 		if (padata_validate_cpumask(pinst, pd->cpumask.pcpu) &&
 		    padata_validate_cpumask(pinst, pd->cpumask.cbcpu))
 			__padata_start(pinst);