diff mbox series

[kvm-unit-tests,1/1] s390x: sclp: consider monoprocessor on read_info error

Message ID 20230424174218.64145-2-pmorel@linux.ibm.com (mailing list archive)
State New, archived
Headers show
Series Fixing infinite loop on SCLP READ SCP INFO error | expand

Commit Message

Pierre Morel April 24, 2023, 5:42 p.m. UTC
When we can not read SCP information we can not abort during
sclp_get_cpu_num() because this function is called during exit
and calling it will lead to an infnite loop.

The loop is:
abort() -> exit() -> smp_teardown() -> smp_query_num_cpus() ->
sclp_get_cpu_num() -> assert() -> abort()

Since smp_setup() is done after sclp_read_info() inside setup() this
loop happens when only the start processor is running.
Let sclp_get_cpu_num() return 1 in this case.

Fixes: 52076a63d569 ("s390x: Consolidate sclp read info")
Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
---
 lib/s390x/sclp.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

Comments

Claudio Imbrenda April 25, 2023, 8:26 a.m. UTC | #1
On Mon, 24 Apr 2023 19:42:18 +0200
Pierre Morel <pmorel@linux.ibm.com> wrote:

> When we can not read SCP information we can not abort during
> sclp_get_cpu_num() because this function is called during exit
> and calling it will lead to an infnite loop.
> 
> The loop is:
> abort() -> exit() -> smp_teardown() -> smp_query_num_cpus() ->
> sclp_get_cpu_num() -> assert() -> abort()
> 
> Since smp_setup() is done after sclp_read_info() inside setup() this
> loop happens when only the start processor is running.
> Let sclp_get_cpu_num() return 1 in this case.

looks good to me, but please add a comment to explain that this is only
supposed to happen in exceptional circumstances

> 
> Fixes: 52076a63d569 ("s390x: Consolidate sclp read info")
> Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
> ---
>  lib/s390x/sclp.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/s390x/sclp.c b/lib/s390x/sclp.c
> index acdc8a9..c09360d 100644
> --- a/lib/s390x/sclp.c
> +++ b/lib/s390x/sclp.c
> @@ -119,8 +119,9 @@ void sclp_read_info(void)
>  
>  int sclp_get_cpu_num(void)
>  {
> -	assert(read_info);
> -	return read_info->entries_cpu;
> +    if (read_info)
> +	    return read_info->entries_cpu;
> +    return 1;
>  }
>  
>  CPUEntry *sclp_get_cpu_entries(void)
Pierre Morel April 25, 2023, 10:53 a.m. UTC | #2
On 4/25/23 10:26, Claudio Imbrenda wrote:
> On Mon, 24 Apr 2023 19:42:18 +0200
> Pierre Morel <pmorel@linux.ibm.com> wrote:
>
>> When we can not read SCP information we can not abort during
>> sclp_get_cpu_num() because this function is called during exit
>> and calling it will lead to an infnite loop.
>>
>> The loop is:
>> abort() -> exit() -> smp_teardown() -> smp_query_num_cpus() ->
>> sclp_get_cpu_num() -> assert() -> abort()
>>
>> Since smp_setup() is done after sclp_read_info() inside setup() this
>> loop happens when only the start processor is running.
>> Let sclp_get_cpu_num() return 1 in this case.
> looks good to me, but please add a comment to explain that this is only
> supposed to happen in exceptional circumstances


Is this ok like this:

"
Read SCP information can fails if the SCLP buffer length is too small
for the information to return which happens for example when defining 248 CPUs.

When SCLP read SCP information did fail during setup, we can currently not abort because
the function sclp_get_cpu_num(), called during exit, asserts on the previous success
of SCLP read SCP information.

The loop is:
abort() -> exit() -> smp_teardown() -> smp_query_num_cpus() ->
sclp_get_cpu_num() -> assert() -> abort()

Since smp_setup() is done after sclp_read_info() inside setup() this
loop happens when only the start processor is running.

Since only one processor is running and we know it, we do not
need to make the SCLP call in sclp_get_cpu_num() and can safely return 1.
"

>
>> Fixes: 52076a63d569 ("s390x: Consolidate sclp read info")
>> Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
>> ---
>>   lib/s390x/sclp.c | 5 +++--
>>   1 file changed, 3 insertions(+), 2 deletions(-)
>>
>> diff --git a/lib/s390x/sclp.c b/lib/s390x/sclp.c
>> index acdc8a9..c09360d 100644
>> --- a/lib/s390x/sclp.c
>> +++ b/lib/s390x/sclp.c
>> @@ -119,8 +119,9 @@ void sclp_read_info(void)
>>   
>>   int sclp_get_cpu_num(void)
>>   {
>> -	assert(read_info);
>> -	return read_info->entries_cpu;
>> +    if (read_info)
>> +	    return read_info->entries_cpu;
>> +    return 1;
>>   }
>>   
>>   CPUEntry *sclp_get_cpu_entries(void)
Janosch Frank April 25, 2023, 11:33 a.m. UTC | #3
On 4/25/23 12:53, Pierre Morel wrote:
> 
> On 4/25/23 10:26, Claudio Imbrenda wrote:
>> On Mon, 24 Apr 2023 19:42:18 +0200
>> Pierre Morel <pmorel@linux.ibm.com> wrote:
>>

How is this considered to be a fix and not a workaround?


Set the variable response bit in the control mask and vary the length 
based on stfle 140. See __init sclp_early_read_info() in 
drivers/s390/char/sclp_early_core.c


>>
>>> Fixes: 52076a63d569 ("s390x: Consolidate sclp read info")
>>> Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
>>> ---
>>>    lib/s390x/sclp.c | 5 +++--
>>>    1 file changed, 3 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/lib/s390x/sclp.c b/lib/s390x/sclp.c
>>> index acdc8a9..c09360d 100644
>>> --- a/lib/s390x/sclp.c
>>> +++ b/lib/s390x/sclp.c
>>> @@ -119,8 +119,9 @@ void sclp_read_info(void)
>>>    
>>>    int sclp_get_cpu_num(void)
>>>    {
>>> -	assert(read_info);
>>> -	return read_info->entries_cpu;
>>> +    if (read_info)
>>> +	    return read_info->entries_cpu;
>>> +    return 1;
>>>    }
>>>    
>>>    CPUEntry *sclp_get_cpu_entries(void)
Pierre Morel April 25, 2023, 11:45 a.m. UTC | #4
On 4/25/23 13:33, Janosch Frank wrote:
> On 4/25/23 12:53, Pierre Morel wrote:
>>
>> On 4/25/23 10:26, Claudio Imbrenda wrote:
>>> On Mon, 24 Apr 2023 19:42:18 +0200
>>> Pierre Morel <pmorel@linux.ibm.com> wrote:
>>>
>
> How is this considered to be a fix and not a workaround?
>
>
> Set the variable response bit in the control mask and vary the length 
> based on stfle 140. See __init sclp_early_read_info() in 
> drivers/s390/char/sclp_early_core.c


Yes it is something to do anyway.

Still in case of error we will need this fix or workaround.


>
>
>>>
>>>> Fixes: 52076a63d569 ("s390x: Consolidate sclp read info")
>>>> Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
>>>> ---
>>>>    lib/s390x/sclp.c | 5 +++--
>>>>    1 file changed, 3 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/lib/s390x/sclp.c b/lib/s390x/sclp.c
>>>> index acdc8a9..c09360d 100644
>>>> --- a/lib/s390x/sclp.c
>>>> +++ b/lib/s390x/sclp.c
>>>> @@ -119,8 +119,9 @@ void sclp_read_info(void)
>>>>       int sclp_get_cpu_num(void)
>>>>    {
>>>> -    assert(read_info);
>>>> -    return read_info->entries_cpu;
>>>> +    if (read_info)
>>>> +        return read_info->entries_cpu;
>>>> +    return 1;
>>>>    }
>>>>       CPUEntry *sclp_get_cpu_entries(void)
>
Claudio Imbrenda April 25, 2023, 12:16 p.m. UTC | #5
On Tue, 25 Apr 2023 13:45:13 +0200
Pierre Morel <pmorel@linux.ibm.com> wrote:

> On 4/25/23 13:33, Janosch Frank wrote:
> > On 4/25/23 12:53, Pierre Morel wrote:  
> >>
> >> On 4/25/23 10:26, Claudio Imbrenda wrote:  
> >>> On Mon, 24 Apr 2023 19:42:18 +0200
> >>> Pierre Morel <pmorel@linux.ibm.com> wrote:
> >>>  
> >
> > How is this considered to be a fix and not a workaround?
> >
> >
> > Set the variable response bit in the control mask and vary the length 
> > based on stfle 140. See __init sclp_early_read_info() in 
> > drivers/s390/char/sclp_early_core.c  

I agree that the SCLP needs to be fixed

> 
> 
> Yes it is something to do anyway.
> 
> Still in case of error we will need this fix or workaround.

and I agree that we need this fix anyway

therefore the comment should be more generic and just mention the fact
that the test would hang if an abort happens before SCLP Read SCP
Information has completed.

> 
> 
> >
> >  
> >>>  
> >>>> Fixes: 52076a63d569 ("s390x: Consolidate sclp read info")
> >>>> Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
> >>>> ---
> >>>>    lib/s390x/sclp.c | 5 +++--
> >>>>    1 file changed, 3 insertions(+), 2 deletions(-)
> >>>>
> >>>> diff --git a/lib/s390x/sclp.c b/lib/s390x/sclp.c
> >>>> index acdc8a9..c09360d 100644
> >>>> --- a/lib/s390x/sclp.c
> >>>> +++ b/lib/s390x/sclp.c
> >>>> @@ -119,8 +119,9 @@ void sclp_read_info(void)
> >>>>       int sclp_get_cpu_num(void)
> >>>>    {
> >>>> -    assert(read_info);
> >>>> -    return read_info->entries_cpu;
> >>>> +    if (read_info)
> >>>> +        return read_info->entries_cpu;
> >>>> +    return 1;
> >>>>    }
> >>>>       CPUEntry *sclp_get_cpu_entries(void)  
> >
Pierre Morel April 25, 2023, 1:24 p.m. UTC | #6
On 4/25/23 14:16, Claudio Imbrenda wrote:
> On Tue, 25 Apr 2023 13:45:13 +0200
> Pierre Morel <pmorel@linux.ibm.com> wrote:
>
>> On 4/25/23 13:33, Janosch Frank wrote:
>>> On 4/25/23 12:53, Pierre Morel wrote:
>>>> On 4/25/23 10:26, Claudio Imbrenda wrote:
>>>>> On Mon, 24 Apr 2023 19:42:18 +0200
>>>>> Pierre Morel <pmorel@linux.ibm.com> wrote:
>>>>>   
>>> How is this considered to be a fix and not a workaround?
>>>
>>>
>>> Set the variable response bit in the control mask and vary the length
>>> based on stfle 140. See __init sclp_early_read_info() in
>>> drivers/s390/char/sclp_early_core.c
> I agree that the SCLP needs to be fixed
>
>>
>> Yes it is something to do anyway.
>>
>> Still in case of error we will need this fix or workaround.
> and I agree that we need this fix anyway
>
> therefore the comment should be more generic and just mention the fact
> that the test would hang if an abort happens before SCLP Read SCP
> Information has completed.

OK


>
>>
>>>   
>>>>>   
>>>>>> Fixes: 52076a63d569 ("s390x: Consolidate sclp read info")
>>>>>> Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
>>>>>> ---
>>>>>>     lib/s390x/sclp.c | 5 +++--
>>>>>>     1 file changed, 3 insertions(+), 2 deletions(-)
>>>>>>
>>>>>> diff --git a/lib/s390x/sclp.c b/lib/s390x/sclp.c
>>>>>> index acdc8a9..c09360d 100644
>>>>>> --- a/lib/s390x/sclp.c
>>>>>> +++ b/lib/s390x/sclp.c
>>>>>> @@ -119,8 +119,9 @@ void sclp_read_info(void)
>>>>>>        int sclp_get_cpu_num(void)
>>>>>>     {
>>>>>> -    assert(read_info);
>>>>>> -    return read_info->entries_cpu;
>>>>>> +    if (read_info)
>>>>>> +        return read_info->entries_cpu;
>>>>>> +    return 1;
>>>>>>     }
>>>>>>        CPUEntry *sclp_get_cpu_entries(void)
>>>
Pierre Morel April 25, 2023, 1:48 p.m. UTC | #7
On 4/25/23 13:33, Janosch Frank wrote:
> On 4/25/23 12:53, Pierre Morel wrote:
>>
>> On 4/25/23 10:26, Claudio Imbrenda wrote:
>>> On Mon, 24 Apr 2023 19:42:18 +0200
>>> Pierre Morel <pmorel@linux.ibm.com> wrote:
>>>
>
> How is this considered to be a fix and not a workaround?


For me it was a fix because in the previous version the sclp failed but 
the abort did work and this patch does not fix sclp but the abort.
diff mbox series

Patch

diff --git a/lib/s390x/sclp.c b/lib/s390x/sclp.c
index acdc8a9..c09360d 100644
--- a/lib/s390x/sclp.c
+++ b/lib/s390x/sclp.c
@@ -119,8 +119,9 @@  void sclp_read_info(void)
 
 int sclp_get_cpu_num(void)
 {
-	assert(read_info);
-	return read_info->entries_cpu;
+    if (read_info)
+	    return read_info->entries_cpu;
+    return 1;
 }
 
 CPUEntry *sclp_get_cpu_entries(void)