Message ID | 20230424174218.64145-2-pmorel@linux.ibm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Fixing infinite loop on SCLP READ SCP INFO error | expand |
On Mon, 24 Apr 2023 19:42:18 +0200 Pierre Morel <pmorel@linux.ibm.com> wrote: > When we can not read SCP information we can not abort during > sclp_get_cpu_num() because this function is called during exit > and calling it will lead to an infnite loop. > > The loop is: > abort() -> exit() -> smp_teardown() -> smp_query_num_cpus() -> > sclp_get_cpu_num() -> assert() -> abort() > > Since smp_setup() is done after sclp_read_info() inside setup() this > loop happens when only the start processor is running. > Let sclp_get_cpu_num() return 1 in this case. looks good to me, but please add a comment to explain that this is only supposed to happen in exceptional circumstances > > Fixes: 52076a63d569 ("s390x: Consolidate sclp read info") > Signed-off-by: Pierre Morel <pmorel@linux.ibm.com> > --- > lib/s390x/sclp.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/lib/s390x/sclp.c b/lib/s390x/sclp.c > index acdc8a9..c09360d 100644 > --- a/lib/s390x/sclp.c > +++ b/lib/s390x/sclp.c > @@ -119,8 +119,9 @@ void sclp_read_info(void) > > int sclp_get_cpu_num(void) > { > - assert(read_info); > - return read_info->entries_cpu; > + if (read_info) > + return read_info->entries_cpu; > + return 1; > } > > CPUEntry *sclp_get_cpu_entries(void)
On 4/25/23 10:26, Claudio Imbrenda wrote: > On Mon, 24 Apr 2023 19:42:18 +0200 > Pierre Morel <pmorel@linux.ibm.com> wrote: > >> When we can not read SCP information we can not abort during >> sclp_get_cpu_num() because this function is called during exit >> and calling it will lead to an infnite loop. >> >> The loop is: >> abort() -> exit() -> smp_teardown() -> smp_query_num_cpus() -> >> sclp_get_cpu_num() -> assert() -> abort() >> >> Since smp_setup() is done after sclp_read_info() inside setup() this >> loop happens when only the start processor is running. >> Let sclp_get_cpu_num() return 1 in this case. > looks good to me, but please add a comment to explain that this is only > supposed to happen in exceptional circumstances Is this ok like this: " Read SCP information can fails if the SCLP buffer length is too small for the information to return which happens for example when defining 248 CPUs. When SCLP read SCP information did fail during setup, we can currently not abort because the function sclp_get_cpu_num(), called during exit, asserts on the previous success of SCLP read SCP information. The loop is: abort() -> exit() -> smp_teardown() -> smp_query_num_cpus() -> sclp_get_cpu_num() -> assert() -> abort() Since smp_setup() is done after sclp_read_info() inside setup() this loop happens when only the start processor is running. Since only one processor is running and we know it, we do not need to make the SCLP call in sclp_get_cpu_num() and can safely return 1. " > >> Fixes: 52076a63d569 ("s390x: Consolidate sclp read info") >> Signed-off-by: Pierre Morel <pmorel@linux.ibm.com> >> --- >> lib/s390x/sclp.c | 5 +++-- >> 1 file changed, 3 insertions(+), 2 deletions(-) >> >> diff --git a/lib/s390x/sclp.c b/lib/s390x/sclp.c >> index acdc8a9..c09360d 100644 >> --- a/lib/s390x/sclp.c >> +++ b/lib/s390x/sclp.c >> @@ -119,8 +119,9 @@ void sclp_read_info(void) >> >> int sclp_get_cpu_num(void) >> { >> - assert(read_info); >> - return read_info->entries_cpu; >> + if (read_info) >> + return read_info->entries_cpu; >> + return 1; >> } >> >> CPUEntry *sclp_get_cpu_entries(void)
On 4/25/23 12:53, Pierre Morel wrote: > > On 4/25/23 10:26, Claudio Imbrenda wrote: >> On Mon, 24 Apr 2023 19:42:18 +0200 >> Pierre Morel <pmorel@linux.ibm.com> wrote: >> How is this considered to be a fix and not a workaround? Set the variable response bit in the control mask and vary the length based on stfle 140. See __init sclp_early_read_info() in drivers/s390/char/sclp_early_core.c >> >>> Fixes: 52076a63d569 ("s390x: Consolidate sclp read info") >>> Signed-off-by: Pierre Morel <pmorel@linux.ibm.com> >>> --- >>> lib/s390x/sclp.c | 5 +++-- >>> 1 file changed, 3 insertions(+), 2 deletions(-) >>> >>> diff --git a/lib/s390x/sclp.c b/lib/s390x/sclp.c >>> index acdc8a9..c09360d 100644 >>> --- a/lib/s390x/sclp.c >>> +++ b/lib/s390x/sclp.c >>> @@ -119,8 +119,9 @@ void sclp_read_info(void) >>> >>> int sclp_get_cpu_num(void) >>> { >>> - assert(read_info); >>> - return read_info->entries_cpu; >>> + if (read_info) >>> + return read_info->entries_cpu; >>> + return 1; >>> } >>> >>> CPUEntry *sclp_get_cpu_entries(void)
On 4/25/23 13:33, Janosch Frank wrote: > On 4/25/23 12:53, Pierre Morel wrote: >> >> On 4/25/23 10:26, Claudio Imbrenda wrote: >>> On Mon, 24 Apr 2023 19:42:18 +0200 >>> Pierre Morel <pmorel@linux.ibm.com> wrote: >>> > > How is this considered to be a fix and not a workaround? > > > Set the variable response bit in the control mask and vary the length > based on stfle 140. See __init sclp_early_read_info() in > drivers/s390/char/sclp_early_core.c Yes it is something to do anyway. Still in case of error we will need this fix or workaround. > > >>> >>>> Fixes: 52076a63d569 ("s390x: Consolidate sclp read info") >>>> Signed-off-by: Pierre Morel <pmorel@linux.ibm.com> >>>> --- >>>> lib/s390x/sclp.c | 5 +++-- >>>> 1 file changed, 3 insertions(+), 2 deletions(-) >>>> >>>> diff --git a/lib/s390x/sclp.c b/lib/s390x/sclp.c >>>> index acdc8a9..c09360d 100644 >>>> --- a/lib/s390x/sclp.c >>>> +++ b/lib/s390x/sclp.c >>>> @@ -119,8 +119,9 @@ void sclp_read_info(void) >>>> int sclp_get_cpu_num(void) >>>> { >>>> - assert(read_info); >>>> - return read_info->entries_cpu; >>>> + if (read_info) >>>> + return read_info->entries_cpu; >>>> + return 1; >>>> } >>>> CPUEntry *sclp_get_cpu_entries(void) >
On Tue, 25 Apr 2023 13:45:13 +0200 Pierre Morel <pmorel@linux.ibm.com> wrote: > On 4/25/23 13:33, Janosch Frank wrote: > > On 4/25/23 12:53, Pierre Morel wrote: > >> > >> On 4/25/23 10:26, Claudio Imbrenda wrote: > >>> On Mon, 24 Apr 2023 19:42:18 +0200 > >>> Pierre Morel <pmorel@linux.ibm.com> wrote: > >>> > > > > How is this considered to be a fix and not a workaround? > > > > > > Set the variable response bit in the control mask and vary the length > > based on stfle 140. See __init sclp_early_read_info() in > > drivers/s390/char/sclp_early_core.c I agree that the SCLP needs to be fixed > > > Yes it is something to do anyway. > > Still in case of error we will need this fix or workaround. and I agree that we need this fix anyway therefore the comment should be more generic and just mention the fact that the test would hang if an abort happens before SCLP Read SCP Information has completed. > > > > > > > >>> > >>>> Fixes: 52076a63d569 ("s390x: Consolidate sclp read info") > >>>> Signed-off-by: Pierre Morel <pmorel@linux.ibm.com> > >>>> --- > >>>> lib/s390x/sclp.c | 5 +++-- > >>>> 1 file changed, 3 insertions(+), 2 deletions(-) > >>>> > >>>> diff --git a/lib/s390x/sclp.c b/lib/s390x/sclp.c > >>>> index acdc8a9..c09360d 100644 > >>>> --- a/lib/s390x/sclp.c > >>>> +++ b/lib/s390x/sclp.c > >>>> @@ -119,8 +119,9 @@ void sclp_read_info(void) > >>>> int sclp_get_cpu_num(void) > >>>> { > >>>> - assert(read_info); > >>>> - return read_info->entries_cpu; > >>>> + if (read_info) > >>>> + return read_info->entries_cpu; > >>>> + return 1; > >>>> } > >>>> CPUEntry *sclp_get_cpu_entries(void) > >
On 4/25/23 14:16, Claudio Imbrenda wrote: > On Tue, 25 Apr 2023 13:45:13 +0200 > Pierre Morel <pmorel@linux.ibm.com> wrote: > >> On 4/25/23 13:33, Janosch Frank wrote: >>> On 4/25/23 12:53, Pierre Morel wrote: >>>> On 4/25/23 10:26, Claudio Imbrenda wrote: >>>>> On Mon, 24 Apr 2023 19:42:18 +0200 >>>>> Pierre Morel <pmorel@linux.ibm.com> wrote: >>>>> >>> How is this considered to be a fix and not a workaround? >>> >>> >>> Set the variable response bit in the control mask and vary the length >>> based on stfle 140. See __init sclp_early_read_info() in >>> drivers/s390/char/sclp_early_core.c > I agree that the SCLP needs to be fixed > >> >> Yes it is something to do anyway. >> >> Still in case of error we will need this fix or workaround. > and I agree that we need this fix anyway > > therefore the comment should be more generic and just mention the fact > that the test would hang if an abort happens before SCLP Read SCP > Information has completed. OK > >> >>> >>>>> >>>>>> Fixes: 52076a63d569 ("s390x: Consolidate sclp read info") >>>>>> Signed-off-by: Pierre Morel <pmorel@linux.ibm.com> >>>>>> --- >>>>>> lib/s390x/sclp.c | 5 +++-- >>>>>> 1 file changed, 3 insertions(+), 2 deletions(-) >>>>>> >>>>>> diff --git a/lib/s390x/sclp.c b/lib/s390x/sclp.c >>>>>> index acdc8a9..c09360d 100644 >>>>>> --- a/lib/s390x/sclp.c >>>>>> +++ b/lib/s390x/sclp.c >>>>>> @@ -119,8 +119,9 @@ void sclp_read_info(void) >>>>>> int sclp_get_cpu_num(void) >>>>>> { >>>>>> - assert(read_info); >>>>>> - return read_info->entries_cpu; >>>>>> + if (read_info) >>>>>> + return read_info->entries_cpu; >>>>>> + return 1; >>>>>> } >>>>>> CPUEntry *sclp_get_cpu_entries(void) >>>
On 4/25/23 13:33, Janosch Frank wrote: > On 4/25/23 12:53, Pierre Morel wrote: >> >> On 4/25/23 10:26, Claudio Imbrenda wrote: >>> On Mon, 24 Apr 2023 19:42:18 +0200 >>> Pierre Morel <pmorel@linux.ibm.com> wrote: >>> > > How is this considered to be a fix and not a workaround? For me it was a fix because in the previous version the sclp failed but the abort did work and this patch does not fix sclp but the abort.
diff --git a/lib/s390x/sclp.c b/lib/s390x/sclp.c index acdc8a9..c09360d 100644 --- a/lib/s390x/sclp.c +++ b/lib/s390x/sclp.c @@ -119,8 +119,9 @@ void sclp_read_info(void) int sclp_get_cpu_num(void) { - assert(read_info); - return read_info->entries_cpu; + if (read_info) + return read_info->entries_cpu; + return 1; } CPUEntry *sclp_get_cpu_entries(void)
When we can not read SCP information we can not abort during sclp_get_cpu_num() because this function is called during exit and calling it will lead to an infnite loop. The loop is: abort() -> exit() -> smp_teardown() -> smp_query_num_cpus() -> sclp_get_cpu_num() -> assert() -> abort() Since smp_setup() is done after sclp_read_info() inside setup() this loop happens when only the start processor is running. Let sclp_get_cpu_num() return 1 in this case. Fixes: 52076a63d569 ("s390x: Consolidate sclp read info") Signed-off-by: Pierre Morel <pmorel@linux.ibm.com> --- lib/s390x/sclp.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)