Message ID | 20220330150551.2573938-12-cristian.marussi@arm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | SCMIv3.1 Miscellaneous changes | expand |
On 3/30/2022 5:05 PM, Cristian Marussi wrote: > Using the common protocol helper implementation add support for all new > SCMIv3.1 extended names commands related to all protocols with the > exception of SENSOR_AXIS_GET_NAME. > > Signed-off-by: Cristian Marussi <cristian.marussi@arm.com> This causes the following splat on a platform where regulators fail to initialize: [ 0.603737] ------------[ cut here ]------------ [ 0.603752] WARNING: CPU: 1 PID: 1 at mm/page_alloc.c:5402 __alloc_pages+0x6c/0x184 [ 0.603797] Modules linked in: [ 0.603809] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 5.19.0-rc1-g44dbdf3bb3f4 #42 [ 0.603818] Hardware name: BCX972160SV (DT) [ 0.603825] pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 0.603834] pc : __alloc_pages+0x6c/0x184 [ 0.603841] lr : kmalloc_order+0x40/0x88 [ 0.603851] sp : ffffffc00a40b850 [ 0.603856] x29: ffffffc00a40b850 x28: 0000000000000000 x27: ffffffc008d60404 [ 0.603867] x26: ffffff80c1e3e1a8 x25: ffffffc00877bd78 x24: 0000000000000058 [ 0.603878] x23: ffffffc0081921a8 x22: ffffffc008cb04b0 x21: 0000000000000000 [ 0.603889] x20: 000000000000000b x19: 000000000000000b x18: 0000000000000000 [ 0.603900] x17: 0000000000000001 x16: 0000000100000000 x15: 000000000000000a [ 0.603911] x14: 0000000000000000 x13: ffffff80c1e3c20a x12: ffffffffffffffff [ 0.603922] x11: 0000000000000020 x10: 0000000000000880 x9 : ffffffc008159dac [ 0.603932] x8 : ffffff80c02708e0 x7 : 0000000000000004 x6 : 000000000041a880 [ 0.603943] x5 : 0000000000000001 x4 : ffffff8000000000 x3 : 0000000000000000 [ 0.603954] x2 : 0000000000000000 x1 : 0000000000000001 x0 : ffffffc00a32d3f2 [ 0.603965] Call trace: [ 0.603970] __alloc_pages+0x6c/0x184 [ 0.603977] kmalloc_order+0x40/0x88 [ 0.603984] kmalloc_order_trace+0x30/0xd0 [ 0.603992] __kmalloc_track_caller+0x64/0x19c [ 0.603999] devm_kmalloc+0x5c/0xe0 [ 0.604009] scmi_voltage_protocol_init+0x14c/0x2f4 [ 0.604020] scmi_get_protocol_instance+0x128/0x1f4 [ 0.604030] scmi_devm_protocol_get+0x64/0xc8 [ 0.604037] scmi_regulator_probe+0x5c/0x42c [ 0.604049] scmi_dev_probe+0x28/0x38 [ 0.604056] really_probe+0x1b8/0x380 [ 0.604065] __driver_probe_device+0x14c/0x164 [ 0.604073] driver_probe_device+0x48/0xe0 [ 0.604080] __driver_attach+0x160/0x170 [ 0.604087] bus_for_each_dev+0x78/0xb8 [ 0.604095] driver_attach+0x28/0x30 [ 0.604101] bus_add_driver+0xf4/0x208 [ 0.604108] driver_register+0xb4/0xf0 [ 0.604116] scmi_driver_register+0x5c/0xa4 [ 0.604123] scmi_drv_init+0x28/0x30 [ 0.604132] do_one_initcall+0x80/0x1a4 [ 0.604141] kernel_init_freeable+0x220/0x23c [ 0.604149] kernel_init+0x28/0x128 [ 0.604158] ret_from_fork+0x10/0x20 [ 0.604166] ---[ end trace 0000000000000000 ]--- [ 0.604194] scmi-regulator: probe of scmi_dev.2 failed with error -12 [ 0.604792] arm-scmi brcm_scmi@0: Failed. SCMI protocol 22 not active.
On Wed, Jun 15, 2022 at 05:45:11AM +0200, Florian Fainelli wrote: > > > On 3/30/2022 5:05 PM, Cristian Marussi wrote: > > Using the common protocol helper implementation add support for all new > > SCMIv3.1 extended names commands related to all protocols with the > > exception of SENSOR_AXIS_GET_NAME. > > > > Signed-off-by: Cristian Marussi <cristian.marussi@arm.com> > > This causes the following splat on a platform where regulators fail to > initialize: > Hi Florian, thanks for the report. It seems a memory error while allocating so it was not meant to be solved by the fixes, anyway, I've never seen this splat in my testing and at first sight I cannot see anything wrong in the devm_k* calls inside scmi_voltage_protocol_init...is there any particular config in your setup ? Moreover, the WARNING line 5402 seems to match v5.19-rc1 and it has slightly changed with -rc-1, so I'll try rebasing on that at first and see if I can reproduce the issue locally. Thanks, Cristian > [ 0.603737] ------------[ cut here ]------------ > [ 0.603752] WARNING: CPU: 1 PID: 1 at mm/page_alloc.c:5402 > __alloc_pages+0x6c/0x184 > [ 0.603797] Modules linked in: > [ 0.603809] CPU: 1 PID: 1 Comm: swapper/0 Not tainted > 5.19.0-rc1-g44dbdf3bb3f4 #42 > [ 0.603818] Hardware name: BCX972160SV (DT) > [ 0.603825] pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS > BTYPE=--) > [ 0.603834] pc : __alloc_pages+0x6c/0x184 > [ 0.603841] lr : kmalloc_order+0x40/0x88 > [ 0.603851] sp : ffffffc00a40b850 > [ 0.603856] x29: ffffffc00a40b850 x28: 0000000000000000 x27: > ffffffc008d60404 > [ 0.603867] x26: ffffff80c1e3e1a8 x25: ffffffc00877bd78 x24: > 0000000000000058 > [ 0.603878] x23: ffffffc0081921a8 x22: ffffffc008cb04b0 x21: > 0000000000000000 > [ 0.603889] x20: 000000000000000b x19: 000000000000000b x18: > 0000000000000000 > [ 0.603900] x17: 0000000000000001 x16: 0000000100000000 x15: > 000000000000000a > [ 0.603911] x14: 0000000000000000 x13: ffffff80c1e3c20a x12: > ffffffffffffffff > [ 0.603922] x11: 0000000000000020 x10: 0000000000000880 x9 : > ffffffc008159dac > [ 0.603932] x8 : ffffff80c02708e0 x7 : 0000000000000004 x6 : > 000000000041a880 > [ 0.603943] x5 : 0000000000000001 x4 : ffffff8000000000 x3 : > 0000000000000000 > [ 0.603954] x2 : 0000000000000000 x1 : 0000000000000001 x0 : > ffffffc00a32d3f2 > [ 0.603965] Call trace: > [ 0.603970] __alloc_pages+0x6c/0x184 > [ 0.603977] kmalloc_order+0x40/0x88 > [ 0.603984] kmalloc_order_trace+0x30/0xd0 > [ 0.603992] __kmalloc_track_caller+0x64/0x19c > [ 0.603999] devm_kmalloc+0x5c/0xe0 > [ 0.604009] scmi_voltage_protocol_init+0x14c/0x2f4 > [ 0.604020] scmi_get_protocol_instance+0x128/0x1f4 > [ 0.604030] scmi_devm_protocol_get+0x64/0xc8 > [ 0.604037] scmi_regulator_probe+0x5c/0x42c > [ 0.604049] scmi_dev_probe+0x28/0x38 > [ 0.604056] really_probe+0x1b8/0x380 > [ 0.604065] __driver_probe_device+0x14c/0x164 > [ 0.604073] driver_probe_device+0x48/0xe0 > [ 0.604080] __driver_attach+0x160/0x170 > [ 0.604087] bus_for_each_dev+0x78/0xb8 > [ 0.604095] driver_attach+0x28/0x30 > [ 0.604101] bus_add_driver+0xf4/0x208 > [ 0.604108] driver_register+0xb4/0xf0 > [ 0.604116] scmi_driver_register+0x5c/0xa4 > [ 0.604123] scmi_drv_init+0x28/0x30 > [ 0.604132] do_one_initcall+0x80/0x1a4 > [ 0.604141] kernel_init_freeable+0x220/0x23c > [ 0.604149] kernel_init+0x28/0x128 > [ 0.604158] ret_from_fork+0x10/0x20 > [ 0.604166] ---[ end trace 0000000000000000 ]--- > [ 0.604194] scmi-regulator: probe of scmi_dev.2 failed with error -12 > [ 0.604792] arm-scmi brcm_scmi@0: Failed. SCMI protocol 22 not active. > -- > Florian
On Wed, Jun 15, 2022 at 09:18:03AM +0100, Cristian Marussi wrote: > On Wed, Jun 15, 2022 at 05:45:11AM +0200, Florian Fainelli wrote: > > > > > > On 3/30/2022 5:05 PM, Cristian Marussi wrote: > > > Using the common protocol helper implementation add support for all new > > > SCMIv3.1 extended names commands related to all protocols with the > > > exception of SENSOR_AXIS_GET_NAME. > > > > > > Signed-off-by: Cristian Marussi <cristian.marussi@arm.com> > > > > This causes the following splat on a platform where regulators fail to > > initialize: > > > > Hi Florian, > > thanks for the report. > > It seems a memory error while allocating so it was not meant to be > solved by the fixes, anyway, I've never seen this splat in my testing > and at first sight I cannot see anything wrong in the devm_k* calls > inside scmi_voltage_protocol_init...is there any particular config in > your setup ? > > Moreover, the WARNING line 5402 seems to match v5.19-rc1 and it has > slightly changed with -rc-1, so I'll try rebasing on that at first and > see if I can reproduce the issue locally. > I just re-tested the series rebased on v519-rc1 plus fixes and I cannot reproduce in my setup with a few (~9) good and bad voltage domains. How many voltage domains are advertised by the platform in your setup ? Thanks, Cristian > > > [ 0.603737] ------------[ cut here ]------------ > > [ 0.603752] WARNING: CPU: 1 PID: 1 at mm/page_alloc.c:5402 > > __alloc_pages+0x6c/0x184 > > [ 0.603797] Modules linked in: > > [ 0.603809] CPU: 1 PID: 1 Comm: swapper/0 Not tainted > > 5.19.0-rc1-g44dbdf3bb3f4 #42 > > [ 0.603818] Hardware name: BCX972160SV (DT) > > [ 0.603825] pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS > > BTYPE=--) > > [ 0.603834] pc : __alloc_pages+0x6c/0x184 > > [ 0.603841] lr : kmalloc_order+0x40/0x88 > > [ 0.603851] sp : ffffffc00a40b850 > > [ 0.603856] x29: ffffffc00a40b850 x28: 0000000000000000 x27: > > ffffffc008d60404 > > [ 0.603867] x26: ffffff80c1e3e1a8 x25: ffffffc00877bd78 x24: > > 0000000000000058 > > [ 0.603878] x23: ffffffc0081921a8 x22: ffffffc008cb04b0 x21: > > 0000000000000000 > > [ 0.603889] x20: 000000000000000b x19: 000000000000000b x18: > > 0000000000000000 > > [ 0.603900] x17: 0000000000000001 x16: 0000000100000000 x15: > > 000000000000000a > > [ 0.603911] x14: 0000000000000000 x13: ffffff80c1e3c20a x12: > > ffffffffffffffff > > [ 0.603922] x11: 0000000000000020 x10: 0000000000000880 x9 : > > ffffffc008159dac > > [ 0.603932] x8 : ffffff80c02708e0 x7 : 0000000000000004 x6 : > > 000000000041a880 > > [ 0.603943] x5 : 0000000000000001 x4 : ffffff8000000000 x3 : > > 0000000000000000 > > [ 0.603954] x2 : 0000000000000000 x1 : 0000000000000001 x0 : > > ffffffc00a32d3f2 > > [ 0.603965] Call trace: > > [ 0.603970] __alloc_pages+0x6c/0x184 > > [ 0.603977] kmalloc_order+0x40/0x88 > > [ 0.603984] kmalloc_order_trace+0x30/0xd0 > > [ 0.603992] __kmalloc_track_caller+0x64/0x19c > > [ 0.603999] devm_kmalloc+0x5c/0xe0 > > [ 0.604009] scmi_voltage_protocol_init+0x14c/0x2f4 > > [ 0.604020] scmi_get_protocol_instance+0x128/0x1f4 > > [ 0.604030] scmi_devm_protocol_get+0x64/0xc8 > > [ 0.604037] scmi_regulator_probe+0x5c/0x42c > > [ 0.604049] scmi_dev_probe+0x28/0x38 > > [ 0.604056] really_probe+0x1b8/0x380 > > [ 0.604065] __driver_probe_device+0x14c/0x164 > > [ 0.604073] driver_probe_device+0x48/0xe0 > > [ 0.604080] __driver_attach+0x160/0x170 > > [ 0.604087] bus_for_each_dev+0x78/0xb8 > > [ 0.604095] driver_attach+0x28/0x30 > > [ 0.604101] bus_add_driver+0xf4/0x208 > > [ 0.604108] driver_register+0xb4/0xf0 > > [ 0.604116] scmi_driver_register+0x5c/0xa4 > > [ 0.604123] scmi_drv_init+0x28/0x30 > > [ 0.604132] do_one_initcall+0x80/0x1a4 > > [ 0.604141] kernel_init_freeable+0x220/0x23c > > [ 0.604149] kernel_init+0x28/0x128 > > [ 0.604158] ret_from_fork+0x10/0x20 > > [ 0.604166] ---[ end trace 0000000000000000 ]--- > > [ 0.604194] scmi-regulator: probe of scmi_dev.2 failed with error -12 > > [ 0.604792] arm-scmi brcm_scmi@0: Failed. SCMI protocol 22 not active. > > -- > > Florian
On 6/15/22 02:40, Cristian Marussi wrote: > On Wed, Jun 15, 2022 at 09:18:03AM +0100, Cristian Marussi wrote: >> On Wed, Jun 15, 2022 at 05:45:11AM +0200, Florian Fainelli wrote: >>> >>> >>> On 3/30/2022 5:05 PM, Cristian Marussi wrote: >>>> Using the common protocol helper implementation add support for all new >>>> SCMIv3.1 extended names commands related to all protocols with the >>>> exception of SENSOR_AXIS_GET_NAME. >>>> >>>> Signed-off-by: Cristian Marussi <cristian.marussi@arm.com> >>> >>> This causes the following splat on a platform where regulators fail to >>> initialize: >>> >> >> Hi Florian, >> >> thanks for the report. >> >> It seems a memory error while allocating so it was not meant to be >> solved by the fixes, anyway, I've never seen this splat in my testing >> and at first sight I cannot see anything wrong in the devm_k* calls >> inside scmi_voltage_protocol_init...is there any particular config in >> your setup ? >> >> Moreover, the WARNING line 5402 seems to match v5.19-rc1 and it has >> slightly changed with -rc-1, so I'll try rebasing on that at first and >> see if I can reproduce the issue locally. >> > > I just re-tested the series rebased on v519-rc1 plus fixes and I cannot > reproduce in my setup with a few (~9) good and bad voltage domains. > > How many voltage domains are advertised by the platform in your setup ? There are 11 voltage regulators on this platform, and of course, now that I am trying to reproduce the splat I reported I just cannot anymore... I will let you know if there is anything that needs to be done. Thanks for being responsive as usual!
On Wed, Jun 15, 2022 at 09:10:03AM -0700, Florian Fainelli wrote: > On 6/15/22 02:40, Cristian Marussi wrote: > > On Wed, Jun 15, 2022 at 09:18:03AM +0100, Cristian Marussi wrote: > > > On Wed, Jun 15, 2022 at 05:45:11AM +0200, Florian Fainelli wrote: > > > > > > > > > > > > On 3/30/2022 5:05 PM, Cristian Marussi wrote: > > > > > Using the common protocol helper implementation add support for all new > > > > > SCMIv3.1 extended names commands related to all protocols with the > > > > > exception of SENSOR_AXIS_GET_NAME. > > > > > > > > > > Signed-off-by: Cristian Marussi <cristian.marussi@arm.com> > > > > > > > > This causes the following splat on a platform where regulators fail to > > > > initialize: > > > > > > > > > > Hi Florian, > > > > > > thanks for the report. > > > > > > It seems a memory error while allocating so it was not meant to be > > > solved by the fixes, anyway, I've never seen this splat in my testing > > > and at first sight I cannot see anything wrong in the devm_k* calls > > > inside scmi_voltage_protocol_init...is there any particular config in > > > your setup ? > > > > > > Moreover, the WARNING line 5402 seems to match v5.19-rc1 and it has > > > slightly changed with -rc-1, so I'll try rebasing on that at first and > > > see if I can reproduce the issue locally. > > > > > > > I just re-tested the series rebased on v519-rc1 plus fixes and I cannot > > reproduce in my setup with a few (~9) good and bad voltage domains. > > > > How many voltage domains are advertised by the platform in your setup ? > > There are 11 voltage regulators on this platform, and of course, now that I > am trying to reproduce the splat I reported I just cannot anymore... I will > let you know if there is anything that needs to be done. Thanks for being > responsive as usual! ... you're welcome... I'm trying to figure out where an abnormal mem request could happen... can you try adding this (for brutal debugging) when you try ? (...just to rule out funny fw replies.... :D) Thanks, Cristian --->8---- diff --git a/drivers/firmware/arm_scmi/voltage.c b/drivers/firmware/arm_scmi/voltage.c index 895741b66f5a..fd841292df5c 100644 --- a/drivers/firmware/arm_scmi/voltage.c +++ b/drivers/firmware/arm_scmi/voltage.c @@ -108,6 +108,9 @@ static int scmi_init_voltage_levels(struct device *dev, return -EINVAL; } + dev_info(dev, "num_returned:%d num_remaining:%d\n", + num_returned, num_remaining); + v->levels_uv = devm_kcalloc(dev, num_levels, sizeof(u32), GFP_KERNEL); if (!v->levels_uv) return -ENOMEM; > -- > Florian
On 6/15/22 09:29, Cristian Marussi wrote: > On Wed, Jun 15, 2022 at 09:10:03AM -0700, Florian Fainelli wrote: >> On 6/15/22 02:40, Cristian Marussi wrote: >>> On Wed, Jun 15, 2022 at 09:18:03AM +0100, Cristian Marussi wrote: >>>> On Wed, Jun 15, 2022 at 05:45:11AM +0200, Florian Fainelli wrote: >>>>> >>>>> >>>>> On 3/30/2022 5:05 PM, Cristian Marussi wrote: >>>>>> Using the common protocol helper implementation add support for all new >>>>>> SCMIv3.1 extended names commands related to all protocols with the >>>>>> exception of SENSOR_AXIS_GET_NAME. >>>>>> >>>>>> Signed-off-by: Cristian Marussi <cristian.marussi@arm.com> >>>>> >>>>> This causes the following splat on a platform where regulators fail to >>>>> initialize: >>>>> >>>> >>>> Hi Florian, >>>> >>>> thanks for the report. >>>> >>>> It seems a memory error while allocating so it was not meant to be >>>> solved by the fixes, anyway, I've never seen this splat in my testing >>>> and at first sight I cannot see anything wrong in the devm_k* calls >>>> inside scmi_voltage_protocol_init...is there any particular config in >>>> your setup ? >>>> >>>> Moreover, the WARNING line 5402 seems to match v5.19-rc1 and it has >>>> slightly changed with -rc-1, so I'll try rebasing on that at first and >>>> see if I can reproduce the issue locally. >>>> >>> >>> I just re-tested the series rebased on v519-rc1 plus fixes and I cannot >>> reproduce in my setup with a few (~9) good and bad voltage domains. >>> >>> How many voltage domains are advertised by the platform in your setup ? >> >> There are 11 voltage regulators on this platform, and of course, now that I >> am trying to reproduce the splat I reported I just cannot anymore... I will >> let you know if there is anything that needs to be done. Thanks for being >> responsive as usual! > > ... you're welcome... > > I'm trying to figure out where an abnormal mem request could happen... I think the problem is/was with the number of voltage domains being reported which was way too big and passed directly, without any clamping to devm_kcalloc() resulting the splat indicating that the allocation was beyond the MAX_ORDER. The specification allows for up to 2^16 domains which would still be way too much to allocate using kmalloc() so we could/should consider vmalloc() here eventually? In all likelihood though we probably won't find a system with 65k voltage domains. > > can you try adding this (for brutal debugging) when you try ? > (...just to rule out funny fw replies.... :D) Sure, nothing weird coming out and it succeeded in enumerating all of the regulators, I smell a transient issue with our firmware implementation, maybe... [ 0.560544] arm-scmi brcm_scmi@0: num_returned:1 num_remaining:0 [ 0.560617] arm-scmi brcm_scmi@0: num_returned:1 num_remaining:0 [ 0.560673] arm-scmi brcm_scmi@0: num_returned:1 num_remaining:0 [ 0.560730] arm-scmi brcm_scmi@0: num_returned:1 num_remaining:0 [ 0.560881] arm-scmi brcm_scmi@0: num_returned:1 num_remaining:0 [ 0.560940] arm-scmi brcm_scmi@0: num_returned:1 num_remaining:0 [ 0.560996] arm-scmi brcm_scmi@0: num_returned:1 num_remaining:0 [ 0.561054] arm-scmi brcm_scmi@0: num_returned:1 num_remaining:0 [ 0.561110] arm-scmi brcm_scmi@0: num_returned:1 num_remaining:0 [ 0.561168] arm-scmi brcm_scmi@0: num_returned:1 num_remaining:0 [ 0.561225] arm-scmi brcm_scmi@0: num_returned:1 num_remaining:0 [ 0.561652] scmi-regulator scmi_dev.2: Regulator stb_vreg_2 registered for domain [2] [ 0.561858] scmi-regulator scmi_dev.2: Regulator stb_vreg_3 registered for domain [3] [ 0.562030] scmi-regulator scmi_dev.2: Regulator stb_vreg_4 registered for domain [4] [ 0.562190] scmi-regulator scmi_dev.2: Regulator stb_vreg_5 registered for domain [5] [ 0.564427] scmi-regulator scmi_dev.2: Regulator stb_vreg_6 registered for domain [6] [ 0.564638] scmi-regulator scmi_dev.2: Regulator stb_vreg_7 registered for domain [7] [ 0.564817] scmi-regulator scmi_dev.2: Regulator stb_vreg_8 registered for domain [8] [ 0.565030] scmi-regulator scmi_dev.2: Regulator stb_vreg_9 registered for domain [9] [ 0.565191] scmi-regulator scmi_dev.2: Regulator stb_vreg_10 registered for domain [10]
On Wed, Jun 15, 2022 at 10:19:53AM -0700, Florian Fainelli wrote: > On 6/15/22 09:29, Cristian Marussi wrote: > > On Wed, Jun 15, 2022 at 09:10:03AM -0700, Florian Fainelli wrote: > > > On 6/15/22 02:40, Cristian Marussi wrote: > > > > On Wed, Jun 15, 2022 at 09:18:03AM +0100, Cristian Marussi wrote: > > > > > On Wed, Jun 15, 2022 at 05:45:11AM +0200, Florian Fainelli wrote: > > > > > > > > > > > > > > > > > > On 3/30/2022 5:05 PM, Cristian Marussi wrote: > > > > > > > Using the common protocol helper implementation add support for all new > > > > > > > SCMIv3.1 extended names commands related to all protocols with the > > > > > > > exception of SENSOR_AXIS_GET_NAME. > > > > > > > > > > > > > > Signed-off-by: Cristian Marussi <cristian.marussi@arm.com> > > > > > > > > > > > > This causes the following splat on a platform where regulators fail to > > > > > > initialize: > > > > > > > > > > > > > > > > Hi Florian, > > > > > > > > > > thanks for the report. > > > > > > > > > > It seems a memory error while allocating so it was not meant to be > > > > > solved by the fixes, anyway, I've never seen this splat in my testing > > > > > and at first sight I cannot see anything wrong in the devm_k* calls > > > > > inside scmi_voltage_protocol_init...is there any particular config in > > > > > your setup ? > > > > > > > > > > Moreover, the WARNING line 5402 seems to match v5.19-rc1 and it has > > > > > slightly changed with -rc-1, so I'll try rebasing on that at first and > > > > > see if I can reproduce the issue locally. > > > > > > > > > > > > > I just re-tested the series rebased on v519-rc1 plus fixes and I cannot > > > > reproduce in my setup with a few (~9) good and bad voltage domains. > > > > > > > > How many voltage domains are advertised by the platform in your setup ? > > > > > > There are 11 voltage regulators on this platform, and of course, now that I > > > am trying to reproduce the splat I reported I just cannot anymore... I will > > > let you know if there is anything that needs to be done. Thanks for being > > > responsive as usual! > > > > ... you're welcome... > > > > I'm trying to figure out where an abnormal mem request could happen... > > I think the problem is/was with the number of voltage domains being reported > which was way too big and passed directly, without any clamping to > devm_kcalloc() resulting the splat indicating that the allocation was beyond > the MAX_ORDER. The specification allows for up to 2^16 domains which would > still be way too much to allocate using kmalloc() so we could/should > consider vmalloc() here eventually? > Yes I was suspicious about the same and I was starting to think about vmalloc in general across all domain enumerations even if this is may not be the case for your splat... > In all likelihood though we probably won't find a system with 65k voltage > domains. > > > > > can you try adding this (for brutal debugging) when you try ? > > (...just to rule out funny fw replies.... :D) > > Sure, nothing weird coming out and it succeeded in enumerating all of the > regulators, I smell a transient issue with our firmware implementation, > maybe... > > [ 0.560544] arm-scmi brcm_scmi@0: num_returned:1 num_remaining:0 > [ 0.560617] arm-scmi brcm_scmi@0: num_returned:1 num_remaining:0 > [ 0.560673] arm-scmi brcm_scmi@0: num_returned:1 num_remaining:0 > [ 0.560730] arm-scmi brcm_scmi@0: num_returned:1 num_remaining:0 > [ 0.560881] arm-scmi brcm_scmi@0: num_returned:1 num_remaining:0 > [ 0.560940] arm-scmi brcm_scmi@0: num_returned:1 num_remaining:0 > [ 0.560996] arm-scmi brcm_scmi@0: num_returned:1 num_remaining:0 > [ 0.561054] arm-scmi brcm_scmi@0: num_returned:1 num_remaining:0 > [ 0.561110] arm-scmi brcm_scmi@0: num_returned:1 num_remaining:0 > [ 0.561168] arm-scmi brcm_scmi@0: num_returned:1 num_remaining:0 > [ 0.561225] arm-scmi brcm_scmi@0: num_returned:1 num_remaining:0 > [ 0.561652] scmi-regulator scmi_dev.2: Regulator stb_vreg_2 registered > for domain [2] > [ 0.561858] scmi-regulator scmi_dev.2: Regulator stb_vreg_3 registered > for domain [3] > [ 0.562030] scmi-regulator scmi_dev.2: Regulator stb_vreg_4 registered > for domain [4] > [ 0.562190] scmi-regulator scmi_dev.2: Regulator stb_vreg_5 registered > for domain [5] > [ 0.564427] scmi-regulator scmi_dev.2: Regulator stb_vreg_6 registered > for domain [6] > [ 0.564638] scmi-regulator scmi_dev.2: Regulator stb_vreg_7 registered > for domain [7] > [ 0.564817] scmi-regulator scmi_dev.2: Regulator stb_vreg_8 registered > for domain [8] > [ 0.565030] scmi-regulator scmi_dev.2: Regulator stb_vreg_9 registered > for domain [9] > [ 0.565191] scmi-regulator scmi_dev.2: Regulator stb_vreg_10 registered > for domain [10] > Ok, this was another place where a reply carrying a consistent number of non-segmented entries could trigger an jumbo-request to kmalloc... ..maybe this is where a transient fw issue can reply with a dramatic numbers of possible (non-segmented) levels (num_returned+num_remaining) (I filter out replies carrying descriptors for segmented voltage-levels that does not come in triplets (returned:3 remaining:0) since they are out of spec...but I just hit today on another fw sending such out of spec reply for clocks and I will relax such req probably not to break too many out-of-spec blobs out in the wild :P) So let me know if got new splats... Thanks, Cristian
On 6/15/22 10:32, Cristian Marussi wrote: > > Ok, this was another place where a reply carrying a consistent number of > non-segmented entries could trigger an jumbo-request to kmalloc... > > ..maybe this is where a transient fw issue can reply with a dramatic > numbers of possible (non-segmented) levels (num_returned+num_remaining) > > (I filter out replies carrying descriptors for segmented voltage-levels > that does not come in triplets (returned:3 remaining:0) since they are out > of spec...but I just hit today on another fw sending such out of spec > reply for clocks and I will relax such req probably not to break too > many out-of-spec blobs out in the wild :P) > > So let me know if got new splats... This turned out to be a nice firmware bug, someone *cough* *cough* decided to add some extra "protection" using a temporary buffer and they completely forgot the newly added voltage domain service, so we would just return stale information, wonderful. This is now fixed. Thanks a lot for your responsiveness and support.
diff --git a/drivers/firmware/arm_scmi/clock.c b/drivers/firmware/arm_scmi/clock.c index d78339dc2fdc..6dd4150b761b 100644 --- a/drivers/firmware/arm_scmi/clock.c +++ b/drivers/firmware/arm_scmi/clock.c @@ -16,6 +16,7 @@ enum scmi_clock_protocol_cmd { CLOCK_RATE_SET = 0x5, CLOCK_RATE_GET = 0x6, CLOCK_CONFIG_SET = 0x7, + CLOCK_NAME_GET = 0x8, }; struct scmi_msg_resp_clock_protocol_attributes { @@ -27,7 +28,8 @@ struct scmi_msg_resp_clock_protocol_attributes { struct scmi_msg_resp_clock_attributes { __le32 attributes; #define CLOCK_ENABLE BIT(0) - u8 name[SCMI_MAX_STR_SIZE]; +#define SUPPORTS_EXTENDED_NAMES(x) ((x) & BIT(29)) + u8 name[SCMI_SHORT_NAME_MAX_SIZE]; __le32 clock_enable_latency; }; @@ -108,9 +110,11 @@ scmi_clock_protocol_attributes_get(const struct scmi_protocol_handle *ph, } static int scmi_clock_attributes_get(const struct scmi_protocol_handle *ph, - u32 clk_id, struct scmi_clock_info *clk) + u32 clk_id, struct scmi_clock_info *clk, + u32 version) { int ret; + u32 attributes; struct scmi_xfer *t; struct scmi_msg_resp_clock_attributes *attr; @@ -124,6 +128,7 @@ static int scmi_clock_attributes_get(const struct scmi_protocol_handle *ph, ret = ph->xops->do_xfer(ph, t); if (!ret) { + attributes = le32_to_cpu(attr->attributes); strlcpy(clk->name, attr->name, SCMI_MAX_STR_SIZE); /* Is optional field clock_enable_latency provided ? */ if (t->rx.len == sizeof(*attr)) @@ -132,6 +137,16 @@ static int scmi_clock_attributes_get(const struct scmi_protocol_handle *ph, } ph->xops->xfer_put(ph, t); + + /* + * If supported overwrite short name with the extended one; + * on error just carry on and use already provided short name. + */ + if (!ret && PROTOCOL_REV_MAJOR(version) >= 0x2 && + SUPPORTS_EXTENDED_NAMES(attributes)) + ph->hops->extended_name_get(ph, CLOCK_NAME_GET, clk_id, + clk->name, SCMI_MAX_STR_SIZE); + return ret; } @@ -401,7 +416,7 @@ static int scmi_clock_protocol_init(const struct scmi_protocol_handle *ph) for (clkid = 0; clkid < cinfo->num_clocks; clkid++) { struct scmi_clock_info *clk = cinfo->clk + clkid; - ret = scmi_clock_attributes_get(ph, clkid, clk); + ret = scmi_clock_attributes_get(ph, clkid, clk, version); if (!ret) scmi_clock_describe_rates_get(ph, clkid, clk); } diff --git a/drivers/firmware/arm_scmi/perf.c b/drivers/firmware/arm_scmi/perf.c index 0e9703310758..9e046fd121b9 100644 --- a/drivers/firmware/arm_scmi/perf.c +++ b/drivers/firmware/arm_scmi/perf.c @@ -32,6 +32,7 @@ enum scmi_performance_protocol_cmd { PERF_NOTIFY_LIMITS = 0x9, PERF_NOTIFY_LEVEL = 0xa, PERF_DESCRIBE_FASTCHANNEL = 0xb, + PERF_DOMAIN_NAME_GET = 0xc, }; struct scmi_opp { @@ -56,10 +57,11 @@ struct scmi_msg_resp_perf_domain_attributes { #define SUPPORTS_PERF_LIMIT_NOTIFY(x) ((x) & BIT(29)) #define SUPPORTS_PERF_LEVEL_NOTIFY(x) ((x) & BIT(28)) #define SUPPORTS_PERF_FASTCHANNELS(x) ((x) & BIT(27)) +#define SUPPORTS_EXTENDED_NAMES(x) ((x) & BIT(26)) __le32 rate_limit_us; __le32 sustained_freq_khz; __le32 sustained_perf_level; - u8 name[SCMI_MAX_STR_SIZE]; + u8 name[SCMI_SHORT_NAME_MAX_SIZE]; }; struct scmi_msg_perf_describe_levels { @@ -209,9 +211,11 @@ static int scmi_perf_attributes_get(const struct scmi_protocol_handle *ph, static int scmi_perf_domain_attributes_get(const struct scmi_protocol_handle *ph, - u32 domain, struct perf_dom_info *dom_info) + u32 domain, struct perf_dom_info *dom_info, + u32 version) { int ret; + u32 flags; struct scmi_xfer *t; struct scmi_msg_resp_perf_domain_attributes *attr; @@ -225,7 +229,7 @@ scmi_perf_domain_attributes_get(const struct scmi_protocol_handle *ph, ret = ph->xops->do_xfer(ph, t); if (!ret) { - u32 flags = le32_to_cpu(attr->flags); + flags = le32_to_cpu(attr->flags); dom_info->set_limits = SUPPORTS_SET_LIMITS(flags); dom_info->set_perf = SUPPORTS_SET_PERF_LVL(flags); @@ -248,6 +252,16 @@ scmi_perf_domain_attributes_get(const struct scmi_protocol_handle *ph, } ph->xops->xfer_put(ph, t); + + /* + * If supported overwrite short name with the extended one; + * on error just carry on and use already provided short name. + */ + if (!ret && PROTOCOL_REV_MAJOR(version) >= 0x3 && + SUPPORTS_EXTENDED_NAMES(flags)) + ph->hops->extended_name_get(ph, PERF_DOMAIN_NAME_GET, domain, + dom_info->name, SCMI_MAX_STR_SIZE); + return ret; } @@ -902,7 +916,7 @@ static int scmi_perf_protocol_init(const struct scmi_protocol_handle *ph) for (domain = 0; domain < pinfo->num_domains; domain++) { struct perf_dom_info *dom = pinfo->dom_info + domain; - scmi_perf_domain_attributes_get(ph, domain, dom); + scmi_perf_domain_attributes_get(ph, domain, dom, version); scmi_perf_describe_levels_get(ph, domain, dom); if (dom->perf_fastchannels) diff --git a/drivers/firmware/arm_scmi/power.c b/drivers/firmware/arm_scmi/power.c index e378a3eb0d07..964882cc8747 100644 --- a/drivers/firmware/arm_scmi/power.c +++ b/drivers/firmware/arm_scmi/power.c @@ -18,6 +18,7 @@ enum scmi_power_protocol_cmd { POWER_STATE_SET = 0x4, POWER_STATE_GET = 0x5, POWER_STATE_NOTIFY = 0x6, + POWER_DOMAIN_NAME_GET = 0x8, }; struct scmi_msg_resp_power_attributes { @@ -33,7 +34,8 @@ struct scmi_msg_resp_power_domain_attributes { #define SUPPORTS_STATE_SET_NOTIFY(x) ((x) & BIT(31)) #define SUPPORTS_STATE_SET_ASYNC(x) ((x) & BIT(30)) #define SUPPORTS_STATE_SET_SYNC(x) ((x) & BIT(29)) - u8 name[SCMI_MAX_STR_SIZE]; +#define SUPPORTS_EXTENDED_NAMES(x) ((x) & BIT(27)) + u8 name[SCMI_SHORT_NAME_MAX_SIZE]; }; struct scmi_power_set_state { @@ -97,9 +99,11 @@ static int scmi_power_attributes_get(const struct scmi_protocol_handle *ph, static int scmi_power_domain_attributes_get(const struct scmi_protocol_handle *ph, - u32 domain, struct power_dom_info *dom_info) + u32 domain, struct power_dom_info *dom_info, + u32 version) { int ret; + u32 flags; struct scmi_xfer *t; struct scmi_msg_resp_power_domain_attributes *attr; @@ -113,15 +117,26 @@ scmi_power_domain_attributes_get(const struct scmi_protocol_handle *ph, ret = ph->xops->do_xfer(ph, t); if (!ret) { - u32 flags = le32_to_cpu(attr->flags); + flags = le32_to_cpu(attr->flags); dom_info->state_set_notify = SUPPORTS_STATE_SET_NOTIFY(flags); dom_info->state_set_async = SUPPORTS_STATE_SET_ASYNC(flags); dom_info->state_set_sync = SUPPORTS_STATE_SET_SYNC(flags); strlcpy(dom_info->name, attr->name, SCMI_MAX_STR_SIZE); } - ph->xops->xfer_put(ph, t); + + /* + * If supported overwrite short name with the extended one; + * on error just carry on and use already provided short name. + */ + if (!ret && PROTOCOL_REV_MAJOR(version) >= 0x3 && + SUPPORTS_EXTENDED_NAMES(flags)) { + ph->hops->extended_name_get(ph, POWER_DOMAIN_NAME_GET, + domain, dom_info->name, + SCMI_MAX_STR_SIZE); + } + return ret; } @@ -308,7 +323,7 @@ static int scmi_power_protocol_init(const struct scmi_protocol_handle *ph) for (domain = 0; domain < pinfo->num_domains; domain++) { struct power_dom_info *dom = pinfo->dom_info + domain; - scmi_power_domain_attributes_get(ph, domain, dom); + scmi_power_domain_attributes_get(ph, domain, dom, version); } pinfo->version = version; diff --git a/drivers/firmware/arm_scmi/protocols.h b/drivers/firmware/arm_scmi/protocols.h index 5461fa333152..60ea880b3855 100644 --- a/drivers/firmware/arm_scmi/protocols.h +++ b/drivers/firmware/arm_scmi/protocols.h @@ -24,6 +24,8 @@ #include <asm/unaligned.h> +#define SCMI_SHORT_NAME_MAX_SIZE 16 + #define PROTOCOL_REV_MINOR_MASK GENMASK(15, 0) #define PROTOCOL_REV_MAJOR_MASK GENMASK(31, 16) #define PROTOCOL_REV_MAJOR(x) ((u16)(FIELD_GET(PROTOCOL_REV_MAJOR_MASK, (x)))) diff --git a/drivers/firmware/arm_scmi/reset.c b/drivers/firmware/arm_scmi/reset.c index cc465632aa1a..a420a9102094 100644 --- a/drivers/firmware/arm_scmi/reset.c +++ b/drivers/firmware/arm_scmi/reset.c @@ -17,6 +17,7 @@ enum scmi_reset_protocol_cmd { RESET_DOMAIN_ATTRIBUTES = 0x3, RESET = 0x4, RESET_NOTIFY = 0x5, + RESET_DOMAIN_NAME_GET = 0x6, }; #define NUM_RESET_DOMAIN_MASK 0xffff @@ -26,8 +27,9 @@ struct scmi_msg_resp_reset_domain_attributes { __le32 attributes; #define SUPPORTS_ASYNC_RESET(x) ((x) & BIT(31)) #define SUPPORTS_NOTIFY_RESET(x) ((x) & BIT(30)) +#define SUPPORTS_EXTENDED_NAMES(x) ((x) & BIT(29)) __le32 latency; - u8 name[SCMI_MAX_STR_SIZE]; + u8 name[SCMI_SHORT_NAME_MAX_SIZE]; }; struct scmi_msg_reset_domain_reset { @@ -89,9 +91,11 @@ static int scmi_reset_attributes_get(const struct scmi_protocol_handle *ph, static int scmi_reset_domain_attributes_get(const struct scmi_protocol_handle *ph, - u32 domain, struct reset_dom_info *dom_info) + u32 domain, struct reset_dom_info *dom_info, + u32 version) { int ret; + u32 attributes; struct scmi_xfer *t; struct scmi_msg_resp_reset_domain_attributes *attr; @@ -105,7 +109,7 @@ scmi_reset_domain_attributes_get(const struct scmi_protocol_handle *ph, ret = ph->xops->do_xfer(ph, t); if (!ret) { - u32 attributes = le32_to_cpu(attr->attributes); + attributes = le32_to_cpu(attr->attributes); dom_info->async_reset = SUPPORTS_ASYNC_RESET(attributes); dom_info->reset_notify = SUPPORTS_NOTIFY_RESET(attributes); @@ -116,6 +120,16 @@ scmi_reset_domain_attributes_get(const struct scmi_protocol_handle *ph, } ph->xops->xfer_put(ph, t); + + /* + * If supported overwrite short name with the extended one; + * on error just carry on and use already provided short name. + */ + if (!ret && PROTOCOL_REV_MAJOR(version) >= 0x3 && + SUPPORTS_EXTENDED_NAMES(attributes)) + ph->hops->extended_name_get(ph, RESET_DOMAIN_NAME_GET, domain, + dom_info->name, SCMI_MAX_STR_SIZE); + return ret; } @@ -320,7 +334,7 @@ static int scmi_reset_protocol_init(const struct scmi_protocol_handle *ph) for (domain = 0; domain < pinfo->num_domains; domain++) { struct reset_dom_info *dom = pinfo->dom_info + domain; - scmi_reset_domain_attributes_get(ph, domain, dom); + scmi_reset_domain_attributes_get(ph, domain, dom, version); } pinfo->version = version; diff --git a/drivers/firmware/arm_scmi/sensors.c b/drivers/firmware/arm_scmi/sensors.c index 07c28e249f0c..6fd8b3a874ea 100644 --- a/drivers/firmware/arm_scmi/sensors.c +++ b/drivers/firmware/arm_scmi/sensors.c @@ -27,6 +27,7 @@ enum scmi_sensor_protocol_cmd { SENSOR_CONFIG_GET = 0x9, SENSOR_CONFIG_SET = 0xA, SENSOR_CONTINUOUS_UPDATE_NOTIFY = 0xB, + SENSOR_NAME_GET = 0xC, }; struct scmi_msg_resp_sensor_attributes { @@ -71,6 +72,7 @@ struct scmi_msg_resp_sensor_description { __le32 attributes_low; /* Common attributes_low macros */ #define SUPPORTS_ASYNC_READ(x) FIELD_GET(BIT(31), (x)) +#define SUPPORTS_EXTENDED_NAMES(x) FIELD_GET(BIT(29), (x)) #define NUM_TRIP_POINTS(x) FIELD_GET(GENMASK(7, 0), (x)) __le32 attributes_high; /* Common attributes_high macros */ @@ -78,7 +80,7 @@ struct scmi_msg_resp_sensor_description { #define SENSOR_SCALE_SIGN BIT(4) #define SENSOR_SCALE_EXTEND GENMASK(31, 5) #define SENSOR_TYPE(x) FIELD_GET(GENMASK(7, 0), (x)) - u8 name[SCMI_MAX_STR_SIZE]; + u8 name[SCMI_SHORT_NAME_MAX_SIZE]; /* only for version > 2.0 */ __le32 power; __le32 resolution; @@ -519,6 +521,17 @@ static int scmi_sensor_description_get(const struct scmi_protocol_handle *ph, SCMI_MAX_NUM_SENSOR_AXIS); strlcpy(s->name, sdesc->name, SCMI_MAX_STR_SIZE); + /* + * If supported overwrite short name with the extended + * one; on error just carry on and use already provided + * short name. + */ + if (PROTOCOL_REV_MAJOR(si->version) >= 0x3 && + SUPPORTS_EXTENDED_NAMES(attrl)) + ph->hops->extended_name_get(ph, SENSOR_NAME_GET, + s->id, s->name, + SCMI_MAX_STR_SIZE); + if (s->extended_scalar_attrs) { s->sensor_power = le32_to_cpu(sdesc->power); dsize += sizeof(sdesc->power); diff --git a/drivers/firmware/arm_scmi/voltage.c b/drivers/firmware/arm_scmi/voltage.c index 7aa887a7cbd2..5d58ba724eeb 100644 --- a/drivers/firmware/arm_scmi/voltage.c +++ b/drivers/firmware/arm_scmi/voltage.c @@ -21,13 +21,15 @@ enum scmi_voltage_protocol_cmd { VOLTAGE_CONFIG_GET = 0x6, VOLTAGE_LEVEL_SET = 0x7, VOLTAGE_LEVEL_GET = 0x8, + VOLTAGE_DOMAIN_NAME_GET = 0x09, }; #define NUM_VOLTAGE_DOMAINS(x) ((u16)(FIELD_GET(VOLTAGE_DOMS_NUM_MASK, (x)))) struct scmi_msg_resp_domain_attributes { __le32 attr; - u8 name[SCMI_MAX_STR_SIZE]; +#define SUPPORTS_EXTENDED_NAMES(x) ((x) & BIT(30)) + u8 name[SCMI_SHORT_NAME_MAX_SIZE]; }; struct scmi_msg_cmd_describe_levels { @@ -149,6 +151,16 @@ static int scmi_voltage_descriptors_get(const struct scmi_protocol_handle *ph, v->attributes = le32_to_cpu(resp_dom->attr); strlcpy(v->name, resp_dom->name, SCMI_MAX_STR_SIZE); + /* + * If supported overwrite short name with the extended one; + * on error just carry on and use already provided short name. + */ + if (PROTOCOL_REV_MAJOR(vinfo->version) >= 0x2 && + SUPPORTS_EXTENDED_NAMES(v->attributes)) + ph->hops->extended_name_get(ph, VOLTAGE_DOMAIN_NAME_GET, + v->id, v->name, + SCMI_MAX_STR_SIZE); + cmd = tl->tx.buf; /* ...then retrieve domain levels descriptions */ do { diff --git a/include/linux/scmi_protocol.h b/include/linux/scmi_protocol.h index ced37d1de1fe..56e6f13355b8 100644 --- a/include/linux/scmi_protocol.h +++ b/include/linux/scmi_protocol.h @@ -13,7 +13,7 @@ #include <linux/notifier.h> #include <linux/types.h> -#define SCMI_MAX_STR_SIZE 16 +#define SCMI_MAX_STR_SIZE 64 #define SCMI_MAX_NUM_RATES 16 /**
Using the common protocol helper implementation add support for all new SCMIv3.1 extended names commands related to all protocols with the exception of SENSOR_AXIS_GET_NAME. Signed-off-by: Cristian Marussi <cristian.marussi@arm.com> --- drivers/firmware/arm_scmi/clock.c | 21 ++++++++++++++++++--- drivers/firmware/arm_scmi/perf.c | 22 ++++++++++++++++++---- drivers/firmware/arm_scmi/power.c | 25 ++++++++++++++++++++----- drivers/firmware/arm_scmi/protocols.h | 2 ++ drivers/firmware/arm_scmi/reset.c | 22 ++++++++++++++++++---- drivers/firmware/arm_scmi/sensors.c | 15 ++++++++++++++- drivers/firmware/arm_scmi/voltage.c | 14 +++++++++++++- include/linux/scmi_protocol.h | 2 +- 8 files changed, 104 insertions(+), 19 deletions(-)