diff mbox series

[v1] ASoC: tlv320aic32x4: handle regmap_read error gracefully

Message ID 20191227152056.9903-1-ps.report@gmx.net (mailing list archive)
State New, archived
Headers show
Series [v1] ASoC: tlv320aic32x4: handle regmap_read error gracefully | expand

Commit Message

Peter Seiderer Dec. 27, 2019, 3:20 p.m. UTC
Fixes:

[    5.169310] Division by zero in kernel.
[    5.200998] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 5.3.18-20191021-1+ #14
[    5.203049] cdc_acm 2-1.6:1.0: ttyACM0: USB ACM device
[    5.208198] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
[    5.220084] Backtrace:
[    5.222628] [<8010f60c>] (dump_backtrace) from [<8010f9a8>] (show_stack+0x20/0x24)
[    5.230348]  r7:810a1c6c r6:00000000 r5:60000013 r4:810a1c6c
[    5.236097] [<8010f988>] (show_stack) from [<809e06a0>] (dump_stack+0xac/0xd8)
[    5.243469] [<809e05f4>] (dump_stack) from [<8010f780>] (__div0+0x24/0x28)
[    5.250431]  r9:8111adc8 r8:ae631180 r7:aebd27c0 r6:ae631e40 r5:00000000 r4:81006508
[    5.258325] [<8010f75c>] (__div0) from [<809de7ac>] (Ldiv0+0x8/0x10)
[    5.264841] [<8085c7e0>] (clk_aic32x4_div_recalc_rate) from [<805ba70c>] (__clk_register+0x2f8/0x7e4)
[    5.274141]  r5:80dd065c r4:ae6bd480
[    5.277869] [<805ba414>] (__clk_register) from [<805bace0>] (devm_clk_register+0x58/0x8c)
[    5.286130]  r10:81006508 r9:810946d4 r8:00000000 r7:ae8de1c0 r6:ae631ac0 r5:ae631e40
[    5.294103]  r4:ae8d8020
[    5.296724] [<805bac88>] (devm_clk_register) from [<8085cea8>] (aic32x4_register_clocks+0x120/0x14c)
[    5.306004]  r7:ae8de1c0 r6:ae8d8020 r5:ae631e40 r4:810946c0
[    5.311818] [<8085cd88>] (aic32x4_register_clocks) from [<8085bf60>] (aic32x4_probe+0x94/0x468)
[    5.320602]  r10:81094730 r9:00000000 r8:af361fc0 r7:bfd6d040 r6:00000000 r5:ae8d8020
[    5.328574]  r4:af361e40
[    5.331195] [<8085becc>] (aic32x4_probe) from [<8085cf60>] (aic32x4_i2c_probe+0x6c/0x88)
[    5.339434]  r8:00000000 r7:ae8d8000 r6:81094730 r5:ae8d8000 r4:81006508
[    5.346288] [<8085cef4>] (aic32x4_i2c_probe) from [<807554b0>] (i2c_device_probe+0x2ac/0x2f0)
[    5.354894]  r5:8085cef4 r4:ae8d8020
[    5.358625] [<80755204>] (i2c_device_probe) from [<80678e34>] (really_probe+0x11c/0x428)
[    5.366802]  r9:00000000 r8:810b3e78 r7:00000000 r6:8111e020 r5:ae8d8020 r4:8111e01c
[    5.374694] [<80678d18>] (really_probe) from [<80679388>] (driver_probe_device+0x88/0x1e0)
[    5.383106]  r10:80f63860 r9:ffffe000 r8:ffffe000 r7:80679794 r6:81094730 r5:81094730
[    5.391080]  r4:ae8d8020
[    5.393702] [<80679300>] (driver_probe_device) from [<8067978c>] (device_driver_attach+0x68/0x70)
[    5.402724]  r9:ffffe000 r8:ffffe000 r7:80679794 r6:81094730 r5:00000000 r4:ae8d8020
[    5.410555] [<80679724>] (device_driver_attach) from [<80679858>] (__driver_attach+0xc4/0x164)
[    5.419313]  r7:80679794 r6:ae8d8020 r5:81094730 r4:00000000
[    5.425123] [<80679794>] (__driver_attach) from [<80676a14>] (bus_for_each_dev+0x84/0xc4)
[    5.433384]  r7:80679794 r6:81094730 r5:81006508 r4:ae8dc0c0
[    5.439192] [<80676990>] (bus_for_each_dev) from [<80678668>] (driver_attach+0x2c/0x30)
[    5.447279]  r7:00000000 r6:af361500 r5:8107fd94 r4:81094730
[    5.453087] [<8067863c>] (driver_attach) from [<80677fc4>] (bus_add_driver+0x1d0/0x210)
[    5.461240] [<80677df4>] (bus_add_driver) from [<80679f34>] (driver_register+0x84/0x118)
[    5.469414]  r7:00000000 r6:80f4ac9c r5:81006508 r4:81094730
[    5.475224] [<80679eb0>] (driver_register) from [<80755dfc>] (i2c_register_driver+0x4c/0xb8)
[    5.483807]  r5:81006508 r4:81094714
[    5.487472] [<80755db0>] (i2c_register_driver) from [<80f4acc0>] (aic32x4_i2c_driver_init+0x24/0x28)
[    5.496750]  r5:81006508 r4:810a7180
[    5.500415] [<80f4ac9c>] (aic32x4_i2c_driver_init) from [<80103288>] (do_one_initcall+0x64/0x2d0)
[    5.509442] [<80103224>] (do_one_initcall) from [<80f014a8>] (kernel_init_freeable+0x300/0x390)
[    5.518287]  r8:810c7300 r7:810c7300 r6:00000007 r5:80f920c4 r4:80f63840
[    5.525079] [<80f011a8>] (kernel_init_freeable) from [<809f892c>] (kernel_init+0x18/0x124)
[    5.533490]  r10:00000000 r9:00000000 r8:00000000 r7:00000000 r6:00000000 r5:809f8914
[    5.541461]  r4:00000000
[    5.544084] [<809f8914>] (kernel_init) from [<801010b4>] (ret_from_fork+0x14/0x20)
[    5.551800] Exception stack(0xaf115fb0 to 0xaf115ff8)
[    5.556935] 5fa0:                                     00000000 00000000 00000000 00000000
[    5.565262] 5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[    5.573522] 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[    5.580283]  r5:809f8914 r4:00000000

Signed-off-by: Peter Seiderer <ps.report@gmx.net>
---
 sound/soc/codecs/tlv320aic32x4-clk.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Mark Brown Dec. 27, 2019, 10:52 p.m. UTC | #1
On Fri, Dec 27, 2019 at 04:20:56PM +0100, Peter Seiderer wrote:
> Fixes:
> 
> [    5.169310] Division by zero in kernel.
> [    5.200998] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 5.3.18-20191021-1+ #14
> [    5.203049] cdc_acm 2-1.6:1.0: ttyACM0: USB ACM device
> [    5.208198] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
> [    5.220084] Backtrace:
> [    5.222628] [<8010f60c>] (dump_backtrace) from [<8010f9a8>] (show_stack+0x20/0x24)

Please think hard before including complete backtraces in upstream
reports, they are very large and contain almost no useful information
relative to their size so often obscure the relevant content in your
message. If part of the backtrace is usefully illustrative then it's
usually better to pull out the relevant sections.
Peter Seiderer Jan. 6, 2020, 8:45 p.m. UTC | #2
Hello Mark,

On Fri, 27 Dec 2019 22:52:04 +0000, Mark Brown <broonie@kernel.org> wrote:

> On Fri, Dec 27, 2019 at 04:20:56PM +0100, Peter Seiderer wrote:
> > Fixes:
> >
> > [    5.169310] Division by zero in kernel.
> > [    5.200998] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 5.3.18-20191021-1+ #14
> > [    5.203049] cdc_acm 2-1.6:1.0: ttyACM0: USB ACM device
> > [    5.208198] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
> > [    5.220084] Backtrace:
> > [    5.222628] [<8010f60c>] (dump_backtrace) from [<8010f9a8>] (show_stack+0x20/0x24)
>
> Please think hard before including complete backtraces in upstream
> reports, they are very large and contain almost no useful information
> relative to their size so often obscure the relevant content in your
> message. If part of the backtrace is usefully illustrative then it's
> usually better to pull out the relevant sections.

Thanks for review..., but a little disagree here, do not know much which
is more informative than a complete back trace for a division by zero (and
which is the complete information/starting point for investigating the
reason therefore) and it would be a pity to loose this valuable information?

Maybe I should have added more information about why and how the failing
regmap_read() call leads to a division by zero?

Any hint for a better commit message is welcome ;-)

Regards,
Peter
Mark Brown Jan. 9, 2020, 8:35 p.m. UTC | #3
On Mon, Jan 06, 2020 at 09:45:34PM +0100, Peter Seiderer wrote:
> On Fri, 27 Dec 2019 22:52:04 +0000, Mark Brown <broonie@kernel.org> wrote:
> > On Fri, Dec 27, 2019 at 04:20:56PM +0100, Peter Seiderer wrote:

> > Please think hard before including complete backtraces in upstream
> > reports, they are very large and contain almost no useful information
> > relative to their size so often obscure the relevant content in your
> > message. If part of the backtrace is usefully illustrative then it's
> > usually better to pull out the relevant sections.

> Thanks for review..., but a little disagree here, do not know much which
> is more informative than a complete back trace for a division by zero (and
> which is the complete information/starting point for investigating the
> reason therefore) and it would be a pity to loose this valuable information?

Right, some backtrace is definitely often useful for understanding where
things broke and helping people search for problems - it's just
providing the *full* backtrace that can be an issue as a lot of it can
end up being redundant.  As a rule of thumb I'd tend to say that once
you get out of the driver or subsystem you're starting to loose
relevance.  For example with a probe failure like this once you get out
of the driver probe function it almost never matters exactly what the
stack in the device model core is, it's not adding anything and it's
usually more than a screenful that needs to be paged through.  Cutting
that out helps people focus on the bits that matter.

> Maybe I should have added more information about why and how the failing
> regmap_read() call leads to a division by zero?

Any analysis you've done about why things got confused and broken is
definitely good to include!
Mark Brown Jan. 9, 2020, 8:38 p.m. UTC | #4
On Fri, Dec 27, 2019 at 04:20:56PM +0100, Peter Seiderer wrote:
> @@ -338,7 +338,8 @@ static unsigned long clk_aic32x4_div_recalc_rate(struct clk_hw *hw,

>  	unsigned int val;

> -	regmap_read(div->regmap, div->reg, &val);
> +	if (regmap_read(div->regmap, div->reg, &val))
> +		return 0;

Is this the best fix - shouldn't we be returning an error here?  We
don't know what the value programmed into the device actually is so zero
might be wrong, and we still have the risk that the value we read from
the device may be zero if the device is misprogrammed.
Peter Seiderer Jan. 9, 2020, 10:21 p.m. UTC | #5
Hello Mark,

On Thu, 9 Jan 2020 20:38:19 +0000, Mark Brown <broonie@kernel.org> wrote:

> On Fri, Dec 27, 2019 at 04:20:56PM +0100, Peter Seiderer wrote:
> > @@ -338,7 +338,8 @@ static unsigned long clk_aic32x4_div_recalc_rate(struct clk_hw *hw,
>
> >  	unsigned int val;
>
> > -	regmap_read(div->regmap, div->reg, &val);
> > +	if (regmap_read(div->regmap, div->reg, &val))

Unrelated to your question, but I would change this line (on next patch
iteration) to (as all other return value checked places for regmap_read
in the same file):

	if (regmap_read(div->regmap, div->reg, &val) < 0)

> > +		return 0;
>
> Is this the best fix - shouldn't we be returning an error here?  We
> don't know what the value programmed into the device actually is so zero
> might be wrong, and we still have the risk that the value we read from
> the device may be zero if the device is misprogrammed.

clk_aic32x4_div_recalc_rate() is used for clk_ops aic32x4_div_ops.recalc_rate,
did not check/or see on first sight if there is a error handling on the usage
of recalc_rate, but did take a look at some other places where the error
handling seems to be to return zero, e.g. sound/soc/codecs/da7219.c
sound/soc/intel/skylake/skl-ssp-clk.c, etc.

The error occurred with linux-5.3.18, with the earlier versions on regmap_read
failure val was (by chance) near 2^31 and evaluated with (val & AIC32X4_DIV_MASK)
to 96 (or similar)...but with 5.3.18 evaluated to 0 and the line

	return DIV_ROUND_UP(parent_rate, val & AIC32X4_DIV_MASK);

produced the 'Division by zero'...

Regards,
Peter
diff mbox series

Patch

diff --git a/sound/soc/codecs/tlv320aic32x4-clk.c b/sound/soc/codecs/tlv320aic32x4-clk.c
index 156c153c12ab..7a82e3448780 100644
--- a/sound/soc/codecs/tlv320aic32x4-clk.c
+++ b/sound/soc/codecs/tlv320aic32x4-clk.c
@@ -338,7 +338,8 @@  static unsigned long clk_aic32x4_div_recalc_rate(struct clk_hw *hw,
 
 	unsigned int val;
 
-	regmap_read(div->regmap, div->reg, &val);
+	if (regmap_read(div->regmap, div->reg, &val))
+		return 0;
 
 	return DIV_ROUND_UP(parent_rate, val & AIC32X4_DIV_MASK);
 }