diff mbox series

[1/2] media: si2168: request caching of firmware to make it available on resume

Message ID 20200813214538.8474-1-kernel@tuxforce.de (mailing list archive)
State New, archived
Headers show
Series [1/2] media: si2168: request caching of firmware to make it available on resume | expand

Commit Message

Lukas Middendorf Aug. 13, 2020, 9:45 p.m. UTC
even though request_firmware() is supposed to be safe to call during
resume, it might fail (or even hang the system) when the firmware
has not been loaded previously. Use firmware_request_cache() to
have it cached so it is available reliably on resume.

Signed-off-by: Lukas Middendorf <kernel@tuxforce.de>
---
 drivers/media/dvb-frontends/si2168.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

Comments

Luis Chamberlain Aug. 13, 2020, 9:54 p.m. UTC | #1
On Thu, Aug 13, 2020 at 11:45:37PM +0200, Lukas Middendorf wrote:
> even though request_firmware() is supposed to be safe to call during
> resume, it might fail (or even hang the system) when the firmware
> has not been loaded previously. Use firmware_request_cache() to
> have it cached so it is available reliably on resume.
> 
> Signed-off-by: Lukas Middendorf <kernel@tuxforce.de>

Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>

  Luis
Lukas Middendorf April 1, 2021, 2:42 p.m. UTC | #2
Hi,

I see this (or a similar fix) has not yet been included in 5.12-rc5.
Any further problems or comments regarding this patch? It still applies 
cleanly to current git master and the problem is still relevant.

Best regards
Lukas

On 13/08/2020 23:45, Lukas Middendorf wrote:
> even though request_firmware() is supposed to be safe to call during
> resume, it might fail (or even hang the system) when the firmware
> has not been loaded previously. Use firmware_request_cache() to
> have it cached so it is available reliably on resume.
> 
> Signed-off-by: Lukas Middendorf <kernel@tuxforce.de>
> ---
>   drivers/media/dvb-frontends/si2168.c | 11 +++++++++++
>   1 file changed, 11 insertions(+)
> 
> diff --git a/drivers/media/dvb-frontends/si2168.c b/drivers/media/dvb-frontends/si2168.c
> index 14b93a7d3358..ea4b2d91697e 100644
> --- a/drivers/media/dvb-frontends/si2168.c
> +++ b/drivers/media/dvb-frontends/si2168.c
> @@ -757,6 +757,17 @@ static int si2168_probe(struct i2c_client *client,
>   		 dev->version >> 24 & 0xff, dev->version >> 16 & 0xff,
>   		 dev->version >> 8 & 0xff, dev->version >> 0 & 0xff);
>   
> +	/* request caching of the firmware so it is available on resume after suspend.
> +	 * The actual caching of the firmware file only occurs during suspend
> +	 * The return value does not show whether the firmware file exists
> +	 */
> +	ret = firmware_request_cache(&client->dev, dev->firmware_name);
> +	if (ret) {
> +		dev_err(&client->dev,
> +				"firmware caching for '%s' failed\n",
> +				dev->firmware_name);
> +	}
> +
>   	return 0;
>   err_kfree:
>   	kfree(dev);
>
Luis Chamberlain April 2, 2021, 6:04 p.m. UTC | #3
On Thu, Apr 01, 2021 at 04:42:26PM +0200, Lukas Middendorf wrote:
> Hi,
> 
> I see this (or a similar fix) has not yet been included in 5.12-rc5.
> Any further problems or comments regarding this patch? It still applies
> cleanly to current git master and the problem is still relevant.

Working on it. Also while at it, take a look at commit d723522b0be49
("mt7601u: use firmware_request_cache() to address cache on reboot"),
and so in the meantime it would be nice to know if this device has
a similar optimization, perhaps on the dev->warm case. Would any of
you know?

  Luis
Mauro Carvalho Chehab April 9, 2021, 11:29 a.m. UTC | #4
Em Thu, 1 Apr 2021 16:42:26 +0200
Lukas Middendorf <kernel@tuxforce.de> escreveu:

> Hi,
> 
> I see this (or a similar fix) has not yet been included in 5.12-rc5.
> Any further problems or comments regarding this patch? It still applies 
> cleanly to current git master and the problem is still relevant.

Well, I fail to see why si2168 is so special that it would require it...

on a quick check, it sounds that there's just a single driver using this
kAPI:

	drivers/net/wireless/mediatek/mt7601u/mcu.c:            return firmware_request_cache(dev->dev, MT7601U_FIRMWARE);

while there are several drivers on media that require firmware.

Btw, IMHO, the better would be to reload the firmware at resume
time, instead of caching it, just like other media drivers.



> 
> Best regards
> Lukas
> 
> On 13/08/2020 23:45, Lukas Middendorf wrote:
> > even though request_firmware() is supposed to be safe to call during
> > resume, it might fail (or even hang the system) when the firmware
> > has not been loaded previously. Use firmware_request_cache() to
> > have it cached so it is available reliably on resume.
> > 
> > Signed-off-by: Lukas Middendorf <kernel@tuxforce.de>
> > ---
> >   drivers/media/dvb-frontends/si2168.c | 11 +++++++++++
> >   1 file changed, 11 insertions(+)
> > 
> > diff --git a/drivers/media/dvb-frontends/si2168.c b/drivers/media/dvb-frontends/si2168.c
> > index 14b93a7d3358..ea4b2d91697e 100644
> > --- a/drivers/media/dvb-frontends/si2168.c
> > +++ b/drivers/media/dvb-frontends/si2168.c
> > @@ -757,6 +757,17 @@ static int si2168_probe(struct i2c_client *client,
> >   		 dev->version >> 24 & 0xff, dev->version >> 16 & 0xff,
> >   		 dev->version >> 8 & 0xff, dev->version >> 0 & 0xff);
> >   
> > +	/* request caching of the firmware so it is available on resume after suspend.
> > +	 * The actual caching of the firmware file only occurs during suspend
> > +	 * The return value does not show whether the firmware file exists
> > +	 */
> > +	ret = firmware_request_cache(&client->dev, dev->firmware_name);
> > +	if (ret) {
> > +		dev_err(&client->dev,
> > +				"firmware caching for '%s' failed\n",
> > +				dev->firmware_name);
> > +	}
> > +
> >   	return 0;
> >   err_kfree:
> >   	kfree(dev);
> >   



Thanks,
Mauro
Luis Chamberlain April 9, 2021, 4:58 p.m. UTC | #5
On Fri, Apr 09, 2021 at 01:29:57PM +0200, Mauro Carvalho Chehab wrote:
> Em Thu, 1 Apr 2021 16:42:26 +0200
> Lukas Middendorf <kernel@tuxforce.de> escreveu:
> 
> > Hi,
> > 
> > I see this (or a similar fix) has not yet been included in 5.12-rc5.
> > Any further problems or comments regarding this patch? It still applies 
> > cleanly to current git master and the problem is still relevant.
> 
> Well, I fail to see why si2168 is so special that it would require it...
> 
> on a quick check, it sounds that there's just a single driver using this
> kAPI:
> 
> 	drivers/net/wireless/mediatek/mt7601u/mcu.c:            return firmware_request_cache(dev->dev, MT7601U_FIRMWARE);
> 
> while there are several drivers on media that require firmware.
> 
> Btw, IMHO, the better would be to reload the firmware at resume
> time, instead of caching it, just like other media drivers.

Mauro,

Here is the thing. If we have a race to a filesystem (it calls
submit_bio()) after resume but before thaw you can end up in
a situation where async read waits forever as the read never
hit hardware.

Fixing this is part of the work I had tried long ago by removing
the kthread freezer from filesystems [0] which allow proper
filesystem freeze/thaw during suspend / resume. I am picking
this work up in the meantime.

The firmware cache resolves these races by caching firmware
in case its needed on resume. However, if a driver never
actually had called request_firmware() upon bootup, then
the firmware was never cached and the call to request_firmware()
on resume will actually trigger a submit_bio().

In my tests the race does trigger a forever wait on XFS and btrfs, but
not on ext4. But in any case, I can put a stop gap to these issues
by issuing a try lock on the usermode helper lock prior to a direct
fs read, however that's just a hack, and preference is to just resolve
this by getting drivers to properly call request_firmware() before
thaw. The commit log for the one user you mentioned explains well why
that driver needed it, commit d723522b0be4 ("mt7601u: use
firmware_request_cache() to address cache on reboot") was added
since the device may sometimes retain the firmware on the hardware
device upon reboot, and in such case not trigger a request_firmware()
call on reboot on the driver side.

If such cases happen on other drivers, they can use that.

Its not clear to me from looking at the media APIs whether or not
all drivers are always properly calling the request_firmware() API
on suspend, prior to resume. If not that needs to be fixed.

  Luis
Lukas Middendorf April 9, 2021, 10:02 p.m. UTC | #6
On 09/04/2021 13:29, Mauro Carvalho Chehab wrote:
> Well, I fail to see why si2168 is so special that it would require it...

The special case here is that si2168 does (try to) load the firmware for 
the first time during resume. Most other drivers that use firmware do it 
for the first time at boot (or when connecting the device) and therefore 
will automatically have their firmware cached for use on resume.

> on a quick check, it sounds that there's just a single driver using this
> kAPI:
> 
> 	drivers/net/wireless/mediatek/mt7601u/mcu.c:            return firmware_request_cache(dev->dev, MT7601U_FIRMWARE);
> 
> while there are several drivers on media that require firmware.

Any other driver that might load the firmware for the first time during 
resume also has to be fixed. On a quick glance it looks like the si2165 
for example might have the same problem. I think that at least all dvb 
frontends which load the firmware in init callback but not during probe 
are problematic.

The possible patch with the usermode helper lock by Luis causes uncached 
firmware loading on resume to fail very noisily instead of just stalling 
the system. That would show up other non-conformant drivers. There 
likely would be some more bug reports coming in from users which dislike 
the backtraces coming up in dmesg. You will likely want to fix the 
drivers before that happens.
The fact that this bug is only exposed now that btrfs is seeing more 
wide spread adoption does not make it less of a bug.

> Btw, IMHO, the better would be to reload the firmware at resume
> time, instead of caching it, just like other media drivers.

Loading the firmware on resume without it being cached is exactly what 
causes problems (see Luis' explanation). The caching is set up 
implicitly if the normal request_firmware() is used before suspend. The 
firmware does not stay in cache permanently. The firmware is just cached 
by the firmware loader api during suspend and cleaned again at the end 
of resume when proper file system access is possible again.

A really better solution would be to not load the firmware on resume in 
case it has not been previously loaded to the device (or not load it at 
all on resume since playback has to be restarted after suspend anyway). 
But it seems like the same init callback of the si2168 driver is called 
both at resume and when the device is being used and therefore does not 
easily allow for this. Likely the dvb_frontend api would have to be 
extended to have a separate callback for resume.
diff mbox series

Patch

diff --git a/drivers/media/dvb-frontends/si2168.c b/drivers/media/dvb-frontends/si2168.c
index 14b93a7d3358..ea4b2d91697e 100644
--- a/drivers/media/dvb-frontends/si2168.c
+++ b/drivers/media/dvb-frontends/si2168.c
@@ -757,6 +757,17 @@  static int si2168_probe(struct i2c_client *client,
 		 dev->version >> 24 & 0xff, dev->version >> 16 & 0xff,
 		 dev->version >> 8 & 0xff, dev->version >> 0 & 0xff);
 
+	/* request caching of the firmware so it is available on resume after suspend.
+	 * The actual caching of the firmware file only occurs during suspend
+	 * The return value does not show whether the firmware file exists
+	 */
+	ret = firmware_request_cache(&client->dev, dev->firmware_name);
+	if (ret) {
+		dev_err(&client->dev,
+				"firmware caching for '%s' failed\n",
+				dev->firmware_name);
+	}
+
 	return 0;
 err_kfree:
 	kfree(dev);