Message ID | 20200813214538.8474-1-kernel@tuxforce.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [1/2] media: si2168: request caching of firmware to make it available on resume | expand |
On Thu, Aug 13, 2020 at 11:45:37PM +0200, Lukas Middendorf wrote: > even though request_firmware() is supposed to be safe to call during > resume, it might fail (or even hang the system) when the firmware > has not been loaded previously. Use firmware_request_cache() to > have it cached so it is available reliably on resume. > > Signed-off-by: Lukas Middendorf <kernel@tuxforce.de> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Luis
Hi, I see this (or a similar fix) has not yet been included in 5.12-rc5. Any further problems or comments regarding this patch? It still applies cleanly to current git master and the problem is still relevant. Best regards Lukas On 13/08/2020 23:45, Lukas Middendorf wrote: > even though request_firmware() is supposed to be safe to call during > resume, it might fail (or even hang the system) when the firmware > has not been loaded previously. Use firmware_request_cache() to > have it cached so it is available reliably on resume. > > Signed-off-by: Lukas Middendorf <kernel@tuxforce.de> > --- > drivers/media/dvb-frontends/si2168.c | 11 +++++++++++ > 1 file changed, 11 insertions(+) > > diff --git a/drivers/media/dvb-frontends/si2168.c b/drivers/media/dvb-frontends/si2168.c > index 14b93a7d3358..ea4b2d91697e 100644 > --- a/drivers/media/dvb-frontends/si2168.c > +++ b/drivers/media/dvb-frontends/si2168.c > @@ -757,6 +757,17 @@ static int si2168_probe(struct i2c_client *client, > dev->version >> 24 & 0xff, dev->version >> 16 & 0xff, > dev->version >> 8 & 0xff, dev->version >> 0 & 0xff); > > + /* request caching of the firmware so it is available on resume after suspend. > + * The actual caching of the firmware file only occurs during suspend > + * The return value does not show whether the firmware file exists > + */ > + ret = firmware_request_cache(&client->dev, dev->firmware_name); > + if (ret) { > + dev_err(&client->dev, > + "firmware caching for '%s' failed\n", > + dev->firmware_name); > + } > + > return 0; > err_kfree: > kfree(dev); >
On Thu, Apr 01, 2021 at 04:42:26PM +0200, Lukas Middendorf wrote: > Hi, > > I see this (or a similar fix) has not yet been included in 5.12-rc5. > Any further problems or comments regarding this patch? It still applies > cleanly to current git master and the problem is still relevant. Working on it. Also while at it, take a look at commit d723522b0be49 ("mt7601u: use firmware_request_cache() to address cache on reboot"), and so in the meantime it would be nice to know if this device has a similar optimization, perhaps on the dev->warm case. Would any of you know? Luis
Em Thu, 1 Apr 2021 16:42:26 +0200 Lukas Middendorf <kernel@tuxforce.de> escreveu: > Hi, > > I see this (or a similar fix) has not yet been included in 5.12-rc5. > Any further problems or comments regarding this patch? It still applies > cleanly to current git master and the problem is still relevant. Well, I fail to see why si2168 is so special that it would require it... on a quick check, it sounds that there's just a single driver using this kAPI: drivers/net/wireless/mediatek/mt7601u/mcu.c: return firmware_request_cache(dev->dev, MT7601U_FIRMWARE); while there are several drivers on media that require firmware. Btw, IMHO, the better would be to reload the firmware at resume time, instead of caching it, just like other media drivers. > > Best regards > Lukas > > On 13/08/2020 23:45, Lukas Middendorf wrote: > > even though request_firmware() is supposed to be safe to call during > > resume, it might fail (or even hang the system) when the firmware > > has not been loaded previously. Use firmware_request_cache() to > > have it cached so it is available reliably on resume. > > > > Signed-off-by: Lukas Middendorf <kernel@tuxforce.de> > > --- > > drivers/media/dvb-frontends/si2168.c | 11 +++++++++++ > > 1 file changed, 11 insertions(+) > > > > diff --git a/drivers/media/dvb-frontends/si2168.c b/drivers/media/dvb-frontends/si2168.c > > index 14b93a7d3358..ea4b2d91697e 100644 > > --- a/drivers/media/dvb-frontends/si2168.c > > +++ b/drivers/media/dvb-frontends/si2168.c > > @@ -757,6 +757,17 @@ static int si2168_probe(struct i2c_client *client, > > dev->version >> 24 & 0xff, dev->version >> 16 & 0xff, > > dev->version >> 8 & 0xff, dev->version >> 0 & 0xff); > > > > + /* request caching of the firmware so it is available on resume after suspend. > > + * The actual caching of the firmware file only occurs during suspend > > + * The return value does not show whether the firmware file exists > > + */ > > + ret = firmware_request_cache(&client->dev, dev->firmware_name); > > + if (ret) { > > + dev_err(&client->dev, > > + "firmware caching for '%s' failed\n", > > + dev->firmware_name); > > + } > > + > > return 0; > > err_kfree: > > kfree(dev); > > Thanks, Mauro
On Fri, Apr 09, 2021 at 01:29:57PM +0200, Mauro Carvalho Chehab wrote: > Em Thu, 1 Apr 2021 16:42:26 +0200 > Lukas Middendorf <kernel@tuxforce.de> escreveu: > > > Hi, > > > > I see this (or a similar fix) has not yet been included in 5.12-rc5. > > Any further problems or comments regarding this patch? It still applies > > cleanly to current git master and the problem is still relevant. > > Well, I fail to see why si2168 is so special that it would require it... > > on a quick check, it sounds that there's just a single driver using this > kAPI: > > drivers/net/wireless/mediatek/mt7601u/mcu.c: return firmware_request_cache(dev->dev, MT7601U_FIRMWARE); > > while there are several drivers on media that require firmware. > > Btw, IMHO, the better would be to reload the firmware at resume > time, instead of caching it, just like other media drivers. Mauro, Here is the thing. If we have a race to a filesystem (it calls submit_bio()) after resume but before thaw you can end up in a situation where async read waits forever as the read never hit hardware. Fixing this is part of the work I had tried long ago by removing the kthread freezer from filesystems [0] which allow proper filesystem freeze/thaw during suspend / resume. I am picking this work up in the meantime. The firmware cache resolves these races by caching firmware in case its needed on resume. However, if a driver never actually had called request_firmware() upon bootup, then the firmware was never cached and the call to request_firmware() on resume will actually trigger a submit_bio(). In my tests the race does trigger a forever wait on XFS and btrfs, but not on ext4. But in any case, I can put a stop gap to these issues by issuing a try lock on the usermode helper lock prior to a direct fs read, however that's just a hack, and preference is to just resolve this by getting drivers to properly call request_firmware() before thaw. The commit log for the one user you mentioned explains well why that driver needed it, commit d723522b0be4 ("mt7601u: use firmware_request_cache() to address cache on reboot") was added since the device may sometimes retain the firmware on the hardware device upon reboot, and in such case not trigger a request_firmware() call on reboot on the driver side. If such cases happen on other drivers, they can use that. Its not clear to me from looking at the media APIs whether or not all drivers are always properly calling the request_firmware() API on suspend, prior to resume. If not that needs to be fixed. Luis
On 09/04/2021 13:29, Mauro Carvalho Chehab wrote: > Well, I fail to see why si2168 is so special that it would require it... The special case here is that si2168 does (try to) load the firmware for the first time during resume. Most other drivers that use firmware do it for the first time at boot (or when connecting the device) and therefore will automatically have their firmware cached for use on resume. > on a quick check, it sounds that there's just a single driver using this > kAPI: > > drivers/net/wireless/mediatek/mt7601u/mcu.c: return firmware_request_cache(dev->dev, MT7601U_FIRMWARE); > > while there are several drivers on media that require firmware. Any other driver that might load the firmware for the first time during resume also has to be fixed. On a quick glance it looks like the si2165 for example might have the same problem. I think that at least all dvb frontends which load the firmware in init callback but not during probe are problematic. The possible patch with the usermode helper lock by Luis causes uncached firmware loading on resume to fail very noisily instead of just stalling the system. That would show up other non-conformant drivers. There likely would be some more bug reports coming in from users which dislike the backtraces coming up in dmesg. You will likely want to fix the drivers before that happens. The fact that this bug is only exposed now that btrfs is seeing more wide spread adoption does not make it less of a bug. > Btw, IMHO, the better would be to reload the firmware at resume > time, instead of caching it, just like other media drivers. Loading the firmware on resume without it being cached is exactly what causes problems (see Luis' explanation). The caching is set up implicitly if the normal request_firmware() is used before suspend. The firmware does not stay in cache permanently. The firmware is just cached by the firmware loader api during suspend and cleaned again at the end of resume when proper file system access is possible again. A really better solution would be to not load the firmware on resume in case it has not been previously loaded to the device (or not load it at all on resume since playback has to be restarted after suspend anyway). But it seems like the same init callback of the si2168 driver is called both at resume and when the device is being used and therefore does not easily allow for this. Likely the dvb_frontend api would have to be extended to have a separate callback for resume.
diff --git a/drivers/media/dvb-frontends/si2168.c b/drivers/media/dvb-frontends/si2168.c index 14b93a7d3358..ea4b2d91697e 100644 --- a/drivers/media/dvb-frontends/si2168.c +++ b/drivers/media/dvb-frontends/si2168.c @@ -757,6 +757,17 @@ static int si2168_probe(struct i2c_client *client, dev->version >> 24 & 0xff, dev->version >> 16 & 0xff, dev->version >> 8 & 0xff, dev->version >> 0 & 0xff); + /* request caching of the firmware so it is available on resume after suspend. + * The actual caching of the firmware file only occurs during suspend + * The return value does not show whether the firmware file exists + */ + ret = firmware_request_cache(&client->dev, dev->firmware_name); + if (ret) { + dev_err(&client->dev, + "firmware caching for '%s' failed\n", + dev->firmware_name); + } + return 0; err_kfree: kfree(dev);
even though request_firmware() is supposed to be safe to call during resume, it might fail (or even hang the system) when the firmware has not been loaded previously. Use firmware_request_cache() to have it cached so it is available reliably on resume. Signed-off-by: Lukas Middendorf <kernel@tuxforce.de> --- drivers/media/dvb-frontends/si2168.c | 11 +++++++++++ 1 file changed, 11 insertions(+)