diff mbox series

[5.17,127/298] driver core: Fix wait_for_device_probe() & deferred_probe_timeout interaction

Message ID 20220613094928.793712131@linuxfoundation.org (mailing list archive)
State Not Applicable, archived
Headers show
Series None | expand

Commit Message

Greg KH June 13, 2022, 10:10 a.m. UTC
From: Saravana Kannan <saravanak@google.com>

[ Upstream commit 5ee76c256e928455212ab759c51d198fedbe7523 ]

Mounting NFS rootfs was timing out when deferred_probe_timeout was
non-zero [1].  This was because ip_auto_config() initcall times out
waiting for the network interfaces to show up when
deferred_probe_timeout was non-zero. While ip_auto_config() calls
wait_for_device_probe() to make sure any currently running deferred
probe work or asynchronous probe finishes, that wasn't sufficient to
account for devices being deferred until deferred_probe_timeout.

Commit 35a672363ab3 ("driver core: Ensure wait_for_device_probe() waits
until the deferred_probe_timeout fires") tried to fix that by making
sure wait_for_device_probe() waits for deferred_probe_timeout to expire
before returning.

However, if wait_for_device_probe() is called from the kernel_init()
context:

- Before deferred_probe_initcall() [2], it causes the boot process to
  hang due to a deadlock.

- After deferred_probe_initcall() [3], it blocks kernel_init() from
  continuing till deferred_probe_timeout expires and beats the point of
  deferred_probe_timeout that's trying to wait for userspace to load
  modules.

Neither of this is good. So revert the changes to
wait_for_device_probe().

[1] - https://lore.kernel.org/lkml/TYAPR01MB45443DF63B9EF29054F7C41FD8C60@TYAPR01MB4544.jpnprd01.prod.outlook.com/
[2] - https://lore.kernel.org/lkml/YowHNo4sBjr9ijZr@dev-arch.thelio-3990X/
[3] - https://lore.kernel.org/lkml/Yo3WvGnNk3LvLb7R@linutronix.de/

Fixes: 35a672363ab3 ("driver core: Ensure wait_for_device_probe() waits until the deferred_probe_timeout fires")
Cc: John Stultz <jstultz@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Rob Herring <robh@kernel.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Naresh Kamboju <naresh.kamboju@linaro.org>
Cc: Basil Eljuse <Basil.Eljuse@arm.com>
Cc: Ferry Toth <fntoth@gmail.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Anders Roxell <anders.roxell@linaro.org>
Cc: linux-pm@vger.kernel.org
Reported-by: Nathan Chancellor <nathan@kernel.org>
Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: John Stultz <jstultz@google.com>
Signed-off-by: Saravana Kannan <saravanak@google.com>
Link: https://lore.kernel.org/r/20220526034609.480766-2-saravanak@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Rafael J. Wysocki <rafael@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/base/dd.c | 5 -----
 1 file changed, 5 deletions(-)

Comments

Shreeya Patel Aug. 16, 2023, 9:39 a.m. UTC | #1
On 13/06/22 15:40, Greg Kroah-Hartman wrote:
> From: Saravana Kannan<saravanak@google.com>
>
> [ Upstream commit 5ee76c256e928455212ab759c51d198fedbe7523 ]
>
> Mounting NFS rootfs was timing out when deferred_probe_timeout was
> non-zero [1].  This was because ip_auto_config() initcall times out
> waiting for the network interfaces to show up when
> deferred_probe_timeout was non-zero. While ip_auto_config() calls
> wait_for_device_probe() to make sure any currently running deferred
> probe work or asynchronous probe finishes, that wasn't sufficient to
> account for devices being deferred until deferred_probe_timeout.
>
> Commit 35a672363ab3 ("driver core: Ensure wait_for_device_probe() waits
> until the deferred_probe_timeout fires") tried to fix that by making
> sure wait_for_device_probe() waits for deferred_probe_timeout to expire
> before returning.
>
> However, if wait_for_device_probe() is called from the kernel_init()
> context:
>
> - Before deferred_probe_initcall() [2], it causes the boot process to
>    hang due to a deadlock.
>
> - After deferred_probe_initcall() [3], it blocks kernel_init() from
>    continuing till deferred_probe_timeout expires and beats the point of
>    deferred_probe_timeout that's trying to wait for userspace to load
>    modules.
>
> Neither of this is good. So revert the changes to
> wait_for_device_probe().
>
> [1] -https://lore.kernel.org/lkml/TYAPR01MB45443DF63B9EF29054F7C41FD8C60@TYAPR01MB4544.jpnprd01.prod.outlook.com/
> [2] -https://lore.kernel.org/lkml/YowHNo4sBjr9ijZr@dev-arch.thelio-3990X/
> [3] -https://lore.kernel.org/lkml/Yo3WvGnNk3LvLb7R@linutronix.de/

Hi Saravana, Greg,


KernelCI found this patch causes the baseline.bootrr.deferred-probe-empty test to fail on r8a77960-ulcb,
see the following details for more information.

KernelCI dashboard link:
https://linux.kernelci.org/test/plan/id/64d2a6be8c1a8435e535b264/

Error messages from the logs :-

+ UUID=11236495_1.5.2.4.5
+ set +x
+ export 'PATH=/opt/bootrr/libexec/bootrr/helpers:/lava-11236495/1/../bin:/sbin:/usr/sbin:/bin:/usr/bin'
+ cd /opt/bootrr/libexec/bootrr
+ sh helpers/bootrr-auto
e6800000.ethernet	
e6700000.dma-controller	
e7300000.dma-controller	
e7310000.dma-controller	
ec700000.dma-controller	
ec720000.dma-controller	
fea20000.vsp	
feb00000.display	
fea28000.vsp	
fea30000.vsp	
fe9a0000.vsp	
fe9af000.fcp	
fea27000.fcp	
fea2f000.fcp	
fea37000.fcp	
sound	
ee100000.mmc	
ee140000.mmc	
ec500000.sound	
/lava-11236495/1/../bin/lava-test-case
<8>[   17.476741] <LAVA_SIGNAL_TESTCASE TEST_CASE_ID=deferred-probe-empty RESULT=fail>

Test case failing :-
Baseline Bootrr deferred-probe-empty test -https://github.com/kernelci/bootrr/blob/main/helpers/bootrr-generic-tests

Regression Reproduced :-

Lava job after reverting the commit 5ee76c256e92
https://lava.collabora.dev/scheduler/job/11292890


Bisection report from KernelCI can be found at the bottom of the email.

Thanks,
Shreeya Patel

#regzbot introduced: 5ee76c256e92
#regzbot title: KernelCI: Multiple devices deferring on r8a77960-ulcb

---------------------------------------------------------------------------------------------------------------------------------------------------

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * **
* If you do send a fix, please include this trailer: *
* Reported-by: "kernelci.org bot" <bot@...> *
* *
* Hope this helps! *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

stable-rc/linux-5.10.y bisection: baseline.bootrr.deferred-probe-empty 
on r8a77960-ulcb

Summary:
Start: 686c84f2f136 Linux 5.10.189-rc1
Plain log: 
https://storage.kernelci.org/stable-rc/linux-5.10.y/v5.10.188-183-g686c84f2f1364/arm64/defconfig/gcc-10/lab-collabora/baseline-r8a77960-ulcb.txt
HTML log: 
https://storage.kernelci.org/stable-rc/linux-5.10.y/v5.10.188-183-g686c84f2f1364/arm64/defconfig/gcc-10/lab-collabora/baseline-r8a77960-ulcb.html
Result: 71cbce75031a driver core: Fix wait_for_device_probe() & 
deferred_probe_timeout interaction

Checks:
revert: PASS
verify: PASS

Parameters:
Tree: stable-rc
URL: 
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
Branch: linux-5.10.y
Target: r8a77960-ulcb
CPU arch: arm64
Lab: lab-collabora
Compiler: gcc-10
Config: defconfig
Test case: baseline.bootrr.deferred-probe-empty

Breaking commit found:

-------------------------------------------------------------------------------
commit 71cbce75031aed26c72c2dc8a83111d181685f1b
Author: Saravana Kannan <saravanak@...>
Date: Fri Jun 3 13:31:37 2022 +0200

driver core: Fix wait_for_device_probe() & deferred_probe_timeout 
interaction

[ Upstream commit 5ee76c256e928455212ab759c51d198fedbe7523 ]

Mounting NFS rootfs was timing out when deferred_probe_timeout was
non-zero [1]. This was because ip_auto_config() initcall times out
waiting for the network interfaces to show up when
deferred_probe_timeout was non-zero. While ip_auto_config() calls
wait_for_device_probe() to make sure any currently running deferred
probe work or asynchronous probe finishes, that wasn't sufficient to
account for devices being deferred until deferred_probe_timeout.

Commit 35a672363ab3 ("driver core: Ensure wait_for_device_probe() waits
until the deferred_probe_timeout fires") tried to fix that by making
sure wait_for_device_probe() waits for deferred_probe_timeout to expire
before returning.

However, if wait_for_device_probe() is called from the kernel_init()
context:

- Before deferred_probe_initcall() [2], it causes the boot process to
hang due to a deadlock.

- After deferred_probe_initcall() [3], it blocks kernel_init() from
continuing till deferred_probe_timeout expires and beats the point of
deferred_probe_timeout that's trying to wait for userspace to load
modules.

Neither of this is good. So revert the changes to
wait_for_device_probe().

[1] - 
https://lore.kernel.org/lkml/TYAPR01MB45443DF63B9EF29054F7C41FD8C60@TYAPR01MB4544.jpnprd01.prod.outlook.com/
[2] - https://lore.kernel.org/lkml/YowHNo4sBjr9ijZr@dev-arch.thelio-3990X/
[3] - https://lore.kernel.org/lkml/Yo3WvGnNk3LvLb7R@linutronix.de/

Fixes: 35a672363ab3 ("driver core: Ensure wait_for_device_probe() waits 
until the deferred_probe_timeout fires")
Cc: John Stultz <jstultz@...>
Cc: "David S. Miller" <davem@...>
Cc: Alexey Kuznetsov <kuznet@...>
Cc: Hideaki YOSHIFUJI <yoshfuji@...>
Cc: Jakub Kicinski <kuba@...>
Cc: Rob Herring <robh@...>
Cc: Geert Uytterhoeven <geert@...>
Cc: Yoshihiro Shimoda <yoshihiro.shimoda.uh@...>
Cc: Robin Murphy <robin.murphy@...>
Cc: Andy Shevchenko <andy.shevchenko@...>
Cc: Sudeep Holla <sudeep.holla@...>
Cc: Andy Shevchenko <andriy.shevchenko@...>
Cc: Naresh Kamboju <naresh.kamboju@...>
Cc: Basil Eljuse <Basil.Eljuse@...>
Cc: Ferry Toth <fntoth@...>
Cc: Arnd Bergmann <arnd@...>
Cc: Anders Roxell <anders.roxell@...>
Cc: linux-pm@...
Reported-by: Nathan Chancellor <nathan@...>
Reported-by: Sebastian Andrzej Siewior <bigeasy@...>
Tested-by: Geert Uytterhoeven <geert+renesas@...>
Acked-by: John Stultz <jstultz@...>
Signed-off-by: Saravana Kannan <saravanak@...>
Link: https://lore.kernel.org/r/20220526034609.480766-2-saravanak@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@...>
Reviewed-by: Rafael J. Wysocki <rafael@...>
Signed-off-by: Linus Torvalds <torvalds@...>
Signed-off-by: Sasha Levin <sashal@...>

diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index 4f4e8aedbd2c..f9d9f1ad9215 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -250,7 +250,6 @@ DEFINE_SHOW_ATTRIBUTE(deferred_devs);

int driver_deferred_probe_timeout;
EXPORT_SYMBOL_GPL(driver_deferred_probe_timeout);
-static DECLARE_WAIT_QUEUE_HEAD(probe_timeout_waitqueue);

static int __init deferred_probe_timeout_setup(char *str)
{
@@ -302,7 +301,6 @@ static void deferred_probe_timeout_work_func(struct 
work_struct *work)
list_for_each_entry(p, &deferred_probe_pending_list, deferred_probe)
dev_info(p->device, "deferred probe pending\n");
mutex_unlock(&deferred_probe_mutex);
- wake_up_all(&probe_timeout_waitqueue);
}
static DECLARE_DELAYED_WORK(deferred_probe_timeout_work, 
deferred_probe_timeout_work_func);

@@ -706,9 +704,6 @@ int driver_probe_done(void)
*/
void wait_for_device_probe(void)
{
- /* wait for probe timeout */
- wait_event(probe_timeout_waitqueue, !driver_deferred_probe_timeout);
-
/* wait for the deferred probe workqueue to finish */
flush_work(&deferred_probe_work);
-------------------------------------------------------------------------------


Git bisection log:

-------------------------------------------------------------------------------
git bisect start
# good: [2c85ebc57b3e1817b6ce1a6b703928e113a90442] Linux 5.10
git bisect good 2c85ebc57b3e1817b6ce1a6b703928e113a90442
# bad: [686c84f2f136412631eb684b064def993a96a8cc] Linux 5.10.189-rc1
git bisect bad 686c84f2f136412631eb684b064def993a96a8cc
# good: [88f1b613c37fbd3c4171f5a9decdcd12ae704637] Bluetooth: cmtp: fix 
possible panic when cmtp_init_sockets() fails
git bisect good 88f1b613c37fbd3c4171f5a9decdcd12ae704637
# bad: [6c5742372b2d5d36de129439e26eda05aab54652] Input: snvs_pwrkey - 
fix SNVS_HPVIDR1 register address
git bisect bad 6c5742372b2d5d36de129439e26eda05aab54652
# good: [07280d2c3f33d47741f42411eb8c976b70c6657a] random: make more 
consistent use of integer types
git bisect good 07280d2c3f33d47741f42411eb8c976b70c6657a
# bad: [2fc7f18ba2f98d15f174ce8e25a5afa46926eb55] tools headers: Remove 
broken definition of __LITTLE_ENDIAN
git bisect bad 2fc7f18ba2f98d15f174ce8e25a5afa46926eb55
# bad: [c2ae49a113a5344232f1ebb93bcf18bbd11e9c39] net: dsa: 
lantiq_gswip: Fix refcount leak in gswip_gphy_fw_list
git bisect bad c2ae49a113a5344232f1ebb93bcf18bbd11e9c39
# good: [c1b08aa568e829b743affe5d3231e6de28b7609e] ASoC: samsung: Use 
dev_err_probe() helper
git bisect good c1b08aa568e829b743affe5d3231e6de28b7609e
# good: [97a9ec86ccb4e336ecde46db42b59b2ff7e0d719] drm/nouveau/clk: Fix 
an incorrect NULL check on list iterator
git bisect good 97a9ec86ccb4e336ecde46db42b59b2ff7e0d719
# good: [572211d631d7665c6690b5a6cb80436f8c368dc1] pwm: lp3943: Fix duty 
calculation in case period was clamped
git bisect good 572211d631d7665c6690b5a6cb80436f8c368dc1
# good: [8f49e1694cbc29e76d5028267c1978cc2630e494] bpf: Fix probe read 
error in ___bpf_prog_run()
git bisect good 8f49e1694cbc29e76d5028267c1978cc2630e494
# bad: [3660db29b0305f9a1d95979c7af0f5db6ea99f5d] iommu/arm-smmu: fix 
possible null-ptr-deref in arm_smmu_device_probe()
git bisect bad 3660db29b0305f9a1d95979c7af0f5db6ea99f5d
# good: [04622d631826ba483ae3a0b8a71c745d8e21453d] gpio: pca953x: use 
the correct register address to do regcache sync
git bisect good 04622d631826ba483ae3a0b8a71c745d8e21453d
# bad: [32be2b805a1a13ccc68bd209ec3ae198dd3ba5d6] perf c2c: Fix sorting 
in percent_rmt_hitm_cmp()
git bisect bad 32be2b805a1a13ccc68bd209ec3ae198dd3ba5d6
# good: [c1f0187025905e9981000d44a92e159468b561a8] scsi: sd: Fix 
potential NULL pointer dereference
git bisect good c1f0187025905e9981000d44a92e159468b561a8
# bad: [71cbce75031aed26c72c2dc8a83111d181685f1b] driver core: Fix 
wait_for_device_probe() & deferred_probe_timeout interaction
git bisect bad 71cbce75031aed26c72c2dc8a83111d181685f1b
# good: [b8fac8e321044a9ac50f7185b4e9d91a7745e4b0] tipc: check attribute 
length for bearer name
git bisect good b8fac8e321044a9ac50f7185b4e9d91a7745e4b0
# first bad commit: [71cbce75031aed26c72c2dc8a83111d181685f1b] driver 
core: Fix wait_for_device_probe() & deferred_probe_timeout interaction
-------------------------------------------------------------------------------


> Fixes: 35a672363ab3 ("driver core: Ensure wait_for_device_probe() waits until the deferred_probe_timeout fires")
> Cc: John Stultz<jstultz@google.com>
> Cc: "David S. Miller"<davem@davemloft.net>
> Cc: Alexey Kuznetsov<kuznet@ms2.inr.ac.ru>
> Cc: Hideaki YOSHIFUJI<yoshfuji@linux-ipv6.org>
> Cc: Jakub Kicinski<kuba@kernel.org>
> Cc: Rob Herring<robh@kernel.org>
> Cc: Geert Uytterhoeven<geert@linux-m68k.org>
> Cc: Yoshihiro Shimoda<yoshihiro.shimoda.uh@renesas.com>
> Cc: Robin Murphy<robin.murphy@arm.com>
> Cc: Andy Shevchenko<andy.shevchenko@gmail.com>
> Cc: Sudeep Holla<sudeep.holla@arm.com>
> Cc: Andy Shevchenko<andriy.shevchenko@linux.intel.com>
> Cc: Naresh Kamboju<naresh.kamboju@linaro.org>
> Cc: Basil Eljuse<Basil.Eljuse@arm.com>
> Cc: Ferry Toth<fntoth@gmail.com>
> Cc: Arnd Bergmann<arnd@arndb.de>
> Cc: Anders Roxell<anders.roxell@linaro.org>
> Cc:linux-pm@vger.kernel.org
> Reported-by: Nathan Chancellor<nathan@kernel.org>
> Reported-by: Sebastian Andrzej Siewior<bigeasy@linutronix.de>
> Tested-by: Geert Uytterhoeven<geert+renesas@glider.be>
> Acked-by: John Stultz<jstultz@google.com>
> Signed-off-by: Saravana Kannan<saravanak@google.com>
> Link:https://lore.kernel.org/r/20220526034609.480766-2-saravanak@google.com
> Signed-off-by: Greg Kroah-Hartman<gregkh@linuxfoundation.org>
> Reviewed-by: Rafael J. Wysocki<rafael@kernel.org>
> Signed-off-by: Linus Torvalds<torvalds@linux-foundation.org>
> Signed-off-by: Sasha Levin<sashal@kernel.org>
> ---
>   drivers/base/dd.c | 5 -----
>   1 file changed, 5 deletions(-)
>
> diff --git a/drivers/base/dd.c b/drivers/base/dd.c
> index 977e94cf669e..86fd2ea35656 100644
> --- a/drivers/base/dd.c
> +++ b/drivers/base/dd.c
> @@ -257,7 +257,6 @@ DEFINE_SHOW_ATTRIBUTE(deferred_devs);
>   
>   int driver_deferred_probe_timeout;
>   EXPORT_SYMBOL_GPL(driver_deferred_probe_timeout);
> -static DECLARE_WAIT_QUEUE_HEAD(probe_timeout_waitqueue);
>   
>   static int __init deferred_probe_timeout_setup(char *str)
>   {
> @@ -312,7 +311,6 @@ static void deferred_probe_timeout_work_func(struct work_struct *work)
>   	list_for_each_entry(p, &deferred_probe_pending_list, deferred_probe)
>   		dev_info(p->device, "deferred probe pending\n");
>   	mutex_unlock(&deferred_probe_mutex);
> -	wake_up_all(&probe_timeout_waitqueue);
>   }
>   static DECLARE_DELAYED_WORK(deferred_probe_timeout_work, deferred_probe_timeout_work_func);
>   
> @@ -720,9 +718,6 @@ int driver_probe_done(void)
>    */
>   void wait_for_device_probe(void)
>   {
> -	/* wait for probe timeout */
> -	wait_event(probe_timeout_waitqueue, !driver_deferred_probe_timeout);
> -
>   	/* wait for the deferred probe workqueue to finish */
>   	flush_work(&deferred_probe_work);
>
Geert Uytterhoeven Aug. 16, 2023, 10:10 a.m. UTC | #2
Hi Shreeya,

On Wed, Aug 16, 2023 at 11:39 AM Shreeya Patel
<shreeya.patel@collabora.com> wrote:
> On 13/06/22 15:40, Greg Kroah-Hartman wrote:
> > From: Saravana Kannan<saravanak@google.com>
> >
> > [ Upstream commit 5ee76c256e928455212ab759c51d198fedbe7523 ]
> >
> > Mounting NFS rootfs was timing out when deferred_probe_timeout was
> > non-zero [1].  This was because ip_auto_config() initcall times out
> > waiting for the network interfaces to show up when
> > deferred_probe_timeout was non-zero. While ip_auto_config() calls
> > wait_for_device_probe() to make sure any currently running deferred
> > probe work or asynchronous probe finishes, that wasn't sufficient to
> > account for devices being deferred until deferred_probe_timeout.
> >
> > Commit 35a672363ab3 ("driver core: Ensure wait_for_device_probe() waits
> > until the deferred_probe_timeout fires") tried to fix that by making
> > sure wait_for_device_probe() waits for deferred_probe_timeout to expire
> > before returning.
> >
> > However, if wait_for_device_probe() is called from the kernel_init()
> > context:
> >
> > - Before deferred_probe_initcall() [2], it causes the boot process to
> >    hang due to a deadlock.
> >
> > - After deferred_probe_initcall() [3], it blocks kernel_init() from
> >    continuing till deferred_probe_timeout expires and beats the point of
> >    deferred_probe_timeout that's trying to wait for userspace to load
> >    modules.
> >
> > Neither of this is good. So revert the changes to
> > wait_for_device_probe().
> >
> > [1] -https://lore.kernel.org/lkml/TYAPR01MB45443DF63B9EF29054F7C41FD8C60@TYAPR01MB4544.jpnprd01.prod.outlook.com/
> > [2] -https://lore.kernel.org/lkml/YowHNo4sBjr9ijZr@dev-arch.thelio-3990X/
> > [3] -https://lore.kernel.org/lkml/Yo3WvGnNk3LvLb7R@linutronix.de/
>
> Hi Saravana, Greg,
>
>
> KernelCI found this patch causes the baseline.bootrr.deferred-probe-empty test to fail on r8a77960-ulcb,
> see the following details for more information.

Commit 9be4cbd09da820a2 ("driver core: Set default deferred_probe_timeout
back to 0.") in v5.19 contains a reference to the same commit as
mentioned in the Fixes tag.  Does backporting that help?

Gr{oetje,eeting}s,

                        Geert
Geert Uytterhoeven Aug. 16, 2023, 10:15 a.m. UTC | #3
On Wed, Aug 16, 2023 at 12:10 PM Geert Uytterhoeven
<geert@linux-m68k.org> wrote:
> On Wed, Aug 16, 2023 at 11:39 AM Shreeya Patel
> <shreeya.patel@collabora.com> wrote:
> > On 13/06/22 15:40, Greg Kroah-Hartman wrote:
> > > From: Saravana Kannan<saravanak@google.com>
> > >
> > > [ Upstream commit 5ee76c256e928455212ab759c51d198fedbe7523 ]
> > >
> > > Mounting NFS rootfs was timing out when deferred_probe_timeout was
> > > non-zero [1].  This was because ip_auto_config() initcall times out
> > > waiting for the network interfaces to show up when
> > > deferred_probe_timeout was non-zero. While ip_auto_config() calls
> > > wait_for_device_probe() to make sure any currently running deferred
> > > probe work or asynchronous probe finishes, that wasn't sufficient to
> > > account for devices being deferred until deferred_probe_timeout.
> > >
> > > Commit 35a672363ab3 ("driver core: Ensure wait_for_device_probe() waits
> > > until the deferred_probe_timeout fires") tried to fix that by making
> > > sure wait_for_device_probe() waits for deferred_probe_timeout to expire
> > > before returning.
> > >
> > > However, if wait_for_device_probe() is called from the kernel_init()
> > > context:
> > >
> > > - Before deferred_probe_initcall() [2], it causes the boot process to
> > >    hang due to a deadlock.
> > >
> > > - After deferred_probe_initcall() [3], it blocks kernel_init() from
> > >    continuing till deferred_probe_timeout expires and beats the point of
> > >    deferred_probe_timeout that's trying to wait for userspace to load
> > >    modules.
> > >
> > > Neither of this is good. So revert the changes to
> > > wait_for_device_probe().
> > >
> > > [1] -https://lore.kernel.org/lkml/TYAPR01MB45443DF63B9EF29054F7C41FD8C60@TYAPR01MB4544.jpnprd01.prod.outlook.com/
> > > [2] -https://lore.kernel.org/lkml/YowHNo4sBjr9ijZr@dev-arch.thelio-3990X/
> > > [3] -https://lore.kernel.org/lkml/Yo3WvGnNk3LvLb7R@linutronix.de/
> >
> > Hi Saravana, Greg,
> >
> >
> > KernelCI found this patch causes the baseline.bootrr.deferred-probe-empty test to fail on r8a77960-ulcb,
> > see the following details for more information.
>
> Commit 9be4cbd09da820a2 ("driver core: Set default deferred_probe_timeout
> back to 0.") in v5.19 contains a reference to the same commit as
> mentioned in the Fixes tag.  Does backporting that help?

Anyway, remembering the days (weeks?) spent in investigating
subtle issues with fw_devlinks and deferred probe, collecting all the
fixes for backporting to stable may be a very hard job...

Gr{oetje,eeting}s,

                        Geert
Greg KH Aug. 16, 2023, 3:03 p.m. UTC | #4
On Wed, Aug 16, 2023 at 03:09:27PM +0530, Shreeya Patel wrote:
> On 13/06/22 15:40, Greg Kroah-Hartman wrote:
> > From: Saravana Kannan<saravanak@google.com>
> > 
> > [ Upstream commit 5ee76c256e928455212ab759c51d198fedbe7523 ]
> > 
> > Mounting NFS rootfs was timing out when deferred_probe_timeout was
> > non-zero [1].  This was because ip_auto_config() initcall times out
> > waiting for the network interfaces to show up when
> > deferred_probe_timeout was non-zero. While ip_auto_config() calls
> > wait_for_device_probe() to make sure any currently running deferred
> > probe work or asynchronous probe finishes, that wasn't sufficient to
> > account for devices being deferred until deferred_probe_timeout.
> > 
> > Commit 35a672363ab3 ("driver core: Ensure wait_for_device_probe() waits
> > until the deferred_probe_timeout fires") tried to fix that by making
> > sure wait_for_device_probe() waits for deferred_probe_timeout to expire
> > before returning.
> > 
> > However, if wait_for_device_probe() is called from the kernel_init()
> > context:
> > 
> > - Before deferred_probe_initcall() [2], it causes the boot process to
> >    hang due to a deadlock.
> > 
> > - After deferred_probe_initcall() [3], it blocks kernel_init() from
> >    continuing till deferred_probe_timeout expires and beats the point of
> >    deferred_probe_timeout that's trying to wait for userspace to load
> >    modules.
> > 
> > Neither of this is good. So revert the changes to
> > wait_for_device_probe().
> > 
> > [1] -https://lore.kernel.org/lkml/TYAPR01MB45443DF63B9EF29054F7C41FD8C60@TYAPR01MB4544.jpnprd01.prod.outlook.com/
> > [2] -https://lore.kernel.org/lkml/YowHNo4sBjr9ijZr@dev-arch.thelio-3990X/
> > [3] -https://lore.kernel.org/lkml/Yo3WvGnNk3LvLb7R@linutronix.de/
> 
> Hi Saravana, Greg,
> 
> 
> KernelCI found this patch causes the baseline.bootrr.deferred-probe-empty test to fail on r8a77960-ulcb,
> see the following details for more information.
> 
> KernelCI dashboard link:
> https://linux.kernelci.org/test/plan/id/64d2a6be8c1a8435e535b264/
> 
> Error messages from the logs :-
> 
> + UUID=11236495_1.5.2.4.5
> + set +x
> + export 'PATH=/opt/bootrr/libexec/bootrr/helpers:/lava-11236495/1/../bin:/sbin:/usr/sbin:/bin:/usr/bin'
> + cd /opt/bootrr/libexec/bootrr
> + sh helpers/bootrr-auto
> e6800000.ethernet	
> e6700000.dma-controller	
> e7300000.dma-controller	
> e7310000.dma-controller	
> ec700000.dma-controller	
> ec720000.dma-controller	
> fea20000.vsp	
> feb00000.display	
> fea28000.vsp	
> fea30000.vsp	
> fe9a0000.vsp	
> fe9af000.fcp	
> fea27000.fcp	
> fea2f000.fcp	
> fea37000.fcp	
> sound	
> ee100000.mmc	
> ee140000.mmc	
> ec500000.sound	
> /lava-11236495/1/../bin/lava-test-case
> <8>[   17.476741] <LAVA_SIGNAL_TESTCASE TEST_CASE_ID=deferred-probe-empty RESULT=fail>
> 
> Test case failing :-
> Baseline Bootrr deferred-probe-empty test -https://github.com/kernelci/bootrr/blob/main/helpers/bootrr-generic-tests
> 
> Regression Reproduced :-
> 
> Lava job after reverting the commit 5ee76c256e92
> https://lava.collabora.dev/scheduler/job/11292890
> 
> 
> Bisection report from KernelCI can be found at the bottom of the email.
> 
> Thanks,
> Shreeya Patel
> 
> #regzbot introduced: 5ee76c256e92
> #regzbot title: KernelCI: Multiple devices deferring on r8a77960-ulcb
> 
> ---------------------------------------------------------------------------------------------------------------------------------------------------
> 
> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * **
> * If you do send a fix, please include this trailer: *
> * Reported-by: "kernelci.org bot" <bot@...> *
> * *
> * Hope this helps! *
> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> 
> stable-rc/linux-5.10.y bisection: baseline.bootrr.deferred-probe-empty on
> r8a77960-ulcb

You are testing 5.10.y, yet the subject says 5.17?

Which is it here?

confused,

greg k-h
Shreeya Patel Aug. 17, 2023, 11:36 a.m. UTC | #5
Hi Greg,

On 16/08/23 20:33, Greg Kroah-Hartman wrote:
> On Wed, Aug 16, 2023 at 03:09:27PM +0530, Shreeya Patel wrote:
>> On 13/06/22 15:40, Greg Kroah-Hartman wrote:
>>> From: Saravana Kannan<saravanak@google.com>
>>>
>>> [ Upstream commit 5ee76c256e928455212ab759c51d198fedbe7523 ]
>>>
>>> Mounting NFS rootfs was timing out when deferred_probe_timeout was
>>> non-zero [1].  This was because ip_auto_config() initcall times out
>>> waiting for the network interfaces to show up when
>>> deferred_probe_timeout was non-zero. While ip_auto_config() calls
>>> wait_for_device_probe() to make sure any currently running deferred
>>> probe work or asynchronous probe finishes, that wasn't sufficient to
>>> account for devices being deferred until deferred_probe_timeout.
>>>
>>> Commit 35a672363ab3 ("driver core: Ensure wait_for_device_probe() waits
>>> until the deferred_probe_timeout fires") tried to fix that by making
>>> sure wait_for_device_probe() waits for deferred_probe_timeout to expire
>>> before returning.
>>>
>>> However, if wait_for_device_probe() is called from the kernel_init()
>>> context:
>>>
>>> - Before deferred_probe_initcall() [2], it causes the boot process to
>>>     hang due to a deadlock.
>>>
>>> - After deferred_probe_initcall() [3], it blocks kernel_init() from
>>>     continuing till deferred_probe_timeout expires and beats the point of
>>>     deferred_probe_timeout that's trying to wait for userspace to load
>>>     modules.
>>>
>>> Neither of this is good. So revert the changes to
>>> wait_for_device_probe().
>>>
>>> [1] -https://lore.kernel.org/lkml/TYAPR01MB45443DF63B9EF29054F7C41FD8C60@TYAPR01MB4544.jpnprd01.prod.outlook.com/
>>> [2] -https://lore.kernel.org/lkml/YowHNo4sBjr9ijZr@dev-arch.thelio-3990X/
>>> [3] -https://lore.kernel.org/lkml/Yo3WvGnNk3LvLb7R@linutronix.de/
>> Hi Saravana, Greg,
>>
>>
>> KernelCI found this patch causes the baseline.bootrr.deferred-probe-empty test to fail on r8a77960-ulcb,
>> see the following details for more information.
>>
>> KernelCI dashboard link:
>> https://linux.kernelci.org/test/plan/id/64d2a6be8c1a8435e535b264/
>>
>> Error messages from the logs :-
>>
>> + UUID=11236495_1.5.2.4.5
>> + set +x
>> + export 'PATH=/opt/bootrr/libexec/bootrr/helpers:/lava-11236495/1/../bin:/sbin:/usr/sbin:/bin:/usr/bin'
>> + cd /opt/bootrr/libexec/bootrr
>> + sh helpers/bootrr-auto
>> e6800000.ethernet	
>> e6700000.dma-controller	
>> e7300000.dma-controller	
>> e7310000.dma-controller	
>> ec700000.dma-controller	
>> ec720000.dma-controller	
>> fea20000.vsp	
>> feb00000.display	
>> fea28000.vsp	
>> fea30000.vsp	
>> fe9a0000.vsp	
>> fe9af000.fcp	
>> fea27000.fcp	
>> fea2f000.fcp	
>> fea37000.fcp	
>> sound	
>> ee100000.mmc	
>> ee140000.mmc	
>> ec500000.sound	
>> /lava-11236495/1/../bin/lava-test-case
>> <8>[   17.476741] <LAVA_SIGNAL_TESTCASE TEST_CASE_ID=deferred-probe-empty RESULT=fail>
>>
>> Test case failing :-
>> Baseline Bootrr deferred-probe-empty test -https://github.com/kernelci/bootrr/blob/main/helpers/bootrr-generic-tests
>>
>> Regression Reproduced :-
>>
>> Lava job after reverting the commit 5ee76c256e92
>> https://lava.collabora.dev/scheduler/job/11292890
>>
>>
>> Bisection report from KernelCI can be found at the bottom of the email.
>>
>> Thanks,
>> Shreeya Patel
>>
>> #regzbot introduced: 5ee76c256e92
>> #regzbot title: KernelCI: Multiple devices deferring on r8a77960-ulcb
>>
>> ---------------------------------------------------------------------------------------------------------------------------------------------------
>>
>> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * **
>> * If you do send a fix, please include this trailer: *
>> * Reported-by: "kernelci.org bot" <bot@...> *
>> * *
>> * Hope this helps! *
>> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
>>
>> stable-rc/linux-5.10.y bisection: baseline.bootrr.deferred-probe-empty on
>> r8a77960-ulcb
> You are testing 5.10.y, yet the subject says 5.17?
>
> Which is it here?

Sorry, I accidentally used the lore link for 5.17 while reporting this 
issue,
but this test does fail on all the stable releases from 5.10 onwards.

stable 5.15 :- 
https://linux.kernelci.org/test/case/id/64dd156a5ac58d0cf335b1ea/
mainline :- 
https://linux.kernelci.org/test/case/id/64dc13d55cb51357a135b209/

Thanks,
Shreeya Patel

>
> confused,
>
> greg k-h
>
Saravana Kannan Aug. 17, 2023, 6:33 p.m. UTC | #6
On Thu, Aug 17, 2023 at 4:37 AM Shreeya Patel
<shreeya.patel@collabora.com> wrote:
>
> Hi Greg,
>
> On 16/08/23 20:33, Greg Kroah-Hartman wrote:
> > On Wed, Aug 16, 2023 at 03:09:27PM +0530, Shreeya Patel wrote:
> >> On 13/06/22 15:40, Greg Kroah-Hartman wrote:
> >>> From: Saravana Kannan<saravanak@google.com>
> >>>
> >>> [ Upstream commit 5ee76c256e928455212ab759c51d198fedbe7523 ]
> >>>
> >>> Mounting NFS rootfs was timing out when deferred_probe_timeout was
> >>> non-zero [1].  This was because ip_auto_config() initcall times out
> >>> waiting for the network interfaces to show up when
> >>> deferred_probe_timeout was non-zero. While ip_auto_config() calls
> >>> wait_for_device_probe() to make sure any currently running deferred
> >>> probe work or asynchronous probe finishes, that wasn't sufficient to
> >>> account for devices being deferred until deferred_probe_timeout.
> >>>
> >>> Commit 35a672363ab3 ("driver core: Ensure wait_for_device_probe() waits
> >>> until the deferred_probe_timeout fires") tried to fix that by making
> >>> sure wait_for_device_probe() waits for deferred_probe_timeout to expire
> >>> before returning.
> >>>
> >>> However, if wait_for_device_probe() is called from the kernel_init()
> >>> context:
> >>>
> >>> - Before deferred_probe_initcall() [2], it causes the boot process to
> >>>     hang due to a deadlock.
> >>>
> >>> - After deferred_probe_initcall() [3], it blocks kernel_init() from
> >>>     continuing till deferred_probe_timeout expires and beats the point of
> >>>     deferred_probe_timeout that's trying to wait for userspace to load
> >>>     modules.
> >>>
> >>> Neither of this is good. So revert the changes to
> >>> wait_for_device_probe().
> >>>
> >>> [1] -https://lore.kernel.org/lkml/TYAPR01MB45443DF63B9EF29054F7C41FD8C60@TYAPR01MB4544.jpnprd01.prod.outlook.com/
> >>> [2] -https://lore.kernel.org/lkml/YowHNo4sBjr9ijZr@dev-arch.thelio-3990X/
> >>> [3] -https://lore.kernel.org/lkml/Yo3WvGnNk3LvLb7R@linutronix.de/
> >> Hi Saravana, Greg,
> >>
> >>
> >> KernelCI found this patch causes the baseline.bootrr.deferred-probe-empty test to fail on r8a77960-ulcb,
> >> see the following details for more information.
> >>
> >> KernelCI dashboard link:
> >> https://linux.kernelci.org/test/plan/id/64d2a6be8c1a8435e535b264/
> >>
> >> Error messages from the logs :-
> >>
> >> + UUID=11236495_1.5.2.4.5
> >> + set +x
> >> + export 'PATH=/opt/bootrr/libexec/bootrr/helpers:/lava-11236495/1/../bin:/sbin:/usr/sbin:/bin:/usr/bin'
> >> + cd /opt/bootrr/libexec/bootrr
> >> + sh helpers/bootrr-auto
> >> e6800000.ethernet
> >> e6700000.dma-controller
> >> e7300000.dma-controller
> >> e7310000.dma-controller
> >> ec700000.dma-controller
> >> ec720000.dma-controller
> >> fea20000.vsp
> >> feb00000.display
> >> fea28000.vsp
> >> fea30000.vsp
> >> fe9a0000.vsp
> >> fe9af000.fcp
> >> fea27000.fcp
> >> fea2f000.fcp
> >> fea37000.fcp
> >> sound
> >> ee100000.mmc
> >> ee140000.mmc
> >> ec500000.sound
> >> /lava-11236495/1/../bin/lava-test-case
> >> <8>[   17.476741] <LAVA_SIGNAL_TESTCASE TEST_CASE_ID=deferred-probe-empty RESULT=fail>
> >>
> >> Test case failing :-
> >> Baseline Bootrr deferred-probe-empty test -https://github.com/kernelci/bootrr/blob/main/helpers/bootrr-generic-tests
> >>
> >> Regression Reproduced :-
> >>
> >> Lava job after reverting the commit 5ee76c256e92
> >> https://lava.collabora.dev/scheduler/job/11292890
> >>
> >>
> >> Bisection report from KernelCI can be found at the bottom of the email.
> >>
> >> Thanks,
> >> Shreeya Patel
> >>
> >> #regzbot introduced: 5ee76c256e92
> >> #regzbot title: KernelCI: Multiple devices deferring on r8a77960-ulcb
> >>
> >> ---------------------------------------------------------------------------------------------------------------------------------------------------
> >>
> >> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * **
> >> * If you do send a fix, please include this trailer: *
> >> * Reported-by: "kernelci.org bot" <bot@...> *
> >> * *
> >> * Hope this helps! *
> >> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> >>
> >> stable-rc/linux-5.10.y bisection: baseline.bootrr.deferred-probe-empty on
> >> r8a77960-ulcb
> > You are testing 5.10.y, yet the subject says 5.17?
> >
> > Which is it here?
>
> Sorry, I accidentally used the lore link for 5.17 while reporting this
> issue,
> but this test does fail on all the stable releases from 5.10 onwards.
>
> stable 5.15 :-
> https://linux.kernelci.org/test/case/id/64dd156a5ac58d0cf335b1ea/
> mainline :-
> https://linux.kernelci.org/test/case/id/64dc13d55cb51357a135b209/
>

Shreeya, can you try the patch Geert suggested and let us know if it
helps? If not, then I can try to take a closer look.

-Saravana
Shreeya Patel Aug. 17, 2023, 11:13 p.m. UTC | #7
Hi Geert, Saravana,

On 18/08/23 00:03, Saravana Kannan wrote:
> On Thu, Aug 17, 2023 at 4:37 AM Shreeya Patel
> <shreeya.patel@collabora.com> wrote:
>> Hi Greg,
>>
>> On 16/08/23 20:33, Greg Kroah-Hartman wrote:
>>> On Wed, Aug 16, 2023 at 03:09:27PM +0530, Shreeya Patel wrote:
>>>> On 13/06/22 15:40, Greg Kroah-Hartman wrote:
>>>>> From: Saravana Kannan<saravanak@google.com>
>>>>>
>>>>> [ Upstream commit 5ee76c256e928455212ab759c51d198fedbe7523 ]
>>>>>
>>>>> Mounting NFS rootfs was timing out when deferred_probe_timeout was
>>>>> non-zero [1].  This was because ip_auto_config() initcall times out
>>>>> waiting for the network interfaces to show up when
>>>>> deferred_probe_timeout was non-zero. While ip_auto_config() calls
>>>>> wait_for_device_probe() to make sure any currently running deferred
>>>>> probe work or asynchronous probe finishes, that wasn't sufficient to
>>>>> account for devices being deferred until deferred_probe_timeout.
>>>>>
>>>>> Commit 35a672363ab3 ("driver core: Ensure wait_for_device_probe() waits
>>>>> until the deferred_probe_timeout fires") tried to fix that by making
>>>>> sure wait_for_device_probe() waits for deferred_probe_timeout to expire
>>>>> before returning.
>>>>>
>>>>> However, if wait_for_device_probe() is called from the kernel_init()
>>>>> context:
>>>>>
>>>>> - Before deferred_probe_initcall() [2], it causes the boot process to
>>>>>      hang due to a deadlock.
>>>>>
>>>>> - After deferred_probe_initcall() [3], it blocks kernel_init() from
>>>>>      continuing till deferred_probe_timeout expires and beats the point of
>>>>>      deferred_probe_timeout that's trying to wait for userspace to load
>>>>>      modules.
>>>>>
>>>>> Neither of this is good. So revert the changes to
>>>>> wait_for_device_probe().
>>>>>
>>>>> [1] -https://lore.kernel.org/lkml/TYAPR01MB45443DF63B9EF29054F7C41FD8C60@TYAPR01MB4544.jpnprd01.prod.outlook.com/
>>>>> [2] -https://lore.kernel.org/lkml/YowHNo4sBjr9ijZr@dev-arch.thelio-3990X/
>>>>> [3] -https://lore.kernel.org/lkml/Yo3WvGnNk3LvLb7R@linutronix.de/
>>>> Hi Saravana, Greg,
>>>>
>>>>
>>>> KernelCI found this patch causes the baseline.bootrr.deferred-probe-empty test to fail on r8a77960-ulcb,
>>>> see the following details for more information.
>>>>
>>>> KernelCI dashboard link:
>>>> https://linux.kernelci.org/test/plan/id/64d2a6be8c1a8435e535b264/
>>>>
>>>> Error messages from the logs :-
>>>>
>>>> + UUID=11236495_1.5.2.4.5
>>>> + set +x
>>>> + export 'PATH=/opt/bootrr/libexec/bootrr/helpers:/lava-11236495/1/../bin:/sbin:/usr/sbin:/bin:/usr/bin'
>>>> + cd /opt/bootrr/libexec/bootrr
>>>> + sh helpers/bootrr-auto
>>>> e6800000.ethernet
>>>> e6700000.dma-controller
>>>> e7300000.dma-controller
>>>> e7310000.dma-controller
>>>> ec700000.dma-controller
>>>> ec720000.dma-controller
>>>> fea20000.vsp
>>>> feb00000.display
>>>> fea28000.vsp
>>>> fea30000.vsp
>>>> fe9a0000.vsp
>>>> fe9af000.fcp
>>>> fea27000.fcp
>>>> fea2f000.fcp
>>>> fea37000.fcp
>>>> sound
>>>> ee100000.mmc
>>>> ee140000.mmc
>>>> ec500000.sound
>>>> /lava-11236495/1/../bin/lava-test-case
>>>> <8>[   17.476741] <LAVA_SIGNAL_TESTCASE TEST_CASE_ID=deferred-probe-empty RESULT=fail>
>>>>
>>>> Test case failing :-
>>>> Baseline Bootrr deferred-probe-empty test -https://github.com/kernelci/bootrr/blob/main/helpers/bootrr-generic-tests
>>>>
>>>> Regression Reproduced :-
>>>>
>>>> Lava job after reverting the commit 5ee76c256e92
>>>> https://lava.collabora.dev/scheduler/job/11292890
>>>>
>>>>
>>>> Bisection report from KernelCI can be found at the bottom of the email.
>>>>
>>>> Thanks,
>>>> Shreeya Patel
>>>>
>>>> #regzbot introduced: 5ee76c256e92
>>>> #regzbot title: KernelCI: Multiple devices deferring on r8a77960-ulcb
>>>>
>>>> ---------------------------------------------------------------------------------------------------------------------------------------------------
>>>>
>>>> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * **
>>>> * If you do send a fix, please include this trailer: *
>>>> * Reported-by: "kernelci.org bot" <bot@...> *
>>>> * *
>>>> * Hope this helps! *
>>>> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
>>>>
>>>> stable-rc/linux-5.10.y bisection: baseline.bootrr.deferred-probe-empty on
>>>> r8a77960-ulcb
>>> You are testing 5.10.y, yet the subject says 5.17?
>>>
>>> Which is it here?
>> Sorry, I accidentally used the lore link for 5.17 while reporting this
>> issue,
>> but this test does fail on all the stable releases from 5.10 onwards.
>>
>> stable 5.15 :-
>> https://linux.kernelci.org/test/case/id/64dd156a5ac58d0cf335b1ea/
>> mainline :-
>> https://linux.kernelci.org/test/case/id/64dc13d55cb51357a135b209/
>>
> Shreeya, can you try the patch Geert suggested and let us know if it
> helps? If not, then I can try to take a closer look.

I tried to test the kernel with 9be4cbd09da8 but it didn't change the 
result.
https://lava.collabora.dev/scheduler/job/11311615

Also, I am not sure if this can change things but just FYI, KernelCI 
adds some kernel parameters when running these tests and one of the 
parameter is deferred_probe_timeout=60.
You can check this in the definition details given in the Lava job. I 
also tried to remove this parameter and rerun the test but again I got 
the same result.

I will try to add 9be4cbd09da8 to mainline kernel and see what results I 
get.


Thanks,
Shreeya Patel

>
> -Saravana
>
Saravana Kannan Aug. 18, 2023, 8:19 p.m. UTC | #8
On Thu, Aug 17, 2023 at 4:13 PM Shreeya Patel
<shreeya.patel@collabora.com> wrote:
>
> Hi Geert, Saravana,
>
> On 18/08/23 00:03, Saravana Kannan wrote:
> > On Thu, Aug 17, 2023 at 4:37 AM Shreeya Patel
> > <shreeya.patel@collabora.com> wrote:
> >> Hi Greg,
> >>
> >> On 16/08/23 20:33, Greg Kroah-Hartman wrote:
> >>> On Wed, Aug 16, 2023 at 03:09:27PM +0530, Shreeya Patel wrote:
> >>>> On 13/06/22 15:40, Greg Kroah-Hartman wrote:
> >>>>> From: Saravana Kannan<saravanak@google.com>
> >>>>>
> >>>>> [ Upstream commit 5ee76c256e928455212ab759c51d198fedbe7523 ]
> >>>>>
> >>>>> Mounting NFS rootfs was timing out when deferred_probe_timeout was
> >>>>> non-zero [1].  This was because ip_auto_config() initcall times out
> >>>>> waiting for the network interfaces to show up when
> >>>>> deferred_probe_timeout was non-zero. While ip_auto_config() calls
> >>>>> wait_for_device_probe() to make sure any currently running deferred
> >>>>> probe work or asynchronous probe finishes, that wasn't sufficient to
> >>>>> account for devices being deferred until deferred_probe_timeout.
> >>>>>
> >>>>> Commit 35a672363ab3 ("driver core: Ensure wait_for_device_probe() waits
> >>>>> until the deferred_probe_timeout fires") tried to fix that by making
> >>>>> sure wait_for_device_probe() waits for deferred_probe_timeout to expire
> >>>>> before returning.
> >>>>>
> >>>>> However, if wait_for_device_probe() is called from the kernel_init()
> >>>>> context:
> >>>>>
> >>>>> - Before deferred_probe_initcall() [2], it causes the boot process to
> >>>>>      hang due to a deadlock.
> >>>>>
> >>>>> - After deferred_probe_initcall() [3], it blocks kernel_init() from
> >>>>>      continuing till deferred_probe_timeout expires and beats the point of
> >>>>>      deferred_probe_timeout that's trying to wait for userspace to load
> >>>>>      modules.
> >>>>>
> >>>>> Neither of this is good. So revert the changes to
> >>>>> wait_for_device_probe().
> >>>>>
> >>>>> [1] -https://lore.kernel.org/lkml/TYAPR01MB45443DF63B9EF29054F7C41FD8C60@TYAPR01MB4544.jpnprd01.prod.outlook.com/
> >>>>> [2] -https://lore.kernel.org/lkml/YowHNo4sBjr9ijZr@dev-arch.thelio-3990X/
> >>>>> [3] -https://lore.kernel.org/lkml/Yo3WvGnNk3LvLb7R@linutronix.de/
> >>>> Hi Saravana, Greg,
> >>>>
> >>>>
> >>>> KernelCI found this patch causes the baseline.bootrr.deferred-probe-empty test to fail on r8a77960-ulcb,
> >>>> see the following details for more information.
> >>>>
> >>>> KernelCI dashboard link:
> >>>> https://linux.kernelci.org/test/plan/id/64d2a6be8c1a8435e535b264/
> >>>>
> >>>> Error messages from the logs :-
> >>>>
> >>>> + UUID=11236495_1.5.2.4.5
> >>>> + set +x
> >>>> + export 'PATH=/opt/bootrr/libexec/bootrr/helpers:/lava-11236495/1/../bin:/sbin:/usr/sbin:/bin:/usr/bin'
> >>>> + cd /opt/bootrr/libexec/bootrr
> >>>> + sh helpers/bootrr-auto
> >>>> e6800000.ethernet
> >>>> e6700000.dma-controller
> >>>> e7300000.dma-controller
> >>>> e7310000.dma-controller
> >>>> ec700000.dma-controller
> >>>> ec720000.dma-controller
> >>>> fea20000.vsp
> >>>> feb00000.display
> >>>> fea28000.vsp
> >>>> fea30000.vsp
> >>>> fe9a0000.vsp
> >>>> fe9af000.fcp
> >>>> fea27000.fcp
> >>>> fea2f000.fcp
> >>>> fea37000.fcp
> >>>> sound
> >>>> ee100000.mmc
> >>>> ee140000.mmc
> >>>> ec500000.sound
> >>>> /lava-11236495/1/../bin/lava-test-case
> >>>> <8>[   17.476741] <LAVA_SIGNAL_TESTCASE TEST_CASE_ID=deferred-probe-empty RESULT=fail>
> >>>>
> >>>> Test case failing :-
> >>>> Baseline Bootrr deferred-probe-empty test -https://github.com/kernelci/bootrr/blob/main/helpers/bootrr-generic-tests
> >>>>
> >>>> Regression Reproduced :-
> >>>>
> >>>> Lava job after reverting the commit 5ee76c256e92
> >>>> https://lava.collabora.dev/scheduler/job/11292890
> >>>>
> >>>>
> >>>> Bisection report from KernelCI can be found at the bottom of the email.
> >>>>
> >>>> Thanks,
> >>>> Shreeya Patel
> >>>>
> >>>> #regzbot introduced: 5ee76c256e92
> >>>> #regzbot title: KernelCI: Multiple devices deferring on r8a77960-ulcb
> >>>>
> >>>> ---------------------------------------------------------------------------------------------------------------------------------------------------
> >>>>
> >>>> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * **
> >>>> * If you do send a fix, please include this trailer: *
> >>>> * Reported-by: "kernelci.org bot" <bot@...> *
> >>>> * *
> >>>> * Hope this helps! *
> >>>> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> >>>>
> >>>> stable-rc/linux-5.10.y bisection: baseline.bootrr.deferred-probe-empty on
> >>>> r8a77960-ulcb
> >>> You are testing 5.10.y, yet the subject says 5.17?
> >>>
> >>> Which is it here?
> >> Sorry, I accidentally used the lore link for 5.17 while reporting this
> >> issue,
> >> but this test does fail on all the stable releases from 5.10 onwards.
> >>
> >> stable 5.15 :-
> >> https://linux.kernelci.org/test/case/id/64dd156a5ac58d0cf335b1ea/
> >> mainline :-
> >> https://linux.kernelci.org/test/case/id/64dc13d55cb51357a135b209/
> >>
> > Shreeya, can you try the patch Geert suggested and let us know if it
> > helps? If not, then I can try to take a closer look.
>
> I tried to test the kernel with 9be4cbd09da8 but it didn't change the
> result.
> https://lava.collabora.dev/scheduler/job/11311615
>
> Also, I am not sure if this can change things but just FYI, KernelCI
> adds some kernel parameters when running these tests and one of the
> parameter is deferred_probe_timeout=60.

Ah this is good to know.

> You can check this in the definition details given in the Lava job. I
> also tried to remove this parameter and rerun the test but again I got
> the same result.

How long does the test wait after boot before checking for the
deferred devices list?

> I will try to add 9be4cbd09da8 to mainline kernel and see what results I
> get.

Now I'm confused. What do you mean by mainline? Are you saying the tip
of tree of Linus's tree is also hitting this issue?

-Saravana
Shreeya Patel Aug. 21, 2023, 11:35 a.m. UTC | #9
On 19/08/23 01:49, Saravana Kannan wrote:
> On Thu, Aug 17, 2023 at 4:13 PM Shreeya Patel
> <shreeya.patel@collabora.com> wrote:
>> Hi Geert, Saravana,
>>
>> On 18/08/23 00:03, Saravana Kannan wrote:
>>> On Thu, Aug 17, 2023 at 4:37 AM Shreeya Patel
>>> <shreeya.patel@collabora.com> wrote:
>>>> Hi Greg,
>>>>
>>>> On 16/08/23 20:33, Greg Kroah-Hartman wrote:
>>>>> On Wed, Aug 16, 2023 at 03:09:27PM +0530, Shreeya Patel wrote:
>>>>>> On 13/06/22 15:40, Greg Kroah-Hartman wrote:
>>>>>>> From: Saravana Kannan<saravanak@google.com>
>>>>>>>
>>>>>>> [ Upstream commit 5ee76c256e928455212ab759c51d198fedbe7523 ]
>>>>>>>
>>>>>>> Mounting NFS rootfs was timing out when deferred_probe_timeout was
>>>>>>> non-zero [1].  This was because ip_auto_config() initcall times out
>>>>>>> waiting for the network interfaces to show up when
>>>>>>> deferred_probe_timeout was non-zero. While ip_auto_config() calls
>>>>>>> wait_for_device_probe() to make sure any currently running deferred
>>>>>>> probe work or asynchronous probe finishes, that wasn't sufficient to
>>>>>>> account for devices being deferred until deferred_probe_timeout.
>>>>>>>
>>>>>>> Commit 35a672363ab3 ("driver core: Ensure wait_for_device_probe() waits
>>>>>>> until the deferred_probe_timeout fires") tried to fix that by making
>>>>>>> sure wait_for_device_probe() waits for deferred_probe_timeout to expire
>>>>>>> before returning.
>>>>>>>
>>>>>>> However, if wait_for_device_probe() is called from the kernel_init()
>>>>>>> context:
>>>>>>>
>>>>>>> - Before deferred_probe_initcall() [2], it causes the boot process to
>>>>>>>       hang due to a deadlock.
>>>>>>>
>>>>>>> - After deferred_probe_initcall() [3], it blocks kernel_init() from
>>>>>>>       continuing till deferred_probe_timeout expires and beats the point of
>>>>>>>       deferred_probe_timeout that's trying to wait for userspace to load
>>>>>>>       modules.
>>>>>>>
>>>>>>> Neither of this is good. So revert the changes to
>>>>>>> wait_for_device_probe().
>>>>>>>
>>>>>>> [1] -https://lore.kernel.org/lkml/TYAPR01MB45443DF63B9EF29054F7C41FD8C60@TYAPR01MB4544.jpnprd01.prod.outlook.com/
>>>>>>> [2] -https://lore.kernel.org/lkml/YowHNo4sBjr9ijZr@dev-arch.thelio-3990X/
>>>>>>> [3] -https://lore.kernel.org/lkml/Yo3WvGnNk3LvLb7R@linutronix.de/
>>>>>> Hi Saravana, Greg,
>>>>>>
>>>>>>
>>>>>> KernelCI found this patch causes the baseline.bootrr.deferred-probe-empty test to fail on r8a77960-ulcb,
>>>>>> see the following details for more information.
>>>>>>
>>>>>> KernelCI dashboard link:
>>>>>> https://linux.kernelci.org/test/plan/id/64d2a6be8c1a8435e535b264/
>>>>>>
>>>>>> Error messages from the logs :-
>>>>>>
>>>>>> + UUID=11236495_1.5.2.4.5
>>>>>> + set +x
>>>>>> + export 'PATH=/opt/bootrr/libexec/bootrr/helpers:/lava-11236495/1/../bin:/sbin:/usr/sbin:/bin:/usr/bin'
>>>>>> + cd /opt/bootrr/libexec/bootrr
>>>>>> + sh helpers/bootrr-auto
>>>>>> e6800000.ethernet
>>>>>> e6700000.dma-controller
>>>>>> e7300000.dma-controller
>>>>>> e7310000.dma-controller
>>>>>> ec700000.dma-controller
>>>>>> ec720000.dma-controller
>>>>>> fea20000.vsp
>>>>>> feb00000.display
>>>>>> fea28000.vsp
>>>>>> fea30000.vsp
>>>>>> fe9a0000.vsp
>>>>>> fe9af000.fcp
>>>>>> fea27000.fcp
>>>>>> fea2f000.fcp
>>>>>> fea37000.fcp
>>>>>> sound
>>>>>> ee100000.mmc
>>>>>> ee140000.mmc
>>>>>> ec500000.sound
>>>>>> /lava-11236495/1/../bin/lava-test-case
>>>>>> <8>[   17.476741] <LAVA_SIGNAL_TESTCASE TEST_CASE_ID=deferred-probe-empty RESULT=fail>
>>>>>>
>>>>>> Test case failing :-
>>>>>> Baseline Bootrr deferred-probe-empty test -https://github.com/kernelci/bootrr/blob/main/helpers/bootrr-generic-tests
>>>>>>
>>>>>> Regression Reproduced :-
>>>>>>
>>>>>> Lava job after reverting the commit 5ee76c256e92
>>>>>> https://lava.collabora.dev/scheduler/job/11292890
>>>>>>
>>>>>>
>>>>>> Bisection report from KernelCI can be found at the bottom of the email.
>>>>>>
>>>>>> Thanks,
>>>>>> Shreeya Patel
>>>>>>
>>>>>> #regzbot introduced: 5ee76c256e92
>>>>>> #regzbot title: KernelCI: Multiple devices deferring on r8a77960-ulcb
>>>>>>
>>>>>> ---------------------------------------------------------------------------------------------------------------------------------------------------
>>>>>>
>>>>>> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * **
>>>>>> * If you do send a fix, please include this trailer: *
>>>>>> * Reported-by: "kernelci.org bot" <bot@...> *
>>>>>> * *
>>>>>> * Hope this helps! *
>>>>>> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
>>>>>>
>>>>>> stable-rc/linux-5.10.y bisection: baseline.bootrr.deferred-probe-empty on
>>>>>> r8a77960-ulcb
>>>>> You are testing 5.10.y, yet the subject says 5.17?
>>>>>
>>>>> Which is it here?
>>>> Sorry, I accidentally used the lore link for 5.17 while reporting this
>>>> issue,
>>>> but this test does fail on all the stable releases from 5.10 onwards.
>>>>
>>>> stable 5.15 :-
>>>> https://linux.kernelci.org/test/case/id/64dd156a5ac58d0cf335b1ea/
>>>> mainline :-
>>>> https://linux.kernelci.org/test/case/id/64dc13d55cb51357a135b209/
>>>>
>>> Shreeya, can you try the patch Geert suggested and let us know if it
>>> helps? If not, then I can try to take a closer look.
>> I tried to test the kernel with 9be4cbd09da8 but it didn't change the
>> result.
>> https://lava.collabora.dev/scheduler/job/11311615
>>
>> Also, I am not sure if this can change things but just FYI, KernelCI
>> adds some kernel parameters when running these tests and one of the
>> parameter is deferred_probe_timeout=60.
> Ah this is good to know.
>
>> You can check this in the definition details given in the Lava job. I
>> also tried to remove this parameter and rerun the test but again I got
>> the same result.
> How long does the test wait after boot before checking for the
> deferred devices list?
>

AFAIK, script for running the tests is immediately ran after the boot 
process is complete so there is no wait time.

>> I will try to add 9be4cbd09da8 to mainline kernel and see what results I
>> get.
> Now I'm confused. What do you mean by mainline? Are you saying the tip
> of tree of Linus's tree is also hitting this issue?


KernelCI runs tests on different kernel branches and trees, we also have 
this same test running on mainline tree.
Following is the link to the dashboard for it and as you can see, it 
does fail there too.


https://linux.kernelci.org/test/case/id/64dc13d55cb51357a135b209/


> -Saravana
>
Robin Murphy Aug. 21, 2023, 12:39 p.m. UTC | #10
On 2023-08-21 12:35, Shreeya Patel wrote:
> 
> On 19/08/23 01:49, Saravana Kannan wrote:
>> On Thu, Aug 17, 2023 at 4:13 PM Shreeya Patel
>> <shreeya.patel@collabora.com> wrote:
>>> Hi Geert, Saravana,
>>>
>>> On 18/08/23 00:03, Saravana Kannan wrote:
>>>> On Thu, Aug 17, 2023 at 4:37 AM Shreeya Patel
>>>> <shreeya.patel@collabora.com> wrote:
>>>>> Hi Greg,
>>>>>
>>>>> On 16/08/23 20:33, Greg Kroah-Hartman wrote:
>>>>>> On Wed, Aug 16, 2023 at 03:09:27PM +0530, Shreeya Patel wrote:
>>>>>>> On 13/06/22 15:40, Greg Kroah-Hartman wrote:
>>>>>>>> From: Saravana Kannan<saravanak@google.com>
>>>>>>>>
>>>>>>>> [ Upstream commit 5ee76c256e928455212ab759c51d198fedbe7523 ]
>>>>>>>>
>>>>>>>> Mounting NFS rootfs was timing out when deferred_probe_timeout was
>>>>>>>> non-zero [1].  This was because ip_auto_config() initcall times out
>>>>>>>> waiting for the network interfaces to show up when
>>>>>>>> deferred_probe_timeout was non-zero. While ip_auto_config() calls
>>>>>>>> wait_for_device_probe() to make sure any currently running deferred
>>>>>>>> probe work or asynchronous probe finishes, that wasn't 
>>>>>>>> sufficient to
>>>>>>>> account for devices being deferred until deferred_probe_timeout.
>>>>>>>>
>>>>>>>> Commit 35a672363ab3 ("driver core: Ensure 
>>>>>>>> wait_for_device_probe() waits
>>>>>>>> until the deferred_probe_timeout fires") tried to fix that by 
>>>>>>>> making
>>>>>>>> sure wait_for_device_probe() waits for deferred_probe_timeout to 
>>>>>>>> expire
>>>>>>>> before returning.
>>>>>>>>
>>>>>>>> However, if wait_for_device_probe() is called from the 
>>>>>>>> kernel_init()
>>>>>>>> context:
>>>>>>>>
>>>>>>>> - Before deferred_probe_initcall() [2], it causes the boot 
>>>>>>>> process to
>>>>>>>>       hang due to a deadlock.
>>>>>>>>
>>>>>>>> - After deferred_probe_initcall() [3], it blocks kernel_init() from
>>>>>>>>       continuing till deferred_probe_timeout expires and beats 
>>>>>>>> the point of
>>>>>>>>       deferred_probe_timeout that's trying to wait for userspace 
>>>>>>>> to load
>>>>>>>>       modules.
>>>>>>>>
>>>>>>>> Neither of this is good. So revert the changes to
>>>>>>>> wait_for_device_probe().
>>>>>>>>
>>>>>>>> [1] 
>>>>>>>> -https://lore.kernel.org/lkml/TYAPR01MB45443DF63B9EF29054F7C41FD8C60@TYAPR01MB4544.jpnprd01.prod.outlook.com/
>>>>>>>> [2] 
>>>>>>>> -https://lore.kernel.org/lkml/YowHNo4sBjr9ijZr@dev-arch.thelio-3990X/
>>>>>>>> [3] -https://lore.kernel.org/lkml/Yo3WvGnNk3LvLb7R@linutronix.de/
>>>>>>> Hi Saravana, Greg,
>>>>>>>
>>>>>>>
>>>>>>> KernelCI found this patch causes the 
>>>>>>> baseline.bootrr.deferred-probe-empty test to fail on r8a77960-ulcb,
>>>>>>> see the following details for more information.
>>>>>>>
>>>>>>> KernelCI dashboard link:
>>>>>>> https://linux.kernelci.org/test/plan/id/64d2a6be8c1a8435e535b264/
>>>>>>>
>>>>>>> Error messages from the logs :-
>>>>>>>
>>>>>>> + UUID=11236495_1.5.2.4.5
>>>>>>> + set +x
>>>>>>> + export 
>>>>>>> 'PATH=/opt/bootrr/libexec/bootrr/helpers:/lava-11236495/1/../bin:/sbin:/usr/sbin:/bin:/usr/bin'
>>>>>>> + cd /opt/bootrr/libexec/bootrr
>>>>>>> + sh helpers/bootrr-auto
>>>>>>> e6800000.ethernet
>>>>>>> e6700000.dma-controller
>>>>>>> e7300000.dma-controller
>>>>>>> e7310000.dma-controller
>>>>>>> ec700000.dma-controller
>>>>>>> ec720000.dma-controller
>>>>>>> fea20000.vsp
>>>>>>> feb00000.display
>>>>>>> fea28000.vsp
>>>>>>> fea30000.vsp
>>>>>>> fe9a0000.vsp
>>>>>>> fe9af000.fcp
>>>>>>> fea27000.fcp
>>>>>>> fea2f000.fcp
>>>>>>> fea37000.fcp
>>>>>>> sound
>>>>>>> ee100000.mmc
>>>>>>> ee140000.mmc
>>>>>>> ec500000.sound
>>>>>>> /lava-11236495/1/../bin/lava-test-case
>>>>>>> <8>[   17.476741] <LAVA_SIGNAL_TESTCASE 
>>>>>>> TEST_CASE_ID=deferred-probe-empty RESULT=fail>
>>>>>>>
>>>>>>> Test case failing :-
>>>>>>> Baseline Bootrr deferred-probe-empty test 
>>>>>>> -https://github.com/kernelci/bootrr/blob/main/helpers/bootrr-generic-tests
>>>>>>>
>>>>>>> Regression Reproduced :-
>>>>>>>
>>>>>>> Lava job after reverting the commit 5ee76c256e92
>>>>>>> https://lava.collabora.dev/scheduler/job/11292890
>>>>>>>
>>>>>>>
>>>>>>> Bisection report from KernelCI can be found at the bottom of the 
>>>>>>> email.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Shreeya Patel
>>>>>>>
>>>>>>> #regzbot introduced: 5ee76c256e92
>>>>>>> #regzbot title: KernelCI: Multiple devices deferring on 
>>>>>>> r8a77960-ulcb
>>>>>>>
>>>>>>> ---------------------------------------------------------------------------------------------------------------------------------------------------
>>>>>>>
>>>>>>> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * **
>>>>>>> * If you do send a fix, please include this trailer: *
>>>>>>> * Reported-by: "kernelci.org bot" <bot@...> *
>>>>>>> * *
>>>>>>> * Hope this helps! *
>>>>>>> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
>>>>>>>
>>>>>>> stable-rc/linux-5.10.y bisection: 
>>>>>>> baseline.bootrr.deferred-probe-empty on
>>>>>>> r8a77960-ulcb
>>>>>> You are testing 5.10.y, yet the subject says 5.17?
>>>>>>
>>>>>> Which is it here?
>>>>> Sorry, I accidentally used the lore link for 5.17 while reporting this
>>>>> issue,
>>>>> but this test does fail on all the stable releases from 5.10 onwards.
>>>>>
>>>>> stable 5.15 :-
>>>>> https://linux.kernelci.org/test/case/id/64dd156a5ac58d0cf335b1ea/
>>>>> mainline :-
>>>>> https://linux.kernelci.org/test/case/id/64dc13d55cb51357a135b209/
>>>>>
>>>> Shreeya, can you try the patch Geert suggested and let us know if it
>>>> helps? If not, then I can try to take a closer look.
>>> I tried to test the kernel with 9be4cbd09da8 but it didn't change the
>>> result.
>>> https://lava.collabora.dev/scheduler/job/11311615
>>>
>>> Also, I am not sure if this can change things but just FYI, KernelCI
>>> adds some kernel parameters when running these tests and one of the
>>> parameter is deferred_probe_timeout=60.
>> Ah this is good to know.
>>
>>> You can check this in the definition details given in the Lava job. I
>>> also tried to remove this parameter and rerun the test but again I got
>>> the same result.
>> How long does the test wait after boot before checking for the
>> deferred devices list?
>>
> 
> AFAIK, script for running the tests is immediately ran after the boot 
> process is complete so there is no wait time.

Regardless of what the kernel is doing, it seems like a fundamentally 
dumb test to specifically ask deferred probe to wait for up to a minute 
then complain that it hasn't finished after 11 seconds :/

If anything, it seems plausible that the "regression" might actually be 
the correct behaviour, and it was wrong before. I can't manage to pull 
up a boot log for a pre-5.10 kernel since all the async stuff on the 
KernelCI dashboard always just times out for me with a helpful "Error 
while loading data from the server (error code: 0)", but what would be 
interesting is whether those devices on the list are expected to 
successfully probe anyway - the mainline log below also shows other 
stuff failing to probe and CPUs failing to come online, so it's clearly 
not a very happy platform to begin with.

Robin.

>>> I will try to add 9be4cbd09da8 to mainline kernel and see what results I
>>> get.
>> Now I'm confused. What do you mean by mainline? Are you saying the tip
>> of tree of Linus's tree is also hitting this issue?
> 
> 
> KernelCI runs tests on different kernel branches and trees, we also have 
> this same test running on mainline tree.
> Following is the link to the dashboard for it and as you can see, it 
> does fail there too.
> 
> 
> https://linux.kernelci.org/test/case/id/64dc13d55cb51357a135b209/
> 
> 
>> -Saravana
>>
Mark Brown Aug. 21, 2023, 1:11 p.m. UTC | #11
On Mon, Aug 21, 2023 at 01:39:11PM +0100, Robin Murphy wrote:
> On 2023-08-21 12:35, Shreeya Patel wrote:

> > AFAIK, script for running the tests is immediately ran after the boot
> > process is complete so there is no wait time.

> Regardless of what the kernel is doing, it seems like a fundamentally dumb
> test to specifically ask deferred probe to wait for up to a minute then
> complain that it hasn't finished after 11 seconds :/

IIRC that stuff is expecting the modules to be loaded from the initramfs
and checking from the main system which is a bit more sensible (at least
in the case where there is a main filesystem).  It's vulnerable to races
but less so, especially given the time a Debian rootfs typically takes
to boot over NFS.
Shreeya Patel Aug. 22, 2023, 2:10 p.m. UTC | #12
Hi Robin,

On 21/08/23 18:09, Robin Murphy wrote:
> On 2023-08-21 12:35, Shreeya Patel wrote:
>>
>> On 19/08/23 01:49, Saravana Kannan wrote:
>>> On Thu, Aug 17, 2023 at 4:13 PM Shreeya Patel
>>> <shreeya.patel@collabora.com> wrote:
>>>> Hi Geert, Saravana,
>>>>
>>>> On 18/08/23 00:03, Saravana Kannan wrote:
>>>>> On Thu, Aug 17, 2023 at 4:37 AM Shreeya Patel
>>>>> <shreeya.patel@collabora.com> wrote:
>>>>>> Hi Greg,
>>>>>>
>>>>>> On 16/08/23 20:33, Greg Kroah-Hartman wrote:
>>>>>>> On Wed, Aug 16, 2023 at 03:09:27PM +0530, Shreeya Patel wrote:
>>>>>>>> On 13/06/22 15:40, Greg Kroah-Hartman wrote:
>>>>>>>>> From: Saravana Kannan<saravanak@google.com>
>>>>>>>>>
>>>>>>>>> [ Upstream commit 5ee76c256e928455212ab759c51d198fedbe7523 ]
>>>>>>>>>
>>>>>>>>> Mounting NFS rootfs was timing out when deferred_probe_timeout 
>>>>>>>>> was
>>>>>>>>> non-zero [1].  This was because ip_auto_config() initcall 
>>>>>>>>> times out
>>>>>>>>> waiting for the network interfaces to show up when
>>>>>>>>> deferred_probe_timeout was non-zero. While ip_auto_config() calls
>>>>>>>>> wait_for_device_probe() to make sure any currently running 
>>>>>>>>> deferred
>>>>>>>>> probe work or asynchronous probe finishes, that wasn't 
>>>>>>>>> sufficient to
>>>>>>>>> account for devices being deferred until deferred_probe_timeout.
>>>>>>>>>
>>>>>>>>> Commit 35a672363ab3 ("driver core: Ensure 
>>>>>>>>> wait_for_device_probe() waits
>>>>>>>>> until the deferred_probe_timeout fires") tried to fix that by 
>>>>>>>>> making
>>>>>>>>> sure wait_for_device_probe() waits for deferred_probe_timeout 
>>>>>>>>> to expire
>>>>>>>>> before returning.
>>>>>>>>>
>>>>>>>>> However, if wait_for_device_probe() is called from the 
>>>>>>>>> kernel_init()
>>>>>>>>> context:
>>>>>>>>>
>>>>>>>>> - Before deferred_probe_initcall() [2], it causes the boot 
>>>>>>>>> process to
>>>>>>>>>       hang due to a deadlock.
>>>>>>>>>
>>>>>>>>> - After deferred_probe_initcall() [3], it blocks kernel_init() 
>>>>>>>>> from
>>>>>>>>>       continuing till deferred_probe_timeout expires and beats 
>>>>>>>>> the point of
>>>>>>>>>       deferred_probe_timeout that's trying to wait for 
>>>>>>>>> userspace to load
>>>>>>>>>       modules.
>>>>>>>>>
>>>>>>>>> Neither of this is good. So revert the changes to
>>>>>>>>> wait_for_device_probe().
>>>>>>>>>
>>>>>>>>> [1] 
>>>>>>>>> -https://lore.kernel.org/lkml/TYAPR01MB45443DF63B9EF29054F7C41FD8C60@TYAPR01MB4544.jpnprd01.prod.outlook.com/
>>>>>>>>> [2] 
>>>>>>>>> -https://lore.kernel.org/lkml/YowHNo4sBjr9ijZr@dev-arch.thelio-3990X/ 
>>>>>>>>>
>>>>>>>>> [3] -https://lore.kernel.org/lkml/Yo3WvGnNk3LvLb7R@linutronix.de/
>>>>>>>> Hi Saravana, Greg,
>>>>>>>>
>>>>>>>>
>>>>>>>> KernelCI found this patch causes the 
>>>>>>>> baseline.bootrr.deferred-probe-empty test to fail on 
>>>>>>>> r8a77960-ulcb,
>>>>>>>> see the following details for more information.
>>>>>>>>
>>>>>>>> KernelCI dashboard link:
>>>>>>>> https://linux.kernelci.org/test/plan/id/64d2a6be8c1a8435e535b264/
>>>>>>>>
>>>>>>>> Error messages from the logs :-
>>>>>>>>
>>>>>>>> + UUID=11236495_1.5.2.4.5
>>>>>>>> + set +x
>>>>>>>> + export 
>>>>>>>> 'PATH=/opt/bootrr/libexec/bootrr/helpers:/lava-11236495/1/../bin:/sbin:/usr/sbin:/bin:/usr/bin'
>>>>>>>> + cd /opt/bootrr/libexec/bootrr
>>>>>>>> + sh helpers/bootrr-auto
>>>>>>>> e6800000.ethernet
>>>>>>>> e6700000.dma-controller
>>>>>>>> e7300000.dma-controller
>>>>>>>> e7310000.dma-controller
>>>>>>>> ec700000.dma-controller
>>>>>>>> ec720000.dma-controller
>>>>>>>> fea20000.vsp
>>>>>>>> feb00000.display
>>>>>>>> fea28000.vsp
>>>>>>>> fea30000.vsp
>>>>>>>> fe9a0000.vsp
>>>>>>>> fe9af000.fcp
>>>>>>>> fea27000.fcp
>>>>>>>> fea2f000.fcp
>>>>>>>> fea37000.fcp
>>>>>>>> sound
>>>>>>>> ee100000.mmc
>>>>>>>> ee140000.mmc
>>>>>>>> ec500000.sound
>>>>>>>> /lava-11236495/1/../bin/lava-test-case
>>>>>>>> <8>[   17.476741] <LAVA_SIGNAL_TESTCASE 
>>>>>>>> TEST_CASE_ID=deferred-probe-empty RESULT=fail>
>>>>>>>>
>>>>>>>> Test case failing :-
>>>>>>>> Baseline Bootrr deferred-probe-empty test 
>>>>>>>> -https://github.com/kernelci/bootrr/blob/main/helpers/bootrr-generic-tests
>>>>>>>>
>>>>>>>> Regression Reproduced :-
>>>>>>>>
>>>>>>>> Lava job after reverting the commit 5ee76c256e92
>>>>>>>> https://lava.collabora.dev/scheduler/job/11292890
>>>>>>>>
>>>>>>>>
>>>>>>>> Bisection report from KernelCI can be found at the bottom of 
>>>>>>>> the email.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Shreeya Patel
>>>>>>>>
>>>>>>>> #regzbot introduced: 5ee76c256e92
>>>>>>>> #regzbot title: KernelCI: Multiple devices deferring on 
>>>>>>>> r8a77960-ulcb
>>>>>>>>
>>>>>>>> --------------------------------------------------------------------------------------------------------------------------------------------------- 
>>>>>>>>
>>>>>>>>
>>>>>>>> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * **
>>>>>>>> * If you do send a fix, please include this trailer: *
>>>>>>>> * Reported-by: "kernelci.org bot" <bot@...> *
>>>>>>>> * *
>>>>>>>> * Hope this helps! *
>>>>>>>> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
>>>>>>>>
>>>>>>>> stable-rc/linux-5.10.y bisection: 
>>>>>>>> baseline.bootrr.deferred-probe-empty on
>>>>>>>> r8a77960-ulcb
>>>>>>> You are testing 5.10.y, yet the subject says 5.17?
>>>>>>>
>>>>>>> Which is it here?
>>>>>> Sorry, I accidentally used the lore link for 5.17 while reporting 
>>>>>> this
>>>>>> issue,
>>>>>> but this test does fail on all the stable releases from 5.10 
>>>>>> onwards.
>>>>>>
>>>>>> stable 5.15 :-
>>>>>> https://linux.kernelci.org/test/case/id/64dd156a5ac58d0cf335b1ea/
>>>>>> mainline :-
>>>>>> https://linux.kernelci.org/test/case/id/64dc13d55cb51357a135b209/
>>>>>>
>>>>> Shreeya, can you try the patch Geert suggested and let us know if it
>>>>> helps? If not, then I can try to take a closer look.
>>>> I tried to test the kernel with 9be4cbd09da8 but it didn't change the
>>>> result.
>>>> https://lava.collabora.dev/scheduler/job/11311615
>>>>
>>>> Also, I am not sure if this can change things but just FYI, KernelCI
>>>> adds some kernel parameters when running these tests and one of the
>>>> parameter is deferred_probe_timeout=60.
>>> Ah this is good to know.
>>>
>>>> You can check this in the definition details given in the Lava job. I
>>>> also tried to remove this parameter and rerun the test but again I got
>>>> the same result.
>>> How long does the test wait after boot before checking for the
>>> deferred devices list?
>>>
>>
>> AFAIK, script for running the tests is immediately ran after the boot 
>> process is complete so there is no wait time.
>
> Regardless of what the kernel is doing, it seems like a fundamentally 
> dumb test to specifically ask deferred probe to wait for up to a 
> minute then complain that it hasn't finished after 11 seconds :/
>
> If anything, it seems plausible that the "regression" might actually 
> be the correct behaviour, and it was wrong before. I can't manage to 
> pull up a boot log for a pre-5.10 kernel since all the async stuff on 
> the KernelCI dashboard always just times out for me with a helpful 
> "Error while loading data from the server (error code: 0)", but what 
> would be interesting is whether those devices on the list are expected 
> to successfully probe anyway - the mainline log below also shows other 
> stuff failing to probe and CPUs failing to come online, so it's 
> clearly not a very happy platform to begin with.
>

Sorry about the dashboard issues you are facing, KernelCI team is 
working on a new dashboard which will fix all of these issues. But we 
need to wait for it be ready for some more time.


Your point makes sense and that is why we did a test to add some sleep 
time of 60-65 seconds before running the tests and it actually fixed the 
problem. There are no more deferred devices as you can see in the 
following job.
https://lava.collabora.dev/scheduler/job/11330931

This change, to add deferred_probe_timeout=60 as kernel parameter was 
recently added in KernelCI since there were number of devices failing to 
probe on different platforms, specifically for chromebooks.
Unfortunately, no one realized to change the start of the test time and 
hence these issues are seen now on different platforms.

Thanks for pointing it out, it will help us to eliminate quite many 
deferred probe test failures that we are seeing.

Saravana, thanks to you as well for asking the valid questions about 
when the test starts running.

Thanks,
Shreeya Patel


#regzbot resolve: Nothing is broken due to the patch, a fix is needed on 
the KernelCI side.

> Robin.
>
>>>> I will try to add 9be4cbd09da8 to mainline kernel and see what 
>>>> results I
>>>> get.
>>> Now I'm confused. What do you mean by mainline? Are you saying the tip
>>> of tree of Linus's tree is also hitting this issue?
>>
>>
>> KernelCI runs tests on different kernel branches and trees, we also 
>> have this same test running on mainline tree.
>> Following is the link to the dashboard for it and as you can see, it 
>> does fail there too.
>>
>>
>> https://linux.kernelci.org/test/case/id/64dc13d55cb51357a135b209/
>>
>>
>>> -Saravana
>>>
>
Saravana Kannan Aug. 23, 2023, 8:59 p.m. UTC | #13
On Tue, Aug 22, 2023 at 7:10 AM Shreeya Patel
<shreeya.patel@collabora.com> wrote:
>
> Hi Robin,
>
> On 21/08/23 18:09, Robin Murphy wrote:
> > On 2023-08-21 12:35, Shreeya Patel wrote:
> >>
> >> On 19/08/23 01:49, Saravana Kannan wrote:
> >>> On Thu, Aug 17, 2023 at 4:13 PM Shreeya Patel
> >>> <shreeya.patel@collabora.com> wrote:
> >>>> Hi Geert, Saravana,
> >>>>
> >>>> On 18/08/23 00:03, Saravana Kannan wrote:
> >>>>> On Thu, Aug 17, 2023 at 4:37 AM Shreeya Patel
> >>>>> <shreeya.patel@collabora.com> wrote:
> >>>>>> Hi Greg,
> >>>>>>
> >>>>>> On 16/08/23 20:33, Greg Kroah-Hartman wrote:
> >>>>>>> On Wed, Aug 16, 2023 at 03:09:27PM +0530, Shreeya Patel wrote:
> >>>>>>>> On 13/06/22 15:40, Greg Kroah-Hartman wrote:
> >>>>>>>>> From: Saravana Kannan<saravanak@google.com>
> >>>>>>>>>
> >>>>>>>>> [ Upstream commit 5ee76c256e928455212ab759c51d198fedbe7523 ]
> >>>>>>>>>
> >>>>>>>>> Mounting NFS rootfs was timing out when deferred_probe_timeout
> >>>>>>>>> was
> >>>>>>>>> non-zero [1].  This was because ip_auto_config() initcall
> >>>>>>>>> times out
> >>>>>>>>> waiting for the network interfaces to show up when
> >>>>>>>>> deferred_probe_timeout was non-zero. While ip_auto_config() calls
> >>>>>>>>> wait_for_device_probe() to make sure any currently running
> >>>>>>>>> deferred
> >>>>>>>>> probe work or asynchronous probe finishes, that wasn't
> >>>>>>>>> sufficient to
> >>>>>>>>> account for devices being deferred until deferred_probe_timeout.
> >>>>>>>>>
> >>>>>>>>> Commit 35a672363ab3 ("driver core: Ensure
> >>>>>>>>> wait_for_device_probe() waits
> >>>>>>>>> until the deferred_probe_timeout fires") tried to fix that by
> >>>>>>>>> making
> >>>>>>>>> sure wait_for_device_probe() waits for deferred_probe_timeout
> >>>>>>>>> to expire
> >>>>>>>>> before returning.
> >>>>>>>>>
> >>>>>>>>> However, if wait_for_device_probe() is called from the
> >>>>>>>>> kernel_init()
> >>>>>>>>> context:
> >>>>>>>>>
> >>>>>>>>> - Before deferred_probe_initcall() [2], it causes the boot
> >>>>>>>>> process to
> >>>>>>>>>       hang due to a deadlock.
> >>>>>>>>>
> >>>>>>>>> - After deferred_probe_initcall() [3], it blocks kernel_init()
> >>>>>>>>> from
> >>>>>>>>>       continuing till deferred_probe_timeout expires and beats
> >>>>>>>>> the point of
> >>>>>>>>>       deferred_probe_timeout that's trying to wait for
> >>>>>>>>> userspace to load
> >>>>>>>>>       modules.
> >>>>>>>>>
> >>>>>>>>> Neither of this is good. So revert the changes to
> >>>>>>>>> wait_for_device_probe().
> >>>>>>>>>
> >>>>>>>>> [1]
> >>>>>>>>> -https://lore.kernel.org/lkml/TYAPR01MB45443DF63B9EF29054F7C41FD8C60@TYAPR01MB4544.jpnprd01.prod.outlook.com/
> >>>>>>>>> [2]
> >>>>>>>>> -https://lore.kernel.org/lkml/YowHNo4sBjr9ijZr@dev-arch.thelio-3990X/
> >>>>>>>>>
> >>>>>>>>> [3] -https://lore.kernel.org/lkml/Yo3WvGnNk3LvLb7R@linutronix.de/
> >>>>>>>> Hi Saravana, Greg,
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> KernelCI found this patch causes the
> >>>>>>>> baseline.bootrr.deferred-probe-empty test to fail on
> >>>>>>>> r8a77960-ulcb,
> >>>>>>>> see the following details for more information.
> >>>>>>>>
> >>>>>>>> KernelCI dashboard link:
> >>>>>>>> https://linux.kernelci.org/test/plan/id/64d2a6be8c1a8435e535b264/
> >>>>>>>>
> >>>>>>>> Error messages from the logs :-
> >>>>>>>>
> >>>>>>>> + UUID=11236495_1.5.2.4.5
> >>>>>>>> + set +x
> >>>>>>>> + export
> >>>>>>>> 'PATH=/opt/bootrr/libexec/bootrr/helpers:/lava-11236495/1/../bin:/sbin:/usr/sbin:/bin:/usr/bin'
> >>>>>>>> + cd /opt/bootrr/libexec/bootrr
> >>>>>>>> + sh helpers/bootrr-auto
> >>>>>>>> e6800000.ethernet
> >>>>>>>> e6700000.dma-controller
> >>>>>>>> e7300000.dma-controller
> >>>>>>>> e7310000.dma-controller
> >>>>>>>> ec700000.dma-controller
> >>>>>>>> ec720000.dma-controller
> >>>>>>>> fea20000.vsp
> >>>>>>>> feb00000.display
> >>>>>>>> fea28000.vsp
> >>>>>>>> fea30000.vsp
> >>>>>>>> fe9a0000.vsp
> >>>>>>>> fe9af000.fcp
> >>>>>>>> fea27000.fcp
> >>>>>>>> fea2f000.fcp
> >>>>>>>> fea37000.fcp
> >>>>>>>> sound
> >>>>>>>> ee100000.mmc
> >>>>>>>> ee140000.mmc
> >>>>>>>> ec500000.sound
> >>>>>>>> /lava-11236495/1/../bin/lava-test-case
> >>>>>>>> <8>[   17.476741] <LAVA_SIGNAL_TESTCASE
> >>>>>>>> TEST_CASE_ID=deferred-probe-empty RESULT=fail>
> >>>>>>>>
> >>>>>>>> Test case failing :-
> >>>>>>>> Baseline Bootrr deferred-probe-empty test
> >>>>>>>> -https://github.com/kernelci/bootrr/blob/main/helpers/bootrr-generic-tests
> >>>>>>>>
> >>>>>>>> Regression Reproduced :-
> >>>>>>>>
> >>>>>>>> Lava job after reverting the commit 5ee76c256e92
> >>>>>>>> https://lava.collabora.dev/scheduler/job/11292890
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Bisection report from KernelCI can be found at the bottom of
> >>>>>>>> the email.
> >>>>>>>>
> >>>>>>>> Thanks,
> >>>>>>>> Shreeya Patel
> >>>>>>>>
> >>>>>>>> #regzbot introduced: 5ee76c256e92
> >>>>>>>> #regzbot title: KernelCI: Multiple devices deferring on
> >>>>>>>> r8a77960-ulcb
> >>>>>>>>
> >>>>>>>> ---------------------------------------------------------------------------------------------------------------------------------------------------
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * **
> >>>>>>>> * If you do send a fix, please include this trailer: *
> >>>>>>>> * Reported-by: "kernelci.org bot" <bot@...> *
> >>>>>>>> * *
> >>>>>>>> * Hope this helps! *
> >>>>>>>> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> >>>>>>>>
> >>>>>>>> stable-rc/linux-5.10.y bisection:
> >>>>>>>> baseline.bootrr.deferred-probe-empty on
> >>>>>>>> r8a77960-ulcb
> >>>>>>> You are testing 5.10.y, yet the subject says 5.17?
> >>>>>>>
> >>>>>>> Which is it here?
> >>>>>> Sorry, I accidentally used the lore link for 5.17 while reporting
> >>>>>> this
> >>>>>> issue,
> >>>>>> but this test does fail on all the stable releases from 5.10
> >>>>>> onwards.
> >>>>>>
> >>>>>> stable 5.15 :-
> >>>>>> https://linux.kernelci.org/test/case/id/64dd156a5ac58d0cf335b1ea/
> >>>>>> mainline :-
> >>>>>> https://linux.kernelci.org/test/case/id/64dc13d55cb51357a135b209/
> >>>>>>
> >>>>> Shreeya, can you try the patch Geert suggested and let us know if it
> >>>>> helps? If not, then I can try to take a closer look.
> >>>> I tried to test the kernel with 9be4cbd09da8 but it didn't change the
> >>>> result.
> >>>> https://lava.collabora.dev/scheduler/job/11311615
> >>>>
> >>>> Also, I am not sure if this can change things but just FYI, KernelCI
> >>>> adds some kernel parameters when running these tests and one of the
> >>>> parameter is deferred_probe_timeout=60.
> >>> Ah this is good to know.
> >>>
> >>>> You can check this in the definition details given in the Lava job. I
> >>>> also tried to remove this parameter and rerun the test but again I got
> >>>> the same result.
> >>> How long does the test wait after boot before checking for the
> >>> deferred devices list?
> >>>
> >>
> >> AFAIK, script for running the tests is immediately ran after the boot
> >> process is complete so there is no wait time.
> >
> > Regardless of what the kernel is doing, it seems like a fundamentally
> > dumb test to specifically ask deferred probe to wait for up to a
> > minute then complain that it hasn't finished after 11 seconds :/
> >
> > If anything, it seems plausible that the "regression" might actually
> > be the correct behaviour, and it was wrong before. I can't manage to
> > pull up a boot log for a pre-5.10 kernel since all the async stuff on
> > the KernelCI dashboard always just times out for me with a helpful
> > "Error while loading data from the server (error code: 0)", but what
> > would be interesting is whether those devices on the list are expected
> > to successfully probe anyway - the mainline log below also shows other
> > stuff failing to probe and CPUs failing to come online, so it's
> > clearly not a very happy platform to begin with.
> >
>
> Sorry about the dashboard issues you are facing, KernelCI team is
> working on a new dashboard which will fix all of these issues. But we
> need to wait for it be ready for some more time.
>
>
> Your point makes sense and that is why we did a test to add some sleep
> time of 60-65 seconds before running the tests and it actually fixed the
> problem. There are no more deferred devices as you can see in the
> following job.
> https://lava.collabora.dev/scheduler/job/11330931
>
> This change, to add deferred_probe_timeout=60 as kernel parameter was
> recently added in KernelCI since there were number of devices failing to
> probe on different platforms, specifically for chromebooks.
> Unfortunately, no one realized to change the start of the test time and
> hence these issues are seen now on different platforms.
>
> Thanks for pointing it out, it will help us to eliminate quite many
> deferred probe test failures that we are seeing.
>
> Saravana, thanks to you as well for asking the valid questions about
> when the test starts running.

Glad we sorted this out. Thanks for maintaining the tests.

-Saravana
diff mbox series

Patch

diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index 977e94cf669e..86fd2ea35656 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -257,7 +257,6 @@  DEFINE_SHOW_ATTRIBUTE(deferred_devs);
 
 int driver_deferred_probe_timeout;
 EXPORT_SYMBOL_GPL(driver_deferred_probe_timeout);
-static DECLARE_WAIT_QUEUE_HEAD(probe_timeout_waitqueue);
 
 static int __init deferred_probe_timeout_setup(char *str)
 {
@@ -312,7 +311,6 @@  static void deferred_probe_timeout_work_func(struct work_struct *work)
 	list_for_each_entry(p, &deferred_probe_pending_list, deferred_probe)
 		dev_info(p->device, "deferred probe pending\n");
 	mutex_unlock(&deferred_probe_mutex);
-	wake_up_all(&probe_timeout_waitqueue);
 }
 static DECLARE_DELAYED_WORK(deferred_probe_timeout_work, deferred_probe_timeout_work_func);
 
@@ -720,9 +718,6 @@  int driver_probe_done(void)
  */
 void wait_for_device_probe(void)
 {
-	/* wait for probe timeout */
-	wait_event(probe_timeout_waitqueue, !driver_deferred_probe_timeout);
-
 	/* wait for the deferred probe workqueue to finish */
 	flush_work(&deferred_probe_work);